The Oxford Handbook of Experimental Syntax 9780198797722, 0198797729

This volume showcases the contributions that formal experimental methods can make to syntactic research in the 21st cent

179 57 16MB

English Pages 705 Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The Oxford Handbook of Experimental Syntax 9780192518576, 9780198797722, 0192518577

This volume showcases the contributions that formal experimental methods can make to syntactic research in the 21st cent

146 29 5MB Read more

The Oxford Handbook of Experimental Semantics and Pragmatics 9780198791768, 0198791763

This handbook is the first to explore the growing field of experimental semantics and pragmatics. In the past 20 years,

184 32 9MB Read more

The Oxford Handbook of Experimental Semantics and Pragmatics 9780192509550

788 101 5MB Read more

The Oxford Handbook of Comparative Syntax (Oxford Handbooks in Linguistics) 0195136500, 9780195136500

Comparison across formal languages is an essential part of formal linguistics. The study of closely-related varieties ha

158 8 6MB Read more

Experimental Insights into the Syntax of Romanian Ditransitives 9781501513657, 9781501518072

This book investigates the syntax of Romanian ditransitives building on new experimental data with a view to enable a mo

221 124 1MB Read more

Experimental Insights into the Syntax of Romanian Ditransitives 9781501513657, 9781501518072

This book investigates the syntax of Romanian ditransitives building on new experimental data with a view to enable a mo

194 79 3MB Read more

Handbook of Experimental Phenomenology

939 63 9MB Read more

The Handbook of Experimental Economics, Volume 2: The Handbook of Experimental Economics 9781400883172

When The Handbook of Experimental Economics first came out in 1995, the notion of economists conducting lab experiments

340 65 9MB Read more

The Oxford Handbook of Experimental Semantics and Pragmatics 9780192509550, 9780198791768, 0192509551

This handbook is the first to explore the growing field of experimental semantics and pragmatics. In the past 20 years,

158 88 5MB Read more

The Handbook of Experimental Economics 9780691213255

This book, which comprises eight chapters, presents a comprehensive critical survey of the results and methods of labora

415 155 47MB Read more

The Oxford Handbook of Experimental Syntax
9780198797722, 0198797729

Author / Uploaded
Jon Sprouse
Professor of Psychology Jon Sprouse

Table of contents :
Cover
Half-title
EXPERIMENTALSYNTAX
Copyright
Contents
Preface
List of figures and tables
The contributors
Part I Judgment methods in syntactic theory
Chapter 1 Acceptability judgments
Chapter 2 Acceptability judgments of binding and coreference5pt
Chapter 3 (Quantifier) scope judgments
Chapter 4 Experimental syntax and linguistic fieldwork
Annotated bibliography for Part I
Part II Acquisition methods in syntactic theory
Chapter 5 Behavioral acquisition methods with infants
Chapter 6 Behavioral acquisition methods with preschool-age children
Chapter 7 Modeling syntactic acquisition
Chapter 8 Artificial language learning
Annotated bibliography for Part II
Part III Psycholinguistic methods in syntactic theory
Chapter 9 Self-paced reading
Chapter 10 Eye-tracking and experimental syntax
Chapter 11 Speed–accuracy trade-off modeling and its interface with experimental syntax
Chapter 12 Formal Methods in Experimental Syntax
Chapter 13 Investigating syntactic structure and processing in the auditory modality
Chapter 14 Language-processing experiments in the field
Part IV Neurolinguistic methods in syntactic theory
Chapter 15 Electrophysiological methods
Chapter 16 Hemodynamic methods
Chapter 17 Aphasia and syntax
Annotated bibliography for Part IV
Chapter 18 The future of experimental syntax
Index

Citation preview

t h e ox f o rd ha n db o o k o f

EXPERIMENTAL SYNTAX

OX F OR D HA N DB O OK S I N L I NG U I ST IC S RECENTLY PUBLISHED

THE OXFORD HANDBOOK OF REFERENCE Edited by Jeanette Gundel and Barbara Abbott

THE OXFORD HANDBOOK OF EXPERIMENTAL SEMANTICS AND PRAGMATICS Edited by Chris Cummins and Napoleon Katsos

THE OXFORD HANDBOOK OF EVENT STRUCTURE Edited by Robert Truswell

THE OXFORD HANDBOOK OF LANGUAGE ATTRITION Edited by Monika S. Schmid and Barbara Köpke

THE OXFORD HANDBOOK OF NEUROLINGUISTICS Edited by Greig I. de Zubicaray and Niels O. Schiller

THE OXFORD HANDBOOK OF ENGLISH GRAMMAR Edited by Bas Aarts, Jill Bowie, and Gergana Popova

THE OXFORD HANDBOOK OF AFRICAN LANGUAGES Edited by Rainer Vossen and Gerrit J. Dimmendaal

THE OXFORD HANDBOOK OF NEGATION Edited by Viviane Déprez and M. Teresa Espinal

THE OXFORD HANDBOOK OF LANGUAGE CONTACT Edited by Anthony P. Grant

THE OXFORD HANDBOOK OF LANGUAGE AND RACE Edited by H. Samy Alim, Angela Reyes, and Paul V. Kroskrity

THE OXFORD HANDBOOK OF LANGUAGE PROSODY Edited by Carlos Gussenhoven and Aoju Chen

THE OXFORD HANDBOOK OF LANGUAGES OF THE CAUCASUS Edited by Maria Polinsky

THE OXFORD HANDBOOK OF GRAMMATICAL NUMBER Edited by Patricia Cabredo Hofherr and Jenny Doetjes

THE OXFORD HANDBOOK OF COMPUTATIONAL LINGUISTICS Second edition Edited by Ruslan Mitkov

THE OXFORD HANDBOOK OF THE MENTAL LEXICON Edited by Anna Papafragou, John C. Trueswell, and Lila R. Gleitman

THE OXFORD HANDBOOK OF ETHIOPIAN LANGUAGES Edited by Ronny Meyer, Bedilu Wakjira, and Zelealem Leyew

THE OXFORD HANDBOOK OF EXPERIMENTAL SYNTAX Edited by Jon Sprouse For a complete list of Oxford Handbooks in Linguistics please see pp 675–78.

the oxford handbo ok of

...........................................................................................................

EXPERIMENTAL SYNTAX ...........................................................................................................

Edited by

JON SPROUSE

Great Clarendon Street, Oxford OX2 6DP, United Kingdom Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries © editorial matter and organization Jon Sprouse 2023 © the chapters their several authors 2023 The moral rights of the authors have been asserted First Edition published in 2023 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by licence or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this work in any other form and you must impose this same condition on any acquirer Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America British Library Cataloguing in Publication Data Data available Library of Congress Control Number: 2022935323 ISBN 978–0–19–879772–2 DOI: 10.1093/oxfordhb/9780198797722.001.0001 Printed and bound by CPI Group (UK) Ltd, Croydon, CR0 4YY Links to third party websites are provided by Oxford in good faith and for information only. Oxford disclaims any responsibility for the materials contained in any third party website referenced in this work.

Contents ..............................

Preface List of figures and tables The contributors

ix xiii xvii

PA RT I J U D G M E N T M ET HOD S I N SY N TAC T IC T H E ORY 1. Acceptability judgments

3

Jon Sprouse

2. Acceptability judgments of binding and coreference: Methodological considerations

29

Elsi Kaiser and Jeffrey Runner

3. (Quantifier) scope judgments

53

Kriszta Eszter Szendrői

4. Experimental syntax and linguistic fieldwork

97

Maria Polinsky

Annotated bibliography for Part I

127

PA RT I I AC QU I SI T ION M ET HOD S I N SY N TAC T IC T H E ORY 5. Behavioral acquisition methods with infants

137

Laurel Perkins and Jeffrey Lidz

6. Behavioral acquisition methods with preschool-age children Kristen Syrett

171

vi

contents

7. Modeling syntactic acquisition

209

Lisa S. Pearl

8. Artificial language learning

271

Jennifer Culbertson

Annotated bibliography for Part II

301

PA RT I I I P SYC HOL I NG U I ST IC M ET HOD S I N SY N TAC T IC T H E ORY 9. Self-paced reading

313

Masaya Yoshida

10. Eye-tracking and experimental syntax

333

Dave Kush and Brian Dillon

11. Speed–accuracy trade-off modeling and its interface with experimental syntax

363

Stephani Foraker, Ian Cunnings, and Andrea E. Martin

12. Formal methods in experimental syntax

393

Tim Hunter

13. Investigating syntactic structure and processing in the auditory modality

453

Mara Breen and Katy Carlson

14. Language-processing experiments in the field

491

Matthew Wagers and Sandra Chung

Annotated bibliography for Part III

513

PA RT I V N E U ROL I NG U I ST IC M ET HOD S I N SY N TAC T IC T H E ORY 15. Electrophysiological methods

533

Jon Sprouse and Diogo Almeida

16. Hemodynamic methods Jonathan R. Brennan

559

contents

17. Aphasia and syntax

vii

593

William Matchin and Corianne Rogalsky

Annotated bibliography for Part IV 18. The future of experimental syntax

635 643

The contributors

Index

665

Preface ........................

The field of syntax has always been interdisciplinary. Part of this is simply the nature of cognitive science—the immensity of the problem posed by human cognition requires a concerted effort from multiple disciplines. And part of this is the nature of syntactic theory: It mediates between sound and meaning, it is a theory of the representations constructed during sentence-processing, and it is a theory of the end state for language acquisition. As technology has advanced, so too have the methods that syntacticians have brought to bear on the central questions of the field. The past two decades in particular have seen an explosion in the use of various experimental methods for probing the syntax of human languages. This Handbook is an attempt to bring these strands of research together into a single volume. I have three goals for this Handbook: (i) to provide high-level reviews of the experimental work that has been driving the field of experimental syntax, (ii) to inspire new research that will push the boundaries of the theory of syntax, and (iii) to provide high-level methodological guidance for researchers who wish to incorporate experimental methods into their own research. I hope readers will agree that the contributors to this volume have created chapters that succeed in all three goals. For this handbook, I have intentionally defined experimental syntax in the broadest possible terms—as the use of any (and all) experimental methods in service of syntactic theory. I am aware that the term experimental syntax is sometimes used in a narrower sense that is more or less synonymous with formal acceptability judgment experiments (I have used it that way myself in my own work), but I believe this synonymy is merely a symptom of the important role that acceptability judgments play in syntactic theory, and not a meaningful delimiter of the types of methods that syntacticians can profitably employ in their research. The space of possible methods is large—too large for any single volume. In assembling this Handbook, I have chosen to focus on methods that are (i) relatively well-understood, (ii) relatively practical in terms of the equipment required, and (iii) relatively likely to yield information that is relevant to syntactic theory. All of these choices are subjective. I do not intend the exclusion of any given method to mean that it does not, or could not, fall under the broad definition of experimental syntax. In fact, what I hope this Handbook shows is that this broader definition of experimental syntax is still in its infancy. We have not yet explored all of the methods that could potentially contribute to theories of syntax, nor have we seen the full potential of the methods that we have explored. As such, this handbook is aspirational—it is simultaneously a snapshot of the knowledge we have collected to date and a pointer to the kind of work that will be possible in the future.

x

preface

One critical component of all experimental work is a linking hypothesis—a hypothesis that links the observed data to the unobserved theoretical constructs that are under investigation. In experimental syntax, we need hypotheses that link each of the methods discussed in this handbook back to syntactic theory. Because there is likely no method that provides a direct link to syntactic theory (at least at our current level of technology), for each and every method in this handbook, creating a linking hypothesis between the data and syntactic theory entails creating (or investigating) a linking hypothesis between syntactic theory and another component of the theory of language, such as the theory of sentence-processing or the theory of language acquisition. Many of the chapters in this Handbook discuss this issue in detail, so I will not belabor the point here. The practical consequence of this is that there is a theoretical theme throughout this handbook—the linking of syntactic theory to other components of a complete theory of language. I have organized the Handbook around this theme. There are four sections, each corresponding to the linking hypotheses necessary to leverage the methods in each section in service of syntactic theory: (i) judgment methods, which require a link between the theory of offline judgments and syntax, (ii) acquisition methods, which require a link between the theory of the language-acquisition process and syntax, (iii) psycholinguistic methods, which require a link between the theory of sentenceprocessing and syntax, and (iv) neurolinguistics methods, which require a link between neurobiology, sentence-processing, and syntax. A few notes on the organization of the Handbook are in order. First, I have asked Oxford University Press to keep the references for each chapter with that chapter (and not in a global reference list at the end of the Handbook). My hope is that this will allow the chapters in this Handbook to truly serve as a guide for exploring the potential theoretical contributions that each method can make to syntax, with the reference lists serving as a first reading list. Second, because these methods are not useful if they cannot be learned by new researchers, I have asked the contributors of each chapter to create an annotated list of resources for learning the method in their chapter. I have collated the lists by section, compiling them into four stand-alone chapters that occur at the end of each section. Finally, I have asked each contributor to write a mini-essay about what they see as the future of experimental syntax. My hope is that these mini-essays will provide inspiration to readers who are considering adopting experimental syntax into their own research programs, and also serve as a sort of time capsule by which we can measure the progress of the field in future years. I have collected these mini-essays into a single chapter at the end of the handbook. This volume could not exist without the talent, energy, and effort of innumerable colleagues. First, I would like to thank the commissioning editor at Oxford University Press, Julia Steer, for laying the foundation of this volume by encouraging me to explore a broad definition of experimental syntax rather than a narrow review of work to date. Second, I would like to thank each of the contributors for sharing both their visions and their expertise. If this volume succeeds in any of its three goals, it will be because

preface

xi

of their hard work and dedication, both in writing their chapters and in doing the kind of research that pushes the boundaries of the field. Finally, I would like to thank everyone who has supported me throughout my career—advisors, collaborators, colleagues, students, family, and friends. Science is a community effort. And I am grateful beyond words for the community that I have somehow been given in this life.

List of figures and tables ...............................................................................

Figures The two predictions of the 2×2 design for whether-islands (left panel and center panel), and the observed results of an actual experiment (right panel)

17

Three demonstrations of the continuous nature of acceptability judgments

21

2.1

Scene verification display from the experiment by Kaiser et al. (2009)

43

2.2

Picture selection display from the experiment by Kaiser et al. (2009)

44

3.1

Outcome of example test story from Conroy et al.’s (2009) TVJT task

87

7.1

Model of the acquisition process adapted from Lidz and Gagliardi (2015)

213

11.1

SAT function for one condition, illustrating the three phases of processing

369

Idealized differences in the three phases of the SAT functions for two conditions

370

Idealized differences in the finishing time distributions corresponding to the SAT differences shown in Fig. 11.2

370

We can use surprisal to formulate a linking hypothesis which, taken together with a probability distribution over sentences, produces empirical predictions about sentence comprehension difficulty

401

Since surprisal can act as a test of probability distributions and probability distributions can be seen as consequences of hypothesized grammars, surprisal can act as a test of hypothesized grammars

402

12.3

Graphical illustration of lc-predict and lc-connect

431

13.1

Example item from Breen et al. (2010) designed to elicit naturalistic productions

459

14.1

A sample itemset from Sussman and Sedivy (2003)

494

14.2

Some culturally specific illustrations created for the Chamorro Psycholinguistics na Project

497

The BOLD signal

562

1.1

1.2

11.2 11.3 12.1

12.2

16.1

xiv

list of figures and tables

16.2

Schematic representation of syntax-related brain regions of the left hemisphere

566

Linking hypotheses connect properties of the grammar with neural signals

582

17.1

Acceptability judgment data reproduced from Linebarger et al. (1983)

611

17.2

Functional neuroanatomy of language and working memory (WM) as relevant to our proposal

618

A schematic of healthy and agrammatic sentence/phrase production with respect to the dorsal and ventral pathways to articulation

622

16.3

17.3

Tables 3.1

Percentage of surface scope response for comprehension question

64

3.2

Summary of experimental findings of the language acquisition studies reviewed in this paper

74

4.1

Experimental paradigm for studying subject preference, morphologically ergative languages

108

The qualitative fit Yang discovered between the unambiguous data advantage (Adv) perceived by a VarLearner in its acquisitional intake and the observed age of acquisition (AoA) in children for six parameter values across different languages

236

Optional infinitive examples in child-produced speech in different languages, and their intended meaning

236

8.1

Summary of key artificial language learning methods

276

12.1

A first illustration of bottom-up parsing

417

12.2

The effect of center-embedding on bottom-up parsing

420

12.3

The effect of left-embedding on bottom-up parsing

421

12.4

The effect of right-embedding on bottom-up parsing

423

12.5

A first illustration of top-down parsing

424

12.6

The effect of center-embedding on top-down parsing

426

12.7

The effect of left-embedding on top-down parsing

427

12.8

The effect of right-embedding on top-down parsing

428

12.9

A first illustration of left-corner parsing

430

7.1

7.2

12.10 The effect of center-embedding on left-corner parsing

433

12.11 The effect of left-embedding on left-corner parsing

434

12.12 The effect of right-embedding on left-corner parsing

435

list of figures and tables 16.1

xv

Syntactic representations stand in a many-to-many relationship with sentence-processing operations

564

16.2

Summary of brain regions related to syntax

565

16.3

Examples of phrases and sentences made of of real words or nonsense pseudo-words

572

16.4

Hypothetical counts for utterances with intransitive verbs

580

17.1

Examples of stimuli from each condition

612

The contributors .......................................................

Diogo Almeida is an Associate Professor of Psychology at New York University Abu Dhabi. His research capitalizes on behavioral and electrophysiological data (EEG and MEG) to investigate questions about linguistic representations and processes at multiple levels (phonology, morphology, and syntax). He holds an MA in cognitive science from the École des Hautes Études en Sciences Sociales (2003) and a PhD in linguistics from the University of Maryland (2009), and completed his post-doctoral training at the University of California, Irvine. Mara Breen is an Associate Professor in the Department of Psychology and Education at Mount Holyoke College. Her research explores the role of prosody in speech perception and production. Using behavioral techniques, eye-tracking, and event-related potentials, she investigates how speakers use prosodic cues to provide meaning, how listeners use prosody to comprehend, and how imagined prosody during reading can affect understanding. In addition, she explores how sound cues are processed similarly across music and language. Her work appears in journals such as Cognition, Journal of Memory and Language, Journal of Experimental Psychology: General, and Language, Cognition, and Neuroscience. Jonathan R. Brennan is an Associate Professor of Linguistics and Psychology at the University of Michigan, where he directs the Computational Neurolinguistics Laboratory. He received a PhD in linguistics from New York University in 2010 and completed post-doctoral training at the Children’s Hospital of Philadelphia. Katy Carlson is a Professor of English in the Department of English at Morehead State University. Her research concentrates primarily on how prosody can affect sentenceprocessing, with special interests in focus effects in ellipsis sentences and prosodic influences on attachment. She studies both pitch accents and prosodic boundaries, and has published in journals such as Language and Speech, Glossa, and Language, Cognition and Neuroscience. Sandra Chung is Distinguished Professor (emerita) of Linguistics at the University of California, Santa Cruz. Her research investigates theoretical issues in syntax and other areas through fieldwork on Chamorro and other Austronesian languages. She has collaborated on research in semantics with William A. Ladusaw, and on research in psycholinguistics with Matthew Wagers. Since 2009 she has been involved in a communitybased effort in the Northern Mariana Islands to upgrade the documentation of the Chamorro language.

xviii

the contributors

Jennifer Culbertson is a Professor in the Department of Linguistics and English Language at the University of Edinburgh, and a founding member of the Centre for Language Evolution. She uses experimental and computational tools to investigate how the human cognitive system shapes linguistic typology. She received the Robert J. Gushko Prize for Outstanding Doctoral Dissertations in Cognitive Science in 2012, was elected to the Young Academy of Europe in 2019, and currently holds a European Research Council Starting Grant. Ian Cunnings is an Associate Professor of Psycholinguistics in the School of Psychology and Clinical Language Sciences at the University of Reading, UK. His main research interests are in sentence and discourse-processing in different populations of speakers. His work has examined the memory-encoding, storage, and retrieval mechanisms that subserve the resolution of different types of linguistic dependencies during language comprehension. His most recent research examines how these different memory operations can inform our understanding of the factors that influence successful sentence comprehension in native and non-native speaker populations. Brian Dillon is an Associate Professor in the Department of Linguistics at the University of Massachusetts, Amherst. His research focuses on adult sentence comprehension, aiming to understand how linguistic constraints are deployed in real-time to constrain sentence processing. In his work, he integrates insights from linguistic theory with process-level cognitive models, with a particular interest in the processing of agreement and pronominal reference. Stephani Foraker is an Associate Professor in the Department of Psychology at State University of New York College at Buffalo, specializing in cognition and psycholinguistics. Her main research interests are in sentence and discourse processing, focusing on the role of memory and focal attention. She has used the speed–accuracy tradeoff (SAT) procedure to investigate long-distance dependencies and pronoun resolution, training under Brian McElree, who pioneered the application of SAT to psycholinguistic issues. Her current research examines the contribution of hand gestures as part of encoding, storage, and retrieval operations, particularly in anaphora resolution. Tim Hunter is an Associate Professor in the Department of Linguistics at the University of California, Los Angeles. The bulk of his research uses computational perspectives to investigate the formal properties of natural language grammar, with one main goal being to clarify the consequences of taking linguistic theories to be testable cognitive hypothesis. This line of work includes studies connecting minimalist syntax to experimental work in language-processing, and studies of the relationship between determiners’ truth-conditions and verification procedures. He has also worked on the argument/adjunct distinction and the syntax of ellipsis. Elsi Kaiser is a Professor in the Department of Linguistics at the University of Southern California. She received her PhD in linguistics from the University of Pennsylvania, after a BA in Germanic languages and literatures from Princeton University and an MA in

the contributors

xix

psychology from the University of Pennsylvania. Her research focuses on the processes and representations involved in comprehension and production, especially in domains involving multiple aspects of linguistic representation (syntax, semantics, pragmatics), such as reference resolution. She has investigated multiple languages (e.g. Finnish, Estonian, French, German, and Dutch, including collaborative work on Bangla/Bengali, Hindi, Italian, Korean, Chinese, and Vietnamese). Dave Kush is an Assistant Professor of Linguistics at University of Toronto. He is interested in sentence-processing, syntactic theory, and cross-linguistic variation. Jeffrey Lidz is Distinguished Scholar-Teacher and Professor of Linguistics at the University of Maryland. His research explores language acquisition from the perspective of comparative syntax and semantics, focusing on the relative contributions of experience, extralinguistic cognition, and domain-specific knowledge in learners’ discovery of linguistic structure and linguistic meaning. Andrea E. Martin is a Lise Meitner Research Group Leader at the Max Planck Institute for Psycholinguistics, and a Principal Investigator at the Donders Centre for Cognitive Neuroimaging at Radboud University in Nijmegen, the Netherlands. Her work has spanned structural and semantic aspects of sentence processing. She has used the speed–accuracy trade-off procedure and cognitive neuroimaging to study the role of memory in sentence processing via ellipsis, a line of research begun with Brian McElree, who pioneered application of SAT to psycholinguistic issues. The current focus of her lab, Language and Computation in Neural Systems, is on developing theories and models of language representation and processing which harness the computational power of neural oscillations, such that formal properties (viz., constituency, compositionality) can be realized in biological and artificial neural networks. William Matchin is an Assistant Professor of Communication Sciences and Disorders in the Arnold School of Public Health at the University of South Carolina. As part of the Center for the Study of Aphasia Recovery, he directs the NeuroSyntax lab, using functional neuroimaging and lesion–symptom mapping and incorporating insights of linguistic theory to understand the architecture of language in the brain. He is currently investigating the nature of grammatical deficits in aphasia, including paragrammatism and agrammatism. Lisa S. Pearl is a Professor in the Department of Language Science at the University of California, Irvine. Her research lies at the interface of language development, computation, and information extraction, including both cognitively oriented research and applied linguistic research that combines theoretical and computational methods. Her cognitively oriented research focuses on child language acquisition, with a particular focus on theory evaluation via acquisition-modeling, and how children’s input affects their linguistic development. Laurel Perkins is an Assistant Professor in the Department of Linguistics at the University of California, Los Angeles. She earned her PhD in linguistics from the University

xx

the contributors

of Maryland and held a postdoctoral fellowship in the Laboratoire de Sciences Cognitives et Psycholinguistique at the École Normale Supérieure. Her research studies the earliest stages of syntax acquisition in infancy, drawing from formal linguistics, developmental psychology, and computational cognitive modelling. She is a recipient of a Post-Doctoral Study Grant from the Fyssen Foundation, a Doctoral Dissertation Improvement Grant from the National Science Foundation, and a Glushko Dissertation Prize from the Cognitive Science Society. Maria Polinsky is Professor of Linguistics, Associate Director of the Language Science Center, and Director of Research Field Stations at the University of Maryland. She has conducted extensive primary work on several languages of the Caucasus, Austronesian languages, and Chukchi. She is also engaged in a comprehensive research program on heritage languages. Her work emphasizes the importance of lesser-studied languages for theoretical linguistics. Recent publications include Deconstructing Ergativity (2016), Heritage Languages and Their Speakers (2018), and The Oxford Handbook of Languages of the Caucasus (2021). Corianne Rogalsky is an Associate Professor of Speech and Hearing Science in the College of Health Solutions at Arizona State University (ASU). As Director of ASU’s Communication Neuroscience and Neuroimaging Lab, Rogalsky uses behavioral and neuroimaging techniques to better understand the neural and cognitive resources that support effective communication in everyday life for individuals who have experienced a brain injury such as a stroke. Rogalsky’s current focus is investigating how executive functions such as selective attention and working memory support speech comprehension in neurotypical adults, and how that support may change after a stroke. Jeffrey Runner is a Professor of Linguistics and Brain & Cognitive Sciences, Dean of the College, and Vice Provost and University Dean for Undergraduate Education at the University of Rochester. He earned a BA in linguistics at the University of California, Santa Cruz, in 1989 and a PhD in linguistics at the University of Massachusetts at Amherst in 1995. He joined the department of Linguistics at the University of Rochester in 1994. His research uses experimental methodologies to investigate natural language syntax. In 2017, he became dean of the College in Arts, Sciences and Engineering, and is responsible for the curricular, co-curricular, and extra-curricular undergraduate experience. Jon Sprouse is a Professor of Psychology at New York University Abu Dhabi. He received an AB in linguistics from Princeton University (2003) and a PhD in linguistics from the University of Maryland (2007). His research focuses on the use of experimental syntax techniques, including acceptability judgments, EEG, and computational modeling, to explore fundamental questions in syntax. He has authored over forty journal articles and book chapters on experimental syntax. His work has been recognized by the Best Paper in Language award, the Early Career award, and the C. L. Baker mid-career award from the Linguistic Society of America.

the contributors

xxi

Kristen Syrett is an Associate Professor in the Department of Linguistics at Rutgers, the State University of New Jersey–New Brunswick, with a co-appointment at the Center for Cognitive Science (RuCCS). She is the Director of the Laboratory for Developmental Language Studies. Her research focuses on semantics and its interface with pragmatics and syntax in language acquisition and development, and on experimental semantics and pragmatics in adult psycholinguistics. Kriszta Eszter Szendrői is a Professor of Theoretical and Experimental Linguistics at the University of Vienna. She works on information structure, including its syntax and prosody, using both theoretical and experimental means, working with both adults and children. She has also worked on the syntax of scope, especially on the acquisition of scope. She is also interested in the interactions between information structure and scope. On a different note, for the past few years she has been leading a research project studying the grammar of Contemporary Hasidic Yiddish. Matthew Wagers is Professor of Linguistics at the University of California, Santa Cruz, where he has taught since 2009. The focus of his research is how syntactic information is represented in memory and how morphological cues guide incremental interpretation. These are cross-cut by an interest in broadening the contribution of psycholinguistically under-investigated languages to theory development. He holds a PhD in linguistics from the University of Maryland (2008) and an AB in Molecular Biology from Princeton University (2003). Masaya Yoshida is an Associate Professor in the Department of Linguistics at Northwestern University. Research interests include online sentence processing and syntax. He has worked on the syntax and processing of ellipsis constructions, long-distance dependencies, and islands. Some of his recent studies have explored the structure associated with the ellipsis site in clausal ellipsis constructions, and how structure in the ellipsis site is built during online sentence-processing.

pa rt

i

...................................................................................................

JUDGMENT METHODS IN S Y N TA C T I C T H E O RY ...................................................................................................

c ha p t e r 1 ...........................................................................................................

a c c e p ta b i l i t y judgments ...........................................................................................................

jon sprouse

1.1 Introduction

..........................................................................................................................

The goal of experimental syntax, at least to my mind, is straightforward: to use experimental methods to collect data that is relevant for the construction and evaluation of syntactic theories. For data types that can only be collected using a formal experiment, such as reaction times or EEG, the work of experimental syntax is simply the work of leveraging these methods for questions in theoretical syntax. However, things appear to be a bit more complicated when the data type in question is acceptability judgments, as acceptability judgments can be collected both relatively informally, as is typical in much of the syntax literature, or relatively formally, as is typical in the experimental syntax literature. I take the coexistence of these two methods of judgment collection to imply that the goal of experimental syntax with respect to acceptability judgments is not simply to collect acceptability judgments, because that is what is done in all syntactic work, but rather to explore ways in which the formal collection of judgments can add new insights over and above those that derive from informal methods. Therefore, my goal in this chapter is to identify four areas in which formal judgment experiments have made substantial contributions—two that lean toward methodological issues, and two that lean toward theoretical issues—and to review the current state of the evidence that we have for each of those areas. To be clear, this chapter is not intended as an exhaustive review of all possible areas in which formal judgment experiments could potentially make a contribution; rather, it is intended as a starting point for thinking about the kinds of questions in theoretical syntax that might benefit from formal acceptability judgment experiments. My hope is that these questions will help to inspire new questions, and new work, in the growing field of experimental syntax.

4

jon sprouse

Before delving into the primary content of this chapter, I would like to briefly mention a few assumptions (and/or decisions) that I am making. The first is that I assume, following many working syntacticians, that acceptability judgments are in principle valuable for the construction and evaluation of syntactic theories. I will, therefore, not attempt to motivate the use of acceptability judgments in general (see Schütze 1996 for a comprehensive discussion of this). The second is that I will assume a relatively minimal linking hypothesis between acceptability judgments and the cognitive properties of sentence processing. Under this linking hypothesis, an acceptability judgment is a relatively automatic behavioral response that arises when a speaker comprehends a sentence, and that this behavioral response is impacted by a large number of cognitive factors, such as the grammaticality of the sentence, the processing dynamics of the sentence, the sentence-processing resources required by the sentence, the meaning of the sentence, the plausibility of the sentence relative to the real world, and even the properties of the specific task that is given to the speaker. I believe wholeheartedly that a more precise linking hypothesis would be helpful for using judgments as evidence in syntax; however, I also believe that the minimal linking hypothesis above is more than sufficient to begin to explore the value of formal acceptability judgment experiments in syntax. My third assumption is that there is no substantive qualitative difference between “informal” and “formal” judgment experiments. Both are experiments in the sense that they involve the manipulation of one variable (syntactic structure) to reveal a causal relationship with another variable (acceptability). Therefore both involve all of the components that typify psychology experiments: a set of conditions, a set of items in each condition, a set of participants, a task for the participants to complete using the items, and a process for analyzing the results of the task. The difference appears to me to be primarily quantitative, in that “formal” experiments tend to involve more conditions, more items per condition, more participants, and more complex analysis processes. To my mind, the labels “informal” and “formal” simply point toward different ends of this quantitative spectrum. In practice, when I say that formal experiments are valuable in some way, what I mean is that increasing the number of conditions, items, or participants, and/or increasing the complexity of the analysis, can yield insights that fewer conditions, items, participants, and/or less complex analyses cannot. The labels “informal” and “formal” are a more concise way to express this idea. My fourth assumption is that Schütze 1996 already provides a comprehensive review of experimental syntax work that was published before 1996. Therefore, in order to provide something new for the field, I will focus here on work published after 1996. Finally, this chapter is not a how-to for constructing formal judgment experiments. The goal is for this to be the chapter one reads, either before or after reading a howto, for inspiration about the types of questions one can ask with the method. I will provide some references for learning acceptability judgment methods in the annotated bibliography for Part I of this Handbook.

acceptability judgments

5

1.2 The validity and reliability of acceptability judgments

..........................................................................................................................

Perhaps the most frequently asked question in the experimental syntax literature is to what extent the informally collected judgments that have been published in the literature can be trusted to form the empirical basis of syntactic theory. This question has arisen since the earliest days of generative grammar (Hill 1961; Spencer 1973); it played a central role in the two books that ushered in the most recent wave of interest in experimental syntax (Schütze 1996; Cowart 1997); and it has given rise to a number of high-level debates in the experimental syntax literature over the past 15 years or so (see Edelman and Christiansen 2003; Ferreira 2005; Wasow and Arnold 2005; Featherston 2007; Gibson and Fedorenko 2013 for some concerns about informal methods; see Marantz 2005 and Phillips 2009 for some rebuttals, and Myers 2009 for a proposal that attempts to split the difference between informally collected judgments and full-scale formal experiments). The existence of this question is understandable. First, informally collected judgments form the vast majority of the data points published in the (generative) syntax literature. Second, the properties of informal collection methods are not identical to the properties of the formal experimental methods that are often used in other domains of cognitive science: Informal methods often involve a smaller number of participants, those participants are often professional linguists instead of naïve participants, the participants are often presented a smaller number of items, and the results are often only analyzed descriptively (without inferential statistics). If one believes that the properties of formal experiments are what they are to ensure the quality of the data, then it is logically possible that the differences between informal methods and formal experiments could lead to lower-quality data. The consequences of this cannot be understated. If there are systemic problems with informally collected judgments, then there are likely to be systemic problems with (generative) syntactic theories. This question touches upon a number of issues in psychometrics and the broader philosophy of measurement. The first question is: What do we mean when we say that data can be “trusted” to form the basis for a theory? Psychometric theories have identified a number of properties that good measurement methods should have. Here I will mention two (and only in a coarse-grained way, setting aside subtypes of these properties): validity and reliability. A measurement method is valid if it measures the property it is purported to measure. A measurement method is reliable if it yields consistent results under repeated measurements (with unchanged conditions). The concerns about informal methods that have figured most prominently in the literature appear to be a combination of concerns about validity and reliability, such as the concern that small sample sizes will lead to an undue influence of random variation, the concern that a small number of experimental items will lead to an undue influence of lexical properties, and the concern that the participation of professional linguists will lead to theoretical bias. In each case, the concern seems to be that informally collected judgments

6

jon sprouse

will not reflect the true acceptability of the sentence (validity), and furthermore that the judgments themselves will be inconsistent over repeated measurements (reliability). This leads to a second question: How does one establish validity for the measurement of a cognitive property like acceptability? The direct method for establishing validity is to compare the results of the measurement method with a second, previously validated, measurement method. This is obviously unavailable for most cognitive properties—if cognitive scientists had a method to directly measure the cognitive property of interest, we would not bother with the unvalidated measurement method. That leaves only indirect methods of validation. One indirect method is to ask whether the theory that results from the data has the properties of a good scientific theory. This, of course, interacts with broader issues in the philosophy of science about what properties a good theory would have, so I will not attempt to provide an exhaustive list. But two possible criteria are: (i) making potentially falsifiable predictions, and (ii) explaining multiple phenomena with a relatively small number of theoretical constructs. In the case of acceptability judgments, I would argue that the resulting theory of syntax does, indeed, have these properties. Another indirect method is to ask whether other data types provide corroborating evidence, modulo the linking theories between the data types and the underlying theory. In the case of acceptability judgments, we can ask whether the resulting syntactic theory can be linked to a sentence-processing theory in a way that makes potentially falsifiable predictions about other psycholinguistic measures, such as reading times, eye movements, or EEG, and ultimately whether these measures corroborate the syntactic theory. I would argue that the current results in the literature connecting syntactic theories and sentence-processing theories are promising. That said, indirect methods cannot guarantee validity. It is logically possible that acceptability judgments could give rise to a theory that has all of the hallmarks of a good theory, but that does not ultimately explain human syntax (perhaps the resulting theory is actually about probability, or plausibility, or even prescriptive grammatical rules). This leads to the final question: How does one establish reliability? In principle, establishing reliability is relatively straightforward, as it simply entails replicating the measurement. The exact replication can vary based on the type of reliability one is interested in: Between-participant (or inter-rater) reliability asks whether the same judgments are obtained with different sets of participants; within-participant (or test–retest) reliability asks whether one set of participants will give the same judgments at two different times; between-task reliability asks whether different judgment tasks will yield the same judgments (either between-participant or within-participant). In practice, establishing the reliability of informal methods is complicated by their informality. By definition, informal methods control the various properties of the judgment collection process less strictly than formal methods, making a strict replication difficult if not impossible. One way to circumvent this problem is to compare the results of informal methods, perhaps as reported in the syntactic literature, with the results of formal experiments. This would be a type of between-task reliability for informal and formal methods, and to the extent that the two sets of results converge, it would establish a kind of reliability for informal methods. Many of the results reported below test precisely this kind of reliability. But

acceptability judgments

7

it is important to note that while convergence between the two methods can be interpreted as establishing a type of reliability for both, divergence between the two methods can be interpreted in three ways: It could be the case that informal methods are unreliable, or it could be the case that formal methods are unreliable, or both. It is tempting to assume that formal methods enjoy some sort of epistemological priority in a divergence (i.e. that formal methods reveal the ground truth), but as many linguists have pointed out, it is easy to imagine experimental materials that lead to unreliable judgments from non-linguist participants, but not from linguist participants (such as garden path sentences like The horse raced past the barn fell). Resolving the source of the divergence between two methods requires follow-up experiments that manipulate specific hypotheses for the divergence. To my knowledge, though there have been many studies of the convergence/divergence between informal and formal methods, there have been no systematic studies of the source of the divergences that do arise (presumably because, as we will see presently, there are relatively few divergences between the methods). In reviewing the evidence collected so far on the convergence between informal and formal methods for judgment collection, it is useful to make a distinction between studies that sampled the data points to retest with bias, and studies that sampled the data points to retest randomly. Biased sampling means that the data points were chosen because of some property that they have; in these studies, this is typically the belief that the specific data points are invalid or unreliable. Typically this belief comes from debates in the literature about the status of the data point, or the researchers’ own (informally collected) judgments. Biased sampling studies can be used to establish that the convergence between informal methods and formal methods is not perfect by showing that the data points in question do not replicate using formal methods. But biased sampling cannot be used to estimate a specific convergence rate. A biased sample could either overestimate or underestimate the actual convergence rate by virtue of the biased selection procedure: A procedure that focuses on selecting known invalid or unreliable data points will almost certainly underestimate the true convergence rate; similarly, a procedure that focuses on selecting likely uncontroversial data points (e.g. a judgment for This is a pen) is likely to overestimate the convergence rate. There are only two options for determining the true convergence rate: An exhaustive comparison of all data points, which would establish the convergence rate with certainty, or a random sampling procedure, which would estimate the convergence rate within a margin of error determined by the size of the random sample relative to the size of the population in question. Biased sampling studies dominated much of the debate about the validity and reliability of informally collected judgments until relatively recently, presumably because of the time and financial cost associated with testing large numbers of data points prior to the creation of crowdsourcing platforms like Amazon Mechanical Turk. Here I will briefly review some of the more prominent biased sampling studies. Wasow and Arnold (2005) tested a claim from Chomsky (1957) that the ordering preference between NPs and particles in verb–particle constructions is based on the complexity of the NP, not the length. They found that the judgments follow Chomsky’s reported judgments when averaged over the entire sample of participants, but that some individual participants

8

jon sprouse

do not report Chomsky’s judgment pattern. A similar result was obtained by Langendoen, Kalish-London, and Dore (1973) when they tested the claim by Fillmore (1965) that wh-movement is impossible out of the first object position of a ditransitive verb (*Who did you show __ the woman?). Eighty-seven out of 99 responses in their experiment indicated a second object interpretation, in line with Fillmore’s claim, but 22 indicated a first object interpretation, contrary to Fillmore’s claim. Though the results corroborate Fillmore’s claim in the aggregate (the proportion is highly significant by sign test), it is possible to interpret the participants who show the opposite pattern as presenting a potential problem for Fillmore’s claim (e.g. Gibson and Fedorenko 2013). These two studies demonstrate the difficulty of defining convergence between informal and formal methods. If one assumes that judgments are variable in the way that other behavioral methods are variable, such that the correct way to analyze the results is to look at the mean across all participants in the sample, then these two examples are convergences. If, instead, one assumes that judgments will show less variability, perhaps because of a belief in a grammar that creates a stark binary contrast between grammatical and ungrammatical sentence types (see Section 1.5), then the participants who fail to show the predicted pattern constitute a divergence. To my knowledge, these two interpretations have not been investigated in detail for these data points, or for any data points that have been claimed to be divergences between informal and formal methods in the judgment literature. Another prominent biased sampling study is Clifton, Fanselow, and Frazier’s (2006) test of Kayne’s (1983) claim that a third wh-word can rectify what would otherwise be a superiority violation (i.e. What can who do about it when? is better than What can who do about it?). In a formal experiment, they found that the two sentences had identical ratings. Fedorenko and Gibson (2010) replicate this finding. However, as Clifton et al. (2006) note, this example illustrates the complexity inherent in defining a theoretically relevant data point. If one assumes that Kayne’s claim is that the three-wh condition should be more acceptable than a two-wh condition, then this is a divergence. But, if one assumes that Kayne’s claim is that the three-wh condition is more acceptable than it would be predicted to be given that it is a superiority violation, then this is in fact a convergence. Clifton and Frazier demonstrate elsewhere in their study that the number of wh-words in a sentence leads to a linear decrease in acceptability: one wh-word is more acceptable than two, and two wh-words is more acceptable than three. It is thus surprising that the three-wh superiority condition is equal in acceptability to the two-wh superiority condition, suggesting that there is a relative increase in acceptability in this configuration over what would otherwise be expected. To my knowledge, this difference in interpretation of the Kayne effect has not been investigated further. The most recent biased sampling study is Linzen and Oseki’s (2018) study of Hebrew and Japanese. They searched several issues of top theoretical journals for “subtle contrasts” that they found “potentially questionable” based on their own native speaker judgments. They identified 14 questionable judgments for each language, and retested them in formal experiments. For Hebrew they found that 7 replicated and 7 failed to replicate. For Japanese they found that 10 replicated and 4 failed to replicate. Like

acceptability judgments

9

previous biased sampling studies, this study demonstrates that there are some number of divergences between informal and formal methods. It also demonstrates that judgments by professional linguists can be used to identify questionable judgments (though it is not possible to calculate the accuracy of this without information about how many data points were considered during the sampling stage). As is the case with all biased sampling studies, it is impossible to use these numbers to estimate the divergence rate for the two methods (as noted by the authors themselves). These 11 divergences could represent a small sample of a much larger number of divergences, or they could represent a substantial proportion of the divergences in the literature. Exhaustive sampling and random sampling studies have recently supplanted biased sampling studies in the literature of reliability. This makes sense, given that (i) exhaustive and random sampling studies can provide the overall convergence rate between methods, and (ii) online crowdsourcing platforms have made exhaustive and random sampling studies much more practical. Sprouse and Almeida (2012) exhaustively tested all of the English acceptability judgment data points in Adger’s (2003) textbook Core syntax. They defined convergence relatively conservatively—statistical significance (using both null hypothesis testing and Bayes factor analysis) in the direction reported by Adger—but crucially assumed that there would be variability in judgments. They found that 98% of the data points replicated in their formal experiments. Sprouse, Schütze, and Almeida (2013) randomly sampled 300 data points (forming 150 two-condition phenomena) from articles published in the journal Linguistic Inquiry between 2001 and 2010, and tested these 150 two-condition phenomena using three different judgment tasks. Using the same relatively conservative criteria as Sprouse and Almeida (2012), they found that 95% of the sampled data points replicated in their formal experiments. Given the size of their sample relative to the number of data points published between 2001 and 2010, the 95% result can be used as an estimate for the overall convergence rate for judgments in LI 2001–2010 with a margin of error of ±5. Sprouse et al. internally replicated their own results with a different sample of participants (for between-participants reliability), and found the same 95% convergence rate. Mahowald, Graff, Hartman, and Gibson (2016) closely replicated this finding (observing a 92% convergence rate), crucially with a novel random sample of phenomena, and with a crowdsourced approach to materials construction (students from a large psychology class at MIT created the materials) rather than relying on professional linguists to create the materials (Mahowald et al. use these results to provide guidelines for the sample size of experiments, a topic that we discuss in more detail in the next section). Moving beyond English, Song, Choe, and Oh (2014) investigated Korean by testing 118 pairs of sentences sampled from two years of the journal Studies in Generative Grammar. Using a rating scale task, they found that 99 out of the 118 pairs replicated by the strictest measure of statistical significance (p < .05 in the predicted direction), yielding a convergence rate of at least 84%. Chen, Xu, and Xie (2020) investigated Mandarin by testing all of the data points in Huang, Li, and Li’s (2009) textbook The Syntax of Chinese. Using a rating scale task, they found that 141 out of 158 tested pairs reached significance in the predicted direction, yielding a convergence rate of at least 89%.

10

jon sprouse

Exhaustive and random sampling studies provide us with several pieces of information that may be relevant for assessing the validity and reliability of informally collected judgments. First, they provide convergence rates between the two methods that we can attempt to interpret. The interpretation of these rates is ultimately subjective, so it is up to individual researchers to determine what level of convergence is required to increase confidence in the validity or reliability of informal methods (see e.g. the discussion formed by Gibson and Fedorenko 2013, Sprouse and Almeida 2013, and Gibson, Piantadosi, and Fedorenko 2013). My subjective opinion is that the observed convergence rates in English of 92–98% are impressive. I know of no other area of cognitive science that has replication rates this high (see Open Science Collaboration 2015 for estimated replication rates in other areas of psychology, all in the range of 36–53%, using similar definitions of replication to the one used in the judgment studies reported above). Second, these results begin to establish a set of known divergences: Linzen and Oseki (2018) find 7 Hebrew and 4 Japanese divergences; under one count, Sprouse and Almeida 2012 find 6 (English) divergences in Adger (2003); under one count, Sprouse et al. (2013) find 9 (English) divergences in Linguistic Inquiry. Under one count, Mahowald et al. (2016) find 8 English divergences. Though these are relatively few compared to the hundreds of data points that have been investigated, they still deserve follow-up work, as the divergence itself does not tell us which result better reflects reality. Relatedly, these results provide some information about the types of divergences that we find. The vast majority of the divergences involve a null result in the formal experiments. These null results are ambiguous between a true divergence and low statistical power for the size of the effect to be detected. Very few of the divergences involve a sign reversal—an effect in the formal experiment that is opposite in direction to the effect reported using informal methods. This suggests that the most egregious sources of differences between the two methods, such as a contamination of theoretical bias from professional linguists, are relatively rare (see also Dabrowska 2010 for evidence that the differences in judgments between professional linguists and non-linguists when using a rating scale at most differ quantitatively, not qualitatively). Finally, it should be noted that the vast majority of this work has been done in English, and has focused on standard acceptability judgments that do not involve coreference, prosody, or multiple interpretations. Though the results of these studies have been encouraging, there is very clearly a need for studies in other languages, and a need for studies on different judgment types.

1.3 The differences among acceptability judgment tasks

..........................................................................................................................

A second set of questions that formal judgment experiments are particularly well-suited to explore concerns the differences among judgment tasks. The results of the exhaustive and random sampling convergence studies discussed in Section 1.2 suggest that, at

acceptability judgments

11

least when it comes to detecting differences between sentence types, informal methods provide valid and reliability results. Furthermore, the relatively narrow range of results in those studies suggests that all of the formal experimental tasks employed by those studies provide roughly similar information. This is reassuring, as it suggests that all of the major methods for collecting acceptability judgments tap into the same underlying cognitive states and/or cognitive processes. But there are still a number of finer-grained questions that one could ask about the differences between tasks in order to optimize the use of formal experimental methods to address specific hypotheses in the syntactic literature. In this section, I will review a number of the most prominent questions that have been asked in the literature, as well as point out some of the questions that have not, to my knowledge, been systematically investigated yet. The first question one could ask is whether different tasks provide different pieces of information about acceptability. The answer is almost certainly yes, given that different tasks ask participants to provide acceptability judgments in different ways. Here I will mention three classes of tasks to illustrate these differences. Rating tasks ask participants to rate individual sentences along a scale with some number of points (such as 1 through 7). Rating tasks can provide information about the rating of individual sentences along what is assumed to be a linear rating scale with regular intervals between the numbers on the scale. This information can be used to describe the absolute acceptability of the sentence, or the ratings of two sentences can be used to calculate the size of the difference between them. Categorization tasks ask participants to assign sentences to some number of nominal categories (without the assumption of regular linear distances between the categories). The most common example is the two-alternative forced-choice task that asks participants to rate sentences as acceptable or unacceptable. Categorization tasks can be used to determine the category membership of sentences in cases where theories make predictions about some number of acceptability categories. But they provide only coarse-grained information about the absolute acceptability of the sentences. They also provide only coarse information about the size of the difference between two sentence types (in the form of the difference in the proportion of responses to each sentence). Selection tasks ask participants to choose a sentence out of a set of sentences. The most common example is a two-alternative forced-choice task that asks participants to select the more acceptable sentence in a pair. Selection tasks are extremely sensitive to the presence or absence of a difference between sentence types (because the explicit task is to detect a difference), but provide no information about individual acceptability, provide no information about categories, and only provide very coarse-grained information about the size of differences between sentence types (through the proportion of selections). Given these differences, at the most general level, the choice of task should reflect the type of information that is necessary to test the hypothesis under consideration. The second question one could ask is whether there are optimizations to be made within each class of tasks—in other words, if there is an optimal instantiation of a specific task. Bard, Robertson, and Sorace (1996) and Cowart (1997) initiated a new line of research in judgment methodology by asking just this question about rating tasks. As

12

jon sprouse

both note, the psychophysicist Stanley Smith Stevens observed that typical rating tasks potentially suffer from (at least) two drawbacks. The first is that the number of response options is typically finite. If the number of options is too small, participants may be able to perceive differences among the stimuli of the experiment that they cannot report using the response scale. The second is that the rating scale assumes that the intervals between the response points are uniform—that is, the interval between 1 and 2 is the same size as the interval between 4 and 5. Though this may turn out to be true, it cannot be guaranteed for every participant or every stimulus. Stevens (1956) proposed a new type of rating task called magnitude estimation to eliminate these two potential drawbacks. Bard et al. (1996) and Cowart (1997) adapted Stevens’ magnitude estimation task to acceptability judgments. In the most typical version of magnitude estimation of acceptability, one sentence, called the standard, is presented to the participants along with a number, called the modulus, which represents the standard’s acceptability. The standard is typically chosen such that it is somewhere in the middle range of acceptability, and the modulus is typically set to be a number that is easy to divide and multiply, such as 100. Participants are then shown other sentences without any acceptability numbers associated with them. They are told to rate each sentence as a multiple or fraction of the standard. If the sentence is twice as acceptable, the rating should be 200; if the sentence is half as acceptable, the rating should be 50. The fundamental idea is that the standard becomes a type of perceptual measurement unit that participants use to measure the acceptability of the sentences in the experiment. Because the response scale is the positive real number line, participants can report as many distinctions among stimuli as they want. And because there is just one interval (the acceptability of the standard), there is no risk of unequal interval distances. Both Bard et al. (1996) and Cowart (1997) demonstrated the potential utility of magnitude estimation (and also some of the potential drawbacks of finite response scales). Their results inspired a number of syntacticians to explore the utility of magnitude estimation across a number of phenomena (e.g., Keller 2000; Sorace 2000; Featherston 2005a). Ultimately, the rising popularity of magnitude estimation led a number of researchers to directly test its purported advantages over rating tasks. Weskott and Fanselow (2011) compared the results of magnitude estimation and a standard rating task for three phenomena in German, and found that magnitude estimation leads to higher variability, and consequently smaller standardized effect sizes and less statistical power. They conclude that this may be a consequence of the larger response scale in magnitude estimation. Sprouse (2011) adapted methods from the psychophysics literature to test one of the fundamental cognitive assumptions of magnitude estimation—that participants can make ratio judgments of the acceptability of a sentence (i.e. that it is a multiple of the acceptability of the standard). The results suggest that participants cannot make ratio judgments of acceptability. This runs contrary to the results for other types of psychophysical judgments in the literature, which has historically demonstrated that participants can make ratio judgments of physical stimuli like brightness and loudness. The impossibility of ratio judgments of acceptability is likely because acceptability has no true zero point that represents the absence of all acceptability. Without a true zero,

acceptability judgments

13

ratio judgments are impossible. The Sprouse (2011) results suggest that participants cannot do true magnitude estimation with acceptability; therefore Stevens’ arguments that magnitude estimation is superior to other rating tasks simply do not apply to acceptability judgments. Participants in magnitude estimation experiments must be covertly converting the magnitude estimation task into some other type of rating task that they can actually perform, and this covert rating task likely suffers from the same potential drawbacks as other rating tasks. This in turn leads to a further interpretation of the Weskott and Fanselow (2011) results. In both cases participants are completing a rating task; but in the case of magnitude estimation, the rating task that they are adopting is leading to more variability and lower statistical power (perhaps due the unbounded response scale, or perhaps due to other issues that arise when participants are asked to perform a task that they cannot cognitively complete). As such, there is no argument, either logical or empirical, to support the use of magnitude estimation over other rating tasks for acceptability judgments. The third question that one could ask is whether different tasks yield different levels of sensitivity to acceptability judgment differences. Bard et al. (1996) began this line of questioning by comparing several conditions in both a rating task and magnitude estimation. They report that magnitude estimation allows participants to report more levels of acceptability than a rating task with 5 points. Bader and Haüssler (2010) compared magnitude estimation to a two-alternative categorization task for 16 sentence types (forming one 2×2 design and two 2×3 designs) in order to create a signal detection model for acceptability judgments. In the process, they report that the two tasks yield the same pattern of statistically significant results at two sample sizes: 24 and 36 participants. As previously mentioned, Weskott and Fanselow (2011) compared magnitude estimation and rating tasks for three phenomena, and found more variability for magnitude estimation, which in turn decreases statistical power. Sprouse and Almeida (2017) tested 47 two-condition phenomena taken from Linguistic Inquiry (from Sprouse et al. 2013) using all four types of tasks mentioned previously—a 7-point rating task, a two-alternative categorization task, a two-alternative selection task, and magnitude estimation. They used resampling simulations to estimate statistical power for each phenomenon and each task at a range of sample sizes from 5 to 100 participants. Their results suggest that the selection task has the most statistical power for detecting differences between two conditions (as expected given the logic of the task), that the rating task and magnitude estimation tasks have similar statistical power (both less than the selection task), and that the categorization task has the least statistical power (again, as expected given that it will, by definition, have trouble detecting differences between two conditions that fall into the same category). Their results can also be used to estimate the number of participants necessary for a given level of statistical power for a wide range of effect sizes. Langsford, Perfors, Hendrickson, Kennedy, and Navarro (2018) extend these findings (using the Sprouse et al. 2013 materials) in two ways: (i) they focus on test–retest reliability (instead of statistical power), and (ii) they include an additional task from psychophysics, the Thurstone method (Thurstone 1927). The Thurstone method is

14

jon sprouse

a combination of a two-alternative selection task in which the pairs of sentences are random, and a modeling procedure which converts judgments of the random pairings into an ordering among all of the test items along an inferred acceptability scale. Their results suggest that the Thurstone method is not superior to the best performing traditional methods, as traditional rating tasks and selection tasks demonstrate the best combination of within-participant and between-participant test–retest reliability. That said, the high degree of correlation between the Thurstone method, which makes very few assumptions about the nature of acceptability, and the traditional rating task, which makes many assumptions by virtue of the structure of the rating scale, adds an extra dimension of validation to the traditional rating task. In fact, spread throughout all of the papers discussed here is quite a bit of information about the correlation among the various methods. In all cases, the methods appear to be yielding highly correlated results, modulo the differences in the types of information the tasks yield, and minor differences in reliability. Most recently, Marty et al. (2020) expanded the number of tasks investigated (using the Sprouse and Almeida 2017 materials and methods) in order to quantify the individual contributions of the different elements of the tasks, such as the number of responses available to participants and the number of sentence types presented in a single trial. Though the finer details of the results are beyond the scope of this brief chapter, the overall pattern once again supports the conclusion that there is a supremely high degree of correlation among different types of judgment tasks. Though there has obviously been much work on the effects of different tasks, there are still a number of properties of judgment tasks that have not been systematically investigated. What is the contribution of task instructions? Cowart (1997) has some initial results on this, suggesting that there is relatively little impact of the instructions, but I know of no other systematic studies. What is the effect of the number of judgments per condition per participant on statistical power? The existing studies all used one judgment per condition, so they offer only a minimum estimate. What is the effect of the number of fillers (unrelated, and typically unanalyzed, items) in the experiment? It is often assumed that the judgments of individual sentences can be pushed higher or lower by including different types of fillers (e.g. extremely unacceptable fillers could lead to higher ratings for otherwise unacceptable sentences). What is the effect of the number of distinct items created for each condition? This interacts with the debate in the statistical literature about random effects and item generalizability (e.g. Clark 1973; Wike and Church 1976; Raaijmakers, Schrijnemakers, and Gremmen 1999; Barr, Levy, Scheepers, and Tily 2013). These questions also interact with the proposal by Myers (2009) that there may be a middle ground between typical informal experimental methods and fully formal experimental methods that he calls “small-scale” experiments. The answers to these questions could both inform the construction of full-fledged formal experiments and also help determine exactly how small scale experiments can be for different syntactic questions.

acceptability judgments

15

1.4 The source of acceptability judgment effects

..........................................................................................................................

As briefly discussed in Section 1.1, acceptability judgments are typically assumed to be impacted by a number of factors. This means that the source of any given acceptability judgment effect is ambiguous: The effect could be due to a syntactic constraint violation, a violation in a different part of the grammar, some component of sentence processing (beyond the recognition of a syntactic violation), word or construction frequency, sentence plausibility, or any number of other factors that impact sentence comprehension. The primary tool for dealing with this ambiguity is experimental design—constructing conditions to isolate a potential syntactic effect to the exclusion of all other possible types of judgment effects. Experimental design can be leveraged this way with either informal or formal judgment methods; and, in fact, many syntax articles that use informally collected judgments include explicit manipulations designed to exclude extra-syntactic explanations for acceptability effects. That said, formal judgment experiments can contribute to the investigation of the source of judgment effects in two ways—by formalizing the process of designing an experiment to isolate the factors that contribute to acceptability (through factorial logic), and, in the instances where it is impossible to logically separate syntactic effects from other possible effects, by quantifying judgments in a way that allows us to test predictions that involve data types beyond offline acceptability judgments. In this section, I will use island effects as an example phenomenon to illustrate these two properties of formal judgment experiments. Factorial logic is a formalization of the process that all experimentalists use to test for the presence of an effect (including syntacticians who use informal experiments to collect acceptability judgments). The term “factor” means a property that can be manipulated, such as some dimension of the structure of a sentence; the term “level” is used to refer to the specific values that a factor can take. Factors can be continuous (an infinite number of levels) or categorical (a finite number of levels). The goal of factorial logic is to isolate effects using subtraction logic. As a concrete example, we can look at a factorial design for island effects. As a first definition, we can define island effects as the low acceptability that arises when the tail of a long-distance dependency is contained within a specific structure, called an island structure. The whether-island sentence in (1d) below is a classic example: the tail of the wh-dependency is contained within the embedded whether clause. The space of possible sources for this effect is large; it is bounded by the list of factors that we believe contribute to acceptability judgments (i.e. the linking hypothesis that was briefly discussed in Section 1.1 and in the previous paragraph). For this example, we will consider two possible sources. The first possibility is that there is a constraint in the grammar that specifically targets the syntactic structure of (1d) and rules it ungrammatical (which then leads to unacceptability when coupled with an appropriate linking hypothesis between ungrammaticality and acceptability). A second possibility proposed by Kluender and Kutas (1993) is that the sentence in (1d) is

16

jon sprouse

grammatical, but that the low acceptability is the result of the combination of two types of processing complexity: the complexity associated with processing a long-distance dependency, and the complexity associated with processing the island structure itself (in this case, the embedded whether clause). The Kluender and Kutas (1993) theory suggests that there are (at least) two acceptability judgment effects that any investigation of island effects will want to quantify: The effect of a long-distance dependency, and the effect of parsing the island structure. We can use factorial logic to isolate these two effects with two factors. The first we can call dependency length. At its simplest, we can define two levels for dependency length—short and long. The subtraction between these two will isolate the effect of a long-distance dependency. The second we can call structure. Again, at its simplest, we can define two levels for structure: non-island and island. The subtraction between the two will isolate the effect of parsing an island structure. With two factors, each with two levels, we have a 2×2 design (each digit in this label represents a factor, and each value of the digit represents the number of levels), which yields four conditions, as in (1):

(1) a. Who __ thinks that Mary wrote a book? b. What do you think that Mary wrote __? c. Who __ wonders whether Mary wrote a book? d. What do you wonder whether Mary wrote __?

dependency short long short long

structure non-island non-island island island

The factorial design in (1) lends itself to subtraction logic in the following way. The subtraction of (1a–1b) isolates the effect of processing a long-distance dependency by subtracting the short and long levels of dependency length while holding structure constant (and not involving the potentially ungrammatical sentence). The subtraction of (1a–1c) isolates the effect of processing the island structure by subtracting the nonisland and island levels of structure while holding dependency length constant. Armed with these quantities, it becomes possible to test two competing classes of theories of what it means to be an island effect. If an island effect is completely reducible to the combination of the dependency length and structure processing effects, then we would predict that the difference (1a–1d) should be a linear sum of the differences (1a–1b) and (1a–1c); in other words: (1a–1d) = (1a–1b) + (1a–1c). In contrast, if island effects are more than just the linear combination of these two processing effects, as predicted by grammatical theories, as well as more complex sentence-processing based theories, we would expect that the difference (1a–1d) will be larger than the linear sum of these differences; in other words: (1a–1d) = (1a–1b) + (1a–1c) + X, where X is some additional effect that is not isolated by any of the factors. In statistical terms, we would say that the simple reductionist theory in which island effects are the result of linearly combining the two processing effects predicts two main effects, one for dependency length and one for structure, but no interaction of the two factors. The class of more complex theories predicts that there will be a superadditive interaction between the two factors, such that

acceptability judgments

mean judgment

simple reductionist

superadditive

17

observed results

1

0

–1

short

long

short

long

short

long

length non–island

island

fig. 1.1 The two predictions of the 2×2 design for whether-islands (left panel and center panel), and the observed results of an actual experiment (right panel)

combining the long and island levels leads to a larger effect than the linear sum of the two factors alone. The left panel of Figure 1.1 illustrates the prediction of the simple reductionist theory. The parallel lines indicate that the acceptability of the long|island condition is the linear sum of the two factors. The center panel of Figure 1.1 illustrates a superadditive interaction in which the acceptability of the long/island condition is lower than one would expect based on the two factors alone. Finally, the right panel of Figure 1.1 shows the observed results of a real acceptability judgment experiment using this design (with 32 participants, participants rated two tokens of each condition, 9 practice items, and 14 fillers using a 7-point rating task). The fact that we observe a superadditive interaction in the real experiment suggests that the simple reductionist theory is incorrect. Whether-island effects are not the simple linear sum of the effects of long-distance dependencies and island structures. This result suggests that we must explore the class of more complex theories to explain whether-island effects. A number of studies have used this factorial logic to explore the variation in the presence and absence of different types of island effects across the world’s languages. Here is a non-exhaustive list covering an interesting subset of languages: Arabic: Tucker, Idrissi, Sprouse, and Almeida (2019); Danish: Christensen, Kizach, and Nyvad (2013); English: Sprouse (2007); Sprouse, Wagers, and Phillips (2012); Goodall (2015); Hofmeister, Culicover, and Winkler (2015); Atkinson, Apple, Rawlins, and Omaki (2016); Italian: Sprouse, Caponigro, Greco, and Cecchetto (2016); Japanese: Sprouse, Fukuda, Ono, and Kluender (2011); Korean: Kim and Goodall (2016); Norwegian: Kush, Lohndal, and Sprouse (2018); Slovenian: Stepanov, Mušič, and Stateva (2018); Spanish: Pañeda et al. (2020). Before we explore the class of more complex theories, it is worth noting that factorial designs like the one discussed here can be used to define a necessary condition for the existence of a syntactic explanation for an acceptability judgment effect. A syntactic explanation entails that the acceptability effect cannot be fully explained by the other factors that impact acceptability (e.g. processing effects, semantic effects, task effects).

18

jon sprouse

In statistical terms, a syntactic explanation entails that there is no factorial design consisting of solely of non-syntactic factors that leads to linear additivity; all such designs will yield superadditivity (because, by definition, they do not contain a factor to capture the syntactic constraint). Another way to view this is that, given a factorial design consisting exclusively of factors that lie outside of the theory of syntax, linear additivity as illustrated in the left panel of Figure 1.1 is deductive evidence for a simple reductionist approach to acceptability judgment effects, while superadditivity as illustrated in the center panel of Figure 1.1 is ambiguous between a syntactic explanation or a complex interaction of extra-syntactic factors. In short, supreadditivity in these designs is a necessary, but not sufficient, condition for a syntactic explanation. This is not a new observation. Careful work in theoretical syntax has always incorporated both factorial logic and this necessary condition for the existence of syntactic explanations. This is easily seen in the syntax literature where factorial designs are common albeit rarely described using factorial terminology (see also Myers 2009), and where researchers often demonstrate that non-syntactic factors cannot completely explain the judgment effect. The point here is simply that formal experiments allow us to make this logic explicit, and potentially allow us to isolate a larger number of factors simultaneously, insofar as they make quantifying judgments (and keeping track of multiple conditions) a bit easier. At this point it is clear that the superadditive pattern for whether-islands in Figure 1.1 could be due either to a syntactic constraint targeting condition (1d), or to an interaction between the processing of long-distance dependencies and the processing of island structures (which only arises in condition 1d). Teasing these two explanations apart is not trivial. This brings us to the second benefit of formal judgment experiments when it comes to identifying the source of acceptability judgment effects—the ability to test hypotheses that go beyond offline acceptability judgments. One example of this is looking for relationships between acceptability judgments and other data types that might be indicative of a causal relationship between extra-syntactic factors (in the case of islands, sentence-processing factors) and the acceptability effect. A number of studies have explored this approach. Stowe (1986) was perhaps the first, using self-paced reading to demonstrate (i) that the parser attempts to complete long-distance dependencies at the first available gap location (a strategy that Frazier and Flores d’Arcais 1989 later named the active filler strategy), and (ii) that the parser does not attempt to complete dependencies inside of finite subject islands. Phillips (2006) reviews a number of studies that extend Stowe’s finding using different islands and different data types (such as eyetracking and ERPs). Phillips (2006) also demonstrates that the behavior of the parser is even more sophisticated than previously thought, as it not only suppresses the active filling strategy inside of islands but also allows active gap-filling inside of islands that can participate in parasitic gap constructions (Engdahl 1983; see also Culicover 2001 for a review of parasitic gaps). The crucial fact here is that these islands do give rise to unacceptability when there is only one gap in the construction (inside of the island). Phillips’s result suggests that this unacceptability cannot be due to the inability of the parser to complete a dependency inside of these islands, because there is reading-time

acceptability judgments

19

evidence that the parser does complete the dependency. This dissociation between acceptability and first-pass sentence processing means that the resulting unacceptability must be due to some later process, such as a check against the grammatical requirements of a parasitic gap configuration (i.e. a second gap outside of the island). Sprouse, Wagers, and Phillips (2012) take a slightly different approach. Instead of investigating real-time sentence-processing effects, they investigate the relationship between acceptability judgments and individual variation in working memory capacity. The rationale behind this is a specific proposal by Kluender and Kutas (1993) that island effects are the result of limited working memory resources, such that the simultaneous processing of long-distance dependencies and island structures taxes working memory resources beyond their capacity, leading to the perception of unacceptability. Sprouse et al. interpret this proposal to predict that there should be a detectable inverse relationship between working memory capacity and the size of island effects as quantified by the superadditive interaction in acceptability judgments—as working memory increases, the size of the island effect should decrease. They tested a large number of participants on two working memory tasks (serial recall and n-back) and four island types, yet found no evidence of a relationship (see also Michel 2014 for a replication using a third working memory task, serial recall). Yoshida et al. (2014) test a second prediction of the Kluender and Kutas working memory theory: If island effects are due to working memory capacity, island effects should arise for any dependency that triggers the same working memory requirements as wh-dependencies. It has long been established in the syntactic literature that island effects, as defined as acceptability judgment effects, do not arise for binding. But Yoshida et al. demonstrate that island effects also have no impact on the real-time processing of binding dependencies. Their study builds upon previous work showing that the parser attempts to resolve binding dependencies in which an anaphor appears before its antecedent (called cataphora, or backwards anaphora) at the first possible opportunity during real-time processing (van Gompel and Liversedge 2003; Sturt 2003; Kazanina, Lau, Lieberman, Yoshida, and Phillips 2007). This process shares a number of similarities with active gap filling, including recruiting the same areas of the left inferior frontal gyrus in the brain (Matchin, Sprouse, and Hickok 2014). Despite these similarities, Yoshida et al. demonstrate that island structures do not suppress the search for an antecedent for binding during real-time processing, contrary to the plausible prediction of the working memory approach to island effects. Another classic example of testing predictions beyond offline acceptability judgments can be found in the syntactic satiation literature. Syntactic satiation is the phenomenon by which acceptability judgments for a specific sentence type appear to increase with repeated exposures to that sentence type. Syntactic satiation has long been informally reported by professional linguists, particularly over the course of weeks or months working on a specific phenomenon. The satiation studied in the judgment literature is different, as the goal has been to induce satiation in non-linguist participants over short (single-experiment) timescales, thus potentially linking the effect to syntactic priming, as recently discussed in Do and Kaiser (2017), or implicit learning, as in Luka

20

jon sprouse

and Barsalou (2005). The first systematic study that I am aware of is Snyder (2000). He tested seven different sentence types instantiating distinct syntactic violations, and found that three violation types showed evidence of satiation over the course of five exposures, but that four other violation types did not. Though these results were not designed to probe the source of the satiation effects (as it was just a first study), Snyder suggested that one possibility might be that satiation differs based on the source of the acceptability judgment effect. His specific claim was that judgment effects that satiate may be due to sentence-processing sources, while judgment effects that do not satiate may be due to grammatical sources. The underlying idea appears to be either that the difficult parsing processes themselves might get easier with time, or that licit grammatical representations might somehow become easier to construct with time. In either case, the prediction would be that acceptability judgment effects due to sentence processing should satiate, whereas effects due to grammatical violations should not. The complication with the satiation literature is that satiation effects have proved particularly variable across experiments. Several authors have attempted to replicate and extend Snyder’s results, with mixed results (see Hiramatsu 2000; Sprouse 2009; Francom 2009; Goodall 2011; and Do and Kaiser 2017 for English islands among other violation types; see Christensen et al. 2013 for Danish islands). This has led some authors to raise the possibility that source of satiation effects may not be as theoretically deep as Snyder suggested, perhaps instead reflecting: (i) task effects such as identifiability (and therefore correctability) of the violations (Francom 2009), (ii) response strategies intended to equalize the number of responses along the provided response scales (Sprouse 2009), (iii) mere exposure effects (Luka and Barsalou 2005), or (iv) syntactic priming (Do and Kaiser 2017). Though this section has reviewed a number of studies that have attempted to identify the source of judgment effects, there are still directions that need to be explored. For syntactic theory as a whole, there is relatively little formal experimental research into the source of acceptability judgments for the vast majority of phenomena. This could be because there are no other potential candidate theories to explain the judgment effects; that is, it could be the case that island effects are a special case in that there are potential extra-syntactic explanations, like the Kluender and Kutas (1993) working memory theory. Or this could be because experimental methods have only recently been adopted on the scale necessary to do this kind of research. For island effects, there is still relatively little formal experimental research into grammatical explanations outside of syntax, like the semantic approaches of Szabolcsci and Zwarts (1993) or Abrusán (2011), or the pragmatic approaches of Erteschik-Shir (1973) or Goldberg (2007). At the level of sentence processing, there are still a number of open questions about the processing of long-distance dependencies, including to what extent active gap-filling is suppressed in islands other than subject islands, and to what extent there are second-pass gapfilling mechanisms in constructions like parasitic gaps. And when it comes to satiation much work still needs to be done to both solidify the empirical results and develop a causal theory of satiation (both at short, single-experiment timescales and at the longer timescales experienced by professional linguists).

acceptability judgments

21

1.5 The architecture of the grammar

..........................................................................................................................

gradience by effect size 4

1

0

–1

3 2 1 0

0 100 200 sentence types (ordered by judgment)

0 50 100 phenomena (ordered by effect size)

mean judgment (z–transformed)

gradience by sentence type effect size (Cohen's d)

mean judgment (z–transformed)

Any acceptability judgment collection method, be it relatively informal or relatively formal, can contribute to investigations of the architecture of the grammar. Acceptability judgments are, after all, the primary data type of syntactic theory; and the architecture of the grammar is, after all, the primary object of study of syntactic theory. The special relevance of formal judgment methods for questions about the architecture of the grammar is that formal methods allow us to quantify acceptability judgments with a level of precision that makes it readily apparent that acceptability is a continuous measure. Figure 1.2 shows this in three ways. The left panel shows the mean acceptability for the 300 individual sentence types from Sprouse et al. (2013), arranged in order of increasing acceptability. It is clear that there are no substantial step-like breaks in these means to indicate quantized acceptability. The center panel shows the mean effect size for the 150 pairwise phenomena from Sprouse et al. (2013), arranged in order of increasing size. Again, there are no substantial step-like breaks to indicate quantized effect sizes. The right panel shows the same pairwise phenomena, ordered by effect size, but with the ratings of each sentence type represented by the end points of the lines. There is nothing particularly novel in these results. The continuous nature of acceptability has been acknowledged since the earliest days of generative syntax (Chomsky 1957). However, the rise of formal experimental methods for judgment collection has made it easier than ever to demonstrate the continuous nature of acceptability, which in turn has led to a renewed interest in determining its source. This is a complex question. Here I simply want to touch upon three of the questions that drive current research into continuous acceptability and what it may, or may not, reveal about the architecture of the grammar. The first question we can ask is whether the grammar itself is categorical—yielding some discrete number of levels of grammaticality—or continuous—yielding an infinite number of levels of grammaticality. Both types of grammars can explain continuous acceptability, albeit in different ways. For categorical grammars, continuous acceptability

gradience by phenomenon 1

0

–1

0 50 100 phenomena (ordered by effect size)

fig. 1.2 Three demonstrations of the continuous nature of acceptability judgments

22

jon sprouse

must be entirely the result of continuous extra-grammatical factors, such as sentenceprocessing mechanisms, plausibility, or even task effects (as Armstrong, Gleitman, and Gleitman 1983 showed, even concepts that are categorical by definition, such as even number, will receive continuous judgments under certain rating tasks). For continuous grammars, continuous acceptability is the result of a combination of continuous extragrammatical factors (sentence processing mechanisms, plausibility, task effects) and the infinite levels of grammaticality made available by the grammar. What this means in practice is that continuous acceptability itself is not dispositive of the two architectures. Instead, we must build relatively complete theories of both types, to see which better explains the phenomena we wish to explain (continuous acceptability judgments, sentence-processing facts, language acquisition facts, etc.). Assuming that we want to pursue the strategy of constructing both grammar types to evaluate their ability to predict continuous acceptability, the second question we can ask is how to make grammatical theories continuous. One option is to directly add abstract weights to the constraints that already exist in familiar grammatical theories. Two famous examples of this direct approach are Keller’s (2000) Linear Optimality Theory (see also Keller 2006, which includes a nice comparison of Linear Optimality Theory to other grammar types, such as Harmonic Grammar and Stochastic Optimality Theory), and Featherston’s (2005b) Decathlon model. Though these direct approaches to gradience are relatively successful at explaining continuous acceptability judgments, the direct approach leads to two complications. The first is that, even for continuous grammars, continuous acceptability is not solely driven by the grammar; it is also driven by continuous extra-grammatical factors. This means that we cannot determine constraint weights directly from acceptability judgments. We still need a relatively complete theory of extra-grammatical factors to help us determine how much of the continuous acceptability is due to the grammar and how much is due to factors outside the grammar. The second complication is that, if abstract constraint weights vary cross-linguistically, they must be learned by children acquiring the language. We already know from the acquisition literature that learning syntactic constraints is a complex problem in its own right. Adding weights to these constraints compounds the acquisition problem. Not only does this dramatically increase the space of possible grammars that children must explore (the space of all possible combinations of constraints and all possible combinations of weights), it also raises difficult questions about what would count as evidence for the acquisition of constraint weights. As researchers, we can measure acceptability from native speakers in an experiment, and then use that information to develop the weights for a continuous grammar; but children likely do not have access to the (presumably internal) acceptability judgments of the speakers around them. This suggests that purely abstract weights are likely only viable under a strongly nativist theory in which either the constraints are innately specified, the weights are innately specified, or both. A second option is to avoid purely abstract constraint weights, and instead link the constraint weights to continuous quantities that exist in the language system for independent reasons. The space of possible quantities that could be used to ground

acceptability judgments

23

constraint weights is relatively large, and therefore beyond the scope of this chapter. However, I would like to mention that one robust area of research involves using frequencies of occurrence (as estimated through natural language corpora) to derive probabilities that can be added in various ways to different grammar types to yield gradient outputs. Classic examples of this can be found in the phonology literature, where Stochastic Optimality Theory (e.g. Boersma and Hayes 2001) and Maximum Entropy grammars (e.g. Jäger 2007) provide two ways to convert frequencies into grammar-internal quantities that can yield gradience like we see with continuous acceptability. Classic examples of this can also be found in the computational linguistics literature, where various grammar types, such as context-free grammars, have long been augmented with probabilities to capture the gradience that we see in production frequencies. Hunter and Dyer (2013) have extended this work to develop probability distributions for minimalist grammars (in the sense of Stabler 1997). And Bresnan (2007) has demonstrated the utility of systematically investigating the psychological factors that can influence gradience in production through a case study of the dative alternation in English. To my knowledge, none of these frameworks have been implemented as comprehensively in service of explaining continuous acceptability as the direct approach to abstract constraint weights; but these previous studies do show that it is possible, in principle, to ground continuous acceptability in an independently motivated quantity. Assuming that we wish to pursue the strategy of explaining continuous acceptability through something like a continuous grammar, the final question we can ask is whether the best strategy is to look for ways to add continuous properties to existing grammatical architectures (e.g. Optimality Theory, Minimalism), or whether we should instead begin to consider new grammatical architectures altogether. The integrated connectionist/symbolic (ICS) cognitive architecture developed by Paul Smolensky and colleagues (most recently in the Harmonic Grammar framework; Smolensky and Legendre 2006) has long been at the forefront of this line of research. The ICS architecture provides a linking theory between continuously valued neural networks and categorical symbolic grammatical theories like Optimality Theory and Harmonic Grammar that are more familiar within linguistics. This raises the possibility of grounding quantities like continuous acceptability in lower-level quantities of the cognitive architecture itself, such as connection weights or Harmony. Moving in a slightly different direction, Lau, Clark, and Lappin (2017) suggest that adding probabilities directly in the grammar can not only help to capture the continuous nature of acceptability but also make grammatical architectures that linguists have typically dismissed as inadequate to capture human grammars more viable, such as n-gram models (cf. Chomsky 1956) and simple recurrent neural networks (cf. Marcus 2001) that are trained solely on surface word strings from a naturally occurring corpus. There is some debate about how successful these models are at explaining syntactic phenomena (e.g. Sprouse, Yankama, Indurkhya, Fong, and Berwick 2018), but the broader point still holds—the addition of continuous quantities to grammatical theories could motivate a re-evaluation of the adequacy of different grammatical architectures. The recent explosion of work in machine

24

jon sprouse

learning using neural networks (of which Lau et al. 2017 is part) is poised to dramatically expand the range of possible theories that linguists might consider for explaining acceptability judgments (see also Warstadt, Singh, and Bowman 2018 for a neural net that was trained to perform categorical judgments).

1.6 Conclusion

..........................................................................................................................

Acceptability judgments have been the primary data type in (generative) syntactic theory for over 60 years, and will likely continue to be, at the very least, a substantial component of the empirical base of syntactic theory for many years to come. This is because judgments provide the kind of information that syntacticians need to construct syntactic theories—information about which sentence types are licensed by the grammar and which are not (i.e. assuming a linking hypothesis in which grammar is one of the factors influencing acceptability judgments). Formal experimental methods for judgment collection provide another useful tool for syntacticians to probe the nature of the grammar. Formal methods can be used to resolve both methodological questions, such as questions about validity, reliability, and sensitivity, and theoretical questions, such as questions about the source of acceptability effects and the architecture of the grammar. This chapter has reviewed some of the work that has been done to date on these questions. But to my mind, the work of using formal experimental methods has really just begun. It will be up to the next generations of syntacticians to figure out how to leverage the power of formal experimental methods, across new empirical domains and new theoretical questions, in order to push the boundaries of syntactic theory.

References Abrusán, Márta. 2011. Wh-islands in degree questions: A semantic approach. Semantics and Pragmatics 4: 1–44. Adger, David. 2003. Core syntax: A minimalist approach. Oxford: Oxford University Press. Armstrong, Sharon Lee, Gleitman, Lila R., and Gleitman, Henry. 1983. What some concepts might not be. Cognition 13: 263–308. Atkinson, Emily, Aaron Apple, Kyle Rawlins, and Akira Omaki. 2016. Similarity of wh-phrases and acceptability variation in wh-islands. Frontiers in Psychology 6: 2048. Bader, Markus, and Jana Häussler. 2010. Toward a model of grammaticality judgments. Journal of Linguistics 46: 273–330. Bard, Ellen Gurman, Dan Robertson, and Antonella Sorace. 1996. Magnitude estimation of linguistic acceptability. Language 72: 32–68. Barr, Dale J., Roger Levy, Christoph Scheepers, and Harry J. Tily. 2013. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language 68: 255–278. Boersma, Paul, and Bruce Hayes. 2001. Empirical tests of the gradual learning algorithm. Linguistic Inquiry 32: 45–86.

acceptability judgments

25

Bresnan. Joan. 2007. Is syntactic knowledge probabilistic? Experiments with the English dative alternation. In Sam Featherston and Wolfgang Sternefeld (eds), Roots: Linguistics in search of its evidential base, 77–96. Berlin: Mouton de Gruyter. Chen, Zhong, Yuhang Xu, and Zhiguo Xie. 2020. Assessing introspective linguistic judgments quantitatively: The case of The Syntax of Chinese. Journal of East Asian Linguistics 29: 311– 336. Chomsky, Noam. 1956. Three models for the description of language. IRE Transactions on Information Theory 2: 113–124. Chomsky, Noam. 1957. Syntactic structures. The Hague: Mouton. Christensen, Ken Ramshøj, Johannes Kizach, and Anne Mette Nyvad. 2013. Escape from the island: Grammaticality and (reduced) acceptability of wh-island violations in Danish. Journal of Psychonlinguistic Research 42: 51–70. Clark, Herbert H. 1973. The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. Journal of Verbal Learning and Verbal Behavior 12: 335–359. Clifton Jr, Charles, Gisbert Fanselow, and Lyn Frazier. 2006. Amnestying superiority violations: Processing multiple questions. Linguistic Inquiry 37: 51–68. Cowart, Wayne. 1997. Experimental syntax. Thousand Oaks, CA: Sage. Culicover, Peter. 2001. Parasitic gaps: a history. In Peter Culicover and Paul Postal (eds), Parasitic gaps, 3–68. Cambridge, MA: MIT Press. Dąbrowska, Ewa. 2010. Naive v. expert intuitions: An empirical study of acceptability judgments. Linguistic Review 27: 1–23. Do, Monica L., and Elsi Kaiser. 2017. The relationship between syntactic satiation and syntactic priming: A first look. Frontiers in Psychology 8: 1851. Edelman, Shimon, and Christiansen, Morten H. 2003. How seriously should we take Minimalist syntax? Trends in Cognitive Science 7: 60–61. Engdahl, Elisabet. 1983. Parasitic gaps. Linguistics and Philosophy 6: 5–34. Erteschik-Shir, Nomi. 1973. On the nature of island constraints. PhD dissertation, MIT. Featherston, Sam. 2005a. Magnitude estimation and what it can do for your syntax: Some wh-constraints in German. Lingua 115: 1525–1550. Featherston, Sam. 2005b. The Decathlon model of empirical syntax. In M. Reis and S. Kepser (eds), Linguistic evidence: Empirical, theoretical, and computational perspectives, 187–208. Berlin: Mouton de Gruyter. Featherston, Sam. 2007. Data in generative grammar: The stick and the carrot. Theoretical Linguistics 33: 269–318. Fedorenko, Evelina, and Gibson, Edward. 2010. Adding a third wh-element does not increase the acceptability of object-initial multiple-wh-questions. Syntax 13: 183–195. Ferreira, Fernanda. 2005. Psycholinguistics, formal grammars, and cognitive science. Linguistic Review 22: 365–380. Fillmore, Charles J. 1965. Indirect Object constructions in English and ordering of transformations. The Hague: Mouton. Francom, Jerid Cole. 2009. Experimental Syntax: Exploring the effect of repeated exposure to anomalous syntactic structure. Evidence from rating and reading tasks. PhD dissertation. University of Arizona. Frazier, Lyn, and Giovanni B. Flores d’Arcais. 1989. Filler driven parsing: A study of gap filling in Dutch. Journal of Memory and Language 28: 331–344. Gibson, Edward, and Evelina Fedorenko. 2013. The need for quantitative methods in syntax and semantics research. Language and Cognitive Processes 28: 88–124.

26

jon sprouse

Gibson, Edward, Steven T. Piantadosi, and Evalina Fedorenko. 2013. Quantitative methods in syntax/semantics research: A response to Sprouse and Almeida (2013). Language and Cognitive Processes 28: 229–240. Goldberg, Adele. 2007. Constructions at work: The nature of generalization in language. Oxford: Oxford University Press. Goodall, Grant. 2011. Syntactic satiation and the inversion effect in English and Spanish wh‐questions. Syntax 14: 29–47. Goodall, Grant. 2015. The D-linking effect on extraction from islands and non-islands. Frontiers in Psychology 5: 1493. doi:10.3389/fpsyg.2014.01493 Hill, Archibald A. 1961. Grammaticality. Word 17: 1–10. Hiramatsu, Kazuko. 2000. Accessing linguistic competence: Evidence from children’s and adults’ acceptability judgments. PhD dissertation, University of Connecticut. Hofmeister, Philip, Peter Culicover, and Susanne Winkler. 2015. Effects of processing on the acceptability of “frozen” extraposed constituents. Syntax 18: 464–483. Huang, C.-T. James, Y.-H. Audrey Li, and Yafei Li. 2009. The syntax of Chinese. Cambridge: Cambridge University Press. Hunter, Tim, and Chris Dyer. 2013. Distributions on Minimalist grammar derivations. Proceedings of the 13th Meeting on the Mathematics of Language. Jäger, Gerhard. 2007. Maximum entropy models and stochastic Optimality Theory. In Annie Zaenen et al. (eds), Architectures, rules, and preferences: Variations on themes by Joan W. Bresnan, 467–479. Stanford, CA: CSLI. Kayne, Richard S. 1983. Connectedness. Linguistic Inquiry 14: 223–249. Kazanina, Nina, Ellen F. Lau, Moti Lieberman, Masaya Yoshida, and Colin Phillips. 2007. The effect of syntactic constraints on the processing of backwards anaphora. Journal of Memory and Language 56: 384–409. Keller, Frank. 2000. Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. PhD dissertation, University of Edinburgh. Keller, Frank. 2006. Linear Optimality Theory as a model of gradience in grammar. In Gisbert Fanselow, Caroline Fery, Ralph Vogel, and Matthias Schlesewsky (eds), Gradience in grammar: Generative perspectives, 270–287. Oxford: Oxford University Press. Kim, Boyoung, and Grant Goodall. 2016. Islands and non-islands in native and heritage Korean. Frontiers in Psychology 7: 134. Kluender, Robert, and Marta Kutas. 1993. Subjacency as a processing phenomenon. Language and Cognitive Processes 8: 573–633. Kush, Dave, Terje Lohndal, and Jon Sprouse. 2018. Investigating variation in island effects: A case study of Norwegian wh-extraction. Natural Language and Linguistic Theory 36: 743– 779. Langendoen, D. Terence, Nancy Kalish-Landon, and John Dore. 1973. Dative questions: A study in the relation of acceptability to grammaticality of an English sentence type. Cognition 2: 451–478. Langsford, Steven, Amy Perfors, Andrew T. Hendrickson, Lauren A. Kennedy, and Danielle J. Navarro. 2018. Quantifying sentence acceptability measures: Reliability, bias, and variability. Glossa 3: 37. Lau, Jey. H., Alexander Clark, and Shalom Lappin. 2017. Grammaticality, acceptability, and probability: A probabilistic view of linguistic knowledge. Cognitive Science 41(5): 1201– 1241.

acceptability judgments

27

Linzen, Tal, and Yohei Oseki. 2018. The reliability of acceptability judgments across languages. Glossa 3: 100. Luka, Barbara J., and Lawrence W. Barsalou. 2005. Structural facilitation: Mere exposure effects for grammatical acceptability as evidence for syntactic priming in comprehension. Journal of Memory and Language 52: 436–459. Mahowald, Kyle, Peter Graff, Jeremy Hartman, and Edward Gibson. 2016. SNAP judgments: A small N acceptability paradigm (SNAP) for linguistic acceptability judgments. Language 92: 619–635. Marantz, Alec. 2005. Generative linguistics within the cognitive neuroscience of language. Linguistic Review 22: 429–445. Marcus, Gary F. 2001. The algebraic mind: Integrating connectionism and cognitive science. Cambridge, MA: MIT Press. Marty, Paul, Emmanuelle Chemla, and Jon Sprouse. 2020. The effect of three basic task features on the sensitivity of acceptability judgment tasks. Glossa 5: 1–23. Matchin, William, Jon Sprouse, and Greg Hickok. 2014. A structural distance effect for backward anaphora in Broca’s area: an fMRI study. Brain and Language 138: 1–11. Michel, Dan. 2014. Individual cognitive measures and working memory accounts of syntactic island phenomena. Doctoral dissertation, University of California, San Diego. Myers, James. 2009. The design and analysis of small-scale syntactic judgment experiments. Lingua 119: 425–444. Open Science Collaboration. 2015. Estimating the reproducibility of psychological science. Science 349: 943. Pañeda, Claudia, Sol Lago, Elena Vares, João Veríssimo and Claudia Felser. 2020. Island effects in Spanish comprehension. Glossa 5(1): 1–30. Phillips, Colin. 2006. The real-time status of island phenomena. Language 82: 795–823. Phillips, Colin. 2009. Should we impeach armchair linguists? Japanese/Korean Linguistics 17: 49–64. Raaijmakers, Jeroen G. W., Joseph M. C. Schrijnemakers, and Frans Gremmen. 1999. How to deal with “the language-as-fixed-effect fallacy”: Common misconceptions and alternative solutions. Journal of Memory and Language 41: 416–426. Schütze, Carson T. 1996. The empirical base of linguistics: Grammaticality judgments and linguistic methodology. Chicago: University of Chicago Press Smolensky, Paul, and Géraldine Legendre. 2006. The harmonic mind: From neural computation to optimality-theoretic grammar, vol. 1. Cambridge, MA: MIT Press. Song, Sanghoun, Jae-Woong Choe, and Eunjeong Oh. 2014. FAQ: Do non-linguists share the same intuition as linguists? Language Research 50(2): 357–386. Sorace, Antonella. 2000. Gradients in auxiliary selection with intransitive verbs. Language 76: 859–890. Spencer, Nancy Jane. 1973. Differences between linguists and nonlinguists in intuitions of grammaticality-acceptability. Journal of Psycholinguistic Research 2: 83–98. Sprouse, Jon. 2007. A program for experimental syntax. Doctoral dissertation, University of Maryland, College Park. Sprouse, Jon. 2009. Revisiting satiation. Linguistic Inquiry 40: 329–341. Sprouse, Jon. 2011. A test of the cognitive assumptions of magnitude estimation: Commutativity does not hold for acceptability judgments. Language 87: 274–288. Sprouse, Jon, and Diogo Almeida. 2012. Assessing the reliability of textbook data in syntax: Adger’s Core Syntax. Journal of Linguistics 48: 609–652.

28

jon sprouse

Sprouse, Jon, and Diogo Almeida. 2013. The empirical status of data in syntax: A reply to Gibson and Fedorenko. Language and Cognitive Processes 28: 222–228. Sprouse, Jon, and Diogo Almeida. 2017. Design sensitivity and statistical power in acceptability judgment experiments. Glossa 2(1): 14. Sprouse, Jon, Matt Wagers, and Colin Phillips. 2012. A test of the relation between working memory capacity and syntactic island effects. Language 88(1): 82–123. Sprouse, Jon, Ivano Caponigro, Ciro Greco, and Carlo Cecchetto. 2016. Experimental syntax and the variation of island effects in English and Italian. Natural Language and Linguistic Theory 34: 307–344. Sprouse, Jon, Shin Fukuda, Hajime Ono, and Robert Kluender. 2011. Reverse island effects and the backward search for a licensor in multiple wh-questions. Syntax 14(2): 179–203. Sprouse, Jon, Carson T. Schütze, and Diogo Almeida. 2013. A comparison of informal and formal acceptability judgments using a random sample from Linguistic Inquiry 2001–2010. Lingua 134: 219–248. Sprouse, Jon, Beracah Yankama, Sagar Indurkhya, Sandiway Fong, and Robert C. Berwick. 2018. Colorless green ideas do sleep furiously: Gradient acceptability and the nature of the grammar. Linguistic Review 35: 575–599. Stabler, Edward P. 1997. Derivational minimalism. In Christian Retoré (ed.), Logical aspects of computational linguistics. New York: Springer. Stepanov, Arthur, Manca Mušič, and Penka Stateva. 2018. Two (non-)islands in Slovenian: A study in experimental syntax. Linguistics 56: 435–476. Stevens, Stanley Smith. 1956. The direct estimation of sensory magnitudes: Loudness. American Journal of Psychology 6: 1–25. Stowe, Laurie A. 1986. Parsing WH-constructions: Evidence for on-line gap location. Language and Cognitive Processes 1: 227–245. Sturt, Patrick. 2003. The time-course of the application of binding constraints in reference resolution. Journal of Memory and Language 48: 542–562. Szabolcsci, Anna, and Frans Zwarts. 1993. Weak islands and an algebraic semantics for scope taking. Natural Language Semantics 1: 235–284. Thurstone, Louis L. 1927. A law of comparative judgment. Psychological Review 34: 273–286. Tucker, Matthew, Ali Idrissi, Jon Sprouse, and Diogo Almeida. 2019. Resumption ameliorates different islands differentially: Acceptability data from Modern Standard Arabic. In Matthew Tucker (ed), Perspectives on Arabic Linguistics 30: 159–193. van Gompel, Roger P. G., and Liversedge, Simon P. 2003. The influence of morphological information on cataphoric pronoun assignment. Journal of Experimental Psychology: Learning, Memory, and Cognition 29: 128–139. Warstadt, Alex, Amanpreet Singh, and Samuel R. Bowman. 2018. Neural network acceptability judgments. arXiv preprint arXiv:1805.12471. Wasow, Thomas, and Jennifer Arnold. 2005. Intuitions in linguistic argumentation. Lingua 115: 1481–1496. Weskott, Thomas, and Gisbert Fanselow. 2011. On the informativity of different measures of linguistic acceptability. Language 87: 249–273. Wike, Edward L., and James D. Church. 1976. Comments on Clark’s “The language-as-fixedeffect fallacy”. Journal of Memory and Language 15: 249–255. Yoshida, Masaya, Nina Kazanina, Leticia Pablos, and Patrick Sturt. 2014. On the origin of islands. Language, Cognition and Neuroscience 29: 761–770.

c ha p t e r 2 ...........................................................................................................

a c c e p ta b i l i t y judgments of binding and coreference Methodological considerations ...........................................................................................................

elsi kaiser and jeffrey runner

2.1 Introduction

..........................................................................................................................

Constraints on binding/coreference1 are a key component of many aspects of syntactic theory: they contribute fundamental evidence that is relevant for theories of binding and are often also used to test other aspects of grammar, such as claims regarding syntactic structure. Thus, in addition to ‘regular’ acceptability judgments, judgments regarding the acceptability or availability of different binding/coreference options form a major data type used by syntacticians. Crucially, rather than flagging the sentence itself as (un)acceptable, with binding/coreference judgments, (un)acceptability is typically contingent on a particular relation between an anaphor and its antecedent. The typical linguistic notation—subscripts—is exemplified in (1), which conveys that ‘himself ’ can be bound by the local subject Andy but not by the matrix subject Mark (according to Condition A of the Binding Theory). (1) Markj told Kate that Andyi impressed himselfi/∗j

1

In this chapter, we use the terms ‘binding’ and ‘coreference’ largely interchangeably. The semantic distinction between discourse-level coreference and semantic binding is not crucial for the methodological issues discussed here. We do not intend “coreference” to be construed as excluding “binding,” but see e.g. Bach and Partee (1980) and Reinhart (1983) on the semantic distinction often reflected in the terms “coreference” vs. “binding.”

30

elsi kaiser and jeffrey runner

Thus, in the case of examples like (1), the sentence itself as a whole is not unacceptable. If one were to present sentence (1) to a naïve, non-linguistically trained participant without any subscripts, they would presumably judge it to be acceptable. (In the present chapter, we follow Cowart 1997, Schütze and Sprouse 2013, and others in talking about “acceptability judgments” rather than “grammaticality judgments.” We assume that when people report whether a particular sentence is possible in their language (i.e. “acceptable”), their response is influenced both by the grammar and also by factors such as plausibility, ease of processing, and frequency—see e.g. Schütze 1996 for discussion of these issues.) The fact that sentence (1) is not unacceptable per se (only a particular coindexation is unacceptable) means that if a researcher wants to conduct experiments investigating the acceptability of different coreference and binding possibilities with non-specialist, naïve participants—as is typically done in research in psycholinguistics and cognitive psychology—the researcher needs to ensure that participants are considering the intended coreference pattern. Especially in the case of large-scale questionnaires or surveys that participants complete on their own, without extensive interaction with the experimenter (whether over the internet or in a lab setting), one runs the risk of getting uninterpretable or misleading data if participants misunderstand the task. For example, participants who misunderstood the subscript notation may rate sentences such as (1) as acceptable regardless of subscript configuration—thereby leading researchers to conclude incorrectly that reflexives can be bound by non-local subjects. Collecting reliable and interpretable acceptability judgments of binding and coreference thus requires a distinct methodology from standard acceptability judgments. Not surprisingly, as we discuss below, use of subscripts with non-linguistically trained participants is not recommended. (On the broader significance—not specific to coreference judgements—of using non-linguists as participants, see e.g. Dąbrowska 2010, but see also Sprouse et al. 2013.) The structure of this chapter is as follows. We start by reviewing one of the first experimental investigations of Binding Theory, by Gordon and Hendrick (1997), which provides foundational evidence that naïve, non-syntactically trained participants can provide meaningful information about Binding Theory in an experimental setting (Section 2.1.1). In Sections 2.2 and 2.3, we turn to two fundamental methodological issues regarding the experimental investigation of binding/coreference judgments, namely: (a) how participants provide their responses (Section 2.2) and (b) how experimenters can indicate the intended coreference configuration to non-linguistically trained participants (Section 2.3). In Section 2.4, we consider other tasks used to investigate binding and coreference patterns without asking about acceptability per se (e.g. picture-selection tasks or antecedent-selection tasks) and how they can complement the acceptabilitybased methods described in Sections 2.2 and 2.3. (For a more in-depth discussion of non-acceptability-based tasks used to investigate coreference, see Kaiser, 2021.) In Section 2.5, we turn to some general methodological considerations important for experimental work. Section 2.6 concludes the chapter by discussing open questions and future directions.

acceptability judgments of binding and coreference

31

This chapter aims to provide information that is primarily helpful for linguists interested in conducting experiments on binding and coreference. With this goal in mind, we do not discuss psycholinguistic models of reference resolution, and instead focus on providing guidance for researchers already familiar with linguistic theories about binding and coreference who are interested in designing experiments to be used with naïve adult native speaker participants to assess the acceptability of different referential configurations.

2.1.1 An early investigation: Gordon and Hendrick (1997) Gordon and Hendrick (1997) is a classic early demonstration that experimental techniques can be used with naïve, non-linguistically trained participants to test the predictions of linguistic theory. Their goal was to investigate the degree to which the conditions of the Binding Theory accurately predicted naïve native speakers’ interpretations of pronouns, reflexives and names (R-expressions). This work is important for several reasons. First, it illustrates the possibility of indicating coreference in an experimental setting. Second, it is a relatively early example of using experimental design in a study focusing on what has been considered a syntactic phenomenon (another such foundational demonstration is in Cowart 1997). And third—and most important to the purposes of this volume—it illustrates that naïve speakers can be reliable and meaningful sources of judgment data, even on the more complex judgments regarding the acceptability of coreference options. One of Gordon and Hendrick’s foundational experiments tests Condition C, which predicts that an R-expression (e.g. a name) cannot corefer with another expression that c-commands it. The first experiment tested this claim by manipulating sequences of noun phrases and whether or not the first c-commanded the second. Simplifying somewhat, the basic design of the experiment was 3×2: 3 NP sequences (Name–Pronoun, Name–Name, Pronoun–Name) × 2 c-command levels (c-command, no c-command). This resulted in six conditions, as shown in example (2). (Gordon and Hendrick did not use subscripts in the materials they presented to participants; instead, they used boldface font to indicate coreference. We added subscripts to example (2) for ease of exposition.) (2a) Name–Pronoun, no C-command: (2b) Name–Pronoun, yes C-command: (2c) Name–Name, no C-command: (2d) Name–Name, yes C-command: (2e) Pronoun–Name, no C-command: (2f) Pronoun–Name, yes C-command:

Johni ’s roommates met himi at the restaurant. Johni met hisi roommates at the restaurant. Johni ’s roommates met Johni at the restaurant. Johni met Johni ’s roommates at the restaurant. Hisi roommates met Johni at the restaurant. Hei met Johni ’s roommates at the restaurant.

According to the Binding Theory, the sequence in which the first NP c-commands a name should induce a Condition C violation (examples 2d, 2f). This should result in participants rejecting sentences where the name is coreferential with the first NP.

32

elsi kaiser and jeffrey runner

Otherwise, the Binding Theory should be satisfied and the examples are predicted to be acceptable (at least as far as the Binding Theory is concerned). Gordon and Hendrick tested this with 45 undergraduate students who “had been exposed to the systematic study of language, but had not yet studied syntax” (Gordon and Hendrick 1997: 336)—in other words, participants who had not received any explicit information about Binding Theory. Participants were asked to indicate whether coreference between the two boldface items was possible (see Section 2.3 for more detailed discussion of this method). There were six versions of each sentence for a total of 36 sentences to be judged; and there were six different orderings of the sentences. (See Section 2.5 for information about current best practices for stimulus presentation, randomization etc.) As expected, participants largely accepted all Name–Pronoun sequences (ex. 2a, 2b), regardless of c-command (“yes” proportions around .99). Somewhat contrary to expectations, participants disfavored Pronoun–Name sequences (ex. 2e, 2f), showing no sensitivity to c-command: “yes” proportions were .28 for Pronoun–Name– No–C-Command (2e) and .25 for Pronoun–Name–Yes–C-Command (2f). However, Name–Name sequences were influenced by c-command in the direction predicted by Binding Theory: “yes” proportions were .37 for No c-command (2c) and .17 for Yes c-command (2d). Gordon and Hendrick point out that if Condition C of the Binding Theory were the only constraint on coreference between various types of NPs, we would expect a substantial effect of c-command, especially in Pronoun–Name sequences, where none was found. Instead they interpret these results as supporting a Centering Theory-based approach (see e.g. Grosz, Joshi, and Weinstein 1995) which predicts a dispreference for a referent introduced in a syntactically prominent position (here a subject) to be subsequently referred to with a name. The experiment described above examined Condition C, which constrains the coreferential possibilities of referential expressions. But a more central focus of syntactic theory has been on the distribution and interpretations of pronouns and reflexives. Gordon and Hendrick test the judgments on these NP types in another experiment, where they looked at sequences of coreferential NPs where NP1 either c-commands or does not c-command NP2. The experiment was conducted as a second part of another experiment, using the same procedure as the experiment described above. Forty-eight students participated. Principle A predicts that a reflexive (anaphor) should be acceptable if there is a c-commanding antecedent in the same sentence, and Principle B predicts that a pronoun should be unacceptable in the same configuration. And again, Principle C predicts that a referential expression should be unacceptable if it refers to another c-commanding expression.

acceptability judgments of binding and coreference

33

The experimental conditions of this study are listed in (3), along with sample stimuli: (3a) Name–Pronoun, no C-command: (3b) Pronoun–Name, no C-command: (3c) Name–Name, no C-command: (3d) Pronoun–Anaphor, no C-command: (3e) Name–Anaphor, no C-command: (3f) Name–Pronoun, yes C-command: (3g) Pronoun–Name, yes C-command: (3h) Name–Anaphor, yes C-command:

Joani ’s father respects heri . Heri father respects Joani . Joani ’s father respects Joani . Heri father respects herself i . Joani ’s father respects herself i . Joani respects heri . Shei respects Joani . Joani respects herself i .

Here the effects of c-command are robust. As predicted by Principle A, when a reflexive corefers with a c-commanding antecedent, as in (3h), the sentence is judged acceptable (.94 “yes” responses); when it does not have a c-commanding antecedent, as in (3d) and (3e), the sentence is judged unacceptable (.04, .06 “yes”, respectively). And as predicted by Principle B, when a pronoun is coreferential with a non-c-commanding antecedent, as in (3a), the sentence is acceptable (.94 “yes”); when it does have a c-commanding antecedent, as in (3f), the sentence is judged unacceptable (.06 “yes”). As in the first experiment, both sentence types with Pronoun–Name sequences are degraded, though c-command does seem to play a role (.33 “yes” without c-command (3b), .12 “yes” with c-command (3g)). And finally the one Name–Name sequence (3c) is also somewhat degraded (.62 “yes”) even though there is no c-command between the two NPs. For our purposes, the results of these two experiments are important because they illustrate the possibility of using experimental techniques to probe questions that bear on theoretically interesting questions. In particular, they demonstrate that nonsyntacticians who are naïve to the hypotheses being tested can reliably provide acceptability judgments that can be used to test the predictions of linguistic theories, in this case the Binding Theory principles governing the interpretation and distribution of NP types. These two experiments also serve to illustrate one of the important design decisions that must be made in developing binding experiments: how to indicate coreference to naïve (non-linguistically trained) participants. In Gordon and Hendrick’s experiments, coreference was indicated by boldface font. As we discuss in Section 2.3, there are also other ways of accomplishing this.

2.2 Different ways of querying the acceptability of a coreference relation

..........................................................................................................................

In the next two sections we consider two fundamental methodological issues regarding the experimental investigation of binding/coreference judgments, namely: (a) how participants provide their responses to the experimental stimuli (i.e., what the nature of the dependent variable is, discussed here in Section 2.2), and (b) how experimenters

34

elsi kaiser and jeffrey runner

indicate the intended coreference configuration to participants who (presumably) are not linguists and thus are not trained in the metalinguistic subscript convention that is used to indicate coreference in theoretical linguistics (Section2. 3). (See also Kaiser, 2021, for related discussion.) In what follows we review three possible options for eliciting participants’ acceptability judgments: (i) binary questions, (ii) scale-based questions that allow more gradient responses, and (iii) Magnitude Estimation. These methods differ both in the “grain size” that they offer to participants for making distinctions and in the mathematical properties of the resulting data, which has implications for statistical analysis (see e.g. Cowart 1997; Bard et al. 1996). Perhaps the simplest approach is a basic yes/no two-alternative forced-choice question, where participants indicate whether a certain sentence is acceptable under the indicated coreference interpretation. This binary distinction echoes the traditional split into grammatical vs. ungrammatical. Gordon and Hendrick (1997) used a binary task in some of their studies, and instructed participants to indicate by checking a box whether a sentence “would be acceptable if the boldface NPs it contained were coreferential” (Gordon and Hendrick 1997: 337; see also Section 2.3 below). A more rarely used but potentially very informative extension of the yes/no binary acceptability task is the speeded acceptability judgment task: In the speeded version, participants see sentences word-by-word, and each word is visible for 300–400ms. As in the untimed version, participants provide a binary yes/no acceptability response, but now are asked to do it as quickly as possible after the end of the sentence (e.g. within a “deadline” of 2 seconds—Bader and Häussler 2010; Wagers, Lau, and Phillips 2009). Thus, this method provides information both about people’s final yes/no (acceptable/not acceptable) choices under time pressure, as well as the speed with which they make these decisions. A recent example of speeded acceptability being used to investigate coreference judgments comes from Lago et al. (2018), who used this method to investigate processing of possessive pronouns in German by native speakers of Spanish and English. A comparison of the reaction times in the speeded-acceptability task and a self-paced reading task that Lago et al. also conducted suggests that the reaction time component of speeded-acceptability judgment tasks can yield a sensitive means of tapping into processing. A two-alternative forced-choice task has the advantage of being relatively easy to explain to participants, but its binary nature does not offer participants any means to indicate, for any individual item, an intermediate assessment of acceptability. Use of an n-point (Likert-type) scale allows participants to indicate intermediate levels of acceptability, which can also provide more fine-grained data that could potentially pick up on more subtle effects that may not be detectable with a binary set-up. Empirically, the jury is still out on how data elicited using a binary task compares to data elicited with n-point scales (see e.g. Langsford et al. 2018; Weskott and Fanselow 2011; Sprouse and Almeida 2017 for divergent views). A scale-based approach was used by Gordon and Hendrick in their fourth experiment. They had a fully labelled six-point scale:

acceptability judgments of binding and coreference 1. 2. 3. 4. 5. 6.

35

Completely Unacceptable Unacceptable Just Barely Unacceptable Just Barely Acceptable Acceptable Completely Acceptable

It is worth noting that one may not need to label all six steps of the scale. In fact, Cowart (1997) recommends that in order to (increase one’s chances of being able to) elicit interval data, it is best to only label the two ends of the scale, not all scale points. The fully labelled scale above will yield (at most) ordinal data, but the statistical inferences that one can make based on interval data are more powerful. Thus, many researchers only label the two ends of the scale. While this does not guarantee interval data, it is widely viewed as a step in the right direction. (Only labelling the end points also allows one to sidestep the sometimes difficult task of trying to identify the right verbal labels for the different points on the scale.) Participants can then be instructed to use the intermediate (unlabeled) points on the scale for intermediate judgments. In addition to scales where the two extremes are “acceptable” vs “unacceptable,” scales can also be constructed where the two extremes are two different antecedent choices (see ex. 3). These scales do not measure acceptability but instead offer a measure of which referent is preferred as the antecedent (see e.g. Kaiser 2015). (3) Andrew heard from Bob about the picture of himself on the wall. Who is shown in the picture? Andrew 1 2 3 4 5 6 Bob In general, when using odd-numbered scales in particular, experimenters are faced with the question of how to interpret responses at the middle of the scale—in other words, how to distinguish uncertainty from (certainly) intermediate acceptability. Consider sentence (4) in a situation where, following Gordon and Hendrick, participants are asked to assess if the words in boldface can refer to the same person. (4) Andrew heard from Lisa about a picture of him on the wall. (a) Unacceptable 1 2 3 4 5 6 7 Acceptable (odd number of points) (b) Unacceptable 1 2 3 4 5 6 Acceptable (even number) In (4a), a seven-point scale is shown. If a participant strongly feels that a coreference relation between ‘Andrew’ and ‘him’ has an intermediate acceptability level, they could opt for “4”—this is a case of a participant being highly confident in the coreference relation having an intermediate acceptability level. However, what about a situation where the participant is uncertain about whether ‘Andrew’ and ‘him’ can refer to the same person? In this case, they would probably also select “4”—this is a case where a “4” occurs because of low confidence. Thus, response choices at or near the midpoint of the

36

elsi kaiser and jeffrey runner

scale bring up questions about the danger of conflating level of acceptability with level of certainty (see also Ionin and Zyzyk 2014 for related discussion). Even with even-numbered scales (4b), the same challenge can persist: When someone is uncertain about their response, they might choose a number near the middle of the scale, and when someone is certain that the sentence involves an intermediate level of acceptability, they could also choose a number near the middle of the scale. (It may be, though, that an exact midpoint—as offered by an odd-numbered scale—is more likely to be construed as a means of responding “I don’t know” (low certainty) than the middle of the scale on an even-numbered scale. This is an empirical question.) One way to address this concern is to include an additional scale for participants to rate their confidence in their responses, or to offer a separate/additional “I don’t know” answer choice that is distinct from the scale (see e.g. Montrul, Dias, and Santos 2011; Rebuschat 2013; Ionin and Zyzyk 2014). Including a separate confidence rating scale for all items in a study would effectively mean doubling the number of questions that participants need to answer, which—given that people will get tired after a large number of questions—can have consequences for the number of items that can be included per condition and thus for statistical power. A third means of asking participants to indicate their acceptability judgment is by means of Magnitude Estimation. In contrast to binary tasks or n-point scales, Magnitude Estimation allows participants to use as fine-grained a scale as they wish, and without pre-determined end-points. Magnitude Estimation has a long tradition in psychophysics research (see Stevens 1975), and was introduced to linguistics by Bard et al. (1996) and Cowart (1997). In a Magnitude Estimation study, participants are instructed to estimate the “magnitude” of stimuli relative to an intermediate baseline (a “reference” stimulus), by providing numerical values for each experimental stimulus that are proportional to the numerical value (“modulus”) given to the reference stimulus.2 The reference sentence is typically held constant throughout the experiment. For example, in a loudness perception experiment, a participant might give the modulus sound a value of, say, 30. Then, if the next sound is judged to be twice as loud, it should receive a value of 60. If the next sound is judged to be half as loud as the modulus, it should be rated 15. Participants can use all positive numbers at whatever grain size they prefer, including fractions and decimals. More relevantly for our purposes, Keller (2000) used ex. (5) as the modulus in an experiment investigating the acceptability of sentences like (5’) and (5”). In these examples, intended coreference is shown by capitalization (see Section 2.3 for further discussion). 2

However, see Sprouse (2008) and Featherston (2008) for concerns that in the linguistic domain, people fail to use the modulus as a proportional unit of measurement when judging the acceptability of sentences. A related method that avoids this issue is known as “thermometer judgments,” and has been pioneered by Featherston and colleagues (see Featherston 2008 for an overview). Participants are asked to indicate how natural a particular linguistic stimulus sounds. They are given two reference sentences (rather than just one, as is typically done in Magnitude Estimation), one that is relatively low in acceptability—assigned a score of 20—and one that is relatively high—assigned a score of 30—and asked to indicate how natural the test sentences sound given these two pre-specified points on the scale.

acceptability judgments of binding and coreference

37

(5) Jill told the people HE trusts all about SAM. [modulus] (5’) HANNA saw a photograph of HER. (5”) HANNA saw a photograph of HERSELF. In the domain of coreference judgments, Keller (2000) and Keller and Asudeh (2001) were the first to systematically use Magnitude Estimation in an experimental context. Experiment 1 in Keller and Asudeh (2001) tested standard Binding Theory configurations, whereas Experiment 2 turned to an area where the judgments are known to be murkier, namely picture-NPs (e.g. the picture of her/herself ). Experiment 1 replicated Gordon and Hendrick’s (1997) findings, thereby confirming that Magnitude Estimation can yield reliable and interpretable data about binding/coreference judgments. Experiment 2, on picture-NPs, showed Magnitude Estimation can yield fine-grained information about coreference in this construction as well. Earlier, experimental linguistics researchers felt that one of the key advantages of Magnitude Estimation is that it yields interval data, which allows for more powerful statistical analyses than ordinal data, the kind of data obtained from an n-point scale. However, some subsequent work has challenged this view. For example, Weskott and Fanselow (2011) compare data obtained by means of binary acceptability judgments, a seven-point scale and magnitude estimation, and show that Magnitude Estimation data is not more informative than the other two methods (see also Sprouse 2011) and in fact can be more susceptible to spurious variance. (See also Fukuda et al. 2012, Bader and Häussler 2010, and Langsford et al. 2018 for related discussion on Magnitude Estimation.) Although these methodological comparisons did not specifically test coreference judgments, it seems reasonable to assume that their conclusions would also extend to the reference resolution domain. In sum, then, it seems that Magnitude Estimation does not have clear advantages over n-point scales, for example.

2.3 Indicating coreference without the use of subscripts

..........................................................................................................................

A fundamental consideration when investigating binding/coreference judgments has to do with how to ensure that the relevant coreference relation is being considered by the participants. As mentioned at the start of this chapter, binding/coreference judgments pose a specific challenge in this regard. In contrast to many other cases where acceptability judgments are collected, in the domain of binding/coreference the researcher needs to be aware of the distinction between testing whether (a) a particular sentence is (un)acceptable or whether (b) a particular coindexation configuration is (un)acceptable. As we already saw in the discussion of Gordon and Hendrick’s work, a sentence can be acceptable under one coindexation configuration but not under another. Consider, for example, ‘Joan respects her.’ With no coindexation information

38

elsi kaiser and jeffrey runner

given, this sentence would presumably be judged to be completely acceptable—under the reading that ‘Joan’ and ‘her’ are not coindexed. However, as Gordon and Hendrick showed, once we specify that ‘Joan’ and ‘her’ are coindexed, the sentence receives very low acceptability ratings. In theoretical linguistics, this challenge is solved by means of subscripts (*Joani respects heri or Joani respects her∗i/j ). However, if the goal is to conduct large-scale experiments with naïve (i.e., not linguistically trained) participants, subscripts are not an ideal approach, as they may be confusing and would require considerable metalinguistic explaining. These issues are especially challenging if the experiment is being conducted remotely (e.g. over the internet) such that the experimenter does not interact directly with participants. In this section, we consider alternative ways of indicating the intended coreference relation that have successfully been used in existing work on binding/coreference judgments in experimental syntax. As a preview, the three main alternatives we consider are (i) typographic means (e.g. boldface, capitalization, boxed words), (ii) phi-featural means (e.g. gender, number), and (iii) linguistic context. Generally speaking, different kinds of typographic cues are the most widely used means for indicating the intended coreference relation. Prior work has used means such as (a) putting the two words whose coreference relation is being evaluated in boldface font (e.g. Gordon and Hendrick 1997) or in boldface font combined with underlining (e.g. Kaiser, Nichols, and Wang 2017), (b) writing those words in ALL CAPITALS (e.g. Keller and Asudeh 2001), (c) color-coding the two words whose coreference relation is being evaluated in the same (non-black) font color (e.g. Temme and Verhoeven 2017; Moulton et al. 2018), and (d) putting boxes or circles around the anaphor (e.g. Cunnings and Sturt 2014; 2018; Carden and Dieterich 1981). The final box/circle option has been used to ask questions about antecedent choice in prior work, but could easily be adapted to fit an acceptability judgment task, by putting a box around the anaphor and another box around the antecedent being tested. These typographical variants are illustrated in example (6) below. (The ‘font color’ option is not properly depicted below, due to printing limitations.) (6) (a) boldface: (b) boldface and underlined: (c) All caps: (d) Colored font:

John heard from Peter about the picture of himself. John heard from Peter about the picture of himself. John heard from PETER about the picture of HIMSELF. John heard from Peter about the picture of himself.

(e) Boxes:

John heard from Peter about the picture of himself .

It is important to acknowledge that, like subscripts, these options are still metalinguistic in the sense that they do not depict coreference in a direct, iconic way—they all require the participant to interpret a certain typographical convention as indicating that two

acceptability judgments of binding and coreference

39

words refer to the same person. As a consequence, the experimenter still needs to explain (without technical jargon) the basic notion of ‘coreference’ to participants. This is especially important when experiments are conducted remotely via the internet, because in these kinds of remote contexts, participants cannot receive instant answers to potential clarification questions, and experimenters cannot observe and react to potential confusions with practice items. Let us consider some of the wording choices past work has used when explaining the coreference task to participants. Keller and Asudeh (2001)’s instructions were: “Your task is to judge how acceptable each sentence is by assigning a number to it. By acceptability we mean the following: Every sentence will contain two expressions in ALL CAPITALS. A sentence is acceptable if these two expressions can refer to the same person.” Temme and Verhoeven (2017) used the following instructions: “Do you find the sentence acceptable under the condition that the highlighted words relate to the same person?” (This is their translation of the German original: “Finden Sie den Satz akzeptabel unter der Bedingung, dass sich die beiden markierten Wörter auf dieselbe Person beziehen?”) Moulton et al. (2018) asked: “Can the two parts in green refer to the same person?” While these instructions are relatively straightforward, especially if accompanied by example items, it is worth keeping in mind that they may pose challenges for populations unfamiliar with thinking about language in metalinguistic terms. In such contexts, creating a setting where the experimenter can discuss practice items with participants can provide people with the opportunity to ask clarification questions. Crucially, the experimenter should ensure that participants are basing their responses not on whether the sentence itself is acceptable, but rather on whether it is acceptable on the specific coreference interpretation that is being indicated. In addition to the use of instructions and practice items, participants’ understanding of the task can also be monitored with “catch trials” during the experiment—trials designed to reveal if participants are not following instructions or not paying attention (see Section 5, below). It is also worth noting that the typographical signals described hinge on the materials being presented in writing to literate participants—the typographical cues to coreference are by definition limited to written language. As a consequence, these kinds of approaches do not allow effects of prosody/intonation to be investigated or controlled (at least not in a straightforward way). In fact, one may wonder whether certain typographical options could be interpreted as signaling specific intonational cues, e.g. prosodic focus, something that we discuss below. In addition, these methods cannot be used with non-literate or pre-literate populations, such as young children. Something else to consider with some of the typographical methods discussed here is the possibility that participants will construe the marked words as being somehow emphasized (e.g. prosodically and semantically focused). Especially as regards the status of the candidate antecedent that is being tested, this could introduce a confound. To see why, let us first note that recent work by Fraundorf et al. (2013) and Maia and Morris (2019) suggests that words presented in all-capitals evoke alternatives, indicating that they are interpreted as being contrastively focused (but note

40

elsi kaiser and jeffrey runner

that these studies did not test anaphora). Thus, a potential antecedent presented in all-capitals may be processed differently than it would otherwise be, and this could influence its availability to serve as an acceptable antecedent for a reflexive or pronoun. In sum, the possibility of affecting the representation of the typographically marked referent is a potential complication associated with formats associated with emphasis (such as all caps, italics, boldface, and underlining). However, these concerns may be less applicable to color-coding and boxes, which are not conventionally used to indicate emphasis/focus. (With color-coding, however, one should choose colors which are maximally clear even to participants with varying degrees of color blindness.) Having considered the use of typographic means to indicate coreference, let us now turn to the other two options, namely phi-featural means (e.g. gender, number) and linguistic context. In languages like English where reflexives are marked for person, gender, and/or number, researchers can make use of those features to ensure that, even in sentences with more than one human referent, only one of them featurally matches the anaphor being investigated.3 Thus, in these languages, phi-features can be used to signal the intended coreference configuration whose acceptability the researcher wants to test, and the researcher can simply ask participants to rate the acceptability of the sentence. However, before turning to examples of this in English, it is important to acknowledge that in many languages, disambiguation by means of person, number, or gender may not be easily achievable. For example, the Chinese reflexive ziji ‘self ’ does not mark gender, number, or person. Furthermore, even in languages like English, gender marking is not always present (e.g. with singular ‘they’) and/or may be something that a researcher prefers not to use. The examples in (7) illustrate how, in English, gender and person features can be manipulated such that only the non-local matrix subject is a featurally matching “candidate antecedent” (7a, 7c) or that only the local, embedded subject is a featurally matching “candidate” (7b, 7d). In these cases, judgments about the acceptability of the sentence should coincide with judgments about the acceptability of the coreference relation being investigated (assuming there are no other sources of potential ungrammaticality, of course). (7) a. Bob heard that Cindy embarrassed himself. b. Cindy heard that Bob embarrassed himself. 3

When considering the use of phi-features as a means to disambiguate the intended coreference relation, one should also be aware of existing psycholinguistic research on the question of how sensitive the language-processing system is to different kinds of phi-features (see e.g. Kaiser, 2021). Although comprehenders appear (at least in some configurations) to temporarily “ignore” structural constraints on anaphor resolution, there does not seem to be strong evidence that comprehenders “ignore” phi-features such as gender, person or number if those features are marked on the relevant anaphors in their language (see e.g. research on the gender mismatch effect, GMME).

acceptability judgments of binding and coreference

41

c. Bob heard that I embarrassed himself. d. I heard that Bob embarrassed himself.

One advantage of this approach, relative to the typographical convention-based approaches, is that it allows stimuli to be presented in spoken/auditory format—thereby making it possible for researchers to investigate or control effects of prosody/intonation (and to test non-literate participants such as children). However, this phi-feature-based approach is less well-suited for pronouns, because they can refer to entities mentioned outside the sentence—in fact, all four sentences in (7) would be fully acceptable if the reflexive is replaced with a pronoun (which could then be interpreted as referring to another, unmentioned third-person referent). The examples above use gender-specific names—one could also use roles with definitional gender (e.g. king). In some situations, researchers may want to use roles with stereotypical gender if they do not want to present participants with potentially ungrammatical sentences.4 For example, ‘Bob heard that Cindy embarrassed himself ’ is ungrammatical if we assume that Cindy is female, but ‘The surgeon heard that the nurse embarrassed himself ’ may be hard to process due to the stereotype violation but is not, strictly speaking, ungrammatical (see e.g. Sturt 2003). In addition to gender and person, another feature that could be used to constrain antecedent availability with reflexives is animacy, but creating minimal pairs of the type shown above is hard with inanimate noun phrases. The number feature could also be used, but one should be careful to avoid ambiguous set-member construals which could complicate judgments. (Consider, for example, a sentence like ‘The boy said that the children embarrassed himself ’ where the boy could be construed as one of the children.) In closing, it is worth noting that coreference can also be constrained by linguistic context. For example, Featherston (2002) used linguistic context and an additional paraphrase to clarify the interpretation being tested. His experiment probed the interpretation of German object-position pronouns and reflexives, and used items such as (8) (we added underlining to highlight the region of interest). Here, both the story and the addition of the paraphrase after “i.e.” (in German d.h. ‘das heisst’) indicate that Martin saw Martin. Participants used Magnitude Estimation to rate the naturalness of each item. This kind of approach could also be adjusted to use visual context (rather than linguistic context) to constrain the intended interpretation—e.g. to show participants an image where Martin sees Martin in the mirror. (8) Martins neuer Bundeswehrhaarschnitt gibt ihm den Anschein eines Sträflings. Manche finden es jedoch gemein von mir, dass ich Martin sich im Spiegel gezeigt habe. (d.h. Martin sah Martin) 4

Research using sentences with nouns that have stereotypical gender normally relies on pre-existing norms that provide information about the strength of different nouns’ gender bias. Unfortunately, it is still the case that many nouns referring to professions have strong gender biases.

42

elsi kaiser and jeffrey runner ‘Martin’s new army haircut makes him look like a convict. But some people thought it was mean of me that I showed Martin himself in the mirror. (i.e., Martin saw Martin).’

2.4 Investigating binding and coreference patterns without asking about acceptability

..........................................................................................................................

In this section, we discuss tasks that make use of visual context to indicate the intended coreference relation (or to represent two possible alternatives), but do not directly ask participants to assess the acceptability of a pre-specified coreference/binding relation. These visual-context-based tasks can nevertheless provide useful complementary data for research that uses acceptability judgment tasks. These tasks are especially informative in situations where an anaphor could be coreferential with either of two possible antecedents, i.e., both coreference configurations are rated highly acceptable. The use of visual context to pin down the intended coreference relation is often called “scene verification” (sometimes “picture verification”) and is related to the Truth Value Judgment Task (TVJT) often used with children (e.g. Chien and Wexler 1990 and many others). In some of our prior work (Kaiser et al. 2009), we used the scene verification method to investigate the effects of verbal semantics on the binding in “picture” noun phrases (e.g. picture of her/herself ). In that study we were interested in how the choice of verb (told vs. heard) affected the antecedent choices for a sentence like (9), given that the verb manipulation changes who is the source and who is the perceiver of the information. This work was motivated by prior observations in the theoretical literature suggesting that reflexives tend to prefer sources-of-information and pronouns tend to prefer perceivers-of-information, and we wanted to test these predictions experimentally to assess their impact on both real-time processing and offline interpretations. (9) Alison {told/heard from} Mary about a picture of {her/herself} on the wall. In our scene verification method, participants listened to a sentence like (9), while viewing a scene like the one in Figure 2.1. Participants had been instructed to press the “y” (for “yes”) or “n” (for “no”) to indicate whether they thought the scene matched the sentence they heard. The critical feature of scene verification is that the experimenter can isolate a particular reading or interpretation and present that to participants, asking essentially if that interpretation is available for a particular sentence. This is similar to the approaches discussed above, where the intended reading is indicated with typographical conventions, but without the explicit task of judging coreference. That is, participants are not asked

acceptability judgments of binding and coreference

Mary

43

Alison

fig. 2.1 Scene verification display from the experiment by Kaiser et al. (2009)

to interpret things like bold font; they simply have to answer whether they think the sentence they are listening to is consistent with the scene they are viewing—admittedly a simpler and less metalinguistic task. Furthermore, in scene verification experiments, participants are not indicating whether a certain coreferential configuration is acceptable, but rather whether or not the sentence and the picture match, i.e., whether they are consistent. In another experiment in Kaiser et al. (2009), we used a related set-up—a pictureselection task—to test the binding options in sentences like (9). It is useful to compare these two methods directly because there are differences that can be exploited depending on the theoretical question the experimenter is interested in. In the picture-selection task, participants again listened to a sentence like (9), but now saw a display like Figure 2.2, with a framed picture of each character. Thus, in contrast to scene verification, both potential antecedents (two framed pictures, in this case) are shown in the scene. Now, the task for the participant is to click on the relevant picture they believe was mentioned in the sentence they heard (that of Mary or of Alison). In essence, the picture-selection task is a forced-choice task, which simply asks participants to select their preferred antecedent. But in the case of pronouns, and sometimes

44

elsi kaiser and jeffrey runner

Mary

Alison

fig. 2.2 Picture selection display from the experiment by Kaiser et al. (2009)

reflexives, there can be multiple potential antecedents in the context. Referring back to example (9) again, Kaiser et al. (2009) found that in the picture-selection task, when the sentence contained a reflexive, participants selected the subject (Alison) much more often than the object (Mary), suggesting a strong preference for the subject antecedent for the reflexive in the picture-NP. In that study, though, we were interested in the degree to which the verb semantics affected the binding possibilities. Replacing ‘tell’ (in 9) with ‘hear’ did not have a large effect the picture choice (though reflexives did exhibit a marginal sensitivity to the source-of-information, as predicted). Ultimately, though, the subject bias was strong enough to encourage listeners to mostly select the subject as their “preferred” choice. However, we were interested in the degree to which the non-subject was a “possible” antecedent. The picture-selection task is not an effective way to answer this question. This is where the scene verification method becomes useful. By isolating the interpretation in which only the non-subject is the antecedent of the reflexive (as depicted in Figure 2.1), we can test the degree to which that reading is indeed available, as a possible, though perhaps dispreferred, option. In syntactic theorizing, it is often these subtler, available-but-not-preferred, interpretations that researchers are interested in. Indeed, we found, using the scene-verification method,

acceptability judgments of binding and coreference

45

that with sentences like (9) and displays like Figure 2.1 the object is accepted significantly more often with ‘hear’ than with ‘tell’—in other words, if participants are directly asked whether the object is possible antecedent (yes/no), we are able to detect that objects which are sources-of-information (with ‘hear’) are more likely to be accepted as possible antecedents than objects which are perceivers-of-information (with ‘told’). In sum, these two methods (picture verification and picture selection/antecedent selection) are good for answering slightly different questions. Scene verification allows the experimenter to ask if a particular reading for a given sentence is available (i.e., can a particular sentence have a specific meaning, is a particular reading acceptable?). This is because the scene isolates a particular reading visually and participants simply respond whether the sentence they hear can have that reading. If a particular reading is unacceptable, participants should respond with “no” in the scene verification task. The picture selection task allows the experimenter to ask what the preferred reading of a potentially ambiguous sentence is. Participants are asked to click on one of several options of antecedent, and the assumption is that they will more often than not click on the option corresponding to the preferred (or more prominent) reading. In other words, the picture selection task can reveal that one of the two candidate antecedents is preferred over the other, even in a situation where both are acceptable. Considering only the results of an acceptability judgment task could lead an experimenter to not notice this difference. Conversely, using only a picture-selection task could lead an experimenter to observe that one of the candidate antecedents always wins over the other—which could lead the experimenter to incorrectly conclude that the other candidate antecedent is unavailable and/or unacceptable. However, this conclusion may be incorrect, as it could simply be that the other candidate antecedent is dispreferred (i.e., it “loses out” to the other competitor) even if it is fully acceptable. In sum, the picture verification and picture-selection/antecedent-selection tasks are best used in tandem, as they complement each other. Which is most suitable depends on the hypothesis being tested, and often using both can yield valuable information.

2.5 General methodological considerations

..........................................................................................................................

So far, we have touched upon some of the specific considerations that are relevant when designing experiments on coreference. However, there are also broader “best practices” to keep in mind when conducting linguistic experiments, in order to ensure that the data is as interpretable and meaningful as possible. We do not attempt to give a full discussion of these topics here, but will mention some of

46

elsi kaiser and jeffrey runner

them and point the interested reader to further resources that have more in-depth discussion. One of the basic questions concerns how to word instructions. Often, instructions explain that we are not interested in prescriptive rules, in real-world plausibility/truth, or in how understandable a sentence is—i.e., these considerations should not play a role in participants’ responses. For example, if the task is to rate acceptability, a sentence like ‘Pink elephants are flying overhead’ should be rated highly acceptable, despite its real-world implausibility. A sentence like ‘me eat sandwich Sunday’ should be rated low in acceptability, despite the fact that the intended meaning is clear. A common approach is to instruct participants to consider whether someone saying the sentence would sound like a native speaker of the language in question (e.g. Schütze and Sprouse 2013). Beyond these basic considerations, the extent to which the details of instructions matter presumably depends a lot on the complexity of the task that the participant is being asked to do. For a regular acceptability judgment task—which, as Schütze and Sprouse (2013) note, is “intuitively natural for participants” (p. 37)—the consensus seems to be that instructions matter relatively little (see e.g. Cowart 1997; Schütze and Sprouse 2013). However, it seems plausible that if the intended task is easily confusable with another task, the instructions may matter more—due to the risk of a participant doing a task other than what the experiment intended. For many of the coreference tasks we discuss in this chapter, there is a risk that if participants skip over the instructions, they may end up rating the acceptability of the sentence (a pretty intuitive task) rather than the acceptability of the indicated coreference relation (arguably a less intuitive task). Thus, it seems prudent to strive for clear yet brief instructions, ideally with some example items. Crucially, if example items are included, it is important to avoid inadvertently biasing participants’ responses: We do not want to “teach” participants to respond in accordance with our predictions. It is important to make sure that any example items do not create biases regarding the experimental manipulations being tested. This being said, it is interesting to note that Gordon and Hendrick (1997) compared “reflective instructions” (encouraging participants to reflect on the sentences before responding) and “immediate instructions” (encouraging participants to respond immediately). This dimension is different from the “task mix-up” concern mentioned above, but it is interesting to note that although Gordon and Hendrick found no main effects of instruction type, a closer look at the different conditions suggests that in some conditions, the reflective instructions elicited stronger effects of c-command than the immediate instructions. In addition to ensuring that the intended task is clear, experimenters may want to include some “catch trials” (distributed throughout the study) that are not related to the question being investigated but that would allow them to detect if a particular participant has misunderstood the instructions from the very start (e.g. is rating overall

acceptability judgments of binding and coreference

47

sentence acceptability instead of acceptability of coreference) or perhaps forgets the instructions and shifts to rating overall acceptability at some point in the experiment. More generally, especially in internet studies that are done remotely, it can be helpful to include catch items that function as attention checks (unrelated to the targets) that allow experimenters to detect participants who are, for example, simply responding randomly and not attending to the task. Participants who fail to meet a required performance threshold on the “attention check” items can then be excluded, using a criterion that is independent of their performance on the target items. A related consideration is the use of filler/distractor items—in other words, items that are not related to the manipulation/issue being tested. Fillers have multiple purposes (see e.g. Schütze and Sprouse 2013). First, they can play an important role in camouflaging the targets and thus address the concern that if participants realize that a particular phenomenon is being investigated, they may start to respond strategically, in ways that diverge from how they would normally process or assess a certain kind of sentence. Because the experimenters’ aim is to tap into how people normally process sentences, we do not want people to adopt particular response strategies that depart from the norm. Fillers can also play a key role in controlling the “landscape” of participants’ responses—in other words, in making sure that participants encounter items that are expected to elicit high, low, and intermediate ratings on a Likert scale, or in that in a yes/no task, there are roughly equal proportions of yes and no responses elicited across the course of the experiment. Imagine a study where a participant realizes, after a while, that almost every response they are giving is 5 or above on a 6-point scale. After a while, they may start to worry about “skewing too high” and may start to compensate by giving low ratings to sentences which, in a different context, they may have rated high. Fillers can play a key role in avoiding potential compensatory effects by eliciting a range of different response types (for related discussion, see e.g. Sprouse 2009 on the “Equalization Strategy”). Finally, interleaving targets with fillers can help minimize potential effects of syntactic satiation/adaptation or syntactic priming, both of which could distort the data. Finally, let us turn to one of the most basic questions, concerning the number of data points: How many target items are enough? How many participants are enough? In other words, how big does the sample size need to be, in order for us to detect an effect, assuming that one exists? Ultimately, the answers to these questions depend on various factors, such as the magnitude of the effect being investigated (some contrasts are more subtle than others), the method being used (some methods have been shown to be more sensitive than others), and the amount of inter- and intra-participant variability (see e.g. Sprouse and Almeida 2017 regarding acceptability judgements in particular; Brysbaert 2019 regarding experiments more broadly). If no effect is detected—i.e., if the result is a “null result”—this does not conclusively mean that there is no effect: It could simply not have been detectable due to lack of statistical power. Indeed, when trying to make decisions about the number of items and participants, it is important to be aware of the emerging concern among psycholinguists that many

48

elsi kaiser and jeffrey runner

past experiments in the general field of cognitive psychology and psycholinguistics have tended to be underpowered. Without focusing specifically on experimental syntax or binding/coreference, Brysbaert (2019) provides helpful rules of thumb about having sufficient numbers of data points; see also Sprouse and Almeida (2017) for specific discussion about acceptability judgments in particular. Thus, these issues are worth considering carefully during the design phase of a study. To avoid showing the same person multiple versions of the same few sentences, experimenters typically create multiple items that have the same structural properties (e.g. a reflexive inside a picture NP is marked as coreferential with the matrix subject) but use different words (different lexicalizations). In this way the experimenter obtains repeated data points about how each subject rates the structural/referential configuration of interest, without the participant being exposed to (essentially) the same sentence multiple times. When multiple lexicalizations and multiple conditions are investigated, they need to be presented to participants in a principled and systematic way. The most common approach in experimental syntax is the Latin-Square distribution (see e.g. Abbuhl, Gass, and Mackey 2013); but a detailed discussion of these considerations is beyond the specific reach of this chapter.

2.6 Future outlook and open directions

..........................................................................................................................

As we have seen over the course of this chapter, experimental investigations of acceptability judgements about coreference and binding face some specific but by no means insurmountable challenges. Experimenters have developed multiple means of indicating coreference/binding relations to participants who are not linguistically trained, thereby avoiding the use of potentially confusing subscripted indexes while still allowing naïve participants to provide meaningful data. Multiple means of measuring participants’ acceptability judgements are also available, and have been thoroughly tested in prior work. Furthermore, related methods such as scene verification and picture-selection paradigms can provide complementary data that also allows for the use of spoken stimuli and less metalinguistic tasks. However, many questions—both theoretical and methodological—remain open and many phenomena and languages remain under-investigated. Although experimental work has continually and gradually broadened its linguistic coverage, most work still tends to be done on a fairly small set of widely spoken languages. Given the crosslinguistic variation in anaphoric paradigms that exist across languages, both in terms of their morphological complexity and their syntactic behavior, much still remains to be discovered. Further crosslinguistic investigations can inform our understanding of the extent to which certain types of anaphoric expressions have universal properties vs. exhibit language-specific variation.

acceptability judgments of binding and coreference

49

Furthermore, even in the case of well-studied languages such as English, there are many phenomena related to coreference and binding that have been discussed in the theoretical literature but have received much less experimental attention. This includes semantic phenomena such as donkey anaphora, and syntactic phenomena outside the purview of core Binding Theory such as pronouns and reflexives in locative PPs (e.g. Mary saw a snake near her/herself ). Especially in situations where native-speaker judgments are potentially murky, an experimentally oriented approach has the potential to provide valuable complementary data that inform syntactic theorizing.

References Abbuhl, R., S. M. Gass, and A. Mackey. 2013. Experimental research design. In R. Podesva and D. Sharma (eds), Research methods in linguistics, 116–135. Cambridge: Cambridge University Press. Bach, E., and B. H. Partee. 1980. Anaphora and semantic structure. In Jody Kreiman and Almerindo E. Ojeda (eds), Papers from the parasession on pronouns and anaphora, 1–28. Chicago: Chicago Linguistic Society. Bader, M., and J. Häussler. 2010. Toward a model of grammaticality judgments. Journal of Linguistics 46: 273–330. Bard, E. G., D. Robertson, and A. Sorace. 1996. Magnitude estimation of linguistic acceptability. Language 72: 32–68. Brysbaert, M. 2019. How many participants do we have to include in properly powered experiments? A tutorial of power analysis with reference tables. Journal of Cognition 2(1), Article 16, 10.5334/joc.72. https://psycnet.apa.org/record/2019-45517-001. Carden, G., and T. Dieterich. 1981. Introspection, observation, and experiment: an example where experiments pay off. In P. D. Asquith and R. N. Giere (eds), PSA 1980: Proceedings of the 1980 Biennial Meeting of the Philosophy of Science Association, 583–597. Chicago: University of Chicago Press. Chien, Y-C., and K. Wexler. 1990. Children’s knowledge of locality conditions in binding as evidence for the modularity of syntax and pragmatics. Language Acquisition 1: 225–295. Cowart, W. 1997. Experimental syntax: Applying objective methods to sentence judgments. Thousand Oaks, CA: Sage. Cunnings, I., and P. Sturt. 2014. Coargumenthood and the processing of reflexives. Journal of Memory and Language 75: 117–139. Cunnings, I., and P. Sturt. 2018. Coargumenthood and the processing of pronouns. Language, Cognition and Neuroscience 33(10): 1235–1251. Dąbrowska, E. 2010. Naïve v. expert intuitions: An empirical study of acceptability judgments. Linguistic Review 27: 1–23. Featherston, S. 2002. Coreferential objects in German: Experimental evidence on reflexivity. Linguistische Berichte 192: 457–484. Featherston, S. 2008. Thermometer judgments as linguistic evidence. In Claudia Maria Riehl and Astrid Rothe (eds), Was ist linguistische Evidenz?, 69–90. Aachen: Shaker. Fraundorf, S. H., A. S. Benjamin, and D. G. Watson. 2013. What happened (and what

50

elsi kaiser and jeffrey runner

didn’t): Discourse constraints on encoding of plausible alternatives. Journal of Memory and Language 69: 196–227. Fukuda, S., G. Goodall., D. Michel, and H. Beecher. 2012. Is magnitude estimation worth the trouble? In Jaehoon Choi et al. (eds), Proceedings of the 29th West Coast Conference on Formal Linguistics, 328–336. Somerville, MA: Cascadilla. Gordon, P. C., and R. Hendrick. 1997. Intuitive knowledge of linguistic co-reference. Cognition 62: 325–370. Grosz, B. J., A K. Joshi, and S. Weinstein (eds). 1995. Centering: A framework for modelling the local coherence of discourse. Computational Linguistics 21: 203–226. Ionin, T., and Zyzik, E. 2014. Judgment and interpretation tasks in second language research. Annual Review of Applied Linguistics 34: 1–28. Kaiser, E. 2015. Perspective-shifting and free indirect discourse: Experimental investigations. In Sarah D’Antonio, Mary Moroney, and Carol Rose Little (eds), Proceedings of 25th Semantics and Linguistic Theory (SALT 25), 346–372. Linguistic Society of America and Cornell Linguistics Circle. http://journals.linguisticsociety. org/proceedings/index.php/SALT/issue/view/132. Kaiser, E. 2021. Anaphora: experimental methods for investigating coreference. In G. Goodall (ed), Cambridge Handbook of Experimental Syntax, 278–314. Cambridge: Cambridge University Press. Kaiser, E., J. Runner, R. Sussman, and M. Tanenhaus. 2009. Structural and semantic constraints on the resolution of pronouns and reflexives. Cognition 112: 55–80. Kaiser, Elsi, Justin Nichols, and Catherine Wang. 2017. Experimenting with imposters: What modulates choice of person agreement in pronouns? In Uli Sauerland and Stephanie Solt (eds), Proceedings of Sinn und Bedeutung 22, vol. 1, 505–521. Leibniz-Zentrum Allgemeine Sprachwissenschaft. Keller, F. 2000. Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. PhD thesis, University of Edinburgh. Keller, F., and A. Asudeh. 2001. Constraints on linguistic coreference: Structural vs. pragmatic factors. In J. Moore and K. Stenning (eds), Proceedings of the 23rd annual conference of the Cognitive Science Society, 483–488. Mahwah, NJ: Lawrence Erlbaum. Lago, S., A. Stutter Garcia, and C. Felser. 2018. The role of native and non-native grammars in the comprehension of possessive pronouns. Second Language Research 35(3): 319–349. Langsford, S., A. Perfors, A. Hendrickson, L. Kennedy, and D. Navarro. 2018. Quantifying sentence acceptability measures: Reliability, bias, and variability. Glossa 3(1): 1–34. Maia, J., and R. Morris. 2019. The semantics–pragmatics of typographic emphasis in discourse. Poster presented at the 32nd Annual CUNY Conference on Human Sentence Processing. Available at: https://www.colorado.edu/event/cuny2019/sites/default/files/attachedfiles/a3_maia_morris.pdf. Montrul, S., R. Dias, and H. Santos. 2011. Clitics and object expression in the L3 acquisition of Brazilian Portuguese: Structural similarity matters for transfer. Second Language Research 27: 21–58 Moulton, K., Q. Chan, T. Cheng, C.-H. Han, K. Kim, and S. Nickel-Thompson. 2018. Focus on cataphora: Experiments in context. Linguistic Inquiry 49(1): 151–168. Rebuschat, P. 2013. Measuring implicit and explicit knowledge in second language research. Language Learning 63: 595–626. Reinhart, T. 1983. Anaphora and semantic interpretation. London: Croom Helm. Schütze, C. 1996. The empirical base of linguistics. Chicago: University of Chicago Press.

acceptability judgments of binding and coreference

51

Schütze, C., and J. Sprouse. 2013. Judgment data. In Robert J. Podesva and Devyani Sharma (eds), Research Methods in Linguistics, 27–50. Cambridge: Cambridge University Press Sprouse, J. 2008. Magnitude estimation and the non-linearity of acceptability judgments. In N. Abner and J. Bishop (eds), Proceedings of the West Coast Conference on Formal Linguistics (WCCFL), 397–403. Somerville, MA: Cascadilla Proceedings Project. Sprouse, J. 2009. Revisiting satiation. Linguistic Inquiry 40(2): 329–341. Sprouse, J. 2011. A test of the cognitive assumptions of magnitude estimation: Commutativity does not hold for acceptability judgments. Language 87(2): 274–288. Sprouse, J., C. T. Schütze, and D. Almeida. 2013. A comparison of informal and formal acceptability judgments using a random sample from Linguistic Inquiry 2001–2010. Lingua 134: 219–248. Sprouse, J., and D. Almeida. 2017. Design sensitivity and statistical power in acceptability judgment experiments. Glossa: A Journal of General Linguistics 2(1): 14. 1–32, https://doi.org/10.5334/gjgl.236. Stevens, S. 1975. Psychophysics: Introduction to its perceptual, neural, and social prospects. New York: John Wiley. Sturt, P. 2003. The time-course of the application of binding constraints in reference resolution. Journal of Memory and Language 48: 542–562. Temme, A., and E. Verhoeven. 2017. Backward binding as a psych effect: A binding illusion? Zeitschrift für Sprachwissenschaft 36(2): 279–308. Wagers, M. W., E. F. Lau, and C. Phillips. 2009. Agreement attraction in comprehension: Representations and processes. Journal of Memory and Language 61(2): 206–237. Weskott, T., and G. Fanselow. 2011. On the informativity of different measures of linguistic acceptability. Language 87(2): 249–273.

c ha p t e r 3 ...........................................................................................................

( q ua n t i f i e r ) s c o p e judgments ...........................................................................................................

kriszta eszter szendrői

3.1 Introduction

..........................................................................................................................

3.1.1 Quantifier scope interactions: A semantic ambiguity? Sentences like (1a)–(4a) are arguably ambiguous in English. They contain quantifiers and operators whose interaction gives rise to the ambiguities illustrated in (1b)–(4b) and (1c)–(4c). The (b) examples illustrate what is usually called the overt scope reading, where the surface c-command relations between the quantifiers and operators are reflected in their scopal relations. This reading is also sometimes called “isomorphic,” again, because syntactic structure and semantic scope are matched. The (c) examples give the so-called inverse scope readings, where the surface c-command relations are reversed: here the scopal relations in meaning are the opposite of the c-command relations in surface syntax. The readings that involve sets of individuals for both quantifiers, rather than a single individual for the existential and a set of individuals for the universal, i.e. (1c) and (2b), are also sometimes called distributive readings. (1)

a. A doctor advised every nurse. b. a > every: There is a doctor that advised every nurse. c. every > a: For every nurse there is a doctor that advised her, but not necessarily the same doctor.

(2)

a. Every nurse assisted a doctor. b. every > a: Every nurse assisted a doctor, but not necessarily the same doctor. c. a > every: There is a (specific) doctor and every nurse assisted him/her.

54 (3)

kriszta eszter szendrői a. All doors in this car will not open at the next station.1 b. all > not: All the doors in the carriage will not open at the next station. c. not > all: It is not the case that all the doors in this carriage will open at the next station.

(4)

a. The detective didn’t find two guys. b. not > two: It is not the case that the detective found two guys. c. two > not: There are two guys that the detective did not find.

It has long been debated in the linguistic literature how best to represent such ambiguities in syntax and semantics, or in fact whether to represent them at all. Ambiguities are present in other sentences, for instance, the sentence in (5) is recognized to be ambiguous between a reading where the PP modifier attaches to the NP and a reading where it modifies the VP. (5)

a. I see the man with the binoculars. b. NP-modifier reading: the man has binoculars c. VP-modifier reading: I see the man through binoculars

This ambiguity thus arises because of the different syntactic positioning of the PP modifier. There is in fact no doubt that the sentence is ambiguous in this way, as the two readings have distinct truth conditions, i.e. they are true in different situations. But in sentences like (2), the readings are not so clearly distinct. In fact, one reading is logically entailed by the other: In any situation where a specific doctor is being assisted by every nurse, it is also true that every nurse assists a doctor. In other words, in any situation where (2c) is true, (2b) is also always true. This fact has been used to argue against the presence of a real semantic ambiguity in such sentences. In order to see how the argumentation goes let us consider the schematic situations in (6). (6)

1

Announcement on London’s Hammersmith and City underground line before Baker Street station, in the final carriage, which is only partially aligned with the platform at that station. The announcement is presumably intended with the inverse scope reading as it continues with Please use other doors.

(quantifier) scope judgments

55

On the face of it, (6a) depicts the inverse scope reading in (2c), while (6b) depicts the overt scope reading in (2b). But let us assume for argument’s sake that (2a) is not actually ambiguous, rather the only reading that is available is the overt scope reading, where surface c-command relations correspond to the semantic scope of the quantifiers, i.e. (2b). This reading is true in both situations depicted in (6). So, the argument goes, there is no distinct inverse scope reading of (2), rather the situation in (6a) is just a special case of the distributive type of situations—one where the doctors assisted by the nurses happen to be a single individual. So, this argumentation can be used to argue that in semantics, (2a) only has one reading, the overt scope one, (2b). The same issue arises with respect to (1a). Here, however, the more general situation (i.e. the one involving (potentially) different doctors, i.e. (6b)) corresponds to the reading that represents inverse scope, i.e. (1c). The overt scope reading, (1b), corresponds to the more specific situation, where the doctors who advise the nurses happens to be a single individual. So, if we want to treat these two situations be compatible with a single semantic meaning, as we did with sentences like (2), then we are forced to assign the inverse scope reading, i.e. (1c), as the only meaning of (1a). So, we are forced to abandon the parallelism between surface syntactic c-command relations and scopal order in the semantics. But this is counterintuitive. Why would two quantifiers take obligatorily overt scope in sentences like (2) but inverse scope in sentences like (1)? One could try to posit that the quantifier every always takes wide scope over an existential quantifier and thus both (1c) and (2b) would be derived, with (1b) and (2c) being special cases of them, respectively. This would be a good fix here, but this does not seem to hold generally. In sentences like (3), native speakers clearly easily access both the overt scope reading in (3b), with the universal taking wide scope, and also the inverse scope reading in (3c), where the negator takes scope over the universal. The same holds for (4). In any case, (3b) and (3c), and (4b) and (4c) are not in entailment relations. The truth conditions of (3b) are orthogonal to the truth conditions of (3c): one can conjure up a situation that makes (3b) true and (3c) false, and vice versa. The same holds for (4b) and (4c). So, (3) and (4) are similar to the PP-modifier ambiguity sentence in (5). It is natural to assume then that just like (5), at least (3) and (4) would involve a proper semantic ambiguity. The only difference is that in (3) and (4) the ambiguity is not due to the syntactic position of a PP-modifier, but rather to the relative scopal relations of a quantifier and negation. But then it seems to lack generality to then not propose the same kind of scopal ambiguity to be present in sentences with two quantifiers like (1) and (2). In addition, it is also the case that speakers have a psychological sense of ambiguity in examples like (1–4). What I mean by that is that the sentences are recognized to be ambiguous by speakers. Once they are encouraged to engage in metalinguistic considerations about the meaning of the sentences, they see them as ambiguous between the overt scope reading and the inverse scope reading. Speakers can offer judgments based on the distinct readings. Very often, like in the case of (1), they have strong preferences.

56

kriszta eszter szendrői

Anyone who has ever taught the grammar of such sentences to first-year undergraduates can testify that many of them insist that they only allow the overt scope reading, at least until they are confronted with syntactically analogous2 but lexically biased examples like Hirschbühler’s (1982) famous sentence, in (7), where the inverse scope reading “shines through”: (7) An American flag was hanging in front of every building. It is also often claimed (see e.g. Jackendoff 1972), although empirically this has proven difficult to substantiate (see Syrett, Simon, and Nisula 2014 for a successful attempt), that appropriate prosody can be used to disambiguate the two scopal readings: (8)

All the men didn’t eat. a. falling tune: overt scope b. rising tune: inverse scope

In sum, eventually, a consensus was reached in the field that the ambiguities illustrated in examples (1–4) are semantically real and should be represented in our grammar of English.

3.1.2 Research questions One of the first questions that then arises is whether the ambiguities should be represented in the syntax and if so, how. Recall the examples with PP modifiers like with binoculars, i.e. (5). These examples represent structural ambiguities, where the syntactic position of the PP determines which reading the sentence actually has. It is not immediately obvious what kind of structural account one could give to the scopal ambiguities in (14). Or whether the same analysis should be adopted for all four types of sentences. As we will see later, experimental work can be illuminative in this respect. Another group of questions concerns how people understand such ambiguous sentences in real time. Going back again to the structural ambiguity in sentences like I see the man with the binoculars, it is well known that people prefer—at least temporarily, while they are parsing the sentence—the reading where the PP modifies the whole VP. One may ask if there are similar preferences in the case of scopal ambiguity. Does the hearer commit to one of the readings early on during parsing, say to the overt scope reading? Or do hearers maintain the ambiguity as long as disambiguating evidence is encountered and only then commit to one of the readings? And what exactly counts as disambiguating evidence? Another important, and potentially related, issue concerns the role of context. The role of context is especially relevant because it is clear that the two readings associated 2

Near-analogous, to be precise, as such examples often have verbs with prepositional complements rather than NP direct object arguments.

(quantifier) scope judgments

57

with sentences involving an existential and a universal are not equal in terms of conceptual representation. Recall that the distributive scope reading (i.e. 1c and 2b) involves, or at least potentially involves, a set of individuals taking part in a series of events, while the wide scope existential reading necessitates the presence of only one such individual. Crain and Steedman (1985) and Altmann and Steedman (1988) argued that the preferred interpretation adopted by the parser for any kind of ambiguity is the one that carries the fewest unsatisfied presuppositions. In other words, hearing a sentence like (1) out of the blue, with no previous discourse context, the parser would prefer to assign the overt scope interpretation because that requires accommodating the existence of a single doctor in the discourse context, while the inverse scope reading (at least potentially, and according to Fodor (1982) preferentially) requires the accommodation of a set of doctors. But crucially, Crain and Steedman (1985) demonstrate that such preferences are dependent on discourse context. If the sentence is not parsed out of the blue, but rather embedded in a discourse context that already establishes the existence of a salient set of doctors, then the parser should no longer have a preference for the overt scope reading. Taking that line of logic one step further, if the context is designed in such a way that it is heavily biased towards a distributive interpretation, perhaps one would even expect an inverse scope parsing preference for sentences like (1). Based on such considerations, one might ask if speakers’ interpretation is sensitive to the discourse context in which the sentences occur. Do people have less of a preference for, say, the overt scope interpretation in a discourse situation that strongly favors an inverse scope interpretation? Do hearers actually parse such ambiguous sentences differently in contexts that favor one of the readings than in null contexts? More generally, one might ask if there is an extra processing cost associated with one of the readings. If so, that would be relevant for our linguistic analysis of these ambiguities. If one of the readings is easier to compute than the other one, perhaps that is due to that reading being syntactically simpler than the other one. Finally, do children parse such sentences the same way as adults? Are they aware of the ambiguities or can be made aware of them the same way as adults? Do they have a preference for one of the readings? Is it an even stronger preference than that of adults? Given that they have a developing grammar as well as a fewer processing resources, one might expect children to behave differently from adults under certain scenarios. Again, this kind of information would ultimately be useful for determining the correct linguistic analysis of such ambiguities. The rest of this chapter will review the psycholinguistic literature concerning scopal ambiguities in search of answers to these questions. We will consider how speakers assign the different readings to these scopally ambiguous sentences. We will see if some sentences are treated differently from others, and we will consider the time course of the parsing process. We will investigate the potential role different discourse contexts have in influencing people’s interpretation of the sentences, and whether children are different from adults in any of these respects.

58

kriszta eszter szendrői

3.1.3 Roadmap Recall that a crucial difference presents itself in that the two readings obtained by the different scopal orderings of an existential and a universal quantifier are not distinct, and that this issue does not arise in the case of sentences with a quantifier and negation. Although, as we have established above, both types contain real, semantic scopal ambiguity, nevertheless it turns out that speakers do not treat these two types of scopal ambiguities exactly the same way. This directly influences the linguistic analyses we should adopt for different types of scopally ambiguous sentences. The (non-)distinctness of the readings also presents a methodological issue. Since, it is logically impossible, for instance, to create a situation where say (1b) (There is a doctor that advised every nurse) is true and (1c) (For every nurse there is a doctor that that nurse advised) is false, it is problematic to test such non-distinct readings in any experimental task dependent on the truth conditions of the sentence, such as the TruthValue Judgment Task (Crain and McKee, 1985). This is because the expected response we associate with the existential-wide-scope reading (i.e. 1b) will also apply to the distributive reading (i.e. 1c). This restricts considerably our methodological tool box that we can apply to such sentences. For these reasons, we will divide our review into two parts, the first part involving sentences with an existential quantifier and a universal quantifier and the second part involving sentences with a quantifier and negation. In Section 3.2, we will look at sentences with an existential and a universal quantifier, like (1) and (2) above. We will find that forced-choice questionnaire studies reveal a preference for overt scope in examples like (1), and that a series of self-paced reading tasks showed that inverse scope in examples like (1) comes with a processing cost (Anderson 2004). This can be taken to support a syntactic (or semantic) analysis of such examples where the inverse scope reading is syntactically (or semantically) more complex than the overt scope reading. We will also review a growing body of evidence that children do access inverse scope readings in various types of sentences involving scopal ambiguities, even when the same ambiguity is not present in the adult language, i.e. in so-called rigid-scope languages. In Section 3.3, we will look at examples with a quantifier and negation, like (3) and (4) above. We will demonstrate that such examples are preferentially interpreted with surface scope in a task called the Incremental Verification Task (Conroy 2008). The same preference was found in a speeded force-choice task, while the non-speeded forcechoice task and a sentence completion task yielded no preference for either reading. In addition, we will see that adults have preference for the inverse scope reading with examples like (3) in the truth-value judgment task, and that their surface-scope bias can be successfully alleviated in the incremental verification task by priming (Conroy 2008). We will consider the so-called Parser Hypothesis, that proposes that adults and children have an intrinsic parsing bias for overt scope. We will also consider the Extra-Linguistic Hypothesis, that assumes that the observed parsing biases are not the result of genuine preferences in the parser for the overt scope reading, but are rather emergent results of

(quantifier) scope judgments

59

the interplay of various extra-linguistic factors, such as how different tasks place different demands on the hearer and how they lead to different verification strategies in different sentences. Throughout we will consider data from adults and children alongside each other. We will review the literature on the development of scope in first language acquisition. We will review evidence in favor of what has been called the Observation of Isomorphism (Musolino, Crain, and Thornton 2000; Lidz and Musolino, 2002), where it was found that in examples like (3) and (4), children have a strong preference for the overt scope reading in comprehension. However, the significance of this finding has been questioned. First, Gualmini (2004) and Hulsey et al. (2004) found that the inverse scope reading is available to children if the pragmatic conditions are favorable. Second, Conroy et al. (2009) found that the isomorphism effect found is only present in 5-year olds, but not 4-year-olds, giving rise to a U-shaped developmental curve. Overall, we will conclude in Section 4 that the evidence is mixed but ultimately comes down in favor of the Extra-Linguistic Hypothesis. Thus, the hypothesis that is most consistent with the body of evidence as a whole is that children, like adults, have access to both the overt and the inverse scope reading from the start. Various extra-linguistic factors, and potentially task effects, must then be responsible for the fleeting Observation of Isomorphism in children.

3.2 Universal–existential combinations

..........................................................................................................................

3.2.1 Adult psycholinguistic evidence 3.2.1.1 Early work: Evidence for Overt Scope Preference In this section we will investigate sentences involving an existential and a universal quantifier, such as (1) and (2) repeated here for convenience. Recall that the overt scope reading in both is the one where the scopal order of the quantifiers corresponds to their surface syntax c-command relations, i.e. (1b) and (2b). The so-called inverse scope reading is the one where the semantic scopal order is the opposite of the surface syntax c-command order, i.e. (1c) and (2c). Recall also that the reading where a group of individuals are involved for both quantifiers, i.e. (1c) and (2b), is called the distributive scope reading. (1)

a. A doctor advised every nurse. b. a > every: There is a doctor that advised every nurse. c. every > a: For every nurse there is a doctor that advised

(2)

a. Every nurse assisted a doctor. b. every > a: Every nurse assisted a doctor, but not necessarily the same doctor. c. a > every: There is a (specific) doctor and every nurse assisted him/her.

60

kriszta eszter szendrői

Early work on such sentences argued for the relevance of linear order (VanLehn 1978; Fodor 1982): the earlier the quantifier appears in the sentence, the wider the scope it has. Ioup (1975) proposed that the relative position of the two quantifiers on the argument hierarchy determines their scopal relation. Reinhart (1983) proposed that only surface c-command relations matter, which are not easily distinguished from linear order in a right-branching language like English. Yet others, such as Kempson and Cormack (1981) and May (1985), argued for extra-syntactic factors, such as the topical nature of the quantifier to play a role, in the sense that topics take wide scope over non-topical quantifiers. Finally, it has also been noted that different quantifiers exhibit differing degrees of likelihood for taking wide scope over other quantifiers (see e.g. Ioup 1975; Kroch 1974). (See Tunstall 1998 for an extensive overview.) Most of these studies involved corpus investigations or relied on paraphrases offered by the participants or on various metalinguistic judgments by the participants. For instance, Catlin and Micham (1975) asked participants to say which noun phrase the sentence was “about” and the members of which noun category would one need to examine in order to determine whether the sentence was true. Some studies involve complex judgment tasks: for instance, Micham et al. (1980) asked participants to determine whether a given sentence was true in a situation depicted by a diagram or table matching two sets of participants corresponding to the two quantified arguments, not unlike our own sketch in (6) above. All of these methodologies raise important questions. One has inadequate control over the experiment if subjects can offer their own paraphrases. Metalinguistic judgments, as involved in Catlin and Micham’s study, and problem-solving tasks, as in Micham et al. (1980), are likely to involve central cognitive processes to a higher degree. Our intention is to determine whether native speakers access inverse scope readings when they use language naturally. So, it is not desirable to get them to calculate the readings as a kind of maths problem. Nevertheless, bearing their methodological baggage in mind, it is important to note that all of these studies found some support for a general preference for overt scope. So, people generally preferred to interpret such sentences with semantic scopal relations corresponding to surface c-command relations of the quantifiers. This can be formulated as a broad and intuitive hypothesis, which we will simply state here as follows: (9) Overt Scope Preference (OSP) When a scopal ambiguity arises, people have a preference for assigning the overt scope interpretation over the inverse scope interpretation. Kurtzmann and MacDonald (1993) were the first to manage to successfully ease the methodological tension arising from obtaining convincing results about the nature of the reading assumed by the participant without overburdening them (and potentially interfering with the results) with a metalinguistic task. Rather than asking metalinguistic judgments or presenting participants with a problem-solving tasks, they provided disambiguation sentences with singular or plural subjects (e.g. the kid… vs. the kids…) which were presented to the participants after they read a doubly-quantified sentence

(quantifier) scope judgments

61

such as A kid climbed every tree. Their line of thinking was that if participants obtain overt scope in the doubly-quantified sentence, they will favour a continuation referring to a singular subject referent, while inverse scope would favor a plural subject in the continuation. They also included control items that had either unambiguous surface scope (e.g. The same kid climbed every tree) or unambiguous inverse scope (e.g. A different kid climbed every tree). Their results indicated the interplay of a variety of factors including syntactic scope between the quantifiers, but also the relevance of thematic roles, verb type (activity vs. stative), and topicality (i.e. subjects are preferred topics and take wide scope). Tunstall (1998) replicated their study and adapted it to three-argument verbs as in Kelly showed a photo to every critic last month. The photo was…/The photos were…. This way she successfully eliminated the topicality and verb-type factors. This is because the subject, which is the default topic in these sentences, is not quantificational. The question of scopal order concerns the direct and the indirect objects. The method used was a self-paced word-by-word stops-making-sense reading task, meaning that participants were asked to indicate if the sentence combination they are reading stops making sense to them. She hypothesized that participants will have a preference for overt scope, along the lines of our OSP.3 Interestingly, the results diverged. In the a… every-condition (i.e. items like (1)), participants had a significantly longer reading time in the critical region (subject and auxiliary) of the continuation sentence if the subject was plural compared to if the subject was singular (Tunstall 1998: 66). So, participants favored the overt scope reading in these sentences. However, no comparable difference was found in the every... acondition (i.e. in items like (2)). Here, Tunstall did not find longer reading times in the critical region for singular subjects. She proposed to account for this asymmetry by her Vagueness Principle: (10) Vagueness Principle (Tunstall 1998: 71) When the processor gives every wide scope over an indefinite, it can remain vague (underspecified) as to whether the indefinite is multiply instantiated or not. This information can be filled in by further inferencing or by subsequent context. So, her interpretation is that all the results are consistent with OSP—i.e. in every... a sentences too, every takes wide scope over the existential, reflecting their relative ccommand relations, but this does not force people to choose a plural continuation. So, you do not see a cost for continuation sentences starting with a singular subject in this case. In other words, Tunstall distinguished overt vs. inverse scope from distributivity. The latter is not necessarily a direct consequence of the former: a universal can take wide scope over an indefinite without necessarily triggering a distributive interpretation of the indefinite. Thus, Tunstall could maintain the generality of the OSP despite the 3

Tunstall’s (1998: 56) actual Principle of Scope Interpretation is a more technical formulation of the Overt Scope Preference.

62

kriszta eszter szendrői

diverging results across the a… every and every… a conditions. Note that an alternative interpretation of the data would be to take the divergence of the results at face value and propose different grammatical analyses for the two quantifier orders. We will come back to this idea below.

3.2.1.2 The role of context: Evidence from self-paced reading studies Let us now review a series of experiments that addressed the question whether in online processing the parser has a preference for overt scope in a...every or every...a sentences, and whether such a preference can be mitigated or perhaps even reversed by appropriate discourse context. The tentative conclusion we will draw is that there is a robust overt scope preference for a...every sentences, while the evidence of an overt scope preference for every...a sentences is murky at best. We will also see that context does influence the availability of the inverse scope reading, but that in most cases, again for a...every sentences, the parser’s preference for overt scope cannot be totally obliterated. These conclusions will lead us to question Tunstall’s uniform analysis of the two types of sentences. Anderson (2004) performed a series of experiments to investigate parsing preferences for doubly-quantified sentences. Her first experiment was a questionnaire study using items like (11). It was a variation on Kurtzmann and MacDonald’s (1993) methodology in the sense that it involved two sentences, one with the two quantifiers and a paraphrase that would disambiguate the reading. (11) Example test item for questionnaire study (Anderson 2004: 32, ex. 47) A cashier greeted every customer. a. One cashier greeted customers. b. Several cashiers greeted customers. The singular paraphrase was chosen in 81% of the cases, indicating a strong preference for the overt scope reading. In her second experiment, Anderson (2004) decided to investigate whether contextual bias influences people’s parsing preferences. So, she tested the same sentences as in the first experiment embedded in a context that biases for overt scope (12a) and inverse scope (12b), respectively. (12) Example test items for questionnaire study with context (Anderson 2004: 35, exx. 53 and 54) a. Overt-scope-biasing context The members of the gourmet club decided to put out a cookbook of their favorite recipes. They wanted the recipes to be easy enough for an inexperienced cook. The president of the club requested that a volunteer test the recipes to make sure that the

(quantifier) scope judgments

63

instructions were correct. After a short discussion, a member of the club tested every recipe. b. Inverse-scope-biasing context The members of the gourmet club decided to put out a cookbook of their favorite recipes. They wanted the recipes to be easy enough for an inexperienced cook. Members who nominated recipes were required to test the recipes to make sure that the instructions were correct. A member of the club tested every recipe. Anderson also included two control conditions, which used the same contexts as the ambiguous conditions, but which employed test sentences that unambiguously indicated an overt scope reading or an inverse scope reading. For the former she used a definite NP subject (The helpful member tested every recipe), for the latter she used a distributive modifier (A different member tested every recipe). Each paragraph was followed by a forced choice comprehension question, as in (13). (13) How many club members tested recipes? One. Several. The results revealed an effect of context as the singular response for the comprehension question, indicating an overt scope interpretation, was chosen 81% of the time in the ambiguous overt-scope biassing context but only 47% of the time in the ambiguous inverse-scope biassing context (Anderson 2004: 40). (The corresponding findings for the unambiguous conditions were 99% in the overt scope context and 4% in the inverse scope context.) Note that the overt-scope biassing context did not seem to have a positive effect in the sense that comparable response rates were found in Experiment 1 without contextual embedding. Nevertheless, there was still a strong bias towards an overt scope interpretation, despite the presence of a context with an inverse-scope bias. Anderson also performed the experiment for every...a items, but the results are not comparable, as she used a rating scale instead of a forced-choice response. The results showed an effect of context in that case too. They also showed that the plural paraphrase was acceptable to people as it had an average rating of 3.3 on a 5-point scale, which is significantly higher than the mid-point at 2.5, while the surface-scope paraphrase had a 4.3-rating (Anderson 2004: 43). Anderson (2004) also performed a self-paced reading task with sentence pairs adopted from Kurtzmann and MacDonald’s (1993) study: (14) Example test items from self-paced reading study (Anderson 2004: 45) a. An experienced climber scaled every cliff. The climber(s) was/were very skilled. b. Every historian examined a document. The document(s) was/were in good condition. The residual reading times calculated for the continuation sentence as a whole were significantly longer in the a...every condition, if the continuation sentence had a

64

kriszta eszter szendrői

Table 3.1 Percentage of surface scope response for comprehension question (i.e. One in a...every condition; Several in every...a condition) for the two types of continuation sentences in Anderson’s (2004) study Singular subject disambiguating sentence Plural subject disambiguating sentence

a...every

every...a

87% 59%

18% 91%

plural subject compared to when it had a singular subject (Anderson 2004: 48). (There was also a marginal difference in the same direction on the final part of the continuation sentence, i.e. the part following the finite verb.) This indicates a processing cost associated with inverse scope, because participants found it more taxing to read a continuation sentence that is compatible with an inverse scope reading, suggesting that they had committed themselves to an overt scope interpretation. Interestingly, no comparable (but opposite) difference was found for the every...asentences. Here the reading times for the continuation sentence as a whole or at any of its regions were not longer if the subject of the continuation sentence was singular, compared to when it was plural (Anderson 2004: 52). This is compatible with at least two interpretations. First (and this is what Anderson proposes), we can follow Tunstall’s line of thinking: There is an overt scope parsing preference in these cases too, but since that does not commit to a plural subject in the continuation sentence (see Vagueness Principle), the task is inadequate to find this difference. In other words, participants may have chosen the singular response and nevertheless entertain an overt scope interpretation. But the data is also compatible with an alternative explanation, namely that in this case, there is no overt scope parsing preference in the first place. The second explanation can perhaps be further supported by looking at the answers participants gave to the comprehension questions in (13). As we can see in Table 3.1 (Anderson 2004: 49, table 3; 54, table 5), participants chose the response for the comprehension question that was compatible with an overt scope reading around 90% of the time in both the a...every and the every...a conditions when the disambiguating sentence was compatible with that reading (i.e. in the singular subject in the a...every condition and the plural subject in the every...a condition). But they behaved differently in the face of a continuation sentence compatible with inverse scope. In the a...every condition they still chose a response to the comprehension question that indicated an overt scope interpretation 59% of the time, giving evidence for a reluctance to entertain inverse scope. In contrast, in the every...a-condition they only did so 18% of the time. In my view, this shows that participants were not averse to an inverse scope reading in this case, as they chose a response to the comprehension question indicating that reading 82% of the time. Anderson also tested the sentence pairs in (14) in biassing contexts like the ones in (12). She used a 2×2 design with quantifier order and contextual scope bias as the two

(quantifier) scope judgments

65

controlled variables. There were two interesting findings. First, in the a...every condition the continuation sentence that disambiguated for the inverse scope reading (i.e. the one with a plural subject) was read more slowly than the continuation sentence that disambiguated for the overt scope reading (Anderson 2004: 61–62). So there was a processing cost involved with the inverse scope interpretation for a...every sentences even if the context was biased for that interpretation. It is interesting to note that the processing cost could only be measured on the continuation sentence, as the reading times for the critical regions of the doubly quantified sentences did not reveal any relevant differences. There was a marginal effect of context, with the inverse-scope-biassing context leading to marginally longer response times irrespective of the type of disambiguating sentence (singular subject or plural subject). Second, unlike in the experiment without context, marginally longer reading times were found for the continuation sentence that disambiguates for the inverse scope reading (this time that is the one with singular subject). There are two possible explanations for the difference between the experiment without biassing context and the one with context. Anderson takes it to stem from the fact that an inverse-scope-biassing context boosted participants’ willingness to entertain an inverse scope reading. In turn, the higher proportion of inverse scope interpretations led to an increased processing cost. But in my opinion, it is not clear that this was in fact the case. This line of thinking would predict that there would be a discrepancy in the proportion of inverse scope responses in the every...a condition depending on whether the context was biassing for overt scope or inverse scope. However, this was not the case: the proportion of responses indicating an inverse scope interpretation (i.e. One. in the every...a-condition) was 71% in the overt-scope-biassing context and 69% in the inverse-scope-biasing context (Anderson 2004: 66, table 7). So, there is in fact no evidence that contextual bias had an effect on participants’ willingness to entertain inverse scope for every...a sentences. Given that the result was only marginal, it seems more parsimonious to conclude that there is no conclusive evidence that there is a parsing preference for the overt scope interpretation in every...a sentences. The final pair of experiments that Anderson conducted addressed the question whether the increased processing cost measured on the continuation sentences indicating inverse scope in the a...every condition was the result of the parser committing to an overt scope representation early on and the continuation sentence forcing a reanalysis, which leads to processing cost, or whether participants in fact do not commit to a resolution of the scope ambiguity until the continuation sentence, and then the measured processing cost can be attributed to deriving the inverse scope reading. In one experiment (Anderson 2004: 69), she tested a...every sentences with no discourse context and no continuation sentence in a self-paced reading task. The sentences were always followed by a comprehension question (see (13) above). She analyzed the reading times data separately depending on whether participants’ response to the comprehension question indicated an overt scope reading or an inverse scope reading. Calculating residual reading times for the entire sentence, a significant difference was found: Participants who entertained the inverse scope reading read the sentences significantly slower

66

kriszta eszter szendrői

compared to participants whose response to the comprehension question revealed an overt scope interpretation. Anderson (2004: 73) concludes that it is the assignment of the inverse scope reading to the sentence and not a subsequent reanalysis that presents a processing load. In her final experiment, Anderson (2004: 73–74) used the same a...every sentences, this time embedded in biasing contexts. Again, there were no continuation sentences and the data was divided in two based on the response participants gave to a follow-up comprehension question. She also included two unambiguous conditions, using items like in her second questionnaire study described above. The unambiguous sentences were only embedded in matching discourse contexts. The results revealed that reading times were significantly longer in the inverse-scope-biassing context compared to the overt-scope-biasing context. This was true for ambiguous as well as unambiguous items. She concludes that the favorable context did not mitigate the processing cost associated with entertaining the inverse scope reading. This goes against the Principle of Parsimony (Altmann and Steedman 1988; Crain and Steedman 1985), which states that in case of parsing ambiguities, a reading that fits with the discourse context better is always favored. In contrast, it seems that a processing cost was incurred even in the presence of favorable context. Moreover, the processing cost was not even alleviated when the item itself received unambiguously inverse scope (i.e. items with the modifier different), so essentially, the processing cost is present even if there is no actual ambiguity between the inverse scope reading and the overt scope reading. Overall, regarding the tentatively hypothesized Overt Scope Preference, we can conclude that the reviewed psycholinguistic evidence presents a robust overt scope preference for adult processing of a...every sentences, but not for every...a sentences. It turns out that it is possible to mitigate this preference using a discourse context that biases the reader towards an inverse scope interpretation, but the preference does not fully go away. Finally, we have seen that the real-time assignment of inverse scope comes with a processing cost in a...every sentences, but the same was not so clearly the case in a...every-sentences.

3.2.1.3 Theoretical considerations regarding the adult psycholinguistic evidence The question I would like to consider in this subsection is the theoretical implications of the above adult psycholinguistic findings. It is the nature of theoretical discussions that they inevitably engage with technical details of sometimes complex proposals. Some readers of this chapter will relish the thought, others less so. I can reassure the latter type of reader that they can fully profit from this chapter by omitting to read this subsection and jumping straight to Section 3.2.2 to consider the review of the data from language development. Let us briefly consider what kind of theoretical proposals are available in the literature to account for scopal ambiguities. In generative syntax it is generally assumed, following

(quantifier) scope judgments

67

May (1977), that inverse scope in a…every sentences is obtained by a covert movement operation, quantifier raising, henceforth QR, as illustrated in (16) for a sentence like (7). (16)

IP

every building [IP an American flag was [VP hanging in front of tNP ]

It was shown that QR, like other instances of A-bar movement, is island-sensitive and may give rise to semantically distinct readings.4 QR, as its name suggest, can only affect generalized quantifiers. Existentials like the object noun phrase in examples like (2) do not have that option. Rather, they obtain wide scope by different means. Different proposals exist with respect to the nature of existential wide scope. For ease of exposition, let us adopt Reinhart’s (1997) choice function account for existentials. The details of the choice function mechanism need not concern us here. What matters for us here is that these assume that existentials do not take wide scope via a syntactic movement operation.5 Reinhart (1997) showed that such a distinction (i.e. QR vs. choice function) is empirically justified given the divergent grammatical properties of wide scope indefinites and wide scope universals: Wide scope indefinites, for instance, are not island-sensitive, and generally have properties that liken them to wh-in-situ. Wide scope universals are island-sensitive, even clause-bound, and in general their properties are similar to those of moved wh-elements. Other theories do not posit any kind of asymmetry between how the grammar assigns overt versus inverse scope. In Steedman’s (2000) Combinatory Categorial Grammar, for instance, the different scopal possibilities are derived by differential orders of composition between the verb and the noun phrases in question. If the verb combines with the object first and then the subject, the subject will take higher scope. If the verb combines with the subject first, and then the object, the object will take scope over the subject. The former composition gives rise to what we call overt scope and the latter to the inverse scope reading. There is no asymmetry at the level of syntactic structure; in fact, in this system the terms “overt” and “inverse” make little sense. There is also no asymmetry of type, in the sense that indefinites and universals are treated in the same way by the system. Tunstall’s own (1998) proposal sits halfway between the two. She posits an asymmetry in terms of how syntax and semantics map onto each other, but does not distinguish between the different types of quantifiers. In this respect, Tunstall’s proposal is similar to the theoretical proposal put forward by Bobaljik and Wurmbrand (2012). They proposed that there is an interface principle in the grammar that ensures that surface syntax c-command relations are respected at LF, and consequently in the semantics, 4 In fact, QR seems to be even more strongly local than A-bar movement, as it is generally clausebound. The interested reader is referred to Reinhart (1997) for a historical overview of different conceptualizations of QR. 5 Note that the same applies arguably to numerals also, as in (4). Numerals can be generalized quantifiers, in which case they can obtain wide scope via QR, or they can be existentials, in which case they take wide scope by other means.

68

kriszta eszter szendrői

by corresponding scopal relations. These two proposals are essentially more technical formulations of our Overt Scope Preference idea. So, we have three different approaches to quantifier scope interactions: One that treats overt scope and inverse scope on a par (e.g. Steedman’s framework), one that treats overt scope as preferable in general (e.g. Tunstall’s approach and Bobaljik and Wurmbrand’s proposal), and one that treats overt scope as preferable in those cases where inverse scope would necessitate QR (Reinhart’s (1997, 2006) economy account). Other formulations of these theoretical possibilities are available in the literature, but for ease of exposition I singled out these proposals to illustrate these three logical possibilities. Considering the predictions of these theories for psycholinguistic data, note that the three proposals are asymmetrically entailed. Steedman predicts no asymmetries, Bobaljik and Wurmbrand predict a general overt scope preference, while Reinhart predicts a preference for overt scope in those cases where inverse scope would be the result of QR. In other words, this means that Bobaljik and Wurmbrand’s theory would not be falsified by data that would support Reinhart’s proposal; it would simply have to be enriched to provide an account for the unexpected difference between the parsing preferences associated with the different type of quantifiers. Similarly, if it turns out that there is a general parsing preference for surface scope, that would not be incompatible with Steedman’s framework. Rather, some additional mechanism would need to be invoked to account for the observed difference. The predictions have a stronger bite in the opposite direction. Reinhart’s (1997) proposal would be called into question if a general rather than a particular overt scope preference was found to be present in parsing; and Bobaljik and Wurmbrand’s proposal (as well as Reinhart’s) would be questioned by findings that point to a systematic lack of evidence for any kind of parsing preference for overt scope. In this light, we can review the findings enumerated in Section 3.2.1.2. Anderson’s findings are by and large compatible with Reinhart’s (1997) predictions. There was a clear overt scope preference in a…every sentences, which are purported to involve QR by Reinhart (1997), but not in every…a sentences, which Reinhart assumes do not involve QR. Nevertheless, more general tendencies for overt scope were present in all the conditions, and the fact that this overt scope preference could only be partially mitigated by context is also important to note. This, if taken at face value, would provide support for the approaches of Tunstall and of Bobaljik and Wurmbrand’s approach. Anderson’s (2004) findings also revealed that in a…every sentences, but not so much in every…a sentences, the parser experiences an extra processing load when considering inverse scope readings, even if that reading is supported by context and, moreover, even if the utterance is unambiguous. One question that arises is why the construction of an inverse scope interpretation would present a processing load. It is not the case that instances of A-bar movement generally have this effect. Reinhart (1997; 2006) argued that the processing load is due to the fact that QR involves global economy considerations, which are costly to the processor. This might go some way to explain the findings, but under this view one would potentially expect the processing cost to be

(quantifier) scope judgments

69

diminished if the inverse scope reading is supported by the discourse context, and perhaps even eliminated altogether when the sentence in question takes an inverse scope reading unambiguously. The robustness of the processing cost in such sentences supports a general default overt scope interpretation, which the parser seemingly has to abandon if faced with the inverse scope reading. That is not dissimilar to the reassignment of the syntactic position of the PP-modifier in sentences like I saw the man with the binoculars from a VP-adjoined position to a NP-adjoined position in the course of the parse. Whether this has to do with the presence of an existential in the subject position, or the default topical nature of the existential subject, or a genuine overt scope preference, should be explored further in future research. Let us now turn to the findings from language development. As we will see, these findings also have interesting theoretical implications, which we will also discuss.

3.2.2 Evidence from child language acquisition: scopal freedom Based on the adult findings reviewed above, we can expect that children would also have an overt scope preference for sentences involving an existential subject and a universal object. In fact, given that Anderson found that the inverse scope reading incurs an extra processing load in a…every sentences, we might even expect children to have an exaggerated preference for overt scope in these sentences. Whether this is in fact so depends in large part on what actually causes the processing load associated with inverse scope in such utterances. We will come back to this issue at the end of the section, where we will consider the theoretical implications of the language development findings. The literature on quantifier scope interactions rarely considers existential–universal quantifier pairs. This is due the presence of an entailment between the two readings, which, as we have already noted, makes it methodologically difficult to test such sentences using tasks that rely on the assignment of a truth value. Nevertheless, a pioneering study on quantifier scope interactions involved precisely these quantifiers. Japanese is a so-called rigid-scope language where inverse scope by quantifier raising is severely restricted. A sentence like (17a), for instance, would be assigned overt scope by adult native speakers. In this language, the distributive scope reading (every > some), would be available in utterances with scrambling, like (17b). Here the object c-commands the subject in surface syntax, so the distributive reading can be obtained without resorting to inverse scope. (17)

a. Dareka-ga daremo-o sementa someone-NOM everyone-ACC criticized ‘someone criticized everyone’ (unambiguous) b. Daremo-oi dareka-ga ti semeta everyone-ACC someone-NOM criticized Lit. ‘Everyone, someone criticized’ (ambiguous) (Goro 2007: 57–58, ex. 41)

70

kriszta eszter szendrői

Goro and Akiba (2004), reported in Goro (2007), performed a truth-value judgment task with English and Japanese children and adults using sentences with an existential subject and a universal object, as in (18). (18) Dareka-ga dono tabemono mo tabeta someone-NOM every food ate ‘Someone ate every food.’

(Goro 2007: 47–48: ex. 36)

The story involved an eating contest with twelve group of animals. Each group consisted of three animals of the same type (e.g. three pigs, etc.). Each group was invited to eat three different pieces of food (e.g. a cream puff, a banana, and a pepper). The child was told that there are two important rules of the game. First rule is that all the food must be eaten. Second rule is that each of the group members has to eat something. So, if each member of a group eats exactly one item, the group wins and gets a gold medal. If one animal is greedy and eats up all the three food items, the group gets a black cross, i.e. a symbol of failure. Also, if they all refuse to eat one of the food items, they also get a black cross. The outcome of the story was that four groups performed according to the rules and received gold medals. Four groups had a greedy member who ate up all the food, and thus received a black cross, and four groups shared the food out nicely but the final member didn’t finish eating their food item, so they ended up receiving a black cross too. The critical trials are the groups that received a gold medal, as here the inverse scope reading of the test sentence is true while the overt scope reading is false. In the second batch, with the groups with greedy animals, the reverse is true. The third batch involves groups that attempted to perform what corresponds to the inverse scope but failed to do so. Sixteen Japanese children with a mean age of 5;4 (range: 4;10–5;9) accepted the inverse scope reading, so answered “Yes” to the test item 42.2% of the time. A group of 16 Japanese adults never accepted the critical test items. Sixteen English-speaking children with a mean age of 5;4 (range 5;0–5;10) accepted the inverse scope reading 35.9% of the time, while 29 English-speaking adults did so 33.6% of the time (Goro 2007: 53, ex. 38). Goro (2007) also performed a control study with different test items, to ascertain that Japanese children also have access to the overt scope reading of such sentences. Goro (2007) concluded that Japanese adults revealed an unwillingness to assign inverse scope to such sentences. This matches previous findings, in both the theoretical and the empirical literature on Japanese, about Japanese being a rigid-scope language. At the same time, Japanese children’s behavior patterned with that of English-speaking children, and not with that of Japanese adults. Szendrői et al. (2017) found very similar results with German children and adults using an act-out task. One advantage of the act-out task is that it does not require truth-value judgments on the part of the participant, thus making it ideal for testing existential–universal quantifier pairs. At the same time, it is important to note that, methodologically speaking, the act-out task is less than ideal to investigate any kind of ambiguity. This is because the participant displays their preferred reading in this task.

(quantifier) scope judgments

71

It is possible, therefore, that the act-out task would underrepresent all the readings that the participant would be able to assign to the test sentence. German is also a rigid-scope language, with scrambling. There too, adults showed scope rigidity with utterances like (19), assigning an overt scope reading 98% of the time. In contrast, 20 5-year-old children with a mean age of 5;3 (range: 5;1–5;7) performed an inverse scope action 56% of the time, and 20 6-year-olds (mean: 6;4, range: 6;1–6;11) did so 42% of the time. (19) Ein Tierpfleger füttert JEDE Giraffe. (Szendrői et al. 2017:7, ex. 11) A zookeeper feeds EVERY giraffe. This shows that the unexpected scopal freedom that Goro found with Japanese children is not a language-specific effect: The same holds in another rigid-scope language too. (See also Zhou and Crain 2009 on Mandarin discussed below.)6

3.2.3 Interim summary Recall that in the adult psycholinguistics literature, it was found that overt scope is preferred in a...every sentences, unlike every…a sentences, and that inverse scope in the former type of sentences comes with a processing cost. The evidence from child language paints a different picture. Here we saw that even in languages with rigid scope, 5- and 6-year old children showed scopal freedom. We cannot offer a satisfactory reconciliation of these facts at this point. Instead, let us turn to scope interactions involving negation. This should be useful in investigating whether the overt scope preference of the parser is something that holds more generally, or whether perhaps it arises as an emergent phenomenon due to the nature of the doubly-quantified sentences we explored so far.

6

In terms of a more detailed theoretical perspective, it is possible to think of scope rigidity in the form of cross-derivational, global economy, as suggested by Reinhart (1997). In this line of thought, the reason why German and Japanese SVO sentences lack the inverse scope reading would be precisely that these languages allow for alternative word orders (i.e. e.g. scrambling) that have the same distributive scope reading without recourse to inverse scope. These alternative orders effectively block the availability of inverse scope reading (see also Bobaljik and Wurmbrand 2011 for the same point). If children fail to carry out such global cross-derivational comparisons due to processing limitations, as Reinhart (1999; 2004) suggested, then they would be expected to fail to exclude the inverse scope reading of the SVO utterances. They cannot retrieve the alternative word order variant that obtains the distributive reading under overt scope and compare them under the intended interpretation, so they have no reason to exclude the inverse scope reading, and this no blocking takes place. Hence their lack of scope rigidity. This is how Szendroi et al. proposed to account for their results. Note, however, that Goro (2007) argues against such a blocking account and offers an alternative account based on the conversational implicature of maximality associated with the Japenese particle ga. Persuasive as his account is, however, it would not easily carry over to German. So, we shall have to leave open for future research the issue of why children in rigid-scope languages consistently experience scopal freedom.

72

kriszta eszter szendrői

3.3 Scope interactions of quantifiers and negation

..........................................................................................................................

3.3.1 Setting the scene Recall that Tunstall (1998) proposed that the parser is intrinsically endowed with a preference for overt scope. Anderson’s (2004) findings endorsed this, at least for a...every sentences. But is it in fact an intrinsic property of the human parser to preferentially assign overt scope? Or is it perhaps the case that the parser considers both overt and inverse scope and extra-linguistic factors influence the final choice resulting in an outcome that prefers overt scope? So, this preference is an emergent consequence of the combination of extra-linguistic factors and the properties of the parses, not due to an intrinsic property of the parser itself. One possible way to explore the generality of the parser’s overt scope preference is to conduct a variety of tasks with the same sentences. If the parser has a general preference, this should show up in all or most tasks, or at least in the tasks that tap into early preference. We will review experimental evidence that has been amassed on various tasks both off-line and online ones, with or without context. The findings put together show a mixed picture, with some tasks showing a strong overt scope preference, while others not. Another issue that could shed light on this issue is the comparison of data from adults and children. Assume that the parser has an intrinsic preference for overt scope, which can be overridden in favorable contexts in the case of adults, although there is some evidence that even in this case the inverse scope reading incurs a processing cost. In this scenario it would be a natural extension of the state of affairs in adults that children would have an even stronger preference for surface scope. It is well established that children have smaller working memory resources and also that they are generally less able to capitalize on at least certain types of discourse-contextual information (e.g. Noveck 2001). Both would point in the direction that children’s ability to override the assumed overt-scope preference of the parser should be diminished compared to adults’ ability, resulting in an even more robust overt-scope preference. In contrast, it is also possible that the parser has no intrinsic preference, but rather supplies both overt and inverse scope readings. It could be the result of a combination of grammatical and extra-grammatical factors that ultimately adults show an overt-scope preference in many tasks, especially those without supporting discourse context. In such a scenario, whether children show an overt-scope preference would depend on their knowledge of the relevant grammatical factors and their susceptibility to the relevant extra-grammatical factors. If children are adultlike in both these domains, we would expect them to have the same behavior as adults, so we would expect them to show an overt-scope preference in many tasks.

(quantifier) scope judgments

73

Alternatively, if children either lack necessary grammatical knowledge or are less susceptible to the relevant extra-linguistic factors, then we would expect a less robust preference for overt scope compared to adults. In the following sections we will review a number of studies. The reader might find it helpful to refer to Table 3.2 for details of each experiment.

3.3.2 The ‘Observation of Isomorphism’: Evidence from truth-value judgment tasks Musolino et al. (2000) tested 15 children with an average age of 4;7 (range 3;10–5;2), 15 children with an average age of 5;7 (range: 5;2–6;6) and a group of adults in a truth-value judgment task, using sentences like (20). (20) The detective didn’t find some guys. The context story involved a situation where different characters hid behind various objects and the detective’s task was to find them. The outcome of the story was designed to satisfy the inverse scope reading (i.e. ‘There was someone the detective didn’t find’) but falsify the overt scope reading (i.e. ‘There was at least one guy the detective found’). Note also that the overt scope reading is independently ruled out because some N is a positive polarity item in English, so it must be outside the scope of negation. The adults accepted the test sentence 100% of the time, while the older children did so in 65% cases, and the younger ones 35% of the time (Musolino et al. 2000: 10). All these results were significantly different from each other. Children’s justification for their “No” response was that the detective did find someone, so they revealed an overt scope interpretation, despite some N being a positive polarity item in adult grammar. Musolino et al. (2000) also tested 20 children with an average age of 5;11 (range 4;0– 7;3) and a control group of adults on sentences like (21). In the story, three horses attempt to jump over a barn, but they realize it is too high for them to jump over, then they decide to jump over a fence. Two horses successfully jump over the fence, but the third one fails to do so. This outcome makes the inverse scope reading (not > every) true, while the overt scope reading (every > not) is false. (21) Every horse didn’t jump over the fence. Children accepted the test sentence in 7.5% cases, while adults did so 100% of the time. This indicated a very strong preference for overt scope with these types of sentences too. There is an asymmetric entailment relation between the two different scopal possibilities of a negation and a universal in the sense that the reading where the universal takes wide scope over negation entails the reading where the negation takes wide scope over the universal. In other words, if it is true that none of the horses jumped over the fence (i.e. every > not), then it is also true that not every horse did so (i.e. not > every). For this reason it is impossible to make the every > not reading true in a situation

univ > neg univ > neg

Conroy 2008 Q/A task Conroy 2008 sentence completion task Conroy 2008 speeded FC Conroy 2008 IVT Conroy 2008 IVT Conroy et al. 2009 Musolino & Lidz 2006 Zhou & Crain 2009

Lidz & Musolino 2002 Gualmini et al. 2008 Musolino & Lidz 2003 Conroy 2008 IVT Conroy 2008 IVT Musolino & Lidz 2003 exist > univ actout Goro & Akiba 2004, TVJT

Every bunny didn’t eat a purple carrot. Every dwarf didn’t spray-paint the barn that belongs to the pig/the cow

Every dwarf didn’t spray-paint the barn that belongs to the pig/the cow Every dog isn’t wearing a hat

Every cow doesn’t have a hat

Every cat didn’t hide behind the sofa.

Every horse jumped over the log, but every horse didn’t jump over the fence Every horse jumped over the log, but every horse didn’t jump over the fence (Mandarin)

The detective didn’t find two guys The Troll didn’t deliver two pizzas The detective didn’t find two guys

John didn’t find two hearts John didn’t find two hearts

Two frogs didn’t jump over the rock Ein Tierpfleger füttert JEDE Giraffe Szendrői et al. 2017 Dareka-ga dono tabemono mo tabeta Some ate every food

Every bug didn’t hide behind the tree.

Musolino et al. 2000 Musolino & Lidz 2006 Viau et al. 2010

Every horse didn’t jump over the fence

NO

No NO

exist > univ exist > univ

early late

NO NO YES

NO YES

NO

NO

num > neg

neg > num neg > num

neg > num neg > num neg > num

univ > neg

Ist

last

univ > neg univ > neg

pig

NO pig

NO NO NO

NO NO

OS

univ > neg

univ > neg univ > neg

univ > neg univ > neg univ > neg

neg > exist neg > exist

Musolino et al. 2000 Gualmini 2004

The detective didn’t find some guys The Troll didn’t deliver some pizzas.

Surface syntax

Source

Test item

YES YES

YES 2 IS

late early

YES YES YES

YES NO

YES

YES

2nd

1st

cow

YES cow

YES YES YES

YES YES

IS

0 33.6

27.5

47 early; 40 late

unclear

93

0 100

100

22.9 1st (incl. 14/22 0) 63.4 1st (with 10/20 100) 76

8/20 100; 9/20 0 40?= 10/20 100 ‘pig’, 10/20 86.6 ‘cow’ 18.5 ‘cow’

100 92.5

100

% YES Adults

56 IS (5;3), 42 IS (6;4) 42.2 (5;4) 35.9 (5;4)

33 (4;4) 75 (4;6) 75 OS, 7.5 IS, 17.5

89 (3;4-4;3)

81 (4;9)

22.3/38.8 (4;5), 80.6 primed

35 (4;7) 90 (4;10)

%YES 4yo

Table 3.2 Summary of experimental findings of the language acquisition studies reviewed in this paper

44 (5;4)=(7/15 0, 8/15 82.5) 60 (5;4) =6/10 100, 4/10 0 10 (4;5-5;11) 100(4;5-5;11)

7.5 (5;11) 15 (5;4)

65 (5;7)

% YES 5-6yo

(quantifier) scope judgments

75

while making the not > every reading false. This poses a methodological problem for the truth-value judgment task, as this task relies on associating the “Yes” answer with the reading which is by assumption harder to obtain, and the “No” answer with the other reading. This is not a problem in utterances like (21) where the targeted inverse scope reading is the one where negation takes wide scope over the universal. But in sentences like (22), one cannot associate the every > not reading with “Yes,” while making sure that the not > every reading will correspond to a “No” response. (22) The detective didn’t find every guy. Lidz and Musolino (2002) got around this problem by testing sentences like in (23). Given that numerals do have a quantificational meaning (alongside an existential one, which we can put to the side here), such sentences can test whether the utterance in (23) can be interpreted distributively with the numeral taking scope over negation. But with numerals it is possible to create a situational context where the overt scope reading is false while the inverse scope reading is true. For instance, if the detective tries to find four guys, and manages to find two of them but not the other two, then it will be true that there are two guys such that the detective didn’t find them (i.e. two > neg), but it is false that he found fewer than two, as he did in fact find two people (i.e. neg > two). (23) The detective didn’t find two guys. Like before, Lidz and Musolino (2002) found that 24 English-speaking children with a mean age of 4;4 (range 3;11–4;11) accepted such sentences in the given context 34% of the time, while a group of 24 adults did so 93% of the time (Lidz and Musolino 2002: 131–132). Again, this shows an overt-scope preference by children, albeit a milder one than before, while adults are able to access the inverse scope reading. To sum up, in a series of truth-value judgment tasks involving sentences with various quantifiers and negation, it has been found that children have a preference for the overt scope interpretation. This has been termed the “Observation of Isomorphism” (Musolino et al. 2000: 14). Adults, in contrast, were able to access the inverse scope reading in all these cases.

3.4 Possible research hypotheses

..........................................................................................................................

Let us investigate this effect further. There are essentially four possible reasons for its existence. First, it is possible that children have a grammatical deficit. They simply have not yet acquired the grammatical tools that underlie inverse scope (e.g. quantifier raising). Second, it is possible that children and adult have the same grammatical knowledge, but children’s parsers, unlike those of adults, have an intrinsic preference for overt scope. This would mean that children’s and adults’ parsers are qualitatively different, and thus we would need to find the so-called magic moment, when children mysteriously abandon their child parser and turn into adults. This approach goes against the spirit of the

76

kriszta eszter szendrői

Continuity Hypothesis (Pinker 1984; Crain and Thornton 1998) and thus should only be considered if the other approaches fail to account for the data. Third, it is possible that both children and adults are capable of deriving inverse scope grammatically speaking, but their parser has an intrinsic preference for overt scope. Adults’ parsing preference for overt scope is exaggerated in children due to their limited memory resources. This would mean that there is no qualitative difference between adults and children. Both populations have an intrinsic parsing preference for overt scope, but this is more pronounced in children. Following Conroy’s (2008) proposal, let us call this the Parser Hypothesis.7 Fourth, it is possible that children and adults are capable of deriving inverse scope and they do not have an intrinsic parsing preference for overt scope. Rather, extra-linguistic factors are responsible for the semblance of an overt scope preference. Such factors could have differing effects in different experimental tasks and also interact in interesting ways in different age groups. So, it is possible that children’s appearance of an overt-scope preference plays out differently from adults’ appearance of an overtscope preference. Conroy (2008) termed this the Extra-Linguistic Hypothesis. Let us review these hypotheses in turn.

3.5 The Grammatical Deficit Hypothesis

..........................................................................................................................

It is easy to demonstrate that the first option can be dismissed. Syrett and Lidz (2005) tested 24 4-year-olds (range 4;1–4;10) in a between-subjects design on sentences that involve and ambiguous VP ellipsis site, such as (24a). Such sentences involve Antecedent Contained Deletion. The quantificational object every X involves a VP ellipsis site. VP-ellipsis is normally resolved under the Parallelism Constraint: the elided VP is the same as its antecedent. But in sentences where the VP ellipsis is inside the object that is inside the antecedent VP this leads to infinite regress. The solution is to assume that the quantificational object every X undergoes QR to the position where it c-commands the material that has been elided (e.g. Fiengo and May 1994; Merchant 2000). In a sentence with embedded clauses like (24), its position thus determines the size of the elided VP: if the QR adjoins to the embedded verb, as in (18b), the antecedent of the elided VP will be interpreted as the embedded VP, and if adjoins to the matrix verb, as in (24c), then the elided VP will be interpreted as the matrix VP. (24)

a. Miss Piggyi wanted to PROi drive every car that Kermit did. b. Miss Piggy wanted to [vP [DP every car that Kermit did ]i [VP drove ti ]] c. Miss Piggy [vP [DP every car that Kermit did ]i [VP wanted to drive ti ]]

7

Conroy (2008) in fact named this hypothesis the Parsing Hypothesis, but both I and an anonymous reviewer finds that name less intuitive given what it means, so I changed the name to Parser Hypothesis.

(quantifier) scope judgments

77

Syrett and Lidz (2005) found that if the context story was consistent with an embedded reading and falsified a matrix reading, children gave a “No” response indicating a matrix reading 54% of the time while adults did so 32% of the time. In most cases both children’s and adults’ justification of their “No” responses revealed a genuine matrix reading. So, we can conclude that children possess the grammatical knowledge to apply quantifier raising as a syntactic operation, just like adults. A further reason to doubt that the Observation of Isomorphism is due to a deficient grammar is that children appear to be able to access the inverse scope reading in other experimental setups. Gualmini (2004) was the first to notice that information structuring can alleviate children’s reluctance to assign inverse scope in sentences involving negation and an indefinite or numeral object. They argued that children could indeed access inverse scope if the reading with inverse scope provided an appropriate answer to what they called the “question under discussion.” In particular, when the expectation is built up that the Troll should deliver all the pizzas, and he ends up delivering two but loses two, children were no longer unable to access the inverse scope reading of The Troll didn’t deliver some/two pizzas. In the experiment with some, children’s inverse scope responses jumped from 50% in Musolino (1998) to 90% in Gualmini’s (2004) experiment. In the experiment with two, children’s inverse scope responses jumped from 50% in Musolino (1998) and 33% in Lidz and Musolino (2002), to 75% in Gualmini’s (2004) experiment. Gualmini et al. (2008) argued that this substantial improvement occurred because the expectations of the situation make the question ‘Will the Troll deliver all the pizzas?’ highly accessible, and the inverse scope reading (i.e. ‘Two/Some pizzas were not delivered’) is a more appropriate answer to this question than the overt scope reading (i.e. The Troll didn’t deliver any/(at least) two pizzas). So, children do appear to consider inverse scope when the information-structure requirements of the story require them to do so. In particular, Gualmini et al.’s view is that children access whichever reading provides a felicitous answer to the so-called Question Under Discussion, which is an abstract construct that maintains information flow in discourse. This points towards a scenario where children have no grammatical deficit. Gualmini et al.’s specific explanation is most consistent with the idea that neither children nor adults have an intrinsic parser preference for overt scope. Rather, extra-linguistic factors sometimes cause adults and more frequently children to favor the overt scope reading in some experimental tasks, i.e. Conroy’s “extra-linguistic hypothesis.” Musolino and Lidz (2006) also provided evidence against a lack of grammatical knowledge in children, when they demonstrated that children who fail to access the inverse scope reading in examples like (21), repeated here for convenience, nevertheless do so in examples like (25). (21) Every horse didn’t jump over the fence. (25) Every horse jumped over the log, but every horse didn’t jump over the fence. They tested 20 English-speaking children (8 boys and 12 girls) between the ages of 5;0 and 5;11 (mean 5;4) and 20 adults on sentences like (21) and (25) in similar contexts

78

kriszta eszter szendrői

favoring an inverse scope reading. They found a significant difference for acceptance rates for children, 15% for utterances like (21) and 60% for utterances like (25). (Note that this was a bimodal distribution of 6/10 children accepting the test sentence 100% of the time and 4 children rejecting it 100% of the time.) Adults, in contrast, accepted the test sentence 92.5% and 100% of the time, respectively. Their explanation for children’s improvement is reminiscent of Gualmini’s explanation: It is children’s immature pragmatic abilities that stop them from displaying their correct grammatical knowledge in certain scenarios. Once the pragmatic conditions are favorable, as in (25), the children are able to access the inverse scope reading. In fact, as Viau, Lidz, and Musolino (2010) demonstrated, it is not the actual contrast in the test sentence that makes the inverse scope reading shine through, but rather the difference in the events enumerated in the context stories. In the stories that tested (25) the horses first successfully jump over an obstacle (i.e. the log) before they attempt the jump over the fence that only some of them manage. It is the presence of this early success in the story that proved to be the relevant factor, and not its explicit mention in the test sentences.8 One final argument against a grammatical deficit account of the Observation of Isomorphism comes from a priming study. Viau et al. (2010) tested 4-year-old children in a priming task with utterances like (26a) and (26b). One group of children received 6 instances of an utterance like (26a), while the second group of children received three such utterances preceded by three instances of utterances of the type illustrated in (26b). (26)

a. Every bug didn’t hide behind the tree. b. Not every bug hid behind the tree.

For the first group, proportion of inverse scope judgments were 22.25% for the first three utterances and 38.8% for the last three. In contrast, the proportion of inverse scope judgments for the first three utterances in the second group was 83.3% and 80.58% for the last three utterances. Viau et al. (2010) interpreted their findings to show that utterances with an unambiguous distributive scope facilitate the distributive inverse scope interpretation in ambiguous utterances. This is a sort of semantic priming effect where certain aspects of the meaning of an utterance prime the same aspect in an utterance which can optionally have that interpretation. Assuming that children are not able to attain readings that are beyond their grammatical competence, we may assume that priming successfully nudged their processor to consider the inverse scope reading that they seemed initially unable to consider.9 8 Interestingly, the children tested by Viau et al. (2010) were 4-year-olds. This age group acted differently in Conroy et al.’s (2009) experiment, which found 81% acceptance in stories with no early success event. We will come back to this point below. 9 The same effect was demonstrated for adults by Conroy (2008). She performed a series of experiments which established that priming occurs for the unexpected interpretation in both children and adults (inverse scope for children and surface scope for adults), but adults’ priming effects can be modulated according to immediately previous exposure. Although these results are interesting in their own right, as they reveal interesting aspects of semantic priming, ultimately, our lack of understanding of the

(quantifier) scope judgments

79

In fact, the priming effect worked in a rather more subtle way too. Viau et al. (2010) also tested utterances with a universal subject and negation in stories with early success and no early success. First, they reconfirmed that if children heard the test sentences with three stories with early success, their performance was boosted compared to those children that heard the same stories without early success (50% vs. 25% inverse scope readings). But then all the children heard three stories with no early success. In these stories, the proportion of inverse scope reading was 80% for the children who had heard early success stories before, and remained 25% for those that heard no early success stories. This means that simple exposure to a story with a discourse setup that favors inverse scope boosted performance not only for the children that actually showed sensitivity to this discourse manipulation in the first three stories, but also for some of those children that did not reveal a sensitivity to the discourse manipulation earlier. This suggests that for some children the sensitivity was there, but they were not fast enough to integrate the discourse information to provide a matching judgment. Overall, based on the above studies we can conclude that children’s grammar is not deficient. They are able to perform the syntactic operation of quantifier raising and they can even use it to obtain inverse scope readings, they are just reluctant to do so in some experimental tasks. But their performance can be boosted by various pragmatic manipulations and by semantic priming.

3.6 The Parser Hypothesis

..........................................................................................................................

3.6.1 Evidence from TVJT tasks Musolino and Lidz (2003) considers the proposition (discussed above) that both children and adults have an intrinsic preference for overt scope. This is what I termed the Parser Hypothesis (see Conroy 2008). Recall that Lidz and Musolino (2002) found that adults accessed the inverse scope reading 93% of the time for sentences like (23), repeated here for convenience, when the sentence was presented in a context that was compatible with the inverse scope reading and falsified the overt scope reading. (23) The detective didn’t find two guys. Musolino and Lidz (2003) tested the same sentences in contexts that were compatible with either reading. Their findings revealed that adults’ justification indicated an overt scope interpretation 75% of the time. This shows that although adults, unlike children, can access the inverse scope interpretation of such sentences, they nevertheless have a preference for the overt scope interpretation. underlying mechanisms of semantic priming make it difficult to draw any firm conclusions with respect to scope interpretations.

80

kriszta eszter szendrői

Similarly, in sentences with a numeral subject and negation, like (27), adults no longer showed the overwhelming ability to access the inverse scope reading that they demonstrated with sentences involving a universal subject (i.e. 100% acceptance for items like (21) above from Musolino et al. 2000). Using sentences like (27) in a context that favors the inverse scope reading and falsifies the overt scope reading, they found that adult participants only accepted such sentences in 27.5% of the time, indicating a substantial reluctance to access the inverse scope reading (Musolino and Lidz 2003: 9). (27) Two frogs didn’t jump over the rock. Musolino and Lidz (2003) concluded that one way to explain these facts is if adults too have an intrinsic preference for overt scope in such sentences, although they are able to revise their initial parse to fit a context that favors the inverse scope reading, while children lack the processing resources to do so. But there are a couple of loose ends. First, adults’ performance on sentences with a numeral subject and negation like (27) differed from their performance on sentences like (23) with a numeral object and negation, while children’s performance was uniform on both. Lidz and Musolino (2002) propose that this may be because the sentences have different underlying grammatical mechanisms: A sentence like (23) involves quantifier raising of the object over negation, while (27) is more likely to involve reconstruction of the subject to a position under negation.10 Potentially, this explains the difference for adults, although, given the underlying assumption that children have the same grammatical capacity and parser preferences as adults, it is not clear how it follows that children’s performance is bad on both. At the same time, it is also important to note that adults perform differently on sentences involving negation and a universal subject, like (21), compared to sentences with negation and a numeral subject, like (27). If both involve reconstruction, why do adults find the sentences with a numeral subject more difficult? Lidz and Musolino (2002) give a potential explanation. They note that one crucial difference between such sentences is the entailment relations that hold between the overt and inverse scope readings in the universal case but not the numeral one. They argued (see also Musolino and Lidz 2006) that for sentences with universal subjects the every > not reading is more efficiently expressed by an utterance such as None of the horses/no horse jumped over the fence, so the hearer can reason that if the speaker used Not every horse jumped over the fence then they are more likely to have meant the every > not reading, because otherwise they would have used the more efficient and unambiguous utterance No horse.../None of the horses.... In this way Lidz and Musolino (2002) attribute adults’ preference for the inverse scope reading to Gricean reasoning, while underlyingly they have access to both readings. Children, they hypothesize, may have limited processing resources that stop them from engaging with such Gricean reasoning (see also Reinhart 1995; 2006 for the same claim), hence the lack of preference for the not > every scope in their case. 10

But compare with Reinhart (2004), who argues that neither sentence has QR.

(quantifier) scope judgments

81

But there is one issue raised by this account. If adults show an inverse scope preference in sentences with universal subjects and negation as a result of Gricean reasoning, then the question arises as to why the same reasoning cannot be invoked to explain a potentially diametrically opposite preference. There is, of course, also a more efficient and unambiguous way to express the not > every reading, namely by using an utterance such as Not every horse jumped .... It is not clear why adults are sensitive to one potential alternative but not the other.

3.6.2 Evidence from IVT tasks It is equally possible that this inverse-scope preference for adults is in fact not generalizable, and to some extent it is a special consequence of the truth-value judgment task. To explore this possibility, let us turn to a novel task, the Incremental Verification Task, designed by Conroy (2008). One crucial difference between the IVT and the TVJT used in the experiments described above is that the IVT does not involve a full-fledged discourse context. The IVT task invites participants to judge if a sentence is true in a picture as soon as they feel they have enough information to judge. The picture itself has four subparts which are each hidden under a cup. Participants can reveal a growing proportion of the picture by removing cups from left to right one after the other. An example item with a picture is given in (28). (28) Every dog isn’t wearing a hat. (a)

(b)

(c)

In this item the inverse scope reading can be verified after removing the first cup (28a), while all the cups must be removed for the verification of the overt scope reading (28c). Both readings are true in the picture, but one can still distinguish which reading a participant entertained by checking how many cups they removed to reach a decision. Participants entertained the overt scope reading (i.e. persisted to the last

82

kriszta eszter szendrői

cup) 77.1% of the time, with 14/22 participants doing so 100% of the time (Conroy 2008: 49). One might wonder whether perhaps participants favor the overt scope reading for some task-specific reason, for instance that they persist with removing cups until the reading that requires the largest amount of information can be verified. But this seems unlikely, given that Conroy also tested items like (29). In these items, the cows but not their possessions are all visible from the start. In such items, the overt scope reading can be verified (or more precisely, falsified) after the first cup is removed (see 29a), while two cups must be removed to verify the inverse scope reading (29b). (29) Every cow doesn’t have a hat. (a)

(b)

In such trials, participants entertained the overt scope reading 63.4% of the time, with 10/20 participants entertaining it all the time. But we know that adults adhere to the inverse scope reading in truth-value judgment tasks where the inverse scope reading is associated with the “Yes” answer (e.g. 96.6% (Conroy 2008: 121); Musolino, Crain, and Thornton 2000, reported above around example (21); also Lidz and Musolino 2006). There are at least two important factors to consider when judging how this difference between the TVJT and the IVT occurs. First, the high proportion of “Yes” responses in the TVJT is likely to be boosted by what Crain and Thornton (1998: 212) call the “Principle of Charity,” which states that participants always respond “Yes” in a truthvalue judgment task when the reading associated with the “Yes” answer is available to them. The same issue does not arise in the IVT, where there is no compelling reason that would cause participants to settle on a judgment early or late in the task. Recall that for sentences with negation and universal objects, the truth-value judgment task

(quantifier) scope judgments

83

that associated the inverse scope reading with “Yes” and the overt scope reading with “No” yielded 93% inverse scope responses (Lidz and Musolino 2002), while the same task yielded 75% overt scope responses once both readings were associated with a “Yes” (Musolino and Lidz 2003). So, the effect attributable to the Principle of Charity is large indeed.11 To sum up, given Conroy’s findings using the IVT, it seems that there is in fact an early parsing bias for the overt scope in sentences with universal subjects and negation (77% vs. 63% in the two experiments). This would be in line with Tunstall’s (1998) proposal that the parser has an intrinsic overt scope preference, and our own Overt Scope Preference. But is this effect reproducible in other tasks, or is it perhaps some emergent effect that is the result of task effects associated with the IVT and other extralinguistic factors?

3.6.3 Evidence from forced-choice tasks Conroy performed two further tasks to probe this question further. She performed a sentence completion task and a speeded forced-choice task using the same stimulus discourse contexts that allow for a felicitous use of both scope readings, see (30). The sentence fragment for the sentence completion task is given in (31). The only difference in the speeded forced-choice task was that participants were instructed to choose one of the two pictures of the pig’s or the cow’s barns, and to do that as soon as possible, while there was no time pressure in the sentence completion task. (30) Example context and image for sentence completion task and speeded force choice task Here, there is a red, blue, and green dwarf, with their cans of spray-paint. The farmer has pink spray-paint. There is a barn that the cow lives in, and a barn that the pig lives in. It looks like the red and blue dwarves spray-painted the cow’s 11

Note also that one important difference between the IVT and the TVJT is that the the former does not include a discourse context, while the latter does. In fact, in this light, Conroy’s (2008) IVT findings can also be interpreted as supporting Lidz and Musolino’s (2003; 2006) account based on Gricean reasoning. This is because it would make sense for the Gricean reasoning to apply in a full-fledged discourse but not in what one could describe as a situation of uncertainty, such as in the IVT. Chierchia, Crain, Guasti and Thornton (1998) argued that tasks that require a verificational judgment before the whole discourse situation is known force participants to take decisions in “prediction mode,” which has the effect of cancelling scalar implicatures. For instance if someone told you that ‘There will be pizza or ice cream at the party’ before the party takes place and in the end there was both pizza and ice cream at the party, you would not say that they pronounced an untrue statement. But if they were to utter the sentence ‘There was pizza or ice cream at the party’ after the party had taken place, then you would think they are giving an imprecise account of what happened. This is because cancellation of the scalar implicature that provides the exclusive reading for or is justified in a situation of uncertainty, i.e. before the party, but not in a situation of full knowledge, i.e. after the party. Arguably, the IVT consists of such a situation of uncertainty, as the participant’s task is to verify a sentence in an unfolding situation (i.e. during the decision process more and more aspects of the situation are revealed). So, I speculate that this task would cancel scalar implicatures of the type that Lidz and Musolino (20003; 2006) hypothesized to boost inverse scope readings in the TVJT.

84

kriszta eszter szendrői barn, but not the green dwarf. It doesn’t look like any of the dwarves spraypainted the pig’s barn, so the farmer finished the job.

(31) Every dwarf didn’t spray-paint the barn that belongs to the ... If participants entertain an overt scope reading of the universal subject and the negation, they will opt for the pig’s barn, as none of the dwarves painted that. If they entertain an inverse scope reading, they will choose the cow’s barn, as all but one of the dwarves painted that. (The stimuli were counterbalanced for physical positioning and temporal mention order effects.) In the non-speeded sentence completion task, 40% of the responses indicated an inverse scope reading (Conroy 2008: 93). But this was based on a bimodal distribution of half the participants never accessing the inverse scope reading and the other half accessing it 86.6% of the time (Conroy 2008: 93). We note that this should result in an overall average of 43.3% inverse scope, not 40%, so there must be a typo in the original text somewhere. In contrast, a significantly different result was obtained in the speeded forced-choice task where participant’s overall rate of inverse scope choice was 18.5% (Conroy 2008: 93). Conroy’s interpretation is that the results of the sentence completion task are in line with her previous findings in the IVT task: there is overall a mild preference for overt scope. But note that the preference was much less pronounced; in fact it could be as low as 56.7%. Let us also note that there was a bimodal distribution of some participants consistently going for overt scope and some consistently entertaining inverse scope. This weakens the conclusion that there is in fact an intrinsic parsing preference for such sentences. At the same time, the speeded task revealed that under pressure, participants are overwhelmingly more likely to settle

(quantifier) scope judgments

85

for the overt scope reading, which constitutes a fairly strong argument in favor of an overt-scope parsing preference.12 Conroy (2008) also performed a judgment task with unbiassed context like in (32). Participants’ task was to answer a question, as in (33), indicating their scope judgment. This task was slightly different in terms of its discourse information structure, as here, unlike in the sentence completion task, the actual outcome of the story was not revealed in the context. The task was also more similar to a TVJT task in that a “Yes”/“No” answer was required from participants. (32) There was a party at Farmer Jon’s farm. A bunny from Hillsdale, a bunny from Stonybrook, and a bunny from Camelot came. Farmer Jon offered carrots all around, but they were purple. He also had some cauliflower. Although the bunnies were all hungry, each one thought that purple carrots might not taste too good and considered eating the cauliflower instead. But, there was a lot more of the purple carrots and Farmer Jon kept saying how good they were. He really hoped that they would all try them. But in the end, every bunny didn’t eat a purple carrot. At the end of the day, the bunnies had a cool glass of celery juice to drink. (33) Did some bunnies eat a carrot? There were 12 target paragraphs. The results showed a strongly bimodal distribution with 8 out of 20 participants never obtaining an inverse scope interpretation, and 9 doing so all of the time. The obtained results were thus very similar to the non-speeded sentence completion task described above.

3.6.4 Taking stock Conroy puts forward two different hypotheses to explain her data. Under the Parser Hypothesis, which posits an intrinsic parser preference for surface-scope reasoning, one could posit that adults would have a parsing preference for the surface scope at an early stage in the comprehension process, which is reflected in their results in the ITV task. 12

Conroy also performed the IVT task using numerals and negation (Conroy 2008: 58–63), using sentences like John didn’t find two hearts. Participants entertained the overt scope reading 47% of the time in those trials where the overt scope reading could be verified earlier, and entertained the overt scope reading in 40% of the trials where the inverse scope reading could be verified first. The difference between the two types of trial were not significant, and there were 8/22 participants across trials that only entertained the overt scope reading. Conroy concluded that participants only have an overt scope preference in this task with sentences involving universal subjects and negation, but not with numeral objects and negation. Let us compare this with results reviewed above using the truth-value judgment task. Musolino and Lidz (2003) found an overt scope preference (75%) in a task associating both readings with a “Yes” answer using similar sentences. They do not report individual data. But given Conroy’s report of a bimodal distribution and given that all these studies involve a relatively low number of participants (normally 20), it seems reasonable to conclude that a clear overt-scope bias has not been demonstrated overall for items involving negation and a numeral object. At the same time, one issue that certainly leaves room for thought is the consistently bimodal distribution of scope judgments found in these experiments.

86

kriszta eszter szendrői

This overt-scope preference is later revised to match the situational and discourse context. This would explain the adult preference for the inverse scope in the TVJT task. In a speeded task there is time pressure on the participant, which could arguably stop them from revising their interpretation, hence the predominantly overt scope response in that condition. Conroy herself notes that this hypothesis does not explain the difference between the bimodal distribution of responses found in the non-speeded sentence completion task and the unimodal inverse-scope pattern found in the TVJT. Both tasks are non-speeded and involve a full-fledged discourse context, although the sentence completion task makes both readings true, while the TVJT only makes the inverse scope reading true. There is, of course, one more difference between the two, namely that the Principle of Charity biases towards an inverse scope interpretation in the TVJT but not in the sentence completion task. In contrast, the Extra-Linguistic Hypothesis would posit that both the overt and the inverse scope readings are available to the parser, which does not have an intrinsic preference for either. Conroy claims that this easily explains the results of the sentence completion task. It does indeed do so, to the extent that both readings are manifested.13 Under the extra-linguistic hypothesis, any results that show an overt-scope bias or an inverse-scope bias need further explanation. We have already provided one for the TVJT results with sentences involving universal subjects and negation. As Lidz and Musolino (2002) explain, the two scopal readings of such sentences are in an asymmetric entailment relation so in a TVJT we can expect that adults (but not children) perform Gricean reasoning, favoring the inverse scope reading, even though they do not have an intrinsic parsing bias for it. But how to explain the overt-scope bias found in sentences with universal subjects and negation with the IVT?

3.6.5 Two strong arguments against the Parser Hypothesis MacDonald, Just, and Carpenter (1992) showed that adults with low word-span recall have difficulty comprehending syntactically complex ambiguous sentences. Carpenter et al. (1994) showed that concurrent load can be used to tax working memory resources even in people with normal or high span. Specifically, Waters, Caplan, and Hildebrandt (1987) found that articulatory suppression affected adults’ ability to process syntactically complex sentences. Listening to irrelevant speech impacts participants’ word span, indicating that it taxes the phonological loop (Colle and Welsh 1976). For this reason, Conroy (2008: 211) performed a reading task using non-biased context under a concurrent task taxing working memory resources. Conroy’s hypothesis was that if the Parser Hypothesis is on the right track, the concurrent load should influence parsing of scopally ambiguous sentences in non-biased contexts, leading to a higher proportion of 13

I am not sure why this hypothesis would give rise to such a markedly bimodal distribution. The only reason I can think of is self-priming, but the effect seems too strong for that.

(quantifier) scope judgments

87

overt scope readings. This is because, by hypothesis, overt scope is accessed first and inverse scope can be obtained as a result of subsequent revision which places a burden on the working memory resources of the parser. In contrast, if it turns out that concurrent load makes no difference to the proportion of overt scope readings obtained, that would question the validity of the Parsing Hypothesis. Conroy’s 20 adult participants obtained an overt scope reading 53% of the time, which was not statistically significantly different from the rate of overt scope readings obtained in the baseline condition, without the concurrent task (Conroy 2008: 215). Thus, the findings revealed no effect of concurrent task, even though they pre-tested the task to show that it does indeed impact word span (p. 211). Conroy concluded that the results undermine the Parsing Hypothesis. A final argument against the Parser Hypothesis is put forward by Conroy et al. (2009). They noted that the children who were susceptible to the discourse manipulation in Gualmini’s (2004) study were about a year younger than the children tested in Musolino and Lidz (2006). (There were also other differences; for instance, Gualmini tested negation and numeral objects, while Musolino and Lidz tested universal subjects and negation.) Conroy et al. tested 15 4.5-year olds (4;5–5;2, mean 4;9) and 15 5-year-olds (5;3–5;7, mean 5;4) and 12 adults in a truth-value judgment task using test items like (34) (see Figure 3.1 for the outcome of the story). As shown in the figure from Conroy et al. (2009), the inverse scope reading was true in the story, while the overt scope reading was false. (34) Every cat didn’t hide behind the sofa.

fig. 3.1 Outcome of example test story from Conroy et al.’s (2009) TVJT task

88

kriszta eszter szendrői

Adults accepted the inverse scope reading 76% of the time, which is lower than in other similar TVJT tasks. The authors do not have an explanation for this unexpected finding. 4.5-year-olds accepted the inverse scope reading 81% of the time. In contrast, 5-year-olds accepted the inverse scope reading 44% of the time, which was marginally significantly different from the rate of inverse cope readings obtained by adults and significantly different from that of 4-year-olds. The distribution of the 5-year-old participants’ data was bimodal, with 7 out of 15 children never accessing the inverse scope reading. (It follows that the remaining 8 children must have accepted the inverse scope reading 82.5% of the time.) Conroy et al. (2009: 13) offer an account for what they term the “fleeting ismorphism effect” of 5-year-olds: “First, because younger children can obtain the inverse scope interpretation, and presumably younger children do not have more parsing resources than older children, we conclude that the isomorphism effect in five-year olds cannot be due to immaturity of the sentence parser, as claimed in Lidz and Musolino (2002).” They also “conclude that the isomorphism effect does not solely derive from a failure to experimentally meet felicity conditions” (Conroy et al. 2009: 13), contrary to Gualmini et al.’s (2008) conclusions. Rather, they propose (p. 13) that “children adhere to a U-shaped development in the domain of scope ambiguity resolution.” The idea, which is set out in more detail in Conroy’s (2008) work, is that at the early stages (i.e. until age 4.5) children have non-adult-like parsers with an inverse scope preference. The reason is claimed to be that children aim to “mimick the inverse scope interpretations observed in the input” (Conroy 2008: 146). Later on, at age 5, they acquire an adult-like parser, but at this point, they are still assumed to be “lacking the ability to revise their interpretation according to discourse information” (Conroy 2008: 146), resulting in an inability to revise their initial interpretation. Only when they are able to appropriately integrate discourse information will they reach the end of the U-shaped curve, and behave in an adult-like way. A couple of things to note with respect to this explanation are the following. First, the evidence that adults entertain an overwhelmingly higher proportion of inverse scope readings with sentences involving a universal subject and negation rests on a small dataset informally collected by Musolino. There are no formal corpus studies based on large spoken or written corpora to back up this assumption. Second, Conroy (2008: 201) herself argues that many examples collected “in the wild” actually cannot be properly classified, as the discourse context in which they appeared did not convincingly disambiguate the two scope readings. One wonders then how children are supposed to perform this task to arrive at a parser strategy that mimicks adult proportion of inverse scope readings. Nevertheless, Zhou and Crain’s (2009) findings are important to mention here. They tested Mandarin equivalents of sentences with universal subjects and negation in an early success context TVJT. Mandarin is scopally rigid like Japanese. Zhou and Crain’s finding patterned very similarly to those of Conroy et al. (2009). They found that the

(quantifier) scope judgments

89

children whose age range was 3;4–4;3 accepted an inverse scope reading 89% of the time, while the older children, aged 4;5–5;11 did so only 10% of the time, with adults never accepting the inverse scope reading. In the Mandarin data there is, of course no U-shaped pattern, given that adults disallow inverse scope in such sentences. But it is interesting to note that the drop in inverse scope readings for Mandarin children seems to occur at the same age, around the end of the 4th year of life, as for the Englishspeaking children. Could it be that (some) English children are briefly experimenting with a rigid-scope parameter during Age 5? So, overall, there seems to be evidence that 5-year-olds display a fleeting isomorphism effect in truth-value judgment tasks, displaying what looks like a U-shaped developmental curve. Future research should establish whether this effect is general in the sense that it is replicable using other tasks, and in the sense that other doubly-quantified constructions also display the effect. It would also be interesting to see data from different languages.14 More generally the evidence, although mixed, ultimately comes down against the Parser Hypothesis. First, we have seen that an inverse-scope bias was found in many adult TVJTs. Comparing these to forced-choice tasks and IVTs, however, strongly suggests that the over-acceptance of inverse scope is a task effect, due to the Principle of Charity. The results from the forced-choice and question–answer tasks revealed no adult preference for overt scope. This argues against the Parser Hypothesis, as does the fact that the so-called Isomorphism Effect in children turns out to be a fleeting one. Perhaps the strongest argument against the Parser Hypothesis is the lack of sensitivity scopal assignments showed for increased working memory load. If we are to abandon the Parser Hypothesis, then the findings that will need to be accounted for are the overt-scope preference found in IVT tasks, the overwhelming overt-scope preference in the speeded forced-choice task and last but not least, the Isomorphism Effect 14

I would like to note two aspects of the Conroy et al. (2009) task that in my view would merit further investigations. First, unlike in many previous tasks which had 4 test stories, here 6 test stories were performed. This could have partly boosted the effect due to the fact that self-priming has been demonstrated to play a role with scopal judgments (see Viau et al. 2010, reviewed above). In addition, 2 warm-up stories were administered and 2 filler stories, all of which had the same event structure as the target stories: Three characters first failed to perform a particular task, then two out of three proceeded to succeed in a different task, while the third character failed to do so. This is important for two reasons. First, in contrast to Viau et al.’s (2010) and Lidz and Musolino’s (2006) experiments, child participants did not have the discourse advantage of Early Success in this experiment. Nevertheless even 5-year-olds strongly outperformed participants from those studies in comparable No Early Success stories (i.e. 15% inverse scope rate for 5-year-olds in Lidz and Musolino (2006); 22.5% in Viau et al. 2010). The uniformity of the stories is also relevant from the perspective of verificational strategies. In all of the stories, an existential verification strategy is a fruitful one (i.e. ’Find a guy who did/didn’t X’). This could well be the reason why performance was boosted compared to the other studies. The inverse scope reading requires an existential verification strategy. But training the children to perform existential rather then universal verification would have allowed the possibility that some children chose to apply an existential falsification strategy to the overt scope reading. Of course, this does not in any way provide an answer to the to the intriguing question why about half of 5-year-olds would have decided to do so while no adult or 4-year-old did.

90

kriszta eszter szendrői

found in children. However fleeting it is, it needs an explanation. In the next section we will review some extra-linguistic factors discussed in the literature that may be helpful in this endeavor.

3.7 The Extra-Linguistic Hypothesis

..........................................................................................................................

Given the lack of overwhelming evidence for an intrinsic parsing preference for utterances with a quantifier and negation, let us explore why the appearance of such a preference might nevertheless show up in certain tasks.

3.7.1 Verification and falsification Conroy offers a possible explanation for the overt scope bias found in the IVT task with sentences involving a universal subject and a negation. Assuming that there is no real parsing advantage, she proposes that the overt scope reading is favored in such examples because of the way participants perform the process of verification for the two readings. She explains that in an utterance involving a universal, say ‘Every snail has antennae’ people could choose to verify if the sentence is true, i.e. check whether every snail has antennae, or try falsifying it, i.e. check whether there is a snail without antennae. The first option involves a universal verification procedure (i.e. checking every snail), while the second involves an existential one (i.e. find one snail such that ...). Conroy goes on to argue that both falsification and existential procedures are harder for the human parser than verification and universal procedures. Now take a sentence with a universal subject and a negation such as Every snail doesn’t have antennae. In such a situation, participants have a choice between two readings. In the overt scope reading, a verification procedure is available which is also a universal one: One needs to check every snail is without antennae. If so, the sentence is true. The inverse scope reading, however, does not have a verification procedure that is a universal one. One can either use an existential verification procedure, i.e. check if there is a snail without antennae, or one can use a universal falsification procedure, i.e. check every snail as to whether it has antennae. If they all have antennae, then the sentence is falsified. Given that participants did not persist to the last cup in the IVT task when their True/False response indicated an inverse scope reading, Conroy concludes that people prefer verification procedures even if they are existential, compared to a falsification procedure, even if that is a universal one. But this means, Conroy argues, that perhaps the reason why adults opt for the overt scope reading in IVT is that an easy universal verification procedure is available for this reading, while an existential one must be used for the inverse scope reading.15 15

Conroy (2008) reasons that the verification account would predict that sentences with an existential subject and a universal object such as Anderson’s (2004) ’A climber scaled every cliff ’ should give

(quantifier) scope judgments

91

3.7.2 The Semantic Subset Principle Another extra-linguistic factor that is discussed extensively by Goro (2007) and Crain (2012) concerns the effect of the Language Acquisition Device, or more precisely the Semantic Subset Principle on scopal readings in children. These researchers studied sentences with a downward entailing operator, such as negation and disjunction. Due to what is called De Morgan’s Law, in logic, a negated disjunction is equal to the conjunction of the negated conjuncts: ∨ ∧ (35) ¬ (A B) = ¬ A ¬ B As (36) illustrates, this is in fact true in adult English as well. (36) is true in any situation where John brought neither beer nor wine, and false otherwise. (36) John didn’t bring beer or wine to the party. But, interestingly, the same type of utterances in Japanese or Mandarin has different truth conditions. (37) and (38), the latter a direct translation of (36), are true on the ‘not both’ reading. (37) John-wa supeingo ka furansugo-o hanasa-nai John-TOP Spanish or French-ACC speak-NEG ‘John doesn’t speak Spanish OR he doesn’t speak French.’ (Goro 2007: 188, ex. 222) (38) (Wo cai) Yuehan meiyou dai pijiu huozhe hongjiu qu jiuhui. (I guess) John not bring beer or wine go party. ‘It’s either beer or wine that John did not bring to the party’ (Crain 2012: 149, ex. 107) Goro (2007) and Crain (2012) show that this is not the result of De Morgan’s Law not holding in the language or the logical connectives having different truth conditions. Rather the readings arise because in Japanese and Mandarin the disjunction takes scope over the negation, so De Morgan’s Law does not apply. Thus, the LF for the utterances in (37) and (38) is as in (39a and b), respectively. (39)

a. [supeingo ka furansugo-o]i John-wa ti hanasa-nai b. [pijiu huozhe hongjiu]i Yuehan meiyou dai ti qu jiuhui.

rise to an inverse scope preference, contrary to fact, as in this case it is the inverse scope reading that has a universal verification process (i.e. check every cliff if a climber scaled it), while the overt scope reading needs an existential verification process (i.e. check to see if there is a climber such that they scaled every cliff). Conroy proposes that the overt scope preference found by Anderson (2004) was in fact due to the topicality of the subject interfering by licensing wide scope for the existential subject. But this fails to explain Tunstall’s (1998) findings where topicality as a factor was eliminated. However, I am not convinced in the first place that Anderson’s reading tasks actually present a verification problem in the first place. It is possible that the test item is not actually verified (i.e. matched to context to determine its truth or falsity) in the relevant sense. So, no verificational advantage for the overt scope is actually relevant here.

92

kriszta eszter szendrői

In Goro and Akiba’s (2004) study, they tested 30 3–6-year-old children (mean: 5;3) using the prediction mode of the truth-value judgment task. In the story, there were twelve animals. Each animal was asked, in turn, if it was happy to eat two vegetables, a carrot and a green pepper. The child participants were asked to give the animals rewards as follows: if an animal ate both vegetables they were supposed to receive a gold medal, if they ate one of the two vegetables they got a blue medal. If the animal refused to eat both vegetables, they received a black cross, which is a symbol in Japanese culture for failure with which the children were familiar. After the rewards were given out, the puppet uttered the test sentences as a guess. An example is given in (40). (40) The pig didn’t eat the pepper or the carrot. The critical trials were those where the animal in question had a blue medal. In such trials, as expected, adult controls accepted the test sentence 100% of the time, while children rejected it 75% of the time. In fact, four children were adult-like, and once their responses were removed the rejection rate jumped to 87% for the remaining 26 children. Children’s justification revealed an overt scope interpretation, as they interpreted (40) to mean that the pig ate neither pepper nor carrot. Thus, their judgments matched that of English adults (and children) and not that of Japanese adults. A similar study was reported by Crain (2012) with Mandarin children and adults, with the same findings. Goro (2007) and Crain (2012) argue that the reason children have an overt scope reading in sentences involving a negation and a disjunction in the object is not a general bias towards an overt scope interpretation. Rather, they propose that the Language Acquisition Device helps them avoid a learnability problem. The specific problem is that the two possible readings are in an asymmetric entailment relation. The situations where the ‘neither’ reading is true are a proper subset of the set of situations where the ‘not both’ reading is true. As a result, if a child were to initially assume the ‘not both’ reading, they would run into a learnability problem, given the lack of negative evidence in child language acquisition. Since adults around them might have a grammar that assigns the ‘neither’ reading (e.g. English) to these operators, such children would never receive positive evidence that would lead them to revise their over-permissive grammar. In contrast, if LAD ensures that children always start out with the subset grammar, the one with the stronger reading (i.e. the ‘neither’ reading), they will eventually run into positive evidence that would push them to revise their grammar if they happen to acquire a language where adults use such sentences in the ‘not both’ sense, such as Mandarin and Japanese. So, Goro (2007) and Crain (2012) demonstrated that in particular cases of linguistic ambiguity where one reading asymmetrically entails the other, children are expected to entertain the reading that is true in a smaller set of possible situations, i.e. the subset grammar, so as to avoid having to rely on negative evidence to revise their grammar. This is yet another case of an extra-linguistic factor, specifically a factor associated with the

(quantifier) scope judgments

93

Language Acquisition Device, guiding children’s interpretation of scopally ambiguous utterances.16

3.8 Conclusions

..........................................................................................................................

Overall, let us try to make some helpful generalizations based on a wealth of data involving both existential–universal combinations and interactions of quantifiers with negation, in both adults and children. The evidence is complex, but it seems to me that scopal freedom is default unless quantifier raising is involved, which is in a very restricted set of cases. The appearance of an overt-scope preference in the other cases (i.e. with sentences involving negation and a quantifier), I would like to suggest, is more likely an illusion. Scopal readings are evidently very sensitive to different task effects: We have witnessed a high acceptance rate for inverse scope in the truth-value judgement task, at least for adults, but also, it turns out, for most children, except 5-year-olds. We have also seen that the Incremental Verification Task gives rise to an overt-scope preference. However, at the moment we can only reach this conclusion for adults, as the IVT has not been performed with children yet. In addition, scopal readings seem to be easily influenced by priming and even self-priming too, making it even harder to pin down any intrinsic preferences for one scopal reading or the other. But forced-choice tasks and unbiassed context question–answer tasks revealed scopal freedom, at least for adults. On a theoretical level, we may note that there was not necessarily any theoretical reason to expect an overt scope preference in sentences involving negation and a quantifier in the first place. Reinhart (1997; 2006) argues that such sentences only ever involve quantifier raising if a universal quantifier is c-commanded by negation (i.e. in object position). All the other cases, including all the test items reviewed in this section, involve optional reconstruction of the subject under negation, or indefinites or numerals taking wide scope over negation. Reinhart argued that the mechanism for wide scope for indefinites is different from quantifier raising, as it is not island sensitive. If this is all on the right track, then we can return to the data reviewed in Section 3.2 involving existentials and universals, which—if they occur in this order in surface syntax, and only 16 Interestingly, as we already mentioned above in a different context, Crain and Hamburger (1992) argued that adults will often have the opposite “strategy.” In situations where the discourse context is too poor to guide the resolution of a particular ambiguity, adults often adopt the weaker reading, the one which is true in a larger set of circumstances. This is a cooperative discourse move, as in this case, the speaker who uttered the original ambiguous utterance is assumed to be committed to a weaker statement. In the course of the subsequent discourse, the hearer will have a chance to clarify if the stronger reading was in fact intended. This strategy has precisely the opposite outcome compared to that of the subset principle in children. One example of this effect is the default focal interpretation out of context for utterances with only in a sentence where only occupies a VP-adjoined position such as ‘Peter only gave a book to Sue.’ Here adults have a default interpretation where the only associates with the indirect object (i.e. Peter didn’t give a book to anyone else). 4-5-year-old children, in contrast, prefer a VP-focus interpretation (i.e. Peter didn’t do anything else) See Szendrői (2004) for details.

94

kriszta eszter szendrői

then—involve quantifier raising. It seems, then, that the psycholinguistic evidence supports the assumption of a distinct mechanism for such sentences, i.e. quantifier raising, as in the case of such sentences, we found evidence of extra processing load and an overt scope preference for adults. Note, however, that the same preference was not present for children in the limited evidence available. In this chapter, I hope to have provided readers with a helpful overview of a very interesting and growing area of psycholinguistics and language acquisition, interpretive ambiguities arising from scopal interactions. I hope to have demonstrated that this is an area where psycholinguistic evidence and evidence from language development can be directly relevant for theoretical analyses of the phenomena. I have also discussed that this area can posit serious methodological challenges. I hope that the readers of this chapter feel well motivated and better equipped to tackle these research questions in future work. Future research should target sentences with potential quantifier raising more specifically, to find out if the interesting contrast found (adults: processing cost, overt scope preference; children: no preference) holds more generally, or is specific to sentences with existential subjects and universal objects. Cross-linguistic studies should also be encouraged, as they help distinguish cognitive and grammatical factors. Finally, experiments with methodological clarity and ones that test a wide age range promise to be useful to further our understanding, but then again they always are.

References Altmann, G., and M. Steedman. 1988. Interaction with context during human sentence processing. Cognition 30: 191–238. Anderson, Catherine. 2004. The structure and real-time comprehension of quantifier scope ambiguity. Dissertation, Northwestern University. Bobaljik, Jonathan D., and Susi Wurmbrand. 2012. Word order and scope: Transparent interfaces and the 3/4 signature. Linguistic Inquiry 43(3): 371–421. Catlin, J., and D. L. Micham. 1975. Semantic representations as procedures for verification. Journal of Psycholinguistic Research 4(3): 209–225. Chierchia, Gennaro, Stephen Crain, Maria Teresa Guasti, and Rosalind Thornton. 1998. ‘Some’ and ‘or’: A study on the emergence of logical form. In Proceedings of the 22nd Boston University Conference on Language Development, 97–108. Somerville, MA: Cascadilla Press. Conroy, Anastasia. 2008. The role of verification strategies in semantic ambiguity resolution in children and adults. Dissertation, University of Maryland. Conroy, Anastasia, Eri Takahashi, Jeffrey Lidz, and Colin Phillips. 2009. Equal treatment for all antecedents: How children succeed with Principle B. Linguistic Inquiry 40(3): 446–486. Crain, S. 2012. The Emergence of Meaning. Cambridge: Cambridge University Press. Crain, S., and H. Hamburger. 1992. Semantic knowledge and NP modification. In R. Levine (ed), Formal grammar: Theory and interpretation, vol. 2, 372–401. Vancouver: University of British Columbia Press. Crain, S., and C. McKee. 1985. The acquisition of structural restrictions on anaphora. In Proceedings of NELS 16. Amherst, MA: GLSA, 94–111.

(quantifier) scope judgments

95

Crain, S., W. Ni, and L. Conway. 1994. Learning, parsing, and modularity. In C. Cliifton, Jr., L. Frazier, and K. Rayner (eds), Perspectives on sentence processing, 443–466. Brighton: Psychology Press. Crain, S., and M. Steedman. 1985. On not being led up the garden path: The use of context by the psychological syntax processor. In D. R. Dowty, L. Karttunen, and A. M. Zwicky (eds), Natural language parsing: Psychological, computational and theoretical perspectives, 320–358. Cambridge: Cambridge University Press. Crain, S., and R. Thornton. 1998. Investigations in Universal Grammar. Cambridge, MA: MIT Press. Fiengo, R., and R. May. 1994. Indices and identity. Cambridge, MA: MIT Press. Fodor, J. D. 1982. The mental representation of quantifiers. In S. Peters and E. Saarinen (eds), Processes, beliefs and questions, 129–164. Dordrecht: Reidel. Goro, Takuya. 2007. Language-specific constraints on scope interpretation in first language acquisition. PhD dissertation, University of Maryland. Goro, T., and S. Akiba. 2004. The acquisition of disjunction and positive polarity in Japanese. In V. Chand, A. Kelleher, A. J. Rodríguez, and B. Schmeiser (eds), WCCFL 23: Proceedings of the 23rd West Coast Conference on Formal Linguistics, 251–264. Somerville, MA: Cascadilla Press. Gualmini, Andrea. 2004. Some knowledge children don’t lack. Linguistics 42(5): 957–982. Gualmini, A., Sarah Hulsey, Valentine Hacquard, and Danny Fox. 2008. The question–answer requirement for scope assignment. Natural Language Semantics 16(3): 205–237. Hirschbühler, P. 1982. VP deletion and across-the-board quantifier scope. In J. Pustejovsky and P. Sells (eds), Proceedings of NELS 12, art. 11. Amherst, MA: GLSA. Hulsey, S., V. Hacquard, D. Fox, and A. Gualmini. 2004. The Question–Answer requirement and scope assignment. In A. Csirmaz, A. Gualmini and A. Nevins (eds), Plato’s Problem: Problems in Language Acquisition (71–90). Cambridge, MA: MITWPL. Ioup, G. 1975. Some universals for quantifier scope. In J. P. Kimball (ed.), Syntax and semantics 4, 37–58. New York: Academic Press. Kempson R. M., and A. Cormack. 1981. Ambiguity and quantification. Linguistics and Philosophy 4: 259–310. Kroch, Anthony S. 1974. The semantics of scope in English. Doctoral Dissertation, Massachusetts Institute of Technology. New York: Garland Publishing, 1979. Kurtzman, H. S., and M. C. MacDonald. 1993. Resolution of quantifier scope ambiguities. Cognition 48: 243–279. Lidz, J., and J. Musolino. 2002. Children’s command of quantification. Cognition 84: 113–154. MacDonald, M. C., M. A. Just, and P. A. Carpenter. 1992. Working memory constraints on the processing of syntactic ambiguity. Cognitive Psychology 24(1): 56–98. May, Robert C. 1977. The grammar of quantification. Doctoral Dissertation, Massachusetts Institute of Technology. Reproduced by the Indiana University Linguistics Club, 1982. Merchant, J. 2000. Antecedent-contained deletion in negative polarity items. Syntax 3: 144–150. Micham, D. L., J. Catlin, N. J. VanDerveer, and K. A. Loveland. 1980. Lexical and structural cues to quantifier scope relations. Journal of Psycholinguistic Research 9(4): 367–377. Musolino, J. 1998. Universal Grammar and the acquisition of semantic knowledge: An experimental investigation into the acquisition of quantifier-negation interaction in English. PhD dissertation, University of Maryland at College Park.

96

kriszta eszter szendrői

Musolino, J., S. Crain, and R. Thornton. 2000. Navigating negative quantificational space. Linguistics 38(1): 1–32. Musolino, J., and J. Lidz. 2003. The scope of isomorphism: turning adults into children. Language Acquisition 11: 277–291. Musolino, J., and J. Lidz. 2006. Why children aren’t universally successful with quantification. Linguistics 44: 817–852. Noveck, I. 2001. When children are more logical than adults: Experimental investigations on scalar implicatures. Cognition 78: 165–188. Papafragou, A., and J. Musolino. 2003. Scalar implicatures at the semantics/pragmatics interface. Cognition 80: 253–282. Pinker, S. 1984. Language learnability and language development. Cambridge, MA: MIT Press. Reinhart, T. 1983. Anaphora and semantic interpretation. London: Croom Helm. Reinhart, T. 1995. Interface strategies. OTS Working papers in Linguistics. Utrecht: OTS. Reinhart, T. 1997. Quantifier scope: How labor is divided between QR and choice functions. Linguistics and Philosophy 20: 335–397. Reinhart, T. 1999. The processing cost of reference set computation: Guess patterns in acquisition. OTS Working Papers in Linguistics, 99–001–CL/TL, Utrecht University. Reinhart, Tanya. 2004. The processing cost of reference set computation: acquisition of stress shift and focus. Language Acquisition 12(2): 109–155. Reinhart, Tanya. 2006. Interface strategies: Optimal and costly computations. Cambridge, MA: MIT Press. Steedman, M. 2000. The syntactic process. Cambridge, MA: MIT Press. Syrett, Kristen, and Jeffrey Lidz. 2005. Children want to access every interpretation adults do. In Leah Bateman and Cherlon Ussery (eds), Proceedings of the 35th annual meeting of the North East Linguistic Society, 591–602. Charleston, SC: Booksurge. Syrett, Kristen, Georgia Simon, and Kirsten Nisula. 2014. Prosodic disambiguation of scopally ambiguous quantificational sentences in a discourse context. Journal of Linguistics 50: 453–493. Szendrői, Kriszta. 2004. Acquisition evidence for an interface theory of focus. In Jacqueline van Kampen and Sergio Baauw (eds), Proceedings of Generative Approaches to Language Acquisition 2003, 457–468. Utrecht: LOT. Szendrői, K., R. Schumacher, T. Fritzsche, and B. Höhle. 2017. Acquisition of quantifier raising of a universal across an existential: Evidence from German. Glossa: a journal of general linguistics 2(1): 46. Tunstall, S. 1998. The interpretation of quantifiers: Semantics and processing. Doctoral dissertation, University of Massachusetts, Amherst. Vanlehn, K. A. 1978. Determining the scope of English quantifiers. MIT Artificial Intelligence Lab Report. Viau, Joshua , Lidz, Jeffrey, and Musolino, Julien. 2010. Priming of Abstract Logical Representations in 4-Year-Olds. Language Acquisition 17(1): 26–50. Waters, G., D. Caplan, and N. Hildebrandt. 1987. Working memory and written sentence comprehension. In M. Coltheart (ed) Attention and Performance XII, 531–555. Hillsdale, NJ: Laurence Erlbaum. Zhou, P., and S. Crain. 2009. Scope assignment in child language: Evidence from the acquisition of Chinese. Lingua 119: 973–988.

c ha p t e r 4 ...........................................................................................................

e x p e r i m e n ta l s y n ta x and linguistic fieldwork ...........................................................................................................

maria polinsky

4.1 The fieldworker’s backpack and the experimenter’s lab coat

..........................................................................................................................

Linguistic fieldwork is research conducted on a language that the linguist does not speak natively, through the collection of primary language data gathered in interaction with native-speaking consultants (Chelliah and de Reuse 2011: 7; Bochnak and Matthewson 2015: 2; Bowern 2008). Experimental fieldwork is simply experimental work conducted in a natural setting (the location where a given language is spoken), rather than in the researcher’s lab. Although nothing in this definition requires that the work be conducted on a language spoken in a remote or poorly accessible setting, or an endangered language, or an understudied language, one or all of these extra conditions are often implicit in our understanding of linguistic fieldwork—and these assumptions can get in the way of planning potential experimental work. It is common to contrast linguistic fieldwork with lab-based experimental work on language, but the two are less different than they seem. Both lines of inquiry work to understand the mental representation of language by a native speaker; both are guided by testable hypotheses; both are designed to evaluate predictions based on theoretical considerations; both conduct those evaluations by constructing minimal contrasts; and both deal with variation within and across language users. Fieldwork essentially consists of tiny experiments that are fine-tuned in situ based on consultant feedback. In short, there is no irreconcilable difference between the fieldwork culture and the culture of laboratory linguistics—particularly experimental syntax.

98

maria polinsky

However, differences between the two do exist. I will address the main points of divergence: the baseline data used, the nature and role of the language consultant, and the degree of a researcher’s involvement in the language community. Experimental syntax typically relies on already-established data and uses established syntactic analyses as the springboard, whereas fieldwork relies on primary data that is collected and analyzed by the same group of researchers. Since experimental syntax relies on existing analyses, it has mainly dealt with well-described and thoroughly analyzed languages. But it is helpful to remember that deep analyses of such languages also started with introspection and conversations between linguists as to whether structure X was possible, and if not, why. A fieldworker working on a lesser-studied language needs to collect the primary data first, and such collection involves both natural production and targeted elicitation; this collection is essentially an experiment, in the broad sense of that word. Reductively, then, the fundamental difference between lab work and fieldwork is that the fieldworker needs to do more preliminary work before s/he can start an experiment in the narrow sense. While experimental research on languages is viewed as modeling grammars, fieldwork is often associated with descriptive work (within the latter, work on endangered languages often goes under the special rubric of salvage work). Yet descriptive work is just the first step in constructing the model of a new language available to a researcher in the field. This first step is simply taken for granted in experimental work; someone has already done the original data collection, and that work can be relied on without much hesitation. The better-described languages primarily represent educated, rich societies (particularly the English-speaking world), and it is from this population that experimental syntactic work has primarily drawn. In psychology, researchers have expressed concern that the oversampling of people from Western, educated, industrialized, rich, and democratic (WEIRD) societies—who represent as much as 80% of study participants, but only 12% of the world’s population—may be skewing our understanding of human behavior and culture (Henrich et al. 2010). Likewise, in experimental work, the emphasis has been on monolingual, young, available, and literate speakers (MYALSs), and that may be skewing our perception of native speakerhood. In some ways, the use of MYALSs is a matter of expediency. These days, it takes only an hour to collect online judgment data from English speakers, however narrowly defined, and to build an experimental paradigm based on those data. Fieldwork, on the other hand, often—though not always—involves speakers in more remote areas, who may be less literate or educated, are often bilingual, and may not be as comfortable with test-taking as MYALSs. The two subfields differ in the types of participants they recruit as well as in potential sample sizes. Experimental syntax is often based on large-scale comparison with many MYALSs (Sprouse and Almeida 2012; 2017), while a fieldworker settled in a small language community on the wane may be dealing with five remaining speakers of a given language. The reality is that, when working with an endangered language, the luxury of large numbers is simply not there. It may therefore be tempting to think that “the

experimental syntax and linguistic fieldwork

99

conclusions that can be drawn from [data from endangered languages] will be weaker and more speculative in nature than the conclusions based on quantitative data” (Gibson and Fedorenko 2013: 94) and that “obscure, little-studied languages [are an] … unsatisfactory data source” (Featherston 2007: 278). Yet that’s not a sufficient reason to abandon an experimental approach to these languages. After all, studies of clinical populations, fMRI research, most studies in phonetics, and research on sign language also have tiny subject pools but are proudly counted among experimental approaches. In experimental work, the language speaker is either a participant or a subject; in fieldwork, the speaker is called the informant or language consultant. The consultant’s role is much more active than that of the participant; the consultant is not just at the receiving end of the language experiment but contributes to the data and the flow of work. Both the fieldworker and the consultant are trying to get a glimpse of the consultant’s mind, so the consultant is simultaneously an object of testing and an active participant in that testing. The more trained a consultant becomes, the more eager s/he may be to offer opinions on the relative acceptability of similar examples or even the possible analysis of a particular structure—an analysis that may not involve linguistic terminology but can be insightful and informative. The fieldworker and consultant together perform iterative experiments, asking a similar question over and over again to the point where a semblance of statistical significance may arise (an issue I will revisit below). Both the fieldworker and the consultant engage in learning—not just about the consultant’s language, but about linguistics as well. In fact, some successful fieldworker–consultant partnerships have led to the training of native-speaker linguists, a model pioneered by Ken Hale in the USA (Hale 1972a; 1972b; 1992) and by Peter Skorik (Vaxtin and Golovko 2005) and Alexander Kibrik (Kibrik 2005; Vaxtin and Golovko 2005) in Russia.1 Unlike experimental study participants, fieldwork consultants may be older, taking the Y out of MYALS and potentially leading to additional challenges brought on by aging (on the effects of aging on language, see Kemper et al. 2001; Burke and Shafto 2004). Fieldwork consultants may also lack the educational savvy of MYALSs encoded by the L in this abbreviation, as speakers of understudied or endangered languages often lack literacy in the traditional sense. This limitation, too, imposes additional requirements on researchers and their paradigm. Related to these differences, fieldworkers and experimentalists differ in the degree to which they are involved in the community. Experimentalists and their MYALSs rarely forge long-lasting relationships. MYALSs (and other experimental participants) take part in an experiment, answer a few questions before and after, get compensated for their participation, and leave. They rarely come back, unless the return is for a followup experiment, and they are not connected to the researcher in a meaningful way. In linguistic fieldwork, on the other hand, it is crucial to build a strong relationship with the language consultant(s). Since this process takes time, trust, mutual understanding,

1

Other names may be added to this list; there is no comprehensive accounting of all of the outside researchers who have overseen or encouraged the training of native-speaker linguists.

100

maria polinsky

patience, and strong motivation, fieldwork tends to be a self-selecting discipline; researchers who view the consultant as a mere machine for producing language do not last long, and typically switch to a different line of inquiry. Given the preceding discussion, it is evident that in order to conduct experimental work in the field, one needs a team. A one-man orchestra will not do, as several strands of expertise are required. An experienced fieldworker with good ties to a community can provide the primary data, help establish contacts, and (in the optimal cases) train local native-speaker linguists in the ongoing and future work. A syntactician is needed to do what they do best: articulate a specific set of hypotheses and propose ways of collecting, norming, and analyzing data to test them. This role can, in theory, be fulfilled by just one person, though a team of syntacticians may often be involved—with some members focused on the theory and others on the experimental aspect of the study.2 Additionally, a psycholinguist or neurolinguist is needed to design and conduct the actual experiment, employing the appropriate and necessary methods given the research question at hand (I would like to underscore that I remain agnostic in terms of which methods need to be used—they could include judgments, reaction times, electrophysiology, etc.). Every fieldwork situation is different, but if native-speaker linguists are available, they can provide an important link between the different strands of inquiry and become a major force in bringing experiments to the field. The team I described here is an idealization, reflecting my own experiences as well as the main components that are needed to produce effective experimentation in the field. No matter how many people are involved, the most successful fieldwork experimentation projects are those where all of the team members share some common assumptions and continue educating each other. In modern times, when it is possible for people to meet without being in the same room, regular meetings for an ongoing project are not only desirable, but doable. Projects are less successful when each person is responsible for a narrow corner of the work and communication amongst the team is limited. One helpful outcome of team interaction in experimental fieldwork is that each type of researcher is forced to get out of their comfort zone. A theoretical syntactician may have to explain what parasitic gaps are, and why they are relevant, to teammates who do not have the concept at their fingertips; a fieldworker may have to clarify the role that different conjugations play in the language under discussion and why ‘boil’ cannot be expressed by just one word; a native-speaker linguist may be aware of gender differences in the use of proper names, and a person who knows eye-tracking may balk at the use of ambiguous referents in the visual stimuli. Although it appears obvious that teammates should be willing to educate each other, such interactions do not always happen. Like the work with native consultants, this type of communication requires trust, mutual respect, appreciation of the others’ expertise, and patience.

2

Since the focus of this Handbook is on syntax, I emphasize the need for a syntactician here; but more generally, a good fieldwork team comprises experts in phonetics and phonology, morphologists, syntacticians, lexicographers, etc.

experimental syntax and linguistic fieldwork

101

In sum, fieldwork and experimental syntax can and should be combined; they share a number of premises and they stand to enrich one another. But in order for this marriage of minds to be successful, it is important to plan carefully, and it is this planning that I will explore in the rest of this chapter. While my focus is on syntactic work, there are encouraging examples of fieldwork and experimentation converging outside of syntax, especially in the semantic realm (Arunachalam and Kothari 2011; Bohnemeyer 2015; Gil 2008; Bochnak and Matthewson 2015). The two fields can enhance each other by sharing approaches and tools.

4.2 Conceptual issues

..........................................................................................................................

Ultimately, since both fieldwork and experimental syntax work on language, many of the approaches used in the two disciplines are parallel: Successful approaches to language in general are also often successful approaches to lesser-studied languages in an experimental setting. In this section, I will discuss some conceptual issues surrounding approaches to fieldwork and experimentation. There are two schools of thought concerning fieldwork. According to American structuralists, fieldwork is all about the process of discovery—you approach the new language as a complete unknown, ignoring any information that may already be available for fear of getting tainted with incorrect ideas. The beauty of this approach is that you learn by trial and error (a lot of it, too), and whatever you learn stays with you forever. The alternative approach is to gather as much information as possible prior to embarking on a project, in the hopes that your fieldwork will allow you to verify what you read (it is always good to question whether the other researcher got it right) and move ahead with interesting discoveries. Understanding the general theoretical landscape of a given phenomenon is important in both approaches; if you know how noun–adjective combinations are built, for example, you can ask better questions when encountering them in your language of study. The latter approach is particularly helpful for experimental work in general, and experimental work in syntax more specifically. Having a working knowledge of the existing work on a given language (if any such work is available) as well as syntactic theory allows for more effective exploration and hypothesis verification. In the age of powerful statistics and great gadgets (many of which are improving faster than our theories!), experimental work is tempting. Experimental studies seduce us with the novelty of fresh endeavors, the allure of quantified results, and the promise of moving the field forward. But anyone who has done serious experimental work will tell you that the preparations are long and arduous. Add to this preparatory work the unique difficulties of conducting your experimental work in a remote, non-WEIRD setting, and the challenges become immense. Rather than starting from the assumption that you are ready to begin an experiment, approach experimental work in the field (as probably all experimental work) with the question: “In what ways am I not ready to conduct an experiment?” To put it differently, before embarking on an experiment,

102

maria polinsky

we should all do what typical “armchair linguists” do: ponder unusual facts, anticipate how these facts connect with the rest of the language we know, and assess the role of these results against existing linguistic theory (Fillmore 1992; Phillips 2010). Armchair linguistics is cheap, and it can save time in the long run. Only after we have thought hard about various issues are we ready to run an experiment in any language, foreign or familiar. Let me present two general situations where an experiment may seem tempting but is not warranted. The situations are real and all-encompassing; the examples I chose to illustrate these situations can of course vary (and my choice of actual examples may strike the reader as flawed). The first scenario arises when there is no clear hypothesis to be explored. Take, for instance, the so-called double-is construction in Modern English (also called the reduplicative/double copula, ISIS, Extris, amalgam, or thing-is construction):3 (1)

a. The thing is is that it all depends on the graphic card’s drivers. b. I think the answer is is to have Thread B not terminate but rather have the Thread A delegate release the Mutex for Thread B when bytes are received. c. The result is is that when the carb gets hot, almost all of the clearance at the shaft is taken up by expansion. d. What’s nice is is that it has a sort of other-worldly character …

The double-is construction is well attested in contemporary American English, Australian English, and New Zealand English. It first appeared in the second half of the 20th century, and its use increased through the 1960s and 1970s (Curzan 2012; O’Neill 2015). There are no obvious geographic or sociological factors that might characterize its speaker distribution (McConvell 1988). Most English speakers produce this construction yet reject it when asked to judge examples like (1)—a common production/comprehension divide, one that is not limited to this particular phenomenon. The double-is construction has received quite a bit of attention from theoreticians. Two main analyses have been proposed. According to one, this construction is a reduced pseudo-cleft where what in the headless relative clause constituent has been deleted. On that approach, the example in (1a) can be schematized as follows: (2) [DP [CP What the thing is]] [PredP [VP is [CP that it all depends on the … drivers]]]. SUBJECT PREDICATE This is the analysis proposed by Massam (1999); its main advantage is its ability to capture the seemingly biclausal nature of this construction. This analysis, however, leaves unaddressed a curious observation about tense in the two copulas: when the copula in the relative clause (the initial copula) is in the past tense, is and was are equally possible in the second position, (3a), but when the initial copula is in the present tense, was in the second copula seems unacceptable, (3b): 3

The examples below are from Mark Liberman’s Language Log of June 27, 2004: http://itre.cis.upenn.edu/myl/languagelog/archives/001123.html.

experimental syntax and linguistic fieldwork

103

(3) The thing was is/was that we had no control over the situation. The thing is is/*was that we had no control over the situation. Furthermore, the pseudo-cleft analysis cannot account for examples such as (4), which Coppock and Staum (2004) also consider part of the double-is family: (4) That can’t be a very welcome outcome, is that rates will now rise. The alternative analysis, first proposed by Bolinger (1987) and more recently resumed by Coppock and co-authors (Coppock and Staum 2004; Coppock et al. 2006), postulates that the second is functions as a focus marker in a monoclausal construction: (5) [DP The thing] [PredP [VP is

is [CP that it all depends on the … drivers]]]. focus marker

If this approach is on the right track, however, it remains unclear why the new focus marker is limited to double-is constructions and is not spreading further. The growing body of naturalistic data on the double-is phenomenon is certainly intriguing, and it is easy to imagine how one might test the two hypotheses further— for example, by collecting native speaker judgments online and thus expanding the database, which was done by O’Neill (2015). But beyond obtaining more data from a larger variety of speakers, it is hard to imagine an experiment that would distinguish between the two competing hypotheses. As captivating as the double-is construction is, further experiments are not warranted until they can shed more light on the existing theoretical quandary. Let me emphasize that this cautionary note is not exclusive to this particular example. Unless more data collection is warranted, a successful experiment should be based on a relevant testable hypothesis. Somehow, linguists seem to be more aware of this need for a testable hypothesis in fieldwork elicitations (we do not go in asking a bunch of random sentences) or in the traditional introspection, where an explicit hypothesis is needed in order to come up with useful data. When it comes to experiments, this self-evident truth is often forgotten. So far, I have tried to make a case for the importance of hypotheses supported by good knowledge of theory. But that does not mean that a theoretical linguist can always come up with a nice hypothesis and rush into testing it. When the theoretician lacks sufficient information about processing, new problems may arise. Consider the phenomenon of intervention, for example—an effect caused by moving an expression across one or more nodes with similar feature specification. Compare the licit subextraction in (6a) with the illicit or marginal (6b) where the prepositional phrase about Mary “intervenes”: (6)

a. [How many people]i did Kim talk [to ti ] about Mary? b. ??/*[How many people]i did Kim talk [about Mary] [to ti ]?

Intervention effects are notoriously varied and difficult, and they have raised many theoretical questions: Do intervention effects stem from semantics (as has been argued

104

maria polinsky

primarily for intervention effects in negative islands, cf. Kluender and Gieseman 2013)? Can they be reduced to economy conditions (Lechner 2013)? Do they have more to do with the nature of probing for syntactic features than with the intervener itself (Preminger 2014)? Should they be discounted as effects of linear order (Bruening 2014)? The theoretical hypotheses here are clear, and it would be relatively straightforward to set up an experiment differentiating at least the semantic and the syntactic accounts of intervention. (A theoretical syntactician or fieldwork linguist who is not well versed in principles of processing may be duly intrigued by this phenomenon.) Knowledge of processing, however, turns the intervention phenomenon on its head. Processing operates incrementally, and one of its basic tenets is that, once a filler is identified in a non-argument position (the wh-position, in this case), it should be linked to a gap as soon as possible—a concept known as the Active Filler Hypothesis (Clifton and Frazier 1989). The opportunity to link a gap with an available filler supersedes the opportunity to identify a lexical phrase of category XP. Thus, with respect to the intervention example in (6), the Active Filler Hypothesis predicts that the parse below, where how many people is associated with the stranded preposition about, will be preferred to any other parses: (7) How many peoplei did Kim talk about ti … Once this processing constraint is taken into account, we can see that the introduction of about Mary into (6b) makes this sentence unacceptable or degraded for reasons orthogonal to intervention effects. Thus, any experiment on intervention that uses sentences like (6a,b) will be confounded by Active-Filler effects. Experimentation on this phenomenon is not warranted unless the full range of relevant effects are taken into consideration. Although both of the examples discussed in this section are from English, it is easy to imagine that similar issues may arise with respect to any language—indeed, these types of effects are likely to be even murkier when encountered in languages less well understood than English. So, these remarks are intended to be a cautionary note about the extent of preparation that is necessary before embarking on an experimental adventure in the field. If one does not have a clear hypothesis to test—or does not strike the right balance between language structure and the use of language in real time—an experiment will have to wait. Let’s assume that the question “Am I ready to conduct an experiment?” has been resolved, and an experiment is truly warranted. At this stage, it is useful to consider whether a lesser-known language that a fieldworker has access to is the best option for the experiment, or whether a more easily accessible language could be examined first. For instance, a number of languages combine prenominal and postnominal modifiers; examples include Mosetén (Sakel 2002), Yimas (Foley 1991), Tongan (Churchward 1953), and French. If one sought to experimentally investigate differences in the processing and structure of pre- and postnominal modifiers, it would make sense to start with French. Starting the experimental process with easy-to-access speakers in a forgiving laboratory setting allows for many of the kinks to be worked out before the

experimental syntax and linguistic fieldwork

105

methodologies and findings of the preliminary study are extrapolated to more exotic languages.

4.3 Is it worth the trouble?

..........................................................................................................................

Is it reasonable to run experiments in a fieldwork setting? My answer to this question is a cautious yes, and in this section, I will examine several examples where fieldwork experimentation is warranted. The list is by no means exhaustive, and hopefully with time it will grow. For the sake of exposition, I pass over the practical details of experimentation in the examples I discuss in this section; I will turn to the latter in Section 4.4.

4.3.1 Phenomena over languages As a general rule, when planning experimental work in a fieldwork setting, it is important to start with phenomena rather than languages. Say you wish to investigate the processing of wh-questions derived by A-bar movement in a novel language, language L. If your ultimate goal is to compare the processing of such wh-questions in L versus English (where we are confident that wh-question formation involves A-bar movement), then you would first need to ascertain that wh-questions in L are also formed via A-bar movement. Choosing a language where the wh-word is in the initial position is not informative enough; the wh-word may be in the initial position because it is a predicate of a pseudo-cleft with a silent copula, as shown in (8b). This is expected in head-initial languages in particular (Potsdam and Polinsky 2011), so if the lesser-studied language in question tends towards head-initiality, special care must be taken to tell these two derivations apart. (8)

a. Wh-wordi S ti V X

A-bar movement

b. [PredP Wh-word] [DP [CP S V X]]

pseudo-cleft

While finding exotic languages with initial wh-words is not difficult, rushing to experimentation before conducting a syntactic analysis of these candidate languages is premature. This is where the role of the fieldworker becomes indispensable; someone familiar with language L will know the details of its structure and be able to determine if it meets the desiderata for an experiment. Assuming this general approach is adopted, what kinds of phenomena warrant experimental investigation in languages with remote access? The most obvious answer deals with phenomena that are not available in better-studied languages. In what follows, I will review two such phenomena that have already received attention at the intersection of fieldwork and experimental syntax—alignment and word order.

106

maria polinsky

4.3.2 Examples of convergence 4.3.2.1 Alignment English, German, Dutch, Italian, Spanish, Korean, and Japanese—all languages that have been studied extensively using the experimental-syntax paradigm—have been shown to share a number of processing constraints. Among these is the subject preference advantage (SPA): the observation that subject gaps (for example, in relative clauses) are easier to process than object gaps. Consider the familiar minimal pair in (9), where the relative clause in (9a) includes a gap in subject position, and the one in (9b) has a gap in object position. The latter clause is more difficult to process, as numerous studies have shown (see Kwon et al. 2010; 2013 for overviews): (9)

a. the senatori [that __i attacked the reporter] admitted the error b. the senatori [that the reporter attacked __i ] admitted the error

The SPA is quite robust in all of the languages mentioned above, but the reasons for this remain unclear. Problematically, the existing data come from nominative–accusative languages, in which subjects appear in the same case, regardless of transitivity, and the marked form is the object (accusative). This covariance of grammatical function (subject vs. object) and case (nominative vs. accusative) has prevented researchers from determining which of these two factors underlies the SPA. In addition to case, alignment may be expressed via agreement: All subjects, regardless of transitivity, can be cross-referenced on the verb, whereas objects do not determine verbal agreement. A potential workaround for this problem is to investigate the SPA in languages with morphological ergativity. Ergative languages allow for the separation of case and grammatical function, since the subject position is associated with two cases: absolutive (intransitive subjects) and ergative (transitive subjects). Compare in Niuean (Polynesian): (10)

a. Kua koli e ekekafo. Niuean pfv dance abs doctor ‘The doctor danced.’ b. Kua lagomatai he ekekafo e faiaoga. pfv help erg doctor abs teacher ‘The doctor helped the teacher.’

If alignment is manifested in verbal agreement only, freestanding noun phrases may remain unmarked, whereas the form of the predicate varies depending on whether it agrees with the intransitive subject or direct object (absolutive agreement) or transitive subject (ergative agreement). Compare in Ch’ol (Mayan), where one set of affixes on the verb indexes the absolutive argument, and the other, the ergative: (11)

a. Tyi y-il-ä-y=ety. Ch’ol pfv 3sg.erg-see-trans.verb-epenthesis=2abs ‘S/he saw you.’

experimental syntax and linguistic fieldwork

107

b. Tyi ts’äm-i-y=ety. pfv bathe-intrans.verb-epenthesis=2abs ‘You bathed.’ (Coon 2017: 101) Ergative languages allow researchers to study the processing of case and/or agreement and grammatical function (that is, the syntactic position of an argument in clause structure) as independent phenomena in a way that accusative languages do not. In ergative languages, case marking does not co-vary with the subject/object distinction. If ergative languages are sensitive to differences between subjects and objects (regardless of case marking), this will provide strong and novel evidence that subjects constitute an independent concept in grammar. Some ergative languages are consistently ergative (that is, their ergative alignment is found across all aspectual and tense forms), while others display “split ergativity”: their ergativity alignment is limited to certain aspectual or mood features (the perfective or irrealis, for instance) or to particular persons (non-pronominal expressions). See Coon and Preminger (2017) for an overview and discussion. There is a clear need to test subject preference in ergative languages, and this need has led to experimental fieldwork on Basque (Carreiras et al. 2010; Gutierrez-Mangado 2011; Laka et al. 2012), Avar (Polinsky et al. 2012; Polinsky 2016), Niuean (Longenbaugh and Polinsky 2016; 2017), Georgian (Foley 2020; Lau et al. 2023), Ch’ol, and Q’anjob’al (Clemens et al. 2015).4 For each language, the studies tested speaker preferences in the comprehension of subject and object gaps in relative clauses, following research on clauses like the English ones shown in (9). The languages listed were chosen for both conceptual and practical reasons. The conceptual reasons included: (i) the need to use ergative languages where both ergative and absolutive arguments can extract without a gap (i.e. languages without syntactic ergativity); (ii) the need to compare and contrast ergative languages where the relative clause precedes the head noun (headfinal languages) and follows the head noun (head-initial languages); and (iii) the need to compare languages where alignment is encoded on the nominal (via dependentmarking, i.e. case-marking) versus the predicate (via head marking, i.e. agreement) (see Nichols 1986 for the distinction). With respect to the latter point, the distinction between dependent- and head-marking in languages is certainly a simplification and idealization, because quite a few languages combine the two types of marking. For example, of the languages in Table 4.1, Avar, Basque, and Georgian all have casemarking as well as argument cross-referencing on the verb. With that caveat in mind, the comparisons are summarized in Table 4.1. Practical considerations for choosing these languages include: (i) the availability of pre-existing analytical work on the data in question, and (ii) at least in the Avar and Mayan studies, partnerships with native-speaker linguists. Native-speaker linguists played a crucial role in norming the stimuli for the experiments, finding participants, 4

Q’anjob’al is partially syntactically ergative, but the study cited here examined the domain where only morphological ergativity holds.

108

maria polinsky

Table 4.1 Experimental paradigm for studying subject preference, morphologically ergative languages Head-initial Head-final

Dependent-marking (case-marking)

Head-marking (agreement)

Niuean Avar, Basque, Georgian

Ch’ol, Q’anjob’al –

and offering explanations to community members about why the study was useful (I will return to this in Section 4.4). Studies of Georgian, Basque, and Avar were carried out using the self-paced reading paradigm, which has also been used in many MYALS-based studies of relative clauses.5 The results generally upheld the SPA, but there were some complications. First, because Basque and Avar speakers primarily use their languages orally, the average reading times for these speakers were about three times longer than in languages with a wellestablished reading tradition, such as German or Japanese. That led the researchers to look for alternative testing methods, including sentence–picture matching (SPM) (Bishop 2003). In this task, which can be used both offline and online, participants are presented with a series of pictures (usually two or four), listen to one sentence, and then have to decide which picture goes with the sentence. The results of this task were independently compared against the results from a self-paced reading task in a language where test participants were used to reading on a daily basis (Clemens et al. 2015). The results were comparable, which confirmed the utility of the SPM task. The results of the SPM experiment in multiple ergative languages upheld the SPA, and therefore offered novel evidence in favor of the privileged status of subjects regardless of alignment. The SPA was particularly apparent in head-marking languages, where it was essentially the main result. In dependent-marking languages, there was an additional cueing effect that followed from morphological informativity: As the marked case, the ergative in the relative clause served as the cue that an absolutive argument needed to be projected. This ergative cueing effect was observed in both prenominal (Basque, Avar) and postnominal (Niuean) relative clauses. As expected, when the relative clause contained only an absolutive argument, no cueing effects were found, since the absolutive can serve either as the subject of an intransitive or the object of a transitive clause. On the nominative–accusative side, cueing effects are observed in the presence of the accusative, which only reinforces the SPA. These fieldwork experiments lead to new predictions: in an ergative language with prenominal relative clauses and head-marking (the missing cell in Table 4.1), the SPA

5

Georgian stands out on this list of languages because it is not endangered, is widely used, and is spoken by a highly literate populace, which allowed us to conduct a number of extensive studies that are currently being analyzed and written up. With respect to Georgian, it was the advent of new tools, in particular, a portable EEG machine (Lau et al. 2023) and online reading studies, that allowed us to add Georgian speakers to the ranks of MYALs.

experimental syntax and linguistic fieldwork

109

should be particularly apparent. As far as I am aware, few languages fit this description. Among them is Abkhaz, a language spoken in the Northwest Caucasus (Hewitt 1979: 35–45), which may offer an excellent test case for connecting fieldwork and experimentation in the future. The long-distance dependencies discussed in this section have long been at the center of attention in experimental syntax. However, alignment differences go well beyond the SPA, and future experimental work on ergative languages can include explorations into island constraints, licensing or prediction of case forms, agreement attraction, and other phenomena (see Longenbaugh and Polinsky 2017 for a discussion of several directions in experimental research on ergativity).

4.3.2.2 Word order Most experimental syntactic research has been based on languages with the basic word orders SVO or SOV, which are by far the most common orders cross-linguistically. Many SVO and SOV languages, including the ones studied experimentally, allow subject-before-object (SO) orders and object-before-subject (OS) orders; for these languages, the main experimental result is that OS orders impose a greater processing load (Bader and Meng 1999; Kaiser and Trueswell 2004; Mazuka et al. 2002; Sekerina 1997; Kwon et al. 2006; Tamaoka et al. 2005). However, in the languages investigated, OS may be derived by scrambling.6 If we assume that scrambling is not base-generation, we can predict that the OS order should be syntactically more complex (as shown (12)). OS order is often less frequent, as in Korean and Japanese (Kwon et al. 2006). These considerations point to the SO order as the starting point. (12)

a. Oi S ti V

scrambling, SOV language

b. Oi S V ti

scrambling, SVO language

c. Oi Vk S tk ti

scrambling and verb movement, SVO language

In psycholinguistic literature, two general theoretical explanations for the SO preference have been spelled out (Koizumi et al. 2014). In one view, grammatical factors of individual languages (such as syntactic complexity) are the main driving force behind word order preferences; these preferences are therefore domain-specific. If this account is correct, SO is not a universally preferred order. In the alternative view, word order preferences follow from universal human cognitive features; if that is the case, SO word order should be preferred regardless of the basic word order of any individual language (Bornkessel-Schlesewsky and Schlesewsky 2009; Tanaka et al. 2011). These views both correctly predict that SO word order is preferred in SO languages: SVO, SOV, and VSO. The deciding group are OS languages, of which VOS is the only reliable type attested 6 Although a number of analyses uphold the base-generation approach to scrambling (Fanselow 2001; Neeleman and van de Koot 2008), it is not obvious that scrambling is syntactically more complex. Clear, incontrovertible evidence for scrambling as movement is surprisingly hard to come by, at least in Germanic languages.

110

maria polinsky

cross-linguistically. To create a test environment, researchers must compare the OS order, which can be assumed to be basic, and the SO order. Within this comparison, the domain-specific approach predicts that OS order should be easier, and SO order should be associated with a higher processing burden. The universal approach predicts the opposite. Koizumi et al. (2014) and Yasunaga et al. (2015) conducted several studies on word order processing preferences for the Mayan language Kaqchikel, in which the basic order is consistently VOS (regardless of dialect). These studies compared and contrasted SVO (SO) and VOS (OS) orders. These two orders differ along two dimensions: structural complexity (SVO is derived from VOS via scrambling (13))7 and frequency (SVO is more common). This conspiracy of factors makes Kaqchikel a promising language for the analysis of the SO vs. OS contrast.8 (13) [Si [V O gapi ]] Yasunaga et al. (2015) compared SVO and VOS orders using an SPM task, and recorded electroencephalograms for their participants, all native speakers of Kaqchikel. Each participant saw a picture in the center of a computer screen for three seconds and, after the picture disappeared, a Kaqchikel sentence was aurally presented through a headset. As the authors note, the auditory “rather than visual presentation method was used because the Kaqchikel language is mainly used in daily conversations rather than in written form, and Kaqchikel speakers generally are not accustomed to reading Kaqchikel” (Yasunaga et al. 2015: 19). Each picture used in this experiment depicted a transitive action describable with one of the following six verbs commonly used in Kaqchikel: ‘hit,’ ‘pull,’ ‘push,’ ‘call,’ ‘bless,’ and ‘surprise.’ Either the agent or patient argument consisted of two persons, and the other consisted of just a single person. The agent(s) and patient(s) were painted in different colors: red, blue, white, or black. The participants heard sentences such as the following (as well as VOS and OSV sentences, which are also possible):9 (14)

a. x-∅-k-oyoj ri xar ri taq käq. Kaqchikel asp-3abs.sg-3erg.pl-call det blue det pl red VOS b. ri taq käq x-∅-k-oyoj ri xar. det pl red asp-3abs.sg-3erg.pl-call det blue SVO ‘The reds called the blue one.’

7 While the status of the left-hand subject as a topicalized constituent is relatively clear, the nature of this topicalization (scrambling vs. base-generation) and the landing site of topicalization are subject to some debate. The authors do not commit to a particular landing-site category; they denote the base position atheoretically as a gap. 8 Although the authors do not address this in the paper, it bears mentioning for the purposes of this chapter that their neuroimaging work relied on a careful linguistic analysis of Kaqchikel word order, conducted in collaboration with several native-speaker linguists (Koizumi et al. 2014). 9 The glosses and translation are modified from the original.

experimental syntax and linguistic fieldwork

111

The brain imaging results showed different areas of difficulty associated with OS and SO word orders. Without going into technical details, the pattern of results corroborated the theoretical analysis according to which SVO is the more complex order in Kaqchikel, with the subject in a preverbal A-bar position, as shown in (13). At the same time, the results showed that even though this complex SVO order is the most frequent in the language, it is harder to process than the structurally basic VOS. The Kaqchikel results therefore argue against the hypothesis that SO order is universal and cognitively preferred. This paper may not be the final word on SO vs. OS. In particular, one of the major confounds has to do with the baseline differences, whereby the critical region that is tested follows a verb in SVO, but a noun in VOS (see Federmeier et al. 2000 on the effect of grammatical category on the distribution of ERP components). Nevertheless, it is a welcome new step in applying neuroimaging to a language outside the familiar pool, relying on extensive fieldwork, and modifying the experimental methodology in an ecologically sound way—in particular, by using auditory presentation. Further successful fieldwork experiments could be built on this model; for example, it would be valuable to directly compare VSO and VOS in Fijian, where they are equally possible (Dixon 1988; Aranovich 2013) or Tagalog, where VSO and VOS are both observed in Agent Voice (Kroeger 1993). Another important word order consideration concerns incrementality in production. It is generally assumed that language users do not plan entire utterances before beginning to speak. Instead, as in parsing, planning unfolds step by step (Levelt 1989; Ferreira and Swets 2002). While incremental planning is itself uncontroversial, it is less clear whether the structure or the lexicon serves as the starting point in production planning. If structure is the starting point, the speaker will generate the syntactic skeleton of their utterance and then add lexical content in an incremental manner. If lexical encoding takes precedence, the reverse is true. English production seems to support the lexical model of encoding, but data from languages with more flexible word order support the structural model—or a combination of the two (Hwang and Kaiser 2015; Norcliffe, Jaeger, and Harris 2015). Until recently, all of the work in this sub-area has focused on a small set of subjectinitial languages, but lately verb-initial languages have been added to the data pool. These languages “offer an interesting test case for studying the effects of grammar on sentence formulation. In order to select a suitable sentence-initial verb, information about the relational structure of the event presumably must be planned early, possibly earlier than in subject-initial languages” (Norcliffe, Jaeger, and Harris 2015b: 1020). Recent experimental data from two such languages, Tseltal (Norcliffe, Knopke, Brown, and Levinson 2015) and Tagalog (Sauppe et al. 2013),10 suggest that the early position of the verb changes the order of encoding operations: Relational information encoded in the transitive verb receives priority over either character associated with that verb. In both languages, the verbal morphology carries important information concerning 10

In both studies, the experimental work relied on a detailed syntactic analysis of the language in question based on primary data.

112

maria polinsky

the event and its participants, and this may give priority to grammatical structures. If these results are on the right track, they offer additional support for Hwang and Kaiser’s (2015) proposal that production is guided by both structure and lexical access, but that the relationship between these two components varies by language and can only be predicted based on a careful examination of each language’s grammatical system.

4.4 Practical issues

..........................................................................................................................

4.4.1 Starting point Since any experimental work conducted in a fieldwork setting is experimental work on language, it needs to start with a specific hypothesis and a rationale for choosing one language over others—the previous section illustrated some of these rationales. Fishingexpedition experiments do not work well in a lab setting, and the situation in the field is no different. Furthermore, when starting experimental fieldwork on a new language, it is advisable to begin by replicating experimental methodologies already used on more familiar languages. Say your ultimate aim is to use a visual world paradigm to explore the anaphoric use of classifiers in a lesser-studied language. You should begin with a simpler experiment to pave the way. For instance, you might replicate a study on Mandarin conducted by Huettig et al. (2010), in which speakers heard a noun and looked at pictures of objects that shared or did not share that classifier. (In the Mandarin study, the main finding was that classifier distinctions influence eye gaze, but only when classifiers are overtly present in the speech stream.) By replicating an existing study, the researcher can rely on an established experimental paradigm (which can be modified as needed) and minimize the unknowns. Once the replication experiment has been done, a novel study is easier to conduct. (Note that most of the experiments described in Section 4.3 replicated the experimental design of work conducted on better-studied languages.)

4.4.2 Participants Fieldwork is not always about pitching a tent in a remote location, sharing exotic food with your consultants, and carrying coffers of recording equipment up a steep hill. As Claire Bowern put it, one does not have to “be Indiana Jones in order to be a real linguist or fieldworker” (2008: 14). The main components of fieldwork are: (i) that the language has not been well described before, and therefore, (ii) data collection will need to be undertaken before completing an experiment. It is possible to find languages that fit this bill within the cities that house research universities. If there are enough speakers of a given language in a city, experiments can even be done in a familiar lab.11 The 11

An important consideration in this situation is that speakers in such a setting will most likely be bilingual—something I will return to in Section 4.4.3.

experimental syntax and linguistic fieldwork

113

Endangered Language Alliance of New York City is an outstanding example of work on the linguistic diversity of a large urban area. Still, bringing speakers of lesser-known languages into the researcher’s experimental setting is less common and less likely than traveling to those speakers. The trip can take a few hours or it can take several days. No matter how close or far the speakers are, you’ll need to plan carefully. Two main aspects of the interaction with participants deserve mention: justification of the study to the participants and researcher involvement in the community. In experimental work in the lab, the former component barely plays a role. MYALSs rarely ask questions about the experiments they participate in. They are accustomed to tests and test-taking, they typically don’t expect explanation of the reasoning behind an experiment, and they normally show cooperative behavior in dealing with the experimenter. In a fieldwork setting, such cooperative behavior and magnanimous indifference are an exception, not the norm. When embarking on an experiment in the field, be prepared to be greeted with curiosity, suspicion, surprise, criticism for engaging in silly activities, or some other reaction that may be hard to predict—anything but immediate acceptance. Because participants are not likely to cooperate automatically, it is important to be able to explain why you are conducting your study, and to indicate possible benefits to the community. The explanation can be presented at a community meeting, built into the prompt of the experiment, or offered in an initial conversation with participants. It is often helpful to rely on the fieldworker, who may already have ties with the local community, and on native-speaker linguists, if they are available. If not, seek out community members who are in positions of authority. Once these people approve of your project, they will be able to serve as the link between your research team and the local participants. Teachers or priests often perform this role. Remember that, in justifying the study to the community, it is important to step outside of your technical frame of mind and couch the study in more general terms. Stating that the study will allow outsiders to understand the language of the community better is a valid justification. Small communities are often pleased by outside interest in their language. However, this is not always the case, and it is never a good idea to push an experiment upon a group that is not willing to accept it. Quite a bit of experimental fieldwork has been done with Mayan languages; I have already mentioned Koizumi et al. (2014), Yasunaga et al. (2015), Clemens et al. (2015), and Norcliffe, Knopka, et al. (2015), and there is also work on Yucatec Mayan (see Butler 2011 on number marking; Skopeteas and Verhoeven 2009 on information structure; Norcliffe and Jaeger 2016 on production). The confluence of research on Mayan is not accidental, and it owes its success to at least two practical considerations. First, there is an abundance of rich primary work on Mayan, pioneered by Judith Aissen, Nora England, Clifton Pye, Barbara Pfeiler, and Roberto Zavala (see Aissen et al. 2017 for a summary volume). Second, and equally important, is the strong pattern of indigenous activism in Mayan communities (Warren 1998; Fischer and Brown 2001). Local activists tend to be interested in collaborating with linguists (and other researchers) in promoting new work on their languages—as long as that work contributes to the

114

maria polinsky

recognition of the local communities and cultures. Thanks to this strong community engagement, it became possible to establish a field research station in the highlands of Guatemala, and it has been my privilege to direct this field station since 2015 (see Polinsky 2019 for details). A similar convergence of factors has favored experimental syntax research into the Austronesian language Chamorro (see Wagers and Chung, Chapter 14 in this volume). While this combination of first-rate primary research and community involvement does not mean that experimental work on Mayan and Chamorro is carried off without a hitch, it eases the path for researchers hoping to work with native speakers in the field. Long-term involvement of the researcher in the language community is another crucial aspect of fieldwork, and another fundamental difference between experiments with MYALSs and experiments in the field. Fly-by-night studies do not work in the fieldwork setting, and there is an expectation that both sides should benefit from the experiment. The benefits to the community may be broad—like validating faith in the community and its language or promoting cultural awareness—or may be more direct. For example, researchers often succeed in bringing informants on board with the argument, “If someone wants to learn your language they will know what its most difficult aspects are.” In the work conducted on Mayan languages, popular presentations (on the value of promoting Mayan languages, Mayan diversity, or Mayan inscriptions) are frequent and greatly appreciated by the local communities. Community feedback and desire for involvement will vary across communities, but a sense of investment in the research should always be expected. Being involved in a community means knowing and respecting the rules of cooperative behavior in that community. Nowhere is this knowledge more important than when considering compensation for participation in an experiment. The field linguist and/or native consultant should make recommendations about culturally appropriate ways to approach this issue in a given community, and it is important to follow their advice (even if it may seem counterintuitive to an outsider). It is also imperative that all participants be compensated the same way; in tight-knit communities, where people like to talk, any sign of disparity or favoritism may sink the experiment. The number of participants in an experiment is a serious question that comes up in all types of experiment planning, not just experimental fieldwork. The appropriate number largely depends on the methodology, the task at hand, and the goals set by a researcher. On the other hand, the feasible number may be constrained by the context, and the potential pool of participants may not be as large as it is when working with MYALS in a research center. Unfortunately, the difficulty of recruiting participants often becomes a deterrent for researchers hoping to work on a language in a fieldwork setting. Here, I would like to offer two considerations. First, as noted in Section 4.1, there are experimental fields in linguistics where the number of participants is very low, yet nobody contests the validity of the results. Such fields include aphasiology, brain imaging (which involves expensive fMRI techniques), some sign language research, and most research in phonetics. In these fields, researchers have learned to work around their small sample sizes by modifying their statistical

experimental syntax and linguistic fieldwork

115

analyses. A common solution is random sampling with replacement (Groves et al. 2009), but researchers may also choose to simply adjust their tests for sample size, making sure to use nonparametric tests when they do not have enough data to be assured of a normally distributed dataset. A standard workaround is to obtain a large number of data points per participant. This strategy can easily be carried out in fieldwork settings by testing each structure under examination with multiple lexicalizations and on multiple days. The experiment will take longer, but the results will be as useful and usable as those of a phonetic-recording experiment done in the lab with three speakers over the course of one day. Second, not all “fieldwork languages” have a limited number of speakers. For languages with robust speaker populations, it is desirable to recruit more participants than are typically recruited in laboratory settings, since there is a greater likelihood that fieldwork participants will not be used to test-taking and may not completely follow the protocol. Unforeseen factors such as lack of vision correction or bad dentistry may cause noise within your data. It may be hard for the researcher to anticipate such situations, but it is possible to conduct an experiment in a society where enhancers like glasses and dentures are a luxury—and it may be culturally inappropriate to turn away someone with no teeth who wants to participate in a production study, or someone who can barely see but wants to take part in an SPM task. To account for such noise in the data, a good rule of thumb is to increase the number of participants by about 20%, as compared to numbers in a lab setting. On the flip side, in some communities, people who are asked to participate will bring friends and family members along, and you may wind up with more participants than you actually need. Again, if it is culturally inappropriate to turn these participants away, you may have to accommodate them. Some of these participants’ data will have to be discarded—a small price to pay for the collection of good data. In a lab, MYALSs show up on schedule, and the idea of arriving unannounced to an experiment is equally strange to researcher and participant. But things are different in the field. “Don’t expect people without clocks and watches to be concerned about very specific times of the day. It’s pointless arranging a meeting for 10:30 when no one has a clock. It’s much better to be flexible in your work hours. Be aware too that in some cultures an agreement to meet isn’t like making an appointment … and doesn’t necessarily obligate the person to turn up” (Bowern 2008: 135). For many participants in the field, the experiment is a social event. As mentioned above, some may bring their friends or neighbors to watch or participate. These social considerations also mean that the “quiet testing room” we tend to take for granted in a lab setting may be completely alien to your fieldwork participants—and you may find yourself, quite literally, with noise in your data. No matter what, always be prepared for chaos and commotion. Finally, those consent forms and questionnaires that our MYALSs fill in and sign without thinking twice may cause significant consternation among people who are not used to taking tests and signing documents on a daily basis. Again, these forms should be designed in consultation with people who have worked in the given cultural setting. In most fieldwork situations, including in the context of experiments, oral consent

116

maria polinsky

is preferable to written consent. Most ethics committees in Western universities are amenable to this option. Make sure the consent form is prepared in the language you are targeting in your experiment. If a fluent speaker is involved in your research team, that speaker can explain the consent form to each participant; if not, the consent form and experimental instructions can be recorded in advance. Questionnaires should include standard biographical information as well as information about the participant’s knowledge of different languages and literacy—an issue I address in the next section.

4.4.3 Participants and literacy In the work on subject preference advantage in Mayan (Clemens et al. 2015), we found a sharp contrast between bilingual (Spanish–Ch’ol, Spanish–Q’anjob’al) and monolingual Mayan participants. The trends in the data were the same for both groups, but the SPA and other effects were stronger in the bilingual cohort as compared to the monolingual speakers. The monolingual Mayan speakers were significantly less accurate than the bilingual speakers even on the syntactically unambiguous clauses. In addition, the standard error and level of noise in the data was greater for the monolinguals than the bilinguals in each of our analyses. We interpreted these results as an indication that the Mayan monolinguals, who had no experience with literacy, faced greater challenges in the SPM task because they lacked general skills that belong to the playbook of “cooperative research behaviors”: following instructions with less context than one receives in the “real world,” interacting with technology, interpreting abstract or hypothetical questions, and imagining unlikely situations. These behaviors develop in general educational settings, regardless of language, and may improve as people engage with literacy on a daily basis. Additional support for our hypothesis came from the monolingual participants’ performance on longer sentences. We found that in the monolingual cohort alone, our experimental results became less accurate as the auditory stimuli became longer. Although this lower performance on longer stimuli was observed in both bilinguals and monolinguals, it was again greater among the monolinguals. A longer stimulus imposes a greater memory load, and there is independent evidence that educational experience correlates with working memory capacity (Gathercole et al. 2004). This variance would have been negligible in a population more skilled at test-taking, but it played a negative role in our pool, particularly with Mayan-speaking monolinguals. These observations concerning test-taking skills and the ability to engage in metalinguistic deliberations are not unique to exotic languages. Existing work on gradience in English judgments shows that such gradience is relativized to the participants’ educational levels, again offering support for the correlation between general literacy and cooperative research behavior. Subjects with higher levels of formal education produce cleaner data in an elicitation or experimental setting (cf. Dąbrowska 1997; 2012; Street and Dąbrowska 2010). All told, experimenters in the field need to be prepared for difficulties in test-taking that are not directly related to the participants’ competence.

experimental syntax and linguistic fieldwork

117

Lastly, it is worth touching on a perennial question raised by our bilingual/monolingual discrepancy in the Mayan study: all things being equal, which group of speakers should researchers rely on, monolingual or bilingual? Which participants should we test? Although there are opposing views on this topic (see e.g. Vaux and Cooper 1999; Bowern 2008), I suggest that, in an ideal world, it is a good idea to test both groups, while remembering to keep track of their literacy, education, and multilingual experience, as in Clemens et al. (2015) for Mayan.

4.4.4 Materials Planning an experiment on a well-studied language, with MYALSs as participants, takes a long time, and is usually more labor-intensive than running the experiment itself. With a lesser-known language, that preparation time increases even more. Plan to double your preparation time before a fieldwork experiment; you may find that there are more unknowns, more people who need to be involved, and more confounds to be discovered along the way. What makes preparation for fieldwork experiments so complex? For one thing, most standard experiments rely on existing dictionary and corpus data to determine the frequency of items or constructions, establish plausibility conditions, and choose between alternative stimuli. But many lesser-known languages lack dictionaries, large annotated and tagged corpora, or even a decent collection of texts. As a result, this stage in preparation for a fieldwork experiment may become an experiment in its own right: the fieldworker will need to collect a corpus, analyze it, and use that new data to construct experimental stimuli. At the very least, a fieldworker could collect narratives in the given language from several consultants—using typical prompts such as the Frog story (Mayer 1969), the Pear story (Chafe 1980), Totem Field Storyboards (which are designed to elicit a body of tokens of specifically targeted constructions in fieldwork),12 or traditional folklore stories—or pre-test some participants on sentence completion tasks in order to build up a pre-normed collection of materials. For example,13 for an experiment on adjunct islands, the fieldworker may get speakers to provide continuations of sentences like “My neighbor was happy when …” and “The plants bloomed later because …”. Collecting data like this gives the researcher the necessary set of adjuncts from which to build a set of experimental stimuli on wh-questions. Another bottleneck in the creation of materials concerns the norming of the experimental data. With larger, better-known languages, this task is often done online (and quite efficiently), but that option may not be available for lesser-known languages. 12

http://totemfieldstoryboards.org/. The advantage of Frog and Pear stories is in that these stories have already been used to collect data from a wide variety of speakers and languages (cf. Berman and Slobin 1994; Chafe 1980). The resulting data are highly comparable as they are based on the same plot. Yet these stories are culture-specific, and they may not work in a new fieldwork setting without significant modification. 13

118

maria polinsky

Again, an extra field trip may be needed to norm the stimuli by testing their naturalness, pragmatic appropriateness, and general grammaticality with native speakers (preferably different from those who are going to be tested in the subsequent experiment). In preparation for the work on Niuean reported in Longenbaugh and Polinsky (2016; 2017), we took two trips to Auckland to work with Niuean speakers on creating stimuli, norming them, and running a pilot; only after that, on our third trip, were we able conduct the experiment. As we progressed in our work on the Niuean project, we made certain that the syntactic analysis of Niuean relative clauses was clear to us. This experience underscores another critical requirement in combining fieldwork and experimentation: You cannot analyze the language and run an experiment at the same time. The analysis has to come first, no matter how long it takes. Recall Yasunaga et al.’s (2015) work on Kaqchikel, discussed in Section 4.3.2.2; the researchers’ analysis of Kaqchikel SVO presupposed the topicalization of the subject argument.14 Once the materials for the anticipated experiment are assembled, it is desirable to conduct a pilot experiment with two or three language consultants and to collect their opinions and recommendations on the stimuli and time course of the experiment. Sometimes, even though your materials may be perfectly grammatical and well-formed, the pilot participants identify finer points that would not be apparent to any of the outsiders, including the fieldwork linguist. For example, in a subsistence culture, it may be important to specify what kind of fruit, vegetable, or animal is mentioned in the stimuli: just calling it a mango or a goat may not be enough. It may be culturally appropriate to use proper names in some societies but not in others, and it may be necessary to identify if the transfer of an object from one person to another is permanent or temporary. Sometimes, just an accidental resemblance between a person in the set of visual stimuli and a member of the local community may become a source of confusion, discomfort, or amusement. Participants in the pilot study cannot be the same consultants that helped with the initial stimuli construction, and they cannot participate in the full experiment later on, so it is important to be judicious in choosing the participants for your pilot (of course, a lot will depend on which speakers, and how many, are available). While it is important to tailor your stimuli to your particular experiment, it is also useful to remember the many existing materials. Stimuli from well-known languages can be combined with fieldwork stimuli (for example, Benjamin Bruening’s Scope Fieldwork Project15 or the MPI Language and Cognition field materials16 ) and then subjected to further selection.

14

It is fine to have two (or more) alternative hypotheses regarding the structure in question; sometimes primary data may not distinguish the two well enough, and the experiment can come to the rescue. Regardless, the consequences of each analysis must be spelled out. 15 http://udel.edu/ bruening/scopeproject/scopeproject.html. 16 http://fieldmanuals.mpi.nl/volumes/2001/.

experimental syntax and linguistic fieldwork

119

4.4.5 Methods There is no need to invent new methodologies in a fieldwork setting; what works well in a lab setting should also work in the field. The two main desiderata are (a) a reliance on an auditory rather than visual presentation (because of the likelihood that lesserknown languages will exist primarily in a spoken medium, with low literacy), and (b) an expectation that the testing environment may be less “clean” and more disrupted than in the lab (in terms of both the environment—too hot, too cold, too dark—and the risk of non-participants standing around watching the experiment or walking into the testing room talking; for more on this issue, see Section 4.4.2). There may be no single place where the experiment can be conducted; instead, the research team will have to move their equipment from one participant’s home to another. Even an experienced fieldworker may not anticipate the numerous practical issues that come up in testing, and this may create a need for multiple field trips—adding more time to the project. It is ideal to start testing with simple behavioral data; such tests are known as “paperand-pencil tasks” in studies of MYALSs, but there should be no paper or pencil in the fieldwork version (see den Dikken et al. 2007 for a discussion). Instead, the data can be recorded, then analyzed. If this stage is productive and there is justification for doing something more elaborate, it is reasonable to follow up with an eye-tracking experiment and then a neuro-imaging experiment. Eye-trackers, and even ERP machines, are becoming ever more portable, but before jumping on a plane or boat with the latest EyeLink or Brain Products amplifier, it is worth asking the questions raised in Section 4.2: Is the experiment warranted? What can language L deliver that cannot be obtained by studying a different language? Can a simpler methodology be used to answer the same questions?

4.4.6 Language endangerment and experimental work Many lesser-studied languages are also endangered languages. In these cases, often the only speakers left are so-called semi-speakers, also known as passive (recessive) bilinguals or lower-proficiency heritage speakers. The special status of these speakers in fieldwork was first raised in a seminal paper by Hans-Jürgen Sasse. Sasse observed that differentiating native grammars “from the … situation of language decay is essential for the evaluation of data elicited from last generation speakers in a language death situation… How reliable is the speech of the last speakers [of a given community] and how much does it reveal of the original structure?” (Sasse 1992: 76). Semi-speakers present unique challenges in the fieldwork context: If you are working on a highly endangered language, conversation data might be very difficult to obtain. People might not speak the language on a daily basis, or they might feel uncomfortable about speaking spontaneously while being recorded. (Bowern 2008:122)

120

maria polinsky

Beyond the sociocultural context that accompanies endangered languages, there may also be issues with production and comprehension. In terms of production, endangered language data is likely to include a higher-than-typical occurrence of (i) long pauses, due to lexical-access problems; (ii) disfluencies or retractions; (iii) multiple redundancies and repetitions; and (iv) short segments, with few, if any, embedded structures. It is also reasonable to expect variation in production across and within speakers due to the speakers’ uncertainty about some forms—a side effect of a small or fragmented speech community with reduced communication in the target language. Heritage morphology is typically rife with overmarking and over-regularization, with few, if any, null pronominals (see Benmamoun et al. 2013 and Polinsky 2018 for details of heritage language production). In comprehension, the yes-bias, which is the tendency to over-accept questionable data while being reluctant to reject ungrammatical or infelicitous language material, appears to be one of the strongest telltale signs of heritage language status (Orfitelli and Polinsky 2017; Polinsky 2018). In manifestations of the yes-bias, heritage speakers generally give higher ratings to well-formed and felicitous segments, but are loath to reject ill-formed or infelicitous structures because of their own uncertainty. When such speakers do reject a particular linguistic form, this can be taken as a sign of solid, strong judgment. Yet when the much-coveted star on a linguistic example does not materialize, its absence may not provide clear information; the example may actually be correct, or the speaker may be so uncertain that she cannot commit to a decision. If the language you wish to study has few remaining speakers and those speakers show signs of being semi-speakers or heritage speakers, should you still go ahead with your experiment? The answer depends on the hypothesis and questions underlying your research. Investigations of such speakers can provide useful new data on heritage language structures. If you do choose to approach the lesser-studied language as a heritage language, it may be valuable to compare its behavior to other heritage languages for which a baseline dialect is available—for example, Heritage Spanish or Heritage Korean. Existing experimental work on heritage languages in the field includes behavioral studies on Heritage Inuttitut (Sherkina-Lieber 2011; 2015; Sherkina-Lieber et al. 2011).

4.5 Conclusions

..........................................................................................................................

This chapter has surveyed the main conceptual and practical aspects of experimental work in the field involving lesser-known languages. Connections between experimental work and fieldwork are bound to grow, given the increased interest in documenting lesser-studied languages, the progress in analytical tools and diagnostics in theoretical syntax, and the rapid development of experimental approaches to sentence structure that are becoming more integrated with syntactic theory. I have argued that there are no insurmountable differences between experimental syntax and fieldwork on syntax of lesser-studied languages. Both disciplines work with

experimental syntax and linguistic fieldwork

121

massive amounts of data, both rely on hypothesis testing, and both constantly refine and update their techniques. It is probably more apparent with experimental work than fieldwork, but great strides in both disciplines have been made since the 1980s in terms of both sophistication and depth of analysis. Although the two disciplines may use different tools and different vocabularies, they address the same fundamental issues. Furthermore, while lab-based experimental syntax may seem the more glamorous practice these days (after all, it is a relatively new field, and novelty always attracts attention), applying the experimental approach to lesser-known languages in a fieldwork setting is an equally challenging—and highly rewarding—pursuit. Good experimental work and good fieldwork have to rely on the understanding of language structure and knowledge of language theory; without those, experimental endeavors no matter how sophisticated are doomed to fall flat. When done right, experimentation in the lab and experimentation in the field bring meaningful results that can feed back into linguistic theory by raising new questions and raising the bar for analytical adequacy. One of the crucial takeaways from the discussion in this chapter has to do with the ideal composition of an experimental fieldwork team: I have argued, and I firmly believe, that the most effective way of combining experimentation with fieldwork is by building research teams where different members (the fieldworker, the experimentalist) bring different skillsets and areas of expertise, but share common goals, such as a willingness to rely on each other’s strengths. Engagement of native speakers as members of the team, or major stakeholders in the ongoing project, is also an important ingredient of successful experimental work in the field.

Acknowledgments

..........................................................................................................................

This work was completed during my visit at the Hungarian Academy of Sciences under the Distinguished Guest Scientist Program in 2017. I am grateful to Ryan Bochnak, Marcel den Dikken, David Erschler, Grant Goodall, Polina Pleshak, Jon Sprouse, and two anonymous reviewers for their helpful comments on the earlier draft of this chapter.

References Aissen, J., Nora England, and Roberto Zavala Maldonado (eds). 2017. Mayan languages. London: Routledge. Aranovich, R. 2013. Transitivity and polysynthesis in Fijian. Language 89: 465–500. Arunachalam, S., and A. Kothari. 2011. An experimental study of Hindi and English perfective interpretation. Journal of South Asian Linguistics 4: 27–42. Bader, M., and M. Meng. 1999. Subject–object ambiguities in German embedded clauses: An across-the-board comparison. Journal of Psycholinguistic Research 28: 121–143. Benmamoun, E., S. Montrul, and M. Polinsky. 2013. Heritage languages and their speakers: Opportunities and challenges for linguistics. Theoretical Linguistics 39: 129–181.

122

maria polinsky

Berman, R., and D. Slobin (eds). 1994. Relating events in narrative: A crosslinguistic developmental study. Hillsdale, NJ: Lawrence Erlbaum. Bishop, D. V. M. 2003. Test for reception of grammar, Version 2 (TROG-2). London: Psychological Corporation. Bochnak, R., and L. Matthewson. 2015. Introduction. In R. Bochnak and L. Matthewson (eds), Methodologies in semantic fieldwork, 1–12. Oxford: Oxford University Press. Bohnemeyer, J. 2015. A practical epistemology for semantic elicitation in the field and elsewhere. In R. Bochnak and L. Matthewson (eds), Methodologies in semantic fieldwork, 13–46. Oxford: Oxford University Press. Bolinger, D. 1987. The remarkable double IS. English Today 9: 39–40. Bornkessel-Schlesewsky, I., and M. Schlesewsky. 2009. Processing syntax and morphology: A neurocognitive perspective. Oxford: Oxford University Press. Bowern, C. 2008. Linguistic fieldwork: A practical guide. New York: Palgrave Macmillan. Bruening, B. 2014. Defects of Defective Intervention. Linguistic Inquiry 45: 707–719. Burke, D. M., and M. A. Shafto. 2004. Aging and language production. Current Directions in Psychological Science 13: 21–24. Butler, L. K. 2011. The morphosyntax and processing of number marking in Yucatec Maya. PhD dissertation, University of Arizona. Carreiras, M., J. A. Duñabeitia, M. Vergara, I. de la Cruz-Pavía, and I. Laka. 2010. Subject relative clauses are not universally easier to process: Evidence from Basque. Cognition 115: 79–92. Chafe, W. (ed.). 1980. The pear stories: Cognitive, cultural and linguistic aspects of narrative production. Norwood, NJ: Ablex. Chelliah, S. L. and W. J. de Reuse. 2011. Definition and goals of descriptive linguistic fieldwork. In Handbook of descriptive linguistic fieldwork, 7–31. Dordrecht: Springer. Churchward, C. M. 1953. Tongan grammar. Tonga: Vava’u Press. Clemens, L., J. Coon, P. Mateo Pedro, A. M. Morgan, M. Polinsky, G. Tandet, and M. Wagers. 2015. Ergativity and the complexity of extraction: A view from Mayan. Natural Language and Linguistic Theory 33: 417–467. Clifton, C., and L. Frazier. 1989. Comprehending sentences with long distance dependencies. In G. N. Carlson and M. K. Tannenhaus (eds), Linguistic structure in language processing, 272–317. Dordrecht: Kluwer. Coon, J. 2017. Little‐v0 agreement and templatic morphology in Ch’ol. Syntax 20: 101–137. Coon, J., and O. Preminger. 2017. Split ergativity is not about ergativity. In J. Coon, D. Massam, and L. Travis (eds), The Oxford handbook of ergativity, 226–252. Oxford: Oxford University Press. Coppock, E., and L Staum. 2004. Origin of the English double-is construction. MS, Stanford University. Coppock, E., L. Staum, J. Brenier, and L. Michaelis. 2006. ISIS: It’s not a disfluency, but how do we know that? Paper presented at the 32nd Annual Meeting of the Berkeley Linguistic Society. Curzan, A. 2012. Revisiting the reduplicative copula with corpus-based evidence. In T. Nevalainen and E. C. Traugott (eds), The Oxford handbook of the history of English, 211–221. Oxford: Oxford University Press. Dąbrowska, E. 1997. The LAD goes to school: A cautionary tale for nativists. Linguistics 35: 735–766.

experimental syntax and linguistic fieldwork

123

Dąbrowska, E. 2012. Different speakers, different grammars: Individual differences in native language attainment. Linguistic Approaches to Bilingualism 2: 219–253. den Dikken, M., J. Bernstein, C. Tortora, and R. Zanuttini. 2007. Data and grammar: Means and individuals. Theoretical Linguistics 33: 335–352. Dixon, R. M.W. 1988. A grammar of Bouma Fijian. Chicago: University of Chicago Press. Fanselow, G. 2001. Features, θ-roles, and free constituent order. Linguistic Inquiry 32: 405–437. Featherston, S. 2007. Data in generative grammar: The stick and the carrot. Theoretical Linguistics 33: 269–318. Federmeier, K. D., Segal, J. B., Lombrozo, T., and M. Kutas. 2000. Brain responses to nouns, verbs and class-ambiguous words in context. Brain 123: 2552–2566. Ferreira, F. and B. Swets. 2002. How incremental is language production? Evidence from the production of utterances requiring the computation of arithmetic sums. Journal of Memory and Language 46: 57–84. Fillmore, C. 1992. “Corpus linguistics” or “Computer-aided armchair linguistics.” In J. Svartvik (ed), Directions in corpus linguistics, 35–60. Berlin: de Gruyter. Fischer, E.F., and R. M. Brown (eds). 2001. Maya cultural activism in Guatemala. Austin: University of Texas Press. Foley, S. 2020. Case, agreement, and sentence processing in Georgian. PhD dissertation, University of California at Santa Cruz. Foley, W. A. 1991. The Yimas language of Papua New Guinea. Stanford, CA: Stanford University Press. Gathercole, S. E., S. J. Pickering, C. Knight, and Z. Stegmann. 2004. Working memory skills and educational attainment: Evidence from national curriculum assessments at 7 and 14 years of age. Applied Cognitive Psychology 18: 1–16. Gibson, E., and E. Fedorenko. 2013. The need for quantitative methods in syntax and semantics research. Language and Cognitive Processes 28: 88–124. Gil, D. 2008. How complex are isolating languages? In M. Miestamo, K. Sinnemäki, and F. Karlsson (eds), Language complexity: Typology, contact, change, 109–132. Amsterdam: John Benjamins. Groves, R. M., F. J. Fowler, M. Couper, J. Lepkowski, E. Singer, and R. Tourangeau. 2009. Survey methodology. Hoboken, NJ: Wiley. Gutierrez-Mangado, M. J. 2011. Children’s comprehension of relative clauses in an ergative language: the case of Basque. Language Acquisition 18: 176–201. Hale, K. 1972a. A new perspective on American Indian linguistics. In A. Ortiz (ed), New perspectives on the pueblos, 87–110. Albuquerque: University of New Mexico Press. Hale, K. 1972b. Some questions about anthropological linguistics: The role of native knowledge. In D. Hymes (ed), Reinventing anthropology, 382–397. New York: Pantheon. Hale, K. 1992. Language endangerment and the human value of linguistic diversity. Language 68: 35–42. Henrich, J., Heine, S. J., and A. Norenzayan. 2010. The weirdest people in the world? Behavioral and Brain Sciences 33: 61–135. Hewitt, B. G. 1979. Abkhaz. Amsterdam: North-Holland. Huettig, F., Chen, J., Bowerman, M., and A. Majid. 2010. Do language-specific categories shape conceptual processing? Mandarin classifier distinctions influence eye gaze behavior, but only during linguistic processing. Journal of Cognition and Culture 10: 39–58. Hwang, H., and E. Kaiser. 2015. Accessibility effects on production vary cross-linguistically: Evidence from English and Korean. Journal of Memory and Language 84: 190–204.

124

maria polinsky

Kaiser, E., and J. C. Trueswell. 2004. The role of discourse context in the processing of a flexible word-order language. Cognition 94: 113–147. Kemper, S., M. Thompson, and J. Marquis. 2001. Longitudinal change in language production: Effects of aging and dementia on grammatical complexity and propositional content. Psychology and Aging 16: 600–614. Kibrik, A.E. 2005. Opyt OTiPLa (filfak MGU) v izučenii maloopisannyx jazykov. In A. E. Kibrik (ed), Malye jazyki i tradicii: Suščestvovanie na grani. Vypusk 1. Lingvističeskie problemy soxranenija i dokumentacii malyx jazykov, 53–72. Moscow: Novoe izdatel’stvo. Kluender, R., and S. Gieselman. 2013. What’s negative about negative islands? A re-evaluation of extraction from weak island contexts. In J. Sprouse and N. Hornstein (eds), Experimental syntax and island effects, 186–207. Cambridge: Cambridge University Press. Koizumi, M., Y. Yasugi, K. Tamaoka, S. Kiyama, J. Kim, J. E. Ajsivinac Sian, and P. O. García Matzar. 2014. On the (non-)universality of the preference for subject–object word order in sentence comprehension: A sentence processing study in Kaqchikel Maya. Language 90: 722–736. Kroeger, P. 1993. Phrase structure and grammatical relations in Tagalog. Stanford, CA: CSLI. Kwon, N., M. Polinsky, and R. Kluender. 2006. Subject preference in Korean. In D. Baumer, D. Montero, and M. Scanlon (eds), Proceedings of the 25th West Coast Conference on Formal Linguistics, 1–14. Somerville, MA: Cascadilla Press. Kwon, N., Y. Lee, P. C. Gordon, R. Kluender, and M. Polinsky. 2010. Cognitive and linguistic factors affecting subject/object asymmetry: An eye-tracking study of pre-nominal relative clauses in Korean. Language 86: 546–582. Kwon, N., R. Kluender, M. Kutas, and M. Polinsky. 2013. Subject/Object processing asymmetries in Korean relative clauses: Evidence from ERP data. Language 89: 537–585. Laka, I., M. Santesteban, K. Erdocia, and A. Zawiszewski. 2012. The Basque language in the minds of native and non-native bilinguals. In P. Salaburu and X. Alberdi (eds), The challenge of a bilingual society in the Basque Country, 157–172. Reno: University of Nevada. Lau, E., Socolof, M., Clarke, N., Asatiani, R., Polinsky, M. (2023). A subject relative clause preference in a split-ergative language: ERP evidence from Georgian. Brain and Language 236. Lechner, W. 2013. Diagnosing covert movement: The Duke of York and reconstruction. In L. Cheng and N. Corver (eds), Diagnosing syntax, 158–189. Oxford: Oxford University Press. Levelt, W. J. M. 1989. Speaking: From intention to articulation. Cambridge, MA: MIT Press. Longenbaugh, N., and M. Polinsky. 2016. The processing of long-distance dependencies in Niuean. In H. Hsieh (ed), AFLA 22: The Proceedings of the 22nd Meeting of the Austronesian Formal Linguistics Association, 98–120. Canberra: Australian National University. Longenbaugh, N., and M. Polinsky. 2017. Experimental approaches to ergative languages. In J. Coon, D. Massam, and L. Travis (eds), The Oxford handbook of ergativity, 709–736. Oxford: Oxford University Press. Massam, D. 1999. Thing is constructions: The thing is, is what’s the right analysis? English Language and Linguistics 3: 335–352. Mayer, M. 1969. Frog, where are you? New York: Dial Books. Mazuka, R., K. Itoh, and T. Kondo. 2002. Costs of scrambling in Japanese sentence processing. In M. Nakayama (ed), Sentence processing in East Asian languages, 131–166. Stanford, CA: CSLI. McConvell, P. 1988. To be or double be? Current changes in the English copula. Australian Journal of Linguistics 8: 287–305.

experimental syntax and linguistic fieldwork

125

Neeleman, A., and H. van de Koot. 2008. Dutch scrambling and the nature of discourse templates. Journal of Comparative Germanic Linguistics 11: 137–189. Nichols, J. 1986. Head-marking and dependent-marking grammar. Language 62: 56–119. Norcliffe, E., and F. Jaeger. 2016. Predicting head-marking variability in Yucatec Maya relative clause production. Language and Cognition 8: 167–205. Norcliffe, E., Konopka, A. E., Brown, P., and Levinson, S. C. 2015a. Word order affects the time course of sentence formulation in Tzeltal. Language, Cognition and Neuroscience 30: 1187–1208. Norcliffe, E., F. Jaeger, and A. Harris. 2015b. Cross-linguistic psycholinguistics and its critical role in theory development: Early beginnings and recent advances. Language, Cognition and Neuroscience 30: 1009–1032. O’Neill, T. 2015. The domain of finiteness: Anchoring without tense in copular amalgam sentences. PhD dissertation, CUNY Graduate Center. Orfitelli R., and Polinsky M. 2017. When performance masquerades as comprehension: Grammaticality judgments in non-native speakers. In M. Kopotev, O. Lyashevskaya, and A. Mustajoki (eds), Quantitative approaches to the Russian language, 197–214. London: Routledge. Phillips, C. 2010. Should we impeach armchair linguists? In S. Iwasaki, H. Hoji, P. Clancy, and S.-O. Sohn (eds), Japanese–Korean Linguistics 17, 49–64. Stanford, CA: CSLI. Polinsky, M. 2016. Deconstructing ergativity. Oxford: Oxford University Press. Polinsky, M. 2018. Heritage languages and their speakers. Cambridge: Cambridge University Press. Polinsky, M. 2019. Field stations for linguistic research: A blueprint of a sustainable model. Language 95: e327–e338. Polinsky, M., C. Gomez Gallo, P. Graff and E. Kravtchenko. 2012. Subject preference and ergativity. Lingua 122: 267–277. Potsdam, E., and M. Polinsky. 2011. Questions and word order in Polynesian. In C. MoyseFaurie and J. Sabel (eds), Topics in Oceanic morphosyntax, 83–109. Berlin: Mouton de Gruyter. Preminger, O. 2014. Agreement and its failures. Cambridge, MA: MIT Press. Sakel, J. 2002. A grammar of Mosetén. Berlin: de Gruyter. Sasse, H.-J. 1992. Language decay and contact-induced change: Similarities and differences. In M. Brenzinger (ed), Language death: Factual and theoretical explorations with special reference to East Africa, 59–79. Berlin: Mouton de Gruyter. Sauppe, S., E. Norcliffe, A. E. Konopka, R. D. Van Valin Jr., and S. C. Levinson. 2013. Dependencies first: Eye tracking evidence from sentence production in Tagalog. In M. Knauff, M. Pauen, N. Sebanz, and I. Wachsmuth (eds), Proceedings of the 35th Annual Meeting of the Cognitive Science Society, 1265–1270. Austin, TX: Cognitive Science Society. Sekerina, I. A. 1997. The syntax and processing of scrambled constructions in Russian. PhD dissertation, City University of New York. Sherkina-Lieber, M. 2011. Knowledge of Labrador Inuttitut functional morphology by receptive bilinguals. PhD dissertation, University of Toronto. Sherkina-Lieber, M. 2015. Tense, aspect, and agreement in heritage Labrador Inuttitut. Do receptive bilinguals understand functional morphology? Linguistic Approaches to Bilingualism 5: 30–61.

126

maria polinsky

Sherkina-Lieber, M., A. T. Pérez-Leroux, and A. Johns. 2011. Grammar without speech production: The case of Labrador Inuttitut heritage receptive bilinguals. Bilingualism: Language and Cognition 14: 301–317. Skopeteas, S., and E. Verhoeven. 2009. Postverbal argument order in Yucatec Maya. STUF—Sprachtypologie und Universalienforschung 58: 347–373. Sprouse, J., and D. Almeida. 2012. Assessing the reliability of textbook data in syntax: Adger’s Core Syntax. Journal of Linguistics 48: 609–652. Sprouse, J., and D. Almeida. 2017. Design sensitivity and statistical power in acceptability judgment experiments. Glossa 2(1): 14. http://doi.org/10.5334/gjgl.236. Street, J., and E. Dąbrowska. 2010. More individual differences in language attainment: How much do adult native speakers of English know about passives and quantifiers? Lingua 120: 2080–2094. Tamaoka, K., Sakai, H., Kawahara, J., Miyaoka, Y., Lim, H., and M. Koizumi. 2005. Priority information used for the processing of Japanese sentences: Thematic roles, case particles or grammatical functions? Journal of Psycholinguistic Research 34: 281–332. Tanaka, M. N., Branigan, H. P., McLean, J. F., and Pickering, M. J. 2011. Conceptual influences on word order and voice in sentence production: Evidence from Japanese. Journal of Memory and Language 65: 318–330. Vaux, B., and J. Cooper. 1999. Introduction to linguistic field methods. Munich: Lincom Europa. Vaxtin, N. B., and E. V. Golovko. 2005. Isčezajuščie jazyki i zadači lingvistov-severovedov. In A. E. Kibrik (ed), Malye jazyki i tradicii: Suščestvovanie na grani. Vypusk 1. Lingvističeskie problemy soxranenija i dokumentacii malyx jazykov, 40–52. Moscow: Novoe izdatel’stvo. Warren, K. 1998. Indigenous movements and their critics: Pan-Maya activism in Guatemala. Princeton, NJ: Princeton University Press. Yasunaga, D., M. Yano, Y. Yasugi, and M. Koizumi. 2015. Is the subject-before-object preference universal? An event-related potential study in the Kaqchikel Mayan language. Language, Cognition and Neuroscience 30: 1209–1229.

Annotated bibliography for Part I

..........................................................................................................................

Contributors in this section have compiled brief annotated bibliographies of resources for readers interested in learning how to use the methods discussed in the chapters. The annotated bibliographies are organized below by chapter. ***

Chapter 1. Acceptability judgments (Jon Sprouse)

..........................................................................................................................

Sprouse, Jon. 2021. Online course materials for methods in experimental syntax. https://jonsprouse.com/courses/experimental-syntax/ For readers looking to learn how to design and analyze acceptability judgment experiments, I have created a course that covers design and analysis from start to finish. It includes (i) course slides covering experimental design and statistical analysis, (ii) R scripts for data wrangling, statistical analysis, and generating publication-quality figures, and (iii) the materials and data from a real experiment on island effects that can be used to demonstrate the workflow from start to finish. I will endeavor to keep the most up-to-date version linked on my professional website at all times. Grolemund, Garrett, and Hadley Wickham. 2019. R for Data Science. https://r4ds.had. co.nz/ Every experimentalist must find a set of tools for processing data and generating publication-quality figures. There are several computing language that can serve this purpose, such as Python, Matlab, and R. I personally prefer R because it is free, open-source, and specifically designed for data analysis and visualization. The scripts included in the course above are designed to provide a basic introduction to R that is focused on the tasks that arise in acceptability judgment experiments. For readers who wish to go beyond these scripts, there are any number of free resources online for learning R. A good place to start is the free online book written by the creators of RStudio and the “tidyverse” package: Field, Andy, Jeremy Miles, and Zoë Field. 2012. Discovering statistics using R. Thousand Oaks, CA: Sage. Statistics is a living, and ever-evolving, field of study in its own right. Statistical analysis is thus probably one of the more difficult aspects of experimental methods to learn. The course presents an introduction to statistics that is specifically targeted at syntacticians

128

annotated bibliography for part i

who want to use rating tasks for judgment experiments. For readers interested in going beyond that introduction, this textbook by Fields, Miles, and Fields is a good first introduction to a wide range of analyses in frequentist statistics. It also uses R for analysis and visualization. Maxwell, Scott E., and Harold D. Delaney. 2003. Designing experiments and analyzing data: A model comparison perspective. Abingdon: Routledge. For readers who want to truly understand the theory underlying frequentist statistics, this textbook by Maxwell and Delaney is a terrific next step. It is an advanced text, so I would not necessarily recommend it as a first textbook on statistics. That said, I do recommend that any readers who plan to make frequentist statistics a major component of their analysis pipeline think about adding it to their library. Kruschke, John. 2014. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. New York: Academic Press. Bayesian statistics and frequentist statistics differ in their philosophical approach, and therefore provide different types of information. For readers interested in exploring Bayesian statistics in their research, Kruschke’s textbook is probably the best place to start. It provides a comprehensive introduction to Bayesian statistics, as well as code to implement Bayesian analyses for all of the most common experimental designs. Morey, Richard D. 2019. Using the “BayesFactor” package. https://richarddmorey. github.io/BayesFactor/ Bayes factors provide valuable information in their own right (the odds ratio of the probability of the evidence under the experimental and null hypotheses), and also provide a lower-cost introduction to Bayesian analysis than full-fledged posterior models. For readers interested in Bayes factors, the BayesFactor package in R is a comprehensive solution. The manual for the package includes examples for the most common experimental designs, and links to articles in the primary literature. ***

Chapter 2. Acceptability judgments of binding and coreference: Methodological considerations (Elsi Kaiser and Jeffrey Runner)

..........................................................................................................................

Gordon, P. C., and R. Hendrick. 1997. Intuitive knowledge of linguistic co-reference. Cognition 62: 325–370.

annotated bibliography for part i

129

A foundational article in experimental syntax, especially as regards binding and coreference. The experiments presented in this paper showed that naïve speakers could judge the acceptability of particular binding configurations. The study is notable not just for showing that the constraints of the binding theory could be tested experimentally but also for previewing the fact that constraints other than the binding theory interact with judgments on binding. Although some aspects of Gordon and Hendrick’s experimental designs diverge from present-day best practices (e.g. an apparent lack of fillers, relatively small numbers of critical items), they make other important methodological contributions, for example by exploring the importance of instructions among other things that have become critical concerns in experimental syntax. Cowart, W. 1997. Experimental syntax: Applying objective methods to sentence judgments. Thousand Oaks, CA: Sage. A treasure trove of useful information in designing experiments in syntax, examining a variety of methods, explaining the reasoning behind different analytical techniques, and illustrating these methods with real experiments. A chapter on sampling explains the complexities of shifting from the assumption that sentences are either acceptable or not to understanding how variability can be analyzed to support a theoretical conclusion. Schütze, C. 1996. The empirical base of linguistics. Chicago: University of Chicago Press. A classic treatise on the importance of expanding the methods used in linguistics to include experimental approaches. This work highlights the risks of allowing one’s theory to drive one’s interpretation of data, without the guardrails of objectively obtained data. This and Cowart’s book together helped incite the revolution that is experimental syntax. Schütze, C., and J. Sprouse. 2013. Judgment data. In R. Podesva and D. Sharma (eds), Research methods in linguistics, 27–50. Cambridge: Cambridge University Press. A much-needed state of the art chapter on the (by then) much more developed field of experimental syntax. This work covers many practical issues that experimental work needs to consider, while also keeping in focus the kinds of questions syntacticians need to answer and how best to do so. Topics range from experimental design and choice of task to data analysis and debates about naïve vs. expert participants. This chapter also offers an in-depth discussion of the terms “grammaticality” and “acceptability” and addresses the debate regarding gradient vs. categorical representations. Kaiser, E., J. Runner, R. Sussman, and M. Tanenhaus. 2009. Structural and semantic constraints on the resolution of pronouns and reflexives. Cognition 112: 55–80.

130

annotated bibliography for part i

Illustrates some of the points we make in our chapter about how different methods can and should be used to answer different kinds of questions. The studies present auditory stimuli paired with visual scenes. In one study, the scene builds in the intended interpretation and participants are asked to indicate whether the sentence matches the scene (scene verification, building on the truth-value judgment task); this method can detect acceptable but less preferred interpretations. Another study asks participants to click on a picture representing how they interpret the sentence; this “forced choice” method does a good job of revealing which interpretation is preferred. Later studies in the paper use visual-world eye-tracking to examine participants’ eye movements to intended referents as they listen to stimuli containing reflexives and pronouns (as they select a particular a particular antecedent); this method produces a preferred interpretation for the anaphoric expression, and also provides information about the time-course of how syntactic and semantic information guide participants’ moment-by-moment processing that leads to their final interpretation. Kaiser, E. 2003. The quest for a referent: A crosslinguistic look at reference resolution. PhD dissertation, University of Pennsylvania. An early use of scene verification and visual world eye-tracking (see also Kaiser et al. 2009) to examine both the interpretive preferences of particular pronouns and the timecourse of establishing these interpretations. The dissertation includes experiments on Finnish, which provides both a variety of word orders—with different discourse context sensitivities—as well as a more elaborate set of proforms. The work argues that different types of proforms can exhibit varying levels of sensitivity to different types of information—grammatical function vs. information structure. The work illustrates the value in using experimental techniques to examine a language outside of the usual sphere of languages syntacticians focus on, especially as regards what the more elaborate set of pronouns can reveal about pronoun resolution. Keller, F. 2000. Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. PhD thesis, University of Edinburgh. Tackles the fact that experimental results do not always align with the categorical nature of syntactic theory. The dissertation not only develops a theoretical framework for adapting syntactic theory to reflect this kind of data, but also contains much useful discussion of experimental design and analysis. Chien, Y.-C., and K. Wexler. 1990. Children’s knowledge of locality conditions in binding as evidence for the modularity of syntax and pragmatics. Language Acquisition 1: 225–295. Tests the development of the binding theory in young children. Given the pre-literate nature of young children, Chien and Wexler demonstrate the Truth Value Judgment

annotated bibliography for part i

131

Task, which is an ancestor of the “scene verification” task discussed in our chapter (and in Kaiser 2003 and Kaiser et al. 2009). This task provides a visual representation of the desired interpretation the research aims to examine (e.g. a picture of Mama Bear touching Goldilocks) while the child is asked a question (e.g. “Is Mama Bear touching herself?”), eliciting a “yes” or “no” response and revealing how the child understands the anaphoric expression. We argue that this method is a good way to indicate intended binding configurations without having to use written language and techniques like underlining, bold face, or subscripts/indices to indicate intended coreference. The fact that young children are adept at this method suggests that it is robust and can be adapted to other language communities to test non-written language interpretation. ***

Chapter 3. (Quantifier) scope judgments (Kriszta Eszter Szendrői)

..........................................................................................................................

Ioup, G. 1975. Some universals for quantifier scope. In J. P. Kimball (ed.), Syntax and semantics 4, 37–58. New York: Academic Press. Kurtzman, H. S., and M. C. MacDonald. 1993. Resolution of quantifier scope ambiguities. Cognition 48: 243–279. Let us start by these two classical works on the interpretation of doubly quantified sentences. Ioup’s study reports on a crosslinguistic scaled judgment survey of doubly quantified sentences involving combinations of a wide set of quantifiers. She established the Quantifier Hierarchy and the Grammatical Function Hierarchy, claiming that it is the nature of quantifiers and their grammatical function that are the best predictors of their scopal behavior. Kurtzman and MacDonald were the first to obtain meaningful data about scopal interactions without a metalinguistic judgment task. They used continuation sentences to disambiguate the readings: A kid climbed every tree. The kid was… / The kids were …. It is perhaps no coincidence that the five most influential, innovative, and informative studies on the psycholinguistics of scope ambiguities in adults and children are due to five junior scholars from the years of their doctoral work. This area is fraught with a combination of theoretical complexity and methodological obstacles. It is an area that has been by and large avoided, due to these difficulties it presents, up until the turn of the millennium. But the audacity and fresh thinking of these junior scientists, coupled with the systematic, sustained hard work allowed for by the doctoral years, laid the foundations for this field. Let us take them in turn. Tunstall, S. 1998. The interpretation of quantifiers: semantics and processing. Doctoral dissertation, University of Massachusetts, Amherst.

132

annotated bibliography for part i

Tunstall provides an extensive and thorough review of this early literature including the theoretical literature of the time. In addition, she provides a systematic study on the scopal behavior of English each and every including their links to distributivity. Her work also includes an important methodological innovation adapting Kurtzman and MacDonald’s sentence continuation method to dative sentences and other syntactic constructions. Tunstall’s work was one of the first systematic studies on adult processing of scopal ambiguities. Musolino, Julien. 1998. Universal Grammar and the acquisition of semantic knowledge: An experimental investigation into the acquisition of quantifier–negation interaction in English. PhD dissertation, University of Maryland. Musolino performed a series of truth-value judgment tasks with children investigating their interpretation of sentences involving quantifiers in subject and object position and negation. These were perhaps the first successful experiments to be performed in this area. The interpretation of the findings—that children have an unadult-like preference for overt scope—has since been brought into question, including in later work by the author himself in light of further results. The dissertation remains methodologically important and, in terms of its research questions, foundational in the area of scopal interactions in child language. Anderson, Catherine. 2004. The Structure and Real-Time Comprehension of Quantifier Scope Ambiguity. Evanston: Northwestern University dissertation. Anderson performed a series of offline and online comprehension tasks involving scopal ambiguities using the quantifiers every and a in subject and object position with adults. Her offline methodology avoided using metalinguistic judgments by making use of the fact that the distributive scope reading involves a plurality of referents, following Kurtzmann and MacDonald (1993). The online studies involved self-paced reading, which allows for the simultaneous study of interpretational preferences and measuring the processing cost associated with the reading in question. She studied scopal ambiguities in null context as well as in contexts biasing towards a distributive or non-distributive reading. Her dissertation also offers a highly informed and interesting theoretical account of her findings. Given the breadth, systematicity and precision of the studies, this dissertation deserves to be better known in the field. Her online behavioral results were among the first to appear in the area of scopal ambiguity resolution in adults. Conroy, Anastasia. 2008. The role of verification strategies in semantic ambiguity resolution in children and adults. PhD dissertation, University of Maryland. This dissertation investigated the role of the parser and that of extra-linguistic factors in the processing of scopal ambiguities by both adults and children, with special attention

annotated bibliography for part i

133

to the time course of the scope assignment process. Conroy worked with quantifiers and negation using the truth-value judgment task, a sentence completion task applied under time pressure and without, as well as her own novel design called the Incremental Verification Task. In terms of its breadth of coverage and methodological innovation, the dissertation provides important findings that would be enough for two dissertations. This work, again, deserves to be much better known. Especially its methodological insights seem a highly valuable source of information for future work in this area. Goro, Takuya 2007. Language-specific constraints on scope interpretation in first language acquisition. PhD dissertation, University of Maryland. Another important work in the area of child language acquisition regarding scopal ambiguities is Goro’s dissertation. Goro performed a series of studies on scopal ambiguities with children using a Rigid Scope language, Japanese. He studied the interactions between a universal and an existential in subject and object position as well as sentences with the logical connector ka ‘or’ under negation. His methodology involves an innovative application of the truth-value judgment task in so-called prediction and descriptive modes. The dissertation also has a very strong theoretical component, interpreting the findings in the context of wider theoretical questions. Again, this work deserves to be much better known than it is, and it certainly constitutes a good starting point for anyone interested in the area of scopal ambiguities.

pa rt

ii

...................................................................................................

AC QUISITION METHODS IN S Y N TA C T I C T H E O RY ...................................................................................................

c ha p t e r 5 ...........................................................................................................

b e h av i o r a l acquisition methods w i t h i n fa n t s ...........................................................................................................

laurel perkins and jeffrey lidz

5.1 Introduction

..........................................................................................................................

The study of children’s syntax was long dominated by studies of the sentences they produced (L. Bloom 1970; Braine 1963; Brown and Bellugi 1964; Hyams 1986; Poeppel and Wexler 1993; Snyder 2001; Stromswold 1990). The assumption behind this kind of research was that children’s productions provided straightforward evidence of their grammars. Much of the early research on children’s syntax could thus be described as a kind of corpus linguistics. However, children’s utterances represent an imperfect subset of their grammatical potential. First, a corpus is just a sampling of utterances and hence is unlikely to fully realize the range of structures compatible with a child’s grammar at any one time. Second, independent factors, such as working memory and executive function, can impact children’s abilities to plan and execute an utterance, hence masking aspects of their grammars (P. Bloom 1990; Phillips 1995; Shipley et al. 1969). Third, to the extent that comprehension precedes production, production measures run the risk of underestimating children’s grammatical abilities. Finally, production measures are limited to studying children’s grammars once they have started talking. But surely children attain some grammatical knowledge prior to being able to express it in their utterances. Indeed, the lower bound imposed by production makes it impossible to see the very earliest stages of syntactic development and the processes that precede children’s first multi-word utterances (Hirsh-Pasek and Golinkoff 1996).

138

laurel perkins and jeffrey lidz

In this chapter, we face this lower bound by describing how developmental linguists have probed the growth of grammar in infancy. Such probes typically involve measures of comprehension and attention, measured by eye movements, looking time, or listening time. Of course, just as production studies are limited by performance factors affecting planning and execution, comprehension measures face related challenges from immature sentence processing mechanisms that can hide adult-like grammatical knowledge. Moreover, as in all behavioral studies, correct performance may derive from erroneous knowledge masquerading as adult-like knowledge. To face these challenges, infant researchers rely on short and simple designs that take into account potential interference from extralinguistic factors. To the extent that such factors can be minimized, simple comprehension measures may give us the tools to investigate the very earliest steps children take in acquiring a syntactic system. We review three areas in which progress in understanding infants’ syntactic development has been made. These areas represent a natural starting place for children’s early syntax because they illustrate the most basic properties of any syntactic system: grammatical categories, hierarchical structure, and grammatical dependencies. First, we explore children’s initial steps in acquiring the syntactic categories of their language. To what extent can infants distinguish lexical and functional categories distributionally, and use these distributional properties to make inferences about the syntactic and semantic properties of novel words? How do infants use their knowledge of grammatical categories to constrain online lexical access and comprehension? A good deal of work has probed infants’ sensitivity to subcategories of verbs and how these subcategories relate to verb meaning. Second, we examine children’s early phrase structure representations, especially in the clausal domain. Are children’s earliest syntactic representations hierarchically structured? We further explore when and how children learn the canonical order of subjects, verbs, and objects in their language. Relatedly, we examine whether infants’ early clause representations are complete, and when infants become sensitive to language-specific properties of clauses, such as whether null subjects are licensed. Finally, we turn to infants’ acquisition of grammatical dependencies. We explore when and how infants detect dependencies that hold across non-adjacent morphemes in particular syntactic environments, and we ask how richly they represent these dependencies. We also examine movement dependencies in infancy. We ask whether infants know that only constituents can move and how they go about detecting movement dependencies in the sentences that contain them. Furthermore, we explore infants’ knowledge of binding dependencies, specifically the constraint that a pronoun cannot be referentially dependent on an NP that it c-commands. We examine how online processing can provide insight into the nature of the hierarchical relations that underlie binding dependencies. We hope that this review provides a clear summary of both the prospects and the challenges for examining syntax in infancy. While infant research must face the challenge that infants are limited in their behavioral repertoire, at the same time, studying infant syntax represents the frontier of our knowledge about the emergence of grammar.

behavioral acquisition methods with infants

139

Gaining a richer understanding of infants’ sensitivities and their ability to make inferences from distributional observations to syntactic representations may ultimately help us to better understand how the language faculty allows us to acquire whatever language we are exposed to.

5.2 Syntactic categories and subcategories

..........................................................................................................................

Different words occur in different linguistic environments. For example, arrive can be used after an auxiliary verb like will, but arrival cannot. (1)

a. Elliott will arrive b. *Elliott will arrival

By the same token, arrival can be used after an article like the, unlike arrive. (2)

a. *The arrive of Elliott surprised Grandma b. The arrival of Elliott surprised Grandma

These distributional differences reflect the grammatical categories of the words. Even though arrive and arrival have similar meanings, their different grammatical categories (verb vs. noun) lead to different sentence distributions. Learning which words belong to which grammatical categories is one of the earliest syntactic problems that infants solve. Grammatical categories come in two flavors: lexical and functional. Lexical categories have rich referential content and are open-class, in the sense that new words can be added to those categories freely. Functional categories have less referential content and are closed-class, in the sense that new words in those categories arise only through processes of historical change. Functional categories are generally higher-frequency (numerically) than lexical categories, and as such frequently signal when specific lexical categories are upcoming; for example, determiners are signals for nouns. These signals might provide useful information in helping children categorize novel words. Children’s acquisition of grammatical categories has been a central battleground for debates about the origins of productivity in syntax. Some researchers argue that words acquire their categories by an exemplar-driven process that discovers abstract categories by noticing similarities across words (Meylan et al. 2017; Pine and Lieven 1997; Pine and Martindale 1996; Tomasello 2000; 2003). Others argue that children are biased to find specific categories and that productivity is an automatic consequence of identifying the morphological cues to category membership (Valian 1986; Valian et al. 2009; Yang 2013). The empirical focus of these debates has been about detecting productivity in children’s productions, with different ways of measuring productivity yielding different results. In such a situation, evidence from perception may be more enlightening, as it is not dependent on factors outside of the researchers’ control, like the rates at which children happen to use particular words. Instead, by looking at infant perception we are

140

laurel perkins and jeffrey lidz

able to see what kinds of inferences children make about novel words and what kinds of morphological signals count as the evidence that drives these inferences.

5.2.1 Investigating knowledge of functional vs. lexical categories From early in infancy, children appear sensitive to the differences between function words and content words, which tend to have different acoustic and phonological properties crosslinguistically. Across languages, function words are often unstressed, shorter than content words, have reduced vowels, and appear at prosodic boundaries (e.g. Monaghan, Chater, and Christiansen 2005; Shi, Morgan, and Allopenna 1998). Even newborns demonstrate sensitivity to these differences. In a study by Shi, Werker, and Morgan (1999), newborns heard repetitions of English words selected from an audio recording of natural maternal speech. Infants’ attention to these audio stimuli was tested using a procedure called High-Amplitude Sucking, which measures infants’ sucking strength and rate on pressure-sensitive pacifier. Infants learn that they can control the presentation of an audio stimulus by sucking harder, and the researchers measure how the rate of these high-amplitude sucks declines over time as infants lose attention. Once this rate declines to a certain threshold, infants are considered to be “habituated” to the stimulus, and a new test stimulus is played. If infants consider this new stimulus different from the previous one, they should recover attention (“dishabituate”) and therefore increase their rate of high-amplitude sucks. Shi, Werker, and Morgan habituated infants either to a list of content words or to a list of function words, and then tested them on new words from the same category or the opposite category. Infants who were habituated to content words recovered attention and increased their sucking rate when they heard function words, and vice versa, but did not recover attention when they heard new content words. It therefore appears that newborns are able to discriminate the phonological differences between function and content words. This ability may enable infants to begin categorizing words into functional and lexical categories from the earliest stages of language acquisition. Sensitivity to the acoustic differences between function and content words does not tell us how infants use these differences in building syntactic categories, however. To address this issue, we would need to additionally identify the role that such categories play in word segmentation and word learning. Early on, function words can serve as anchors in the speech stream: 8-month-olds can use known function words to segment new content words (Shi and Lepage 2008), suggesting that function words play a special role in word learning (Christophe et al. 2008; Hochmann et al. 2010). Older infants can use function words as a signal for specific lexical categories (Hicks et al. 2007; Höhle et al. 2004; Shi and Melançon 2010). For example, Hicks, Maye, and Lidz (2007) used a Head-Turn Preference procedure to examine infants’ categorization abilities. In this technique, infants hear speech coming from one of two speakers. When the speech occurs, a light connected to the speaker blinks. As long as infants look towards the light, the speech continues. If they look

behavioral acquisition methods with infants

141

away, the speech stops and a new trial begins. Hicks et al. (2007) familiarized 14- to 16-month-olds with a nonsense word preceded by a determiner (e.g. my kets). Then, infants heard trials with the same nonsense word paired with a different determiner (her kets) or with an auxiliary (will kets). Infants listened longer to words paired with function words from a different category than from the same category. Similarly, infants also listened longer when a familiarized nonsense word preceded by a modal (will dak) later occurred after a determiner (my dak) than when it occurred after a different modal (can dak) (Hicks et al. 2007). This suggests that children use the determiner and auxiliary functional categories to identify the lexical category of an unknown word: Hearing a determiner tells them that the novel word is a noun and therefore should only occur in places where nouns can occur, and hearing an auxiliary tells them that the novel word is a verb and should only occur in places where verbs can occur. Thus, at ages before children reliably produce multiword combinations, we can see that they understand the categorial status of certain function words and the consequences of occurring next to these function words. Using the Conditioned Head Turn procedure (Kuhl 1985), Cauvet et al. (2014) showed that 18-month-old French-learning children use function words to identify known words during language comprehension. In this task, infants are seated at a table with an experimenter who has some toys to hold their attention. Infants learn that if they hear a particular word from a loudspeaker and orient towards it, away from the experimenter, an electronic toy will light up and make noise. In this case, infants were trained to respond to a target noun preceded by a determiner (e.g. la balle ‘the ball’) or a target verb preceded by a pronoun (je mange ‘I eat’). At test, children turned towards the loudspeaker more frequently when the target words were preceded by another word from the correct functional category (une balle ‘a ball’, on mange ‘we eat’) than when they were preceded by a word from the wrong functional category (on balle ‘we ball’, une mange ‘the eat’). This suggests both that infants use function words to parse the speech stream and that they treat function words drawn from the same category in the grammar of the community as a category in their own grammars. Other studies have found that 2-year-olds show better and faster sentence comprehension when singular nouns are preceded by determiners than by ungrammatical or missing function words (Gerken and McIntosh 1993; Kedar et al. 2006; Shipley et al. 1969). Furthermore, children can use functional categories to infer aspects of a content word’s meaning. Although grammatical categories do not correlate perfectly with semantic categories, some imperfect correlations do exist: For example, nouns tend to label object kinds, adjectives tend to label object properties, and verbs tend to label events. Children as young as 1 year old can these correlations to infer whether a novel word labels an object kind or property (Hall et al. 1993; Mintz and Gleitman 2002; L. B. Smith et al. 1992; Taylor and Gelman 1988; Waxman 1999; Waxman and Booth 2001; Waxman and Markow 1998). 12-month-olds who hear an object labeled as a blicket will select another object of the same kind when asked for another blicket (Waxman and Markow 1998). 13-month-olds who hear a purple horse labeled as a daxish one will prefer to select a novel purple object over a differently colored horse (Waxman 1999).

142

laurel perkins and jeffrey lidz

This behavior suggests that 1-year-old infants can distinguish the distribution of nouns and adjectives based on co-occurring functional categories, and use that knowledge to infer that a novel word in a noun context labels an object kind, whereas a novel word in an adjective context labels an object property. Using a Habituation task, He and Lidz (2017) found that slightly older infants are also able to use the presence of functional verbal morphology to identify that a novel word labels an event rather than an object. This method follows the same logic as the High-Amplitude Sucking procedure, but uses infants’ gaze towards a visual display as a measure of their attention. An experimenter live-codes infants’ attention towards the display in a separate room, and the stimulus stops when infants look away for a specified length of time. Infants reach habituation once their attention declines below a particular threshold, upon which researchers measure whether infants dishabituate to a new test stimulus. He and Lidz (2017) habituated 18-month-olds to a scene of a penguin spinning, labeled either by a novel word in a noun context (e.g. It’s a doke) or in a verb context (It’s praching). At test, children saw a scene of the penguin performing a different action, labeled by the same audio. Children dishabituated when they heard It’s praching label that new scene, but not when they heard It’s a doke. These infants appear to have used the co-occuring functional categories to identify whether the novel word was a noun or verb, and therefore what concept it should label. Infants who heard the novel word after a determiner identified the word as a noun and therefore an object name, and were not surprised to hear this word label the same object performing a different action. By contrast, infants who heard the novel word in a verbal context, after the auxiliary is and with the inflectional suffix -ing, identified the word as a verb and therefore an event name, and were surprised to hear this word label a different action. Identifying the signals of a new word’s grammatical category—its distributional context and co-occurring function words—allows children to both categorize and make inferences about the meaning of that word. These experimental results show us that children’s knowledge about grammatical categories in their language goes beyond the distribution of these categories, and includes information about other syntactic or interpretive properties of these categories. Before they productively combine words into phrases, children know that nouns label objects, adjectives label object properties, and verbs label events.

5.2.2 Syntactic bootstrapping: Sensitivity to subcategories We’ll now turn to infants’ knowledge of the properties of subcategories of lexical items, reflected in the argument-taking properties of particular predicates. Under prominent theories of verb learning, infants use the syntactic properties of verbs to infer aspects of their meanings. This is syntactic bootstrapping: if children are aware of the relations between verbs’ syntactic distributions and their meanings, and can observe those syntactic distributions, then they might be able to use those distributions to narrow down the candidate meanings of novel verbs (Gleitman 1990; Landau and

behavioral acquisition methods with infants

143

Gleitman 1985; Lasnik 1989). Although initially proposed as a theory of verb learning, this term has also been used to describe other cases in which children infer aspects of a word’s meaning by using information about its syntactic distribution (Brown 1957). How can we tell whether infants are sensitive to the syntactic properties of particular verbs, and whether they can use those properties to draw inferences about verb meanings? Many approaches to this question have considered infants’ sensitivity to transitivity. Because causal events tend to be described by transitive clauses crosslinguistically, these studies have asked whether infants infer that a novel verb in a transitive clause is likely to label a causal event involving both an agent and a patient, whereas a novel verb in an intransitive clause is not (e.g. Arunachalam and Waxman 2010; Brandone, Addy, Pulverman, Golinkoff, and Hirsh-Pasek 2006; Fisher, Gertner, Scott, and Yuan 2010; Naigles 1990; Noble, Rowland, and Pine 2011; Pozzan, Gleitman, and Trueswell 2015; Yuan and Fisher 2009; Yuan, Fisher, and Snedeker 2012). The primary method used in these studies is called Intermodal Preferential Looking (Golinkoff et al. 1987): An auditory linguistic stimulus is played in the context of two visual stimuli, and infants’ eye movements are recorded by a hidden camera. An experimenter then codes these eye movements frame by frame in order to determine the proportion of time infants look towards one visual stimulus versus the other, out of the total time spent looking at either stimulus. These looking preferences are taken as evidence for how infants interpreted the linguistic stimulus, under the assumption that infants will look longer at the image or scene that they perceive as a better match for the audio they are hearing. This assumption was originally established in non-linguistic tests of this method (Spelke 1976). A related method, the Looking While Listening paradigm, aims to provide a finer-grained measure of looking preferences by analyzing the time-course of looking to each visual stimulus on a frame-by-frame basis, time-locked to the unfolding audio stimulus (Fernald et al. 2008). In one of the first studies to use a preferential looking method to investigate verb learning, Naigles (1990) presented 25-month-olds with a novel verb in the context of two scenes: a causal scene intended to be viewed with two participants (a duck pushing a bunny over), and a non-causal scene intended to be viewed as two separate one-participant events (a duck and a bunny each wheeling their arms independently). Naigles measured infants’ looking preferences as a function of whether they heard the novel verb in a transitive clause or an intransitive clause. Infants who heard The duck is gorping the bunny looked longer at the pushing scene, and infants who heard The duck and the bunny are gorping looked longer at the arm-wheeling scene. It therefore appears that infants were sensitive to the syntactic frame of the novel verb, inferring that gorp in a transitive frame was more likely to label the causal event, whereas gorp in an intransitive frame was more likely to label the non-causal event. These results supported an influential hypothesis about how infants use the syntactic properties of verbs to draw inferences about meanings. Under this hypothesis, infants take the nouns (or noun phrases) in a clause to be arguments, and expect the number of arguments in a clause to match one-to-one the number of participants in the event the clause describes (Fisher 1996; Gleitman 1990; Naigles 1990). Thus, a transitive clause

144

laurel perkins and jeffrey lidz

with two arguments should label an event perceived with two participants, whereas an intransitive clause with only one argument should label an event perceived with one participant. This is a potentially powerful learning strategy for infants at early stages of syntactic development because it requires very little syntactic knowledge: In order to narrow down the candidate events a clause refers to, infants only need to be able to identify the number of nouns or noun phrases in the clause, and do not need to identify their thematic roles or hierarchical position in the clause. Extensive tests of this hypothesis have corroborated that infants as young as 22 months are sensitive to transitivity, and will infer that a novel transitive verb labels a causal event (Arunachalam and Waxman 2010; Brandone et al. 2006; Fisher et al. 2010; Noble et al. 2011; Pozzan et al. 2015; Yuan et al. 2012; Yuan and Fisher 2009). It furthermore appears that children are able to draw this inference on the basis of distributional information alone. Yuan and Fisher (2009) familiarized 28-month-olds with short dialogues containing novel transitive or intransitive verbs, without any informative visual context. At test, infants were then asked to identify the referent of the novel verb (e.g. Find blicking!) while viewing two candidate events, one causal and one non-causal. Infants who had heard the transitive dialogues looked longer at the causal event than infants who had heard the intransitive dialogue. This indicates that they had tracked the syntactic properties of the novel transitive verb and used those properties to draw inferences about its possible meanings, even without the support of referential context. However, beyond Naigles’ (1990) seminal study, further work has found inconsistent behavior with intransitive verbs. Infants who hear novel verbs in intransitive frames do not show a reliable preference for events intended to be viewed with one participant as opposed to two (e.g. Arunachalam and Waxman 2010; Noble, Rowland, and Pine 2011; Yuan et al. 2012). Because these results are not predicted under the participant-to-argument matching hypothesis, several methodological explanations have been proposed. First, many studies use intransitive sentences with conjoined subjects (e.g. The duck and the bunny are gorping) in order to control the number of nouns across conditions. It is possible that infants may not reliably perceive these sentences as intransitive: if they mistake the conjoined subject for two separate arguments, this might lead them to infer a causal meaning for the verb (Gertner and Fisher 2012; Yuan et al. 2012). Alternatively, it is possible that infants do not reliably perceive the presented scenes under the intended event representation. If infants conceptualize a scene of one actor pushing another as an event of two actors playing, then they might consider the intended “two-participant” scene a good referent for a novel intransitive verb (Arunachalam et al. 2016; Brandone et al. 2006; Pozzan et al. 2015). These concerns highlight the importance of carefully controlling for how children perceive both the linguistic and visual stimuli in a preferential looking task. But it is also possible that infants’ behavior with intransitives is due not to methodological confounds, but instead to an alternative weaker learning strategy. If infants merely expect that each argument of a clause will name an event participant, without necessarily matching participants one-to-one, then either a one-participant or a twoparticipant event could be a potential referent for an intransitive clause (Williams 2015).

behavioral acquisition methods with infants

145

Further work is therefore necessary to determine the specific inferences infants draw on the basis of hearing verbs in transitive vs. intransitive frames, and whether a participantto-argument matching strategy best characterizes of infants’ behavior across different clause types. This question is explored in Perkins (2019). We’ve seen that infants can use transitivity information to draw inferences about verb meanings. Can they use information beyond the number of arguments in the clause, and draw inferences on the basis of which particular arguments are present? Crosslinguistically, subjects of transitive clauses tend to label agents of causal events, and objects tend to label patients (M. C. Baker 1988; Dowty 1991; Fillmore 1968; Jackendoff 1972). If an infant can identify the subject and object in a transitive clause, she may be able to infer that the clause labels not just any causal event, but one in which the referent of the subject is the agent and the referent of the object is the patient. Gertner, Fisher, and Eisengart (2006) tested the ability of 2-year-olds and 21-month-olds to draw this inference using another preferential looking task. Infants heard a transitive sentence (e.g. The duck is gorping the bunny) in the context of two causal scenes: one in which a duck pushed a bunny, and one in which the bunny pulled the duck. Both groups of infants looked preferentially at the scene in which the duck was the agent, indicating that they knew that the subject of a transitive clause labels the agent rather than the patient of a causal event. Furthermore, infants preferred the duck-agent and bunnypatient event even for sentences like He is gorping the bunny: Here, they could only rely on the referent of the object because the subject does not identify a unique referent in the discourse. This indicates that infants knew that the object of a transitive clause labels the patient rather than the agent of a causal event. These infants were able to exploit relationships between argument position (subject vs. object) and argument roles (agent vs. patient) in order to constrain the inferences they draw about transitive verb meanings. For intransitive verbs these relationships are more complicated: the subject of an intransitive clause can label either an agent (e.g. John baked) or a patient (e.g. The bread rose). These sub-classes of intransitives also display differences in meaning: Intransitives whose subject is an agent tend to label actions of that agent, whereas intransitives whose subject is a patient tend to label changes undergone by that patient (e.g. Fillmore 1970; Levin and Hovav 2005; Williams 2015). Another line of work has asked whether children can draw these finer-grained inferences about verb meanings on the basis of the thematic role of the intransitive subject (Bunger and Lidz 2008; 2004; Naigles 1996; Scott and Fisher 2009). For example, Scott and Fisher (2009) familiarized 28-montholds with a dialogue in which a novel verb alternated between transitive and intransitive uses. Infants either heard the intransitive with an animate subject (e.g. Matt dacked the pillow. He dacked) or an inanimate subject (e.g. Matt dacked the pillow. The pillow dacked). At test, infants heard the verb in a transitive frame in the context of two “two participant” scenes: a caused-motion event in which a girl pushes a boy over, or a contact-activity event in which the girl dusts the boy with a feather duster. Infants who were exposed to the animate-subject intransitive dialogue preferred to look at the contact-activity event, whereas infants who were exposed to the inanimate-subject dialogue preferred to look at the caused-motion event. These infants were able to use cues

146

laurel perkins and jeffrey lidz

to the thematic role of the intransitive subject, such as its animacy, to infer whether the novel verb labeled an action of an agent or a change undergone by a patient. Thus, infants between 21 months and 2 years appear sensitive not only to the number of arguments in a clause, but also to the thematic roles of those arguments in drawing inferences about verb meanings. Infants can use cues such as argument position and animacy to infer whether an argument in a clause labels an agent or a patient in an event, constraining the type of events that the clause is likely to label. And syntactic bootstrapping doesn’t end with simple transitive and intransitive clauses: Additional questions remain about how infants map sentences to events with three participants (Wellwood et al. 2015; Perkins 2019), and how children infer particular mental states from the types of complements that attitude verbs embed. The latter question has been investigated extensively in preschoolers (Fisher et al. 1991; Gleitman et al. 2005; Hacquard 2014; Harrigan et al. 2016; White et al. 2018); see Chapter 6 of this volume for more information. These observations invite further investigation into the nature of the inferences infants draw on the basis of the argument-taking properties of new verbs, and the syntactic representations these inferences are drawn from.

5.2.3 Summary In this section we’ve reviewed behavioral evidence for the development of syntactic category knowledge in infancy. Sensitivity to the co-occurrence patterns of categories like determiners, nouns, and verbs tells us when infants have gained awareness of how categories distribute in their language, and sensitivity to the syntactic and interpretive properties of these categories tells us when infants represent these categories with the same features that adults do. We’ve further examined infants’ sensitivity to lexical subcategories, revealed through the inferences infants draw from the argument-taking properties of verbs to the types of meanings those verbs can have. The studies we’ve reviewed here show that a great deal of syntactic category development takes place before children consistently produce these categories in their own speech, demonstrating the importance of comprehension measures in assessing the full extent of children’s grammatical knowledge. But open questions remain—in particular, how rich are children’s representations of these lexical categories and their combinatorial properties? We’ll now explore this latter question in more detail as we turn to children’s acquisition of clause structure.

5.3 Clause structure

..........................................................................................................................

Children’s first multi-word utterances, shortly before their 2nd birthday, are often heralded as the first evidence of syntactic development beyond the word level. The ability to combine subjects with predicates, verbs with objects, indicates that a child has gained

behavioral acquisition methods with infants

147

knowledge of not only the syntactic properties of individual words, but also the syntactic properties of phrases and clauses. As we observe this ability emerging, what can we conclude about the nature of children’s early clause structure representations? Do they show sensitivity to the properties that constrain clause structure cross-linguistically, such as hierarchical structure and the role of functional elements like tense? Do they show sensitivity to the properties specific to the child’s target language, such as word order, overt tense marking, and obligatory vs. null subjects? Here, much of the literature has debated the evidence from children’s productions. Children’s earliest combinatorial speech is far from adult-like, frequently omitting elements required in the grammar of the target language (e.g. Brown 1973). From this early production data, it is tempting to conclude that children’s clause structure knowledge is quite incomplete at the age of two years. Yet comprehension studies suggest that this is not the full story. By the age of 18 months, children already demonstrate knowledge of the hierarchical structure of phrases and the order of subjects, verbs, and objects in their language (Hirsh-Pasek and Golinkoff 1996; Lidz et al. 2003). 20-month-olds potentially represent even more complex clausal structures such as wh-questions (Gagliardi et al. 2016), as we’ll discuss in Section 5.4. The debate over children’s early clause structure knowledge thus serves to illustrate the difficulty of drawing conclusions about children’s linguistic representations from their behavior—and in particular, the challenge of separating the contributions of grammatical knowledge and other cognitive and linguistic factors in young children’s early production data.

5.3.1 Comprehending basic clause structure: Subjects and objects Cross-linguistically, subjects and objects are represented in an asymmetrical hierarchy within a clause: verbs and objects form a constituent to the exclusion of the subject (e.g. Baker 2001). But this underlying hierarchy is realized in different word orders from language to language. English typically displays SVO order: Subjects precede verbs, which precede objects. Because we know this canonical word order, we know that the dog is the subject and not the object in the sentence The dog bit the cat. This word order varies across languages: SOV word order is dominant in Japanese, VSO is common in Irish, and VOS in Malagasy. In order to arrive at the correct representation of clauses in their language, children must identify the relative order of the subject and object. How can we tell when children know the canonical word order of their language? Children’s utterances display the correct word order of their language as early as it can be observed—from the onset of combinatorial speech (L. Bloom 1970; Brown 1973). This knowledge must therefore be acquired before children begin producing sentences, requiring us to look prior to children’s sentence productions at their early sentence comprehension. One approach is to probe children’s sensitivity to the interpretive consequences of being a subject or an object. Subjects of active, transitive clauses tend to label agents of causal events, and objects tend to label patients—patterns that hold

148

laurel perkins and jeffrey lidz

robustly across the world’s languages (M. C. Baker 1988; Dowty 1991; Fillmore 1968; Jackendoff 1972). If children are aware of these tendencies, then identifying the order of subjects and objects in a sentence may allow them to draw inferences about the likely thematic roles of the entities named by those arguments. And by observing the inferences children draw, we as researchers can infer which arguments in the clause children take to be subjects, and which they take to be objects. Hirsh-Pasek and Golinkoff (1996) found that children could identify the order of subjects and objects in English as early as 17 months. They used a preferential looking task, in which children heard a transitive sentence (e.g. Big Bird is washing Cookie Monster) while viewing two scenes: one in which Big Bird was washing Cookie Monster, and one in which Cookie Monster was washing Big Bird. Children looked more at the scene where Big Bird was the agent when they heard Big Bird is washing Cookie Monster, and they looked more at the opposite scene when they heard Cookie Monster is washing Big Bird. This behavior suggests that children had identified the canonical word order of English: they knew that subjects precede objects, and inferred that the individual named by the subject was the agent of the event. In the previous section, we saw that older children could use this knowledge to constrain their hypotheses about the meanings of new verbs: 2-year-olds who heard The duck is gorping the bunny looked longer at an event in which the duck was the agent, rather than the bunny (Gertner et al. 2006). These looking-time studies provide evidence that children can identify the order of subjects and objects in sentences during their second year of life, and can use that order to draw inferences about sentence meanings. How this early understanding develops is still an open question. One hypothesis proposes that children might bootstrap into the word order of their language by inferring the thematic relations in sentences they hear—a form of semantic bootstrapping (Grimshaw 1981; Pinker 1984; 1989). Suppose a child represents a scene of a dog biting a cat as an event where the dog is the agent and the cat is the patient, and knows that agents are typically realized as subjects and patients as objects of transitive sentences. If that child hears the sentence The dog bites the cat to describe this scene, and knows that the dog labels the dog and the cat labels the cat, then she can infer that the dog is the subject and the cat is the object. She may then assume that English has SVO word order. This hypothesis rests on several critical assumptions: (1) that children perceive scenes in the world under conceptual structures differentiating thematic roles like “agent” and “patient”; (2) that these conceptual structures align straightforwardly with at least some of the sentence descriptions they hear; and (3) that children are aware of the mapping between conceptual and linguistic structure, specifically how agents and patients tend to be realized in particular argument positions in a clause. Investigating each of these assumptions is necessary in order to demonstrate the viability of the semantic bootstrapping hypothesis. Research with prelinguistic infants indicates that the first assumption is borne out: 6-month-olds represent agents as distinct from patients in events, attributing to them goals and intentions (Csibra et al. 1999; Leslie 1995;

behavioral acquisition methods with infants

149

Leslie and Keeble 1987; Luo et al. 2009; Woodward 1998). Further research is needed to confirm the second and third assumptions—particularly how children may handle challenges from so-called “non-basic” clauses like passive sentences, which obscure the mappings between argument positions and argument roles (Lidz and Gleitman 2004; Perkins et al. 2017; Pinker 1984), and from reversible predicates like chase and flee, which describe the same event from two different perspectives (Gleitman 1990). Because not all sentences children hear will provide equally informative data for inferring word order, it is necessary to determine whether children are able to ignore data that is uninformative for drawing these inferences. A further question is how children represent subjects and objects as they learn their relative order in sentences. Do children begin by only representing the linear order of noun phrases (e.g. Fisher 1996), or do they represent these phrases qua subjects and objects, within a hierarchical clause structure? At the heart of this question is whether children’s syntactic representations are constrained to be hierarchically structured from the earliest stages of development (Chomsky 1975). Lidz, Waxman, and Freedman (2003) tested this question at the level of the noun phrase, using anaphora to probe 18-month-olds’ representations of phrases like a yellow bottle. Consider the sentence I’ll give Adam this yellow bottle, and I’ll give you that one. The word one refers not merely to another bottle, but to another yellow bottle. Because one is anaphoric to yellow bottle, adults’ noun phrase representations must be hierarchically structured: yellow bottle must be a nested constituent in the phrase [this [yellow bottle]]. Infants’ interpretations of the word one should thus reveal whether they, too, represent this phrase with nested structure. The researchers used a preferential looking task to investigate these interpretations. First, they familiarized infants with a picture of a yellow bottle labeled with a determiner–adjective–noun sequence: Look! A yellow bottle. Then, they showed a display containing both a yellow bottle and a blue bottle, and measured infants’ looking preferences upon hearing a sentence with anaphoric one (Do you see another one?) or a control sentence (What do you see now?). Infants in the anaphoric one condition looked more at the yellow bottle, whereas infants in the control condition looked more at the novel blue bottle. This indicates that infants interpreted one as anaphoric to yellow bottle—and therefore that their noun phrase representation contained yellow bottle as a nested constituent. At least by the age of 18 months, infants represent phrases with internal hierarchical structure. It still remains to be determined whether hierarchical structure extends above the phrasal to the clausal level in children’s early syntactic representations. Suggestive evidence for clause-level hierarchical structure comes from work on children’s knowledge of constraints on pronoun interpretation, in particular Principle C (Lukyanenko et al. 2014; Sutton 2015; Sutton et al. 2012). If children’s early interpretations are constrained by Principle C, a constraint defined over hierarchical structure, this would indicate that their clause representations are hierarchically structured. But testing for knowledge of Principle C in infants is not a trivial task. We’ll return to this question in Section 5.4.

150

laurel perkins and jeffrey lidz

5.3.2 The view from production data: Telegraphic speech Although open questions remain, the comprehension studies reviewed above support the view that a good deal of clause structure knowledge is in place even before children begin combining words into sentences in their own productions. However, when children do begin producing sentences, those utterances are “telegraphic,” omitting clausal elements that are required by the adult grammar (Brown, 1973). On one view, these non-adultlike productions indicate that children’s clause structure knowledge is quite incomplete even through their third year of life (e.g. Brown 1973; Guasti 2002; Guilfoyle and Noonan 1988; Radford 1990; Rizzi 1994), a position inconsistent with the view from comprehension described above. How do we reconcile these two positions? Experimental methods may allow us to gain a fuller understanding of the factors underlying children’s early telegraphic productions. Shipley, Smith, and Gleitman (1969) conducted one of the first experiments to address the following question: Do children’s telegraphic utterances reflect immature grammatical knowledge or an immature performance system? The researchers reasoned that a child who produces telegraphic utterances due to immature grammatical knowledge would show better comprehension of telegraphic commands (e.g. Throw ball! or Ball!) compared to well-formed commands (e.g. Throw me the ball!). That is, if children’s grammatical competence at this age is limited to generating telegraphic sentences, then their comprehension should be similarly limited. The researchers gave both telegraphic and well-formed commands to seven 1- to 2-year-old children whose own utterances were telegraphic, and measured how frequently each child obeyed the commands. Children who produced telegraphic speech did not show improved comprehension of telegraphic commands—they actually obeyed the telegraphic commands less frequently than the well-formed commands. This result suggests that children’s own telegraphic speech is not primarily the result of immature grammatical competence. Instead, a variety of interacting factors may be at play, including the developing extralinguistic cognitive systems that allow children to deploy their grammatical competence during real-time speech production. This study highlights the difficulty in teasing apart the relative contributions of grammatical competence and performance in children’s productions, and the role that experimental work can play in understanding this complex dynamic. One phenomenon in children’s telegraphic speech that has been hotly debated is the so-called “root infinitive” stage, during which children sometimes use the infinitive form of main clause verbs instead of the tensed form (Bar-Shalom and Snyder 1997; Haegeman 1995; Harris and Wexler 1996; Platzack 1990; Schaeffer and Ben Shalom 2004; Weverink 1989; Wexler 1994). The duration of this stage appears to vary crosslinguistically: It is rare in Italian-speaking children (Guasti 1993) and may extend past the 4th birthday for Dutch and English-speaking children (Haegeman 1995; Harris and Wexler 1996; Phillips 1995). Many accounts have taken this phenomenon as evidence for immature clause structure knowledge: If children’s early productions lack tense morphology, then perhaps their early clause structure representations lack tense,

behavioral acquisition methods with infants

151

or their tense representations are in some way immature (e.g. Guasti 2002; Guilfoyle and Noonan 1988; Radford 1990; Rizzi 1994; Wexler 1994). The large majority of the work on root infinitives has focused on analyzing patterns of verb productions in children’s spontaneous speech to support different theories of grammatical development. German-speaking children’s differentiation between finite and non-finite verbs when producing V2 sentences has been taken as evidence for early (although potentially immature) representations of tense, contra accounts that children’s early clause structure lacks tense altogether (Guasti 2002; Phillips 1995; Poeppel and Wexler 1993; Wexler 1998). Correlations between the length of the root infinitive stage and the rate of unambiguous cues to overt tense-marking in the input have been taken as evidence that children in this stage are learning whether or not their target language marks tense overtly, rather than developing mature tense representations (Legate and Yang 2007). Yet few experimental studies have been conducted to test differing accounts of infants’ grammatical development in the lab. To date, experimental work on root infinitives has been conducted only with older, preschool-aged children (Grinstead et al. 2009; Pratt and Grinstead 2007a; 2007b; Rice et al. 1998; 1999), using indirect grammaticality judgment methods that will be described in Chapter 6 of this volume. Furthermore, as Shipley, Smith, and Gleitman (1969) demonstrated, it is not easy to isolate grammatical development from developing extralinguistic factors that contribute to speech production when studying spontaneous speech corpora. The puzzle of root infinitives thus invites further experimentation, particularly with methods appropriate to younger children, in order to determine the factors responsible for both their appearance in infancy and their disappearance in later childhood. The debate over root infinitives illustrates the difficulty about drawing conclusions from a child’s productions about the linguistic knowledge that underlies those productions. Similar themes emerge in the literature on another hotly debated phenomenon in early child speech: so-called early “null subjects,” in which children omit the subjects of main clauses even in languages like English that require them (e.g. Bowerman 1973; Hyams 1986). Like root infinitives, these omitted subjects were initially claimed to reflect immature grammatical knowledge. On one class of accounts, children are in the process of learning whether or not their language requires overt subjects (Hyams 1986; 1992; Hyams and Wexler 1993; Wexler 1998; Yang 2002); on another, null subjects are the result of immature clause structure representations that (perhaps optionally) lack functional projections such as tense that host subjects in adult grammars—a hypothesis motivated in part by the overlap between early null subjects and the root infinitive stage (Guasti 2002; Guilfoyle and Noonan 1992; Radford 1990; Rizzi 1993; 2005; Wexler 1994; 2014). However, other work has found that children acquiring languages that require overt subjects drop them less frequently than children acquiring languages that do not (Kim 2000; Valian 1991; Valian and Eisenberg 1996; Wang et al. 1992), and are sensitive to the discourse contexts in which they are licensed (Allen 2000; Clancy 1993; Serratrice 2005). This raises the possibility that children are sensitive to their target grammar’s requirements for overt subjects, even if their own productions do not always satisfy these requirements.

152

laurel perkins and jeffrey lidz

Experimental investigations into early null subjects have implicated a range of factors beyond developing grammatical knowledge that may contribute to this phenomenon. Gerken (1991; 1994) used a method called Elicited Imitation, in which children are asked to imitate sentences produced by a puppet. Within the test sentences, the researcher systematically manipulates specific variables of interest. Gerken manipulated whether the test sentences had a full lexical subject (e.g. the bear) or a pronominal subject, and found that 2-year-olds were more likely to produce full lexical subjects than pronominal subjects in their own imitations. Because the type of noun phrase is not predicted to affect the rate of subject production under grammatical competency-based accounts, Gerken concluded that performance and prosodic factors were responsible: English-learning children with limited processing resources may prefer to align their productions to a dominant strong–weak syllable pattern, and thus preferentially omit unstressed subjects that are the first rather than the second syllable of a prosodic foot. These prosodic preferences may interact with information structure: Sentence elements that convey less information may be preferentially dropped if processing resources are taxed, yielding more subject than object omissions (subjects tend to convey “given” information more frequently than objects) and more pronominal than lexical subject omissions (Allen 2000; Clancy 1993; Serratrice 2005; Valian and Eisenberg 1996). The same experimental method has been used to probe further effects of these limited processing resources (Nuñez del Prado et al. 1993; Valian et al. 1996; Valian and Aubry 2005). Valian, Hoeffner, and Aubry (1996) tested whether sentence length would affect children’s subject imitations. If developing memory or other cognitive resources are responsible for children’s early null subjects, then taxing these resources in the production of longer sentences might lead to higher rates of subject omissions (P. Bloom 1990; Valian 1991). The authors found that 2-year-olds imitated fewer subjects from long sentences than from short ones. However, this effect was only observed for children whose mean length of utterance in spontaneous speech (MLU) was below 3, suggesting that children’s developing extralinguistic cognitive capacities contribute to their ability both to produce subjects and to produce longer sentences in their own spontaneous speech. A follow-up study tested whether a second chance to imitate the target sentence, after having already parsed it once, would lead to increased subject production due to reduced cognitive demands (Valian and Aubry 2005). Children did indeed increase their production of pronominal and expletive subjects when given a second chance to imitate the target sentence, further pointing to the role of extralinguistic cognitive capacities in their early subject omissions. These experimental studies use controlled production tasks to demonstrate that factors unrelated to early clause structure knowledge—such as subject type, sentence length, and the opportunity to repeat a sentence twice—affect the rates at which young children omit subjects in their early speech. Their results indicate that systems outside of core grammatical competency may contribute to children’s early subject omissions, including developing cognitive resources, information structure, and prosodic sensitivities. Yet they do not pinpoint exactly which system or systems are primarily

behavioral acquisition methods with infants

153

responsible: Whether early null subjects can be traced to a single source or stem from a combination of factors remains an open question. More recently, an attempt has been made to investigate how children comprehend sentences with missing subjects. Orfitelli and Hyams (2012) used a variant of a Truth Value Judgment Task (see description in Chapter 6 of this volume) to test whether 2- to 5-year-olds would interpret subjectless sentences like Play with blocks as declaratives or imperatives. The researchers found that the youngest children failed to treat these sentences as imperatives when they did not include a clear imperative marker (e.g. please), which they took as evidence that young children’s grammars allow subjectless declaratives. However, they also found inconsistent imperative interpretations up to the age of 4, raising the possibility that the pragmatic requirements of this task may have been particularly challenging for this age (Valian 2016). This topic awaits further research, with comprehension methods tailored to children’s developing cognitive abilities. The debate over early subject omissions further illustrates the difficulty of isolating the factors responsible for children’s telegraphic speech: does this non-adultlike behavior reflect immature clause structure knowledge, or interference from other linguistic and extralinguistic systems that support sentence production? As in the case of root infinitives, this question is difficult to answer from spontaneous speech data alone, but experimental methods may provide greater insight into the relative contributions of developing grammatical knowledge and developing performance systems in children’s early productions.

5.3.3 Summary How complete are children’s earliest clause structure representations? We’ve seen different perspectives from comprehension and production data. On the one hand, much of children’s clause structure knowledge appears to be in place even before the onset of combinatorial speech: Children are sensitive to the order of subjects and objects in their language, and represent phrases with hierarchical structure. On the other hand, children’s early sentences are strikingly incomplete, omitting required grammatical elements such as tense marking and overt subjects. This apparent conflict illustrates the challenges of inferring children’s grammatical knowledge from their behavior, which may not faithfully reflect that grammatical knowledge. For phenomena like root infinitives and early null subjects, it remains an open question how best to isolate the contribution of the grammar from the contribution of other cognitive systems that interact in sentence comprehension and production.

5.4 Syntactic dependencies

..........................................................................................................................

Syntactic dependencies are relations between elements in a clause or across clauses, determined by the syntactic properties of those elements and the structures they occur

154

laurel perkins and jeffrey lidz

in. Here we will consider infants’ knowledge and acquisition of three kinds of dependencies: morphosyntactic dependencies, movement dependencies, and referential dependencies. Morphosyntactic dependencies express an abstract grammatical relation, such as agreement or selection, through morphological means. For, example, in (3a), there is a dependency between the auxiliary verb is and the -ing form of the verb, which work together to tell us that the sentence is in the present progressive. (3)

a. Jane is playing the piano b. Jane is softly playing the piano c. Jane is softly and beautifully playing the piano.

Such a dependency represents a head-to-head relation between the auxiliary and the main verb. This type of relation can hold across intervening material, as in (3b,c). A second type of dependency occurs in questions like (4). (4)

a.

Which sonata is Jane playing __ tonight?

b. *Which sonata is Jane playing the piano tonight? Here there is a dependency between the “wh-phrase” which sonata and the verb play: the verb play requires a direct object, typically to its right, and that requirement is satisfied by the wh-phrase, despite it not occurring to the right of the verb. Indeed, the wh-phrase cannot occur if there is an object in the postverbal position, as shown in (4b). We also find this type of relation in relative clauses (5a), clefts (5b), and topicalization (5c), among other constructions. (5)

a. I love the sonata that Jane is playing __ tonight. b. It is my favorite sonata that Jane is playing __ tonight. c. That sonata, Jane is playing __ tonight.

These kinds of dependencies can hold across an unbounded degree of intervening material (6a,b), but cannot hold across certain “island” configurations (6c,d) (Ross 1967). (6)

a. Which sonata did Tony think that the program said that Jane was playing __ tonight? b. I love the sonata that everyone believed that the critics wanted Jane to play __ tonight. c. *Which sonata do you wonder why Jane is playing __ tonight d. *I love the sonata that you wonder why Jane is playing __ tonight

Because the object of the verb appears to have “moved” to a different position in these sentences, these dependencies are called movement dependencies. They’re also frequently called “filler-gap dependencies” because the moved element is a “filler” that becomes associated with a “gap” later on in the sentence.

behavioral acquisition methods with infants

155

An important property of both morphosyntactic and movement dependencies is the fact that they are defined over the hierarchical structure of elements in a sentence (Chomsky 1975). In other words, the relations that elements of a sentence can enter into depend on their structural positions with respect to each other. For instance, the dependency between is and -ing holds only between the auxiliary (be) and its verbal complement. Thus, no verbs or other auxiliaries can intervene between the auxiliary and the verb bearing the -ing (7). (7)

a. *Jane is try to eating her pizza. b. *Jane is might playing the piano c. Jane might be playing the piano

Similarly, movement dependencies are structurally defined. Strings of words that function as syntactic constituents can move (8a), but those that are not constituents cannot (8b). (8)

a. which sonata is Jane playing __ in the concert? b. *which sonata in is Jane playing __ the concert?

Structure dependence is also illustrated by a third type of dependency, namely referential dependencies that hold between pronouns and their antecedents, as in (9). (9)

a. Jane thinks that you saw her at the concert1 b. Jane saw herself on TV after the concert

In these dependencies the pronouns get their semantic values from their antecedents. These dependencies are constrained by structure. In the case of reflexive pronouns like herself, the antecedent must c-command the pronoun and the antecedent must be (roughly) in the same clause as the pronoun, as illustrated by the unacceptability of (10a,b). (10)

a. *Jane’s brother saw herself on TV after the concert b. *Jane thought that you saw herself on TV after the concert

In the case of pronominals like her, there are two relevant constraints. First, a pronominal may not take a locally c-commanding antecedent (11a), though it may take non-c-commanding antecedents (11b), or c-commanding antecedents across clause boundaries (9a). And, a pronominal may not c-command its antecedent (11c), though it may precede it (11d). (11)

1

a. *Jane saw her on TV. b. Jane’s brother saw her on TV. c. *She thought that you saw Jane on TV. d. When she was practicing, Jane thought the sonata sounded great.

In the examples that follow, bold typeface is used to indicate intended coreference.

156

laurel perkins and jeffrey lidz

With these basic ideas about syntactic dependencies in mind, we now turn to the question of what infants know about these dependencies and how that knowledge arises.

5.4.1 Morphosyntactic dependencies in infancy Experimental work with very young children has found that they can track the statistical signature of dependencies like the is-ing relation, but this ability is mediated by their memory resources. For example, Santelmann and Jusczyk (1998) used the head-turn preference procedure to examine the morphosyntactic dependency between is and -ing. Santelmann and Jusczyk (1998) played 18-month-olds sentences with the sequence is Verb-ing, a real English dependency, as well as sentences containing the sequence can Verb-ing, which is not an English dependency. Some children heard sentences like Everybody is baking bread, and other children heard sentences like *Everybody can baking bread. 18-month-olds preferred to listen to sentences with the is Verb-ing sequence over sentences with the can Verb-ing sequence, indicating that they recognized that is and -ing stand in a dependency relation. 15-month-olds, however, showed no such preference. Moreover, the 18-month-olds preferred sentences with is Verb-ing when a 2-syllable adverb came between is and the verb, but not when a longer adverb intervened: they were still able to detect this dependency in sentences like Everybody is often baking bread but not Everybody is effectively baking bread. It appears that these infants’ limited memory resources interfered with their ability to track this morphosyntactic dependency across longer distances. That is, children needed to be able to hold enough linguistic material in memory in order to detect the co-occurrence of is with -ing, and longer intervening adverbs taxed their limited memory resources enough to prevent them from doing so. Santelmann and Jusczyk’s (1998) results indicate that English-speaking children are aware of the morphosyntactic dependency between is and -ing by the age of 18 months, although their memory resources aren’t always sufficient to detect this dependency in their input. Höhle, Schmitz, Santelmann, and Weissenborn (2006) extended this finding to German, showing that a similar dependency is also detected by 18-month-old Germanlearning infants. However, Höhle et al. also found that the German infants could detect the dependency across a longer distance of intervening material. They argued that the difference between English- and German-learning infants was not due to differences in memory, but to whether infants could linguistically analyze the material that intervened between the auxiliary and the verb. In English, what intervened was not part of the VP complement to the auxiliary. However, in German it was, and hence could more easily be integrated into children’s syntactic representations. What allows children to become aware of this kind of dependency? Results from artificial language learning studies suggest that children can track co-occurrence patterns in their input to learn non-adjacent dependencies, like the one between is and -ing in English (Gómez 2002; Gómez and Maye 2005). Saffran, Aslin, and Newport (1996)

behavioral acquisition methods with infants

157

famously showed that infants as young as 8 months old could use statistics to track the probability that certain nonsense syllables would occur next to each other. Gómez and Maye (2005) asked whether children can track the probability that certain strings will occur together across intervening material, and what it takes to learn such dependencies. These authors tested 15-month-olds’ abilities to detect these types of non-adjacent dependencies in an artificial language. These children heard “sentences” like pel-vameyrud, pel-wadim-rud, and pel-tapsu-rud, in which a dependency between the nonwords pel and rud obtained across a variety of intervening nonwords. After training, these infants were able to recognize this pel-X-rud dependency in new “sentences” that contained it, as long as their training contained enough variety in the nonwords that came between pel and rud. This suggests that children as young as 15 months old are able to detect the statistical signature of non-adjacent dependencies, provided they hear enough variety in the intervening material. Gómez and Maye argue that the greater variety discourages the learner from tracking adjacent dependencies and hence promotes the ability to notice non-adjacent dependencies. Omaki, Orita, and Lidz (in prep.) combined these findings together to ask whether the artificial language paradigm accurately models natural language acquisition. Omaki et al. provided 15-month-olds with experience of high variability in the verb intervening in the is-ing construction and then tested them using Santelmann and Jusczyk’s (1998) method and materials. They found that 15-month-olds were able to learn the dependency given this highly concentrated input. This suggests that the learning procedure children used in the artificial language experiment may be applicable to the acquisition of a natural language. Because morphosyntactic dependencies like the one between is and -ing in English are defined over hierarchical structures in a sentence rather than over the linear order of words, these relations can hold across certain kinds of intervening material. Children’s ability to detect the statistical signatures of non-adjacent dependencies is therefore crucial for learning these morphosyntactic dependencies in their language. Detecting non-adjacent dependencies requires high variability among the items that intervene between the parts of the dependency, variability that promotes the discovery of the dependency. But these statistical sensitivities interact with extralinguistic cognition: Children need sufficient memory resources and the ability to analyze the intervening material in order to recognize these dependencies over longer distances. Infants may be unable to keep both parts of the dependency in memory if the amount of linguistic material between them grows too large or is not linguistically analyzable. Children’s ability to detect morphosyntactic dependencies in their language thus develops in concert with their maturing memory resources. It remains open, however, how infants represent these dependencies. Several questions arise here. First, do infants represent these dependencies as between particular morphological forms, or do they recognize that all forms of the auxiliary “be” are equivalent in this relation (Tincoff et al. 2000)? Second, do they represent it as a headto-head relation between two verbs, as a relation between a head and its complement, or as a movement relation, as in Chomsky’s (1957) affix-hopping analysis? Third, when

158

laurel perkins and jeffrey lidz

infants observe a discontinuous dependency between two morphemes, what is the range of possible relations that they consider for representing it (Fodor 1966)? Do infants distinguish head-to-head relations, head–complement relations, and movement relations on the basis of morphological patterns, or do they require additional syntactic information to identify specific grammatical dependency relations? We leave these questions for future research.

5.4.2 Movement dependencies in infancy Learning movement dependencies involves both children’s linguistic and extralinguistic capacities. In this section, we first consider what infants know about wh-movement and relativization. We then turn to the question of how infants identify the strings that might contain movement dependencies. Some studies have found evidence that English-learning children might develop the ability to detect movement dependencies in English sentences between the ages of 15 and 20 months (Gagliardi et al. 2016; Seidl et al. 2003). Seidl, Hollich, and Jusczyk (2003) investigated 13-, 15-, and 20-month-old infants’ understanding of wh-questions using a preferential looking technique. Infants saw an event of e.g. an apple hitting some keys, and then saw still images of the apple and the keys while being asked one of three questions: Where are the keys?, What hit the keys?, and What did the apple hit?. They found that 13-month-olds were unable to respond correctly to any of the questions, that 15-month-olds looked at the correct image for the “where” question and the subject question, but not the object question, and that 20-month-olds looked at the correct image for all three question types. Gagliardi, Mease, and Lidz (2016) followed up on this research, testing comprehension of wh-questions like Which dog did the cat bump? and relative clauses like Find the dog that/who the cat bumped. These questions were asked after the infants watched a scene in which one dog bumped a cat, and then the cat bumped a second dog, making both the questions and relatives felicitous. Unlike Seidl et al. (2003), these authors did not find a subject–object asymmetry, but they did find an interesting Ushaped learning pattern. 15-month-olds appeared to arrive at the correct interpretation for both wh-questions and relative clauses. In an object question/relative they looked more at the dog that got bumped, rather than the dog that was the agent of bumping. 20-month-olds, on the other hand, only appeared to comprehend wh-questions and relative clauses with who, but not relative clauses with that. These authors argued that 20-month-olds’ surprising failure with certain relative clauses might demonstrate the development of syntactic knowledge: They have learned to represent the full movement dependencies in these sentences, but have difficulty detecting when relative clauses with that contain these dependencies. The word that is ambiguous in English—it occurs in many contexts other than in relative clauses—so words like who or which are much clearer cues to movement dependencies. By this logic, then, 15-month-olds might arrive at the right answer through a heuristic that does not require them to parse the full

behavioral acquisition methods with infants

159

movement dependency, thereby avoiding these difficulties with relative clauses. In other words, 15-month-olds’ success with both wh-questions and relative clauses may reflect more about their knowledge of argument structure than about their ability to represent long-distance dependencies. In support of this account, Perkins and Lidz (2020) found that 15-month-olds’ apparent success on this task is modulated by their vocabulary, a correlate of developing verb knowledge. We noted above that movement dependencies can only hold between structural units in a sentence. Because this structure-dependence is a universal property of human language, it is something that children might take for granted when learning their first language. In other words, it might be an intrinsic constraint imposed by their language learning mechanism (Chomsky 1975). This constraint would provide useful guidance for learning movement dependencies in their language: Once children can identify the hierarchical structure of a sentence, they will know that only units within that structure can move, and therefore will know which strings of words are candidates for movement. Takahashi and Lidz (2008) and Takahashi (2009) used an artificial language learning paradigm to test children’s knowledge of structure-dependence. Following a method developed by Thompson and Newport (2007), they constructed artificial grammars in which phrasal categories were expressed through the probabilities that certain words and word categories could occur together, with the idea that two adjacent categories from within a phrase would be more likely to cooccur than two adjacent categories from across a phrase boundary. To create these differences in probabilities in a corpus, they included “rules” in the artificial grammar through which some sequences of nonsense word categories could be optional, repeated, or substituted by other categories. These sequences could thus be identified as constituents. After being exposed to this artificial language for several minutes, adults and 18-month-olds were tested on sentences that contained movement. Adults accepted sentences when one of the optional, repeated, or substituted category sequences were moved: They used the differences in transitional probabilities to group these sequences into units and recognized that those units could move. In a head-turn preference experiment, 18-month-olds likewise distinguished sentences with moved units from those with moved sequences that weren’t units. In other words, these infants knew that only strings of words that formed a unit within a structural hierarchy could take part in movement relations, even though they had never heard movement before in this task. Once they were able to identify the hierarchical structure of these sentences, they were able to identify possible and impossible instances of movement in this artificial language. Their knowledge of structure-dependence allowed these learners to draw conclusions about syntactic relations beyond what they were exposed to in their input.

5.4.3 Referential dependencies Sentence structure is contributor to many aspects of sentence meaning. For example, the interpretation of pronouns depends on their syntactic context. Pronouns make a

160

laurel perkins and jeffrey lidz

contribution to sentence meaning that is underspecified, requiring the context to fill in some aspects of reference. In the sentence Allison thinks that she will get the job, the pronoun can be interpreted either as referring to Allison or to some other salient female individual in the context. In other cases, the pronoun’s interpretation depends on the syntactic context rather than the discourse context. For example, the pronouns she or her may get their reference from (co-refer with) Belinda in sentences like (12). (12)

a. When she was in the interview, Belinda spilled some water. b. Belinda said that my brother interviewed her.

However, the pronouns must all refer to someone other than Belinda in sentences like (13). (13)

a. *She was in the interview when Belinda spilled some water. b. *Belinda interviewed her.

Thus, while pronouns can have their reference be determined by other parts of the sentence, the conditions under which such referential dependencies hold are constrained by syntactic hierarchy and syntactic locality.2 The role of hierarchy can be seen in the contrast between (12a) and (13a), above. In each of these sentences, the pronoun precedes Belinda in the linear order of words, but in (13a) the pronoun is “higher” in the structural hierarchy. The notion of height in linguistic structures is expressed though a relation called c-command (Reinhart 1981). One expression c-commands another if the smallest unit containing the first also contains the second. In (12a), the pronoun does not c-command Belinda, but in (13a), it does. In addition, one expression binds a second expression if it c-commands the second expression and co-refers with that expression (Chomsky 1981). But we can’t interpret the second sentence above with the pronoun co-referring with Belinda: It has to refer to someone else. In other words, the pronoun cannot bind Belinda. The relevant constraint on pronoun interpretation, known as Principle C, is thus that a pronoun cannot bind its antecedent (Lasnik 1976), or, stated slightly differently, a referring expression like Belinda cannot be bound (Chomsky 1981). Principle C has played a very prominent role in arguments concerning the origins of grammatical knowledge (Crain 1991). Because children are exposed only to grammatical sentence–meaning pairs, it is a puzzle how they acquire constraints like Principle C, which block certain sentences from expressing otherwise sensible interpretations. How can one acquire rules about the interpretations that sentences cannot have? Crain and McKee (1985) observed that Principle C constrains children’s interpretations as early as 3 years of age. This observation raises the question of the origin of this constraint. The success of 3-year-olds is often taken as strong evidence for the role of 2

In certain discourse contexts, these constraints may be overridden (Bolinger 1979, Evans 1980, Harris and Bates 2002).

behavioral acquisition methods with infants

161

c-command in children’s representations, and hence for the role of hierarchical structure in shaping children’s interpretations throughout development. See Kazanina and Phillips (2001) for supporting evidence from Russian. This view may be further bolstered by work demonstrating that 30-month-old infants display knowledge of Principle C. Lukyanenko, Conroy, and Lidz (2014) conducted a preferential looking experiment in which infants saw two videos side by side. In one video, a girl (Katie) was patting herself on the head. In the other video, a second girl patted Katie on the head. Infants were then asked to find the image in which “She is patting Katie,” or the one in which “she is patting herself.” Infants in the former condition looked more at the video in which Katie was getting patted by someone else, whereas those in the latter condition looked more at the video in which Katie was patting herself. To determine whether children’s interpretations were driven by Principle C, as opposed to an alternative non-structural heuristic, Sutton, Fetters, and Lidz (2012) and Sutton (2015) tested children in a preferential looking task like that in Lukyanenko et al. (2014) and also in a task measuring sensitivity to hierarchical structure. Children saw three objects—a big red train, a medium-sized yellow train and a small yellow train. They then were asked to find “the big yellow train.” Correct interpretation requires restricting the adjective big to apply to the phrase yellow train. Sutton et al. measured the speed with which they looked to the correct object and used that to predict the speed with which they arrived at the correct interpretation of the Principle C sentences. They found that these structural processing measures were significantly correlated, though measures of lexical processing speed and vocabulary size were not predictive of Principle C performance. Together these findings suggest that the computation of hierarchical structure is a critical component of children’s understanding of sentences, which are subject to Principle C as early as we can measure.

5.4.4 Summary In summary, children have been shown to be sensitive to morphosyntactic, movement, and referential dependencies very early in development. Using behavioral methods based on simple attentional measures, we are able to see the emerging sensitivity to linguistic dependencies in the second year of life as well as children’s use of statistical sensitivities to identify specific dependencies. Moreover, by taking into account constraints from early sentence processing mechanisms, we are able to better diagnose the structural nature of children’s early successes and failures with syntactic dependencies. In all cases, it appears as though children represent syntactic dependencies in hierarchical terms.

5.5 Conclusion

..........................................................................................................................

Behavioral research provides a window into how core properties of a grammar are acquired in infancy. In this chapter, we have reviewed evidence for the development of grammatical categories, clause structure, and syntactic dependencies, much of which

162

laurel perkins and jeffrey lidz

precedes infants’ earliest sentence productions. Behavioral methods thus allow developmental linguists to see earlier emergence, and potentially a truer picture, of grammatical competence than is revealed in the sentences children produce. Controlled experimental designs allow researchers to overcome the sampling limitations inherent in studying what children happen to say spontaneously. Furthermore, if designed well, these tasks can allow researchers to control for extralinguistic factors like working memory and executive function which interact with grammatical knowledge in influencing children’s behavior. Infants are of course limited in the behaviors they are able to control in response to a linguistic stimulus, particularly before they begin producing sentences of their own. Therefore, developmental researchers frequently rely on methods that use implicit measures of infants’ linguistic comprehension. The methods we’ve surveyed include measures of attention—from High-Amplitude Sucking for newborns only able to control their sucking rate, to Habituation, Conditioned Head Turn, and Head Turn Preference procedures for older infants able to control their neck muscles and eye gaze. We’ve also seen tasks that measure eye movements at a finer-grained level, such as the Preferential Looking paradigm, which relies on infants’ ability to coordinate their eye saccades in response to an auditory stimulus. And as a less implicit measure, we’ve discussed the Elicited Imitation procedure, which is one of the few production tasks used with children as young as 2 years old. The evidence we’ve reviewed from these tasks reveals a rich and complex picture of infants’ earliest syntactic development. In their acquisition of grammatical categories, infants appear sensitive to the differences between lexical and functional categories from birth (Shi et al. 1999), and to the syntactic and interpretive consequences of many of these categories and subcategories by 18–19 months (e.g. Fisher et al. 2010; He and Lidz 2017). In their acquisition of clause structure, infants appear to be aware of the canonical word order of their language by 17 months (Hirsh-Pasek and Golinkoff 1996), and are biased towards hierarchical structural representations (Lidz et al. 2003; Sutton 2015; Sutton et al. 2012). In their acquisition of syntactic dependencies, infants appear able to detect morphosyntactic dependencies and movement dependencies between 15 and 20 months (Gagliardi et al. 2016; Gómez and Maye 2005; Santelmann and Jusczyk 1998; Seidl et al. 2003), and are aware of the structural constraints on movement and referential dependencies between 18 and 30 months (Lukyanenko et al. 2014; Sutton 2015; Sutton et al. 2012; Takahashi 2009; Takahashi and Lidz 2008). Yet this picture is by no means complete, and many open questions remain about the nature and development of infants’ syntactic knowledge. How do we determine whether children are aware of the full syntactic and interpretive consequences of assigning a word to a particular category—e.g. that it is possible to extract out of the clausal complement of a verb but not a noun, or that determiners not only co-occur with nouns but have specific syntactic and semantic properties by virtue of being determiners? What is the nature of the inferences children draw about verb meaning on the basis of clausal arguments, and what can this tell us about how richly children represent those arguments? Are children’s earliest clause structure representations

behavioral acquisition methods with infants

163

hierarchically structured and complete at the earliest stages of syntactic development, and if so, how do we explain production phenomena like early root infinitives and null subjects? How can we tell whether children represent syntactic dependencies in an adult-like, structure-dependent manner, and how do we determine whether children are aware of the syntactic consequences of identifying particular dependency types—e.g. that wh-movement is island-sensitive? These questions push beyond the frontier of our knowledge of language acquisition, and answering them will involve increasingly creative, age-appropriate methods for assessing linguistic knowledge in a challenging population. But doing so brings us closer to understanding how such a highly structured cognitive system—a grammar—can be acquired by all humans exposed to similar linguistic experience, and to understanding the nature of the specialized language faculty that we share with even the youngest members of our species.

References Allen, S. E. 2000. A discourse-pragmatic explanation for argument representation in child Inuktitut. Linguistics 38(3): 483–521. Arunachalam, S., K. Syrett, and Y. Chen. 2016. Lexical disambiguation in verb learning: Evidence from the conjoined-subject intransitive frame in English and Mandarin Chinese. Frontiers in Psychology 7: 138. Arunachalam, S., and S. R. Waxman. 2010. Meaning from syntax: Evidence from 2-year-olds. Cognition 114(3): 442–446. Baker, M. C. 1988. Incorporation: A theory of grammatical function changing. Chicago: University of Chicago Press. Baker, M. C. 2001. The natures of nonconfigurationality. In R. K. Baltin and C. Collins (eds), The handbook of contemporary syntactic theory, 407–438. Oxford: Blackwell. Bar-Shalom, E., and W. Snyder. 1997. Root infinitives in child Russian: A comparison with Italian and Polish. In Proceedings of the GALA ’97 Conference on Language Acquisition: Knowledge Representation and Processing. Bloom, L. 1970. Language development: Form and function in emerging grammars. Cambridge, MA: MIT Press. Bloom, P. 1990. Subjectless sentences in child language. Linguistic Inquiry 21(4): 491–504. Bolinger, D. 1979. Pronouns in discourse. In T. Givón (ed), Syntax and Semantics, Vol. 12: Discourse and Syntax, 287–309. New York: Academic Press. Bowerman, M. 1973. Early syntactic development: A cross-linguistic study with special reference to Finnish. Cambridge: Cambridge University Press. Braine, M. D. S. 1963. The ontogeny of English phrase structure: The first phase. Language 39(1): 1–13. Brandone, A., D. A. Addy, R. Pulverman, R. M. Golinkoff, and K. Hirsh-Pasek. 2006. Onefor-one and two-for-two: Anticipating parallel structure between events and language. In Proceedings of the 30th annual Boston University Conference on Language Development, 36–47. Brown, R. 1957. Linguistic determinism and the part of speech. Journal of Abnormal and Social Psychology 55(1): 1.

164

laurel perkins and jeffrey lidz

Brown, R. 1973. A first language: The early stages. Cambridge, MA: Harvard University Press. Brown, R., and U. Bellugi. 1964. Three processes in the child’s acquisition of syntax. Harvard Educational Review 34(2): 133–151. Bunger, A., and J. Lidz. 2004. Syntactic bootstrapping and the internal structure of causative events. In Proceedings of the 28th Annual Boston University Conference on Language Development, 74. Bunger, A., and J. Lidz. 2008. Thematic relations as a cue to verb class: 2-year-olds distinguish unaccusatives from unergatives. University of Pennsylvania Working Papers in Linguistics 14(1): 4. Cauvet, E., R. Limissuri, S. Millotte, K. Skoruppa, D. Cabrol, and A. Christophe. 2014. Function words constrain on-line recognition of verbs and nouns in French 18-month-olds. Language Learning and Development 10(1): 1–18. Chomsky, N. 1957. Syntactic structures. The Hague: Mouton. Chomsky, N. 1975. Reflections on language. New York: Pantheon. Chomsky, N. 1981. Lectures on government and binding. Dordrecht: Foris. Christophe, A., S. Millotte, S. Bernal, and J. Lidz. 2008. Bootstrapping lexical and syntactic acquisition. Language and Speech 51(1–2): 61–75. Clancy, P. 1993. Preferred argument structure in Korean acquisition. In Proceedings of the 25th Annual Child Language Research Forum, 307–314. Crain, S. 1991. Language acquisition in the absence of experience. Behavioral and Brain Sciences 14(04): 597–612. Crain, S., and McKee, C. 1985. The acquisition of structural restrictions on anaphora. Proceedings of NELS 15: 94–110. Csibra, G., G. Gergely, S. Bíró, O. Koos, and M. Brockbank. 1999. Goal attribution without agency cues: The perception of “pure reason” in infancy. Cognition 72(3): 237–267. Dowty, D. 1991. Thematic proto-roles and argument selection. Language 67(3): 547–619. Evans, G. 1980. Pronouns. Linguistic Inquiry 11(2): 337-362. Fernald, A., R. Zangl, A. L. Portillo, and V. A. Marchman. 2008. Looking while listening: Using eye movements to monitor spoken language. In I. A. Sekerina, E. M. Fernandez, and H. Clahsen (eds), Developmental psycholinguistics: On-line methods in children’s language processing, 97–135. Amsterdam: Benjamins. Fillmore, C. J. 1968. The case for case. In E. Bach and R. Harms (eds), Universals in linguistic theory, 1–88. New York: Holt, Rinehart, & Winston. Fillmore, C. J. 1970. The grammar of “hitting” and “breaking.” In R. A. Jacobs and P. S. Rosenbaum (eds), Readings in English Transformational Grammar, 120–133. London: Ginn. Fisher, C. (1996). Structural limits on verb mapping: The role of analogy in children’s interpretations of sentences. Cognitive Psychology 31(1): 41–81. Fisher, C., Y. Gertner, R. M. Scott, and S. Yuan. 2010. Syntactic bootstrapping. Wiley Interdisciplinary Reviews: Cognitive Science 1(2): 143–149. Fisher, C., H. Gleitman, and L. R. Gleitman. 1991. On the semantic content of subcategorization frames. Cognitive Psychology 23(3): 331–392. Fodor, J. A. 1966. How to learn to talk: Some simple ways. In F. Smith and G. Miller (eds), The genesis of language, 105–122. Cambridge, MA: MIT Press. Gagliardi, A., T. M. Mease, and J. Lidz. 2016. Discontinuous development in the acquisition of filler-gap dependencies: Evidence from 15-and 20-month-olds. Language Acquisition 23(3): 1–27.

behavioral acquisition methods with infants

165

Gerken, L. 1991. The metrical basis for children’s subjectless sentences. Journal of Memory and Language 30(4): 431–451. Gerken, L. 1994. A metrical template account of children’s weak syllable omissions from multisyllabic words. Journal of Child Language 21(3): 565–584. Gerken, L., and B. J. McIntosh. 1993. Interplay of function morphemes and prosody in early language. Developmental Psychology 29(3): 448. Gertner, Y., and C. Fisher. 2012. Predicted errors in children’s early sentence comprehension. Cognition 124(1): 85–94. Gertner, Y., C. Fisher, and J. Eisengart. 2006. Learning words and rules: Abstract knowledge of word order in early sentence comprehension. Psychological Science 17(8): 684–691. Gleitman, L. R. 1990. The structural sources of verb meanings. Language Acquisition 1(1): 3–55. Gleitman, L. R., K., Cassidy, R. Nappa, A. Papafragou, and J. C. Trueswell. 2005. Hard words. Language Learning and Development 1(1): 23–64. Golinkoff, R. M., K. Hirsh-Pasek, K. M. Cauley, and L. Gordon. 1987. The eyes have it: Lexical and syntactic comprehension in a new paradigm. Journal of Child Language 14(1): 23–45. Gómez, R. L. 2002. Variability and detection of invariant structure. Psychological Science 13(5): 431–436. Gómez, R. L., and J. Maye. 2005. The developmental trajectory of nonadjacent dependency learning. Infancy 7(2): 183–206. Grimshaw, J. 1981. Form, function and the language acquisition device. In C. L. Baker and J. J. Mccarthy (eds), The logical problem of language acquisition, 165–182. Cambridge, MA: MIT Press. Grinstead, J., J. De la Mora, M. Vega-Mendoza, B. Flores, et al. 2009. An elicited production test of the optional infinitive stage in child Spanish. In Proceedings of the 3rd Conference of Generative Approaches to Language Acquisition—North America, 36–45. Guasti, M. T. 1993. Verb syntax in Italian child grammar: Finite and nonfinite verbs. Language Acquisition 3(1): 1–40. Guasti, M. T. 2002. Language acquisition: The growth of grammar. Cambridge, MA: MIT Press. Guilfoyle, E., and M. Noonan. 1988. Functional categories and language acquisition. Paper presented at 13th Annual Boston University Conference on Language Development. Guilfoyle, E., and M. Noonan. 1992. Functional categories and language acquisition. Canadian Journal of Linguistics/Revue canadienne de linguistique 37(2): 241–272. Hacquard, V. 2014. Bootstrapping attitudes. Semantics and Linguistic Theory 24: 330–352. Haegeman, L. 1995. Root infinitives, tense, and truncated structures in Dutch. Language Acquisition 4(3): 205–255. Hall, D. G., S. R. Waxman, and W. M. Hurwitz. 1993. How two-and four-year-old children interpret adjectives and count nouns. Child Development 64(6): 1651–1664. Harrigan, K., V. Hacquard, and J. Lidz. 2016. Syntactic bootstrapping in the acquisition of attitude verbs: Think, want and hope. In Proceedings of WCCFL, 33. Harris, C. L., and Bates, E. A. 2002. Clausal backgrounding and pronominal reference: A functionalist approach to c-command. Language and Cognitive Processes 17(3): 237–269. Harris, T., and K. Wexler. 1996. The optional-infinitive stage in child English: Evidence from negation. In H. Clahsen (ed.), Language acquisition and language disorders, vol. 14, p. 1. Amsterdam: Benjamins. He, A. X., and J. Lidz. 2017. Verb learning in 14-and 18-month-old English-learning infants. Language Learning and Development, 1–22. http://dx.doi.org/10.1080/15475441. 2017.1285238

166

laurel perkins and jeffrey lidz

Hicks, J., J. Maye, and J. Lidz. 2007. The role of function words in infants’ syntactic categorization of novel words. Paper presented at Linguistic Society of America Annual Meeting, Anaheim, CA. Hirsh-Pasek, K., and R. M. Golinkoff. 1996. The intermodal preferential looking paradigm: A window onto emerging language comprehension. In D. McDaniel, C. McKee, and H. S. Cairns (eds), Methods for assessing children’s syntax, 105–124. Cambridge, MA: MIT Press. Hochmann, J.-R., A. D. Endress, and J. Mehler. 2010. Word frequency as a cue for identifying function words in infancy. Cognition 115(3): 444–457. Höhle, B., M. Schmitz, L. M. Santelmann, and J. Weissenborn. 2006. The recognition of discontinuous verbal dependencies by German 19-month-olds: Evidence for lexical and structural influences on children’s early processing capacities. Language Learning and Development 2(4): 277–300. Höhle, B., J. Weissenborn, D. Kiefer, A. Schulz, and M. Schmitz. 2004. Functional elements in infants’ speech processing: The role of determiners in the syntactic categorization of lexical elements. Infancy 5(3): 341–353. Hyams, N. 1986. Language acquisition and the theory of parameters. Dordrecht: Reidel. Hyams, N. 1992. A reanalysis of null subjects in child language. In J. Weissborn, H. Goodluck, and T. Roeper (eds), Theoretical issues in language acquisition: Continuity and change in development, 249–267. Brighton: Psychology Press. Hyams, N., and K. Wexler. 1993. On the grammatical basis of null subjects in child language. Linguistic Inquiry 24(3): 421–459. Jackendoff, R. 1972. Semantic interpretation in generative grammar. Cambridge, MA: MIT Press. Kazanina, N., and C. Phillips. 2001. Coreference in child Russian: Distinguishing syntactic and discourse constraints. In Proceedings of the 25th Boston University Conference on Language Development. Kedar, Y., M. Casasola, and B. Lust. 2006. Getting there faster: 18-and 24-month-old infants’ use of function words to determine reference. Child Development 77(2): 325–338. Kim, Y.-J. 2000. Subject/object drop in the acquisition of Korean: A cross-linguistic comparison. Journal of East Asian Linguistics 9(4): 325–351. Kuhl, P. K. 1985. Methods in the study of infant speech perception. In G. Gottlieb and N. A. Krasnegor (eds), Measurement of audition and vision in the first year of postnatal life: A methodological overview, 223–251. Norwood, NJ: Ablex. Landau, B., and L. R. Gleitman. 1985. Language and experience: Evidence from the blind child. Cambridge, MA: Harvard University Press. Lasnik, H. 1976. Remarks on coreference. Linguistic Analysis 2: 1–22. Lasnik, H. 1989. On certain substitutes for negative data. In R. J. Matthews and W. Demopoulos (eds), Learnability and linguistic theory, 89–105. Alphen aan den Rijn: Kluwer. Legate, J. A., and C. Yang. 2007. Morphosyntactic learning and the development of tense. Language Acquisition 14(3): 315–344. Leslie, A. M. 1995. A theory of agency. In D. Sperber, D. Premack, and A. J. Premack (eds), Causal cognition: A multidisciplinary debate, 121–141. Oxford: Oxford University Press. Leslie, A. M., and S. Keeble. 1987. Do six-month-old infants perceive causality? Cognition 25(3): 265–288. Levin, B., and M. R. Hovav. 2005. Argument realization. Cambridge: Cambridge University Press. Lidz, J., and L. R. Gleitman. 2004. Yes, we still need universal grammar. Cognition 94(1): 85–93.

behavioral acquisition methods with infants

167

Lidz, J., S. Waxman, and J. Freedman. 2003. What infants know about syntax but couldn’t have learned: Experimental evidence for syntactic structure at 18 months. Cognition 89(3): 295–303. Lukyanenko, C., A. Conroy, and J. Lidz. 2014. Is she patting Katie? Constraints on pronominal reference in 30-month-olds. Language Learning and Development 10(4): 328–344. Luo, Y., L. Kaufman, and R. Baillargeon. 2009. Young infants’ reasoning about physical events involving inert and self-propelled objects. Cognitive Psychology 58(4): 441–486. Meylan, S. C., M. C. Frank, B. C. Roy, and R. Levy. 2017. The emergence of an abstract grammatical category in children’s early speech. Psychological Science 28(2): 181–192. Mintz, T. H., and L. R. Gleitman. 2002. Adjectives really do modify nouns: The incremental and restricted nature of early adjective acquisition. Cognition 84(3): 267–293. Monaghan, P., N. Chater, and M. H. Christiansen. 2005. The differential role of phonological and distributional cues in grammatical categorisation. Cognition 96(2): 143–182. Naigles, L. R. 1990. Children use syntax to learn verb meanings. Journal of Child Language 17(2): 357–374. Naigles, L. R. 1996. The use of multiple frames in verb learning via syntactic bootstrapping. Cognition 58(2): 221–251. Noble, C. H., C. F. Rowland, and J. M. Pine. 2011. Comprehension of argument structure and semantic roles: Evidence from English-learning children and the forced-choice pointing paradigm. Cognitive Science 35(5): 963–982. Nuñez del Prado, Z., C. Foley, and B. Lust. 1993. The significance of CP to the pro-drop parameter: An experimental study comparing Spanish and English. In Proceedings of the 25th Annual Child Language Research Forum, 146. Omaki, A., N. Orita, and J. Lidz. In prep. Beyond artificial language: Statistical learning of an English non-adjacent syntactic dependency. Orfitelli, R., and N. Hyams. 2012. Children’s grammar of null subjects: Evidence from comprehension. Linguistic Inquiry 43(4): 563–590. Perkins, L. (2019). How grammars grow: Argument structure and the acquisition of non-basic syntax. Doctoral dissertation, University of Maryland. Perkins, L., N. H. Feldman, and J. Lidz. 2017. Learning an input filter for argument structure acquisition. In Proceedings of the 7th Workshop on Cognitive Modeling and Computational Linguistics. Perkins, L., and J. Lidz. 2020. Filler-gap dependency comprehension at 15 months: The role of vocabulary. Language Acquisition 27(1): 98–115. Phillips, C. 1995. Syntax at age two: Cross-linguistic differences. MIT Working Papers in Linguistics 26: 325–382. Pine, J. M., and E. V. Lieven. 1997. Slot and frame patterns and the development of the determiner category. Applied Psycholinguistics 18(2): 123–138. Pine, J. M., and H. Martindale. 1996. Syntactic categories in the speech of young children: The case of the determiner. Journal of Child Language 23(2): 369–395. Pinker, S. 1984. Language learnability and language development. Cambridge, MA: Harvard University Press. Pinker, S. 1989. Learnability and cognition: The acquisition of argument structure. Cambridge, MA: MIT Press. Platzack, C. 1990. A grammar without functional categories: A syntactic study of early Swedish child language. Nordic Journal of Linguistics 13(2): 107–126.

168

laurel perkins and jeffrey lidz

Poeppel, D., and K. Wexler. 1993. The full competence hypothesis of clause structure in early German. Language 69(1): 1–33. Pozzan, L., L. R. Gleitman, and J. C. Trueswell. 2015. Semantic ambiguity and syntactic bootstrapping: The case of conjoined-subject intransitive sentences. Language Learning and Development 12(1): 14–41. Pratt, A., and J. Grinstead. 2007a. Receptive measures of the optional infinitive stage in child Spanish. Paper presented at Hispanic Linguistics Symposium, San Antonio, TX. Pratt, A., and J. Grinstead. 2007b. Optional infinitives in child Spanish. In Proceedings of the 2nd Conference on Generative Approaches to Language Acquisition North America, 351–362. Radford, A. 1990. Syntactic theory and the acquisition of English syntax: The nature of early child grammars of English. Oxford: Blackwell. Reinhart, T. (1981). Definite NP anaphora and c-command domains. Linguistic Inquiry 12(4): 605–635. Rice, M. L., K. Wexler, and S. Hershberger. 1998. Tense over time: The longitudinal course of tense acquisition in children with specific language impairment. Journal of Speech, Language, and Hearing Research 41(6): 1412–1431. Rice, M. L., K. Wexler, and S. M. Redmond. 1999. Grammaticality judgments of an extended optional infinitive grammar: Evidence from English-speaking children with specific language impairment. Journal of Speech, Language, and Hearing Research 42(4): 943–961. Rizzi, L. 1993. Some notes on linguistic theory and language development: The case of root infinitives. Language Acquisition 3(4): 371–393. Rizzi, L. 1994. Early null subjects and root null subjects. In T. Hoekstra and B. D. Schwartz (eds), Language acquisition studies in generative grammar, 151–176. Amsterdam: Benjamins Rizzi, L. 2005. Phase theory and the privilege of the root. In H. Broekhuis, N. Corver, R. Huybregts, U. Kleinherz, and J. Koster (eds), Organizing grammar: Studies in honor of Henk van Riemsdijk, 529–537. Berlin: Mouton de Gruyter. Ross, J. R. 1967. Constraints on variables in syntax. Doctoral dissertation, Massachusetts Institute of Technology. Saffran, J. R., R. N. Aslin, and E. L. Newport. 1996. Statistical learning by 8-month-old infants. Science 274(5294): 1926–1928. Santelmann, L. M., and P. W. Jusczyk. 1998. Sensitivity to discontinuous dependencies in language learners: Evidence for limitations in processing space. Cognition 69(2): 105–134. Schaeffer, J., and D. Ben Shalom. 2004. On root infinitives in child Hebrew. Language Acquisition 12(1): 83–96. Scott, R. M., and C. Fisher. 2009. Two-year-olds use distributional cues to interpret transitivityalternating verbs. Language and Cognitive Processes 24(6): 777–803. Seidl, A., G. Hollich, and P. W. Jusczyk. 2003. Early understanding of subject and object whquestions. Infancy 4(3): 423–436. Serratrice, L. 2005. The role of discourse pragmatics in the acquisition of subjects in Italian. Applied Psycholinguistics 26(3): 437–462. Shi, R., and M. Lepage. 2008. The effect of functional morphemes on word segmentation in preverbal infants. Developmental Science 11(3): 407–413. Shi, R., and A. Melançon. 2010. Syntactic categorization in French-learning infants. Infancy 15(5): 517–533. Shi, R., J. L. Morgan, and P. Allopenna. 1998. Phonological and acoustic bases for earliest grammatical category assignment: A cross-linguistic perspective. Journal of Child Language 25(1): 169–201.

behavioral acquisition methods with infants

169

Shi, R., J. F. Werker, and J. L. Morgan. 1999. Newborn infants’ sensitivity to perceptual cues to lexical and grammatical words. Cognition 72(2): B11–B21. Shipley, E. F., C. S. Smith, and L. R. Gleitman. 1969. A study in the acquisition of language: Free responses to commands. Language 45: 322–342. Smith, L. B., S. S. Jones, and B. Landau. 1992. Count nouns, adjectives, and perceptual properties in children’s novel word interpretations. Developmental Psychology 28(2): 273. Snyder, W. 2001. On the nature of syntactic variation: Evidence from complex predicates and complex word-formation. Language 77(2): 324–342. Spelke, E. 1976. Infants’ intermodal perception of events. Cognitive Psychology 8(4): 553–560. Stromswold, K. J. 1990. Learnability and the acquisition of auxiliaries. PhD thesis, Massachusetts Institute of Technology. Sutton, M. 2015. Competence and performance in the development of Principle C. Doctoral dissertation, University of Maryland. Sutton, M., M. Fetters, and J. Lidz. 2012. Parsing for Principle C at 30 months. In Proceedings of the 36th Boston University Conference on Language Development, 581–593. Takahashi, E. 2009. Beyond statistical learning in the acquisition of phrase structure. Dissertation, University of Maryland. Takahashi, E., and J. Lidz. 2008. Beyond statistical learning in syntax. In Proceedings of Generative Approaches to Language Acquisition. Taylor, M., and S. A. Gelman. 1988. Adjectives and nouns: Children’s strategies for learning new words. Child Development 59: 411–419. Thompson, S. P., and E. L. Newport. 2007. Statistical learning of syntax: The role of transitional probability. Language Learning and Development 3(1): 1–42. Tincoff, R., L. M. Santelmann, and P. W. Jusczyk. 2000. Auxiliary verb learning and 18-montholds’ acquisition of morphological relationships. In Proceedings of the 24th Annual Boston University Conference on Language Development, vol. 2, 726–737. Tomasello, M. 2000. Do young children have adult syntactic competence? Cognition 74(3): 209–253. Tomasello, M. 2003. Constructing a language: A usage-based theory of language acquisition. Cambridge, MA: Harvard University Press. Valian, V. 1986. Syntactic categories in the speech of young children. Developmental Psychology 22(4): 562. Valian, V. 1991. Syntactic subjects in the early speech of American and Italian children. Cognition 40(1–2): 21–81. Valian, V. 2016. Null subjects. In J. Lidz, W. Snyder, and J. Pater (eds), Oxford handbook of developmental linguistics, 386–413. Oxford: Oxford University Press. Valian, V., and Aubry, S. 2005. When opportunity knocks twice: Two-year-olds’ repetition of sentence subjects. Journal of Child Language 32(3): 617–641. Valian, V., and Z. Eisenberg. 1996. The development of syntactic subjects in Portuguesespeaking children. Journal of Child Language 23(1): 103–128. Valian, V., J. Hoeffner, and S. Aubry. 1996. Young children’s imitation of sentence subjects: Evidence of processing limitations. Developmental Psychology 32(1): 153. Valian, V., S. Solt, and J. Stewart. 2009. Abstract categories or limited-scope formulae? The case of children’s determiners. Journal of Child Language 36(4): 743–778. Wang, Q., D. Lillo-Martin, C. T. Best, and A. Levitt. 1992. Null subject versus null object: Some evidence from the acquisition of Chinese and English. Language Acquisition 2(3): 221–254.

170

laurel perkins and jeffrey lidz

Waxman, S. R. 1999. Specifying the scope of 13-month-olds’ expectations for novel words. Cognition 70(3): B35–B50. Waxman, S. R., and A. E. Booth. 2001. Seeing pink elephants: Fourteen-month-olds’ interpretations of novel nouns and adjectives. Cognitive Psychology 43(3): 217–242. Waxman, S. R., and D. B. Markow. 1998. Object properties and object kind: Twenty-onemonth-old infants’ extension of novel adjectives. Child Development 69(5): 1313–1329. Wellwood, A., A. X. He, J. Lidz, and A. Williams. 2015. Participant structure in event perception: Towards the acquisition of implicitly 3-place predicates. University of Pennsylvania Working Papers in Linguistics 21(1): 32. Weverink, M. 1989. The subject in relation to inflection in child language. Dissertation, University of Utrecht. Wexler, K. 1994. Optional infinitives, head movement and the economy of derivations. In D. Lightfoot and N. Hornstein (eds), Verb movement, 305–350. Cambridge: Cambridge University Press. Wexler, K. 1998. Very early parameter setting and the unique checking constraint: A new explanation of the optional infinitive stage. Lingua 106(1): 23–79. Wexler, K. 2014. A new theory of null-subjects of finite verbs in young children. In M. Becker, J. Grinstead, and J. Rothman (eds), Generative linguistics and acquisition: Studies in honor of Nina M. Hyams, 325–356. Amsterdam: Benjamins. White, A. S., V. Hacquard, and J. Lidz. 2018. Main clause syntax and the labeling problem in syntactic bootstrapping. In Semantics in acquisition, 198–220. Amsterdam: Benjamins. Williams, A. 2015. Arguments in syntax and semantics. Cambridge: Cambridge University Press. Woodward, A. L. 1998. Infants selectively encode the goal object of an actor’s reach. Cognition 69(1): 1–34. Yang, C. 2002. Knowledge and learning in natural language. Oxford: Oxford University Press. Yang, C. 2013. Ontogeny and phylogeny of language. Proceedings of the National Academy of Sciences 110(16): 6324–6327. Yuan, S., and C. Fisher. 2009. “Really? She blicked the baby?” Two-year-olds learn combinatorial facts about verbs by listening. Psychological Science 20(5): 619–626. Yuan, S., C. Fisher, and J. Snedeker. 2012. Counting the nouns: Simple structural cues to verb meaning. Child Development 83(4): 1382–1399.

c ha p t e r 6 ...........................................................................................................

b e h av i o r a l acquisition methods with preschool-age children ...........................................................................................................

kristen syrett

6.1 Introduction

..........................................................................................................................

This chapter picks up where the previous chapter, devoted to the acquisition of syntax by infants, left off. In the first two to three years of life, infants demonstrate an impressive capacity to acquire the building blocks of syntax. They distinguish phonological forms that are frequently instantiated by words associated with functional categories from those associated with lexical categories, and appear to recognize certain clusters of functional words and their role as anchors in speech segmentation and the identification of phrases. They recruit the linguistic context in which a novel verb appears in the service of assigning a likely meaning to that verb based on event representation. They link word order and syntactic structure with semantic roles. They target aspects of the hierarchical syntactic structure for anaphoric dependence and coreference. They track non-adjacent morphosyntactic dependences. They appeal to syntactic structure to learn filler–gap dependencies. All of these abilities are that much more impressive when one considers the modest productive repertoire of these little humans. As infants become more skilled at comprehending, processing, and producing language, the range of syntactic constructions that can be investigated becomes considerably more complex, and the range of methodologies that experimenters are able to employ to assess young children’s developing syntactic knowledge broadens. The syntactic knowledge of preschoolers (children age

172

kristen syrett

3–6) and the methodologies that researchers use to tap into this knowledge are the focus of this chapter. As is the case with infants, experiments with preschoolers emphasizing production can return highly misleading results about children’s underlying competence. As a result, many behavioral methods employed with this age group focus on investigating children’s ability to access one or more interpretations associated with a target sentence via their assessment of the acceptability or appropriateness of a sentence in light of a particular context, or their selection or re-creation of a context that aligns with a sentence. However, some methods still creatively capitalize upon children’s productions to gather informative data about children’s capacity to produce certain constructions, and their reluctance or inability to produce others. In this chapter, I provide a broad summary of a number of aspects of young children’s developing syntactic knowledge in the preschool years, highlighting methodological approaches used to uncover this knowledge. In a number of cases, this syntactic knowledge lies more at the interface of syntactic and semantic knowledge, precisely because of the tight way in which these grammatical systems are interwoven and rely upon each other: lexical entries do not only carry implications for individual word meaning, but also impact the ways in which those words compose or interact with other lexical expressions and the requirements for the syntactic structure in which they appear. The syntactic knowledge covered in this chapter ranges from knowledge of individual words and the interpretations they license, to covert and overt movement operations. The topics covered in this chapter thus fall into three main areas. First, I begin by reviewing what children know about the meaning of individual words insofar as this knowledge is related to the syntactic constraints on their distribution and their interaction with other lexical items. I focus on a targeted set of lexical items, since their meaning and distribution are tightly linked to elements of the syntactic structure in which they appear: These include pronouns and reflexives and the binding constraints that are implicated by them, universal quantifiers and their scope-taking ability, propositional attitude verbs, raising and control verbs and structures, and ‘tough’ and control adjectives. Second, I turn to children’s interpretation of certain syntactic constructions and the factors that make it harder or easier for them to be interpreted, along with what the erroneous interpretations that children assign says about the grammar. In this section, I cover relative clauses and passive constructions. These two constructions are in fact linked to the topics covered in the next section; however, I have highlighted them here because of the particular attention that has been devoted to them as constructions per se, given variations within them and contrasts with similar declarative constructions. Third and finally, I turn to what children know about constraints on overt and covert movement. For overt movement, I focus on yes/no polar questions that involve subject– auxiliary verb inversion and wh-movement. For covert movement, I focus on two grammatical mechanisms: the first, called quantifier raising, which is not, in fact, restricted to quantifiers, and the second, reconstruction. In each case, the questions are

behavioral acquisition methods with preschool-age children

173

whether or not children have the relevant movement operation in their grammar, and if so, if they deploy it in an adult-like manner and correctly interpret structures in which it is implicated. Taken together, the experimental findings illustrate that even at a very early age, children demonstrate sophisticated syntactic competence that is not accurately depicted in their performance, and in many ways is on a par with that of adults in that they perform grammatical operations that target hierarchical structure and abstract representations. At the same time, their interpretations and productions often diverge from those of adults, but are far from haphazard, thereby seeming to reflect an underlying grammatical system that demonstrates characteristics shared across languages. A common theme that arises throughout the studies covered here is the question of whether the behavioral responses observed when children veer from the adult state are a direct reflection of children’s still-developing syntactic knowledge or are the consequence of extragrammatical factors, such as an experimental task artifact, an immature sentence processor, or a growing ability to appeal to features of the discourse context. An important goal for researchers investigating each of these topics (and others) is to find the right kind of methodology, or complementary methodologies, that will help to pin down the answer to this question and paint a clearer picture of young children’s syntactic knowledge.

6.2 Individual words: Syntactic constraints and interaction

..........................................................................................................................

6.2.1 Pronouns and reflexives Pronouns are pervasive in natural language. They can easily pick out the speaker(s) of the utterance (I, we), the addressee(s) (you) or another salient individual perhaps associated with a particular grammatical gender (he, she) or an individual of non-binary or unknown gender (they), or group of individuals (they). Their short phonological form makes them easy to produce, and they can make reference to individuals without further precisifying their identity, so they make for communicative ease. However, the same ability to pick out a variety of individuals can be a sword that cuts both ways, since they cannot be interpreted without contextual support: They require a salient antecedent in the linguistic or extralinguistic discourse context. This topic can be explored further from a pragmatic perspective in terms of the presupposition of existence and accommodation, or from a semantic perspective in terms of indexicality. From a syntactic perspective, there are constraints on which other expressions within a sentence can serve as a potential antecedent. The chapter on the acquisition of syntax by infants touched upon children’s initial ability to appeal to c-command (Reinhart 1981) and Principle C (Chomsky 1981/1993; Reinhart 1983) in order to identify potential antecedents or rule them out. Principle C

174

kristen syrett

states that a name (also known as an R-expression) must be free. That is, it cannot be ccommanded by a pronoun with which it is to be co-construed. A c-command relation is typically defined in terms of dominance. Given two expressions α and β, α c-commands β if every node that dominates α (or the lowest node that dominates α) also dominates β, but neither dominates the other. This constraint rules out the possibility of the subject pronoun and name in (1) referring to the same person, since the pronoun in subject position c-commands the R-expression in object position. The pronoun could, however, be linked to a sentence-external antecedent. (1) She∗i /k is proud of Maryi . Researchers initially thought that children prohibit coreference between a pronoun and any subsequent R-expression in a sentence, since in Act-Out Tasks designed to probe their interpretation of sentences such as those in (2), children routinely manipulated props so that the pronoun was associated with an external referent (e.g. another animal) (Solan 1983; Tavakolian 1978). However, it has been argued that these results could have arisen from the pressure to resolve reference as soon as possible and/or from a preference to choose an antecedent not named in the sentence. Plus, an actout task invites participants to select one way of realizing an interpretation, even if they might allow another one, and thus ultimately reflects preference. Thus, one cannot conclude from these results what is not allowed. Moreover, these constructions allow for an anaphoric relation in which the pronoun could, in principle, be associated with the experiencer that follows later, as in (2), and thus do not reveal anything about Principle C or coreference in cases in which it is not allowed. (See Guasti and Chierchia 1999/2000: 140–142.) (2)

a. For him to kiss the lion would make the duck happy. b. That he kissed the lion made the duck happy.

Lust et al. (1980) reported that children who were presented with the sentences in (3) in an act-out task were more likely to allow for coreference in the case of backwards anaphora as in (a) than in (b), although they still allowed anaphoric relations in sentences like those in (b) a sizeable percentage of the time. (3)

a. While he was playing with the lion, John was singing. b. He was playing with the lion while John was singing.

However, these findings still came from an act-out task. A better test of whether or not children know about constraints on coreference comes from tasks in which performance pressures are alleviated, coreference is contextually favored, and children are asked to assess whether or not a sentence containing a Principle C violation could in fact express this relation. Such evidence comes to us from the Truth Value Judgment Task (TVJT) (Crain and Thornton 1998; Gordon 1996). In this task, one experimenter tells a story with props (small toys) or images on a screen. Another experimenter plays the role of a hand puppet, who watches the stories alongside the child. The premise is typically that the puppet

behavioral acquisition methods with preschool-age children

175

is learning, and may make mistakes, and needs the child’s help to learn. At the end of the story, the puppet delivers a statement, and the child is asked to judge its truth value. However, children are not directly asked to do so. Instead, they are asked to choose an appropriate reward for the puppet/something desirable if the puppet is correct, and something not (or less) desirable if the puppet is wrong. In this way, the child sees the task as a game. But there is also a methodological advantage in that the experimenter can carefully control the form of a sentence and its associated interpretation, as well as the context of presentation, thereby determining the truth-conditions assigned to a sentence and how the syntax constrains meaning (or not). In a TVJT designed to probe whether children allow for the pronoun and an Rexpression it c-commands to be co-construed, Crain and Thornton (1998) presented sentences such as those in (4) in a game scenario, and found that preschoolers largely allowed for coreference in (a), but not in (b). As a point of comparison, in so-called backwards anaphora constructions, as in (5), children routinely permitted the pronoun to be associated either with a sentence-internal antecedent (an anaphoric relation) or with a salient discourse referent in the discourse (an exophoric relation). (4)

a. The Trolli said that hei/k is the best jumper. b. He*i/k said that the Trolli is the best jumper.

(5) While hei/k was eating a cookie, the Trolli played tennis. In the experiment, each target trial designed to probe (4) had a similar structure. The story for the control and test sentences in (a) and (b), respectively, proceeded as follows. The participants are first introduced to a set of characters (Cookie Monster, a Troll, and Grover), who are about to participate in a jumping contest, judged by Robocop. Each competitor is supposed to take a turn jumping over some objects (a log, some barrels, a bench). Cookie Monster goes first and has only moderate success but a positive attitude. The Troll goes next and, full of bravado, jumps over all of the obstacles. Not to be bested, Grover then follows, and matches the Troll’s success. When it is time for Robocop to judge, he judiciously takes his time, deciding between the last two jumpers who will be awarded the coveted prize (colored pasta). Robocop ultimately awards the prize to Grover, declaring him to be the best jumper. Grover graciously accepts his prize. However, the Troll protests, seizing some of the pasta prize for himself, declaring that Robocop is wrong and that he is the best jumper. Thus, it is true that the Troll thinks that he himself (the Troll) is the best jumper (not Grover, the other salient male jumper), but not true that Robocop (the other salient male character assessing the jumping) thinks that the Troll is the best jumper. This distinction is key for rendering a judgment of the sentences in (4), in which the binding constraints are implicated. In subsequent examinations of children’s knowledge of Principle C, Leddon and Lidz (2005) also found that children disallowed coreference between the pronoun and Rexpression in constructions such as those in (6). (6)

a. He∗i/k was very proud of Andyi . b. She∗i/k put up the red painting of Miss Cruellai .

176

kristen syrett

Finally, Italian-speaking children who were presented with sentences such as those in (7) in a study conducted by Guasti and Chierchia (1999/2000), allowed coreference in ambiguous sentences such as the one in (a), but rejected coreference for sentences such as the one in (b) (7)

a. Mentre ballava, un pagliaccio suonava la chitarra. While (he) was.dancing, a clown was.playing the guitar ‘While he was dancing, a clown was playing the guitar.’ b. Andava sul cavallo a dondolo, mentre un musicista (He) was.riding on.the rocking horse, while a musician suonava la tromba. was.playing the trumpet ‘(He) was riding a rocking horse, while a musician was playing the trumpet.’

In a separate Sentence Repetition task using similar sentences with an anaphoric relation, these same researchers asked children to repeat a frog’s utterance describing a scene to a bear, who was unable to see the events. They found that Italian-speaking children were more likely to revise sentences that were ungrammatical under coreference than sentences that were grammatical (although they revised both at relatively high rates). Thus, it appears that children largely demonstrate mastery of Principle C relatively early in development. However, the knowledge they exhibit is variable and task-dependent, and it is apparent that evidence gathered from tasks asking children to demonstrate an interpretation should be complemented by evidence from tasks in which they assess the availability of a favored interpretation that is either consistent with or violates Principle C. (On this point, see Gor and Syrett 2015.) Principle C is not the only binding constraint. Two other constraints place limitations on the availability of co-construal between a name and an anaphoric expression. Principle A states that an expression like herself must be locally bound, that is, it must be c-commanded by an antecedent in the same clause. Thus, in (8), the reflexive herself can be coindexed with Maryellen in (a), but can only be coindexed with Jane in (b). By contrast, Principle B says that a pronoun must be free; its antecedent cannot c-command it. Thus, in (9), the pronoun cannot be coindexed with Maryellen in (a), but in (b), it can only be coindexed with Maryellen, not Rebecca. (8)

a. Maryelleni is proud of herselfi . b. Maryelleni said that Rebecca is proud of herself∗i /k .

(9)

a. Maryelleni is proud of her∗i /k . b. Maryelleni said that Rebecca is proud of heri/*k .

Chien and Wexler (1990) set out to test whether young children aged 2–5 years had these binding constraints as part of their grammar, and therefore whether they would demonstrate knowledge of them in an experimental task. In one version of an Act-Out Task, they asked children to perform actions with props in response to target sentences

behavioral acquisition methods with preschool-age children

177

as in (10). In another version, they asked children to follow the directions of a puppet and perform actions themselves. (10)

a. {Kitty/Snoopy} says that {Sarah/Adam} should give {her/him} a popsicle. b. {Kitty/Snoopy} says that {Sarah/Adam} should give {herself/himself} a crayon.

The younger children allowed for an ungrammatical non-local antecedent with Principle A (reflexives), and many of the older children still did not demonstrate consistent knowledge of Principle B. When the experimenters controlled for the gender of the pronoun, children’s performance increased, indicating that this morphosyntactic information helped to resolve reference. Chien and Wexler suggested that a pragmatic principle that disallows coreference, raising the question about the source of the non-adult behavior. McDaniel, Smith Cairns, and Hsu (1990a) also found variability in performance among the binding principles. They asked preschoolers to act out sentences such as those in (11), and found that if children demonstrated mastery of the binding principles by performing successfully with any or all three types, they were not successful with Principle B prior to Principles A or C. (11) a. Groveri is washing himselfi /*k . Principle A b. Groveri is washing him*i /k .

Principle B

c. He*i /k is washing Groveri .

Principle C

In a version of their task using Picture Comprehension, Chien and Wexler showed children a drawing, were introduced to the characters in it, and were asked a question about the relationship between the characters in the drawing, as in (12). (12)

a. Is Mama Bear touching {her/herself/Goldilocks}? b. {Is every bear/Are all of the bears} touching {her/herself/Goldilocks}?

The experimenters compared names and universally quantified subjects (as in (b)), because quantificational phrases, unlike names, cannot be coindexed with another expression, and instead “bind” the other expression via c-command. Here, the youngest children still allowed a reflexive to have a non-local antecedent (in violation of Principle A), and many children also allowed for a pronoun to have a local antecedent (in violation of Principle B); but performance gradually improved with age, and was much better when gender features of the lexical items matched. The presence of a quantifier and a binding relation also improved performance. Evidence that children do recognize that a universal quantifier in subject position binds a reflexive in object position, consistent with Principle A, comes from the results of a TVJT conducted by Leddon and Lidz (2005) (see also Leddon 2007), where participants were presented with sentence such as those in (13). (13)

a. Every hippoi was very proud of herselfi/*k . b. Every danceri put up the white painting of herselfi/*k .

178

kristen syrett

In the story for the (a) sentence, a cow and three hippos are engaged in a rock-pushing contest. The hippos baulk and say that the rocks all look heavy, and say that they would be very proud of anyone who pushes their rock past the line. Miss Cow swaggers up to her rock and, though she initially struggles, she deftly pushes it across the line. The hippos express their collective pride in Miss Cow. When it is time for the hippos to push their rocks, they try, but only manage to push their rocks a little bit, nowhere near the line. They concede to only being a little proud of themselves. The puppet then delivers the target sentence in (a). If participants interpret ‘herself ’ as ‘Miss Cow’, they should accept the sentence, but if they take ‘herself ’ to be bound by the quantified phrase, they should reject the sentence. In the story for the (b) sentence, Miss Cruella announces to the dancers in her school that they have been elected the best dancers, and as a result, they get to put a painting of themselves up on the wall. Each girl must choose between a red and a white painting. But they each must also pick a picture of their teacher to put up, and here, too, they are forced to choose between a red and a white painting. Each girl ultimately puts up a white painting of herself and a red painting of Miss Cruella. Thus, when presented with the sentences in (b), they should reject the sentence if ‘herself ’ is taken to mean ‘Miss Cruella’ and accept it if it is taken to mean ‘the dancer’. Like adults, children routinely accessed bound interpretations of the reflexives, although not at the same “ceiling” level as adults. The presence of a quantificational subject appears to be a key feature of evaluating children’s knowledge of the binding principles. In a variation of the sentences testing Principle B, children largely interpreted the pronoun freely and not bound by the universal quantifier, although surprisingly, even adults frequently allowed the pronoun to be bound (Leddon and Lidz 2005). (14)

a. Every hippo was a little proud of her*i/k . b. Every danceri put up the white painting of her*i/k .

However, children’s performance with Principle B is highly variable, and depends not only on the type of subject but also on design features of the task. In a TVJT, Thornton and Wexler (1999) found that in response to target sentences such as those in (15), following a story in which the bound reading is true, children accept the first sentence a significant percentage of the time, but routinely reject the second. (15)

a. Berti brushed him*i/k . b. Every reindeeri brushed him*i/k .

The fact that children routinely allow for a pronoun to be associated with an Rexpression that c-commands it has been called the delay of Principle B effect. The fact that their performance improves and that they appear to demonstrate knowledge of the Principle when a quantificational phrase binds the pronoun has been called Quantificational Asymmetry (Elbourne 2005). Following up on critiques by Elbourne of previous experiments that seem to have led to a Quantificational Asymmetry, Conroy et al. (2009) incorporated changes into the experimental design, and demonstrated that this Quantificational Asymmetry is a task artifact.

behavioral acquisition methods with preschool-age children

179

Conroy et al. took care to match the design of the stories in the referential and quantificational conditions, ensuring that the potential antecedents were comparably available in both conditions, and promoted the accessibility of certain interpretations, making them central to the story plot in an equivalent way. As a result, with similar sentences to those investigated before, children and adults routinely rejected the bound (anaphoric) interpretation of the pronoun when there was a Principle B violation in both conditions. In a follow-up experiment using target sentences in which the pronoun appeared in possessive position and therefore allowed for the bound interpretation, as in (16), participants instead accepted the sentences, providing further support that their rejection of it in the previous experiment could be attributed to Principle B, and not to inaccessibility of the intrasentential antecedent and anaphoric reading in general. (16)

a. Grumpy painted his costume. b. Every dwarf painted his costume.

Finally, in further manipulations of the story and a pronoun in object position, they once again elicited an asymmetry between a referential and a quantificational subject in terms of acceptance rates of the bound reading, further demonstrating that Principle B is a part of the child’s grammar, and that task artifacts may mask this knowledge. (See also extensive discussion in Grimshaw and Rosen 1990 concerning task artifacts masking knowledge of the binding principles.)

6.2.2 Quantifiers and scope Words like every and all are universal quantifiers that pick out the entirety of a set of entities. A long line of research on the acquisition of semantics has investigated whether or not young children know about the “maximal” meaning of these quantifiers, how they differ from the strongly distributive universal quantifier each, whether they quantify over objects or events, and how their meaning interacts with other components of the sentence such as an indefinite object, negation, or disjunction. Some of this semantics research implicates syntax, and one area in which the study of quantification lies at the syntax–semantics interface is with the scope-taking properties of quantifiers. A sentence such as (17) is ambiguous. It could either mean that every student was such that they did not get an A, or that it is not the case that every student got an A (leaving the door open for some students getting an A). The ambiguity arises from the interaction of the quantifier and negation operator at an abstract logical level. Scopal ambiguity is not restricted to universal quantifiers; similar ambiguity arises with indefinites (e.g. some student(s)), including numerical expressions (e.g. two students). (17) Every student didn’t get an A. Children typically seem to access the interpretation that corresponds to the surface scope of these logical elements, and have difficulty in accessing the interpretation where the position of the elements in the abstract logical representation is not isomorphic to

180

kristen syrett

the surface syntax (Musolino 1998). This may seem like a semantic problem, in that the different interpretations arise from the interaction of logical elements. However, syntax is not absolved, because the availability of any given interpretation depends on the quantificational phrase taking “scope” relative to the negation operator. Scope is not determined by linear order in a flat string, but rather by the c-command relation in the syntax (as described above for pronouns and the binding principles). In addition to syntax being implicated for scope-taking purposes, quantifiers have to move through the syntactic structure to take wide or narrow scope with respect to negation (a form of covert movement discussed in Section 6.4). These facts can be further illustrated with (18), where the quantificational indefinite phrase is in the object position. On the surface scope, it takes narrow scope with respect to negation, but after covert movement, it takes wide scope, or inverse scope in the logical representation. (18) The professor didn’t give [some/two] (of the) students an A. One might ask, then, whether children are guided by the linear order of the elements with respect to each other in the surface string, or whether they are guided by another relation. Lidz and Musolino (2002) provided convincing evidence that children are not relying upon linear order by comparing responses from English-speaking children and adults to those from participants who speak Kannada, a language that does not share the same word order. In the version of (18) in Kannada, the indefinite phrase would precede negation. Lidz and Musolino ran a Truth Value Judgment Task (described above), presenting children and adults with scenarios that made one of the interpretations true and the other false. Adults in both languages accepted the sentences, reflecting their ability to access both the isomorphic and non-isomorphic scope interpretations. Children, however, displayed a different pattern: regardless of their language, they consistently accessed the reading corresponding to surface scope (negation>indefinite) but not the inverse scope reading (indefinite>negation). While Musolino, Crain, and Thornton (2000) initially argued that children’s grammars are to blame for their being unable to generate the requisite representation to access the non-isomorphic interpretation, subsequent years of research have demonstrated that various experimental manipulations are successful in opening the gate to this interpretation. These manipulations include satisfying the felicity conditions of negation and providing a contrast of events, accompanied by contrastive focus (Musolino and Lidz 2006) (a manipulation that complements the findings on relative clauses reviewed in this chapter); inclusion of the partitive x of the y structure (Musolino and Gualmini 2004); increasing the salience of and satisfying the Question Under Discussion (Gualmini et al. 2008), and priming the abstract logical form (Viau, Lidz, and Musolino 2010). Thus, there is by now sufficient reason to think that children’s grammars are not at all impoverished as far as these sentences are concerned, and that their apparently default inclination towards a non-adult-like interpretation is the results of an interaction of grammatical and extragrammatical factors.

behavioral acquisition methods with preschool-age children

181

6.2.3 Propositional attitude verbs The chapter on acquisition methods with infants highlighted the ways in which the syntactic environment(s) in which a word (namely, a verb) appears can be informative about the meaning of the word. This hypothesis is known as syntactic bootstrapping (Gleitman 1990; Landau and Gleitman 1985). For example, the transitive frame in which the novel verb gorp appears in (19a) indicates that this verb takes two arguments (a subject and an object), distinguishing it from the verb in the intransitive frame in (19b), which appears only with a conjoined subject. (Both occurrences of gorp are accompanied by two NPs.) The occurrence of gorp in (19b) indicates that for this verb, an object is not obligatory, allowing it to have a meaning like ‘laugh’ or ‘run’ or ‘collaborate’, and also indicates that it cannot have the causative meaning that the occurrence in (19a) can have, since the transitive, but not the intransitive frame, can portray an event in which an agent acts upon a patient, perhaps even causing a change of state. (19)

a. Amelie is gorping Mary. b. Amelie and Mary are gorping.

Acquisition of verbs continues well into the preschool years, moving from primarily transitive and intransitive verbs that denote perceptible events to also include those that denote more abstract events or properties, such as believe, think, and know. The syntactic environments in which these verbs appear continue to play a role in narrowing the hypothesis space for their meanings, although it becomes crucial for the learner not to leap to conclusions about meaning immediately upon seeing that the verb takes a finite or nonfinite sentential (propositional) complement. As the sentences with the novel verb gorp in (20) illustrate, these syntactic environments support multiple verb meanings, allowing multiple clause-taking verbs to be candidates. For example, believe, think, know, and hope can all occur in the frame in (a), while believe, want, and need can all occur in the frame in (b). (20)

a. Amelie gorps that Mary is traveling to France. b. Amelie gorps Mary to be a safe traveler.

As in earlier acquisition, a cluster of frames and common meaning are central to classifying verbs according to concepts such as “desire” or “belief ” (White, Hacquard, and Lidz in press), as illustrated in the sampling of frames presented in (21). (21)

a. Amelie [believes/thinks/knows/says/*wants/*needs] (that) it is raining. b. Amelie [*believes/*thinks/?knows/*says/wants/needs] it to rain. c. Amelie [*believes/*thinks/*knows/says/*wants/*needs] to someone that it is raining. d. Amelie [*believes/thinks/knows/*says/*wants/*needs] about rain.

Know is a factive verb, which means that it presupposes the truth of its complement (i.e. that it is true that it is raining). Even when the entire sentence is negated

182

kristen syrett

or questioned (e.g. Amelie does not know that it is raining, Does Amelie know that it is raining?), this presupposition survives. However, to conclude from the co-occurrence of a verb and a tensed sentential complement that the complement must be true would most definitely lead the learner astray, since think and say also take such finite-clause complements, and are not factive. Negating or questioning those sentences certainly does not allow the presupposition to survive. The presence of this complement does not even entail belief on the part of the subject, since while know and think associate the grammatical subject with the belief that the complement is true (at least in the absence of negation), say does not. One can say anything without believing it or thinking it to be so! It is only upon comparing the patterns of occurrences of individual verbs across frames and various verbs within the same frame that the learner can arrive at distinctions among clause-taking verbs and a more fine-grained verb meaning. Dudley et al. (2015) have demonstrated that 3-year-olds do not treat think as presupposing the truth of its complement, but differ in whether they do for know. While many 3-year-olds in their studies do not distinguish between know and think with respect to factivity, others do. For example, when shown a red and a blue box, and told (22), children search for the toy in the red box. But when told (23) either with matrix negation (a) or embedded negation (b), they search for the toy in the blue box, in line with adults. However, when presented with (24), the researchers observed some children treating know as non-factive (as with think), and others behaving in an adult-like fashion (searching for the toy in the red box). (22) Lambchop thinks that it’s in the red box. (23)

a. Lambchop doesn’t think that it’s in the red box. (matrix negation) b. Lambchop thinks that it’s not in the red box. (embedded negation)

(24) Lambchop doesn’t know that it’s in the red box. What might contribute to children’s difficulty with know? Dudley (2017) and Dudley, Hacquard, and Lidz (submitted) provide corpus data from the Gleason transcripts in CHILDES (MacWhinney 2000) demonstrating a lack of opportunities for children to encounter know in a frame in the input that would unambiguously indicate that the speaker presupposes the truth of the complement clause. Three-year-olds typically have difficulty with false belief tasks that call upon them to deploy their theory of mind and recognize that another speaker does not share the same beliefs about the world that they do (Wimmer and Perner 1983). One hypothesis for this failure to succeed in such tasks is that they do not attend to the entire sentence when assessing its truth value. That is, for a sentence such as the one in (23a) above with think, children might appear to only attend to the complement clause, disregarding what Amelie thinks is true when assigning a truth value (de Villiers 1995; Diessel and Tomasello 2001). However, if such sentences are only investigated in contexts in which the complement clause is false, then we only have part of the picture. Lewis (2013) and Lewis, Hacquard, and Lidz (2012) showed that when the main clause is false, 3-year-olds

behavioral acquisition methods with preschool-age children

183

pattern with adults in rejecting the sentence, thus demonstrating that they are attending to the content of both the main and subordinate clause.

6.2.4 Raising and control verbs While appeal to the surface syntax can be informative even for more abstract verbs, it can only go so far with another class of verbs known as raising verbs, some of which are illustrated in (25). These verbs pose a particular challenge to the learner in that their frame is “opaque.” That is, it is not possible to easily deduce something about the subcategorization pattern or argument structure directly from the syntactic environment. (25)

a. The conductor [seems/appears] to be upset. b. The pianist [is likely/happened] to play a piece by Mozart.

Whereas a learner who hears one of the grammatical sentences in (21) can deduce something about one of these verbs—namely that the subject is said to ‘think’, ‘believe’, ‘know’, etc., and that it takes a subject and a certain type of complement—the same cannot be said of the sentences in (25). The subject of these sentences is not one who engages in an event of ‘seeming’ or ‘being likely’, etc. The subject of these sentences is also not an argument of the verb. These facts are captured in (26), where the subject occurs in the object complement, and the subject position is occupied by an expletive it. (See also Becker 2005.) (26)

a. It [seems/appears] [that the conductor is upset]. b. It [is likely/happened] [that the pianist played a piece by Mozart].

What adds to the challenge of learning these verbs, apart from the opacity of the frame, is that on the surface, their structure resembles that of another construction: the control construction. As shown in (27), the subject of the main clause is indeed semantically meaningful; it controls the subject of the non-finite clause. A sentence like the one in (a) cannot be translated to one with an expletive subject, as shown in (28). The only way this sentence makes sense is if it is anaphoric to a salient entity, such as a fundraising or awards committee. (27)

a. The conductor wants to be recognized. b. The conductori wants proi to be recognized.

(28) *It wants the conductor to be recognized. Thus, when the learner encounters a new verb in a frame resembling that of (25) or (27), it is not immediately apparent whether the verb is a raising or a control verb. Becker (2006) tested the hypothesis that learners begin by assuming that all verbs appearing in such a frame are control verbs. In a modified Grammaticality Judgment Task, Becker showed preschoolers pictures, and asked them to listen to a puppet’s

184

kristen syrett

description of the picture and decide whether it was “OK” or “silly.” The puppet’s statements included control and raising verbs, as shown in (29). Becker also manipulated whether the predicate was compatible (a) or incompatible (b) with the subject in a fully-crossed design. Only the 5-year-olds responded in an adult-like way, but 3- and 4year-olds seemed to treat control verbs as raising verbs, responding that sentences such as the one in (a) were “OK.” Evidence that the children were probably not just ignoring the verb came from a follow-up TVJT, in which child participants patterned as adults would. (29)

a. The flower wants to be pink. b. The hay seems to be excited.

How then do children come to know that want is a control verb, and that the structure in (29a) is a control structure? Becker (2006) argued that children must rely upon multiple cues, including the type of predicate and event structure, expletive it subjects, and animacy of the subject. To test the potency of the last cue, Becker and Estigarribia (2013) presented adults with novel verbs in frames similar to the ones in (25) and (27) with either an inanimate or an animate subject, to determine whether animacy could prompt participants to disambiguate between raising and control verbs. Some participants read a definition favoring one type of interpretation, then a set of sentences such as the ones in (30), either (a) or (b). Others read the definition, and read a short story using such sentences. Still other participants were presented with both. (30) a. The old man joops to be very tired. (animate subject) b. The book joops to be very long.

(inanimate subject)

Participants were then asked to deliver a judgment about which of two sentences containing the novel verb sounded better to them. The expletive there sentence in (31) is compatible with a raising verb analysis, while the pseudocleft construction in (32) does not allow for a raising verb to occur in it; a control verb, however, is permissible. (31) There joops to be a computer on the desk. (expletive construction; raising OK) (32) What the fairy joops is to be small.

(pseudocleft; raising not OK)

Becker and Estigarribia (2013) found that when participants were presented with a raising-compatible definition but an animate subject, they were only “correct” about half of the time in choosing the expletive there construction over the pseudoflect. However, when the subject was inanimate, the success rate was near ceiling. Thus, adults can use animacy paired with the syntactic frame to somehow categorize the verb and make inferences about occurrences beyond the initial frame, suggesting that children, too, might engage in a similar process. Complicating the learning of control verbs and control structures is the fact that there are different kinds of control structures, as shown in (33).

behavioral acquisition methods with preschool-age children (33)

185

a. Ariel told Erniei proi to buy an ice cream. (object control into complement) b. Arieli wanted proi to push Peter Pan. (subject control into complement) c. Arieli kissed Ernie before proi buying an ice cream. (subject control into adjunct)

Across a variety of tasks, children do not exhibit consistently adult-like performance with these control structures until well after 4 years of age, demonstrating mastery of control into complements before control into adjuncts (Cairns et al. 1994; Hsu, Smith Cairns, and Fiengo 1985; McDaniel, Cairns, and Hsu 1990b). (See Guasti 2002 and McDaniel and Smith Cairns 1990 for further details.) Gerard (2016) provided evidence against a grammatical explanation of children’s non-adult-like interpretation of adjunct control sentences by showing that in moving away from a task that called upon children to assess grammaticality or make a decision about characters’ utterances, children became more adult-like in their responses. Gerard administered a Coloring Task using a method pioneered by Zuckerman et al. (2015) in which she presented children with two black and white drawings, indicating two sequential events, and asked participants to listen to a sentence, as in (34), then color one of the drawings accordingly using a touchscreen device. (34) Dora washed Diego before pro eating the red apple. In one such trial, children were shown a picture of Dora spraying water on Diego, then a picture of Dora and Diego together, each with an apple. Participants’ interpretation of pro was determined via which apple they chose to color: Dora’s or Diego’s. In this task, children exhibited a much higher success rate than in previous tasks, in particular a TVJT. Thus, as we have seen with other topics, the choice of methodology affects the conclusions one is able to make about children’s grammatical knowledge of control structures, just as is seen with other syntactic phenomena discussed in this chapter.

6.2.5 Tough and control adjectives While most adjectives can appear in prenominal (white cat) or copular position (The cat is white), some adjectives can appear in a wider range of constructions, including with a non-finite clause following, as in (35). In this sentence, the gymnastic coach is not the one who is “easy” or “tough.” Rather the sentence is comparable to the one in (36), where the subject position is occupied by an expletive it, and the coach is the object of the verb. (This pattern is reminiscent of (25) and (26).) (35) The gymnastic coach is easy/tough to please. (36) It is easy/tough to please the gymnastic coach. Still other adjectives appear with a non-finite complement, but for these, the grammatical subject is indeed the semantic subject, as shown in (37). The (a) sentence cannot

186

kristen syrett

be transformed into either (b) or (c). This is because the construction in (a) is a control structure. The underlying representation is something like what is shown in (38). (37)

a. The gymnastic coach is eager to start practicing. b. *It is eager to start practicing the gymnastic coach. c. *It is eager for the gymnastic coach to start practicing.

(38) The gymnastic coachi is eager [proi to start practicing]. As with raising verbs, animacy also plays a role in the acquisition of tough adjectives. Becker, Estigarribia, and Gylfadottir (2012) and Becker (2015) showed children brief videos of an experimenter animating and voicing a small toy in a context that was compatible with both a tough and a control interpretation. In each video, the character uttered a novel adjective five times in the frame template in (39). Importantly, the children were assigned to either an inanimate or animate condition based on the animacy of the sentential subject. (39) The NP is adjective to VP. (40)

a. An apple is very daxy to draw. b. Mr. Farmer is always greppy to help.

After each video, children were asked two yes/no questions, as shown in (41), one of which was compatible with a ‘tough’ interpretation, and the other with a control interpretation, as indicated. Each was thus grammatical under one interpretation, and ungrammatical under the other. (41) a. Is it adjective to VP? (‘tough’ adjective interpretation) b. Is the NP adjective?

(no ‘tough’ adjective, e.g., control adjective)

Becker et al. found that children who heard novel adjectives appear multiple times with an inanimate subject, were more likely to categorize them as a ‘tough’ adjective than they were when novel adjectives appeared with an animate subject. They were also faster to answer grammatical questions than ungrammatical ones with novel ‘tough’ adjectives.

6.3 Syntactic constructions

..........................................................................................................................

Certain constructions have been known over the years to produce non-adult-like responses from young children, raising questions about source of these behavioral responses—whether they are due to an immature grammar or to extragrammatical sources, such as felicity conditions of the discourse, parsing strategies, or cognitive overload. Two such constructions are relative clauses and passive constructions. The central question with both of these constructions has been whether children have the correct representation of the construction in order to interpret sentences featuring them correctly. For this reason, these constructions are presented separately from the

behavioral acquisition methods with preschool-age children

187

phenomena in the next section, although in all cases, there is claimed to be syntactic movement.

6.3.1 Relative clauses Relative clauses modifying a noun, such as the ones illustrated in bold in (42), can pose a challenge to young children due to the conditions under which it is appropriate to express them, and the processing and structural demands they impose. For example, in (a), the relative clause modifies the subject (the singer), and there is a gap in object position following the verb in the relative clause (greeted t). As a result, the verb from the relative clause is juxtaposed with the main clause verb, inviting a potential “garden path” as the sentence processor tries to make sense of the structure correlated with the words encountered incrementally. The subject encountered in sentence-initial position must be associated with the second verb appearing later in the sentence, which makes parsing this sentence that much more challenging. In (42b), the relative clause modifies the object, and there is a gap in the subject position of the relative clause, to be filled by the object of the main clause (the musician). (42)

a. The singer that the dancer greeted _ thanked the musician. b. The singer thanked the musician that _ greeted the dancer.

In a series of Act-Out Tasks, children who were presented with sentences such as (43) and toy props of a dog, a sheep, and a pig consistently acted out an interpretation in which the main subject was the subject of both the main clause and the relative clause (Sheldon 1974; Tavakolian 1981). This pattern led Tavakolian (1981) to propose that children lack the ability to represent relative clauses, and thus when confronted with a sentence with a relative clause, they reach into their grammar for another representation they do have—in this case, conjunction. Thus, they arrive at an interpretation of (43) that looks something like (44). (43) The dog pushed the sheep that jumped over the pig. (44) [The dog pushed the sheep] and [ jumped over the pig] However, subsequent experiments demonstrated that this hypothesis is untenable, and years of research that followed showed that many children actually do accurately produce relative clause syntax between ages 2 and 4 (Diessel and Tomasello 2000; McKee, McDaniel, and Snedeker 1998), and correctly interpret relative clauses not only as test sentences, but as control sentences for investigations targeting more complex constructions (Syrett and Lidz 2011). Why, then, would children have acted out sentences such as (43) in a non-adult way? Crain and Thornton offer a very convincing explanation based on pragmatics and processing. The relative clause presupposes two things. First, the event of (the sheep) jumping over the pig preceded the event expressed in the main clause assertion—that

188

kristen syrett

the dog pushed the sheep. Second, the presence of the relative clause is taken to signal that there is more than one sheep, and the one that is the object/patient in the main clause can be distinguished from the others by having the property that it was the one that jumped over the pig. Thus, if there is only one sheep to choose from among the props, and children are asked to process the sentence incrementally and plan the events to act out, they are, in a way, “lured” into acting out an incorrect interpretation. Once children are given time to process the sentence and plan their response, and more animals are introduced (thereby satisfying the felicity conditions on the use of the relative clause), the number of adult-like responses increases (Crain, McKee, and Emiliani 1990; Hamburger and Crain 1982). Diessel and Tomasello also argued that processing and pragmatic constraints are relevant to children’s production of relative clauses. They conducted a search of CHILDES transcripts (MacWhinney 2000) to document children’s early production of relative clauses between 1;9 and 5;2 years. Of the 300+ occurrences, the lion’s share of relative clauses produced modified the “predicate nominal of a presentational copular clause” (PN) as in (45), followed next by relative clauses modifying an isolated NP or the object, as in (46). (45) a. Is this something that turn around? b. It’s the one you went to last night. (46) a. The girl that came with us.

(Adam 3;5; PN) (Peter 2;10; PN) (Nina 3;1; NP)

b. I want to see some ducks that do that too. (Nina 3;2; OBJ) Diessel and Tomasello accounted for the skew by pointing out that in most of the children’s productions, the relative clause does not express presupposed information; rather, it asserts new information, and thus performs a different pragmatic function than the typical adult relative clause. What’s more, most attested relative clauses expressed a single proposition such that the entire sentence could be paraphrased as a single proposition, as in (47). (47) a. Here’s a tiger that’s gonna scare him.

(Nina 3;1; PN)

b. =The tiger’s gonna scare him. Thus, relative clauses are part of children’s early grammatical repertoire, but are subject to syntactic, pragmatic, and processing constraints, and thus variable performance (and perhaps interpretation of them when encountered) is expected as a result.

6.3.2 Passive constructions An event can be described with either active or passive voice, as illustrated in (48). (48) a. The hungry gorillas devoured the fruit.

(active voice)

b. The fruit was devoured by the hungry gorillas. (passive voice)

behavioral acquisition methods with preschool-age children

189

While children have been observed to produce some passives as early as 2.5–3 years of age (Jakubowicz 1989; Snyder and Stromswold 1997; Budwig 2001), they do not typically exhibit consistent and accurate production and comprehension of passives until around well after age 4 (Borer and Wexler 1987; Horgan 1978; Messenger, Branigan, and McLean 2012a). And often, when children are asked to act out passive sentences, they often perform an active interpretation instead (Brooks and Tomasello 1999; Harris and Flora 1982; Horgan 1978; Lempert 1990; Maratsos et al. 1985; Messenger et al. 2012b; Pinker, Lebeaux, and Frost 1987; Sudhalter and Braine 1985). This delay in the spontaneous production and comprehension of passives may suggest that the requisite grammatical knowledge is not yet in place in early childhood. However, there are three reasons to conclude that grammatical immaturity is not the cause, and that children’s knowledge of passive constructions is better their performance indicates, and may be masked by task demands. (See Baldie 1976; Crain, Thornton, and Murasugi 2009; and Armon-Lotem et al. 2015.) First, there is crosslinguistic variability in the production and comprehension of passives, in that children in some languages consistently produce passives correctly as early as 2 years of age (see e.g. Allen and Crago 1996 for Inuktitut; Demuth 1989 for Sesotho). Assuming that children acquiring any language should exhibit roughly the same course of grammatical development, it should not be the case that children acquiring these languages acquire the passive years before those acquiring languages such as English, French, German, and Hebrew do. Ironically, children have been observed to “get worse” in their interpretation of passives over the course of development (Bever 1970; Maratsos 1974), which of course cannot be due to a decline in linguistic knowledge, but may instead be linked to increased grammatical knowledge, knowledge of relative frequency of the active and passive constructions in the input, or processing factors. Second, children as young as 3 years of age can be primed to produce passive constructions. Bencini and Valian (2008) administered a Syntactic Priming Task in which they described drawings of transitive events to children with either active or passive sentences, as in (49), prompting them to repeat the sentence. (49) a. The wagon is carrying the presents.

(active)

b. The presents are carried by the wagon. (passive) Children were then encouraged to independently describe similar pictures. Children showed no effect of priming with active sentences, but a significant effect of priming with passive sentences—a result to be expected if the default is already to produce active sentences, while the ability to produce passive sentences is nascent but needs additional support to manifest itself. The results therefore indicate that young children are able to form abstract representations of passive constructions (a finding that complements syntactic priming results from Thothathiri and Snedeker 2008 with double-object and prepositional-object dative sentences). (See also Messenger et al. 2012b and Turner and Rommetveit 1967.)

190

kristen syrett

Third, there are semantic factors that facilitate production and comprehension of passive constructions. Maratsos et al. (1985) administered a Binary Forced-Choice Picture Selection Task in which they compared sentences such as those in (50), where (a) illustrates a ‘physical action passive’ and (b) a ‘mental verb passive’ in which the subject and object are experiencer and ‘stimulus’, respectively. Children were significantly better with passives such as (a) than those like (b). (50)

a. Grover is held by Ernie. b. Batman is liked by Superman.

This pattern of children performing better with actional passives than mental passives has been replicated across other studies (de Villiers and de Villiers 1985; Fox and Grodzinsky 1998; Messenger et al. 2009; Pinker, Lebeaux, and Frost 1987; Sudhalter and Braine 1985). (Although see Hirsch and Wexler (2004) for English, and Demuth, Moloi, and Machobane (2010) for counterevidence and no effect of the actional/non-actional distinction.) There is, to be sure, an effect of input, in that passives are less frequent than actives in child-directed speech (and speech in general), and that among passives, non-actional passives are not as frequent, a pattern Gordon and Chafetz (1990) found support for in a search of the three Brown (1973) CHILDES transcripts. Children are, however, better at producing passives with verbs they have heard in the passive construction (Gordon and Chafetz 1990; Pinker, LeBeaux, and Frost 1987). Using a Binary Forced-Choice Pointing Task in which children viewed videos of two events unfolding on a computer screen and asked to point to one when presented with a target sentence, Dittmar, Lieven, and Tomasello (2014) found that young children age 2–3 performed better with novel verbs than with familiar verbs (which were not necessarily found in the passive form). Moreover, children as young as 3 who are presented with novel verbs in full passive constructions are able to reproduce them (Tomasello, Brooks, and Stern 1998), providing further evidence that not only does the type of passive construction matter, but the verb itself matters. Children also appear to perform better with passives in which the arguments are animate (Lempert 1990). Finally, passives can take different forms. Apart from the difference between adjectival and verbal passives discussed in the syntactic literature (e.g., the artist is satisfied with the painting vs. the document was forged by a counterfeiter), there are also get passives and be passives, as shown in (51). (51)

a. The culprit [got/was] caught. b. The little boy [got/was] tagged (by the older boy).

Although previous researchers have found that children may produce get passives early (Crain, Thornton, and Murasugi 2009) and may also perform better with get passives than with be passives (Harris and Flora 1982; Marchman et al. 1991), this is not always the case. In a Binary Forced-Choice Picture Selection Task comparing comprehension of get and be passives in 3- and 4-year-old children, Gotowski (2018) found no

behavioral acquisition methods with preschool-age children

191

advantage of get passives. This difference in findings may point to the power of different methodologies and/or the importance of contextual support for interpretation and production.

6.4 Overt movement in questions and covert movement

..........................................................................................................................

In natural languages, two types of movement can be distinguished, based on whether we can detect that movement took place in the surface string: overt movement, in which a linguistic object is displaced in the surface structure relative to the position in which it is interpreted, and covert movement, in which movement is said to take place at a level of interpretation in which it is not observed on the surface. The first of these two types of movement can be readily observed in questions: subject/auxiliary verb inversion in yes/no questions and movement of the wh-phrase in wh-questions. The second type, which cannot be observed on the surface, comes in two forms: quantifier raising and reconstruction. In all four of these cases, children have been observed to have the requisite mechanisms as part of their grammar, but diverge from adults in noticeable ways. The similarities with and differences from adults with respect to these types of movement are outlined below.

6.4.1 Yes/no questions Yes/no (or “polar”) questions in English are typically formed by inverting the subject and the verb heading the matrix VP, as shown in the comparison between the question in (a) and the base declarative form in (b) in (52) below. That the verb is not the first verb in the declarative sentence, but is the head of the main clause is illustrated in (53), which features a restrictive relative clause modifying the subject. The auxiliary verb in the relative clause does not move; the auxiliary verb from the main clause does. (Moreover, the verb also does not just move to the front of the sentence. It moves to a position immediately before the subject, as can be illustrated by inserting an adverbial clause at the beginning of the sentences.) Thus, the formation of yes/no questions relies upon appealing to the hierarchical syntactic structure, and not linear precedence. (52)

a. Is the girl playing basketball? b. The girl is playing basketball.

(53)

a. Is the girl who can dance playing basketball? b. The girl who can dance is playing basketball.

Crain and Nakayama (1987) employed a Question Elicitation Paradigm to investigate whether young children generate a structure-dependent or structure-independent

192

kristen syrett

hypothesis about the formation of yes/no questions. Children were shown pictures that featured two salient entities, differing in a key property (thereby satisfying the felicity conditions of the use of the relative clause in the subject position, as mentioned earlier). They were instructed to ask a puppet a question about one of the entities. For example, Ask Jabba if the boy who is watching Mickey Mouse is happy or Ask Jabba if the boy who is happy can see Mickey Mouse. Children produced a range of question forms, many of which were indeed ungrammatical, as attested in (54). A number of these questions appeared to involve insertion of a sentence-initial is as in (a) and (b), but others veered off course in other ways, implicating tense, as in (c) and (d). (54)

a. Is the boy can jump? b. Is the boy who is watching Mickey Mouse is happy? c. Did you did came home? d. What did you got?

Importantly, however, children never simply preposed the first auxiliary verb, leaving a gap in the relative clause. Thus, Crain and Nakayama argued that children adopt a hypothesis about yes/no question formation that appeals to the syntactic structure, and not (just) linear order. This argument was later bolstered by Gualmini and Crain (2005) in their investigations of the scope of negation with disjunction. (See also the summary of work on one anaphora in the Chapter 5.)

6.4.2 Wh-questions Questions such as those in (55) involve displacement of the wh-phrase from its base position—for example, from the subject position (a), the object position (b), or an adjunct position (c). Stromswold’s (1995) analysis of 12 CHILDES transcripts revealed that children begin producing their first subject and object questions with who and what some time between the age of 1;8 and 2;8, with which questions following at least a year after the onset of these questions. (55)

a. Who is drawing a picture? b. What is the girl drawing? c. Why is the girl drawing a picture?

While children begin producing simple wh-questions early, they also demonstrate an ability to access multiple interpretations of biclausal sentences with wh-adjunct questions, as in (56). These “short” and “long-distance” interpretations suggest that children (like adults) allow for successive-cyclic movement of the wh-phrase from its base position through the structure to the front of the sentence (de Villiers, Roeper, and Vainikka 1990; Weissenborn, Roeper, and de Villiers 1991). (56) Wheni did the boy say ti he hurt himself ti ? Such evidence comes from experiments using the Questions after Story paradigm.

behavioral acquisition methods with preschool-age children

193

In this task, children are told a brief story in which two salient events occur at different times and locations. In the corresponding story for (56), for example, the boy hurts himself when falling from a tree earlier in the day, and then recounts the accident in the tub to his parent later in the evening. After the story, the experimenter asks the child the target question, which the child then answers. The child’s answer reveals at least one interpretation that is made available by the grammar, ostensibly reflecting grammatical constraints (or not) in the generation of this response. As an illustration of the constraints on movement, children do not appear to allow multiple answers to questions such as (57), where the medial wh- ‘how’ question induces a barrier to wh-movement. (The relevant position in the syntactic tree is already occupied by a wh-word, thereby blocking movement from further down in the structure. The when wh-phrase must therefore originate higher than the embedded CP.) (57) Wheni did the boy say ti howk he hurt himself t*i/k ? Omaki et al. (2014) extended the results beyond the matrix verb of say and testing children (and now adults as well) in English and in Japanese/to languages that differ in their word order, as shown in (58). (58)

a. Where did Lizzie tell someone that she was gonna catch butterflies? b. Doko-de Yukiko-chan-wa choucho-o tsukamaeru-to itteta-no? where-at Yukiko-dim-top pro butterfly-acc catch-comp was.tellingques1 ‘Where was Yukiko telling someone that she will catch butterflies?’

Omaki et al. (2014) found that English-speaking participants preferred answering the embedded clause question with say, but the matrix clause question with tell someone or say to someone, while Japanese-speaking participants preferred answering the embedded clause question—a difference they attributed to the fact that children actively associate the fronted wh-phrase with the first VP in the sentence, indicating that the sentence parser is seeks to complete the filler–gap dependency at the first possible site. The parser encounters the matrix verb (tell) first in English, but the embedded verb (catch) first in Japanese. The results also show that for both children and adults, the parser is also influenced by properties of the verbs, a finding complementing evidence by Snedeker and Trueswell (2004) in children’s incremental processing of syntactically ambiguous garden path sentences. Not all children’s productions and interpretations are adult-like, though. In fact, children have been observed in elicited production tasks to produce wh-questions such as those in (59) (Thornton 1995), which English-speaking adults do not produce. (59)

1

a. Which Smurf do you think who has roller skates on? b. Which animal do you think what really says ‘woof woof ’?

DIM = diminuitive form; TOP = topic marker; ACC = accusative case; COMP = complementizer; QUES = question marker.

194

kristen syrett

Grolla and Lidz (2018) have argued, based on the results of an elicited question production task paired with motor and cognitive inhibition tasks, that the production of such medial questions is not due to an impoverished grammar, but is instead due to the influence of an immature production system, since an increased number of medial questions is more likely to be observed in children with a limited inhibition capacity (as demonstrated when participants are asked to press a key on the opposite side of a keyboard relative to where an image appears on a screen). However, accessing medial wh-questions seems not to be exclusive to production. De Villiers et al. (1990) found that children often answer the medial who question in questions such as (60) (i.e. by responding who to help, not how the asking took place). (60) Howi did he ask ti who he should help t*i ? To probe if children were simply inclined to answer the last salient question in the sentence, de Villiers and Roeper (1991; 1995) tested children with questions such as those in (61) which included a relative clause that should induce a barrier to wh-movement. (61) a. How did the man who hurt his leg get home?

(subject relative)

b. How did the man rescue the cat who broke her leg? (object relative) c. How did the boy drink who sneezed?

(subject relative, extraposed)

In the story preceding the sentence in (61c), two brothers went to a circus. A clown tickled one boy’s nose with a feather, leading him to accidentally sneeze and blow the clown’s wig right off. Afterwards, the boys drank some milk, and the boy who sneezed drank his milk with a straw, while his old brother drank his straight from the carton. Thus, the answer to (61c) should be ‘with a straw,’ but if children are drawn to answer the last question, they should say, ‘the little boy.’ Children routinely answered the subject and object relative clauses correctly, demonstrating implicit awareness of barriers to movement. This difference in responses to questions (60) and (61) led de Villiers and Roeper to argue that what children do in the case of the former is licensed by the grammar, and is consistent with partial questions in other languages, such as German. Thus, children must learn to prune away an interpretation that is not licensed in the language they are acquiring, but is part of the cross-linguistic grammatically licensed inventory. Indeed, while children’s productions such as those in (59a) diverge from adults in English, they are reminiscent of Irish, and are consistent with an approach to wh-movement that assumes successive cyclic movement with the wh-phrase leaving a trace as it moves through the structure—a trace that may exhibit overt agreement (see also, van Valin 1998). An asymmetry between production and comprehension of wh-questions is also observed with why questions. Very young English-speaking children do not always invert subject–verb word order when producing why questions, as illustrated in (62) from Adam (Brown 1973; MacWhinney 2000), even as they invert word order with many other why and with other wh-questions, and perform do support with why questions

behavioral acquisition methods with preschool-age children

195

(see Labov and Labov 1978; Thornton 2004). However, even when children’s production of these why questions is non-adult-like in the lack of inversion, they are still able to access both interpretations of ambiguous why questions such as the one in (63) (Conroy and Lidz 2007). (62) a. Why four men are eating it? (Adam file 48, line 298) b. Why you live where I live?

(Adam file 53, line 2389)

(63) Why did Joe say Monster ate his sandwich? Questions can also give rise to another kind of ambiguity arising from the interaction of a wh-phrase and a quantificational phrase. In the wh-questions in (64), a universal quantifier appears in either subject (a) or object (b) position. (64) a. [What/Which toy] did [every/each] child pick? (subject quantifier question) b. [Who/Which child] picked [every/each] toy?

(object quantifier question)

The difference in syntactic configuration yields differences in the kind of answer licensed for each question type. In response to either question, one could respond with what is known as a “single answer (i.e. identifying the single token or type of toy that every child picked in (a) or identifying the child in (b)). However, in (64a), there is an additional type of response that is licensed: a pair-list answer. In this case, one could respond by listing for each child what toy they picked. In a Question after Story task, Roeper and de Villiers (1993) demonstrated that children allow pair-list answers for both question types, leading them to conclude that children lacked the structural constraints on movement of the object quantifier questions. However, features of their stories in their Question after Story task and their use of the unmarked who in the two questions may have induced children to respond with pair-list answers. A replication by Yamakoshi (2002) using a similar design, and hinted at the subject–object asymmetry predicted by the theoretical literature, but adopted a coding strategy that may have presented an inaccurate picture of the rate of pair-list answers in both question types. Achimova et al. (2017) therefore incorporated the requisite revised design features into a Question after Story task, including use of which N instead of who, and ran the experiment both with children and with adults. The type of universal quantifier was also manipulated, in order to see if the distributivity of each induced pair-list answers. The results demonstrate that children do have a subject–object asymmetry in the rate of pair-list answers licensed by subject and object quantifier wh-questions, but that they are not sensitive to the strong distributivity effects of each, a finding also produced across experiments elsewhere. (See Syrett 2015a for discussion.)

196

kristen syrett

6.4.3 Quantifier raising Universal quantifiers such as every have already played a prominent role in topics covered earlier in this chapter, because of what children have to learn about their quantificational meaning and how they interact with other elements in the sentence to yield distinct interpretations. Yet another aspect of their meaning concerns their role in the interpretation of sentences such as (65). (65) Anna read every book that Nathaniel did. This sentence means that for every book that Nathaniel read, Anna read that book as well. Thus, the word did is a placeholder for a Verb Phrase similar to the one encountered elsewhere in the sentence. It therefore resembles a sentence such as the one in (66), where one simply looks back to the VP antecedent in the first conjunct, and “inserts” this meaning into the site of the missing VP in the second conjunct. By 4 years of age, children are able to interpret instances of VP ellipsis such as this one, and are sensitive to structural constraints on the interpretation of ellipsis (Foley et al. 2003; Matsuo and Duffield 2001; Syrett and Lidz 2011). (66) Anna read every book and Nathaniel did, too. In (65), the situation is a bit more complicated, though, since the site of VP ellipsis is actually contained in the antecedent VP. It is for this reason that the construction is called Antecedent-Contained Deletion (ACD). Consequently, there is no way to fully resolve the interpretation as long as the structure remains as it is. In order for this meaning to be generated, the quantificational phrase in bold must raise to a higher location in the structure, outside of the VP, in a configuration such as the one in (67), thereby allowing the VP to be copied into the site of VP ellipsis (Fiengo and May 1994; Kennedy 1997; Larson and May 1990; May and Keyser 1985). (67) Anna [every book that Nathaniel read]i read ti . This grammatical mechanism is called quantifier raising. Quantifier raising (QR) is claimed to already be necessary for other semantic reasons (in a generative framework), one of which is to allow a quantificational phrase in object position to compose with a transitive verb that takes it as an argument, because there is a semantic type mismatch between the two elements. So apart from generating and interpreting constructions such as (65), children would have to have QR as part of their grammar regardless. Evidence that young children are able to perform the operation of QR comes from a number of TVJTs in which children are presented with sentences that require the quantifier-raising operation in order to generate either a grammatical or the intended interpretation, in contexts that favor another interpretation (Gor and Syrett 2015; Kiguchi and Thornton 2004; Lidz et al. 2004; Syrett and Lidz 2009; 2011; Syrett 2015b; 2015c). Lidz et al. (2004) presented participants with sentences such as those in (68), where the quantificational phrase was either in subject position (a) or object position (b).

behavioral acquisition methods with preschool-age children (68)

197

a. Every danceri kissed Kermit before shei/k went on stage. b. Kermit kissed every danceri before shei/k went on stage.

In each supporting context, there were three dancers, and another salient female, a singer. When the quantifier is in subject position, as in (68a), it can bind the variable in the before clause, because the subject is in a higher position than this VP adjunct, or it can allow it to be free and be associated with an external antecedent, thereby making the sentence ambiguous. When the quantifier is in object position, as in (68b), the VP adjunct is in a higher position than the quantificational phrase, and so one might assume that the quantifier cannot bind the variable. Indeed, it cannot from this position, but it can once the quantificational phrase undergoes QR to a position that is higher than the VP adjunct. From this position, binding is therefore possible. Lidz et al. (2004) showed that both children and adults access both interpretations of both sentences, presumably indicating that children have QR as a part of their grammar. Perhaps more convincing evidence (which does not rely on acceptance of the target statement, and a confound of a potential Yes response bias) comes from studies investigating ACD constructions, as in (65). Syrett and Lidz (2009) reasoned that if young children do not have the QR operation in their grammar, when faced with an ACD sentence, they will have to arrive at some way of interpreting it. Notice that in (69), there is a relative clause attached to the direct object as part of the quantificational phrase. The previous research on relative clauses indicated that when children cannot interpret a restrictive relative clause (for whatever reason), they opt for a representation that is part of their grammatical repertoire: coordinated conjunction. Thus, Syrett and Lidz compared participants’ performance with ACD sentences like (69) to coordinated conjunction sentences like (70) in contexts that made the first true and the second false, and contexts that had the opposite truth values (because, for example, there was no overlap in the race cars that were driven). Children showed very different acceptance rates for the two constructions, in a way that aligned with the adult grammar. (69) Lady Bug drove every race car that Mister Bug did.

ACD

(70) Lady Bug drove every race car, and Mister Bug did, too. Conjunction Further evidence of the QR operation as part of the child’s grammar comes from their interpretation of ACD constructions that involve pronominal reference and the binding principles. Kiguchi and Thornton (2004) presented participants with sentences like those in (71) in contexts that falsified the interpretation generated by QR and evaluated how the pronoun was construed. (71)

a. Darth Vader found her*i/k the same kind of treasure that the Mermaidi did. b. Dora gave himi/k the same color paint the Smurfi ’s father did. c. He*i/k jumped over every fence that Kermiti tried to.

Participants were observed to reject coreference in (71a), but that could have been either because of a Principle C violation on the surface (since the pronoun c-commands the

198

kristen syrett

R-expression) or a Principle B violation after QR (since then the R-expression would c-command the pronoun). However, participants allowed coconstrual in the (b) sentences, even with the pronoun c-commanding the R-expression (which can be illustrated by substituting in ever boy and his in place of the two expressions, allowing for a binding relation), thereby indicating that the phrase the same color paint… must QR to a position higher than the pronoun. And since the R-expression is in the possessive position of the DP, it cannot c-command the pronoun, allowing for the pronoun to be free in its domain. Assuming the quantification phrase QRs to a higher position, sentences like the one in (c) illustrate that the landing site is still lower than the subject position (e.g. vP), since QR does not salvage coconstrual here (Fox 1995; Merchant 2000). Syrett and Lidz (2011) provided evidence that children are not restricted to only one possible landing site by demonstrating that they can access multiple readings of the ambiguous subject (a) and object (b) control structures that feature embedded non-finite clauses in (72). In order to access the embedded reading, the quantificational phrase must at least QR out of the innermost VP (drive…, read…), and in order to access the matrix reading, it must QR past the outermost VP (want…, ask…). (72)

a. Miss Piggy wanted to drive every car that Kermit did. b. Clifford asked Goofy to read every book that Scooby did.

Even more striking is the fact that children can access multiple readings of sentences such as those in (73), which feature a finite embedded clause, providing clear justifications for both the embedded and matrix readings, and accessing each even when it is disfavored by the experimental context (Syrett and Lidz 2011; Syrett 2015c). (73)

a. Fozzie said that Miss Piggy drove every car that Kermit did. b. Clifford said that Goofy read every book that Scooby did.

The availability of the matrix reading clearly indicates that children are QRing out of a tensed embedded clause. This finding may be surprising in a framework where it assumed that tense is a barrier to movement. However, experimental manipulations reveal that for some adults, too, the matrix reading is robust under the right discourse conditions and when processing load is alleviated. For example, many adult participants access the matrix reading of sentences such as the one in (74), where the embedded subject is a pronoun and there is no overt complementizer (although this manipulation appears not to be necessary) (Syrett 2015c). What’s more, having QRed out of the matrix VP, some adults even allow the quantificational phrase to QR past the sentential subject to an extra-wide scope position, when this interpretation is supported by the context (Syrett 2015b). (74) Woodyi said hei jumped over every frog that Jessie did. (75) Someone said that he could jump over every frog that Jessie did. Thus, in the case of ACD sentences, experimental judgment data from children help to shed light on the nature of grammatical mechanisms, even in adults.

behavioral acquisition methods with preschool-age children

199

6.4.4 Reconstruction Not only do certain phrases raise covertly in the structure for reasons such as to resolve type mismatch or to generate other readings, but some phrases covertly reconstruct into their original position in order to be interpreted. Such is the case with sentences such as the one in (76). The target sentence is in (76a). This sentence cannot be assigned the reading that every pirate put a gun into his own barrel. The reason is that the Prepositional Phrase must first be reconstructed into its base position, as shown in (76b). Then the quantificational phrase moves covertly, generating the configuration in (76c). In this configuration, the pronoun c-commands the trace of the moved element, the intended antecedent, barring coreference. (Even if one considers that there is no reconstruction, and that the moved PP in (76a) leaves a copy in the base position, there is still a Principle C violation between the pronoun and its intended antecedent.) (76)

a. In the barrel of every pirate, he put a gun. b. base position: He put a gun in the barrel of every pirate. c. Position of quantificational phrase after QR: [every pirate]i he put a gun in the barrel of ti .

Children in a TVJT conducted by Guasti and Chierchia (1999/2000) consistently rejected coreference in such sentences, presumably indicating that they were able to reconstruct the prepositional phrase. By contrast, they consistently accepted both anaphoric and exophoric readings of ambiguous sentences such as (77). Adults behaved similarly. (77) The monkeys hid the treasure of each child while he was sleeping. The operation of reconstruction has been observed to behave differently for different types of configurations. For sentences with moved predicates, such as (78), reconstruction appears to be obligatory, as shown in the different indexing patterns between (78a) and (78b). (78)

a. Billi knew how proud of himself*i/k Johnk was __. b. Billi knew how proud of Johnk hei/*k was __.

However, for objects, which are arguments of the verb, reconstruction appears to apply differentially, depending on whether Principle A or Principle C is relevant, as illustrated in the sentences in (79). Reconstruction seems to be optional for Principle A, allowing for either coreference pattern in (79a), but it is obligatory for Principle C, as it is with predicates, allowing for only one coreference pattern in (79b). (79)

a. Billi knew which picture of himselfi/k Johnk liked __. b. Billi knew which picture of Johnk hei/*k liked __.

In a TVJT task that also incorporated a Questions after Stories element, Leddon and Lidz (2005) and Leddon (2007) tested children on sentences where reconstruction

200

kristen syrett

was obligatory, as in (80) and (81). These were variations of the declarative statements in their TVJT described earlier. After some trials, the children were asked a question. Before the question, the experimenter listed the characters in the story, with the syntactically disfavored antecedent listed last, in order to be boosted in salience. (80)

a. Which painting of [Miss Cruellai /herself] did she*i/k put up ___? b. How proud of [Miss Cowi /herself] was she*i/k __?

(81)

a. Which painting of her*i/k did every danceri put up ___? b. How proud of her*i/k was every hippoi __?

In questions with herself, as in (80), children (and adults) consistently responded in a way that reflected a bound interpretation of reflexive. However, in questions with the pronoun, as in (81), the results were less crisp. Adults consistently responded in a way that reflected reconstruction, forcing the pronoun to be interpreted exophorically. Children responded in this way with the argument questions (as in (81a)), but not with the predicate questions (as in (81b)). With the latter, they patterned at chance level. (See Chapter 2 of Leddon 2007 for further discussion.) However, in a variation of the task with target sentences featuring a quantified subject and reflexives that should be bound or pronouns that should be interpreted freely (as in (82)), children patterned in a manner more consistent with what would be predicted by reconstruction and Principles A and B, accessing the bound interpretation of the reflexive the vast majority of the time and only accessing the bound interpretation of the pronoun a small percent of the time. Adults demonstrated the same pattern, only more pronounced. (82)

a. How confident in [himselfi/*k /him*i/k ] was every whalei __?

Combined, these results underscore the importance of investigating a range of lexical elements within a target construction, the role of experimental methodology, and the variability in grammatical knowledge manifested both within and across tasks, which may be tied to grammatical knowledge or other factors.

6.5 Conclusions

..........................................................................................................................

In this chapter, I have reviewed a variety of syntactic phenomena, ranging from individual words and their syntactic constraints, to specific syntactic constructions, to grammatical mechanisms that require movement in the syntactic structure. Throughout the summary of these phenomena, I hope to have illustrated a number of key points. I emphasize three here. First, the researcher interested in investigating syntactic knowledge and development in young children has a number of tried and true methodologies and paradigms at their fingertips. The choice of methodology/-ies can have enormous implications for

behavioral acquisition methods with preschool-age children

201

the conclusions one makes about a child’s grammar and the role of extragrammatical factors, so complementing multiple methods may yield the most success. The choice of methodology is always conditioned by the phenomenon under investigation, the tools and participants available, and the research question being asked. Second, syntax does not work alone. Many of the studies reviewed have showcased the tight connection between syntax, semantics, and pragmatics. It is therefore highly desirable to be attentive to, and to incorporate into the experimental design, satisfaction of felicity conditions and features of the discourse, such as common ground between the speaker and hearer and knowledge of what is presupposed. In addition, lessening the cognitive load and facilitating sentence processing may help to bring out certain readings that were thought not to exist. At the same time, the availability of certain readings may be an outcome of the incremental processing of a target sentence. Third, the young child between 3 and 6 years of age displays an impressive repertoire of syntactic knowledge. Where this competence appears to diverge from that of adults, additional (creative) experimentation has often called into question the contribution of an immature grammar, and revealed the role of factors outside of the syntax. Children’s errors in production and interpretation are not only amusing, but are also especially informative of their developing knowledge. However, they cannot always be taken at face value. Future research should home in on those cases where children consistently access non-adult-like interpretations, and should determine the source of these aberrations. The answers to these questions will reveal the degree of continuity in language development, shed light on the nature of the errors observed in adults and neuroatypical populations, and inform us about the viability of claims emanating from syntactic theory.

References Achimova, Asya, Kristen Syrett, Julien Musolino, and Viviane Déprez. 2017. Children’s developing knowledge of wh-/quantifier question-answer relations. Language Learning and Development 1: 80–99. Allen, Shanley E. M., and Martha B. Crago. 1996. Early passive acquisition in Inuktitut. Journal of Child Language 23: 129–155. Armon-Lotem, Sharon, Ewa Haman, Kristine Jensen de López, Magdalena Smoczynska, et al. 2016. A large-scale cross-linguistic investigation of the acquisition of passive. Language Acquisition 23: 27–56. Baldie, Brian J. 1976. The acquisition of the passive voice. Journal of Child Language 3: 331–348. Becker, Misha. 2005. Learning verbs without arguments: The problem of raising verbs. Journal of Psycholinguistic Research 34: 173–199. Becker, Misha. 2006. There began to be a learnability puzzle. Linguistic Inquiry 37: 441–456. Becker, Misha. 2015. Animacy and the acquisition of tough adjectives. Language Acquisition 22: 68–103. Becker, Misha, and Bruno Estigarribia. 2013. Harder words: Learning abstract verbs with opaque syntax. Language Learning and Development 9: 211–244.

202

kristen syrett

Becker, Misha, Bruno Estigarribia, and Duna Gylfadottir. 2012. Tough-adjectives are easy to learn. In Supplemental proceedings of the 36th Boston University Conference on Language Development. http://www.bu.edu/bucld/proceedings/supplement/vol36/. Bencini, Giulia, and Virginia Valian. 2008. Abstract sentence representations in 3-year-olds: Evidence from language production and comprehension. Journal of Memory and Language 59: 97–113. Bever, Tom. 1970. The cognitive basis for linguistic structures. In J. R. Hayes (ed.), Cognition and the development of language, 279–362. New York: Wiley. Borer, Hagit, and Wexler, Kenneth. 1987. The maturation of syntax. In Thomas Roeper and Edwin Williams (eds), Parameter setting, 123–172. Dordrecht: Reidel. Brooks, Patricia J., and Michael Tomasello. 1999. Young children learn to produce passives with nonce verbs. Developmental Psychology 35: 29–44. Brown, Roger. 1973. A first language: The early stages. Cambridge, MA: Harvard University Press. Budwig, Nancy. 2001. An exploration into children’s use of passives. In Michael Tomasello and Elizabeth Bates (eds), Language development: The essential readings, 1221–1252. Hoboken, NJ: Wiley-Blackwell. Cairns, Helen Smith, McDaniel, Dana, Hsu, Jennifer Ryan, & Rapp, Michelle. 1994. A longitudinal study of principles of control and pronominal reference in child English. Language: 260–288. Chien, Yu-Chin, and Ken Wexler. 1990. Children’s knowledge of locality conditions in binding as evidence for the modularity of syntax and pragmatics. Language Acquisition 1: 225–295. Chomsky, Noam. 1981/1993. Lectures on government and binding: The Pisa lectures. Berlin: Mouton de Gruyter. Conroy, Anastasia, and Jeffrey Lidz. 2007. Production/comprehension asymmetry in children’s why questions. In Alyona Belikova, Luisa Meroni, and Mari Umeda (eds), Proceedings of the 2nd Conference on Generative Approaches to Language Acquisition North America, 73–83. Somerville, MA: Cascadilla. Conroy, Anastasia, Eri Takahashi, Jeffrey Lidz, and Colin Phillips. 2009. Equal treatment for all antecedents: How children succeed with Principle B. Linguistic Inquiry 40: 446–486. Crain, Stephen, and Mineharu Nakayama. 1987. Structure dependence in grammar formation. Language 63: 522–543. Crain, Stephen, and Rosalind Thornton. 1998. Investigations in Universal Grammar: A guide to experiments on the acquisition of syntax and semantics. Cambridge, MA: MIT Press. Crain, Stephen, Cecile McKee, and Maria Emiliani. 1990. Visiting relatives in Italy. In Lyn Frazier and Jill Villiers (eds), Language processing and language acquisition, 335–356. Dordrecht: Kluwer. Crain, Stephen, Rosalind Thornton, and Keiko Murasugi. 2009. Capturing the evasive passive. Language Acquisition 16: 123–133. de Villiers, Jill. 1995. Questioning minds and answering machines. In Dawn MacLaughlin and Susan McEwen (eds), Proceedings of the 19th Annual Boston University Conference on Language Development, 20–36. Somerville, MA: Cascadilla. de Villiers, Jill, and Peter de Villiers. 1985. The acquisition of English. In Dan Slobin (ed.), The crosslinguistic study of language acquisition, vol. 1: The data, 27–140. Hillsdale, NJ: Erlbaum. de Villiers, Jill, and Thomas Roeper. 1991. Introduction. In Thomas L. Maxfield and Bernadette Plunkett (eds), The acquisition of wh, 1–18. Amherst, MA: GSLA.

behavioral acquisition methods with preschool-age children

203

de Villiers, Jill, and Thomas Roeper. 1995. Relative clauses are barriers to wh-movement for young children. Journal of Child Language 22: 389–404. de Villiers, Jill, Thomas Roeper, and Anne Vainikka. 1990. The acquisition of long distance rules. In Lyn Frazier and Jill de Villiers (eds), Language processing and acquisition, 257–297. Dordrecht: Kluwer. Demuth, Katherine. 1989. Maturation and the acquisition of the Sesotho passive. Language 65: 56–80. Demuth, Katherine, Francina Moloi, and Malillo Machobane. 2010. 3-Year-olds’ comprehension, production, and generalization of Sesotho passives. Cognition 115: 238–251. Diessel, Holger, and Michael Tomasello. 2000. The development of relative clauses in spontaneous child speech. Cognitive Linguistics 11: 131–151. Diessel, Holger, and Michael Tomasello. 2001. The acquisition of finite complement clauses in English: A corpus-based analysis. Cognitive Linguistics 12: 97–142. Dittmar, Miriam, Kirsten Abbot-Smith, Elena Lieven, and Michael Tomasello. 2014. Familiar verbs are not always easier than novel verbs: How German pre-school children comprehend active and passive sentences. Cognitive Science 38: 128–151. Dudley, Rachel. 2017. The role of input in discovering presupposition triggers: Figuring out what everybody already knew. Doctoral dissertation, University of Maryland, College Park. Dudley, Rachel, Naho Orita, Valentine Hacquard, and Jeffrey Lidz. 2015. Three-year-olds’ understanding of know and think. In Florian Schwarz (ed.), Experimental perspectives on presuppositions, 241–262. Dordrecht: Springer. Dudley, Rachel, Meredith Rowe, Valentine Hacquard, and Jeffrey Lidz. Submitted. Acquiring the factivity of know. MS, University of Maryland, College Park. Elbourne, Paul. 2005. On the acquisition of Principle B. Linguistic Inquiry 36: 333–365. Fiengo, Robert, and Robert May. 1994. Indices and identity. Cambridge, MA: MIT Press. Foley, Claire, Zelmira Nuñez del Prado, Isabella Barbier, and Barbara Lust. 2003. Knowledge of variable binding in VP-ellipsis: Language acquisition research and theory convergence. Syntax 6: 52–83. Fox, Danny. 1995. Condition C effects in ACD. In Rob Pensalfini and Hiroyuki Ura (eds), Papers on minimalist syntax, 105–120. Cambridge, MA: MIT Working Papers in Linguistics. Fox, Danny, and Yosef Grodzinsky. 1998. Children’s passive: A view from the by-phrase. Linguistic Inquiry 29: 311–332. Gerard, Juliana. 2016. The acquisition of adjunct control: Grammar and processing. Doctoral dissertation, University of Maryland, College Park. Gleitman, Lila R. 1990. The structural sources of verb meanings. Language Acquisition 1: 3–55. Gor, Vera, and Kristen Syrett. 2015. Picking up after sloppy children: What pronouns reveal about children’s analysis of English comparative constructions. In Elizabeth Grillo and Kyle Jepson (eds), Proceedings of the 39th Annual Boston University Conference on Language Development, 191–203. Somerville, MA: Cascadilla. Gordon, Peter. 1996. The truth value judgment task. In Dana McDaniel, Cecile McKee, and Helen Smith Cairns (eds), Methods for assessing children’s syntax, 211–232. Cambridge, MA: MIT Press. Gordon, Peter, and Jill Chafetz. 1990. Verb-based versus class-based accounts of actionality effects in children’s comprehension of passives. Cognition 36: 227–254. Gotowski, Megan. 2018. The acquisition of the get passive. Language Acquisition. https://doi.org/10.1080/10489223.2017.1391268.

204

kristen syrett

Grimshaw, Jane, and Sara Thomas Rosen. 1990. Knowledge and obedience: The developmental status of the binding theory. Linguistic Inquiry 21: 187–222. Grolla, Elaine, and Jeffrey Lidz. 2018. A performance account for medial wh-questions in child English. In Anne B. Bertolini and Maxwell J. Kaplan (eds), the Proceedings of the 42nd Annual Boston University Conference on Language Development, 289–302. Somerville, MA: Cascadilla. Gualmini, Andrea, and Stephen Crain. 2005. Operator conditioning. In Alejna Brugos, Linnea Micciulla, and Christine E. Smith (eds), Proceedings of the 28th Annual Boston University Conference on Language Development, 232–243. Somerville, MA: Cascadilla. Gualmini, Andrea, Sarah Hulsey, Valentine Hacquard, and Danny Fox. 2008. The questionanswer requirement for scope assignment. Natural Language Semantics 16: 205–237. Guasti, Maria Teresa, and Gennaro Chierchia. 1999/2000. Backward versus forward anaphora: Reconstruction in child grammar. Language Acquisition 8: 129–170. Guasti, Maria Teresa. 2002. Language acquisition. MIT Press. Hamburger, Henry, and Stephen Crain. 1982. Relative acquisition. In S. Kuczaj (ed.), Language development, vol. 2, 245–274. Hillsdale, NJ: Erlbaum. Harris, Frances N., and June A. Flora. 1982. Children’s use of get passives. Journal of Psycholinguistic Research 11: 297–311. Hirsch, Christopher, and Kenneth Wexler. 2004. Children’s passives and their resulting interpretation. In Kamil Ud Deen, Jun Nomura, Barbara Schulz, and Bonnie D. Schwartz (eds), Proceedings of the Inaugural Conference of the Generative Approaches to Language Acquisition—North America, 125–136. Storrs, CT: UConn Linguistics. Horgan, Dianne. 1978. The development of the full passive. Journal of Child Language 5: 65–80. Hsu, Jennifer R., Helen Smith Cairns, and Robert W. Fiengo. 1985. The development of grammars underlying children’s interpretation of complex sentences. Cognition 20: 25–48. Jakubowicz, Celia. 1989. Invariance of Universal Grammar principles in the acquisition of reflexives, anaphors, passive, promis and raising constructions in French. Paper presented at the 14th Annual Boston University Conference on Language Development. Kennedy, Christopher. 1997. Antecedent-contained deletion and the syntax of quantification. Linguistic Inquiry 28: 662–688. Kigushi, Hirohisa, and Rosalind Thornton. 2004. Binding principles and ACD constructions in child grammars. Syntax 7: 234–271. Labov, William, and Teresa Labov. 1978. Learning the syntax of questions. In Robin N. Campbell and Philip Smith (eds), Recent advances in the psychology of language III, 1–44. New York: Plenum Press. Landau, Barbara, and Lila R. Gleitman. 1985. Language and experience: Evidence from the blind child. Cambridge, MA: Harvard University Press. Larson, Robert, and Robert May. 1990. Antecedent containment or vacuous movement: Reply to Baltin. Linguistic Inquiry 21: 103–122. Leddon, Erin. 2007. Reconstruction effects in child language. Doctoral dissertation, Northwestern University, Evanston, IL. Leddon, Erin, and Jeffrey Lidz. 2005. Reconstruction effects in child language. In David Bamman, Tatiana Magnitskaia, and Colleen Zaller (eds), Proceedings of the 30th Annual Boston University Conference on Language Development, 328–339. Somerville, MA: Cascadilla. Lempert, Henrietta. 1990. Acquisition of passives: The role of patient animacy, salience, and lexical accessibility. Journal of Child Language 17: 677–696. Lewis, Shevaun. 2013. Pragmatic enrichment in language processing and development. Doctoral dissertation, University of Maryland, College Park.

behavioral acquisition methods with preschool-age children

205

Lewis, Shevaun, Valentine Hacquard, and Jeffrey Lidz. 2012. The semantics and pragmatics of belief reports in preschoolers. In Proceedings of Semantics and Linguistic Theory 22, 247–267. Ithaca, NY: CLC. Lidz, Jeffrey, and Julien Musolino. 2002. Children’s command of quantification. Cognition 84: 113–154. Lidz, Jeffrey, Erin McMahon, Kristen Syrett, Joshua Viau, et al. (eds). 2004. Proceedings of the 28th Annual Boston University Conference on Language Development, 340–349. Somerville, MA: Cascadilla. Lust, Barbara, Loveland, Kate, and Kornet, Renée. 1980. The development of anaphora in first language: Syntactic and pragmatic constraints. Linguistic Analysis 6: 359–391. MacWhinney, Brian. 2000. The CHILDES project: Tools for analyzing talk, 3rd edn. Mahwah, NJ: Erlbaum. Maratsos, Michael. 1974. Children who get worse at understanding the passive: A replication of Bever. Journal of Psycholinguistic Research 3: 65–74. Maratsos, Michael, Dana E. C. Fox, Judith A. Becker, and Mary Anne Chalkley. 1985. Semantic restrictions on children’s passives. Cognition 19: 167–192. Marchman, Virginia A., Elizabeth Bates, Antoinette Burkardt, and Amy B. Good. 1991. Functional constraints of the acquisition of the passive: Toward a model of the competence to perform. First Language 11: 65–92. Matsuo, Ayumi, and Nigel Duffield. 2001. VP-Ellipsis and anaphora in first language acquisition. Language Acquisition 9: 301–327. May, Robert, and Samuel Jay Keyser. 1985. Logical form: Its structure and derivation. Cambridge, MA: MIT Press. McDaniel, Dana, and Helen Smith Cairns. 1990. The processing and acquisition of control structures by young children. In Lyn Frazier and Jill de Villiers (eds), Language processing and acquisition, 313–325. Dordrecht: Kluwer. McDaniel, Dana, Helen Smith Cairns, and Jennifer Ryan Hsu. 1990a. Binding principles in the grammars of young children. Language Acquisition 1: 121–139. McDaniel, Dana, Helen Smith Cairns, and Jennifer Ryan Hsu. 1990b. Control principles in the grammars of young children. Language Acquisition 1: 297–335. McKee, Cecile, Dana McDaniel, and Jesse Snedeker. 1998. Relatives children say. Journal of Psycholinguistic Research 27: 573–596. Merchant, Jason. 2000. Antecedent-contained deletion in negative polarity items. Syntax 3: 144–150. Messenger, Katherine, Holly Branigan, Janet McLean, and Antonella Sorace. 2009. Semantic factors in young children’s comprehension and production of passives. In Jane Chandlee, Michelle Franchini, Sandy Lord, and Gudrun-Marion Rheiner (eds), Proceedings of the 33rd Annual Boston University Conference on Language Development, vol. 2, 355–366. Somerville, MA: Cascadilla. Messenger, Katherine, Holly P. Branigan, and Janet F. McLean. 2012a. Is children’s acquisition of the passive a staged process? Evidence from six- and nine-year-olds’ production of passives. Journal of Child Language 39: 991–1016. Messenger, Katherine, Holly P. Branigan, Janet F. McLean, and Antonella Sorace. 2012b. Is young children’s passive syntax semantically constrained? Evidence from syntactic priming. Journal of Memory and Language 66: 568–587. Musolino, Julien. 1998. Universal Grammar and the acquisition of semantic knowledge: An experimental investigation into the acquisition of quantifier-negation interaction in English. Doctoral dissertation, University of Maryland, College Park.

206

kristen syrett

Musolino, Julien, Stephen, Crain, and Rosalind Thornton. 2000. Navigating negative quantificational space. Linguistics 38: 1–32. Musolino, Julien, and Andrea Gualmini. 2004. The role of partitivity in child language. Language Acquisition 12: 97–107. Musolino, Julien, and Jeffrey Lidz. 2006. Why children aren’t universally successful with quantification. Linguistics 44: 817–852. Omaki, Akira, Imogen Davidson White, Takuya Goro, Jeffrey Lidz, and Colin Phillips. 2014. No fear of commitment: Children’s incremental interpretation in English and Japanese whquestions. Language Learning and Development 10: 206–233. Pinker, Steven, David S. Lebeaux, and Loren A. Frost. 1987. Productivity and constraints in the acquisition of the passive. Cognition 26: 195–267. Reinhart, Tanya. 1981. Definite NP anaphora and c-command domains. Linguistic Inquiry 12: 605–635. Reinhart, Tanya. 1983. Anaphora and semantic interpretation. Chicago, IL: University of Chicago Press. Roeper, Thomas, and Jill de Villiers. 1993. The emergence of bound variable structures. In E. Reuland and W. Abraham (eds), Knowledge and language, vol. 1: From Orwell’s Problem to Plato’s Problem, 105–139. Dordrecht: Kluwer. Sheldon, Amy. 1974. The role of parallel function in the acquisition of relative clauses in English. Journal of Verbal Learning and Verbal Behavior 13: 272–281. Snedeker, Jesse, and John Trueswell. 2004. The developing constraints on parsing decisions: The role of lexical-biases and referential scenes in child and adult sentence processing. Cognitive Psychology 49: 238–299. Snyder, William, and Karin Stromswold. 1997. The structure and acquisition of English dative constructions. Linguistic Inquiry 28: 281–317. Solan, Lawrence. 1983. Pronominal reference: Child language and the theory of grammar. Dordrecht: Reidel. Stromswold, Karin. 1995. The acquisition of subject and object wh-questions. Language Acquisition 4: 5–48. Sudhalter, Vicki, and Martin D. S. Braine. 1985. How does comprehension of passives develop? A comparison of actional and experiential verbs. Journal of Child Language 12: 455–470. Syrett, Kristen. 2015a. Events and agents in the acquisition of universal quantification. Theoretical Linguistics 41: 211–222. Syrett, Kristen. 2015b. Experimental support for inverse scope readings of finite-clause embedded Antecedent-Contained Deletion sentences. Linguistic Inquiry 46: 579–592. Syrett, Kristen. 2015c. QR out of a tensed clause: Evidence from Antecedent-Contained Deletion. Ratio Special issue: Investigating Meaning, ed. N. Hansen and E. Borg, 28: 395–421. Syrett, Kristen, and Jeffrey Lidz. 2009. QR in child grammar: Evidence from AntecedentContained Deletion. Language Acquisition 16: 67–81. Syrett, Kristen, and Jeffrey Lidz. 2011. Competence, performance and the locality of Quantifier Raising: Evidence from 4-year-old children. Linguistic Inquiry 42: 305–337. Tavakolian, Susan. 1978. Structural principles in the acquisition of complex sentences. Doctoral dissertation, University of Massachusetts, Amherst. Tavakolian, Susan. 1981. The conjoined clause analysis of relative clauses. In S. Tavakolian (ed.), Language acquisition and linguistic theory, 167–187. Cambridge, MA: MIT Press. Thornton, Rosalind. 1995. Referentiality and wh-movement in child English: Juvenile DLinkuency. Language Acquisition 4: 139–175.

behavioral acquisition methods with preschool-age children

207

Thornton, Rosalind. 2004. Why continuity. In Alejna Brugos, Linnea Micciulla, and Christine E. Smith (eds), Proceedings of the 28th Annual Boston University Conference on Language Development, 620–632. Somerville, MA: Cascadilla. Thornton, Rosalind, and Kenneth Wexler. 1999. Principle B, VP ellipsis and interpretation in child grammars. Cambridge, MA: MIT Press. Thothathiri, Malathi, and Jesse Snedeker. 2008. Syntactic priming during language comprehension in three- and four-year-old children. Journal of Memory and Language 58: 188–213. Tomasello, Michael, Patricia Brooks, and Elissa Stern. 1998. Learning to produce passive utterances through discourse. First Language 18: 223–237. Turner, Elizabeth Ann, and Ragnar Rommetveit. 1967. Experimental manipulation of the production of active and passive voice in children. Language and Speech 10: 169–180. van Valin Jr., Robert D. 1998. The acquisition of WH-questions and the mechanisms of language acquisition. In Michael Tomasello (ed), The new psychology of language: Cognitive and functional approaches to language structure, 203–229. New York: Psychology Press. Viau, Joshua, Jeffrey Lidz, and Julien Musolino. 2010. Priming of abstract logical representations in 4-year-olds. Language Acquisition 17: 26–50. Weissenborn, Jürgen, Thomas Roeper, and Jill de Villiers. 1991. The acquisition of whmovement in French and German. In B. Plunkett and T. Maxfield (eds), The acquisition of wh, 43–78. Amherst, MA: GSLA. White, Aaron, Valentine Hacquard, and Jeffrey Lidz. 2018. The labeling problem in syntactic bootstrapping: Main clause syntax in the acquisition of propositional attitude verbs. In Kristen Syrett and Sudha Arunachalam (eds), Semantics in acquisition, 198–220. Amsterdam: Benjamins. Wimmer, Heinz, and Josef Perner. 1983. Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children’s understanding of deception. Cognition 13: 103–128. Yamakoshi, K. 2002. The acquisition of wh/every interaction in English. In Barbora Scarabela, Sarah Fish, and Anna H.-J. Do (eds), Proceedings of the 26th Annual Boston University Conference on Language Development, 769–780. Somerville, MA: Cascadilla. Zuckerman, Shalom, Manuela Pinto, Elly Koutamanis, and Yoïn van Spijk. 2015. A new method for language comprehension reveals better performance on passive and principle B constructions. In Jennifer Scott and Deb Waughtal (eds), Proceedings of the 40th Annual Boston University Conference on Language Development, 443–456. Somerville, MA: Cascadilla.

c ha p t e r 7 ...........................................................................................................

m o d e l i n g s y n ta c t i c acquisition ...........................................................................................................

lisa s. pearl

7.1 Why model?

..........................................................................................................................

Just like other scientific tools of investigation, modeling is something we use to answer certain kinds of questions. For syntactic acquisition, these questions tend to concern the process of acquisition that yields adult syntactic knowledge—that is, how exactly syntactic acquisition proceeds, using particular learning strategies (Pearl and Sprouse 2015). In essence, an informative model of syntactic acquisition is the embodiment of a specific theory about syntactic acquisition. So, to build an informative syntactic acquisition model, you need to first have a theory about how syntactic acquisition works. Then, the model can be used to (1) make all the components of that acquisition theory explicit, (2) evaluate whether it actually works, and (3) determine precisely what makes it work (or not work).

7.1.1 Making the components explicit It might seem surprising at first to claim that one benefit of modeling is that it makes theory components explicit—but this truly is important. It often turns out that the acquisition theories that seem explicit to humans don’t actually specify all the details necessary to implement the strategies these theories describe. For example, suppose a proposed learning strategy is that children use triggers in their input to signal certain parametric values, and the triggers are explicitly defined ahead of time for children (perhaps through Universal Grammar (UG)). As a concrete example, let’s say that the trigger for wh-movement is seeing a wh-word in a position different from where it’s

210

lisa s. pearl

understood (e.g. what in the question What did the penguin do what ?). So, the learning strategy for acquiring the right wh-movement structure in your language involves identifying these wh-movement triggers. Are we finished? Not quite. What do children need in order to recognize the appropriate wh-movement trigger in their input? They probably at least need to know that a certain word is one of these special wh-words (in English, this would include who, what, how, and so on). They probably need to be able to reliably segment the words in the utterance and recognize that the wh-word is not appearing where it’s understood (which means they need to understand enough of what the utterance means). They probably need to be able to remember the existence of the fronted wh-word in the utterance long enough and reliably enough so they can update their internal parameter value. They probably need to ignore the utterances in English where the wh-word doesn’t move (e.g. echo questions like The penguin did what?). And then, what about the wh-in-situ option (for languages like Mandarin Chinese and Japanese)—is there a trigger for that as well? If so, what is it? More specifically, suppose we only have a trigger defined for wh-movement. In this case, we’re effectively implying that the lack of that trigger indicates wh-in-situ. This is fine as a supposition, but how exactly does that work in practice? For instance, is wh-in-situ a default setting that’s overridden by the wh-movement trigger? This isn’t unreasonable as a default assumption and might result from a general tendency to disprefer movement unless the language indicates otherwise (e.g. based on something like the Minimal Chain Principle: De Vincenzi 1991; Sakas and Fodor 2012; Fodor and Sakas 2017). Or, if there are no defaults, does a child use indirect negative evidence to decide that her language uses wh-in-situ? In particular, she would observe that wh-movement keeps not happening when she expects it to. If this is the learning strategy, then how long does she have to wait before she decides that her language doesn’t have wh-movement? These are just some of the aspects of the triggering theory that need to be specified in order to implement it in a model. More generally, by trying to build an implementable model for a particular syntactic acquisition theory, we can see where the gaps are. This is because a computer program can only implement an acquisition theory where every relevant detail is specified (Kol, Nir, and Wintner 2014; Pearl 2014; Pearl and Sprouse 2015). So, even if an acquisition theory has already been developed, a computational model provides a way to flesh out the necessary components of that theory.

7.1.2 Evaluating the theory and explaining what happened Once an acquisition theory is specified enough to implement in a computational model, we can then evaluate it by comparing the predictions it generates against the empirical data available from children. I should note that we have to be somewhat careful about interpreting model results. There are two basic outcomes: (1) the model predictions match children’s data, or (2) they don’t.

modeling syntactic acquisition

211

If the predictions match, this is an existence proof that the acquisition theory, as implemented by the computational model, is a way that acquisition could proceed. That is, the computational model demonstrates exactly how successful acquisition could work. So, this model is support for that acquisition theory. Notably, however, this modeling evidence doesn’t rule out alternative acquisition theories. This is why I emphasized could above: Just because the model demonstrates one way acquisition could work doesn’t mean that other ways couldn’t also work. So, modeling evidence is interpreted with respect to the implemented acquisition theory only. Still, of course, sometimes the model predictions don’t match children’s data. What then? This is then evidence against that acquisition theory, as implemented by the model. That is, we can’t immediately rule out all versions of the acquisition theory unless we explicitly implement them (or rule them out for principled reasons). Remember: A model often specifies components of a theory that the original theory didn’t. So, if this particular theory implementation doesn’t work, maybe it’s a problem with those components, and not the theory more broadly. The only way to know is to test each of the possible implementations, or rule them out in principle somehow. This of course becomes hard to do in practice—there may be quite a number of implementations for any given acquisition theory. (Just think about all the implementational details for the triggering theory described in the previous section.) The other option, ruling implementations out on principle, depends greatly on the principles available and how agreed upon they are. (Are these principles of human computational capacity, theoretical economy, something else?) This practical consideration about model interpretation is why we may often seen published results involving models that succeed (even if the results also include models that fail), rather than only results about models that fail. Related to that, if you have an implemented model (whether it succeeds or fails), a very useful benefit is that you can look inside it to determine what exactly makes it work or not work. This is something that’s much more difficult to do with children’s minds. That is, we can sift through the components of the implemented acquisition theory to see which ones are important for acquisition success. For example, suppose we have a successful implementation of the triggering theory for learning about whmovement. We can see if it’s important for English children to ignore wh-echo questions where there’s no wh-movement, or how necessary a Mandarin Chinese default wh-insitu value is. Suppose we find that these components are vital for filtering children’s input appropriately (e.g. ignoring echo questions) or navigating the hypothesis spaced based on the input (e.g. a default wh-in-situ value)—that is, without them, the model’s predictions don’t match children’s behavior. Then, we can say that these are necessary components of the successful trigger-based acquisition theory that explains how children learn where wh-words appear in their language. Moreover, we can also explain why (e.g. one component filters the input and the other helps children navigate the hypothesis space). This highlights how modeling can be used as a tool for both developing and refining acquisition theories. Notably, an acquisition theory actually includes two types of theories: theories of the learning process and theories of the representations to be learned.

212

lisa s. pearl

An informative model requires us to be explicit about both. So, when we build a model that incorporates both theory types and see the results, we get feedback about both. To understand why an informative model incorporates theories of the learning process and theories of representation, it’s helpful to consider all the pieces that go into characterizing the language acquisition task.

7.1.3 Characterizing the acquisition task There have been several recent discussions of the acquisition process (Pearl and Mis 2011; Pearl and Sprouse 2013a; Pearl 2014; Lidz and Gagliardi 2015; Omaki and Lidz 2015; Pearl and Sprouse 2015; Pearl and Mis 2016), and I find the model articulated by Omaki and Lidz (2015) and more fully by Lidz and Gagliardi (2015) to be especially helpful (an adapted version is shown in Figure 7.1). This model specifies components external and internal to the child during the acquisition process, and is meant to capture the iterative process of acquisition unfolding over time. I should note that this is one particular way of specifying different important components of the acquisition process—it very well may be that future versions alter how the specified components relate to each other or even what the specified components are. For now, however, I think it sets us up quite well to think concretely about components of the acquisition task that are important for computational models of acquisition. External components are observable. We can observe the input signal available to children during acquisition, and we can also observe children’s behavior at any stage of development, either through naturalistic productions or through clever experimental designs. The internal components involve several clusters. The first cluster centers around perceptual encoding—that is, it concerns the information in the input signal that the child is able to perceive at this stage of development. The perceptual intake that results from this depends on the child’s developing grammar (i.e. the linguistic knowledge currently available), the parsing procedures that are able to apply that linguistic knowledge in real time to the incoming input signal, and a variety of extralinguistic systems (e.g. memory, pattern recognition) that are also necessary for extracting information from the input signal. This serves to highlight the intrinsic link between developing representations and developing processing abilities during acquisition (Lidz and Gagliardi 2015; Omaki and Lidz 2015; Phillips and Ehrenhofer 2015). When the child generates observable behavior, she relies on the current representations she’s been able to perceptually encode (the perceptual intake) and her developing grammar. She then applies her production systems to those representations to generate behavior like speaking (which relies on utterance generation processes) or responding non-verbally (e.g. looking at a picture representing a scene described by an utterance she just heard, which relies on extralinguistic systems like motor control, attention, and decision-making).

modeling syntactic acquisition

External

Input

Behavior

Internal Perceptual encoding Parsing procedures Extralinguistic systems

213

Production Developing grammar

Utterance generation Extralinguistic systems

Inference Perceptual intake

Constraints & filters

Acquisitional intake Extralinguistic systems

fig. 7.1 Model of the acquisition process adapted from Lidz and Gagliardi (2015), highlighting the contributions of several key components. Observable components are external to the child (input signal and the child’s behavior). Internal components include the pieces used to perceptually encode information from the input signal (developing grammar, perceptual encoding), the pieces used to produce the observable behavior (perceptual intake, developing grammar, and production systems), and the pieces used for inference over the perceptually encoded intake (inference). These yield the next stage of the developing grammar, which itself is used in subsequent perceptual encoding and production.

The last cluster involves the inference process—that is, the process of updating internal hypotheses about the developing grammar, given the input data perceived as relevant. The data perceived as relevant has been referred to as the acquisitional intake, and is based on the perceptual intake. Notably, the acquisitional intake is typically not all of the perceptual intake. That is, it’s not everything the child is able to encode. Instead, depending on what the child is trying to learn, what’s relevant is likely some subset of the perceptual intake. This is one place where UG can have an impact: It can filter the perceptual intake down to the relevant pieces by providing both contraints on possible hypotheses and attentional filters. Inference then operates over the acquisitional intake, and typically involves extralinguistic abilities like probabilistic inference, statistical learning, or hypothesis testing. For example, consider the wh-movement triggers we discussed before. Suppose a child hears What did the penguin do? and is at the stage of development when she can perceptually encode several aspects of the utterance: the individual words, the utterance’s phrase structure, and the dependency between what and the place where it’s understood after do. This is the child’s perceptual intake. If the child is still learning whether her language has wh-movement, UG will highlight for her the salient information in her representation: what has moved, as indicated by the dependency. This is an example attentional filter provided by UG; the fact that the child is only

214

lisa s. pearl

considering +wh-movemement and -wh-movement as her hypotheses (rather than various partial wh-movement options that only apply in certain contexts) would be an example hypothesis space constraint provided by UG. The wh-movement part of her perceptual intake is then relevant for learning about wh-movement, and so the whmovement acquisitional intake consists of that information alone (i.e. what moved); the rest of the information is irrelevant for learning about wh-movement. Inference occurs over this piece of acquisitional intake, presumably strengthening the child’s belief that her language has wh-movement. This inference process thus updates the child’s developing grammar. Because the developing grammar is involved in the perceptual encoding process and the behavior production process, every update to the developing grammar can impact both what the child is able to perceive from the input signal and what the child is able to produce. In our example of wh-movement, suppose the child has just decided that her language uses wh-movement, based on the acquisitional intake accrued. This may allow her developing parser to extract wh-dependencies more reliably in future utterances. So, for instance, she might now be able to correctly extract the wh-dependency in a longer utterance like What did Jack think that Lily said the penguin did?, where before this information might not have been available to her. Similarly, she may be able to produce longer wh-dependencies in her output and respond correctly to utterances involving longer wh-dependencies. Inspired by this characterization of the acquisition process, I want to suggest several components for precisely defining the acquisition task that an acquisition theory is meant to solve. 1. Initial state: What knowledge, abilities, and learning biases does the modeled child already have? These include both language-specific and domain-general components. In Figure 7.1, the initial state encompasses the current status of the developing grammar, parsing procedures, utterance generation, constraints and filters, and extralinguistic systems in perceptual encoding, production, and inference. This is the starting point for the modeled child, given the acquisition task being considered. 2. Data intake: What data is the modeled child learning from? This captures the pipeline from the external input signal to perceptual encoding to the acquisitional intake. The external input is what’s available for the child to learn from. Her perceptual encoding filters down that input signal to what she’s capable of perceiving at this stage of development (the perceptual intake). The acquisitional intake is the result of further winnowing down, based on the constraints and filters in the modeled child’s initial state (e.g., provided by UG). So, this is where we can see the immediate impact of the modeled child’s initial state – those initial state components influence what’s in the modeled child’s acquisitional intake. 3. Inference: How are updates to the modeled child’s internal representations made? This is the inference from Figure 7.1, which operates over the acquisitional intake to yield the latest developing grammar representations.

modeling syntactic acquisition

215

4. Learning period: How long does the modeled child have to learn? This is represented by the entirety of Figure 7.1, and primarily involves the developing grammar. That is, this is when the iterative process of updating the developing grammar occurs. 5. Target state: What does it mean for the modeled child to succeed at learning? We can assess this by matching model output against observable data from children, i.e. their behavior in Figure 7.1. So, it’s useful to have a model that can generate behavioral output (e.g. Perfors, Tenenbaum, and Regier 2011; Pearl and Sprouse 2013b; 2015; Pearl and Mis 2016; Savinelli, Scontras, and Pearl 2017; BarSever, Lee, Scontras, and Pearl 2018; Bates, Pearl, and Braunwald 2018; Savinelli, Scontras, and Pearl 2018; Pearl and Sprouse 2018a). However, if we’re confident about which internal representation a particular observable behavior corresponds to (i.e. a specific developing grammar component, such as +wh-movement), we may match model output directly against that grammatical knowledge (e.g. Yang 2004; Pearl and Sprouse 2018b). These components correspond to implementable computational model components of an acquisition theory. More specifically, the acquisition task can be characterized by these components, and an acquisition theory consists of specifying each component according to a theory of developing representations and a theory of developing processing abilities. In the rest of this chapter, I’ll first highlight some modeling framework basics, including considerations of cognitive plausibility, different explanatory levels of modeling, and some commonly used inference mechanisms. I’ll then discuss several modeling case studies in syntactic acquisition, including both parametric and non-parametric approaches to modeling syntactic acquisition. I’ll conclude with some thoughts on exciting new directions for syntactic acquisition modeling.

7.2 Some modeling framework basics

..........................................................................................................................

7.2.1 Designing informative models: Cognitive plausibility In order for our modeling results to tell us something about how children develop syntactic knowledge (i.e. for a model to be an informative model and not just an interesting programming exercise), we need to believe that the model reasonably approximates aspects of a child’s acquisition process. The way to do this is to make sure the model components are cognitively plausible. That is, we make reasonable assumptions for what’s actually going on during the acquisition process in children when implementing each model component. But how do we know what’s cognitively plausible for any given component? This is where prior theoretical, corpus, and experimental research can help.

216

lisa s. pearl

Theoretical research can help define parts of the initial state (the developing grammar, the contents of UG), which then impacts the data in the modeled child’s acquisitional intake. The inference process may also be defined by theoretical work (e.g. language-specific inference mechanisms, such as the Structural Triggers Learner of Sakas and Fodor: Sakas and Fodor 2001; 2012; Sakas 2016). The target knowledge state (the developed grammar) is also typically defined by theoretical proposals for knowledge representations. Corpus analysis can help define aspects of the data intake, in particular the input that children encounter as child-directed speech (which they subsequently perceptually encode). Corpus analysis can also provide quantitative descriptions of children’s linguistic productions, which is one type of behavior that can indicate the underlying grammar (either the developing representations during the learning period or the target representations after the learning period is completed). Experimental results can help define parts of the initial state (the parsing and extralinguistic abilities available at a particular age), which impact how the input is perceived (perceptual intake). Experimental results can also define how children’s developing production systems generate observable behavior from underlying linguistic representations. Experimental results additionally help define what inference abilities are available for the modeled child, how long the modeled child’s learning period is, and what behavior the modeled child should display both during the learning period and afterwards, when the target knowledge has been acquired. Practically speaking, it can still be non-trivial to make each model component cognitively plausible, despite all this information. For example, let’s consider the input component, something external to the child that we might try to estimate using corpus analysis. The key question is analysis of what: Do we think the relevant information is contained only in child-directed utterances (which we have large samples of from electronic resources like the CHILDES database (MacWhinney 2000)), or do we also need detailed information about the visual scene, accompanying actions, or other discourse context (some of which is available in CHILDES, but far less often)? If what we believe to be relevant input information is not easily available in existing corpora, corpus analysis won’t help. We’ll have to make an educated guess about what reasonable input looks like for those dimensions that we don’t have precise estimates for. As another example, consider the learning period. For some acquisition problems, we have empirical data about exactly what children know when; for others, we may only have the typically developing adult knowledge state or an atypically developing learner’s final knowledge state. In these latter cases, it may be difficult to determine what should count as a plausible learning period. Again, because modeling requires us to implement all the model components explicitly, we may simply have to make an educated guess. The target state can also present cognitive plausibility challenges, since we have to consider exactly what representations the modeled child ought to learn and what behavior that modeled child ought to produce in order to demonstrate that the target representation has been learned. If we have detailed empirical data available about the

modeling syntactic acquisition

217

stages of learning (e.g. through experimental results or corpus analysis of children’s productions), this can be a reasonable comparison for the model child’s output—we can try to capture the appropriate learning trajectory (e.g. Alishahi and Stevenson 2008). However, if we don’t, we may need to rely on other measures of what counts as acquisition success (Pearl 2014). Perhaps the modeled child should attain adult-like knowledge. If so, we can use behavior correlated with adult knowledge as the desired target behavior (e.g., Pearl and Sprouse 2013b,a; 2015; 2018b). Perhaps we have a measure of behavior at one particular age (when the child may not yet have the adult knowledge). If so, we can use that behavior as a metric of what the learner should have learned by that age (e.g., Pearl and Mis 2016; Savinelli et al. 2017; Pearl and Sprouse 2018a). Perhaps we know that the target knowledge will be used to bootstrap future acquisition processes – e.g. the developing grammar impacts future perceptual encoding of linguistic representations. If so, we can measure how useful the modeled child’s developing representations are, regardless of whether they match adult representations (e.g. Phillips and Pearl 2014b,a; 2015a; Bar-Sever and Pearl 2016). The larger point about implementing a model of syntactic acquisition is that we should strive as much as possible to make sure each component is psychologically grounded. If empirical data are available to guide our implementation, then this is straightforward. However, sometimes the data we need aren’t available yet, and so we need to make principled decisions about how to implement a given model component. When we make choices that are not derived from empirical data, we have to be prepared to explain why they’re reasonable choices and what impact they have on the acquisition process. For more detailed discussion of how to computationally model language acquisition more generally, I recommend the acquisition modeling overviews by Alishahi (2010), Pearl (2010), Räsänen (2012), Freudenthal and Alishahi (2014), Pearl and Sprouse (2015), and Pearl and Goldwater (2016). The key is that when we make model implementation decisions in cognitively plausible ways, our resulting computational model is more likely to tell us something informative about the syntactic acquisition process in children.

7.2.2 Levels of explanation Another important consideration is the level of explanation a model is seeking to provide, as this impacts decisions about the modeling components. I find it useful to think about this in terms of Marr’s levels of explanation (Marr 1982): computational,1 algorithmic, and implementational. 1 Note that this is a different use of “computational” than what we mean when we talk about “computational modeling.” This is an unfortunate overloading of the term “computational,” with two distinct meanings. Computational as an explanation level is what I talk about in this section. Computational as a modeling technique means implementing a model concretely with a computer program. I’ll try to distinguish these meanings by using “computational-level models” for the explanation level and “computational modeling” for the technique.

218

lisa s. pearl

A computational-level explanation is about capturing the goal of the acquisition process—that is, what is the cognitive computation the child is trying to carry out? Another way to think about this is that we’re trying to correctly conceptualize the acquisition task happening. In the terms we used previously, is it possible to reach a specific target state, given a specific initial state and data intake? A computational-level model cares about the form of the underlying representations and biases (initial state), realistic input (which yields the data intake when combined with the initial state), and the checkpoint provided by the target state—all of these components should be implemented in a cognitively plausible way. Notice, however, that we’re abstracting away from the details of the inference mechanism and learning period—we’re really just concerned with seeing if we’ve got the right task description (this is sometimes implemented as an ideal learner model, such as those in Foraker, Regier, Khetarpal, Perfors, and Tenenbaum 2009, Perfors, Tenenbaum, and Wonnacott 2010, Perfors et al. 2011, Pearl 2011, Pearl, Ho, and Detrano 2017, and Pearl and Sprouse 2018a). Why do we do this? If we find that it’s in fact impossible to reach the target state, given the initial state and data intake, this is a signal that we may not be describing the acquisition task correctly. So, if we try to implement specific learning strategies to solve that acquisition task, we’ll probably find that none of them work (see Pearl (2011) for an example of this in metrical stress acquisition). We can save ourselves from being doomed to failure by making sure we first have a reasonable computationallevel model of the acquisition task. A successful computational-level model will validate the underlying representations and biases in the initial state that went into it, because this model demonstrates how those initial-state assumptions can lead to the target state using realistic input. In short, a computational-level model can demonstrate that it’s possible in principle to solve that acquisition task using a specific set of assumptions about the initial state. But what about being possible for humans? This is where an algorithmic-level model helps us (sometimes implemented as a process model, such as those in Regier and Gahl 2004; Yang 2004; Pearl and Lidz 2009; Pearl and Sprouse 2013b; and Pearl and Mis 2016). Algorithmic-level models are concerned with the steps humans would carry out to solve the acquisition task, and often include considerations of incremental processing and limited memory, as well as a realistic learning period for the steps to be carried out in. An algorithmic-level model can demonstrate that the acquisition task can be solved with the cognitive and time limitations children have. That is, it’s possible for children to solve the specified acquisition task using a particular set of assumptions about the initial state. What typically differs between computational-level and algorithmic-level models is the inference process implementation. A computational-level inference process is focused on simply completing the computation accurately—there is no claim that the way inference is being carried out in the model is the way humans carry out inference. As a concrete example, Gibbs sampling is a statistical inference algorithm currently used in some computational-level models (e.g. Perfors et al. 2010; 2011; Phillips and Pearl 2015b; Pearl and Sprouse 2018a), and it has the happy property of being

modeling syntactic acquisition

219

guaranteed to converge on the best answer if given enough time to search the (potentially infinite) hypothesis space. While this inference algorithm involves a particular (clever) process of iteratively searching the hypothesis space, it’s unlikely that humans perform the same process when they’re doing inference. So, models using this inference process typically aren’t concerned with learning period considerations, which encode how long the child has to perform the steps of the inference process. Put simply, because the steps of computational-level inference aren’t the same as the steps of human inference, paying attention to how long it takes a modeled child to accomplish computational-level inference isn’t really related to how long children have to accomplish their inference. This contrasts with algorithmic-level models, whose inference process is meant to represent the steps children would go through to perform their inference (e.g. Regier and Gahl 2004; Yang 2004; Pearl and Lidz 2009; Pearl and Sprouse 2013b, and Pearl and Mis 2016). This is why algorithmic-level models are sensitive to the cognitive limitations children have when performing inference (e.g. limited memory) and the time limitations imposed by the learning period. In some cases, the algorithmic-level inference is known to be a heuristic approximation of computational-level inference (e.g. Regier and Gahl 2004; Pearl and Lidz 2009; Bonawitz et al. 2011; Pearl and Mis 2016); in other cases, it’s not obviously so (e.g., Yang 2004; Pearl and Sprouse 2013b). The last explanation level is implementational, and this is concerned with how the cognitive computation of the acquisition task is implemented in the brain. This has direct links to the algorithmic-level explanation: If we think we have the right steps to carry out the acquisition computation, how are they actually carried out in the available neural medium? This involves a deep understanding of neural architecture, as the specific relevant properties of the brain need to be simulated (e.g. how information is represented in a distributed manner, the way neurons spread information to each other, the structural subdivisions of the brain). This consideration has significant impact on how the initial state assumptions, data intake, inference, and target state are encoded in the model. As an example focusing on the initial state, if we think there’s a bias to have a default wh-movement value of no movement, what does that actually look like neurally? That’s what we would need to encode in an implementational level model. Because the field is currently developing the linking theories between linguistic theory and cognitive neuroscience, this represents an exciting area of future collaborative research for syntacticians, computational modelers, and cognitive neuroscientists.

7.2.3 Inference As noted above, inference is how the modeled child updates the developing grammar representations. Below I discuss some common inference components and inference implementations used in the syntactic acquisition modeling literature, though this is by no means an exhaustive list.

220

lisa s. pearl

7.2.3.1 Counting things A very common component in the modeled inference process is the ability to count things, which is a domain-general ability. In syntactic acquisition models, the more important thing for inference is what’s being counted—this is where the learning theory and representational theory come in handy. These theories, through the initial state and its effect on the data intake, determine what actually makes it to the inference process—i.e., what things are counted. Depending on the theories involved, the inference mechanism could be operating over counts of lexical items (Yang 2005; Freudenthal Pine, Aguado-Orea, and Gobet 2007; Freudenthal, Pine, and Gobet 2009; 2010; Freudenthal, Pine, Jones, and Gobet 2015; Yang 2016), syntactic category sequences (Perfors et al. 2010; 2011; Pearl and Sprouse 2018a), syntactic signals realized in certain overt structures (Sakas and Fodor 2001; Yang 2004; Legate and Yang 2007; Mitchener and Becker 2010; Pearl and Sprouse 2013b; Becker 2014; Sakas 2016; Pearl and Sprouse 2018b), or something else entirely. Importantly, for our purposes as modelers, the inference mechanism itself doesn’t change. Once we define the units over which inference is operating, inference then simply operates. Counts are typically translated into probabilities (e.g. 700 instances of an element appearing in 1,000 data points translated to ≈2 0.7), and this provides a sense of how relatively frequent the element is. Probabilities also have an intuitive interpretation as beliefs about categorical options. For example, let’s consider the two wh-movement options: +wh-movement and -wh-movement. A probability of 0.7 for +wh-movement (and a probability of 0.3 for -wh-movement) might reasonably be interpreted as the modeled child believing +wh-movement is more likely to be true. This can also be thought of as the modeled child picking +wh-movement 70% of the time if asked to generate an utterance involving a wh-word. In this way, probabilities can make it easier to link categorical representations to observable behavior (e.g. a child using +wh-movement in her utterances about 70% of the time). As a practical note, probabilities derived from counts typically involve smoothing, where things never observed still have a very small amount of probability assigned to them. This is because zero probability is very final—something with zero probability can never, ever occur. In contrast, something with a very small probability may occur only very occasionally, but it’s not impossible. This is useful when we consider the acquisition task: Children get a finite data sample and have to make generalizations from it. It could be that some element they never observe is actually ungrammatical—but it could also be the case that it’s just very rare. So, assigning the element a very small probability gives some wiggle room for generalization purposes. Perhaps later on, evidence will come in that indicates this element is just fine, such as actually hearing someone use it in naturalistic speech. If the element has zero probability, no amount of evidence can change that belief. But, if the element has a very small probability, belief in the element can still be adjusted. 2

See the discussion about smoothing below for why the ≈ is used.

modeling syntactic acquisition

221

Smoothing can be intuitively implemented via pseudocounts, where the modeled child imagines she’s seen each element a certain number of times (typically patient), and whichever thematic roles are present are sorted according to this ordering. Then, the highest role available maps to the highest syntactic position (e.g. subject). This gives rUTAH more flexibility than UTAH, for example allowing it to easily handle unaccusative constructions like The icepatient breaks. In this case, UTAH would expect the subject to have an agent-ish role, which patient certainly isn’t. In contrast, rUTAH would note there was only one

254

lisa s. pearl

role available (patient) and expect that role to appear in the highest syntactic position available (subject), which it does. Notably, Pearl and Sprouse (2018a) also investigated whether those linkings—UTAH or rUTAH—were known already by children or were instead learned over time. If the linkings were already known (perhaps because they were innately specified the way UTAH and rUTAH theorists had previously assumed), we would expect modeled 3year-olds using this linking knowledge to create verb classes that best match observed 3-year-old verb classes. In contrast, if the linkings were learned over time (perhaps because they were derived from the children’s language experience), we might expect that older modeled children using this linking knowledge would best match the relevant observed verb classes from actual older children; however, younger modeled children who lack this linking knowledge would best match the observed verb classes from actual younger children. Pearl and Sprouse (2018a) used a computational-level hierarchical Bayesian model that learned from realistic samples of speech directed at 3-, 4-, or 5-year-old English children. A modeled child’s perceptual intake for each utterance involving a verb was the syntactic phrase structure, the animacy of a verb’s arguments, and the thematic roles of a verb’s arguments. The acquisitional intake involved several pieces of information. First, the modeled child extracted the syntactic frame a verb was used in (e.g. The ice broke would have the frame NP for break). Second, the modeled child heeded the animacy of the verb’s arguments and information specific to the particular linking theory assumptions being used: A modeled child using UTAH would map the thematic roles to their respective thematic categories; a modeled child using rUTAH would map the thematic roles to their order (highest, 2nd-highest, and so on). A modeled child who didn’t have linking theory knowledge would then simply track where the thematic category appeared syntactically (e.g. patient-ish or highest in subject for The ice broke); a modeled child who had linking theory knowledge would instead note if the thematic categories appeared where they were expected to or seemed to have moved (e.g. patient-ish in subject is unexpected for UTAH, while highest in subject is expected for rUTAH). So, based on the linking-theory assumptions a modeled child was using (UTAH vs rUTAH, not having linking knowledge vs. having it already), the acquisitional intake could look very different. Using this approach, Pearl and Sprouse (2018a) found support for two compelling ideas. First, both thematic representation options seem to be used by English children at different ages: either UTAH or rUTAH explained 5-year-old verb classes, but only UTAH best explained 4-year-old verb classes, and only rUTAH best explained 3year-old verb classes. So, children’s thematic system knowledge isn’t static, but rather something that evolves over development. Second, children’s verb classes are best explained by children not having built-in knowledge of linking theories. Instead, only 5-year-old verb classes are best matched by modeled children with existing linking knowledge; 3- and 4-year-old verb classes are best matched by modeled children without this linking knowledge. This suggests that children may derive knowledge of linking patterns over time, rather than knowing it a priori.

modeling syntactic acquisition

255

Given this, Pearl and Sprouse (2018b) investigated which linking theory, UTAH or rUTAH, would be easier to derive from English children’s input. Using the same realistic sample of speech directed at 3-, 4-, and 5-year-old children, Pearl and Sprouse (2018b) concretely defined how children might construct the linkings specified by both UTAH and rUTAH. In particular, they modeled children who would examine individual links in the input (e.g., agent-ish ↔ subject vs. patient-ish ↔ subject) in order to derive a complex linking pattern that connected all thematic categories to syntactic positions, and all syntactic positions to thematic categories. Then, that complex linking pattern would be evaluated against the data from the verbs in children’s input. The evaluation process at both stages (i.e. evaluating whether individual links were strongly enough attested and whether complex linking patterns were strongly enough attested) was accomplished by using the Tolerance Principle. Recall from Section 7.2.3.3 that the Tolerance Principle is a computational-level decision criterion for deciding when a generalization has enough support (i.e. few enough exceptions) to support its adoption by the child. So, Pearl and Sprouse (2018b) used the Tolerance Principle to assess which links were strong enough to generalize and which complex linking pattern was strong enough to generalize, based on children’s acquisitional intake. This acquisitional intake was a subset of the acquisitional intake used by the modeled learners in Pearl and Sprouse (2018a): It was just the thematic categories for the syntactic positions associated with a verb (e.g. The ice broke would have patient-ish ↔ subject (UTAH) or highest ↔ subject (rUTAH)). Their results suggested that a child using the Tolerance Principle to learn linking theories from English child-directed speech data has two advantages for learning rUTAH instead of UTAH. First, a rUTAH-learning child will have a significantly easier time explicitly generating complex linking patterns for solving the linking problem from individual links. Second, rUTAH is the only complex linking pattern that can be successfully generalized from these realistic child-directed data using the Tolerance Principle. So, taken together with Pearl and Sprouse (2018a), these results suggest that English children may solve the linking problem by deriving a relativized linking theory like rUTAH over time, with this complex linking theory in place by 5 years old. 7.3.2.4.5 which structured representations exactly? Recall from Section 7.3.2.2 that a child using Bayesian inference may be able in principle to infer that hierarchichally structured representations of utterances are preferable to ones that aren’t. This then leaves open the question of exactly which hierarchically structured representations should be selected out of all the ones possible, with the idea that the correct representations will allow the child to successfully parse the utterances of her language; this is also the approach assumed by the Structural Triggers Learner (STLearner) from Section 7.3.1.1 and the hierarchical Bayesian learner from Perfors et al. (2011) of Section 7.3.2.2. However, in contrast to both these prior learning approaches, which only rely on syntactic information, the approach of Abend et al. (2017) incorporates information from both semantics and syntax; leveraging both information sources allows the learner to simultaneously learn the meanings associated with

256

lisa s. pearl

individual words and the syntactic structures that will yield the correct utterance-level meaning for utterances in the child’s input. This kind of learning, when representations of different kinds are learned simultaneously, is often referred to as joint learning or bootstrapping, with the idea that partial information of one type can help the child learn about representations of another type, and vice versa. For example, syntactic bootstrapping would involve partial syntactic information constraining hypotheses about what individual words mean; semantic bootstrapping would involve partial semantic information constraining what syntactic structures are compatible with the observed utterance. More generally, this bootstrapping approach is an example of syntax viewed within the context of the larger linguistic system; so, learning about syntax involves learning both from non-syntactic (though related) information and learning about non-syntactic (though related) information. For Abend et al. (2017), a correct parse of an utterance doesn’t just include all and only the observable words; it also allows the child to associate the correct utterancelevel meaning with the parsed utterance. The key assumption the child must have is that there’s a transparent mapping from the syntactic structure to the logical form (i.e. the meaning) of an utterance. So, if the utterance has the right parse, it will yield the right logical form, and the logical form will match the utterance-level meaning, as shown in (15) below. (15) A parsed utterance,9 corresponding logical form, and intended meaning a. Parsed utterance: IP NP I

VP V

NP

love penguins b. Associated logical form: love(I, penguins) c. Intended meaning: the speaker has happy feelings towards penguins With this mapping firmly in place in the initial state, the modeled child of Abend et al. (2017) can make serious progress with only a few other components. First, the modeled child has some ability to perceive the utterance-level meaning from non-syntactic context—that is, based on the sorts of things happening in the child’s environment and what’s salient to the child, the child can infer some kind of utterance-level meaning that’s not too far off from the true meaning. This may seem surprising to assume, but it may be more reasonable in context than we think. For instance, the utterance I love penguins might occur when the child and parent are both looking at a picture of penguins, and the parent is smiling and pointing at the penguins. The parent may have also previously 9 Note that the notation in (15a) uses phrase structure grammar categories for ease of comparison with the other examples in the rest of the chapter, but the actual notation should be that of Combinatory Categorial Grammar (CCG), where, for example, VP might be notated in English as S\NP. See Abend et al. (2017) for an accessible overview of CCG.

modeling syntactic acquisition

257

said things about penguins, such as Look at those penguins! or Those penguins are so cute! It turns out that speech directed at children under the age of 2 is often about the “here and now” (Frank, Tenenbaum, and Fernald 2013a; Clerkin, Hart, Rehg, Yu, and Smith 2016), and so likely to refer to things that are visually salient to the child. Therefore, this kind of scenario isn’t implausible. So, given this context, the child might then be able to infer that the penguins are being talked about and the parent is expressing some kind of positive sentiment towards them—this translates into a meaning whose logical form is something like love(I, penguins). Abend et al. (2017) assume that the modeled child has access to a universal conceptual structure, which allows the child to generate that logical form from the perceived utterance meaning. The next component the modeled child relies on is a pre-defined hypothesis space of syntactic structure rules. Importantly, rather than explicitly defining all possible rules (which would be difficult, as there are infinitely many!), the hypothesis space is implicitly defined via a set of constraints: rules are hierarchical, rules are constrained in how their elements combine, and rule elements are composed from a pre-defined set of building blocks. That is, with these constraints, the latent hypothesis space is defined;10 When inferring which syntactic structures are best, the modeled child explicitly generates specific hypotheses for consideration from this latent hypothesis space, using the building blocks defined by that latent hypothesis space. The utility of this approach is that very large latent hypothesis spaces can be considered by the child, but still feasibly searched for the best hypothesis. In fact, the idea of large latent hypothesis spaces is something familiar to those who work in the parameters framework. For instance, n binary parameters implicitly define a latent hypothesis space of 2n grammars; the variational learning approach used by Yang (2002; 2004) is one way to search this latent hypothesis space by generating explicit hypotheses about grammars to test as the data come in. Abend et al. (2017) harness this same idea for defining and searching a latent hypothesis space of syntactic structure rules. The last two components are biases the modeled child has. The first is to use Bayesian inference when evaluating hypotheses about potential syntactic structures. The second is a predisposition to make use of structural parts that have been used before—that is, when considering a potential syntactic rule, the child is biased in favor of structural components that have been used in other syntactic rules. This is similar in spirit to an STLearner preference for treelets that have helped the STLearner successfully parse utterances before. The idea is that reusing pieces that have been used before is efficient, and therefore preferred in potential syntactic rules. This is represented implicitly in the Bayesian learner as an overhypothesis about structural pieces—so, any time a particular structural piece is used in a syntactic rule, the learner takes note of that. The learner then considers the popularity of the pieces involved in any syntactic rule under consideration. In this way, utterances that seem unrelated on the surface may in fact be

10

See Perfors (2012) for a helpful discussion of latent vs. explicit hypothesis spaces, and Pearl and Sprouse (2018b) for discussion targeted at a latent hypothesis space of linking theories.

258

lisa s. pearl

connected by a commonly used structural piece (similar to how linguistic parameters are meant to work). With this initial state set, the modeled child of Abend et al. (2017) is then ready to learn. Its acquisitional intake involves both the words of the utterance in order (e.g. I love penguins), and the logical forms of both that utterance and some number of utterances that occur around it (e.g. something like love(I, penguins), be(those(x, penguins(x)), so cute), and look(at(those(x, penguins(x))))). The fact that multiple (presumably related) logical forms are taken in is meant to represent the child’s uncertainty about the true logical form of the utterance, something that seems cognitively plausible in the contexts children encounter. That is, while the child is able to extract some core conceptual structure (which this set of utterances may have in common because of discourse context and the child’s here-and-now environment), there’s uncertainty in exactly which conceptual structure corresponds to the utterance under consideration. The data themselves come from a subset of the Brown–Eve corpus in the CHILDES database (MacWhinney 2000), and are encountered by the modeled child in the order the actual child encountered them, one at a time. Idealized Bayesian inference is done for each utterance, updating the child’s set of word meanings, syntactic categories, and syntactic rules using those categories. So, this is an example of a computational-level model that nonetheless learns incrementally (one utterance at a time). It’s computational-level because inference is idealized, and so the model investigates what’s possible in principle, given this conceptualization of the child’s acquisition task. However, by learning incrementally, this learner can generate a developmental trajectory, where its knowledge and behavior can be inspected at any point during learning. This property is particularly useful for comparing the modeled child’s developmental patterns against known developmental patterns in actual children, which Abend et al. (2017) in fact do. That is, in addition to evaluating the modeled child on the knowledge it achieves (target knowledge), Abend et al. (2017) also evaluate the modeled child on qualitative behavior patterns observed via naturalistic or experimental setups (target behavior). For target knowledge, the modeled child is tested on whether it can generate the correct logical form for an utterance it hasn’t heard before. The only way for it to do this is to use its acquired syntactic representations to parse the utterance, and then map that parsed form to a logical form. This can be thought of as syntactic bootstrapping, because the syntactic structure is what allows the modeled child to generate a meaning for the utterance (and all its parts). The modeled child achieved up to 60% accuracy at inferring the correct logical form after only a few thousand utterances of input. While this is fairly impressive given the modeled child’s limited input and many competing alternative parses, what’s even more striking is the qualitative behavior of the modeled child. As mentioned above, the modeled child is capable of syntactic bootstrapping—just like actual children in many experiments. This in turn allows the model to do something else actual children do in numerous experiments: one-shot learning of words when the words are embedded in a syntactic context (i.e. an utterance that includes other words) and the child has enough syntactic knowledge internalized to use that syntactic

modeling syntactic acquisition

259

context. Because of this syntactic bootstrapping, the modeled child also shows accelerated learning of individual vocabulary items corresponding to specific syntactic categories (like nouns and transitive verbs). As the child acquires more syntactic knowledge, it becomes faster and faster at acquiring new word meanings. Related to this, it turns out that it’s easier for the modeled child to acquire the syntactic structure associated with nouns, and so nouns are bootstrapped earlier in development. This then leads to the commonly-observed “noun bias” in children’s early vocabularies (Braginsky, Yurovsky, Marchman, and Frank 2015). The modeled child is also capable of sudden “inspiration”—that is, sudden jumps in learning, where many different aspects of knowledge are now present. This is in part due to the overhypotheses about structure—information about the right representations can come from many different sources. So, one piece of information can be the tipping point for a particular popular structure, and every utterance using that popular structure then benefits. This again may remind us of linguistic parameters. The main difference is that the modeled child of Abend et al. (2017) doesn’t hardcode prior knowledge of specific linguistic structures connected together under a linguistic parameter. Instead, the popular structures emerge as explicit hypotheses from the latent hypothesis space because the data themselves show these structures get reused a lot. All the modeled child has built in is the overhypothesis to look for structural pieces in common across utterances and the bias to prefer popular ones. Taken together, the results from the computational-level learner of Abend et al. (2017) suggest that many observed developmental effects for children’s syntactic knowledge may not require syntax-specific mechanisms. Children do need structured representations (here: the semantics encoded in a logical form) and constraints on how representations can be built (here: constraints on the form of syntactic structures). With this knowledge, the knowledge that there’s an intimate connection between logical form and syntactic form, and a few domain-general biases, children can in principle bootstrap their way to sophisticated syntactic knowledge and a variety of observed acquisition behaviors.

7.4 Where we are and where we’re headed

..........................................................................................................................

7.4.1 Where we are As language scientists interested in understanding syntactic acquisition, we can use computational modeling to formalize specific proposals about how syntactic acquisition could work and empirically evaluate those proposals. The model allows us to look inside the acquisition process in a very precise way so that we can explain exactly how and why the proposal works (or doesn’t). This gives us a lot of explanatory power, particularly for aspects of syntactic acquisition that are difficult to understand through other methods (for example, what the data intake can be or which learning strategies

260

lisa s. pearl

can succeed). Using computational modeling, we can understand more about both the developing representations in children during syntactic acquisition and the learning mechanisms that work in tandem with those developing representations. The key to building an informative computational model is characterizing the syntactic acquisition task with as much precision as we can. This involves defining several task aspects, including the modeled learner’s initial state, the data intake used for learning, the inference process that updates the learner’s internal representations, the learning period during which acquisition occurs, and the learner’s target state. To make our models match the human language acquisition process as much as possible, we try to ground all these aspects in empirical data. This includes considerations of cognitive plausibility, so that the modeled learner is doing something that we think children are potentially capable of doing. When we do this, we have the best chance of creating informative computational models of syntactic acquisition. As discussed earlier in the chapter, there are a variety of ways for computational models to be informative. Computational-level models investigate whether a specific learning strategy (and, importantly, the learning assumptions that strategy includes) will work in principle—if inference were optimal, are these assumptions enough to get the job done? Algorithmic-level models investigate if a learning strategy will work in practice for children, who have cognitive limitations—when inference is approximate, are these learning assumptions good enough? Implementational-level models investigate if the strategy will work in practice in children’s brains, which have biological limitations—when the representations and inference are encoded in the wetware of the brain, are these learning assumptions good enough? Something important to remember is that a model’s inference process is about how the modeled learner updates the underlying representations, based on the data intake. Typically, this involves more or less fancy ways of counting things, and reasoning about those counts. Where linguistic theory usually comes in—and what often drives a modeled learner to acquisition success or failure—is what specifically in the input is being counted. To put it simply, it’s not that things are being counted, it’s what things are being counted. This is often what varies from acquisition theory to acquisition theory, and what does the explanatory work. In this chapter, I surveyed several syntactic acquisition models that demonstrated exactly this, highlighting the importance of the modeled learner’s assumptions about what data from the input were relevant to learn from, based on what was being learned. These models included both parametric and non-parametric approaches to syntactic acquisition, and all the models evaluated specific proposals about the exact knowledge and abilities necessary to solve certain syntactic acquisition tasks. Common elements of the parametric approaches were that (i) data perceived as unambiguous are particularly impactful, and (ii) the acquisitional intake is generated by parsing the input data with the currently available parametric options. Notably, parametric approaches rely on very precise, language-specific innate knowledge, in the form of linguistic parameters. Non-parametric approaches don’t necessarily have to, although they can incorporate

modeling syntactic acquisition

261

precise, language-specific prior knowledge. Interestingly, a common element of the non-parametric models surveyed here is that less-specific prior linguistic knowledge can often be effectively used to solve different syntactic acquisition tasks. Interestingly, this less-specific prior knowledge often involves a bias for structured linguistic representations. Many syntactic acquisition models I discussed also encode very precise ideas about how developing representations and developing processing abilities work together during syntactic acquisition. An exciting direction in the syntactic acquisition modeling community is to develop more articulated models of this kind (Lidz and Gagliardi 2015; Omaki and Lidz 2015), which recognize how syntactic acquisition occurs as part of a larger cognitive system.

7.4.2 Where we’re headed This idea of syntactic acquisition occurring as part of broader linguistic (and cognitive) development underscores other sources of information we’ll want to integrate into our syntactic acquisition theories, including other types of syntactic, linguistic, and non-linguistic information. That is, even for what seems to be an acquisition task that targets a specific piece of syntactic knowledge, children may well be using a variety of data sources to either constrain possible hypotheses or helpfully search through those hypotheses (or both). We saw this in the case studies of pronoun interpretation, where other syntactic data (English anaphoric one: Data coming from other pronouns) or other linguistic data (anaphora resolution: discourse information) were successfully leveraged. We also saw this in case studies of the linking problem and learning about syntactic structure more generally, where other types of conceptual information were incorporated: animacy, thematic roles, and logical forms of utterances. Relatedly, it may be useful to consider how syntactic acquisition may be bootstrapped by other representations developing at the same time. This kind of joint learning often allows a child to learn better and more quickly than sequential learning (in sequential learning, first a “foundational” representation is learned completely, and then the representation that builds on that foundational representation is learned). For example, in the realm of phonology, learning about phonetic categories first and then word forms using those categories second is harder than learning about phonetic categories and word forms at the same time (Feldman, Griffiths, Goldwater, and Morgan 2013). It’s also harder to learn phonetic categories first and phonological rules using the phonetic categories second, rather than learning the correct phonological rules at the same time as learning phonetic categories (Dillon, Dunbar, and Idsardi 2013). In the realm of syntax, it’s typically been assumed by modelers that syntactic categories are known before the syntactic rules that depend on those categories are learned. Yet, Abend et al. (2017) demonstrate that great acquisition strides can be made when both of these representations (categories and the rules that use them) are learned at the same time. This is all

262

lisa s. pearl

due to the power of bootstrapping, where partial information from one representation is helpful for learning about another representation. Exploring a broader range of data sources for children’s acquisitional intake may also help us better understand how children solve the induction problems that occur in syntactic acquisition (and language acquisition more generally). The key idea is that a wealth of indirect positive evidence may exist for the specific syntactic knowledge children need to acquire, simply because children are learning a linguistic system as a whole and not just isolated pieces of it. More concretely, by thinking about syntactic acquisition data this way, we may develop additional answers for the representations and learning mechanisms necessary for successful syntactic acquisition. These answers may complement or extend existing proposals for solving acquisition tasks that seem to require prior knowledge in children (e.g. see a handy list of examples from Hsu and Chater (2010), along with estimates of which ones are more likely to cause problems for acquisition without additional prior knowledge). Of course, children have to know to look (and more importantly, where to look) for just the right sources of information in the vast input signal around them. This justright search of the input likely requires prior knowledge about what to look for (e.g. Perfors et al. 2011; Pearl and Sprouse 2013b; Pearl and Mis 2016). Moreover, children have to be able to reliably extract out the just-right information from their input. This is where their developing language processing abilities and extralinguistic abilities have a huge impact, as the articulated models of Lidz and Gagliardi (2015) and Omaki and Lidz (2015) show. This is why understanding more about children’s developing parsing and extralinguistic abilities is paramount for building more informative and integrated computational models of syntactic acquisition. We can also implement more informative models once we have more precise empirical data about what children know when, and what abilities are available when. Another exciting theoretically oriented takeaway relates to linguistic parameters. While traditional linguistic parameters are very intriguing from an acquisition perspective for all the reasons I’ve discussed in this chapter, they’ve run into empirical coverage problems as comparative linguists have surveyed more and more data from the world’s languages.11 For one thing, it’s difficult to identify a single parameter that doesn’t have exceptions—that is, the structures that are meant to be connected to a parameter are often connected as expected, but sometimes they’re not (e.g. the headedness parameter: German has prepositions before their objects (head-initial) but verbs after their complements (head-final)). This is a problem if linguistic parameters are hardcoded into a child’s prior knowledge because exceptions shouldn’t be allowed at all; however, if linguistic parameters emerge as overhypotheses from structural pieces that keep getting reused as the child tries to understand her language, then it’s not surprising to find exceptions. The “linguistic parameter” for that language just doesn’t happen to involve the exceptional structural piece. Importantly, the exception doesn’t have to be pre-defined—and neither does the linguistic parameter. Linguistic parameters can 11

Some of these data are helpfully available at http://wals.info, the World Atlas of Language Structures.

modeling syntactic acquisition

263

have more variation in how they get instantiated from language to language, but what’s in common is how they’re derived from the prior knowledge the child has about structured representations and using those representations to parse and interpret the input. So, less-specific prior knowledge of structured representations coupled with the available language data and the child’s learning mechanisms may allow us to derive what linguistic parameters actually look like, and how much variation there is in what they look like. One current empirical hurdle is the lack of large-scale datasets of structurally annotated child-directed speech from different languages. The CHILDES Treebank12 currently provides approximately 201K utterances of speech directed at North American English children which is annotated with phrase structure, as well as some animacy and thematic role information. This has been enough to start evaluating English syntacticacquisition models. Right now, many computational models of syntactic acquisition focus on English because that’s where the empirical data are easily available. But that means we only have an English-focused modeling snapshot of the universal process of syntactic acquisition that all typically-developing children are supposed to go through. To evaluate our syntactic acquisition theories more thoroughly with computational modeling techniques, we need structurally annotated data from other languages. We also need data from other populations that may have quantitatively or qualitatively different input than those whose input data we’ve seen so far. For example, we know that there are both quantitative and qualitative differences in children’s linguistic input across socioeconomic status (SES) (e.g. see Schwab and Lew-Williams (2016) for an overview). However, we currently don’t know if the input needed for syntactic acquisition differs quantitatively, qualitatively, or both across SES. We also don’t know to what extent any such differences impact the development of the syntactic knowledge itself. This, however, is exactly what a computational model could tell us, if only we had the data to feed into it. With this in mind, my student Alandi Bates recently syntactically annotated just under twenty thousand utterances of lower-SES child-directed speech.13 We then investigated the quantity and quality of the input with respect to learning about syntactic islands. Our results suggest first that the wh-dependency data in lower-SES children’s data is quantitatively similar to the wh-dependency data in higher-SES children’s data. This is a welcome finding, as it suggests that this kind of complex linguistic input doesn’t differ that much across SES, unlike many other more fundamental input types. Moreover, by applying the same syntactic islands learning model used by Pearl and Sprouse (2013b), we found that lower-SES children’s wh-dependency input doesn’t differ qualitatively either. That input will allow a child to successfully internalize knowledge of the same islands that higher-SES children’s data does, assuming the child is using the same probabilistic learning strategy. This is an even more welcome finding, as it suggests that any linguistic gap across SES doesn’t extend to this complex syntactic level–once 12 13

Available at http://www.socsci.uci.edu/~lpearl/CoLaLab/CHILDESTreebank/childestreebank.html. Available as part of the CHILDES Treebank.

264

lisa s. pearl

children have the ability to take in the available wh-dependency data in their input and process it, they’ll be able to learn syntactic islands knowledge. There may not be a complex syntax gap across SES, so to speak, but much more investigation of this type remains to be done. And that’s really the big picture about using computational models for syntactic acquisition: They’re informative tools for answering certain kind of questions, provided we have the right empirical data to base them on. When we understand what knowledge and abilities children have available at each stage of syntactic development, what data they have available, and what exactly they know when, we can better determine what building blocks they use to acquire syntax as well as they do.

References Abend, Omri, Tom Kwiatkowski, Nathaniel J. Smith, Sharon Goldwater, and Mark Steedman. 2017. Bootstrapping language acquisition. Cognition 164: 116–143. Alishahi, Afra. 2010. Computational modeling of human language acquisition. Synthesis Lectures on Human Language Technologies 3(1): 1–107. Alishahi, Afra, and Suzanne Stevenson. 2008. A computational model of early argument structure acquisition. Cognitive Science 32(5): 789–834. Bar-Sever, Galia, and Lisa Pearl. 2016. Syntactic categories derived from frequent frames benefit early language processing in English and ASL. In Proceedings of the 40th Annual Boston University Conference on Child Language Development, 32–46. Somerville, MA: Cascadilla. Bar-Sever, Galia, Rachael Lee, Gregory Scontras, and Lisa Pearl. 2018. Little lexical learners: Quantitatively assessing the development of adjective ordering preferences. In A. Bertolini and M. Kaplan (eds), Proceedings of the 42nd Annual Boston University Conference on Child Language Development, 58–71. Somerville, MA: Cascadilla. Bates, Alandi, Lisa Pearl, and Susan Braunwald. 2018. I can believe it: Quantitative evidence for closed-class category knowledge in an English-speaking 20- to 24-month-old child. In K. Garvin, N. Hermalin, M. Lapierre, Y. Melguy, T. Scott, and E. Wilbanks (eds), Proceedings of the 44th Annual Meeting of the Berkeley Linguistics Society, 1–15. Becker, Misha. 2006. There began to be a learnability puzzle. Linguistic Inquiry 37(3): 441–456. Becker, Misha. 2007. Animacy, expletives, and the learning of the raising-control distinction. Generative Approaches to Language Acquisition North America 2: 12–20. Becker, Misha. 2009. The role of NP animacy and expletives in verb learning. Language Acquisition 16(4): 283–296. Becker, Misha. 2014. Animacy and hematic alignment. Cambridge: Cambridge University Press. Berwick, Robert, Paul Pietroski, Beraca Yankama, and Noam Chomsky. 2011. Poverty of the stimulus revisited. Cognitive Science 35: 1207–1242. Boeckx, Cedric, and Evelina Leivada. 2014. On the particulars of Universal Grammar: Implications for acquisition. Language Sciences 46: 189–198. Bonawitz, Elizabeth, Stephanie Denison, Annie Chen, Alison Gopnik, and Thomas L. Griffiths. 2011. A simple sequential algorithm for approximating Bayesian inference. In L. Carlson, C Hoelscher, and T. F. Shipley (eds), Proceedings of the 33rd Annual Meeting of the Cognitive Science Society, 2463–2468.

modeling syntactic acquisition

265

Braginsky, Mika, Daniel Yurovsky, Virginia A. Marchman, and Michael C. Frank. 2015. Developmental changes in the relationship between grammar and the lexicon. In Proceedings of the Cognitive Science Society, 256–261. Bush, Robert R., and Frederick Mosteller. 1951. A model for stimulus generalization and discrimination. Psychological Review 58(6): 413. Chemla, Emmanuel, Toben H. Mintz, Savita Bernal, and Anne Christophe. 2009. Categorizing words using “frequent frames”: What cross-linguistic analyses reveal about distributional acquisition strategies. Developmental Science 12(3): 396–406. Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press. Chomsky, Noam. 1970. Remarks on monimalization. In R. Jacobs and P. Rosenbaum (eds), Reading in English Transformational Grammar, 184–221. Waltham, MA: Ginn. Chomsky, Noam. 1971. Problems of knowledge and freedom. London: Fontana. Chomsky, Noam. 1973. Conditions on transformations. In S. Anderson and P. Kiparsky (eds), A Festschrift for Morris Halle, 237–286. New York: Holt, Rinehart, & Winston. Chomsky, Noam. 1981. Lectures on Government and Binding. Dordrecht: Foris. Clark, Robin. 1992. The selection of syntactic knowledge. Language Acquisition 2(2): 83–149. Clerkin, Elizabeth M., Elizabeth Hart, James M. Rehg, Chen Yu, and Linda B. Smith. 2016. How everyday visual experience prepares the way for learning object names. In Development and Learning and Epigenetic Robotics, 126–131. Piscataway, NJ: IEEE. Crain, Stephen, and Mineharu Nakayama. 1987. Structure dependence in grammar formation. Language 63: 522–543. De Vincenzi, Marica. 1991. Syntactic parsing strategies in Italian: The minimal chain principle. Dordrecht: Springer. Dillon, Brian, Ewan Dunbar, and William Idsardi. 2013. A single-stage approach to learning phonological categories: Insights from Inuktitut. Cognitive Science 37: 344–377. Elman, Jeffrey L. 1993. Learning and development in neural networks: The importance of starting small. Cognition 48(1): 71–99. Erkelens Marian. 2009. Learning to categorize verbs and nouns: Studies on Dutch. Amsterdam: Netherlands Graduate School of Linguistics/LOT. Feldman, Naomi, Thomas Griffiths, Sharon Goldwater, and James Morgan. 2013. A role for the developing lexicon in phonetic category acquisition. Psychological Review 120(4): 751–778. Fodor, Janet D. 1998a. Parsing to Learn. Journal of Psycholinguistic Research 27(3): 339–374. Fodor, Janet D. 1998b. Unambiguous triggers. Linguistic Inquiry 29: 1–36. Fodor, Janet Dean. 2017.Ambiguity, parsing, and the evaluation measure. Language Acquisition 24(2): 85–99. Fodor, Janet Dean, and William Gregory Sakas. 2005. The subset principle in syntax: Costs of compliance. Journal of Linguistics 41(3): 513–569. Fodor, Janet D., and William G. Sakas. 2017. Learnability. In Ian Roberts (ed.), The Oxford handbook of Universal Grammar, ch. 11. Oxford: Oxford University Press. Fodor, Janet Dean, William Gregory Sakas, and Arthur Hoskey. 2007. Implementing the subset principle in syntax acquisition: Lattice-based models. In Proceedings of the Second European Cognitive Science Conference, 161–166. Hillsdale, NJ: Erlbaum. Foraker, Stephani, Terry Regier, Naveen Khetarpal, Amy Perfors, and Joshua Tenenbaum. 2009. Indirect evidence and the poverty of the stimulus: The case of anaphoric one. Cognitive Science 33: 287–300. Frank, Michael C., Mika Braginsky, Daniel Yurovsky, and Virginia A Marchman. 2017. Wordbank: An open repository for developmental vocabulary data. Journal of Child Language 44(3): 677–694.

266

lisa s. pearl

Frank, Michael C., Joshua B. Tenenbaum, and Anne Fernald. 2013. Social and discourse contributions to the determination of reference in cross-situational word learning. Language Learning and Development 9(1): 1–24. Frank, Robert, Donald Mathis, and William Badecker. 2013. The acquisition of anaphora by simple recurrent networks. Language Acquisition 20(3): 181–227. Frank, Stella, Sharon Goldwater, and Frank Keller. 2009. Evaluating models of syntactic category acquisition without using a gold standard. In Proceedings of the 31st Annual Conference of the Cognitive Science Sociey, 2576–2581. Frank, Stella, Sharon Goldwater, and Frank Keller. 2013.Adding sentence types to a model of syntactic category acquisition. Topics in Cognitive Science 5(3): 495–521. Freudenthal, Daniel, and Afra Alishahi. 2014. Computational models of language development. In Patricia J. Brooks and Vera Kempe (eds), Encyclopedia of language development. Thousand Oaks, CA: SAGE. Freudenthal, Daniel, Julian M. Pine, Javier Aguado-Orea, and Fernand Gobet. 2007. Modeling the developmental patterning of finiteness marking in English, Dutch, German, and Spanish using MOSAIC. Cognitive Science 31(2): 311–341. Freudenthal, Daniel, Julian M. Pine, and Fernand Gobet. 2009. Simulating the referential properties of Dutch, German, and English root infinitives in MOSAIC. Language Learning and Development 5(1): 1–29. Freudenthal, Daniel, Julian Pine, and Fernand Gobet. 2010. Explaining quantitative variation in the rate of optional infinitive errors across languages: A comparison of MOSAIC and the variational learning model. Journal of Child Language 37(3): 643–669. Freudenthal, Daniel, Julian M. Pine, Gary Jones, and Fernand Gobet. 2015. Defaulting effects contribute to the simulation of cross-linguistic differences in optional infinitive errors. In Proceedings of the 37th Annual Meeting of the Cognitive Science Society, 746–751. Gibson, Edward, and Kenneth Wexler. 1994. Triggers. Linguistic Inquiry 25(4): 407–454. Goodman, Nelson. 1994. Fact, fiction and forecast. Cambridge, MA: Harvard University Press. Gutman, Ariel, Isabelle Dautriche, Benoît Crabbé, and Anne Christophe. 2015. Bootstrapping the syntactic bootstrapper: Probabilistic labeling of prosodic phrases. Language Acquisition 22(3): 285–309. Hsu, Anne S., and Nick Chater. 2010. The logical problem of language acquisition: A probabilistic perspective. Cognitive Science 34(6): 972–1016. Huang, C.-T. James. 1982. Logical relations in Chinese and the theory of grammar. PhD thesis, Massachusetts Institute of Technology. Hyams, Nina. 1987. The theory of parameters and syntactic development. In Parameter setting, 1–22. Berlin: Springer. Jackendoff, Ray. 1977. X-bar syntax: A study of phrase structure. Cambridge, MA: MIT Press. Kam, Xuaˆn Nga Cao, Iglika Stoyneshka, Lidiya Tornyova, Janet D. Fodor, and William G. Sakas. 2008. Bigrams and the richness of the stimulus. Cognitive Science 32(4): 771–787. Kemp, Charles, and Joshua B. Tenenbaum. 2008. The discovery of structural form. Proceedings of the National Academy of Sciences 105(31): 10687–10692. Kemp, Charles, Amy Perfors, and Joshua Tenenbaum. 2007. Learning overhypotheses with hierarchical Bayesian models. Developmental Science 10(3): 307–321. Kol, Sheli, Bracha Nir, and Shuly Wintner. 2014. Computational evaluation of the traceback method. Journal of Child Language 41(1): 176–199. Lasnik, Howard, and Mamuro Saito. 1984. On the nature of proper government. Linguistic Inquiry 15: 235–289.

modeling syntactic acquisition

267

Legate, Julie, and Charles Yang. 2002. Empirical re-assessment of stimulus poverty arguments. Linguistic Review 19: 151–162. Legate, Julie, and Charles Yang. 2007. Morphosyntactic learning and the development of tense. Linguistic Acquisition 14(3): 315–344. Legate, Julie, and Charles Yang. 2013. Assessing child and adult grammar. In Robert Berwick and Massimo Piatelli-Palmarini (eds), Rich languages from poor inputs, 168–182. Oxford: Oxford University Press. Lewis, John D., and Jeffrey L. Elman. 2001. Learnability and the statistical structure of language: Poverty of stimulus arguments revisited. In B. Skarabela, S. Fish, and A. H.-J. Do (eds), Proceedings of the 26th Annual Conference on Language Development, 359–370. Lidz, Jeffrey, and Annie Gagliardi. 2015. How nature meets nurture: Universal Grammar and statistical learning. Annual Review of Linguistics 1(1): 333–352. Lidz, Jeffrey, Sandra Waxman, and Jennifer Freedman. 2003. What infants know about syntax but couldn’t have learned: Experimental evidence for syntactic structure at 18 months. Cognition 89: B65–B73. MacWhinney, Brian. 2000. The CHILDES project: Tools for analyzing talk. Mahwah, NJ: Erlbaum. Marr, David. 1982.Vision. San Francisco, CA: Freeman. Mintz, Toben. 2003. Frequent frames as a cue for grammatical categories in child directed speech. Cognition 90: 91–117. Mitchener, William Garrett, and Misha Becker. 2010. Computational models of learning the raising-control distinction. Research on Language and Computation 8(2–3): 169–207. Niyogi, Partha, and Robert C. Berwick. 1996. A language learning model or finite parameter spaces. Cognition 61: 161–193. Omaki, Akira, and Jeffrey Lidz. 2015. Linking parser development to acquisition of syntactic knowledge. Language Acquisition 22(2): 158–192. Orita, Naho, Rebecca McKeown, Naomi H Feldman, Jeffrey Lidz, and Jordan Boyd-Graber. 2013. Discovering pronoun categories using discourse information. In M. Knauff, M. Pauen, N. Sebanz, and I. Wachsmuth (eds), Proceedings of the 35th Annual Meeting of the Cognitive Science Society, 3193–3198. Payne, John, Geoffrey Pullum, Barbara Scholz, and Eva Berlage. 2013. Anaphoric one and its implications. Language 90(4): 794–829. Pearl, Lisa. 2010. Using computational modeling in language acquisition research. In Elma Blon and Sharon Unsworth (eds), Experimental methods in language acquisition research, 163–184. Amsterdam: Benjamins. Pearl, Lisa. 2011. When unbiased probabilistic learning is not enough: Acquiring a parametric system of metrical phonology. Language Acquisition 18(2): 87–120. Pearl, Lisa. 2014. Evaluating learning strategy components: Being fair. Language 90(3): e107– e114. Pearl, Lisa, and Sharon Goldwater. 2016. Statistical learning, inductive bias, and Bayesian inference in language acquisition. In Jeffrey Lidz, William Snyder, and Joe Pater (eds), The Oxford handbook of developmental linguistics, 664–695. Oxford: Oxford University Press. 2016. Pearl, Lisa, and Jeffrey Lidz. 2009. When domain-general learning fails and when it succeeds: Identifying the contribution of domain-specificity. Language Learning and Development 5(4): 235–265.

268

lisa s. pearl

Pearl, Lisa, and Jeffrey Lidz. 2013. Parameters in language acquisition. In Kleanthes Grohmann and Cedric Boeckx (eds), The Cambridge handbook of biolinguistics, 129–159. Cambridge: Cambridge University Press. Pearl, Lisa, and Benjamin Mis. 2011. How far can indirect evidence take us? Anaphoric one revisited. In L. Carlson, C. Höschler, and T. Shipley (eds), Proceedings of the 33rd Annual Conference of the Cognitive Science Society, 879–884. Pearl, Lisa, and Benjamin Mis. 2016. The role of indirect positive evidence in syntactic acquisition: A look at anaphoric one. Language 92(1): 1–30. Pearl, Lisa, and Jon Sprouse. 2013a. Syntactic islands and learning biases: Combining experimental syntax and computational modeling to investigate the language acquisition problem. Language Acquisition 20: 19–64. Pearl, Lisa, and Jon Sprouse. 2013b. Computational models of acquisition for islands. In Jon Sprouse and Norbert Hornstein (eds), Experimental syntax and islands effects, 109–131. Cambridge: Cambridge University Press. Pearl, Lisa, and Jon Sprouse. 2015. Computational modeling for language acquisition: A tutorial with syntactic islands. Journal of Speech, Language, and Hearing Research 58: 740–753. Pearl, Lisa, and Jon Sprouse. 2018a. Comparing solutions to the linking problem using an integrated quantitative framework of language acquisition. URL https://ling.auf.net/lingbuzz/003913. Pearl, Lisa, and Jon Sprouse. 2018b. The acquisition of linking theories: a Tolerance Principle approach to learning UTAH and rUTAH. URL https://ling.auf.net/lingbuzz/004088. Pearl, Lisa, Timothy Ho, and Zephyr Detrano. 2017. An argument from acquisition: Comparing English metrical stress representations by how learnable they are from child-directed speech. Language Acquisition 24: 307–342. Perfors, Amy. 2012. Bayesian models of cognition: What’s built in after all? Philosophy Compass7(2): 127–138. Perfors, Amy, Joshua Tenenbaum, and Elizabeth Wonnacott. 2010. Variability, negative evidence, and the acquisition of verb argument constructions. Journal of Child Language 37(3): 607–642. Perfors, Amy, Joshua Tenenbaum, and Terry Regier. 2011. The learnability of abstract syntactic principles. Cognition 118: 306–338. Phillips, Colin, and Lara Ehrenhofer. 2015. The role of language processing in language acquisition. Linguistic Approaches to Bilingualism 5(4): 409–453. Phillips, Lawrence, and Lisa Pearl. 2014a. Bayesian inference as a viable cross-linguistic word segmentation strategy: It’s all about what’s useful. In Proceedings of the 36th Annual Conference of the Cognitive Science Society, 2775–2780. Phillips, Lawrence, and Lisa Pearl. 2014b. Bayesian inference as a cross-linguistic word segmentation strategy: Always learning useful things. In A. Lenci, M. Padró, T. Poibeau, A. Villavicencio (eds), Proceedings of the 5th Workshop on Cognitive Aspects of Computational Language Learning, 9–13. Phillips, Lawrence, and Lisa Pearl. 2015a. Utility-based evaluation metrics for models of language acquisition: A look at speech segmentation. In Proceedings of CMCL, 68–78. Phillips, Lawrence, and Lisa Pearl. 2015b. The utility of cognitive plausibility in language acquisition modeling: Evidence from word segmentation. Cognitive Science. doi:10.1111/cogs.12217. Pullum, Geoffrey, and Barbara Scholz. 2002. Empirical assessment of stimulus poverty arguments. Linguistic Review 19: 9–50.

modeling syntactic acquisition

269

Räsänen, Okko. 2012. Computational modeling of phonetic and lexical learning in early language acquisition: existing models and future directions. Speech Communication 54(9): 975–997. Reali, Florencia, and Morten Christiansen. 2005. Uncovering the richness of the stimulus: Structure dependence and indirect statistical evidence. Cognitive Science 29: 1007–1028. Regier, Terry, and Susanne Gahl. 2004. Learning the unlearnable: The role of missing evidence. Cognition 93: 147–155. Ross, John. 1967. Constraints on variables in syntax. PhD thesis, Massachusetts Institute of Technology. Sakas, William. 2003. A word-order database for testing computational models of language acquisition. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, 415–422. Sakas, William. 2016. Computational approaches to parameter setting in generative linguistics. In Jeffrey Lidz, William Snyder, and Joe Pater (eds), The Oxford handbook of developmental linguistics, 696–724. Oxford: Oxford University Press. Sakas, William, and Janet Fodor. 2012. Disambiguating syntactic triggers. Language Acquisition 19(2): 83–143. Sakas, William, and Janet D. Fodor. 2001. The structural triggers learner. In Stefano Bertolo (ed.), Language acquisition and learnability, 172–233. Cambridge: Cambridge University Press. Sakas, William, and Eiji Nishimoto. 2002. Search, structure or statistics? A comparative study of memoryless heuristics for syntax acquisition. In Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society. Savinelli, K. J., Gregory Scontras, and Lisa Pearl. 2017. Modeling scope ambiguity resolution as pragmatic inference: Formalizing differences in child and adult behavior. In Proceedings of the 39th Annual Meeting of the Cognitive Science Society. Savinelli, K. J., Gregory Scontras, and Lisa Pearl. 2018. Exactly two things to learn from modeling scope ambiguity resolution: Developmental continuity and numeral semantics. In Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics. Schwab, Jessica F., and Casey Lew-Williams. 2016. Language learning, socioeconomic status, and child- directed speech. Wiley Interdisciplinary Reviews: Cognitive Science 7: 264–275. Sprouse, Jon, Matt Wagers, and Colin Phillips. 2012. A test of the relation between working memory capacity and syntactic island effects. Language 88(1): 82–124. Wang, Hao, and Toben Mintz. 2008. A dynamic learning model for categorizing words using frames. In Harvey Chan, Heather Jacob, and Enkeleida Kapia (eds), Proceedings of the 32nd Annual Boston University Conference on Language Development, 525–536. Somerville, MA: Cascadilla. Wang, Hao, Barbara Höhle, N. F. Ketrez, Aylin C. Küntay, Toben H. Mintz, N. Danis, K. Mesh, and H. Sung. 2011. Cross-linguistic distributional analyses with frequent frames: The cases of German and Turkish. In Proceedings of 35th Annual Boston University Conference on Language Development, 628–640. Somerville, MA: Cascadilla. Weisleder, Adriana, and Sandra R. Waxman. 2010. What’s in the input? Frequent frames in child-directed speech offer distributional cues to grammatical categories in Spanish and English. Journal of Child Language 37(5): 1089–1108. Xin Cai, Ling Xiao, and Thomas Lee 2006.The development of the verb category and verb argument structures in Mandarin-speaking children before two years of age. In Yukio Otsu (ed.), Proceedings of the Seventh Tokyo Conference on Psycholinguistics, 299–322.

270

lisa s. pearl

Yang, Charles. 2002. Knowledge and learning in natural language. Oxford: Oxford University Press. Yang, Charles. 2004. Universal grammar, statistics or both? Trends in Cognitive Science 8(10): 451–456. Yang, Charles. 2005. On productivity. Yearbook of Language Variation 5: 333–370. Yang, Charles. 2011. A statistical test for grammar. In Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics, 3038. Yang, Charles. 2012. Computational models of syntactic acquisition. WIREs Cognitive Science 3: 205–213. Yang, Charles. 2015. Negative knowledge from positive evidence. Language 91(4): 938–953. Yang, Charles. 2016. The price of linguistic productivity: How children learn to break the rules of language. Cambridge, MA: MIT Press.

c ha p t e r 8 ...........................................................................................................

a r t i f i c i a l l a n g ua g e learning ...........................................................................................................

jennifer culbertson

8.1 Introduction

..........................................................................................................................

The earliest uses of artificial language (or grammar) learning in psychology were focused on whether learners could extract implicit rules or rule-like generalizations from structured input (e.g. Braine 1963; Reber 1967). These experiments were extended starting in the 1990s to show that learners could use distributional information in the input—statistical learning—to form representations of word boundaries and phrasal constituents, along with syntax-like rules (Saffran et al. 1996; Mintz et al. 2002; Reeder et al. 2013). More recently, researchers in cognitive psychology and theoretical linguistics have begun adapting artificial language learning methods to study how statistical learning might interact with or be shaped by other properties of the cognitive system. An explicit goal of this research, and the focus of this chapter, is using these methods to provide evidence for cognitive constraints or biases which might explain specific features of language structure. Such constraints form a cornerstone of generative theories of grammar, where they are posited in order to restrict the set of possible human languages. The two main traditional sources of evidence for grammatical constraints are developmental pathways during acquisition, and common features of synchronic and diachronic typology. For example, common errors in the acquisition of phonology, such as the simplification of multisyllabic words or consonant clusters, have been used as evidence for positing universal constraints on syllable structure (e.g. Barlow 2000). These same constraints are supported to the extent that they can generate all and only typologically attested syllable structure patterns (Prince and Smolensky 1993/2004). Similarly, research in syntax has looked for evidence of universal principles, like structure-dependence, both in the

272

jennifer culbertson

types of errors children make (or fail to make) during early acquisition and in the types of rules found (or not found) in the world’s languages (e.g. Crain and Nakayama 1987).

8.1.1 Empirical and theoretical challenges to universal constraints The view of language, and language learning, as subject to a set of universal constraints is challenged both by empirical facts and alternative theoretical approaches. Here I will argue that artificial language learning experiments can be used both to test predicted effects of hypothesized constraints, and to refine how the constraints should be formalized in a theory of grammar. First, however, I will briefly lay out some of the challenges linguists face in making a convincing case for particular constraints on language using natural language acquisition and typological data alone. Perhaps the most well-known argument for constraints on the linguistic system comes from the “argument from the Poverty-of-the-Stimulus” (Chomsky 1959). The general sketch of this argument is that what children acquire—abstract, generalizable grammatical knowledge—and how they acquire it—with remarkable speed, without explicit guidance, and with markedly better results than adults—is only possible with a set of universal guiding principles in place. Of course, almost all aspects of the argument have been vigorously challenged (e.g. Morgan et al. 1995; Pullum and Scholz 2002; Ambridge et al. 2008), and at the very least we now have a much clearer picture of how much grammatical knowledge learners can in fact acquire on the basis of the input they get. Computational models of phonotactic learning, word segmentation, and syntactic category learning, for example, have been built to explore this in mathematically rigorous ways (e.g. Hayes and Wilson 2008; Frank et al. 2010; Redington et al. 1998, among many others). That does not necessarily mean that learning is not constrained (Legate and Yang 2002); however it does mean that evidence of sophisticated, early-acquired knowledge on its own may not indicate the presence of universal guiding principles. This has important implications for research aimed at showing that certain logically possible structures or rules are not learned, that there are differences in the rate of learning among alternative patterns, or that certain error types are more common than others. In these cases, one must show quantitatively that these asymmetries are not present in, or otherwise driven by particular properties of the input. Research highlighting the power of statistical or distributional learning mechanisms may suggest that a grammar can be acquired on the basis of the input alone, largely obviating the need to posit constraints on learning. A potentially strong counter to this is the existence of typological universals. If certain types of linguistic patterns, rules, or structures are systematically missing from the typological record, then perhaps we don’t want our learning mechanism to be able to acquire them, even if it could. In other words, we may need to posit a constraint or principle which actively rules them out. On the face of it, typological universals therefore provide compelling evidence of constraints on language. These constraints could be active during learning

artificial language learning

273

in particular, or could persist in later language usage, shaping how linguistic systems change over time. There are two main challenges linguists face in inferring constraints on language directly from the typological record: limitations on the data itself, and challenges involved in interpreting reliable typological differences. Typological universals are formulated on the basis of a sample of languages, in some case very small (e.g. Greenberg’s sample of 30), but more recently relatively large (e.g. 2,679 in Dryer and Haspelmath 2013)). Although larger samples may result in a more reliable picture of what types of linguistic systems are attested, any sample will necessarily omit many hypothetically possible languages which simply don’t exist. This makes it difficult if not impossible to establish to a satisfactory level of certainty that a universal is absolute, or inviolable (Piantadosi and Gibson 2014). Further, samples are confounded by genetic and areal relations among the languages—in other words individual languages cannot be treated as independent data points (Cysouw 2005). There are computational methods which make it possible to take these factors into account (e.g. Bickel 2007; Dunn et al. 2011), however they are not without limitations (Jaeger et al. 2011). More importantly, even if a typological universal does appear to be very reliable, this does not tell us that the best explanation for it is a principle of grammar, or even a more general property of our cognitive system. This exactly parallels acquisition (where evidence for the absence of particular types of linguistic errors in development does not necessarily implicate a principle of grammar). There is a large body of research in typology as well as theoretical linguistics which argues that most if not all universals result from pathways of diachronic change, not constraints active in the cognitive systems of individuals. For example, evolutionary phonology argues that many phonology universals result from “channel bias”—misperceptions or misarticulations which over time change languages in systematic ways. In other words, these patterns arise from constraints on the physical/sensory system rather than from cognitive or grammatical principles shaping learning (Ohala 1992; Blevins 2004). Similar work argues that grammaticalization pathways in syntax lead to universals relating to morphosyntax and word order (among other things, Aristar 1991; Bybee 2006; Whitman 2008). This is all in addition to the fact that many proposed universals are in fact known to admit exceptions, i.e. they are statistical tendencies rather than absolute universals. This has led some researchers to reject the notion of domain-specific constraints on language (e.g. Evans and Levinson 2009), arguing instead that general cognitive constraints largely determine likely grammatical systems. Indeed, statistical universals are not easily integrated into many generative frameworks, which formalize principles of grammar as inviolable constraints (or in approaches like Optimality Theory, a universal set of violable constraints which strictly limits the generative capacity). This leads to a serious theoretical dilemma: Should theories of natural language structure account only for absolute universals, or should they incorporate a formal notion of preference or bias in order to account for these tendencies? The former approach is assumed in most theories of generative syntax. For example, Newmeyer (2005) argues explicitly that statistical tendencies may be accounted for by grammar-external factors. These might include both historical and culturally driven pressures, as well as “third factor”

274

jennifer culbertson

constraints on cognition (Chomsky 2005). However, several recent models working within a probabilistic constraint-based grammatical framework take the latter approach to at least some statistical typological tendencies (Pater 2011; Culbertson et al. 2013; White 2017). To summarize then, although typological universals present a potentially strong piece of evidence for constraints on the linguistic system, the data we get from language samples is not necessarily reliable, and may in some cases be misleading. Nor can we say with confidence that a universal which does appear to be reliable should be explained by features of our linguistic or cognitive system. For some researchers, this problem is exacerbated by the fact that many typological tendencies to not conform to the notion of exceptionless linguistic universal argued for by many generative syntacticians.

8.1.2 How can artificial language learning methods help? The most important contribution of artificial language learning experiments to date is in allowing researchers to test the predicted behavioral effects of hypothesized constraints in a controlled laboratory environment. Observations from linguistic typology or language acquisition can be used to generate hypotheses linking language structure to human cognition. The predictions of these hypotheses can be tested using precisely designed experimental manipulations. While most work in theoretical syntax does not yet incorporate this kind of evidence, the last decade has seen a surge in the use of artificial language learning experiments in research on theoretical phonology. For the most part, these studies have explicitly focused on statistical typological tendencies, attempting to show that typologically common patterns are acquired (or inferred) more readily than rare patterns. In other words, this research is focused on uncovering cognitive biases which might explain a given typological distribution. Such biases are difficult to test with natural language acquisition data alone, since no two natural languages will differ only in the phenomenon of interest. Further, the researcher cannot control the frequency with which particular learners might receive relevant information in the input. Where research has in fact focused on apparently non-defeasible principles, the problem is obvious: There simply are no natural language acquisition data available. Using artificial language learning experiments makes it possible to perfectly match languages aside from properties of interest, to control input frequency, and to compare learning of attested vs. unattested and common vs. rare linguistic patterns. In addition to this, these methods allow us to explore whether the same biases are found across development, how they might be amplified or dampened by language experience, and how widely they apply across cognitive domains. In the remainder of this chapter, I will first outline four general artificial language learning methods used for research on syntax. I will then give some specific examples of results obtained using these methods, briefly making connections to similar work in phonology where relevant. Because artificial language learning is relatively new in the field of syntax, I will end by discussing the directions this research must head in the future to fulfill its potential to answer key questions for syntactic theory.

artificial language learning

275

8.2 Key methods and results

..........................................................................................................................

8.2.1 A description of four widely used artificial language learning paradigms Table 8.1 provides a brief summary of the four main paradigms I will discuss in this chapter, along with a selection of key references. These paradigms differ in terms of what the input looks like and what the behavioral measure of interest is. It should be noted that although I will continue to refer to them collectively as artificial language learning paradigms, they do in fact differ in the extent to which “learning” is actively involved. The first paradigm, which I call “ease of learning,” is perhaps the most basic one. Participants are typically taught one of a set of patterns of interest, and are tested on how accurately (within some set number of trials) they are able to learn it. Accuracy levels are then compared across patterns. For example, Tabullo et al. (2012) teach participants a language with one of the six basic word order patterns, and test which of the six patterns is learned better. A variant of this paradigm traces the learning trajectory over trials, in some cases without an explicit training block, in order to assess how quickly a given pattern is acquired. The “poverty-of-the-stimulus” (POS) paradigm was first used in Wilson (2003; 2006), to explore biases in learning of phonological alternations. As its name indicates, the paradigm exposes participants to impoverished input, missing evidence that would support one hypothesis over another. For example, the input may be ambiguous between two grammatical rules, with held out-data designed to allow researchers to observe whether participants inferred one rule or the other. To take an example from syntax (discussed in detail below), Culbertson and Adger (2014) train participants on nouns with single modifiers (e.g. adjectives or numerals), leaving ambiguous the relative order of modifiers when more than one is present. Thus it differs from a basic ease–learning design in using held-out data to test how learners trained on a subset of the artificial language extrapolate to structures of interest in the absence of explicit evidence. Note that this is not simply testing generalization to new stimuli of the same kind (something which is commonly done across paradigms), but to patterns or structures which differ from exposure stimuli in some critical way. In more recent work in syntax, this paradigm has instead been called the “extrapolation paradigm” (in a nod to this distinction, and incidentally also disassociating the paradigm from the contentious POS debate). The “regularization” paradigm involves training learners on a variable or probabilistic distribution of alternative structures in the input—for example two possible words for a given meaning, or two different word orders used to describe a given scene. The input language is thus similar to what a learner might encounter during a period of linguistic innovation, contact, or change. Evidence for a bias comes from observing if and how learners alter the input distribution in their output when tested. Using one of the

276

jennifer culbertson

Table 8.1 Summary of key artificial language learning methods Paradigm

Brief description

Key references

Ease of learning

Learners are trained on patterns of interest. Speed or accuracy of learning is compared across patterns.

Poverty-ofthe-stimulus (Extrapolation)

Learners are trained on input data that is ambiguous between hypotheses of interest. Extrapolation to disambiguating held-out data is observed. Learners are trained on variable data. Degree of regularization of input distributions is observed (or compared across patterns). Typically, no input is provided. Participants’ improvised gestures or gesture sequences are observed.

Morgan et al. (1989), Musso et al. (2003), Culbertson et al. (2017), in nod to this distinction (and incidentally also disassociating the paradigm from the contentious POS debate) Wilson (2006), Culbertson and Adger (2014), Martin et al. (2020), in nod to this distinction (and incidentally also disassociating the paradigm from the contentious POS debate) Hudson Kam and Newport (2009), Culbertson et al. (2012), Fedzechkina et al. (2012), Culbertson et al. (2020a)

Regularization

Silent gesture

Goldin-Meadow et al. (2008), Schouwstra and de Swart (2014), Futrell et al. (2015a), Culbertson et al. (2020b)

alternatives more frequently than would be justified by the input is known as “regularization.” Generally speaking, regularization has been used as an alternative measure of ease of learning, or preference; patterns that are regularized are taken to be preferred by learners. For example, Culbertson et al. (2012) teach participants variable systems of noun phrase order (adjectives and numerals come before or after the noun with some probabilitity), and test whether learners shift these variable systems to bring them in line with a proposed preference for word order “harmony” (i.e. consistent order of heads and different modifiers). The fourth paradigm, called the “silent gesture” paradigm, involves eliciting manual gestures, typically in the context of improvisation (with no input) rather than learning (thus it is essentially an extreme version of the ‘poverty-of-the-stimulus’ paradigm, although see Motamedi et al. 2019). Hearing participants with no previous knowledge of a sign language are given a set of meanings which they must communicate using gesture alone, without any speech. Researchers measure similarities among the gestures they produce, as an indication of constraints or biases which guide behavior when no input or linguistic model is present. This paradigm was first used by Goldin-Meadow et al. (2008), who observed similarities in gestures conveying simple transitive events among speakers of several different languages.

8.2.2 Some methodological notes Before heading into discussion of specific experiments and findings using artificial language learning, it is worth highlighting a couple of general methodological issues. First,

artificial language learning

277

because this is a relatively new area of inquiry, the methods are still under development to some degree. For example, the size and complexity of the artificial languages used varies widely—from studies using many novel lexical items and exposure over several days (Hudson Kam and Newport 2009; Fedzechkina et al. 2012) to studies using native or pseudo-native lexical items and lasting 10–20 minutes (Smith and Wonnacott 2010; Culbertson and Adger 2014). Whether these methodological differences matter is not well understood. In addition, as I will discuss further below, the populations targeted by these experiments are often limited to adults, who necessarily bring their experience with their native language to the task. The specific language has historically often been English (an issue shared with (psycho)linguistics, psychology, and indeed cognitive science more generally), although more recent work includes evidence from other populations. Setting aside experience with a particular language, that this research mainly targets adult participants is also important to note in the context of broader questions about the hypothesized linking mechanisms between individual-level cognitive biases that play out in the lab and population-level forces which lead to typological distributions. Specifically, there is robust debate in the field as to whether the biases of adults or children, L1 learners/speakers or L2 learners/speakers, primarily shape language typology (e.g. see Lightfoot 1997; Bybee 2009; Lupyan and Dale 2010). This issue is beyond the scope of this chapter, but should be kept in mind by the interested reader. Finally, it is worth mentioning that, like all experimental methodologies, artificial language learning is by design very much simplified compared to the real-world equivalent. This has, in my experience, led to much skepticism that these methods could in fact tap into mechanisms that are at play in natural language (learning). As I have outlined above, however, there are important limitations to all sources of data in linguistics, and therefore converging evidence is extremely valuable to the field. To the degree that these sources of evidence align, we can be confident that the results we see are meaningful.

8.2.3 Early and foundational studies The first artificial language- (or “grammar”-) learning experiments, such as those conducted by Braine (1963) and Reber (1967), were aimed at understanding whether language was acquired via implicit learning, as argued by Chomsky and other early generativists (Chomsky 1957). The “languages” in most early artificial language studies comprised strings of nonsense words generated by a finite-state grammar, with no corresponding semantic content.1 Later studies began using semantically meaningful 1

It’s worth noting in relation to this that there is a huge literature on the use of artificial language learning paradigms to explore the ability of humans and other animals to acquire center-embedding— sometimes argued to be a key feature of language, and something only humans can acquire. This claim is very contentious, and it turns out center-embedding is very hard to learn, at least in the context of meaningless strings of nonsense words. For reasons of space, I will not discuss these studies further, but point the interested reader to: Fitch and Hauser (2004); Perruchet and Rey (2005); de Vries et al. (2008); Lai and Poletiek (2011), among many others.

278

jennifer culbertson

strings, following Moeser and Bregman (1972; 1973), who showed that syntactic rules are much more readily acquired under these conditions. The general idea was to observe how learners gradually pick up on the underlying structure of the language over the course of the experiment. Of particular interest was whether participants who accurately learned the system could verbalize their knowledge, or whether it remained implicit. Indeed, these early studies have prompted a huge amount of further work on implicit learning of grammatical rules (e.g. Mathews and Roussel 1997; Williams 2005). Braine and colleagues also went on to use these methods to explore implicit learning of grammatical categories including morphological gender. For example, Braine et al. (1990) and Brooks et al. (1993) showed that adults and children can acquire artificial gender classes when phonological cues are present, but not when classes are completely arbitrary. This was a theoretically significant finding, since early models of morphological acquisition predicted easy learning of arbitrary subclasses (e.g. Anderson 1983; Pinker 1984). While most of these early studies focused on general features of the learning mechanisms involved in acquiring grammatical knowledge, researchers were already beginning to compare learning of different types of systems. Frigo and McDonald (1998) compared ease of learning of artificial gender systems depending on the position and weight of phonological cues to noun class. A series of studies by Newport and colleagues (Morgan and Newport 1981; Morgan et al. 1987; 1989) showed that learning artificial phrase structure grammars was facilitated by cues to constituent structure (e.g. morphological dependencies and prosody). Smith et al. (1993) and later Musso et al. (2003) compared learning of rules from an existing language (e.g. passive formation in Italian), and hypothetical “impossible” rules (e.g. negation by changing word order). In a now classic set of studies, Hudson Kam and Newport (2005; 2009) sought to directly observe how learners might change a language in failing to accurately learn it. The idea was that these changes might reveal learners’ preferences or expectations about how natural languages should be structured. Research investigating how deaf children acquire sign language from second-language learner caregivers (Singleton and Newport 2004) suggested that unstructured errors in the input were not reproduced by children. Where adults sometimes produced a correct form, but other times omitted required morphological markers, children appeared to acquire deterministic rules. Regularization of unstructured variation has also been documented in research on creolization, a common claim being that child learners reduced variability during the process of creolization (DeGraff 1999; Singler 2006). Hudson Kam and Newport (2005; 2009) developed the ‘regularization’ paradigm, to investigate this phenomenon in the lab. Adult and child participants were trained on languages in which a determiner was used unpredictably (e.g., one language had the determiner present in most cases, but absent in the remainder). The frequency with which learners used the determiner in subsequent productions was measured. The findings revealed that children tended to produce systems which were more consistent or regular than the input, while adults only did this when the input was particularly complex. This regularization behavior (sometimes also measured as a drop in entropy compared to the input) has now been investigated

artificial language learning

279

extensively to show how children and adults might push linguistic and non-linguistic systems to be more deterministic over generations of learners (Hudson Kam and Chang 2009; Reali and Griffiths 2009; Smith and Wonnacott 2010; Culbertson et al. 2012; Perfors and Burns 2010; Ferdinand et al. 2019; Saldana et al. 2022). While the regularization paradigm was originally designed to investigate learners’ expectations about linguistic variation, experimental paradigms developed at the same time by theoretical phonologists focused on investigating the factors influencing the content and structure of phonological rules. For example, influential studies by Wilson (2003; 2006), Pycha et al. (2003), and Moreton (2008) used the povertyof-the-stimulus and ease–learning paradigms to test competing hypotheses from two alternative theoretical approaches. Evolutionary phonology argues that typological generalizations reflect pathways of phonologization, common articulatory and/or perceptual errors which accumulate over generations (Ohala 1993; Blevins 2004). By contrast, substantively biased approaches to phonological theory argue that typology reflects biases in the grammatical systems of individuals (Steriade 1997; Hayes 1999). The latter predicts that the effects of individual biases should be apparent in the learning of phonological patterns, straightforwardly testable using artificial language learning. These two strands of research—from developmental psychology on the one hand, and theoretical phonology on the other—built the foundation for experimental work investigating specific biases hypothesized to shape properties of syntactic systems. In the next sections, I will focus on three main categories of phenomena: simplicity (syntactic patterns reflecting a preference for representational complexity), naturalness (patterns reflecting meaning or conceptual knowledge), and communicative efficiency (patterns reflecting a trade-off between simplicity and ambiguity avoidance). I will end with discussion of some additional studies on processing and perception. As noted above, the majority of these studies target adult learners, but see Culbertson and Schuler (2019) for a review focused on relevant artificial language learning research in children.

8.2.4 Simplicity Observationally, linguistic systems which are formally less complex are often more common than more complex alternatives. This has been incorporated into linguistic theories, particularly in phonology (Martinet 1968; Clements 2003; Prince and Smolensky 1993/2004), and has been proposed as a general inductive bias in the broader cognitive science literature (e.g. Chater and Vitányi 2003). In generative syntax, economy principles have been claimed to constrain representations and operations (e.g. Chomsky 1957; Chomsky and Lasnik 1993; Grimshaw 1997). Simplicity is also a key notion in definitions of communicative efficiency (see Section 8.2.6). A number of artificial language learning studies have sought to provide behavioral evidence that simpler structures are easier to learn, or more likely to be implicitly assumed by learners in the absence of evidence in the input. For example, ease of learning of artificial phonological patterns generally correlates with the number of features relevant to the pattern (see Moreton and Pater 2012 for a review of such studies in phonology). In syntax,

280

jennifer culbertson

artificial language learning has been used to investigate one of the best-known observations concerning word order, alternatively called harmony or consistent head-direction (Greenberg 1963; Travis 1984; Dryer 1992; Baker 2001; Hawkins 2004; Cinque, to appear), which a number of researchers have argued to reflect simplicity.

8.2.4.1 Word order harmony Harmonic word order patterns are those which preserve a consistent order of syntactic heads relative to modifiers and other dependents, across a range of phrase types. For example, a language which has Object–Verb order in the VP, Noun–Postposition order in the PP, and Adjective–Noun order in the NP, is harmonic (head-final) across these phrases. A number of alternative mechanisms have been proposed to explain harmony, including representationally simplicity (e.g. Vennemann 1976; Pater 2011; Culbertson and Newport 2015; Culbertson and Kirby 2016), processing ease (e.g. Hawkins 1994), a head-direction “parameter” (e.g. Travis 1984), common grammaticalization pathways (e.g. Whitman 2008) or accidental, lineage-specific effects (e.g. Dunn et al. 2011). As a general tendency with many exceptions, harmony is thus subject to the issues of interpretation highlighted above (see also Ladd et al. 2014): It is likely to reflect a bias rather than a hard-and-fast constraint, but based on the typology alone we cannot say with certainty that any cognitive (or linguistic) bias for harmony exists. Culbertson et al. (2012) sought to investigate the cognitive underpinnings of word order patterns in the nominal domain, including harmony among noun phrases with an adjective and numeral modifier. In a relatively large sample of languages (Dryer 2013a,b), harmonic orders (e.g., both modifier types prenominal, or both postnominal) outnumber non-harmonic orders (e.g., one modifier prenominal, and the other postnominal). In addition, among both harmonic and non-harmonic patterns, postnominal adjectives are more common: the harmonic pattern N-Adj with N-Num is relatively more common than Adj-N with Num-N, and among the non-harmonic patterns N-Adj with Num-N is more common than Adj-N with N-Num (“Universal 18,” Greenberg 1963). To investigate whether these two tendencies reflect cognitive biases in learning, Culbertson et al. (2012) used the regularization paradigm with adult English speakers. Participants were taught a language with simple noun phrases consisting of a noun and a single modifier, either an adjective or a numeral (i.e. (dis)harmony was at the level of the grammar, not the utterance, cf. Hawkins 1994). Participants were trained on an input language featuring one of the four patterns (pre- or postnominal harmonic, N-Adj with Num-N or Adj-N with N-Num) as the dominant one in the language. However, order in each language was variable, and thus in principle any combination of noun with adjective or noun with numeral could appear in either order. The results showed that participants trained on the two dominant harmonic patterns regularized most, and participants trained on the rare Adj-N, N-Num pattern regularized the least. This result was partially replicated with child learners in Culbertson and Newport (2015), who showed that 6–7-year-old children strongly prefer harmonic patterns, and in general shift non-harmonic input distributions dramatically toward harmonic (with no apparent distinction between the two nonharmonic patterns; see

artificial language learning

281

also Culbertson and Newport 2017). Notably, English is a harmonic language, thus a preference for harmony could reflect abstract structural transfer—i.e. the results might reflect prior language experience rather than a universal bias with the potential to shape typology. Culbertson et al. (2020a) therefore used this same design to test French- and Hebrew-speaking children and adults, whose native language pattern is predominantly non-harmonic (N-Adj, Num-N). In all cases, a bias for harmony was replicated (though it also interacted in complex ways with both L1 and L2 experience in these populations). Finally, it is worth noting that the notion of harmony is mainly discussed as cross-category harmony: alignment across different types of phrases, like VP and PP. Recent research has used the extrapolation paradigm to provide evidence for a harmony bias in this context (Wang et al. submitted). The experiments outlined above primarily provide evidence of a preference for harmony—a bias likely related to representational simplicity. Follow-up work has sought to understand this as an emergent bias reflecting ease of generalization in grammatical rule learning (Pater 2011), a prior bias over grammars (Culbertson and Smolensky 2012; Culbertson et al. 2013), or as a more general bias for simplicity operating across cognition (Culbertson and Kirby 2016). While additional evidence is needed to adjudicate between these theoretical approaches, the original motivation for Culbertson et al. (2012) was in fact the potential distinction between non-harmonic patterns mentioned above (Universal 18). In their study, English-speaking adult learners particularly disfavored combining Adj-N with N-Num (see Culbertson et al. 2013 for a replication of this effect). A possibility explanation is that this pattern is the worst of the worst; it is not only non-harmonic, but uses prenominal adjectives, which are independently disfavored. For example, postnominal adjectives may have an advantage over prenominal adjective because they set up the object to be modified first, so that the adjective can immediately be interpreted in context (Kamp and Partee 1995; Rubio-Fernandez et al. 2020). This leads us to another set of experiments exploring the idea that some linear orders may be preferred over others for semantic or conceptual reasons, rather than simplicity.

8.2.5 Naturalness The idea that some syntactic structures or order are more “natural” than others is loosely related to the notion of naturalness commonly appealed to in phonology. In phonology, more natural patterns are phonotactically motivated (e.g. vowel harmony as opposed to disharmony), and a number of artificial language learning studies have sought to explore whether such rules are preferred by learners (e.g. Wilson 2006; White 2014; Martin and White 2019). What constitutes a natural pattern in syntax is not as straightforward, however: The term has been used to refer to patterns which are motivated by semantic or conceptual considerations. There are two strands of research using artificial language learning to explore naturalness biases in syntax: The first investigates basic word order, and the second again tackles word order in the noun phrase.

282

jennifer culbertson

8.2.5.1 Basic word order The typological frequency of basic word order patterns is highly skewed: SOV is the most frequent, followed by SVO, with a steep drop-off for VSO, VOS, OVS, and finally OSV. Most theoretical accounts of the typological differences among these orders focus on a general preference to have subjects (or animate things) first (e.g. Gibson 2000; Jackendoff 2002; Demiral et al. 2008), and a preference for grouping the verb and the object together (Baker 2009; Gibson 2000). The two most frequent orders, SOV and SVO, conform to both of these, while the less frequent orders violate one or both. While some psycholinguistic studies suggest these constraints might explain word order variation within a language (e.g. Demiral et al. 2008), typological frequency data alone does not tell us whether these kinds of constraints shape the overall frequency of basic ordering patterns (see e.g. Maurits and Griffiths 2014). A handful of artificial language learning studies using an ease-of-learning design have attempted to provide evidence for a cognitive bias favoring frequent basic orders over infrequent ones. Tily et al. (2011) taught English-speaking adults a language featuring one of the six basic word order patterns. Participants were tested on comprehension (choosing which picture corresponded to a given sentence in the language) and production (clicking on words in the lexicon to construct a sentence corresponding to a given picture). No differences in comprehension were found, but production accuracy was higher in the SVO and SOV input conditions compared to the remaining four patterns. Similarly, Tabullo et al. (2012) taught Spanish-speaking adults a miniature artificial language with SVO, SOV, OSV, or VSO word order. They found that accuracy rates (as measured by a grammaticality judgment task) for SVO and SOV were higher than for OSV and VSO, roughly mirroring the typological frequency of these patterns. Neither of these studies revealed a clear difference in learning between SVO and SOV, despite the frequency difference between them (Greenberg 1963; Dryer 1997), and despite the fact that all learners in these studies were speakers of SVO languages. Interestingly, a number of researchers have shown that SOV is more likely than SOV to arise in the early stages of spoken and signed languages, though in some cases SOV later changes to SVO (Givón 1979; Fischer 1975; Sandler et al. 2005; Gell-Mann and Ruhlen 2011). This suggests the possibility that even if there is no learnability difference between these orders, SOV order may be innovated more frequently. An important series of studies explores this possibility using the “silent gesture” paradigm, first introduced by Goldin-Meadow et al. (2008). In this study, non-signing adult participants who were native speakers of English (SVO), Spanish (SVO), Chinese (SVO/SOV), or Turkish (SOV) were shown pictures of basic transitive events, like a scene in which a girl covers a box. They were asked to communicate these events using only their hands. Across all language groups, participants consistently provided gesture sequences in which the agent was first, then the patient, and finally the action. By contrast, when participants were asked to verbally describe the same scenes, they used either SVO or SOV according to the basic word order in their native language. How exactly these gestures relate to or consist of linguistic structure is not possible to say; it seems unlikely

artificial language learning

283

that participants were producing the grammatical categories Subject–Object–Verb, and perhaps they were not even producing a (single) sentence. Goldin-Meadow et al. (2008) suggest that conveying the agent and patient together before the action is a natural way of representing this kind of event—it highlights the entities involved in an action before the relational action itself. By extension, we can say that in natural language this corresponds to SOV. This result generated a number of follow-up studies probing the conditions under which gesturers might switch to SVO (Schouwstra and de Swart 2014; Hall et al. 2014; Marno et al. 2015), and how gesture order might be affected by (or conditioned on) event semantics (Schouwstra and de Swart 2014; Gibson et al. 2013; Hall et al. 2013), or speaker perspective (Kirton et al. 2021). One of the most intriguing findings from these studies is the apparent difference between gesture order for reversible and nonreversible events. In Goldin-Meadow et al. (2008), almost all the transitive events involved a human actor and an inanimate object patient. These are non-reversible events, with no ambiguity concerning which participant is the agent and which is the patient. Gibson et al. (2013) showed that reversible events—where either participant could in principle play either grammatical role—are more likely to trigger SVO gesture order. For example, when participants were shown a scene with a fireman kicking a ball, they were likely to gesture the fireman first, then the ball, and finally the kicking action. By contrast, when participants were shown a scene with a fireman kicking a girl, they were more likely to gesture the fireman, then the kicking action, then the girl. Futrell et al. (2015a) replicated this pattern of SOV for non-reversible and SVO for reversible events in a larger set of language populations. This included speakers of two SVO languages (English and Russian), and two VSO languages (Irish and Tagalog), suggesting that the strong subject-initial bias found in previous studies may hold even if gesturers’ native language does not conform to this. Why exactly gesturers shift from SOV to SVO for certain types of events is not yet clear. Gibson et al. (2013) argue that SVO is more robust to noise, preserving information about grammatical relations even if one of the noun phrases is obscured (the “noisy-channel” hypothesis). Hall et al. (2013; 2014) suggest a number of other possibilities including a gesture-specific constraint on producing a human patient gesture between a human agent and the action it takes. The role of modality-specific constraints is in this case particularly important given the findings reported above using spoken artificial language stimuli. However, the differences found between SVO and SOV may stem from constraints on semantic or conceptual naturalness, brought out most clearly in tasks which involve improvisation rather than learning (Schouwstra et al. 2016).

8.2.5.2 Universal 20 The idea that some word orders might more naturally reflect semantic relations is also at play in the noun phrase. Above we discussed the possibility that Greenberg’s Universal 18 might result from a (simplicity) preference for harmony combined with a (naturalness) preference for postnominal adjectives. Here we will focus on naturalness in the context of Greenberg’s (1963) Universal 20, which concerns the relative order of

284

jennifer culbertson

nominal modifiers including adjective, numeral, and demonstrative (rather than how they are ordered relative to the noun). Universal 20 has generated sustained interest among typologists and syntacticians, and the general restrictions posited to be relevant for this universal have been argued to govern word order patterns in other domains as well (e.g. ordering of verbal elements, Koopman and Szabolcsi 2000). The original statement of the universal highlighted three patterns, picking out Dem-Num-Adj-N as the most common prenominal pattern, and N-Dem-Num-Adj along with N-Adj-NumDem as the two most common postnominal patterns. Subsequent work has focused largely on accommodating additional patterns, including many which were unattested in Greenberg’s original sample (e.g. Hawkins 1979; Cinque 2005; Abels and Neeleman 2012; Dryer 2018; Steedman 2020). Most of these accounts assume semantic or structural distinctions among nominal modifiers that can be described in terms of scope, or semantic compositionality. Intuitively, adjectives modify dimensions inherent to noun meaning and are therefore typically claimed to take innermost scope, composing with the noun first. Numerals then compose with the semantic constituent including both the noun and adjective, therefore taking scope over this unit. Demonstratives serve to connect nominal material to the surrounding discourse, and therefore compose last, taking highest scope over the semantic constituents containing the numeral, adjective and noun (see also Partee 1987; Adger 2003; Rijkhoff 2004). Importantly, these structural relations are argued to influence but not fully determine linear order. Some noun phrase word orders can be derived from it directly, simply by choosing an order for each sub-constituent—N relative to Adj; Num relative to the constituent containing N and Adj; Dem relative to the sub-constituent containing the other two constituents. There are eight such linearizations, labelled “homomorphic” following Martin et al. (2020). For example, [Dem [Num [Adj N ] ] ], [ [ [N Adj] Num] Dem], and [Dem [ [N Adj] Num ] ] are homomorphic (as illustrated by the structural bracketing), but N-Dem-Num-Adj is not. The homomorphic orders are well attested, and indeed they are all among the most frequent. By contrast, non-homomorphic patterns are generally less common (though based on the latest typological data almost all possible orders are attested, Dryer 2018). This can be seen most clearly for Greenberg’s original three patterns: homomorphic Dem-Num-Adj-N and N-Adj-Num-Dem are the two most frequently attested patterns (note they are also harmonic), while non-homomorphic NDem-Num-Adj is attested but rare. Despite decades of work implicating naturalness— i.e. the relation between semantic or conceptual structure and linear order—in this domain, there has been scant behavioral evidence for a bias favoring homomorphic orders. Culbertson and Adger (2014) provided the first such evidence using the poverty-ofthe-stimulus (extrapolation) paradigm. Notably, as mentioned above, this paradigm shares with the silent-gesture paradigm the focus on improvising or extrapolating beyond any input data. In Culbertson and Adger (2014), English-speaking adults (Dem-Num-Adj-N) were exposed only to phrases with a single postnominal modifier, so they had no evidence about the relative order of modifiers. They found that participants nevertheless implicitly assumed homomorphic order when extrapolating to a

artificial language learning

285

phrase with multiple modifiers—i.e. they chose orders like N-Adj-Dem over N-DemAdj. These findings were replicated for Thai-speaking adults (N-Adj-Num-Dem) in Martin et al. (2019). Importantly, these studies used real lexical items (either English or Thai) and visually presented phrases to participants. In order to be sure that these results were not driven by a strategy to “flip” native language orders (thus deriving homomorphic orders without a homomorphism bias), Martin et al. (2020) replicated this study with a fully artificial language, no visual presentation of phrases, and oral production at test. As for harmony, however, the homomorphism bias found in English and Thai speakers could in principle reflect abstract transfer. The best evidence against this interpretation would be to replicate these results in speakers whose native language is non-homomorphic. That presents a challenge, as such languages are rare, therefore Culbertson et al. (2020b) instead used the silent gesture paradigm to investigate whether the bias for homomorphism was present when participants are using a modality distinct from their own. Recall that in work on basic word order using this paradigm, participants did not use their native language order, suggesting that this paradigm may be less likely to elicit native language transfer. Culbertson et al. (2020b) found that English-speaking adults using gesture to convey pictures of simple objects with a pattern or size, numeral, and location contrast consistently improvised homomorphic orders. Moreoever, they showed a bias for postnominal adjectives, relating back to the typological findings discussed in section 8.2.4. They also sketch an explanation for how the underlying structure in this domain—which groups adjectives closest to noun, then numerals, with demonstrative furthest away—might be learned by observing objects and their properties in the world. Briefly, they show that an information-theoretic measure of conceptual closeness predicts just this asymmetry among properties conveyed by the three modifier types.

8.2.5.3 Universal 39 The idea that elements which are conceptual closer to each other is also potentially at play in yet another of Greenberg’s 1963 observations: Universal 39. This states that in languages with distinct number marking case marking, number is always placed closer to the noun stem than case. As for the noun phrase, it has been hypothesized that this results from scope or semantic composition; for example, morphemes which more directly affect or modify the semantic content of the stem have narrower scope (e.g. Baker 1985; Bybee 1985). However, there are a number of competing hypotheses for why particular patterns of morpheme order might hold, some of which also apply to word order more generally. For example, Hay (2001) argues that morpheme order depends on the degree to which a particular morpheme is easily parseable from the stem (related to how frequently the stem occurs without that morpheme), and relatedly Hahn et al. (2020b,a) argue that word and morpheme order reflect the degree of dependency between the elements in question (related to the information-theoretic notion of surprisal). Both of these accounts are frequency-driven, and do not themselves depend on the meaning of

286

jennifer culbertson

the elements in question. Saldana et al. (2021) test these alternative hypotheses using the extrapolation paradigm. They trained adult English- and Japanese- speaking adults on a language in which number and case markers were either both prenominal or both postnominal, but there was no information about their relative order. They controlled for frequency, such that nouns were equally likely to occur with number or case marking. Both populations were highly like to infer an order in which number came closer to the noun than case. Evidence from these two populations allows Saldana et al. (2021) to conclude that this finding is likely to be independent from prior language experience; English does not have case, and Japanese does not have number marking (at least not of the type used in the study), and yet both populations showed the same preferences. This suggests that naturalness is indeed at play in morpheme order, although it likely interacts with other surface features of the language, like relative frequencies of different morphemes.2

8.2.6 Communicative efficiency Hahn et al. (2020b,a) appeal to the notion of efficiency in their work on word and morpheme order. While the results of Culbertson and Adger (2014) and Saldana et al. (2021) suggest this may not necessarily be the ultimate explanation at least for Universals 20 and 39, efficiency has been argued to be at play in a wide range of syntactic (and semantic) domains. In this section, we turn to a set of experiments which aims to test the basic hypothesis that communicative efficiency shapes (morpho)syntax. One common informal definition of efficiency is a balance between simplicity and ambiguity avoidance. A number of studies, often based on cross-linguistic corpora, suggest that languages tend to be efficient—as simple as they can be while maintaining communicative utility (e.g. Gibson et al. 2019; Hahn et al. 2020b). Artificial language learning experiments using a fifth paradigm—typically called the “iterated learning paradigm”—have shown that languages evolve to be efficient when participants use them to communicate, and then pass them on to new “generations” of participantlearners (Kirby et al. 2015; Motamedi et al. 2019). However, there is also evidence using the regularization paradigm suggesting that learning on its own may lead to the creation of efficient languages.

2

This is not the only study suggesting that naturalness may play a role in morphology or morphosyntax. Recent work has also used artificial language learning (both extrapolation and ease-of-learning) to show that certain typologically common person systems (i.e. as instantiated in personal pronoun paradigms) are easier to learn than others, after controlling for paradigm simplicity (Maldonado and Culbertson, 2020). Maldonado et al. (2020) also use a design similar to Saldana et al. (2021) to explore the relative order of person and number morphemes. In both cases, substantial theoretical work exists, but typological data is sparse, making experimental data of this sort particularly important.

artificial language learning

287

8.2.6.1 Differential case marking Fedzechkina et al. (2012) explore so-called differential case marking (DCM) systems, a prime example of a phenomenon used to argue for the role of communicative efficiency in language (e.g. Comrie 1978; Croft 1990; Jäger 2007). DCM systems can be characterized as efficient because rather than case marking all event participants (which would minimize ambiguity but maximize complexity), they typically mark only participants that are potentially ambiguous. For example, a typical DCM system might target unusual, or non-prototypical arguments—e.g. inanimate subjects, or animate objects. These types of arguments occur in contexts that are potentially ambiguous: If a sentence has an animate object, this object might be mistaken for the subject, if other reliable cues to grammatical roles are absent. The hypothesis that efficiency drives DCM systems has remained in play in the face of typological and theoretical evidence calling it into question (e.g. Aissen 1999; Haspelmath 2008; Bickel et al. 2015; Levshina in press). Fedzechkina et al. (2012) use the regularization paradigm to test whether Englishspeaking adults will reorganize a language with random variation in word order and case marking such that a DCM system emerges. Participants were taught a language in which case marking appeared randomly on either objects or subjects regardless of their animacy. In the variable object-marking condition, by the end of training (which lasted four days), participants who produced variable case marking conditioned that marking on animacy, using the marker more often for animate objects. The results were similar, though weaker, for the variable subject case condition. While these results are consistent with an efficiency account of DCM, it is worth noting that participants did not use this language for communication; they simply learned and reproduced it (i.e. described pictures). Moreover, the input languages in this experiment were set up such that the sentences were not functionally ambiguous. Particular nouns in the language were only ever seen as subjects or objects. Thus this experiment does not necessarily provide clear evidence for a communicative efficiency account of case marking. It could instead reflect, for example, a bias to mark unusual alignments between e.g. animacy and grammatical role (as in Aissen 1999; Haspelmath 2008). Smith and Culbertson (submitted) conducted a replication in which the sentences were actually ambiguous, and found that participants indeed created DCM-like systems, but only when they had to actively use the language they had learned to communicate (i.e. describe pictures to a confederate who interpreted them).

8.2.7 Experiments on processing and perception of (morpho)syntax 8.2.7.1 Cue position It is worth nothing that there was also an effect of word order Fedzechkina et al. (2012); case marking was used more often when the case-marked grammatical role was sentence-initial. In the object-marking condition, this meant using the marker when

288

jennifer culbertson

the sentence order was OSV, while in the subject-marking condition this meant using the marker in SOV sentences (see also Fedzechkina et al. 2017). This is consistent with findings from Pozzan and Trueswell (2015), who show that adult learners are delayed in both comprehension and production of morphology in an artificial language learning task when morphology appears in sentence-final position (Fedzechkina et al. 2015). Pozzan and Trueswell (2015) taught participants one of four languages, which differed along two dimensions. Each language used either head marking or argument marking, and each was either verb-initial or verb-final. In this case, the markers themselves provided information about the type of event (e.g. causative). After two days of learning, on a third day participants were tested on comprehension and production. Performance on both measures was higher for the two verb-initial conditions compared to the two verb-final conditions. The potential role of language processing and comprehension in shaping syntax has been hypothesized by many linguists on the basis of observations from individual languages and typology (e.g. Hawkins 2007; Jäger, 2007; Trueswell et al. 2012). These experimental results provide direct behavioral evidence of individual cognitive processes active during learning that can explain structural properties of syntax—here, common patterns of case marking, and contingencies between word order and morphology.

8.2.7.2 Dependency-length minimization One of the most well-known hypothesized effects of processing on typology is dependency-length minimization (DML), the idea that longer dependencies are difficult to process and are generally avoided where possible (e.g., Grodner and Gibson 2005). This can happen in real time in a given language, when variation is conditioned on phrase length. For example, if we assume that a ditransitive verb like took in the sentence Jon took the trash out has three dependents: Jon, out, and trash, then we can shorten the length of the dependencies if we use the alternative Jon took out the trash. Indeed, English exhibits a general short-before-long preference which can be understood in terms of DLM (Arnold et al. 2000). DLM can also happen diachronically, if orders with shorter dependencies become grammaticalized or fixed more often than those with longer dependencies (e.g. Hawkins 1990). In recent work, Futrell et al. (2015b) show that across a large cross-linguistic sample, languages appear to minimize dependency lengths more than expected by chance. Fedzechkina et al. (2018) test whether English-speaking learners will reorganize an artificial language with variable word order in line with DLM in the lab. The input language featured variable order of subjects and object (SO or OS), and consistent case marking. In one condition, learners were taught a verb-initial language, in the other they were taught a verb-final language. The languages had adjectives which could modify nouns, and adpositional phrases. In the verb-initial language, the latter were prepositional, while in the verbfinal language they were postpositional. This meant that shorter dependency lengths could be achieved in the verb-initial language by ordering the subject first and the object second; for example, SO order as in ‘[punch]S [girl]O [boy on red stool]’ has shorter dependency lengths than OS order as in ‘[punch]O [boy on red stool] S [girl]’, since in the

artificial language learning

289

latter case the longer constituent intervenes between the verb and its other dependent. The ordering required to achieve shorter dependencies is reversed in the verb-final language: Here, OS order results in shorter dependencies, e.g., ‘O [boy on red stool] S [girl] [punch]’ than SO order, as in “S [girl]O [boy on red stool] [punch]. Fedzechkina et al. (2018) find that, indeed, learners take advantage of the variation present in the input language to shorten dependencies in both conditions, using SO order more in the verbinitial condition, and OS order more in the verb-final condition. Importantly, the latter result cannot be explained by the short-before-long preference preference that is already present in English. This study therefore supports the claim that DLM is a bias active during language learning and/or use that may shape languages both synchronically and diachronically. However, as in previous studies reviewed here (e.g. Culbertson et al. 2012; Culbertson and Adger 2014), there is a possibility of transfer on a more abstract-level: English-speakers have experience with a language in which dependencies are actively minimized, and therefore they may transfer this expectation to a newly learned language. To tackle this possibility, experiments with different populations are again needed.

8.2.7.3 Affix ordering As discussed in Section 8.2.5.3, there are good reasons to believe that morpheme and word order are shaped by similar pressures: for naturalness, simplicity, and efficiency, among other things. The kinds of processing-related pressures discussed in the previous section have also been argued to shape features at the level both of syntax and of morphosyntax. One well-known example is the so-called “suffixing preference.” It has long been noted that more languages use predominantly or exclusively suffixes compared to prefixes (Greenberg 1963; Dryer 2013c). Hawkins and Cutler (1988) argue that the preference for suffixes arises due to a confluence of pressures on language processing at the word level. Briefly, they argue that the processing system privileges lexical information over grammatical information, thus it follows that stems should be placed at the beginning and affixes at the end. Hupp et al. (2009) propose that placing important information, like stems, at the beginning, and grouping information, like affixes, at the end may reflect a more general perceptual bias. In particular, they show that when English-speaking adults see or hear artificial sequential stimuli (syllables, shapes, musical notes), they rate sequences which differ at their ends as more similar than sequences which differ at their beginnings (see Bruening et al. 2012 for similar findings with English-speaking children). St. Clair et al. (2009) further show that English-speaking adults are better at acquiring two novel categories of artificial lexical items when the categories are marked by an artificial suffix-like element rather than a prefix-like element. This result is also consistent with the idea that humans might have a universal perceptual bias that makes grouping similar words together easier when the similarity is at the end. However, both these studies (and indeed most previous psycholinguistic evidence looking at affix order) test a participant population whose language is heavily suffixing. By contrast, Martin and Culbertson (2020) replicated Hupp et al. (2009) comparing

290

jennifer culbertson

English speakers with speakers of the Bantu language Kîîtharaka, which is predominantly prefixing. While they find that English speakers judge sequences as more similar when they differ at the ends, Kîîtharaka speakers show the opposite pattern, judging sequences that differ at the beginning as more similar. This suggests the possibility that what has been called the “suffixing preference” in fact does not reflect a cognitive or perceptual bias, but perhaps is a residue of common historical changes (e.g. Himmelmann 2014).

8.3 The future of artificial language learning in syntax

..........................................................................................................................

The experiments discussed in this chapter represent the current state of the art in using artificial language experiments to investigate key features of syntax (and morphosyntax). The major goal of these studies is to provide empirical evidence connecting recurring patterns of syntactic structure with properties of the human cognitive system. The phenomena I have covered here include nominal and basic word order, morpheme order, case marking and its interaction with word order, and dependency length. I have highlighted along with these phenomena the high-level cognitive and linguistic mechanisms which have been argued to be at play for these phenomena, including simplicity, naturalness, communicative efficiency, and processing ease. In this last section, I would like to discuss three issues which I believe are critical for the future of artificial language learning as a source of evidence for linguistics in general, and syntax in particular. The first and perhaps most obvious issue is the range of phenomena studied. While this range is growing year-on-year, it remains narrow in several respects. First, much of the work targets ordering phenomena, which of course does not fully represent (morpho)syntax. Second, for some syntacticians, these phenomena may not be seen as “core”; there has long been debate about to what degree, for example, Greenberg’s Universals are indeed something syntax as a field should seek to explain (e.g. Newmeyer 2005; Boeckx 2009). In part this is due to the perception that only exceptionless syntactic generalization should be explained by syntax. But regardless of whether that view has merit, to have wide value to the field, these methods must be applied to a wider range of topics. In many of the studies discussed here, the range of participant populations studied also remains narrow. A full understanding of the extent to which a hypothesized bias is found across speakers of different languages, and across stages of development, is crucial in a number of respects. The strongest evidence for the universality of a bias (i.e. evidence of that the bias is at work in all humans, though not in all human languages) comes from showing that it is at work even in languages which themselves violate it. Several studies discussed above attempt to provide this kind of evidence, for example in studying the harmony bias in speakers of a non-harmonic languages

artificial language learning

291

(Culbertson et al. 2020a), or the subject-first bias in the gesture orders of speakers of a VSO language (Futrell et al. 2015a), or the suffixing preference with speakers of a prefixing language (Martin and Culbertson 2020). Further, understanding the role that experience with a particular language, and development more generally, plays in shaping biases will help to build more precise theories of the link between individual cognition and language typology. Whether language structure is shaped primarily by first-language learners, by adult second-language learners, or by adult patterns of usage continues to be much debated (Yang 2000; Trudgill 2011; Lupyan and Dale 2010). Artificial language experiments have the potential to advance these debates. Research using artificial language learning experiments must also continue to feed back into theories of grammar. The question of whether and how to explain violable cognitive or linguistic biases is one of the major issues of contemporaries syntax. A number of researchers have attempted to shed light on how syntactic biases can be accounted for in existing theories of grammar (e.g. Culbertson and Smolensky 2012; Culbertson et al. 2013; Pater 2011; Perfors et al. 2011). At the same time, some biases may reflect truly domain-general features of cognition, which interact with grammar, but are not directly encoded in the grammatical system itself (e.g. the so-called regularization bias, Reali and Griffiths 2009; Culbertson et al. 2013; Culbertson and Kirby 2016). While such biases do not necessarily concern syntacticians directly, a complete account of natural-language syntax calls for integrated theories of grammar-internal and external cognitive factors. Indeed, recent computational models have shown that weak biases of these kind are both more likely to have evolved and able to exert relatively strong effects over time (Thompson et al. 2016). Theories and models of interacting pressures shaping language will help to address a common criticism of work on universal biases in linguistics in general: Why don’t all languages (eventually) come to conform to a given bias? The obvious answer is that these biases are defeasible, and thus will interact with competing pressures which potentially pull in other directions. However, artificial language learning paradigms can help to concretize this idea. For example, recent work on the cultural transmission of language has explored the concrete effects of competing pressures for simplicity and ambiguity avoidance using traditional artificial language learning and silent-gesture paradigms combined with iterated learning (Kirby et al. 2015; Motamedi et al. 2019). To summarize, artificial language learning is an important method in the toolbox of linguistics for advancing key hypotheses about why language looks the way it does. These methods have been used extensively in work on language development and theoretical phonology, and have recently been extended to syntax, where they have been used to explore the extent to which properties of human cognition are linked to recurring features of syntactic typology. Important directions for these methods in the future include extension to a wider range of hypothesized biases, more comprehensive cross-linguistic and developmental experimentation, and integration with theories of grammar-internal and external forces shaping syntax.

292

jennifer culbertson

References Abels, K., and A. Neeleman. 2012. Linear asymmetries and the LCA. Syntax 15: 25–74. Adger, D. 2003. Core syntax. Oxford: Oxford University Press. Aissen, J. 1999. Markedness and subject choice in optimality theory. Natural Language and Linguistic Theory 17: 673–711. Ambridge, B., C. F. Rowland, and J. M. Pine. 2008. Is structure dependence an innate constraint? New experimental evidence from children’s complex-question production. Cognitive Science 32: 222–255. Anderson, J. R. 1983. The architecture of cognition. Cambridge, MA: Harvard University Press. Aristar, A. R. 1991. On diachronic sources and synchronic pattern: An investigation into the origin of linguistic universals. Language 67: 1–33. Arnold, J., A. Losongco, T. Wasow, and R. Ginstrom. 2000. Heaviness vs. newness: The effects of structural complexity and discourse status on constituent ordering. Language 76: 28–55. Baker, M. 1985. The mirror principle and morphosyntactic explanation. Linguistic Inquiry 16: 373–415. Baker, M. 2001. The atoms of language: The mind’s hidden rules of grammar. New York: Basic Books. Baker, M. C. 2009. Language universals: Abstract but not mythological. Behavioral and Brain Sciences 32: 448–449. Biberauer, T., A. Holmberg, and I. Roberts. 2014. A syntactic universal and its consequences. Linguistic Inquiry 45: 169–225. Bickel, B. 2007. Typology in the 21st century: Major current developments. Linguistic Typology 11: 239–251. Bickel, B., A. Witzlack-Makarevich, and T. Zakharko. 2015. Typological evidence against universal effects of referential scales on case alignment. In Ina Bornkessel-Schlesewsky, Andrej Malchukov, and Marc Richards (eds), Scales and hierarchies: A cross-disciplinary perspective, 7–44. Berlin: de Gruyter Mouton. Blevins, J. 2004. Evolutionary phonology: The emergence of sound patterns. New York: Cambridge University Press. Boeckx, C. 2009. Round table—language universals: Yesterday, today and tomorrow. In M. Piattelli-Palmarini, J. Uriagereka, and P. Salaburu (eds), Of minds and language: A dialogue with Noam Chomsky in the Basque Country, 195–199. Oxford: Oxford University Press. Braine, M. D. 1963. On learning the grammatical order of words. Psychological Review 70: 33. Braine, M. D. S., R. E. Brody, P. J. Brooks, V. Sudhalter, J. A. Ross, L., Catalano, and S. M. Fisch. 1990. Exploring language acquisition in children with a miniature artificial language: Effects of item and pattern frequency, arbitrary subclasses, and correction. Journal of Memory and Language 29: 591–610. Brooks, P. J., M. D. Braine, L. Catalano, R. E. Brody, and V. Sudhalter. 1993. Acquisition of gender-like noun subclasses in an artificial language: The contribution of phonological markers to learning. Journal of Memory and Language 32: 76–95. Bruening, P., P. Brooks, L. Alfieri, V. Kempe, and I. Dabašinskienė. 2012. Children’s tolerance of word-form variation. Child Development Research 2012: 401680. Bybee, J. L. 1985. Morphology: A study of the relation between meaning and form. Philadelphia, PA: Benjamins. Bybee, J. 2006. From usage to grammar: The mind’s response to repetition. Language 82: 711– 733.

artificial language learning

293

Bybee, J. 2009. Language universals and usage-based theory. In M. H. Christiansen, C. Collins, and S. Edelman (eds), Language universals: 17–39. Oxford: Oxford University Press. Chater, N., and P. Vitanyi. 2003. Simplicity: A unifying principle in cognitive science? Trends in Cognitive Sciences 7: 19–22. Chomsky, N. 1957. Syntactic structures. Mouton de Gruyter. Chomsky, N. 1959. A review of B. F. Skinner’s Verbal behavior. Language 35: 26–58. Chomsky, N. 2005. Three factors in language design. Linguistic Inquiry 36: 1–22. Chomsky, N., and H. Lasnik. 1993. The theory of principles and parameters. Syntax 1: 506– 569. Cinque, G. 2005. Deriving Greenberg’s Universal 20 and its exceptions. Linguistic Inquiry 36: 315–332. Cinque, G. to appear. A micro-parametric approach to the head-initial/head-final parameter. Linguistic Analysis 42. Clements, G. N. 2003. Feature economy in sound systems. Phonology 20: 287–333. Comrie, B. 1978. Ergativity. In W. Lehmann (ed.), Syntactic typology: Studies in the phenomenology of language, 329–394. Austin: University of Texas Press. Crain, S., and M. Nakayama. 1987. Structure dependence in grammar formation. Language 522–543. Croft, W. 1990. Typology and universals. New York: Cambridge University Press. Culbertson, J., and D. Adger. 2014. Language learners privilege structured meaning over surface frequency. Proceedings of the National Academy of Sciences 111: 5842–5847. Culbertson, J., and S. Kirby. 2016. Simplicity and specificity in language: Domain- general biases have domain specific effects. Frontiers in Psychology 6. Culbertson, J., and E. L. Newport. 2015. Harmonic biases in child learners: In support of language universals. Cognition 139: 71–82. Culbertson, J., and E. L. Newport. 2017. Innovation of word order harmony across development. Open Mind 1: 91–100. Culbertson, J., and K. Schuler. 2019. Artificial language learning in children. Annual Review of Linguistics 5: 353–373. Culbertson, J., and P. Smolensky. 2012. A Bayesian model of biases in artificial language learning: The case of a word-order universal. Cognitive Science 36: 1468–1498. Culbertson, J., A. Gagliardi, and K. Smith. 2017. Competition between phonology and semantics in noun class learning. Journal of Memory and Language 92: 343–358. Culbertson, J., M. Schouwstra, and S. Kirby. 2020. From the world to word order: deriving biases in noun phrase order from statistical properties of the world. Language 96. Culbertson, J., J. Franck, G. Braquet, M., Barrera Navarro, and I. Arnon. 2020. A learning bias for word order harmony: Evidence from speakers of non-harmonic languages. Cognition 204. Culbertson, J., P. Smolensky, and G. Legendre. 2012. Learning biases predict a word order universal. Cognition 122: 306–329. Culbertson, J., P. Smolensky, and C. Wilson. 2013. Cognitive biases, linguistic universals, and constraint-based grammar learning. Topics in Cognitive Science 5: 392–424. Cysouw, M. A. 2005. Quantitative methods in typology. In Quantitative linguistics: An international handbook, 554–578. Berlin: Mouton de Gruyter. DeGraff, M. 1999. Creolization, language change and language acquisition: An epilogue. In M. DeGraff (ed.), Language creation and language change: Creolization, diachrony, and development, 473–543. Cambridge, MA: MIT Press.

294

jennifer culbertson

Demiral, S¸. B., M. Schlesewsky, and I. Bornkessel-Schlesewsky. 2008. On the universality of language comprehension strategies: Evidence from Turkish. Cognition 106: 484–500. de Vries, M. H., P. Monaghan, S. Knecht, and P. Zwitserlood. 2008. Syntactic structure and artificial grammar learning: The learnability of embedded hierarchical structures. Cognition 107: 763–774. Dryer, M. 1992. The Greenbergian word order correlations. Language 68: 81–183. Dryer, M. S. 1997. On the six-way word order typology. Studies in Language 21: 69–103. Dryer, M. 2013a. Order of adjective and noun. In M. S. Dryer and M. Haspelmath (eds), The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology. Dryer, M. 2013b. Order of numeral and noun. In M. S. Dryer and M. Haspelmath (eds), The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology. Dryer, M. S. 2013c. Prefixing vs. suffixing in inflectional morphology. In M. S. Dryer and M. Haspelmath (eds), The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology. Dryer, M. 2018. On the order of demonstrative, numeral, adjective and noun. Language 94: 798–833. Dryer, M. S., and M. Haspelmath (eds) 2013. The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology. Dunn, M., S. Greenhill, S. Levinson, and R. Gray. 2011. Evolved structure of language shows lineage-specific trends in word-order universals. Nature 473: 79–82. Evans, N., and S. C. Levinson. 2009. The myth of language universals: Language diversity and its importance for cognitive science. Behavioral and Brain Sciences 32: 429–448. Fedzechkina, M., T. F. Jaeger, and E. L. Newport 2012. Language learners restructure their input to facilitate efficient communication. Proceedings of the National Academy of Sciences 109: 17897–17902. Fedzechkina, M., B. Chu, and T. Florian Jaeger. 2018. Human information processing shapes language change. Psychological Science 29: 72–82. Fedzechkina, M., T. F. Jaeger, and J. C. Trueswell. 2015. Production is biased to provide informative cues early: Evidence from miniature artificial languages. In D. C. Noelle, R. Dale, A. S. Warlaumont, J. Yoshimi, T. Matlock, C. D. Jennings, and P. P. Maglio (eds), Proceedings of the 37th Annual Meeting of the Cognitive Science Society, 674–679. Austin, TX: Cognitive Science Society. Fedzechkina, M., E. L. Newport, and T. F. Jaeger. 2017. Balancing effort and information transmission during language acquisition: Evidence from word order and case marking. Cognitive Science 41: 416–446. Ferdinand, V., S. Kirby, and K. Smith. 2019. The cognitive roots of regularization in language. Cognition 184: 53–68. Fischer, S. D. 1975. Influences on word-order change in American Sign Language. La Jolla, CA: Salk Institute for Biological Studies. Fitch, T., and M. Hauser. 2004. Computational constraints on syntactic processing in a nonhuman primate. Science 303. Frank, M. C., S. Goldwater, T. L. Griffiths, and J. B. Tenenbaum. 2010. Modeling human performance in statistical word segmentation. Cognition 117: 107–125. Frigo, L., and J. McDonald. 1998. Properties of phonological markers that affect the acquisition of gender-like subclasses. Journal of Memory and Language 39: 218–245.

artificial language learning

295

Futrell, R., T. Hickey, A. Lee, E. Lim, E. Luchkina, and E. Gibson. 2015. Cross-linguistic gestures reflect typological universals: A subject-initial, verb-final bias in speakers of diverse languages. Cognition 136: 215–221. Futrell, R., K. Mahowald, and E. Gibson. 2015b. Large-scale evidence of dependency length minimization in 37 languages. Proceedings of the National Academy of Sciences 112: 10336– 10341. Gell-Mann, M., and M. Ruhlen. 2011. The origin and evolution of word order. Proceedings of the National Academy of Sciences 108: 17290–17295. Gibson, E. 2000. The dependency locality theory: A distance-based theory of linguistic complexity. In A. Marantz, Y. Miyashita, and W. O’Neil (eds), Image, language, rain, 95–126. Cambridge, MA: MIT Press. Gibson, E., S. T. Piantadosi, K. Brink, L. Bergen, E. Lim, and R. Saxe. 2013. A noisy-channel account of crosslinguistic word-order variation. Psychological Science 24(7): 1079–1088. Gibson, E., R. Futrell, S. T. Piantadosi, I. Dautriche, K. Mahowald, L. Bergen, and R. Levy. 2019. How efficiency shapes human language. Trends in Cognitive Sciences 23(12): 1087. Givón, T. 1979. On understanding grammar. New York: Academic Press. Goldberg, A. E. 2013. Substantive learning bias or an effect of familiarity? Comment on Culbertson, Smolensky, and Legendre 2012. Cognition 127: 420–426. Goldin-Meadow, S., W. C. So, A.Özyürek, and C. Mylander. 2008. The natural order of events: How speakers of different languages represent events nonverbally. Proceedings of the National Academy of Sciences 105: 9163–9168. Greenberg, J. 1963. Some universals of grammar with particular reference to the order of meaningful elements. In J. Greenberg (ed.), Universals of language: 73–113. Cambridge, MA: MIT Press. Grimshaw, J. 1997. Projection, heads, and optimality. Linguistic Inquiry 28; 373–422. Grodner, D., and E. Gibson. 2005. Consequences of the serial nature of linguistic input for sentenial complexity. Cognitive Science 29: 261–290. Hahn, M., J. Degen, and R. Futrell. 2020. Modeling word and morpheme order in natural language as an efficient tradeoff of memory and surprisal. PsyArXiv Preprint. Hahn, M., D. Jurafsky, and R. Futrell. 2020. Universals of word order reflect optimization of grammars for efficient communication. Proceedings of the National Academy of Sciences 117: 2347–2353. Hall, M. L., R. I. Mayberry, and V. S. Ferreira. 2013. Cognitive constraints on constituent order: Evidence from elicited pantomime. Cognition 129: 1–17. Hall, M. L., V. S. Ferreira, and R. I. Mayberry. 2014. Investigating constituent order change with elicited pantomime: A functional account of svo emergence. Cognitive Science 38: 943–972. Haspelmath, M. 2008. Creating economical morphosyntactic patterns in language change. In J. Good (ed.), Language universals and language change, 185–214. Oxford: Oxford University Press. Hawkins, J. A. 1979. Implicational universals as predictors of word order change. Language 55: 618–648. Hawkins, J. A. 1990. A parsing theory of word order universals. Linguistic Inquiry 21: 223–261. Hawkins, J. A. 1994. A performance theory of order and constituency. Cambridge: Cambridge University Press. Hawkins, J. A. 2004. Complexity and efficiency in grammars. Oxford: Oxford University Press. Hawkins, J. A. 2007. Processing typology and why psychologists need to know about it. New Ideas in Psychology 25: 87–107.

296

jennifer culbertson

Hawkins, J. A., and A. Cutler. 1988. Psycholinguistic factors in morphological asymmetry. In J. A. Hawkins (ed.), Explaining language universals, 280–317. Oxford: Blackwell. Hay, J. 2001. Lexical frequency in morphology: Is everything relative? Linguistics 39: 1041– 1070. Hayes, B., and C. Wilson. 2008. A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry 39: 379–440. Hayes, B. P. 1999. Phonetically driven phonology: The role of optimality theory and inductive grounding. In M. Darnell, E. Moravcsik, F. J. Newmeyer, M. Noonan, and K. Wheatley (eds), Functionalism and formalism in linguistics, vol. 1: General papers. Amsterdam: Benjamins. Himmelmann, N. P. 2014. Asymmetries in the prosodic phrasing of function words: Another look at the suffixing preference. Language 90: 927–960. Hudson Kam, C., and A. Chang. 2009. Investigating the cause of language regularization in adults: Memory constraints or learning effects? Journal of Experimental Psychology: Learning, Memory, and Cognition 35: 815. Hudson Kam, C., and E. Newport. 2005. Regularizing unpredictable variation. Language Learning and Development 1: 151–195. Hudson Kam, C., and E. Newport. 2009. Getting it right by getting it wrong: When learners change languages. Cognitive Psychology 59: 30–66. Hupp, J. M., V. M. Sloutsky, and P. W. Culicover. 2009. Evidence for a domain-general mechanism underlying the suffixation preference in language. Language and Cognitive Processes 24: 876–909. Jackendoff, R. 2002. Foundations of language: Brain, meaning, grammar, evolution. Oxford: Oxford University Press. Jaeger, T. F., P. Graff, W. Croft, and D. Pontillo. 2011. Mixed effect models for genetic and areal dependencies in linguistic typology. Linguistic Typology 15: 281–319. Jager, G. 2007. Evolutionary game theory and typology: A case study. Language 83: 74–109. Kamp, H., and B. Partee. 1995. Prototype theory and compositionality. Cognition 57: 129–191. Kirby, S., M. Tamariz, H. Cornish, and K. Smith. 2015. Compression and communication in the cultural evolution of linguistic structure. Cognition 141: 87–102. Kirton, F., Schouwstra, M., Culbertson, J., Smith, K., and Kirby, S. (2021). Constituent order in silent gesture reflects the perspective of the producer. Journal of Language Evolution, 6(1): 54–76. Koopman, H. J., and A. Szabolcsi. 2000. Verbal complexes. Cambridge, MA: MIT Press. Ladd, D. R., S. G. Roberts, and D. Dediu. 2014. Correlational studies in typological and historical linguistics. Annual Review of Linguistics 1. Lai, J., and F. H. Poletiek. 2011. The impact of adjacent-dependencies and staged-input on the learnability of center-embedded hierarchical structures. Cognition 118: 265–273. Legate, J. A., and C. D. Yang. 2002. Empirical re-assessment of stimulus poverty arguments. Linguistic Review: 18: 151–162. Levshina, N. in press. Communicative efficiency and differential case marking: A reverseengineering approach. Linguistics Vanguard. Lightfoot, D. 1997. Catastrophic change and learning theory. Lingua 100: 171–192. Lupyan, G., and R. Dale. 2010. Language structure is partly determined by social structure. PloS One 5: e8559.

artificial language learning

297

Maldonado, M., and Culbertson, J. (2020). Person of interest: Experimental investigations into the learnability of person systems. Linguistic Inquiry: 1–42. Maldonado, M., C. Saldana, and J. Culbertson. 2020. Learning biases in person-number linearization. In M. Asatryan, Y. Song, and A. Whitmal (eds), Proceeding of of the Fiftieth Annual Meeting of the North East Linguistic Society, 163–177. Cambridge, MA: MIT Press. Marno, H., A. Langus, M. Omidbeigi, S. Asaadi, S. Seyed-Allaei, and M. Nespor. 2015. A new perspective on word order preferences: The availability of a lexicon triggers the use of SVO word order. Frontiers in Psychology 6. Martin, A., and J. Culbertson. 2020. Revisiting the suffixing preference: Native language affixation patterns influence perception of sequences. Psychological Science 31: 1107–1116. Martin, A., and J. White. 2019. Vowel harmony and disharmony are not equivalent in learning. Linguistic Inquiry: 1–20. Martin, A., K. Abels, T. Ratitamkul, and J. Culbertson. 2019. Cross-linguistic evidence for cognitive universals in the noun phrase. Linguistics Vanguard 5. Martin, A., A. Holtz, K. Abels, D. Adger, and J. Culbertson. 2020. Experiment evidence for the influence of structure and meaning on linear order in the noun phrase. Glossa 5: 1–21. Martinet, A. 1968. La linguistique synchronique. Études et recherches. Paris: Presses universitaires de France. Mathews, R. C., and L. G. Roussel. 1997. Abstractness of implicit knowledge: A cognitive evolutionary perspective. In D. C. Berry (ed.), How implicit is implicit learning?, 13–47. Oxford: Oxford University Press. Maurits, L., and T. L. Griffiths. 2014. Tracing the roots of syntax with Bayesian phylogenetics. Proceedings of the National Academy of Sciences. Mintz, T. H., E. L. Newport, and T. G. Bever. 2002. The distributional structure of grammatical categories in speech to young children. Cognitive Science 26: 393–424. Moeser, S. D., and A. S. Bregman. 1972. The role of reference in the acquisition of a miniature artificial language. Journal of Verbal Learning and Verbal Behavior 11: 759–769. Moeser, S. D., and A. S. Bregman. 1973. Imagery and language acquisition. Journal of Verbal Learning and Verbal Behavior 12: 91–98. Moreton, E. 2008. Analytic bias as a factor in phonological typology. In C. B. Chang and H. J. Haynie (eds), Proceedings of the 26th West Coast Conference on Formal Linguistics, 393–401. Somerville, MA: Cascadilla. Moreton, E., and J. Pater. 2012. Structure and substance in artificial-phonology learning, part I: Structure. Language and Linguistics Compass 6: 686–701. Morgan, J. L., and E. L. Newport. 1981. The role of constituent structure in the induction of an artificial language. Journal of Verbal Learning and Verbal Behavior 20: 67–85. Morgan, J. L., K. M. Bonamo, and L. L. Travis. 1995. Negative evidence on negative evidence. Developmental Psychology 31: 180–197. Morgan, J. L., R. P. Meier, and E. L. Newport. 1987. Structural packaging in the input to language learning: Contributions of prosodic and morphological marking of phrases to the acquisition of language. Cognitive Psychology 19: 498–550. Morgan, J. L., R. P. Meier, and E. L. Newport. 1989. Facilitating the acquisition of syntax with cross-sentential cues to phrase structure. Journal of Memory and Language 28: 360–374. Motamedi, Y., M. Schouwstra, K. Smith, J. Culbertson, and S. Kirby. 2019. Evolving artificial sign languages in the lab: From improvised gesture to systematic sign. Cognition 192: 103964.

298

jennifer culbertson

Musso, M., A. Moro, V. Glauche, M. Rijntjes, J. Reichenbach, C. Buchel, and C. Weiller. 2003. Broca’s area and the language instinct. Nature Neuroscience 6: 774–781. Newmeyer, F. 2005. Possible and probable languages. New York: Oxford University Press. Ohala, J. J. 1992. What’s cognitive, what’s not, in sound change. In G. Kellermann and M. Morrissey (eds), Diachrony within synchrony: Language history and cognition,309–355. Frankfurt: Lang. Ohala, J. J. 1993. The phonetics of sound change. In C. Jones (ed.), Historical linguistics: Problems and perspectives, 237–278. Harlow: Longman. Partee, B. H. 1987. Noun phrase interpretation and type-shifting principles. In J. Groenendijk, D. de Jongh, and M. Stokhof (eds), Studies in discourse representation theory and the theory of generalized quantifiers, 115–143. Dordrecht: Foris. Pater, J. 2011. Emergent systemic simplicity (and complexity). McGill Working Papers in Linguistics, 22. Perfors, A., and N. Burns. 2010. Adult language learners under cognitive load do not overregularize like children. In Proceedings of the 32nd Annual Meeting of the Cognitive Science Society. Perfors, A., J. Tenenbaum, and T. Regier. 2011. The learnability of abstract syntactic principles. Cognition 118: 306–338. Perruchet, P., and A. Rey. 2005. Does the mastery of center-embedded linguistic structures distinguish humans from nonhuman primates? Psychonomic Bulletin and Review 12: 307– 313. Piantadosi, S. T., and E. Gibson. 2014. Quantitative standards for absolute linguistic universals. Cognitive Science 38: 736–756. Pinker, S. 1984. Language learnability and language development. Cambridge, MA: Harvard University Press. Pozzan, L., and J. C. Trueswell. 2015. Revise and resubmit: How real-time parsing limitations influence grammar acquisition. Cognitive Psychology 80: 73–108. Prince, A., and P. Smolensky. 1993/2004. Optimality Theory: Constraint interaction in generative grammar. Technical Report, Rutgers University and University of Colorado at Boulder. Revised version published by Blackwell. Pullum, G., and B. C. Scholz. 2002. Empirical assessment of stimulus poverty arguments. Linguistic Review 19: 9–50. Pycha, A., P. Nowak, E. Shin, and R. Shosted. 2003. Phonological rule-learning and its implications for a theory of vowel harmony. In M. Tsujimura, and G. Garding (eds), Proceedings of the 22nd West Coast Conference on Formal Linguistics, 101–114. Somerville, MA: Cascadilla. Reali, F., and T. L. Griffiths. 2009. The evolution of frequency distributions: Relating regularization to inductive biases through iterated learning. Cognition 111: 317–328. Reber, A. S. 1967. Implicit learning of artificial grammars. Journal of Verbal Learning and Verbal Behavior 6: 855–863. Redington, M., N. Chater, and S. Finch. 1998. Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science 22: 425–469. Reeder, P. A., E. L. Newport, and R. N. Aslin. 2013. From shared contexts to syntactic categories: The role of distributional information in learning linguistic form-classes. Cognitive Psychology 66: 30–54. Rijkhoff, J. 2004. The noun phrase. Oxford: Oxford University Press.

artificial language learning

299

Rubio-Fernandez, P., F. Mollica, and J. Jara-Ettinger. 2020. Speakers and listeners exploit word order for communicative efficiency: A cross-linguistic investigation. Journal of Experimental Psychology: General: 150(3): 583–594. Saffran, J. R., R. N. Aslin, and E. L. Newport. 1996. Statistical learning by 8-month-old infants. Science 274: 1926–1928. Saldana, C., Y. Oseki, and J. Culbertson. 2021. Cross-linguistic patterns of morpheme order reflect cognitive biases: An experimental study of case and number morphology. Journal of Memory and Language 118: 104204. Saldana, C., K. Smith, S. Kirby, and J. Culbertson. 2022. Is regularisation uniform across linguistic levels? Comparing learning and production of unconditioned probabilistic variation in morphology and word order. Language Learning and Development 17(2): 158–188. Sandler, W., C. Padden, and M. Aronoff. 2005. The emergence of grammar: Systematic structure in a new language. PNAS 102: 2661–2665. Schouwstra, M., and H. de Swart. 2014. The semantic origins of word order. Cognition 131: 431–436. Schouwstra, M., K. Smith, and S. Kirby. 2016. From natural order to convention in silent gesture. In S. Roberts, C. Cuskley, L. McCrohon, L. Barcelo´-Coblijn, O. Feher, and T. Verhoef (eds), The Evolution of Language. Singler, J. 2006. Children and creole genesis. Journal of Pidgin and Creole Languages 21: 157–173. Singleton, J. L., and E. L. Newport. 2004. When learners surpass their models: The acquisition of American Sign Language from inconsistent input. Cognitive Psychology 49: 370–407. Smith, K., and J. Culbertson. submitted. Communicative pressures shape language during communication (not learning): Evidence from casemarking in artificial languages. Preprint: https://psyarxiv.com/5nwhq/. Smith, K., and E. Wonnacott. 2010. Eliminating unpredictable variation through iterated learning. Cognition, 116: 444–449. Smith, N. V., I.-M. Tsimpli, and J. Ouhalla. 1993. Learning the impossible: The acquisition of possible and impossible languages by a polyglot savant. Lingua 91: 279–347. St. Clair, M. C., P. Monaghan, and M. Ramscar. 2009. Relationships between language structure and language learning: The suffixing preference and grammatical categorization. Cognitive Science 33: 1317–1329. Steedman, M. 2020. A formal universal of natural language grammar. Language 96: 618–660. Steriade, D. 1997. Phonetics in phonology: the case of laryngeal neutralization. MS. https://linguistics.ucla.edu/people/steriade/papers/PhoneticsInPhonology.pdf. Tabullo, Á., M. Arismendi, A. Wainselboim, G. Primero, S. Vernis, E. Segura, S. Zanutto, and A. Yorio. 2012. On the learnability of frequent and infrequent word orders: An artificial language learning study. Quarterly Journal of Experimental Psychology 65: 1848–1863. Thompson, B., S. Kirby, and K. Smith 2016. Culture shapes the evolution of cognition. Proceedings of the National Academy of Sciences 113: 4530–4535. Tily, H., M. Frank, and T. Jaeger 2011. The learnability of constructed languages reflects typological patterns. In L. Carlson, C. Hoelscher, and T. F. Shipley (eds), Proceedings of the 33rd Annual Conference of the Cognitive Science Society, 1364–1369. Austin, TX: Cognitive Science Society. Travis, L. (1984). Parameters and effects of word order variation. PhD dissertation, Massachusetts Institute of Technology.

300

jennifer culbertson

Trudgill, P. 2011. Sociolinguistic typology. New York: Oxford University Press. Trueswell, J. C., D. Kaufman, A. Hafri, and J. Lidz. 2012. Development of parsing abilities interacts with grammar learning: Evidence from Tagalog and Kannada. In Proceedings of the 36th Annual Boston University Conference on Language Development, 620–632). Somerville, MA: Cascadilla. Vennemann, T. 1976. Categorial grammar and the order of meaningful elements. In A. Juilland (ed.), Linguistic studies offered to Joseph Greenberg on the occasion of his sixtieth birthday, 615–634. Saratoga, CA: Anma Libri. Wang, F., S. Kirby, and J. Culbertson. Submitted. Typology reflects learning biases in crosscategory harmony. Proceedings of the 43rd Annual Meeting of the Cognitive Science Society. White, J. 2014. Evidence for a learning bias against saltatory phonological alternations. Cognition 130: 96–115. White, J. 2017. Accounting for the learnability of saltation in phonological theory: A maximum entropy model with a p-map bias. Language 93: 1–36. Whitman, J. 2008. The classification of constituent order generalizations and diachronic explanation. In J. Good (ed.), Linguistic universals and language change, 233–252. Oxford: Oxford University Press. Williams, J. N. 2005. Learning without awareness. Studies in Second Language Acquisition 27: 269. Wilson, C. 2003. Experimental investigation of phonological naturalness. In G. Garding, and M. Tsu- jimura (eds), Proceedings of the 22nd West Coast Conference on Formal Linguistics: 101–114). Somerville, MA: Cascadilla Press. Wilson, C. 2006. Learning phonology with substantive bias: An experimental and computational study of velar palatalization. Cognitive Science, 30: 945–982. Yang, C. D. 2000. Internal and external forces in language change. Language variation and change, 12: 231–250.

Annotated bibliography for Part II

..........................................................................................................................

Each of the contributors in this section has compiled a brief annotated bibliography of resources for readers interested in learning how to use the methods discussed in the chapters. The annotated bibliographies are organized below by chapter. ***

Chapter 5. Behavioral acquisition methods with infants (Laurel Perkins and Jeffrey Lidz)

..........................................................................................................................

Oakes, L. 2010. Using Habituation of Looking Time to assess mental processes in infancy. Journal of Cognition and Development 11: 255–268. Habituation of Looking Time uses the tendency for animals of any kind to decrease their attention towards a stimulus over time and to increase attention when a stimulus requires encoding. This paper provides a set of best practices for using habituation of looking time techniques. It identifies a set of concerns with habituation designs and the best ways to minimize the effects of infant biases and experimental artifacts. Golinkoff, R.M., W. Ma, L. Song, and K. Hirsh-Pasek. 2013. Twenty-five years using the Intermodal Preferential Looking Paradigm to study language acquisition: what have we learned? Perspectives on Psychological Science 8: 316–339. The Intermodal Preferential Looking Paradigm measures children’s attention to one of two visual stimuli as a function of their match to an auditory stimulus. Typically, two images or scenes are presented and infants hear a word or sentence. If the infant perceives a match between one of the scenes and the sentence, then they spend more time looking at that scene. This paper discusses the history of the Intermodal Preferential Looking method, the range of questions that have been asked using it, and variations in how it has been deployed. This task has been used to assess phonological representations, lexical meanings, and syntactic representations. It has also been used as a tool in word-learning experiments, asking how children generalize the meaning of a novel word. The paper also reviews the use of this method in real-time processing measures (Looking While Listening) and in non-linguistic tasks. Kemler-Nelson, D. G., P. W. Jusczyk, D. R. Mandel, J. Myers, A. Turk, and L. Gerken. 1995. The Head-Turn Preference Procedure for testing auditory perception. Infant Behavior and Development 18: 111–116.

302

annotated bibliography for part ii

The Head-Turn Preference Procedure (HPP) is based on infants’ tendency to orient visually towards a sound source that they are attending. Infants can learn to maintain this visual response in order to continue processing. In the HPP, the infant is presented with an audio stimulus paired with a blinking light attached to the audio speaker. If the infant continues looking at the light, the audio stimulus continues playing. If the infant looks away, the audio stops. Differences in looking time to different audio stimuli reflect differences in the desire to keep listening, which can be caused by differences in depth of processing. This paper provides details about how to use head-turn preferences as a measure of children’s auditory and linguistic representations. It reviews the logic behind the technique, and designs that provide the greatest sensitivity. In addition, it reports tests of reliability and observer bias, supporting the robustness of the method. Werker, J. F., L. Polka, and J. E. Pegg. (1997). The Conditioned Head Turn Procedure as a Method for Testing Infant Speech Perception. Early Development and Parenting 6: 171–178. In the Conditioned Head Turn Procedure, infants are taught to turn their heads in response to a particular sound or word. They are taught that a visual reinforcer will light up when the auditory stimulus changes. Experimenters measure anticipatory looks to the stimulus change before the reinforcer lights up. This article reviews the procedure, discussing its application to auditory and linguistic stimuli, the ideal ages for implementing it, adaptations for older children and its strengths and limitations. Fernald, A., R. Zangl, A. L. Portillo, and V. Marchman. 2008. Looking while listening: Using eye movements to monitor spoken language comprehension by infants and young children. In I. A. Sekerina, E. M. Fernandez, and H. Clahsen (eds), Developmental psycholinguistics: On-line methods in children’s language processing, 97–135. Amsterdam: Benjamins. The Looking While Listening method uses real-time measures of children’s patterns of visual attention in response to speech. Unlike the Intermodal Preferential Looking paradigm, it relies on moment-to-moment patterns of gaze direction, providing a picture of how words and sentences are processed as they unfold in time. Infants look at pairs of pictures while listening to speech naming or describing one of the pictures. The article reviews the need for on-line measures of sentence understanding in development and the use of Looking While Listening as a probe for incremental processing of phonological, lexical and syntactic structure. It also describes the nuts and bolts of putting together an experiment, data analysis and the stability and validity of on-line processing measures in development. ***

annotated bibliography for part ii

303

Chapter 6. Behavioral acquisition methods with preschool-age children (Kristen Syrett)

..........................................................................................................................

The following textbooks provide a student-friendly introduction to the methodologies used in language acquisition research in early childhood. McDaniel, Dana, Cecile McKee, and Helen Smith Cairns. 1996. Methods for assessing children’s syntax. Cambridge, MA: MIT Press. This resource provides an overview of many of the main methodologies used in investigating syntax in children age 2 to 6. It is a go-to guide for instructors teaching language acquisition and development, acquisition methodologies, and practicum courses, as well as students engaged in independent language acquisition research. The chapters provide step-by-step instructions, warnings about pitfalls to avoid, and features to incorporate for best practices. These methodologies have also been used to investigate children’s knowledge of pragmatics and semantics, and their knowledge of phenomena at the interface of these sub-areas of language. The contributors are researchers who were at the forefront of research in language acquisition and development, and the research and methods represent the bread and butter of acquisition research through the early 2000s. The methods include the intermodal preferential looking paradigm, analysis of spontaneous speech, act-out tasks, the ‘questions after stories’ paradigm, the picture selection task, elicited production tasks, the truth value judgment task, and the grammaticality judgment task. Crain, Stephen, and Rosalind Thornton. 1998. Investigations in Universal Grammar: A guide to experiments on the acquisition of syntax and semantics (Chapters 26 and 27). Cambridge, MA: MIT Press. This groundbreaking work not only details the key characteristics of the Truth Value Judgment Task (TVJT), pioneered by the coauthors, but also presents a beautiful and compelling array of evidence gathered from administering the task to children over a number of years, revealing young children’s sophisticated syntactic and semantic knowledge. While the TVJT is now a tried and true methodology, and an impressive range of studies in the field of language acquisition and adult psycholinguistics have made use of the power of the TVJT, not every study has adhered to the design recommendations outlined in this volume, or incorporated the kind of experimental design elements that either make the study a true TVJT design or that guard against experimental errors (specifically Types I and II). Students and researchers who plan to undertake a study using the TVJT should consult this reference for clear guidelines on

304

annotated bibliography for part ii

what to do, how to do it, and why to do it this way (or why deviations are motivated). In short, this book is a must-have for investigators using the TVJT. Blom, Elma, and Sharon Unsworth. 2010. Experimental methods in language acquisition research. Philadelphia, PA: Benjamins. This volume builds upon previous methodological guides to provide students and researchers with practical information on how to administer experimental tasks, not only for investigations of first-language acquisition in typical populations, but also with second-language learners and atypical populations, and comparison among populations. Methodologies include some of the offline methods discussed in the volumes mentioned above, as well as online methodologies and an introduction to research in acquisition using Event-Related Potentials (ERP). It also moves beyond binary judgment tasks to those using magnitude estimation, and introduces research on computational modeling in acquisition. The volume also includes useful tips about conducting data analysis. ***

Chapter 7. Modeling syntactic acquisition (Lisa S. Pearl)

..........................................................................................................................

In my experience, after you have a general idea about how syntactic acquisition modeling works, the best way to learn how to model yourself is to really walk through the nitty-gritty details of specific models. You’ll particularly benefit from poring over ones which (i) are modeling phenomena that are similar to the phenomena you’re interested in or (ii) are using approaches that are similar to the approach you want to use. Because of that, the references below are models of syntactic acquisition that either provide a lot of detail about their implementation, the motivation for that implementation, or both. I’ve broken them down by modeling approach.

Bayesian inference Abend, Omri, Tom Kwiatkowski, Nathaniel J. Smith, Sharon Goldwater, and Mark Steedman. 2017. Bootstrapping language acquisition. Cognition 164: 116–143. This demonstrates a computational-level Bayesian model of learning syntactic structure rules, with detail about the motivation for the model and a lot of detail in the appendices about model implementation. Orita, Naho, Rebecca McKeown, Naomi H. Feldman, Jeffrey Lidz, and Jordan BoydGraber. 2013. Discovering pronoun categories using discourse information. Proceedings of the 35th annual conference of the Cognitive Science Society, vol. 35, Berlin.

annotated bibliography for part ii

305

This demonstrates a computational-level Bayesian model of learning pronoun classes, with good detail about the motivations for the model design and empirical grounding of model variables. Pearl, Lisa, and Benjamin Mis. 2016. The role of indirect positive evidence in syntactic acquisition: A look at anaphoric one. Language 92(1): 1–30. This demonstrates an algorithmic-level Bayesian model for learning English anaphoric one, with extensive supplementary material that walks through the modeling implementation. Pearl, Lisa, and Jon Sprouse. 2019. Comparing solutions to the linking problem using an integrated quantitative framework of language acquisition. Language 95: 583–611. This demonstrates a computational-level Bayesian model of learning linking theories, with extensive information in the appendices about how the model is implemented and why the specific modeling decisions were made. Perfors, Amy, Joshua Tenenbaum, and Terry Regier. 2011. The learnability of abstract syntactic principles. Cognition 118: 306–338. This demonstrates a computational-level Bayesian model for learning to prefer structure- dependent representations, with detailed appendices about the model implementation.

The Tolerance Principle Pearl, Lisa, and Jon Sprouse. 2021. The acquisition of linking theories: A Tolerance and Sufficiency Principle approach to deriving UTAH and rUTAH. Language Acquisition 28: 294–325. This demonstrates a computational-level model of deriving linking theories using the Tolerance Principle, with a lot of detail about the motivation for the implementation choices. Yang, Charles. 2016. The price of linguistic productivity: How children learn to break the rules of language. Cambridge, MA: MIT Press. Chapters 3–6 provide motivation and an explicit walk-through of how the Tolerance Principle was derived (chapter 3), along with several examples of the application of the Tolerance Principle (chapters 4–6).

306

annotated bibliography for part ii

Reinforcement learning Yang, Charles. 2002. Knowledge and Learning in Natural Language. Oxford: Oxford University Press. Chapter 2 provides the cognitive motivations for variational learning, along with its implementation, and both chapters 2 and 4 offer several examples of its application.

Structural triggers learner Fodor, Janet Dean. 2017. Ambiguity, parsing, and the evaluation measure. Language Acquisition 24(2): 85–99. This walks through the nitty-gritty about how parsing with treelets is meant to work, providing the motivation for modeling decisions.

Simple recurrent networks Frank, Robert, Donald Mathis, and William Badecker. 2013. The acquisition of anaphora by simple recurrent networks. Language Acquisition 20(3): 181–227. This provides a detailed walk-through of a simple recurrent network for learning how to resolve anaphora, with a special focus on how to tell what the model is doing internally to generate its observable behavior.

Other probabilistic approaches Freudenthal, Daniel, Julian Pine, and Fernand Gobet. 2010. Explaining quantitative variation in the rate of Optional Infinitive errors across languages: A comparison of MOSAIC and the Variational Learning Model. Journal of Child Language 37(03): 643–669. This provides a detailed walk-through of the algorithmic-level MOSAIC model, compared against variational learning for the acquisition of optional infinitives. Pearl, Lisa, and Jon Sprouse. 2013. Computational models of acquisition for islands. In Jon Sprouse and Norbert Hornstein (eds), Experimental syntax and islands effects, 109–131. Cambridge: Cambridge University Press. This demonstrates an algorithmic-level model for learning syntactic islands, with details about the motivation for model implementation choices and an appendix describing the implementation. ***

annotated bibliography for part ii

307

Chapter 8. Artificial language learning (Jennifer Culbertson)

..........................................................................................................................

Culbertson, J., and D. Adger. 2014. Language learners privilege structured meaning over surface frequency. Proceedings of the National Academy of Sciences 111: 5842–5847. This paper investigates a cognitive bias for homomorphism—a transparent mapping between conceptual structure and linear order—in noun phrase word order. The method used is the poverty-of-the-stimulus (extrapolation) paradigm, which originated in work on artificial language learning and phonological typology. See Martin et al. (2019; 2020) for important replications. Culbertson, J., and E. L. Newport. 2015. Harmonic biases in child learners: In support of language universals. Cognition 139: 71–82. This study uses the regularization paradigm, first introduced by Hudson Kam and Newport (2005, 2009) to compare English-speaking child and adult learners’ treatment of unconditioned variation in the input. Here, this paradigm is used to show that child learners are biased in favour of word order harmony. Specifically, they are more likely to regularize input systems which order different types of modifiers either consistently before or after the noun. See also Culbertson and Newport (2017). Culbertson, J., and K. Schuler. 2019. Artificial language learning in children. Annual Review of Linguistics 5: 353–373. A review article focusing on artificial language learning in children, with specific coverage of studies connecting learning and typology. Culbertson, J., P. Smolensky, and G. Legendre. 2012. Learning biases predict a word order universal. Cognition 122: 306–329. This paper tests whether the biases of English-speaking adult learners reflect Greenberg’s Universal 18: If a language has pre-nominal adjectives it also has pre-nominal numerals. The study uses the regularization paradigm to investigate whether learners are more likely to regularize patterns conforming to this universal. Culbertson, J., J. Franck, G. Braquet, M. Barrera Navarro, and I. Arnon. 2020. A learning bias for word order harmony: Evidence from speakers of non-harmonic languages. Cognition (204): 104392. A follow-up study building on Culbertson et al. (2012) and Culbertson and Newport (2015) investigating learning of nominal word order—particularly harmony—in speakers of non-harmonic languages (French and Hebrew).

308

annotated bibliography for part ii

Fedzechkina, M., T. F. Jaeger, and E. L. Newport. 2012. Language learners restructure their input to facilitate efficient communication. Proceedings of the National Academy of Sciences 109: 17897–17902. This study uses the regularization paradigm to test whether learners reorganize variable case-marking systems in order to mark unexpected or potentially ambiguous agent and patient and agent types. Fedzechkina, M., B. Chu, and T. Florian Jaeger. 2018. Human information processing shapes language change. Psychological Science 29(1): 72–82. A large body of work suggests that languages and language users prefer shorter dependencies, and actively shorten dependencies when possible. This study uses the regularization paradigm to show that learners acquiring a language variable order of subject and object will use that variation to shorten dependencies. Futrell, R., T. Hickey, A. Lee, E. Lim, E. Luchkina, and E. Gibson. 2015. Cross-linguistic gestures reflect typological universals: A subject-initial, verb-final bias in speakers of diverse languages. Cognition 136: 215–221. This is a cross-linguistic experimental study showing that speakers across a range of L1s exhibit consistent patterns of gesture order when asked to spontaneously improvise a gesture to communicate simple transitive events. When events are non-reversible (i.e. agent is not likely to be mistaken for the patient and vice versa), SOV is used. When events are reversible, SVO is more likely to be used. Goldin-Meadow, S., W. C. So, A. Özyürek, and C. Mylander. 2008. The natural order of events: How speakers of different languages represent events nonverbally. Proceedings of the National Academy of Sciences 105: 9163–9168. This is the first study to use the silent-gesture paradigm, in which non-signing participants are asked to spontaneously improvise gestures to communicate simple events or pictures. They found that speakers of both SVO and SOV languages use gesture sequences corresponding to Agent–Patient–Action in response to simple transitive events. Hudson Kam, C., and E. Newport. 2009. Getting it right by getting it wrong: When learners change languages. Cognitive Psychology 59: 30–66. This paper was the first to use the regularization paradigm, and compares regularization of unconditioned variation in morphosyntax by children and adults. Kirby, S., M. Tamariz, H. Cornish, and K. Smith. 2015. Compression and communication in the cultural evolution of linguistic structure. Cognition 141: 87–102.

annotated bibliography for part ii

309

This paper builds on the original work by Kirby, Cornish, and Smith (2009), using the iterated learning paradigm to investigate the conditions under which compositionality can emerge in a language. See also Motamedi et al. (2019) for a conceptual replication using silent gesture. Maldonado, M., and Culbertson, J. (2020). Person of interest: Experimental investigations into the learnability of person systems. Linguistic Inquiry: 1–42. Using both ease-of-learning and extrapolation paradigms, this paper presents a series of experiments evaluating competing theories of person systems by teaching Englishspeaking participants novel pronoun systems. Martin, A. and Culbertson, J. 2020. Revisiting the suffixing preference: Native language affixation patterns influence perception of sequences. Psychological Science 31(9): 1107–1116. This paper uses a similarity judgement task to reassess claims of a universal perceptual bias as the driving force behind the so-called suffixing preference. They find that while English speakers judge sequences as more similar when they differ at the ends (replicating Hupp et al. 2009), speakers of a predominantly prefixing Bantu language Kîîtharaka have the opposite preference. Moreton, E., and J. Pater. 2012a. Structure and substance in artificial-phonology learning, part I: Structure. Language and Linguistics Compass 6: 686–701. Moreton, E., and J. Pater. 2012b. Structure and substance in artificial-phonology learning, part II: Substance. Language and Linguistics Compass 6: 702–718. These two papers present an excellent review of literature on artificial language learning experiments on phonological typology. Saldana, C., Y. Oseki, and J. Culbertson. 2021. Cross-linguistic patterns of morpheme order reflect cognitive biases: An experimental study of case and number morphology. Journal of Memory and Language 118: 104204. This paper investigates Greenberg’s Universal 39, which concerns the relative order of number and case marking morphology on nouns. Using the extrapolation paradigm, they find across a series of experiments that English- and Japanese-speaking participants infer number closer to the noun than case, mirroring typology. Tabullo, Á., M. Arismendi, A. Wainselboim, G. Primero, S. Vernis, E. Segura, S. Zanutto, and A. Yorio. 2012. On the learnability of frequent and infrequent word orders: An artificial language learning study. Quarterly Journal of Experimental Psychology 65: 1848–1863. This paper uses a straightforward ease-of-learning paradigm to test whether Spanishspeaking adults differentially learn the eight possible patterns of basic word order.

pa rt

iii

...................................................................................................

P SYC HOL I NG U I ST IC METHODS IN S Y N TA C T I C T H E O RY ...................................................................................................

c ha p t e r 9 ...........................................................................................................

s e l f - pa c e d r e a d i n g ...........................................................................................................

masaya yoshida

9.1 Introduction

..........................................................................................................................

The self-paced reading (SPR) method has been widely used in sentence comprehension research. SPR has been introduced to achieve a methodology that is “as similar as possible to normal reading” (Mitchell and Green 1978: 610) in order to measure the cognitive processes working behind online reading (Aaron and Scarborough 1976; Mitchell and Green 1978; Just, Carpenter, and Woolley 1982). In an SPR experiment, the reading time (RT) for each word or phrase in a sentence or in a series of sentences is measured. Typically, in this paradigm, a word or a phrase is presented on a computer screen, and when the research participant presses a button on the button box or a key on the keyboard the first word is replaced by another word. In this way, each press of the button reveals a sentence or a series of sentences that are presented as an experimental stimulus. The latencies of each button press are understood as the time spent for reading each segment (or, more standardly, a region). It is generally assumed that the reading time reflects the properties of each word, phrase, or (partial) structure in the sentence, including lexical and structural ambiguity, frequency of words and phrases, and complexity of the structure of the sentence. The RT thus reveals the time course of mental processes that are carried out during online reading. Traditionally, linguists are relying on native speakers’ “acceptability judgments” of sentences as the source of data. Recently studies have been suggesting that largerscale behavioral experiments can be useful in answering questions in theoretical linguistics (Phillips 2006; Phillips and Lasnik 2003; Sprouse 2008; 2011; Sprouse and Almeida 2012; Sprouse, Wagers, and Phillips 2012; Phillips 2013a; Sprouse and Almeida 2013; Sprouse, Schütze, and Almeida 2013). In addition to “acceptability judgment” methodologies, online sentence processing experiments, such as SPR experiments, have sometimes been employed to answer questions in formal syntax and semantics (Bever and McElree 1988; McErlee and Bever 1989; Pickering and Barry 1991; Koizumi and

314

masaya yoshida

Tamaoka 2004; Hackl, Koster-Hale, and Varvoutis 2012; Kotek and Hackl 2013; Huang, Stranahan, and Snedeker 2016 among many others). In such studies, online measures such as RT have been used to argue for or against syntactic or semantic analyses. Thus, it has been suggested that offline methodologies, such as acceptability judgments, acceptability-rating experiments, and sentence fragment-completion experiments as well as online methodologies, such as SPR experiments or eye-tracking while reading experiments, can potentially provide a rich source of data and insight for formal syntactic theories. In this light, this chapter reviews studies that employ SPR experiments, and ask the following questions: What can online sentence-processing studies which use the SPR method tell us about syntax, and how can online sentence-processing experiments help us make progress in syntactic theories? To this end I will restrict my attention here to sentence-processing studies that can potentially touch on the issues of formal syntax, and I will largely ignore the recent development of theories of sentence processing.

9.2 Sentence processing

..........................................................................................................................

9.2.1 Reading time slowdown One of the central goals of the study of sentence processing is to describe the mechanism of online sentence comprehension (see Fodor, Bever, and Garrett 1974; Frazier and Fodor 1978; Berwick and Weinberg 1984; Frazier 1987; Abney 1989; Frazier 1990; Abney and Johnson 1991; Gibson 1991; Crocker 1996; Phillips 1996; Pickering and Traxler 2000; Boland 2005; Traxler 2012 among many others). As part of the mechanism of sentence comprehension, a theory of syntactic parsing can be developed, which specifies, for example, how the structure of a sentence is built (see Crocker 1999 for a detailed discussion of mechanisms of structure building), what kind of structure is built, and what sort of linguistic and non-linguistic knowledge influences the structure-building (see Traxler 2012: ch.4 for an excellent recent summary). Thus typically, the study of sentence-processing involves a theory of structure-building (a structure-building algorithm), a theory of representation (a theory of syntax), and considerations of linguistic and extralinguistic factors that affect the structure-building processes (e.g. syntactic constraints, working memory consideration, the frequency of lexical items). In addition to these considerations, in order to experimentally investigate the mechanism of sentence processing, a linking hypothesis must be established. In other words, we have to have a certain hypothesis about what causes RT slowdown/speedup and how it is related to the theory of structure-building and the theory of syntax. A linking hypothesis is a hypothesis that links a linguistic theory to online cognitive processes (Hale 2003; Boland 2005; Kim, Kobele, Runner, and Hale 2011; Lewis and Phillips 2015; Hale 2016) and one that links the behavior of the model of the processing mechanism

self-paced reading

315

with observable measures (e.g. the RT) in experiments (Tanenhaus, Magnuson, Dahan, and Chambers 2000; Tanenhaus, Spivey-Knowlton, and Hnna 2000; Hale 2003; Crocker 2005).1 In other words, we need to establish the relationship between linguistic theories and the model of sentence processing, and between the model of sentence processing and the experimentally measurable data.2 Here, I would like to concentrate on linking hypotheses that are concerned with the mechanism of sentence processing and the experimentally measurable data. Intuitively, the experimentally measurable data (e.g. the RT) and the mechanism of sentence processing is something like the following: When part of the sentence is difficult to process, the RT is slower. However, it is not often clear what makes part of a sentence difficult to process, and how the structure of that sentence is built during online processing. Let us look at an example from a study of garden-path sentences and explore these points further. A sentence that contains local or temporary ambiguities such as in (1) have been extensively investigated in sentence-processing studies (e.g. Frazier and Rayner 1982; Ferreira and Henderson 1990; Slattery, Sturt, Christianson, Yoshida, and Ferreira 2013). (1)

a. The contestant imagined that the small tropical island would be completely deserted. b. The contestant imagined the small tropical island would be completely deserted.

In online reading experiments, the RT of the main verb would be in (1b) is found to be slower than that in (1a) (Frazier and Rayner 1982). What does this RT slowdown tell us? Normally, the RT slowdown is interpreted as the cost (or difficulty) associated with the reanalysis process. When the main verb would be is encountered, the parser is forced to reanalyze the structure of the sentence from (2a), the matrix direct object analysis, to (2b), the embedded subject analysis. (2)

a. The contestant [VP [V′ imagined [NP the small tropical island]]] b. The contestant [VP [V′ imagined [CP ø [TP [NP the small tropical island] [T′ would [VP be ...]]]

When the string imagined the small tropical island is encountered, the parser builds the direct-object structure in (2a) which is grammatically possible structure, and there is no element that suggests other analyses. When the parser recognizes the verb would be, it finds that the verb is not compatible with the structure in (2a) and is forced to reanalyze the structure. Recognizing that the second verb is incompatible with the matrix directobject structure in combination with the process of reanalysis leads to the RT slowdown. Thus, the RT slowdown is understood as being caused by the parser’s attempt at reanalysis, in which the parser builds one particular structure upon encountering an ambiguity 1

See e.g. Hale (2003) for an explicit illustration of theory of representation, theory of structure building, linking hypotheses and the relation among them. 2 See Lewis and Phillips (2015) for an extensive discussion on linking hypothesis that maps linguistic theories to the sentence-processing mechanism.

316

masaya yoshida

such as (2a), revises the current structure and builds a new structure (or rerank the structures in a parallel processing model) upon encountering a disambiguating word (the main verb would be), the embedded subject structure in (2b).3 This understanding of the RT slowdown in the processing of garden-path sentences like (1) requires certain assumptions about the components of a theory of sentence processing. First, it is assumed that a certain structure is built by the parser.4 In the case of (1), it is assumed that “the correct structure” of the sentence in (1b) is something like (2b). Second, it is assumed that the parser temporarily builds “a wrong structure” such as (2a), where the string imagined the small tropical island is analyzed as the VP of the main clause because the parser does not find any evidence against it until the matrix verb is encountered. Thus, the parser builds the structure online based on the information available locally. In (1b), the Noun Phrase (NP) is locally ambiguous between the direct object and the matrix subject, but the parser prefers the directobject structure over the embedded subject structure. This means that there is a mechanism and linguistic or extralinguistic factors that governs the structure-building process, such that the parser prefers the matrix object structure, a simpler structure, to the embedded subject structure. Furthermore, the parser does not delay in building a structure of the sentence. If the parser does not build the structure before an element that comes at a later point in the sentence such as the verb would be in (1b), the garden-path effect is not expected, as the parser can build the structure using the information from the embedded verb, which suggests that imagine takes the sentential complement. Finally, it is assumed that when the parser encounters the verb that suggests the current structure is “a wrong structure,” the parser reanalyzes the structure and builds “the correct structure.” In this view, the slower RT of the verb would be is linked to the parser’s reanalysis process. It is important to note that the observation (i.e. the RT slowdown) is not linked directly to the syntactic structure itself. Rather, it is linked to the hypothesized behavior of the mechanism of sentence processing. In other words, the grammar licenses a certain structure (e.g. the embedded subject structure), but the grammar does not predict the RT slowdown. It is the hypothesized behavior of the structure-building mechanism that predicts the RT slowdown.

3

See Sturt and Crocker (1996); Sturt (1997); Sturt, Pickering, and Crocker (199)9; Schneider and Phillips (2001); Sturt, Pickering, Scheepers, and Crocker (2001) for extensive discussions on the nature of reanalysis processes. 4 I will keep assuming that there is a mechanism that incrementally builds basic sentence structures, such as the left-corner algorithm, and I will not go into details about mechanisms of online structure building. See Abney and Johnson (1991); Gibson (1991); Crocker (1999); Hale (2014) for detailed discussion on structure-building mechanisms.

self-paced reading

317

9.2.2 Linking hypotheses In the experimental psycholinguistics literature, various linking hypotheses have been assumed in the model of online sentence processing. Some of the linking hypotheses in the literature are summarized in (3) (Crocker 2005 include a list of linking hypotheses proposed in the literature). Models of sentence processing assume that RT slowdown or speedup is due to: (3)

a. b. c. d. e. f. g. h.

Structural complexity (Frazier 1985) Backtracking (Abney 1989; Crocker 1996) Non-monotonicity (Sturt and Crocker 1996) Non-determinism (Marcus 1980) Reranking of parallel alternatives (Jurafsky 1996; Crocker and Brants 2000) Storage and integration cost (Gibson 1998) The reduction of uncertainty (Hale 2003) Competition (McRae, Spivey-Knowlton, and Tanenhaus 1998)

I do not go into details about each of the linking hypotheses listed in (3). However, it is important to note that each linking hypothesis in this list is concerned with what the parser does (e.g. reanalysis, or reranking of alternatives, storing elements in the memory) and with properties of the structure that the parser builds (e.g. structural complexity). Linking hypotheses are not directly concerned with the well-formedness of the syntactic representation per se. Given that most of the time, the theory of representation must be independently assumed and justified, the relation between the data from online experiments and the theory of representation (or derivation) must be indirect.5 Therefore, it is normally difficult to test theories of representation/derivation directly using the data from online sentence processing experiments.

9.3 Sentence processing and syntax

..........................................................................................................................

Given the view of the study of sentence processing outlined above, what can studies of sentence processing, which involve SRP methodology, tell us about formal syntax and formal linguistic theories? As noted above, formal linguistic theories or theories 5

Derivational Theory of Complexity (DTC) is one such idea in which syntactic derivation is linked directly to the structural complexity. Therefore, under DTC hypothesis the steps of derivation (number of transformation) were supposed as the predictor of the structural complexity, and the structural complexity as the predictor of the reaction time speed. Though DTC is an important idea in considering the relation between the grammar and the parser, the extensive review and implication of DTC is beyond the scope of this chapter. Therefore, I will not touch on DTC any further. For an extensive review and evaluation of DTC, see e.g. Fillenbaum (1971); Fodor et al. (1974); Levelt (1974); Berwick and Weinberg (1984); Bever (1988); Wanner (1988); Phillips (1996); Townsend and Bever (2001).

318

masaya yoshida

of syntactic representations/derivations alone do not provide predictions in terms of online sentence processing (see Berwick and Weinberg 1984; Crocker 1996 for an extensive discussion on this point). To interpret the data from online experiments, a theory of structure-building and a linking hypothesis must be supplied. As syntactic theories are not theories of sentence comprehension or sentence production (Chomsky 1965), even if a theory of sentence processing that incorporates a specific theory of syntax fails to predict the results of an online sentence-processing experiment, it does not necessarily mean that the theory of syntax is falsified. It is always possible that the problem lies in the theory of structure-building and/or the linking hypothesis (Berwick and Weinberg 1984; Crocker 1996; Phillips 1996). Results from online sentence processing do not bear directly on the theory of representation/derivation. Therefore, as Boland puts it, it is possible that “most psycholinguistic data is irrelevant to formal linguistic theory” (Boland 2005: 23). Still, I think (as Boland 2005 also notes), there are cases where sentence-processing studies can tell us something about syntax and formal linguistic theories. I would like to discuss some of such cases and think about the relation between online sentence-processing studies and formal linguistic theories. Specifically, I point out that online methodologies like SPR is useful when grammatical approaches and processing-based approaches are competing for a linguistic generalization.

9.3.1 Argument/adjunct distinction It is sometimes difficult to distinguish arguments from adjuncts (e.g. Larson 1988; Schütze and Gibson 1999; Boland 2005). For example, the instrumental PP with a monkey wrench in (4) shows properties of both arguments and adjuncts.6 Like arguments, instrumental PPs cannot be iterated as in (4a) and they can be extracted easily from a weak island as in (4b). However, like adjuncts, they can co-occur with VP pro-form, do so. (4) Kim changed the tire with a monkey wrench. a. *John cut the meat [PP with a knife] [PP with a sharp end]. b. [PP With which key] do you deny that the butler could have opened the door? c. John will eat the cake with a fork and Mary will do so [PP with a spoon]. Thus, based on standard syntactic tests, it is not easy to tell whether the instrumental PP is an argument or an adjunct. Boland (2005) argues that online sentence-processing studies can provide a clue to distinguish arguments from adjuncts. It has been known that the processing of PPs—integrating the newly encountered PPs into the existing structure during online sentence processing—is influenced strongly by “lexical frequency”: When a lexical item can be in multiple syntactic structures (e.g. a word like 6

These examples and judgments are taken from Boland (2005).

self-paced reading

319

duck can be used as a verb or a noun, or some verbs can have single objects or multiple objects), it is easier to process the more frequent syntactic structure that the lexical item specifies, and thus faster to process (results in a shorter RT) than less frequent ones. For example, both the verb suggest and delegate can take an NP as an object, but they can also take prepositional PP object, as illustrated in (5). (5)

a. The parents suggested the chore (to their kids). b. The parents delegated the chore (to their kids).

Boland shows that the dative PP occurs more frequently with delegate than with suggest. Assuming the linking hypothesis above, then, it is expected that the processing of the dative PP following delegate is easier and faster than the one following suggest. On the other hand, Boland argues that unlike arguments, adjuncts are not specified in a verb’s lexical information (i.e. adjuncts are not selected by the verb and thus adjuncts are not predicted from verb’s lexical information). Therefore, the verb does not specify how likely it is to be followed by adjunct PPs. Thus, the processing of adjunct PPs is not influenced by the choice of the verb. Boland et al. (2004) tested the following paradigm:7 The PP in (6a) is read faster than in (6b) in a phrase-by-phrase self-paced reading experiment. In other words, in the case of a dative PP, the lexical frequency effects were observed. On the other hand, the processing of PPs in (6c) and (6d) were not different: The lexical frequency effects are not observed for the processing of these PPs. (6)

a. b. c. d.

The chores that the parents delegated [PP to their kids] ... The chores that the parents suggested [PP to their kids] ... The tire that the mechanic changed [PP with a monkey wrench] ... The customer that the salesman noticed [PP with a quick glance] ...

The absence of lexical frequency effects in the processing of PPs in (6c) and (6d), following the linking hypothesis above, suggests that these PPs are not selected by the verb, and thus they are not arguments. In summary, Boland suggests that arguments and adjuncts can be distinguished in terms of whether a PP shows lexical frequency effects, which can be measured as RT slowdown. In Boland’s study, the crucial part is the linking hypothesis. She points out that RT is affected by lexical frequency, and that the verb’s lexical information (whether a verb selects certain PPs or not) influences those lexical frequency effects.

9.3.2 Parasitic gaps and islands Online reading experiments can provide us with new insights when the nature of generalizations that linguists have discovered is unclear. It is possible that some linguistic 7

The experiment and the explanation outlined here are greatly simplified. For details, see Boland (2005) and Boland et al. (2004).

320

masaya yoshida

phenomena are compatible with “processing” based accounts as well as “grammatical” accounts. In such cases online reading experiments can be particularly useful. If the difficulty or unacceptability of certain constructions is due to processing complexity, it is likely that we can observe the processing complexity effect in online sentence processing. Wh-movement cannot escape certain domains, such as islands (e.g. Ross 1967; Chomsky 1977; Huang 1982; Chomsky 1986; Rizzi 1990; Lasnik and Saito 1992; Uriagereka 1999; Hornstein, Lasnik, and Uriagereka 2007). For example, wh-extraction out of a complex NP subject gives rise to a severe degradation in acceptability. In (7a), the wh-phrase is moved from the matrix object position. In (7b), on the other hand, the wh-phrase is moved from the object position embedded within the subject NP. An acceptability rating experiment shows that sentences like (7b) are judged significantly worse than sentences like (7a) (Phillips 2006). (7)

a. The environmentalist investigated what [NP the local campaign to preserve the important habitats] had harmed __. b. *The environmentalist investigated what [NP the local campaign to preserve __] had harmed the annual migration. (Simplified examples from Phillips 2006: 805)

Traditionally, the unacceptability resulting from movement out of island domains has been explained syntactically; essentially, certain syntactic domains such as subjects, relative clauses, and clausal adjuncts are not compatible with wh-extractions (e.g. Ross 1967; Huang 1982; Chomsky 1986; Lasnik and Saito 1992; Uriagereka 1999; Hornstein et al. 2007). Recently, however, the grammatical explanation of islands has been challenged, and some studies suggest that island effects (the degradation of acceptability associated with wh-movement out of islands) should be understood as a processing phenomenon. Roughly put, sentences that involve wh-movement out of islands are too complex to process (e.g. Kluender and Kutas 1993; Hawkins 1999; Hofmeister and Sag 2010).8 Let us think about what aspects of the processing of wh-constructions could contribute to processing complexity. We can imagine certain component processes involved in the processing of wh-filler–gap dependencies. The wh-phrase must be linked to a verb or a preposition so that it can be properly interpreted, yet these elements do not always appear immediately after the wh-phrase. Furthermore, the distance between the wh-phrase and the verb or preposition can be potentially unbounded, and where in the sentence these elements are located is not signaled. Thus, during the online processing of wh-constructions, the wh-phrase should be maintained in memory until the parser finds a verb or a preposition to which the wh-phrase is linked. Thus, when the wh-sentences are processed, the parser maintains the wh-filler in memory (Wanner 1974; Gibson 2000; Gibson and Warren 2004; Wagers and Phillips 2009; Wagers and 8

Sprouse and Hornstein (2013) offers a collection of papers on islands from different perspectives and approaches. Specifically, readers are referred to Phillips (2013b), which provides an excellent overview of the issues surrounding islands (Phillips 2013b; Sprouse and Hornstein 2013).

self-paced reading

321

Phillips 2014), which may impact the memory resources. At the same time, the parser processes the materials intervening the filler and the gap position, which may independently contribute to the processing complexity (Kluender and Kutas 1993; Hofmeister and Sag 2010). Thus, when the wh-gap dependency spans a long distance, the parser needs to process the elements that intervene between the wh-phrase and the gap, while maintaining the wh-phrase in memory. It has been known that each lexical item has a different contribution to memory cost, depending on whether the element requires discourse reference (Gibson 1998 2000; Warren and Gibson 2002; Hofmeister and Sag 2010). NPs or any elements that require discourse reference incur a processing cost. For example, definite NPs are more costly than indefinite NPs. The processing-complexity approaches basically suggest that island effects are caused by the processing-complexity effect (because they have greater demands on the memory capacity), which is attributed to the filler and the materials intervening the filler and the gap (or the verb or the preposition). In other words, island violations like in (7b) are unacceptable because it is too difficult for the parser to posit a gap inside the complex subject NP. Phillips (2006) contends that if it is too difficult for the parser to posit a gap inside island domains, then the parser never attempts to posit a gap within an island domain or form a dependency that spans across an island boundary. Phillips points out that this prediction is apparently not confirmed. Firstly, when there is another gap in the matrix clause in an example like (7b), the gap inside the complex subject is understood as a parasitic gap (e.g. Engdahl 1983; Chomsky 1986; Culicover 2001), which is shown in (8). When a parasitic gap is licensed the apparent island-violating examples are judged much more acceptable. (8) The environmentalist investigated what [NP the local campaign to preserve __PG ] had harmed __. This does not follow straightforwardly from processing-complexity approaches to islands. From the processing-complexity point of view, the fact that the gap can be posited within the subject island means that the subject NP does not incur the processing cost. However, the gap can be posited within the subject island only when there is an additional gap in the matrix clause. Thus, it means that it is not too difficult to posit a gap within the complex subject NP only when there is another gap in the matrix clause. It is not clear, however, how an additional gap in the later position in the sentence can reduce the complexity associated with the complex subject NP. Furthermore, Phillips points out that when the complex subject contains a finite clause, having an additional gap like in (9b) does not improve the acceptability. Thus, parasitic gaps are not licensed within a finite clause in the complex subject NP. (9)

a. *The environmentalist investigated what [NP the local campaign [CP that preserved __]] had harmed the annual migration. b. *The environmentalist investigated what [NP the local campaign [CP that preserved __]] had harmed __.

322

masaya yoshida

The contrast between (8) and (9b) is due to the finiteness of the embedded clause, i.e. it is easier to find parasitic gaps in an infinite, non-tensed clause than in a finite, tensed clause (Engdahl 1983; Culicover 2001). It has been sometimes suggested that the finiteness (or whether a clause is tensed or not) contribute to the acceptability and processing complexity (Hofmeister and Sag 2010). Therefore, the contrast may suggest that the lower acceptability of (9b) is due to the processing complexity, i.e. the embedded clause in (9b) is too hard for the parser to posit a gap within it. However, an acceptability rating experiment that Phillips (2006) conducted indicates that when there is no additional gap in (7b), (7b) and (9b) are judged equally unacceptable. If the acceptability contrast between (8) and (9b) is solely due to the processing complexity attributed to the finiteness of the embedded clause, it is expected that (7b) should be more acceptable than (9b). Phillips (2006) further tested these subject island violations using self-paced movingwindow paradigm. Phillips takes advantage of the wh-verb plausibility paradigm (Traxler and Pickering 1996). It has been shown that the parser tries to complete a “filler–gap” dependency as soon as possible, which is sometimes called “active dependency formation” or “active gap filling” (Frazier, Clifton, and Randall 1983; Stowe 1986; Frazier and Flores D’Arcais 1989; Traxler and Pickering 1996; Phillips 2006; Omaki, Lau, White, Dakan, Apple, and Phillips 2015). Traxler and Pickering (1996) showed that when the parser encounters a wh-phrase, the parser tries to link the wh-phrase to the verb as soon as possible. As a result, when the verb is not semantically compatible with the wh-phrase, it causes a surprise effect which gives rise to the RT slowdown at the verb. Phillips holds the following assumptions: First, parasitic gaps are licensed within the subject island when the subject NP involves non-finite clauses. Second, the parser builds the sentence structure incrementally. Third, the parser launches active dependency formation upon encountering the wh-phrase, whereby the parser tries to link the wh-phrase to the gap (or the verb) as soon as possible. Fourth, the semantic incompatibility of the wh-phrase and the verb to which the parser attempts to link the wh-phrase gives rise to the RT slowdown at the verb position. Here the linking hypothesis is that the parser’s attempt to link the wh-phrase to the verb causes the RT slowdown associated with the semantic incompatibility between the wh-phrase and the verb. Thus, if the parser does not try to link the wh-phrase to the verb, meaning that it does not try to form the filler–verb dependency, the plausibility effect should not be observed. Taking advantages of this plausibility manipulation paradigm, Phillips (2006) tested the following paradigm: (10) a./b. The school superintendent learned which schools/which high school students the proposal to expand drastically and innovatively upon the current curriculum would overburden/motivate __ during the following semester. c./d. The school superintendent learned which schools/which high school students [NP the proposal [CP that expanded drastically and innovatively upon the current curriculum]] would overburden/motivate __ during the following semester.

self-paced reading

323

In this paradigm, the semantic plausibility and the finiteness of the clause embedded within the complex subject NP were manipulated as independent factors. If the parser attempts to link the wh-phrase to the verb within the subject NP, then the plausibility effect, indicated by the slower RT of the verb in the implausible conditions (10b) and (10d), is predicted. On the other hand, if the parser does not attempt to link the wh-phrase to the verb, then the plausibility effect is not predicted. Another important point in this paradigm is that the complex subject NP involving an infinite clause can host a parasitic gap if there is another gap in the main clause verb. Thus, parasitic gaps can be licensed in (10a) and (10b). On the other hand, in (10c) and (10d), where the complex subject NP contains finite clauses, parasitic gaps are not licensed. Thus, if the parser is sensitive to such grammatical constraints on parasitic gap licensing, it is possible that the parser attempts to form a wh-verb dependency across the subject island boundary in (10a) and (10b), but not in (10c) and (10d); thus, in this case, the plausibility effect is expected in (10a) and (10b), but not in (10c) and (10d). On the other hand, the processing-complexity approaches predict that the parser never tries to link the wh-phrase to the embedded verb regardless of the finiteness, because, whether finite or not, the gap in a complex subject NP is not acceptable when there is no additional gap in the matrix clause. Phillips (2006) observed the plausibility effect only in the comparison between (10a) and (10b): The verb in (10b), which is semantically incompatible with the wh-phrase, is read significantly slower than that in (10a). But no such difference is found in the comparison between (10c) and (10d). This result is predicted if the parser is sensitive to the grammatical licensing condition on parasitic gaps. In other words, this result suggests that the parser tries to link the wh-phrase to the verb within the complex subject NP only when such a link is grammatically sanctioned, i.e. when a parasitic gap can be licensed. This result, on the other hand, is not straightforwardly predicted from processing-complexity approaches to islands. This is so because extraction from complex subject NPs is unacceptable, and under the complexity approaches, this means that complex subject NPs are too complex a domain for the parser to posit a gap.

9.3.3 Backward binding and islands Kazanina et al. (2007) showed that when the parser encounters a pronoun, the parser launches an active antecedent search (a process akin to active dependency formation). Here, active antecedent search means that the parser tries to identify the antecedent and link the pronoun to the antecedent as soon as possible. Furthermore, the parser locates the antecedent only in grammatically sanctioned positions, i.e. the position which is not c-commanded by the pronoun. Kazanina et al. argue that the same mechanism of active dependency formation is working behind online cataphoric dependency formation and wh-gap dependency formation. In the previous literature, it has been noted that when the parser attempts to link a pronoun or a reflexive to its antecedent, and if the gender information of the pronoun

324

masaya yoshida

and the potential antecedent are not congruent (e.g., John1 hates herself 1 ), the word that signals the gender mismatch is read slower (Sturt 2003; van Gompel and Liversedge 2003). This effect is called the Gender Mismatch Effect (GMME). The RT slowdown is thus caused by the parser’s attempt to link the pronoun to the potential antecedent. When the parser tries to link the reflexive to the antecedent and when the gender specification of the reflexive is not compatible with that of the antecedent, the slower RT of the reflexive is observed. Thus, the linking hypothesis is that the RT slowdown associated with the gender mismatch is tied to the parser’s attempt to link the pronoun/reflexive to its antecedent (the dependency formation). Thus, if the parser does not attempt to link the pronoun/reflexive to the potential antecedent, the RT slowdown associated with the GMME is not expected. Assuming this linking hypothesis, Kazanina et al., tested the following paradigm in (11). All the examples in (11) involve the cataphoric pronoun, i.e., a pronoun that precedes the antecedent. In (11a) and (11b), the pronoun c-commands the potential antecedent young quarterback, which is stereotypically construed as male. On the other hand, in (11c) and (11d), the pronoun is embedded within a larger NP and thus does not c-command the potential antecedent. (11) a./b. He/She chatted amiably with some fans while the talented, young quarterback signed autographs for the kids ... c./d. His/Her managers chatted amiably with some fans while the talented, young quarterback signed autographs for the kids ... (Kazanina, Lau, Lieberman, Yoshida, and Phillips 2007: experiment 3) What Kazanina et al. try to test is whether or not the active antecedent search process is restricted by structural constraints. They contend that if the parser’s antecedent search and dependency formation process is restricted by grammatical constraints, such as Binding Condition C (BCC), then we expect to observe the RT slowdown associated with GMME in the comparison between (11c) and (11d), but not in the comparison between (11a) and (11b). This is so because, in (11a) and (11b), the pronoun c-commands the NP young quarterback and thus, linking the pronoun to the young quarterback results in a BCC violation. On the other hand, in (11c) and (11d), the pronoun does not c-command young quarterback, and thus linking the pronoun to young quarterback does not result in a BCC violation. On the other hand, if the antecedent search and dependency-formation process is not sensitive to grammatical structure, and the parser tries to locate the antecedent of the pronoun in the position linearly close to the pronoun, then the parser should try to link the pronoun to the closest potential antecedent whether it is c-commanded by the pronoun or not. Thus, in this case, the RT slowdown associated with GMME is expected both in the comparison between (11c) and (11d) as well as in the comparison between (11a) and (11b). Put differently, if it is the case that the parser does not build ungrammatical structure, the parser does not try to locate the antecedent for the pronoun in the position which is c-commanded by the pronoun. On the other hand, if the parser’s search for the antecedent does not respect grammatical structure, then the parser could link the pronoun to the NP that is linearly the closest to the pronoun. In this view, the RT slowdown associated with

self-paced reading

325

GMME is linked to the parser’s attempt to link the pronoun to an NP which is a potential antecedent for the pronoun. In a word-by-word self-paced reading experiment, Kazanina et al. indeed found such reading time contrast between (11a) vs. (11b) on the one hand and (11c) vs. (11d) on the other, i.e. GMME was absent in (11a) vs. (11b) comparison but was observed in (11c) vs. (11d) comparison. These results suggest that online structure-building respects grammatical constraints such BCC, and that the parser builds hierarchical structure that encodes configurational relations such as c-command. Taking advantage of Kazanina et al.’s findings, Yoshida et al. (2014) investigated the online cataphoric dependency formation in the context of islands (Ross 1967; Chomsky 1977; Huang 1982; Chomsky 1986; Rizzi 1990; Lasnik and Saito 1992; Uriagereka 1999; Hornstein et al. 2007). As discussed earlier, some studies argue that the island effect should be understood as a processing phenomenon rather than a grammatical phenomenon. In other words, these studies argue that the wh-gap dependency that spans across island boundaries is grammatical. The unacceptability that is caused by island-crossing wh-gap dependencies is due to general cognitive considerations such as processing costs and processing overload (Kluender and Kutas 1993; Hawkins 1999; Hofmeister and Sag 2010). Under the processing-complexity view, island effects are understood as the processing difficulty resulting from the combination of the processing cost associated with maintaining the wh-phrase in memory and the processing cost associated with the intervening elements. Elements that constitute the island boundary are typically the elements that incur a heavy processing cost. Yoshida et al. (2014) argue that if the island effect is understood as the processing difficulty caused by the processing cost of the element held in memory and the intervening elements, any dependency that spans across an island boundary should give rise to an effect similar to island effects. Thus, for example, cataphoric dependency formation, in which the pronoun must be maintained in working memory until it is linked to its antecedent (this is why the active antecedent search is launched), could be blocked by an island boundary because holding the pronoun in the memory and processing the costly elements at the island boundary at the same time could be too costly, exactly like in the processing of wh-filler–gap dependency formation.9 On the other hand, if islands are understood as grammatical constraints on filler–gap dependency but not pronoun– antecedent dependency (Ross 1967; Hornstein et al. 2007), then island-like effects are not expected in the online cataphoric dependency formation. In other words, if the parser is sensitive to the grammatical and structural distinction between the filler–gap dependency and the pronoun–antecedent dependency, then the parser does not try to locate the gap within an island domain because locating a gap within an island violates grammatical constraints; but it could locate the antecedent of the pronoun within an island domain because forming a pronoun–antecedent dependency across island 9

Recent studies by Keshev and Meltzer-Asscher (in press) report binding dependency formation incurs certain difficulty, which is reflected in the acceptability rating.

326

masaya yoshida

boundary does not violate any grammatical constraints. Yoshida et al. (2014) tested the following paradigm in a word-by-word self-paced moving window paradigm to test these points. (12) a./b. His/Her managers revealed that [NP the studio [CP that notified Jeffrey about the new film]] selected a novel for the script ... b./c. He/She revealed that [NP the studio [CP that notified Jeffrey about the new film]] selected a novel for the script ... In (12a) and (12b), the pronoun does not c-command the potential antecedent Jeffrey. On the other hand, in (12c) and (12d), the pronoun c-commands the potential antecedent. Furthermore, the potential antecedent Jeffrey is embedded within a complex NP island with a definite NP as the head of the relative clause, which is a costly element. Because the pronoun does not c-command Jeffrey in (12a) and (12b), linking the pronoun to Jeffrey does not violate BCC, contrary to (12c) and (12d). If the cataphoric dependency can be formed in (12a) and (12b), then we expect a slower RT at the Jeffrey region in gender-mismatching conditions such as (12b) than in gender matching conditions such as (12a). In (12c) and (12d), due to the BCC violation configuration, the parser should not attempt the dependency formation, and thus GMME is expected. On the other hand, if the cataphoric dependency formation is blocked by island, the GMME in (12a) and (12b) is not expected. Yoshida et al. (2014) observed GMME in (12a) and (12b) but not in (12c) and (12d), suggesting that the parser attempted cataphoric dependency formation across an island boundary. Yoshida et al. contend that this effect is not expected straightforwardly from processing complexity accounts. However, this result readily follows from the grammatical analysis of islands. Grammatically, the wh-gap dependency and the pronoun–antecedent dependency are different types of dependencies. The former is formed by movement, but the latter is not. Thus, they have different grammatical structures (representation and derivation) and subject to different grammatical constraints (islands and binding conditions). Despite that very similar processing mechanisms (the active dependency formation) are working behind the online processing of these two dependencies, the wh-gap formation is blocked by islands but the pronoun–antecedent dependency formation is not. From the processing point of view, cataphoric dependency formation and whfiller–gap dependency formation employ very similar mechanisms (active dependency formation), and are thus subject to similar processing complexity considerations. However, from a grammatical point of view, they are grammatically different dependencies. If island effects are due to processing complexity, both wh-filler-gap dependency formation and cataphoric dependency formation should be restricted by islands because spanning a dependency over an island boundary should incur processing costs. The observation that online cataphoric dependency formation is not blocked by island boundaries thus supports the grammatical approach to islands, i.e. islands put grammatical constraints on filler–gap dependencies but not on pronoun–antecedent dependencies.

self-paced reading

327

9.4 Summary

..........................................................................................................................

What can the study of online sentence processing tell us about syntax? The studies reviewed so far suggest roughly two possibilities. The first case is demonstrated in Boland’s studies of the distinction between arguments and adjuncts. When traditional methodologies in theoretical syntactic studies cannot clearly tell the grammatical status of some items, such as the argument/adjunct distinction, online experimental methodologies, such as self-paced reading studies, can give us a better clue. What Boland’s studies show is that if we have a good understanding of the structure-building and linking hypotheses, we can infer syntactic structures using RT data. It has been independently shown in these studies that lexical frequency effects are reflected in the RT slowdown in SPR experiments. Additionally, she assumes that adjuncts are not lexically specified by the verb. The combination of these two assumptions predicts the difference in the processing of arguments and adjuncts. Another possibility is concerned with the nature of linguistic generalizations. As we have seen, there are two classes of competing explanations of island effects, namely the grammatical explanation and the processing complexity-based explanation. In this case, we can formulate the prediction of each approach in terms of RT measures. As we have seen, we can formulate the prediction of the grammatical approaches if we further formulate the linking hypothesis (e.g. the dependency-formation process leads to the plausibility effects and GMME, which are reflected in the RT slowdown). Processing-complexity approaches also make straightforward predictions (e.g. no plausibility effects or GMME within the island domain). Thus, in a situation where there are grammar-based explanations and processing-based explanations, online sentenceprocessing experiments are particularly informative. As has been pointed out earlier, syntactic theories are not theories of sentence comprehension or production. The study of syntax and the study of online sentenceprocessing have different goals. The study of syntax is concerned with the wellformedness of the structure of sentences (i.e. what is the possible and impossible structure of sentence), and the study of online sentence processing (at least part of it) is concerned with the description of the mechanism working behind online reading, which presupposes certain syntactic representations that the mechanism refers to. Assuming certain linking hypotheses, theories of online processing predict RT differences, but theories of syntax do not. In this particular sense, online sentence-processing studies involving SPR (or other online) experiments are not so informative about the study of syntactic theorizing (Boland 2005). In general, I hope to have shown that it is essential to formulate a theory of representation, a theory of structure-building, and a linking hypothesis that maps hypothetical behavior of the structure-building mechanism to experimentally observable measures in order to investigate the mechanism of sentence processing and gain insights into syntactic theories.

328

masaya yoshida

References Aaron, Doris, and Hollis Shapiro Scarborough. 1976. Performance theories for sentence coding: Some quantitative evidence. Journal of Experimental Psychology: Human Perception and Performance 2: 56–70. Abney, Steven. 1989. A computational model of human parsing. Journal of Psycholinguistic Research 18: 129–144. Abney, Steven P., and Mark Johnson. 1991. Memory requirements and local ambiguities of parsing strategies. Journal of Psycholinguistic Research 20: 233–250. Berwick, Robert C., and Amy S. Weinberg. 1984. The grammatical basis of linguistic performance: language use and acquisition Cambridge, MA: MIT Press. Bever, Thomas. 1988. The psychological reality of grammar: A student’s-eye view of cognitive science. In William Hirst (ed.), The making of cognitive science: Essays in honor of George Miller, 112–142. Cambridge: Cambridge University Press. Bever, Thomas G., and Brian McElree. 1988. Empty categories access their antecedents during comprehension. Linguistic Analysis 9: 35–43. Boland, Julie E. 2005. Cognitive mechanisms and syntactic theory. In Anne Cutler (ed.), Twenty-first century psycholinguistics: Four cornerstones, 23–42. Hillsdale, NJ: Erlbaum. Boland, J. E., Lewis, R., and Blodgett, A. 2004. Distinguishing generation and selection of modifier attachments: Implications for lexicalist parsing and competition models. Unpublished manuscript. Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press. Chomsky, Noam. 1977. On wh-movement. In Peter Culicover, Thomas Wasow, and Adrian Akmajian (eds), Formal syntax, 71–132. New York: Academic Press. Chomsky, Noam. 1986. Barriers. Cambridge, MA: MIT Press. Crocker, Matthew. 2005. Rational models of comprehension: addressing the performance paradox. In Anne Cutler (ed.), Twenty-first century psycholinguistics: Four cornerstones, 363–80. Hillsdale, NJ: Erlbaum. Crocker, Matthew W. 1996. Computational psycholinguistics: An interdisciplinary approach to the study of language. Dordrecht: Kluwer Academic. Crocker, Matthew W. 1999. Mechanisms for sentence processing. In Simon Garrod and Martin J. Pickering (eds), Language processing, 191–232. Brighton: Psychology Press. Crocker, Matthew, and Thorsten Brants. 2000. Wide-coverage probabilistic sentence processing. Journal of Psycholinguistic Research 29: 647–669. Culicover, Peter W. 2001. Parasitic gaps: A history. In Peter W. Culicover and Paul M. Postal (eds), Parasitic gaps, 3–68. Cambridge, MA: MIT Press. Engdahl, Elisabet. 1983. Parasitic gaps. Linguistics and Philosophy 6: 5–34. Ferreira, Fernanda, and John M. Henderson. 1990. Use of verb information in syntactic parsing: evidence from eye movements and word-by-word self-paced reading. Journal of Experimental Psychology: Learning, Memory and Cognition 16: 555–568. Fillenbaum, Samuel. 1971. Psycholinguistics. Annual Review of Psychology 22: 251–308. Fodor, Jerry A., Thomas G. Bever, and Merrill F. Garrett. 1974. The psychology of language: an introduction to psycholinguistics and generative grammar. New York: McGraw Hill. Frazier, Lyn. 1985. Syntactic complexity. In David Dowty, Lauri Kartunnen, and Arnold Zwicky (eds), Natural language parsing, 128–189. Cambridge: Cambridge University Press. Frazier, Lyn. 1987. Sentence processing: A tutorial review. In Max Coltheart (ed.), Attention and performance XII: The psychology of reading, 601–681. Hillsdale, NJ: Erlbaum.

self-paced reading

329

Frazier, Lyn. 1990. Exploring the architecture of the language-processing system. In Gerry Altmann (ed.), Cognitive models of speech processing: Psycholinguistics and computational linguistics, 409–433. Cambridge, MA: MIT Press. Frazier, Lyn, and Janet Dean Fodor. 1978. The sausage machine: a new two-stage parsing model. Cognition 2: 291–325. Frazier, Lyn, Charles Clifton, and Janet Randall. 1983. Filling gaps: Decision principles and structure in sentence comprehension. Cognition 13: 187–222. Frazier, Lyn. and Giovanni B. Flores D’Arcais. 1989. Filler driven parsing: A study of gap filling in Dutch. Journal of Memory and Language 28: 331–334. Frazier, Lyn, and Keith Rayner. 1982. Making and correcting errors during sentence conprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology 14: 178–210. Gibson, Edward. 1991. A computational theory of human linguistic processing: Memory limitations and processing breakdown. Pittsburgh, PA: Carnegie Mellon University. Gibson, Edward. 1998. Linguistic complexity: locality of syntactic dependencies. Cognition 68: 1–76. Gibson, Edward. 2000. The dependency locality theory: A distance-based theory of linguistic complexity. In Yasushi Miyashita, Alec Marantz, and Wayne O’Neil (eds), Image, language, brain, 95–126. Cambridge, MA: MIT Press. Gibson, Edward, and Tessa Warren. 2004. Reading time evidence for intermediate linguistic structure in long-distance dependencies. Syntax 7: 55–78. Hackl, Martin, Jorie Koster-Hale, and Jason Varvoutis. 2012. Quantification and ACD: Evidence from real time sentence processing. Journal of Semantics 29: 145–206. Hale, John. 2003. The information conveyed by words in sentences. Journal of Psycholinguistic Research 32: 101–124. Hale, John. 2014. Automaton theories of human sentence comprehension. Chicago: University of Chicago Press. Hale, John. 2016. Information-theoretical complexity metrics. Language and Linguistics Compass 10(9): 397–412. Hawkins, John A. 1999. Processing complexity and filler-gap dependencies across grammars. Language 75: 244–285. Hofmeister, Philip, and Ivan A. Sag. 2010. Cognitive constraints and island effects. Language 86: 366–415. Hornstein, Norbert, Howard Lasnik, and Juan Uriagereka. 2007. The dynamics of islands: Speculations on the locality of movement. Linguistic Analysis 33: 149–175. Huang, Cheng-Teh James. 1982. Logical relations in Chinese and the theory of grammar. PhD dissertation, Massachusetts Institute of Technology. Huang, Yujing, Laine Stranahan, and Jesse Snedeker. 2016. The unaccusative hypothesis revisited. Paper presented at the 22nd Conference on Architectures and Mechanisms for Language Processing. Jurafsky, Daniel. 1996. A probabilistic model of lexical and syntactic access and disambiguation. Cognitive Science 20: 137–194. Just, Marcel Adam, Patricia A. Carpenter, and Jacqueline D. Woolley. 1982. Paradigms and processes in reading comprehension. Journal of Experimental Psychology 111: 228–238. Kazanina, Nina, Ellen Lau, Moti Lieberman, Masaya Yoshida, and Colin Phillips. 2007. The effect of syntactic constraints on the processing of backwards anaphora. Journal of Memory and Language 56: 384–409.

330

masaya yoshida

Keshev, Maayan, and Aya Meltzer-Asscher. 2019. A processing-based account of subliminal wh-island effects. Natural Language and Linguistic Theory 37: 621–657. Kim, Christina, Gregory Kobele, Jeffrey Runner, and John T. Hale. 2011. The acceptability cline in VP ellipsis. Syntax 14: 318–354. Kluender, Robert, and Marta Kutas. 1993. Subjacency as a processing phenomenon. Language and Cognitive Processes 8: 573–633. Koizumi, Masatoshi, and Katsuo Tamaoka. 2004. Cognitive processing of Japanese sentences with ditransitive verbs. Gengo Kenkyu 125: 173–190. Kotek, Hadas, and Martin Hackl. 2013. An experimental investigation of interrogative syntax/semantics. In Maria Aloni, Michael Franke, and Floris Roelofsen (eds), Proceedings of the 2013 Amsterdam Colloquium: http://events.illc.uva.nl/ AC/AC2013/uploaded_files/inlineitem/19_Kotek_Hackl.pdf. Larson, Richard K. 1988. On the double object construction. Linguistic Inquiry 19: 335–391. Lasnik, Howard, and Mamoru Saito. 1992. Move α: Conditions on its application and output. Cambridge, MA: MIT Press. Levelt, W. J. M. 1974. Formal grammars in linguistics and psycholinguistic. The Hague: Mouton. Lewis, Shevaun, and Colin Phillips. 2015. Aligning grammatical theories and language processing models. Journal of Psycholinguistic Research 44: 27–46. Marcus, Mitchell P. 1980. A theory of syntactic recognition for natural language. Cambridge, MA: MIT Press. McErlee, Brian, and Thomas G. Bever. 1989. The psychological reality of linguistically defined gaps. Journal of Psycholinguistic Research 18: 21–35. McRae, Ken, Michael J. Spivey-Knowlton, and Michael K. Tanenhaus. 1998. Modeling the influence of thematic fit (and other constraints) in on-line sentence comprehension. Journal of Memory and Language 38: 283–312. Mitchell, Don C., and David W. Green. 1978. The effects of context and content on immediate processing in reading. Quarterly Journal of Experimental Psychology 30: 609–636. Omaki, Akira, Ellen Lau, Imogen Davidson White, Myles L. Dakan, Aaron Apple, and Colin Phillips. 2015. Hyper-active gap filling. Frontiers in Psychology 6: article 384. Phillips, Colin. 1996. Order and structure. Cambridge, MA: Massachusetts Institute of Technology. Phillips, Colin. 2006. The real-time status of island phenomena. Language 82: 795–823. Phillips, Colin. 2013a. Some arguments and nonarguments for reductionist accounts of syntactic phenomena. Language and Cognitive Process 28: 156–187. Phillips, Colin. 2013b. On the nature of island constraints I: Language processing and reductionist accounts. In Jon Sprouse and Norbert Hornstein (eds), Experimental syntax and island effects, 64–108. Cambridge: Cambridge University Press. Phillips, Colin, and Howard Lasnik. 2003. Linguistics and empirical evidence. Trends in Cognitive Sciences 7: 61–62. Pickering, Martin. J., and Guy D. Barry. 1991. Sentence processing without empty categories. Language and Cognitive Processes 8: 229–259. Pickering, Martin J., and Matthew J. Traxler. 2000. Parsing and incremental understanding during reading. In Matthew W. Crocker, Martin Pickering and Jr. Charles Clifton (eds), Architectures and mechanisms for language processing, 238–258. Cambridge: Cambridge University Press. Rizzi, Luigi. 1990. Relativized minimality Cambridge, MA: MIT Press. Ross, John Robert. 1967. Constraints on variables in syntax. Cambridge, MA: MIT Press.

self-paced reading

331

Schneider, David, and Colin Phillips. 2001. Grammatical search and reanalysis. Journal of Memory and Language 45: 308–336. Schütze, Carson T., and Edward Gibson. 1999. Argumenthood and English prepositional phrase attachment *1, *2, *3. Journal of Memory and Language 40: 409. Slattery, Timothy J., Patrick Sturt, Kiel Christianson, Masaya Yoshida, and Fernanda Ferreira. 2013. Lingering misinterpretations of garden path sentences arise from competing syntactic representations. Journal of Memory and Language 69: 104–120. Sprouse, Jon. 2008. The differential sensitivity of acceptability to processing effects. Linguistic Inquiry 39: 686–694. Sprouse, Jon. 2011. A test of the cognitive assumptions of magnitude estimation: Commutativity does not hold for acceptability judgments. Language 87: 274–288. Sprouse, Jon, and Diogo Almeida. 2012. Assessing the reliability of textbook data in syntax: Adger’s core syntax. Journal of Linguistics 48: 609–652. Sprouse, Jon, and Diogo Almeida. 2013. The empirical status of data in syntax: A reply to Gibson and Fedorenko. Language and Cognitive Processes 28: 222–228. Sprouse, Jon, and Norbert Hornstein (eds) 2013. Experimental syntax and island effects. Cambridge: Cambridge University Press. Sprouse, Jon, Matthew Wagers, and Colin Phillips. 2012. Working-memory capacity and island effects: A reminder of the issues and the facts. Language 88: 401–407. Sprouse, Jon, Carson Schütze, and Diogo Almeida. 2013. A comparison of informal and formal acceptability judgments using a random sample from Linguistic Inquiry 2001–2010. Lingua 134: 219–248. Stowe, Laurie A. 1986. Parsing wh-constructions: Evidence for on-line gap location. Language and Cognitive Processes 3: 227–245. Sturt, Patrick. 1997. Syntactic reanalysis in human language processing. PhD dissertation. University of Edinburgh Sturt, Patrick. 2003. The time-course of the application of binding constraints in reference resolution. Journal of Memory and Language 48: 542–562. Sturt, Patrick, and Matthew W. Crocker. 1996. Monotonic syntactic processing: a crosslinguistic study of attachment and reanalysis. Language and Cognitive Processes 11: 449–494. Sturt, Patrick, Martin J. Pickering, and Matthew W. Crocker. 1999. Structural change and reanalysis difficulty in language comprehension. Journal of Memory and Language 40(1): 136–158. Sturt, Patrick, Martin Pickering, Christoph Scheepers, and Matthew W. Crocker. 2001. The preservation of structure in language comprehension: Is reanalysis the last resort? Journal of Memory and Language 45: 283–307. Tanenhaus, Michael K., James S. Magnuson, Dalphine Dahan, and Craig Chambers. 2000. Eye movements and lexical access in spoken-language comprehension: Evaluating linking hypothesis between fixations and linguistic processing. Journal of Psycholinguistic Research 29: 557–580. Tanenhaus, Michael K., Michael J. Spivey-Knowlton, and Joy E. Hnna. 2000. Modeling thematic and discourse context effects with a multiple constraints approach: Implications for the architecture of the language comprehension system. In Matthew Crocker, Martin Pickering, and Charles Clifton (eds), Architectures and mechanisms for language processing, 90–118. Cambridge: Cambridge University Press.

332

masaya yoshida

Townsend, David J., and Thomas G. Bever. 2001. Sentence comprehension: The integration of habits and rules. Cambridge, MA: MIT Press. Traxler, Matthew. 2012. Introduction to psycholinguistics: Understanding language science. Oxford: Wiley–Blackwell. Traxler, Matthew J., and Martin. J. Pickering. 1996. Plausibility and the processing of unbounded dependencies: An eye-tracking study. Journal of Memory and Language 35: 454–475. Uriagereka, Juan. 1999. Multiple spell-out. In David Samuel Epstein and Norbert Hornstein (eds), Working minimalism, 251–282. Cambridge, MA: MIT Press. van Gompel, Roger P. G., and Simon P. Liversedge. 2003. The influence of morphological information on cataphoric pronoun assignment. Journal of Experimental Psychology: Learning, Memory, and Cognition 29: 128–139. Wagers, Matthew W., and Colin Phillips. 2009. Multiple dependencies and the role of the grammar in real-time comprehension. Journal of Linguistics 45: 395–433. Wagers, Matthew W., and Colin Phillips. 2014. Going the distance: Memory and control processes in active dependency construction. Quarterly Journal of Experimental Psychology 67: 1274–1304. Wanner, Eric. 1974. On remembering, forgetting, and understanding sentences: A study of the deep structure hypothesis. The Hague: Mouton. Wanner, Eric. 1988. Psychology and linguistics in the sixties. In William Hirst (ed.), The making of cognitive science: Essays in honor of George A. Miller, 143–152. Cambridge: Cambridge University Press. Warren, Tessa, and Edward Gibson. 2002. The influence of referential processing on sentence complexity. Cognition 85: 79–112. Yoshida, Masaya, Nina Kazanina, Leticia Pablos, and Patrick Sturt. 2014. On the origin of islands. Language, Cognition, and Neuroscience 29(7): 761–770.

c ha p t e r 1 0 ...........................................................................................................

eye-tracking and e x p e r i m e n ta l s y n ta x ...........................................................................................................

dave kush and brian dillon

10.1 Introduction

..........................................................................................................................

In this chapter we consider some ways that the study of reading behavior using eyetracking can contribute to experimental syntax. We additionally aim to provide a highlevel introduction to the technique and review some of the key theoretical background and empirical results in the reading literature. We focus on the eye-tracking-whilereading technique, rather than other techniques that make use of eye-tracking such as the visual world (for an excellent introduction to the use of the visual world paradigm, we refer the reader to Ferreira and Henderson 2004; for recent examples of its application to syntactic theory, see: Kaiser, Runner, Sussman, and Tanenhaus 2009; Runner and Head 2014). It is important to delimit at the outset what we take “experimental syntax” to mean. We adopt a relatively traditional view of what syntax, or syntactic theory, encompasses: Syntacticians are interested in characterizing the nature of abstract syntactic representations, generally stated at a meta-behavioral level of analysis (e.g. Marr’s 1982 computational level). That is, we agree with Chomsky (e.g. 1965) that syntax is about giving a characterization of human linguistic competence. Traditional syntactic inquiry relies on informal acceptability judgments as the primary method for investigating questions of linguistic competence. Experimental syntax corresponds to the idea that formal experimental design and quantitative analysis of acceptability data can provide an added benefit above and beyond informal methods. Syntacticians may wish to use laboratory techniques as the arbiters of contentious debates that traditional one-off judgment experiments have left unresolved (cf. Phillips 2010), or they may wish to provide quantitative evidence in support of their empirical claims (see discussion in Gibson and Fedorenko 2010; 2013; Sprouse and Almeida 2013;

334

dave kush and brian dillon

Phillips 2010). They may also wish to use laboratory methods to provide convergent evidence for an analysis or construct. Recently grammatically oriented psycholinguistic research has continued to gain prominence, and the empirical basis of experimental syntax has broadened to include a range of other experimental techniques, including reading measures like eye-tracking. These techniques provide a powerful new tool to investigate the nature of grammatical representations, by investigating how comprehenders compute grammatical relationships incrementally during sentence comprehension. For this reason, researchers interested in using eye-tracking data in the service of experimental syntax must engage questions of linguistic performance. In order to draw theoretically meaningful conclusions from eye-tracking data, it is critical to have a working hypotheses about (i) how the incremental parser operates and how this impacts reading behavior, and (ii) how to link parser operations to grammatical constructs. Thus while eye-tracking offers—in our view—a technique that promises to shed new light on various issues in syntactic theory, it does impose additional theoretical overhead for the analyst, who must balance commitments both to the nature of the syntactic representations and to the processing mechanisms that assemble those representations in real-time (for similar discussion, see Frazier 2008; Lewis and Phillips 2015; Troztke, Bader, and Frazier 2013). In what follows we first offer a general—although by no means exhaustive—introduction to the study of eye movements during reading, as well as the eye-tracking-while-reading technique as applied to sentence comprehension. We then turn to a discussion of the theoretical issues that arise when trying to link eyemovement behavior to grammatical models. We close by offering in-depth case studies that in our view highlight ways in which eye-tracking data can contribute to syntactic theory.

10.2 Eye-tracking while reading and syntactic processing

..........................................................................................................................

10.2.1 Eye-tracking: The basic model A basic working knowledge of the psychology of reading is crucial for understanding the eye-tracking method. As the name suggests, the primary behavior of interest in an eye-tracking-while-reading experiment is the pattern of eye movements across a span of text during normal reading. In educated and literate adults, reading is an easy and extremely well-practiced behavior. In this respect, eye-tracking offers an ecologically valid counterpart to other psycholinguistic methodologies that rely on visual presentation of linguistic stimuli, such as self-paced reading (Just, Carpenter, and Woolley 1982) or rapid serial visual presentation (Potter 1984). This feature makes it an attractive methodology for studying language comprehension using visually presented

eye-tracking and experimental syntax

335

stimuli: The experimental task need not be learned on the spot and should therefore be less susceptible to contamination by “test-taking” strategies or other task dependent behavior. Despite the apparent simplicity, the processes necessary to recognize and interpret text during this routine behavior are cognitively complex. Readers must engage visual processing to identify word forms, link them to appropriate phonological/orthographic codes, access the lexicon and integrate those words into an evolving representation of the sentence and discourse context, all the while continuously engaging in oculomotor programming routines to keep the eyes moving through a text at an appropriate pace (Huey 1908; Rayner 1998; Rayner and Pollatsek 1989; Schotter and Rayner 2015). Because reading involves coordinating these diverse cognitive processes, the study of reading behavior has engaged a similarly diverse community of cognitive scientists and psychologists (Rayner 1978; 1998; see Pollatsek and Treiman 2015 for a broad overview of research in this area). The upshot is a body of research that provides a wealth of empirical data on how reading proceeds for practiced adult readers, but also a firm theoretical model of the reading task itself. This task model in turn provides an important basis for understanding how to link eye movements to higher order linguistic processing. Having such a relatively well-understood task model sets eye-tracking-while-reading apart from other reading methodologies like self-paced reading. The subjective impression that reading proceeds in a smooth, continuous fashion is misleading. In reality, the eyes move across text by alternating between short periods of relative stability on a single location in the text (fixations) and rapid, ballistic movements between those fixations (saccades). Fixations generally last between 150ms and 500ms, averaging around 200–250ms (Rayner 1978; Schotter and Rayner 2015). The saccades between fixations are much shorter, averaging 20–35 milliseconds depending on the length of the movement (Schotter and Rayner, 2015). During a fixation, the visual field can be subdivided into areas of differing visual acuity. The fovea is the region with greatest visual acuity, and spans approximately 2◦ in the center of visual attention. This is followed by the parafovea, which has diminished visual acuity and extends to approximately 5◦ on either side of a fixation; beyond the parafovea is the periphery (Rayner 1998). Normal reading, then, can be thought of as a series of fixations that uptake information about the text as a set of more or less static images, each partitioned into distinct regions of visual resolution. An important theoretical question is what, exactly, determines the pattern of fixations a reader will make across a sentence or text. One critical—but sometimes implicit—hypothesis is that the eyes generally fixate material that is currently in attention; this is referred to as the “eye–mind hypothesis” (Just and Carpenter 1980, 1984). At present there is a broad consensus that this is more or less correct, though the link between fixation and attention in reading is not strictly one-to-one. Current understanding of the link between attention and fixations in reading is largely built on the work of the late Keith Rayner and colleagues (Clifton, Ferreira, Henderson, Inhoff, Liversedge, Reichle, and Schotter 2016). This link is expressed perhaps most explicitly in the influential E-Z Reader model of reading, a formal model of reading behavior that is

336

dave kush and brian dillon

built on decades worth of empirical research (Reichle, Pollatsek, Fischer, and Rayner 1998; Rayner, Ashby, Pollatsek, and Reichle 2004; Reichle, Warren, and McConnell 2009). E-Z Reader posits that reading is a serial, attentionally driven process: A fixation serves the purpose of loading a word in the text into attention, which then feeds lexical access routines as well as higher-order syntactic and semantic integration processes. In addition to supporting lexical access, early processing stages (the familiarity check) serve the additional purpose of allowing the system to program and execute the next saccade. These claims are not universally endorsed; there exist competitor models that differ in the specific claims they make about this process. For example, the SWIFT reading model rejects E-Z Reader’s claim that attention is directed serially and sequentially from word to word in favor of an attention gradient that can span multiple words in the text, allowing readers to process several words in parallel (Engbert, Nuthmann, Richter, and Kliegl 2005). Despite these differences, there is broad consensus that linguistic processing largely determines the duration and pattern of fixations in the text (but cf. McConkie and Yang 2003). Much research has focused on lexical processing in reading. Research has shown, for example, that fixation durations are reliably longer as a function of a word’s frequency (Rayner and Duffy 1986), its length (Juhasz, White, Liversedge, and Rayner 2008), and its cloze predictability in context (Ehrlich and Rayner 1981; Staub 2015); these are all variables thought to influence how readily a word can be accessed in the lexicon. Indeed, the effect of these variables does appear to be due to covert lexical processing, rather than any low-level visual processes; in one striking set of experiments, it was found that frequency effects on fixation duration obtain even when the printed word disappears shortly after a reader fixates it (Liversedge, Rayner, White, VergilinoPerez, Findlay, and Kentridge 2004; Rayner, Liversedge, White, and Verginilino-Perez 2003). Similarly, readers spend more time fixating ambiguous words than unambiguous words when an ambiguous word has roughly equally available meanings (Duffy, Morris, and Rayner 1988; Rayner and Duffy 1986). This finding suggests that covert cognitive processes—lexical competition, in this instance—are reflected in longer fixation durations during normal reading. Overall, there is good empirical support for the view that ease of lexical access is an important determinant of how long readers fixate a word in text. This in turn supports the widely held view that “ongoing lexical access [is] the ‘engine’ that ‘drives’ the eyes forward” (Reichle et al. 2009: 4). One notable exception is the landing position for a saccade, which has been argued to reflect purely physical features of the text (but cf. Bicknell, Higgins, Levy, and Rayner 2013). It is not just lexical processing that drives the pattern of fixations across a text. Difficulty in post-lexical processing, such as syntactic integration or integration into a discourse context, is reflected in eye movements as well (see Boland 2004; Clifton, Staub, and Rayner 2007 for helpful reviews). In a seminal study, Frazier and Rayner (1982) showed that reading is significantly disrupted on the underlined regions of the gardenpath sentences in (1). First fixation times in the region increased, as did the probability of making a backwards saccade or regression in the text.

eye-tracking and experimental syntax (1)

337

a. Since Jay always jogs a mile and a half seems like a short distance to him. b. The lawyers think his second wife will claim the entire inheritance belongs to her.

The authors reasoned that the disruption reflected parser failure and the cost of syntactic reanalysis stemming from syntactic ambiguity. An incremental parser that builds syntactic representations left to right and word by word faces choice when analyzing the NPs a mile and a half and the entire inheritance in (1a) and (1b), respectively. Before the underlined regions are read, two analyses are possible: The NPs could either be the direct object of the preceding verbs jogs and claim (2a), or they could be the subjects of a new clause (2b). (2)

a. [… [VP jogs [NP a mile]] / [… [VP claim [NP the entire inheritance]] b. [… [VP jogs]] [CP [NP a mile] / [… [VP claim [CP [NP the entire inheritance] …]

According to Frazier’s (1978) Garden Path theory, the parser is strictly serial, so it must choose a single analysis to commit to. General parsing principles should lead the parser to prefer the direct object analysis in both cases (the principle of Late Closure in 1a) and (Minimal Attachment in 1b). Subsequently reading the underlined material in both examples disambiguates the parse to the dispreferred separate-clause analysis, which necessitates costly revision and reanalysis. These results provide evidence that the parser incrementally makes syntactic commitments that correspond to representational distinctions in our formal theory, and that the process of making and managing these commitments is reflected in the eye-movement record. Since this seminal work, there have been well over 100 studies that demonstrate that “garden-pathing” of this sort reliably impedes a reader’s progress through the text in eye-tracking-while-reading experiments (Clifton et al. 2007). Moreover, it is not only the resolution of a syntactic ambiguity of this sort that can create an increase in fixation durations and regressive eye movements. The recognition of syntactic and semantic anomalies has a similarly deleterious effect on reading. For example, Pearlmutter, Garnsey, and Bock (1999) and Deutsch and Bentin (2001) provided early demonstrations that fixation durations, as well as the probability of making a backwards regression, increased when readers encountered a number agreement error on the verb in English, or a gender agreement error in Hebrew (respectively). In a similar vein, Rayner, Warren, Juhasz, and Liversedge (2004) showed that the various types of implausibility in (3) can impede reading: (3)

a. The man used a strainer to drain the thin spaghetti … b. The man used a blow-dryer to dry the thin spaghetti … c. The man used a photo to blackmail the thin spaghetti …

(3b) and (3c) present two different types of implausibility: (3b) is implausible in light of our world knowledge, whereas (3c) is impossible and arguably reflects a selectional restriction that blackmail places on its object (i.e. that it must be animate). Interestingly,

338

dave kush and brian dillon

these different types of implausibility impacted fixations on the critical word differentially: the anomaly in (3c) was registered from the very first fixation on spaghetti, unlike in (3b), and readers regressed more frequently in (3c) than (3b). To the extent that these two types of implausibility play out differently in the eye movement record, there is a plausible argument to be made that these anomalies have different cognitive statuses. Early versions of E-Z Reader focused primarily on modeling the role of lexical access in reading; but more recent versions have integrated explicit, postlexical processing stages into the model in order to capture effects of syntactic or semantic integration like those described above (Reichle, Warren, and McConnell 2009). Although this model remains fairly agnostic about the nature of the parsing and interpretation events that occur in this postlexical processing stage, one strand of current research seeks to build more explicit links between formal models of parsing and models of eye movements (e.g. Engelmann, Vasishth, Engbert, and Kliegl 2013; Vasishth, Bruessow, Lewis, and Drenhaus 2008; Vasishth, von der Malsburg, and Engelmann 2013). In our view, further research on the link between parsing models and eye movements is of critical importance for the use of eye-movement measures in experimental syntax. As we discuss below, the degree to which eye-tracking methods will prove informative for syntactic theory depends on making explicit the links between the grammatical model and the parsing model, and between the parsing model and the eye-movement model. Despite the widespread recognition that syntactic and semantic processes are reflected in the eye-movement record, we believe that much work remains to be done on linking parsing models and eye-tracking models.

10.2.2 Eye-tracking: The practice We now turn to practical questions of how an eye-tracking-while-reading experiment proceeds, and how the data it produces are analyzed. In a canonical eye-tracking-whilereading experiment, a participant is presented with a single stimulus sentence on the screen, usually in a monospace font such as Courier or Monaco. Typically, the entire sentence will be presented on screen, and the participant will read at their leisure until they are satisfied that they have understood the sentence. At that point, they will proceed to a comprehension question or the next sentence by pressing a button on a response pad. Variations of this basic setup involve presenting multiple lines of text, offering the participant additional context for the stimulus sentence, or perhaps an additional secondary task above and beyond simple reading for comprehension (Clifton 2013). During this passive reading, an eye-tracker is used to monitor the duration and location of fixations and saccades across the text. The raw data from an experiment essentially consist of a record of the duration and location of the fixations that a participant makes on a given sentence. Analysis of these data typically involves (i) specifying a region of interest (ROI), or a portion of the sentence within which the fixations will

eye-tracking and experimental syntax

339

be analyzed, and (ii) calculating a number of derived dependent measures that summarize the pattern of fixations within a given span of text. Suppose an analyst was interested in measuring the disruption caused by subject–verb agreement failure in the ungrammatical sentence (4): (4) The lonely hiker on the mountain were getting tired and decided to take a break. Under the assumption that effects of interest would at least be localized to the region where agreement failure becomes apparent, the analyst might choose the ungrammatical auxiliary were as a region of interest, potentially including the following word as part of the ROI. Since effects are not always immediately localized to a critical region, one might also choose to analyze a spillover region following the critical region (tired in this example). Having specified a region of analysis (or several), various dependent measures are computed for each ROI. The first fixation duration refers to the duration, typically expressed in milliseconds, of the first fixation that lands in the ROI. First-pass reading time is the sum of all fixations on an ROI before the reader exits it to the left or to the right; this is sometimes called gaze duration when the ROI comprises only a single word. Firstpass regressions out is the proportion of trials where readers make a backwards saccade or regression from the ROI after their first pass. Go-past time or regression path is the sum of all fixations that occur from the first fixation in an ROI until the reader exits it to the right; this measure includes fixations made on previous stretches of the text after a regression. Together, these measures comprise so-called “early” measures of reading behavior (though see below for caveats concerning a tidy early/late dichotomy), as they jointly index all processing that occurs up until the reader moves to the right of a given region. Later measures include rereading time, which is the sum of all fixations in an ROI after the reader moves past it for the first time, and total time, which is the sum total of all fixations on a given ROI, regardless of when they occur during the course of reading a sentence. These derived dependent measures are some of the most common reported in eye-tracking studies. However, eye-tracking data analysis remains an active area of investigation. Some more recent but less commonly used measures include scanpath analysis (von der Malsburg and Vasishth 2011; 2013), and cumulative progression analysis (Scheepers, Hemforth, Konieczny, and van Gompel n.d.. cited in Kreiner, Sturt, and Garrod 2008). These measures have strengths and weaknesses that complement more traditional analysis. Interested readers should consult these references for further information. As the preceding makes clear, the eye-tracking analyst faces numerous substantive analytic choices. This presents a somewhat perilous situation: In the absence of clear theoretical guidance on the choice of a region of analysis, or the appropriate dependent measure, there are a number of delicate statistical issues. First, the multiplicity of dependent measures typically analyzed in an eye-tracking experiment evidently leads to a potentially serious multiple comparisons problem; eye-tracking researchers typically declare an “effect” after observing a significant effect of a manipulation in a single dependent measure (von der Malsburg and Angele 2017). This can dramatically increase

340

dave kush and brian dillon

the probability of a Type I error; a simple heuristic to avoid this is to ensure that any critical effect is found in more than one dependent measure, but other measures to correct for multiple comparisons such as the Dunn-Bonferroni correction may also be applied (von der Malsburg and Angele 2017). Second, we believe there is no clear theoretical guidance (yet) on the optimal choice of analysis regions. This leads to another statistical problem: if researchers choose their regions of analysis after the fact, they may inflate their experiment-wise Type I error rate (i.e. they may unwittingly enter “the Garden of Forking Paths”; Gelman and Loken 2013). In our view, the situation at present is one where choice of analysis region should be viewed as exploratory. There is substantial study-to-study variation in where and when a given parsing effect will arise (Clifton et al. 2008), and so it is not (yet) possible to predict a priori exactly where a given effect will arise. In light of this, it seems unavoidable that researchers will sometimes need to examine multiple distinct regions of analysis for a given experiment. In this case, it is important to present fully and transparently the results of such exploration in published work so that the distinction between confirmatory and exploratory analyses is clear; Pearlmutter et al. (1999) provide an excellent example of this approach.

10.2.3 The time-course of grammatical processing Perhaps one of the most often-cited benefits of the eye-tracking-while-reading methodology when applied to sentence processing is its temporal resolution: The analyst can examine the time course of processing from the first fixation on a critical region all the way up to any rereading of that region. Given the relative ordering, it is customary to refer to first fixation and first pass as “early measures,” and second pass/rereading and total time as “late measures” (Rayner, Sereno, Morris, Schmauder, and Clifton 1989). Go-past and regressions out are sometimes labeled early, sometimes late (Clifton et al. 2008). This rough distinction allows some insight into the processes of interest: To the extent that a given manipulation is reflected in first fixation or first-pass measures, it seems safe to say that it reflects the initial processing of the region of interest. On the other hand, effects that only occur in late measures like rereading are less likely to result from early processing of the target region. However, it is important to resist reifying the early/late distinction. As mentioned above, there are some measures like go-past times whose classification is up for debate. More importantly, however, there seems to be a many-to-one relationship between parsing processes and eye-tracking measures. This was clearly demonstrated by Clifton, Staub, and Rayner (2007), in a review of 100 peer-reviewed papers that used eye-tracking to investigate “higher-order” processes, such as resolving a syntactic ambiguity, recognizing a syntactic or semantic anomaly, or processing syntactic or referential complexity. The survey indicated that for any given manipulation or type of anomaly, there was substantial variation in where, and on what measure, it impacted eye movements. For instance, the authors observed that the processing of a syntactic anomaly impacted first pass times and regressions out on both the critical region and

eye-tracking and experimental syntax

341

on the spillover region, as well as late measures like total time and rereading. The subsequent literature has been similarly varied: For example, the recognition of an agreement error has been shown to impact go-past times at the critical region and the spillover (e.g. Kreiner, Garrod, and Sturt 2013), first pass and total reading times (Dillon, Mishler, Sloggett, and Phillips 2013), and regressions out (Dillon et al. 2013). Despite this uncertainty, it is clear that grammatical processing does impact eye movements, and that it can do so quite rapidly. One of the most common experimental paradigms for studying grammatical processing in comprehension is the so-called violation paradigm; in other well-known measures for studying comprehension, such as the event-related potentials (ERP) technique or self-paced reading, this paradigm is widely used. It has been less common in eye-tracking-while-reading. In the violation paradigm, the experimenter will introduce a syntactic (or semantic) anomaly at some position in the sentence, and measure when, and to what degree, the recognition of the anomaly impacts reading comprehension; Rayner et al.’s (2004) study on the various types of implausibility offers one such example. Another example comes from Kreiner, Garrod, and Sturt (2013), who looked at examples such as (5): (5)

a. The family / families definitely and undeniably wish to avoid a court trial. b. The widow / widows definitely and undeniably wish to avoid a court trial.

Kreiner and colleagues introduced an agreement error on the critical verb wish by manipulating the number marking on the subject head noun; the following two words served as a spillover region. Kreiner et al. wanted to know whether notionally plural collective nouns like family would mitigate any processing difficulty created by a mismatched verb. Interestingly, they observed that both collective nouns like family and non-collective nouns like widow imposed a similar penalty on a mismatching verb in early measures; in particular, they both created a slow-down of approximately 80ms in go-past times at the verb. Thus, in an arguably “early” measure (go-past times), a mismatched verb had an impact on the eye-movement recording, slowing fixation durations and/or triggering additional regressive saccades and rereading. In itself, it is difficult to characterize a mismatch penalty in go-past times as extrinsically “early” in processing for all the reasons detailed above: There is just too much variation in where comparable mismatch effects are observed, study to study, to draw firm conclusions of this sort. However, within an experiment, it is possible to distinguish the relative time-course of the grammatical processing for a given comparison. Indeed, this is exactly what Kreiner and colleagues did. For those same stimuli, the mismatch penalty they observed looked importantly different in second-pass reading times. In particular, there was no longer any reliable mismatch penalty for collective nouns, while the penalty for non-collective nouns persisted (and indeed appeared numerically greater than in go-past times). Thus, within this experiment, there was a reliable difference in the time-course of the mismatch penalty when comparing the different noun types: Non-collective nouns imposed an early and long-lasting mismatch penalty; collective nouns imposed an early penalty that dissipated in later measures. Like Rayner et al. (2004), these data point to a distinction between notionally plural

342

dave kush and brian dillon

collective nouns and non-collective nouns with respect to how they enter into agreement dependencies with the verb. Of course, the interpretation of such a distinction is up for debate: Kreiner and colleagues suggest that the semantic information is available only at a delay, and so can come online to resolve an agreement mismatch only in later processing stages. What is clear, however, is that eye-tracking affords a comprehensive picture of the comprehension process as it unfolds, which allows the analysis to temporally dissociate plausibly distinct subcomponents of the parsing process. To the extent that parsing processes make direct contact with the representational vocabulary afforded by the grammar, such dissociations have the potential to bear on questions of experimental syntax. Though the precise ways in which higher-order grammatical processes are reflected in the eye-movement record remain poorly understood, the same is not true of lexical processing. The lexical factors that control eye movements and the measures that they impact are comparatively well understood. Thus, eye-tracking may be particularly useful for experimental syntacticians whose theoretical questions concern the dividing line between the lexical and grammatical processes. Explicit comparisons of how lexical ambiguity and syntactic ambiguity play out in eye-tracking measures reveal some similarities, and some differences; the differences have been used to argue against processing models that rely on lexical storage and retrieval of syntactic frames to assemble syntactic structure in comprehension (van Gompel, Traxler, and Pickering 2000). For example, we noted above that comprehenders spend more time fixating ambiguous words than unambiguous words when the ambiguity is relatively balanced (Duffy, Morris, and Rayner 1988; Rayner and Duffy 1986). The same cannot be said of syntactic ambiguity: processing syntactic ambiguity does not reliably cause reading time slowdowns. For example, the verb found can take either an NP or CP complement, as shown in (6a, 6b), but there does not seem to be reliable evidence that the availability of two parses slows participants down at the verb (see Hare, McRae, and Elman 2003 for the original experiment, and subsequent discussion in Clifton et al. 2007; and Clifton and Staub 2008). (6)

a. The psychology students found the book in the bookstore. b. The psychology students found the book was written poorly.

While evidence for costs of ambiguity is scarce, there is good evidence that syntactic ambiguity often conveys a processing advantage in eye-tracking measures, speeding reading times relative to unambiguous sentences. This effect is known as the ambiguity advantage effect (Traxler, Pickering, and Clifton 1998; van Gompel, Pickering, and Traxler 2000; 2001; van Gompel, Pickering, Pearson, and Liversedge 2005). The observation that lexical ambiguity slows reading times, while syntactic ambiguity speeds reading times, has provided an important challenge for syntactic processing models that rely on lexical storage and retrieval to assemble syntactic structures in incremental sentence processing (Clifton and Staub 2008; Traxler et al. 1998; van Gompel et al. 2000; but see Green and Mitchell 2006; Vosse and Kempen 2009 for counterarguments).

eye-tracking and experimental syntax

343

10.3 Aligning grammar and parser

..........................................................................................................................

Processing models and results such as those discussed in the previous section are descriptions expressed at the algorithmic level: They concern how readers create, maintain, and update incremental syntactic representations over the course of processing a sentence. By contrast, syntactic theories are generally formulated at a higher level of abstraction (Sprouse and Lau 2013). This gap needs to be bridged in order for eye-tracking data to be put to service for syntactic theorizing. In order for eye-tracking results to have bearing on the form of our syntactic theory, we need linking hypotheses that map (i) grammatical constructs to parsing operations and (ii) parsing operations to dependent measures like the various reading time measures discussed above. The issue of how to link grammatical representations with parsing operations is a very difficult issue: We highlight some of the basic issues we see (we point the reader to Boland 2005 for additional extended discussion, and Marantz 2005 for a somewhat more optimistic take on the enterprise). Being able to draw inferences about the grammar from real-time results presupposes the intuitive conjecture that there exists a (non-trivial) correspondence between operations and constructs of the grammar and those of the parser. Complex grammatical representations, for example, should require the parser to conduct more complex computations all else being equal, while less complex representations should require less computation. More complex computations should require greater effort, which should in turn manifest as an increase in reading times (or some other similar measure). The degree of correspondence between the grammar and the parser is to some extent an open empirical and theoretical issue, and the extent of correspondence that the analyst assumes has consequences for how to interpret parsing data. The strongest assumption to make is that there is a maximally transparent or isomorphic relation between grammatical and parsing operations. The now disfavored Derivational Theory of Complexity (Chomsky and Miller 1963) assumed such strong isomorphism (see Fodor, Bever, and Garret 1974 for early arguments against the DTC; and Phillips 1996 for a different perspective). Weaker variants of the transparency thesis give up on strict isomorphism in favor of some kind of weaker homomorphism, which permit manyto-one mappings between grammatical and parsing operations in either direction. One well-known example of homomorphism is Berwick and Weinberg’s (1984) weak type transparency condition, which holds that grammatical rules and structures are transparently instantiated in the parser, but the parser may make additional distinctions above and beyond those imposed by the rules and representations in the grammar. We suspect that most theorists nowadays endorse homomorphism at least implicitly, but the exact details vary (see Marantz 2005; Lewis and Phillips 2013; 2015).1 1

We should note that not all researchers agree that transparency is warranted: Townsend and Bever (2001), for example, propose that initial parsing is carried out using a suite of “quick and dirty” heuristics that are only indirectly related to full-fledged grammatical representations which are computed at

344

dave kush and brian dillon

If one has a formal model that clearly delimits both the basic inventory of parsing operations and the primary determinants of processing costs, one can align grammatical constructs to parsing operations (either one-to-one as in the case of isomorphic mappings, or one-to-many in the case of homomorphism). In turn, one can make relatively straightforward predictions about when and where effects of a grammatical manipulation should emerge in the eye-tracking record. It is important to note, however, that such predictions are often only valid with respect to the particular parsing model assumed. Different models of the parser often apportion the costs of complexity differently across various computationally divergent operations and therefore make different predictions about where and when grammatical effects are supposed to emerge. Therefore, it is important for the experimental syntactician to commit to a specific model (or equivalence class of models) from the outset. The problem that the syntactician will encounter here is that there are many models on the market, many of which vary quite substantially in their details and parameters, and consequently the explanations that they support. Consider a few choices that one is faced with when choosing a parsing model: One must decide, whether the parser is serial (i.e. it only pursues a single syntactic analysis for an input sentence at a given time), or parallel (it can consider multiple possible parses simultaneously). One must also be clear about the memory architecture that subserves the parser: How does it encode and store information? What are its capacity limits? How is information maintained and accessed? Further, one has to determine how predictive the parser is: Can it build syntactic structure or compute dependencies before it has unambiguous bottom-up evidence? Each of these choices has theoretical and practical consequences for interpreting results and making predictions about incremental processing, since they determine, in part, the inventory of parsing operations that grammatical constructs can be set in correspondence to, and because they each come with their own grammar-independent complications that must controlled for. For example, as pointed out by Phillips and Wagers (2007), essentially all of the psycholinguistic evidence adduced against the existence of traces/gaps (e.g. Pickering and Barry 1991) rests on the assumption that parsing is bottom-up. The findings are equally consistent with a parser that predictively posited the existence of a trace before its linear position (see also Aoshima, Phillips, and Weinberg 2004; Gibson and Hickok 1993; Gorrell 1993; Crocker 1994). Given that interpretation so often depends on model assumptions, we advise aspiring experimental syntacticians to adopt independently established and explicit parsing frameworks whenever possible. This is for two reasons: First, explicit parsing frameworks allow clearer mapping hypotheses between parser and grammar, which can help in refining predictions and testing more precise theoretical hypotheses. Second, results some subsequent point. Other researchers working within the Good Enough Parsing (GEP) framework have gone so far as to argue that the end state of parsing needn’t always be a well-formed syntactic representation (see Ferreira and Patson 2007). It is clear that adopting either of these hypotheses makes direct inference from parsing measures to grammatical conclusions incredibly difficult. For the purposes of this chapter we assume that there is a degree of transparency (see Lewis and Phillips 2015; Phillips 2013 for further discussion).

eye-tracking and experimental syntax

345

are more easily falsifiable and evaluable. As we discuss later, we have couched much of our own work within the general cue-based parsing framework (e.g. Lewis and Vasishth 2005; Lewis, Vasishth, and Van Dyke 2006; McElree 2006) partially for these reasons. The cue-based framework posits a serial left-corner parser with an extremely limited active memory. The parser’s extremely limited focus of attention (a capacity of one or two “chunks” at most) entails that the parser must frequently retrieve previously processed chunks from a separate content-addressable memory store based on their features (associative cues). Parsing costs within this framework are largely tied to these retrieval procedures, and questions of grammatical representation are often formulated in terms of encoding: how is grammatical information encoded in features that can be used as retrieval cues? Theorists working within this framework have leveraged this perspective to ask sophisticated questions about the precise nature of syntactic representations constructed in incremental processing: for example, Arnett and Wagers (2017) use this perspective to investigate what features define a subject phrase in English. Obviously, the biggest drawback to formulating hypotheses relative to highly precise models is that not everyone shares the same assumptions. Results that depend too heavily on a controversial assumption may be dismissed by theorists working within other frameworks. Hopefully, as parsing models are elaborated and consensus is reached on at least some parameters of the parser, this problem should recede. The success of experimental syntax therefore depends in part on finding answers to questions about the parser that might seem orthogonal to grammatical questions in and of themselves.

10.4 Case studies

..........................................................................................................................

In the remainder of this chapter we consider two areas in syntactic theory where eye-tracking results may inform debates in syntactic theory: (i) the origin of island constraints and (ii) constraints on the interpretation of reflexive pronouns. We spend some time outlining the logic of the studies for two reasons. First, we wish to familiarize readers with some independent and well-established effects commonly used in eye-tracking studies and explain how such effects can be used to probe representational commitments at the level of syntax.2 Second, we also wish to highlight the (sometimes 2

There exists another approach to using eye-tracking measures to probe syntactic knowledge that proposes to draw inferences about syntactic representation by reading effects of (representational/operational) complexity directly off of reading time measures. For example, Boston and colleagues (Boston et al. 2008) showed a significant correlation between reading times and grammatical complexity (as quantified using a surprisal metric; Hale 2001). Such correlations between some experimental measure and complexity can be used to argue for the utility of syntactic representations in models of processing, and it might also be seen as a way to arbitrate between different formalisms. Other authors in this volume (Brennan, Ch. 16) use a variant of this correlational approach to argue for neural correlates of syntactic processing. While we think that such work is an important first step for neurophysiological work, we believe that it rarely permits us to ask the fine-grained questions that most syntacticians are

346

dave kush and brian dillon

complicated and tenuous) chain of inference required to connect empirical results to theoretical claims.

10.4.1 Islands Certain syntactic domains are islands for long-distance dependency formation (Ross 1967). For example, it is possible in many languages to relate a wh-phrase like what to a gap inside a declarative complement clause (7a), but not inside a relative clause (7b), or a subject (7c). (7)

a. What did Knut say [CP that the kid read __]? b. *What did Knut know the kid [RC that read __]? c. *What did Knut say that [SUBJ the kid reading __] was amused?

Why should some syntactic domains, but not others, block the extraction of whphrases? One influential line of thought attributes island effects to innate syntactic constraints (e.g. Ross 1967; Chomsky 1973). Other researchers reject a syntactic account, positing instead that islands are reducible to limitations of extralinguistic cognitive domains like memory (e.g. Deane 1991; Kluender and Kutas 1993). Recent studies have used experimental methods to argue for one position or the other, often using reading measures like self-paced reading (for recent summaries we refer the reader to Phillips 2013 and Wagers 2013). However, relatively few have used eye-tracking to study how islands constrain incremental filler-gap processing. We focus on two such studies. In order to interpret a displaced phrase like a wh-word (henceforth, a filler) readers must connect the filler to its base-position (its gap) that falls somewhere later in the sentence. One challenge that readers face is that the location of the gap is often uncertain when the filler is first encountered. A filler can, in principle, be linked to a number of different plausible positions, as in (8). (8) Knut asked who … a. ___ liked brown cheese.

[SUBJECT GAP] b. Marit saw___. [DO GAP] c. Torgunn gave the cheese to___. [IO GAP]

Psycholinguistic research suggests that readers manage this incremental uncertainty by predicting gap positions. According to this active filler strategy, readers pre-emptively posit a gap as soon as possible, even before receiving bottom-up confirmation of the true gap site (e.g. Stowe 1986; Frazier and Clifton 1989). Early eye-tracking evidence for active filling comes from Experiment 2 in Traxler and Pickering (1996), where English participants read sentences like (9). Sentences concerned with: Most competing theories agree on the general loci of complexity, but differ in the details (consider the fact that Boston and colleagues (2008) found that both dependency and phrase-structure grammars were good predictors of reading times).

eye-tracking and experimental syntax

347

contained RCs whose head (book/city) was ultimately linked to an oblique gap (__). Sentences also contained an optionally transitive verb (wrote) intervening between the filler and the true gap. According to the active filling hypothesis, participants should initially interpret the filler as the object of write. Traxler and Pickering used a plausibility mismatch manipulation to determine whether participants did so. They manipulated whether the filler was a plausible object of the intervening verb (one can write a book, but not a city). They reasoned that if readers posit a gap position after write, then readers should experience difficulty if the filler was an implausible object but not to a plausible one. (9) We like the {book / city} that the author wrote unceasingly and with great dedication about ___ … The researchers observed a plausibility mismatch effect at wrote in (9): Gaze durations were longer on the verb when the filler was city than when it was book. Based on this effect, Traxler and Pickering concluded that comprehenders were active gap-fillers (corroborating earlier findings such as Stowe 1986). The experiment also contained two additional conditions relevant to islands. The conditions featured the same plausibility-mismatch manipulation and the same verb intervening between the filler and gap. The only difference was that the intervening verb was embedded in an island: the embedded subject NP (10). The researchers wanted to know whether readers would blindly associate an unintegrated filler with any intervening verb, or whether readers would only posit gaps in positions that were potentially grammatical. (10) We like the {book / city} that [SUBJECT the author who wrote unceasingly and with great dedication] saw ___ … Traxler and Pickering found no plausibility-mismatch effect at wrote in (10), which indicated that participants suspended active filling inside the subject. The authors took this as evidence that islands constrained parsing behavior. As pointed out by Phillips (2006), Traxler and Pickering’s (1996) results are equally compatible with two different explanations: active filling might be suppressed inside islands because A’-dependencies into islands are grammatically excluded, or because active-filling inside islands over-taxes memory resources. If the first explanation were correct, we would have (indirect) evidence that islands are grammatical in origin. But how might we tease the two possibilities apart? Reductionist accounts predict a blanket ban on active filling in domains coarsely categorized as islands (subjects, RCs, etc.). The ban is expected to hold crosslinguistically, under the assumption that humans possess essentially the same working memory resources, irrespective of their native language. In contrast, the grammatical approach allows for some wiggle room: If there exist languages that allow grammatical dependencies into apparent islands, we should expect active filling in exactly those domains.

348

dave kush and brian dillon

We consider one experiment that we think provides weak support for the grammatical account. (We also point the reader to Phillips 2006, where a different version of the same argument is made in English.) Mainland Scandinavian languages like Swedish appear to allow some long-distance dependencies into RCs (e.g. Maling and Zaenen 1982; Engdahl 1997). If RCs are not islands in Swedish,3 then, according to the logic above, Swedish readers should pursue active filling inside of RCs, all else being equal. Tutunjian, Heinat, Klingvall, and Wiklund (2017) tested this hypothesis in Swedish using a modified version of Traxler and Pickering’s (1996) design. They tested whether readers temporarily interpreted a topicalized NP filler (såna där möbler/flyttlådor in 11, 12) as the object of an optionally transitive verb embedded inside an RC (renoverade). The verb of interest preceded the filler’s true gap site (the object position of bära ‘to carry’). Tutunjian and colleagues manipulated whether the filler was a plausible object of the RC-internal verb or not: möbler (‘furniture’) can be ‘renovated’ in Swedish, but flyttlådor (‘moving boxes’) cannot. They also manipulated the position of the critical RC. In RC-Object conditions (11), the NP containing the RC was the object of the matrix verb bad4 (‘asked’), while in RC-Subject conditions like (12) the NP was the matrix subject. (11) RC-Object Såna där {möbler /flyttlådor} bad jag [NP en kollega [RC som renoverade (_) Such there furniture/boxes asked I a colleague who renovated på land-et]] att bära __ efter match-en i söndags. on land-def to carry after match-def last Sunday ‘Such furniture, I asked a colleague that renovated (_) in the country to carry __ after the match on Sunday.’ (12) RC-Subject Såna där {möbler /flyttlådor} bad [DP en kollega [RC som renoverade (__) Such there furniture/boxes asked a colleague who renovated på land-et]] mig att bära __ efter match-en i söndags. on land-def me to carry after match-def last Sunday ‘Such furniture, a colleague that renovated (_) in the country asked me to carry __ after the match on Sunday.’ The researchers reasoned that if Swedish RCs are not islands, participants should actively interpret the filler as the object of renoverade in (11), yielding a plausibility 3

The islandhood of RCs in Mainland Scandinavian languages is a point of ongoing debate. Some advocate a position that RCs are, across the board, non-islands (e.g. Allwood 1982; Engdahl 1996), while others have argued that a more fine-grained analysis is required to distinguish between acceptable and unacceptable RC-spanning dependencies (Lindahl 2017). For the purposes of this section we adopt the view that they are non-islands in order to illustrate the point. 4 The matrix verb bad immediately follows the topicalized object because Swedish is a V2 language (Holmberg and Platzack 1995).

eye-tracking and experimental syntax

349

mismatch effect. They further reasoned that active filling should not occur in (12) because subjects are strong islands in Swedish as in English (see Engdahl 1983). Thus, the RC-subject conditions were intended to serve as controls where we would expect no mismatch effect. The results of the experiment were somewhat complex: Gaze duration and total reading times were longer on renoverade when the filler was implausible than when it was plausible. Although the mismatch effect was numerically larger in (11) than in (12) for both measures, the plausibility ×RC-position interaction was not significant. The authors found one effect that they argued supported a distinction between RC-Object and RC-Subject conditions: a three-way plausibility ×RC position ×trial order interaction in gaze duration. This interaction indicated that there was a large plausibility mismatch in the RC-Object conditions on early trials, but this effect was extinguished over the course of the experiment. In RC-Subject conditions, there was no early effect of plausibility, but suggestions of one began to emerge near the end of the experiment.5 Overall, the experiment provides suggestive evidence that active gap-filling is not suspended inside Swedish RCs. The results minimally show that active gap-filling inside RCs (and subjects) is not precluded due to resource limitations. The results might also be interpreted to support the non-islandhood of Swedish RCs in general. If RCs are not islands in Swedish, then the results might support the hypothesis that grammatical acceptability controls active gap-filling. Moreover, if island effects do not reflect resource limitations and can vary cross-linguistically, then we the results constitute an indirect argument that islands require a grammatical explanation (though perhaps not one tied to inviolable universal constraints). We point out, however, that these conclusions should be treated with care because aspects of the results—particularly the suggestion of active gap-filling inside subjects—weaken the strength of the findings.

10.4.2 Reflexive processing In an influential study, Sturt (2003) investigated the real-time resolution of English reflexive anaphors such as himself, with the goal of determining at what stage of incremental processing comprehenders apply Binding Principle A. Sturt sketched two general options: Knowledge of Principle A could apply as an initial constraint on the consideration of potential antecedents, restricting the set of possible antecedents to those licensed by the grammar; results from cross-modal priming seemed to support this view (Nicol and Swinney 1989). Alternatively, Principle A could apply as a late filter on antecedent consideration. Under this second option, the parser might initially consider (feature-matching) NPs in grammatically inappropriate positions as potential antecedents, but then rescind this consideration at a later point in processing; some 5

We suspect it is possible that active gap-filling may be possible in Swedish subjects with finite RCs due to the rather permissive parasitic-gapping possibilities in the language (see Engdahl 1983).

350

dave kush and brian dillon

results from self-paced reading seemed to support this view (Badecker and Straub, 2002). Sturt proposed that eye-tracking could help differentiate between these two hypotheses. Sturt employed a gender-mismatch paradigm in an eye-tracking-while-reading study. The logic of this paradigm is as follows: If a reader encounters a reflexive that lacks a feature-matching antecedent in the local context, the parser will fail or otherwise experience integration failure (e.g. Reichle et al. 2009; Vasishth et al. 2008). Thus, reflexives without a feature-matched antecedent will be read more slowly than anaphors that have an antecedent, owing to the increased likelihood of integration failure (although other linking hypotheses between the parser and reading times in this paradigm are possible: see Jaeger, Engelmann, and Vasishth 2017; Nicenboim, Engelmann, Suckow, and Vasishth n.d.; Patil, Vasishth, and Lewis 2016). Thus, we expect longer reading times at the reflexive himself in (7) than we would expect to find on herself in the same position. (7) The girl hurt herself/himself wrangling the reindeer. (Gender-)mismatch effects of this type can be used to probe whether certain NPs are “visible” to the parser as potential antecedents or licensors. Sturt’s (2003) Experiment 1 manipulated the gender-match between a reflexive (himself/herself below) and two c-commanding NPs. The subject of the reflexive-containing embedded clause (the surgeon) was accessible because it was local enough to bind the reflexive according to Principle A. The reflexive either matched the stereotypical gender of the accessible NP (himself ) or mismatched it (herself ).6 The second NP was a pronoun in the matrix subject position of the second sentence. The pronoun was coreferential with a name introduced in the first sentence (Jonathan/Jane), and either matched or mismatched the reflexive in stereotypical or definitional gender. As this higher NP was not local to the reflexive (not contained within its immediate finite clause), it was ruled out as a potential antecedent by Principle A. Sturt reasoned that if the parser considered grammatically inappropriate antecedents, the reflexive in (8c) should be processed differently than in (8d). Gender-match with the inaccessible NP might facilitate or inhibit processing of the otherwise unlicensed reflexive relative to the condition where the inaccessible NP did not match. (8)

6

a. Jonathan was pretty worried at the City Hospital. He remembered that the surgeon had pricked himself with a used syringe needle. b. Jennifer was pretty worried at the City Hospital. She remembered that the surgeon had pricked himself with a used syringe needle. c. Jonathan was pretty worried at the City Hospital. He remembered that the surgeon had pricked herself with a used syringe needle.

Sturt (2003) used stereotypical gender in order to avoid presenting people with fully ungrammatical sentences in an eye-tracking experiment. In eye-tracking paradigms, stereotypical gender violations and definitional gender violations lead to similar gender mismatch effects in early measures (Kreiner, Sturt, and Garrod 2008). In later studies (e.g. Dillon et al. 2013; Parker and Phillips 2017), definitional gender violations or number violations were also used.

eye-tracking and experimental syntax

351

d. Jennifer was pretty worried at the City Hospital. She remembered that the surgeon had pricked herself with a used syringe needle. Sturt found that first-fixation, first-pass, and regression-path times were longer for a reflexive that mismatched the accessible NP (8c,d) than for one that matched (8a,b). Gender-match between the reflexive and the inaccessible NP did not reliably influence these measures at the reflexive. However, the gender of the inaccessible NP did appear to affect later processing: Second-pass reading times at the reflexive were shorter when both NPs matched the reflexive (8a) and longer when only the inaccessible NP matched (8d). In later regions, the presence of a matching inaccessible NP facilitated second-pass times, suggesting that the non-local, c-commanding NPs may be considered at some stage of processing. In a second experiment Sturt (2003) again used a gender-mismatch design to test the effect of an inaccessible NP in a different structural position. In Experiment 2 the inaccessible NP was embedded inside a relative clause attached to the accessible NP. In this position the NP was not a grammatically acceptable antecedent for the reflexive because it did not c-command the reflexive. (9) The surgeon who treated Jennifer/Jonathan had pricked himself/herself with a used syringe needle. As in Experiment 1, participants’ first fixation and first-pass times were longer in the critical region when the reflexive did not match the stereotypical gender of the accessible NP. Unlike Experiment 1, the gender of the inaccessible NP had no observable effect on any early or late measures. From these findings, Sturt concluded that Principle A was deployed as a defeasible filter: It constrained antecedent selection in early processing, but could be overridden in subsequent processing stages if, for example, countervailing discourse constraints made the inaccessible antecedent particularly tempting. Since Sturt’s seminal finding, there has been an intense interest in the processing of reflexives, and in particular, intense interest in the question of whether Principle A is applied as a “hard” constraint on antecedent retrieval (Dillon et al. 2013), or if instead, Principle A provides only one constraint among many used to select an antecedent (Badecker and Straub 2002; Jaeger et al. 2017; Patil et al. 2016; Parker and Phillips 2017; Sloggett 2017). At present, this debate between these two views continues; see reviews in Dillon (2014), Sturt (2003), and especially Jaeger et al. (2017). A recent meta-analysis by Jaeger et al. (2017) suggests that there is very little evidence for interference from inaccessible antecedents on reflexive dependencies; at the same time, recent work by Parker and Phillips (2017) and Sloggett (2017) calls this conclusion into question. We return to these results below. While recent evidence calls into question the strong conclusion that Binding Theory is rigidly deployed as a “hard” constraint on antecedent access (Dillon et al. 2013; Dillon 2014), this body of literature does provide evidence that knowledge of Principle A constrains the parser’s earliest attempts to resolve a reflexive dependency. Given this, it is interesting to ask what relevance these findings might have for syntactic theory.

352

dave kush and brian dillon

First, these results provide us with some evidence that speaks to the issues of transparency. Rapid or immediate alignment between the grammar and the parser is at the very least consistent with the transparency thesis. In this spirit, subsequent work using the mismatch paradigm in eye-tracking has leveraged this close alignment between online parsing processes and grammatical constraints to investigate long-held distinctions made by formal linguistic theories. One important distinction is between co-argument and non co-argument anaphors/reflexives. As we will see, this long-held grammatical distinction between argument and co-argument anaphors does not map cleanly onto online measures. If we adopt a tight link between the grammar and the parser, it stands to reason that these results can inform inquiry into the (grammatical) status of the coargumenthood distinction. Since the influential work of Pollard and Sag (1992), it has been suggested that only reflexives whose binder is a coargument of the same syntactic or semantic predicate are strictly subject to Principle A of the Binding Theory; those with non-coargument antecedents are exempt anaphors that can participate in anaphoric relations that violate Principle A (Pollard and Sag 1992; see also the notion of a “logophor” in Reinhart and Reuland 1993). Cunnings and Sturt (2014) use eye-tracking-while-reading to compare the processing of traditional and exempt reflexives using a gender-mismatch design. They investigated whether resolution of exempt reflexives was susceptible to interference from non-local NPs, with the goal of ascertaining if the structural position of the reflexive had any effect on the degree to which non-local antecedents might be considered. The first experiment looked at reflexive pronouns in direct object position (10). The second and third looked at reflexive pronouns inside picture NPs, without (11) or with (12) a local possessor, respectively. Picture noun phrase reflexives without a possessor are typically taken as parade-cases of exempt anaphors. (10) Jonathan/Jennifer was walking through the military barracks. Coargument reflexives a. She/He heard that the soldier had positioned himself in the middle of the mess hall. b. She/He heard that the soldier had positioned herself in the middle of the mess hall. (11)

a. Jonathan/Jennifer was walking through the military barracks. Picture NP reflexives without possessors b. She/He heard that the soldier had a picture of himself in the middle of the mess hall. c. She/He heard that the soldier had a picture of herself in the middle of the mess hall.

eye-tracking and experimental syntax

353

(12) Jonathan/Jennifer was walking through the military barracks. Possessed Picture NPs a. She/He heard about the soldier’s picture of himself in the middle of the mess hall. b. She/He heard about the soldier’s picture of herself in the middle of the mess hall. In end-of-sentence interpretation judgments, Cunnings and Sturt confirmed that PNP reflexives (and especially possessed picture noun phrases) were more likely to take a non-local antecedent, and more so when the local antecedent was a (stereotypical) gender mismatch. This finding was not mirrored in the eye-movement record. In all of their experiments, Cunnings and Sturt observed the expected gender-mismatch effect either at the reflexive pronoun or in the spillover region: Reading times were slower when the local antecedent mismatched the reflexive’s features. In no experiment did they find evidence that the gender of the non-local antecedent influenced early processing of the reflexive. A cumulative progression analysis did reveal, however, that comprehenders were slower to process picture noun phrase reflexives than coargument reflexives or possessed picture noun phrase reflexives (see also Burkhardt 2005). These results suggest that the parser treats co-argument and picture noun phrase reflexives similarly in at least one respect: In early reading, comprehenders only seem to entertain local, Principle A antecedents for all types of reflexives in the configurations that Cunnings and Sturt tested (though see Kaiser, Runner, Sussman, and Tanenhaus 2009; Runner, Sussman, and Tanenhaus 2003; 2006 for contrasting evidence from the visual world paradigm). Recent research suggests, however, that comprehenders may even be willing to entertain non-local antecedents for coargument reflexives under certain conditions. Parker and Phillips (2017) investigated sentences as in (13; from their Experiment 3): (13) The talented actor/actress mentioned that … a. the attractive spokesman praised himself for a great job. b. the attractive spokeswoman praised himself for a great job. c. the attractive spokeswomen praised himself for a great job. Parker and Phillips’ studies extended the mismatch paradigm from Sturt (2003) by including both 1-feature accessible mismatch conditions (13b) and 2-feature mismatch conditions (13c). Across three experiments, they consistently observed that reflexives in sentences like (13c) were read more quickly when the non-local antecedent matched the reflexive’s features; importantly, this effect was evident even in some early measures. Parker and Phillips suggested that sensitivity to the non-local antecedent was the consequence of an antecedent retrieval mechanism that jointly considers structural and morphosyntactic information in selecting an antecedent (e.g. Lewis and Vasishth 2005; Lewis, Vasishth, and van Dyke 2006; McElree 2006). On this view, though structural cues often carry the day (creating the appearance of strong Principle A sensitivity

354

dave kush and brian dillon

in many contexts), when the local antecedent is a very poor match to the reflexive’s features, the non-local antecedent may sometimes be retrieved and considered. Parker and Phillips’ results show that sensitivity to non-local antecedents can arise in comprehension due to properties of the memory access mechanisms that mediate the online formation of reflexive-antecedent dependencies. However, Sloggett (2017) showed that the likelihood of accessing a non-local antecedent for reflexives is also conditioned on whether that antecedent is a tempting perspective center for an utterance (see Culy 1994; Kuno 1972; Sells 1987; Sloggett 2017). In particular, Sloggett observed that when the non-local antecedent in examples like (13) is a source of information, comprehenders access it more readily in early processing than when it is a receiver of information (Sloggett 2017; see also Kaiser et al. 2009, for sensitivity to sourcehood for picture noun phrase reflexives). Similarly, Sloggett (2017) observed that access to non-local antecedents was impeded when the local subject is a tempting perspective center (e.g. an indexical pronoun such as I). Sloggett (2017) hypothesizes that sensitivity to non-local antecedents is mediated through an encoding of the logophoric center of an utterance, and that comprehenders can access this encoding more readily when the reflexive’s retrieval cues are a poor match to the local subject. The results of Parker and Phillips, and Sloggett, provide an interesting complement to Cunnings and Sturt. While some theoretical accounts (Pollard and Sag 1992; Reinhart and Reuland 1993; van Valin and La Polla 1997) have focused on the coargument/non-co-argument distinction as a critical grammatical factor in controlling access to non-local antecedents, the on-line eye-tracking data suggest a different picture. Co-argument and non-co-argument reflexives show a similar sensitivity to local antecedents when directly compared (Cunnings and Sturt 2014). However, readers do access non-local antecedents for co-argument reflexives in online processing in certain contexts, as a function of feature match to the Principle A antecedents (Parker and Phillips 2017), and as a function of their sensitivity to perspectival elements (Sloggett 2017). The eye-tracking data seem to imply that co-argument and non-coargument reflexives are more similar than different, and instead, processing factors (feature match to the local subject) and other grammatical factors (discourse roles of potential antecedents) have a greater role to play in controlling access to non-local antecedents. If one maintains a strong relationship between the parser and the grammar, then these results seem to call into question theoretical distinctions drawn on the basis of intuitive data. In this respect, they align with other results from visual-world methodologies (Kaiser et al. 2009; see especially discussion in Runner et al. 2003; 2006; Runner and Head 2014). It is of course possible that the divisions among anaphors suggested by the online data do not, in fact, align with grammatically active divisions. However, in an interesting point of convergence, a similar conclusion is reached by Isabelle Charnavel and colleagues on the basis of acceptability judgment data (Charnavel and Sportiche 2016).

eye-tracking and experimental syntax

355

10.5 Conclusions

..........................................................................................................................

This chapter was intended to provide aspiring experimental syntacticians with an introduction to and rough guide on how to answer questions about the grammar using eye-tracking-while-reading. We offered basic information on the experimental paradigm, an overview of behavioral models of reading, identified points of divergence between various models of the parser, and discussed challenges in deciding on appropriate mapping hypotheses between grammatical knowledge and parsing performance. We believe that experimental work that lacks a clear understanding of these issues will have only a limited impact on the broader experimental syntax enterprise, but work that succeeds in balancing the methodological and theoretical issues has a good chance of making important contributions to our understanding of human syntactic competence.

References Allwood, J. 1982. The complex NP constraint in Swedish. In E. Engdahl and E. Ejerhed. Readings on unbounded dependencies in Scandinavian languages, 15–32. Acta Universitatis Umensis: Umeå. Aoshima, S., C. Phillips, and A. Weinberg. 2004. Processing filler-gap dependencies in a headfinal language. Journal of Memory and Language 51, 23–54. Arnett, N., and M. Wagers. 2017. Subject encodings and retrieval interference. Journal of Memory and Language 93: 22–54. Badecker, W., and K. Straub. 2002. The processing role of structural constraints on interpretation of pronouns and anaphors. Journal of Experimental Psychology: Learning, Memory, and Cognition 28: 748–769. Barker, C. Linguistic Inquiry 43: 614–633. Berwick, R., and A. Weinberg. 1984. The grammatical basis of linguistic performance: Language use and language acquisition. Cambridge, MA: MIT Press. Bicknell, K., E. Higgins, R. Levy, and K. Rayner. 2013. Evidence for cognitively controlled saccade targeting in reading. In M. Knauff, M. Pauen, N. Sebanz, and I. Wachsmuth (eds), Proceedings of the 35th annual conference of the Cognitive Science Society, 197–202. Boland, J. E. 2004. Linking eye movements to sentence comprehension in reading and listening. In M. Carreiras and C. Clifton Jr. (eds), The on-line study of sentence comprehension: Eye-tracking, ERP, and beyond, 51–76. Brighton: Psychology Press. Boland, J. E. 2005. Cognitive mechanisms and syntactic theory: Arguments against adjuncts in the lexicon. In A. E. Cutler (ed.), Twenty-first century psycholinguistics: Four cornerstones, 23–42. Hillsdale, NJ: Erlbaum. Boston, Marisa Ferrara, John T. Hale, Reinhold Kliegl, Umesh Patil, and Shravan Vasishth. 2008. Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus. Journal of Eye Movement Research 2: 1–12. Burkhardt, P. 2005. The syntax–discourse interface: Representing and interpreting dependency. Philadelphia, PA: Benjamins. Charnavel, I., and D. Sportiche. 2016. Anaphor binding: What French inanimate anaphors show. Linguistic Inquiry 47: 35–38.

356

dave kush and brian dillon

Chomsky, N. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press. Chomsky, N. 1973. Conditions on transformations. In S.R. Anderson and P. Kiparsky (eds), A Festschrift for Morris Halle, 232–286. New York: Holt, Rinehart & Winston. Clifton Jr, C. (2013). Situational context affects definiteness preferences: Accommodation of presuppositions. Journal of Experimental Psychology: Learning, Memory, and Cognition 39: 487–501. Clifton, C., and A. Staub. 2008. Parallelism and competition in syntactic ambiguity resolution. Language and Linguistics Compass 2: 234–250. Clifton, C., F. Ferreira, J. M. Henderson, A. W. Inhoff, S. P. Liversedge, E. D. Reichle, and E. R. Schotter. 2016. Eye movements in reading and information processing: Keith Rayner’s 40 year legacy. Journal of Memory and Language 86: 1–19. Clifton, C., Staub, A., and Rayner, K. (2007). Eye movements in reading words and sentences. In R. van Gompel, M. Fischer, W. Murray, and R. Hill (eds), Eye movements: A window on mind and brain, 341–372. Amsterdam: Elsevier. Culy, C. 1994. Aspects of logophoric marking. Linguistics 32: 1055–1094. Cunnings, I., and P. Sturt. 2014. Coargumenthood and the processing of reflexives. Journal of Memory and Language 75: 117–139. Deane, P. 1991. Limits to attention: A cognitive theory of island phenomena. Cognitive Linguistics 2(1): 1–63. Deutsch, A., and S. Bentin. 2001. Syntactic and semantic factors in processing gender agreement in Hebrew: Evidence from ERPs and eye movements. Journal of Memory and Language, 45: 200–224. Dillon, B. 2014. Syntactic memory in the comprehension of reflexive dependencies: an overview. Language and Linguistics Compass 8: 171–187. Dillon, B., A. Mishler, S. Sloggett, and C. Phillips. 2013. Contrasting intrusion profiles for agreement and anaphora: Experimental and modeling evidence. Journal of Memory and Language 69: 85–103. Duffy, S. A., R. K. Morris, and K. Rayner. 1988. Lexical ambiguity and fixation times in reading. Journal of Memory and Language 27: 429–446. Ehrlich, S. F., and K. Rayner. 1981. Contextual effects on word perception and eye movements during reading. Journal of Verbal Learning and Verbal Behavior 20: 641–655. Engbert, R., A. Nuthmann, E. Richter, and R. Kliegl. 2005. SWIFT: A dynamical model of saccade generation during reading. Psychological Review 112: 777–813. Engdahl, E. 1983. Parasitic gaps. Linguistics and Philosophy 6: 5–34. Engdahl, E. 1997. Relative clause extractions in context. Working Papers in Scandinavian Syntax 60: 51–79. Engelmann, F., S. Vasishth, R. Engbert, and R. Kliegl. 2013. A framework for modeling the interaction of syntactic processing and eye movement control. Topics in Cognitive Science 5: 452–474. Ferreira, F., and J. Henderson. 2004. The interface of language, vision, and action. Brighton: Psychology Press. Ferreira, F., and N. D. Patson. 2007. The “good enough” approach to language comprehension. Language and Linguistics Compass 1: 71–83. Fodor, J. A., T. G. Bever, and M. F. Garrett. 1974. The psychology of language. An introduction to psycholinguistics and generative grammar. McGraw-Hill: New York.

eye-tracking and experimental syntax

357

Frazier, L. 2008. Processing ellipsis: A processing solution to the undergeneration problem. In Proceedings of the 26th West Coast Conference on Formal Linguistics, 21–32. Somerville, MA: Cascadilla. Frazier, L., and C. Clifton Jr. 1989. Successive cyclicity in the grammar and the parser. Language and Cognitive Processes, 4, 93–126. Frazier, L., and K. Rayner. 1982. Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology 14(2): 178–210. Kluender, R., and M. Kutas. 1993. Subjacency as a processing phenomenon. Language and Cognitive Processes 8(4): 573–633. Gelman, A., and E. Loken. 2013. The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University. Gibson, E., and E. Fedorenko. 2010. Weak quantitative standards in linguistics research. Trends in Cognitive Sciences 14: 233–234. Gibson, E., and E. Fedorenko. 2013. The need for quantitative methods in syntax and semantics research. Language and Cognitive Processes 28: 88–124. Gibson, E., and G. Hickok. 1993. Sentence processing with empty categories. Language and Cognitive Processes 82: 147–161. Green, M. J., and D. C. Mitchell. 2006. Absence of real evidence against competition during syntactic ambiguity resolution. Journal of Memory and Language 55: 1–17. Gorrell, P. 1993. Evaluating the direct association hypothesis: A reply to Pickering and Barry (1991). Language and Cognitive Processes 8: 129–146. Grant, M., B. Dillon, and S. Sloggett. 2020. Ambiguity resolution in attachment and pronominal reference. Glossa: a journal of general linguistics 5: 77. Hale, J. 2001. A probabilistic Earley parser as a psycholinguistic model. In Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies, 1–8. Association for Computational Linguistics. Hare, M., K. McRae, and J. L. Elman. 2003. Sense and structure: meaning as a determinant of verb subcategorization preferences. Journal of Memory and Language 48: 281–303. Holmberg, A., and C. Platzack. 1995. The role of inflection in Scandinavian syntax. New York: Oxford University Press. Huey, E. B. 1908/1968. The psychology and pedagogy of reading. New York: Macmillan; repr. MIT Press. Jaeger, L. A., F. Engelmann, and S. Vasishth. 2017. Similarity-based interference in sentence comprehension: Literature review and Bayesian meta-analysis. Journal of Memory and Language 94: 316–339. Juhasz, B. J., S. J. White, S. P. Liversedge, and K. Rayner. 2008. Eye movements and the use of parafoveal word length information in reading. Journal of Experimental Psychology: Human Perception and Performance 34: 1560–1579. Just, M. A., and P. A. Carpenter. 1980. A theory of reading: From eye fixations to comprehension. Psychological Review 87: 329. Just, M. A., and P. A. Carpenter. 1984. Using eye fixations to study reading comprehension. In D. Kieras and M. Just (eds), New methods in reading comprehension research, 151–182. Hillsdale, NJ: Erlbaum. Just, M. A., P. A. Carpenter, and J. D. Woolley. 1982. Paradigms and processes in reading comprehension. Journal of Experimental Psychology: General 111: 228–238.

358

dave kush and brian dillon

Kaiser, E., J. T. Runner, R. S. Sussman, and M. K. Tanenhaus. 2009. Structural and semantic constraints on the resolution of pronouns and reflexives. Cognition 112: 55–80. Kreiner, H., S. Garrod, and P. Sturt. 2013. Number agreement in sentence comprehension: The relationship between grammatical and conceptual factors. Language and Cognitive Processes 28: 829–874. Kreiner, H., P. Sturt, and S. Garrod. 2008. Processing definitional and stereotypical gender in reference resolution: Evidence from eye-movements. Journal of Memory and Language 58: 239–261. Kuno, S. 1972. Functional sentence perspective: A case study from Japanese and English. Linguistic Inquiry 3: 269–320. Kush, D.2013. Respecting relations: Memory access and antecedent retrieval in incremental sentence processing. PhD dissertation, University of Maryland, College Park. Kush, D., J. Lidz, and C. Phillips. 2015. Relation-sensitive retrieval: Evidence from bound variable pronouns. Journal of Memory and Language 82: 18–40. Lewis, R. L., and S. Vasishth. 2005. An activation‐based model of sentence processing as skilled memory retrieval. Cognitive Science 29(3): 375–419. Lewis, R. L., S. Vasishth, and J. A. Van Dyke. 2006. Computational principles of working memory in sentence comprehension. Trends in Cognitive Sciences 10: 447–454. Lewis, S., and C. Phillips. 2015. Aligning grammatical theories and language processing models. Journal of Psycholinguistic Research 44: 27–46. Lindahl, F. (2017). Extraction from relative clauses in Swedish. PhD dissertation. University of Gothenburg. Liversedge, S. P., K. Rayner, S. J. White, D. Vergilino-Perez, J. M. Findlay, and R. W. Kentridge. 2004. Eye movements while reading disappearing text: Is there a gap effect in reading? Vision Research 44: 1013–1024. Maling, J., and A. Zaenen. 1982. A phrase structure account of Scandinavian extraction phenomena. In P. Jacobsob and G. Pullum (eds), The nature of syntactic representation, 229–282. Dordrecht: Springer. Marantz, A. 2005. Generative linguistics within the cognitive neuroscience of language. Linguistic Review 22(2–4): 429–445. Marr, D. 1982. Vision: A computational investigation into the human representation and processing of visual information. New York: Freeman. McConkie, G. W., and S. N. Yang. 2003. How cognition affects eye movements during reading. In J. Hyona, R. Radach, and H. Deubel (eds), The mind’s eye: Cognitive and applied aspects of eye movement research, 413–427. Amsterdam: North-Holland. McElree, B. 2006. Accessing recent events. In B. Ross (ed), Psychology of learning and motivation, vol. 46, 155–200. San Diego, CA: Elsevier Academic Press. Miller, G. A., and N. Chomsky. 1963. Finitary models of language users. In R. D. Luce, R. R. Bush, and E. Galanter (eds). Handbook of Mathematical Psychology, vol. 2, 419–491. New York: Wiley. Nicenboim, B., F. Engelmann, K. Suckow, and S.Vasishth. n.d. Number interference in German: Evidence for cue-based retrieval. Psyarxiv e-print. Retrieved from: http://www.ling.uni-potsdam.de/∼nicenboim/papers/NicenboimEtAl2016Number.pdf Nicol, J., and D. Swinney. 1989. The role of structure in coreference assignment during sentence comprehension. Journal of Psycholinguistic Research 18: 5–19. Parker, D., and C. Phillips. 2017. Reflexive attraction in comprehension is selective. Journal of Memory and Language 94: 272–290.

eye-tracking and experimental syntax

359

Patil, U., S. Vasishth, and R. L. Lewis. 2016. Retrieval interference in syntactic processing: The case of reflexive binding in English. Frontiers in Psychology 7. Pearlmutter, N. J., S. M. Garnsey, and K. Bock. 1999. Agreement processes in sentence comprehension. Journal of Memory and Language 41: 427–456. Phillips, C. 1996. Order and structure. PhD dissertation, Massachusetts Institute of Technology. Phillips, C. 2006. The real-time status of island phenomena. Language 82: 795–823. Phillips, C. 2010. Should we impeach armchair linguists? Japanese/Korean Linguistics 15: 49–64. Phillips, Colin. 2013. On the nature of island constraints I: Language processing and reductionist accounts. In Jon Sprouse and Norbert Hornstein (eds), Experimental syntax and island effects, 64–108. Cambridge: Cambridge University Press. Phillips, C., and M. Wagers. 2007. Relating structure and time in linguistics and psycholinguistics. In Oxford handbook of psycholinguistics, 739–756. New York, NY: Oxford University Press. Pickering, M., and G. Barry. 1991. Sentence processing without empty categories. Language and Cognitive Processes 6: 229–259. Pollard, C., and I. A. Sag. 1992. Anaphors in English and the scope of binding theory. Linguistic Inquiry 23: 261–303. Pollatsek, A., and R. Treiman (eds) 2015. The Oxford handbook of reading. New York: Oxford University Press. Potter, M. C. 1984. Rapid serial visual presentation (RSVP): A method for studying language processing. New Methods in Reading Comprehension Research 118: 91–118. Rayner, K. 1978. Eye movements in reading and information processing. Psychological Bulletin 85: 618–660. Rayner, K. 1998. Eye movements in reading and information processing: 20 years of research. Psychological Bulletin 124: 372. Rayner, K., J. Ashby, A. Pollatsek, and E. D. Reichle. 2004. The effects of frequency and predictability on eye fixations in reading: Implications for the E-Z Reader model. Journal of Experimental Psychology: Human Perception and Performance 30: 720–732. Rayner, K., and S. A. Duffy. 1986. Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity. Memory and Cognition 14: 191–201. Rayner, K., S. P. Liversedge, S. J. White, and D. Vergilino-Perez. 2003. Reading disappearing text: Cognitive control of eye movements. Psychological Science 14: 385–389. Rayner, K., and A. Pollatsek. 1989. The psychology of reading. New York: Erlbaum. Rayner, K., S. C. Sereno, R. K. Morris, A. R. Schmauder, and C. Clifton, Jr. 1989. Eye movements and on-line language comprehension processes. Language and Cognitive Processes 4: SI21–SI49. Rayner, K., T. Warren, B. J. Juhasz, and S. P. Liversedge. 2004. The effect of plausibility on eye movements in reading. Journal of Experimental Psychology: Learning, Memory, and Cognition 30: 1290–1301. Reichle, E. D., A. Pollatsek, D. L. Fisher, and K. Rayner. 1998. Toward a model of eye movement control in reading. Psychological Review 105: 125–157. Reichle, E. D., T. Warren, and K. McConnell. 2009. Using E-Z Reader to model the effects of higher-level language processing on eye movements during reading. Psychonomic Bulletin and Review 16: 1–21.

360

dave kush and brian dillon

Reinhart, T., and E. Reuland. 1993. Reflexivity. Linguistic Inquiry 24: 657–720. Runner, J. T., and K. D. Head. 2014. What can visual world eye-tracking tell us about the binding theory? Empirical Issues in Syntax and Semantics 10: 269–286. Runner, J. T., R. S. Sussman, and M. K. Tanenhaus. 2003. Assignment of reference to reflexives and pronouns in picture noun phrases: Evidence from eye movements. Cognition 89: B1–B13. Runner, J. T., R. S. Sussman, and M. K. Tanenhaus. 2006. Processing reflexives and pronouns in picture noun phrase. Cognitive Science 30: 193–241. Safir, K.The syntax of anaphora. New York: Oxford University Press. Scheepers, C., B. Hemforth, L. Konieczny, and R. P. G. van Gompel. n.d. Monotonicity in headfinal sentence processing: Top-down prediction of verb valency. MS. Schotter, E. R., and K. Rayner. 2015. The work of the eyes during reading. In A. Pollatsek and R. Treiman (eds), The Oxford handbook of reading, 44–59. New York: Oxford University Press. Sells, P. 1987. Aspects of logophoricity. Linguistic Inquiry 18(3): 445–479. Sloggett, S. 2017. When errors aren’t: How comprehenders selectively violate Binding Theory. PhD dissertation, University of Massachusetts, Amherst. Sprouse, J., and D. Almeida. 2013. The empirical status of data in syntax: A reply to Gibson and Fedorenko. Language and Cognitive Processes 28: 222–228. Sprouse, J., and E. F. Lau. 2013. Syntax and the brain. In M. den Dikken (ed), The Cambridge handbook of generative syntax, 971–1005. New York: Cambridge University Press. Staub, A. 2015. The effect of lexical predictability on eye movements in reading: Critical review and theoretical interpretation. Language and Linguistics Compass 9(8): 311–327. Stowe, L. A. 1986. Parsing WH-constructions: Evidence for on-line gap location. Language and Cognitive Processes 1(3): 227–245. Sturt, P. 2003. The time-course of the application of binding constraints in reference resolution. Journal of Memory and Language 48: 542–562. Townsend, D. J., and T. G. Bever. 2001. Sentence comprehension: The integration of habits and rules. Cambridge, MA: MIT Press. Traxler, M. J., and M. J. Pickering. 1996. Plausibility and the processing of unbounded dependencies: An eye-tracking study. Journal of Memory and Language 35(3): 454–475. Traxler, M. J., M. J. Pickering, and C. Clifton. 1998. Adjunct attachment is not a form of lexical ambiguity resolution. Journal of Memory and Language 39: 558–592. Trotzke, A., M. Bader, and L. Frazier. 2013. Third factors and the performance interface in language design. Biolinguistics 7: 1–34. Tutunjian, D., F. Heinat, E. Klingvall, and A. L. Wiklund. 2017. Processing relative clause extractions in Swedish. Frontiers in Psychology 8: 2118. van Gompel, R. P., M. J. Pickering, and M. J. Traxler. 2000. Unrestricted race: A new model of syntactic ambiguity resolution. In A. Kennedy, R. Radach, D. Heller, and J. Pynte (eds), Reading as a perceptual process, 621–648. New York: Elsevier. van Gompel, R. P., M. J. Pickering, and M. J. Traxler. 2001. Reanalysis in sentence processing: Evidence against current constraint-based and two-stage models. Journal of Memory and Language 45: 225–258. van Gompel, R. P., M. J. Pickering, J. Pearson, and S. P. Liversedge. 2005. Evidence against competition during syntactic ambiguity resolution. Journal of Memory and Language 52: 284–307.

eye-tracking and experimental syntax

361

Van Valin, R. D., and R. J. LaPolla. 1997. Syntax: Structure, meaning, and function. Cambridge: Cambridge University Press. Vasishth, S., S. Brüssow, R. L. Lewis, and H. Drenhaus. 2008. Processing polarity: How the ungrammatical intrudes on the grammatical. Cognitive Science 32: 685–712. Vasishth, S., T. von der Malsburg, and F. Engelmann. 2013. What eye movements can tell us about sentence comprehension. Cognitive Science 4: 125–134. von der Malsburg, T., and B. Angele. 2017. False positives and other statistical errors in standard analyses of eye movements in reading. Journal of Memory and Language 94: 119–133. von der Malsburg, T., and S. Vasishth. 2011. What is the scanpath signature of syntactic reanalysis? Journal of Memory and Language 65(2): 109–127. von der Malsburg, T., and S. Vasishth. 2013. Scanpaths reveal syntactic underspecification and reanalysis strategies. Language and Cognitive Processes 28: 1545–1578. Vosse, T., and G. Kempen. 2009. In defense of competition during syntactic ambiguity resolution. Journal of Psycholinguistic Research 38: 1. Wagers, M. W. 2013. Memory mechanisms for wh-dependency formation and their implications for islandhood. In Jon Sprouse and Norbert Hornstein (eds), Experimental syntax and island effects, 161–185. Cambridge: Cambridge University Press.

c ha p t e r 1 1 ...........................................................................................................

speed–accuracy trade-off modeling a n d i t s i n t e r fa c e w i t h e x p e r i m e n ta l s y n ta x ...........................................................................................................

stephani foraker, ian cunnings, and andrea e. martin

In this chapter, we review key insights gained by using the speed–accuracy trade-off (SAT) technique to address psycholinguistic and linguistic issues. SAT evidence has been instrumental in integrating sophisticated memory models into psycholinguistic theory, and bears on several linguistic issues in experimental syntax. We explain how SAT can provide clear evidence about the time-course of processing that is unconfounded by accuracy or probability of interpretation over trials, and in so doing, can fruitfully inform debates about processing and representation. Many advances in linguistic theory have been made via acceptability judgments (Sprouse, Schütze, and Almeida 2013). Yet, linguistic judgments, like any other kind of judgment, are inherently susceptible to a speed–accuracy trade-off—a phenomenon where the probability of making a particular judgment can change as information accumulates over time. For example, it is well attested in agreement attraction that ungrammatical sentences, such as The key to the cabinets were rusty, may be perceived as grammatical when a reader is under time pressure, even though native language users may be more likely to reject this type of sentence as ungrammatical when given enough time (e.g. Parker 2015; Phillips, Wagers, and Lau 2011; Wagers, Lau, and Phillips 2009). This indicates that perception of sentence acceptability, or indeed perception of sentence plausibility or the availability of a particular sentence interpretation, interacts with the time taken to make a particular response. The SAT technique provides an

364

stephani foraker, ian cunnings, and andrea e. martin

explicit way of assessing this inherent trade-off between speed and accuracy in linguistic judgments as they unfold over time. One well-motivated reason for why linguistic judgments are susceptible to a speed– accuracy trade-off is that making such judgments involves accessing information from memory, particularly when dependent elements are separated by intervening material. In cases of agreement attraction, such as The key to the cabinets were rusty, correctly rejecting the sentence as ungrammatical requires accessing the sentence subject (the key) from memory at the verb (were), and assessing the dependency between these elements as ungrammatical, due to the number mismatch. Attraction errors occur when the intervening, linearly closer constituent (the cabinets) is bound1 with the verb based on the matching number features, instead of querying memory to retrieve the syntactic, singular subject. That is, dependency creation requires accessing the correct item in memory.

11.1 SAT unconfounds quality of information from time-course of processing

..........................................................................................................................

Much of our current understanding of language processing comes from studies using measures of processing time to test hypotheses about the nature of the representations and processes deployed in real-time language production and comprehension. Common timing measures include reaction times for judgments about an expression, reading time or eye-movement measures while reading an expression, or, as in the visual world paradigm, the time at which eye-movements are launched to visual objects in a display as a related spoken utterance is interpreted. In many, if not most, applications of timing measures, researchers compare an expression containing a property that is hypothesized to tax comprehension operations with a minimally contrastive control expression, with the prediction that the former will take longer to read than the latter. A positive finding provides support for the researcher’s hypothesis and evidence against any model that is not able to draw a principled distinction between the two types of expressions. Although untimed measures, such as percent accuracy or acceptability ratings, might reveal a comparable effect, timing measures are often preferred, as they are generally considered more sensitive than other behavioral measures, particularly for highly accurate response measures (Sternberg 1966). Note that in these types of applications, timing data are used simply as an ordinal measure—as a means to verify that one type of expression is indeed more taxing to process than another. 1 We use the terms “binding” and “bound,” as found in the memory literature, to refer to the mechanism by which information in memory is integrated together (e.g. Cohen and Eichenbaum 1993; Hagoort 2003; James 1918), rather than the more specific use of these terms in the linguistics literature to refer to specific cases of anaphoric coreference in the presence of c-command (e.g. Chomsky 1981).

speed–accuracy trade-off modeling and its interface

365

Following seminal work by Sternberg (1969), which reintroduced and extended earlier work by Donders (1868/1969), timing measures have been used to address a host of questions concerning the nature and organization of mental architectures. Crucially, the key assumption in such applications is that differences in timing measures scale to differences in real-time mental operations, functioning as interval or ratio measures. For situations when this assumption holds, it licenses the use of timing measures to investigate components that are essential precursors to developing fully articulated models of language processing, such as when certain processes are operative, how component operations within a complex skill are organized (e.g. in serial or parallel), or whether one source of information bypasses or suppresses the use of another. To what degree can we be certain that differences in timing measures scale to differences in real-time mental operations? As noted previously, with very few exceptions, most timing measures are sensitive to trade-offs between speed and accuracy (Wickelgren 1977). In language comprehension tasks, comprehenders have flexibility over the depth to which they process an expression in a given context, and (if required) when and how to execute an overt response. Consequently, differences in timing measures can reflect differences in subjective criteria rather than just intrinsic differences in the speed of processing. Language scientists often try to control for or rule out criterion shifts by assessing measures of processing accuracy, either direct or indirect, to supplement basic timing measures. However, the approaches are valid only if accuracy level is measured at the same point in time that the timing measure is collected. A second, more formidable concern is that common timing measures are not pure measures of underlying processing speed. Rather, they are sensitive to factors that affect comprehension accuracy, particularly the quality and availability of information required to construct a meaningful interpretation. Furthermore, although language comprehension engages a set of highly overlearned, largely automatic cognitive processes, it is not without error. For example, two types of expressions may differ in reading time or acceptability judgment latency if (a) the quality of the resulting interpretations substantially vary (e.g. their acceptability, plausibility, specificity, etc.), or (b) errors in key operations are more likely in one than the other (e.g. failure to retrieve essential information, misanalyses of grammatical relations). Such factors that affect quality or availability of information lead to differences in interpretation accuracy, which can vary independently of the time taken to compute that interpretation. As a consequence, researchers cannot straightforwardly interpret a difference in most commonly used timing measures purely as an underlying difference in processing speed. The SAT method, then, provides separable measures of speed and accuracy by modeling how comprehension accuracy develops over processing time. To help illustrate this central problem with common timing measures, consider sentences (1)–(4). (1) The doctor realized the boy yelled. (2) This is the boy [who the doctor realized] yelled.

366

stephani foraker, ian cunnings, and andrea e. martin

(3) This is the boy [who the doctor [who calmed the mother] realized] yelled. (4) This is the boy [who the doctor [who ordered a blood test] realized] yelled. Adding material between the dependent elements of boy and yelled typically introduces differences that affect the accuracy of the resulting interpretations due to differing availability or quality of information necessary to create and resolve the dependency. For example, the interpolated clause(s), shown in square brackets, may alter expectations about upcoming information. They may also increase the likelihood of misparsing the long-distance dependency in the expression, as the interpolated material may introduce alternative attachment sites (possessing some degree of local coherence) that the reader does not recover from on some proportion of trials (e.g. in (3), did the mother yell or the boy yell?). Even if expectations are held constant and the potential for misparsing is minimized, as we will discuss, many studies now leave little doubt that interpolated material can adversely affect sentence comprehension. All of these factors can affect the quality and accuracy of interpretation, without necessarily affecting the time-course of processing. The predictions that are crucial for evaluating many fundamental hypotheses about language processing, particularly those that address basic architectural issues, often concern the respective speed of processing for different types of expressions. Timecourse measures are critical for answering questions such as whether the interpretation of one type of expression increases the complexity of a particular operation, whether it recruits an altogether different type of operation, or whether it requires more operations than another (e.g. Bott, Bailey, and Grodner 2012; Bott, Rees, and Frisson 2016; McElree and Griffith 1995; 1998; McElree and Nordlie 1999; McElree, Pylkkänen, Pickering, and Traxler 2006). Time-course measures also provide the primary means of investigating general architectural issues, such as whether there are contingencies in the organization of component operations, with some operations having temporal priority over others, vs. organized in an interactive fashion (Bornkessel, McElree, Schlesewsky, and Friederici 2004; Martin and McElree 2018; McElree 1993). Hence, the great value of SAT measures, in contrast to other timing measures, is that they unconfound the quality or availability of information from the time-course of processing that information.

11.2 The SAT methodology

..........................................................................................................................

In most situations when timing measures are collected, it is up to the participant to strike a reasonable balance between the competing demands of accuracy and speed. Typically, a speed–accuracy trade-off allows the probability of making a correct decision to increase as information accumulates over time. In the SAT procedure, this trade-off is for the most part controlled by having participants respond at set time points. As quickly as possible after a signal—usually an auditory tone for a reading task—the participant makes a response. Signal time-points are chosen to capture the

speed–accuracy trade-off modeling and its interface

367

full span of processing from before the beginning of a critical word or region until after the end, providing a window onto processing as it unfolds over time. The signals can be distributed either across trials with one deadline per trial (single-response SAT; for illustration, see McElree 2000), or they can all occur on each trial (multiple-response SAT; for illustration, see McElree 1993). A critical aspect of both SAT variants—one that distinguishes the SAT from all other alternative measures of processing speed (see Wickelgren 1977)—is that participants do not themselves decide when they want to respond. By constraining participants to respond at both early and late times, researchers control potential speed–accuracy trade-offs and can thereby obtain unbiased measures of accuracy and rate of processing at each time point. In SAT investigations of language comprehension, researchers have required participants to discriminate acceptable from unacceptable expressions, with sets that include yoked acceptable and unacceptable conditions. For example, to measure the speed and accuracy of resolving the long-distance dependency between yelled and the boy in (2), the participant judges whether the sentence This is the boy who the doctor realized yelled is acceptable or not, and on another trial judges whether an unacceptable counterpart such as This is the boy who the doctor realized tore* is acceptable or not. The final verb, yelled vs. tore, is the locus of the critical binding operation at question here, and hence, the probes signaling the participant to make an acceptability judgment need to begin just before that critical point of the sentence. Furthermore, it is of fundamental importance that the source of unacceptability clearly taps the particular issue being investigated, and is not confounded with other sources of unacceptability. If the locus of the unacceptability is not carefully chosen to target the dependency under investigation, interpretation of the data will be murky at best. In sentences (2)–(4), the syntactic structures are unacceptable sentences only when comprehenders attempt to bind the clefted NP to the incompatible verb (the boy tore). Requiring participants to discriminate acceptable from unacceptable expressions obliges them to determine the acceptability of the final verb and clefted subject binding to obtain above chance performance. A strength of SAT methodology is that the judgment data are fit individually for each participant, and typically for each item, and then patterns across participants and items are evaluated for consistency. One key consideration is that to calculate stable SAT functions, many observations for each participant and item are necessary. Therefore, in most applications of SAT, each participant is presented with all versions of an item. For examples (1)–(4), representing four conditions, participants would encounter eight versions of an item—four acceptable, four unacceptable. The eight versions would be counterbalanced across sessions on different days and shown in a different order to each participant, but every participant would judge all eight versions of each item. This is in contrast to contemporary psycholinguistic experiment designs, where different conditions of an item are counterbalanced across participants. SAT results have been criticized on these grounds, since repeated exposure may introduce different routines or strategies that do not reflect typical processing. However, several investigations have revealed converging evidence from eye-tracking of reading (Foraker and McElree 2007; Martin and McElree 2008; 2011; Van Dyke and

368

stephani foraker, ian cunnings, and andrea e. martin

McElree 2011), as well as judgment latency (McElree 1993; McElree and Griffith 1995; 1998) and ERP measures (Bornkessel et al. 2004), reducing concerns about the generalizability and construct validity of SAT evidence. Additionally, including astute control conditions and/or additional experiments to rule out alternative explanations for the observed results remain a critical element of effective experimental design (see e.g. Martin and McElree 2008; McElree and Griffith 1998 for discussion). To calculate SAT functions, responses are typically corrected for response biases by transforming percentage correct into d-prime (d′ ) for each time point. One reader may be more liberal overall, registering more acceptable judgments for the materials, while another may be more conservative, with fewer acceptable judgments; d′ provides a way to standardize across this variation. For binary yes/no judgments, an equal-variance Gaussian d′ is typically used, which is the z-transform of the hit rate minus the z-transform of the false alarm rate, d′ = z(P(“yes”|acceptable)) − z(P(“yes”|unacceptable)).2 A hit is when an acceptable sentence is correctly interpreted as acceptable (boy is bound with yelled), while a false alarm is when an unacceptable sentence is incorrectly interpreted as acceptable (boy is bound with tore). Hypothetical data shown in Figure 11.1 illustrate accuracy (black circles) at different response signal time points for one condition. A curve representing the growth of response accuracy is also shown (solid line), fit with an exponential approach to a limit, plotting accuracy (d′ ) as a function of processing time (t): d′ = λ (1 – e− β ( t −δ) ) for t > δ, otherwise 0. A d′ of zero equals chance performance, with scores approaching 4 reflecting nearly perfect performance. SAT functions show an initial period of chance performance followed by a monotonically increasing function, culminating in the final asymptotic level. The three parameters of the exponential, λ, β, and δ, are used to estimate how conditions vary in the three phases of processing illustrated in Figure 11.1. The parameter λ represents the asymptote of the function, and it provides an estimate of the highest level of discrimination reached with maximal processing time. The parameters δ and β provide joint measures of the speed of processing—also referred to as the time course dynamics—indexing how quickly accuracy accrues to its asymptotic level. The parameter δ estimates the intercept of the function, which is when accuracy departs from chance level, and provides an estimate of the point in time at which comprehenders first show sensitivity to the information necessary to discriminate acceptable from unacceptable sentences. The parameter β estimates the slope, or rate, at which accuracy grows from chance to asymptote. Determining how experimental conditions impact the shape of their corresponding SAT function requires a hierarchical model-testing scheme in which different combinations of λ, β, and δ are competitively applied to the SAT functions. First, in order to obtain a robust estimate of asymptote, averaging the final 2–3 d′ points into a single bin can assure that a larger sample is the basis of the asymptote estimates, in turn Pseudo-d′ can also be computed to examine incorrect responses to the target experimental sentences; see McElree (1998), McElree and Dosher (1989). 2

speed–accuracy trade-off modeling and its interface Chance

Information Accrual

369

Terminal Accuracy

4 Accuracy (d’ units)

Asymptote

3 Rate

2 1 0

Intercept

0

1.5

3.0

4.5

Total Processing Time (Interruption lag plus latency) in Sec fig. 11.1 SAT function for one condition, illustrating the three phases of processing.

allowing for more stable estimates of the dynamics parameters. To find the best-fitting set of parameters for different conditions, the number of asymptotes, intercepts, and rates are systematically varied from a null model (one asymptote, rate, and intercept, 1λ–1β–1δ, for all data points from all conditions), through hypothesis-driven combinations, like 4λ–1β–2δ, for example, to a fully saturated model, such as 4λ–4β–4δ for four conditions. Panel A of Figure 11.2 illustrates a case where two conditions are associated with the same intercept and rate but differ in asymptotic accuracy, with Condition 2 having a lower asymptote. In terms of the underlying distribution of finishing times for the judgment at hand, illustrated in Panel A of Figure 11.3, this asymptote pattern can arise if fewer processes in one trial and/or across trials successfully complete in Condition 2 than Condition 1. In this illustration, those that do successfully complete do so with a comparable distribution of times, and the overlapping finishing times result in equivalent intercepts and proportional slopes (rate). Panels B in Figures 11.2 and 11.3 illustrate a case where two conditions are associated with the same asymptote and rate but different intercepts, with Condition 2 having a delayed intercept. Panels C in Figures 11.2 and 11.3 illustrate the case where two conditions are associated with the same asymptote and intercept but different rates, with Condition 2 having a disproportionately slower approach to asymptote. These two patterns reflect differences in the speed of processing only. Differences in SAT intercept correspond to differences in the minimum of the finishing time distributions, whereas differences in SAT rate correspond to differences in the variances of the distributions. For example, if one condition consistently requires additional computational operations applied in a serial or cascading fashion, then relative to a condition with fewer

370

stephani foraker, ian cunnings, and andrea e. martin 4

1

Panel A

3 2

2

Accuracy in d’ units

1 Chance Level

0 1 4

2

Panel B

3 Panel C 1

3

2 2

1 2

1

Chance Level

0 0

1

Chance Level

3

2

1

2

3

Processing time (tone lag + response latency) in seconds

fig. 11.2 Idealized differences in the three phases of the SAT functions for two conditions. Panel A

Probability process completed

1

2

Panel B 1

Panel C 1

2

2

Time Process completed

fig. 11.3 Idealized differences in the finishing time distributions corresponding to the SAT differences shown in Fig. 11.2.

operations, the finishing time distribution will be shifted toward longer times, manifesting as different intercepts. On the other hand, if one condition entails a relaunched

speed–accuracy trade-off modeling and its interface

371

operation on some subset or proportion of trials, the distribution will be more positively skewed, leading to a disproportionately slower rate. To wit, reanalysis may be attempted on misparsed trials, or additional queries of memory to retrieve lower quality or less frequent information may be needed (see also McElree and Dosher 1989; 1993; Reed 1976). A lower asymptote for the condition with the slower rate is furthermore consistent with this kind of explanation. The best-fitting model is chosen by a combination of criteria. In most published applications, models have been fit with a least-squares error criterion (Chandler 1969; Reed 1976), with the quality of the fit assessed by goodness-of-fit statistics, such as adjusted-R2 (Judd and McClelland 1989). Model fits are performed for the averaged data for expository purposes, but it is essential to model each participant’s data separately (and by items), allowing evaluation of the consistency of parameter estimates across participants (and items), and inferential tests of significance. The most crucial aspects of model fitting are that (a) only differences that exist in observed d′ should be posited in the models, and (b) such patterns should be largely evident across individual participants and/or items. That is, statistical tests on d′ across participants and/or items should support a reliable difference between conditions before any difference is posited in the asymptote parameter (λ). Otherwise, the estimates of the SAT dynamics parameters (β, δ) will be biased to account for a difference that does not exist in the d′ data. At the core of the hierarchical model testing procedure is allocating a minimum number of parameters to best account for the variance in how d′ accrues over time. As such, the first model fit is a 1λ–1β–1δ model which fits all conditions with the same three parameter values. Even if a difference in d′ by condition already exists, starting with 1λ–1β–1δ establishes the baseline adjusted-R2 . Next, if there is an observable difference in d′ between two conditions, the next models to test are 2λ–1β–1δ, 2λ–2β–1δ, 2λ–1β–2δ, and 2λ–2β–2δ. Those four are evaluated for the best fitting model, assessing which SAT parameter estimates differ systematically and reliably across participants (and items) using inferential statistics. The best-fitting model will typically have the highest adjusted-R2 across participants and usually for the average data, but we note that, in our experience, adjusted-R2 alone is not diagnostic to the best-fitting model—some models result in adjusted-R2 that differ on such small orders of magnitude that it is difficult to assign meaning to that difference. It is therefore more crucial that any differences in parameter values—whether asymptote, rate, or intercept—are reliable across participants (and items). These two points require that empirical d′ and parameter estimates are reported for individual participants. If evidence for a difference in SAT dynamics parameters is found, one way to guard against parameter trade-off—the phenomenon where variance in one parameter is allocated to another parameter—is to perform fixed parameter fits, where, for example, the asymptote parameter (λ) is fixed to the value of d′ (based on averaging over the last 2–3 time points, as noted above) and not allowed to vary. This will force the model to assign any remaining applicable variance to the other parameters, rather than erroneously account for that variance by modulating the asymptote as well. In the case of a veridical difference in processing speed, reliable differences between conditions across

372

stephani foraker, ian cunnings, and andrea e. martin

participants should appear even when the asymptotes are not allowed to vary from the d′ values. A similar approach can be taken when trying to evaluate the relationship between rate (β) and intercept (δ). Intercept can be inferred from the first time lag where d′ departs from chance; either fixing intercept to this value, or fitting only a single dynamics parameter at a time, can be employed to check for parameter trade-off between rate (β) and intercept (δ). Finally, the best-fit model should be interpreted in light of the competing explanations, accounts, or theories being tested. Significant differences in asymptote along with null results for intercept or rate might support one explanation, while differences in both asymptote and rate are more consistent with another, and so on. What is of interest is in which parameter(s) the differences emerge and for which conditions, and having principled predictions for the presence or absence of such differences. Additionally, replication across SAT experiments and converging evidence from other measures and methods can constrain and solidify inferences.

11.3 Memory operations are fundamental to language processing

..........................................................................................................................

Memory-based operations, such as encoding, storage, and retrieval, have long been acknowledged as important factors in language production and comprehension, and in constraining linguistic theory. From the early days of psycholinguistics, limits on center-embedding or other aspects of language complexity illustrated that memory constraints can interact with linguistic content and sentence acceptability (e.g. Miller and Chomsky 1963). Attempts to explain why memory limitations may determine the upper bound on our ability to interpret complex sentence structures have typically appealed to working-memory capacity limits, usually focusing on memory storage capacity. Perhaps the best-known theory of this type is Daneman and Carpenter’s (1980) capacity-based theory of sentence comprehension, which explains difficulty in sentence processing in terms of the amount of information that an individual must hold in memory at one time. Consider again examples (2–4) above. Here, successful comprehension requires encoding a representation of the boy when it is first encountered, storing it in working memory whilst the following constituents are processed (and themselves encoded in working memory), and then retrieving it from memory at the verb yelled. Similarly, locality effects have been explained in terms of working-memory load, where progressively greater processing difficulty occurs with additional unresolved dependencies, due to maintaining more items in working memory, as well as increasing the distance between elements being integrated (Gibson 2000; Grodner and Gibson 2005; Warren and Gibson 2002). However, a substantial body of research shows that memory-based restrictions on language are best described not in terms of the amount of information that needs to be

speed–accuracy trade-off modeling and its interface

373

held in limited-capacity working memory at one time, but rather in terms of the retrievability of information in memory, based on its content and quality (see meta-analysis: Jäger, Engelmann, and Vasishth 2017; reviews: Foraker and McElree 2011; Van Dyke and Johns 2012; Parker, Shvartsman, and Van Dyke 2017; also: Gordon, Hendrik, and Johnson 2001; 2004; Gordon, Hendrik, and Levine 2002; Martin 2016; Martin, Nieuwland, and Carreiras 2012; 2014; Van Dyke 2007; Van Dyke and Lewis 2003; Van Dyke and McElree 2006; 2011; Van Dyke, Johns, and Kukona 2014; Vasishth, Brüssow, Lewis, and Drenhaus 2008). Importantly, this approach suggests a fundamentally different way of assessing how memory may influence sentence complexity, one that shifts away from fixed capacity limits to explanations that emphasize the ability to discriminate between which items need to be retrieved from memory during sentence processing. As such, to fully understand the interaction between memory constraints and language, one requires not only well-defined linguistic theory, but also a well-motivated theory of memory operations and architecture.

11.3.1 Candidate memory operations A variety of cognitive mechanisms and architectures could theoretically play a role in language comprehension (see also Foraker and McElree 2011). Evidence from speed– accuracy trade-off modeling has supported a theory of memory access during sentence processing that involves direct-access retrieval. In this model, memory retrieval involves matching a set of retrieval cues against items in memory. The cues available at retrieval enable access to content-addressable memory representations in one step (hence, direct). Content-addressability means that cues at the retrieval site make contact with memory representations that have overlapping content (McElree and Dosher 1989; 1993; McElree 1996; 1998; 2006; Öztekin and McElree 2007; 2010), and direct-access means retrieval can proceed without recourse to search through extraneous or unrelated memories for the to-be-retrieved item (e.g. Clark and Gronlund 1996; Kohonen 1984). That is, the cues available at the point of retrieval resonate with items in memory according to the amount of (partially) matching content, and the item retrieved is the one with the most overlap or best fit (e.g. Ratcliff 1978). Perhaps the most notable advantage of this type of memory mechanism is that it enables the rapid recovery of past representations, without introducing the distancedependent processing-time cost found in search operations needed to recover relational information between items (e.g. McElree and Dosher 1993). However, equally notable is the disadvantage that cue-driven direct-access operations are highly susceptible to interference from other constituents in memory that match the cues used for retrieval. Basic memory research indicates that similarity in memory creates retrieval interference through cue-overload, where retrieval cues cannot reliably elicit any single target because they are associated with other items in memory (e.g. Öztekin and McElree 2007; 2010; Nairne 2002a; 2002b; Watkins and Watkins 1975). Alternative candidate memory architectures include serial search, parallel search, and active maintenance. The key prediction of a search operation is that processing time is

374

stephani foraker, ian cunnings, and andrea e. martin

a function of the number of items in the memory set that must be searched prior to a response. Serial search retrieval is a one-by-one, relatively slow search that is necessary for recovering order information, such as the recency of elements in time or across space, and produces intercept differences (Gronlund, Edwards, and Ohrt 1997; McElree 2001; 2006; McElree and Dosher 1993). In parallel search retrieval, possible target items are accessed in memory at the same point in time, but produce functions that differ based on the rate of information accrual for the parallel comparisons (Murdock 1971; Townsend and Ashby 1983). Another fundamental cognitive operation involved in comprehension is active maintenance. Modern conceptions of the memory system include controlled attention, where one’s focus of attention is an extremely limited-capacity state into and out of which information is shunted very quickly. Several lines of evidence derived from a variety of cognitive and perceptual tasks indicate that a very limited amount of information can be maintained in focal attention (3–4 units: Cowan 2001; 2005; 1 unit: McElree 1998; 2001; 2006). McElree’s (2006) conception states that focal attention is just one processing chunk which is quickly replaced by the next chunk of information in mental processing (McElree 1998; Öztekin, Davachi, and McElree 2010). What constitutes a memory chunk in sentence comprehension is currently underspecified, as it could denote a word, phrase, clause, or potentially (though unlikely) larger stretches of text. In their computational implementation of cue-based retrieval, Lewis and Vasishth (2005) assumed maximal projections constituted a single chunk in memory.

11.3.2 The nature of content-addressable cues Research within the cue-based framework of language comprehension (Jäger et al. 2017; Lewis et al. 2006; Martin 2016; McElree 2000; McElree et al. 2003; Nicenboim and Vasishth 2018) suggests that representations formed during sentence processing are content-addressable. That is, memory retrieval during language comprehension involves matching a set of retrieval cues against items in memory. The item that provides the best match is then retrieved. In sentence processing, retrieval cues can be generated by (at least) phonological, morphosyntactic, lexical, syntactic, semantic, pragmatic, or discourse information. As noted above, however, accessing memory in this way leads to the possibility of similarity-based retrieval interference, when multiple items partially match the cues available at the retrieval site. In such cases, discrimination between an intended retrieval target and competitors becomes more difficult. To illustrate this principle, consider sentences (1)–(4), once again. In (1), the subject and verb are adjacent to one another, providing optimal conditions for incrementally building a representation of the complement clause for the matrix verb realize: having just processed the subject NP the boy, the comprehender can immediately match it with the final verb, yelled. In contrast, (2) is more challenging to process since the doctor realized is now a relative clause intervening between the boy and yelled. Processing the interpolated material will displace

speed–accuracy trade-off modeling and its interface

375

the subject NP from active processing, necessitating a retrieval operation to restore the boy to active processing when yelled is encountered (McElree et al. 2003). Here, the boy and the doctor share syntactic and semantic features, making retrieval more difficult (syntactic subject of its clause, agent role, and animacy), as well as other aspects of representational similarity (number and gender, stereotypically male for doctor), which may impinge on forming the correct dependency and final interpretation. In (3) and (4), an additional relative clause is inserted (who calmed the mother or who ordered a blood test), but notice that in (3), overlap of the animacy feature between the mother and the boy entails another source of interference, while the inanimate blood test in (4) does not. In these ways, similarity-based interference at retrieval is a necessary by-product of the way linguistic memory is hypothesized to be accessed. To reiterate, memory operations affect the availability and quality of information needed for language comprehension. Whether interpreting spoken, written, or signed language, comprehenders must reconstruct linguistic relationships among the sequentially presented elements that encode meaning.

11.3.3 SAT predictions for memory operations Recall that asymptote differences in SAT fits reflect the likelihood that an acceptable interpretation is computed or the degree of acceptability of the interpretation. Many factors can contribute to acceptability. A higher asymptote could be due to the higher likelihood of successfully retrieving a representation of a constituent that is sufficient to resolve a non-adjacent dependency (e.g. a subject for a verb, a filler for a gap, an antecedent for an ellipsis or pronoun), or due to the interpretation of the stimulus in a given condition being more plausible or natural than in another condition. Lower asymptotes can be interpreted as a reduction in the quality of retrieved information or as a failure to retrieve the required constituent on a proportion of trials. This includes failed retrieval attempts, including cases where retrieval of an inappropriate item leads to an anomalous interpretation. On some trials, an incorrect first retrieval attempt that results in an inappropriate representation or a problematic one can be followed up with another retrieval attempt that produces an acceptable interpretation; in this way, averaging over several trials contributes to an overall lower asymptote for that condition. One must keep in mind that inferring causes of asymptote differences is constrained asymmetrically: If the likelihood of successful retrieval is low, then empirical d′ and estimates of asymptote accuracy will also be low, but if d′ and asymptote are low, it does not mean that the decrease comes from the retrieval process alone. Interpretation and retrieval cannot be orthogonally dissociated through SAT modeling; only through the careful design of stimuli that differ only in variables hypothesized to affect retrieval but not subsequent interpretation can inferences purely about retrieval be drawn from differences in d′ or asymptote. However, inferences about retrieval can be made should there be no difference in the speed of processing between conditions, because that is

376

stephani foraker, ian cunnings, and andrea e. martin

indicative of the content-addressable direct-access retrieval mechanism. Hence, rates and intercepts should not differ despite variation in asymptotic accuracy if the directaccess mechanism is at work. For serial search, the one-by-one, iterative process produces a linear function, and in SAT, the crucial prediction is that intercepts should increase in time as a function of the number of items that must be searched prior to finding a match (e.g. McElree and Dosher 1989; 1993; Neath 1993; Neath and Knoedler 1994; Öztekin, McElree, Staresina, and Davachi 2008; Sternberg 1975). Parallel search, on the other hand, predicts that only the SAT rates would reflect a speed of processing difference. Parallel search is distinguishable from direct-access retrieval because it predicts no difference in finishing times for positive decisions (d′ is based on hits and false alarms, which are both “yes/acceptable” decisions) as a function of set size, resulting in a linear relationship between reaction time and set size (Murdock 1971). For language processing, then, SAT rates should decrease systematically as the number of potential binding comparisons at stake increases. Inasmuch as the hierarchical structure of a sentence is often encoded by the order of constituents within a string, predominantly so in languages such as English, one could argue that a serial search like that used to retrieve recency information might be required to access the elements involved in non-adjacent dependencies. Several SAT experiments have placed interpolated material between the to-be-retrieved constituent and the site of the dependency to test for a backward search mechanism (Martin and McElree 2008; 2009; 2011; McElree 2000; McElree et al. 2003; Van Dyke and McElree 2011). A few experiments have also added additional material before the to-be-retrieved constituent to test for a forward search, in which the search starts at the beginning of an expression (Martin and McElree 2009; 2011; Van Dyke and McElree 2006; 2011). McElree (1998; 2001; 2006), following Wickelgren, Corbett, and Dosher (1980), argued that measures of the speed of accessing information provide the most direct and unequivocal evidence for whether an item is represented in an active vs. passive state in memory. Measures of processing speed in several cognitive tasks have shown a sharply dichotomous pattern for information in focal attention vs. memory, with processing speed being exceptionally fast for responses based on information actively maintained in awareness (Dosher 1981; McElree 1996; 1998; 2001; 2006; McElree and Dosher 1989; 1993; McElree et al. 2003; Öztekin and McElree 2007; Wickelgren et al. 1980). In sentence processing, one would expect to see, for example, notably faster processing on the final verb in (1) above as compared to (2)–(4) if the subject NP were actively maintained when encountering the verb. A comparatively slower speed of accessing the subject NP at the final verb for (2)–(4), and that speed being the same for all three conditions, would indicate a direct-access retrieval operation. Decreasing speed as dependency length increases across (2)–(4) would, on the other hand, be indicative of a serial search.

speed–accuracy trade-off modeling and its interface

377

11.4 SAT evidence for interactions between syntax and memory operations

..........................................................................................................................

McElree (2000) examined structures where increasing surface distance between dependent elements would predict increasing serial search time. In (5), the direct object noun phrase (the book) of a final verb (admired/*amused) was fronted to the beginning of the sentence in a cleft construction. The acceptability of the direct object as a theme of this final verb was manipulated, to be either acceptable (the book that the editor admired) or unacceptable (the book that the editor *amused). The distance between the NP and verb was increased by adding one (6) or two (7) subject-relative clauses. The retrieval site at which the dependency needs to be resolved is the final verb. Participants in the experiment judged sentences using a single-response SAT procedure in which they made an acceptability judgment response at one of six different response times, from 50ms to 3000ms following presentation of the sentence-final verb. (5) This was the book that the editor admired (*amused). (6) This was the book that the editor who the receptionist married admired (*amused). (7) This was the book that the editor who the receptionist who quit married admired (*amused). The best fit SAT function indicated that the asymptotes decreased progressively with more interpolated material, indicating a progressively lower probability of computing a correct interpretation of a sentence, consistent with a decreasing likelihood of retrieving the correct argument from memory. However, the speed of comprehension (rate and intercept) was unaffected by the amount of material intervening between the dependent elements, arguing against backward serial search. In a next step, McElree et al. (2003) tested whether hierarchical distance rather than surface distance determines search time. Embedded complement clauses differ from center-embedded subject relative clauses in that they increase not only the surface distance between the verb and its argument but also the distance along the right edge of a hierarchical structure (see McElree et al. 2003). In (8), the object NP (the scandal) is clefted out of its canonical position adjacent to the verb, while in (9) and (10), additional complement clauses are embedded between the clefted NP and the final verb. (8) It was the scandal that the celebrity relished (*panicked). (9) It was the scandal that the model believed that the celebrity relished (*panicked). (10) It was the scandal that the model believed that the journalist reported that the celebrity relished (*panicked). Once again, accuracy declined progressively as the distance between the dependent elements increased, while the speed of processing remained constant.

378

stephani foraker, ian cunnings, and andrea e. martin

In a second experiment, subject–verb dependencies were examined, contrasting cases where the elements were adjacent to one another (The book __ ripped/*laughed) with cases of intervening material of varying syntactic and semantic overlap: an object relative clause (that the editor admired), prepositional phrase plus object relative clause (from the prestigious press that the editor admired), an object relative plus subject relative clause (that the editor who quit the journal admired), or two object-relative clauses (that the editor who the receptionist married admired) intervened. When the verb was adjacent to its subject, processing speed was exceptionally fast, consistent with basic memory studies (McElree 2006) indicating that the last item processed was still active in focal attention. The increasing amount and complexity of interpolated material decreased accuracy systematically, consistent with a cue-combination, direct-access operation. Again, the speed of processing did not systematically slow with the amount of interpolated material, counter a serial search operation. Another way to test for serial search performed over syntactic structure in an iterative manner is through sluicing structures. Martin and McElree (2011) manipulated the number of syntactically available antecedents (11 and 12 one, 13 and 14 two), as well as the distance between the antecedent, studied, and sluice site, what (11 and 13 recent, 12 and 14 distant). (11) In the morning, Michael studied but he didn’t tell me what. (12) Michael studied in the morning, but he didn’t tell me what. (13) Michael slept and studied, but he didn’t tell me what. (14) Michael studied and slept, but he didn’t tell me what. In (13) and (14), both the correct antecedent, studied, and the incorrect verb, slept, are syntactically licensed. If search is syntactically constrained, the presence of slept should slow the speed of interpretation compared to (11) and (12). As well, if syntactically guided search occurs in a forward fashion, then (13) should be slower than (14), and vice versa if it is a backward search. The results, however, revealed no difference in the speed of processing, contra syntactically guided search, either forward or backward. Instead, the asymptotic differences supported straightforward direct-access retrieval. Note that these results do not support the conclusion that syntactic structure is not important during retrieval and interpretation of long-distance dependencies. Rather, that the pattern of results is consistent with the engagement of a direct-access retrieval mechanism simply means that it is unlikely that syntactic structures are being serially scanned in order to access antecedent or extracted or dislocated constituents. Similarly, Martin and McElree (2008) found that the number of words and phrases between an antecedent and its ellipsis decreased the likelihood of successful retrieval and interpretation, but did not affect time-course to retrieve and interpret the antecedent at the ellipsis site. Secondly, increasing the length and complexity of the antecedent had a similar effect such that only asymptotic accuracy was affected. Again, this pattern of results suggests that syntactic relations between antecedent and ellipsis, while clearly important, do not need to be serially scanned or iteratively evaluated during long-distance

speed–accuracy trade-off modeling and its interface

379

dependency resolution. These results by no means suggest that only semantic features are at play during retrieval, nor that syntax is not used during retrieval (see discussion of Van Dyke and McElree 2011, below, for evidence of syntactic cues). Martin and McElree (2008) account for these results by positing a pointer mechanism that can point to extant structures in memory without iteratively evaluating them or recomputing them. Syntactic complexity can alternatively be increased by the number of constituents being bound and interpreted at a retrieval point. Direct-access cues may not be sufficient when interpretation explicitly depends on the relative ordering of constituents. Memory research indicates that a (serial) search is required when relational information is at issue (McElree 2006). McElree et al. (2003) examined the dependency between a direct object noun (‘the album’) and a verb particle (spread open), with short (15) and long (16) distances between the constituents. Cases such as (17) and (18) examined variants in which the processing of a verb particle with two arguments (mount in) required resolving two non-adjacent dependencies (the album and the stamps) to construct the ditransitive verb phrase (…mount the stamps in the album). The unacceptable versions reversed the order of the arguments, resulting in an anomalous interpretation (e.g. …mount the album in the stamps). (15) This is the album that the customer found difficult to spread open. (16) This is the album that the customer who obviously angered the fussy collector found difficult to spread open. (17) This is the album that the stamps were difficult to mount in. (18) This is the album that the stamps which obviously angered the fussy collector were difficult to mount in. Distance served to lower asymptotic accuracy only, for both single- and doubleargument sentences. However, single-argument sentences were processed faster than double-argument sentences (earlier intercept and faster rate), demonstrating that resolving two arguments at the one retrieval site required additional time. One explanation of this effect is that relational order information is needed to resolve a dependency when more than one constituent is being bound and interpreted at the retrieval site. Syntactic role information can also act as a constraining cue at the retrieval site, and appears to have priority over semantic and pragmatic properties. Van Dyke and McElree (2011) compared the interpretation of sentences with differing syntactic contexts, in addition to semantic cues (in/animacy restrictions of the verb). In one experiment, the interfering material matched the syntactic cues at the verb, appearing as a syntactic subject (motion or witness in 19), while in a second experiment, the interfering material did not match, appearing in syntactic object position (20). (19) The attorney who the judge realized had declared that the motion/witness was inappropriate compromised. (20) The attorney who the judge realized had rejected the motion/witness in the case compromised.

380

stephani foraker, ian cunnings, and andrea e. martin

They found retroactive interference effects on asymptotes from a semantic competitor, consistent with much other research (McElree 2000; McElree et al. 2003; Martin and McElree 2008; 2009), but only when the interpolated competitor and to-be-retrieved target were both syntactic subjects (19). This provides evidence that the syntactic role of a constituent affects retrieval in comprehension, and that syntactic constraints appear to be weighted more heavily than semantic constraints. Additionally, syntactic constraints may limit potential sources of interference from memory constituents that have semantic properties in common with the target constituent. Hence, constraints from syntax can help counteract similarity-based interference, which is a critical weakness of a content-addressable memory system. There is also evidence that morphosyntactic information creates retrieval interference, and thus, by inference, is implicated as a retrieval cue during dependency resolution. For example, using electrophysiology, Martin et al. (2012; 2014) found that grammatical gender agreement between noun phrase ellipsis and its antecedent in Spanish is subject to interference when a noun bearing gender morphology occurs within the ellipsis dependency. Although not SAT evidence, these results highlight the importance of morphosyntax as a retrieval cue during long-distance dependency resolution. More broadly, we would like to note that, to our knowledge, no theory of cue-based retrieval minimizes or discounts the role of syntactic structure in the retrieval and interpretation of non-adjacent dependencies. The fact that interference effects have been found with origins from information inside a relative clause merely implies that the language-processing architecture can access information in a syntactic configuration that may not be licensed in other situations. This makes sense if forming long-distance dependencies is not an identical process to computing or generating syntactic structure locally.

11.5 Relations to other aspects of theoretical linguistics

..........................................................................................................................

The SAT paradigm provides insight into other kinds of questions in the theoretical linguistics literature, including types of processing and issues of representation. Against the background of distinguishing between the quality or probability of accurate interpretation on the one hand, and the time-course of processing on the other, SAT experiments can address a range of linguistic concepts. Informing a broad set of sentence processing models, McElree and Griffith (1995) found that across experiments and types of model fits, thematic role violations (Some senators offend elections) produced a later intercept or slower rate than violations of either syntactic category (Some senators repeatedly elections) or subcategorization (Some senators roar elections). These data support models of sentence comprehension where both constituent structure and subcategorization components of syntactic representations are accessible before thematic representations, thus constraining serial, cascade,

speed–accuracy trade-off modeling and its interface

381

and parallel models. In this experiment, asymptotic differences did not emerge, indicating that the three kinds of constructions were approximately equivalent in the quality and availability of information required to detect each kind of violation. Another investigation of McElree and Griffith (1998) focused on the time course of filler–gap processing. They contrasted constructions with subcategorization and thematic role violations, such as It was the evidence that the judge assumed the attorney had loathed/*gone/*astonished, with island violations, such as It was the evidence that the judge rebuked the attorney who loathed or It was the attorney who the judge researched the evidence which astonished. Again, subcategorization violations produced an earlier intercept than thematic role information. Crucially, island violations consistently showed earlier intercepts than the other sources of information, providing clear evidence that global syntactic configuration information guides a parse very early in processing. Models in which the parser is blocked from predicting gap sites within an island are supported (Stowe 1986), while strong first-resort models in which island constraints are treated as a filter applied after a gap is projected, such as the active-filler strategy (Clifton and Frazier 1989), are not. McElree (1993) examined how the relative frequencies of a verb’s syntactic frames impacts parsing. Overall results indicated that when the syntactic preference of the verb matched the sentence structure (e.g. watched in a transitive frame), a higher asymptote emerged, compared to a mismatch (e.g. rushed in a transitive frame), but did not affect time-course dynamics. Hence, preferred verb-frame frequencies exerted an influence due to stronger representations in the mental lexicon, and were applied at similar speeds over the incremental parse. As well, the asymptote differences provided evidence against frame-frequency information being applied serially, where a more frequent frame could temporarily suppress a less frequent structure. Two nuanced time-course differences in McElree (1993) are also of interest. First, a slower rate arose in NP–gap strings with an intransitive-preferring verb in a transitive construction. Second, a slower rate occurred for a syntactic garden-path construction. In both of these cases, slower rates are consistent with reanalysis, following on lower asymptotes due to impoverished retrieval cues. Additional SAT investigations have examined reanalysis and recovery processes in garden-path ambiguous sentences more specifically. Martin and McElree (2018) tested temporarily ambiguous sentences, like The actress sent the jewelry sparkled/arrived/frowned, with an initial (incorrect) matrix verb interpretation of sent, vs. the correct reduced relative clause interpretation. The verb sparkled is a weak cue for the dependency with actress, as it is more strongly related to the local noun jewelry, based on higher latent semantic analysis values, while the verb arrived is neutral, equally related to each noun, and the verb frowned is more strongly related to the subject than the local noun. The 3×2 design included unambiguous relative clause conditions: The actress who was sent the jewelry sparkled/arrived/frowned. Results demonstrated that retrieval cue strength increased interpretation probability (asymptotes) for all sentences, but did not affect the time-course. Ambiguity, on the other hand, uniformly slowed rates compared to unambiguous sentences. The rate difference is consistent with

382

stephani foraker, ian cunnings, and andrea e. martin

reanalysis based on additional attempts to retrieve and interpret a subject. Overall, this profile supports accounts that posit representational differences, such as competing lexical or structural representations. It also indicates that ambiguous structures take more time to process, which is due to multiple parsing attempts. Note that the lack of intercept differences argues against separate, additional repair or reanalysis mechanisms at work. Contrastingly, Bornkessel et al. (2004) presented evidence from an SAT experiment in German indicating that case information and phrase structure, which can be pulled apart in German, interacted during reanalysis. Participants judged sentences that were temporarily ambiguous, where a garden-path analysis was nominative-initial but a correct interpretation was dative-initial, and compared them to sentences with a correct nominative-initial interpretation. A later intercept was found for the dative-initial sentences compared to the nominative-initial sentences, supporting a reanalysis operation for syntactic structure. Additionally, within the dative-initial conditions, asymptotic accuracy was higher for an object-experiencer verb than a dative active verb, indicating that the case information associated with the object-experiencer verb provided a stronger cue to guide reanalysis. While interpretation of the time-course results from McElree (1993) and Martin and McElree (2018) may seem to contradict those of Bornkessel et al. (2004), note that the first two showed rate differences, while the latter showed an intercept difference. Recall that rate differences are more compatible with reapplication of a mechanism already at work, such as additional attempts at retrieving needed elements (Martin and McElree 2018; McElree 1993; McElree et al. 2003), more than one gap to be filled (McElree et al. 2003), or building additional semantic structure (McElree et al. 2006). Intercept differences, on the other hand, can provide evidence of an additional operation or separate mechanism at work over processing time, as in a serial or cascaded parsing routine in which one kind of information is computed before another, or in a parallel architecture in which one kind of information takes more time to compute (Bornkessel et al. 2004; Bott et al. 2012; McElree and Griffith 1995; 1998). Further examination of different kinds of garden-path ambiguities with SAT methods will help to further elucidate sentence processing mechanisms. Anaphora is another area where SAT methods can be applied to translate formal linguistic claims to cognitive mechanisms that make time-course predictions. Foraker and McElree (2007) assessed two accounts of antecedent prominence, comparing a continuum of activation strength to a special cognitive state akin to focal attention. Approaches such as the Focus Memory Framework (Garrod, Freudenthal, and Boyle 1994; Stewart, Pickering, and Sanford 2000) propose that antecedent representations vary along a continuum of activation strength, which is consistent with a higher probability of retrieving a more prominent antecedent, supporting higher accuracy of coreference resolution—but, no time-course distinctions. Alternatively, approaches such as Gundel (1999; Gundel, Hedberg, and Zacharski 1993), and to some extent, Centering Theory (Grosz and Sidner 1986; Grosz, Joshi, and Weinstein 1995), posit that discourse factors which increase antecedent prominence place the most salient item in the

speed–accuracy trade-off modeling and its interface

383

psychological focus of attention. This claim predicts a faster speed of processing for coreference involving a prominent antecedent. Foraker and McElree (2007) compared prominent referents (21, 22) to nonprominent ones (23, 24), as well as a pronoun adjacent to its referent (22, 24) vs. distant from its referent (21, 23). (21) It was the skillful carpenter who repaired the antique dresser. He hammered (*creaked). (22) What the skillful carpenter repaired was the antique dresser. It creaked (*hammered). (23) What the skillful carpenter repaired was the antique dresser. He hammered (*creaked). (24) It was the skillful carpenter who repaired the antique dresser. It creaked (*hammered). The syntactic clefting structure in (21, 24) makes the noun phrase skillful carpenter more prominent, while the pseudo-cleft in (22, 23) renders antique dresser more prominent. Unacceptable versions were constructed by switching the last verb, creating an animacy violation during binding with the pronoun. When the pronoun referred back to a prominent referent, asymptotic accuracy was higher than the non-prominent conditions, consistent with facilitated retrieval of the referent representation. However, prominence did not affect the speed of processing, arguing against active maintenance in a specialized state. Instead, speed of processing was faster when the antecedent and pronoun were adjacent (22, 24), compared to not (21, 23). The faster speed supports an active maintenance explanation for adjacent elements only (see also McElree et al. 2003), not for prominent antecedent conditions. Structural locality influences on reflexive anaphor resolution have also been examined with SAT modeling (Dillon, Chow, Wagers, Guo, Liu, and Phillips 2014). For the Mandarin Chinese reflexive ziji, Dillon et al. found an earlier intercept for retrieval and binding of ziji with an antecedent within a local syntactic domain, compared to long-distance binding. These results appear to show that retrieval is limited to the local subject position at first (even when that is the dispreferred interpretation overall), suggesting that a subset of features germane to the dependency is used as retrieval cues, rather than all features. Although these results are of interest, we advise additional investigation, as the differences in observed d′ were only marginally different yet fit with different asymptotes, and accuracy was extremely low overall. Perhaps other factors may be at play in this case of anaphor processing. Finally, additional SAT investigations have tested models and theories involving metonymy, metaphor, enriched composition, and scalar implicatures. Metonymic expressions were less likely to be computed than literal controls, with no differences in time course (Bott et al. 2016), supporting direct access to metonymic senses, and arguing against an indirect, literal-first type of model. Similarly, figurative interpretations were less likely to be recovered or computed than literal ones, with no differences

384

stephani foraker, ian cunnings, and andrea e. martin

in time course (McElree and Nordlie 1999), also arguing against a serial, literal-first approach. Relatedly, enriched composition expressions were less likely to be sensibly computed than non-coerced controls (McElree et al. 2006). However, in this case, enriched expressions also displayed slower rates, consistent with the claim that building additional semantic structure required more time. Finally, pragmatic upper-bound interpretations of some scalar implicatures were less likely to be computed than logical, lower-bound interpretations (Bott et al. 2012). Importantly, pragmatic interpretations, which required the scalar implicature, were also consistently slower in time-course intercept and rate than logical ones. Bott et al. (2012) discuss several causes of costly implicatures, including extra computations, under-informativeness, and aspects of an inferential mechanism itself, such as implementing the epistemic step.

11.6 Future applications of SAT

..........................................................................................................................

Looking to future SAT applications, a number of linguistic theories have posited that anaphora resolution can be resolved in different ways (e.g. Bosch 1983; Grodzinsky and Reinhart 1993; Reuland 2001; 2011). Although the precise characterization of these theories differ, many assume that anaphora resolution can be resolved via either a syntactic route, typically referred to as variable binding, or via discourse-mediated coreference assignment (for discussion, see Reuland 2011). Reuland argued that in cases where both routes are available, an economy principle dictates that variable binding should be computed before coreference assignment. Results from eye-movement studies have provided mixed support for this claim, with some researchers claiming evidence in favor of a preference for variable binding (Koornneef 2008), and others not (Cunnings, Patterson, and Felser 2014). The SAT procedure can provide an explicit way of modeling the likelihood and time course of pronoun resolution to help tease apart the hypothesized dissociation between variable binding and coreference resolution. Another important issue to discuss in relation to how linguistic representations are accessed from memory relates to c-command. “C-command” describes the relationship between two constituents in a syntactic tree structure in terms of hierarchical dominance. The standard definition of c-command is that a constituent c-commands its sister constituents, and any constituents that they dominate (Reinhart 1993). C-command is crucial to the linguistic characterization of syntactic constraints on linguistic dependencies. For example, in (25), the traditional linguistic characterization of how himself can be interpreted is that it must be bound by a c-commanding antecedent in the same local domain (Chomsky 1981). (25) The boy who Kevin spoke to yesterday morning injured himself. Cue-based content-addressable memory access relies on features to access information in memory. However, c-command is an inherently relational concept between sentence constituents that cannot be reduced to a feature. We cannot, for example, say

speed–accuracy trade-off modeling and its interface

385

that Kevin lacks a [+C-COMMAND] feature to restrict it from being retrieved upon encountering the reflexive, because while Kevin does not c-command the reflexive, it does c-command other constituents in the sentence (spoke to yesterday morning). Other linguistic dependencies are also typically described as being restricted by c-command. Given the relational nature of c-command and its importance in constraining linguistic dependencies, it might be surprising that evidence from the SAT paradigm suggests that language comprehension involves memory access via feature-based direct-access retrieval rather than, for example, a serial search that could utilize relational information (McElree et al. 2003). One way to address this issue has been to devise feature-based proxies that encode the c-command relation via a set of features on items in memory that can subsequently be utilized during cue-based retrieval (e.g. Cunnings et al. 2014; Kush 2013; Kush, Lidz, and Phillips 2015). However, on a theoretical level, it is important to note that these feature-based proxies are not c-command, as they are inherently non-relational. Existing research on the role of c-command in constraining memory access during processing has typically assumed that while feature-based proxies might be utilized as retrieval cues to guide memory access online, the linguistic characterization of constraints on linguistic dependencies can still be described in terms of a c-command relation. However, a more radical and controversial conclusion might be that, if it is indeed the case that memory access during language comprehension relies on cuebased retrieval, and on the assumption of a tight relationship between the grammar and the parser, constraints on linguistic dependencies should rather be theoretically characterized in terms of content-based features, instead of relational notions such as c-command. Other possibilities are that the relational order aspect of c-command is not accessed during the retrieval and interpretation of non-adjacent dependencies, or is a much weaker or less reliable constraint that carries lower weight, or that interacts with syntactic role, such as subjecthood (Van Dyke and McElree 2011). A fundamentally different way of conceiving of the relationship between grammar (or grammatical constraints like c-command) and the parser (or computation and its behavior consequence during processing) is to see grammar as a system of (neural) state spaces that the network can enter into (Martin 2016; 2020). On this view, grammar is implicitly represented (see Rust 2014 for a discussion of how implicit information can become explicit and accessible in neural systems) and no additional parsing mechanism is needed. It is only representation of the information that determines which state the system enters next, such that detection of representational state through sensory signals essentially replaces the role of a separate control-structure-like parser (Martin 2016; 2020). As already noted, evidence from the SAT paradigm indicates that language comprehension is susceptible to retrieval interference, and that this interference is dependent on the similarity between items in memory. Similarity-based retrieval interference is in some ways similar to the concept of relativized minimality in the theoretical linguistics literature (Rizzi 1990; 2011). Relativized minimality states that a linguistic dependency between a displaced constituent and its canonical sentence position can be disrupted when a c-commanding constituent, whose morphosyntactic features match that of the displaced constituent, intervenes. Although the notions of similarity-based

386

stephani foraker, ian cunnings, and andrea e. martin

interference and relativized minimality have been formalized to account for different linguistic phenomena, both predict that the success of linguistic dependency resolution is influenced by the similarity between sentence constituents. Cue-based retrieval provides a processing implementation for relativized minimality that is rooted in the principles of human recognition memory, and offers a mechanistic explanation for why intervention effects occur: They fall out naturally as a consequence of the way memory is accessed in cue-based retrieval. Precise characterization of constraints on how memory is accessed during linguistic dependency resolution may help provide a unifying bridge between work in theoretical linguistics on the characterization of linguistic constraints on dependency resolution, and work in psycholinguistics on the time-course of memory access during sentence processing. Future research using the SAT paradigm will help formalize this link between memory access and linguistic representation.

11.7 Conclusion

..........................................................................................................................

While many different experimental paradigms are available to the linguist and psycholinguist interested in investigating linguistic representation and processing, the SAT procedure provides the best means to veridically estimate the trade-off between speed and accuracy during language processing, which provides a comprehensive picture of when and how interpretation develops over processing time. In this chapter, we explained how the SAT paradigm can provide clear evidence about time course of processing that is unconfounded by accuracy or probability of interpretation. We also described the prominent role that SAT evidence has taken in integrating memory models into psycholinguistic theory, and reviewed how SAT evidence can be used to inform other issues of debate in linguistics and psycholinguistics, which we hope will inspire future use of the SAT paradigm within the experimental syntax literature.

References Bornkessel, Ina, Brian McElree, Matthias Schlesewsky, and Angela D. Friederici. 2004. Multidimensional contributions to garden path strength: Dissociating phrase structure from case marking. Journal of Memory and Language 51(4): 495–522. Bösch, Peter. 1983. Agreement and anaphora: A study of the role of pronouns in syntax and discourse. London: Academic Press. Bott, Lewis, Todd M. Bailey, and Daniel Grodner. 2012. Distinguishing speed from accuracy in scalar implicatures. Journal of Memory and Language 66: 123–142. Bott, Lewis, Alice Rees, and Steven Frisson. 2016. The time course of familiar metonymy. Journal of Experimental Psychology: Learning, Memory, and Cognition 42: 1160–1170. Chandler, John P. 1969. Subroutine STEPIT—finds local minimum of a smooth function of several parameters. Behavioral Science 14: 81–82. Chomsky, Noam. 1981. Lectures on government and binding. Dordrecht: Foris.

speed–accuracy trade-off modeling and its interface

387

Clark, Steven E., and Scott D. Gronlund. 1996. Global matching models of recognition memory: How the models match the data. Psychonomic Bulletin and Review 3: 37–60. Clifton, Charles, and Lyn Frazier. Comprehending sentences with long-distance dependencies. In Linguistic structure in language processing, 273–317. Dordrecht: Springer. Cohen, Neal J., and Howard Eichenbaum. 1993. Memory, amnesia, and the hippocampal system. Cambridge, MA: MIT Press. Cowan, Nelson. 2001. The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences 24: 87–185. Cowan, Nelson. 2005. Working memory capacity. New York: Psychology Press. Cunnings, Ian, Clare Patterson, and Claudia Felser. 2014. Variable binding and coreference in sentence comprehension: Evidence from eye movements. Journal of Memory and Language 71(1): 39–56. Daneman, Meredyth, and Patricia A. Carpenter. 1980. Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior 19(4): 450–466. Dillon, Brian, Wing-Yee Chow, Matthew Wagers, Taomei Guo, Fengqin Liu, and Colin Phillips. 2014. The structure-sensitivity of memory access: evidence from Mandarin Chinese. Frontiers in Psychology 5: art. 1025. Donders, Franciscus Cornelis. 1969. On the speed of mental processes. In Attention and Performance II, Acta Pscyhologica 30: 412–431, ed. and trans. W. G. Koster. Amsterdam: North-Holland. (Original publication 1868.) Dosher, Barbara A. 1981. The effect of delay and interference: A speed-accuracy study. Cognitive Psychology 13: 551–582. Foraker, Stephani, and Brian McElree. 2007. The role of prominence in pronoun resolution: active versus passive representations. Journal of Memory and Language 56(3): 357–383. Foraker, Stephani, and Brian McElree. 2011. Comprehension of linguistic dependencies: speed-accuracy tradeoff evidence for direct-access retrieval from memory. Language and Linguistics Compass 5(11): 764–783. Garrod, Simon, Daniel Freudenthal, and Elizabeth Boyle. 1994. The role of different types of anaphor in the on-line resolution of sentences in a discourse. Journal of Memory and Language 33: 39–68. Gibson, Edward. 2000. The dependency locality theory: A distance-based theory of linguistic complexity. In A. Marantz, Y. Miyashita, and W. O’Neil (eds), Image, language, brain: Papers from the first mind articulation project symposium, 94–126. Cambridge, MA: MIT Press. Gordon, Peter C., Randall Hendrick, and Marcus Johnson. 2001. Memory interference during language processing. Journal of Experimental Psychology: Learning, Memory and Cognition 27: 1411–1423. Gordon, Peter C., Randall Hendrick, and Marcus Johnson. 2004. Effects of noun phrase type on sentence complexity. Journal of Memory and Language 51: 97–114. Gordon, Peter C., Randall Hendrick, and William H. Levine. 2002. Memory-load interference in syntactic processing. Psychological Science 13: 425–430. Gordon, Peter C., Randall Hendrick, Marcus Johnson, and Yoonhyoung Lee. 2006. Similaritybased interference during language comprehension: Evidence from eye tracking during reading. Journal of Experimental Psychology: Learning, Memory and Cognition 32: 1304–1321. Grodner, Daniel, and Edward Gibson. 2005. Consequences of the serial nature of linguistic input for sentential complexity. Cognitive Science 29(2): 261–290.

388

stephani foraker, ian cunnings, and andrea e. martin

Grodzinsky, Yosef, and Tanya Reinhart. 1993. The innateness of binding and coreference. Linguistic Inquiry 24(1): 69–101. Gronlund, Scott D., Mark B. Edwards, and Daryl D. Ohrt. 1997. Comparison of the retrieval of item versus spatial position information. Journal of Experimental Psychology: Learning, Memory, and Cognition 23: 1261–1274. Grosz, Barbara J., and Candace L. Sidner. 1986. Attention, intentions, and the structure of discourse. Computational Linguistics 12(3): 175–204. Grosz, Barbara J., Aravind K. Joshi, and Scott Weinstein. 1995. Centering: A framework for modelling the local coherence of discourse. Computational Linguistics 21(2): 203–226. Gundel, Jeanette K. 1999. On different kinds of focus. In P. Bosch and R. van der Sandt (eds), Focus: Linguistic, cognitive, and computational perspectives, 293–305. Cambridge: Cambridge University Press. Gundel, Jeanette K., Nancy Hedberg, and Ron Zacharski. 1993. Cognitive status and the form of referring expressions in discourse. Language 69: 274–307. Hagoort, Peter. 2003. How the brain solves the binding problem for language: a neurocomputational model of syntactic processing. Neuroimage 20: S18–S29. Jäger, Lena A., Felix Engelmann, and Shravan Vasishth. 2017. Similarity-based interference in sentence comprehension: Literature review and Bayesian meta-analysis. Journal of Memory and Language 94: 316–339. James, William 1918. The Principles of Psychology. New York: Holt. (Original publication 1890.) Judd, Charles M., and Gary H. McClelland. 1989. Data analysis: A model-comparison approach. San Diego, CA: Harcourt Brace. Kohonen, Teuvo. 1984. Self-organization and associative memory. New York: Springer. Koornneef, Arnout Willem. 2008. Eye-catching anaphora. Dissertation, Utrecht University. Kush, Dave W. 2013. Respecting relations: Memory access and antecedent retrieval in incremental sentence processing. Dissertation, University of Maryland, College Park. Kush, Dave, Jeffrey Lidz, and Colin Phillips. 2015. Relation-sensitive retrieval: Evidence from bound variable pronouns. Journal of Memory and Language 82: 18–40. Lewis, Richard L., and Shravan Vasishth. 2005. An activation-based model of sentence processing as skilled memory retrieval. Cognitive Science 29: 375–419. Lewis, Richard L., Shravan Vasishth, and Julie A. Van Dyke. 2006. Computational principles of working memory in sentence comprehension. Trends in Cognitive Science 10: 445–454. Martin, Andrea E. 2016. Language processing as cue integration: Grounding the psychology of language in perception and neurophysiology. Frontiers in Psychology 7: art. 120. Martin, Andrea E. 2020. A compositional neural architecture for language. Journal of Cognitive Neuroscience, 32(8), 1407–1427. Martin, Andrea E., and Brian McElree. 2008. A content-addressable pointer mechanism underlies comprehension of verb-phrase ellipsis. Journal of Memory and Language 58: 879–906. Martin, Andrea E., and Brian McElree. 2009. Memory operations that support language comprehension: evidence from verb-phrase ellipsis. Journal of Experimental Psychology: Learning Memory and Cognition 35: 1231–1239. Martin, Andrea E., and Brian McElree. 2011. Direct-access retrieval during sentence comprehension: Evidence from sluicing. Journal of Memory and Language 64: 327–343.

speed–accuracy trade-off modeling and its interface

389

Martin, Andrea E., and Brian McElree. 2018. Retrieval cues and syntactic ambiguity resolution: speed-accuracy tradeoff evidence. Language, Cognition, and Neuroscience 33: 769–783. Martin, Andrea. E., Mante S. Nieuwland, and Manuel Carreiras. 2012. Event-related brain potentials index cue-based retrieval interference during sentence comprehension. Neuroimage 59(2): 1859–1869. Martin, Andrea E., Mante S. Nieuwland, and Manuel Carreiras. 2014. Agreement attraction during comprehension of grammatical sentences: ERP evidence from ellipsis. Brain and Language 135: 42–51. McElree, Brian. 1993. The locus of lexical preference effects in sentence comprehension: a time-course analysis. Journal of Memory and Language 32: 536–571. McElree, Brian. 1996. Accessing short-term memory with semantic and phonological information: A time-course analysis. Memory and Cognition 24: 173–187. McElree, Brian. 1998. Attended and non-attended states in working memory: accessing categorized structures. Journal of Memory and Language 38: 225–252. McElree, Brian. 2000. Sentence comprehension is mediated by content-addressable memory structures. Journal of Psycholinguistic Research 29: 111–123. McElree, Brian. 2001. Working memory and focal attention. Journal of Experimental Psychology: Learning, Memory, and Cognition 27: 817–835. McElree, Brian. 2006. Accessing recent events. In B. H. Ross (ed.), The psychology of learning and motivation: Advances in research theory, 155–200. San Diego, CA: Academic Press. McElree, Brian, and Barbara A. Dosher.1989. Serial position and set size in short term memory: the time course of recognition. Journal of Experimental Psychology: General 118: 346–373. McElree, Brian, and Barbara A. Dosher. 1993. Serial retrieval processes in the recovery of order information. Journal of Experimental Psychology: General 122: 291–315. McElree, Brian, and Teresa Griffith. 1995. Syntactic and thematic processing in sentence comprehension: evidence for a temporal dissociation. Journal of Experimental Psychology: Learning, Memory, and Cognition 21(1): 134–157. McElree, Brian, and Teresa Griffith. 1998. Structural and lexical constraints on filling gaps during sentence comprehension: a time-course analysis. Journal of Experimental Psychology: Learning, Memory, and Cognition 24(2): 432–460. McElree, Brian, and Johanna Nordlie. 1999. Literal and figurative interpretations are computed in equal time. Psychonomic Bulletin and Review 6: 486–494. McElree, Brian, Stephani Foraker, and Lisbeth Dyer. 2003. Memory structures that subserve sentence comprehension. Journal of Memory and Language 48: 67–91. McElree, Brian, Liina Pylkkänen, Martin Pickering, and Matthew J. Traxler. 2006. A time course analysis of enriched composition. Psychonomic Bulletin and Review 13: 53–59. Miller, George, and Noam Chomsky. 1963. Finitary models of language users. In D. R. Luce, R. R. Bush, and E. Galanter (eds), Handbook of mathematical psychology, 2–419. New York: John Wiley. Murdock, Bennet B. Jr. 1971. A parallel-processing model for scanning. Perception and Psychophysics 10: 289–291. Nairne, James S. 2002a. The myth of the encoding–retrieval match. Memory 1: 389–395. Nairne, James S. 2002b. Remembering over the short-term: The case against the standard model. Annual Review of Psychology 53(1): 53–81.

390

stephani foraker, ian cunnings, and andrea e. martin

Neath, Ian. 1993. Distinctiveness and serial position effects in recognition. Memory and Cognition 21: 689–698. Neath, Ian, and Alicia J. Knoedler. 1994. Distinctiveness and serial position effects in recognition and sentence processing. Journal of Memory and Language 33(6): 776–795. Nicenboim, Bruno, and Shravan Vasishth. 2018. Models of retrieval in sentence comprehension. Journal of Memory and Language 99: 1–34. Oberauer, Klaus. 2002. Access to information in working memory: exploring the focus of attention. Journal of Experimental Psychology: Learning, Memory, and Cognition 28: 411–421. Öztekin, Ilke, and Brian McElree. 2007. Proactive interference slows recognition by eliminating fast assessments of familiarity. Journal of Memory and Language 57: 126–149. Öztekin, Ilke, and Brian McElree. 2010. Relationship between measures of working memory capacity and the time course of short-term memory retrieval and interference resolution. Journal of Experimental Psychology: Learning, Memory, and Cognition 36(2): 383–397. Öztekin, Ilke, Brian McElree, Bernhard P. Staresina, and Lila Davachi. 2008. Working memory retrieval: contributions of the left prefrontal cortex, the left posterior parietal cortex, and the hippocampus. Journal of Cognitive Neuroscience 21: 581–593. Öztekin, Ilke, Lila Davachi, and Brian McElree. 2010. Are representations in working memory distinct from representations in long-term memory? Neural evidence in support of a single store. Psychological Science 21(8): 1123–1133. Parker, Dan. 2015. Two is not always better than one: Modeling evidence for a single structure building system. Poster at the Architectures and Mechanisms for Language Processing (AMLaP) 2015 conference, University of Malta. Parker, Dan, Michael Shvartsman, and Julie A. Van Dyke. 2017. The cue-based retrieval theory of sentence comprehension: New findings and new challenges. In L. Escobar, V. Torrens, and T. Parodi (eds), Language processing and disorders, 121–144. Newcastle upon Tyne: Cambridge Scholars. Phillips, Colin, Matthew W. Wagers, and Ellen F. Lau. 2011. Grammatical illusions and selective fallibility in real-time language comprehension. Experiments at the Interfaces 37: 147–180. Ratcliff, Roger. 1978. A theory of memory retrieval. Psychological Review 85: 59–108. Reed, Adam V. 1973. Speed–accuracy trade-off in recognition memory. Science 181: 574–576. Reed, Adam V. 1976. List length and the time course of recognition in immediate memory. Memory and Cognition 4: 16–30. Reinhart, Tanya. 1993. Coreference and bound anaphora: A restatement of the anaphora questions. Linguistics and Philosophy 6(1): 47–88. Reuland, Eric. 2001. Primitives of binding. Linguistic Inquiry 32(3): 439–492. Reuland, Eric J. 2011. Anaphora and language design. Cambridge, MA: MIT Press. Rizzi, Luigi. 1990. Relativized minimality. Cambridge, MA: MIT Press. Rizzi, Luigi. 2011. Minimality. In C. Boeckx (ed.), The Oxford handbook of linguistic minimalism, 220–238. Oxford: Oxford University Press. Rust, Nicole C. 2014. Population-based representations: From implicit to explicit. In M. S. Gazzaniga, G. R. Mangun, and S.-J. Blakemore (eds), The cognitive neurosciences, 337–348. Cambridge, MA: MIT Press. Sprouse, Jon, Carson T. Schütze, and Diogo Almeida. 2013. A comparison of informal and formal acceptability judgments using a random sample from Linguistic Inquiry 2001–2010. Lingua 134: 219–248.

speed–accuracy trade-off modeling and its interface

391

Sternberg, Saul. 1966. High speed scanning in human memory. Science 153: 652–654. Sternberg, Saul. 1969. The discovery of processing stages: extensions of Donders’ method. Acta Psychologica 30: 276–315. Sternberg, Saul. 1975. Memory scanning: new findings and current controversies. Quarterly Journal of Experimental Psychology 27: 1–32. Stewart, Andrew J., Martin J. Pickering, and Anthony J. Sanford. 2000. The time course of the influence of implicit causality information: focusing versus integration accounts. Journal of Memory and Language 42: 423–443. Stowe, Laurie A. 1986. Parsing wh-constructions: evidence for on-line gap location. Language and Cognitive Processes 1: 227–245. Townsend, James T., and F. Gregory Ashby. 1983. The stochastic modeling of elementary psychological processes. New York: Cambridge University Press. Van Dyke, Julie A. 2007. Interference effects from grammatically unavailable constituents during sentence processing. Journal of Experimental Psychology: Learning, Memory and Cognition 33: 407–430. Van Dyke, Julie A., and Clinton L. Johns. 2012. Memory interference as a determinant of language comprehension. Language and Linguistics Compass 6(4): 193–211. Van Dyke, Julie A., and Richard L. Lewis. 2003. Distinguishing effects of structure and decay on attachment and repair: A retrieval interference theory of recovery from misanalyzed ambiguities. Journal of Memory and Language 49: 285–316. Van Dyke, Julie A., and Brian McElree. 2006. Retrieval interference in sentence comprehension. Journal of Memory and Language 55: 157–166. Van Dyke, Julie A., and Brian McElree. 2011. Cue-dependent interference in comprehension. Journal of Memory and Language 65: 247–263. Van Dyke, Julie A., Clinton L. Johns, and Anuenue Kukona. 2014). Low working memory capacity is only spuriously related to poor reading comprehension. Cognition 131(3): 373– 403. Vasishth, Shravan, Sven Brüssow, Richard L. Lewis, and Heiner Drenhaus. 2008. Processing polarity: How the ungrammatical intrudes on the grammatical. Cognitive Science 32(4): 685–712. Wagers, Matthew W., Ellen F. Lau, and Colin Phillips. 2009. Agreement attraction in comprehension: Representations and processes. Journal of Memory and Language 61(2): 206–237. Warren, Tessa, and Edward Gibson. 2002. The influence of referential processing on sentence complexity. Cognition 85(1): 79–112. Watkins, Olga C., and Michael J. Watkins. 1975. Build-up of proactive inhibition as a cue overload effect. Journal of Experimental Psychology: Human Learning and Memory 104(4): 442–452. Wickelgren, Wayne A. 1977. Speed–accuracy tradeoff and information processing dynamics. Acta Psychologica 41: 67–85. Wickelgren, Wayne A., Albert T. Corbett, and Barbara A. Dosher. 1980. Priming and retrieval from short-term memory: A speed–accuracy tradeoff analysis. Journal of Verbal Learning and Verbal Behavior 19: 387–404.

c ha p t e r 1 2 ...........................................................................................................

formal methods in e x p e r i m e n ta l s y n ta x ...........................................................................................................

tim hunter

The goal of this chapter is to outline the ways in which formal methods can help expand the array of evidence that can be brought to bear on the empirical evaluation of theories of natural language syntax. Since bringing any kind of evidence to bear on a theory naturally requires relevant linking hypotheses, expanding the range of useful evidence amounts to expanding the range of linking hypotheses that can be used to expose syntactic hypotheses to the empirical spotlight. This requires both formulating new linking hypotheses and ensuring that the underlying theory to be tested takes a form that these linking hypotheses can engage with. The topic of this chapter, then, is the use of linking hypotheses that require that the underlying theory takes the form of an explicit and self-contained formal grammar. Put differently, it is an overview of the benefits to be had, in the form of greater empirical testability, by formulating one’s syntactic theories in this manner. And in keeping with the section of the handbook in which this chapter appears, the linking hypotheses considered here will serve to link syntactic theories to sentence-processing observations.1 I will discuss two classes of linking hypotheses, those based on information-theoretic complexity metrics (Section 12.2) and those based on automata-theoretic models of parsing (Section 12.3). Although both require that the underlying grammatical theory being tested takes the form of some formal grammar, they differ in what further details they require that the grammar furnish. The use of information-theoretic complexity metrics (for example, surprisal or entropy reduction) requires some probability distribution to be defined over the expressions generated by the grammar, but little more, and broadly speaking this tends to amount to a lower “price of entry” than the automata-theoretic approaches impose. The automata-theoretic approaches work at a lower level of abstraction, and require supplementing the grammar with an explicit 1

See Levy (2013) and Hale (2017) for reviews that provide a broader historical context for many of the topics discussed here, with less focus on the goal of testing hypothesized grammars.

394

tim hunter

parsing algorithm (for example, top-down or bottom-up context-free parsing); the payoff for the extra effort involved is arguably that the candidate explanations provided by this approach have an additional causal, mechanistic character. I will mostly restrict attention to simple, linguistically inadequate kinds of grammars (e.g. finite state machines and context-free grammars) to illustrate the different kinds of linking hypotheses, and what a grammar must be like in order to mesh with these linking hypotheses. In Section 12.4 I will briefly outline how something like contemporary minimalist syntax can be put into a form that allows these linking hypotheses to engage with it, and provide pointers to relevant research along these lines. In recent years, it is the information-theoretic approaches that have most frequently and fruitfully been combined with sophisticated grammars of the sort that syntacticians would find the most familiar. This is not due to any fundamental differences in compatibility, but rather simply due to the higher price of entry for the automata-theoretic methods and the extent of our current knowledge. As our understanding improves it is likely that automata-theoretic methods will more frequently be combined with more linguistically realistic grammars. Given the focus on foundational (and hopefully widely applicable) concepts, there will be little or no discussion of either the finer points of contemporary syntax or any empirical details of recent experimental work in psycholinguistics. Of course the broader enterprise that this chapter aims to contribute to will likely get nowhere if these specifics are always left aside as they are here. The “back to basics” approach that I adopt is based on a hunch that these foundational ideas, as opposed to specifics, are the ones that are under-represented and under-utilized in current discussions.2

12.1 Linking hypotheses, complexity metrics, and the general form of grammar-testing arguments

..........................................................................................................................

Most frequently, a grammar is empirically tested by considering the sound–meaning pairings that it generates. The usual way in which a grammar “pairs up” sounds with meanings is by generating a collection of more abstract syntactic objects—in neutral terms, a structural description, but typically something like a tree—each of which has a certain sound/pronunciation and a certain meaning. One way to go about bringing new kinds of observations/evidence to bear on hypothesized grammars is to follow this pattern and seek ways for structural descriptions to be connected to the relevant new kinds of observables, in something like the way they are usually connected to sounds 2

I hope it goes without saying that this hunch could be wrong (but I include this footnote just to be sure).

formal methods in experimental syntax

395

and meanings. For example, a particular structural description can “have” a certain degree of complexity in much the same way that it “has” a sound and a meaning. To be concrete, let us say that for any structural description t, its sound is PHON(t) and its meaning is SEM(t). Then the usual “linking hypotheses” (although it is unusual to call them that) which we use to test a particular posited mental grammar are something roughly like the following (ignoring many subtleties, such as ambiguities): (1) A speaker encountering the string PHON(t) will latch onto the meaning SEM(t). (2) A speaker wishing to express the meaning SEM(t) will produce the string PHON(t). To expand the range of relevant evidence beyond sound–meaning pairings, then, we can consider adding other “lenses onto” the structural description t beyond the existing PHON and SEM.3 If we introduce some additional function F, such that F(t) is some measure of the complexity of t, and accompany it with a linking hypothesis such as (3) A speaker encountering the string PHON(t) will experience perceptual difficulty proportional to F(t). then we have new ways to find evidence for or against the structural descriptions generated by any particular grammar. In particular, it may be that two grammars are indistinguishable when viewed through the lenses of PHON and SEM (via the two initial, conventional linking hypotheses above), but make different predictions about the cases where speakers will experience greater perceptual difficulty (as measured by reading times, or eye-tracking patterns, or whatever). To take one exceedingly simple example, Miller and Chomsky (1963) suggest that “One rough measure of structural complexity that we might use …is the node-toterminal node ratio. …This number measures roughly the amount of computation per input symbol that must be performed by the listener.” They illustrate with the two trees in (4). (4)

a. A B a

C b

c

d

c

d

b. A a 3

b

Arguably we already make use of others, such as functions which measure the degree of syntactic well-formedness, or degree of semantic anomaly, etc. And there’s no reason the logic here should be restricted to measures of “complexity”—it’s just been a useful place to start.

396

tim hunter

Both have four terminal nodes, but (4a) has three non-terminal nodes whereas (4b) has only one. The ratio of nodes to terminal nodes is therefore 74 for (4a), but 45 for (4b). We can therefore imagine hypothesizing that a grammar that posits (4a) as the structural description underlying the string ‘a b c d’ will predict greater processing load than a grammar that posits (4b) instead.4 This illustrates the general strategy that we can pursue in order to increase the “empirical payload” of a grammatical theory. A metric such as node ratio, which produces a single number for a complete sentence (actually, for a complete structural description), is a good fit for certain relatively coarsegrained experimental measures of processing load that were used in the early days, such as whole-sentence reading times or accuracy in repeating back a given sentence. But a richer and finer-grained goal would be to identify a metric whose predictions can line up with the observations from more modern experimental paradigms such as self-paced reading, eye-tracking, and electrophysiological techniques, where we obtain measures of difficulty at particular points in a sentence. Concretely, rather than F(t) being a single number, F(t) should be some sequence of numbers, one for each word (or other appropriate region), which are taken as measures of the complexity of the work triggered by encountering that particular word of the sentence. I will call a measure like this an incremental complexity metric.

12.2 Information-theoretic complexity metrics

..........................................................................................................................

One well-known class of incremental complexity metrics are those based on formal concepts from information theory.5 Broadly speaking, these ideas relate to degrees of uncertainty about future events, and to the way such uncertainty changes in the face of new information. The common intuition behind these complexity metrics is that the amount of work involved in “taking on board” some new information will be some function of the effect that this new information has on uncertainty—for example, the effect that coming to know that the third word of some sentence is ‘cat’ has on a comprehender’s uncertainty about what the sentence is. As this initial description might suggest, these complexity metrics are very general and abstract. In a sense they don’t say things about what a comprehender does with this information, only how this new information relates to the information the comprehender already had before. In particular, there is nothing said about how information 4

To operationalize this, we would need either an additional assumption about the exact relationship between node ratio and some observable measure (e.g. reading time is 100ms times node ratio), or a separate sentence which has a lower node ratio than ‘a b c d’ does according to one grammar but has a higher node ratio than ‘a b c d’ according to the other grammar. 5 The original work on information theory is Shannon (1948). The very early chapters of the textbooks by Cover and Thomas (2006) and MacKay (2003) cover everything that is relevant here. Manning and Schütze (1999: 60–78) provide a brief introduction in the context of work on language.

formal methods in experimental syntax

397

is represented (either the new information or the previously existing information) or how those representations are transformed; but the common assumption that sentences are processed incrementally corresponds roughly to an assumption about which information is represented and which points in time.6 This high level of abstraction has its benefits and its drawbacks. A drawback is the way they work at a distance from the specifics of the representations and algorithms, as just mentioned. This has the consequence that even when we find a particularly good fit between some collection of data and a theory based on these linking hypotheses, this does not constitute evidence for or against any particular set of underlying nutsand-bolts mechanisms (of the sort that we will turn to in Section 12.3). Having said that, a main aim of the discussion that follows is to demonstrate that it still can constitute evidence for or against particular grammars. The flip side of this drawback is the advantage that, since a theory incorporating one of these complexity metrics only narrows down our live possibilities by a relatively modest degree, the price of entry for submitting a grammar to testing via these metrics—in terms of technical work and theoretical commitments—is accordingly also modest. A second advantage is that there is the possibility that we might find empirical support for such metrics that is genuinely independent of their use as probes into linguistic questions. At least to the extent that we suppose that there are domain-general facts of the matter about how the mind relates new information to old, the plausibility of these metrics can be independently assessed to a degree that more concrete, language-bound linking hypotheses cannot. The most notable examples of these complexity metrics are surprisal (Hale 2001; Levy 2005; 2008) and entropy reduction (Hale 2003; 2006).7 Other variants are also easily imaginable. These metrics differ in exactly how they use information-theoretic concepts about uncertainty to formulate a measure of when it is that new information takes a lot of work to take on board; put differently, they differ in exactly what kind of change in uncertainty they take to be indicative of “high workload.” For the purposes of the current chapter, however, those differences are not particularly consequential. The reason for this is that the requirements they impose on what a grammar must look like remain essentially the same: it must be possible to define a probability distribution over the grammar’s possible derivations, and to update this distribution according to new partial information about a sentence being observed. The focus here will be on these requirements. Satisfying these is what allows a grammar to be “plugged into” these complexity metrics. For illustration, I will demonstrate this “plugging in” by taking surprisal as a representative example of an information-theoretic complexity metric. This choice is purely 6

One might say that these metrics refer only to what Marr (1982) called the “computational level” of description, the most abstract of the three. 7 Hale (2016) provides a wide-ranging review that makes the case for preferring entropy reduction over surprisal; for critiques of entropy reduction see Levy (2005: 36–37) and Levy et al. (2013). For empirical support for surprisal see e.g. Boston et al. (2008), Demberg and Keller (2008), Brennan et al. (2016) and Smith and Levy (2013); and for entropy reduction, Yun et al. (2015), and Nelson et al. (2017). Frank (2013) and Linzen and Jaeger (2016) find support for both metrics independently.

398

tim hunter

one of convenience: It requires slightly less mathematical groundwork than the alternatives. The focus here will be squarely on the way either one of these metrics can be used to formulate a linking hypothesis that brings a hypothesized mental grammar into empirical contact with incremental comprehension measures.

12.2.1 Surprisal One well-known example of an incremental complexity metric is surprisal. Generally speaking, the surprisal at the occurrence of a particular event is large if the event was unexpected (had low probability), and is small if the event was expected (had high probability); in the extreme case where there was no alternative but for that particular event to happen (its probability is one), the surprisal is zero. To adopt surprisal as an incremental complexity metric for sentence comprehension, the idea is to take each of the words encountered in a sentence as a separate event, each of which will have its own surprisal value, and the crucial linking step is to hypothesize that we will see evidence of greater processing difficulty/complexity (e.g. reading slowdowns) when speakers encounter words with high surpisal values. We assume that the probability of a word, or the degree to which that word is expected, may differ depending on what other information about the sentence the comprehender already has; accordingly, the relevant probabilities are conditional upon the linearly/temporally preceding portion of the sentence.8 The crucial probabilities therefore have the general form: (5) P(Wi = wi | W1 = w1 , W2 = w2 , . . . Wi−1 = wi−1 ) where Wi is the random variable corresponding to the event of encountering the ith word, and wi is the particular word encountered in the ith position. For example, the probability relevant to calculating surprisal at the word ‘chased’ in ‘the dog chased the cat’ is given in (6). (6) P(W3 = chased | W1 = the, W2 = dog) Importantly, note that while these probabilities are probabilities of certain linear relationships between words, we will see in Section 12.2.2 that they may calculated on the basis of (hierarchical) structural descriptions containing these words. The particular function that is applied to a probability such as (6) in order to convert it to a surprisal value is the negative logarithm: (7) surprisal at wi = − log P(Wi = wi | W1 = w1 , W2 = w2 , . . . , Wi−1 = wi−1 ) From here we can formulate a linking hypothesis (cf. (3)): 8 One could easily calculate surprisal simply based on expectations about individual words without reference to context, but this would imply, for example, the same surprisal value at the word ‘dog’ in ‘The man bit the dog’ as in ‘The dog bit the man’, and the same surprisal value at the word ‘fell’ in ‘The horse fell’ as in ‘The horse raced past the barn fell’.

formal methods in experimental syntax

399

(8) A speaker encountering the string PHON(t) will, at each word wi , experience greater perceptual difficulty the greater surprisal at wi is. For the purposes relevant to this article, the important thing to note about (7) is simply that it has the effect of turning higher probabilities into lower surprisal values, and turning lower probabilities into higher surprisal values: if P(X) > P(Y), then − log P(X) < − log P(Y). The particular choice of the negative logarithm function ensures that surprisal works in accord with certain intuitions about how a measure of information should behave.9 One example is the extreme case mentioned above, that if P(X) = 1, then − log P(X) = 0 and so surprisal is zero, in accord with the intuition that no information has been obtained by observing an event that was certain to happen. A second is that surprisal values can sensibly be added. If we roll a fair four-sided die and a fair eight-sided die, then the surprisal at seeing the four-sided die come up on any particular side is − log 41 = 2 and the surprisal at seeing the eight-sided die come up on any particular side is − log 18 = 3. Summing these two surprisal values (2 + 3 = 5) gives the same result as calculating the surprisal from the perspective of the 32 different joint outcomes that were possible upon rolling the two dice together: − log 321 = 5. This accords with our intuition that the amount of information obtained by finding out which side came up on the four-sided die and which side came up on the eight-sided die—or, the degree to which we are “surprised by” this information—should be the same as that obtained by finding out which of the 32 joint events occurred. But since the linking hypothesis in (8) relies on just the qualitative idea that higher surprisal values correlate with greater comprehension difficulty/load, these numerical details are not particularly significant; the important guiding idea is simply that lower (conditional) probabilities correlate with greater comprehension difficulty/load. To take a concrete and complete example, suppose we are interested in calculating word-by-word surprisal values for the sentence ‘John saw it’. To illustrate the division of labor between the specification of a probability distribution over sentences and the calculation of surprisal values from those probabilities, for this first example I will suppose that the comprehenders’s expectations about the sentences he or she might encounter are simply specified by a finite lookup table, shown in (9). In Section 12.2.2 I will turn to considering the case where this specification takes the form of a grammar instead. (9)

9

0.4 0.15 0.05 0.25 0.1 0.05

John ran John saw it John saw them Mary ran Mary saw it Mary saw them

See the references in footnote 5 for much more detailed discussion of the ideas briefly introduced in this paragraph.

400

tim hunter

At the first word, ‘John’, there is no other information about the sentence to condition upon, so the relevant probability is simply the sum of all the probabilities of sentences that have ‘John’ as their first word. (10) surprisal at ‘John’ = − log P(W1 = John)

= − log(0.4 + 0.15 + 0.05) = − log 0.6 = 0.74 At the second word, ‘saw’, we need to consider probabilities conditioned upon the fact that the first word of the sentence is already known to be ‘John’; concretely, this means that we restrict attention to the first three lines of the table in (9), and ask how much of the 0.6 probability mass there lies with sentences that furthermore have ‘saw’ as their second word. The resulting probability, 0.33, is lower than the relevant probability at the first word, 0.6; accordingly, the surprisal value is higher here (1.58 vs. 0.74), and greater comprehension difficulty at this word is predicted. (11)

0.4 John ran surprisal at ‘saw’ = − log P(W2 = saw | W1 = John) 0.15 John saw it 0.15 + 0.05 = − log 0.05 John saw them 0.4 + 0.15 + 0.05

= − log 0.33 = 1.58 Similarly, at the third word we restrict attention to the two lines of the table that are still “in play.” The probability of 0.75 is higher than both of the two previous probabilities, and so the surprisal value here is lower than both of the two previous surprisal values. (12)

0.15 John saw it 0.05 John saw them

surprisal at ‘it’ = − log P(W3 = it | W1 = John, W2 = saw) 0.15 = − log 0.15 + 0.05 = − log 0.75

= 0.42 So the adoption of surprisal, and the linking hypothesis in (8), has taken us from the hypothesized probability distribution in (9) to (what I will take to be) testable predictions concerning the sentence ‘John saw it’: comprehension difficulty will be greatest at the word ‘saw’ (surprisal 1.58), lowest at the word ‘it’ (surprisal 0.42), and in between at the word ‘John’ (surprisal 0.74).10 The general idea is illustrated schematically in Figure 12.1. So surprisal amounts to a way to test the fit of a hypothesized probability distribution with some observations or data. 10

It is a simplification to call these “testable predictions”: There are of course still decisions to be made about how comprehension difficulty will be measured (e.g. reading times, electrophysiological responses); how surprisal values are assumed to relate to these measures (e.g. does a twice-as-large surprisal

formal methods in experimental syntax Probability distribution over sentences

401

Predictions about sentence comprehension difficulty

Surprisal plus linking hypothesis (8)

fig. 12.1 We can use surprisal to formulate a linking hypothesis which, taken together with a probability distribution over sentences, produces empirical predictions about sentence comprehension difficulty.

But via familiar logic, if there are unboundedly many relevant sentence-probabilities, then a finite mind cannot directly encode all the probabilities in a table like (9). The probabilities, like the sentences they are associated with, will have to instead be encoded in some finite system of rules. This system of rules would specify some collection of “primitive” probabilities (e.g. that the probability of the first word of a sentence being ‘John’ is 0.1, or that the probability of a verb being ‘run’ is 0.2) and recipes for deriving other “composite” probabilities from those. Independent of the particular numbers entering into the calculation of sentence’s probability, the structure of such a system of primitives and recipes constrains the range of probability distributions that can be defined. A natural way to take grammars to be part of what something like surprisal is testing is to realize that they amount to a hypothesis about the structure of this system.11

12.2.2 The role of grammars The aim in what follows is to illustrate how a grammar can play a role in defining a probability distribution over sentences; as a result, a grammar will play a role in determining surprisal predictions, which means in turn that empirical tests of surprisal predictions can provide evidence for or against hypothesized grammars. The machinery that we have introduced so far is entirely agnostic about the source of the probabilities that enter into these calculations (as illustrated in Figure 12.1), and this is what gives metrics like surprisal the high degree of versatility mentioned in the introduction: Surprisal can be calculated on the basis of probabilities drawn from a finite lookup table as above, or probabilites defined by a relatively simple grammar (e.g. a collection of allowed bigrams,

value predict, all else being equal, a twice-as-large reading time?); whether these surprisal values are assumed to be the sole contributing factor to these measures or one of a collection of interacting factors; etc. To be more careful we could perhaps use the term predictor for calculated values like surprisal, and reserve the term prediction for something more methodologically fleshed-out. But I will gloss over this distinction for ease of exposition. 11 This connection is reflected in the terminological overlap between “generative grammar” and “generative probabilistic model.”

402

tim hunter

or a finite state grammar, as we will see shortly), or probabilities defined by a more linguistically sophisticated grammar (e.g. a context free grammar or a minimalist grammar). This formalism-neutrality is what makes it possible for information-theoretic metrics like surprisal to act as the playing field on which any two grammars can be pitted against each other, no matter how different the two grammars are in their internals; see Figure 12.2.

Hypothesized mental grammar

Probability distribution over sentences

Predictions about sentence comprehension difficulty

Surprisal plus linking hypothesis (8)

Hypothesized mental grammar

Probability distribution over sentences

Predictions about sentence comprehension difficulty

fig. 12.2 Since surprisal can act as a test of probability distributions and probability distributions can be seen as consequences of hypothesized grammars, surprisal can act as a test of hypothesized grammars.

Note in particular that even though the calculation of surprisal values appeals to the notion of transitioning from one word to the next linearly adjacent word, there is no assumption that the grammatical knowledge that the comprehender brings to the task takes the form of statements about which words can and cannot, or what is likely or unlikely to, linearly follow certain other words (in the manner familiar from n-gram models, for example). The information in the table in (9) does not take this form, but still provides everything we need to know in order to calculate surprisal values. Similarly, the information in grammars that work with hierarchical structures rather than linear structures also provides what we need in order to make surprisal calculations. The linear nature of the metric reflects the linear nature of the external sentence-comprehension task, not any assumptions about the internal knowledge being recruited by the comprehender. In order for surprisal predictions to serve as a test of hypothesized grammars, of course, it must be the case that different grammars have different effects on the probability distributions that go into calculating surprisal. Obviously choosing a particular grammar (in the traditional, non-probabilistic sense) does not pick out a particular probability distribution; but adopting a particular grammar—breaking down the specification of a set of expressions with accompanying probabilities into a particular system of interlocking rules—does constrain the range of probability distributions that might arise. For example, if our grammar expresses the assumption that generating a sentence

formal methods in experimental syntax

403

amounts to generating a subject and generating a predicate independently (roughly, think of something like ‘S → NP VP’), then there will be no way to attach probabilities to a this grammar in a way that assigns the six sentences in (9) the particular probabilities that they have there.12 To see this, notice that in order to generate ‘John saw it’ and ‘John saw them’ with the probabilities shown (0.05 and 0.15, respectively), our grammar would need to generate the predicate ‘saw it’ with a probability exactly three times greater than the probability of the predicate ‘saw them’, because these two sentences differ only in the chosen predicate. (We don’t know whether these two probabilities should be, for example, 0.3 and 0.1, in which case the probability of the subject ‘John’ would need to be 0.5; or whether they should be 0.6 and 0.2, with ‘John’ having probability 0.25, etc. But we do know that they need to stand in this 3:1 ratio.) But also, in order to generate ‘Mary saw it’ and ‘Mary saw them’ with the probabilities shown, our grammar would need to generate ‘saw it’ with a probability exactly twice the probability of ‘saw them’. So this grammatical assumption about the structure of the sentences commits us to working within a restricted range of probability distributions over that set of six sentences, which excludes the particular probability distribution shown in (9). All else being equal, then, discovering that the surprisal values computed from the probabilities in (9) make correct predictions would constitute evidence against analyzing the subject and predicate as independent subparts of a sentence. To begin to look at concrete examples involving grammars, let us first consider the finite-state automaton (FSA) in (13). It’s useful to begin by looking at FSAs, rather than more sophisticated kinds of grammars, for a couple of reasons. The first reason is that it allows us to see clearly that the crucial property of a grammar is the way it factors out the generative work into a structured system of interlocking pieces, rather than any particular representations that are built; these two notions are often intertwined in grammars that manipulate tree structures. The second reason is that having seen surprisal calculated on the basis of two distinct grammatical systems—first FSAs, and then PCFGs below—it is easier to get a clear grasp on the idea of surprisal itself, as opposed to its incarnation in relation to any specific kind of grammar, and this in turn makes it easier to understand what it will take to operationalize the idea in whatever particular linguistically sophisticated grammatical system one might wish to (e.g. some version of minimalist syntax).

12

By taking this probabilistic independence assumption to be part and parcel of what is meant by the grammatical rule ‘S → NP VP’, I am leaving aside certain more complex alternatives that would assign customized probabilities to certain specialized combinations of NPs and VPs. While such alternatives can certainly be formulated, they amount to proposing a distinct generative mechanism that contains corresponding specialized rules, e.g. ‘S → NP5 VP3 ’ and ‘S → NP4 VP7 ’, and therefore differs from grammar that contains only the rule ‘S → NP VP’. The redundancy created by this move requires its own justification. But to the extent that we would like to leave this option open, when I say that adopting a particular grammar commits us to a restricted range of probability distributions, we should replace this with the claim that it makes that restricted set of distributions more parsimonious hypotheses than the alternatives.

404

tim hunter ran 0.25

John 0.6

(13) 1

Mary 0.4

2

3

saw 0.75

it 0.7 4

5

them 0.3 6

This FSA encodes the idea that a sentence is generated by independently generating a subject (corresponding to the choice of how to transition out of the start state) and generating a predicate (corresponding to the choice of how to get from the next state to some end state). It is this decomposition of the generative mechanisms into independent sub-parts that we can think of as attributing grammatical structure to the generated expressions; this is what the table in (9) does not do. There are a range of probability distributions that can be defined by putting probabilities on the transitions in (13), and the particular probabilities shown in the diagram pick out one of these. But as mentioned above, the probability distribution in (9) is not in this range, since any probability distribution defined on the basis of the FSA in (13) will have the property that P(John saw it) P(Mary saw it) = P(John saw them) P(Mary saw them) and this is not true of the distribution in (9). So, leaving aside the particular probabilities shown in (13), the example distribution is inconsistent with the hypothesized discrete grammatical structure that is encoded in the “shape” of the FSA in (13), and accordingly observations about sentence comprehension difficulty that match up with surprisal values calculated from this distribution constitute (all else being equal) evidence against that hypothesized grammatical structure. (Similarly, observations that are in accord with surprisal predictions based on probability distribution defined by (13) will, all else being equal, constitute evidence in favor of it.) The FSA in (13)’s relative inflexibility with regard to definable probability distributions is a direct consequence of the way it assigns more grammatical structure than the table in (9). Another way to appreciate this last point is to note that, in order to choose a particular probability distribution over sentences from the range of distributions made available by the FSA in (13), it suffices to choose three component probabilities: if we choose probabilities for, say, the arcs labeled ‘John’, ‘ran’, and ‘it’ (or any other three arcs), then all the other necessary probabilities are determined, since each state’s outgoing arcs must have probabilities that sum to one. In contrast, to choose a particular probability distribution given the table format in (9), one must choose five component probabilities: with five sentence probabilities fixed, the requirement that probabilities sum to one determines the sixth sentence probability. So the structure of the FSA in

formal methods in experimental syntax

405

(13) leaves us with only three parameters, or degree of freedom, whereas the simple lookup table gives us five. It is perhaps unusual to invoke any notion of grammatical structure when dealing with FSAs, given their linear nature. Re-expressing the FSA in (13) as a collection of rewrite rules (‘X1 → John X2 ’, ‘X1 → Mary X2 ’, ‘X2 → ran X3 ’, etc.) may help to counter any tendency to construe “linear” and “structured” as in opposition. But the significant point, notation aside, is that this grammar makes the meaningful claim that certain pairs of distinct sentences—such as the pair ‘John saw it’ and ‘Mary saw it’—have their predicate part in common. In the familiar tree-structure notation, this surfaces as the claim that those two sentences’ tree structures have a certain subtree in common; in the FSA setting, it surfaces as the claim that the state sequences used to generate the two sentences have a certain subsequence in common. The table in (9) does not make any such claim about any two sentences, and it is in this sense that the FSA ascribes more grammatical structure (leading to fewer degrees of freedom in specifying probability distributions). The bread-and-butter of grammatical analysis is making claims of this sort about shared structure—typically, of course, in more complex grammatical frameworks. Although this important general idea about generative structure underlies the way any kind of grammar might be used to define a probability distribution, the way in which surprisal values are determined from (the distribution defined by) a grammar will vary from one kind of grammar to the next. It turns out to be rather simple if the grammar takes the form of an FSA.13 For example, using the FSA in (13) and given the sentence ‘John saw it’, the probability relevant to surprisal at the word ‘it’ can be simply “read off ” the corresponding arc in the diagram: 0.7. But for the purposes of understanding how surprisal predictions can be calculated for other kinds of grammars—and understanding how related complexity metrics other than surprisal, not discussed here, are calculated14 —a different way of thinking about this is more useful. (While the linear nature of FSAs does not disqualify them from serving as a useful illustration of the general notion of “grammatical structure” discussed above, it does make the task of extracting surprisal values from them deceptively simple. The goal of starting with FSAs was to make the first issue clear.) Instead of thinking about an ant walking along the arcs in the diagram, such that the surprisal at a particular word is simply the probability labeling that arc that the ant must walk along, a useful more general notion is to think of the grammatical possibilities narrowing down as more information about the sentence being encountered is revealed. In particular, this can take the form of considering a “narrowing down” of the grammar itself: If we take the grammar above and “prune out” parts of the grammar that are not consistent 13 I am restricting attention here to deterministic FSAs, for simplicity. The situation for nondeterministic FSAs is not significantly different, since a non-deterministic FSA can always be converted to an equivalent deterministic one. See e.g. Rabin and Scott (1959: 121), Sipser (1997: 54), Hopcroft and Ullman (1979: 22), Partee et al. (1990: 462). 14 See e.g, the discussion of how entropy reduction is calculated in Hale (2006: 648) and Yun et al. (2015: 125–127).

406

tim hunter

with the first word being ‘John’, for example, we get the FSA in (14a). Then going one step further, further “pruning” the grammar to generate only sentences that are, in addition, consistent with the second word being ‘saw’, produces the FSA in (14b). And similarly for the third word, which leaves us with (14c), an FSA which generates only the sentence being processed.

(14)

ran 0.25

John 0.6

a. 1

2

3

saw 0.75

it 0.7 4

5

them 0.3 6

John 0.6

b. 1

2

saw 0.75

it 0.7 4

5

them 0.3 6

John 0.6

c. 1

2

saw 0.75

it 0.7

5

4

What this provides is a sequence of grammars, each of which (in a certain relatively externalistic, but nonetheless useful sense) characterizes the comprehender’s knowledge state at a particular point in the sentence: (14a), for example, represents combining the static “background” knowledge encoded in the original grammar in (13) (grammatical knowledge in the familiar sense), with the knowledge that the first word of the sentence currently being processed is ‘John’. More precisely, the set of expressions generated by (14a) is the intersection of (a) the set of expressions generated by the original

formal methods in experimental syntax

407

grammar in (13), and (b) the set of expressions whose pronunciation begins with the word ‘John’. For this reason, we can refer to (14a) as an “intersection grammar.” Similarly, (14b) is the intersection grammar that brings together the background knowledge in (13) and the information that the first two words are ‘John saw’.15 Notice now that if we sum the probabilities assigned to all the sentences generated by (13) the total is 1.0, but if we do the same for the subsequent intersection grammars the total is less than 1—0.6 in (14a) and 0.6 × 0.75 = 0.45 in (14b).16 The fact that, out of the 1.0 probability mass in (13), there is 0.6 remaining in (14a), is precisely the fact that the probability of a sentence generated by (13) beginning with the word ‘John’ is 0.6—which is to say, the fact that the surprisal at the first word is −log 0.6. ) ( ) ( total probability mass in (14a) 0.6 = −log surprisal at ‘John’ = −log total probability mass in (13) 1.0

= −log 0.6 = 0.74 And similarly:

(

surprisal at ‘saw’ = −log

total probability mass in (14b) total probability mass in (14a)

)

(

= −log

0.6 × 0.75 0.6

)

= −log 0.75 = 0.42 ( surprisal at ‘it’ = −log

total probability mass in (14c) total probability mass in (14b)

)

(

= −log

0.6 × 0.75 × 0.7 0.6 × 0.75

)

= −log 0.7 = 0.52 So generally: (15) surprisal at word i = −log

(

total probability mass in Gi total probability mass in Gi−1

)

where Gi is the intersection grammar that combines the comprehender’s “static” mental grammar with the first i observed words of the relevant sentence. 15

Formally, it’s useful to think of the sentence-prefix ‘John saw’ being represented by a very simple FSA that only has three states: The first state is its start state, and it has only one outgoing transition, which emits ‘John’ and leads to the second state; this second state has only one outgoing transition, which emits ‘saw’ and leads to the third, final state; and this third state has a self-loop that can emit any word at all (see e.g. Hale 2006: 648). This FSA generates all word-sequences that begin with ‘John saw’. And given any two FSAs, there exists a simple mechanical procedure for constructing their intersection, i.e. a new FSA that generates precisely what the original two FSAs both generate (via the “cross-product construction,” (Rabin and Scott, 1959: 119)). So it is straightforward to produce the FSAs in (14) using this procedure. 16 So strictly speaking these intersection grammars do not define their own probability distributions over the generated sentences, they merely encode certain subsets of the distribution defined by (13).

408

tim hunter

Connecting back to the setup in Section 12.1, we have now extracted word-by-word surprisal values for the sentence ‘John saw it’ both from the simple table in (9) and from the FSA in (13). The former yielded the three-tuple ⟨0.74, 1.58, 0.42⟩, and the latter yielded ⟨0.74, 0.42, 0.53⟩. These amount to two distinct predictions, extracted by applying a single complexity metric to two different hypotheses about the “underlying nature” of a single sentence (though see footnote 10). (These two predictions came from sources of “different kinds,” namely a lookup table and an FSA, but of course we could begin with two distinct FSAs and extract distinct predictions.) This is analogous to the way the node ratio metric was applied to the two different hypothesized tree structures for the sentence ‘a b c d’ in (4), to extract the distinct values 74 and 54 . The linking hypothesis in (8) connects word-by-word reading-time measures to the individual values in a sequence like ⟨0.74, 0.42, 0.53⟩, and therefore allows reading-time measures to serve as evidence, all else being equal, for or against particular grammatical hypotheses.

12.2.3 Hierarchical structure The important thing to note about the formulation of surprisal in (15) is that it provides us with a recipe for calculating incremental surprisal values on the basis of a probabilistic grammar which does not have the “which word comes next?” structure that FSAs have. For any type of grammar one might be interested in (i.e. any system a syntactician might have devised for breaking down the generative work that gives rise to a sentence), as long as we are able to • intersect a grammar with a given initial portion of a sentence (i.e. the first n words), and • calculate the total probability mass remaining in such a grammar, then we will be able to calculate incremental surprisal values from that kind of grammar. And of course, the particular structured arrangement of interlocking pieces that a grammar posits will impose constraints on the definable probability distributions that carry on through each of these intersection grammars, and therefore have an impact on the calculated surprisal values. It turns out that both of these things can be done with many other familiar kinds of grammars that are more powerful than FSAs, such as (probabilistic) context-free grammars (PCFGs). The details for PCFGs are less intuitive, but a couple of illustrative examples may serve to convey the basic idea. To begin, it is not difficult to see how to intersect the context-free grammar in (16) with the prefix ‘John’ to produce the grammar in (17), since these two grammars are equivalent to the FSAs in (13) and (14a) respectively. We simply remove the rule that generates ‘Mary’. (Although note that things would be more complicated in an FSA where it was possible to return to the start state.)

formal methods in experimental syntax (16)

0.4 0.6 0.25 0.75 0.7 0.3

X1 X1 X2 X2 X4 X4

→ → → → → →

Mary X2 John X2 ran saw X4 it them

(17)

0.6 0.25 0.75 0.7 0.3

X1 X2 X2 X4 X4

→ → → → →

John X2 ran saw X4 it them

409

What distinguishes a CFG from an FSA, however, is its ability to depart from the strictly “right-branching” kind of structure that (16) generates. Consider for example the more interesting grammar in (18). An example of a structure that it generates is shown in (19). (18)

1.0 0.6 0.4 0.8 0.2 0.7 0.3 0.5 0.3 0.2 1.0

S NP NP D D N N VP VP VP V

→ → → → → → → → → → →

NP VP John DN the a cat dog V NP VS left believes

(19) S NP D the

N cat

VP V believes

S NP John

VP left

Intersecting this grammar with the one-word prefix ‘the’ is less straightforward than the first example above, because the remaining portion of the sentence is not required to be a single constituent. (In fact, in this grammar, it will never be a single constituent.) Furthermore, although it is clear from the appearance of ‘the’ as the first word that some

410

tim hunter

NP node must be expanded according to the rule ‘NP → D N’, we cannot assume that all NP nodes will be expanded according to this rule and ignore the other NP rules, in the way that we ignored “roads not taken” in the simpler example. A mechanical procedure exists for solving these problems, however (Bar-Hillel et al. 1961).17 The result, for the prefix ‘the’, is shown in (20). It contains, in addition to all of the “original” rules from (18) (shown on the right), three new rules (shown on the left) which describe how the new non-terminals S′ , NP′ and D′ are used. Note also that the start symbol of this new grammar is S′ (not S). (20)

1.0 S′

→ NP′ VP

0.4 NP′ → D′ N 0.8 D′ → the

1.0 0.6 0.4 0.8 0.2 0.7 0.3 0.5 0.3 0.2 1.0

S NP NP D D N N VP VP VP V

→ → → → → → → → → → →

NP VP John DN the a cat dog V NP VS left believes

Each of the three new rules can be thought of as a specialized instance of a rule from the original grammar (and they are shown alongside the corresponding original rules in (20)). Since the start symbol of the new grammar is S′ , these new rules amount to instances of the original rules that are forced to apply; for example, the NP′ that is the left daughter of the root S′ can only be expanded as D′ and N (not as ‘John’). The upshot is that any derivation beginning with the new start symbol S′ must produce at least this partial structure: (21) Sʹ NPʹ Dʹ the

VP N

The “primed” non-terminals represent non-terminals whose expansion is somehow restricted by the input we have seen so far (in this case, the word ‘the’). Notice that 17

For other presentations see Grune and Jacobs (2008: ch. 13), Nederhof and Satta (2003: §4), and Nederhof and Satta (2008b, §3). Specifically, the issue here is how to intersect a prefix FSA with a contextfree grammar, just as we earlier (footnote 15) needed to intersect a prefix FSA with another FSA.

formal methods in experimental syntax

411

since all the primed non-terminals have now been eliminated, and all that remains is to somehow expand an N node and a VP node, the rest of the derivation can proceed with all of the freedom that it would have if we were using the original grammar in (18), i.e. using the rules shown on the right in (20).18 Besides constructing these intersection grammars, recall that the second important prerequisite for deriving surprisal values is being able to calculate the total probability mass that is “left” in an intersection grammar such as the one in (20). The three new rules shown on the left have the same probabilities as the original rules they are based on, so the total probability assigned to all sentences generated by this grammar is less than 1 just as it was for the intersection grammars in (14). In the particular case of (20) it is perhaps not difficult to see that total probability assigned by this intersection grammar is 1.0×0.4×0.8, since these are the probabilities of the three rules that we have been forced to use in order to generate ‘the’ as the first word. Methods for computing this probability in general are discussed in detail by Nederhof and Satta (2008a).19 We therefore have everything we need to compute word-by-word surprisal values from a PCFG, following the formulation in (15). Finally, it is worth stressing again that while I have restricted attention to surprisal here, there are other ways in which one might choose to quantify cognitive workload based on how the range of grammatical possibilities is affected by each new incoming word, i.e. the “difference” or “change” between two intersection grammars. Surprisal is merely the simplest choice for illustration. A notable alternative is entropy reduction, which can be expressed as in (22), analogous to the expression of surprisal in (15) above. (22) entropy reduction at word i = entropy of Gi−1 − entropy of Gi So calculating entropy reduction values from a grammar involves computing the same intersection grammars as we have used above, and differs from calculating surprisal values only in that (i) we calculate the entropy of each Gi rather than its remaining probability mass, and (ii) the “change” is quantified by the result of a subtraction rather

18

The procedure for constructing these intersection grammars has turned out to be surprisingly closely related to certain approaches to parsing, in particular tabular parsing of the sort used in the wellknown CKY algorithm. The key idea is that parsing reduces to intersection with an FSA that generates exactly one sentence; see Grune and Jacobs (2008: ch. 13), Lang (1988), Billot and Lang (1989). The CKY algorithm is due to Cocke and Schwartz (1970), Kasami (1965), and Younger (1967); see also Hopcroft and Ullman (1979: 139–141), Aho and Ullman (1972: 314–320), Jurafsky and Martin (2000: 453–455), Grune and Jacobs (2008: §4.2). 19 Other methods of calculating prefix probabilities from a PCFG, presented without reference to intersection grammars, were described by Jelinek and Lafferty (1991) and Stolcke (1995). Goodman (1998: 71–77) gives another solution that more closely resembles one based on intersection grammars. In light of the connection discussed in footnote 18, the distinction between methods that use intersection grammars and those that don’t largely dissolves; intersection grammars are just one way to look at what needs to be done. Nederhof and Satta (2008b) review the explicit construction of intersection grammars in §3, the calculation of remaining probability mass (the “partition function”) in §2, and Jelinek and Lafferty’s distinct method in §7.

412

tim hunter

than a division.20 Nearly everything I have said about the overall approach outlined in this section is therefore equally applicable to entropy reduction, and to many other imaginable variants of these metrics.

12.3 Automata-theoretic parsing models

..........................................................................................................................

In this section I will introduce three distinct parsing methods for context-free grammars: bottom-up, top-down, and left-corner. The first two, bottom-up and top-down, are very simplistic and not particularly plausible as cognitive models, but they are useful stepping-stones for understanding the more complicated left-corner method, which better corresponds to (at least certain aspects of) human parsing.21 All three of the methods discussed here will be expressed as transition systems: Given the task of parsing a particular sentence using a particular grammar, each method will specify a particular starting configuration and a particular goal configuration; and in addition, will specify the allowable transitions by which we can progress from one configuration to another. If there is a sequence of transitions that leads from the starting configuration to the goal configuration, then the sentence is grammatical (and the sequence of transitions determines its structural description); otherwise, it is ungrammatical (i.e. has no structural description). Note that what I am describing here as parsing method takes both a sentence and a grammar as “inputs.” The parsing methods themselves make no reference to particular words or particular grammatical categories (such as NP or VP); rather, they are able to operate with whatever context-free grammar one provides. This contrasts with the way the term “parser” is sometimes used, namely to refer to a mechanism that takes only a sentence as an input and attempts to find an analysis for that sentence according to some grammar that is implicitly specified within the workings of that mechanism. What these parsing methods provide is something like a “Marr (1982) level two” description of a mechanism for processing sentences. These algorithmic details provide plenty of hooks on which we can hang various linking hypotheses. The simple one that I will end up appealing to in this section is based on the idea that comprehension difficulty arises when the amount of information that needs to be simultaneously stored in order to grammatically analyze a sentence is large.

20 Another way of seeing these two metrics as having the same general form comes from the fact that surprisal at the ith word is equivalent to the relative entropy of the Gi with respect to Gi−1 , also known as the Kullback-Leibler (KL) divergence of Gi from Gi−1 (Levy, 2005, 2008). A detail that I am glossing over here is that to calculating either entropy of KL divergence requires that an intersection grammar be renormalized; see Nederhof and Satta (2008b: §4). 21 The content of this section is very similar to that of (at least) Abney and Johnson (1991), Resnik (1992), Wolf and Gibson (2006), and Kanazawa (2016: lecture 2). The core underlying ideas go back to Chomsky and Miller (1963), Chomsky (1963), and Miller and Chomsky (1963).

formal methods in experimental syntax

413

Throughout this section I will use the grammar in (23) for illustration. In presenting the general schemas for the various parsing methods, I will assume that the right-hand side of each grammar rule is either (a) a single terminal symbol, or (b) a sequence of nonterminal symbols.22 I will also assume, as is common, that the start symbol is always S. (23)

S → NP VP S → WHILE S S NP → NP POSS N NP → (D) N (PP) (SRC) (ORC) VP → V (NP) (PP) PP → P NP SRC → C VP ORC → NP V

N → dog, cat, rat, wife, brother NP → John, Mary V → barked, chased, bit, ate, fled D → the P → on, in, with C → that POSS → ’s WHILE → while

As a motivating running example, and as an aid to understanding the differences between the three parsing methods being discussed, we’ll consider ways to account for the pattern of human comprehension difficulty on left-embedded, right-embedded, and center-embedded structures: while left-embedding and right-embedding structures can increase in depth apparently without bound and still remain easily comprehensible, center-embedding structures quickly become incomprehensible. I’ll use the sentences in (24), (25), and (26) as illustrative examples for this point. (These particular sentences don’t make a well-designed set of controlled experimental stimuli, but will allow for an easy illustration of the key ideas using the simple grammar in (23) above.) (24) Left-branching structures a. John fled b. John ’s dog fled c. John ’s wife ’s dog fled (25) Right-branching structures a. Mary chased the cat b. Mary chased the cat that bit the rat c. Mary chased the cat that bit the rat that ate the cheese (26) Center-embedding structures a. the rat fled b. the rat the cat chased fled c. the rat the cat the dog bit chased fled Specifically, I will present an illustrative account of the fact that there is an increase in comprehension difficulty in (26c) relative to (26b), but no such increase in (24b)/(24c) 22

Any context-free grammar can be straightforwardly converted into this form, via the introduction of “singleton” nonterminals like WHILE in (23).

414

tim hunter

or (25b)/(25c). (I leave aside any facts about the ‘a.’ sentences, which are shown just to demonstrate the sense in which these three pair-wise comparisons are analogous.) In concrete terms, the aim will be to find some function F such that F(tree for (24b)) = F(tree for (24c)) F(tree for (25b)) = F(tree for (25c)) F(tree for (26b)) < F(tree for (26c)) such that, in combination with the linking hypothesis that (27) A speaker processing the string PHON(t) will experience greater perceptual difficulty the greater F(t) is. this simplified23 pattern of facts has an explanation. The form of the explanation is exactly the one illustrated with the node-ratio example from Miller and Chomsky (1963) mentioned in Section 12.1. Of course since F is a function of the tree structures that a grammar associates with the relevant strings, the predictions we end up making will be sensitive to our choice of grammar. The grammar in (23) generates all the sentences in (24), (25), and (26), with structures like the following. These are of course extremely simplistic, particularly as regards the treatment of relative clauses (SRC stands for “subject relative clause,” ORC for “object relative clause”), but they capture the basic structural configurations well enough for our purposes. Specifically, these trees show structures for the ‘b.’ sentences in (24), (25), and (26); structures for the ‘a.’ sentences would have just one of the highlighted constituents each, and structures for the ‘c.’ sentences would have three of the highlighted constituents each. Nothing in particular hinges on the fact that the highlighted self-embedded constituents are NPs in all three cases; in (29) we could just as well highlight the relationship between the two VP nodes. The important point is that in (28) there are constituents appearing as the left portion of a larger constituent of the same type, in (29) there are constituents appearing as the right portion, and in (30) there are constituents appearing as medial portions of a larger constituent of the same type. (28)

S NP NP John

VP

V POSS N ’s dog fled

23 In particular, it’s not at all clear that there’s much of a difference between, on the one hand, the (24a)/(24b) and (25a)/(25b) comparisons, and on the other, the (26a)/(26b) comparison. If there’s not, then the distinctive status of (26c) might be better explained as some kind of “overflow” effect based on a function that has similar properties to the hypothetical F introduced above. Note also that I am leaving aside all comparisons between sentence of different embedding-types, so for example the relationship between F(tree for (25b)) and F(tree for (26b)) will not play any role in any predictions.

formal methods in experimental syntax (29)

415

S NP Mary

VP NP

V chased

SRC

D N the cat C that

VP V bit

NP D N the rat

(30)

S NP

VP

D N the rat

ORC NP

V fled

V chased

D N the cat

12.3.1 Bottom-up parsing In bottom-up parsing, as in all three of the methods we will discuss here, a configuration has two parts, a buffer and a stack: The buffer is a record of our left-to-right progress through the sentence, and the stack is a record of what we have worked out on the basis of that part of the sentence that we have consumed so far.24 The stack takes the form of a sequence of non-terminal symbols. I will write configurations as an ordered pair, with the stack first and the buffer second. I will represent 24 This means that the parsing device is essentially a pushdown automaton (PDA); see Sipser (1997: ch. 2), Hopcroft and Ullman (1979: ch. 5), Partee et al. (1990: ch. 18). A parsing device as presented here, however, differs from the usual presentation of PDAs in that I am omitting any reference to the automaton’s internal state (or, assuming that each automaton only has one state that does not change).

416

tim hunter

buffers with a ‘|’ symbol separating the input consumed so far from that which remains. The general idea behind bottom-up parsing is that we read in input from left-toright, and when we find two or more elements adjacent to each other that match with the right-hand side of some grammar rule, we replace those elements with the nonterminal symbol that appears on the left-hand side of that rule. For example, if the relevant grammar contains a rule ‘VP → V NP’ and we find occurrences of V and NP adjacent to each other (in that order), then they can be replaced on the stack with VP; this occurrence of VP may itself subsequently be replaced by some other symbol on the basis of a rule which has VP somewhere on its right-hand side. The “goal” is to reach a configuration where the stack contains only the start symbol, S. The bottom-up parsing method is defined in (31). Since these transitions only manipulate the right edge of the sequence of non-terminal symbols that is the first component of each configuration, I will refer to that edge as the “top” of the stack. A shift transition consumes a word (wi ) of input (moving the marker one place to the right in the buffer), and adds that word’s category (X) to the top of the stack; a reduce transition leaves our position in the input unchanged but operates on the stack, replacing the symbols that appear on the right hand side of some rule (Y1 . . . Ym ) with the symbol that appears on the left hand side of that rule (X). (31) Given a sentence w1 . . . wn to be parsed and a grammar: • starting configuration: (ε, | w1 . . . wn ) • goal configuration: (S, w1 . . . wn | ) • shift transitions: (Σ, w1 . . . | wi . . . wn ) =⇒ (ΣX, w1 . . . wi | . . . wn ) if there is a rule X → wi in the grammar • reduce transitions: (ΣY1 . . . Ym , w1 . . . | . . . wn ) =⇒ (ΣX, w1 . . . | . . . wn ) if there is a rule X → Y1 . . . Ym in the grammar (where Σ is a placeholder for sequences of nonterminal symbols) The workings of these transitions are most easily understood via an example. The actions of a bottom-up parser on the sentence ‘the dog chased the cat’ are shown in Table 12.1. We begin in the starting configuration, with the stack empty and with the full sentence remaining to be read, as indicated by the marker at the beginning of the buffer. The first shift step consumes the first word, ‘the’ (moving the marker one step to the right in the buffer), and puts its category, D, onto the stack. The second step does likewise for the word ‘dog’, and its category N. We then find ourselves with the sequence D N on the stack, which matches the right hand side of one of the grammar’s rules, specifically the rule NP → D N. We can therefore take a reduce transition at step 3, replacing the current stack contents with NP and leaving the buffer’s progress marker unchanged in its position after ‘dog’. In the next two steps, two more shift transitions consume ‘chased’ and ‘the’. No reduce transition is possible after Step 5, when the stack contains NP V D,

formal methods in experimental syntax

417

Table 12.1 A first illustration of bottom-up parsing. 0 1 2 3 4 5 6 7 8 9

Type of transition

Rule used

Configuration

—

— D → the N → dog NP → D N V → chased D → the N → cat NP → D N VP → V NP S → NP VP

(ε, | the dog chased the cat) (D, the | dog chased the cat) (D N, the dog | chased the cat) (NP, the dog | chased the cat) (NP V, the dog chased | the cat) (NP V D, the dog chased the | cat) (NP V D N, the dog chased the cat | ) (NP V NP, the dog chased the cat | ) (NP VP, the dog chased the cat | ) (S, the dog chased the cat | )

SHIFT SHIFT REDUCE SHIFT SHIFT SHIFT REDUCE REDUCE REDUCE

because neither V D nor NP V D is the right hand side of any grammar rule.25 A reduce transition next becomes possible only after the last word of the input (‘cat’) is consumed in Step 6, at which point the D N sequence that is on the top of the stack can be replaced with NP (Step 7). This then feeds the final two reduce steps in Step 8 and Step 9, after which we have reached a configuration where all the input has been consumed (the progress marker is at the far right of the buffer) and the stack contains just the single symbol S, i.e. we have reached a goal configuration.

12.3.1.1 Ambiguity resolution The only transitions shown in Table 12.1 are the “correct” transitions that form a path from the starting configuration to the goal configuration. But of course there are other transitions that we could have taken instead at certain points. For example, instead of applying reduce in Step 3 we could have applied shift, which would have led to the configuration (D N V, the dog chased | the cat); and from that point, we could have carried out the same sequence of steps as appear in Table 12.1 to process the VP constituent, with D N at the bottom of the stack throughout instead of NP. This path hits a dead end, however, after the reduce step that forms the VP constituent, because we end up with D N VP in the stack (instead of NP VP) and there is now no way to “get at” the D and N that should be combined to form an NP: in order to be acted on by reduce a sequence of nonterminal symbols must be at the top of the stack, which means that the D and the N “missed their chance” after Step 2. It turns out that there is exactly one sequence of “correct” transitions for each derivation licensed by the grammar: if a word-sequence has two grammatical derivations (i.e. is structurally ambiguous) then

For now I am glossing over the fact that a reduce transition, using the VP → V rule, is also possible after Step 4. 25

418

tim hunter

there will be two distinct transition-sequences for that word-sequence that both end at the goal configuration.26 In a simple example such as Table 12.1, it is relatively straightforward to identify which transitions will turn out to be “wrong turns.” But the general idea of deciding which transition to take from a particular configuration corresponds to decisions about (local or global) ambiguity. In this case choosing not to apply reduce after Step 2 perhaps seems “obviously wrong,” but in general these kinds of choices correspond to decisions about ambiguity, either local or global. In the case of a globally ambiguous sentence, it will be possible to identify the particular point at which the paths to the two possible structural descriptions diverge. To illustrate a case of local ambiguity, consider the sentence in (32). In keeping with the “late closure” heuristic (Frazier and Rayner 1982; Frazier and Clifton 1996), comprehenders often analyze ‘the dog that barked’ as the object of the embedded verb ‘chased’; this leads to a mild garden-path effect when it becomes apparent, at ‘fled’, that this phrase is in fact the matrix subject. (32) While John chased the dog that barked fled. When the bottom-up parsing system is applied to this sentence, a choice point arises between shift and reduce after ‘chased’, corresponding to the decision about whether to analyze this verb as transitive or intransitive or not. The two tables in (33) show the point where these two paths diverge, namely after the first two lines which are the same in each. The late closure heuristic says exactly that shift transitions, such as the one taken in the third line of the table in (33a), should be preferred over reduce transitions whenever a choice between them arises. (33)

a.

···

26

SHIFT

··· V → chased

(WHILE NP, while John | chased the dog that barked fled) (WHILE NP V, while John chased | the dog that barked fled)

SHIFT

D → the

SHIFT

N → dog

··· ···

··· ···

(WHILE NP V D, while John chased the | dog that barked fled) (WHILE NP V D N, while John chased the dog | that barked fled) ··· (WHILE NP V S, while John chased the dog that barked fled | )

In particular, the sequence of configurations visited by a bottom-up parser corresponds exactly to a reverse rightmost context-free derivation: moving upwards in Table 12.1 corresponds to rewriting the rightmost non-terminal symbol at each step.

formal methods in experimental syntax

419

b. SHIFT

···

··· V → chased

(WHILE NP, while John | chased the dog that barked fled) (WHILE NP V, while John chased | the dog that barked fled)

REDUCE

VP → V

(WHILE NP VP, while John chased | the dog that barked fled) (WHILE S, while John chased | the dog that barked fled) (WHILE S D, while John chased | the dog that barked fled) ··· (WHILE S S, while John chased the dog that barked fled | ) (S, while John chased the dog that barked fled | )

SHIFT

S → NP VP D → the

··· ···

··· ···

REDUCE

S → WHILE S S

REDUCE

A simple view of the reanalysis that is required in this example would be to imagine a device that continues down the path taken in (33a) until it reaches a dead end after consuming ‘fled’ (there is nowhere to go from the transition which has ‘WHILE NP V S’ on the stack), and then returns, or “backtracks,” to the divergence point to pursue the alternative path taken in (33b).

12.3.1.2 Embedding structures Let us turn to the pattern noted in (24), (25), and (26) above. The workings of the bottom-up parsing system on the two-clause and three-clause center-embedded sentences, (26b) and (26c), are shown in Table 12.2. (Here and in other tables below, instead of writing out the buffer in full I simply write a number indicating the position in the sentence where the ‘|’ symbol would appear.) Notice that in order to analyze these sorts of structures, we must shift all the initial determiners and nouns before any reduce transitions can be taken, since the first chance we get to combine things into a completed constituent is only after the last determiner-noun pair has been consumed (‘the cat’ in (26b), ‘the dog’ in (26c)). A natural idea to consider is that what makes (26c) noticeably more difficult than (26b) is that processing (26c) requires an ability to maintain six symbols at a time in a stack-based memory system (see the configuration after shifting ‘dog’), whereas processing (26b) only requires being able to maintain four (see the configuration after shifting ‘cat’).27 Recall however that it is only center-embedding structures that cause this drastic increase in comprehension difficulty. Next consider the workings of a bottom-up parsing system on the two-clause and three-clause left-embedded sentences, (24b) and (24c), shown in Table 12.3. In these scenarios it is possible to perform a reduce transition after consuming a beginning portion matching the pattern NP POSS N, as shown in the fourth step in both parts of Table 12.3. Note, however, that this is possible no matter how deeply the NP constituent thus formed needs to eventually be embedded, i.e. whether the NP constituent thus formed eventually turns out to be the matrix subject, as it is in 27

The point would not be significantly affected if we adopted a structure where ‘the rat’ was a subconstituent of ‘the rat the cat chased’. If the structure were [NP [NBAR the rat] [ORC the cat chased] then reduce steps forming the NBAR subconstituents could be performed early, but we would still find ourselves accumulating a number of NBARs on the stack that increases with the depth of embedding.

420

tim hunter

Table 12.2 The effect of center-embedding on bottom-up parsing. Memory load increases as embedding depth increases: maximum of 4 symbols for (26b) but 6 symbols for (26c). (26b): 0 the 1 rat 2 the 3 cat 4 chased 5 fled 6 Transition

Rule used

Configuration

—

— D → the N → rat D → the N → cat NP → D N V → chased ORC → NP V NP → D N V → fled VP → V S → NP VP

(ε, 0) (D, 1) (D N, 2) (D N D, 3) (D N D N, 4) (D N NP, 4) (D N NP V, 5) (D N ORC, 5) (NP, 5) (NP V, 6) (NP VP, 6) (S, 6)

SHIFT SHIFT SHIFT SHIFT REDUCE SHIFT REDUCE REDUCE SHIFT REDUCE REDUCE

(26c): 0 the 1 rat 2 the 3 cat 4 the 5 dog 6 bit 7 chased 8 fled 9 Transition

Rule used

Configuration

—

— D → the N → rat D → the N → cat D → the N → dog NP → D N V → bit ORC → NP V NP → D N ORC V → chased ORC → NP V NP → D N ORC V → fled VP → V S → NP VP

(ε, 0) (D, 1) (D N, 2) (D N D, 3) (D N D N, 4) (D N D N D, 5) (D N D N D N, 6) (D N D N NP, 6) (D N D N NP V, 7) (D N D N ORC, 7) (D N NP, 7) (D N NP V, 8) (D N ORC, 8) (NP, 8) (NP V, 9) (NP VP, 9) (S, 9)

SHIFT SHIFT SHIFT SHIFT SHIFT SHIFT REDUCE SHIFT REDUCE REDUCE SHIFT REDUCE REDUCE SHIFT REDUCE REDUCE

(24b), or a subconstituent of the matrix subject, as it is in (24c). Either way we arrive at a configuration where one NP symbol on the stack suffices to track our progress through the structure. And, importantly, in the more deeply embedded case of (24c), the resulting NP can serve as the starting point for another iteration of the “loop” indicated by the large braces in Table 12.3. These left-branching sentences differ only in how many times we go around that loop: processing the larger sentence in (24c) just involves revisiting some of the stack arrangements that also appear in the course of processing (24b),

formal methods in experimental syntax

421

Table 12.3 The effect of left-embedding on bottom-up parsing. No increase in memory load as embedding depth increases: maximum of 3 symbols in both cases. (24b): 0 John 1 ’s 2 dog 3 fled 4 Transition

Rule used

Configuration

—

— NP → John POSS → ’s N → dog NP → NP POSS N V → fled VP → V S → NP VP

(ε, 0) (NP, 1) (NP POSS, 2) (NP POSS N, 3) (NP, 3) (NP V, 4) (NP VP, 4) (S, 4)

SHIFT SHIFT SHIFT REDUCE SHIFT REDUCE REDUCE

  

(24c): 0 John 1 ’s 2 wife 3 ’s 4 dog 5 fled 6 Transition

Rule used

Configuration

—

— NP → John POSS → ’s N → wife NP → NP POSS N POSS → ’s N → dog NP → NP POSS N V → fled VP → V S → NP VP

(ε, 0) (NP, 1) (NP POSS, 2) (NP POSS N, 3) (NP, 3) (NP POSS, 4) (NP POSS N, 5) (NP, 5) (NP V, 6) (NP VP, 6) (S, 6)

SHIFT SHIFT SHIFT REDUCE SHIFT SHIFT REDUCE SHIFT REDUCE REDUCE

     

and the same would be true if we extended the pattern to create a yet longer sentence.28 This means that the number of symbols a device must be able to maintain simultaneously will not grow as the size of the left-branching structure to be processed grows. Specifically, we can observe that the maximum number of symbols stored at a time is three in both parts of Table 12.3. This gives us the basis of an explanation for why large center-embedding structures become difficult to comprehend but large left-branching structures do not. Concretely, what we can propose amounts to a new function F(t) defined on structural descriptions of the sort mentioned in Section 12.1. Given a context-free parse tree, there is a uniquely determined sequence of configurations that a bottom-up parser must move through to process it, and therefore also a uniquely determined “maximum stack size.” Let us write MaxStackBU (t) for this maximum stack size (BU for “bottom-up”; we will consider stack sizes for other parsing systems below). Then for the examples

28

i.e. ‘John’s brother’s wife’s dog fled’.

422

tim hunter

considered so far we have the following pattern: MaxStackBU (tree for (24b)) = MaxStackBU (tree for (24c))

(left-branching)

MaxStackBU (tree for (26b)) < MaxStackBU (tree for (26c))

(center-embedding)

which in combination with the linking hypothesis that (34) A speaker processing the string PHON(t) will experience greater perceptual difficulty the greater MaxStackBU (t) is. correctly predicts the relevant facts for these four sentences. For the other pair however, i.e. (25b) and (25c), this theory makes the wrong predictions. The workings of the bottom-up parser on these right-branching sentences are shown in Table 12.4. The completely right-branching structure means that the bottomup parser must shift the entire sentence before performing any reductions at all; the first complete constituent that it can construct is the NP consisting of the last two words of the sentence. So for a bottom-up parser, right-branching structures share with centerembedding structures the property that the memory load increases with the depth of embedding. MaxStackBU (tree for (25b)) < MaxStackBU (tree for (25c))

(right-branching)

The linking hypothesis in (34) therefore incorrectly predicts that (25c) should show greater comprehension difficulty than (25b). This incorrect prediction of course also arises from having chosen the grammar in (23). The linking hypothesis serves to expose this grammar to evidence concerning comprehension difficulty. So of course one possible response to finding this incorrect prediction is to maintain (34) and reject the grammar in (23), and seek some other grammar that assigns different tree structures for (25b) and (25c) (e.g. perhaps leftbranching structures) so that the correct predictions arise. Let us suppose though that we are more confident in our hypothesis that (23) is the correct mental grammar for the relevant speakers, than we are in the linking hypothesis in (34); then a reasonable next step is to seek some alternative to bottom-up parsing, rejecting (34) but leaving our hypothesized grammar in (23) in place.

12.3.2 Top-down parsing As an alternative to bottom-up parsing, we can consider top-down parsing. As the names suggest, these two parsing methods can be thought of as inverses of each other: Whereas bottom-up parsing works from the words up to the start symbol, top-down parsing works from the start symbol down to the words. The top-down parsing method is defined in (35). Since the top-down parsing transitions only manipulate the left edge of the sequence of non-terminal symbols, I will refer to that edge as the “top” of the stack. The starting configuration has the start symbol on

formal methods in experimental syntax

423

Table 12.4 The effect of right-embedding on bottom-up parsing. Memory load increases as embedding depth increases: maximum of 8 symbols for (25b) but 12 symbols for (25c). (25b): 0 Mary 1 chased 2 the 3 cat 4 that 5 bit 6 the 7 rat 8 Transition

Rule used

Configuration

—

— NP → Mary V → chased D → the N → cat C → that V → bit D → the N → rat NP → D N VP → V NP SRC → C VP NP → D N SRC VP → V S → NP VP

(ε, 0) (NP, 1) (NP V, 2) (NP V D, 3) (NP V D N, 4) (NP V D N C, 5) (NP V D N C V, 6) (NP V D N C V D, 7) (NP V D N C V D N, 8) (NP V D N C V NP, 8) (NP V D N C VP, 8) (NP V D N SRC, 8) (NP V NP, 8) (NP VP, 8) (S, 8)

SHIFT SHIFT SHIFT SHIFT SHIFT SHIFT SHIFT SHIFT REDUCE REDUCE REDUCE REDUCE REDUCE REDUCE

(26b) 0 Mary 1 chased 2 the 3 cat 4 that 5 bit 6 the 7 rat 8 that 9 ate 10 the 11 cheese 12 Transition

Rule used

Configuration

—

— NP → Mary V → chased D → the N → cat C → that V → bit D → the N → rat C → that V → are D → the N → cheese NP → D N VP → V NP SRC → C VP NP → D N SRC VP → V NP SRC → C VP NP → D N SRC VP → V S → NP VP

(ε, 0) (NP, 1) (NP V, 2) (NP V D, 3) (NP V D N, 4) (NP V D N C, 5) (NP V D N C V, 6) (NP V D N C V D, 7) (NP V D N C V D N, 8) (NP V D N C V D N C, 9) (NP V D N C V D N C V, 10) (NP V D N C V D N C V D, 11) (NP V D N C V D N C V D N, 12) (NP V D N C V D N C V NP, 12) (NP V D N C V D N C VP, 12) (NP V D N C V D N SRC, 12) (NP V D N C V NP, 12) (NP V D N C VP, 12) (NP V D N SRC, 12) (NP V NP, 12) (NP VP, 12) (S, 12)

SHIFT SHIFT SHIFT SHIFT SHIFT SHIFT SHIFT SHIFT SHIFT SHIFT SHIFT SHIFT REDUCE REDUCE REDUCE REDUCE REDUCE REDUCE REDUCE REDUCE REDUCE

424

tim hunter

the stack, and the goal configuration has an empty stack. A predict transition eliminates a symbol (X) currently at the top of the stack, by guessing a particular grammar rule that can be used to expand it and replacing it with that rule’s right-hand side (Y1 . . . Ym ). A match transition eliminates a symbol (X) currently at the top of the stack by consuming a word (wi ) of the corresponding category. (35) Given a sentence w1 . . . wn to be parsed and a grammar: • starting configuration: (S, | w1 . . . wn ) • goal configuration: (ε, w1 . . . wn | ) • predict transitions: (XΣ, w1 . . . | . . . wn ) =⇒ (Y1 . . . Ym Σ, w1 . . . | . . . wn ) if there is a rule X → Y1 . . . Ym in the grammar • match transitions: (XΣ, w1 . . . | wi . . . wn ) =⇒ (Σ, w1 . . . wi | . . . wn ) if there is a rule X → wi in the grammar (where Σ is a placeholder for sequences of non-terminal symbols) Again, the mechanics of these transition rules are most easily understood via an example. The progress of a top-down parsing on the sentence ‘the dog chased the cat’ are shown in Table 12.5. We begin with the start symbol S on the stack, and the full sentence remaining to be read. The first two steps expand this occurrence of S according to the rule ‘S → NP VP’, and then expand the introduced occurrence of NP according to the rule ‘NP → D N’. The upshot of these first two steps is that we have replaced the task of fulfilling a prediction that ‘the dog chased the cat’ is a sentence (S) with the task of fulfilling a prediction that ‘the dog chased the cat’ is of the form ‘D N VP’. In the third step we make some progress towards fulfilling this prediction: Since the first word to be consumed is ‘the’ and this matches the category D that is currently on the top of the stack, a match transition “cancels them out.” The fourth step similarly consumes the word ‘dog’ and eliminates the corresponding symbol N. After this step we are left with the task of fulfilling the prediction that ‘chased the cat’ is a VP. The fifth step replaces this VP prediction with the prediction of a V NP sequence. (Of course a “wrong turn” that could also be taken at this point would be to replace it with a prediction of simply a V instead; the particular wrong turns that present themselves

Table 12.5 A first illustration of top-down parsing. 0 1 2 3 4 5 6 7 8 9

Type of transition

Rule used

Configuration

—

— S → NP VP NP → D N D → the N → dog VP → V NP V → chased NP → D N D → the N → cat

(S, | the dog chased the cat) (NP VP, | the dog chased the cat) (D N VP, | the dog chased the cat) (N VP, the | dog chased the cat ) (VP, the dog | chased the cat ) (V NP, the dog | chased the cat) (NP, the dog chased | the cat) (D N, the dog chased | the cat) (N, the dog chased the | cat) (ε, the dog chased the cat |)

PREDICT PREDICT MATCH MATCH PREDICT MATCH PREDICT MATCH MATCH

formal methods in experimental syntax

425

differ according to the parsing method adopted.) Next the V is matched (Step 6) and then the NP is expanded (Step 7). Note that expanding the NP before the V is matched is not an option, because the predict transition only manipulates the top of the stack. After match transitions consume the last two words in Step 8 and Step 9, we have reached the end of the sentence with no predictions remaining to be fulfilled (the stack is empty), so the goal configuration has been reached. A key point in understanding the relationship between bottom-up and top-down parsing—and also the left-corner parsing method, discussed in the next subsection, which can be seen as combining the advantages of both—is the fact that symbols on the stack in bottom-up parsing represent the already-processed part of the sentence, whereas symbols on the stack on top-down parsing represent (predictions about) the yet-to-come part of the sentence. In each bottom-up parsing configuration shown in Table 12.1, the sequence of non-terminal symbols corresponds to the portion of the sentence to the left of the ‘|’ symbol: for example, ‘D N’ after Step 2 is a description of ‘the dog’, and ‘NP V’ after Step 4 is a description of ‘the dog chased’. In the top-down parsing configurations shown in Table 12.5, however, the sequence of non-terminal symbols corresponds to the words to the right of the ‘|’ symbol: for example, ‘N VP’ after Step 3 represents a prediction that can be fulfilled by ‘dog chased the cat’, and ‘D N’ after Step 7 represents a prediction that can be fulfilled by ‘the cat’. A consequence of this difference is that the two methods differ in which points in the sentence allow for a compact representation of the parser’s current internal state. The bottom-up parser’s internal state after consuming the words ‘the dog’ can be represented compactly in one symbol, namely NP, as shown after Step 3 in Table 12.1. The top-down parser’s state at this point in the sentence can also be represented compactly in one symbol, namely VP, as shown after Step 4 in Table 12.5. But notice that only with the top-down method can a parser’s state after the word ‘chased’ be represented in a single symbol: since what is predicted at this point is a single NP constituent, this single symbol suffices for the top-down parser, as shown after Step 6 in Table 12.5, but the bottom-up parser has no one-symbol representation of its progress at this point since ‘the dog chased’ is not a constituent. The effects of left-branching, right-branching, and center-embedding structures on the stack requirements with top-down parsing can be clearly understood through these observations about the interpretations of stack symbols. Like bottom-up, the top-down method correctly predicts increasing maximum stack size as center-embedding depth increases, as shown in Table 12.6. But the top-down method shows the reverse pattern from the bottom-up method with regard to left-branching and right-branching structures, i.e. the maximum stack size increases for left-branching but not for rightbranching, as shown in Table 12.7 and Table 12.8. MaxStackTD (tree for (24b)) < MaxStackTD (tree for (24c))

(left-branching)

MaxStackTD (tree for (25b)) = MaxStackTD (tree for (25c))

(right-branching)

MaxStackTD (tree for (26b)) < MaxStackTD (tree for (26c)) (center-embedding)

426

tim hunter

Table 12.6 The effect of center-embedding on top-down parsing. Memory load increases as embedding depth increases: maximum of 4 symbols for (26b) but 5 symbols for (26c). (26b): 0 the 1 rat 2 the 3 cat 4 chased 5 fled 6 Transition

Rule used

Configuration

—

— S → NP VP NP → D N ORC D → the N → rat ORC → NP V NP → D N D → the N → cat V → chased VP → V V → fled

(S, 0) (NP VP, 0) (D N ORC VP, 0) (N ORC VP, 1) (ORC VP, 2) (NP V VP, 2) (D N V VP, 2) (N V VP, 3) (V VP, 4) (VP, 5) (V, 5) (ε, 6)

PREDICT PREDICT MATCH MATCH PREDICT PREDICT MATCH MATCH MATCH PREDICT MATCH

(26c): 0 the 1 rat 2 the 3 cat 4 the 5 dog 6 bit 7 chased 8 fled 9 Transition

Rule used

Configuration

—

— S → NP VP NP → D N ORC D → the N → rat ORC → NP V NP → D N ORC D → the N → cat ORC → NP V NP → D N D → the N → dog V → bit V → chased VP → V V → fled

(S, 0) (NP VP, 0) (D N ORC VP, 0) (N ORC VP, 1) (ORC VP, 2) (NP V VP, 2) (D N ORC V VP, 2) (N ORC V VP, 3) (ORC V VP, 4) (NP V V VP, 4) (D N V V VP, 4) (N V V VP, 5) (V V VP, 6) (V VP, 7) (VP, 8) (V, 8) (ε, 9)

PREDICT PREDICT MATCH MATCH PREDICT PREDICT MATCH MATCH PREDICT PREDICT MATCH MATCH MATCH MATCH PREDICT MATCH

The absence of any increase in maximum stack size in Table 12.8 has the same kind of explanation as we saw in Table 12.3: The difference between (25b) and (25c) is simply the difference between going around a “loop” once or twice, where the loop begins and ends at configurations where the prediction about the remaining portion of the sentence can be compactly represented as a single NP symbol (i.e. after ‘chased’ and ‘bit’ and ‘ate’). So the efficient processing of right-branching structures by the top-down parser stems from the fact that these are structures where the remaining portions of a sentence have

formal methods in experimental syntax

427

Table 12.7 The effect of left-embedding on top-down parsing. Memory load increases as embedding depth increases: maximum of 4 symbols for (24b) but 6 symbols for (24c). (24b): 0 John 1 ’s 2 dog 3 fled 4 Transition

Rule used

Configuration

—

— S → NP VP NP → NP POSS N NP → John POSS → ’s N → dog VP → V V → fled

(S, 0) (NP VP, 0) (NP POSS N VP, 0) (POSS N VP, 1) (N VP, 2) (VP, 3) (V, 3) (ε, 4)

PREDICT PREDICT MATCH MATCH MATCH PREDICT MATCH

(24c): 0 John 1 ’s 2 wife 3 ’s 4 dog 5 fled 6 Transition

Rule used

Configuration

—

— S → NP VP NP → NP POSS N NP → NP POSS N NP → John POSS → ’s N → wife POSS → ’s N → dog VP → V V → fled

(S, 0) (NP VP, 0) (NP POSS N VP, 0) (NP POSS N POSS N VP, 0) (POSS N POSS N VP, 1) (N POSS N VP, 2) (POSS N VP, 3) (N VP, 4) (VP, 5) (V, 5) (ε, 6)

PREDICT PREDICT PREDICT MATCH MATCH MATCH MATCH MATCH PREDICT MATCH

compact descriptions in terms of predicted constituents—just as the efficient processing of left-branching structures when working bottom-up stems from the fact that such structures allow compact representations in terms of already-processing constituents. When the top-down parser is confronted with left-branching structures, the required stack size increases with the depth of embedding (Table 12.7), because the remaining predicted structure takes the form of a large number of distinct constituents (and is at its largest at the beginning of the sentence); right-branching structures pose analogous difficulties for bottom-up parsing, since the already-consumed portion of a sentence can only be described with a large number of symbols (and is at its largest at the end of the sentence). Center-embedding structures impose requirements on the stack that increase with depth of embedding because near the middle of such sentences there is neither a compact representation in terms of already-processed constituents (see Table 12.2) nor one in terms of predicted constituents (see Table 12.6). These facts about the workings of the top-down parser therefore mean that under the new linking hypothesis in (36), we make correct predictions regarding right-branching

428

tim hunter

Table 12.8 The effect of right-embedding on top-down parsing. No increase in memory load as embedding depth increases: maximum of 3 symbols in both cases. (25b): 0 Mary 1 chased 2 the 3 cat 4 that 5 bit 6 the 7 rat 8 Transition

Rule used

Configuration

—

— S → NP VP NP → Mary VP → V NP V → chased NP → D N SRC D → the N → cat SRC → C VP C → that VP → V NP V → bit NP → D N D → the N → rat

(S, 0) (NP VP, 0) (VP, 1) (V NP, 1) (NP, 2) (D N SRC, 2) (N SRC, 3) (SRC, 4) (C VP, 4) (VP, 5) (V NP, 5) (NP, 6) (D N, 6) (N, 7) (ε, 8)

PREDICT MATCH PREDICT MATCH PREDICT MATCH MATCH PREDICT MATCH PREDICT MATCH PREDICT MATCH MATCH

                  

(25c): 0 Mary 1 chased 2 the 3 cat 4 that 5 bit 6 the 7 rat 8 that 9 ate 10 the 11 cheese 12 Transition

Rule used

Configuration

—

— S → NP VP NP → Mary VP → V NP V → chased NP → D N SRC D → the N → cat SRC → C VP C → that VP → V NP V → bit NP → D N SRC D → the N → rat SRC → C VP C → that VP → V NP V → ate NP → D N D → the N → cheese

(S, 0) (NP VP, 0) (VP, 1) (V NP, 1) (NP, 2) (D N SRC, 2) (N SRC, 3) (SRC, 4) (C VP, 4) (VP, 5) (V NP, 5) (NP, 6) (D N SRC, 6) (N SRC, 7) (SRC, 8) (C VP, 8) (VP, 9) (V NP, 9) (NP, 10) (D N, 10) (N, 11) (ε, 12)

PREDICT MATCH PREDICT MATCH PREDICT MATCH MATCH PREDICT MATCH PREDICT MATCH PREDICT MATCH MATCH PREDICT MATCH PREDICT MATCH PREDICT MATCH MATCH

                                     

formal methods in experimental syntax

429

structures and center-embedding structures, but an incorrect prediction regarding leftbranching structures.29 (36) A speaker processing the string PHON(t) will experience greater perceptual difficulty the greater MaxStackTD (t) is. Again there is of course the option of attributing the blame to the grammar, rather than this linking hypothesis: Given independent strong evidence that the hypothesis in (36) were correct, we might prefer to modify our hypothesized grammar (e.g. perhaps attributing right-branching structures to the sentences in (24)). But in the next subsection we will see that there is a different linking hypothesis we can combine with this existing grammar that correctly accounts for all of the embedding facts we have discussed.

12.3.3 Left-corner parsing As foreshadowed above, the left-corner parsing method can be seen as a “best of both worlds” mixture of bottom-up and top-down processing, incorporating the advantages of both: the efficient handling of left-branching structures via compact memory representations of already-consumed constituents, and the efficient handling of rightbranching structures via compact memory representations of predicted constituents. Configurations still take the form of a stack paired with a buffer, but the stack can store two different kinds of symbols: “barred” versions of non-terminal symbols (e.g. NP, VP) which, like all symbols stored by a top-down parser, represent constituents predicted in the remaining portion of a sentence, and “plain”/“unbarred” non-terminal symbols (e.g. NP, VP) which, like all symbols stored by a bottom-up parser, represent constituents formed from the already-consumed portion.30 The left-corner parsing method is defined in (37).31 The starting configuration has S on the stack, analogous to the starting configuration for top-down parsing; the goal is to fulfill this prediction. The top of the stack is on the left. The shift and match transitions are essentially identical to the transitions of the same names in the previous systems: Note that shift consumes a word and adds corresponding a bottom-up (i.e. “unbarred”) non-terminal symbol to the stack, and match consumes a word and removes a corresponding top-down (“barred”) non-terminal symbol from the stack. These two simple types of transitions deal with rules that have only terminal symbols on the right-hand side (i.e. rules of the form ‘X → w’). The other two types of transitions, lc-predict and lc-connect, deal with rules that have non-terminal symbols on the right-hand side (i.e. rules of the form ‘X → Y1 . . . Ym ’); these transitions are 29

This is essentially the proposal in Yngve (1960). This bar notation has nothing to do with X-bar theory. More specifically, this is the arc-eager variant of left-corner parsing; the arc-standard alternative does not have the same advantages as a cognitive model. See Abney and Johnson (1991) and Resnik (1992) for discussion. 30 31

430

tim hunter

Table 12.9 A first illustration of left-corner parsing. 0 1 2 3 4 5 6 7 8 9

Type of step

Rule used

Configuration

—

— D → the NP → D N N → dog S → NP VP V → chased VP → V NP D → the NP → D N N → cat

(S, | the dog chased the cat) (D S, the | dog chased the cat) (N NP S, the | dog chased the cat) (NP S, the dog | chased the cat) (VP, the dog | chased the cat) (V VP, the dog chased | the cat) (NP, the dog chased | the cat) (D NP, the dog chased the | cat) (N, the dog chased the | cat) (ε, the dog chased the cat | )

SHIFT LC-PREDICT MATCH LC-CONNECT SHIFT LC-CONNECT SHIFT LC-CONNECT MATCH

more complex because they deal with the interplay between constituents recognized bottom-up (plain symbols) and constituents predicted top-down (barred symbols). (37) Given a sentence w1 . . . wn to be parsed and a grammar: • starting configuration: (S, | w1 . . . wn ) • goal configuration: (ε, w1 . . . wn | ) • shift transitions: (Σ, w1 . . . | wi . . . wn ) =⇒ (XΣ, w1 . . . wi | . . . wn ) if there is a rule X → wi in the grammar • match transitions: (XΣ, w1 . . . | wi . . . wn ) =⇒ (Σ, w1 . . . wi | . . . wn ) if there is a rule X → wi in the grammar • lc-predict transitions: (Y1 Σ, w1 . . . | . . . wn ) =⇒ (Y2 . . . Ym XΣ, w1 . . . | . . . wn ) if there is a rule X → Y1 . . . Ym in the grammar • lc-connect transitions: (Y1 XΣ, w1 . . . | . . . wn ) =⇒ (Y2 . . . Ym Σ, w1 . . . | . . . wn ) if there is a rule X → Y1 . . . Ym in the grammar (where Σ is a placeholder for sequences of nonterminal symbols) An example to illustrate is shown in Table 12.9. The shift transitions do bottom-up work, and only manipulate “plain” symbols (e.g. D at Step 1, V at Step 5); the match transitions do top-down work, and only manipulate “barred” symbols (e.g. N at Step 3). The lc-predict transitions and lcconnect transitions both “trade in” a bottom-up recognized nonterminal (no bar) for some number of predictions (with bars) corresponding to that nonterminal’s hypothesized sisters, according to some chosen grammar rule; the left-corner of a context-free rule is the first symbol on the right hand side, and it is the chosen grammar rule’s leftcorner that is “traded in” by these transitions. When lc-predict applies in Step 2, for example, the already-recognized D is hypothesized to be the left-corner of an NP constituent expanded according to the rule ‘NP → D N’; it is therefore traded in for a predicted N, accompanied by a bottom-up NP symbol that will be available to work

formal methods in experimental syntax

431

fig. 12.3 Graphical illustration of lc-predict and lc-connect. The general form of each, corresponding to the appropriate parts of the definition in (37), is shown at the top. The bottom shows two instantiations of this general form that appear in Table 12.9.

with if that prediction of an N is fulfilled. When lc-connect applies in Step 4, the recognized NP (left-corner of the rule ‘S → NP VP’) is similarly traded in for a predicted VP; what makes lc-connect different is that we put this recognized NP towards the satisfaction of an already predicted instance of the parent nonterminal (here, S), so we remove this symbol from the stack rather than adding a bottom-up instance of this parent nonterminal. A graphical illustration of the relationship between lc-predict and lc-connect is shown in Figure 12.3. The latter gets its name from the fact that it connects the part of the tree that’s growing upwards from the words with the part that’s growing downwards. The left-corner parsing system thus has the ability to mix together bottom-up representations of consituents in the already-seen part of the sentence and top-down representations of constituents predicted in the unseen part of the sentence. Note that in Table 12.9, there are compact one-symbol representations of the parser’s state after consuming ‘the dog’ (the single VP symbol after Step 4) and after consuming ‘the dog chased’ (the single NP symbol after Step 6), as was the case for the top-down parser; but in addition, it also has a compact representation of the state after consuming ‘the dog’ in terms of the recognized NP after Step 3, as was the case for the bottom-up parser. (The fact that there is an additional S symbol on the stack at that point is an unfortunate distraction: this is a simply “constant-sized” indicator that we have not connected with

432

tim hunter

the overall predicted S node yet, and does not grow with the depth of left-embedding as; see Table 12.11.32 ) This ability to use a single symbol both in situations where the seen portion of the sentence forms a constituent and where the unseen portion of the sentence forms a constituent, allows the left-corner parser to mimic the bottom-up parser’s efficient treatment of left-branching structures and mimic the top-down parser’s efficient treatment of right-branching structures. Table 12.11 shows how the stack contents are the same after ‘John’ and after ‘John ’s dog’ (for (24b)) and after ‘John’ and ‘John ’s wife’ and ‘John ’s wife ’s dog’ (for (24c)); a single bottom-up NP symbol can do the same work in all of these cases (in addition to the waiting prediction S), creating the familiar looping pattern. And Table 12.12 shows a similar looping pattern for right-branching structures: at the points after ‘chased’ and ‘bit’ and ‘ate’, a single top-down NP symbol encapsulates all of the parser’s internal state. But just as neither the bottom-up nor top-down method produced this kind of looping pattern on center-embedding structures, maximum stack depth for the left-corner parser increases with the depth of embedding for these sentences, as shown in Table 12.10.33 The stack size requirements for left-corner parsing therefore show the following pattern: MaxStackLC (tree for (24b)) = MaxStackLC (tree for (24c))

(left-branching)

MaxStackLC (tree for (25b)) = MaxStackLC (tree for (25c))

(right-branching)

MaxStackLC (tree for (26b)) < MaxStackLC (tree for (26c))

(center-embedding)

which in combination with the linking hypothesis that (38) A speaker processing the string PHON(t) will experience greater perceptual difficulty the greater MaxStackLC (t) is. produces the full range of predictions that we set out to achieve.

12.3.4 Take-home messages Having looked at the specifics of these context-free parsing methods in some depth, what have we learnt? There are a few take-home messages. 32

One can flip things around so that there is one extra symbol on the stack after the root S node has been created rather than before, by setting the start configuration to have the empty stack and the goal configuration to have a bottom-up S. The effect is just that the relevant instance of lc-connect “in the middle,” eliminating S, is replaced by a corresponding instance of lc-predict, introducing S. 33 The underlying point here on which the overall argument relies is that only center-embedding structures yield the kind of nesting patterns in strings that are beyond the reach of a finite-state machine (e.g. Chomsky 1963: 394–395). The finitely many configurations that are visited in the course of parsing an arbitrarily large left-branching or right-branching structure can by simulated by a finite-state machine.

formal methods in experimental syntax

433

Table 12.10 The effect of center-embedding on left-corner parsing. Memory load increases as embedding depth increases: maximum of 5 symbols for (26b) but 7 symbols for (26c). (26b): 0 the 1 rat 2 the 3 cat 4 chased 5 fled 6 Transition

Rule used

Configuration

—

— D → the NP → D N ORC N → rat D → the NP → D N N → cat ORC → NP V V → chased S → NP VP V → fled VP → V

(S, 0) (D S, 1) (N ORC NP S, 1) (ORC NP S, 2) (D ORC NP S, 3) (N NP ORC NP S, 3) (NP ORC NP S, 4) (V NP S, 4) (NP S, 5) (VP, 5) (V VP, 6) (ε, 6)

SHIFT LC-PREDICT MATCH SHIFT LC-PREDICT MATCH LC-CONNECT MATCH LC-CONNECT SHIFT LC-CONNECT

(26c): 0 the 1 rat 2 the 3 cat 4 the 5 dog 6 bit 7 chased 8 fled 9 Transition

Rule used

Configuration

—

— D → the NP → D N ORC N → rat D → the NP → D N ORC N → cat D → the NP → D N N → dog ORC → NP V V → bit ORC → NP V V → chased S → NP VP V → fled VP → V

(S, 0) (D S, 1) (N ORC NP S, 1) (ORC NP S, 2) (D ORC NP S, 3) (N ORC NP ORC NP S, 3) (ORC NP ORC NP S, 4) (D ORC NP ORC NP S, 5) (N NP ORC NP ORC NP S, 5) (NP ORC NP ORC NP S, 6) (V NP ORC NP S, 6) (NP ORC NP S, 7) (V NP S, 7) (NP S, 8) (VP, 8) (V VP, 9) (ε, 9)

SHIFT LC-PREDICT MATCH SHIFT LC-PREDICT MATCH SHIFT LC-PREDICT MATCH LC-CONNECT MATCH LC-CONNECT MATCH LC-CONNECT SHIFT LC-CONNECT

• The main goal is to convey a sense of what it is that one needs to add to a grammar to produce a theory of sentence comprehension: what needs to be added is what the bottom-up, top-down, and left-corner methods all manage to add to (23). The grammar partially determines what the resulting combined system does: it should be clear that the grammar does not fully determine it, since we have seen three distinct options, each with different empirical pros and cons (while leaving the grammar unchanged). There are of course alternatives to breaking down

434

tim hunter

Table 12.11 The effect of left-embedding on left-corner parsing. No increase in memory load as embedding depth increases: maximum of 4 symbols in both cases. (24b): 0 John 1 ’s 2 dog 3 fled 4 Transition

Rule used

Configuration

—

— NP → John NP → NP POSS N POSS → ’s N → dog S → NP VP V → fled VP → V

(S, 0) (NP S, 1) (POSS N NP S, 1) (N NP S, 2) (NP S, 3) (VP, 3) (V VP, 4) (ε, 4)

SHIFT LC-PREDICT MATCH MATCH LC-CONNECT SHIFT LC-CONNECT

  

(24c): 0 John 1 ’s 2 wife 3 ’s 4 dog 5 fled 6 Transition

Rule used

Configuration

—

— NP → John NP → NP POSS N POSS → ’s N → wife NP → NP POSS N POSS → ’s N → dog S → NP VP V → fled VP → V

(S, 0) (NP S, 1) (POSS N NP S, 1) (N NP S, 2) (NP S, 3) (POSS N NP S, 3) (N NP S, 4) (NP S, 5) (VP, 5) (V VP, 6) (ε, 6)

SHIFT LC-PREDICT MATCH MATCH LC-PREDICT MATCH MATCH LC-CONNECT SHIFT LC-CONNECT

     

the sentence comprehension system into two components with the shapes that I have outlined—including, for example, the option of not breaking down it into any two components thought of as a “grammar” and a “parser” at all, as proposed by Phillips (1996). But to the extent that one would like to take as a starting point a grammar in the sense that has (for better or for worse) become conventional, things have already been carved up in a certain way; the bottom-up, top-down, and left-corner parsing methods are examples of things that have the shape of what got carved off to leave behind a grammar like (23). There is no category error or irreparable clash of perspectives to be overcome in formulating machinery that bridges the gap between grammars and algorithmic processing mechanisms, but the task is more difficult to approach if we do not have a clear picture of exactly what the relevant grammars look like. • The question of what sequence of steps a parser goes through to arrive at a particular structural description can be separated from the question of ambiguity resolution. The first question is whether, for example, a parser arrives at a structural

formal methods in experimental syntax

435

Table 12.12 The effect of right-embedding on left-corner parsing. No increase in memory load as embedding depth increases: maximum of 2 symbols in both cases. (25b): 0 Mary 1 chased 2 the 3 cat 4 that 5 bit 6 the 7 rat 8 Transition

Rule used

Configuration

—

— NP → Mary S → NP VP V → chased VP → V NP D → the NP → D N SRC N → cat C → that SRC → C VP V → bit VP → V NP D → the NP → D N N → rat

(S, 0) (NP S, 1) (VP, 1) (V VP, 2) (NP, 2) (D NP, 3) (N SRC, 3) (SRC, 4) (C SRC, 5) (VP, 5) (V VP, 6) (NP, 6) (D NP, 7) (N, 7) (ε, 8)

SHIFT LC-CONNECT SHIFT LC-CONNECT SHIFT LC-CONNECT MATCH SHIFT LC-CONNECT SHIFT LC-CONNECT SHIFT LC-CONNECT MATCH

                  

(25c): 0 Mary 1 chased 2 the 3 cat 4 that 5 bit 6 the 7 rat 8 that 9 ate 10 the 11 cheese 12 Transition

Rule used

Configuration

—

— NP → Mary S → NP VP V → chased VP → V NP D → the NP → D N SRC N → cat C → that SRC → C VP V → bit VP → V NP D → the NP → D N SRC N → rat C → that SRC → C VP V → ate VP → V NP D → the NP → D N N → cheese

(S, 0) (NP S, 1) (VP, 1) (V VP, 2) (NP, 2) (D NP, 3) (N SRC, 3) (SRC, 4) (C SRC, 5) (VP, 5) (V VP, 6) (NP, 6) (D NP, 7) (N SRC, 7) (SRC, 8) (C SRC, 9) (VP, 9) (V VP, 10) (NP, 10) (D NP, 11) (N, 11) (ε, 12)

SHIFT LC-CONNECT SHIFT LC-CONNECT SHIFT LC-CONNECT MATCH SHIFT LC-CONNECT SHIFT LC-CONNECT SHIFT LC-CONNECT MATCH SHIFT LC-CONNECT SHIFT LC-CONNECT SHIFT LC-CONNECT MATCH

                                     

436

tim hunter

description for ‘the dog chased the cat’ via the sequence of steps shown in Table 12.1, Table 12.5, or Table 12.9. The second question concerns whether any “wrong turns” away from these paths through the search space are taken; recall the illustration of the late-closure effect in (33). At least below a certain level of abstraction, the second question presupposes an answer to the first question. As (33) illustrated, in the context of bottom-up parsing, the late closure preference amounts to a preference for shift transitions over reduce transitions; but in the context of top-down parsing it amounts to a relative preference amongst predict transitions, preferring to use rules with longer right-hand sides (e.g. VP → V NP) over those with shorter right-hand sides (e.g. VP → V). • The case study of center-embedding provides a concrete illustration of what it can look like for a theory to say that a certain sentence violates no grammatical constraint and yet elicits judgements of unacceptability due to a precisely characterized form of processing difficulty. A linking hypothesis such as (38) is a testable “reductionist” account of certain acceptability facts that has the explanatory force to make predictions about sentences other than those that originally motivated it (cf. Sprouse et al. 2013; Phillips 2013: 159–160). • In comparison with Section 12.2, considering specific parsing methods demonstrates the lengths to which one does not need to go in order for informationtheoretic complexity metrics to get off the ground. In at least one reasonable sense, the linking hypotheses in Section 12.2 concern not how parsing happens, but rather what parsing achieves; accordingly, they can be described as pertaining to Marr’s (1982) “computational level,” in contrast to the linking hypotheses in this section which pertain to the “algorithmic level.” • A comparison between left-corner parsing on the one hand, and bottom-up and top-down parsing on the other, reveals a difference in what can be called the “transparency” of the parser–grammar relation (Berwick and Weinberg 1984: 39–42). In the case of bottom-up and top-down parsing there is a particularly direct relationship between grammatical derivations and the steps involved in parsing, as Kanazawa (2016) emphasizes. Specifically, there is a one-to-one correspondence between grammatical rules and parsing transitions: parsing a sentence whose derivation includes a use of the rule ‘NP → D N’, for example, will necessarily involve a reduce step that replaces ‘D N’ with ‘NP’ if done bottom-up, and will necessarily involve a predict step that replaces ‘NP’ with ‘D N’ if done top-down. The relationship is less direct in the case of left-corner parsing, however. Knowing that a certain sentence’s derivation includes a use of the rule ‘NP → D N’ does not allow us to conclude that any particular transition will necessarily be involved in parsing it with the left-corner system, because this piece of grammatical structure might be established via an lc-predict transition or via an lc-connect transition. So in this case, although any given transition the parser takes is licensed by a particular grammatical rule, the relationship between grammatical rules and parsing operations is one-to-many.

formal methods in experimental syntax

437

12.4 Connecting to contemporary syntax

..........................................................................................................................

In both of the preceding sections, I have used simple finite-state or context-free grammars for illustrative purposes. Of course modern theories of natural language syntax do not generally take this form, so it is important to ask how the various informationtheoretic and automata-theoretic concepts outlined in the previous sections can be connected to more expressive kinds of grammars along the lines of what contemporary syntacticians are working with. A line of work descending from Stabler (1997) has shed light on this question as it relates to modern theories in the minimalist tradition, and this is what I will focus on here. But to a degree that is perhaps surprising, a similar story can be told for other systems such as Tree Adjoining Grammars (Joshi et al. 1975; Joshi 1985; Abeillé and Rambow 2000; Frank 2002) and Combinatory Categorial Grammars (Ades and Steedman 1982; Steedman 1996; 2000); see Stabler (2011) and Joshi et al. (1990) for discussion. The crucial underlying issue is to identify exactly how minimalist grammars differ from CFGs, or exactly what minimalist grammars add to CFGs. An understanding of this provides a clear picture of what needs to be finessed in order for minimalist grammars to be plugged in to linking hypotheses of the sorts outlined in the previous sections. Michaelis (2001) showed that the minimalist grammars formulated in Stabler (1997) are in fact very similar to CFGs at a certain significant level of abstraction (see also Kobele et al. 2007). Since the ways in which the ideas from previous sections relate to CFGs are generally well-understood, this paved the way for many of these ideas to be adapted to minimalist grammars. This means that a certain amount (not all) of the work involved in understanding how to formulate interesting linking hypotheses for minimalist grammars just is the work of understanding how to do so for CFGs. The same ideas play important roles. But understanding exactly how those familiar important ideas fit into the ecosystem of minimalist grammars involves a certain adjustment of perspective, because of the abstract level at which the crucial similarities identified by Michaelis reside. In other words, while there are objects in the minimalist grammar ecosystem that one can reason about using the thought patterns that are familiar from CFGs (that is the good news, which makes the easy part of the task easy), they are not objects that linguists are generally in the habit of writing down (this is the bad news, which makes the hard part of the task hard). In particular, there are objects in the minimalist grammar ecosystem that license the same simple, familiar patterns of reasoning that we naturally apply to CFG trees like the ones in (28), (29), and (30) above (the good news); but the relevant objects are not, despite surface similarities, the trees conventionally used to illustrate transformational derivations like the one in (39) (the bad news).

438

tim hunter

(39) CP DP what

Cʹ C

TP TP

AdvP

DP

tomorrow

Tʹ

D N T the girl will

VP V t buy

The pivotal property that is familiar from CFGs—and present in an obscured underlying sense in minimalist grammars—is a certain kind of interchangeability of subexpressions. Consider the two trees shown in (40), both generated by the grammar used in Section 12.3. S

(40) NP

S NP John

VP PP

D N V NP the dog chased Mary P with

VP NP

V bit NP

NP

D N the cat

D N the cat

POSS N ’s rat

Both contain a node labeled VP. A fundamental consequence of this is that the two subtrees dominated by those two VP nodes are interchangeable, in the sense that if we swap one for the other we are guaranteed to get more trees that are well-formed according to the same grammar. Specifically, we can swap the VP subtrees around to get the two trees in (41), which are also both generated by the same grammar.

formal methods in experimental syntax (41)

S NP

439

S VP

D N V the dog bit

NP John NP

NP

POSS N rat ’s

D N the cat

VP V NP chased Mary

PP P with

NP D N the cat

This is a consequence of the way the “goodness as a VP” of a particular subexpression is independent of the environment in which that subexpression might appear. When we describe something of the form ‘VP → V NP’ as a context-free rule, what we are saying is precisely that the right-hand side of the rule is a valid way for a VP to be constituted no matter what context this VP might be appearing in. Since the phrase ‘bit the cat’s rat’ is good enough to fit under the node labeled VP in the second tree in (40), it can’t fail to be good enough to fit under the node labeled VP in the first tree in (41)—it could only fail if there were conditions on “VP-hood” that depended on the environment into which a putative VP is to be put, but by assumption there are none. This kind of modularity of tree structures is exactly what makes CFGs easy to work with, and plays an important role in operationalizing the linking hypotheses discussed above in combination with CFGs. Recall from Section 12.2 the idea of generating a sentence by generating independently chosen subparts, and the way this is the key point of contact between a hypothesized grammatical structure and the corresponding range of probability distributions. The relationship between (40) and (41) could be restated in probabilistic terms as follows (lazily blurring the distinction between trees and strings for a moment): since P(S → the dog chased Mary with the cat) = P(S → NP VP) × P(NP →∗ the dog) × P(VP →∗ chased Mary with the cat) and P(S → John bit the cat’s rat) = P(S → NP VP) × P(NP →∗ John) × P(VP →∗ bit the cat’s rat) we can conclude from the fact that these two sentences have non-zero probabilities that the six multiplied probabilities on the right-hand sides of these equations are also all

440

tim hunter

non-zero; and by gluing the pieces together differently we can conclude that P(S → the dog bit the cat’s rat) = P(S → NP VP) × P(NP →∗ the dog) × P(VP →∗ bit the cat’s rat) and P(S → John chased Mary with the cat) = P(S → NP VP) × P(NP →∗ John) × P(VP →∗ chased Mary with the cat) are both greater than zero, mirroring the conclusion about categorial grammaticality above.34 The role that this interchangeability plays in the automata-theoretic models from Section 12.3 is perhaps slightly less obvious but still significant. The way in which a parsing system can process unboundedly large structures of a certain sort involved the presence of a loop in the contents of the stack. Take, for example, the looping shown in Table 12.12, where we return to configurations where the stack contains just a single NP prediction. The reason that this single stack symbol suffices both after processing only the first two words ‘Mary chased’ of (25b) and after processing the first six words ‘Mary chased the cat that bit’ is exactly that the two corresponding remaining portions—‘the cat that bit the rat’ and ‘the rat’—are interchangeable by virtue of both being NPs. Similarly, the looping shown in Table 12.3 is a consequence of the way the difference between having consumed ‘John’ and having consumed ‘John’s dog’ is irrelevant to what may come next, precisely because ‘John’ and ‘John’s dog’ can go in all the same places. Knowing that irrelevant distinctions like this do not need to be tracked is a key part of what distinguishes a parsing mechanism from a simple device that is equipped with a lookup table of complete sentences and merely searches for matches on the basis of its entire input at once. For comparison, consider now two trees of the sort usually used to represent minimalist-style derivations.

34

What is actually happening is that we have done the same calculation twice, once in the Boolean semiring and once in the probability semiring; non-zero probability values correspond to the Boolean value true, and zero probability values correspond to the Boolean value false. Notice, for example, that the relevant categorial well-formedness calculation could be stated as follows: S →∗ the dog bit the cat’s rat if S → NP VP and NP →∗ the dog and VP →∗ chased the cat’s rat The deep connection here has far-reaching unifying consequences; see Goodman (1998; 1999).

formal methods in experimental syntax (42)

441

CP DP what

Cʹ C

TP TP

AdvP

DP

tomorrow

Tʹ

D N T the girl will

VP V t buy

CP C

TP TP DP D N T the girl will

AdvP tomorrow

Tʹ VP V DP buy it

Notice that, following the usual assumptions within transformational grammar, both of the trees in (42) contain a node labeled VP, just as the two trees in (40) did. But here, this does not license a conclusion that a certain subpart of one tree can be swapped with the other.35 The simple VP ‘buy it’ in the second tree cannot be substituted for any corresponding constituent in the first tree. And there is no “VP constituent” of the first tree that can be substituted into the corresponding position in the second tree. (It doesn’t matter whether we imagine that the VP constituent in the first tree contains both the moved phrase ‘what’ and its trace, or only the trace: if we take the relevant constituent to include both, then the head of this chain will have no appropriate slot to

35 This is in effect the point that motivates the decision, in GPSG and its descendants (Gazdar 1981; Gazdar et al. 1985), to not give these nodes the same label: In these frameworks, the node labeled VP in the first tree in (42) would instead have the label VP/NP. I return to a comparison between the approach in the main text and that of GPSG below.

442

tim hunter

fit into since the second tree has a non-interrogative C head; if we take the relevant constituent to include only the trace, then we will have something equivalent to an unbound trace.) While both trees have a node labeled VP, they do not have any corresponding interchangeable subparts in the sense illustrated above for CFGs. So what it means for a node to be labeled VP (or anything else) in trees of the sort in (42) is simply not the same as what it means for a node to labeled VP (or anything else) in trees of the sort in (40). The presence of movement arrows does not only distort the surface word order—it also puts tangles into the otherwise modular workings of the grammar, and this modularity was the key to operationalizing the various linking hypotheses discussed above. Although transformational grammars can in a sense be thought of as the result of “adding movement to a CFG,” this does not mean that the parts of the trees in (42) that are not movement arrows can be understood exactly as all of the tree structure in (40) can. This is, as I mentioned above, the bad news; a shift in perspective is required. The good news is that there does exist a different way of saying what the parts of ‘what the girl will buy tomorrow’ are and giving labels to those parts, such that parts with the (43) CP Cʹ, – C

TP, – TP, – DP

AdvP tomorrow

Tʹ, –

D N T the girl will

VP, – V buy

DP[– ] what

CP C

TP AdvP

TP DP D N T the girl will

tomorrow

Tʹ VP V buy

DP it

formal methods in experimental syntax

443

same labels can be interchanged just as they could in CFGs.36 The tree that says what those parts are, what labels they have, and how they are put together, is shown in (43). The perspective that we are switching to does not require any changes to how we think of movement-free derivations, so the second tree in (43) is the same as the second tree in (42). The fact that the derivation on the left has no subpart that can be interchanged with the VP ‘buy it’ is now accurately reflected in the fact that no node in the left tree shares this label. Writing ‘DP[-wh]’ rather than ‘DP’ as the label for ‘what’ simply says that it is the kind of thing that undergoes wh-movement. When we write things like ‘VP,-wh’ and ‘TP,-wh’, we are indicating the parts of the tree that, due to the tangles introduced by movement, are not interchangeable with subparts that simply bear the labels ‘VP’ and ‘TP’. By encoding the fact that the tangling extends up as high as the C′ ,-wh node but not the CP node immediately above it, these annotations also encode the fact that the CP expression was constructed out of the C′ ,-wh expression in a way that involved resolving the tangle—in other words, by satisfying the requirement that ‘what’ undergoes whmovement—so there is no need to also draw a line connecting the CP node to ‘what’. Such trees not only bring to the surface the fact that nothing in the left tree can be interchanged with ‘buy it’, they also make transparent the interchangeability relations that the expression constructed out of ‘buy’ and ‘what’ does participate in. The corresponding tree for ‘which book John should read’ is shown in (44). CP

(44)

Cʹ, – C

TP, – DP John

Tʹ, – T should

VP, – V read

DP[–

]

D[– ] N which book

36 Stabler (2011: 624) mentions the crucial partitioning of expressions in a minimalist grammar. This is analogous to the way a CFG creates a partition of possible expressions where two expressions belong to the same equivalence class if and only if they are derived from the same non-terminal. In an FSA, the analogous concept is highlighted by the Myhill–Nerode Theorem (Hopcroft and Ullman 1979: 65).

444

tim hunter

This tree does have a node labeled ‘VP,-wh’, and this now does license the conclusion that certain other expressions will inevitably be grammatical. Specifically, we can swap around this subtree with the one bearing the same label in the first tree in (43), to produce these new grammatical trees: (45)

CP Cʹ, – C

TP, – TP, –

AdvP

DP D the

tomorrow

Tʹ, –

N girl

VP, –

T will V read

DP[– D[– ] which

CP Cʹ, – TP, –

C DP John

Tʹ, – T should

VP, – V buy

DP[– ] what

] N book

formal methods in experimental syntax

445

What is happening here is that we are rearranging the pieces of the two expressions (46)

a. what the girl will buy tomorrow b. which book John should read

(i.e. first tree in (43)) (i.e. tree in (44))

to yield the pair (47)

a. which book the girl will read tomorrow b. what John should buy

(i.e. first tree in (43)) (i.e. second tree in (43))

in just the same way we did for the CFG trees above. The pieces being swapped, indicated by boxes in the trees above, do not correspond to contiguous portions of the eventual linearized strings as they do with CFGs—but to get too distracted by this detail would be to focus unduly on these strings rather than the structure-building machinery of the grammar. The trees in (43), (44), and (45) can be described as derivation trees. By adopting this representation, we focus attention on the way independently chosen pieces were snapped together to form a larger whole. What (45) highlights is the fact that the object that we get by combining ‘buy’ and ‘what’ can be used in all the same ways as the one we get by combining ‘read’ with the combination of ‘which’ and ‘book’; the relationship between these objects and final surface word-order is not straightforward, but what matters is that it is the same for both of them. The trees in (42), in contrast, give priority to surface constituency over “combinability.” In the case of a CFG one need not choose between which of these two properties to focus on, because the two are conflated—or perhaps we should say confounded. The minimalist grammar derivation trees bear a significant resemblance to T-markers in early transformational grammar (e.g. Chomsky 1965: 130), including the way binary-branching nodes represent operations that combine two expressions (i.e. generalized transformations) and unary-branching nodes represent operations that adjust the surface word order of an existing expression (i.e. singulary transformations); see Hunter (2019b) for related discussion. They focus attention on what the grammar generates and how it generates those things—and to what the degree the way it generates this thing might overlap with the way it generates that thing—rather than how those things are pronounced. The perspective they provide is therefore in line with the recent trend in minimalist syntax towards thinking of externalization as a relatively incidental aspect of the human language system. Abandoning the simple and familiar relationship between externalization and structure exhibited by CFG trees is arguably an overdue step that needs to be taken in order to bring our thinking fully into line with the fact that natural language grammars are not CFGs; perhaps adding movement arrows like in (42) was a temporarily useful cheap quick-fix.

446

tim hunter

In short, derivation trees allow us to think about the range of possibilities allowed by a minimalist grammar in exactly the same modular, tractable way that we think about the range of possibilities allowed by a CFG.37 (As an aside: the trees in (43), (44), and (45) bear a significant resemblance to the “slash-passing” trees used in GPSG and its descendants (Gazdar 1981; Gazdar et al. 1985). In both approaches, a “moving” constituent occupies only a single position in the tree structure, with the dependency between that position and the other end of its movement chain encoded on all intervening node labels. A crucial difference, however, is that the derivation trees here place moving constituents in their “base” positions, with their “target” positions encoded only indirectly, whereas slash-passing does the reverse, representing surface positions explicitly and encoding base positions indirectly. A consequence of this difference is that slash-passing cannot express remnant movement patterns (Stabler 2011: 626), and it is exactly the capacity of the minimalist derivation trees to express remnant movement that allows them to account for non-context-free patterns (Kobele 2010) such as crossing dependencies in Swiss German (Shieber 1985).) For work that has combined minimalist grammars with the information-theoretic linking hypotheses from Section 12.2 (e.g. Hale 2003; 2006; Yun et al. 2015), using the underlying connection to CFGs as our guide to think about the range of possible derivations essentially provides an immediate solution to the question of how to formulate probability distributions over derivations: just do to the trees in (43) and (44) what is standardly done to CFG trees; see e.g. Yun et al. (2015: §4). Various more elaborate approaches to defining probability distributions over a CFG can also be imported to the minimalist grammar case (Hunter and Dyer 2013), but the underlying CFG-like structure is what makes all of this possible. As regards adapting the parsing methods from Section 12.3 to minimalist grammars, Stabler (2013) presented a “top-down minimalist parser” that in a sense does to the trees in (43) and (44) what the top-down method described in Section 12.3 does to standard CFG trees. This system has been used as the basis for formulating and testing linking hypotheses along the lines of the stack-depth idea (e.g. Kobele et al. 2013; Graf et al. 2015; 2017). Given the complicated relationship between minimalist derivation trees and surface word order, Stabler’s relatively direct application of the top-down method yields a parser that lacks a certain kind of “incrementality” in its treatment of movement dependencies; see Hunter (2019a), Hunter et al. (2019), and Stanojević and Stabler (2018) for discussion and different proposals that adapt the left-corner method to minimalist grammars instead. All of this work frames the parsing question as one of snapping together the modular, interchangeable parts of trees like those in (43) and

37

What has been made more complicated by the shift away from trees like (42) is essentially the issue of linearization, and so this is where the finessing and adapting mentioned at the beginning of this section remains to be done. The crucial ingredient for addressing these complications is the equivalence between minimalist grammars and multiple context-free grammars (Seki et al. 1991; Kallmeyer 2010; Clark 2014). See Stabler (2013) and Hunter and Dyer (2013: §2) for explanations of this connection.

formal methods in experimental syntax

447

(44), in a manner analogous to the way the parsing methods in Section 12.3 compose the modular parts of conventional CFG trees.

Acknowledgements

..........................................................................................................................

Thanks to Jesse Harris, Bruce Hayes, Norbert Hornstein, Ellen Lau, Philip Resnik, Carson Schütze, and Jon Sprouse for comments on earlier drafts of this chapter, and to John Hale for many related discussions over a number of years.

References Abeille, A., and O., Rambow (eds) 2000. Tree adjoining grammars. Stanford, CA: CSLI,. Abney, S. P., and M. Johnson. 1991. Memory requirements and local ambiguities of parsing strategies. Journal of Psycholinguistic Research 20(3): 233–250. Ades, A. E., and M. Steedman. 1982. On the order of words. Linguistics and Philosophy 4: 517–588. Aho, A. V., and J. D. Ullman. 1972. The theory of parsing, translation and compiling, vol. 1: Parsing. Englewood Cliffs, NJ: Prentice Hall. Bar-Hillel, Y., M. Perles, and E. Shamir. 1961. On formal properties of simple phrase-structure grammars. Zeitschrift fu¨r Phonetik, Sprachwissenschaft und Kommunikationsforschung 14: 143–172. Berwick, R. C., and A. S. Weinberg. 1984. The grammatical basis of linguistic performance. Cambridge, MA: MIT Press. Billot, S., and B. Lang. 1989. The structure of shared forests in ambiguous parsing. In Proceedings of the 1989 Meeting of the Association of Computational Linguistics. Boston, M. F., J. Hale, R. Kliegl, U. Patil, and S. Vasishth. 2008. Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus. Journal of Eye Movement Research 2(1): 1–12. Brennan, J. R., E. P. Stabler, S. E. V. Wagenen, W.-M. Luh, and J. T. Hale. 2016. Abstract linguistics structure correlates with temporal activity during naturalistic comprehension. Brain and Language 157–158: 81–94. Chomsky, N. 1963. Formal properties of grammars. In R. D. Luce, R. R. Bush, and E. Galanter (eds), Handbook of mathematical psychology, vol. 2, 323–418. New York: Wiley. Chomsky, N. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press. Chomsky, N., and G. Miller. 1963. Introduction to the formal analysis of natural languages. In R. D. Luce, R. R. Bush, and E. Galanter (eds), Handbook of mathematical psychology, vol. 2, 269–321. New York: Wiley. Clark, A. 2014. An introduction to multiple context-free grammars for linguists. https://alexc17.github.io/papers/mcfgsforlinguists.pdf Cocke, J., and J. T. Schwartz. 1970. Programming languages and their compilers. Courant Institute of Mathematical Sciences, New York University. Cover, T. M., and J. A. Thomas. 2006. Elements of information theory, 2nd edn. New York: Wiley.

448

tim hunter

Demberg, V., and F. Keller. 2008. Data from eye-tracking corpora as evidence for theories of syntactic processing complexity. Cognition 109: 193–210. Frank, R. 2002. Phrase structure composition and syntactic dependencies. Cambridge, MA: MIT Press. Frank, S. L. 2013. Uncertainty reduction as a measure of cognitive load in sentence comprehension. Topics in Cognitive Science 5: 475–494. Frazier, L., and C. Clifton. 1996. Construal. Cambridge, MA: MIT Press. Frazier, L., and K. Rayner. 1982. Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology 14: 178–210. Gazdar, G. 1981. Unbounded dependencies and coordinate structure. Linguistic Inquiry 12(2): 155–184. Gazdar, G., E. H. Klein, G. K. Pullum, and I. A. Sag. 1985. Generalized phrase structure grammar. Cambridge, MA: Harvard University Press. Goodman, J. T. 1998. Parsing inside-out. PhD thesis, Harvard University. Goodman, J. T. 1999. Semiring parsing. Computational Linguistics 25(4): 573–605. Graf, T., B. Fodor, J. Monette, G. Rachiele, A. Warren, and C. Zhang. 2015. A refined notion of memory usage for minimalist parsing. In Proceedings of the 14th Meeting on the Mathematics of Language, 1–14. Association for Computational Linguistics. Graf, T., J. Monette, and C. Zhang. 2017. Relative clauses as a benchmark for Minimalist parsing. Journal of Language Modelling 5: 57–106. Grune, D., and C. J. H. Jacobs. 2008. Parsing techniques: A practical guide, 2nd edn. New York: Springer. Hale, J. T. 2001. A probabilistic earley parser as a psycholinguistic model. In Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics. Hale, J. T. 2003. Grammar, uncertainty and sentence processing. PhD thesis, Johns Hopkins University. Hale, J. T. 2006. Uncertainty about the rest of the sentence. Cognitive Science 30: 643–672. Hale, J. T. 2016. Information-theoretical complexity metrics. Language and Linguistics Compass 10(9): 397–412. Hale, J. T. 2017. Models of human sentence comprehension in computational psycholinguistics. In Oxford Research Encyclopedia of Linguistics. Hopcroft, J. E., and J. D. Ullman. 1979. Introduction to automata theory, languages and computation. Reading, MA: Addison-Wesley. Hunter, T. 2019a. Left-corner parsing of minimalist grammars. In B. Berwick and E. Stabler (eds), Minimalist parsing. Oxford: Oxford University Press. Hunter, T. 2019b. What kind of cognitive hypothesis is a derivational grammar? Catalan Journal of Linguistics SI: 89–138. Hunter, T., and C. Dyer. 2013. Distributions on minimalist grammar derivations. In Proceedings of the 13th Meeting on the Mathematics of Language. Hunter, T., M. Stanojević, and E. Stabler. 2019. The active-filler strategy in a move-eager leftcorner minimalist grammar parser. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, 1–10. Jelinek, F., and J. D. Lafferty. 1991. Computation of the probability of initial substring generation by stochastic context-free grammars. Computational Linguistics 17(3): 315–323.

formal methods in experimental syntax

449

Joshi, A. 1985. How much context-sensitivity is necessary for characterizing structural descriptions? In D. Dowty, L. Karttunen, and A. Zwicky (eds), Natural language processing: Theoretical, computational and psychological perspectives, 206–250. New York: Cambridge University Press. Joshi, A. K., L. S. Levy, and M. Takahashi. 1975. Tree adjunct grammars. Journal of Computer and System Sciences 10: 136–163. Joshi, A. K., Shanker, K. V., and Weir, D. 1990. The convergence of mildly context-sensitive grammar formalisms. University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-90-01. Jurafsky, D., and J. H. Martin. 2000. Speech and language processing. Upper Saddle River, NJ: Prentice Hall. Kallmeyer, L. 2010. Parsing beyond context-free grammars. Berlin: Springer. Kanazawa, M. 2016. Formal grammar: An introduction. Online lecture notes: https://makotokanazawa.ws.hosei.ac.jp/FormalGrammar/index.html Kasami, T. 1965. An efficient recognition and syntax-analysis algorithm for context-free languages. AFCRL Technical Report 65-758. Kobele, G. M. 2010. Without remnant movement, MGs are context-free. In C. Ebert, G. Jager, and J. Michaelis (eds), Proceedings of Mathematics of Language 10/11, 160–173, Berlin: Springer. Kobele, G. M., S. Gerth, and J. Hale. 2013. Memory resource allocation in top-down minimalist parsing. In G. Morrill and M.-J. Nederhof (eds), Formal grammar 2012/2013, 32–51. Berlin: Springer. Kobele, G. M., C. Retoré, and S. Salvati. 2007. An automata theoretic approach to minimalism. In J. Rogers and S. Kepser (eds), Proceedings of the workshop: Model-theoretic syntax at 10. Dublin. Lang, B. 1988. Parsing incomplete sentences. In Proceedings of the 12th International Conference on Computational Linguistics, 365–371. Levy, R. 2005. Probabilistic models of word order and syntactic discontinuity. PhD thesis, Stanford University. Levy, R. 2008. Expectation-based syntactic comprehension. Cognition 106(3): 1126–1177. Levy, R. 2013. Memory and surprisal in human sentence comprehension. In R. P. G. van Gompel (ed.), Sentence processing, 78–114. Brighton: Psychology Press. Levy, R., E. Fedorenko, and E. Gibson. 2013. The syntactic complexity of Russian relative clauses. Journal of Memory and Language 69: 461–495. Linzen, T., and T. F. Jaeger. 2016. Uncertainty and expectation in sentence processing: Evidence from subcategorization distributions. Cognitive Science 40: 1382–1411. MacKay, D. J. C. 2003. Information theory, inference and learning algorithms. Cambridge: Cambridge University Press. Manning, C. D., and H. Schütze. 1999. Foundations of statistical natural language processing. Cambridge, MA: MIT Press. Marr, D. 1982. Vision: A computational investigation into the human representation and processing of visual information. New York: Freeman. Michaelis, J. 2001. On formal properties of minimalist grammars. PhD thesis, Universität Potsdam. Miller, G. A., and N. Chomsky. 1963. Finitary models of language users. In R. D. Luce, R. R. Bush, and E. Galanter (eds), Handbook of mathematical psychology, vol. 2. New York: Wiley.

450

tim hunter

Nederhof, M. J., and G. Satta. 2003. Probabilistic parsing as intersection. In 8th International Workshop on Parsing Technologies, 137–148. LORIA, Nancy. Nederhof, M. J., and G. Satta. 2008a. Computing partition functions of PCFGs. Research on Language and Computation 6(2): 139–162. Nederhof, M. J., and G. Satta. 2008b. Probabilistic parsing. In G. Bel-Enguix, Jim’enez-L’opez, M. D. Jiménez-López, and C. Martinín- Vide (eds), New developments in formal languages and applications, 229–258. Berlin: Springer. Nelson, M. J., S. Dehaene, C. Pallier, and J. T. Hale. 2017. Entropy reduction correlates with temporal lobe activity. In Proceedings of the 7th Workshop on Cognitive Modeling and Computational Linguistics, 1–10. Partee, B. H., A. ter Meulen, and R. E. Wall. 1990. Mathematical methods in linguistics. Dordrecht: Kluwer. Phillips, C. 1996. Order and structure. PhD thesis, Massachusetts Institute of Technology. Phillips, C. 2013. Some arguments and nonarguments for reductionist accounts of syntactic phenomena. Language and Cognitive Processes 28(1–2): 156–187. Rabin, M. O., and D. Scott. 1959. Finite automata and their decision problems. IBM Journal of Research and Development 3(2): 114–125. Resnik, P. 1992. Left-corner parsing and psychological plausibility. In Proceedings of the Fourteenth International Conference on Computational Linguistics, 191–197. Seki, H., T. Matsumara, M. Fujii, and T. Kasami. 1991. 0n multiple context-free grammars. Theoretical Computer Science 88: 191–229. Shannon, C. E. 1948. A mathematical theory of communication. Bell System Technical Journal 27(3): 379–423. Shieber, S. M. 1985. Evidence against the context-freeness of natural language. Linguistics and Philosophy 8: 333–343. Sipser, M. 1997. Introduction to the theory of computation. Boston, MA: PWS. Smith, N. J., and R. Levy. 2013. The effect of word predictability on reading time is logarithmic. Cognition 128: 302–319. Sprouse, J., M. Wagers, and C. Phillips. 2013. Deriving competing predictions from grammatical approaches and reductionist approaches to island effects. In J. Sprouse and N. Hornstein (eds), Experimental syntax and island effects, 21–41. Cambridge: Cambridge University Press. Stabler, E. P. 1997. Derivational minimalism. In C. Retoré (ed.), Logical aspects of computational linguistics, 68–95. Berlin: Springer. Stabler, E. P. 2011. Computational perspectives on minimalism. In C. Boeckx (ed.), The Oxford handbook of linguistic minimalism. Oxford: Oxford University Press. Stabler, E. P. 2013. Two models of minimalist, incremental syntactic analysis. Topics in Cognitive Science 5(3): 611–633. Stanojević, M., and E. Stabler. 2018. A sound and complete left-corner parser for minimalist grammars. In Proceedings of the Eighth Workshop on Cognitive Aspects of Computational Language Learning and Processing, 65–74. Steedman, M. 1996. Surface structure and interpretation. Cambridge, MA: MIT Press. Steedman, M. 2000. The syntactic process. Cambridge, MA: MIT Press. Stolcke, A. 1995. An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational Linguistics 21(2): 167–201. Wolf, F., and E. Gibson. 2006. Parsing: Overview. In Encyclopedia of Cognitive Science. New York: Wiley.

formal methods in experimental syntax

451

Yngve, V. H. 1960. A model and an hypothesis for language structure. In Proceedings of the American Philosophical Society, vol. 104, 444–466. Younger, D. H. 1967. Recognition and parsing of context-free languages in time n3. Information and Control 10(2): 189–208. Yun, J., Z. Chen, T. Hunter, J. Whitman, and J. Hale. 2015. Uncertainty in the processing of relative clauses in East Asian languages. Journal of East Asian Linguistics 24(2).: 113–148.

c ha p t e r 1 3 ...........................................................................................................

i n v e s t i g at i n g s y n ta c t i c s t r u c t u r e and processing in the au d i t o ry m o d a l i t y ...........................................................................................................

mara breen and katy carlson

The study of how people process and produce written language has taught us a great deal about syntax. But anyone familiar with linguistics knows that spoken language (or signed language) taps the core human language ability, reflecting people’s unedited and untaught capacity for language in a way that written language does not. The field of experimental syntax thus needs to include experiments in auditory processing and speech production (in our opinion) more often than it does now. Language in the auditory domain involves, in addition to the words and syntactic structures present in written language, prosody, or the tune and rhythm of speech. The primary units of prosody are accents and prosodic phrases. Accents usually involve local changes in pitch (fundamental frequency or F0), duration, and intensity that render words perceptually prominent. Although accents usually mark words as new or contrastive in the discourse (Selkirk 1984; 1996; Gussenhoven 1983; 1999), they are also produced for reasons of phonological well-formedness (e.g. Schwarzschild 1999). For example, prosodic theory requires at least one accent per phrase even if no information within is new or contrastive. Prosodic phrases in English are marked by prosodic boundaries, which are signaled by changes in duration and pitch on phrase-final words as well as silent pauses. The size and makeup of prosodic phrases depends in part on syntactic structure, but also (as will be described below) on other factors, including the phonological length of the words in the phrases. Once researchers add the auditory modality to their study of syntax, through speech production and perception experiments, they face new theoretical, practical, and methodological questions. The goal of this chapter is to inform syntax researchers of

454

mara breen and katy carlson

the challenges inherent in doing their work using the auditory modality, as well as provide some guidance on how to practically address these challenges. In Section 13.1, we describe the theoretical questions that have been raised by the adoption of auditory studies for syntax. In Section 13.2, we discuss practical and methodological challenges for auditory research. In Section 13.3, we describe how, despite the challenges, studies using auditory stimuli have informed our understanding of sentence interpretation, which can have consequences for the study of syntactic structures. In Section 13.4, we review recent work on implicit prosody, which we will argue suggests that many of the challenges encountered in auditory paradigms are also present in silent reading.

13.1 Theoretical challenges for prosody research

..........................................................................................................................

The theoretical study of prosody followed extensive study of theoretical syntax, and so early attempts at modeling prosodic structure were based on similar models of syntax, and assumed that the constraints on prosodic structure were the same as constraints on syntactic structure (see Elfner 2018 and Wagner 2015 for reviews). But more recent work has elucidated ways in which prosodic structure is independent of syntactic structure, and suggests that prosodic structure is not simply a property of grammar, but also arises as a result of constraints on speech production (e.g. Ferreira 2007). Early models generated a sentence’s prosodic structure directly from its morphosyntactic structure (Chomsky and Halle 1968; Kiparsky 1982; Nespor and Vogel 1986). Under this view, prosodic structures are isomorphic with syntactic ones, meaning that the syntactic structure of a sentence is fully recoverable from the prosodic structure. In addition, these models postulated a one-way relationship between phrase structures such that syntax can affect prosody, but prosody cannot affect syntax. However, more recent studies have demonstrated that prosodic structure is also partially predicted by non-syntactic factors, and that prosodic structure influences syntactic choices in production. Moreover, prosodic structure appears not to obey the same strict layering constraints as syntactic structure. An isomorphic relationship between syntactic structure and prosodic structure was originally supported by studies of prominence (Chomsky and Halle 1968; Kiparsky 1982) and phrasing (McCawley 1968; Selkirk 1984). Moreover, numerous studies have demonstrated that syntactic structure, specifically the ends of syntactic phrases, is reflected in prosodic phrasing in both ambiguous and unambiguous constructions (e.g. (1) from Selkirk 1978; 1981; Snedeker and Trueswell 2003; Schafer et al. 2000; Watson and Gibson 2004; Lehiste, Olive, and Streeter 1976; Lehiste 1973). In fact, the evidence suggests not only that syntactic boundaries correspond to prosodic ones, but that the size of the prosodic boundary correlates with the height of the syntactic constituent

investigating auditory syntactic processing

455

that is ending (Price et al. 1991; Watson and Gibson 2004; Breen, Watson, and Gibson 2011). (1) In Pakistan, Tuesday, which is a weekday, is, Jane said, a holiday. These consistencies have reinforced the claim that prosodic structure is largely dependent upon, and predicted by, syntactic structure. But there are many examples of mismatch, where prosodic structure diverges from syntactic structure. For example, Shattuck-Hufnagel and Turk (1996) argue that syntax and prosody are not isomorphic because some syntactic ambiguities are not routinely reflected in the prosody, and some prosodic groupings are in direct conflict with syntactic structure. For example, although the syntactic parse of (2) groups by with the following noun phrase, speakers can group by with the preceding direct object you in production. In addition, non-syntactic factors like speech rate, weight, and tonal properties can often predict prosodic phrasing better than syntactic structure (Nespor and Vogel 1986; Fodor 1998; 2002; Gee and Grosjean 1983). (2) Sesame Street is brought to you by // the Children’s Television Network The claim that prosodic considerations can influence syntactic choices in production has been supported by a variety of experimental results and corpus analyses (Féry 2015; Poschmann and Wagner 2016; Wasow, Levy, Melnick, Zhu, and Juzek 2015; Anttila, Adams, and Speriosu 2010; Speyer 2010; Schlüter 2005; Augurzky 2006). For example, Féry (2015) argues that speakers extrapose relative clauses in cases where an in situ embedded clause creates a prosodic structure that mismatches syntactic structure. Moreover, Poschmann and Wagner (2016) demonstrate that extraposed relative clauses, which are rated worse than non-extraposed relative clauses in silent reading, are often judged as better than non-extraposed relative clauses when produced with supportive prosody. Wasow et al. (2015) explored prosodic constraints on productions of the do-be construction (as in (3)). Crucially, the inclusion of to in these constructions is optional. (3) The least we should do is (to) make it as much fun as possible. Analysis of the characteristics of instances of the do-be construction with and without to demonstrates that its inclusion is determined in part by a tendency to alternate strong and weak syllables in production (Hayes 1995): Speakers are more likely to include to in cases where the absence of an (unstressed) to would lead to a stress clash between the post-copula verb (make in the example) and the preceding form of be (is in the example). A similar demonstration of prosodic constraint on syntactic choices in production is provided by Anttila et al. (2010), who find that dative alternation (as in (4)) is conditioned on prosodic phrasing. (4)

a. Double-object: Gave the girl the book b. Prepositional: Gave the book to the girl c. Heavy NP shift: Gave to the girl the book about the American Revolution

456

mara breen and katy carlson

Specifically, in a corpus of written and spoken productions, they observed more doubleobject constructions when the goal is unstressed (I gave him the book) than when the goal is stressed (I gave Jim the book) due, they argue, to an avoidance of stress clash between the verb and the goal. They also observed more Heavy NP shifts for NPs with 2 (or more) phonological phrases (4c) and more prepositional constructions for verbs with 2 or more feet. The preceding examples demonstrate that early claims of a one-way relationship between syntactic structure and prosodic structure are oversimplified. Recent proposals attempt to account for the observed mismatches between syntactic and prosodic structure, and for the influence of prosodic considerations on syntactic choices. For example, Fodor (1998) argues that syntactic structure and prosodic structure, though related, are computed independently in production, and that in cases where syntactic structure is ambiguous, prosodic structure will determine the parse. She further specifies that the prosodic parser is motivated to divide sentences into equal-sized prosodic packages (“same-sized sisters”), a claim which has received experimental support (e.g. Augurzky 2006). Fodor’s position is similar to the claim that “performance structures”—the sentence structures revealed by prosodic phrasing patterns in production studies—reveal a tendency for speakers to balance the size of adjacent phrases (Gee and Grosjean 1983; Cooper and Paccia-Cooper 1980; but cf. Breen, Watson, and Gibson 2011). Another set of recent proposals uses Optimality Theory to model the relationship between syntactic and prosodic structure as a set of violable constraints. These proposals include Selkirk’s Alignment theory (1996; 2000), which maintains that the left or right edges of syntactic constituents should align with phonological phrases; Truckenbrodt’s Wrap theory (1995; 1999), which holds that syntactic constituents should be contained within phonological phrases; and Selkirk’s Match theory (2009), which requires that both edges of a syntactic constituent align with the edges of a prosodic constituent. But in all three cases, the alignment constraints can be outranked by markedness constraints on prosodic constituents, which can result in surface structures with syntax–prosody mismatches (see Elfner 2018 for discussion). A related proposal is the “variation + filter theory” (Anttila 2016), which maintains that syntactic processes generate candidate productions which are then adjudicated by constraints on phonological well-formedness. Using another approach, Steedman (1991) redefines syntax under a Categorial Grammar, so that syntactic structures align with surface prosodic structures (see Wagner 2010 and Hirsch and Wagner 2015, for related proposals). Finally, some argue that there is no principled relationship between syntax and prosody, and that prosodic structures are independently defined (Jun 1993). A final theoretical challenge regarding the relationship between syntax and prosody regards the underlying prosodic structure. Largely inspired by models of syntax, prosodic structure has generally been assumed to be hierarchical (Selkirk 1978; Hayes 1989; Nespor and Vogel 2007; Beckman and Pierrehumbert 1986), such that structures on one level are fully embedded within higher levels. For example, Selkirk (1978) proposes, from bottom up, these constituents: syllable, foot, prosodic word, clitic group, phonological phrase, and intonational phrase (see also Selkirk 1984; 2002). Although

investigating auditory syntactic processing

457

these levels are meant to be universal, there is disagreement about what levels exist across languages, with suggestions that languages like French, Japanese, and Korean require the addition of an accentual phrase to the hierarchy, while others require fewer levels (see Jun 2014 [2006], and references therein). Furthermore, there is theoretical disagreement about how these levels interact; some authors claim the prosodic structures are arranged in a strict non-recursive hierarchy with no level-skipping (Selkirk 1981; 1995; Nespor and Vogel 2007; Pierrehumbert and Beckman 1988; Hayes 1989), while others argue for recursion in prosodic structure (Wagner 2005; 2010; Ladd 1986; Selkirk 1996; 2011; Féry and Truckenbrodt 2005; Gussenhoven 2005; Truckenbrodt 1999). As the preceding section demonstrates, our understanding of the relationship between prosody and syntax had changed in many ways over the past thirty years. Whereas early models of prosodic structure were largely isomorphic with syntactic structure, more recent proposals assume important differences between syntactic structure and prosodic structure. It is possible that prosodic structure is computed in parallel with, or separately, from syntactic structure, that prosodic structure and syntactic structure each influence the other, and that prosodic structure, unlike syntactic structure, is not strictly layered but may, like syntactic structure, allow for recursion. Research continues across languages to further specify the relationship between syntactic and prosodic structure.

13.2 Methodological challenges for prosody research

..........................................................................................................................

Like studies of syntax, studies of the intersection of prosody and syntax have explored questions using both production and comprehension data. Both of these approaches present methodological challenges for researchers when carried out in the auditory domain. We will first lay out the challenges that arise for production studies, and then those for comprehension studies (see Shattuck-Hufnagel and Turk 1996 for an earlier review).

13.2.1 Challenges to production research There are several challenges to collecting and analyzing spoken productions of sentences. The first is to collect natural-sounding productions of sufficiently homogenous materials. The second, which is particular to studies in the auditory domain, is that of qualitatively or quantitatively analyzing prosody in productions. We’ll address these challenges in turn.

458

mara breen and katy carlson

13.2.1.1 Material selection and collection The first challenge, which is true of all research on human language, is to analyze materials that are both ecologically valid but constrained enough that they can be analyzed together. To maximize ecological validity, some prosody researchers gather productions from spoken corpora (e.g. Wasow et al. 2015; Anttila et al. 2010). One challenge to this method is the variability in the form of the productions. In addition, there are challenges specific to using spoken corpora. First, spoken corpora are not always annotated with text, meaning that, unlike written corpora, they are not as easily searchable. Second, acoustic analysis of corpora is challenging if the corpora were not collected with such analyses in mind; the signal-to-noise ratio may not be high enough to perform detailed acoustic analysis. Yet another complication for spoken language research is the fact that, in spontaneous speech contexts, people do not speak in complete and well-formed sentences at all times, especially when interacting with others rather than delivering monologues (e.g. Wennerstrom 2001). Conversation is full of overlaps, errors and repairs, unfinished statements, interjections, and a range of disfluencies, including repetitions, hesitations, and restarts (Fox Tree and Clark 1997). On the other end of the spectrum, where homogeneity of materials is privileged over ecological validity, the challenge is to generate materials that are sufficiently representative of “natural” (i.e. spontaneously produced) prosody. In studies of this kind, researchers present participants with written target sentences preceded by semantic contexts designed to privilege one prosodic reading over another. These contexts can be as simple as asking a participant to produce a target sentence (6) as an answer to a constraining question (5) (Eady and Cooper 1986). (5)

a. Q. What is happening? b. Q. What is departing from France on Sunday? c. Q. On what day is the ship departing from France?

(6) The ship is departing from France on Tuesday. The downside of this type of experiment is that participants may not be motivated (or even able) to produce the target materials with the predicted prosody (Allbritton, McKoon, and Ratcliff 1996). Indeed, there is a long history of debate about the circumstances under which speakers produce the predicted contours (see Section 13.3.1.2). This question is most salient for ambiguous sentences, for which different prosodic patterns signal different meanings. As such, many recent production experiments have adopted a hybrid approach, designed to balance ecological validity and constraints on content. One version of this type of experiment is to provide participants with the syntactic frame of a production, which they populate with item-specific constituents. For example, Breen, Fedorenko, Wagner, and Gibson (2010) presented participants with pictures of target items (as in Figure 13.1, top) which participants used to generate semi-spontaneous target utterances within a specific semantic context (Figure 13.1, bottom).

investigating auditory syntactic processing

459

fig. 13.1 Example item from Breen et al. (2010) designed to elicit naturalistic productions

In more interactive designs, naïve participants engage in games or cooperative tasks with other participants or confederates, where the rules are designed to elicit a specific set of pseudo-spontaneous productions with specific prosodic patterns. Schafer, et al. (2000), for example, investigated the relationship between syntactic attachment and phrasing using a cooperative game task with two participants—a Driver and a Slider—working together to earn points by moving pieces toward cookies and away from goats (7). (7)

a. DRIVER: I want to change the position of the square with the triangle. b. SLIDER: Which triangle do you want to change the position of the square?

460

mara breen and katy carlson c. DRIVER: The red one. When that moves the square it should land in a good spot. d. SLIDER: Good choice. When that moves the square will encounter a cookie.

Following Schafer et al. (2000) cooperative game tasks have been widely employed to investigate specific patterns of accents (Ito and Speer 2008; Watson, Tanenhaus, and Gunlogson 2008; Watson, Arnold, and Tanenhaus 2008) and phrasing (Snedeker and Trueswell 2003; Kraljic and Brennan 2005).

13.2.1.2 Analyzing production data Both practical and theoretical challenges arise in the process of analyzing spoken productions. We know from a long history of studies in experimental phonetics and phonology that listeners don’t perceive the continuous acoustic signal faithfully; rather, they map continuous acoustic variation onto discrete categories (e.g. Ladd and Morton 1997; Barnes, Veilleux, Brugos, and Shattuck-Hufnagel 2012; Dilley 2005). As such, there is value in abstracting away from the signal; the question is in how much we should abstract away. In what follows, we’ll describe two main approaches to measuring prosody. Under the instrumental approach, acoustic measurements are assessed directly (Cooper, Eady, and Mueller 1985; Eady and Cooper 1986; Fry 1955; Lieberman 1960; Pell 2001; Xu and Xu 2005). Under the intonational phonology framework (Féry 2016; Pierrehumbert 1980; Beckman and Pierrehumbert 1986; Ladd 2008; Gussenhoven 1983), prosodic structures are categorical grammatical objects abstracted from the acoustic signal. In addition to disagreement about whether prosodic features are produced and perceived categorically, there is debate about what those categories should be. As we describe below, prosodic annotation systems differ in terms of what prosodic categories they instantiate, as well as how those categories are defined. 13.2.1.2.1 direct measures of prosody Before the development of standard annotation schemes, prosody researchers relied on direct acoustic measures of speech to assess differences in meaning via prosody. Early investigations of prominence demonstrated that focused words (which are new to the discourse) were produced with greater duration and intensity than non-focused words (Fry 1955; Lieberman 1960; Eady and Cooper 1986; Cooper, Eady, and Mueller 1985). For example, the word ship in the answer in (6) is produced with a longer duration and greater intensity following (5b) than (5c), because in the former case it is the new information requested in the question. Similarly, early investigations of phrasing showed that speakers signaled syntactic discontinuity with durational lengthening (Klatt 1975; Lehiste, Olive, and Streeter 1976; Scott 1982). In any of these experiments, there is a considerable amount of work involved in analyzing the productions. First, the direct measurement of acoustic features requires high-quality recordings. Second, the acoustic features of interest need to be identified,

investigating auditory syntactic processing

461

either by hand-identifying them in a waveform, or, using modern modeling methods, force-aligning the words to the speech (Gorman, Howell, and Wagner 2011). The main challenge in both of these cases is in deciding what to measure. As described elsewhere, acoustic cues of duration, intensity, and pitch are all features that convey prosodic importance. But there are multiple ways of measuring each of these features. For example, durational lengthening affects segments of pre-boundary words differently than non-boundary-adjacent words (Wightman et al. 1992). Pitch is arguably more complex. Whereas some studies of pitch are concerned only with gross measures of average or maximum pitch, it’s clear that the shape of the pitch contour conveys specific meaning: pitch interpretation can be affected by the timing of specific turning points in the pitch contour (Bruce 1977; Arvaniti, Ladd, and Mennen 1998), the scaling of the pitch with reference to pitch targets in the local domain or the speaker’s range (Liberman and Pierrehumbert 1984), or a combination of both timing and scaling (Barnes et al. 2012; Neibuhr 2007). In addition to the challenges of measuring any single acoustic feature, an additional complication comes from the fact that these features interact in important ways, and that individual speakers vary in what specific features they employ to signal disjuncture (Cole 2015). One additional issue that arises when we consider speech production and comprehension rather than writing is the possibility of disfluencies. Writing is usually done without the same time pressure as speech and so does not contain pauses, disfluencies, and errors and repairs at the same rate as speech. Disfluencies present challenges to doing syntactic work in the auditory domain, as their presence contaminates the production around them; researchers tend to exclude disfluent speech from analysis. The key is to be aware that disfluencies will happen, and decide ahead of time how to deal with them (see Section 13.3.2.4 for a discussion of how research on disfluencies informs syntactic theory). Given the complexity of identifying just what aspects of acoustic features are important in determining meaning differences, many researchers choose to investigate prosody using annotation schemes, where trained listeners perceptually categorize prosodic events according to a set of conventions. 13.2.1.2.2 indirect measures of prosody The value of prosodic annotation systems is that they can help prosody researchers describe prosodic phenomena and share data across labs, as well as more effectively quantify differences across speakers and contexts. One challenge with using an annotation scheme to assess prosody is that there is still considerable disagreement about what specific intonational categories speakers are producing and listeners are perceiving. In what follows, we will describe two influential systems of prosodic annotation, which differ in terms of the intonational categories they define. The Tones and Break Index (ToBI) annotation scheme (Beckman and AyersElam 1997; Beckman, Hirschberg, and Shattuck-Hufnagel 2005) is based on the autosegmental-metrical theory of prosody (Pierrehumbert 1980; Pierrehumbert and Beckman 1988; Beckman and Pierrehumbert 1986). It was developed as the result of a

462

mara breen and katy carlson

collaborative effort between linguists, computer scientists, and psychologists who recognized the value of using a common annotation system in order to share prosodic data across labs and disciplines (see Beckman, Hirschberg, and Shattuck-Hufnagel 2005). The ToBI system recognizes two categories of prosodic features: pitch accents and prosodic boundaries (Pierrehumbert 1980; Ladd 2008). It also contains multiple categories of pitch accent (Pierrehumbert 1980; Pierrehumbert and Hirschberg 1990; Ladd 2008) using a simple formalism: the letters L or H indicate a low or high pitch (F0) target; * indicates the part of the accent that aligns with the main stressed syllable of the associated word; + indicates that an accent has multiple pitch targets. The most common pitch accent is H* (Dainora 2001), which is generally used to mark new information (Pierrehumbert and Hirschberg 1990). The L+H* accent, with a leading low tone and a higher, steeper rise in pitch than the H*, is used for contrastive information (Ito et al. 2004), though there are varying views on whether these two accents are distinct or form a continuum from less to more prominent H accents (e.g. Ladd and Morton 1997; Ladd and Schepman 2003; Bartels and Kingston 1994; Rump and Collier 1996). Other possible ToBI pitch accents include L*, L*+H, and H+!H* (a high pitch accent preceded by a higher tone), but these accents are far less frequently observed (Breen, Dilley, Kraemer, and Gibson 2012). Under the ToBI system, prosodic boundaries are annotated using a system of break indices ranging from 0 to 4; the number indicates the amount of perceived disjuncture between words. Practically speaking, ToBI instantiates three levels of boundary: no boundary (break index = 0,1), intermediate boundary (break index = 3) or intonational phrase boundary (break index = 4). Break index 2 is reserved for rare instances of mismatch between boundary cues. Importantly, within the ToBI system, there are dependencies between pitch events and break indices. For example, a sequence of speech bounded by break indices of 3 or 4 must contain at least one pitch accent, and a break index of 4 must always be annotated when there is bi-tonal movement at the end of a phrase (Beckman and Ayers Elam 1997). The strengths of ToBI are that it has been widely adopted by speech researchers, resulting in a large corpus of annotated speech, and versions of ToBI have been developed for a prosodically diverse set of languages (Jun 2006). However, research performed subsequent to the development of ToBI suggests that the system includes some categories that are not reliably perceived by listeners (e.g. Dilley 2007), and that it lacks other important categories that are reliably perceived by listeners (e.g. Dilley 2005). Moreover, agreement studies demonstrate inconsistency in the ways that trained coders annotate speech with ToBI (Yoon et al. 2004; Breen, et al. 2012). In part as a response to these challenges, the Rhythm and Pitch (RaP) system was developed as an alternative annotation scheme (Dilley and Brown 2005; Breen et al. 2012). Like ToBI, RaP adopts the conventions of autosegmental-metrical intonation theory: RaP allows annotators to mark categories of prominence (accents) and disjuncture (boundaries). Unlike ToBI, where accents are labeled in a binary fashion (accented vs. unaccented), RaP allows for three levels of prominence. Syllables are first labeled

investigating auditory syntactic processing

463

as beats or non-beats, depending on metrical prominence (but not on pitch information). In a second step, syllables aligned with beats are labeled as accents if they also feature a pitch excursion. In this way, syllables can be non-prominent, prominent but not pitch-accented, or prominent and pitch-accented, a three-way distinction that is supported by recent work demonstrating systematic meaning-based three-way distinctions between prominence categories (Beaver, Clark, Flemming, Jaeger, and Wolters 2007; Greenberg, Carvey, and Hitchcock 2002). Moreover, the annotation of beats and non-beats allows for the coding of rhythmic structure in speech, which is important for language acquisition (Nazzi and Ramus 2003), as well as adult perception (Cutler and Norris 1988). Like ToBI, RaP instantiates three levels of phrasal disjuncture (no boundary, intermediate boundary, and intonational boundary), but, unlike ToBI, where disjuncture distinctions are determined in part by grammatical constraints between break indices and tonal labels, boundaries in RaP are determined only by perceived disjuncture. By not instantiating grammatical constraints between labels, RaP annotations are designed to be closer to the acoustic signal. Moreover, RaP allows for the annotation of a wider variety of intonational events than ToBI, reflecting more recent work in intonational phonology (Dilley and Brown 2005). In this way, RaP is argued to be not only easier to learn, but allows for better agreement between coders (Breen et al. 2012). The development of prosodic annotation schemes like ToBI and RaP have led to great advances in understanding prosodic phenomena and have facilitated collaboration and cooperation among labs. But these schemes are difficult to learn, and even the most experienced annotators require 45–60 minutes to label a minute of speech. Moreover, although annotation schemes have been developed for multiple languages, they still cover only a small set of the world’s languages. In response to these challenges, Cole and colleagues have developed rapid prosody transcription (RPT; Mo, Cole, and Lee 2008; Cole, Mahrt, and Roy 2017), in which naïve coders label prosodic events in realtime, either in the lab or using internet crowd-sourcing platforms. Results from RPT studies demonstrate acceptable agreement among coders, and offer further insight into the specific acoustic cues that coders use to define accents and boundaries. As these tools become more automated, they have the potential to greatly expand the pool of prosodically annotated data for use across multiple labs. As the above discussion demonstrates, there can be considerable value in using a prosodic annotation scheme to identify accent and phrasing categories. But without universal agreement about what categories an annotation scheme should recognize, or agreement on what counts as a member of the category, important aspects of the signal may be missed. In this way, there are positive and negative aspects to both categorical (i.e. annotation) and continuous (i.e. acoustic) approaches to explaining prosodic variation. As we will describe in Section 13.3, both methods have allowed for the identification of important aspects of the relationship between suprasegmental features and meaning. One answer to this challenge is to supplement prosodic annotation with measurements of specific acoustic features.

464

mara breen and katy carlson

13.2.2 Challenges to comprehension studies Not only do we want to understand how speakers produce syntactic structures in speech; we also want to understand how they perceive them. The challenges to conducting comprehension experiments are similar to those of production studies: creating the materials, collecting the data, and maximizing the validity and generalizability of the results. The first challenge to comprehension studies is how to create the stimuli. As with production studies, the goal is to assess listeners’ understanding of a specific prosodic feature or contour; and to do so requires a collection of instances of that contour. A challenge in creating these productions is to match the productions on every prosodic dimension save for the feature or contour of interest. Doing this may require soliciting productions from trained speakers, and often extensive splicing of experimental materials, which can affect their naturalness. Moreover, it may not be possible to match stimuli on every other dimension. For example, in order to investigate how the presence of an accent on a sentence constituent will affect the sentence’s interpretation, the researcher cannot simply remove the accent, because sentences without accents don’t occur in natural speech. Relatedly, removing a phrase boundary from a sentence might also render that sentence unnatural if the resulting string of speech is too long to be reasonably produced without a boundary, while too frequent prosodic boundaries after short constituents can also be distracting. In short, the syntactic and phonological constraints of sentences may interfere with the desire for absolute uniformity and control of all variables. Yet another challenge is how to assess listeners’ comprehension of prosody. One approach is to elicit acceptability judgments about the prosodic features in question, asking listeners to judge the difference between the pitch contours of two exemplars (Dilley 2007), or to state explicitly where they perceive the location of accent or a boundary to be (Streeter 1978; Cole, Mo, and Baek 2010; Buxó-Lugo and Watson 2016). For example, Streeter (1978) investigated the acoustic features of boundary perception by playing listeners manipulated versions of ambiguous equations like “A plus E times O” and asked listeners to decide whether the speaker intended a grouping of “(A plus E) times O” or “A plus (E times O).” A second approach is to collect listeners’ ratings of the naturalness of specific productions of word strings, such that a higher naturalness rating indicates better fit of the prosodic contour with the words (e.g. Welby 2003; Birch and Clifton 1995). A more ecologically valid approach (i.e. one that better approximates how prosody is processed in normal comprehension) is to assess prosodic comprehension indirectly by asking how the manipulation of some prosodic feature influences a listener’s interpretation. For example, listeners can be presented with a specific prosodification of an ambiguous sentence and asked their interpretation of the ambiguity. This approach has been used to assess how listeners interpret the location of prosodic prominence (Breen et al. 2010) as well as phrasing (Wagner 2010; Carlson, Clifton, and Frazier 2001).

investigating auditory syntactic processing

465

Other indirect measures of prosodic interpretation include assessing memory for accented/unaccented constituents (e.g. Braun and Tagliapietra 2010; Fraundorf, Watson, and Benjamin 2010; Fraundorf, Benjamin, and Watson 2013) or effects of priming by accented constituents on lexical decision times (e.g. Husband and Ferreira 2016). Finally, researchers can assess prosodic comprehension in ways that require no explicit judgment or interpretation from the listener using the Visual World paradigm and event-related potential (ERP) studies. For example, Visual World studies have demonstrated that listeners interpret different shapes of accents in different ways (Arnold 2008; Watson et al. 2008; Ito and Speer 2008), that disfluencies are interpreted as cues to upcoming information (Arnold et al. 2004; Arnold et al. 2007), and that phrase boundaries cue syntactic attachments in real time (Snedeker and Trueswell 2003; Kraljic and Brennan 2005). Using ERPs, researchers have identified a characteristic brain response to phrase boundaries (Steinhauer, Alter, and Friederici 1999) which has not only been observed for both listening and silent reading (Steinhauer 2003; Steinhauer and Friederici 2001), but has subsequently been used to explore interactions between prosody and syntax (Hwang and Steinhauer 2011; Liu, Wang, and Jin 2010). In addition, ERPs have been used to investigate how listeners process revision of focus structure (Stolterfoht, Friederici, Alter, and Steube 2007), and as evidence for levels of the prosodic hierarchy (Domahs, Weise, Bornkessel-Schlesewsky, and Schlesewsky 2008; Li and Yang 2009).

13.3 What we can learn about syntax from the auditory domain?

..........................................................................................................................

Despite the methodological challenges that they raise, studies in the auditory modality have provided significant insights into syntactic structure and processing. In this section, we will review recent experimental work which demonstrates how speakers use prosodic features of disjuncture and prominence (prosodic boundaries and accents, respectively), to signal syntactic structure and, moreover, how listeners infer structure from the perception of these features.

13.3.1 Boundaries As described in Section 13.1, utterances are subdivided into smaller prosodic constituents in a way that reflects, in part, the syntactic dependency structure of the utterance. These prosodic groupings are signaled by the presence of prosodic boundaries. Over the past thirty years, researchers have investigated how speakers signal the edges of prosodic constituents with a combination of specific acoustic and phonological cues, and how syntactic structure influences speakers’ and listeners’ perception of

466

mara breen and katy carlson

prosodic boundaries (see reviews by Cutler, Dahan, and van Donselaar 1997; Wagner and Watson 2010; Cole 2015). Results from a diverse set of methods including both production and perception demonstrate that listeners’ interpretation of syntactically ambiguous sentences is affected by the presence or absence of prosodic boundaries (Beach 1991; Price et al. 1991; Snedeker and Trueswell 2003; Kjelgaard and Speer 1999) and naïve speakers produce ambiguous structures differently depending on the intended interpretation (Snedeker and Trueswell 2003; Kraljic and Brennan 2005). Although there is some contention about when these cues are or are not provided, it is clear that both speakers and listeners have knowledge about the relationship between syntactic structure and prosodic phrasing. In the following section, we’ll review work exploring the production and perception of boundaries, including how syntactic (and other) factors have been shown to influence the placement of prosodic boundaries, and how prosodic boundaries are interpreted in context.

13.3.1.1 Top-down and bottom-up cues to boundaries The perception of prosodic boundaries depends on multiple top-down and bottom-up cues. On the one hand, there are strong disjuncture cues in the bottom-up acoustic signal: Words preceding boundaries are lengthened relative to words at non-boundary positions (Klatt 1975; Price et al. 1991; Wightman et al. 1992); silence between words is more likely (and longer) at boundary locations (Klatt 1975; Lehiste 1973; Cooper and Paccia-Cooper 1980); speakers tend to either raise or lower their pitch at boundary locations (Pierrehumbert 1980; Streeter 1978); speakers may signal the onset of a new prosodic constituent through greater intensity on the phrase-initial syllable (Cho, 2002; Fougeron and Keating,1997; Jun, 1993; Keating, Cho, Fougeron, and Hsu 2003); finally, phonological processes that apply across word boundaries (e.g. French liaison or flapping in English) are more likely within phrases than across phrase boundaries (Wagner 2015; Nespor and Vogel 1986). As described in Section 13.1, prosodic structure is believed to be hierarchical, including, minimally, prosodic words, phonological phrases, and intonational phrases (Selkirk 1980; Beckman and Pierrehumbert 1986; Nespor and Vogel 1986; Wightman et al. 1992). Given this organizing structure, psycholinguistic researchers have investigated to what extent speakers realize multiple categorical levels of disjuncture. This work has focused primarily on the difference between phonological phrase boundaries, and intonational phrase boundaries, which can be thought of break indices 3 and 4 in the ToBI system or intermediate vs. intonational boundaries in the RaP annotation system. Production studies have provided evidence of a hierarchical relationship between the amount of lengthening and the level of the boundary, with more lengthening at locations where listeners perceive larger boundaries (Ladd and Campbell 1991; Wightman et al. 1992; Kim, Yoon, Cole, and Hasegawa-Johnson 2006), and with stronger articulatory events at the onsets of larger phrases (Jun 1993; Fougeron and Keating 1997).

investigating auditory syntactic processing

467

In addition to bottom-up acoustic cues, boundary perception is determined in part by expectations, and listeners perceive boundaries in part based on where they should occur (Martin 1970). For example, syntactic structure accounts for listeners’ perceptions of boundaries over and above the contribution of actual acoustic cues such that listeners are more likely to report hearing a prosodic boundary at the location of a syntactic boundary (Cole, Mo, and Baek 2010; Buxó-Lugo and Watson 2016). Finally, while annotation systems like ToBI treat boundaries as absolute categories, more recent evidence suggests that boundaries are determined in relation to the context in which they occur. Production studies demonstrate that speakers scale later boundaries relative to boundaries that they produced earlier in sentences (Wagner 2005; 2010). Similarly, in perception studies, boundaries later in a sentence are interpreted with reference to previous boundaries (Schafer 1997; Carlson, Clifton, and Frazier 2001; Clifton, Carlson, and Frazier 2002). For example, a phrase boundary at position (b) in the globally ambiguous (8) led to fewer high attachment interpretations of after John visited if it was preceded by a similar-sized or larger boundary at position (a). (8) Susie learned (a) that Bill telephoned (b) after John visited.

13.3.1.2 Boundaries and syntactic structure One enduring question in the literature is to what extent speakers disambiguate syntactic structure with prosody. Early work suggested that while some constructions are routinely disambiguated by prosodic phrasing (Price et al. 1991), not all syntactic ambiguities can be disambiguated by prosody. For example, Lehiste (1973) argued that some syntactic ambiguities, like (9), could not be disambiguated by prosody. (9) Visiting relatives can be a nuisance. Further work has explored in more detail the specific types of syntactic structures that speakers consistently disambiguate with prosody, but results have been mixed. For example, some have argued that speakers routinely provide cues that disambiguate the Object/Clause ambiguity, as in (10) (Nagel, Shapiro, Tuller, and Nawy 1996; Beach 1991). (10) Jay believed the gossip… a. right away. (Object) b. wasn’t true. (Clause) On the other hand, Anderson and Carlson (2010) demonstrated that speakers routinely provide cues that disambiguate Late vs. Early closure sentences like (11), but rarely provided disambiguating cues to Object/Clause ambiguities. (11) As Janet baked the bread… a. the brownies cooled. (Late) b. cooled on a rack. (Early)

468

mara breen and katy carlson

These and other equivocal results raise the question of the extent to which speakers who are not aware of potential ambiguity will produce disambiguating prosody. Some studies have demonstrated that prosodic cues to syntactic disambiguation are provided only by speakers who are aware of the ambiguity (Albritton, McKoon, and Ratliff 1996; Snedeker and Trueswell 2003; Fox Tree and Meijer 2000). Snedeker and Trueswell (2003) elicited productions of globally ambiguous sentences like (12). Only speakers who viewed a visual scene that made the ambiguity explicit (by inclusion of a frog, a flower, and a frog holding a flower) disambiguated their productions, where an early boundary (12a) signalled a modifier reading, and a late boundary (12b) signalled an instrument reading. On the other hand, in globally ambiguous sentences like (13), speakers disambiguated with prosody regardless of whether they were aware of the ambiguity (Kraljic and Brennan 2005). (12) Tap the frog with the flower. a. Tap / the frog with the flower. b. Tap the frog / with the flower. (13) Put the dog in the basket on the star. a. Put the dog / in the basket on the star b. Put the dog in the basket / on the star One explanation for the difference between these studies has to do with the length of the target sentences. As speech is planned incrementally, many researchers make the claim that the unit of planning is constrained to be both a syntactic and prosodic constituent as in Selkirk’s (1978) sense unit condition and Watson and Gibson’s (2004) LRB hypothesis. If the sentence is short enough that it can be produced without a boundary, as in (12), the speaker will not disambiguate unless they are explicitly aware of the ambiguity. However, if the sentence is long enough that it requires more than one intonational phrase, as in (13), the speaker will insert the boundary in the location consistent with the intended meaning, respecting the syntactic structure.

13.3.2 Accents and sentence processing A good deal of research on the role of accents in sentence processing has been done, but whether the research is seen as relevant to experimental syntax depends in part on whether the interface of syntax with semantics and information structure is of interest. In general, as laid out below, researchers have found that accented words are more memorable and quicker to process than unaccented ones; that as markers of information structure (newness, givenness, contrastiveness), accents can cause listeners to look more at new or contrastive items in a visual display and to consider contrastive alternatives; that they can affect the resolution of pronoun reference and attachment ambiguities; and that they can affect the interpretation of ambiguous focus-sensitive sentences such as several types of ellipsis. Most if not all of these results have come

investigating auditory syntactic processing

469

about because accents can be auditory indicators of the position and type of focus (Rooth 1992), and as such, these results are analogous to results of studies on other focus indicators such as focus particles (e.g. only) and syntactic clefting. Several early studies of accents in processing were carried out by Cutler and colleagues in the late 1970s. Using a phoneme-monitoring technique (with listeners responding when they heard a word starting with a particular sound), Cutler and Fodor (1979) found that words focused by a preceding question were responded to faster than unfocused words. Cutler and Foss (1977) found that accented content and function words were both responded to faster than unaccented ones. In a particularly elegant experiment, Cutler (1976) found that listeners predicted the position of accent on specific words from the surrounding prosodic contour, such that they responded quickly (as if a word was accented) when actually Cutler had spliced in an unaccented rendition from another recording. This set of results demonstrated invariant facilitation coming from focus on a word, whether or not the word was acoustically accented. This result rules out any theory under which accented words are processed faster simply because they are phonetically prominent. Turning to the match between particular focus structures and the presence and position of H* accents, Birch and Clifton (1995; 2002) showed that auditory sentences were rated as more natural/acceptable when contours accented new elements and given elements were unaccented. This harmonizes with earlier work by Bock and Mazzella (1983) and Terken and Nooteboom (1987), who found faster comprehension of sentences with new elements bearing accents and given elements without. Nooteboom and Kruyt (1987) added the finding that extra accents on given materials were less dispreferred than contours that failed to accent new information. Birch and Clifton also found evidence for the theory of focus projection (Selkirk 1984; 1996), a theory which takes explicit account of the syntactic structure of a sentence in calculating what syntactic unit a specific accent will focus. Birch and Clifton (1995) found that accenting the object of a verb was sufficient to mark the VP as focused almost as well as accenting the verb. Conversely, Birch and Clifton (2002) showed that adjunct phrases were not able to project focus to a whole VP, so that accents on them only facilitated interpreting the adjuncts themselves as new, not the whole VP. Both results support Selkirk’s theory. On the production side, Breen, Fedorenko, Wagner, and Gibson (2010) carried out several production studies of simple sentences in a range of focus conditions to explore how likely normal speakers are to mark information structure with accents. They used preceding questions to make subjects, verbs, or objects either given, by being mentioned in the question; focused, by being the answer to the wh-question; or contrastive, by contradicting part of the information in a yes/no question (14) (see Figure 13.1). (14) subject contrastive focus condition: a. Did Harry fry an omelet this morning? b. Damon fried an omelet this morning. Across multiple experiments, speakers did use multiple acoustic prosodic cues to indicate focus location and produced different prosodic patterns for wide focus on a whole

470

mara breen and katy carlson

sentence vs. narrow focus on only the object. In experiments involving communication with listeners, they also produced distinguishable patterns for contrastive focus vs. noncontrastive focus, but this difference did not emerge clearly in a non-communicative task. This paper is unusual in not annotating the acoustic patterns with prosodic notation of accent presence and type; instead, Breen et al. (2010) collected a wide range of acoustic measurements and performed discriminant function analyses on them to establish differences in productions. Overall, the most useful features for indicating focus were duration, F0 (fundamental frequency) height, and intensity (loudness), which is consistent with the set of features usually used by annotators to decide that an accent is present.

13.3.2.1 Accents and eye movements in the Visual World paradigm A recent line of research on accents has concentrated on their effects on eye movements in Visual World paradigms. The general question is how early people show eye movements directed by prosodic (and segmental) information, and whether accents can be used to predict what visual objects will be referred to even before the object names are spoken (or completed). Some researchers have also studied whether different accent types (H*, L+H*) affect looking behaviors differently, either in time-course or in referent choice. Dahan, Tanenhaus, and Chambers (2002) developed the following method to explore the interpretation of accents on given and new nouns in the Visual World paradigm: Participants’ eye movements were tracked as they listened to and followed simple pairs of instructions about the placement of objects in a grid (15). (15) Put the candle above the square. Now put the X below the Y. Critically, the objects included pairs of words with overlapping initial syllables, such as candle/candy. When the first instruction had already mentioned one of these objects (e.g. the candy), a second instruction with an accented CAND- syllable led to more early looks to the unmentioned object (candle), compared to a deaccented version of that syllable. This result suggests an expectation that an accented word will refer to a new item, and that an unaccented word will refer to a given item. A second experiment varied whether the accented target had been in the same syntactic position across sentences. When the accented item was not only given but in the same focused theme position, looks to a competitor item dominated at first; when the accented item was given but had been in a different unfocused position, looks to that item started right away. This work shows that the use of accents in predictive looking is sensitive not just to whether items have been mentioned at all in a discourse, but also to their focus status and syntactic position. Using the same paradigm, Watson, Tanenhaus, and Gunlogson (2008) specifically studied the difference between H* and L+H* accents in looking behavior, and found that while L+H* accents did favor contrastive referents (the camel, when the camel and the dog had been introduced as contrasting with each other), H* accents did not clearly

investigating auditory syntactic processing

471

favor new referents or contrastive referents. They suggest that rather than being categorically distinct, the two accents have overlapping functions, with L+H* accents more specialized to pick out contrasts and H* accents being used in a variety of situations. Arnold (2008) replicated Dahan et al. (2002)’s findings for adults and also extended the research to 4- to 5-year-old children with similar results, concluding that they must have already acquired knowledge of the way that accenting and deaccenting relate to the discourse status of words. Another line of investigation has studied the effect of accents on adjectives in adjective–noun phrases like the red ball. Weber, Braun, and Crocker (2006) found effects of adjective accents in German, using displays containing a pair of objects contrasting in color (red scissors, purple scissors), another red item, and an unrelated distractor. An initial instruction with noun accent referred to one of the contrastive items (purple SCISSORS), and the second instruction accented the adjective or the noun and picked out the contrastive or non-contrastive item. Accents on the adjective sped up looks to the contrastive item (scissors) but slowed down looks to the same-colored non-contrastive item (vase). The use of the adjective red also led to early looks in all cases to the contrastive red item (scissors). These results contrast with earlier studies in English by Sedivy et al. (1999), which found effects of adjective presence but not the accent. Weber et al. trace the difference in part to the more limited time participants had to look at their displays (vs. longer times in Sedivy et al.’s study, which could have led to display contrasts being identified). An extension of this type of research by Ito and Speer (2008) used the more complex task of hanging ornaments of a variety of colors and shapes on a mini Christmas tree. In this display, there were many potential contrasts, so the contrastive structure of the discourse was not given away by the items displayed and had to be computed based on the auditory instructions. They found that L+H* contrastive adjective accents used appropriately on an item of the same type but different color (i.e. blue ball….GREEN ball) facilitated early looks to the right item, while infelicitous contrastive accenting on non-contrastive nouns (blue ball…green BALL) was not helpful. Similarly, inappropriate use of accents on new adjectives when the noun also changed (i.e. blue angel…GREEN ball) were confusing, as people first looked to green items with the same noun until the noun’s segmental information conflicted with that choice. Together with the other studies of contrastive accents in the visual world, this result indicates that people utilize the presence and position of these accents very quickly to guide their looking behavior, and are sensitive to the representation of contrast. In a clever study of larger prosodic contours, Kurumada et al. (2014) studied the influence of two different accent and boundary tone combinations on the sentence It looks like a zebra. A H* accent on the final noun (zebra) with a falling L-L% boundary is most consistent with the conclusion that the pictured item actually is a zebra, while a contrastive L+H* accent on the verb looks and a rising L-H% boundary imply that it is not a zebra. The latter interpretation, they note, comes about through an implicit contrast of the verb looks like with the verb is. The prosodic patterns interacted with the display conditions: in displays with only one possible contrast, the contrastive prosody resulted

472

mara breen and katy carlson

in a preference to look at the unusual non-zebra animal (an okapi) which started before the onset of the noun, and the other prosody led to looks at the zebra. In displays with two possible contrasting sets of nouns, the effect of the contrastive prosody started only after the noun was heard, and there was a general late preference to look at the okapi even in the non-contrastive prosodic condition. Overall, Kurumada et al. interpret the results as showing that the contrastive function of the L+H* is used early in processing, well before the boundary tone could contribute to the meaning, to reverse the literal meaning of the verbal phrase looks like. A slightly different pattern of results was found by Dennison and Schafer (2010), who were studying H* vs. L+H* accents crossed with two boundary tones, the falling L-L% and the rising L-H%. Dennison and Schafer had complex, multi-item displays (boxes representing Lisa’s vs. Bart’s rooms from The Simpsons, each containing around 10 objects), and used the critical sentence Lisa had the bell. The sentence could state simple possession or the reversed implicature of once having had but no longer possessing (i.e. Lisa had the bell at some earlier point but now Bart does). When the bell was not in Lisa’s possession, mouse-clicks on the bell were faster after the combination of a L+H* contrastive accent and a L-H% boundary than the same accent and a L-L% boundary; with the bell in Lisa’s possession, the L+H* accent and L-L% boundary was fast, as was a contour with simple H* accents. This work suggests that contrast leading to a reverse implicature is signaled best by the combination of a pitch accent and boundary tones, at least with a complex display and this time-based implicature. Overall, this general line of research has shown success in linking accents (or full prosodic patterns) to different predictive looking behaviors, and supported some of the general beliefs about accent use: that new or contrastive items are more likely to be accented than given ones, and that contrastive L+H* accents are more diagnostic of contrasts than H* accents. But the research has also shown that there is a complex interplay between the presence and amount of contrast in a visual display and the use of accents.

13.3.2.2 Accents and contrastive alternatives Contrastive focus on an element should lead to the calculation and consideration of alternatives (Rooth 1992). Braun and Taglapietra (2010), and Husband and Ferreira (2016) studied this process using cross-modal priming, a technique which probes the activation of words during the sentence. As listeners heard a sentence, a related word would be presented visually and listeners indicated whether it was a word or non-word. Braun and Tagliapietra presented the visual primes immediately after the end of sentences, and found that sentences with contrastive accents on target words (in Dutch) led to faster responses to contrastive alternative words than with more neutral intonation. Responses to non-contrastive words associated with the target were not affected by contrastive intonation. Husband and Ferreira used priming earlier in the sentences, finding that initially both contrastive and non-contrastive associate words were primed, but that people hearing sentences with contrastive intonation continued to activate the contrastive alternatives whereas other associates faded. Both sets of results support

investigating auditory syntactic processing

473

the semantic theory that contrastive focus involves consideration of alternatives to the accented word. Fraundorf, Watson, and Benjamin (2010) also studied the effect of accents on contrastive alternatives, but concentrated on post-sentence memory tests. Their discourses established a contrast set (e.g. British scientists vs. French scientists) and then continued to provide information about one of the members of the set. When the continuation used a L+H* contrastive accent on the contrasted item, later memory tests (following the presentation of all stimuli) showed better memory for the contrasted item and better ability to reject false statements about the other member of the contrast. A H* accent did not have the same enhancing effect, and neither accent impaired memory for other elements of the discourse. Spalek, Gotzner, and Wartenburger (2014) found similar results for contrastive alternatives to a word which was focused with one of the focus particles only or even plus an accent, vs. just an accent. This work, like the cross-modal priming studies, suggests that contrastive accenting both highlights a particular piece of information and also leads to consideration of its contrastive alternatives.

13.3.2.3 Accents and syntactic structures One area where accents have been shown to interact with syntactic processing is in the resolution of attachment ambiguities. Schafer et al. (1996) studied the attachment of relative clauses (RCs) to an earlier (N1) or later noun (N2) in phrases like the propeller of the plane that the …. The relative clause could modify N1, propeller, or N2, plane, and H* or contrastive accents on one or the other noun increased attachment of the RC to that noun. Schafer et al. proposed that relative-clause modifiers were drawn to modify an accented noun because of its focused status. Lee and Watson (2011) followed up on this research, and added the finding that longer relative clauses were more likely to shift to N1 attachment in the presence of N1 accent than shorter RCs were. An additional study found that participants preferred to choose answers including the contrastively accented noun, even when that was incorrect. They therefore trace the effects of accent on attachment to the simple salience of the accented nouns, and claim that the final study suggests a question-answering strategy may be behind all effects of accent on attachment. Recent research on a range of additional attachment structures (e.g. Paula phoned a friend # from Alabama, or Kathie claimed that Bill had called # on Friday) has demonstrated that L+H* accents on verbs or nouns can draw the attachment of ambiguously attached prepositional phrase (PP) modifiers (Carlson and Tyler 2017) with effects similar in size to the effects of prosodic boundaries on attachment. Further, a prior wh-question which focused either attachment site also drew the attachment of a modifier, even with no contrastive accents in the sentence (Carlson and Potter 2020). These results are consistent with Schafer et al.’s theory that focus attracts syntactic attachment. Ellipsis sentences are another area of sentence processing in which accents have the potential to suggest different syntactic structures. These sentences are often said to be focus-sensitive, with required focus on contrasting elements within the antecedent and

474

mara breen and katy carlson

ellipsis remnant (Merchant 2001; Sag 1980). For example, in a bare argument ellipsis sentence like Diane thought Patrick was entertaining, not Louise, the final remnant (Louise) could contrast with the higher subject (Diane) or the lower one (Patrick). The choice of interpretation changes what material from the complete first sentence must be re-accessed or copied so as to understand the elided part. In Carlson, Frazier, and Clifton (2009), L+H* accents on the higher or lower subject of such sentences led to a 20% change in interpretations, with almost a third of responses choosing the higher subject contrast with the higher accent placement. Similar effects of accent placement on remnant interpretation have been shown for a range of ellipsis types, as in (16). (16)

a. sluicing: Some tourist suspected that the hotelkeeper was hiding someone. Guess who? (Frazier and Clifton 1998) b. gapping: Dan amazed the judges with his talent and James with his musicality. (Carlson 2001) c. VP ellipsis: John said Fred went to Europe and Mary did too. (Frazier, Clifton, and Carlson 2007) d. let-alone ellipsis: Danielle couldn’t drive a car, let alone race one/a motorcycle. (Harris and Carlson 2016)

In German, Stolterfoht et al. (2007) found evidence from event-related potentials (ERPs) that people expected bare argument ellipsis sentences to have object-contrasting remnants, especially when nur ‘only’ marked the first-clause object. This is consistent with ellipsis processing in English, too, as most structures in (16) also showed an object (or last argument) bias in addition to effects of accent or focus particle placement. In the domain of pronoun reference, we can consider the effects of accenting possible antecedents for a pronoun as well as the effect of accent on a pronoun itself. Both have been shown to influence the chosen referent. Balogh (2003) found that accents on one possible antecedent of a personal pronoun slightly increased references to that antecedent, but that the effect could be swamped by variation in the salience of antecedents due to syntactic position and thematic role. She then showed that contrastively accenting an object pronoun caused a large shift in interpretation, from 85% references to the previous object with an unstressed pronoun down to only 20% of such references with a contrastively accented pronoun. Judging by this comparison, accents on pronouns seem to be more influential in referent choice. It appears that the antecedents had H* accents whereas L+H* accents were placed on pronouns, so the different results could also reflect differences between the accent types in their interaction with reference. Brown-Schmidt et al. (2005) studied interpretation preferences for the personal pronoun it and the demonstrative pronoun that, as well as their reactions to accenting. It was more likely to refer to a theme object, the focus in a previous sentence, while that referred more to composites of multiple objects or non-focused objects. Accenting it reduced its preference to refer to the focused item somewhat, but accents on that did not materially change its reference.

investigating auditory syntactic processing

475

As described above, certain types of ambiguities are known to be effectively influenced by accent placement, including focus-sensitive ellipsis structures and reference resolution processes. Attachment, though, is a more general process, so the finding that accents affect it suggests that accents may have a wider domain of application than previously suspected. For example, work on attachment ambiguities will need to consider and document the accents used as well as the prosodic boundaries, and prosodic studies of a range of ambiguities could explore whether accents also influence interpretation or processing.

13.3.2.4 Disfluencies Although disfluencies in speech were initially seen as simple errors which should be discarded, recent work has demonstrated important relationships between syntax and prosody through the study of the production of and interpretation of disfluencies. In production, speakers are more likely to produce the determiner pronounced as thee than the preceding a longer pause, partly as a signal to listeners of upcoming problems (Fox Tree and Clark 1997). Moreover, speakers vary the production of the disfluency markers uh and um, and produce um with longer following delays (Clark and Fox Tree 2002). Finally, disfluencies are more common at the edges of prosodic phrases than elsewhere. The variation in disfluency markers means that a certain amount of planning has to be involved in their choice, and they must be signals to the hearer as well as symptoms of difficulty in speech. Disfluencies are also interpreted in real time by listeners, as demonstrated using the Visual World paradigm. Arnold, Tanenhaus, Altmann, and Fagnano (2004) found that the extended determiner thee N led to more early looks to a new, unmentioned item than fluent the N pronunciations. In addition, non-fluent thee also led to earlier looks to unfamiliar, hard-to-describe items (Arnold, Hudson Kam, and Tanenhaus 2007). Disfluencies have been shown to affect the online processing of sentences with temporarily ambiguous structures (Bailey and Ferreira 2003): closure ambiguities (While the man hunted the deer ran into the woods) and sentence/NP coordination ambiguities (Sandra bumped into the busboy and the waiter told her to be careful). A disfluency before or after a critical phrase (uh uh the deer vs. the deer uh uh) had different effects. The pre-phrase disfluency functioned like a prosodic boundary, favoring the interpretation where the deer is the start of the next clause, while the post-phrase disfluency favored the alternative interpretation, keeping the deer with the previous verb. Two opposing factors are involved here, as the late disfluency prolonged the time that hearers spent committed to the incorrect structure where the deer was an object, while the early disfluency helpfully implied a clause boundary at its position. Disfluencies involving errors and repairs were studied by Ferreira, Lau, and Bailey (2004) and Ferreira and Bailey (2004), as in Mary will put (pause) throw the ball, where the second verb is intended to replace the first. They show that the initial verb activates its argument structure, so the fact that put requires a PP but throw does not makes this sentence harder and slower to process than the alternative with no correction; but if throw is the initial verb replaced by put, the sentence is easier to process without a

476

mara breen and katy carlson

PP because the earliest verb processed did not require it. They suggest a mechanism of lingering activation or overlay of trees to explain these effects of repaired items on eventual interpretation or rating of sentences. These works suggest that disfluencies are not simply filtered out of the speech stream by listeners, but can influence their predictions for reference and structure during processing. The amount of work on syntactic processing of disfluencies is small at this point and so there is much still to learn in this area. Some interesting questions include how much hearers implicitly repair disfluent utterances by speakers, and how that works compared to cases where the speakers themselves overtly repair their speech; how much weight listeners give to different possible reasons for disfluency, including phonological reasons, distraction, speaker qualities, referent retrieval difficulty, and so on; and what disfluencies in production can tell us about planning processes and planning units for speech.

13.4 Implicit prosody

..........................................................................................................................

As demonstrated above, research on the auditory modality has revealed important evidence for the role of prosodic structure on both the production and perception of syntactic structure. But parallel investigations in silent reading suggest that implicit prosodic factors also influence processing and comprehension. The implicit prosody hypothesis (Fodor 2002) maintains that, even during silent reading, readers are projecting a prosodic representation onto the text which can influence syntactic processing (see also Bader 1998). This hypothesis has been explored for a variety of prosodic phenomena, and results suggest that boundaries, accents, and rhythmic patterns are realized in silent reading (Breen 2014; 2015). There is considerable evidence of implicit prosodic phrasing as demonstrated using multiple methodologies and across multiple languages. Much of this work is based on the finding that speakers are more likely to place prosodic boundaries after longer syntactic constituents than after shorter ones (Watson and Gibson 2004; Watson, Breen, and Gibson 2006; Breen et al. 2011; Ferreira 1993; Cooper and Paccia-Cooper 1980; Gee and Grosjean 1983). Evidence for the realization of implicit prosodic phrasing during silent reading comes from rating studies, interpretation studies, and ERP studies. Across multiple languages, there is evidence that the length of an ambiguously attached clause influences how it is interpreted, such that a longer relative clause is less likely to be interpreted as modifying a preceding noun phrase than a short relative clause. This is presumably because readers are more likely to insert an implicit phrase boundary before a long relative clause, which blocks syntactic attachment to the preceding clause: See evidence in Japanese (Kitagawa and Fodor 2006), English (Quinn, Abdelghany, and Fodor 2000; Swets, Desmet, Hambrick, and Ferreira 2007), German (Augurzky 2006), Croatian (Lovrić 2003), Hindi (Vasishth, Agnihotri, Fernández, and Bhatt 2005), Dutch (Wijnen 2004), French (Pynte and Colonna 2000; Hemforth, Colonna, Petrone, and

investigating auditory syntactic processing

477

D’Imperio 2013), and Korean (Hwang and Schafer 2009), among others (Jun 2003). Relatedly, readers prefer syntactic interpretations consistent with the predicted overt prosodic phrasing (Harris, Jun, and Royer 2016), In addition, ERP studies demonstrate a similar waveform for prosodic boundaries as for commas in written language (the Closure Positive Shift; Steinhauer 2003), which varies in silent reading experiments in a similar way to how it varies in overt prosodic contexts (Hwang and Steinhauer 2011; Liu, Wang, and Jin 2010), suggesting a similar underlying psychological process. Evidence for the role of implicit accent representation comes first from the demonstration that words with four syllables and two accents are read more slowly than words with four syllables and one accent (Ashby and Clifton 2005). It is also supported by demonstrations that mismatches between expected and “perceived” accents during silent reading disrupt processing. For example, a syntactic reanalysis that forces a concurrent reanalysis of accent structure causes readers to slow down (see Breen and Clifton 2011; 2013 for English; Bader 1998; Stolterfoht et al. 2007; Kentner 2012; Kentner and Vasishth 2016 for German). In addition, words in font emphasis (italics or CAPS) are better remembered than non-emphasized words, similar to effects in overt accentuation (Fraundorf et al. 2013). What these studies suggest is that even studies in the reading domain must take into account possible influences of prosodic factors on participant sentence-reading behavior, rating, and interpretation.

13.4.1 Implicit prosody and acceptability judgment experiments One notable trend in experimental syntax is the move toward acceptability judgment studies of sentences with particular syntactic structures, as a more sensitive measure than the traditional intuitive grammaticality judgments used by syntacticians. For example, Sprouse and colleagues have conducted a number of studies on island violations, comparing them to sentences with similar non-island long-distance dependencies as well as sentences with islands but no movement out of them (e.g. Sprouse 2007; Sprouse, Wagers, and Phillips 2012; Kush, Lohndal, and Sprouse 2018). This empirical work to clarify what sentences should count as ungrammatical, and thus what syntactic theories should rule out, is very useful to the field. However, as the previous section suggests, syntax researchers also need to consider the possible role of implicit prosody in these judgments, as patterns of accents or phrasing, even when realized implicitly, can affect interpretation and felicity of both ambiguous and unambiguous sentences. For example, one notoriously hard-to-process structure, sentences with multiple center-embedded relative clauses, has been found to be greatly improved by a particular combination of phrasing and constituent weight (Fodor, Nickels, and Schott 2018). Syntax researchers can account for potential implicit prosodic effects by including auditory acceptability judgment studies where the intended prosodification is made explicit to the participant. Another approach is to collect participant productions of experimental sentences in addition to silent ratings, to ensure that the contours they produce are consistent with experimenter predictions.

478

mara breen and katy carlson

13.5 Conclusion

..........................................................................................................................

The field of experimental syntax has much to gain from the inclusion of studies in the auditory modality. However, studies of auditory production and perception introduce new theoretical questions and methodological challenges for researchers, as described in Sections 13.1 and 13.2. The theoretical questions revolve around how similar prosodic structures are to syntactic ones, and to what extent prosodic structures are encoded in the grammar as opposed to the result of interaction between grammatical structures and constraints on production. The methodological issues arise from the conflict between creating experimental materials that are natural but also consistent for purposes of analysis, and from the conflict between quantifying prosodic information according to phonological categories using an annotation scheme or using continuous acoustic measurements. We have summarized these challenges and provided suggestions for addressing them. In Section 13.3, we summarized the major findings of research on the impact of prosodic boundaries and pitch accents on sentence processing. It has been clear for a long time that prosodic phrasing and prosodic boundaries influence syntactic phrasing and thus syntactic structure, with correspondence of prosodic and syntactic boundaries facilitating comprehension. It was less clear at the outset of prosody research how accents might relate to syntax, and so it took some time for accent research to be launched. Because accents relate to the overall information structure of a discourse, studies of the influence of accents may tend to look beyond single sentences and into effects of context, whether narrowly or broadly defined. This is all necessary work, since a theory of prosody’s impact on syntax that only included prosodic boundaries or only covered single sentences would be very incomplete. There is much yet to do in studying how entire prosodic contours, including accents and boundaries, interact with syntactic constraints and decisions in processing. In Section 13.4, we discussed how research on implicit prosody suggests that prosodic processes are engaged even during silent reading, meaning that the challenges discussed above should also be considered for written studies. Indeed, rather than adding a completely new factor to syntactic research, explicit auditory research allows for better control of the prosody which is already present in any linguistic experimentation. Overall, we hope that the experimental syntax researcher will be challenged by this summary to consider the prosodic domain and begin studying its impact on the syntactic structures of interest.

References Allbritton, David W., Gail McKoon, and Roger Ratcliff. 1996. Reliability of prosodic cues for resolving syntactic ambiguity. Journal of Experimental Psychology. Learning, Memory, and Cognition 22(3): 714–735.

investigating auditory syntactic processing

479

Anderson, Catherine, and Katy Carlson. 2010. Syntactic structure guides prosody in temporarily ambiguous sentences. Language and Speech 53(4): 472–493. https://doi.org/ 10.1177/0023830910372497 Anttila, Arto. 2016. Phonological effects on syntactic variation. Annual Review of Linguistics 2(1): 115–137. https://doi.org/10.1146/annurev-linguistics-011415-040845 Anttila, Arto, Matthew Adams, and Michael Speriosu. 2010. The role of prosody in the English dative alternation. Language and Cognitive Processes 25(7–9): 946–981. https://doi.org/ 10.1080/01690960903525481 Arnold, Jennifer E. 2008. THE BACON not the bacon: How children and adults understand accented and unaccented noun phrases. Cognition 108(1): 69–99. https://doi.org/10.1016/j.cognition.2008.01.001 Arnold, Jennifer E., Carla L. Hudson Kam, and Michael K. Tanenhaus. 2007. If you say thee uh you are describing something hard: The on-line attribution of disfluency during reference comprehension. Journal of Experimental Psychology: Learning, Memory, and Cognition 33(5): 914–930. https://doi.org/10.1037/0278-7393.33.5.914 Arnold, Jennifer E., Michael K. Tanenhaus, Rebecca J. Altmann, and Maria Fagnano. 2004. The old and thee, uh, new: Disfluency and reference resolution. Psychological Science 15(9): 578–582. https://doi.org/10.1111/j.0956-7976.2004.00723.x Arvaniti, Amalia, D. Robert Ladd, and Ineke Mennen. 1998. Stability of tonal alignment: The case of Greek prenuclear accents. Journal of Phonetics 26(1): 3–25. https://doi.org/ 10.1006/jpho.1997.0063 Ashby, Jane, and Charles Clifton, Jr. 2005. The prosodic property of lexical stress affects eye movements during silent reading. Cognition 96(3): B89–B100. https://doi.org/ 10.1016/j.cognition.2004.12.006 Augurzky, Petra. 2006. Attaching relative clauses in German: The role of implicit and explicit prosody in sentence processing. PhD dissertation, Max Planck Institute, Leipzig. Retrieved June 30, 2017, from http://pubman.mpdl.mpg.de/pubman/faces/ viewItemOverviewPage.jsp?itemId=escidoc:720134 Bader, Markus. 1998. Prosodic influences on reading syntactically ambiguous sentences. In J. D. Fodor and F. Ferreira (eds), Reanalysis in sentence processing, 1–46. Dordrecht: Springer. https://doi.org/10.1007/978-94-015-9070-9_1 Bailey, Karl G. D., and Fernanda Ferreira. 2003. Disfluencies affect the parsing of garden-path sentences. Journal of Memory and Language 49(2): 183–200. https://doi.org/10.1016/S0749596X(03)00027-5 Balogh, Jennifer E. 2003. Pronouns, prosody, and the discourse anaphora weighting approach. PhD dissertation, University of California, San Diego. http://elibrary. ru/item.asp?id=8839920 Barnes, Jonathan, Nanette Veilleux, Alejna Brugos, and Stefanie Shattuck-Hufnagel. 2012. Tonal center of gravity: A global approach to tonal implementation in a level-based intonational phonology. Laboratory Phonology 3(2): 337–383. Bartels, Christine, and John Kingston. 1994. Salient pitch cues in the perception of contrastive focus. In P. Bosch and R. van der Sandt (eds), Focus and natural language processing, vol. 3, 1–10. Heidelberg: IBM Deutschland. Beach, Cheryl M. 1991. The interpretation of prosodic patterns at points of syntactic structure ambiguity: Evidence for cue trading relations. Journal of Memory and Language 30(6): 644– 663. https://doi.org/10.1016/0749-596X(91)90030-N

480

mara breen and katy carlson

Beaver, David, Brady Zack Clark, Edward Flemming, T. Florian Jaeger, and Maria Wolters. 2007. When semantics meets phonetics: Acoustical studies of second-occurrence focus. Language 83(2): 245–276. https://doi.org/10.1353/lan.2007.0053 Beckman, Mary, and Gayle Ayers Elam. 1997. Guidelines for ToBI labeling, version 3. Ohio State University. Beckman, Mary, Julia Hirschberg, and Stefanie Shattuck-Hufnagel. 2005. The original ToBI system and the evolution of the ToBI framework. In S.-A. Jun (ed.), Prosodic typology: The phonology of intonation and phrasing, 9–54. Oxford: Oxford University Press. Beckman, Mary E., and Janet B. Pierrehumbert. 1986. Japanese prosodic phrasing and intonation synthesis. In Proceedings of the 24th Annual Meeting on Association for Computational Linguistics, 173–180. Stroudsburg, PA: Association for Computational Linguistics. https://doi.org/10.3115/981131.981156 Birch, Stacy, and Charles Clifton. 1995. Focus, accent, and argument structure: Effects on language comprehension. Language and Speech 38(4): 365–391. https://doi.org/ 10.1177/002383099503800403 Birch, Stacy, and Charles Clifton. 2002. Effects of varying focus and accenting of adjuncts on the comprehension of utterances. Journal of Memory and Language 47(4): 571–588. https://doi.org/10.1016/S0749-596X(02)00018-9 Bock, J. Kathryn, and Joanne R. Mazzella. 1983. Intonational marking of given and new information: Some consequences for comprehension. Memory and Cognition 11(1): 64–76. https://doi.org/10.3758/BF03197663 Braun, Bettina, and Lara Tagliapietra. 2010. The role of contrastive intonation contours in the retrieval of contextual alternatives. Language and Cognitive Processes 25(7–9): 1024–1043. https://doi.org/10.1080/01690960903036836 Breen, Mara. 2014. Empirical investigations of the role of implicit prosody in sentence processing. Language and Linguistics Compass 8(2): 37–50. https://doi.org/10.1111/lnc3.12061 Breen, Mara. 2015. Empirical investigations of implicit prosody. In L. Frazier and E. Gibson (eds), Explicit and implicit prosody in sentence processing, 177–192. Berlin: Springer. https://doi.org/10.1007/978-3-319-12961-7_10 Breen, Mara, and Charles Clifton, Jr. 2011. Stress matters: Effects of anticipated lexical stress on silent reading. Journal of Memory and Language 64(2): 153–170. https://doi.org/10.1016/j.jml.2010.11.001 Breen, Mara, and Charles Clifton, Jr. 2013. Stress matters revisited: A boundary change experiment. Quarterly Journal of Experimental Psychology 66(10): 1896–1909. https://doi.org/10.1080/17470218.2013.766899 Breen, Mara, Laura C. Dilley, John Kraemer, and Edward Gibson. 2012. Inter-transcriber reliability for two systems of prosodic annotation: ToBI (Tones and Break Indices) and RaP (Rhythm and Pitch). Corpus Linguistics and Linguistic Theory 8(2): 277–312. https://doi.org/10.1515/cllt-2012-0011 Breen, Mara, Evelina Fedorenko, Michael Wagner, and Edward Gibson. 2010. Acoustic correlates of information structure. Language and Cognitive Processes 25(7–9): 1044–1098. https://doi.org/10.1080/01690965.2010.504378 Breen, Mara, Duane G. Watson, and Edward Gibson. 2011. Intonational phrasing is constrained by meaning, not balance. Language and Cognitive Processes 26(10): 1532–1562. https://doi.org/10.1080/01690965.2010.508878

investigating auditory syntactic processing

481

Brown-Schmidt, Sarah, Donna K. Byron, and Michael K. Tanenhaus. 2005. Beyond salience: Interpretation of personal and demonstrative pronouns. Journal of Memory and Language 53(2): 292–313. https://doi.org/10.1016/j.jml.2005.03.003 Bruce, G. 1977. Swedish word accents in sentence perspective. Malmø: LiberLäromedel/Gleerup. Buxó-Lugo, Andres, and Duane G. Watson. 2016. Evidence for the influence of syntax on prosodic parsing. Journal of Memory and Language 90: 1–13. https://doi.org/ 10.1016/j.jml.2016.03.001 Carlson, Katy. 2001. The effects of parallelism and prosody in the processing of gapping structures. Language and Speech 44(1): 1–26. Carlson, Katy, Charles Clifton, Jr., and Lyn Frazier. 2001. Prosodic boundaries in adjunct attachment. Journal of Memory and Language 45: 58–81. Carlson, Katy, Lyn Frazier, and Charles Clifton, Jr. 2009. How prosody constrains comprehension: A limited effect of prosodic packaging. Lingua 119(7): 1066–1082. https://doi.org/10.1016/j.lingua.2008.11.003 Carlson, Katy, and David Potter. 2022. Focus attracts attachment. Language and Speech 65: 491–512. https://doi.org/10.1177/00238309211033321 Carlson, Katy, and Joseph C. Tyler. 2017. Accents, not just prosodic boundaries, influence syntactic attachment. Language and Speech 61: 246–276. https://doi.org/ 10.1177/0023830917712282 Cho, Taehong. 2002. The effects of prosody on articulation in English. Brighton: Psychology Press. Chomsky, N., and M. Halle. 1968. The sound pattern of English. Cambridge, MA: MIT Press. Clark, Herbert H., and Jean E. Fox Tree. 2002. Using uh and um in spontaneous speaking. Cognition 84(1): 73–111. https://doi.org/10.1016/S0010-0277(02)00017-3 Clifton, Charles, Jr., Katy Carlson, and Lyn Frazier. 2002. Informative prosodic boundaries. Language and Speech 45(2): 87–114. https://doi.org/10.1177/00238309020450020101 Cole, Jennifer. 2015. Prosody in context: a review. Language, Cognition and Neuroscience 30(1– 2): 1–31. https://doi.org/10.1080/23273798.2014.963130 Cole, Jennifer, Yoonsook Mo, and Soondo Baek. 2010. The role of syntactic structure in guiding prosody perception with ordinary listeners and everyday speech. Language and Cognitive Processes 25(7–9): 1141–1177. https://doi.org/10.1080/01690960903525507 Cole, Jennifer, Timothy Mahrt, and Joseph Roy. 2017. Crowd-sourcing prosodic annotation. Computer Speech and Language 45: 300–325. Cooper, William E., Stephen J. Eady, and Pamela R. Mueller. 1985. Acoustical aspects of contrastive stress in question–answer contexts. Journal of the Acoustical Society of America 77(6): 2142–2156. https://doi.org/10.1121/1.392372 Cooper, William E., and Jeanne Paccia-Cooper. 1980. Syntax and speech. Cambridge, MA: Harvard University Press. Cutler, Anne. 1976. Phoneme-monitoring reaction time as a function of preceding intonation contour. Perception and Psychophysics 20(1): 55–60. https://doi.org/10.3758/BF03198706 Cutler, Anne, Delphine Dahan, and Wilma van Donselaar. 1997. Prosody in the comprehension of spoken language: A literature review. Language and Speech 40(2): 141–201. https://doi.org/10.1177/002383099704000203 Cutler, Anne, and Jerry A. Fodor. 1979. Semantic focus and sentence comprehension. Cognition 7(1): 49–59. https://doi.org/10.1016/0010-0277(79)90010-6 Cutler, Anne, and Donald J. Foss. 1977. On the role of sentence stress in sentence processing. Language and Speech 20(1): 1–10. https://doi.org/10.1177/002383097702000101

482

mara breen and katy carlson

Cutler, Anne, and Dennis Norris. 1988. The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance 14(1): 113– 121. https://doi.org/10.1037/0096-1523.14.1.113 Dahan, Delphine, Michael K. Tanenhaus, and Craig G. Chambers. 2002. Accent and reference resolution in spoken-language comprehension. Journal of Memory and Language 47(2): 292–314. https://doi.org/10.1016/S0749-596X(02)00001-3 Dainora, Audra. 2001. An empirically based probabilistic model of intonation in American English. PhD dissertation, University of Chicago. Dennison, Heeyeon Y., and Amy J. Schafer. 2010. Online construction of implicature through contrastive prosody. In Proceedings of Speech Prosody 2010: 1–4. Retrieved from https://www.isca-speech.org/archive_v0/sp2010/papers/sp10_338.pdf. Dilley, Laura C. 2005. The phonetics and phonology of tonal systems. Thesis, Massachusetts Institute of Technology. Retrieved from http://dspace.mit.edu/handle/ 1721.1/30274 Dilley, Laura C. 2007. The role of F0 alignment in distinguishing categories in American English intonation. Journal of Phonetics 35: 523–551. Dilley, Laura, and Meredith Brown. 2005. The RaP (Rhythm and Pitch) Labeling System, Version 1.0: Available at http://tedlab.mit.edu/rap.html. Domahs, Ulrike, Richard Wiese, Ina Bornkessel-Schlesewsky, and Matthias Schlesewsky. 2008. The processing of German word stress: Evidence for the prosodic hierarchy. Phonology 25(1): 1–36. Eady, Stephen J., and William E. Cooper. 1986. Speech intonation and focus location in matched statements and questions. Journal of the Acoustical Society of America 80(2): 402–415. https://doi.org/10.1121/1.394091 Elfner, Emily. 2018. The syntax–prosody interface: Current theoretical approaches and outstanding questions. Linguistics Vanguard 4(1): 1–14. Ferreira, Fernanda. 1993. Creation of prosody during sentence production. Psychological Review 100(2): 233. Ferreira, Fernanda. 2007. Prosody and performance in language production. Language and Cognitive Processes 22(8): 1151–1177. Ferreira, Fernanda, and Karl G. D. Bailey. 2004. Disfluencies and human language comprehension. Trends in Cognitive Sciences 8(5): 231–237. https://doi.org/10.1016/j.tics.2004.03.011 Ferreira, Fernanda, Ellen F. Lau, and Karl G. D. Bailey. 2004. Disfluencies, language comprehension, and tree adjoining grammars. Cognitive Science 28: 721–749. Féry, Caroline. 2015. Extraposition and prosodic monsters in German. In L. Frazier and E. Gibson (eds), Explicit and implicit prosody in sentence processing, 133–158. New York: Springer International. https://doi.org/10.1007/978-3-319-12961-7_8 Féry, Caroline. 2016. Intonation and prosodic structure. Cambridge: Cambridge University Press. Féry, Caroline, and Hubert Truckenbrodt. 2005. Sisterhood and tonal scaling. Studia Linguistica 59(2–3): 223–243. Fodor, Janet Dean. 1998. Learning to parse? Journal of Psycholinguistic Research 27(2): 285– 319. https://doi.org/10.1023/A:1023258301588 Fodor, Janet Dean. 2002. Psycholinguistics cannot escape prosody. In Speech Prosody 2002, International Conference: 83-90. Retrieved from http://www.iscaspeech.org/archive_open/sp2002/sp02_083.pdf

investigating auditory syntactic processing

483

Fodor, Janet Dean, Stefanie Nickels, and Esther Schott. 2018. Center-embedded sentences: What’s pronounceable is comprehensible. In R. G. de Almeida and L. R. Gleitman (eds), On concepts, modules, and language: Cognitive science at its core, 139–168. New York: Oxford University Press. Fougeron, Cecile, and Patricia A. Keating. 1997. Articulatory strengthening at edges of prosodic domains. Journal of the Acoustical Society of America 101(6): 3728–3740. https://doi.org/10.1121/1.418332 Fox Tree, Jean E., and H. H. Clark. 1997. Pronouncing “the” as “thee” to signal problems in speaking. Cognition 62(2): 151–167. https://doi.org/10.1016/S0010-0277(96)00781-0 Fox Tree, Jean E., and Paul J. A. Meijer. 2000. Untrained speakers’ use of prosody in syntactic disambiguation and listeners’ interpretations. Psychological Research 63(1): 1–13. Fraundorf, Scott H., Aaron S. Benjamin, and Duane G. Watson. 2013. What happened (and what did not): Discourse constraints on encoding of plausible alternatives. Journal of Memory and Language 69(3): 196–227. https://doi.org/10.1016/j.jml.2013.06.003 Fraundorf, Scott H., Duane G. Watson, and Aaron S. Benjamin. 2010. Recognition memory reveals just how CONTRASTIVE contrastive accenting really is. Journal of Memory and Language 63(3): 367–386. https://doi.org/10.1016/j.jml.2010.06.004 Frazier, Lyn, and Charles Clifton, Jr. 1998. Comprehension of sluiced constituents. Language and Cognitive Processes 13: 499–520. Frazier, Lyn, Charles Clifton, Jr., and Katy Carlson. 2007. Focus and VP ellipsis. Language and Speech 50(1): 1–21. Fry, D. B. 1955. Duration and intensity as physical correlates of linguistic stress. Journal of the Acoustical Society of America 27(4): 765–768. https://doi.org/10.1121/1.1908022 Gee, James Paul, and François Grosjean. 1983. Performance structures: A psycholinguistic and linguistic appraisal. Cognitive Psychology 15(4): 411–458. https://doi.org/10.1016/00100285(83)90014-2 Gorman, Kyle, Jonathan Howell, and Michael Wagner. 2011. Prosodylab-aligner: A tool for forced alignment of laboratory speech. Canadian Acoustics 39(3): 192–193. Greenberg, Steven, Hannah Carvey, and Leah Hitchcock. 2002. The relationship between stress accent and pronunciation variation in spontaneous American English discourse. In Proceedings of the ISCA Workshop on Prosody and Speech Processing, 56–61. Retrieved from https://www.isca-speech.org/archive_v0/sp2002/papers/sp02_351.pdf. Gussenhoven, Carlos. 1983. Focus, mode and the nucleus. Journal of Linguistics 19(2): 377– 417. https://doi.org/10.1017/S0022226700007799 Gussenhoven, Carlos. 1999. Discreteness and gradience in intonational contrasts. Language and Speech 42(2–3): 283–305. https://doi.org/10.1177/00238309990420020701 Gussenhoven, Carlos. 2005. Transcription of Dutch intonation. In S.-A. Jun (ed.), Prosodic typology: The phonology of intonation and phrasing, 118–145. Oxford: Oxford University Press. Harris, Jesse A., and Katy Carlson. 2016. Keep it local (and final): Remnant preferences in “let alone” ellipsis. Quarterly Journal of Experimental Psychology 69(7): 1278–1301. https://doi.org/10.1080/17470218.2015.1062526 Harris, Jesse A., Sun-Ah Jun, and Adam J. Royer. 2016. Implicit prosody pulls its weight: Recovery from garden path sentences. In Proceedings of Speech Prosody 2016, 207–211. doi: 10.21437/SpeechProsody.2016-43 Hayes, Bruce. 1989. The prosodic hierarchy in meter. Phonetics and Phonology 1: 201–260.

484

mara breen and katy carlson

Hayes, Bruce. 1995. Metrical stress theory: Principles and case studies. Chicago: University of Chicago Press. Hemforth, Barbara, Saveria Colonna, Caterina Petrone, and Mariapaola D’Imperio. 2013. Length matters: Informational load in ambiguity resolution. Discours: Revue de linguistique, psycholinguistique et informatique/Journal of Linguistics, Psycholinguistics and Computational Linguistics (12). https://doi.org/10.4000/discours.8780 Hirsch, Aron, and Michael Wagner. 2015. Rightward movement affects prosodic phrasing. In D. Ozyildiz and T. Bui (eds), Proceedings of the 45th Meeting of the North-East Linguistic Society (NELS). Retrieved from http://prosodylab.org/∼chael/ papers/Hirsch_Wagner_NELS45.pdf. Husband, E. Matthew, and Fernanda Ferreira. 2016. The role of selection in the comprehension of focus alternatives. Language, Cognition and Neuroscience 31(2): 217–235. https://doi.org/10.1080/23273798.2015.1083113 Hwang, Hyekyung, and Amy J. Schafer. 2009. Constituent length affects prosody and processing for a dative NP ambiguity in Korean. Journal of Psycholinguistic Research 38(2): 151. https://doi.org/10.1007/s10936-008-9091-1 Hwang, Hyekyung, and Karsten Steinhauer. 2011. Phrase length matters: The interplay between implicit prosody and syntax in Korean “garden path” sentences. Journal of Cognitive Neuroscience 23(11): 3555–3575. https://doi.org/10.1162/jocn_a_00001 Ito, Kiwako, and Shari R. Speer. 2008. Anticipatory effects of intonation: Eye movements during instructed visual search. Journal of Memory and Language 58: 541–573. Ito, Kiwako, Shari R. Speer, and Mary E. Beckman. 2004. Informational status and pitch accent distribution in spontaneous dialogues in English. In Speech Prosody 2004. Retrieved from http://www.isca-speech.org/archive_open/sp2004/sp04_279.pdf Jun, Sun-Ah. 1993. The phonetics and phonology of Korean prosody. Ohio State University. Retrieved from https://etd.ohiolink.edu/pg_10?0::NO:10:P10_ACCESSION_ NUM:osu1220465077 Jun, Sun-Ah. 2003. Prosodic phrasing and attachment preferences. Journal of Psycholinguistic Research 32(2): 219–249. Jun, Sun-Ah. 2006. Prosodic typology: The phonology of intonation and phrasing. Oxford: Oxford University Press. Jun, Sun-Ah. 2014. Prosodic typology: By prominence type, word prosody, and macrorhythm. In Prosodic typology II: The phonology of intonation and phrasing, 520–540. Oxford: Oxford University Press. Keating, Patrica, Taehong Cho, Cecile Fougeron, and Chai-Shune Hsu. 2003. Domain-initial strengthening in four languages. In Papers in laboratory phonology VI: Phonetic interpretations, 145–163. Cambridge: Cambridge University Press. Kentner, Gerrit. 2012. Linguistic rhythm guides parsing decisions in written sentence comprehension. Cognition 123(1): 1–20. https://doi.org/10.1016/j.cognition.2011.11.012 Kentner, Gerrit, and Shravan Vasishth. 2016. Prosodic focus marking in silent reading: Effects of discourse context and rhythm. Frontiers in Psychology 7. https://doi.org/10.3389/fpsyg.2016.00319 Kim, Heejin, Tae-Jin Yoon, Jennifer Cole, and Mark Hasegawa-Johnson. 2006. Acoustic differentiation of L- and L-L% in switchboard and radio news speech. In Speech Prosody 2006. Retrieved from http://www.isle.illinois.edu/sst/pubs/2006/kim06sp.pdf Kiparsky, Paul. 1982. Word-formation and the lexicon. Retrieved from https:// kuscholarworks.ku.edu/handle/1808/20827

investigating auditory syntactic processing

485

Kitagawa, Yoshihisa, and Janet Dean Fodor. 2006. Prosodic influences on syntactic judgments. In Gisbert Fanselow et al. (eds), Gradience in grammar: Generative perspectives, 336–358. Oxford: Oxford University Press. Kjelgaard, Margaret M., and Shari R. Speer. 1999. Prosodic facilitation and interference in the resolution of temporary syntactic closure ambiguity. Journal of Memory and Language 40(2): 153–194. https://doi.org/10.1006/jmla.1998.2620 Klatt, D. H. 1975. Vowel lengthening is syntactically determined in a connected discourse. Journal of Phonetics 3(3): 129–140. Kraljic, Tanya, and Susan E. Brennan. 2005. Prosodic disambiguation of syntactic structure: For the speaker or for the addressee? Cognitive Psychology 50(2): 194–231. https://doi.org/10.1016/j.cogpsych.2004.08.002 Kurumada, Chigusa, Meredith Brown, Sarah Bibyk, Daniel F. Pontillo, and Michael K. Tanenhaus. 2014. Is it or isn’t it? Listeners make rapid use of prosody to infer speaker meanings. Cognition 133(2): 335–342. https://doi.org/10.1016/j.cognition.2014.05.017 Kush, Dave, Terje Lohndal, and Jon Sprouse. 2018. Investigating variation in island effects: A case study of Norwegian wh-extraction. Natural Language and Linguistic Theory 36: 743– 779. Ladd, D. Robert. 1986. Intonational phrasing: The case for recursive prosodic structure. Phonology 3: 311–340. https://doi.org/10.1017/S0952675700000671 Ladd, D. Robert. 2008. Intonational phonology, 2nd edn. Cambridge: Cambridge University Press. Ladd, D. Robert, and Nick Campbell. 1991. Theories of prosodic structure: evidence from syllable duration. In Proceedings of the 12th International Congress of Phonetic Sciences, vol. 2, 290–293. Retrieved from http://www.speech-data.jp/nick/feast/proceeding/ICPhS90.pdf Ladd, D. Robert, and Rachel Morton. 1997. The perception of intonational emphasis: continuous or categorical? Journal of Phonetics 25(3): 313–342. https://doi.org/ 10.1006/jpho.1997.0046 Ladd, D. Robert, and Astrid Schepman. 2003. “Sagging transitions” between high pitch accents in English: Experimental evidence. ScienceDirect. Retrieved 16 June 2017, from http://www.sciencedirect.com/science/article/pii/S0095447002000736 Lee, Eun-Kyung, and Duane G. Watson. 2011. Effects of pitch accents in attachment ambiguity resolution. Language and Cognitive Processes 26(2): 262–297. https://doi.org/ 10.1080/01690965.2010.491650 Lehiste, Ilse. 1973. Phonetic disambiguation of syntactic ambiguity. Journal of the Acoustical Society of America 53(1): 380–380. https://doi.org/10.1121/1.1982702 Lehiste, Ilse, Joseph P. Olive, and Lynn A. Streeter. 1976. Role of duration in disambiguating syntactically ambiguous sentences. Journal of the Acoustical Society of America 60(5): 1199– 1202. https://doi.org/10.1121/1.381180 Liberman, Mark, and Janet Pierrehumbert. 1984. Intonational invariance under changes in pitch range and length. In M. Aronoff and R. Oehrle (eds), Language sound structure, 157– 233. Cambridge MA: MIT Press. Lieberman, Philip. 1960. Some acoustic correlates of word stress in American English. Journal of the Acoustical Society of America 32(4): 451–454. https://doi.org/10.1121/1.1908095 Li, W., and Yufang Yang. 2009. Perception of prosodic hierarchical boundaries in Mandarin Chinese. Neuroscience 158(4): 1416–1425. Liu, Baolin, Zhongning Wang, and Zhixing Jin. 2010. The effects of punctuations in Chinese sentence comprehension: An ERP study. Journal of Neurolinguistics 23(1): 66–80.

486

mara breen and katy carlson

Lovrić, Nenad. 2003. Implicit prosody in silent reading: Relative clause attachment in Croatian. PhD dissertation, CUNY Graduate Center, New York. Martin, James G. 1970. On judging pauses in spontaneous speech. Journal of Verbal Learning and Verbal Behavior 9(1): 75–78. https://doi.org/10.1016/S0022-5371(70)80010-X McCawley, James D. 1968. The role of semantics in a grammar. In E. Bach and R. Harms (eds), Universals in linguistic theory, 125–170. New York: Holt, Rinehart, and Winston. Merchant, Jason. 2001. The syntax of silence: Sluicing, islands, and the theory of ellipsis. Oxford: Oxford University Press. Mo, Yoonsook, Jennifer Cole, and Eun Kyu Lee. 2008. Naïve listeners’ prominence and boundary perception. In Proceedings of the 4th International Conference on Speech Prosody, 735–738. https://yonsei.pure.elsevier.com/en/publications/naïve-listenersprominence-and-boundary-perception Nagel, H. Nicholas, Lewis P. Shapiro, Betty Tuller, and Rebecca Nawy. 1996. Prosodic influences on the resolution of temporary ambiguity during on-line sentence processing. Journal of Psycholinguistic Research 25(2): 319–344. https://doi.org/10.1007/BF01708576 Nazzi, Thierry, and Franck Ramus. 2003. Perception and acquisition of linguistic rhythm by infants. Speech Communication 41(1): 233–243. https://doi.org/10.1016/S01676393(02)00106-1 Nespor, Marina, and Irene Vogel. 1986. Prosodic phonology. Dordrecht: Foris. Nespor, Marina, and Irene Vogel. 2007. Prosodic phonology: with a new foreword. Berlin: de Gruyter. Niebuhr, Oliver. 2007. The signalling of German rising–falling intonation categories: The interplay of synchronization, shape, and height. Phonetica 64(2–3): 174–193. https://doi.org/10.1159/000107915 Nooteboom, S. G., and J. G. Kruyt. 1987. Accents, focus distribution, and the perceived distribution of given and new information: An experiment. Journal of the Acoustical Society of America 82(5): 1512–1524. https://doi.org/10.1121/1.395195 Pell, Marc D. 2001. Influence of emotion and focus location on prosody in matched statements and questions. Journal of the Acoustical Society of America 109(4): 1668–1680. https://doi.org/10.1121/1.1352088 Pierrehumbert, Janet B. 1980. The phonetics and phonology of English intonation. PhD dissertation, Massachusetts Institute of Technology. Pierrehumbert, Janet, and Mary Beckman. 1988. Japanese tone structure. Linguistic Inquiry Monographs 15: 1–282. Pierrehumbert, Janet, and Julia B. Hirschberg. 1990. The meaning of intonational contours in the interpretation of discourse. In P. Chen, J. Morgan, and M. Pollack (eds), Intentions in communication, 271–311. Cambridge, MA: Bradford Books. Poschmann, Claudia, and Michael Wagner. 2016. Relative clause extraposition and prosody in German. Natural Language and Linguistic Theory 34(3): 1021–1066. https://doi.org/10.1007/s11049-015-9314-8 Price, P. J., M. Ostendorf, S. Shattuck-Hufnagel, and C. Fong. 1991. The use of prosody in syntactic disambiguation. Journal of the Acoustical Society of America 90(6): 2956–2970. Pynte, Joel, and Saveria Colonna. 2000. Decoupling syntactic parsing from visual inspection: The case of relative clause attachment in French. In A. Kennedy et al. (eds), Reading as a perceptual process, 529–547. Amsterdam: North-Holland. Quinn, Deirdre, Hala Abdelghany, and Janet Dean Fodor. 2000. More evidence of implicit prosody in silent reading: French, English and Arabic relative clauses. Poster presented at

investigating auditory syntactic processing

487

the 13th Annual CUNY Conference, La Jolla, CA. https://www.degruyter.com/document/ doi/10.1515/9783110207576.1.143/pdf Rooth, Mats. 1992. A theory of focus interpretation. Natural Language Semantics 1(1): 75–116. https://doi.org/10.1007/BF02342617 Rump, H. H., and R. Collier. 1996. Focus conditions and the prominence of pitch-accented syllables. Language and Speech 39(1): 1–17. https://doi.org/10.1177/002383099603900101 Sag, Ivan A. 1980. Deletion and logical form. New York: Garland. Schafer, Amy J. 1997. Prosodic parsing: The role of prosody in sentence comprehension. PhD dissertation, University of Massachusetts Amherst. Schafer, Amy J., Julie Carter, Charles Clifton, Jr., and Lyn Frazier. 1996. Focus in relative clause construal. Language and Cognitive Processes 11: 135–163. doi:10.1080/016909696387240 Schafer, Amy J., Shari R. Speer, Paul Warren, P., and S. David White. 2000. Intonational disambiguation in sentence production and comprehension. Journal of Psycholinguistic Research 29(2): 169–182. https://doi.org/10.1023/A:1005192911512 Schlüter, Julia. 2005. Rhythmic grammar: The influence of rhythm on grammatical variation and change in English. Berlin: de Gruyter. Schwarzschild, Roger. 1999. Givenness, Avoid F and other constraints on the placement of accent. Natural Language Semantics 7: 141–177. Scott, D. R. 1982. Duration as a cue to the perception of a phrase boundary. Journal of the Acoustical Society of America 71(4): 996–1007. https://doi.org/10.1121/1.387581 Sedivy, Julie C., Michael K. Tanenhaus, Craig G. Chambers, and Gregory N. Carlson. 1999. Achieving incremental semantic interpretation through contextual representation. Cognition 71(2): 109–147. https://doi.org/10.1016/S0010-0277(99)00025-6 Selkirk, Elisabeth. 1978. The French foot: On the status of mute e. Studies in French Linguistics 1(2): 141–150. Selkirk, Elisabeth O. 1980. The role of prosodic categories in English word stress. Linguistic Inquiry 11(3): 563–605. Selkirk, Elisabeth O. 1981. On the nature of phonological representation. Advances in Psychology 7: 379–388. https://doi.org/10.1016/S0166-4115(08)60213-7 Selkirk, Elisabeth O. 1984. Phonology and syntax: The relation between sound and structure. Cambridge, MA: MIT Press. Selkirk, Elisabeth O. 1995. Sentence prosody: Intonation, stress, and phrasing. In J. Goldsmith (ed.), Handbook of phonological theory, 550–569. Oxford: Blackwell. Selkirk, Elisabeth. 1996. The prosodic structure of function words. In James L. Morgan and Katherine Demuth (eds), Signal to syntax: Prosodic bootstrapping from speech to grammar in early acquisition, 187–214. Mahwah, NJ: Erlbaum. Selkirk, Elisabeth O. 2000. The interaction of constraints on prosodic phrasing. In M. Horne (ed.), Prosody: Theory and experiment, 231–261. Berlin: Springer. https://doi.org/10.1007/978-94-015-9413-4_9 Selkirk, Elisabeth. 2002. Contrastive FOCUS vs. presentational focus: Prosodic evidence from right node raising in English. In Speech prosody 2002, International conference, Aix-enProvence, 643–646. Selkirk, Elisabeth. 2009. On clause and intonational phrase in Japanese: The syntactic grounding of prosodic constituent structure. Gengo Kenkyu 136: 35–73. Selkirk, Elisabeth. 2011. The syntax–phonology interface. In John Goldsmith, Jason Riggle, and Alan C. L. Yu (eds), The handbook of phonological theory, 435–483. Oxford: Wiley.

488

mara breen and katy carlson

Shattuck-Hufnagel, Stefanie, and Alice E. Turk. 1996. A prosody tutorial for investigators of auditory sentence processing. Journal of Psycholinguistic Research 25(2): 193–247. https://doi.org/10.1007/BF01708572 Snedeker, Jesse, and John Trueswell. 2003. Using prosody to avoid ambiguity: Effects of speaker awareness and referential context. Journal of Memory and Language 48(1): 103–130. https://doi.org/10.1016/S0749-596X(02)00519-3 Spalek, Katharina, Nicole Gotzner, and Isabell Wartenburger. 2014. Not only the apples: Focus sensitive particles improve memory for information-structural alternatives. Journal of Memory and Language 70: 68–84. https://doi.org/10.1016/j.jml.2013.09.001 Speyer, Augustin. 2010. Topicalization and stress clash avoidance in the history of English. Berlin: de Gruyter. Sprouse, Jon. 2007. A program for experimental syntax. PhD dissertation, University of Maryland, College Park. Sprouse, Jon, Matt Wagers, and Colin Phillips. 2012. A test of the relation between working memory capacity and syntactic island effects. Language 88(1): 82–123. Steedman, Mark. 1991. Structure and intonation. Language 67(2): 260–296. https://doi. org/10.2307/415107 Steinhauer, Karsten. 2003. Electrophysiological correlates of prosody and punctuation. Brain and Language 86(1): 142–164. https://doi.org/10.1016/S0093-934X(02)00542-4 Steinhauer, Karsten, and Angela D. Friederici. 2001. Prosodic boundaries, comma rules, and brain responses: The closure positive shift in ERPs as a universal marker for prosodic phrasing in listeners and readers. Journal of Psycholinguistic Research 30(3): 267–295. https://doi.org/10.1023/A:1010443001646 Steinhauer, Karsten, Kai Alter, and Angela D. Friederici. 1999. Brain potentials indicate immediate use of prosodic cues in natural speech processing. Nature Neuroscience 2(2): 191–196. https://doi.org/10.1038/5757 Stolterfoht, Britta, Angela D. Friederici, Kai Alter, and Anita Steube. 2007. Processing focus structure and implicit prosody during reading: Differential ERP effects. Cognition 104(3), 565–590. https://doi.org/10.1016/j.cognition.2006.08.001 Streeter, Lynn A. 1978. Acoustic determinants of phrase boundary perception. Journal of the Acoustical Society of America 64(6): 1582–1592. https://doi.org/10.1121/1.382142 Swets, Benjamin, Timothy Desmet, David Z. Hambrick, and Fernanda Ferreira. 2007. The role of working memory in syntactic ambiguity resolution: A psychometric approach. Journal of Experimental Psychology: General 136(1): 64–81. https://doi.org/10.1037/00963445.136.1.64 Terken, Jacques, and S. G. Nooteboom. 1987. Opposite effects of accentuation and deaccentuation on verification latencies for given and new information. Language and Cognitive Processes 2(3–4): 145–163. https://doi.org/10.1080/01690968708406928 Truckenbrodt, Hubert. 1995. Phonological phrases: Their relation to syntax, focus, and prominence. PhD dissertation, Massachusetts Institute of Technology. Truckenbrodt, Hubert. 1999. On the relation between syntactic phrases and phonological phrases. Linguistic Inquiry 30(2): 219–255. Vasishth, Shravan, Rama Kant Agnihotri, Eva M. Fernández, and Rajesh Bhatt. 2005. Noun modification preferences in Hindi. In Proceedings of Construction of Knowledge Conference. Udaipur: Vidya Bhawan Society. https://citeseerx.ist.psu.edu/ viewdoc/download?doi=10.1.1.562.5833andrep=rep1andtype=pdf

investigating auditory syntactic processing

489

Wagner, Michael. 2005. Prosody and recursion. PhD dissertation, Massachusetts Institute of Technology. Wagner, Michael. 2010. Prosody and recursion in coordinate structures and beyond. Natural Language and Linguistic Theory 28(1): 183–237. Wagner, Michael. 2015. Phonological evidence in syntax. In T. Kiss and A. Alexiadou (eds), Syntax: Theory and analysis. An international handbook, 1154–1198. Berlin: Mouton de Gruyter. Wagner, Michael, and Duane G. Watson. 2010. Experimental and theoretical advances in prosody: A review. Language and Cognitive Processes 25. Retrieved 16 June 2017, from http://www.tandfonline.com/doi/abs/10.1080/01690961003589492 Wasow, Thomas, Roger Levy, Robin Melnick, Hanzhi Zhu, and Tom Juzek. 2015. Processing, prosody, and optional to. In L. Frazier and E. Gibson (eds), Explicit and implicit prosody in sentence processing, 133–158. New York: Springer. https://doi.org/10.1007/978-3-31912961-7_8 Watson, Duane G., Jennifer E. Arnold, and Michael K. Tanenhaus. 2008. Tic Tac TOE: Effects of predictability and importance on acoustic prominence in language production. Cognition 106(3): 1548–1557. https://doi.org/10.1016/j.cognition.2007.06.009 Watson, Duane, Mara Breen, and Edward Gibson. 2006. The role of syntactic obligatoriness in the production of intonational boundaries. Journal of Experimental Psychology: Learning, Memory, and Cognition 32(5): 1045. Watson, Duane G., Michael K. Tanenhaus, and Christine A. Gunlogson. 2008. Interpreting pitch accents in online comprehension: H* vs. L+H*. Cognitive Science 32(7): 1232–1244. https://doi.org/10.1080/03640210802138755 Watson, Duane, and Edward Gibson. 2004. The relationship between intonational phrasing and syntactic structure in language production. Language and Cognitive Processes 19(6): 713–755. https://doi.org/10.1080/01690960444000070 Weber, Andrea, Bettina Braun, and Matthew W. Crocker. 2006. Finding referents in time: Eyetracking evidence for the role of contrastive accents. Language and Speech 49(3): 367–392. https://doi.org/10.1177/00238309060490030301 Welby, Pauline. 2003. Effects of pitch accent position, type, and status on focus projection. Language and Speech 46(1): 53–81. https://doi.org/10.1177/00238309030460010401 Wennerstrom, Ann. 2001. The music of everyday speech: Prosody and discourse analysis. Oxford: Oxford University Press. Wightman, Colin W., Stefanie Shattuck‐Hufnagel, Mari Ostendorf, and Patti J. Price. 1992. Segmental durations in the vicinity of prosodic phrase boundaries. Journal of the Acoustical Society of America 91(3): 1707–1717. https://doi.org/10.1121/1.402450 Wijnen, Frank. 2004. The implicit prosody of Jabberwocky and the relative clause attachment riddle. Retrieved 30 June 2017, from http://dspace.library.uu.nl/handle/1874/295949 Yoon, Tae-Jin, Sandra Chavarria, Jennifer Cole, and Mark Hasegawa-Johnson. 2004. Intertranscriber reliability of prosodic labeling on telephone conversation using ToBI. Paper presented at the Interspeech 2004 Conference, Jeju Island, Korea. https://www. researchgate.net/publication/221482950_Intertranscriber_reliability_of_prosodic_labeling_on_telephone_conversation_using_toBI

c ha p t e r 1 4 ...........................................................................................................

l a n g ua g e - p r o c e s s i n g experiments in the field ...........................................................................................................

matthew wagers and sandra chung

14.1 Challenges and opportunities in field-based experiments

..........................................................................................................................

In principle, there are no differences between an experiment in the lab and an experiment in the field: Both ought to rely on theoretically driven, precise hypotheses, and predictions; good design, adequate sample sizes, and appropriate analysis; and ethical treatment of human subjects. In practice, there are many more obstacles in the field than in the lab to meeting the standard of a valuable experiment. There are, at the same time, many more opportunities and room for discovery. Our goal in this chapter is to respond to these practicalities, and signpost both the obstacles and the opportunities, based on our own experience conducting psycholinguistic studies on Chamorro over six years in the U.S. Commonwealth of the Northern Mariana Islands (CNMI). Our focus will be on what are sometimes called “small language” communities—language communities which have relatively few speakers and which typically lack socio-economic and political power. The majority of the languages of the world are small languages. Experimental linguistics is typically carried out in the context of a university laboratory or, increasingly, online via social media or crowd sourcing platforms. Researchers need to compile datasets of considerable size to allow them to draw reliable conclusions, with several scores of items and participants usually the norm.1 In the physical lab, sophisticated devices are often used, such as eye-tracking cameras or brain scanners. 1

Even in laboratory-based experiments, common sample sizes are regrettably not the optimum for many designs. See e.g. Jäger, Engelmann, and Vasishth (2017: appendix B).

492

matthew wagers and sandra chung

The availability of participants and the security and infrastructure required for costly instrumentation are two considerations that make it practical to conduct experimental research in a university laboratory or online. An obvious challenge in field-based experimental research is simply that it provides a different set of practical circumstances to which researchers must adapt. Increasingly, researchers are overcoming these challenges. For example, technological improvements have given rise to more portable equipment which can be brought to the field; see Bennett et al. (2018) for an example of an ultrasound imaging investigation of Irish consonants, and Norcliffe et al. (2015) for an example of an eye-tracking experiment on Tzeltal sentence planning. See Polinsky, Chapter 4 this volume, for a comparison of fieldwork and experimental linguistics that also provides a broader summary of recent, relevant research. A less-often-recognized challenge, but in our view often the most serious challenge, is that the experimental method we have inherited from decades of doing research in university laboratories is itself heavily culturally circumscribed. For a researcher planning a field-based experimental linguistics study, this may be the source of more surprises and obstacles than the practical difficulties that must be overcome. Anand, Chung, and Wagers (2011) describe some of the “cultural felicity” conditions which can typically be met in the lab, but are not guaranteed in the field: the priority attached to test-taking; an individual’s willingness to maintain exclusive focus on an unnatural, usually solitary task; the expectation of accommodation to out-of-context language material (usually presented by a machine). There is a kind of social contract with the experimenter that only makes sense enmeshed in certain cultural standards of authority and evaluation (Rosenthal and Rosnow 2009). Participants at Western-style universities volunteer to participate, are acculturated to test-taking environments, and often appear to be motivated to comply with the experimenter’s expectations and wishes. In a linguistic community in the field, none of this can be taken for granted, nor should the university context necessarily be prized as providing better operating conditions (Henrich, Heine, and Norenzayan 2010). The fieldwork tradition in linguistics, inherited from anthropology, provides a basis for developing a more culturally sensitive ethos in experimental linguistics. However, it is also labor-intensive, and typically centered around the partnership between the linguist and a single individual, or a small number of individuals. Thus it typically provides few answers, and occasionally opposing presuppositions, to some of the bread-andbutter issues of the experimentalist, such as achieving adequate statistical power or disguising the intention of the experimental design. In our experience in conducting language-processing experiments on Chamorro, the most effective way forward was to pursue a team-based approach, one which combined the expertise of an experimentalist (Wagers), a fieldworker (Chung), and, crucially, a member of the language community under investigation (Borja). By devolving some responsibilities and recombining others, this arrangement gave us a way to find realistic, site-specific solutions to the practical and cultural “scalability” issues introduced above.

language-processing experiments in the field

493

But there is ultimately no “one size fits all” approach to those issues because there is no “one size fits all” small language community. Small language communities are extraordinarily diverse—in population size, wealth, political structure, level of education and industrialization, and cultural and societal norms. We believe that this diversity makes it unproductive, at this point in time, to try to generalize about successful strategies for conducting language-processing experiments in the field. The line of research is still too new for anyone to be in a position to enumerate in detail either its challenges or the best practices that would respond to them. So we will limit ourselves to talking about, and talking through, our own experience with Chamorro, an Austronesian language of the Mariana Islands (in Micronesia). From 2011 to 2016 we conducted seven experiments on the processing of Chamorro in the three inhabited islands of the U.S. Commonwealth of the Northern Mariana Islands (henceforth CNMI): six comprehension experiments, each of which involved 80–120 participants, and one production experiment, which involved 43 participants. Our aim was to achieve experimental results that provided meaningful information about the time-course of language processing, including reaction times on a par with those typically seen in experiments conducted in Western-style universities. To do this, we had to negotiate the many issues that arose working with a broad community of speakers in a different social-cultural context. Our expectations were violated on numerous occasions; sometimes we managed to find a workable solution, other times we did not. We describe here what worked, what didn’t work, and our diagnosis of why. If we have general advice to offer, it is this: The experimental linguist in the field must adopt an outlook that is at once holistic and minimalist—holistic in recognizing the interdependency between experimental and social constraints, and minimalist in understanding that complicated procedures or designs can be more easily up-ended on the ground.

14.2 Materials

..........................................................................................................................

Improved experimental design, better digital resources, more accurate measurement devices, more sophisticated analytic techniques: These are just some of the ways in which researchers are constantly innovating and, in doing so, sharpening the questions they can ask. The linguist in the laboratory inherits these cumulative efforts and typically makes small, incremental changes from project to project. Linguists in the field, who must port these cumulative efforts to a different social context, will often find themselves innovating on several fronts at once. The challenges that spur these innovations stem from how a particular experimental design should be moored into the social context of the community whose language is being investigated. We will discuss two ways in which these challenges manifest themselves, in terms of resources and function:

494

matthew wagers and sandra chung

• Do the resources exist to implement the experimental design in a way that is culturally relevant or legible to the community of interest? If not, what has to be created? • Are there any aspects of the task and its function which are not consistent with community values, or which otherwise conflict with community presuppositions? If so, how can the task be adapted? For the sake of concreteness, let us consider one specific experiment and explore the ways in which it could be adapted to the field. The experiment we will model is Sussman and Sedivy (2003), a visual-world eye-tracking experiment that traced the time-course of filler–gap dependency formation. The goal of this experiment was to test whether speakers entertained incremental interpretations of English wh-questions (e.g. ‘What did Jody squash the spider with?’) before the linguistic input signaled any information about the location of the gap. Participants first heard a story and then had to answer a comprehension question. During this time, participants looked at a visual display and their gaze was monitored with a head-mounted eye-tracker. Figure 14.1 illustrates a sample item set, which consists of combinations. The display depicts items related to the story. Crucially, it contains the core arguments of one critical proposition; here, that Jody (agent) squashed the spider (theme) with a shoe (instrument). As expected, Sussman and Sedivy found that when these arguments were mentioned, participants tended to look at their depictions. But they also found that when the verb was mentioned, the participants’ looks anticipated the upcoming theme argument. This anticipation was amplified in wh-questions compared to polar questions (e.g. ‘Did Jody squash the spider with her shoe?’). This finding—that participants

fig. 14.1 A sample itemset from Sussman and Sedivy (2003).

language-processing experiments in the field

495

were selectively “eager” to look at the depiction of a particular unmentioned argument while processing a filler-gap dependency—converged with evidence from reading-time studies that filler–gap processing is especially predictive. Several features of this study make it a compelling design for use in the field. Language-processing behavior was measured in response to connected text as opposed to isolated sentences; the auditory modality was used; and the crucial measurement, probability of fixating a particular image, could be made before participants had to give the response that was ostensibly desired (an answer to the question). What, then, are the challenges? Firstly, there are resource challenges around compiling the physical and digital materials required to carry out a particular design. In the case of a visual-world study like Sussman and Sedivy (2003), the resources required are the stories themselves, the recordings of the stories, and the pictures comprising the visual world.

14.2.1 Visual materials Many methods for investigating language processing involve pictures, such as the visual-world paradigm, naming, picture matching, etc. The laboratory best practice would be to use pictures that are normed in a task-appropriate way. For example, it is usually important that the pictures in a study elicit consistent labels across participants, and a great deal of work has been invested in collecting measurements that an experimenter might use either to identify a homogeneous set of candidate images or to model the variability among images (e.g. as part of a regression analysis). For one recent example, see Moreno-Martínez, Javier, and Montoro (2012), who created a collection of 360 color images rated by 36 Spanish speakers along such dimensions as familiarity, typicality, and manipulability. There is now a plethora of high-quality sources of normed images and related resources (e.g. https://www.cogsci.nl/stimulus-sets; last accessed September 30, 2017). There were two problems with using images from the existing picture databases for our experimental research on Chamorro. Firstly and most obviously, the images were normed for languages other than the language we were investigating. Although Chamorro translations could be found for many of the specific names in these databases, there was no guarantee that the Chamorro translation would be the best name for the picture given. In principle, this problem is easy to fix. We could have taken illustrations from the databases, which are often controlled along non-linguistic dimensions as well, and elicited their names and other sorts of judgments from an appropriately-sized sample of Chamorro speakers. Practically, we did not want to do this. Chamorro is a small language, with a total of some 35,000–40,000 speakers in the CNMI and Guam combined. We believed that locating participants to norm the illustrations would effectively sap, for the period of time we were in the field, the limited pool of Chamorro speakers who would be willing to participate in the main study.

496

matthew wagers and sandra chung

Secondly, the existing images were generally not culturally relevant. They did not depict people with the appearance or clothing typically seen in the CNMI; they did not show the kinds of flora or fauna found there; and they did not show common culturally specific situations and events in Chamorro life. Naturally, many Chamorros have been exposed to mainland U.S. culture through the internet, television, and life abroad, but we wanted our experiments on the Chamorro language to be engaging in Chamorro terms. So we elicited our linguistic stimuli in Chamorro first, in order to decide what needed to be depicted. For example, we wanted to use the verb ngingi’, which refers to sniffing or kissing the back of the hand—a traditional sign of respect when one encounters a Chamorro elder in a social situation. We found a few photos on the internet of the ngingi’, and some illustrations in older printed matter on Saipan, but nothing that would suffice for our experiment. Likewise, we needed a drawing of a sihik, a species of Micronesian kingfisher (Todiramphus cinnamominus), as it appeared on the islands of Saipan and Rota. We could not find many images on the internet that looked just like the local birds; most depicted a differently colored variety of kingfisher found in Guam. So, working with our Chamorro team member, we created a detailed ‘Request for Proposals’ to circulate to potential illustrators, in which we described how a ngingi’ was performed, what the specific features were of kingfishers that our Chamorro team member observed flying around his home, etc. In some other cases we did find good internet resources we could offer as a guide, such as the Wikipedia entry for “Saipan Jungle Fowl” or the excellent Guampedia (http://www.guampedia.com/). Figure 14.2 shows some of the resulting illustrations, which were commissioned for different studies in our project. As we learned from debriefings, many participants were pleasantly surprised to encounter drawings that were locally and culturally specific. At the same time, a few illustrations were problematic, and there were several instances in which drawing conventions familiar to us were not interpreted as we intended. For example, in the image of a rooster pecking the sihik (Figure 14.2, top right panel), small lines were used to indicate the impact of the pecking. These were not widely understood by our participants, particularly older Chamorros. It was instructive to learn how many of our presumptions could be frustrated, in ways we could not expect. For example, the very same illustration of the sihik elicited a surprising response from several farmers, who claimed that roosters would never attack a kingfisher, although hens might. Most of the concerns about the adequacy of particular drawings were minor, but in a few cases they were serious enough to cause us to set the item aside.

14.2.2 Audio materials In addition to pictures, an experiment in the field will often need audio recordings. This was a necessity in our own research; most Chamorros are not skilled readers of Chamorro, in part because there are several standard and nonstandard orthographies in use (Chung and Rechebei 2014). Given the number of individual tokens required in

language-processing experiments in the field

497

fig. 14.2 Some culturally specific illustrations created for the Chamorro Psycholinguistics na Project. Clockwise from top-left: A doctor sniffs (ngingi’) the hand of an elder, a traditional Chamorro sign of respect. A rooster pecks a Micronesian kingfisher (sihik). The Liberation Day queen (raraina) is photographed holding a trumpet shell (kulu’). A coconut grater (kåmyu) rests against a large water bottle—both are very common household items. These illustrations, which were created by California-based artist Nicole Goux, are available for anyone to download and freely use from our project website (http://chamorro.sites.ucsc.edu).

many experimental designs—such as in the fully-crossed, within-subjects designs we used—recording the stimuli is one of the most arduous aspects of preparing for the experiment. We had limited time in the CNMI and thought it would greatly prolong the study to work with another speaker, so we once again made a trade-off and decided to use only our Chamorro team member’s voice. There are definitely perils involved in making this choice. Because the experimenter knows the design of the experiment, s/he might read the materials in a way that is unintentionally informative about a desired mode of task performance. However, the choice also granted us some important flexibility, because it made it easier to re-record our stimuli on the fly. We took care

498

matthew wagers and sandra chung

in inspecting and editing our recordings. We measured several relevant acoustic cues and compared them across conditions, to ensure that our designs were varying what we intended; that there wasn’t too much unintended variation across an item set; and that any unintended variation was random and not correlated unintentionally with condition. In many instances we re-recorded specific stimuli. And unlike an experiment in the lab, where a researcher may have access to a sound booth or a reliably quiet room, we had to contend with background noise, although it was often from the natural world. Whether our Chamorro team member used a substantially nonstandard pronunciation was a concern, since that would be difficult for us to detect. We knew that there were two major dialects in Chamorro—one spoken in Rota (and the southern part of Guam), and the other—the majority dialect—spoken elsewhere in the Mariana Islands. These dialects are mutually intelligible, and the differences between them are mostly phonological. For example, the majority dialect distinguishes between geminate and non-geminate consonants, whereas Rotanese Chamorro has no geminates. However, the majority dialect is recognized as the standard, and there was no problem in using stimuli recorded in this dialect to collect data from speakers on Rota. As it turned out, lexical variation—tied not only to island, but also to age—was the greatest barrier to comprehension for our participants. We were less prepared for this kind of variation because it had not been described in any detail by other linguists. For example, the word for ‘frog’ is kaheru’ on Rota but kairu’ on Saipan (both forms are evidently borrowings from Japanese). Possibly because this sort of word would not typically occur on a news broadcast, say, it was a point of variation of which almost all speakers were unaware. Relatedly, we found that younger speakers’ lexical knowledge of names of animals was extremely limited. A final resource challenge we had to solve involved composing the stimuli themselves. This was, in many ways, purely a fieldwork task. That is to say, we did not attempt to translate “targets” from English to Chamorro ourselves, but instead generated the stimuli through elicitation and translation directly with our Chamorro team member. This was a crucial part of the design phase of the experiment. There is a mode of constructing stimuli for experiments that one might call the “Mad Libs approach,” according to which item templates are designed and the experimenter fills in slots in the templates with lexical material as if they were pigeon-holes. While this characterization is somewhat cartoonish, it is not an entirely inaccurate rendition of how many researchers design experiments in their own language when those experiments require large numbers of items. This was not feasible for our Chamorro experiments, because we often did not know a priori whether a particular factorial design we envisioned could be implemented generally. It would have been easy to be misled by the idiosyncrasies of a few lexical items. Every pair needed to be elicited singly to make sure it was acceptable in Chamorro. Not only that—we discovered frequently, but unsurprisingly, that lexical items had unintended connotations in some constructions. There is a potential reward in attempting to generate large numbers of sentences from scratch with a native speaker. The act of searching for lexical items and trying

language-processing experiments in the field

499

out new combinations of them often led to unexpected ungrammaticality or unpredictable complexity. An exigency of experimental research, i.e. large numbers of items, can thus become an asset in the fieldwork context. In our case, we discovered a number of novel grammatical generalizations, including a complex constraint on wh-dependencies formed on the possessor (Wagers, Borja, and Chung 2015) and the optionality of wh-agreement in certain relative clauses (described in Wagers, Borja, and Chung 2018). Our Chamorro team member had strong but difficult-to-pinpoint intuitions about the infelicity of certain passives in prenominal relative clauses—intuitions that would be supported and amplified by high error rates in a comprehension task (Wagers, Borja, and Chung 2018). Finally, we did occasionally present written materials to experimental participants. We conducted a word familiarity study as a pencil-and-paper survey with a small subset of participants in one of our first experiments (Wagers, Borja, and Chung 2015), as well as some word order preference surveys. Because of the variation in reading skill we alluded to above, the use of written materials required careful administration and instruction. Often, we simply ended up reading the survey aloud and, in many interactions, the administration of the survey effectively became an elicitation session. The data we gleaned was valuable but acquired at a relatively steep cost.

14.3 Methods

..........................................................................................................................

14.3.1 Self-paced listening The goal of our first two studies was to learn something about the incremental processing of wh-agreement, the special agreement found in Chamorro filler–gap dependencies that signals the grammatical relation of the gap. We were interested, in particular, in whether the information that wh-agreement provides about the gap is used by comprehenders to interpret wh-questions in advance of unambiguous bottom-up evidence for the gap site. Although we were inspired by Sussman and Sedivy (2003)’s visualworld paradigm experiment, we did not think we could marshal the required resources to mount that design. Instead we used an anomaly design, of the sort used by other researchers who have investigated wh-dependencies, such as Boland et al. (1995) in their research on argument structure. We compared sentences in a design that crossed the plausibility of the filler as a direct object of the verb (here, prensa ‘iron’) with the presence or absence of wh-agreement morphology, illustrated in (1,2) for overt wh-agreement only. (1) Plausible nigap gi talu’åni? Kuåntu na chinina prinensåm-mu how.many? shirts wh[obj].iron-agr yesterday in afternoon ‘How many shirts did you iron __ yesterday afternoon?’

500

matthew wagers and sandra chung

(2) Implausible Kuåntu na patgun låhi

prinensåm-mu

nigap

gi talu’åni?

how.many? child male

wh[obj].iron-agr yesterday in afternoon

‘How many boys did you iron __ yesterday afternoon?’

In a reading time version of this design (Traxler and Pickering 1996; Wagers and Phillips 2014), the point at which enough information has been amassed to form a dependency between the filler and the gap is indexed by increased reading times for the implausible object conditions. We needed to adapt this to auditory presentation because of the high variability in Chamorro reading skill. One straightforward way to port a reading-time task into auditory presentation is the auditory moving window technique, also called self-paced listening (SPL), described first in detail by Ferreira et al. (1996) (but cf. Pynte 1978). Participants “listen” to a sentence by pressing a button to iteratively advance through a series of segments which were spliced from whole sentence recordings. Compared to reading-time studies, there are many fewer SPL studies in the adult psycholinguistics literature, and so fewer established findings to guide experiment design. SPL has been used more commonly to investigate populations where literacy is an issue, such as children (e.g. Kidd and Bavin 2007) or second-language learners (Papadopoulou, Tsimpli, and Amvrazis 2014). Probably the most common concern for SPL is simply that it is an awkward way of listening to language. By segmenting a sentence and relying on participants’ button presses, it introduces timing discontinuities in the acoustic signal that could distort or corrupt prosodic cues to lexical or syntactic processing. Indeed, Ferreira, Anes, and Horine (1996) showed that, in a task in which participants must leverage prosodic cues to disambiguate an otherwise globally ambiguous sentence, SPL is detrimental to performance compared to the presentation of unsegmented recordings (though only somewhat). So, naturally, attention must be paid to the prosodic features of the phenomenon under investigation, and a judgment must be made about how likely it is that injection of noise into that process would lead to undesirable or misleading consequences. How did participants react to this technique? The first time we used it, 7 of the 40 participants reported during the debriefing that they had had substantial difficulties with words being ‘cut off ’ (ha u’utut) or the sound ‘dying’ (måtai). Another 6 reported problems with a small subset of words, but generally found that the listening technique became easier as the experiment progressed (gi tutuhun kulan makkat, lao klumåklaru ‘at the beginning it was a little tricky, but it started to get clearer’). Finally, 27 reported little to no difficulty understanding. It is hard to directly interpret these numbers. Language-processing experiments and debriefing sessions were novel experiences for virtually all of our participants. Even an ostensible perceptual report, like the words sounding “cut off,” could be the conflation of a number of factors, deriving not only from acoustic quality but also language experience and expectations about how they should respond in the debriefing.

language-processing experiments in the field

501

Our experience suggests that, when all appropriate care is taken, SPL is a valuable technique for the experiment in the field. Yet we ended up only using it for two experiments, each with only 30–40 participants (the first experiment is reported in Wagers, Borja, and Chung 2015). The reason for this was, essentially, a hunch that—for some speakers—it would either be too taxing, too uninteresting, or too unfamiliar. In our first two studies on wh-agreement, we worked with about 200 unique speakers, spanning ages from 19 to 81 (median age: 43). We only administered SPL to those speakers who seemed familiar with computers, were younger, or worked in an office.

14.3.2 Preferential looking For the remaining participants, we needed a technique that would seem less onerous. The technique that we developed is a variant of inter-modal preferential looking (Golinkoff et al. 1987) in which two different response categories were displayed onscreen while participants heard a sentence play over speakers (Wagers, Borja, and Chung 2015). The sentences followed the same anomaly design as in SPL, and the two response categories were simply ‘Good’ (Måolik) and ‘Bad’ (Ti måolik). We reasoned that participants would preferentially look at one of the response categories as evidence accumulated in its favor—in our case, coinciding with dependency formation and interpretation. We were aware that other researchers had used relatively simple technology—a hidden camera—to record and then code point of gaze to a manipulable, physical display using frame-by-frame annotation (Snedeker and Trueswell 2004). So we decided to simply pair our response collection software with a laptop-embedded webcam, and later have annotators align and code the webcam recordings with the simultaneously recorded audio. This was appealing, since we were wary that using an actual eye-tracking camera would be overly intrusive.2 We aimed to be able to set up, and tear down, quickly and not take more than 15 minutes of anyone’s time. We obtained explicit verbal consent to make the recordings. A handful of participants declined to be recorded (fewer than 5%), and we simply covered the camera with a sticker for those sessions. Our data ultimately showed that our idea—that participants would selectively look at response categories as the sentence wore on—was only weakly supported. A better indicator was participants’ looks away from the screen and down toward the keypad as they prepared to make a response. And while that measurement did end up being interpretable and consistent with the SPL data (Wagers, Borja, and Chung 2015: fig. 6), it came at a high cost. We had effectively traded labor on the data collection end

2 We stress that those desiderata reflect trade-offs we chose to commit to in the Chamorro milieu, which was grounded in our hope that we could keep coming around for future projects and keep recruiting participants. For other kinds of research questions, or in other kinds of communities, the use of an actual portable eye-tracker may very well make better sense (as in e.g. Norcliffe et al. 2015).

502

matthew wagers and sandra chung

for labor on the analysis end. We trained several undergraduate RAs to do frame-byframe annotation. Each video was multiply coded, and it was possible, even with this simple data, to achieve high inter-annotator agreement (comparable to Snedeker and Trueswell 2004: appendix D). Unfortunately only 45 of 72 original videos (62.5%) were codeable. There were two main reasons for this. Perhaps as a consequence of our deliberately impromptu interactions, participants often felt free to look away for extended periods of time, chat with someone across the table, or generally not pay attention to the screen. In addition, there were many instances when we conducted the experiment at a participant’s home or some other venue they had selected, and it was impossible to exercise adequate control over the illumination of the face. Generally, our experience with the wh-agreement study was mixed. The Chamorro instructions delivered at the beginning of the experiment emphasized relaxed, brief, and non-judgmental interactions. In doing so, it seems probable that we simultaneously limited the scientific value of some of our participants’ data (as evidenced by the attrition rate in our codeable videos), while also unexpectedly placing greater burdens on the “cleaner” self-paced listening data drawn from the more demographically biased sample. If those imperfect datasets had not pointed to the same conclusions, it is not clear we would have had anything to show for our efforts. On the other hand, our open recruitment standards brought in 112 participants, a number greater than we had imagined possible. 112 participants is a more-than-healthy sample for a lab-based psycholinguistics experiment, but, to put it in the perspective of the small language community, it represents at least 0.3% of the entire Chamorro-speaking population in the Mariana Islands (and nearly 3%, on the island of Rota)! The benefits of a large sample were not only statistical, but also social. In future experiments the percentage of participants who had taken part in one of our previous experiments was usually a minority and ranged from 25% to 60%. But many new participants “had heard” about the experiment from others and were interested in joining in.

14.3.3 Tablets After alternating between the SPL and modified preferential looking task for two experiments, we switched to a simpler—and ultimately more engaging—task: sentencepicture matching on a tablet computer. In a series of experiments on the comprehension of relative clauses (Borja, Chung, and Wagers 2016; Wagers, Borja, and Chung 2018), we asked participants to select from one of two pictures that could depict an individual denoted by a relative clause. An example of a stimulus, translated into English, is ‘Push the star over to the kingfisher that the rooster is pecking.’ We would then depict two eventualities: one of a rooster pecking a kingfisher, and the other of a kingfisher pecking a rooster (see Figure 14.2). Previous researchers had used picture matching as an effective technique for studying relative clause parsing in populations for whom literacy could not be presupposed; see e.g. Caplan, Waters, and Hildebrandt (1997) for a study with adults who are aphasic, Clemens et al. (2015) for speakers of Ch’ol and

language-processing experiments in the field

503

Q’anjoba’l, and Grüter (2005) for children who are second-language learners of French or who have specific language impairment. The use of the tablet computers opened the way to an innovation: We could collect data not only about what pictures participants selected but also how they selected it. We were inspired here by research using mouse-tracking (Freeman and Ambady 2010) as a stand-in for eye-tracking in the visual-world paradigm. In mouse-tracking, the inflection of the trajectory—how much it bends toward an alternative picture—has been used to gauge degree of competition between two response alternatives (Freeman, Dale, and Farmer 2011). In our design,3 we asked participants to move around a small icon called the puck. The puck was initially situated near the bottom of the screen, and needed to be moved to one of two pictures situated equidistant from it near the top of the screen. We were able to analyze not only when participants selected a picture, but also when they first touched the cursor; moreover, we could visualize and quantify the trajectory from initial to final position. It had already been noted that swiping on a touchscreen is much more “ballistic” (Freeman and Ambady 2010) and thus there is less variability in the trajectories. Our research basically confirmed this observation, although we were able to find some clear competitive effects (reported primarily in Borja, Chung, and Wagers 2015). We also found that the point at which our participants initially touched the puck correlated strongly with their final selection time. The usefulness of this finding is that sentence– picture matching times are often quite long and variable—and this can severely limit the conclusions that can be drawn about incremental processing. For example, Clemens et al. (2015) report button-pressing times from sentence-picture matching experiments that range from an average of 6,000ms in an experiment with Russian speakers, to 3,100ms in an experiment with Ch’ol speakers, and 1,200ms in an experiment with Q’anjob’al speakers. Our Chamorro experiments which use initial touch times, instead of final selection times, routinely deliver results at the lower end of this spectrum, with correct answers ranging from 800 to 1,600ms (medians across conditions). The distribution of these reaction times is potentially more plausibly linked to comprehension processes at the final word in the sentence than would be the case for higher RTs. More research is required to substantiate this claim in greater detail. Setting aside the promise of collecting relatively short RTs, participants nearly uniformly found the tablets easy and intuitive to use and the task relatively pleasant to complete. Use of the tablets also opened up the opportunity for more substantive debriefings. In contrast to SPL or preferential looking, our participants could see potential applications for the tablet computers in Chamorro language teaching in the schools. This further dimension of engagement meant that our participants talked longer, and more concretely, about the task and the materials.

3

We wrote custom software for this project in OpenSesame (Mathôt, Schreij, and Theeuwes 2012), an open-source experiment-building environment based in Python. It has an easy-to-use Android run-time module and we were able to collect all of our participants’ interactions with the tablet.

504

matthew wagers and sandra chung

14.3.4 Debriefing The debriefings turned out to be central to our experimental protocol. Participants were debriefed individually or—if several had finished the experimental task at the same time—in small groups, usually by our Chamorro team member together with one of the two other team members. The conversation began in Chamorro and usually continued in Chamorro, although some participants switched to English or alternated between the two languages. Some debriefing questions provided more information about the participant’s fluency in Chamorro (e.g. “Were there any words you didn’t recognize?”) or about which stimuli had worked or not worked (e.g. “Were there any pictures that didn’t make sense?”). Information about lexical variation is an example of something that routinely emerged in debriefings, and something that we could incorporate into our analysis or future design. We found that when speakers could be encouraged to talk on concrete topics, especially whether they recognized particular words, they would often segue into more subtle observations about word order, say, or ambiguity. Other debriefing questions invited participants to reflect on the experience of completing the task, what they thought its purpose was, and how it might be made more enjoyable (e.g. “What did you think of the experiment?”, “Did you like it?”, “Would you take another experiment like this?”). Still other questions were simply invitations to talk (e.g. ‘”Which picture did you like the most?”). Although some debriefings were perfunctory, others evolved into extended conversations about the state of the Chamorro language, the need to preserve it, the purpose of our research, and how some of our materials could be used in the schools. These conversations strengthened our personal relationships with community members and encouraged some of them to return to participate in our later studies.

14.4 In the field

..........................................................................................................................

Many of the same sorts of issues that arise in the design phase can also arise in the field, when the experiment is actually being conducted. In addition to finding the best way, for a given time and place, to recruit participants, the team must be able to resolve cultural issues that are uncovered only as the experiment is being conducted, interact with participants in ways that strike all parties as ethical and respectful, and encourage participants’ interest in continuing to be involved in future research. One memorable illustration of this point comes from our first study, in which we had to reprogram and re-record parts of the experiment on the fly, when we learned that an instruction to “look at the cross in the center of the screen” could only be translated with the Chamorro word kilu’us. We quickly learned that the most prominent sense of this word, and its most immediate translation, was ‘crucifix,’ which elicited a strong reaction in the experimental context. In the end we simply rotated the graphic 45◦ and replaced the relevant word in the instruction with ekkis (the letter ‘x’).

language-processing experiments in the field

505

14.4.1 Recruitment strategies In our work in the CNMI we found that a “one size fits all” recruitment strategy would not work: There had to be multiple recruitment strategies that emphasized personal connections and were tailored for the cultural setting. Given that the CNMI is a multilingual, multicultural society in which fluent Chamorro speakers form a minority of the population, it would not have worked to simply post a sign-up sheet at the local public library. Instead, our Chamorro team member used his extensive network of personal connections to make contact with potential participants on his home island. On the other two islands, he found a local Chamorro who agreed to contact potential participants in the same way. Over and above this, whenever we arrived on an island, we paid visits to local officials— members of congress, mayors, administrators, school principals—to talk about our project and ask them to encourage their Chamorro-speaking staff to participate. We were fortunate enough to be interviewed from time to time on local radio and television programs, and were able to use those interviews to announce (in Chamorro) our interest in recruiting Chamorro-speaking participants. Finally, we did not hesitate to turn random social encounters into on-the-spot invitations to participate in the experiment (“Oh—would you like to take the experiment? We could do it right now...”). Similarly, we found that there had to be multiple types of venues where the experiment could be conducted. It occasionally worked for us to conduct the experiment at a participant’s home, but more often the venue was our Chamorro team member’s home, or a workplace, public library, government office, restaurant, or other more neutral setting. The inclusive character of Chamorro culture made it almost impossible to turn away potential participants, even those who were not native speakers of Chamorro. So we minimally screened participants by asking them three or four questions in Chamorro (e.g. “How old are you?” “How old were you when you began speaking Chamorro?”). Everyone who could answer the questions in Chamorro was invited to serve as a participant, and all data files were initially included in the analysis. We set aside data files only on the basis of automatic criteria, such as average reaction times that were extremely long, or high error rates in answers to comprehension questions. Across several experimental studies we excluded, on average, 10% of participant data files.

14.4.2 Issues in delivering the experiment The most sustained issue we confronted while conducting our experimental studies was that participants wanted to give their reactions as a group. Often when several participants were taking the experiment at the same time, they would want to consult with one another or compare their responses. In the debriefings, many participants

506

matthew wagers and sandra chung

said they enjoyed the experiment but would prefer a task they could collaborate on. We never managed to design a collaborative experimental task, although we devoted much thought to the issue. It may be that we were hampered by our specific research questions, which dealt with the comprehension of syntax and morphology, and that other designs could more fruitfully take advantage of the desire for group responses (for example, if the experimental study involved production; see e.g. Brown-Schmidt and Konopka 2011). In several instances, we were able to debrief multiple participants together. That generated more concrete feedback and appeared to be gratifying for the participants. A different issue concerned the trade-off between informed consent and disseminating information about the experiment’s purpose. Before our research began, few if any Chamorros in the CNMI had participated in an experimental study. The IRB at our university agreed to waive the requirement for written informed consent on the grounds that the need to sign a consent form might frighten potential participants or discourage them from participating. We did, however, obtain positive oral consent, and participants were informed of their right to discontinue participation at any time. Participants nonetheless evinced some anxiety, and many expressed the belief that they were somehow being tested on their knowledge of Chamorro. Despite explicit statements to the contrary in the instructions, which were delivered in Chamorro (e.g. “This is not a test. There are no right or wrong answers”), this proved to be a difficult presupposition to defeat, and participants often asked for their score immediately afterwards. We suspect that several factors may have contributed to this presupposition. First and foremost, participants were often surprised to learn that a small language like Chamorro could be worthy of scientific study, and relatedly, that someone who had not studied their first language in school could be viewed as a competent speaker. This did not come as a surprise: One often heard Chamorro spoken of as a creole or a kind of corrupted version of Spanish (due to its rich lexical stratum of borrowings from the Spanish colonial era) or a language without a grammar. In the third year of our experimental studies in the CNMI, it came to our attention that although participants did not want to sign a consent form, they wanted to be informed of the purpose of our research and to be assured that their participation was anonymous. The issue of anonymity arose partly because the version of preferential looking we had used involved videotaping not just the eyes but the entire face. Because of these privacy concerns and the labor-intensive character of the initial stages of the data analysis, we did not use this version of preferential looking in our later experiments. We addressed the more general concerns about our research by developing an information sheet for each experiment which we distributed to participants as part of the debriefing. The information sheet, which was written in Chamorro and English, gave a brief description in lay terms of the purpose of our research and the particular experiment, stated that participation was anonymous, and provided contact information for the three researchers.

language-processing experiments in the field

507

14.4.3 Sustaining the pool of participants In experimental studies in Western-style universities, the pool of participants is typically regulated by a system that requires undergraduates to participate in order to complete particular courses or certain fields of study, or induces participation by offering extra course credit or sufficient financial compensation. In the field, no such system is in place. This means that an important task for a research team in the field is to figure out what causes community members to participate in an experiment and what would encourage them to continue to do so in the future. This is particularly important when the language has a small population of speakers and so the number of potential participants is intrinsically limited. It is made more challenging by the fact that cultures, societies, and communities clearly differ along this dimension. We were aware before we began our research that the financial compensation we could provide for our experiments would induce very few Chamorros to serve as participants. The money economy of the CNMI, together with the high cost of most goods, which are imported, made it impossible for us to consider paying community members at a rate that would justify their participation time. However, we also knew that flash drives were expensive, hard to obtain in the CNMI, prized by younger Chamorros, and much in demand. In our initial study we offered each participant a flash drive or else $10 as compensation. Flash drives, which were chosen by almost all participants, proved to be a great incentive, both because of their high storage capacity and because they were imprinted with the word “Chamorro.” Phone cards were also successful. But overall, flash drives were our most effective method of compensation, and we have returned to them again and again. Over and above this, people agreed to participate for intangible reasons. The most important of these was their respect for our Chamorro team member, an educator and author who is known throughout the CNMI as a highly skilled, unusually generous community member who is committed to advancing indigenous languages and cultures. Many Chamorros who participated in more than one experiment were members of his extended family, his co-workers, his former students, or had collaborated with him in community endeavors. The novelty of participating in a psycholinguistics experiment was another draw. Some people participated because they wanted to help us, because they believed that our work would advance the study of the Chamorro language, or because they were curious to be part of an event that was conducted in Chamorro and involved outsiders. Finally, it was helpful that two members of our team are involved in a long-term, community-based effort to revise the Chamorro–English dictionary. This meant that dictionary group members were particularly willing to serve as participants and to help recruit other participants by spreading the word about our work. It is harder to identify factors that discouraged people from serving as participants in our experiments. Length of time was clearly a potentially discouraging factor. During the instruction phase, participants were told the length of time that the task would probably take (10–20 minutes, depending on the experiment). While almost no one was

508

matthew wagers and sandra chung

deterred by this, some people commented in the debriefing that the task seemed long, or observed that the task took longer than it had in previous experiments. Our local contacts advised us not to tell participants during the instruction phase how many stimuli would be presented, on the grounds that this number (e.g. 40) would be viewed as a disincentive. In fact, very few individuals opted out at the instruction phase, or began the experimental task but left before completing it. It was more common for individuals to show up at a testing site expecting to take the experiment, but leave when they learned there would be a 10–20-minute wait. Unsurprisingly, individuals were more willing to participate when they had been contacted by our Chamorro team member in advance and we could conduct the experiment in a setting they had chosen. The number, and engagement, of participants was more variable when the experiment was delivered in a more anonymous setting, such as a library or government office.

14.4.4 Community engagement From the beginning we had planned to inform the community about the results of our research, and encourage their involvement, by giving public presentations on each of the three islands every few years. The idea was that these presentations would introduce the audience to the scientific study of language through Chamorro data that the community itself had provided. The first set of presentations, on community-based research on Chamorro, focused on the results of our first comprehension study and, separately, on the online parser and search engine that had been developed by Boris Harizanov for the revised Chamorro–English dictionary (Chung and Rechebei 2014). The discussion period touched briefly on psycholinguistics but then turned into a wide-ranging discussion of the need to preserve and maintain the Chamorro language. This audience response caused us to frame the second set of presentations to highlight what our research revealed about the changing nature of the Chamorro language. One unintended consequence of the inclusive approach to recruiting experimental participants was that our studies collected data from many different types of Chamorro speakers. Some interesting variation was revealed when the data were sorted by the participant’s age or home island. For instance, wh-questions in which the gap was a possessor were comprehended far more accurately by older generations than by younger generations; relative clauses in which the gap could be construed as a subject or an object were interpreted differently across islands. More surprising, to us, was the lack of age-related variation in the comprehension of certain types of relative clauses involving complex verb morphology (i.e. wh-agreement). Our second set of presentations described these findings and used them to point out that younger generations of Chamorro speakers know more about the language’s complex verb morphology than they are usually given credit for. This set of presentations was well attended and highly successful on one island and minimally attended on the others. On all three islands, the community’s later interactions with us revealed that they found these presentations important even if they themselves had not been in the audience.

language-processing experiments in the field

509

14.5 Conclusion

..........................................................................................................................

We purposely resist drawing too many conclusions from our particular experience working with the Chamorro community in Saipan, Tinian, and Rota. However, if there is one lesson we think will have broad applicability, it is that the involvement of community members at different levels is indispensable. Language-processing experiments are fundamentally unusual activities. In the context of a small language community, finding the right settings of cultural parameters will be a process of iterative discovery and adaptation. Rarely will it succeed to port an existing study directly into the language of interest without altering it. The need to involve native speakers as stakeholders as well as experimental participants flows from the fact that experiments have substantial practical requirements that cannot be met responsibly by a researcher who does not fully control the language. For example, experiments require the generation and finetuning of large sets of high-quality materials which have the design features they are intended to have. In our case, it was even better that a native speaker had ownership over the project. Likewise, the need to recruit substantial numbers of participants meant developing a network of community members who had a positive, informed disposition toward what we hoped to accomplish. We had to become comfortable explaining and re-explaining the goals and outcomes of our project to sustain our presence in the community. In the end, this helped us not only to achieve greater focus on the scientific issues at stake but also to find new ways to view the language.

Acknowledgments

..........................................................................................................................

We are indebted to Manuel Flores Borja, the third member of our research team, for his many insights and his collaborative spirit. This work was supported in part by NSF Project BCS-1251429 at the University of California, Santa Cruz.

References Anand, Pranav, Sandra Chung, and Matthew Wagers. 2011. Widening the net: Challenges for gathering linguistic data in the digital age. In NSF SBE 2020: Future research in the social, behavioral, and economic sciences. http://www.nsf.gov/sbe/sbe_2020/ submission_detail.cfm?upld_id=121. Bennett, Ryan, Máire Ní Chiosáin, Jaye Padgett, and Grant McGuire. 2018. An ultrasound study of Connemara Irish palatalization and velarization. Journal of the International Phonetic Association 48(3): 261–304. doi:10.1017/S0025100317000494. Boland, Julie E., Michael K. Tanenhaus, Susan M. Garnsey, and Greg N. Carlson. 1995. Verb argument structure in parsing and interpretation: Evidence from wh-questions. Journal of Memory and Language 34: 774–806.

510

matthew wagers and sandra chung

Borja, Manuel F., Sandra Chung, and Matthew Wagers. 2015. Filler-gap order and online licensing of grammatical relations: Evidence from Chamorro. Paper presented at the Eighty-Ninth Annual Meeting of the Linguistic Society of America, Portland, OR. Borja, Manuel F., Chung, Sandra, and Matthew Wagers. 2016. Constituent order and parser control processes in Chamorro. In Amber Camp, Yuko Otsuka, Claire Stabile, and Nozomi Tanaka (eds), AFLA 21: The Proceedings of the 21st Meeting of the Austronesian Formal Linguistics Association, 15–32. Canberra: Asia-Pacific Linguistics. Brown-Schmidt, Sarah, and Agnieszka E. Konopka. 2011. Experimental approaches to referential domains and the on-line processing of referring expressions in unscripted conversations. Information 2: 302–326. Caplan, David, Gloria S. Waters, and Nancy Hildebrandt. 1997. Determinants of sentence comprehension in aphasic patients in sentence-picture matching tasks. Journal of Speech, Language, and Hearing Research 40: 542–555. Chung, Sandra, and Elizabeth D. Rechebei. 2014. Community engagement in the Revised Chamorro–English Dictionary. Dictionaries: Journal of the Dictionary Society of North America 35: 308–317. Clemens, Lauren Eby, Jessica Coon, Pedro Mateo Pedro, Adam Milton Morgan, Maria Polinsky, Gabrielle Tandet, and Matthew Wagers. 2015. Ergativity and the complexity of extraction: A view from Mayan. Natural Language and Linguistic Theory 33: 417–467. Ferreira, Fernanda, Michael D. Anes, and Matthew D. Horine. 1996. Exploring the use of prosody during language using the auditory moving window technique. Journal of Psycholinguistic Research 25: 273–290. Ferreira, Fernanda, John M. Henderson, Michael D. Anes, Phillip A. Weeks, Jr., and David K. McFarlane. 1996. Effects of lexical frequency and syntactic complexity in spokenlanguage comprehension: Evidence from the auditory moving-window technique. Journal of Experimental Psychology: Learning, Memory, and Cognition 22: 324–355. Freeman, Jonathan B., and Nalini Ambady. 2010. MouseTracker: Software for studying realtime mental processing using a computer mouse-tracking method. Behavior Research Methods 42: 226–241. Freeman, Jonathan B., Rick Dale, and Thomas A. Farmer. 2011. Hand in motion reveals mind in motion. Frontiers in Psychology 2. doi:10.3389/fpsyg.2011.00059. Golinkoff, Roberta M., Kathryn Hirsh-Pasek, Kathleen M. Cauley, and Laura Gordon. 1987. The eyes have it: Lexical and syntactic comprehension in a new paradigm. Journal of Child Language 14: 23–45. Grüter, Theres. 2005. Comprehension and production of French object clitics by child second language learners and children with specific language impairment. Applied Psycholinguistics 26: 363–391. Henrich, Joseph, Steven J. Heine, and Ara Norenzayan. 2010. Most people are not WEIRD. Nature 466: 29. Jäger, Lena A., Felix Engelmann, and Shravan Vasishth. 2017. Similarity-based interference in sentence comprehension: Literature review and Bayesian meta-analysis. Journal of Memory and Language 94: 316–339. Kidd, Evan, and Edith L. Bavin. 2007. Lexical and referential influences on on-line spoken language comprehension: A comparison of adults and primary-school-age children. First Language 27: 29–52.

language-processing experiments in the field

511

Mathôt, Sebastiaan, Daniel Schreij, and Jan Theeuwes. 2012. OpenSesame: An open-source, graphical experiment builder for the social sciences. Behavior Research Methods 44: 314–324. Moreno-Martínez, Francisco Javier, and Pedro R. Montoro. 2012. An ecological alternative to Snodgrass and Vanderwart: 360 high quality colour images with norms for seven psycholinguistic variables. PloS One 7: e37527. Norcliffe, Elisabeth, Agnieszka E. Konopka, Penelope Brown, and Stephen C. Levinson. 2015. Word order affects the time course of sentence formulation in Tzeltal. Language, Cognition and Neuroscience 30: 1187–1208. Papadopoulou, Despina, Ianthi Tsimpli, and Nikos Amvrazis. 2014. Self-paced listening. In Jill Jergerski and Bill Van Patten (eds), Research methods in second language psycholinguistics, 50–68. New York: Routledge. Pynte, Joel. 1978. The intra-clausal syntactic processing of ambiguous sentences. In Willem J.M. Levelt and Giovanni B. Flores d’Arcais (eds), Studies in the perception of language, 109– 127. New York: John Wiley. Rosenthal, Robert, and Ralph L. Rosnow. 2009. Artifacts in behavioral research. Oxford: Oxford University Press. Snedeker, Jesse, and John C. Trueswell. 2004. The developing constraints on parsing decisions: The role of lexical-biases and referential scenes in child and adult sentence processing. Cognitive Psychology 49: 238–299. Sussman, Rachel S., and Julie Sedivy. 2003. The time-course of processing syntactic dependencies: Evidence from eye movements. Language and Cognitive Processes 18: 143–163. Traxler, Matthew J., and Martin J. Pickering. 1996. Plausibility and the processing of unbounded dependencies: An eye-tracking study. Journal of Memory and Language 35: 454–475. Wagers, Matthew, Manuel F. Borja, and Sandra Chung. 2015. The real-time comprehension of WH-dependencies in a WH-agreement language. Language 91: 109–144. Wagers, Matthew, Manuel F. Borja, and Sandra Chung. 2018. Grammatical licensing and relative clause parsing in a flexible word-order language. Cognition 178: 207–221. Wagers, Matthew W., and Colin Phillips. 2014. Going the distance: Memory and control processes in active dependency construction. Quarterly Journal of Experimental Psychology 67: 1274–1304.

Annotated bibliography for Part III

..........................................................................................................................

The contributors in this section have compiled brief annotated bibliographies of resources for readers interested in learning how to use the methods discussed in the chapters. The annotated bibliographies are organized below by chapter. ***

Chapter 9. Self-paced reading (Masaya Yoshida)

..........................................................................................................................

Here I list and discuss studies which employ the Self-Paced Reading paradigm and touch on issues in formal syntactic studies. Aoshima, Sachiko, Colin Phillips, and Amy Weinberg. 2004. Processing filler–gap dependencies in a head-final language. Journal of Memory and Language 51: 23–54. This study investigates the processing of filler–gap dependencies in Japanese. It reveals that the filler–gap dependencies in a head-final language like Japanese are processed similarly to those in a head-initial language like English. This study reports the “filledgap effect” in Japanese and suggests a way to design an experiment incorporating the “filled-gap” paradigm in a head-final language. Badecker, Williams, and Kathleen Straub. 2002. The processing role of structural constraints on the interpretation of pronouns and anaphors. Journal of Experimental Psychology: Learning, Memory, and Cognition 28: 748–769. This is one of the early studies that tested the role of the structural constraints on anaphora resolution (i.e. Binding Conditions). They show that multiple constraints, such as structural and morphological constraints, interact during the online comprehension of sentences involving pronouns and anaphora. This study shows that the parser is sensitive to the congruency of the gender information of the antecedent and the pronoun, and that Gender Mismatch Effects can probe for the online anaphora resolution process. Blodgett, Allison and Julie E. Boland. 2004. Differences in the timing of implausibility detection for recipient and instrument prepositional phrases. Journal of Psycholinguistic Research 33: 1–24. A Stop-Making-Sense experiment and an SPR experiment on argument and adjunct prepositional phrases (PPs) shows that the parser is sensitive to subtle grammatical differences between arguments and adjuncts.

514

annotated bibliography for part iii

Boland, Julie E. 2005. Cognitive mechanisms and syntactic theory. In Anne Cutler (ed.), Twenty-first century psycholinguistics: Four cornerstones, 23–42. Hillsdale, NJ: Erlbaum. This work outlines the goal of the study of syntax and the study of sentence processing, and how the study of sentence processing can contribute to the study of syntax. Using the Lexical-Frequency effect as the probe, Boland demonstrates what the RT data can tell us about the argument/adjunct distinction. Crain, Stephen, and Janet Dean Fodor. 1985. How can grammars help parsers? In David R. Dowty, Lauri Karttunen, and Arnold M. Zwicky (eds), Natural language parsing: Psychological, computational, and theoretical perspectives, 94–128. Cambridge: Cambridge University Press. This is one of the earliest study to utilize the word-by-word SPR paradigm to investigate the processing of filler–gap dependencies in English. They show that during the online processing of filler–gap dependencies, the parser utilizes grammatical information. Gibson, Edward, and Tessa Warren. 2004. Reading time evidence for intermediate linguistic structure in long-distance dependencies. Syntax 7: 55–78. This study provides reading time evidence for intermediate steps in the derivation of wh-filler-gap dependency constructions. It has been suggested in the syntax literature that in the course of wh-movement, the wh-element moves through intermediate landing sites. Employing storage cost effects as the probe, they provide RT evidence for such intermediate steps in the derivation of wh-filler-]–gap dependencies. Just, Marcel Adam, Patricia A. Carpenter, and Jacqueline D. Woolley. 1982. Paradigms and processes in reading comprehension. Journal of Experimental Psychology 111: 228–238. This study compares three different ways of presenting the stimuli in SPR paradigms. They compare the cumulative presentation, where the successive words are presented from left to right, the moving-window presentation, where the previously read word disappears from the computer screen, and center-presentation, where each word appears on the center of the screen. They compare these three SPR paradigms and discuss the advantages and limitations of these tasks. Kazanina, Nina, Ellen Lau, Moti Lieberman, Masaya Yoshida, and Colin Phillips. 2007. The effect of syntactic constraints on the processing of backwards anaphora. Journal of Memory and Language 56: 384–409. Employing the Gender Mismatch paradigm, this study demonstrates that a grammatical condition, Binding Condition C, constrains the parser’s antecedent search process

annotated bibliography for part iii

515

during the processing of backward anaphora constructions. They reveal that the active search mechanism that is employed in the processing of filler–gap dependencies is employed in the processing of backward anaphora, suggestive of the general mechanism of active dependency formation in the processing of long-distance dependency constructions. Lee, Ming-Wei. 2004. Another look at the role of empty categories in sentence processing (and grammar). Journal of Psycholinguistic Research 33: 51–73. Lee demonstrates that if an adjunct is inserted between the filler and the subject position, the Filled-Gap effect can be observed in the subject position, which was controversial in the previous literature. Phillips, Colin. 2006. The real-time status of island phenomena. Language 82: 795–823. Through the SPR paradigm, this study shows that the parser finds the gap in a whfiller–gap dependency construction inside a subject island only when the gap within a subject island is grammatically sanctioned, i.e. when the gap is grammatically licensed as a parasitic gap. Phillips discusses the implication of this finding and presents a major argument against processing-based accounts of island effects. Stowe, Laurie A. 1986. Parsing WH-constructions: evidence for on-line gap location. Language and Cognitive Processes 3: 227–245. This is the first study that reported the effect known as the Filled-Gap Effect. Stowe found that during the online processing of wh-filler–gap dependencies, a noun in the position where a gap is expected causes an RT slowdown which is called the Filled-Gap Effect (FGE), as the expected gap is filled by a noun. Furthermore, she shows that the FGE is not observed inside syntactic islands, suggesting that the parser is sensitive to syntactic constraints like islands. ***

Chapter 10. Eye-tracking and experimental syntax (Dave Kush and Brian Dillon)

..........................................................................................................................

Boland, J. E. 2005. Cognitive mechanisms and syntactic theory: Arguments against adjuncts in the lexicon. In A. E. Cutler (ed.), Twenty-first century psycholinguistics: Four cornerstones, 23–42. Hillsdale, NJ: Erlbaum UK. Boland presents important discussion concerning how to align cognitive mechanisms and syntactic theory, highlighting areas where experimental data may be used to bear on representational questions in syntactic theory.

516

annotated bibliography for part iii

Clifton, C., F. Ferreira, J. M. Henderson, A. W. Inhoff, S. P. Liversedge, E. D. Reichle, and E. R. Schotter. 2016. Eye movements in reading and information processing: Keith Rayner’s 40 year legacy. Journal of Memory and Language 86: 1–19. Much of what we now know about eye movements during reading can be traced to the work of Keith Rayner and colleagues. This is a survey of Rayner’s legacy, and how his research developed over his 40-year career. Clifton, C., A. Staub, and K. Rayner. 2007. Eye movements in reading words and sentences. In R. van Gompel, M. Fischer, W. Murray, and R. Hill (eds), Eye movements: A window on mind and brain, 341–372. Amsterdam: Elsevier. Clifton, Staub, and Rayner provide an extensive survey of the literature using eyetracking to investigate linguistic processing, with special attention to how syntactic and semantic processing is reflected in eye movements. Duffy, S. A., R. K. Morris, and K. Rayner. 1988. Lexical ambiguity and fixation times in reading. Journal of Memory and Language 27: 429–446. A landmark early study on how lexical ambiguity is reflected in early fixation times on potentially ambiguous words. Ehrlich, S. F., and K. Rayner. 1981. Contextual effects on word perception and eye movements during reading. Journal of Verbal Learning and Verbal Behavior 20: 641–655. An important early study on how cloze probability affects early fixation behavior on words in context. Frazier, L., and K. Rayner. 1982. Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology 14(2): 178–210. This seminal study is among the earliest evidence for incremental syntactic analysis in natural reading, and a critical early work on Garden-Path Theory. von der Malsburg, T., and B. Angele. 2017. False positives and other statistical errors in standard analyses of eye movements in reading. Journal of Memory and Language 94: 119–133. This study addresses important statistical issues concerning the analysis of eyetracking-while-reading data, and offers practical suggestions for how to address these issues.

annotated bibliography for part iii

517

Phillips, C., and M. Wagers. 2007. Relating structure and time in linguistics and psycholinguistics. In M. G. Gaskell (ed.), Oxford handbook of psycholinguistics, 739–756. New York: Oxford University Press. Phillips and Wagers discuss research at the interface of linguistics and psychology, focusing on how and when experimental data can be used to address representational questions from syntactic theory. Rayner, K. 1998. Eye movements in reading and information processing: 20 years of research. Psychological Bulletin 124: 372. An important summary and theoretical synthesis of the first two decades’ worth of research on eye movements during reading. Rayner, K., and S. A. Duffy. 1986. Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity. Memory and Cognition 14: 191–201. A formative early study on how various lexical factors, including word frequency, impact early reading measures. Rayner, K., S. C. Sereno, R. K. Morris, A. R. Schmauder, and C. Clifton, Jr. 1989. Eye movements and on-line language comprehension processes. Language and Cognitive Processes 4: SI21–SI49. A summary of early eye-tracking studies on syntactic and semantic processing, and discussion of how eye movements can be used to measure how these processes unfold incrementally during natural reading. Rayner, K., T. Warren, B. J. Juhasz, and S. P. Liversedge. 2004. The effect of plausibility on eye movements in reading. Journal of Experimental Psychology: Learning, Memory, and Cognition 30: 1290–1301. Along with Sturt (2003) and Pickering and Traxler (1996), an important seminal work investigating how anomalous linguistic input impacts reading behavior during normal reading. Reichle, E. D., A. Pollatsek, D. L. Fisher, and K. Rayner. 1998. Toward a model of eye movement control in reading. Psychological Review 105: 125–157. A critical theoretical synthesis article that formed the basis of the influential E-Z reader model of eye movements during reading.

518

annotated bibliography for part iii

Reichle, E. D., T. Warren, and K. McConnell. 2009. Using E-Z Reader to model the effects of higher-level language processing on eye movements during reading. Psychonomic Bulletin and Review 16: 1–21. An extension of the E-Z reader model that extends the theory to model how higher order syntactic and semantic processes impact eye-movement behavior. Schotter, E. R., and K. Rayner. 2015. The work of the eyes during reading. In A. Pollatsek and R. Treiman (eds), The Oxford handbook of reading, 44–59. New York: Oxford University Press. A comprehensive summary and overview of the basic psychology of reading. Sturt, P. 2003. The time-course of the application of binding constraints in reference resolution. Journal of Memory and Language 48: 542–562. Sturt’s pivotal study investigated when binding constraints were applied in reflexive comprehension. By looking at how different reading-time measures were influenced by grammatically illicit distractors, Sturt was able to chart the time course of the application of Principle A of the Binding Theory. Traxler, M. J., and M. J. Pickering. 1996. Plausibility and the processing of unbounded dependencies: An eye-tracking study. Journal of Memory and Language 35(3): 454– 475. One of the most widely cited early studies using eye-tracking-while-reading to investigate the processing of filler–gap dependencies. Traxler and Pickering found that the “active filler” strategy applies in normal reading conditions, and that in those same conditions, island constraints are immediately applied to restrict the search for a gap. ***

Chapter 11. Speed–accuracy trade-off modeling and its interface with experimental syntax (Stephani Foraker, Ian Cunnings, and Andrea E. Martin)

..........................................................................................................................

Foraker, Stephani, and Brian McElree. 2007. The role of prominence in pronoun resolution: Active versus passive representations. Journal of Memory and Language 56(3): 357–383.

annotated bibliography for part iii

519

Demonstrates that content-addressable retrieval extends to coreference processing; indicates that linguistic focus does not lead to a cognitive focal attention state. Foraker, Stephani, and Brian McElree. 2011. Comprehension of linguistic dependencies: speed-accuracy tradeoff evidence for direct-access retrieval from memory. Language and Linguistics Compass 5(11): 764–783. A review of SAT evidence evaluating candidate memory operations involved in linguistic dependencies; shows support for content-addressable direct-access retrieval. Lewis, Richard L., and Shravan Vasishth. 2005. An activation-based model of sentence processing as skilled memory retrieval. Cognitive Science 29: 375–419. A computational implementation of cue-based memory retrieval during language comprehension in the ACT-R framework. Martin, Andrea E. 2016. Language processing as cue integration: Grounding the psychology of language in perception and neurophysiology. Frontiers in Psychology 7: art. 120. A theoretical process model that tries to deconstruct principles like cue-based retrieval into neurophysiological concepts like cue integration and normalization. Martin, Andrea E., and Brian McElree. 2008. A content-addressable pointer mechanism underlies comprehension of verb-phrase ellipsis. Journal of Memory and Language 58: 879–906. Evidence that distance and complexity in sentences affects the likelihood of successful retrieval and interpretation, but not processing speed. Argues for a role of computational structures like pointers that obviate re-computation during dependency resolution. Martin, Andrea E., and Brian McElree. 2009. Memory operations that support language comprehension: Evidence from verb-phrase ellipsis. Journal of Experimental Psychology: Learning, Memory and Cognition 35: 1231–1239. Evidence for retroactive interference and against forward and backward serial search during ellipsis processing. Martin, Andrea E., Mante S. Nieuwland, and Manuel Carreiras. 2012. Event-related brain potentials index cue-based retrieval interference during sentence comprehension. Neuroimage 59(2): 1859–1869.

520

annotated bibliography for part iii

Electrophysiological evidence for cue-based retrieval interference in morphosyntactic agreement computation during noun-phrase ellipsis. Martin, Andrea E., Mante S. Nieuwland, and Manuel Carreiras. 2014. Agreement attraction during comprehension of grammatical sentences: ERP evidence from ellipsis. Brain and Language 135: 42–51. A conceptual replication of Martin et al. (2012). McElree, Brian. 1993. The locus of lexical preference effects in sentence comprehension: A time-course analysis. Journal of Memory and Language 32: 536–571. The first application of speed–accuracy trade-off methodology to sentence-processing phenomena. Employs multiple-response SAT. McElree, Brian. 2006. Accessing recent events. In B. H. Ross (ed.), The psychology of learning and motivation, 155–200. San Diego, CA: Academic Press. A comprehensive chapter on the SAT procedure and its conceptual origins in theories of recognition memory. McElree, Brian, and Barbara A. Dosher. 1989. Serial position and set size in short term memory: the time course of recognition. Journal of Experimental Psychology: General 118: 346–373. The benchmark, definitive work demonstrating that set size effects have their origin in representational factors and are not retrieval speed effects, contra the engagement of a serial scanning retrieval operation. McElree, Brian, and Barbara A. Dosher. 1993. Serial retrieval processes in the recovery of order information. Journal of Experimental Psychology: General 122: 291–315. Evidence that relational-order information, as needed to judge the relative recency between two items, engages a serial scanning retrieval operation, a search of memory. McElree, Brian, Stephani Foraker, and Lisbeth Dyer. 2003. Memory structures that subserve sentence comprehension. Journal of Memory and Language 48: 67–91. A comprehensive investigation of filler–gap dependencies that laid the groundwork of cue-based retrieval theory in sentence processing; comprehensive example of developing well-controlled stimulus conditions. McElree, Brian, and Teresa Griffith. 1998. Structural and lexical constraints on filling gaps during sentence comprehension: A time-course analysis. Journal of Experimental Psychology: Learning, Memory, and Cognition 24(2): 432–460.

annotated bibliography for part iii

521

One of the few demonstrations of clear intercept differences in sentence processing. Nairne, James S. 2002. Remembering over the short-term: The case against the standard model. Annual Review of Psychology 53(1): 53–81. A valuable introduction to the idea and power of cue diagnosticity in recognition memory. Öztekin, Ilke, and Brian McElree. 2007. Proactive interference slows recognition by eliminating fast assessments of familiarity. Journal of Memory and Language 57: 126–149. A benchmark investigation of how interference leads to forgetting. One of the few applications of dual-process SAT model fits in the literature. Öztekin, Ilke, and Brian McElree. 2010. Relationship between measures of working memory capacity and the time course of short-term memory retrieval and interference resolution. Journal of Experimental Psychology: Learning, Memory, and Cognition 36(2): 383. Offers a mechanistic explanation for the relationship between cue-based retrieval theory and forgetting, and suggests that what working-memory batteries might measure is susceptibility to overweighting familiarity information. Öztekin, Ilke, Lila Davachi, and Brian McElree. 2010. Are representations in working memory distinct from representations in long-term memory? Neural evidence in support of a single store. Psychological Science 21(8): 1123–1133. Evidence for a unitary cue-driven human memory system, rather than working memory and long-term memory being qualitatively different components. Reed, Adam V. 1973. Speed–accuracy trade-off in recognition memory. Science 181: 574–576. The first application of the speed–accuracy trade-off procedure. Van Dyke, Julie A., and Brian McElree. 2011. Cue-dependent interference in comprehension. Journal of Memory and Language 65: 247–263. Provides evidence that different retrieval cues may be weighted differently during language comprehension, and specifically that syntactic retrieval cues may be weighted more heavily than semantic cues.

522

annotated bibliography for part iii

Wickelgren, Wayne A. 1977. Speed–accuracy tradeoff and information processing dynamics. Acta Psychologica 41: 67–85. An early classic application of SAT modeling. ***

Chapter 12. Formal methods in experimental syntax (Tim Hunter)

..........................................................................................................................

Automata theory and parsing Bar-Hillel, Y., M. Perles, and E. Shamir. 1961. On formal properties of simple phrasestructure grammars. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung 14: 143–172. Rabin, M. O., and D. Scott. 1959. Finite automata and their decision problems. IBM Journal of Research and Development 3(2): 114–125. These classic works on finite-state automata and context-free grammars established the foundations for many subsequent developments. Hopcroft, J. E., and J. D. Ullman. 1979. Introduction to automata theory, languages and computation. Reading, MA: Addison Wesley. Sipser, M. 1997. Introduction to the theory of computation. Boston, MA: PWS. These include chapters with textbook presentations of the key formal properties of finite-state automata and context-free grammars, from a general computer science perspective. Partee, B. H., A. ter Meulen, and R. E. Wall. 1990. Mathematical methods in linguistics. Dordrecht: Kluwer. Part E of this textbook covers much of the same ground as the textbooks by Hopcroft and Ullman and Sipser, but from a more specifically linguistic perspective. Grune, D., and C. J. H. Jacobs. 2008. Parsing techniques: A practical guide, 2nd edn. New York: Springer. A comprehensive textbook on a wide range of subjects in parsing. The parts most relevant to this handbook chapter are chapters 1–7, and chapter 13, which is a particularly clear presentation of parsing as intersection.

annotated bibliography for part iii

523

Information theory, surprisal, and entropy reduction Cover, T. M., and J. A. Thomas. 2006. Elements of information theory, 2nd edn. Hoboken, NJ: Wiley. MacKay, D. J. C. 2003. Information theory, inference and learning algorithms. Cambridge: Cambridge University Press. Comprehensive textbooks on information theory. The early chapters more than cover the concepts that have been applied to human language processing. Hale, J. T. 2001. A probabilistic Earley parser as a psycholinguistic model. In Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics. ACL. Levy, R. 2005. Probabilistic models of word order and syntactic discontinuity. PhD thesis, Stanford University. Levy, R. 2008. Expectation-based syntactic comprehension. Cognition 106(3): 1126– 1177. Foundational works on surprisal. Levy provides a slightly different perspective from Hale’s by relating surprisal to KL-divergence. Hale, J. T. 2003. Grammar, uncertainty and sentence processing. PhD thesis, Johns Hopkins University. Hale, J. T. 2006. Uncertainty about the rest of the sentence. Cognitive Science 30: 643– 672. Foundational works on entropy reduction. These also introduce the application of parsing-as-intersection to human sentence comprehension. Hale, J. T. 2016. Information-theoretical complexity metrics. Language and Linguistics Compass 10(9): 397–412. This article reviews the conceptual underpinnings of both surprisal and entropy reduction, and the empirical support that has been gathered for each.

Prefixes, intersection grammars, and parsing-as-intersection Lang, B. 1988. Parsing incomplete sentences. In Dénes Vargha (ed), Proceedings of the 12th International Conference on Computational Linguistics, 365–371. Stroudsburg, PA: Association of Computational Linguistics. Billot, S., and B. Lang. 1989. The structure of shared forests in ambiguous parsing. In Julia Hirschberg (ed), Proceedings of the 27th Annual Meeting of the Association of Computational Linguistics, 143–151. Stroudsburg, PA: Association of Computational Linguistics.

524

annotated bibliography for part iii

These papers introduced the conception of parsing as intersection, and of intermediate parser states as (intersection) grammars. Jelinek, F., and J. D. Lafferty. 1991. Computation of the probability of initial substring generation by stochastic context-free grammars. Computational Linguistics 17(3): 315–323. Stolcke, A. 1995. An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational Linguistics 21(2): 167–201. These papers present methods for calculating the “prefix probabilities” on which surprisal depends, but without explicitly connecting this to the idea of intersection grammars. Goodman, J. T. 1998. Parsing inside-out. PhD thesis, Harvard University. Goodman, J. T. 1999. Semiring parsing. Computational Linguistics 25(4): 573–605. Goodman’s thesis (pp. 71–77) includes a method for calculating prefix probabilities that, via the deep connections brought out by the mathematical concept of a semiring (which are the focus of the shorter paper), generalizes to other questions one might wish to ask about the relationship between a prefix and a grammar. Nederhof, M. J., and G. Satta. 2003. Probabilistic parsing as intersection. In 8th International Workshop on Parsing Technologies, 137–148. Nancy: LORIA. This paper extends (quite straightforwardly) the conception of parsing as intersection to probabilistic grammars. Nederhof, M. J., and G. Satta. 2008a. Computing partition functions of PCFGs. Research on Language and Computation 6(2): 139–162. This paper presents techniques for efficiently calculating the “partition function” of a probabilistic CFG; this determines the total probability (or more correctly, total weight) assigned by the grammar across all derivations. Applied to the result of intersecting a grammar with a prefix FSA, this provides the “prefix probabilities” that Jelinek and Lafferty and Stolcke calculated by other means. Nederhof, M. J., and G. Satta. 2008b. Probabilistic parsing. In G. Bel-Enguix, M. D. Jiménez-López, and C. Martinín-Vide (eds), New developments in formal languages and applications, 229–258. New York: Springer. This tutorial-style paper provides a succinct presentation of a number of key techniques from the other papers listed here: intersection grammars (section 3), calculation of the partition function (section 2) and Jelinek and Lafferty’s method for calculating prefix

annotated bibliography for part iii

525

probabilities (section 7). (And normalization of an intersection grammar, not mentioned in this handbook chapter but a prerequisite for entropy calculations, is described in section 4.)

Parsing models and human memory requirements Abney, S. P., and M. Johnson. 1991. Memory requirements and local ambiguities of parsing strategies. Journal of Psycholinguistic Research 20(3): 233–250. Resnik, P. 1992. Left-corner parsing and psychological plausibility. In Alexander Gelbukh (ed), Proceedings of the Fourteenth International Conference on Computational Linguistics, 191–197. Berlin: Springer-Verlag. These two papers established left-corner parsing as an explanation for the generalization that humans have difficulty with center-embedding structures, but not with left-embedding or right-embedding structures. Kanazawa, M. 2016. Formal grammar: An introduction. https://makotokanazawa. ws.hosei.ac.jp/FormalGrammar/index.html. Chapter 2 of these online lecture notes present top-down, bottom-up, and left-corner parsing in a manner that closely relates them to the way derivations unfold in contextfree grammars. Wolf, F., and E. Gibson. 2006. Parsing: Overview. In Lynn Nadel (ed), Encyclopedia of cognitive science. Oxford: Wiley. This review article also presents top-down, bottom-up, and left-corner parsing, but in a more procedural, algorithmic style than Kanazawa’s presentation or my own.

Formalizing minimalist grammars Stabler, E. P. 1997. Derivational minimalism. In C. Retoré (ed), Logical aspects of computational linguistics, 68–95. Berlin: Springer. Stabler, E. P. 2011. Computational perspectives on minimalism. In C. Boeckx (ed), The Oxford handbook of linguistic minimalism, 616–641. Oxford: Oxford University Press, These papers introduce a formalization of minimalist syntax, core properties of which have allowed many techniques originally developed for context-free grammars to be applied to it. Clark, A. 2014. An introduction to multiple context-free grammars for linguists. https://alexc17.github.io/static/pdfs/mcfgsforlinguists.pdf. These notes provide an accessible introduction to multiple context-free grammars (MCFGs), a formalism that provides an important stepping-stone for understanding the abstract properties that minimalist grammars share with context-free grammars.

526

annotated bibliography for part iii

Hunter, T., and C. Dyer. 2013. Distributions on Minimalist Grammar derivations. In András Kornai and Marco Kuhlmann (eds), Proceedings of the 13th Meeting on the Mathematics of Language, 1–11. Stroudsburg, PA: Association for Computational Linguistics. This paper also details the connection between minimalist grammars and multiple context-free grammars (MCFGs), and in particular how this connection allows probabilities to be added to minimalist grammars in a manner that’s familiar from CFGs. Stabler, E. P. 2013. Two models of minimalist, incremental syntactic analysis. Topics in Cognitive Science 5(3): 611–633. Hunter, T. 2019.. Left-corner parsing of minimalist grammars. In B. Berwick and E. Stabler (eds), Minimalist parsing, 125–158. Oxford: Oxford University Press. Stanojevic, M., and E. Stabler. 2018. A sound and complete left-corner parser for Minimalist Grammars. In Marco Idiart, Alessandro Lenci, Thierry Poibeau, and Aline Villavicencio (eds), Proceedings of the Eighth Workshop on Cognitive Aspects of Computational Language Learning and Processing, 65–74. Stroudsburg, PA: Association for Computational Linguistics. Hunter, T., M. Stanojevic, and E. Stabler. 2019. The active-filler strategy in a moveeager left-corner Minimalist Grammar parser. In Emmanuele Chersoni, Cassandra Jacobs, Alessandro Lenci, Tal Linzen, Laurent Prévot, and Enrico Santus (eds), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, 1–10. Stroudsburg, PA: Association for Computational Linguistics. These papers adapt automata-theoretic parsing models to minimalist grammars. Stabler’s first paper adapts the transparent top-down method. The others describe more left-corner-based alternatives, motivated by psycholinguistic findings. Joshi, A. 1985. How much context-sensitivity is necessary for characterizing structural descriptions? In D. Dowty, L. Karttunen, and A. Zwicky (eds), Natural language processing: Theoretical, computational and psychological perspectives, 206–250. New York: Cambridge University Press. Joshi, A. K., Shanker, K. V., and D. Weir. 1990. The convergence of mildly contextsensitive grammar formalisms. University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-90-01. These papers provide early discussion of “mild context-sensitivity,” a property that Stabler’s formalized minimalist grammars share with other kinds of grammars such as combinatory categorial grammars, tree-adjoining grammars, and multiple context-free grammars (see above). This convergence is a surprising result, and is related to the way techniques from context-free grammars can be adapted to these formalisms. See also Stabler’s 2011 paper for this.

annotated bibliography for part iii

527

***

Chapter 13. Investigating syntactic structure and processing in the auditory modality (Mara Breen and Katy Carlson)

..........................................................................................................................

Arnold, Jennifer E. 2008. THE BACON not the bacon: How children and adults understand accented and unaccented noun phrases. Cognition 108(1): 69–99. A set of accent and visual-world studies including both adults and children that shows a now-common method of studying prosody in comprehension and illustrates the effects on reference of accented vs. unaccented nouns. Has good control of prosody and extends the usual findings to acquisition. Breen, Mara. 2014. Empirical investigations of the role of implicit prosody in sentence processing. Language and Linguistics Compass 8: 37–50. Describes evidence from the past 15 years that prosodic structures are generated during silent reading and can influence syntactic ambiguity resolution. Cole, Jennifer. 2015. Prosody in context: A review. Language, Cognition and Neuroscience, 30(1–2): 1–31. Review of recent empirical work in prosody with a particular focus on the role of context, including syntactic context as well as discourse and situational contexts. Elfner, Emily. 2018. The syntax–prosody interface: Current theoretical approaches and outstanding questions. Linguistics Vanguard 4(1): 1–14. This paper provides an overview of theoretical advances in research on the syntax– prosody interface. Current theoretical work is situated historically, and is framed in light of the central research questions in the field, including (a) to what extent prosodic structure can be used as a diagnostic for syntactic constituent structure, (b) the significance of recursion in prosodic theory, and (c) how mismatches between syntactic and prosodic constituent structure are modeled in different approaches to the syntax prosody interface. Ferreira, Fernanda. 2007. Prosody and performance in language production. Language and Cognitive Processes 22(8): 1151–1177. A review of recent algorithmic approaches to predicting phrasing and a proposal of how both prosodic structure (based on syntactic structure) and processing constraints influence word durations and silence.

528

annotated bibliography for part iii

Ito, Kiwako, and Shari R. Speer. 2008. Anticipatory effects of intonation: Eye movements during instructed visual search. Journal of Memory and Language 58: 541–573. Contains a set of studies using an expanded visual world compared to most similar research, with multiple contrasts in adjectives and nouns that could be referred to. A good example of an ecologically valid task that allows interesting questions about accents and accent type to be asked. Jun, Sun-Ah (ed.). 2006. Prosodic typology: The phonology of intonation and phrasing. Oxford: Oxford University Press on Demand. A useful book surveying prosodic systems of a variety of languages. Great background for those not working in English or interested in the range of possible prosodic systems. Kjelgaard, Margaret M., and Shari R. Speer. 1999. Prosodic facilitation and interference in the resolution of temporary syntactic closure ambiguity. Journal of Memory and Language 40(2): 153–194. An important work which demonstrated that there are multiple prosodic contours available for a particular syntactic structure, as well as ones that mismatch in the location of prosodic and syntactic boundaries. Explores both facilitative and interference effects from prosody, and shows helpful, neutral, and conflicting prosodic contours. Ladd, D. Robert. 2008. Intonational phonology, 2nd edn. Cambridge: Cambridge University Press. A thorough textbook on prosody, its phonetics, and its use. Reviews different prosodic theories and prosodic issues and provides ample references to different relevant literatures. Schafer, Amy J., Shari R. Speer, Paul Warren, and S. David White. 2000. Intonational disambiguation in sentence production and comprehension. Journal of Psycholinguistic Research 29(2): 169–182. Exemplifies an interactive production experiment design in which naïve participants play a cooperative game in pairs. The rules are designed to elicit a set of pseudospontaneous productions with specific temporary ambiguities. Results demonstrate that speakers use prosodic phrasing to disambiguate syntactic structure in closure sentences. Selkirk, Elisabeth. 1995. Sentence prosody: Intonation, stress, and phrasing. In J. Goldsmith (ed.), The handbook of phonological theory, 231–261. Oxford: Blackwell.

annotated bibliography for part iii

529

A quick overview of Selkirk’s prosodic theories, including focus projection, which have been central to the analysis and understanding of English prosody. Snedeker, Jesse, and John Trueswell. 2003. Using prosody to avoid ambiguity: Effects of speaker awareness and referential context. Journal of Memory and Language 48(1): 103–130. This paper reports three empirical studies exploring the production and perception of prosodic phrasing, exemplifying the visual-world paradigm. Results demonstrate that speakers use prosody to disambiguate syntactic structure only when they are aware of potential ambiguity in the sentence, and that listeners can use such cues to resolve ambiguity online. Wagner, Michael. 2015. Phonological evidence in syntax. In Tibor Kiss and Artemis Alexiadou (eds), Syntax: Theory and analysis. An International handbook, 1154–1198. Berlin: Mouton de Gruyter. An excellent discussion of a variety of hypotheses about how components of prosody (prosodic phrasing, prosodic prominence, and intonational tunes) relate to syntactic structure. Wagner, Michael, and Duane G. Watson. 2010. Experimental and theoretical advances in prosody: A review. Language and Cognitive Processes 25: 905–945. A recent review of prosody research in phrasing and prominence (comparable to the Cutler, Dahan, and van Donselaar 1997 review). Reviews the acoustic features that signal the presence of prosodic boundaries and accents and how these cues are interpreted by listeners. Also surveys modeling work on syntax and prosody and discusses debate about whether prominence and phrasing are categorical. Watson, Duane, and Edward Gibson. 2004. The relationship between intonational phrasing and syntactic structure in language production. Language and Cognitive Processes 19(6): 713–755. Describes three speech production experiments designed to assess four algorithmic models of the relationship between syntactic structure and prosodic phrasing. Results demonstrate that phrase boundaries are determined by the size of preceding and following syntactic constituents. ***

530

annotated bibliography for part iii

Chapter 14. Language-processing experiments in the field (Matthew Wagers and Sandra Chung)

..........................................................................................................................

Christianson, Kiel, and Fernanda Ferreira. 2005. Conceptual accessibility and sentence production in a free word order language (Odawa). Cognition 98: 105–135. This study describes some of the earliest field psycholinguistics experiments we are aware of on an indigenous language of North America. The same kinds of issues we discussed in our chapter were examined here and in the first author’s 2002 Michigan State dissertation. It is instructive to read about the particular resource and cultural challenges that arose in the Odawa context, and the variety of ways in which these researchers’ perceptions and responses to those challenges both aligned with and diverged from our own experience. Rosenthal, Robert, and Ralph L. Rosnow (eds) 2009. Artifacts in behavioral research: Robert Rosenthal and Ralph L. Rosnow’s classic books. Oxford: Oxford University Press. This single volume is a recent reissue of three classic social science books, either edited or written by Robert Rosenthal and Ralph L. Rosnow. The topics range broadly across the theme of artifacts in behavioral research. The specific sense of “artifact” intended here is as Rosenthal and Rosnow defined it: “the portion of the complexity of human behavior which can be attributed to the social nature of behavioral research” (from the Preface to Artifacts in behavioral research, p. 4). It is a highly relevant read for all linguists who do experimental research. The various chapters cover, in terms both precise and holistic, such issues as test apprehension, the volunteer subject effect, experimenter expectations, and demand characteristics (i.e. what the design and execution of our experiment telegraphs to the participants about how they should perform—a concept due to Martin Orne, and explicated in his chapter in Book One: Artifact in Behavioral Research).

pa rt

iv

...................................................................................................

NEUROLINGUISTIC METHODS IN S Y N TA C T I C T H E O RY ...................................................................................................

c ha p t e r 1 5 ...........................................................................................................

electrophysiol o gical methods ...........................................................................................................

jon sprouse and diogo almeida

15.1 Introduction

..........................................................................................................................

The potential value of electrophysiological measures like electroencephalography (EEG) and magnetoencephalography (MEG) for experimental syntax is easy to see: If one believes that cognition is mediated by electrical activity in the cortex, and if one believes that syntactic theories are ultimately theories of cognition, then any method that yields information about electrical activity in the cortex potentially provides information about syntactic theories. That said, we believe it is fair to say that, to date, EEG and MEG have played much larger roles in the construction and evaluation of theories of language processing (including sentence processing) than they have in the construction and evaluation of theories of grammar. One reason for this is that it is relatively difficult to construct a linking theory between syntactic theories and sentence-processing theories that yields predictions that are dispositive of syntactic theories. This is because the relationship between syntactic theories and sentence-processing theories is many-tomany. A second reason for this is specific to extracranial electrophysiological measures like scalp EEG and MEG: These methods only detect a subset of cortical activity. It is eminently possible that the subset of cortical activity that these methods can detect is not the subset of cortical activity that is relevant to syntactic theories. Despite these challenges, we believe that experimental syntacticians (who are so interested) should nonetheless consider exploring electrophysiology as potential data sources for syntactic theories. To be clear, when it comes to syntax, electrophysiology is a high-risk/highreward method. But as long as it is undertaken after careful consideration, the risk may be worth it. To that end, we would like to use this chapter to provide a foundation for thinking about ways to incorporate EEG and MEG into the experimental syntax toolkit. In this chapter we will focus (nearly) exclusively on extracranial electrophysiological techniques, namely scalp EEG, which measures electrical potentials on the scalp that

534

jon sprouse and diogo almeida

are generated in the cortex, and MEG, which measures magnetic fields that are generated by electrical activity in the cortex. This choice is purely pragmatic—extracranial methods are far more likely to be available to experimental syntacticians than intracranial methods, which can only be used as part of a medical procedure performed by a licensed neurosurgeon (though we do discuss one intracranial study of syntax in Section 15.6). In Section 15.2, we begin by introducing the three content areas necessary to understand both the potential of EEG as a method in cognitive science and its specific challenges: the basics of electricity, the neurobiology underlying scalp potentials, and the math necessary to extract useful information from the EEG signal (with citations here and in the annotated bibliography for more information). In Sections 15.3 and 15.4, we briefly review some of the ERP and time-frequency results (respectively) in the sentence-processing literature, in order to provide a foundation for thinking about ways to leverage EEG results in service of syntactic theories. In Section 15.5, we introduce MEG, and review its similarities and differences with EEG. MEG is substantially more costly than the EEG, so it is likely to be less available to experimental syntacticians; nonetheless, as we will see in Section 15.6, some important recent results were collected using MEG, so it is worth reviewing the foundation of MEG to better understand those results. In Section 15.6, we briefly discuss several recent studies that have attempted to directly link syntactic theories with sentence processing and electrophysiological measures. Our hope is that these brief discussions will provide a starting point for experimental syntacticians who are interested in pursuing their own studies. Section 15.7 concludes.

15.2 The three core content areas for understanding EEG

..........................................................................................................................

Three core content areas are necessary for understanding the potential of EEG and MEG as tools for experimental syntax: electricity, neurobiology, and wave mathematics. In this section we will provide a brief introduction to the core concepts of each, with pointers to more information for interested readers (see also the annotated bibliography, and of course, the two most prominent textbooks for EEG in cognitive science: Luck 2014 and Cohen 2014). We have three goals for this section: (i) to introduce readers to the concepts necessary to begin to work with EEG and MEG, (ii) to illustrate the promise that EEG and MEG hold for studying syntax as a component of cognition, and (iii) to explain the basis of the challenges that are unique to EEG and MEG in the study of syntax.

15.2.1 Electricity EEG measures changes over time in electrical potential on the scalp. A typical EEG system consists of a set of electrodes embedded in a nylon cap or net (typically a power

electrophysiological methods

535

of two: 16, 32, 64, 128, or 256), and a specialized amplifier for recording the very small electrical potentials that the electrodes detect on the scalp. Electrical potential is the potential for electrical current (electrons) to flow between two locations. It is thus important to remember that the value that is recorded by the system at any given scalp location is ultimately the potential for electrons to flow between that location and a reference location. In an electrical circuit that reference location is the ground, but EEG amplifiers allow researchers to specify any reference location that they like by calculating the difference of the potential between one electrode and the ground and the potential between the chosen reference electrode and the ground. If the symbol A is the target electrode, G is ground, and R is the chosen reference electrode, then the potential recorded at A is given by this equation (A-G) - (R-G), which reduces to (A-R), thus underscoring the point that the potential recorded at electrode A is actually the potential for current to flow between A and R (and in the process this equation also removes electrical noise that may have been in the ground circuit). The unit of measure for electrical potential is the volt, with scalp EEG potentials typically in the microvolt range (i.e., one millionth of a volt, or 10−6 ). In order to measure changes over time, EEG amplifiers must take discrete measurements, called samples, many times per second. Modern EEG amplifiers are able to take anywhere from 250 to 100,000 measurements per second (for a sampling rate of 250Hz to 100,000Hz), though the typical sampling rate for language experiments is 250Hz to 1000Hz. In human-made electrical systems, there are two types of electrical current: direct current (DC), wherein the electrons flow in one direction at all times, and alternating current (AC), wherein the electron flow alternates between two directions periodically. The biological electrical current of the brain cannot be as easily categorized as human-made systems; however, because cortical activity is oscillatory, it is often easiest to think of EEG as an AC signal. The fact that EEG is ultimately the measurement of an (AC-like) electrical signal has a number of practical implications for designing EEG experiments, such as the choice of reference location, the choice of sampling rate, and the effect of resistance/impedance on the measurements. These practical considerations are far beyond the scope of this chapter (see Luck 2014 for discussion and advice). The upshot is that it is critical for EEG researchers to invest some time in understanding the fundamentals of electrical signals and electrical circuits as these fundamentals do have consequences for EEG experimental design and analysis (see the annotated bibliography for resources).

15.2.2 Biology The neurobiological source of the electrical activity that (scalp) EEG measures is relatively straightforward to state: It is the summed activity of the synchronized postsynaptic potentials of large populations of spatially aligned pyramidal cells in the cerebral cortex. In this paragraph we will attempt to unpack this statement, albeit to a very cursory level of description. First, the cells that are generating the activity detected by scalp EEG are exclusively within the cerebral cortex. The cerebral cortex is the outer layer of

536

jon sprouse and diogo almeida

the cerebrum, covering the gyri and sulci, typically 2mm to 3mm thick, and often called “gray matter” because of its coloring. The cortex is generally considered to be a critical component of cognition, but crucially, it is not the only component. Second, the cells that are generating this activity are pyramidal cells. These are a relatively common type of neuron in the cortex, but again crucially, they are not the only type of neuron in the cortex. Third, (scalp) EEG can only detect electrical activity generated by large populations of pyramidal cells. This is because the relatively small potentials generated by these neurons dissipate quickly as they travel through the biological matter that intervenes between the cortex and the EEG electrode on the scalp. Fourth, these large populations of pyramidal cells must be spatially aligned. This is because each pyramidal cell is an electrical dipole: One end of the cell is positively charged and the other is negatively charged. If the cells are aligned, the electrical charges will sum, creating a larger electrical current that can reach the electrode on the scalp. If they are not aligned, the charges will not sum (as diametrically opposed dipoles will cancel each other), and the current will not be large enough to reach the electrode on the scalp. Fifth, the potentials that are measured by (scalp) EEG are post-synaptic potentials. Synapses are the junctions between two (or more) neurons, such that there is a presynaptic neuron and a postsynaptic neuron. Crucially, we can distinguish between the potential that is generated within the presynaptic neuron, called an action potential, and the potential that is (chemically) transmitted to the postsynaptic neuron, called the post-synaptic potential. Post-synaptic potentials can either be excitatory, which brings the post-synaptic neuron closer to generating an action potential, or inhibitory, which brings the post-synaptic neuron farther from generating an action potential. The structure and function of neurons and synapses is far beyond the scope of this chapter (see the annotated bibliography to this chapter). The critical point here is that (scalp) EEG measures post-synaptic potentials, not action potentials. Sixth, these post-synaptic potentials must be synchronized, that is, correlated in time. This synchrony can be phase-locked or not, as we will discuss in more detail below. (Synchrony can also be local or long-range; here, though, we focus on local synchrony as it is a necessary requirement to detect a signal at the scalp.) Finally, the pattern of activity that (scalp) EEG measures is the summation of all of the detectable activity emanating from the cortex. This is because the biological matter of the brain, skull, and scalp is conductive. Any given generator of EEG activity inside the cortex can potentially send some amount of signal to all of the electrodes on the scalp. If there are multiple such generators, which is likely given the complex nature of cognition, these signals will sum. Though the preceding paragraph is only a relatively shallow review of the biology underlying EEG, even at this level of detail, it is a bit easier to see the source of both the potential benefits and challenges for the use of EEG in experimental syntax. The benefit of EEG is that it is a relatively direct measure of a signal that we think is relevant for cognition: Postsynaptic potentials are the messages that neurons send to each other. The challenges facing EEG come from the fact that those potentials are being recorded from the scalp. The summation of electrical signals through biological matter means that EEG is not an ideal tool for localizing the source of the EEG activity.

electrophysiological methods

537

The fact that only large populations of spatially aligned pyramidal cells can create the kind of activity that is detectable at the scalp means that EEG can only detect a subset of cortical activity. This makes EEG a potentially risky method for cognitive theories, as it is logically possible that the activity that is most relevant will not be detected. It also means that scalp EEG is likely a poor tool for low-level neuroscience (a point made forcefully in Luck 2014), as it simply cannot detect any of the activities of neurons other than postsynaptic potentials (action potentials, ion gates, etc). Despite these challenges, there have been quite a number of results obtained using scalp EEG that appear relevant for cognitive theories of language processing (see Section 15.3), and therefore we believe it is valuable to at least try to leverage EEG in the domain of experimental syntax. That said, we do recommend that potential EEG researchers invest some time in understanding the neurobiology underlying EEG recordings in order to better appreciate the relationship of the signal to theories of the neurobiology of cognition.

15.2.3 Wave mathematics There are two reasons that wave mathematics is fundamental to EEG analysis. The first is purely mathematical: EEG is a time-varying signal. As Fourier first demonstrated, any time-varying signal can be represented as the sum of some number of constituent sine waves of different frequencies. In other words, there are two equivalent representations for EEG signals: one in the time–amplitude domain, expressing the change in electrical potential over time, and one in the frequency domain, expressing the frequency and amplitude of the sine waves required to compose that signal. This equivalence makes available a number of advanced signal processing techniques from wave mathematics (the Fourier transform, the convolution theorem, etc). The second reason that wave mathematics is fundamental to EEG analysis is biological: The electrical activity generated by the cortex appears to be oscillatory in nature. This suggests that EEG signals may in fact be fundamentally composed of the sum of some number of neuronal oscillations, which in turn can be mathematically represented as sine waves, and analyzed using the tools of wave mathematics. Wave mathematics is a broad field, spanning trigonometry, calculus, and linear algebra. We provide some references to begin learning the most relevant concepts for EEG analysis in the annotated bibliography. There are two fundamental analysis techniques in common use in the EEG literature: the event-related potential technique (ERP) and the time-frequency decomposition technique (TF). Here we will outline each technique, and then provide a brief discussion of their similarities and differences. We begin with the ERP technique, as it is by far the most common analysis technique for EEG experiments in the sentence-processing literature. The first step of the ERP technique is to define each trial in the experiment as a time window of EEG activity around a critical stimulus (such as a word). These time windows are called epochs. The critical stimulus is designated time point 0 for convenience, with epochs typically ranging from 100ms or 200ms before the stimulus (–100ms or –200ms) to 1000ms or more after the stimulus. The second step of

538

jon sprouse and diogo almeida

the ERP technique is to cut out these epochs from the continuous EEG recording, and then organize them according to experimental condition. The third step is to align all of the epochs within each condition by time point 0. The final step is to average (using the arithmetic mean) across all of the aligned epochs within each condition. The resulting averaged wave is called an ERP. The math underlying the ERP technique is simple (time-aligning and averaging), but powerful. The averaging procedure ensures that only features of the EEG signal that are time-locked to the stimulus (arising at the same time in each epoch) and phase-locked to the stimulus (peaks align with peaks, troughs with troughs, at each frequency) will survive. Any activity that is either nottime-locked or not-phase-locked or both will be diminished in the averaging. If this activity is randomly distributed in time and phase-locking, it will approach zero as the number of epochs increases. This leads to the fundamental idea of the ERP technique: The signal that the technique returns is the time-locked and phase-locked activity; the noise that the technique discards is the not-time-locked and not-phase-locked activity. Time-locked and phase-locked activity is sometimes called evoked activity. The ERP technique will be appropriate for any theories that make predictions about evoked activity. (We have purposely left out the processing steps that are necessary to eliminate other sources of noise from the EEG data, such as filtering, artifact detection, and baseline correction, so that we can focus on the underlying logic. See Luck 2014 for a comprehensive introduction to these steps.) The TF technique is less common in the sentence-processing literature than the ERP technique, but has been growing in popularity over the past 20 years or so. The first step of the TF technique is also to define epochs around a critical stimulus. The only difference is that the epochs in the TF technique often extend further back in time from the critical stimulus (typically -400 or –500ms) and sometimes extend further forward in time as well. This is because there is a direct relationship between the size of the epoch and the frequencies that can be reliably detected in the epoch (with lower frequencies requiring longer epochs to be reliably detected). The second step is also identical to the ERP technique: Cut these epochs out of the continuous EEG recording. The third step is where the two methods diverge: perform time-frequency decomposition on each epoch independently. There are a number of techniques for time-frequency decomposition, such as Morlet wavelets, multitapers, and the short-time fast Fourier transform. All of these are beyond the scope of this chapter (but see Cohen 2014 for a comprehensive introduction). The critical idea is that each of these methods attempts to decompose the EEG in the epoch into a combination of sine waves of different frequencies, each with an amplitude (how much of the sine wave is present, sometimes reported as power, which is amplitude squared) and phase (the location in the cycle, reported as radians, as in angles in the unit circle) that varies over time. The fourth step is to align all of the epochs within each condition by time point 0. The final step is to average (using the arithmetic mean) across all of the aligned epochs, respecting the distinction between frequencies. This averaging procedure means that time-locked features of the epochs will be maintained, and not-time-locked features will be diminished. However, there is no phase-locking effect in this averaging, because power and phase have been separated

electrophysiological methods

539

into distinct quantities by the TF decomposition technique. Furthermore, because both amplitude and phase measures are always greater than or equal to 0, there is no way for these measures to cancel themselves out in an averaging procedure. This leads to the fundamental idea of the TF technique: The signal that the technique returns is timelocked activity at each frequency (with no commitment to phase-locking); the noise that the technique discards is the not-time-locked activity. Time-locked but not phaselocked activity is sometimes called induced activity. The TF technique by default returns the sum of induced and evoked activity, but can be modified in various ways to subtract out the evoked activity, leaving only the induced activity behind. There are number of ways of thinking about the relationship between the ERP and TF techniques. One important similarity is that they both leverage time-locking to distinguish cognitive processes that are likely related to our experiment from all of the other processes that the brain might be deploying at any given moment. The primary difference between the two techniques centers on the role of phase-locking. This difference is not simply mathematical. The physiological events that give rise to phase-locked (evoked) activity and the physiological events that give rise to not-phase-locked (induced) activity are likely distinct. For example, one possible source of evoked activity is a phase-reset in the firing of a population of neurons; and one possible source of induced activity is a sustained oscillation of population of neurons. We say “one possible source” because the physiological source(s) of evoked activity is an open area of research (see Mazaheri and Jensen 2010 for a discussion of competing theories). The physiological source of results in the TF technique can never be stated with certainty from the EEG signal alone, as the mathematical decomposition methods will always return a representation that is composed of a series of sine waves for any time-varying signal, regardless of the source of the time-varying signal. For these reasons, the two techniques are complementary, and both should probably be in the EEG researcher’s toolkit.

15.3 A brief review of common erp effects during sentence processing

..........................................................................................................................

In this section we will review some of the well-established ERP effects in the linguistics and psycholinguistics literature. Our goal is twofold. First, any experimental syntactician interested in EEG must first become familiar with the work that has come before, in order to build on, and ultimately extend, that work. Therefore we wish to provide a starting point for building that knowledge. Second, one way to leverage EEG in service of syntactic theories is to use existing EEG effects, either ERP (this section) or TF (next section) to draw inferences about syntactic theories. This is not a simple task, as it requires first linking the EEG effects to underlying cognitive operations, and then linking those cognitive operation, via a sentence-processing theory, to syntactic theories. We

540

jon sprouse and diogo almeida

will discuss this challenge in more detail in Section 15.6. In this section and the next, we wish to provide a foundation for the first step of this process, linking EEG effects to cognitive operations, by reviewing the literature on the functional interpretation of existing EEG effects. In this section we briefly review five ERPs that experimental syntacticians are likely to encounter in the sentence processing literature (for a broader review of ERP components, see Kappenman and Luck 2012: the Oxford handbook of event-related potential components). For each we provide a brief review of the eliciting conditions of the ERP and the functional interpretation of the ERP.

15.3.1 The Early Left Anterior Negativity (ELAN) The ELAN is a negative-going deflection that peaks in a relatively early processing window (between 100ms and 250ms post-stimulus onset), and is maximal over left anterior electrodes. The ELAN was first reported by Neville et al. (1991) to the transposition of a noun and a preposition in sentences like those in (1). Here and throughout, the critical words for the analysis will be in bold. (1) grammatical control: The boys heard Joe’s stories about Africa. transposition:

*The boys heard Joe’s about stories Africa.

A similar effect was reported by Friederici et al. (1993) to German sentences like the one in (2): (2)

a. *Das Baby wurde im gefürttert b. The baby was in-the fed.

The ELAN appears to be elicited by phrase structure violations, as in both of these cases, the critical word (in boldface) cannot appear in that position. The ELAN has been elicited in a number of languages beyond English and German, including Mandarin Chinese (e.g. Ye et al. 2006), Dutch (e.g. Hagoort et al. 2003), French (e.g. Isel et al. 2007), Japanese (e.g. Mueller et al. 2005), and Spanish (e.g. Hinojosa et al. 2003). The ELAN is not affected by task (Hahne and Friederici 2002), by the probability of the violation in the experiment (Hahne and Friederici 1999), or by the frequency of a disambiguated structure (Ainsworth-Darnell, Shulman, and Boland 1998; Friederici et al. 1996). Taken as a whole, these results suggest that the ELAN is a very specific response to phrase structure violations, and not simply a response to difficult or unlikely structures. Recent research on the ELAN has focused on the extremely early latency of the response. The 100–250ms post-stimulus window is remarkably early for syntactic analysis (and error diagnosis) given that estimates of lexical access often center around 200ms post-stimulus (Allopenna, Magnuson, and Tanenhaus 1998; van Petten, Coulson, Rubin, Plante, and Parks 1999). Four approaches have been offered to explain the early latency of the ELAN. Friederici (1995) adopts a parsing model in which the earliest stage considers only word category information (e.g. Frazier 1978), thus limiting the

electrophysiological methods

541

number of processes that need to be performed in the earliest time window. Lau et al. (2006) suggest that the early latency can be explained if the parser has predicted the properties of the critical word prior to encountering it, such that many of the syntactic features are in some sense “pre-parsed”. Dikker et al. (2009) propose the “sensory ELAN hypothesis,” in which the ELAN indexes a processing stage prior to lexical access that occurs in the sensory cortices (visual or auditory cortex). This prelexical processing is based purely on the form typicality of the words—i.e. the sensory cortices use the probability of certain phonetic forms to determine if the incoming string is most likely a noun, verb, etc. Finally, Steinhauer and Drury (2012) argue that at least some of the ELAN effects reported in the literature may be artifacts that arise when comparing two conditions that do not match in the word preceding the critical word. Though there is no consensus on the source of the ELAN, what is clear from this debate is that any adequate functional interpretation must take (i) the earliness of the response and (ii) the specificity of the response into consideration.

15.3.2 The Left Anterior Negativity (LAN) While the LAN and the ELAN share many properties (i.e. they are both negative-going deflections that occur primarily over left anterior electrode sites), they differ along two critical dimensions. First, the LAN occurs in a slightly later time window, usually 300– 500ms post-stimulus onset, which eliminates many of the complex timing questions associated with the ELAN. Second, the LAN has been elicited by a broad array of (morpho)syntactic violations, such as agreement violations (Coulson et al. 1998; Gunter et al. 1997; Münte et al. 1997; Kaan 2002; Osterhout and Mobley 1995), case violations (Münte and Heinze 1994), phrase structure violations, (Friederici, Hahne, and Mecklinger 1996; Hagoort, Wassenaar, and Brown 2003) island constraint violations (Kluender and Kutas 1993b), and even garden-path sentences (Kaan and Swab 2003). The LAN has also been elicited during the processing of long-distance dependencies such as wh-movement, at both the displaced wh-word and the unambiguous cue for the gap location (Kluender and Kutas 1993a; Phillips, Kazanina, and Abada 2005), and during a memory period after processing grammatical semantically reversible sentences (Meltzer and Braun 2013). One common functional interpretation of the LAN is as an index of morphosyntactic agreement violations (e.g. Molinaro et al. 2011). However, two empirical concerns about the LAN have led to (at least partially) competing interpretations. The first concern is that the LAN shows quite a bit of variability across experiments, in some cases not appearing at all for violations that are unambiguously morphosyntactic in nature. Tanner and van Hell (2014) argue that it is possible that the LAN is an illusion that could arise if the participants in a sample are really from two distinct populations: one that shows an N400 to violation and one that shows a P600 to the violation. As we will see when we review the N400 and P600 below, the timing and scalp distributions of N400s and P600s could potentially give rise to an illusory response with the timing

542

jon sprouse and diogo almeida

and scalp distribution of a LAN if averaged together. The second concern is that the LAN also arises for conditions that do not obviously involve increased morphosyntactic processing, but instead likely involve increasing working memory processing, like garden-path sentences, grammatical wh-dependencies, and semantically reversible sentences. This suggests that the morphosyntactic processing interpretation of the LAN cannot be the whole story (see also Martín-Loeches et al. 2005 for some evidence that morphosyntactic LANs and working memory LANs may have different scalp topographies).

15.3.3 The N400 The N400 is a negative-going deflection that is generally largest over centro-parietal electrode sites, and tends to occur 300–500ms post-stimulus onset (with a peak amplitude occurring at 400ms). The N400 was first found by Kutas and Hillyard (1980) when they presented participants with sentences that ended with unexpected words. They compared a baseline sentence with semantically congruent endings (3a) to sentences with semantically incongruent endings (3b) and sentences with endings that were incongruent due to the physical properties of the stimulus such as words written in all-capital letters (3c): (3) a. semantically congruent:

I spread the warm bread with butter.

b. semantically incongruent: I spread the warm bread with socks. c. physically incongruent:

I spread the warm bread with BUTTER.

Kutas and Hillyard (1980) observed a larger N400 for (3b) compared to (3a), and a larger P300 (also known as a P3b) to (3c) compared to (3a). This qualitative difference in the responses to (3b) versus (3a) suggests that the N400 is specifically related to semantic processes rather than general error detection. In the decades since its discovery, the N400 has been elicited by a broad array of linguistic and non-linguistic stimuli, with the common pattern being that they are all meaningful in some way: spoken words, written, words, signed words, pseudowords, acronyms, environmental sounds, faces, and gestures (see Kutas and Federmeier 2011 for a comprehensive review, and Lau et al. 2008 for a review of the brain networks underlying the N400). There are (at least) two dominant functional interpretations of the N400, though none appear to capture all of the N400 results in the literature. The first is that the N400 indexes the difficulty of semantic integration (Hagoort 2008; Osterhout and Holcomb 1992; Brown and Hagoort 1993). Under this view, increases in N400 amplitude reflect the increased difficulty of integrating incongruent, unexpected, or semantically unrelated words into the preceding context. The second view is that the N400 indexes processes related to the activation of semantic features in the mental lexicon. Under this view, decreases in N400 amplitude reflect the ease of activation (or pre-activation) for congruent, predicted, and semantically related words (Federmeier and Kutas 1999; Kutas and Federmeier 2000; Lau et al. 2009). The N400 is by far the most studied ERP

electrophysiological methods

543

effect in the language-processing literature, and the pattern of results is both subtle and complex. Though it is tempting to set the N400 aside as a “semantic” effect, and therefore irrelevant to theories of syntax, the fact that the N400 touches upon issues like memory, predictability, and the mental lexicon means that it is a potentially valuable tool for probing theories of sentence processing.

15.3.4 The P600 The P600 (alternatively the “syntactic positive shift”) is a positive-going deflection that is generally largest over centro-parietal electrode sites and tends to occur 500– 800ms post-stimulus onset (although there is a good deal of variability in the latency in the ERP literature). Like the LAN, the P600 has been reported for a broad array of syntactic violations, in many cases co-occurring with a preceding LAN. P600s have been elicited to phrase structure violations (Hagoort, Brown, and Groothusen 1993; Friederici et al. 1993; Hahne and Friederici 1999; Friederici and Frisch 2000; Osterhout and Holcomb 1992), agreement violations (Hagoort, Brown, and Groothusen 1993; Kaan 2002), syntactic garden-paths (Friederici et al. 1996; Kaan and Swaab 2003; Osterhout, Holcomb, and Swinney 1994), and island violations (McKinnon 1996). P600s have also been elicited by the processing of grammatical sentences with particularly complex syntactic properties, such as ambiguous structures (Frisch, Schlesewsky, Saddy, and Alpermann 2002), wh-movement (Fiebach, Schlesewsky, and Friederici 2002; Kaan, Harris, Gibson, and Holcomb 2000; Phillips, Kazanina, and Abada 2005), and unexpected theta-role assignments (Kim and Osterhout 2005; Kuperberg, Sitnikova, Caplan, and Holcomb 2003; van Herten, Kolk, and Chwilla 2005; Kuperberg 2007; Bornkessel-Schlesewsky and Schlesewsky 2008; Stroud and Phillips 2012). There are two central questions about the functional interpretation of the P600 in the literature. The first is whether there is a single functional interpretation that can cover the full range of P600 effects. Syntactic violations and ambiguous grammatical sentences could potentially be unified under an interpretation of the P600 as syntactic reanalysis (though questions remain as to how many distinct reanalysis operations there are). However, the fact that P600s arise in wh-dependencies (at the verb or preposition that selects the filler) is hard to capture under syntactic reanalysis, suggesting that perhaps the P600 is a family of responses with potentially distinct functional interpretations (see Gouvea et al. 2010 for a comparison of several types of P600s in a single experiment). The second question is whether the P600s that arise to ungrammatical sentences are specific to language or are a domain-general response to unexpected stimuli. One possibility is that these P600s are a temporally delayed version of the P300 (or P3b), which is a well-known domain-general response to unexpected stimuli (Coulson et al. 1998; Osterhout and Hagoort 1999; Sassenhagen et al. 2014).

544

jon sprouse and diogo almeida

15.3.5 Sustained Anterior Negativity (SAN) Sustained anterior negativities are negative-going deflections that tend to appear over anterior electrode sites (though not exclusively), and tend to last for several words during the processing of wh-dependencies and relative-clause dependencies (King and Kutas 1995; Fiebach et al. 2002; Phillips et al. 2005). SANs have typically been interpreted as an index of working-memory usage because (i) they appear during dependency processing, which almost certainly involves working memory, and (ii) similar anterior negativities have been reported for working-memory tasks outside of sentence processing (e.g. Ruchkin et al. 1990). SANs have been less studied relative to some of the other ERPs that arise during sentence processing, partly because they appear to be related to a system outside of the grammar (working memory), and partly because the relationship between SANs and working-memory theories is currently unclear. The original functional interpretation of SANs was as an index of working-memory load due to maintaining the filler in working memory (King and Kutas 1995); but more recent models of working memory in sentence processing have eliminated maintenance costs from the theory in favor of retrieval and interference costs (e.g. McElree et al. 2003; Lewis and Vasishth 2005). Nonetheless, SANs potentially provide an index for working-memory effects (of some sort) during sentence processing, and therefore may be a useful tool for experimental syntacticians interested in dependencies.

15.4 A brief review of time-frequency effects during sentence processing

..........................................................................................................................

Similar to the previous section, our goal in this section is to provide a brief review of the time-frequency literature that experimental syntacticians interested in EEG can use as a starting point for new research. Time-frequency results are typically analyzed in frequency bands. These bands group together frequencies that tend to covary in various domains of cognition. The bands are named after Greek letters. The precise boundaries of the bands can vary by one or two Hertz from study to study, so here we simply provide an example of range boundaries, rather than a hard a fast definition: delta (1Hz–3Hz), theta (4Hz–7Hz), alpha (8Hz–12Hz), lower beta (13Hz–20Hz), upper beta (21Hz–30Hz), and gamma (>30Hz). Because the TF technique yields both power and phase information at each frequency or frequency band, at each electrode site, through time, there are a number of measures that can be derived, such as local changes in power (at one or more electrode sites), correlated fluctuations in power or phase in one frequency band across spatially distinct electrode sites (called coherence), and correlated fluctuations in power or phase across distinct frequency bands (called cross-frequency coupling). Compared to the ERP technique, time-frequency decomposition is a relatively new, and rapidly growing, segment of the sentence processing literature. There

electrophysiological methods

545

are far fewer established debates about the functional interpretation of TF results than there are about the functional interpretation of ERP results. Therefore in this section we review a targeted selection of results. This is not intended as a comprehensive review (for a more comprehensive review, see Bastiaansen et al. 2013). To that end, we have chosen to organize the results according to the types of linguistic manipulations in these studies, subdivided by the types of ERPs that they typically elicit: syntactic violations that lead to P600s, syntactic violations that lead to ELANs, semantic violations that lead to N400s, and dependencies that lead to SANs. Our hope is that this will allow readers to explore their own hypotheses about the functional interpretation of the various TF results (and maybe spur ideas for future research).

15.4.1 Syntactic violations that lead to P600s: increase in power in theta, decrease in alpha and beta Bastiaansen, van Berkum, and Hagoort (2002) performed time-frequency decomposition on the EEG response to two types of syntactic violations in Dutch (relative to a grammatical control sentence): a gender agreement violation between an adjective and noun, and a number agreement violation between an adjective and noun. Examples are given in (4), where com means common case, and neu means neuter case. (4) a. grammatical control: Ik zag een donkere wolk I saw a b. gender violation:

aan de horizon

dark.com cloud.com on the horizon

Ik zag een donker

wolk

aan de horizon

I saw a dark.neu cloud.com on the horizon c. number violation:

Ik zag enkele donkere wolk

aan de horizon

I saw several dark cloud.sg

on the horizon

Both violations lead to a P600 response in the ERP domain (with relatively similar latency and scalp distribution). Both violations lead to an increase in power in the theta band 300–500ms post violation, with the gender violation showing a right-anterior scalp distribution, and the number violation showing a left-anterior scalp distribution. These results are potentially interesting in two ways. First, the latency of the time-frequency response (300–500ms) differs from the latency of the ERP response (500–800ms). Second, the scalp distributions of the time-frequency responses vary by violation, whereas the scalp distributions of the ERP responses do not. This result was one of the first demonstrating that time-frequency analysis can yield different information than ERP analysis (this is obviously true in principle, but Bastiaansen et al. demonstrated that it was also true in practice). Davidson and Indefrey (2007) investigated the ERP and time-frequency responses to both number and phrase structure violations in English. Examples are given in (5) below. (5) a. number violation:

The children walks to school

b. phrase structure violation: Max’s proof the of theorem

546

jon sprouse and diogo almeida

They found P600 responses to both violations in the ERP domain, as expected given the previous literature, and a decrease in power in both the alpha and beta frequency bands in the time-frequency domain. Crucially, they found a relationship between the ERP and time-frequency responses: Participants who showed a larger P600 effect also showed a larger decrease in alpha and beta power. Davidson and Indefrey characterize this as an inverse relationship: an increase in the ERP correlates with a decrease in timefrequency power.

15.4.2 Syntactic violations that lead to ELANs: disruption in beta, decreases in power in alpha and gamma Bastiaansen, Magyari, and Hagoort (2010) investigated the time-frequency response to word category violations compared to grammatical sentences and random reorderings of the words in the sentence. Examples are given in (6). (6) a. grammatical control:

Janneke kreeg

de zegen

Janneke got

the blessing at the river

b. word category violation: Janneke kreeg Janneke got c. random order

bij de rivier.

de zegenen bij de rivier. the to-bless at the river

De de Janneke zegen

kreeg rivier bij

The the janneke blessing got

river at

Bastiaansen et al. used MEG for this particular study, so there is no ERP effect to report; that said, word category violations typically yield an ELAN effect in EEG studies. In the time-frequency domain, Bastiaansen et al. found a linear increase in power in the lower beta frequency band to grammatical sentences (i.e. the power in these two bands increased with each successive word in the sentence). They found that the word category violation disrupted this linear increase in beta (creating what could look like a decrease in beta, similar to the Davidson and Indefrey 2007 result), in addition to creating a decrease in power in the alpha band (again, similar to Davidson and Indefrey) and gamma bands. There was no linear increase in beta in response to the random order condition. They also report a linear increase in the theta band to all three conditions that does not appear to be disrupted by either the syntactic violation or the random ordering of words.

15.4.3 Semantic violations that lead to an N400: increase in power in theta and gamma Hagoort, Hald, Bastiaansen, and Petersson (2004) report to the time-frequency response to two types of meaning-related violations in Dutch: violations of semantic

electrophysiological methods

547

congruency (e.g. trains cannot be sour), and violations of arbitrary facts about the world (e.g. trains in the Netherlands are not white). Though the stimuli were in Dutch, Hagoort et al. only report the English translations: (7) a. grammatical control: The dutch trains are yellow and very crowded. b. world violation:

The dutch trains are white and very crowded.

c. semantic violation:

The dutch trains are sour and very crowded.

Both of these violations yield N400s in the ERP domain, with a slightly larger N400 for the semantic violation than the world knowledge violation. In the time-frequency domain, both violations yield an increase in power in the theta band, with a slightly larger increase to the semantic violation. The world knowledge violation also yielded an increase in power in the gamma band. Though this result suggests a direct relationship between the size of the N400 effect and the size of the increase in power in the theta band, it appears as though Hagoort et al. report total power—that is, a power analysis that includes both phase-locked (i.e. ERP) and non-phase-locked activity. Thus the larger power increase for semantic violations could simply reflect the larger ERP effect. The Davidson and Indefrey (2007) study mentioned previously computed induced power only—that is, non-phase-locked power. Davidson and Indefrey also investigated violations of semantic congruency, and found an inverse relationship between the size of the N400 effect in the ERP domain and the size of the (induced only) theta band increase in the time-frequency domain: Larger N400 effects lead to smaller (induced only) theta band increases. Wang, Zhu, and Bastiaansen (2012) further elaborated the investigation of semantic congruency violations by partially crossing congruency and predictability, leading to three conditions: congruent and predictable, congruent and unpredictable, and incongruent (which is also unpredictable). (8) a. congruent+predictable:

In the concert hall an orchestra played the second symphony of Beethoven.

b. congruent+unpredictable: In the concert hall an expert played the second symphony of Beethoven. c. incongruent:

In the concert hall a finding played the second symphony of Beethoven.

In the ERP domain, Wang et al. found the expected cline in N400 deflections: the incongruent condition leads to the largest N400 deflection, the congruent and unpredictable condition leads to a smaller N400 deflection, and the congruent and predictable condition leads to the smallest N400 deflection. In the time-frequency domain, Wang et al. found two effects. First, the congruent and predictable condition showed an increase in power in the gamma band relative to the two other conditions (which showed no difference relative to each other). Second, the incongruent (and unpredictable) condition showed an increase in power in the theta band relative to the two other conditions (which showed no difference relative to each other). Wang et al. take this to suggest that

548

jon sprouse and diogo almeida

gamma activity may be related to predictability, since it appears to divide the conditions by predictability, whereas theta activity may be related to (semantic congruency) error detection, as it appears to divide the conditions by (semantic congruency) error.

15.4.4 Dependencies that lead to SANs: increased coherence in theta, beta, and gamma Weiss, Mueller, Schack, King, Kutas, and Rappelsberger (2005) investigated the timefrequency response to subject and object relative clauses, as in (9): (9) a. subject RC: The fireman who __ speedily rescued the cop sued the city … b. object RC:

The fireman who the cop speedily rescued __ sued the city. . .

As mentioned in Section 15.3.1, in the ERP domain, object relative clauses elicit a sustained anterior negativity relative to subject relative clauses (King and Kutas 1995). In the time-frequency domain, Weiss et al. found that object relative clauses showed increased coherence between anterior and posterior electrode sites in the gamma band during the relative clause (compared to subject relative clauses). They also found that object relative clauses showed increased coherence between anterior and posterior electrode sites in the theta and beta bands for several words after the gap location of the relative clause (compared to subject relative clauses).

15.5 Magnetoencephalography

..........................................................................................................................

Magnetoencephalography (MEG) shares many of the basic characteristics of EEG and, therefore many of its potential points of interest for the experimental syntactician. MEG, like EEG, directly records the summed post-synaptic potentials of large populations of spatially aligned and synchronously firing pyramidal cells in the brain (for review, see Cohen and Halgren 2009; Ahlfors and Mody 2019). The primary difference between the two techniques is that while EEG captures electrical activity via electrodes directly attached to the scalp at standardized locations, MEG captures magnetic activity via specialized sensors called Superconducting Quantum Interference Devices (SQUIDs) that are housed in a helmet-shaped dewar filled with liquid helium. Because the signals that are recorded by MEG are orders of magnitude smaller than ambient magnetic noise, the whole system needs to be insulated in a magnetically shielded room, and advanced signal processing techniques are often used to further denoise the data (Cohen and Halgren 2009). These characteristics make MEG a much more expensive technology compared to EEG, which has been a major barrier to its more widespread adoption. Moreover, in addition to cost, MEG imposes other constraints on the experimenter that EEG does not. First, there are different types of MEG sensors (SQUIDs), such as

electrophysiological methods

549

magnetometers and gradiometers—with the latter also having different orientations (planar vs. axial)—and they capture different aspects of the magnetic fields generated by the brain. However, manufacturers of MEG systems have not standardized their sensor arrays, resulting in characterizations of magnetic fields that may superficially look different depending on the system.1 Second, because the sensors in MEG are attached to the device and not to the participant’s scalp, as in EEG, the initial head positioning and its maintenance during a recording session is a crucial consideration in MEG studies, creating special challenges for long (e.g. over 1 hour) single-session recordings or multi-session recordings. In addition, the extreme sensitivity of MEG to magnetic noise precludes participants with embedded magnetic material in their body (e.g. some types of dental work or metal pins or plates) from participating in studies. Finally, the sources of signals recorded by MEG are more restricted than those of EEG. Namely, sources whose electric current flows radially to the scalp, like those in the gyri, generate little to no MEG signal (Cohen and Cuffin 1983; Ahlfors, Han, Belliveau, and Hamalainen 2010), and deep (subcortical) sources are severely attenuated or even absent in MEG, due to their physical distance from the MEG sensors (Cohen and Halgren 2009; Ahlfors and Mody 2019). Thus, given the extra challenges and increased costs associated with MEG compared to EEG, what justification could there be for using the former instead of the latter? The answer is twofold. First, the fact that MEG is sensitive primarily to sources oriented tangentially to the scalp is not necessarily a weakness, as EEG’s sensitivity to tangentially and radially oriented sources is not homogenous and may in fact be dominated by the latter due to radially oriented sources being more often found in gyri, in closer proximity to the electrodes (Cohen and Halgren 2009). Thus, the two types of signals can be thought of as complementing one another, rather than being in a strict subset– superset relationship. Second, magnetic signals are not distorted by matter like electric signals are, and thus source information in MEG is not as blurred as in EEG. As a result, sensor-level information in MEG carries more spatial detail about source activity than in EEG. The N400 component provides a clear case: in EEG, it presents with a centroparietal maximum with extensive bilateral slopes, but in MEG it presents with a clear left-lateralized maximum (e.g. Lau, Almeida, Hines and Poeppel 2009; Wang, Jensen, Van den Brink, Weder, Schoffelen, Magyari, Hagoort, and Bastiaansen 2012). In addition, because blurring due to volume conduction in MEG is negligible, head models can be greatly simplified compared to EEG, reducing the potential of errors creeping into the calculations of electromagnetic source localization (Cohen and Halgren 2009). Furthermore, EEG and MEG can be combined into multimodal recording sessions and their joint analysis leads to significant improvements in source localization accuracy,

1 A similar problem may occur in EEG when it comes to how the data is referenced, leading to seemingly different field maps for the same exact data. This potential problem is largely mitigated by explicit disclosure of the reference decisions made by the experimenter, and the conversion between different reference choices is generally trivial when the data is available.

550

jon sprouse and diogo almeida

above and beyond what can be obtained on the basis of the independent consideration of each individual modality (Mosher, Spencer, Leahy, and Lewis 1993; Sharon, Hamalainen, Tootell, Halgren, and Belliveau 2007). Thus, MEG, and especially the combination of EEG and MEG, can not only exploit the excellent temporal resolution of electrophysiological signals but also begin to approximate hemodynamic technologies like fMRI and PET in terms of their spatial resolution, resulting in a clearer picture about the nature of the brain dynamics underlying language, albeit at a higher technical and monetary cost.

15.6 Linking EEG/MEG and syntax

..........................................................................................................................

Like any measure related to sentence processing, linking syntactic theories and EEG/MEG responses requires specifying a linking hypothesis between syntactic theories and sentence-processing theories. This is a recurring theme in this Handbook. With that link in mind, it is in principle possible to look for differential predictions made by the combined syntactic and sentence processing theories in the EEG/MEG domain—ERPs, time-frequency responses, etc. This is no small order. In this section, we briefly discuss a few of the studies that have explored different approaches to linking syntax and EEG/MEG. Our hope is that these examples will provide a starting point for experimental syntacticians who are beginning to think about their own experiments. Bemis and Pylkkänen (2011) attempted to isolate syntactic and semantic combinatorial processes by recording MEG while presenting participants with two-word sequences that form a syntactically and semantically well-formed phrase, such as red boat (presented one word at a time, visually), and comparing the activation to two item sequences that do not form a syntactic or semantic phrase, such as noun lists (cup, boat), and sequences that include an unpronounceable consonant string (xkq boat). The logic of this design is that these two-word phrases likely involve the fundamental processes of syntactic and semantic composition, while avoiding many of the other processes that arise during the processing of complete sentences. They found two potentially interesting patterns of activity: an increase in activity for the two-word condition 200–250ms after the onset of the second word that localizes to an area of the cortex that has been linked to syntactic processing in the past (left anterior temporal lobe), and an increase in activity for the two-word condition 300–500ms after the onset of the second word that localizes to an area of the cortex that has been linked to semantic processing in the past (ventromedial prefrontal cortex). These results suggest that minimal designs like this could be used to isolate fundamental syntactic and semantic processes, while sidestepping certain potential confounds (a research program that the Pylkkänen lab has been exploring in the domain of semantic processing). The are two familiar challenges to expanding on this approach: (i) the theoretical challenge of identifying fundamental syntactic processes beyond basic phrasal composition, and (ii) the

electrophysiological methods

551

methodological challenge of isolating those processes in concrete stimuli. Nonetheless, these results are an encouraging proof of concept for minimal designs. Brennan and Pylkkänen 2016 recorded MEG while participants read a 1,279-word story, one word at a time. They constructed a context-free grammar for prepositional phrases that could be used to syntactically analyze 224 words in the story (i.e. the prepositions, nouns, determiners, and adjectives inside of prepositional phrases). They combined that grammar with two parsers: a left-corner parser, which is a psycholinguistically plausible model for how humans process sentences, and a type of bottom-up parser, which is not generally considered a psycholinguistically plausible model for how humans process sentences. They then calculated the number of parser operations triggered by each of the 224 words according to each parser, and looked for correlations between the parser operation counts at each word, and source-localized MEG activity. The idea behind this analysis is that the number of parser operations can serve as a type of incremental complexity metric for each parser; if an area of cortex shows a pattern of activation that correlates with this complexity metric, it suggests that the area of cortex is involved in the syntactic processing predicted by the parser. They found no cortical areas that showed activity that significantly correlated with the bottom-up parser. For the left-corner parser, they found a significant correlation between the number of parse steps and activity in the left anterior temporal lobe 300–500ms after word onset. These results suggest that incremental complexity metrics derived from psycholinguistically plausible grammar plus parser combinations can be correlated with electrophysiological activity in a way that complexity metrics from sophisticated-yet-implausible parsers cannot. To expand on this approach, one could imagine (i) scaling up the grammatical analysis to cover more complex syntactic phenomena, and (ii) comparing the predictions made by two distinct grammars combined with the same parser. Nelson et al. (2017) recorded intracranial EEG from 12 participants (who were undergoing a separate medical procedure requiring intracranial EEG and several days of waiting) while they read sentences between 3 and 10 words long, one word at a time. Crucially, these sentences contained a subject noun phrase that varied in length, such as Ten students and Ten sad students of Bill Gates. Nelson et al. measured broadband high gamma activity (typically 60Hz to 200Hz), which can be interpreted as an index of the intensity of the activity of the neurons that are local to a given intracranial electrode. They report a number of findings, including correlations between incremental complexity metrics for top-down and left-corner parsers (combined with a contextfree grammar that covers the sentences), and the spatial distributions of the various results. Their primary finding is that there is an increase in high gamma activity as the constituent length of the subject noun phrase increases, with a decrease in activity at both potential and actual constituent boundaries. They interpret this pattern as a potential correlate of phrase-structure building (e.g. minimalist merge). These result are particularly encouraging for future intracranial EEG studies; unfortunately, they may be less encouraging for extracranial EEG studies, given that high gamma is typically not detectable on the scalp (because higher frequencies tend to have lower amplitude, and

552

jon sprouse and diogo almeida

therefore are more likely to be attenuated by the biological tissue between the cortex and the scalp). Nonetheless, these results suggest that there are (potential) electrophysiological correlates of syntactic structure-building that are detectable with some form of current EEG technology, which in turn suggests that it may be worthwhile for experimental syntacticians to look for similar correlates in frequency ranges that are detectable on the scalp. Ding et al. (2016) used a technique known as a steady-state response to demonstrate that speakers construct units of syntactic structure that are larger than syllables and words, such as phrases and sentences. The idea of a steady-state response is to present stimuli at a constant rate to induce a response in the brain at that presentation rate. Ding et al. auditorily synthesized monosyllabic words in Mandarin, and presented them to native speakers at a rate of 4Hz (250ms per word) while recording MEG. Crucially, the words were arranged into four-word sentences, each consisting of two two-word phrases (e.g. New plans gave hope), with no breaks between the sentences. Ding et al. looked at the frequency response induced by this design. They found statistically significant increases in power at 1Hz, 2Hz, and 4Hz. The 4Hz activity is not surprising—the physical stimuli were presented at 4Hz. The 1Hz and 2Hz activity is a different story. The 1Hz activity appears to reflect the construction of complete sentences (4 words presented at 250ms per word yields one complete sentence per second). The 2Hz activity appears to reflect the construction of the two phrases that constitute each sentence. Ding et al. provide causal support for this interpretation by showing that unstructured word lists generate activity only at 4Hz, and that the Mandarin sentences only generate 4Hz activity in English speakers who do not speak Mandarin. They also show that the frequency responses can be modulated by manipulating the size of the phrases in the sentences (i.e. a three-word verb phrase or three-word noun phrase). These results suggest that the 1Hz and 2Hz activity increases were not driven by the 4Hz presentation rate (i.e. they were not harmonics of the presentation rate), but rather were driven by syntactic processing of the stimuli. This in turn suggests that steady-state designs may be another potential tool for experimental syntacticians to explore. The primary challenge to expanding the use of steady-state designs is to figure out how to induce more complex or subtle syntactic phenomena at a constant rate. Hale, Dyer, Kuncoro, and Brennan (2018) demonstrate a slightly different path forward for linking syntactic analyses and electrophysiological responses. Hale et al. introduce a number of new approaches to linking EEG and sentence processing that are beyond the scope of this chapter, from leveraging recurrent neural network grammars (e.g. Dyer et al. 2016) to leveraging beam search as a sort of parsing algorithm (e.g. Stern et al. 2017). For our purposes, we want to focus on their use of the informationtheoretic measure surprisal as an incremental measure of complexity during sentence processing (i.e. an incremental complexity metric). Surprisal is a measure of the unexpectedness of word given the previous words in the sentence. Surprisal is defined in such a way that it is larger for unexpected words, and smaller for expected words. (Surprisal is mathematically defined as the logarithm of the reciprocal of the probability

electrophysiological methods

553

of the appearance of that word as the next word in the string, which can be equivalently calculated as the negative log of the transitional probability of the word; see Hale 2016 for a detailed review.) Surprisal is calculated directly from a probabilistic grammar, without any explicit reference to a mechanistic parsing theory like top-down or left-corner parsers. We can thus make a distinction between information-theoretical metrics like surprisal and automata-theoretic (or memory-theoretic) metrics like stack size or number of parsing operations. It is an open question to what extent the two types of metrics provide different windows into the properties of human sentence processing. To explore the use of information-theoretic metrics, Hale et al. recorded EEG while participants passively listened to a spoken presentation of the first chapter of Alice in Wonderland. They calculated various information-theoretic metrics for each word in the presentation, including surprisal, and used those metrics to search for activity in the (time-amplitude domain of the) EEG that correlated with those metrics. They identified two significant effects: a positivity over frontal electrode sites around 250ms after word onset, and a positivity over central electrode sites around 600ms after word onset. It is interesting to note that these two effects appear similar to two well-known ERPs in the literature—the P2 (not discussed here, but plausibly linked to predictability in the lexical-access literature; e.g. Almeida and Poeppel 2013), and the P600 (as discussed in Section 15.3.4). These results suggest that information-theoretical complexity measures provide another potential tool for connecting to explicit syntactic analyses to electrophysiological responses during sentence processing.

15.7 Conclusion

..........................................................................................................................

It is our hope that this chapter provides a relatively useful introduction to both the prospects and challenges of using electrophysiological measures like EEG and MEG to study syntax. We believe that there is quite a bit of potential for new work in this area, both in principle, because syntactic theories are ultimately intended to be theories of cognition, and in practice, because new methods and linking theories are constantly being developed to link syntax, sentence processing, and electrophysiology. That said, this line of research is not without risk, both because of the biological facts of EEG and MEG (i.e. they only detect a subset of cortical activity) and because of the size of the methodological challenge (linking syntactic and sentence-processing theories is no small task). There are also a number of practical challenges, such as the time requirements for data collection (often several months for one experiment), and the time requirements to learn and deploy the complex data analysis techniques discussed in previous sections (again, often several months). As such, we recommend that experimental syntacticians adopt electrophysiological methods only after careful consideration of all of these factors. It is a classic instance of a high-risk/high-reward method, at least in the context of syntactic theory. That said, for those who decide that that they are willing to assume those risks, we believe that it is a potentially exciting and valuable tool for experimental syntax.

554

jon sprouse and diogo almeida

References Ahlfors, Seppo P., Jooman Han, John W. Belliveau, and Matti S. Hamalainen. 2010. Sensitivity of MEG and EEG to source orientation. Brain Topography 23(3): 227–232. Ahlfors, Seppo P., and Maria Mody. 2019. Overview of MEG. Organizational Research Methods 22(1): 95–115. Ainsworth-Darnell, Kim, Harvey G. Shulman, and Julie E. Boland. 1998. Dissociating brain responses to syntactic and semantic anomalies: Evidence from event-related potentials. Journal of Memory and Language 38: 112–130. Allopenna, Paul, James S. Magnuson, and Michael K. Tanenhaus. 1998. Tracking the time course of spoken word recognition using eye ovements: Evidence for continuous apping models. Journal of Memory and Language 38: 419–439. Almeida, Diogo, and David Poeppel. 2013. Word-specific repetition effects revealed by MEG and the implications for lexical access. Brain and Language 127(3): 497–509. Bastiaansen, Marcel C. M., Jos J. A. van Berkum, and Peter Hagoort. 2002. Syntactic processing modulates the theta rhythm of the human EEG. NeuroImage 17: 1479–1492. Bastiaansen, Marcel C. M., Lilla Magyari, and Peter Hagoort. 2010. Syntactic unification operations are reflected in oscillatory dynamics during on-line sentence comprehension. Journal of Cognitive Neuroscience 22: 1333–1347. Bastiaansen, Marcel C. M., Ali Mazaheri, and Ole Jensen. 2013. Beyond ERPs: Oscillatory neuronal dynamics. In Steven J. Luck and Emily S. Kappenman (eds), The Oxford handbook of event-related potential components, 31–50. Oxford: Oxford University Press. Bemis, Douglas, and Liina Pylkkänen. 2011. Simple composition: A magnetoencephalography investigation into the comprehension of minimal linguistic phrases. Journal of Neuroscience 31: 2801–2814. Bornkessel-Schlesewsky, Ina, and Matthias Schlesewsky. 2008. An alternative perspective on “semantic P600” effects in language comprehension. Brain Research Reviews 59: 55–73. Brennan, Jonathan R., and Liina Pylkkänen. 2016. MEG evidence for incremental sentence composition in the anterior temporal lobe. Cognitive Science 41: 1515–1531. Brown, Colin, and Peter Hagoort. 1993. The processing nature of the n400: Evidence from masked priming. Journal of Cognitive Neuroscience 5: 34–44. Cohen, David, and B. Neil Cuffin. 1983. Demonstration of useful differences between magnetoencephalogram and electroencephalogram. Electroencephalography and Clinical Neurophysiology 56(1): 38–51. Cohen, David, and Eric Halgren. 2009. Magnetoencephalograpy. In L. R. Squire (ed.), Encyclopedia of neuroscience, vol. 5, 615–622. Orlando, FL: Academic Press. Cohen, Mike X. 2014. Analyzing neural time series data. Cambridge, MA: MIT Press. Coulson, Seana, Jonathan King, and Marta Kutas. 1998. Expect the unexpected: Event-related brain response to morphosyntactic violations. Language and Cognitive Processes 13: 21–58. Davidson, Doug J., and Peter Indefrey. 2007. An inverse relation between event-related and time-frequency violation responses in sentence processing. Brain Research 1158: 81–92. Dikker, Suzanne, Hugh Rabagliati, and Liina Pylkkänen. 2009. Sensitivity to syntax in the visual cortex. Cognition 110: 293–321. Ding, Nai, Lucia Melloni, Hang Zhang, Xing Tian, and David Poeppel. 2016. Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience 19: 158–164. Dyer, Chris, Adhiguna Kuncoro, Miguel Ballesteros, and Noah A. Smith. 2016. Recurrent neural network grammars. In Proceedings of the 2016 Conference of the North American Chapter

electrophysiological methods

555

of the Association for Computational Linguistics: Human Language Technologies, 199–209. San Diego, CA. Federmeier, Karen D., and Marta Kutas. 1999. A rose by any other name: Long-term memory structure and sentence processing. Journal of Memory and Language 41: 469–495. Fiebach, Christian J., Matthias Schlesewsky, and Angela D. Friederici. 2002. Separating syntactic memory costs and syntactic integration costs during parsing: The processing of german wh-questions. Journal of Memory and Language 47: 250–272. Frazier, Lyn. 1978. On comprehending sentences: syntactic parsing strategies. PhD dissertation, University of Connecticut. Friederici, Angela D. 1995. The time course of syntactic activation during language processing: A model based on neuropsychological and neurophysiological data. Brain and Language 50: 259–281. Friederici, Angela D., and Stefan Frisch. 2000. Verb argument structure processing: The role of verb-specific and argument-specific information. Journal of Memory and Language 43: 476–507. Friederici, Angela D., Anja Hahne, and Axel Mecklinger. 1996. Temporal structure of syntactic parsing: Early and late event-related brain potential effects. Journal of Experimental Psychology: Learning, Memory, and Cognition 22: 1219–1248. Friederici, Angela D., Erdmut Pfeifer, and Anja Hahne. 1993. Event-related brain potentials during natural speech processing: Effects of semantic, morphological and syntactic violations. Cognitive Brain Research 1: 183–192. Frisch, Stefan, Matthias Schlesewsky, Douglas Saddy, and Annegret Alpermann. 2002. The P600 as an indicator of syntactic ambiguity. Cognition 85: B83–B92. Gouvea, Ana, Colin Phillips, Nina Kazanina, and David Poeppel. 2010. The linguistic processes underlying the P600. Language and Cognitive Processes 25: 149–188. Gunter, Thomas G., Laurie A. Stowe, and Gusbertus Mulder. 1997. When syntax meets semantics. Psychophysiology 34: 660–676. Hagoort, Peter. 2008. The fractionation of spoken language understanding by measuring electrical and magnetic brain signals. Philosophical Transactions of the Royal Society B: Biological Sciences 363: 1055–1069. Hagoort, Peter, Colin Brown, and Jolanda Groothusen. 1993. The syntactic positive shift (SPS) as an ERP measure of syntactic processing. Language and Cognitive Processes 8: 439–483. Hagoort, Peter, Marlies Wassenaar, and Colin M. Brown. 2003. Syntax-related ERP-effects in Dutch. Cognitive Brain Research 16: 38–50. Hagoort, Peter, Lea Hald, Marcel C. M. Bastiaansen, and Karl Magnus Petersson. 2004. Integration of word meaning and world knowledge in language comprehension. Science 304: 438–441. Hahne, Anja, and Angela D. Friederici. 1999. Electrophysiological evidence for two steps in syntactic analysis: Early automatic and late controlled processes. Journal of Cognitive Neuroscience 11: 194–205. Hahne, Anja, and Angela D. Friederici. 2002. Differential task effects on semantic and syntactic processes as revealed by ERPs. Cognitive Brain Research 13: 339–356. Hale, John. 2016. Information-theoretical complexity metrics. Language and Linguistics Compass 10: 397–412. Hale, John, Chris Dyer, Adhiguna Kuncoro, and Jonathan R. Brennan. 2018. Finding syntax in human encephalography with beam search. arXiv:1806.04127.

556

jon sprouse and diogo almeida

Hinojosa, José A., Manuel Martin-Loeches, Pilar Casado, Francisco Muñoz, and Francisco J. Rubia. 2003. Similarities and differences between phrase structure and morphosyntactic violations in Spanish: An event-related potentials study. Language and Cognitive Processes 18: 113–142. Isel, Frédéric, Anja Hahne, Burkhard Maess, and Angela D. Friederici. 2007. Neurodynamics of sentence interpretation: ERP evidence from French. Biological Psychology 74: 337–346. Kaan, Edith. 2002. Investigating the effects of distance and number interference in processing subject–verb dependencies: An ERP study. Journal of Psycholinguistic Research 31: 165–193. Kaan, Edith, Anthony Harris, Edward Gibson, and Phillip J. Holcomb. 2000. The P600 as an index of syntactic integration difficulty. Language and Cognitive Processes 15: 159–201. Kaan, Edith, and Tamara Y. Swaab. 2003. Electrophysiological evidence for serial sentence processing: A comparison between non-preferred and ungrammatical continuations. Cognitive Brain Research 17: 621–635. Kappenman, Emily S., and Steven J. Luck. 2012. ERP Components: The Ups and Downs of Brainwave Recordings. In Emily S. Kappenman and Steven J. Luck (eds.), The Oxford Handbook of Event-Related Potential Components, 3–30. Oxford University Press. Kim, Albert, and Lee Osterhout. 2005. The independence of combinatory semantic processing: Evidence from event-related potentials. Journal of Memory and Language 52: 205–225. King, Jonathan, and Marta Kutas. 1995. Who did what and when? Using word- and clause level ERPs to monitor working memory usage in reading. Journal of Cognitive Neuroscience 7: 376–395. Kluender, Robert, and Marta Kutas. 1993a. Bridging the gap: Evidence from erps on the processing of unbounded dependencies. Journal of Cognitive Neuroscience 5: 196–214. Kluender, Robert, and Marta Kutas. 1993b. Subjacency as a processing phenomenon. Language and Cognitive Processes 8: 573–633. Kuperberg, Gina R. 2007. Neural mechanisms of language comprehension: Challenges to syntax. Brain Research 1146: 23–49. Kuperberg, Gina R., Tatiana Sitnikova, David Caplan, and Phillip J. Holcomb. 2003. Electrophysiological distinctions in processing conceptual relationships within simple sentences. Cognitive Brain Research 17: 117–129. Kutas, Marta, and Karen D. Federmeier. 2000. Electrophysiology reveals semantic memory use in language comprehension. Trends in Cognitive Sciences 4: 463–470. Kutas, Marta, and Kara D. Federmeier. 2011. Thirty years and counting: Finding meaning in the N400 component of the event-related brain potential (ERP). Annual Review of Psychology 62: 621–647. Kutas, Marta, and Steven Hillyard. 1980. Event-related brain potentials to semantically inappropriate and surprisingly large words. Biological Psychology 11: 99–116. Lau, Ellen, Colin Phillips, and David Poeppel. 2002. A cortical network for semantics: (de)constructing the N400. Nature Reviews Neuroscience 9(12): 920–933. Lau, Ellen, Clare Stroud, Silke Plesch, and Colin Phillips. 2006. The role of structural prediction in rapid syntactic analysis. Brain and Language 98: 74–88. Lau, Ellen, Diogo Almeida, Paul C. Hines, and David Poeppel. 2009. A lexical basis for n400 context effects: Evidence from meg. Brain and Language 111: 161–172. Lewis, Richard L., and Shravan Vasishth. 2005. An activation-based model of sentence processing as skilled memory retrieval. Cognitive Science 29: 375–419. Luck, Steven J. 2014. An introduction to the event-related potential technique. Cambridge, MA: MIT Press.

electrophysiological methods

557

Martín‐Loeches, Manuel, Francisco Muñoz, Pilar Casado, A. Melcón, and Carlos Fernández‐Frías. 2005. Are the anterior negativities to grammatical violations indexing working memory? Psychophysiology 42: 508–519. Mazaheri, Ali, and Ole Jensen. 2010. Rhythmic pulsing: Linking ongoing brain activity with evoked responses. Frontiers in Human Neuroscience 4: 177. McElree Brian, Stephani Foraker, and Lisbeth Dyer. 2003. Memory structures that subserve sentence comprehension. Journal of Memory and Language 48: 67–91. McKinnon, Richard. 1996. Constraints on movement phenomena in sentence processing: Evidence from event-related brain potentials. Language and Cognitive Processes 11: 495–524. Meltzer, Jed A., and Allen R. Braun. 2013. P600-like positivity and left anterior negativity responses are elicited by semantic reversibility in nonanomalous sentences. Journal of Neurolinguistics 26: 129–148. Molinaro, Nicola, Horacio A. Barber, and Manuel Carreiras. 2011. Grammatical agreement processing in reading: ERP findings and future directions. Cortex 47: 908–930. Mosher, John C., Michael E. Spencer, Richard M. Leahy, and Paul S. Lewis. 1993. Error bounds for EEG and MEG dipole source localization. Electroencephalography and Clinical Neurophysiology 86(5): 303–321. Mueller, Jutta L., Anja Hahne, Yugo Fujii, and Angela D. Friederici. 2005. Native and nonnative speakers’ processing of a miniature version of Japanese as revealed by ERPs. Journal of Cognitive Neuroscience 17: 1229–1244. Münte, Thomas F., and Hans Jochen Heinze. 1994. ERP negativities during syntactic processing of written words. In Hans-Jochen Heinze, Thomas F. Münte, and George R. Mangun (eds), Cognitive electrophysiology, 211–238. Boston, MA: Birkhäuser. Münte, Thomas F., Mike Matzke, and Sönke Johannes. 1997. Brain activity associated with syntactic incongruencies in words and pseudo-words. Journal of Cognitive Neuroscience 9: 318–329. Nelson, Matthew J., Imen El Karoui, Kristof Giber, Xiaofang Yang, Laurent Cohen, Hilda Koopman, Sydney S. Cash, Lionel Naccache, John T. Hale, Christoph Pallier, and Stanislas Dehaene. 2017. Neurophysiological dynamics of phrase-structure building during sentence processing. Proceedings of the National Academy of Sciences 114: E3669–E3678. Neville, Helen, Janet L. Nicol, Andrew Barss, Kenneth I. Forster, and Merrill F. Garrett. 1991. Syntactically based sentence processing classes: Evidence from event-related brain potentials. Journal of Cognitive Neuroscience 3: 151–165. Osterhout, Lee, and Peter Hagoort. 1999. A superficial resemblance does not necessarily mean you are part of the family: Counterarguments to Coulson, King and Kutas (1998) in the P600/SPS-P300 debate. Language and Cognitive Processes 14: 1–14. Osterhout, Lee, and Phillip J. Holcomb. 1992. Event-related brain potentials elicited by syntactic anomaly. Journal of Memory and Language 31: 785–806. Osterhout, Lee, and Linda A. Mobley. 1995. Event-related brain potentials elicited by failure to agree. Journal of Memory and Language 34: 739–773. Osterhout, Lee, Phillip J. Holcomb, and Daniel Swinney. 1994. Brain potentials elicited by garden-path sentences: Evidence of the application of verb information during parsing. Journal of Experimental Psychology: Learning, Memory, and Cognition 20: 786–803. Phillips, Colin, Nina Kazanina, and Shani H. Abada. 2005. ERP effects of the processing of syntactic long-distance dependencies. Cognitive Brain Research 22: 407–428.

558

jon sprouse and diogo almeida

Ruchkin, Daniel S., Ray Johnson, Howard Canoune, and Walter Ritter. 1990. Short-term memory storage and retention: An event related brain potential study. Electroencephalography and Clinical Neurophysiology 76: 419–439. Sassenhagen, Jona, Matthias Schlesewsky, and Ina Bornkessel-Schlesewsky. 2014. The P600as-P3 hypothesis revisited: Single-trial analyses reveal that the late EEG positivity following linguistically deviant material is reaction time aligned. Brain and Language 137: 29–39. Sharon, Dahlia, Matti S. Hamalainen, Roger B. H. Tootell, Eric Halgren, and John W. Belliveau. 2007. The advantage of combining MEG and EEG: Comparison to fMRI in focally stimulated visual cortex. NeuroImage 36(4): 1225–1235. Steinhauer, Karten, and John E. Drury. 2012. On the early left-anterior negativity (ELAN) in syntax studies. Brain and Language 120: 135–162. Stern, Mitchell, Daniel Fried, and Dan Klein. 2017. Effective inference for generative neural parsing. In Martha Palmer, Rebecca Hwa, and Sebastian Riedel (eds), Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 1695–1700. Stroudsberg, PA: Association for Computational Linguistics. Stroud, Clare, and Colin Phillips. 2012. Examining the evidence for an independent semantic analyzer: An ERP study in Spanish. Brain and Language 120: 108–126. Tanner, Darren, and Janet G. Van Hell. 2014. ERPs reveal individual differences in morphosyntactic processing. Neuropsychologia 56: 289–301. van Herten, Marieke, Herman H. J. Kolk, and Dorothee J. Chwilla. 2005. An ERP study of P600 effects elicited by semantic anomalies. Cognitive Brain Research 22: 241–255. Van Petten, Cyma, Seana Coulson, Susan Rubin, Elena Plante, and Marjorie Parks. 1999. Time course of word identification and semantic integration in spoken language. Journal of Experimental Psychology: Learning, Memory, and Cognition 25: 394–417. Wang, Lin, Ole Jensen, Danielle Van den Brink, Nienke Weder, Jan-Mathijs Schoffelen, Lilla Lilla Magyari, Peter Hagoort, and Marcel Bastiaansen. 2012. Beta oscillations relate to the N400m during language comprehension. Human Brain Mapping 33(12): 2898–2912. Wang, Lin, Zude Zhu, and Marcel C. M. Bastiaansen. 2012. Integration or predictability? A further specification of the functional role of gamma oscillations in language comprehension. Frontiers in Psychology 3: 1–12. Weiss, Sabine, Horst M. Mueller, Baerbel Schack, Jonathan W. King, Marta Kutas, and Peter Rappelsberger. 2005. Increased neuronal communication accompanying sentence comprehension. International Journal of Psychophysiology 57: 129–141. Ye, Zheng, Yue-Jia Luo, Angela D. Friederici, and Xiaolin Zhou. 2006. Semantic and syntactic processing in Chinese sentence comprehension: Evidence from event-related potentials. Brain Research 1071: 186–196.

c ha p t e r 1 6 ...........................................................................................................

hemodynamic methods ...........................................................................................................

jonathan r. brennan

16.1 Introduction

..........................................................................................................................

Hemodynamic methods in brain imaging are techniques that measure properties of blood flow to draw inferences about how neural activity supports cognition. These methods were first applied to cognitive questions beginning in the late 1980s, and the first studies of sentence-processing and syntax emerged in the seminal work of Mazoyer et al. (1993), Just et al. (1996), and Stromswold et al. (1996) in the 1990s. These studies showed for the first time that distinct aspects of sentence processing lead to systematic changes in blood flow in specific cortical regions that can be detected in vivo, while individuals are using language. This work builds on a 100-year tradition of localizing specific aspects of sentence-processing to distinct regions of the brain based on deficitlesion and electrophysiological methods that are discussed elsewhere in this Handbook. The questions most appropriate for these tools concern where syntactic representations are stored and operated on in the service of using language. Answering these questions is a necessary prerequisite to addressing how syntactic knowledge is learned, encoded, and used by the brain. This chapter discusses the application of these techniques to the study of syntactic representations and computations. Such applications must be understood in the context of two basic premises: Syntactic theories describe mental knowledge states that constrain the mapping between the form and meaning of a linguistic utterance. Neuroimaging methods measure brain signals that reflect the application of this knowledge and other cognitive resources while using language.

560

jonathan r. brennan

These premises transparently adapt the competence/performance distinction to neuroimaging (e.g. Chomsky 1965). To apply hemodynamic and other neuroimaging methods to syntactic questions, then, one must articulate linking hypotheses, or functions that connect the grammatical principles described by syntactic theories to brain signals: input −→

mental/brain state −→ .. .

brain signal

conforms to grammar Following the perspective developed by Marr (1982), the mental states involved in parsing linguistic input conform to syntactic representations belonging to the grammar. The corresponding brain states determine the properties of measured brain signals (Embick and Poeppel 2015; Poeppel and Embick 2005; Marantz 2005). The relationship between a grammar and the measured brain signal is thus not direct; it is mediated by several linking functions. These are the arrows in the diagram above. A central challenge in applying hemodynamic methods to study syntax is in specifying these links. A common but often implicit view in the literature is that hemodynamic signals, and perhaps brain signals more generally, are of limited use in addressing syntactic questions. This chapter argues that such a view has merit only when linking functions are vague or under-specified. While current accounts do not yet conclusively relate hemodynamic signals to the sorts of representational questions typically at issue in theoretical syntax, a range of work now points the way towards how these linking functions might be formulated. These efforts provide the bases for a more integrative cognitive neuroscience of syntax. To develop this argument, this chapter briefly introduces methods for recording hemodynamic signals associated with language (Section 16.2), reviews several themes in state-of-the-art efforts to link these signals with syntactic representations and computations (Sections 16.3–16.7), and formulates a framework for how different linking hypotheses fit together to guide forward progress in applying hemodynamic data to syntactic questions (Section 16.8).

16.2 Measuring hemodynamic signals

..........................................................................................................................

Three methods are commonly used to measure hemodynamic brain activity. Each has trade-offs in the kinds of questions it may best answer. To preview: The fMRI technique has precise spatial resolution but limited temporal resolution; fNIRS is better suited for imaging children, but has limited spatial resolution; PET imaging is a third technique that is less common in recent research due to very poor temporal resolution. Functional magnetic resonance imaging (fMRI) is the most common methodology for recording hemodynamic signals. In magnetic resonance imaging (MRI), very strong

hemodynamic methods

561

magnets are used to align, and then perturb, hydrogen nuclei. This sequence generates a signal that can be precisely located in three dimensions. The spatial unit of analysis in these images is the voxel, or “volumetric pixel”: Brain images are composed out of these basic elements, which are typically 1–3 mm3 . A voxel of 1 mm3 contains about 50,000 neurons. The distribution of hydrogen across bodily tissues allows this technique to distinguish, for example, cortical grey matter which houses neuronal cell bodies, from white matter which is made up of the axons that convey electrical signals from one neuron to another. The distribution of hydrogen also varies with the level of oxygen in blood: Oxygenated hemoglobin generates a stronger magnentic resonance signal than de-oxygenated hemoglobin. Because blood oxygenation varies systematically with neuronal activity, this blood oxygenation level-dependent (BOLD) signal allows fMRI to measure brain function. This relationship is indirect as there are complexities concerning the specific aspects of neuronal activity that drive the BOLD signal, as well as the spatial alignment of activated neurons and associated vasculature (e.g. Logothetis and Wandell 2004). For example, an increase in the BOLD signal reflects both inhibitory and excitatory neural activity. Despite these limitations, fMRI has very good spatial resolution: BOLD signal changes can be resolved within a few millimeters (Fig. 16.1A). The BOLD signal changes relatively slowly compared to the speed of neuronal firing and the speed of language. Neurons cycle anywhere from 1 to over 200 times per second and a typical speech rate is 2–4 words every second. However, the hemodynamic response function (HRF) typically peaks 6–8 seconds after an associated neuronal event and does not return to a baseline level for 12–16 seconds. An idealization of the HRF is shown in the middle of Fig. 16.1B. Thus, fMRI has poor temporal resolution: The BOLD signal is orders of magnitude slower than both the underlying neuronal activity and cognitive operations related to incremental language comprehension. Compounding the temporal limitations of fMRI, the time it takes to record an image of the whole brain (the sampling rate, or TR) is typically about 2 seconds (though faster rates are possible; Feinberg et al. 2010). On the other hand, the BOLD signal varies linearly with underlying neuronal activity (at least to a first approximation; Boynton et al. 1996). This logic underlies the most common approach to analyzing BOLD signals: the “subtraction method,” or its broader formulation under the general linear model (GLM). Here, stimulus events are transformed to take into account the hemodynamic lag and a statistical test is conducted to identify clusters of voxels whose dynamics monotonically differ between experimental conditions. Poldrack et al., (2011) offer a comprehensive introduction to the fMRI technique. This statistical procedure is schematized in Figure 16.1B for two different experimental setups. The top row shows an experiment in which stimuli are presented together in blocks that are organized by condition. In this way, changes in the BOLD signal for each condition can be easily distinguished (Fig. 16.1B, top-right panel). But, there are obvious limitations to this block design in terms of stimulus repetition, attention effects, etc. Alternatively, stimuli can be interwoven in an event-related design, shown on the bottom. Statistical techniques help to optimally space the stimuli so as to separate out the BOLD response for each item (Fig. 16.1B, bottom-right panel).

562

jonathan r. brennan Measured BOLD signal

(a)

0.0

0.5

1.0

1.5

2.0 min

(b) Sequence of stimuli from two conditions

Expected BOLD signal

convolve with hemodynamic response function

Condition A Condition B

0.0 0.5 Block design

1.0

1.5

2.0 min

0

0.0 0.5 1.0 Event-related design

1.5

2.0

10

0.0

0.5

1.0

1.5

2.0 min

0.0

0.5

1.0

1.5

2.0

20 sec

fig. 16.1 (A) The BOLD signal changes slowly over time. (B) A sequence of stimuli divided into conditions in either a block design (top left) or event-related design (bottom left) can be combined with the hemodynamic response function (HRF; middle) to estimate BOLD-signal changes separately for each condition (right). Experimental effects are tested by evaluating the statistical fit between an expected signal and the measured signals.

Nonetheless, the slugglishness of the BOLD response limits its use in detecting transient neuronal signals that reflect rapid aspects of syntactic processing. This limitation should be kept in mind when considering what sorts of syntactic questions are best for hemodynamic techniques. The manipulation of magnets in MRI is very loud. This presents a difficulty when using fMRI with auditory stimuli like spoken sentences. However, the sluggishness of the BOLD-signal offers a clever solution called sparse sampling. By tuning the MRI scanner to record only at the peak of the hemodynamic response, which is several seconds after a stimulus is presented, the stimuli can be presented during the (relatively) quiet intervals when the scanner is not recording. In addition to magnetic resonance, blood oxygenation also alters the way light is absorbed as it passes through vasculature. This fact underlies an alternative way to measure hemodynamics: functional near-infrared spectroscopy (fNIRS). In this technique, optical emitters and receivers are placed around the head. A near-infrared optical signal passing through the head will be absorbed to different degrees depending on blood oxygenation. Spectral shifts reflecting the BOLD signal can be correlated with brain

hemodynamic methods

563

locations based on the spatial arrangement of emitters and receivers. While fNIRS does not offer the same high spatial resolution as fMRI (Cui et al. 2011), it offers several other advantages, including the capacity to distinguish dynamics of deoxygenated and oxygenated hemoglobin. Furthermore, the recording caps and associated equipment are relatively unobtrusive and, importantly, quiet. This makes fNIRS especially suitable for studies with children. See Rossi et al. (2012) for a language-centered introduction to this technique. Positron Emission Tomogrophy (PET), while largely supplanted by fMRI, features in many earlier studies. In PET, a radioactive isotope is combined with glucose and injected as a tracer into research participants. This glucose is taken up in the bloodstream and the metabolic response to neuronal activity leads to increased local glucose consumption. As the isotope decays, it emits photons that can be used to detect the location of the increased glucose consumption. This signal is called regional cerebral blood flow (rCBF). While this technique is less widely used, it does have some advantages in measuring functional signals from cranial regions that are subject to magnetic distortion, including language-relevant regions in the anterior poles of the temporal and frontal lobes (e.g. Devlin et al. 2000). The spatial resolution of PET is similar to that of fMRI, but the temporal resolution is even more limited: The regimen of injecting and tracking the radioactive tracer takes about 30 seconds. Thus PET imaging can only be used with block-design experiments, mentioned above.

16.3 Three themes in the brain bases of syntactic processing

..........................................................................................................................

There are a wealth of studies that apply these tools to sentence processing and many reviews that aim to synthesize these findings into coherent accounts (e.g. Kaan and Swaab 2002; Hagoort and Indefrey 2014; Friederici and Gierhan 2013; BornkesselSchlesewsky et al. 2015; Meyer and Friederici 2016; Bornkessel-Schlesewsky and Schlesewsky 2013). These qualitative accounts are grounded in contrasts between stimulus items. For example, a typical approach in this domain is to compare stimulus items that differ in terms of one or another kind of “syntactic complexity.” While this key notion lacks a rigorous definition, it generally applies along two dimensions: hierarchical complexity, or number/distance of long-distance dependencies. These are representational descriptions, but current models do not, on the whole, commit to specific linking hypotheses that specify how processing mediates between such representations and brain signals that differ between stimulus items. As a high-level guide, Table 16.1 describes some connections between terminology used to describe syntactic representations, and terminology used to describe syntax-related sentence processes that may be implemented in specific brain regions.

564

jonathan r. brennan

Table 16.1 Syntactic representations stand in a many-to-many relationship with sentence-processing operations. Representations

Processes

Phrase structure, selection

Structure-building, thematic analysis, semantic composition (integration), recall, prediction, reanalysis Memory encoding, maintenance (interference), cued-recall

Movement and agreement dependencies

Neurolinguistic studies of sentence-processing typically use sentence stimuli to manipulate properties on the right-hand side of Table 16.1, while syntactic theories are generally concerned with the representations (and associated computations) on the left-hand side. The mapping is non-trivial. For example, whereas a basic division between processes that derive hierarchy and those that reconcile dependencies is common across the literature, these operations are unified under some syntactic theories (e.g. Starke 2001), or are based on a common foundation (e.g. feature-checking); even when representationally distinct, some parsing models account for both processes under a common set of operations for storing and retrieving objects in memory (Lewis and Vasishth 2005). This example highlights a major issue with applying hemodynamic methods to syntactic questions: Uncertainty about both neural implementation and processing algorithms severely limits the inferences that can be drawn about syntactic representations. To better understand this challenge, and to see how progress can be made, the next sections highlight three themes in research into the hemodynamics of sentence processing.1 Is syntax special? The left inferior frontal gyrus (LIFG) of the brain may carry out potentially syntax-specific functions. Questions: Do these functions reflect hierarchical computations, long-distance dependencies, or both? Could patterns of activity reflect domain-general memory processing? If so, what is the balance of memory encoding, maintenance, and retrieval across different brain regions implicated in dependency-processing? Computing phrase structure Hierarchical complexity modulates a range of regions including the LIFG but also the left anterior and posterior temporal lobes (LATL, LPTL). Questions: What is the functional balance between these (and other regions) in terms of syntactic structure-building, semantic composition, and related operations like syntactic reanalysis? How do abstract phrase-structure principles relate to online structure-building? 1

I am setting aside debates that concern neurobiological but not obviously syntactic issues such as questions about the functional subdivisions of some of the brain regions that are discussed below.

hemodynamic methods

565

Table 16.2 Summary of brain regions related to syntax Region

Abbreviation

Process

Left Inferior Frontal Gyrus

LIFG

Left Anterior Temporal Lobe

LATL

Hierarchical composition, dependency formation and resolution Hierarchical and semantic composition

Left Posterior Temporal Lobe Temporal-Parietal Junction

LPTL, TPJ

Hierarchical composition, structural prediction, thematic role assignment

Distinguishing syntactic from semantic composition The left temporal lobe is implicated in mapping between hierarchical structure and semantic representations, including thematic roles. Questions: What is the relationship between phrase-structure operations and semantic composition? How do syntactic and semantic representations interact in argument structure and thematic processing? The next three sections engage with recent contributions to each of these themes. As an overview, Fig. 16.2 illustrates the principle brain regions discussed here, and Table 16.2 summarizes, very roughly, their syntax-related functions that are introduced below. The outline of themes and brain regions presented here has value for the syntactician in that it points to the kinds of brain signals that may bear on syntactic questions. But progress has been slow in part because processing and representational assumptions are rarely specified in sufficient detail for accounts to be compared. One particular through-line in this critique concerns the role of prediction at multiple levels of sentence-processing. These observations point towards productive ways to address these limits by more rigorously specifying processing assumptions, representational assumptions, and their interactions.

16.4 Syntax-specific computations in the LIFG?

..........................................................................................................................

One of the most famous brain regions associated with syntactic processing is the LIFG, or “Broca’s area,” after the initial discovery of a close correlation between language difficulties and damage to this region documented by Pierre Paul Broca in a series of papers in 1861.2 Numerous studies using hemodynamic and other methods support a close 2

Dronkers et al., (2007) offers a fascinating discussion and reanalysis of these historical cases.

566

jonathan r. brennan

fig. 16.2 Schematic representation of syntax-related brain regions of the left hemisphere and the connections between them. Selected cortical landmarks are labelled in gray.

connection between parts of this brain region and sentence-processing, though there remains fierce debate as to the precise function(s) carried out within this region. Rogalsky and Hickok (2010) offer a measured take on this debate. One body of literature links this region to the processing of dependencies. This connection was first suggested by patterns of language deficits following lesions (Caramazza and Zurif 1976; Grodzinsky 2000) where patients showed difficulty matching sentences like (1a) to associated pictures, but not sentences like (1b) (see Chapter 17 for detailed discussion of the deficit-lesion method and syntax.)3 These sentences differ in whether they are semantically reversible: The (a) sentence can only be correctly understood by analyzing the boy as the syntactic direct object of chasing, despite being displaced sentence-initially. (1)

3

a. b.

[The boy]1 that the girl is chasing t1 is tall. [The apple]1 that the boy is eating t1 is red.

I am sidestepping the nuanced relationship between clinical symptoms of Broca’s aphasia and the difficulty with dependencies mentioned here, as well as variability between LIFG lesion sites and clinical outcomes; see e.g. Grodzinsky et al. (1999); Berndt and Caramazza (1999); Zurif and Pinango (1999).

hemodynamic methods

567

Hemodynamic methods have supported such a link between dependency-processing and the LIFG (e.g. Caplan et al. 2008; Stowe et al. 1998; Just et al. 1996; Stromswold et al. 1996; Piñango et al. 2016). For example, Stromswold et al. found that center-embedded sentences like (2a) lead to more activation in this area than right-branching sentences with shorter dependencies, like (2b). (2)

a. b.

The juice1 [cp that the child spilled t1 ] stained the rug The child spilled the juice1 [cp that t1 stained the rug]

Many specific functions could be implicated by these processing asymmetries. Such activation may reflect domain-general processing such as working memory used to store and retrieve the displaced element (Rogalsky et al. 2008; Rogalsky and Hickok 2010), executive control resources used to modulate and control competing activation (Novick et al. 2010; Bornkessel-Schlesewsky and Schlesewsky 2013), sequenceprocessing (Snijders et al. 2009; Petersson et al. 2010; Hagoort 2013) domain-specific memory processing (Fiebach et al. 2005), or other language-specific processing such as the representation of a trace (Ben-Shachar et al. 2004; 2003; Santi and Grodzinsky 2007; 2010; Makuuchi et al. 2013) or hierarchical embedding (Makuuchi et al. 2009; Friederici et al. 2006; Bahlmann et al. 2008; Zaccarella and Friederici 2015). This debate is made more complex by the fact that these kinds of sentences may differ in syntactic, semantic, and processing dimensions. For example, the center-embedded sentences in (2) have a dependencies that differ in terms of hierarchy (they span different numbers of nodes),4 and they also differ in pragmatic structure: The sentences take different terms to be given by the discourse (the juice is given in (2a); the child is given in (2b).) It is also the case that sentences like (2b) are more common than sentences like (2a). Setting aside causality, frequency effects alone modulate processing difficulty and can, among other things, lead the comprehender to predict some structures more often than others (e.g. Fine et al. 2013). Recent efforts aim to address these challenges. Matchin et al. (2014) seek to tease apart accounts of LIFG activation that are specific to syntactic movement from accounts that are based on the maintenance in memory of a linguistic or domain-general dependency (cf. Santi and Grodzinsky 2007; Rogalsky et al. 2008). They do so by comparing object wh-questions of different dependency-lengths (3a,b) with backwards anaphora (3c,d). (3)

a. b. c. d.

4

[Which song]1 did [the band] play t1 at the concert that ended early? [Which song]1 did [the band that won the contest] play t1 at the concert? Because he1 extinguished [the flames], [the fireman]1 saved the resident that arrived later. Because he1 extinguished [the flames that burned all night long], [the fireman]1 saved the resident.

The examples in (2) also differ in terms of linear distance. Whether working-memory load in these examples reflect linear or hierarchical distance has not been resolved.

568

jonathan r. brennan

Matchin et al. report that longer wh-dependencies and backwards anaphora both lead to increased activation in the LIFG, specifically the pars triangularis sub-region (blue in Figure 16.2). They interperet this finding in terms of memory demands, recognizing that both the wh-word and the fronted anaphora provide a predictive cue that a dependency will be formed. Psycholinguistic evidence indicates that such dependencies are formed actively (e.g. Stowe 1986; Phillips 2006). It is this active prediction that Matchin et al., attribute to the pars triangularis. Their conclusion is consistent with results from Piñango et al. (2016) showing that LIFG activation is modulated by the presence of a syntactic island which blocks such active dependency formation (Phillips 2006). One take away from this study is that distinct representational elements may elicit similar processing responses, as with active-dependency formation here. Progress is being made in mapping these effects to neural signals in domains, like dependency formation, where psycholinguistic theories are relatively sophisticated. When processing steps are not taken into account, however, the role of theoretically important representational differences on brain signals becomes muddied. For example, Shetreet and Friedmann (2014) aim to distinguish A and A′ movement by comparing the fMRI response to topicalization in Hebrew, as in (4), to that of verb-movement, as in (5). (4)

a. b.

(5)

a. b.

Et ha-xayelet ha-nirgeshet ha-safta texabek maxar ACC the-soldierFEM the-excitedFEM the-grandma will-hug tomorrow Ha-safta texabek et ha-xayelet ha-nirgeshet maxar The-grandma will-hug ACC the-soldierFEM the-excitedFEM tomorrow Maxar texabek ha-safta et ha-xayelet ha-nirgeshet Tomorrow will-hug the-grandma ACC the-soldierFEM the-excitedFEM Maxar ha-safta texabek et ha-xayelet ha-nirgeshet Tomorrow the-grandma will-hug ACC the-soldierFEM the-excitedFEM

Results for the comparison in (4) indicate that topicalization yields increased activation in LIFG and in the left temporal lobe. This is consistent with the LIFG results reported above where the activation reflected an active dependency that was cued by a fronted element. There were no such equivalent effects for verb-movement. Rather, verb-movement modulated a region in the occipital lobe that has been linked to reading and, indirectly, to phonological processing. The authors connect this latter finding to the possibility that head-movement of the verb is a phonological operation. This representational inference may be premature: the verb-movement stimuli in (5) crucially differ from the topicalization examples in (4) in that they do not have a word that serves to cue an active dependency. In other words, rather than differing the type of dependency present in these stimuli, the absence of effects in the LIFG plausibly reflects the different processing strategies which are ellicited by these types of stimuli. The studies discussed thus far indicate that sub-parts of the LIFG reflect predictive processing associated with certain long-distance dependencies. It remains unclear, however, how domain-specific these processes might be. For example, while Matchin

hemodynamic methods

569

et al. conclude that their findings indicate a general mechanism that spans movement and non-movement dependencies, some syntactic theories connect pronominal binding to the same syntactic mechanisms that account for movement (Kayne 2002; see also Hornstein 1999.) So, inferences licensed by neural results are contingent on commitments both at the processing level, in terms of predictiveness, and at the representational level, in terms of an ontological division between binding and movement. Furthermore, the principal effect of dependency distance (example (3) b,d > a,c) is based on a hierarchical manipulation: The dependency spans a relative clause or it does not. To balance this, a different modifier is tacked on to the sentence-final noun in the control sentences. The comparison of relative clauses introduces another variable that is difficult to understand without an explicit linking hypothesis (for example, does it matter that the embedded predicate in (3c) is unergative, while the control embedded predicate in (3d) is unaccusative?). Makuuchi et al. (2009) aim to disentangle hierarchical processing from dependency length by using fMRI to compare brain activity when dependency length is manipulated separately from hierarchical embedding (see also Santi and Grodzinsky 2010; Makuuchi et al. 2013).5 They used German sentences like those in (6a,b), which exhibit centerembedded relative clauses and differ in terms of the linear distances between the matrix subject and verb, as compared with (6c,d) which do not include relative clauses (clause boundaries are indicated and matrix verb–subject pairs are co-indexed). (6)

a. b.

c. d.

[cp Maria1 , [cp die Hans, [cp der gut assah], liebte], Johann geküsst hatte1 ] (Maria who loved Hans who was good looking kissed Johan) [cp Maria1 , [cp die weinte], Johann geküsst hatte1 ] und [cp zwar gestern abend] (Maria who cried kissed Johan and that was yesterday night) [cp Achim1 den großen Mann gestern am späten Abend gesehen hatte1 ] (Achim saw the tall man yesterday late at night) [cp Achim1 den großen Mann gesehen hatte1 ] und [cp zwar am abend] (Achim saw the tall man at night and that was late)

Makuuchi et al.’s results indicate that distinct areas within the LIFG are sensitive to these factors. Center-embedding depth ((6) a,b > c,d) most strongly modulates a subpart of the LIFG called the pars opercularis, while subject-verb distance ((6) a,c > b,d) most strongly modulated BOLD signal in a region adjacent to the LIFG, the inferior frontal sulcus. Zaccarella and Friederici (2015) present further evidence that a small sub-area within the LIFG is involved specifically in processing even simple hierarchical structure. Comparing minimal German phrases like jede Apfel ‘each apple’ with a 5

There is some confusion surrounding the term “hierarchical” in the neurolinguistic literature. In some contexts, this term is used to describe nested phrase structure regardless of the formal complexity of the underlying grammar (Pallier et al. 2011; Brennan et al. 2016). In others, “hierarchical” is reserved specifically for self-embedding phrase-structures, such as center-embedded relative clauses (Bahlmann et al. 2008).

570

jonathan r. brennan

corresponding non-phrasal word-list (apfel flirk). Phrases, but not lists, activated a subpart of the pars opercularis.6 This result is consistent with work by Pallier et al., (2011), discussed in more detail below, which indicate that the LIFG may be recruited for phrasal processing that does not necessarily require center-embedding or long-distance dependencies.7 Two notes of caution are in order when considering the implication of these results. First, contra Zaccarella and Friederici (2015), such results do not speak to the neural bases of theoretical functions that serve to define well-formed syntactic representations, like merge. The experiments above tease out brain activity serving something else, like recognizing a particular syntactic representation given some input or interpreting some incremental input by means of intermediary syntactic structures. The operations of an incremental interpreter, or parser, comform to properties of an underlying grammar, but its operations need not be transparent to that grammar (Stabler 1991). We return to the mapping between grammar and parser in Section 16.8. Further, the manipulation of embedding depth in (6) above relies on comparing less-frequently-ocurring multiply-embedded relative clauses with common unembedded clauses; thus activation could reflect the difference in frequency or predictability across these stimuli. Also, and in contrast to the dependency-related results reported by Matchin et al. (2014), the effects here did not occur in the pars triangularis, but rather in an adjacent region. Numerous factors could contribute to such a discrepancy (cf. Caplan et al. 2008): the studies looked at different kinds of dependencies (anaphora and movement vs. agreement) and used different tasks (answer occasional comprehension questions, or register a ‘sensicality’ judgment for each stimulus). These studies do not furnish theories that indicate how experimental choices, such as task, and representational differences might affect the outcome. In sum, a range of studies test whether LIFG processing reflects languagespecific computations connected to dependency formation and to hierarchical phrasestructure. These studies reveal subtle differences in processing between parts of this larger brain area. For example, they indicate that the pars triangularis may perform memory functions required to actively maintain a syntactic dependency. However, a clear picture relating these functions to properties of syntactic representations has yet to emerge. A principal reason is that existing studies do not typically make explicit their auxilliary assumptions about syntactic representations and parser operations. Accounts that make commitments for these assumptions have the potential not only to productively guide future work, but to draw together in a more compelling way a

6 Note that this result contrasts with magnetoencephalographic results from Bemis and Pylkkänen (2011) which show no effects in the LIFG for similar contrasts. 7 Petersson et al., (2010) make a similar point using artificial grammar learning: They find LIFG activation in participants who learned an artificial finite-state grammar. However, they draw an overly strong generalizion that neural activation which accords with a finite-state grammar precludes explanations of LIFG function in terms of more complex natural language formalisms.

hemodynamic methods

571

range of existing findings. For example, Shetreet and colleagues have conducted numerous studies using subtle syntactic comparisons that bear on representational issues like selection (Shetreet et al. 2007) and argument structure (Shetreet et al. 2009; Shetreet and Friedmann 2012). A broad view from these studies is that linguistically motivated distinctions, including differences with minimal surface effects like unaccusative vs. unergative verbs, lead to clear neural differences with fMRI. It is exciting that such cognitive differences are detectable by current neuroimaging techniques. But, current theories limit the advantages we can take of such a non-trivial sensitivity when they do not indicate how representational properties affect incremental processing. The next section turns to studies that have tried to articulate such assumptions specifically as regards building phrase-structure.

16.5 Processing and representing phrase-structure

..........................................................................................................................

Hierarchical structure-building has been connected with a range of brain regions, primarily in the left hemisphere. Early literature highlighted the left anterior temporal lobe (LATL; red area in Figure 16.2) in studies that compared “simple” sentences (i.e. sentences without embedded clauses) to lists of words that did not form phrases or sentences (e.g. Mazoyer et al. 1993; Stowe et al. 1998). In addition to varying in some kind of representational complexity, such stimuli also notably vary in predictability: Phrases contain sequences of words that are more predictable than random word lists.8 Several studies have tried to isolate just activity associated with hierarchical structure by using nonsense “jabberwocky” stimuli which retain sufficient functional morphology to identify aspects of sentence structure (Friederici et al. 2000; Humphries et al. 2006; Pallier et al. 2011; Matchin et al. 2016). The basic idea behind this approach is that such stimuli engage hierarchical syntactic processing, but should not engage conceptual semantics. Further, such stimuli are uniformly unpredictable, a point that is picked up below. For example, Pallier et al. (2011) present stimuli such as those in Table 16.3 which differ in the number of words that make up each maximal phrase. Pallier et al. identify a range of regions whose activation varies in proportion to (the logarithm of) the number of words per phrase. The LATL shows a proportional increase in activation for sensical stimuli only; activation did not change for sentences with “jabberwocky” content (but cf. Friederici et al. 2000; Humphries et al. 2006). In contrast, a region in the

8

Phrasal and word-list stimuli also differ in prosodic structure. Humphries et al. (2005) suggests that prosodic information may be especially relevant for right anterior temporal activation, but not for the left hemisphere.

572

jonathan r. brennan

Table 16.3 Examples of phrases and sentences made of of real words or nonsense pseudo-words from Pallier et al., (2011) (adapted from French) words phrase

Sense

Example

12

y n

[I believe that you should accept the proposal of your new associate] [I tosieve that you should begept the tropufal of your tew viroate]

6

y n

[the mouse that eats our cheese] [two clients examine this nice couch] [the couse that rits our treeve] [fow plients afomine this kice bloch]

4

y n

[mayor of the city] [he hates this color] [they read their names] [tuyor of the roty] [he futes this dator] [they gead their wames]

2

y n

[looking ahead] [important task] [who dies] [his dog] [few holes] [they write] [troking ahead] [omirpant fran] [who mies] [his gog] [few biles] [they grite]

LIFG9 and more posterior areas of the left temporal lobe did show increases in activation with phrase-size even for jabberwocky stimuli (blue and green areas, respectively, in Figure 16.2). Pallier et al. present an explicit processing model to link the the phrase-structure differences in their stimuli with their hemodynamic data. Their “accumulator” model operates in accordance with a bottom-up parser such that hierarchical dependencies remain open, thereby monotonically contributing to neural activity, until the phrase-final word is encountered. However, the syntactic assumptions that underlie their model are relatively simplistic and they underdetermine how the crucial jabberwocky stimuli could be analyzed by participants. Consider, for example, the string fran who mies his gog in the eighth row of Table 16.3. This string is analyzed as spanning three unconnected phrases, but seems equally plausible as a single phrasal fragment by analogy to man who spies his dog. The reasoning behind the chosen syntactic analysis is not discussed in this study. Seeking in part to address this limitation, Brennan et al. (2016) combine models of incremental parsing with more explicit syntactic assumptions (see also Brennan et al. 2012). They parse the text of a storybook according to either a context-free grammar from the Penn Treebank (Marcus et al. 1993) or a textbook-based minimalist grammar (Sportiche et al. 2013). They report that the LATL and also more posterior temporal areas are independently sensitive to hierarchical structure from both syntactic analyses: The more abstract minimalist analysis contributes to characterizing the hemodynamic data after statistically controlling for the simpler context-free grammar

9

Note, though, that Pallier et al. report activation in the pars triangularis and pars orbitalis subregions of the LIFG. These regions are anterior to the pars opercularis indicated for hierarchy-specific processing by Zaccarella and Friederici (2015) and Makuuchi et al. (2009), which are discussed above.

hemodynamic methods

573

and other control variables. This result highlights how differences between grammatical theories can, in principle, be teased apart with hemodynamic methods. Drawing out such distinctions in practice is the focus of Section 16.8. While the independent contributions of two grammatical formalisms were separable, Brennan et al. report that models which differed on a processing dimension, specifically whether the syntactic structure was constructed predictively or not, could not be distinguished in their data. This may reflect the temporal limitations of fMRI that were highlighted in Section 16.2: these parsing algorithms differ in how they distribute the workload of identifying phrases across a linear sequence, but such temporal differences were lost in the sluggish hemodynamic data. Note that the question of parsing predictiveness has featured prominently in psycholinguistic and neurolinguistic models (e.g. Hagoort and Indefrey 2014), while grammatical details have been much less prominent. These results indicate that choice of grammatical analysis may matter at least as much as some aspects of parser predictiveness. They also highlight the role of the temporal lobe in parsing phrase structure. Both Pallier et al., (2011) and Brennan et al., (2016) present computationally explicit models for how a particular syntactic analysis should affect hemodynamic signals. While the syntactic analyses adopted in these studies are open to revision, by stating their claims in terms of the quantitative fit of measured data to their models the proposals make clear the grounds on which they could be tested. This level of computational detail provides a possible avenue towards a more interactive relationship between syntactic theories and neural data, a point that I return to in Section 16.8.

16.6 Mapping between syntax and semantics

..........................................................................................................................

The studies just discussed are crucially limited in that the neuroimaging results could equally reflect syntactic structure-building, or aspects of semantic composition and interpretation (Stowe et al. 2005). The latter interpretation in terms of semantic composition is, in fact, supported by data from magnetoencephalography (MEG) (Zhang and Pylkkänen 2015; Westerlund et al. 2015) and patient studies (Wilson et al. 2014) that link the LATL with the composition of complex conceptual representations (see Pylkkänen 2016). While the close-knit relationship between structure and interpretation is the key factor underlying linguistic productivity, compositionality poses a “problem” for distinguishing between neural mechanisms involved in identifying phrase structure from those engaged in semantic operations. One approach to isolate differences between conceptual semantic and syntactic processing in the LATL and other areas involves manipulating whether sentences are well-formed or not according to putative syntactic or semantic criteria (e.g. whether they contain a thematic violation or an agreement violation; Rogalsky and Hickok 2009; Vandenberghe et al. 2002; Carreiras et al. 2015; Raettig et al. 2010; Kuperberg et al. 2000;

574

jonathan r. brennan

Friederici et al. 2003.) Much like the discussion of “jabberwocky” stimuli, above, these efforts are difficult to integrate into a broader model of syntactic processing without an explicit theory of how irregular stimuli are recognized and (re)analyzed. At present, the literature has not yet furnished such an account (Pylkkänen et al. 2011; Pylkkänen and Brennan 2018). An alternative approach leverages constructions where semantic interpretation is possible, but there is some mismatch with the surface structure of a sentence. Such mismatches provide opportunities to isolate neural signals that are either specific to semantic representations, or signals that reflect processing steps recruited to resolve apparent mismatches. Husband et al. (2011) offer one example. They use the phenomenon of complement coercion, following Pylkkanen and McElree (2007), to identify fMRI signals that reflect additional semantic processing in examples like (7a), compared to (7b). (7)

a. b.

the novelist began the book before break. the novelist wrote the book before break.

The verb began in (7a) takes an event-denoting complement, but the noun-phrase complement, the book, denotes an individual. The phrase becomes interpretable when the noun phrase is coerced into an event-denoting term with a meaning resembling “a salient event involving the book” (Pustejovsky 1995; Pylkkänen 2008). Husband et al. report that sentences requiring this manner of coercion, like (7a), lead to increased activation exclusively in the LIFG, specifically the pars triangularis, compared to sentences like (7b). Interestingly, the activation for such coercion was statistically indistinguishable from activation for sentences with implausible meanings that assign psychological states to inanimate objects, such as “The novelist annoyed the book before break.” The authors interpret this activation pattern in terms of a semantic coercion operation that updates the denotation of the noun-phrase. But, the book is more predictable following a verb like wrote compared either to began or to the implausible verb annoyed. These data, thus, are consistent with an explanation of LIFG activation in terms of predictability.10 Such an account connects with the evidence that was discussed above linking LIFG activation with maintaining predictions when processing dependencies. Similar interpretive challenges arise in related work examining brain responses to typical and atypical argument structures. Bornkessel et al. (2005) use the word-order flexibility afforded by scrambling in German along with the selectional properties of different verbs to vary the match between the argument structure of a sentence and its linear order, as in (8). (8)

10

a.

Gestern wurde erzählt dass Peter Lehrerinnen hilft. Yesterday was told that Petersg teacherspl helpssg

There is other evidence that distinguishes coercion from predictability using methodologies with a more fine-grained temporal resolution than fMRI, including MEG (Pylkkänen and McElree 2007), eye-tracking (Delogu et al. 2017).

hemodynamic methods b. c. d.

575

Gestern wurde erzählt dass Peter Lehrerinnen helfen. Yesterday was told that Petersg teacherspl helppl Gestern wurde erzählt dass Peter Lehrerinnen auffällt. Yesterday was told that Petersg teacherspl noticessg Gestern wurde erzählt dass Peter Lehrerinnen auffällt. Yesterday was told that Petersg teacherspl noticespl

These sentences use verbal agreement to indicate whether the first or second noun phrase is the grammatical subject, and different psychological verbs are used to manipulate whether the grammatical subject or object receives the experiencer thematic role. Several brain areas, including the LIFG and a posterior area along the superior temporal sulcus of the LPTL, were sensitive to the interaction between these factors. This result implicates these regions in processes required to extract argument structure information from linear strings. The functional interpretation of these activations depends heavily on one’s assumptions about how argument structure is processed. Bornkessel and colleagues commit to an account in which rapidly extracted heuristic information and hierarchical syntactic information are processed along different neural pathways or “streams” (see Bornkessel-Schlesewsky and Schlesewsky 2013, building from Hickok and Poeppel 2007’s influential model of speech perception). For Bornkessel-Schlesewsky and Schlesewsky, (2013), a ventral stream that passes along the anterior aspect of the temporal lobe to the LIFG (green → red → blue areas in Figure 16.2) processes lexical and compositional semantic information, while a dorsal stream that passes along the posterior temporal lobe through the parietal cortex to the frontal lobe (green → orange → blue areas in Figure 16.2) processes the mapping between linear input and hierarchical structure. The LIFG, where these different information streams converge, is engaged in reconciling mismatches and resolving ambiguity (see also Novick et al. 2010). But, once again, stimulus predictability may play a role: Argument structure configurations that are more common lead to the least activation in the LIFG. On this alternative account, a single pathway (perhaps posterior to anterior along the temporal lobe, and then to the LIFG) serves to parse and interpret hierarchical structure. Predictions derived from this sequence then modulate activation in the LIFG based on whether they are matched or not by subsequent stimuli. Note that predictability-based effects are consistent with a broad range of possible neural functions as predictions have a cascading affect on multiple linguistic levels (semantic, syntactic, lexical, and sub-lexical).11 11

In general, theories that associate the LIFG with a role in resolving mismatches and controlling ambiguous activation have been difficult to tease apart from theories that associate the LIFG with predictive processing: Mismatches and ambiguities arise in contexts where predictions have failed. This debate has long dominated discussion of the electrophysiological N400 response (Kutas and Federmeier 2000; Lau et al. 2008) and progress has been made using tools with high temporal resolution to identify predictionspecific activity that arises prior to the appearance of any mismatches (e.g. Wicha et al. 2004, but see Nieuwland et al. 2017). However, such temporal sensitivity is not available in hemodynamic studies.

576

jonathan r. brennan

Frankland and Greene (2015) pursue an alternative approach that side steps some of the difficulties posed by under-specified processing assumptions. They leverage a method for analyzing fMRI data called multi-voxel pattern analysis (MVPA) to tease out brain activity specific to certain kinds of thematic representations. MVPA turns the standard logic of neuroimaging studies on its head: Rather than asking what brain signals show stronger activation between two conditions, MVPA quantifies how well a set of voxels classify whether the brain is processing one kind of stimulus or another (Haxby et al. 2014). An advantage of this tool is that it requires no assumptions about the direction of the hemodynamic effect: Signals may increase or decrease between conditions, and different voxels may register different dynamics. By relaxing assumptions about directionality, MVPA does not require the researcher to identify how a particular representational element will affect neural processing load. Rather, the researcher need only specify that different representations will engage at least some different neural resources. In this way, MVPA can indicate what kind of information is encoded by a particular set of voxels with fewer linking hypotheses about how such information is being processed. Frankland and Greene use this tool to identify brain regions that encode thematic information like agent and patient independently of lexical properties, linear order and, to an important but limited extent, hierarchical structure. Participants read sentences such as those in (9) which appeared in the active and passive voice. (9)

a. b. c. d.

The truck hit the ball. The ball was hit by the truck. The ball hit the truck. The truck was hit by the ball.

Their analysis identified clusters of voxels in the middle of the left superior temporal gyrus that differed systematically between sentences that assign different thematic roles to their arguments, like (9a,c) and (9b,d), as compared to sentences that assign the same thematic roles, like (9a,b) and (9c,d) (approximately the green area in Figure 16.2). This brain response is further linked with semantic interpretation by probing for increased emotional responses to sentences like the grandfather kicked the child as compared to its thematically reversed counterpart. Finally, a follow-up experiment identified adjacent sub-parts of this superior temporal region whose responses separately encoded whether a particular noun appeared as an agent (medial and anterior on the superior temporal sulcus), or as a patient (lateral and posterior along the superior temporal gyrus). The result is consistent with a neural model in which distinct groups of neurons along the superior temporal gyrus implement registers that store the value for argument role variables. Of course, the representations at issue are quite simple. Note also that the active–passive pairs (9a,b) and (9c,d) differ in important structural ways. While the authors take the results to tease out brain activity related to thematic information specifically, this depends on adopting a theory in which thematic role assignment is not fully determined by syntactic position (contra e.g. Baker 1997).

hemodynamic methods

577

In sum, while the results from Frankland and Greene (2015) alone shed little light on issues of linguistic representation, the study is methodologically revealing in how such connections could be made. For example, classification accuracy within the regions delineated here might serve as a testing ground for the thematic structure of more complex domains, such as psychological predicates (Pesetsky 1995), and for questions at the interface of grammatical representations and gradient conceptual representations (Dowty 1991). The next two sections dig deeper into the limitations that have been raised so far, and aim to carve a path for future research that connects more closely with syntactic questions. But, before moving on it is worth emphasizing the progress that has been made in formulating a preliminary map for the brain bases of syntax. This understanding was previewed in Table 16.2, which summarizes the kinds of functions that may be carried out within four key hubs in the sentence processing network. The value of this foundation should not be understated; it radically narrows down the space of possible brain signals that the researcher must sift through to just those that are most likely to be relevant for syntax. For example, if a researcher is considering a question about constituency, then current evidence points to the LATL and LPTL (red and green colors in Figure 16.2, respectively) as the regions where theoretical differences might lead to detectable effects. The LPTL would also be a promising region to focus on when studying thematic roles or argument structure, and the LIFG (blue in Figure 16.2) for studying memory operations used to resolve long-distance dependencies. Keeping this foundation in mind, we consider next how to go about linking hemodynamic signals with theoretical constructs like dependency resolution or constituency.

16.7 Predictability and syntactic representations

..........................................................................................................................

Sensitivity to expectations, or more generally prediction, has emerged repeatedly as a challenge for linking results from subtle linguistic manipulations to conclusions about syntactic representations. This was evident, for example, in the discussion of how long-distance dependencies are resolved, and in the potential processing consequences of using thematically unusual stimuli to distinguish syntactic and semantic processing. Prediction permeates language-processing at multiple levels of representation: predictable syntactic or semantic frames facilitate lexical, sub-lexical, and perceptual processes (Dikker and Pylkkänen 2012; Molinaro et al. 2013; see also Traxler 2014 for an overview). Willems et al. (2015) show how predictability modulates a wide-range of languagerelated brain regions (see also Lopopolo et al. 2017). They use fMRI to test for brain responses that reflect two independent aspects of predictability: entropy captures the

578

jonathan r. brennan

number of possible continuations of a particular sentence fragment and reflects predictive strength, while surprisal captures whether a particular term (word, part of speech, etc.) is unexpected given its left context. These are just two of many possible complexity metrics that can be combined with a processing model to yield estimates of processing difficulty; Hale (2016) offers a comprehensive introduction. Modeling context solely in terms of linear word sequences (no hierarchical structure), Willems et al. find distinct fMRI responses for these two aspects of predictability. Entropy modulates brain activity in frontal and parietal regions (e.g. blue and orange areas in Figure 16.2), while surprisal modulates activity in the temporal lobe (green area) bilaterally and elsewhere.12 Such results are consistent with the meta-analysis by Hagoort and Indefrey (2014) and the broader picture that predictions modulate multiple stages of language processing. Henderson et al. (2016) extend these observations to show the specific role that hierarchical structure plays in guiding expectations during a story-reading task. They estimate word-by-word surprisal where the context comes from a phrase-structure grammar. Unexpected words modulate activity in the LATL and LIFG. This is consistent with the sensitivity these regions show to hierarchical structure discussed in Section 16.5 and it grounds those findings in a familiar mechanism: word sequences that form phrases are more predictable than word sequences that do not. This result is consistent with that of Matchin et al., (2016) who find that activation in LIFG, LATL, and also LPTL, are sensitive to hierarchical structure only in more predictable real-word stimuli, and not in less predictable “jabberwocky” stimuli (see also Bonhage et al. 2015). Sensitivity to expectations need not be a confound in exploring how the brain navigates syntactic represenations. Rather, it can be leveraged to study aspects of syntactic representations. Computational accounts of prediction such as those based on the surprisal complexity metric offer plausible linking hypotheses for how syntactic representations bear on neural processes. The study by Brennan et al. (2016), discussed above, illustrates how this can work. Extending work by Henderson et al. (2016) and Willems et al. (2015), Brennan et al. develop a set of models for word-by-word surprisal. The models differ in terms of the syntactic context used to estimate surprisal. They find that surprisal estimates based on a hierarchical context-free grammar correlate better with LATL and LPTL brain activity than when surprisal is estimated solely with linear sequence information. The important point here is the recipe used to test a (simple) representational question about hierarchical structure vs. linear sequences using neural signals: A psycholinguistic model grounded in prediction was cashed out in explicit computational detail using a particular set of grammars, and a surprisal-linking hypothesis derived quantitative predictions about hemodynamic signals. Because the models’ predictions were quantitative, alternative models sharing the same complexity metric could be compared. 12

These results for surprisal are similar to those of Schuster et al. (2016) and Lowder et al. (2018), who model predictability via a behavioral Cloze task, rather than a computational model.

hemodynamic methods

579

Importantly, the ubiquitous influnce of predictabilty can cut in multiple directions. Stabler (1999) observes how predictability might mask what appear to be important representational differences. For example, highly articulated syntactic structures, such as those associated with remnant movement analyses (Kayne 1994; 1998), need not pose an extra processing burden so long as such components predictably co-occur, as in, for example, cartographic approaches to syntax (Cinque 1999). This observation means that prediction-based linking hypotheses, like surprisal, may only distinguish representational theories which license different kinds of predictions. The basic intuition is that prediction-based linking hypotheses are affected by the number of choices at each parse step, but not (necessarily) by the contents of each choice. To illustrate the intuition, consider the following analysis fragments for the sentence She spoke only to Bill. (10) illustrates a kind of remnant movement adapted from Kayne (1998) which captures certain scope properties of only, while (11) treats only as an adverbial adjunct. The analysis in (10) appears to be “more complex” as it has more branching nodes and more movement than the analysis illustrated in (11).

The apparent difference in complexity belies a similarity in the number of parsing choices made available by the two analyses. According to the remnant movement analysis in (10), once the parser encounters only, the onlyP projection is fully

580

jonathan r. brennan

Table 16.4 Hypothetical counts for utterances with intransitive verbs Utterance

Count

Mary called The bank called The truck arrived Mary arrived The student blossomed The flowers blossomed

200 100 100 50 1 1

predictable, as are the relative positions of the landing site of the PP in its specifier, and the base-generated VP in its complement. These positions are fixed by the grammar. Accordingly, the parser has equivalent number of choices as it navigates the representation in (10) as it does in the seemingly simpler analysis in (11). The take-away is that, all else being equal, a remnant-movement analysis may lead to the same kinds of processing complexity, cashed out in terms of structural expectations, as a more conventional analysis (Stabler 1999). Representational commitments can lead to diverging predictions for linking hypotheses like surprisal under other circumstances. One example comes from the unaccusativity hypothesis for intransitive verbs (Perlmutter 1978): adopting a distinction between unaccusatives and unergatives means that experience with one type of verb does not have a direct bearing on the other. To illustrate this, consider the counts in Table 16.4 which show the number of times a hypothetical language-user may have encountered utterances with various intransitive verbs. In this simplistic example, the verb called has been encountered twice as often with animate subjects compared to inanimate subjects. The reverse is true for arrive, which occurs twice as often with inanimate subjects. What will the parser do when it encounters an uncommon verb, like blossom? By strict count, blossom occurs equally often with animate and inanimate subjects. But, this could reflect only on its rare usage; experiencing just one more utterance with this verb would generate a 2:1 ratio one way or the other. One feature of parts-of-speech is that they permit efficient generalization: Experience with high-frequency verb guides learning lower-frequency items (Gleitman 1990). Consequently, if all intransitive verbs are treated the same, then blossom should be parsed faster with animate subjects, as this is the more common pattern overall. On the other hand, if intransitive verbs are separated into two categories—unergative (call) and unaccusative (arrive, blossom)—then the opposite pattern will hold: blossom will be parsed faster with an inanimate subject, matching prior experience with the verb arrive. The core idea is that theories which differ in how prior experiences are represented (i.e. whether different verbs count towards as same kind of experience or not) can lead to different hypotheses about predictive parsing. The next section turns to a framework for specifying the linking hypotheses systematically.

hemodynamic methods

581

16.8 Rigorous linking hypotheses to connect hemodynamic signals with syntactic theory

..........................................................................................................................

The sections above ranged over several themes that have long been central to hemodynamic studies of sentence processing: whether neural circuits are syntax-specific, the neural bases for hierarchical structure, and the division of labor between syntactic and semantic compositional processes. Figure 16.2 and Table 16.2 provide a rough guide for the reader to indicate that progress has been made using hemodynamic methods to identify a broad set of brain areas that are compellingly connected to syntax. The literature has not pinned down the granular functions of these brain regions. The focus of this chapter has been to think about why. The short answer is that drawing conclusions from any particular set of syntactically interesting stimuli requires a number of commitments on complex representational and processing issues. Despite the familiarity of this challenge, much work does not adequately specify the relevant assumptions. Several efforts suggest a postive direction to move forward (Matchin et al. 2014; Pallier et al. 2011; Brennan et al. 2016). The common thread in these efforts is a careful specification the linking hypotheses that mediate between syntactic representations and hemodynamic signals (Embick and Poeppel 2015). Frankland and Greene (2015) model another path that probes how representational distinctions might be neurally encoded when some linking hypotheses are not specified. What goes into building rigorous theoretical connections between syntactic representations and hemodynamic signals? The schematic in Figure 16.3 seeks to identify a set of distinct linking functions that a comprehensive model would account for. There are two things going on in this diagram. First is the set of links between a linguistic stimulus and some measured brain signal; these are the black boxes.13 These links describe the mental states that the mind enters while processing the stimuli which is captured by the parsing function P, the relation I between those mental states and equivalent co-descriptions in terms of brain states, and, lastly, the function R that maps from some comprehensive brain state to a measurable brain signal. For hemodynamic methods, R is typically the hemodynamic response function which accounts for the lag between a neuronal event and corresponding BOLD changes (see Figure 16.1B). The MVPA technique, discussed above in the context of the study by Frankland and Greene (2015), may be applied even when I and R are under-specified. This is because MVPA is sensitive to neural differences regardless of their directionality and regardless of whether adjacent voxels show similar dynamics.

13

The diagram in Fig. 16.3 is comprehension-centric, but the underlying principles are not. Extending this diagram to cover sentence production amounts to generalizing the function P across modalities, perhaps via the framework of forward models (Pickering and Garrod 2013).

582

jonathan r. brennan Memory detail Long term lexicon, chunks

Short term

Lexical input w1, w2, w3. . .

P

Conforms to

EG

Grammar

Mental states m1, m2, m3. . .

I

Brain states b1, b2, b3. . .

R

Brain signals s1, s2, s3. . .

fig. 16.3 Linking hypotheses connect properties of the grammar with neural signals. Given some lexical input w1 , w2 , w3 . . ., the parser P is a function that incrementally updates mental states m1 , m2 , m3 . . . based on this input and other parameters. These mental states are co-descriptions, by the relation I connecting mind to body, of corresponding brain states b1 , b2 , b3 . . .. These brain states affect neural signals, such as the BOLD signal that is measured with fMRI, via a response function R. The parser has various parameters, including the memory structures that it operates over. These memory structures include short-term contents, such as the current parse tree(s), and long-term contents. Contents may change as a consequence of experience with linguistic input. This is captured in the function E whose output conforms to a Grammar G. Possible outputs may be syntactic rules consistent with G and the grouping or “chunking” of syntactic rules and lexical entries that co-occur in usage. On this view, the Grammar is an abstraction over possible states that the parser may enter. This framework does not preclude proposed homomorphisms between properties of G and P (Phillips 2003; Sprouse and Hornstein 2016), but demands that a comprehensive proposal specify the necessary commitments for each linking function.

The Grammar bears on this process indirectly through its influence on the function P mapping from linguistic input to mental states. The Grammar’s effect can be understood via a second component of this diagram, the expansion of the function P that focuses on the role of memory (grey boxes). Parsing is, at its core, a system for structuring memory operations (Lewis and Vasishth 2005). The nature of these memory operations varies across sentence-processing accounts, from cue-based retrieval (Lewis and Vasishth 2005) to last-in-first-out stacks, as in the automata of Hale, (2014). What this diagram aims to capture is the consensus that the mental states resulting from function P are conditioned by the contents of memory, and sentence-related memory representations conform to the rules of the Grammar. This follows Chomsky (1965: 9): a reasonable model of languge use will incorporate, as a basic component, the generative grammar that expresses the speaker-hearer’s knowledge of the language

hemodynamic methods

583

Note that the structure of these memory representations need not be isomorphic to the Grammar. For example, commonly co-occuring grammatical primitives might be grouped together or “chunked” to facilitate processing (e.g. Hale 2014; O’Donnell 2015). This is analogous to compiling in computer science, where algorithms are efficiently reorganized for execution in some particular environment. The contents of memory, including these chunks, are conditioned by the language-user’s experience, which is captured by the function E whose outputs conform to grammatical rules specified in G. To sum up, grammatical details affect hemodynamic sigals indirectly by virtue of how they (i) constrain possible memory structures which (ii) structure the states a parser can enter which in turn (iii) relate systematically to brain states and (iv) modulate the measured brain signals that correspond to some particular stimulus. These linking functions need not be simple. The cognitive neuroscience of language stands to benefit from progress in computational psycholinguistics and computational neuroscience which offers a range of reasonable proposals for how syntactic structures might be parsed, and how parsing-related mental states connect to measurable brain signals. Several examples discussed above can be restated in the terms introduced here. For example, Matchin et al. (2014) specify a P with active-dependency formation and cash out I in terms of the time-steps required to maintain this dependency. Brennan et al., (2016) specify G in terms of either context-free or minimalist grammars, P in terms of top-down or bottom-up automata, and I in terms of the number of syntactic nodes traversed by P. Beyond these examples, a range of existing proposals are compatible with this framework. For example, Sprouse and Hornstein (2016) argue that minimalist syntactic theories are particularly appropriate for modeling the neural bases of syntax. This view is based in part on the radical derivationalism of minimalism, grounded in the application of merge for basic phrase structure and displacement. Such a proposal amounts, in present terms, to a statement of identity between G and P. This is a version of the parser-is-grammar proposal of Phillips (2003). One takeaway from the current discussion, however, is that such isomorphism is not the only way, or even the most plausible way, for syntactic theories to bear on neural signals. Stabler (1991) describes how a simple top-down parser can asynchronously navigate syntactic and semantic rules that define well-formedness from the bottom up (i.e. P and G are not isomorphic). Stabler (2013b) applies a similar framework for incrementally parsing minimalist syntactic structures. Indeed, fMRI results from Brennan et al. (2016), along side electrophysiological studies such as Brennan and Pylkkanen (2017) and Nelson et al. (2017), are consistent with a parser operating along the lines of the one discussed by Stabler. The current perspective implies that any suitable theory of grammar14 that also affords incremental parsing may, at least in principle, be neurally plausible.

14

That is, one which matches the “hidden consensus” that human language is mildly context-sensitive (Stabler 2013a).

584

jonathan r. brennan

At the other extreme, theories like that developed by Townsend and Bever (2001) that impose a strong division between online processing heuristics and grammatical structures can also be formulated in the present framework. These amount to defining P to operate over memory contents that are separate from those which are constrained by G. But, being formulable does not entail plausibility: Phillips (2012) points out that such accounts require the input to be “parsed twice” (i.e. a separate memory-internal operation must check the alignment between the output of P and G) and Hale et al. (2018) present electrophysiological evidence against such a “two-stage” account. There are two broader points. First, specifying the linking functions described in Figure 16.3, at least in part, is necessary to compellingly demonstrate how aspects of syntactic representations affect hemodynamic signals. Second, when these links are specified quantitatively (e.g. with a computational parsing model), then alternative accounts can be compared by virtue of which combination of linking hypotheses yields predictions that best match some measured hemodynamic signal. Clearly, disagreements about any single link in the chain can lead to different predictions, and it is not always clear which aspect of model should be “blamed” when it fails to match some data. But this challenge is familiar to scientific inference generally. The point advocated here is that arguments connecting neural data to syntax cannot be adequately evaluated until such assumptions are stated explicitly. A key question, perhaps the central question, is what sorts of grammatical theories might be distinguishable under this framework. This chapter does not offer a general answer to this question, nor does the literature furnish a ready list of candidates to be tested. Rather, answers are forthcoming only in the context of models that make specific commitments to each of these links. Still, I believe there are some promising avenues. One focal point is the relation I from Figure 16.3: This relation describes the link from mental states to brain differences. Complexity metrics, introduced in Section 16.7, cash out parser states in terms of processing load and are one possible way to approximate I. If such metrics are grounded in predictability, then theories whose underlying components redistribute probability mass may be distinguishable. The example above concerning unaccusativity is one such case. Alternative theories of relative clause structure are another, as detailed by Hale (2006) and Yun et al. (2015). Hunter and Dyer (2013) offer an example of how probabilistic approaches could be applied to more granular questions by demonstrating how to distribute corpus statistics onto the representational primitives of a minimalist grammar. This work presents an exciting yet largely untapped opportunity, as hemodynamic signals are acutely sensitive to predictability across brain regions engaged in different aspects of sentence-processing.

16.9 Conclusions

..........................................................................................................................

Hemodynamic methods have outlined in broad strokes the brain regions implicated in processing syntax (Figure 16.2). To develop models of these brain signals that are

hemodynamic methods

585

sufficiently granular to connect with theoretical syntax debates, linking hypotheses that capture how syntactic representations are processed, and how such processes bear on brain signals, must be explicitly specified (Figure 16.3). This chapter has noted some examples that show the sorts of problems that arise when such links are under-specified, and also highlighted the progress that is possible when explicit linking hypotheses are provided. Whereas the existing neuroimaging literature indicates where in the brain one aught to look for relevant signals, and the psycholinguistics literature furnishes numerous possible linking functions, the burden is now on researchers to draw from syntax, psycholinguistics, and neuroscience to identify and evaluate the range of possible sets of linking hypotheses. Where successful, these hypotheses serve as partial answers to how syntactic knowledge is used by the brain.

References Bahlmann, J., R. I. Schubotz, and A. D. Friederici. 2008. Hierarchical artificial grammar processing engages Broca’s area. NeuroImage 42(2): 525–534. Baker, M. C. 1997. Thematic roles and syntactic structure. In L. Haegeman (ed.), Elements of grammar. Dordrecht: Kluwer. Bemis, D. K., and L. Pylkkänen. 2011. Simple composition: A magnetoencephalography investigation into the comprehension of minimal linguistic phrases. Journal of Neuroscience 31(8): 2801–2814. Ben-Shachar, M., T. Hendler, I. Kahn, D. Ben-Bashat, and Y. Grodzinsky. 2003. The neural reality of syntactic transformations: Evidence from functional magnetic resonance imaging. Psychological Science 14(5): 433–440. Ben-Shachar, M., D. Palti, and Y. Grodzinsky. 2004. Neural correlates of syntactic movement: Converging evidence from two fMRI experiments. Neurolmage 21(4): 1320–1336. Berndt, R. S., and A. Caramazza. 1999. How “regular” is sentence comprehension in Broca’s aphasia? It depends on how you select the patients. Brain and Language 67(3): 242–247. Bonhage, C. E., J. L. Mueller, A. D. Friederici, and C. J. Fiebach. 2015. Combined eye tracking and fMRI reveals neural basis of linguistic predictions during sentence comprehension. Cortex 68: 33–47. Bornkessel-Schlesewsky, I., and M. Schlesewsky. 2013. Reconciling time, space and function: A new dorsal-ventral stream model of sentence comprehension. Brain and Language 125(1): 60–76. Bornkessel, I., S. Zysset, A. D. Friederici, D. Y. von Cramon, and M. Schlesewsky. 2005. Who did what to whom? The neural basis of argument hierarchies during language comprehension. Neurolmage 26(1): 221–233. Bornkessel-Schlesewsky, I., M. Schlesewsky, S. L. Small, and J. P. Rauschecker. 2015. Neurobiological roots of language in primate audition: Common computational properties. Trends in Cognitive Science 19(3): 142–150. Boynton, G. M., S. A. Engel, G. H. Glover, and D. J. Heeger. 1996. Linear systems analysis of functional magnetic resonance imaging in human V1. Journal of Neuroscience 16(13): 4207–4221.

586

jonathan r. brennan

Brennan, J., Y. Nir, U. Hasson, R. Malach, D. J., Heeger, and L. Pylkkänen. 2012. Syntactic structure building in the anterior temporal lobe during natural story listening. Brain and Language 120: 163–173. Brennan, J. R., and L. Pylkkänen. (2017). MEG evidence for incremental sentence composition in the anterior temporal lobe. Cognitive Science, 41(S6): 1515–1531. Brennan, J. R., E. P. Stabler, S. E. Van Wagenen, W.-M. Luh, and J. T. Hale. 2016. Abstract linguistic structure correlates with temporal activity during naturalistic comprehension. Brain and Language 157–158: 81–94. Caplan, D., E. Chen, and G. Waters. 2008. Task-dependent and task-independent neurovascular responses to syntactic processing. Cortex 44(3): 257–275. Caramazza, A., and E. B. Zurif. 1976. Dissociation of algorithmic and heuristic processes in language comprehension: Evidence from aphasia. Brain and Language 3(4): 572–582. Carreiras, M., I. Quiñones, S. Mancini, J. A. Hernandez-Cabrera, and H. Barber. 2015. Verbal and nominal agreement: An fMRI study. Neuroimage 120: 88–103. Chomsky, N. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press. Cinque, G. 1999. Adverbs and functional heads. Oxford: Oxford University Press. Cui, X., S. Bray, D. M. Bryant, G. H. Glover, and A. L. Reiss. 2011. A quantitative comparison of NIRS and fMRI across multiple cognitive tasks. Neuroimage 54(4): 2808–2821. Delogu, F., M. W. Crocker, and H. Drenhaus. 2017. Teasing apart coercion and surprisal: Evidence from eye-movements and ERPs. Cognition 161: 46–59. Devlin, J. T., R. P. Russell, M. H. Davis, C. J. Price, J. Wilson, H. E. Moss, P. M. Matthews, and L. K. Tyler. 2000. Susceptibility-induced loss of signal: comparing PET and fMRI on a semantic task. Neuroimage 11(61): 589–600. Dikker, S., and L. Pylkkänen. 2012. Predicting language: MEG evidence for lexical preactivation. Brain and Language 127. Dowty, D. 1991. Thematic proto-roles and argument selection. Language 67(3): 547–619. Dronkers, N. F., O. Plaisant, M. T. Iba-Zizen, and E. A. Cabanis. 2007. Paul Broca’s historic cases: High resolution MR imaging of the brains of Leborgne and Lelong. Brain 130(5): 1432–1441. Embick, D., and D. Poeppel. 2015. Towards a computational(ist) neurobiology of language: Correlational, integrated, and explanatory neurolinguistics. Language and Cognitive Neuroscience 30(4): 357–366. Feinberg, D. A., S. Moeller, S. M. Smith, E. Auerbach, S. Ramanna, M. Gunther, M. F. Glasser, K. L. Miller, K. Ugurbil, and E. Yacoub. 2010. Multiplexed echo planar imaging for subsecond whole brain fMRI and fast diffusion imaging. PLoS One 5(12): e15710. Fiebach, C., M. Schlesewsky, G. Lohmann, D. von Cramon, and A. Friederici. 2005. Revisiting the role of Broca’s area in sentence processing: Syntactic integration versus syntactic working memory. Human Brain Mapping 24(2): 79–91. Fine, A. B., T. F. Jaeger, T. A. Farmer, and T. Qian. 2013. Rapid expectation adaptation during syntactic comprehension. PLoS One 8(10): e77661. Frankland, S. M., and J. D. Greene. 2015. An architecture for encoding sentence meaning in left mid-superior temporal cortex. Proceedings of the National Academy of Sciences USA 112(37): 11732–11737. Friederici, A. D., and S. M. E. Gierhan. 2013. The language network. Current Opinions in Neurobiology 23(2): 250–254. Friederici, A. D., M. Meyer, and D. Y. von Cramon. 2000. Auditory language comprehension: An event-related fMRI study on the processing of syntactic and lexical information. Brain and Language 74(2): 289–300.

hemodynamic methods

587

Friederici, A. D., S.-A. Rüschmeyer, A. Hahne, and C. J. Fiebach. 2003. The role of left inferior frontal and superior temporal cortex in sentence comprehension: Localizing syntactic and semantic processes. Cerebral Cortex 13(2): 170–177. Friederici, A. D., C. J. Fiebach, M. Schlesewsky, I. D. Bornkessel, and D. Y. von Cramon. 2006. Processing linguistic complexity and grammaticality in the left frontal cortex. Cerebral Cortex 16(12): 1709–1717. Gleitman, L. 1990. The structural sources of verb meanings. Language Acquisition 1(1): 3–55. Grodzinsky, Y. 2000. The neurology of syntax: Language use without Broca’s area. Behavioral and Brain Sciences 23(1): 1-21. Grodzinsky, Y., M. M. Piñango, E. Zurif, and D. Drai. 1999. The critical role of group studies in neuropsychology: Comprehension regularities in Broca’s aphasia,. Brain and Language 67(2): 134–147. Hagoort, P. 2013. MUC (memory, unification, control) and beyond. Frontiers in Psychology 4(416). Hagoort, P., and P. Indefrey. 2014. The neurobiology of language beyond single words. Annual Review of Neuroscience 37: 347–362. Hale, J. 2006. Uncertainty about the rest of the sentence. Cognitive Science 30(4): 643–672. Hale, J. T. 2014. Automaton theories of human sentence comprehension. Stanford, CA: CSLI. Hale, J. T. 2016. Information-theoretical complexity metrics. Language and Linguistics Compass 10(9): 397–412. Hale, J. T., C. Dyer, A. Kuncoro, and J. R. Brennan. 2018. Finding syntax in human encephalography with beam search. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 1, 2727–2736. Haxby, J. V., A. C. Connolly, and J. S. Guntupalli. 2014. Decoding neural representational spaces using multivariate pattern analysis. Annual Review of Neuroscience 37: 435–456. Henderson, J. M., W. Choi, M. W. Lowder, and F. Ferreira. 2016. Language structure in the brain: A fixation-related fMRI study of syntactic surprisal in reading. Neuroimage 132: 293– 300. Hickok, G., and D. Poeppel. 2007. The cortical organization of speech processing. Nature Reviews Neuroscience 8(5): 393–402. Hornstein, N. 1999. Movement and control. Linguistic Inquiry 30(1): 69–96. Humphries, C., J. R. Binder, D. A. Medler, and E. Liebenthal. 2006. Syntactic and semantic modulation of neural activity during auditory sentence comprehension. Journal of Cognitive Neuroscience 18(4): 665–679. Humphries, C., T. Love, D. Swinney, and G. Hickok. 2005. Response of anterior temporal cortex to syntactic and prosodic manipulations during sentence processing. Human Brain Mapping 26(2): 128–138. Hunter, T., and C. Dyer. 2013. Distributions on minimalist grammar derivations. In Proceedings of the 13th Meeting on the Mathematics of Language (MoL 13), 1–11. Husband, E. M., L. A. Kelly, and D. C. Zhu. 2011. Using complement coercion to understand the neural basis of semantic composition: Evidence from an fMRI study. Journal of Cognitive Neuroscience 23(11): 3254–3266. Just, M., P. Carpenter, T. Keller, W. Eddy, and K. Thulborn. 1996. Brain activation modulated by sentence comprehension. Science 274(5284): 114–116. Kaan, E., and T. Y. Swaab. 2002. The brain circuitry of syntactic comprehension. Trends in Cognitive Sciences 6(8): 350–356. Kayne, R. S. 1994. The antisymmetry of syntax. Cambridge, MA: MIT Press.

588

jonathan r. brennan

Kayne, R. S. 1998. Overt vs. covert movement. Syntax 1(2): 128–191. Kayne, R. S. 2002. Pronouns and their antecedents. In S. D. Epstein and T. D. Seely (eds), Derivation and explanation in the Minimalist Program, 133–166. Malden, MA: Blackwell. Kuperberg, G. R., P. K. McGuire, E. T. Bullmore, M. J. Brammer, S. Rabe-Hesketh, I. C. Wright, D. J. Lythgoe, S. C. R. Williams, and A. S. David. 2000. Common and distinct neural substrates for pragmatic, semantic, and syntactic processing of spoken sentences: An fMRI study. Journal of Cognitive Neuroscience 12(2): 321–341. Kutas, M., and K. D. Federmeier. 2000. Electrophysiology reveals semantic memory use in language comprehension. Trends in Cognitive Science 4: 463–469. Lau, E. F., C. Phillips, and D. Poeppel. 2008. A cortical network for semantics: (De)constructing the N400. Nature Reviews Neuroscience 9(12): 920–933. Lewis, R., and S. Vasishth. 2005. An activation-based model of sentence processing as skilled memory retrieval. Cognitive Science 29(3): 375–419. Logothetis, N. K., and B. A. Wandell. 2004. Interpreting the bold signal. Annual Review of Physiology 66: 735–769. Lopopolo, A., S. L. Frank, A. van den Bosch, and R. M. Willems. 2017. Using stochastic language models (SLM) to map lexical, syntactic, and phonological information processing in the brain. PLoS One 12(5): e0177794. Lowder, M., Choi, W., Ferreira, F., and Henderson, J. 2018. Lexical predictability during natural reading: Effects of surprisal and entropy reduction. Cognitive Science, 42, 1166–1183. https://doi.org/10/gdqzz9 Makuuchi, M., J. Bahlmann, A. Anwander, and A. D. Friederici. 2009. Segregating the core computational faculty of human language from working memory. Proceedings of the National Academy of Sciences USA 106(20): 8362–8367. Makuuchi, M., Y. Grodzinsky, K. Amunts, A. Santi, and A. D. Friederici. 2013. Processing noncanonical sentences in Broca’s region: Reflections of movement distance and type. Cerebral Cortex 23(3): 694–702. Marantz, A. 2005. Generative linguistics within the cognitive neuroscience of language. Linguistic Review 22(2–4): 429–445. Marcus, M., M. Marcinkiewicz, and B. Santorini. 1993. Building a large annotated corpus of English: The Penn treebank. Computational Linguistics 19(2): 313–330. Marr, D. 1982. Vision: A computational investigation into the human representation and processing of visual information. New York: Freeman . Matchin, W., C. Hammerly, and E. Lau. 2016. The role of the IFG and PSTS in syntactic prediction: Evidence from a parametric study of hierarchical structure in fMRI. Cortex 88: 106–123. Matchin, W., J. Sprouse, and G. Hickok. 2014. A structural distance effect for backward anaphora in Broca’s area: An fMRI study. Brain and Language 138: 1–11. Mazoyer, B. M., N. Tzourio, V. Frak, A. Syrota, N. Murayama, O. Levrier, G. Salamon, S. Dehaene, L. Cohen, and J. Mehler. 1993. The cortical representation of speech. Journal of Cognitive Neuroscience 5(4): 467–479. Meyer, L., and A. D. Friederici. 2016. Neural systems underlying the processing of complex sentences. In G. S. Hickok and S. L. Small (eds), Neurobiology of Language, 597–606. New York: Elsevier. Molinaro, N., P. Barraza, and M. Carreiras. 2013. Long-range neural synchronization supports fast and efficient reading: EEG correlates of processing expected words in sentences. NeuroImage 72: 120–132.

hemodynamic methods

589

Nelson, M. J., I. El Karoui, K. Giber, X. Yang, L. Cohen, H. Koopman, S. S. Cash, L. Naccache, J. T. Hale, C. Pallier, and S. Dehaene. 2017. Neurophysiological dynamics of phrase-structure building during sentence processing. Proceedings of the National Academy of Sciences USA 114(18): E3669–E3678. Nieuwland, M. S., Politzer-Ahles, S., Heyselaar, E., Segaert, K., Darley, E., Kazanina, N., Von Grebmer Zu Wolfsthurn, S., Bartolozzi, F., Kogan, V., Ito, A., Mézière, D., Barr, D. J., Rousselet, G., Ferguson, H. J., Busch-Moreno, S., Fu, X., Tuomainen, J., Kulakova, E., Husband, E. M., Huettig, F. 2018. Large-scale replication study reveals a limit on probabilistic prediction in language comprehension. ELife, 7(e33468). https://doi.org/10.7554/eLife.33468 Novick, J. M., J. C. Trueswell, and S. L. Thompson-Schill. 2010. Broca’s area and language processing: Evidence for the cognitive control connection. Language and Linguistics Compass 4(10): 906–924. O’Donnell, T. J. 2015. Productivity and reuse in language: A theory of linguistic computation and storage. Cambridge MA: MIT Press. Pallier, C., A.-D. Devauchelle, and S. Dehaene. 2011. Cortical representation of the constituent structure of sentences. Proceedings of the National Academy of Sciences USA 108(6): 2522– 2527. Perlmutter, D. 1978. Impersonal passives and the unaccusative hypothesis. In Proceedings of the Fourth Annual Meeting of the Berkeley Linguistics Society, vol. 4, 157–189. Pesetsky, D. 1995. Zero syntax: Experiencers and cascades. Cambridge, MA: MIT Press. Petersson, K. M., V. Folia, and P. Hagoort. 2010. What artificial grammar learning reveals about the neurobiology of syntax. Brain and Language 120(2): 83–95. Phillips, C. 2003. Linear order and constituency. Linguistic Inquiry 34(1): 37–90. Phillips, C. 2006. The real-time status of island phenomena. Language 82(4): 795–823. Phillips, C. 2012. We don’t understand everything twice. In M. Sanz, I. Laka, and M. Tanenhaus (eds), Language down the garden path: the cognitive and biological basis for linguistic structure: Papers in honor of Thomas G. Bever. Oxford: Oxford University Press. Pickering, M. J., and S. Garrod. 2013. An integrated theory of language production and comprehension. Behavioral and Brain Sciences 36(04): 329–347. Piñango, M. M., E. Finn, C. Lacadie, and R. T. Constable. 2016. The localization of longdistance dependency components: Integrating the focal-lesion and neuroimaging record. Frontiers in Psychology 7: 1434. Poeppel, D., and D. Embick. 2005. Defining the relation betweein linguistics and neuroscience. In A. Cutler (ed.), Twenty-first century psycholinguistics: Four cornerstones, ch. 6. Mahwah, NJ: Erlbaum. Poldrack, R. A., J. A. Mumford, and T. E. Nichols. 2011. Handbook of functional MRI data analysis. Cambridge: Cambridge University Press. Pustejovsky, J. 1995. The generative lexicon. Cambridge, MA: MIT Press. Pylkkänen, L. 2008. Mismatching meanings in brain and behavior. Language and Linguistics Compass 2(4): 712–738. Pylkkänen, L. 2016. Composition of complex meaning: Interdisciplinary perspectives on the left anterior temporal lobe. In G. Hickok and S. Small (eds), Neurobiology of language. New York: Academic Press. Pylkkänen, L., and J. R. Brennan. 2018. Composition. In M. Gazzaniga, G. Mangun, and D. Poeppel (eds), The cognitive neurosciences, 6th edn. Cambridge, MA: MIT Press. Pylkkänen, L., and B. McElree. 2007. An MEG study of silent meaning. Journal of Cognitive Neuroscience 19(11): 1905–1921.

590

jonathan r. brennan

Pylkkänen, L., J. Brennan, and D. K. Bemis. 2011. Grounding the cognitive neuroscience of semantics in linguistic theory. Language and Cognitive Processes 26(9): 1317–1337. Raettig, T., S. Frisch, A. D. Friederici, and S. A. Kotz. 2010. Neural correlates of morphosyntactic and verb-argument structure processing: An EfMRI study. Cortex 46(5): 613–620. Rogalsky, C., and G. Hickok. 2009. Selective attention to semantic and syntactic features modulates sentence processing networks in anterior temporal cortex. Cerebral Cortex 19(4): 786–796. Rogalsky, C., and G. Hickok. 2010. The role of Broca’s area in sentence comprehension. Journal of Cognitive Neuroscience 23(7): 1–17. Rogalsky, C., W. Matchin, and G. Hickok. 2008. Broca’s area, sentence comprehension, and working memory: An fMRI study. Frontiers of Human Neuroscience 2: 14. Rossi, S., S. Telkemeyer, I. Wartenburger, and H. Obrig. 2012. Shedding light on words and sentences: Near-infrared spectroscopy in language research. Brain and Language 121(2): 152–163. Santi, A., and Y. Grodzinsky. 2007. Taxing working memory with syntax: Bihemispheric modulations. Human Brain Mapping 28: 1089–1097. Santi, A., and Y. Grodzinsky. 2010. fMRI adaptation dissociates syntactic complexity dimensions. NeuroImage 51(4): 1285–1293. Schuster, S., S. Hawelka, F. Hutzler, M. Kronbichler, and F. Richlan. 2016. Words in context: The effects of length, frequency, and predictability on brain responses during natural reading. Cerebral Cortex 26(10): 3889–3904. Shetreet, E., and N. Friedmann. 2012. Stretched, jumped, and fell: An fMRI investigation of reflexive verbs and other intransitives. NeuroImage 60(3): 1800–1806. Shetreet, E., and N. Friedmann. 2014. The processing of different syntactic structures: fMRI investigation of the linguistic distinction between wh-movement and verb movement. Journal of Neurolinguistics 27(1): 1–17. Shetreet, E., D. Palti, N. Friedmann, and U. Hadar. 2007. Cortical representation of verb processing in sentence comprehension: Number of complements, subcategorization, and thematic frames. Cerebral Cortex 17(8): 1958–1969. Shetreet, E., N. Friedmann, and U. Hadar. 2009. The neural correlates of linguistic distinctions: Unaccusative and unergative verbs. Journal of Cognitive Neuroscience 22(10): 2306–2315. Snijders, T. M., T. Vosse, G. Kempen, J. J. A. Van Berkum, K. M. Petersson, and P. Hagoort. 2009. Retrieval and unification of syntactic structure in sentence comprehension: An fMRI study using word-category ambiguity. Cerebral Cortex 19(7): 1493–1503. Sportiche, D., H. Koopman, and E. Stabler. 2013. An introduction to syntactic analysis and theory. Chichester: Wiley. Sprouse, J., and N. Hornstein. 2016. Syntax and the cognitive neuroscience of syntactic structure building. In G. Hickok and S. L. Small (eds), Neurobiology of Language, 165–174. New York: Academic Press. Stabler, E. 1991. Avoid the pedestrian’s paradox. In R. C. Berwick, S. P. Abney, and C. Tenny (eds), Principle-based parsing: Computation and psycholinguistics, 199–237. Alphen aan den Rijn: Kluwer Academic. Stabler, E. 1999. Remnant movement and complexity. In G. Bouma, E. W. Hinrichs, G.-J. M. Kruijff, and R. Oehrle (eds), Constraints and resources in natural language syntax and semantics, 299–326. Stanford, CA: CLSI.

hemodynamic methods

591

Stabler, E. P. 2013a. The epicenter of linguistic behavior. In M. Sanz, I. Laka, and M. K. Tanenhaus (eds), Language down the garden path: The cognitive and biological basis of linguistic structures, 316–323. Oxford: Oxford University Press. Stabler, E. P. 2013b. Two models of minimalist, incremental syntactic analysis. Topics in Cognitive Science 5(3): 611–633. Starke, M. 2001. Move reduces to Merge: A theory of locality. PhD thesis, University of Tromsø. Stowe, L. A. 1986. Parsing wh-constructions: Evidence for on-line gap location. Language and Cognitive Processes 1(3): 227–245. Stowe, L. A., C. A. Broere, A. M. Paans, A. A. Wijers, G. Mulder, W. Vaalburg, and F. Zwarts. 1998. Localizing components of a complex task: Sentence processing and working memory. Neuroreport 9(13): 2995–2999. Stowe, L. A., M. Haverkort, and F. Zwarts. 2005. Rethinking the neurological basis of language. Lingua 115: 997–1042. Stromswold, K., D. Caplan, N. Alpert, and S. Rauch. 1996. Localization of syntactic comprehension by positron emission tomography. Brain and Language 52: 452–473. Townsend, D. J., and T. G. Bever. 2001. Sentence comprehension: The integration of habits and rules, 11–44. Cambridge, MA: MIT Press. Traxler, M. J. 2014. Trends in syntactic parsing: anticipation, Bayesian estimation, and goodenough parsing. Trends in Cognitive Science 18(11): 605–611. Vandenberghe, R., A. C., Nobre, and C. J. Price. 2002. The response of left temporal cortex to sentences. Journal of Cognitive Neuroscience 14(4): 550–560. Westerlund, M., I. Kastner, M. Al Kaabi, and L. Pylkkänen. 2015. The LATL as locus of composition: MEG evidence from English and Arabic. Brain and Language 141: 124–134. Wicha, N. Y. Y., E. M. Moreno, and M. Kutas. 2004. Anticipating words and their gender: An event-related brain potential study of semantic integration, gender expectancy, and gender agreement in Spanish sentence reading. Journal of Cognitive Neuroscience 16(7): 1272–1288. Willems, R. M., S. L. Frank, A. D. Nijhof, P. Hagoort, and A. van den Bosch. 2015. Prediction during natural language comprehension. Cerebral Cortex 26(6): 2506–2516. Wilson, S. M., A. T. DeMarco, M. L. Henry, B. Gesierich, et al. 2014. What role does the anterior temporal lobe play in sentence-level processing? Neural correlates of syntactic processing in semantic variant primary progressive aphasia. Journal of Cognitive Neuroscience 26(5): 970–985. Yun, J., Z. Chen, T. Hunter, J. Whitman, and J. Hale. 2015. Uncertainty in processing relative clauses across East Asian languages. Journal of East Asian Linguistics 24: 113–148. Zaccarella, E., and A. D. Friederici. 2015. Merge in the human brain: A sub-region-based functional investigation in the left pars opercularis. Frontiers in Psychology 6: 1818. Zhang, L., and L. Pylkkänen. 2015. The interplay of composition and concept specificity in the left anterior temporal lobe: an meg study. Neuroimage 111: 228–240. Zurif, E. B., and M. M. Piñango. 1999. The existence of comprehension patterns in Broca’s aphasia. Brain and Language 70(1): 133–138.

c ha p t e r 1 7 ...........................................................................................................

a p h a s i a a n d s y n ta x ...........................................................................................................

william matchin and corianne rogalsky

17.1 Introduction

..........................................................................................................................

The representations, operations, and principles of syntactic theories are generally held to be informative about how language is represented in the human brain (Chomsky 1965; Sprouse and Hornstein 2015). For this reason, there is powerful potential for research on the nature of linguistic deficits due to brain damage, or aphasia, to inform syntactic theory. This is particularly so given that there exist disorders that appear to impair core aspects of language, such as agrammatism. Likewise, researchers and clinicians that seek to characterize the deficits in patients with aphasia and to develop assessment and treatment protocols can in principle greatly benefit from the insights into the nature of language provided by syntactic theory. However, there is currently little interaction between theoretical syntax and aphasiology. This is likely due to several reasons, including sociological ones such as the lack of researchers proficient in both fields and ineffective communication among researchers from these very different traditions. However, we suspect that there are deeper reasons for this disconnect. In particular, we suggest two fundamental obstacles: (i) a lack of insight into how grammatical operations apply to real-time sentence processing, and (ii) a focus by syntactic theories on grammatical operations, principles and modules that do not line up well with the currency of functional neuroimaging and neuropsychology: the cortical area. In addition, the assumption that agrammatism is a syndrome caused by a single underlying cognitive source potentially related to a syntactic module is likely false, as is the assumption that damage to Broca’s area is necessary and sufficient to cause agrammatism and/or Broca’s aphasia. These are related to issues that have been raised by previous authors (Mohr et al. 1978; Badecker and Caramazza 1985; Poeppel and Embick 2005; Embick and Poeppel 2015; Fridriksson et al. 2015), and we reinforce them here.

594

william matchin and corianne rogalsky

In this chapter we will first outline the methods of research in aphasia and how they have been applied to syntax. Following this, we will review the history of the interaction of these two fields, particularly with respect to the putative syndrome of “agrammatism” that is most relevant to syntactic theory. Please note that all findings and examples are in English, unless otherwise noted. We will make key observations about the successes and failures of this research. In light of these failures, we propose splitting agrammatism into at least two separate syndromes: one that is tied to deficits in phonological working memory (WM) resources, and another that is tied to deficits in morphosyntactic WM (Caplan and Waters 1999; 2013; Fiebach et al. 2005; Lewis and Vasishth 2005; Rogalsky et al. 2015; Matchin 2017). This distinction allows us to capture aspects of agrammatism that appear to be domain-general as well as those that appear to be specific to language. We then suggest steps to reconnect syntactic theory to the study of aphasia.

17.2 Aphasia

..........................................................................................................................

Aphasia is a language impairment due to brain injury. Impairments range in severity, and can affect auditory speech perception, speech production, reading, and/or writing. Most aphasia research has historically focused on individuals who have experienced a stroke (disruption of blood flow in the brain) resulting in aphasia, but aphasia can result from almost any type of brain injury, including traumatic brain injury, tumor, surgical removal of brain tissue, or infection. Aphasia can also result from neurodegenerative diseases such as frontotemporal dementia, particularly one subtype often termed “primary progressive aphasia” (Gorno-Tempini et al. 2011; Mesulam et al. 2014). While there are numerous ways to classify the subtypes of aphasia, the classifications most relevant to this chapter are discussed below.

17.2.1 Clinical aphasia assessments and classifications Typical aphasia assessment measures range from five-minute bedside assessments for patients with acute brain damage (i.e. typically within 24 hours of brain injury) to much more extensive test batteries, typically administered by speech-language pathologists in an outpatient setting to chronic patients in order to develop a long-term treatment plan. The details of these assessments can be found elsewhere (e.g. Patterson 2008), but here we will summarize the basic principles of aphasia assessments that are critical when interpreting the existing aphasia literature relevant to syntactic theory and the development of new lines of research. Perhaps the two most common aphasia batteries referenced in the aphasia research literature are the Western Aphasia Battery (WAB; Kertesz 2007) and the Boston Diagnostic Aphasia Examination (BDAE; Goodglass and Kaplan 1983). Both of these

aphasia and syntax

595

English-language batteries are designed to assess individuals with brain damage on several dimensions of language, including multiple aspects of auditory comprehension (word, sentence, and discourse), spontaneous speech production, speech repetition, naming, reading, and writing. Both the WAB and BDAE also contain nonverbal measures, including visual-spatial processing, manual gestures, and mathematical calculations to better understand the specificity of any language deficits present. The WAB’s scoring procedure provides an aphasia classification for each patient, with the possible aphasia classifications of: global, Broca’s, transcortical motor, Wernicke’s, transcortical sensory, mixed transcortical, conduction, and anomic. The BDAE does not provide criteria for aphasia classifications, but rather an approximate percentile ranking of performance in each language domain tested, with several subcategories of possible error types within each domain. These percentiles can then be used to compute expressive and comprehension competency indices. Regarding overall severity, the WAB provides an aphasia quotient, which is essentially a composite score indicating the overall severity of speech production and comprehension deficits regardless of the type of aphasia, and the BDAE includes a subjective severity rating between 0 and 5 for combined speech production and comprehension abilities. Thus, patients designated as having, for example, “severe Broca’s aphasia” may vary regarding the exact characteristics of their deficits. In addition to overall performance in these domains, error types are also tabulated to gain a more precise description of a patient’s deficits. For example, in tests of speech production, there are two main types of paraphasias (or word generation errors): phonemic (also known as literal) and verbal. Phonemic paraphasias are typically defined as words in which phoneme substitution errors are present (e.g. blupt for blunt or tup for top). Verbal paraphasias are word production errors in which an entire real word is substituted for the target word. If the produced word and the target word are highly semantically related (e.g. mother for wife), this type of verbal paraphasia is often described as a semantic paraphasia. An error can also be considered a mixed paraphasia if more than one type of error is made within the same word. Two other types of speech production errors that are examined in aphasia assessments are agrammatic and paragrammatic speech. Agrammatic and paragrammatic speech are discussed in detail below, in relation to Broca’s aphasia and Wernicke’s aphasia. Briefly, agrammatic speech refers to a general lack of grammatical structure and closed class items (e.g. determiners, inflection), whereas paragrammatic speech refers to the presence (even abundance) of grammatical information, but a misuse of such information (e.g. agreement errors) and an overall lack of coherent sentence structure. It is important to note there is no standard clinical definition of “agrammatic” aphasia and the WAB and BDAE do not include a cut-off for performance to be considered agrammatic or paragrammatic. Thus, studies of syntactic processing in aphasia that examine “agrammatic” patients may potentially have very different participant inclusion criteria.

596

william matchin and corianne rogalsky

17.2.2 Broca’s aphasia Broca’s aphasia is perhaps the best-studied of the aphasias, and is most often the focus of testing predictions of syntactic theory in individuals with aphasia. Broca’s aphasia is characterized by effortful, error-filled speech (both spontaneous production and repetition), and relatively intact comprehension (Goodglass and Kaplan 1983; Damasio 1992) (crucial exceptions to this “intact” comprehension will be discussed in detail). Both phonemic and verbal paraphasias are often present in the speech production of patients with Broca’s aphasia, and patients are typically aware of their deficits and make attempts at error correction (Goodglass and Kaplan 1983). Another attribute of Broca’s aphasia that is of particular interest here is agrammatic production,1 i.e. the absence of closed-class items, bound morphemes, and grammatical structure, resulting in singleword or short-phrase utterances (Jakobson 1956; Goodglass 1968; Gleason et al. 1975; Goodglass 1976; Kean 1977). Here is an example of agrammatic speech production in an individual with Broca’s aphasia retelling the tale of Cinderella from Love and Brumm (2011: 207) (prompted by a picture-only book of the story to reduce memory load and facilitate comprehension of the task): ...Happy. B- all- ballerina. I can’t say it. Uh, name. ... Sisters two. Mother evil. Moping. Dress, bird, and, uh, mouse. One, two, three. Uh, angels? Fairy! Crying and uh, uh, mother uh, mother lock -ed it. … Yeah! Mommy, mommy, mommy! And uh, horse and dog. Wands. Uh, uh, muck lock lop moppins [muffins]. And uh, mouse and birds or? ... Oh well. Uh, bored. Curled. Pretty. And, uh, twelve. Shoe. Uh, running. Yeah. And uh, sisters. Um, shoe? One. Shoe? Right there? Bigger. Uh, and uh, thats right. … That’s right (motions putting on a shoe). ...Yeah. And affer [ever] and ever. ...

Broca’s aphasia has traditionally been linked to damage in Broca’s area, which is typically defined as the posterior two-thirds of the left inferior frontal gyrus (Brodmann 1909; Anwander et al. 2007) (Figure 17.2, pars opercularis and pars triangularis combined). Both the aphasia type and brain region are named after Paul Broca, a French scientist and physician in the mid 1800s who was among the first to relate speech production impairments with left frontal lobe damage (in a French-speaking adult). However, in the context of examining syntactic theory in aphasia patients, it is critical to not conflate the impairments of Broca’s aphasia, particularly agrammatic production, with damage to Broca’s area. Damage to Broca’s area is not necessary for the presence of Broca’s aphasia (e.g., damage to the arcuate fasciculus and adjacent subcortical structures can be sufficient; Fridriksson, Bonilha, and Rorden 2007), and damage 1

In the literature agrammatism has been used to describe agrammatic production, agrammatic comprehension, or both. In this chapter, for consistency and specificity, we will use the terms agrammatic production and agrammatic comprehension for clarity, and use agrammatism only in regards to overarching theories or deficits related to both domains. Similarly, agrammatic aphasia is not defined in a consistent manner; thus we will specify the relevant characteristics in each case as it arises.

aphasia and syntax

597

circumscribed to Broca’s area is typically not sufficient to elicit Broca’s aphasia (Mohr 1976; Mohr et al. 1978; Fridriksson et al. 2015). Patients with Broca’s aphasia often have large left-hemisphere lesions that span portions of the frontal, temporal, and parietal lobes (Mohr et al. 1978; Naeser and Hayward 1978). A recent large-scale study found that the most common pattern of left-hemisphere brain damage associated with Broca’s aphasia includes the posterior portion of Broca’s area (pars opercularis) and the posterior superior temporal gyrus (Fridriksson et al. 2015). The only large-scale lesion-symptom mapping study of agrammatic production that we are aware of (Wilson et al. 2010) has linked syntactic production deficits with damage to the anterior portion of Broca’s area (the pars triangularis), the supplementary motor area, and white matter underlying these structures. Large-scale studies of agrammatic comprehension strongly highlight damage to posterior temporal-parietal cortex, with lesser association to frontal damage (Thothathiri, Kimberg, and Schwartz 2012; Magnusdottir et al. 2013 (Icelandic); Rogalsky et al. 2018). These results indicate that Broca’s area is clearly implicated in Broca’s aphasia and agrammatic production, but that dysfunction in Broca’s area is not solely driving the linguistic deficits of these syndromes. This distinction between Broca’s aphasia and Broca’s area is critical to keep in mind when interpreting much of the existing literature examining various elements of syntactic theory in patients. Many studies have been aimed at better understanding the role of Broca’s area in syntactic processing, and thus select subjects based on the presence or absence of damage to Broca’s area (but the Broca’s area patients almost always also have damage in surrounding regions) (e.g. Linebarger, Schwartz, and Saffran 1983; Grodzinsky 2000). Other studies include subjects based on the diagnosis of Broca’s aphasia or presence of agrammatic production (e.g. Gleason et al. 1975; Caramazza and Zurif 1976). In these studies subjects, the areas of brain damage, and therefore the affected mechanisms, may vary widely. These differences are important to consider when comparing findings across aphasia studies.

17.2.3 Wernicke’s aphasia As a group, fluent aphasias are characterized by the relative ease of producing connected speech, although the speech produced is often error-filled (Gordon 1988). Wernicke’s aphasia is perhaps the most well-known fluent aphasia, and is typically characterized by fluent (i.e. somewhat the opposite of effortful speech seen in Broca’s aphasia) speech, but impaired speech comprehension, including single-word comprehension. The speech of individuals with Wernicke’s aphasia typically contains both verbal and phonemic paraphasias, and is often described as paragrammatic, i.e. their speech often contains grammatical information, but grammatical, fully-formed sentences are rare (Goodglass and Kaplan 1983). Individuals with Wernicke’s aphasia are generally unaware of their deficits, and thus attempts at error correction are minimal. Love and Brumm (2011: 210) also provide a nice example of paragrammatic speech production of an individual with Wernicke’s aphasia retelling the Cinderella story:

598

william matchin and corianne rogalsky

First I started with a s- little, small it was the lady’s little which wa- was thing that I wanted before I could remember, but I can’t do it now. This uh- I look carefully about what he he looked around but he couldnt really try it about there. At the same time, all these things, at least one, two, three people. Which were clever to the people. This, this and she supposed to do that. … I clevered what how much that little thing she went right here. Which is fine. I did as much as I could. At the same time, at the beginning, she started to look at the um, girl who is looking for all this stuff that was going through while he was there and I watched and watched that stuff that was going and through I looked at the mice doing that. …

Similar to the relationship between Broca’s aphasia and Broca’s area, the relationship between Wernicke’s aphasia and Wernicke’s area is tenuous, at best. In fact, there is not even a consensus amongst neuroscientists studying language as to the exact location of Wernicke’s area (Mesulam et al. 2015; Tremblay and Dick 2016; Binder 2017). Nonetheless, individuals diagnosed with Wernicke’s aphasia often have large left temporal-parietal lobe damage, centered on the posterior temporal lobe (Damasio 1992; Ogar et al. 2011). Patients with Wernicke’s aphasia have served as a valuable control group in many studies of agrammatic comprehension in Broca’s aphasia (e.g. Caramazza and Zurif 1976; Zurif et al. 1993; Grodzinsky and Finkel 1998) because comprehension deficits in Wernicke’s aphasia can typically be attributed to lexical-semantic or phonological deficits, and thus serve as a control for these types of impairments in sentence-level tasks often used to study agrammatic comprehension. While we do not discuss paragrammatism any more in this chapter, it is an understudied phenomenon that could prove fruitful for understanding the nature of syntax (see Matchin and Hickok (2020) for discussion of paragrammatism and its impact on the cortical organization of syntax).

17.2.4 Conduction aphasia Conduction aphasia is also considered a fluent aphasia, characterized by largely intact auditory speech comprehension, repetition deficits, and phonemic paraphasias (often phoneme substitution errors) in speech production (Goodglass 1992; Bartha and Benke 2003; Baldo, Klostermann, and Dronkers 2008). Speech production in conduction aphasia is otherwise near normal; these patients do not exhibit the agrammatic production deficits characteristic of Broca’s aphasia (Gleason et al. 1975; Goodglass 1992). With respect to agrammatism, conduction aphasia is useful to compare with Broca’s aphasia, because individuals with conduction aphasia present with some of the same agrammatic comprehension patterns as seen in Broca’s aphasia (Caramazza and Zurif 1976). Thus conduction aphasia provides an avenue to examine performance dissociations across sentence-comprehension tasks and identify potential unique mechanisms underlying agrammatic production and comprehension. Conduction aphasia has traditionally been framed as a “disconnection syndrome,” thought to be due to damage to the arcuate fasciculus, a large white-matter pathway

aphasia and syntax

599

that connects the posterior superior temporal lobe and Broca’s area (Geschwind 1965). The predominant theory was that a disconnection between auditory speech representations in the temporal lobe and motor speech representations in Broca’s area resulted in the rather selective repetition deficits and phonemic speech errors in conduction aphasia (Wernicke 1874; Lichtheim 1885). However, there now is strong evidence that conduction aphasia results from cortical damage, not white-matter damage. The damage is most frequently in the vicinity of area Spt, a left-hemisphere posterior superior temporal region near the end of the Sylvian fissure where the temporal and inferior parietal lobes meet (see Damasio and Damasio 1980: fig. 2; Buchsbaum et al. 2011). There certainly remains debate regarding the cortical and white-matter damage contributions to conduction aphasia (Fridriksson et al. 2010, Icelandic), largely in part because the arcuate fasciculus feeds into area Spt, and thus the two are quite frequently both damaged by the same brain injury. However, the functional properties of area Spt are now fairly well characterized, and tightly align with the deficits present in conduction aphasia: Spt is frequently implicated in phonological WM and speech repetition (i.e. auditory–motor integration for speech) (Hickok et al. 2003; Buchsbaum et al. 2011; Isenberg et al. 2012; Rogalsky et al. 2015), and Spt has been shown to be more activated as a function of greater phonological load (Okada et al. 2003; Fegen, Buchsbaum, and D’Esposito 2015). It is generally agreed that the primary characteristics of conduction aphasia (impaired repetition, phonemic paraphasias) result from impairments in phonological processing, which are particularly evident in phonological WM tasks (Baldo, Klostermann, and Dronkers 2008). Patients with conduction aphasia may perform relatively well on single-word or simple-phrase repetition tasks, but exhibit significant declines in performance on non-word repetition tasks, particularly for multi-syllabic non-words, due to increased phonological processing demands (Goodglass 1992). Similarly, repetition of sentences with abstract content is typically more impaired than repetition of more concrete items (Butterworth, Campbell, and Howard 1986). It also has been frequently noted that when patients with conduction aphasia make repetition errors, they often are still able to reproduce the main gist or idea, although it is not a verbatim reproduction (Baldo, Klostermann, and Dronkers 2008). The general consensus from these findings and others is that syntactic and semantic processing are largely intact in conduction aphasia, but that deficits arise in tasks where one must rely upon phonological information to be stored and/or retrieved (Baldo, Klostermann, and Dronkers 2008; Gvion and Friedmann 2012).

17.3 Research methods

..........................................................................................................................

One potential roadblock between aphasia research and syntactic theory is difficulty generating testable hypotheses, particularly regarding selecting the right method and aphasia population(s) to investigate. It also is critical to understand the limitations

600

william matchin and corianne rogalsky

of current aphasia methods, and potential difficulties that arise due to the substantial individual variability present in aphasia populations (Jarso et al. 2013). These methodological topics are summarized in this section, to provide the linguist with an overview of potential avenues for collaboration.

17.3.1 Neuropsychology Many neuropsychological studies report single dissociations, i.e. a group of patients with the same diagnosis or symptoms all exhibit a deficit on task X, but not task Y, thereby suggesting that these patients have a selective deficit in the functions required by task X. For example, individuals with Broca’s aphasia can perform at or near ceiling for sentence comprehension tasks with high semantic-processing demands, but often exhibit impairments in the comprehension of sentences with complex syntactic structures (example adapted from Van Orden, Pennington, and Stone 2001). This finding would suggest a selective impairment for syntactic processing. However, the “holy grail” of neuropsychological studies has traditionally been the double dissociation (Teuber 1955; Van Orden, Pennington, and Stone 2001), where two patient groups are tested and one group is found to be impaired on task X but not task Y, while the other group is impaired on task Y but not task X. A finding of a single dissociation does not rule out the possibility that all individuals with any type of aphasia may perform similarly on semantic vs. syntactic comprehension tasks, not just those with Broca’s aphasia (perhaps due to overall task difficulty or attentional fatigue). However, a double dissociation between Broca’s and Wernicke’s aphasia patients on the semantic and syntactic tasks would indicate a selective syntactic deficit in Broca’s aphasia and a selective semantic deficit in Wernicke’s aphasia. Single and double dissociations found within subsets of individuals with agrammatism (with subsets defined based on aphasia severity, location of brain damage, and/or symptomatology) on sentence comprehension tasks have proved particularly insightful regarding the neural mechanisms of syntactic processing, and are discussed throughout this chapter. There are several common experimental designs that are used to investigate sentence comprehension abilities in aphasia. “Off-line” tasks such as sentence–picture matching, acceptability judgments, enactment (e.g. manipulating an object as described in the sentence), and memory probe tasks are often employed. Studies also have frequently incorporated “on-line” measures such as reading whole sentences presented all at once, self-paced reading or listening, and error detection tasks (Caplan et al. 2016). It is important to choose the type of task carefully: Task-specific cognitive and linguistic demands can affect sentence comprehension performance in individuals with aphasia, particularly related to agrammatic comprehension patterns (Cupples and Inglis 1993; Caplan, DeDe, and Michaud 2006; Caplan, Michaud, and Hufford 2013). For example, only two out of 42 aphasia patients reported on by Caplan, DeDe, and Michaud (2006) exhibited consistent performance for one sentence structure type (passives, objectrelative clauses, or reflexive pronouns) across three sentence tasks: picture matching

aphasia and syntax

601

and object manipulation, and grammaticality judgments. The brain regions implicated in agrammatic comprehension also can significantly vary depending on the task and types of sentence structures used (Gutman et al. 2010; Tyler et al. 2011; Caplan et al. 2016). In fact Caplan et al. (2016) found no lesion pattern to be consistently associated with comprehension of any one of the four syntactic elements they investigated across four tasks. Lastly, the modality of presentation may also affect sentence comprehension performance and the brain areas implicated: across studies of control subjects, Broca’s area is more frequently implicated in visual (reading) tasks than auditory comprehension tasks (see Rogalsky et al. 2018: table 4). The more frequent involvement of frontal regions in reading vs. auditory tasks may be related to greater involvement of subvocal articulation during reading compared to auditory tasks (Baddeley, Thomson, and Buchanan 1975; Slowiaczek and Clifton 1980; Daneman and Newson 1992). In summary, neuropsychological approaches to studying sentence processing in aphasia have provided, and continue to provide, provide valuable insights into the nature of sentence-processing deficits that can result from brain damage (discussed in detail in Section 17.4). Nonetheless, task differences are important to consider when interpreting aphasia research findings, as the relative cognitive and linguistic demands may reduce or exacerbate sentence comprehension deficits in individuals with aphasia.

17.3.2 Neuroimaging and lesion–symptom mapping For researchers interested in investigating the function(s) of a particular area of the brain, the advent of accessible neuroimaging techniques in the past 30 years has provided a wealth of resources to aphasia researchers to better describe and quantify the location of brain damage. However, not all structural MRI scans are created equal; there are several types of structural MRIs used in aphasia research, each providing different insights into the spatial extent, degree, and nature of the brain damage (for a review related to aphasia research, see Fisher, Prichard, and Warach 1995 and Shahid et al. 2017). “Lesion–symptom mapping” is a term used to describe a group of prominent methods used by cognitive neuroscientists to identify what anatomical locations in the brain are critical for a given behavioral task. Lesion–symptom mapping is essentially the marriage of neuropsychological and neuroimaging techniques. Like neuropsychological studies described in Section 17.3.1, the gold standard for lesion–symptom mapping studies is also the double dissociation (Teuber 1955). While there are valid concerns regarding the assumption of double dissociations that the brain is organized in a localized fashion, there is a general consensus that a double dissociation reflects that the two tasks require distinct, although perhaps not independent, neural resources (Shallice 1979; Van Orden, Pennington, and Stone 2001). The first lesion–symptom mapping studies aimed at localizing specific linguistic functions to specific brain regions were based on regions of interest (ROI). One approach was to identify participants according to the presence of a set of behavioral

602

william matchin and corianne rogalsky

symptoms of interest, and to locate regions of brain damage that had high degrees of overlap in those participants. A second approach is to identify participants based on the presence of damage in a particular ROI, e.g. Broca’s area, and compare their performance on a behavioral task (or tasks) to the performance of participants without damage to that ROI (either control subjects or patients with brain damage elsewhere). ROI studies continue to provide valuable information regarding the anatomical localization of language functions, much of which we discuss in subsequent sections of this chapter. However, it is important to note that while ROI studies can implicate a particular brain region in a particular function, they typically cannot determine if the function is specific to the ROI, or if the function is supported by regions adjacent to the ROI that are also typically damaged simply due to the vascular structure of the brain including underlying white-matter pathways of the ROI that connect distant brain regions. Voxel-based lesion–symptom mapping (VLSM; Bates et al. 2003) is currently a popular technique to determine if patients with and without damage within each voxel of the brain perform significantly differently on a particular behavioral task. A voxel is the three-dimensional unit of data generated by an MRI scan; typical resolution of structural MRI scans used in VLSM studies is a voxel size of 1mm3 (Huettel, Song, and McCarthy 2014: 13). There are several variations to VLSM, but the overall approach is to calculate a t-test or other appropriate statistic for each voxel in the brain, to determine if patients with damage in that voxel perform significantly differently on a behavioral task than patients without damage in that voxel (see Rorden, Karnath, and Bonilha 2007 for discussion regarding the appropriate statistics to use in VLSM). Analyses of Covariance (Bates et al. 2003) and multivariate analyses (Caplan et al. 2007; Yourganov et al. 2016) also can be used in VLSM to identify critical regions for a task when several regions are implicated (Wilson 2017). These post-hoc approaches allow researchers to better understand connectivity within language networks of the brain, which is critical given that there is mounting evidence from aphasia and typical language processing that single brain regions do not support a language task in isolation, but rather are part of complex, dynamic functional brain networks (Sebastian et al. 2016; Wilson 2017). Similar techniques to VLSM also are used with known anatomical or functional regions of interest as the unit of measurement instead of voxels, such as sub-regions of Broca’s area, primary auditory cortex, and the superior temporal gyrus (e.g. Caplan et al. 2016). While this ROI lesion–symptom mapping approach may have reduced spatial resolution, it also is one way to reduce multiple comparison problems inherent to voxelbased approaches, may improve statistical power, and potentially better accommodates individual variability of the functional organization within an anatomical region. Voxel-based morphometry (VBM; Ashburner and Friston 2000) is another technique to investigate brain–behavior relationships. VBM has the same goal as VLSM: to identify voxels or brain regions in which abnormalities are associated with decreased performance on a particular task. An advantage to VBM is that it also can be used in patient populations that often have graded atrophy or more subtle brain damage than stroke patients, such as individuals with primary progressive aphasia or other neurodegenerative diseases. This is possible because VBM uses a continuous measure,

aphasia and syntax

603

tissue density in each voxel, instead of the binary “damaged”/“not damaged” distinctions in VLSM. This difference allows for greater sensitivity to different degrees of tissue atrophy, and eliminates some of the arbitrary decisions regarding the threshold for considering a voxel to be “damaged.” Gray-matter and/or white-matter density can be examined using VBM. Notably, the only large-scale lesion–symptom mapping study of agrammatic production (discussed later in the chapter; Wilson et al. 2010) was conducted using VBM in individuals with primary progressive aphasia. For a detailed description and comparison of VLSM and VBM, see Wilson (2017). Finally, diffusion tensor imaging (DTI) is a neuroimaging technique that can provide quantitative information regarding not only white-matter density but also the direction and integrity of white-matter tracts. These measurements provide information regarding what brain regions are physically connected to one another by white-matter pathways, and the strength of these connections. DTI can be acquired using the same MRI scanners as structural MRIs used in the VLSM and VBM analyses described above. DTI structural connectivity analyses can also be combined with other neuroimaging and lesion–symptom mapping methodologies to provide a more detailed picture of the local and network-level disruptions to brain function due to brain injury; this combination of techniques can account for aphasia symptoms better than a single method alone (Del Gaizo et al. 2017). DTI studies themselves also have provided improved resolution regarding the white-matter pathways that support language (e.g. Rilling et al. 2008; Hayashi et al. 2012).

17.3.3 Individual variability, compensation, functional reorganization Even with recent advances, no neuropsychology or neuroimaging method can lead one to predict with even close to 100% accuracy how any one patient will perform on any given language task. Individual variability is quite high amongst patients with the same aphasia diagnosis, as well as amongst patients with highly similar areas of brain damage (Pedersen, Vinter, and Olsen 2004 (Dutch); Lazar and Antoniello 2008; Lazar et al. 2008). For example, initial severity of deficits and lesion size explain only ∼30–40% of variability in language abilities of chronic aphasia patients (Lazar et al. 2008). Jarso et al. (2013: 454) identify four distinct neurobiological mechanisms that affect language abilities post-stroke other than the location and size of the stroke: “(a) reperfusion [return of blood flow into an area initially blocked by the stroke]; (b) recovery from diaschisis [i.e. disruption of function in healthy regions due to damage in functionally or structurally connected regions]; (c) recovery from structural disconnection; and (d) ‘reorganization’ of language.” “Reorganization” in this case refers to the process of intact brain structures supporting functions previously supported by damaged regions. One way to partially circumvent these largely unpredictable neurobiological factors is to examine only acute aphasia patients, i.e. test individuals within approximately 24–48 hours of experiencing a stroke so that reperfusion, recovery, and reorganization

604

william matchin and corianne rogalsky

are minimized (Hillis and Heidler 2002; Jarso et al. 2013). There are of course many challenges to testing aphasia patients at their bedside immediately after a stroke, including difficulties with recruitment, having sufficient time available for testing, and the physical and mental fatigue that are often pronounced in the acute stages of stroke (Wilson 2017). Nonetheless, some prominent aphasia researchers have successfully collected language data on acute aphasia patients, thereby providing valuable insights into a variety of language functions, including sentence processing, which are discussed in this chapter; the work of Argye Hillis and colleagues has been particularly pioneering in this area. Despite the potential benefits of testing acute aphasia patients, additional factors can also affect sentence comprehension performance in even acute aphasia data, including motivation, overall health, and pre-stroke cognitive (e.g. WM, attention), metalinguistic and verbal abilities. For lesion–symptom mapping studies, an additional source of variability is individual differences in pre-stroke laterality of language networks. In right-handed individuals, the vast majority of strokes that cause language deficits are in the left hemisphere. But, this does not mean that the left hemisphere alone supports language function, especially speech comprehension. Numerous functional MRI studies identify bilateral temporal regions to be activated by auditory speech stimuli (Binder et al. 1994; Hickok and Poeppel 2007), and epilepsy patients with their left hemisphere anesthetized during a presurgical procedure still perform significantly above chance on a speech comprehension task using only their right hemisphere (Hickok et al. 2008). Deficits for complex sentences or sentences with non-canonical word order in patients with righthemisphere damage has also been shown (Caplan, Hildebrandt, and Makris 1996), illustrating some contribution of the right hemisphere to higher aspects of language. Thus, it is not surprising that many reported “comprehension deficits” associated with left-hemisphere damage are certainly declines in performance compared to a control group, but often the patients are not performing at chance or at floor level because of right-hemisphere contributions to the task. Together, these sources of potential individual variability in aphasia studies require careful consideration when designing and interpreting small case studies, and provide strong evidence for the need of large neuropsychological and lesion–symptom mapping studies with sufficient statistical power that include covariates for the sources of variability that can be quantified (see Shahid et al. 2017 for an in-depth discussion of power analyses for lesion–symptom mapping studies).

17.3.4 Cognitive deficits in aphasia Although one may tend to think of an individual with aphasia as having specific deficits restricted to the language domain (speech production, comprehension, etc.), this is rarely the case. Domain-general cognitive functions such as WM, attention, general alertness, cognitive control, and inhibition have been found to be impaired in many individuals with aphasia; the nature and severity of these cognitive abilities widely

aphasia and syntax

605

varies both within and across types of aphasia (Erickson, Goldinger, and LaPointe 1996; Caspari et al. 1998; Murray, Holland, and Beeson 1998; Brownsett et al. 2014; Pettigrew and Hillis 2014; Enke 2016). Critically for the present chapter, verbal WM deficits are frequently present in individuals with Broca’s aphasia as well as in individuals with conduction aphasia (Caspari et al. 1998), which may explain some of the sentence comprehension patterns present in both of these aphasia populations. The cognitive psychology literature encompasses several different definitions and dimensions of working memory. For the general purposes of this chapter, we will use Baddeley’s (2010) definition: “the system or systems that are assumed necessary in order to keep things in mind while performing tasks such as reasoning, comprehension, and learning.” The reason for the overlap of language and cognitive deficits in aphasia is twofold: (i) the pattern of damage resulting from a stroke does not adhere to fine-grained functional boundaries in the brain, but rather is determined by anatomical and vascular boundaries which can often group together distinct cognitive and linguistic processes (Fedorenko, Duncan, and Kanwisher 2012); thus brain tissues supporting multiple domains are often affected by the same brain injury; and (ii) domain-general cognitive resources support language processing (e.g. Vaden et al. 2013), and thus when they are damaged, language impairments can arise (Geranmayeh, Brownsett, and Wise 2014). As discussed earlier, common behavioral tasks used to measure sentence comprehension have been shown to tax linguistic and cognitive demands in different ways (Caplan et al. 2016). Therefore it is critical to use tasks in aphasia research that do not introduce unintended cognitive demands that may be unrelated to the study’s hypotheses. The different cognitive demands present across tasks also can be used to a researcher’s advantage to examine the role of different cognitive functions in syntactic processing or other linguistic functions. For example, performance differences on sentence comprehension tasks with varying levels of WM demands can be compared to determine how performance may change when WM resources are taxed and thus not available to support the syntactic processing demands. The remainder of the chapter examines the previous interactions between aphasia research and syntactic theory, and proposes new directions for advancing our knowledge of the neural computations of sentence processing through collaboration between the two fields.

17.4 Aphasia and syntactic theory: a history

..........................................................................................................................

The history of aphasiology and syntax2 can be characterized by three stages: a long period of study of grammatical abilities of aphasia patients without interaction of the 2 See Caplan, Hildebrandt, and Marshall (1988) and Avrutin (2001) for similar reviews of agrammatism that discuss aphasia in terms of syntactic theory.

606

william matchin and corianne rogalsky

fields; a period of intense interaction starting in the 1970s and lasting until the early to mid-2000s; and finally an overall rejection of the use of detailed theoretical syntactic constructions to study aphasia among most researchers.3 This history is quite informative about potential roadblocks that researchers may encounter in attempting to revive a close association between the fields. The most widely investigated syntactic theory in aphasia is the theory of Government and Binding (GB; Chomsky, 1981). Therefore, our review will chiefly concern GB and its successor within mainstream generative grammar, the Minimalist Program (MP; Chomsky 1995; Adger 2003; Hornstein 2009). However, the conclusions we draw here should be relevant to researchers in other frameworks, such as tree-adjoining grammar (TAG; Frank 2002), head-driven phrase structure grammar (HPSG; Pollard and Sag 1994), lexical-functional grammar (LFG; Bresnan 2001), and construction grammar (CxG; Goldberg 1995).

17.4.1 Agrammatism and the syntacto-topic conjecture The field of syntax focuses on the development of a competence model of grammatical knowledge: deep abstract systems that underlie both production and comprehension of sentences (Chomsky 1965). The close association between syntax and aphasiology began with the discovery that patients with Broca’s aphasia not only had a disorder of speech production, but also problems with sentence comprehension (Caramazza and Zurif 1976; Heilman and Scholes 1976; Schwartz, Saffran, and Marin 1980), implying a central deficit that could be tied effectively to syntactic theory. Here we focus on the study by Caramazza and Zurif (1976), which is widely cited as initiating this line of research. The key observation was that patients with Broca’s aphasia performed at chance at understanding object-relative constructions when they were semantically reversible (2), but not when they were non-reversible (1). That is, these patients had difficulty when the thematic relation among the arguments of the sentence could not be reconstructed based only on the identity of the arguments themselves without knowing their syntactic configurations. Importantly, the deficit is generally less severe for sentences with canonical word order, leading to the pattern of better performance for e.g. reversible active sentences (3) compared to reversible passive (4) sentences, (e.g. Schwartz, Saffran, and Marin 1980; for reviews, see Grodzinsky, Zurif, and Drai 1999 and Grodzinsky 20004 ). This pattern of comprehension breakdown, impaired performance for sentences with non-canonical word order and reversible thematic mapping, is called agrammatic comprehension. (1) The apple that the boy is eating is red. (better) (2) The boy that the girl is chasing is tall. (worse) 3 Our review is a rough characterization. There still remains interest in connecting aphasiology and syntax among some researchers, but in our view, this group is waning. 4 Grodzinsky (2000) also reviews related findings in Japanese and Chinese.

aphasia and syntax

607

(3) The girl pushed the boy. (better) (4) The boy was pushed by the girl. (worse) The original hypothesis stemming from this research was that patients with agrammatic comprehension lacked the ability to generate structural representations of sentences. The idea was that they could guess the correct meaning of non-reversible sentences given the arguments and plausible thematic relations between them, but they could not rely on this strategy to correctly identify the meaning of semantically reversible sentences, which provided no such clues. Combined with a strategy that assigned an agent role to the first argument of reversible sentences (Grodzinsky 1986), this proposal successfully accounted for the generally better comprehension of sentences with canonical as opposed to non-canonical word order. The hypothesis that a central syntactic deficit underlies both agrammatic production and comprehension was called the overarching agrammatism hypothesis, with the associated claim that Broca’s area is the locus of syntactic operations (Caramazza and Zurif 1976; Berndt and Caramazza 1980; Schwartz, Saffran, and Marin 1980; Zurif 1980). This proposal sought to unify agrammatic production and comprehension as resulting from damage to a central linguistic system—namely syntax—sparking an intensive effort to characterize behavioral deficits as impairments to grammatical competence. These investigations were often performed in relation to Government and Binding theory (Chomsky 1981), as well as computational and psycholinguistic approaches to sentence processing, making for strong integration of disciplines in the study of aphasia (see e.g. Caplan, Hildebrandt, and Marshall 1988). As we discuss later, this degree of integration among the disciplines is astonishing in light of the current state of affairs, which has little such integration. The biggest development regarding the use of syntactic theory in the study of agrammatism in Broca’s aphasia was the trace-deletion hypothesis (TDH; Grodzinsky 1986; 2000). The TDH states that Broca’s aphasia patients comprehension deficits are restricted to a subcomponent of the grammar—in particular, that the structural representations of sentences in patients with agrammatism lack traces of movement. Grodzinsky proposed this more restricted deficit given that agrammatic patients had largely good comprehension overall, for a variety of constructions—this implied the ability to correctly assign theta roles via intact syntactic configurations. Additional support for this proposal came from neuroimaging studies: increased activity related to the presence (Ben-Shachar et al. 2003 (Hebrew)) and distance (Santi and Grodzinsky 2007a) of Movement transformations in Broca’s area. The TDH identified a specific module of GB theory—the syntactic operation Moveα, or the transformational component of the grammar—as the functional role of Broca’s area in language. Thus it represented a major point of integration between the successful advances of syntactic theories in generative grammar and the clinical study of aphasia. More broadly, the explicit formulation of this hypothesis led the way to a more general goal of integration among these fields called the syntacto-topic conjecture (STC; Grodzinsky and Friederici 2006; Grodzinsky 2006):

608

william matchin and corianne rogalsky

1. Major syntactic operations are neurologically individuated. 2. The organization of these operations in brain space is linguistically significant. (1) states that there is a transparent mapping between syntactic operations or modules and the functions of brain areas—in other words, there are spots of brain that “do” particular grammatical operations. This was a central claim of the TDH—that the Movement module was localized to Broca’s area. As we discuss in this chapter, the success (and eventual failure) of the TDH rested on patterns of deficits in patients with assumed damage to Broca’s area. Neither the TDH nor any other example of the STC attempted to address (2), the hypothesis that the locations of grammatical modules in the brain reflects some important principle of cognitive organization. It is unclear exactly why Movement or any other syntactic module should be localized to Broca’s area in particular—in our view, an understated general flaw of theories aligning grammatical modules with pieces of brain tissue. The failure of the STC raises critical questions regarding the relation between syntactic theory and the brain (echoing concerns expressed by Chomsky 1965, Marr 1982, and Poeppel and Embick 2005 regarding linking linguistic theory to neuroscience), and potentially informs the proper formulation of syntactic theory for the purposes of alignment to aphasiology. Other patterns of data in Broca’s aphasia deepened the connections between aphasiology and syntactic theory. Hickok and colleagues highlighted several comprehension patterns that were problematic for the TDH: impaired comprehension of the matrix clause in subject-relative sentences (Caramazza and Zurif 1976; Hickok, Zurif, and Canseco-Gonzalez 1993) and impaired comprehension of which-N wh-questions with intact comprehension of bare who wh-questions (Hickok and Avrutin 1996). This led Hickok (1992) to develop the Revised Trace Deletion Hypothesis (RTDH), incorporating recent advances in syntactic theory concerning the VP-internal subject hypothesis (Kitagawa 1986; Burton and Grimshaw 1992), D-linking (Pesetsky 1987), and the hypothesized distinction between Government and Binding chains (Cinque 1990). Additionally, patients with Broca’s aphasia were argued to have difficulties comprehending sentences with phrasal movement but not head movement (Grodzinsky and Finkel 1998; cf. Wilson and Saygin 2004). These data were used by Chomsky (2001) to support the notion that head movement should be treated as a PF phenomenon rather than as part of the core syntax. With respect to agrammatic production, Friedmann and Grodzinsky (1997 (Hebrew); 2000 (Palestinian Arabic, Hebrew)) reviewed studies of production deficits in Broca’s aphasia that indicated largely preserved agreement morphology with deficits in tense, which they explained through a grammatical deficit that pruned the upper nodes of the tree structure, leaving lower nodes intact. This converged with the split-INFL hypothesis that proposed separate structural positions for tense and agreement features (Pollock 1989; Chomsky 1991). More recently, there has been some work on explaining comprehension deficits in agrammatic Broca’s aphasia through the lens of Relativized Minimality (RM; Rizzi 1990), a locality constraint on syntactic operations. Grillo (2008; 2009 (Italian)) and

aphasia and syntax

609

Garraffa and Grillo (2008 (Italian)) have attempted to characterize agrammatic comprehension and production as resulting from impoverished syntactic representations that end up causing RM violations. This account expanded upon earlier hypotheses of reduced syntactic processing resources in agrammatism (Zurif et al. 1993; Kolk 1995), particularly to syntactic representations at phase edges (Chomsky 2001), namely nominal (DP), verbal (vP), and clausal (CP) projections. Thus, developments in syntactic theory and aphasiology of agrammatism mutually reinforced each other, appearing to make good on the mentalistic commitments of syntactic theory.

17.4.2 The failure of the syntacto-topic conjecture While the STC and the broader goal of analyzing aphasia from the lens of syntactic theory have been useful in driving research, it is clear that the essence of these proposals is incorrect. The main reason is that Broca’s aphasia patients with agrammatic comprehension have behavioral patterns (reviewed below) that are contradictory and/or mysterious from the aspect of syntactic theory. Additionally, these proposals suffer from problems with explanatory adequacy: why should brain damage selectively target these syntactic representations and/or operations, and not others? These problems leave the field of aphasiology without a clear understanding of how syntactic theory relates to language deficits due to brain damage. An important set of data regarding agrammatism and its relation to syntactic theory are the careful studies of acceptability judgments in aphasia. With respect to the TDH, Broca’s aphasia patients have shown dissociations between their “agrammatic” comprehension patterns and their intact ability to make subtle acceptability judgments about a wide range of grammatical structures, including those with phrasal movement (Linebarger, Schwartz, and Saffran 1983; Wilson and Saygin 2004). We will discuss some of these findings in detail to illustrate the problem that these studies raise for the STC, as well as helping define the set of data to be explained by any successful hypothesis about agrammatic comprehension and production. Linebarger, Schwartz, and Saffran (1983) tested four patients with Broca’s aphasia on an acceptability judgment test. All four patients had agrammatic comprehension, e.g. worse performance on reversible passive than reversible active sentences.5 The task consisted of 10 conditions designed to test a variety of facets of grammatical knowledge, listed below along with examples from the grammatical and ungrammatical examples from each condition. (1) Strict subcategorization a. *He came my house at six o’clock. b. He came to my house at six o’clock. 5 All four patients had some deficits on reversible active sentences, an important point to which we return later.

610

william matchin and corianne rogalsky c. *I hope you to go to the store now. d. I want you to go to the store now.

(2) Particle movement a. *She went the stairs up in a hurry. b. She went up the stairs in a hurry. c. She rolled the carpet up in a hurry. (3) Subject-aux inversion a. *Is the boy is having a good time? b. Is the boy having a good time? (4) Empty elements a. *This job was expected Frank to get. b. Which job did you expect Alfred to get? c. Frank was expected to get the job. d. *The workmen were expected would finish by noon. (5) Tag question: subject copying a. *The little boy fell down, didn’t it? b. The little boy fell down, didn’t he? (6) Left branch condition a. *How many did you see birds in the park? b. How many birds did you see in the park? (7) Gapless relative clauses a. *Mary ate the bread that I baked a cake. b. Mary ate the bread that I baked. (8) Phrase structure rules a. *The gift my mother is very nice. b. The gift my mother got is very nice. c. The gift for my mother is very nice. (9) Reflexives a. *I helped themselves to the birthday cake. b. I helped myself to the birthday cake. c. *The famous man itself attended the ceremony. d. The famous man himself attended the ceremony. (10) Tag questions: aux copying a. *John is very tall, doesn’t he? b. John is very tall, isn’t he?

aphasia and syntax

611

1

Performance (A’)

0.9

×

×

×

×

×

×

×

×

0.8

0.7 0.6

× ×

EB

× AT

in g

s co

py

ive Au x

ul er ur ct tru

se s

Re fle x

es

s

Ga

Ph

pl

ra

es sr

an br ft Le LS

ela

ch

tiv

co

ec

nd

lau

iti

se

on

g op tc ec bj

Su

ye pt Em

au tec bj Su

yin

ts lem

er nv xi

ov le m tic Pa r

en

sio

en em

at riz go te ca ub ts St

ric

n

t

io n

0.5

VS

fig. 17.1 Acceptability judgment data reproduced from Linebarger et al. (1983). Y-axis reflects the A’ value for each experimental condition. White bars indicate the average across subjects, and individual characters mark each subject’s performance. An A’ value of .5 indicates chance performance on the task. X-axis indicates the experimental condition. The key on the right indicates the patient that corresponds to each icon. Two null A’ values reported in Linebarger have been included here as .5, as these two values resulted from either zero “yes” or zero “no” responses across the grammatical or ungrammatical examples, respectively, from that condition.

To prevent patients from being able to determine well-formedness based on local information alone, both ungrammatical and grammatical sentences contained linear word sequences that could appear in grammatical sentences. Approximately 40 trials from each condition were presented to each patient (20 grammatical, 20 ungrammatical). The results (Figure 17.1) indicate that these patients performed remarkably well—most conditions were well above chance for all four subjects, with notable decrements of performance for the three conditions that involve agreement—tag questions (subject and auxiliary) and reflexives—suggesting a working memory problem (we return to this issue below). These results indicate that these agrammatic patients in fact have largely intact grammatical knowledge, and that sentence comprehension deficits of reversible active and passive sentences likely originate from non-grammatical deficits. Wilson and Saygin (2004) performed a similar study to identify the patterns of brain damage associated with deficits of grammatical knowledge, with a focus on testing the TDH by comparing constructions requiring intact phrasal movement chains to a variety of other constructions not involving phrasal movement (Table 17.1). They divided their stimuli into those that they intuitively found more or less difficult within each

612

william matchin and corianne rogalsky

Table 17.1 Examples of stimuli from each condition (from Wilson and Saygin 2004). Condition

Grammatical

Ungrammatical

Trace/Hard

David seems likely to win Which woman did David think saw Pete? The dog which bit me was black. What did Bill buy besides apples? Could they have left without me? He donated the books to the library. The children threw the football over the fence. Could they have left town?

*John seems that it is likely to win. *Which woman did John think that saw Tony? *Me the dog which bit was black. *What did Bill buy oranges and? *Could have they left without us? *She donated the library the books. *The children sang the football over the fence. *Have they could left the city?

Trace/Easy Other/Hard Other/Easy

condition. Age-matched control subjects performed well, although not at ceiling. Patients with damage including Broca’s area showed similar deficits for sentences with and without phrasal movement, and patients without damage to Broca’s area showed the same pattern. Patients with damage to the posterior temporal lobe showed the most severe deficits, and whole-brain VLSM analyses showed little evidence of IFG involvement in deficits on constructions with or without phrasal movement. These results are consistent with similar large-scale analyses of sentence comprehension (Caplan, Hildebrandt, and Makris 1996; Dick et al. 2001; Thothathiri, Kimberg, and Schwartz 2012; Magnusdottir et al. 2013; Pillay et al. 2017; Rogalsky et al. 2018) suggesting no particular association of Broca’s area with grammatical knowledge.6 The results of these studies demonstrate that patients with agrammatic comprehension and/or damage to Broca’s area have a surprisingly good ability to judge syntactic well-formedness. Another issue is that even the observed sentence comprehension deficits in these patients do not follow straightforwardly from STC theories. For instance, comprehension of pronouns and reflexives (Grodzinsky et al. 1993; Hickok and Avrutin 1995; Santi and Grodzinsky 2007a) and reversible simple active and locative constructions (Schwartz, Saffran, and Marin 1980; Grodzinsky, Zurif, and Drai 1999) are impaired in agrammatic patients to varying extents, none of which depend on intact traces, posing a severe problem for the TDH. These problematic patterns of behavioral data are coupled with neuroimaging data on backward anaphora, illustrating that activity in this region is not tied to the presence of syntactic operations such as Movement but rather to the processing of syntactic long-distance dependencies that place demands

6 Some lesion studies have shown involvement of Broca’s area in sentence comprehension, particularly for sentences with non-canonical word order (e.g. Mesulam et al. 2015), but the general pattern across studies, particularly with respect to basic sentence comprehension, suggests limited involvement of this region in grammatical knowledge.

aphasia and syntax

613

on memory resources (Matchin, Sprouse, and Hickok 2014), regardless of the particular syntactic operation involved. Across the whole profile of sentence comprehension, production, and acceptability judgment studies, a coherent theory of the underlying deficits in agrammatism derived from syntactic theory appears impossible. The RM approach to agrammatism fares better than the TDH account in this regard (Garraffa and Grillo 2008; Grillo 2008; 2009), as it predicts a more general syntactic problem cutting across both production and comprehension. However, the agrammatic production deficits of Broca’s aphasia cannot be well described as a categorical loss of certain morphological categories or syntactic features, as deficits in production of functional morphology are dependent on the position of the item within a sentence or vary across testing sessions. For example, Gleason et al. (1975) showed that the omission of determiners and pronouns in eight agrammatic Broca’s aphasic patients occurred largely in sentence-initial position, with significantly improved performance in sentence medial position. Additionally, Dutch-speaking patients with agrammatic Broca’s aphasia do not produce inflected verb forms in incorrect syntactic positions (Bastiaanse and Van Zonneveld 1998 (Dutch)), suggesting difficulties producing functional words/morphemes but not a lack of grammatical knowledge regarding their distribution. Such data are difficult to explain with a theory that makes categorical cuts across syntactic representations, whether specific to Movement, phase edge features, or otherwise. A key point in understanding Broca’s aphasia and agrammatism is that such patients often have non-syntactic cognitive deficits. Chief among these is WM (attentional control and decision-making impairments also are identified in individuals with aphasia, albeit with greater individual differences, e.g. Glosser and Goodglass 1990; Erickson, Goldinger, and LaPointe 1996; Fridriksson et al. 2006). Although clinical assessments generally do not include a WM deficit as a diagnostic criterion for aphasia, performance on standard batteries for aphasia correlate with measures of WM capacity (Caspari et al. 1998), and damage to Broca’s area is correlated with impaired WM capacity (Pettigrew and Hillis 2014). In fact, the seminal study that spawned the association of Broca’s aphasia with a syntactic deficit, Caramazza and Zurif (1976), found the same agrammatic sentence comprehension pattern in patients with conduction aphasia. Patients with conduction aphasia in general do not exhibit agrammatic production, and do not have damage to Broca’s area or adjacent frontal lobe areas (Damasio and Damasio 1980; Buchsbaum et al. 2011). These patients do have impaired phonological WM capacity that appears tied to their sentence comprehension deficits (Friedmann and Gvion 2003; Gvion and Friedmann 2012 (Hebrew)). This suggests that the underlying cause of agrammatic sentence comprehension in both Broca’s and conduction aphasia lies in disruption of the WM resources required to process sentences rather than disruption of grammatical knowledge. In support of this idea, studies that attempt to limit the WM capacities of healthy adults using attention-demanding concurrent secondary tasks or degradation of intelligibility of stimulus materials have shown similar “agrammatic” comprehension patterns as some patients with aphasia (Dick et al. 2001; Rogalsky, Matchin, and Hickok

614

william matchin and corianne rogalsky

2008). Similarly, the behavior of children mirrors that of patients with agrammatism. Nakayama (1987) found that children made more errors on non-canonical as compared to canonical word order in their elicited sentence production, and Crain and Nakayama (1987) found that children’s errors were related to increased length of dependencies. This similarity is plausibly due to the fact that children have limited WM capacities (Cowan et al. 2006). Additionally, Broca’s aphasia patients do not process linguistic material as quickly as healthy adults (Prather et al. 1992), and when the rate of sentence presentation is slowed, their comprehension of sentences with non-canonical word order significantly improves (Love et al. 2008). These data are difficult to explain via categorical deficits in grammatical operations or representations, and are much more compatible with processing resource limitations such as WM deficits. Interestingly, Grodzinsky’s (1986: 156) original position regarding the TDH was that a disrupted processor was the root of the comprehension disorder: It is very likely that some kind of memory (either dedicated to language processing or not), or perhaps some sort of temporary store, which relates positions in sentences during comprehension (i.e., is essential for the execution of the coindexing algorithm necessary for chain formation), is disrupted, and the result is the comprehension deficit in agrammatism … Also, it is possible that the temporary store is crucial for other tasks during sentence comprehension, namely, not only for relating positions, but also for a different type of linking, namely, agreement …

This earlier position is quite similar to the WM hypothesis we provide in the next section. Altogether, these developments force us to acknowledge that while syntactic theory provides useful descriptions for the capacities of normal and agrammatic sentence processing, agrammatism (among aphasic syndromes more generally) is not a singular behavioral profile (Badecker and Caramazza 1985), and is not explainable via deficits in a particular grammatical module or syntactic operation. These observations have broader implications for the relation between syntactic theory and our understanding of linguistic deficits due to brain damage, as there is no single successful case of an impaired syntactic operation or grammatical module explaining deficits in aphasia syndromes. As a consequence, the integration of research on aphasia with syntactic theory is now quite limited. By contrast, there has been substantial progress in identifying selective deficits in domains that have not been clearly tied to grammatical modules, such as word comprehension, sentence comprehension, speech production, and speech perception (Dronkers 1996; Dronkers et al. 2004; Poeppel and Hickok 2004; Thothathiri, Kimberg, and Schwartz 2012; Mesulam et al. 2015; Rogalsky et al. 2018), illustrating that there is not a general failure of the localizationist approach to aphasia. Thus a reevaluation of the approach to the study of aphasia from the aspect of syntactic theory is needed. The failure of the STC is reminiscent of the (purported) failure of the derivational theory of complexity (DTC). The DTC posited a transparent relation between grammatical operations and online processing measures (Miller and Chomsky 1963). There

aphasia and syntax

615

were many experiments testing the DTC that represented a golden age of close interaction between psycholinguistics and syntactic theory (for reviews, see Miller 1962; Fodor et al. 1974; Phillips 1996). However, there were clearly cases where the grammatical theory predicted increased processing complexity that were not borne out in experimental data, and oftentimes the details of syntactic theory appeared to be difficult to reconcile with a direct implementation in online sentence processing (Fodor, Bever, and Garrett 1974). Following this, the period of close interaction between syntactic theory and psycholinguistics ended, and psycholinguists and syntacticians began to pursue separate interests. However, as with the DTC, the failure of the STC critically depends on the model of grammar (Fodor, Bever, and Garrett 1974). If the model changes, then the data must be re-evaluated. In particular, we think that grammatical frameworks such as the Minimalist Program (Chomsky 1995; Adger 2003; Hornstein 2009) and tree-adjoining grammar (TAG; Frank 2002), which aim to reduce language-specific cognitive machinery and principles to a minimum, (Sprouse and Hornstein 2015), are the right avenues of approach for linking syntactic theory with aphasia, and more specifically agrammatic comprehension and production. This is because these frameworks allow researchers to more fully develop and focus on the performance systems that implement grammatical knowledge, including crucially WM, and move away from attempting to localize complex grammatical modules to distinct brain regions. Then, the goal for understanding the link between syntactic theory and agrammatism would focus on how syntactic theory relates to performance, a much broader goal (Fodor, Bever, and Garrett 1974; Phillips 1996; Lewis and Phillips 2014; Brennan 2016).

17.5 Morphosyntactic WM, agrammatism, and syntactic theory

..........................................................................................................................

The main challenge for any theory of agrammatic production and comprehension deficits is to capture the disparate and often seemingly conflicting behavioral profile of Broca’s aphasia patients. In order to clarify this issue, we have compiled a list of grammatical deficits reported in patients defined as having Broca’s aphasia and/or agrammatic comprehension: (1) Sentence comprehension deficits: a. Classic agrammatic comprehension pattern: deficits on comprehension of sentences with noncanonical word order (Caramazza and Zurif 1976; Schwartz, Saffran, and Marin 1980). b. Better comprehension of object-extracted wh-phrases with bare who as opposed to which-N phrases (Hickok and Avrutin 1996; Sheppard et al. 2015; cf. Thompson et al. 1999).

616

william matchin and corianne rogalsky c. Deficits on comprehension deficits of the main clause of subject-relatives (Hickok, Zurif, and Canseco-Gonzalez 1993; Hickok and Avrutin 1995). d. Comprehension deficits for locatives (Schwartz, Saffran, and Marin 1980). e. Comprehension deficits for reversible active sentences (Schwartz, Saffran, and Marin 1980). f. Deficits on comprehension of inflection, including case and agreement (Luria 1975).

(2) Sentence production deficits: a. Deficits in generating properly formed phrases/sentences given sets of words (Zurif, Caramazza, and Myerson 1972; Caramazza et al. 1981). b. Deficits in production of closed-class words and inflectional morphology (Goodglass et al. 1972; Gleason et al. 1975). c. Increased difficulty for functional words in sentence onset position relative to sentence medial position (Gleason et al. 1975). d. Potential dissociation between production of tense (impaired) and agreement (intact) (Friedmann and Grodzinsky 1997; 2000 (Palestinian Arabic, Hebrew)). e. Deficits in production of verbs with complex argument structure, as well as the arguments associated with these verbs (Thompson et al. 1997). (3) Acceptability judgment deficits: a. Acceptability judgment deficits of anaphora: reflexives, pronouns, auxiliary copying (Blumstein et al. 1983; Linebarger, Schwartz, and Saffran 1983; Wulfeck 1988; Grodzinsky et al. 1993; Hickok, Zurif, and CansecoGonzalez 1993). b. Acceptability judgment deficits of number agreement (Wulfeck and Bates 1991). c. Acceptability judgment deficits of movement (Grodzinsky and Finkel 1998; Santi and Grodzinsky 2007a). d. Miscellaneous “difficult” acceptability judgment deficits (Wilson and Saygin 2004). (4) Processing differences: a. Slowed lexical processing (Prather et al. 1992). b. Improvement of sentence comprehension with slowed presentation rate (Love et al. 2008). c. Slowed prediction of syntactic dependencies (Zurif et al. 1993; Jakuszeit, Kotz, and Hasting 2013). d. WM deficits (Caspari et al. 1998; Pettigrew and Hillis 2014).

aphasia and syntax

617

This list is certainly not exhaustive; however, it covers a large set of data that clearly illustrate the problems with extant theories of agrammatic comprehension and production. We will shortly suggest that there are in fact two syndromes underlying “agrammatism” that account for a large portion of these data. As reviewed in the section above, many of these deficits likely derive from phonological WM issues, but the whole set cannot follow from phonological WM deficits alone. To underscore this point, there are patients with impaired phonological WM but not the same production deficits as Broca’s aphasia patients—namely, patients with conduction aphasia. While patients with conduction aphasia show many of the same “agrammatic” sentence comprehension patterns as patients with Broca’s aphasia (Caramazza and Zurif 1976; Goodglass 1992), and correspondingly similar phonological WM deficits, conduction aphasia patients do not have agrammatic production. Their spontaneous production is grammatically near normal (Gleason et al. 1975), and they do not share many of the grammatical comprehension deficits of agrammatic Broca’s aphasia, such as impairments in using function words and/or morphology in sentence comprehension and production (Caramazza et al. 1981; Blumstein et al. 1983). In other words, the striking character of agrammatic production, rather than agrammatic comprehension, calls for a syntactic explanation. It is also possible that some of the comprehension and acceptability judgment deficits listed above, such as problems with anaphora, agreement, and interpretation of long-distance subject–verb agreement, may not follow from a phonological WM deficit and could be attributable to a syntactic deficit as well. However, it must be noted that the comprehension and acceptability judgment abilities of conduction aphasia have not been investigated nearly as thoroughly as Broca’s aphasia; we consider more detailed testing of patients with conduction aphasia to be an important research goal in order to clarify similarities and differences between the two groups.

17.5.1 Phonological vs. morphosyntactic working-memory contributions to agrammatism We propose that there are actually two syndromes that tend to co-occur in patients with Broca’s aphasia. The first is a deficit in phonological WM, and the second is a deficit in morphosyntactic WM. By contrast, patients with conduction aphasia only have the phonological WM deficit. The shared phonological deficit accounts for similarities between these two patients with respect to agrammatic sentence comprehension, while damage to the morphosyntactic WM system in Broca’s aphasia accounts for agrammatic production deficits (and potentially other sentence comprehension deficits, such as agreement and anaphora) that are not present in conduction aphasia. The phonological and morphosyntactic WM systems are shown in Figure 17.3. As we review below, the two systems are neuroanatomically adjacent and are both supplied by the superior system of the middle cerebral artery. In addition, they both share some of the same

618

william matchin and corianne rogalsky

IFG, pars orbitalis

IFG, pars triangularis

IFG, pars opercularis

Superior temporal gyrus Middle temporal gyrus

Left fig. 17.2 Functional neuroanatomy of language and working memory (WM) as relevant to our proposal. Broca’s area consists of the combination of the inferior frontal gyrus, pars triangularis (IFGtri, orange) and the pars opercularis (IFGoper, yellow). The phonological WM circuit consists of the posterior superior temporal gyrus and Sylvian parietal-temporal area (pSTG, Spt, cyan) and the IFGoper. By contrast, the posterior superior middle temporal gyrus (pMTG, blue) along with the IFGtri together constitute the morphosyntactic working memory circuit.

major white-matter pathways, including the arcuate fasciculus (Yagmurlu et al. 2016).7 This means that both systems could be affected by the same stroke or degenerative disorder, resulting in deficits to both phonological WM and morphosyntactic WM in Broca’s aphasia.

17.5.2 Agrammatic comprehension and phonological WM A common view is that phonological WM consists of a phonological loop: interconnected acoustic and motor speech representations, such that the motor speech system drives activation in and “refreshes” acoustic speech representations during articulatory rehearsal (Baddeley and Hitch 1974). Based on neuroimaging and patient data, Buchsbaum, Hickok, and colleagues Buchsbaum, Hickok, and Humphries (2001), Hickok et al. (2003), Buchsbaum and D’Esposito (2008), and Hickok, Houde, and Rong (2011) have suggested that the phonological loop is localized to the posterior superior temporal gyrus (pSTG) for acoustic representations, the posterior third of the inferior frontal gyrus (pars opercularis; IFGoper) for higher-level motor representations, and a brain region posterior and nearly adjacent to the pSTG at the end of the Sylvian fissure (area 7 Broca’s area and the posterior temporal lobe are connected via the arcuate fasciculus/superior longitudinal fasciculus (AF/SLF). The classic view is that the portion of the SLF/AF relevant to language is a single pathway. However, neuroimaging techniques have shown that the AF/SLF contains at least a few functionally distinct sub-tracts

aphasia and syntax

619

Spt) for sensory-motor transformation, with the arcuate fasciculus connecting the regions together (Figure 17.2). Broca’s aphasia is associated with large left-hemisphere damage to the frontal, parietal, and temporal lobes, including IFGoper, pSTG, and Spt and the arcuate fasciculus (Fridriksson, Bonilha, and Rorden 2007; Fridriksson et al. 2015). Patients with conduction aphasia tend to have damage to a specific subset of these areas, namely Spt (Damasio and Damasio 1980; Buchsbaum et al. 2011). Both patient groups therefore typically have damage to the phonological WM circuit, which straightforwardly accounts for their shared phonological WM deficits. While both groups have damage to this circuit, Broca’s aphasia typically results from larger lesions than conduction aphasia, and usually includes frontal damage, possibly impinging on IFGtri and/or white-matter pathways connecting IFGtri to the temporal lobe. Non-canonical sentences induce substantial processing costs relative to canonical sentences in a variety of tasks and settings (Just and Carpenter 1992; Lewis, Vasishth, and Van Dyke 2006; Gibson, Tily, and Fedorenko 2013). These costs are likely due to increased demand on phonological WM resources. For instance, distracting secondary tasks that require phonological WM resources disrupt non-canonical sentences more so than canonical ones (Fedorenko, Gibson, and Rohde 2007; Rogalsky, Matchin, and Hickok 2008). This predicts that we should see a tight correlation between performance on phonological WM tasks and comprehension of sentences with non-canonical word order or garden-paths, which is the observed pattern across studies of patients with Broca’s and conduction aphasia (Caramazza and Zurif 1976; Friedmann and Gvion 2003 (Hebrew); Pettigrew and Hillis 2014). Why do non-canonical sentences require greater WM resources than canonical ones? This issue has been heavily discussed; a prominent hypothesis that this is due to the increased distance between filler and gap (Gibson 1998; 2000). We suggest in addition that sentences with non-canonical word order may also require structural revision. Central to this analysis is the hypothesis that both individuals with and without aphasia predict the structural and/or thematic roles of initial NPs during sentence comprehension (Lewis and Vasishth 2005; Demberg and Keller 2008). If so, then non-canonical sentences such as object-relatives will routinely require revision, like garden-path sentences, as the initial NP is predicted to be the agent of the embedded clause. It is in cases of both revision and distance that we assume phonological WM resources are required.

17.5.3 Deficits in canonical sentence comprehension, acceptability judgment deficits, and agrammatic production: Impaired morphosyntactic WM Section 17.5.2 describes the link between agrammatic comprehension and phonological WM. We now turn to agrammatic production. We suggest that agrammatic production, as well as certain sentence comprehension difficulties, stem from a deficit in morphosyntactic WM. Several previous authors have suggested the existence of a

620

william matchin and corianne rogalsky

specialized WM circuit for sentence processing based on patterns of neuroimaging and neuropsychological data, including dissociations between sentence comprehension abilities and phonological WM capacity (Caplan and Waters 1999; 2013; Fiebach et al. 2005). We suggest that this circuit includes the sub-region of Broca’s area anterior to the IFGoper, the IFGtri, and the posterior middle temporal gyrus, connected by the arcuate fasciculus (Figure 17.2). Consistent with this, fMRI studies by Fedorenko, Duncan, and Kanwisher (2012) and Rogalsky et al. (2015) indicate that there are subregions of Broca’s area (mostly in IFGtri) that are specific to sentences (and do not activate during phonological tasks) adjacent to sub-regions (mostly in IFGoper) that activate to both sentences and phonological tasks. Matchin and Hickok (2020) review the substantial evidence in favor of a syntactic function of the pMTG, crucially including the fact that basic sentence comprehension deficits are most strongly associated with damage to this region (Dronkers et al. 2004; Magnusdottir et al. 2013 (Icelandic); Pillay et al. 2017; Fridriksson et al. 2018; Rogalsky et al. 2018). Wilson et al. (2010; 2011) showed that in primary progressive aphasia, damage to both gray matter in the IFG as well as dorsal white-matter pathways connecting the IFG to the posterior temporal lobe contributed to both agrammatic production and comprehension. Therefore, we posit that impairments to morphosyntactic WM can result from direct damage to the pars triangularis or disconnection of this region from the pMTG (see Fridriksson, Bonilha, and Rorden 2007 for a case study of Broca’s aphasia resulting from disconnection). According to our proposal, the language-specific WM system consists of a higherorder articulatory loop, with hierarchical syntactic representations in pMTG and morphosyntactic sequence representations in IFGtri (Rogalsky et al. 2015; Matchin and Hickok, 2020). The hierarchical syntactic representations consist of phrasal maximal projections comprising bundles of features; features include structural relations (e.g. head, complement, specifier), agreement, case, and strings of phonological features for particular words (Lewis and Vasishth 2005; Lewis, Vasishth, and Van Dyke 2006). We hypothesize that the morphosyntactic resources in the IFGtri support morphological sequence representations—syntactic features arranged into linear relationships, e.g. the fact that in English determiners precede noun phrases, tense suffixation, etc. The morphosyntactic system of the IFGtri (and associated white-matter connections) could therefore act as a conduit between hierarchical syntactic representations in pMTG and phonological representations in IFGoper and premotor cortex to guide sentence production. Previous authors have argued for a similar function of the IFG in linearizing hierarchical syntactic relationships (Boeckx, Martinez-Alvarez, and Leivada 2014), taking inspiration from recent hypotheses in linguistics that linear aspects of syntax and morphology derive principally from demands of the articulatory system to produce speech in serial order (Idsardi and Raimy 2013; Berwick and Chomsky 2016). Deficits in this conduit for converting hierarchical syntactic representations into phonological forms may underlie the classic agrammatic production deficit (discussed in greater detail below). The hierarchical (pMTG) and linear (IFGtri) systems together form the loop that underlies morphosyntactic WM. Just as activation in the phonological articulatory system

aphasia and syntax

621

serves to refresh acoustic representations, activation in the morphosyntactic articulatory system serves to refresh or reactivate the lexical-syntactic chunks in the pMTG. These representations may be reactivated in contexts such as resolving long-distance dependencies such as subject–verb agreement, gap-filling, or anaphoric resolution (McElree, Foraker, and Dyer 2003; Lewis and Vasishth 2005; Lewis, Vasishth, and Van Dyke 2006; Matchin 2017). In support of this, Kush, Johns, and Van Dyke (2015) performed an experiment examining the interfering effects of phonological similarity on sentence comprehension. They found that phonological similarity only affected initial encoding of words, and did not interfere with later memory retrieval during longdistance dependency resolution, suggesting that phonological information is only used in situations of reanalysis or repair, such as non-canonical or garden-path sentences, while morphosyntactic WM may underlie long-distance dependency resolution itself. The deficits in patients with Broca’s aphasia on canonical sentence comprehension, particularly for anaphora, including pronouns, reflexives, and auxiliary copying (Linebarger, Schwartz, and Saffran 1983; Grodzinsky et al. 1993), may be understood as difficulty in reactivating syntactic information via the use of these morphosyntactic articulatory mechanisms, either through cortical damage to IFGtri or through damage resulting in disconnection to this area. This explanation extends to general comprehension or acceptability judgments deficits linked to inflection, including tense, case, and agreement (Luria 1975; Linebarger, Schwartz, and Saffran 1983; Grodzinsky et al. 1993), as the previously encountered syntactic form must be reactivated to assess the appropriateness of the presented form. To the extent that conduction aphasia patients do not have some of the comprehension difficulties listed above, we predict that this is due to the intact status of the morphosyntactic working memory system in conduction aphasia. Deficits in this system can also potentially explain processing speed differences between patients with Broca’s aphasia and healthy subjects on sentence comprehension tasks. If we assume that the morphosyntactic system can predictively drive activation of lexical-syntactic representations in the pMTG before corresponding speech input, presumably processing is sped up. Damage to the IFGtri (and/or arcuate fasciculus) thus predicts slower processing of syntactic information (Zurif et al. 1993; Jakuszeit, Kotz, and Hasting 2013 (German)), and slowing the presentation of linguistic material should facilitate comprehension when predictive resources are not available (Love et al. 2008). Most importantly, our proposal directly applies to the classic symptoms of agrammatic sentence production—critically, the asymmetry between function words/ morphology (e.g. the, past tense -ed) and content words (e.g. cat, chase). Sentence production involves the initial generation of a conceptual-semantic message and corresponding syntactic representation, followed by procedures for lexical encoding, phonological encoding, and overt production (Roelofs and Ferreira, in press; Levelt 1989). We assume that there are essentially two routes for the conversion of a syntactically structured conceptual-semantic message to overt articulatory forms: (i) a dorsal route, the morphosyntactic WM system we have outlined above (Figure 17.3, green), and (ii) a

622

william matchin and corianne rogalsky

ventral route, including portions of the anterior temporal lobe (ATL) that support the activation of conceptual features and a conceptual–phonological interface in the most anterior portion of the IFG (pars orbitali; IFGorb), connected via ventral white-matter pathways including the uncinate fasciculus (Figure 17.3, red). We assume that during healthy sentence production, both pathways are active and interact with each other to coordinate the production of sentences (Figure 17.3, A). While both content and function words have syntactic and conceptual features, the conceptual features of content words are much richer and elaborate than those of function words. Thus, for function words, the path to articulation may rely crucially on the dorsal morphosyntactic route. Content words, on the other hand, even in the face of dorsal stream disruption, can still be produced by the intact ventral route via conceptual features. This accounts for

(a) Healthy sentence production: chased the cat Output (coordinated)

[V] -[әd ] [ðә ]-[N]

Planned message Morpho-syntactic sequences IFGtri

Dorsal pathway Arcuate fasciculus

Lexical-syntactic representations pSTS/MTG

VP NP

V PAST SG.

[t∫eΙs] [kæt]

Conceptualarticulatory interface IFGorb

Ventral pathway Uncinate fasciculus

Conceptualsemantic features ATL

CHASE

D

N

DEF SG.

SG. CAT

(b) Agrammatic sentence production: chase, cat Output (coordinated)

…

Planned message Morpho-syntactic pho-syntactic sequences IFGtri

Dorsal pathw pathway fascicu Arcuate fasciculus

Lexical-syntactic representations pSTS/MTG

VP

[t∫eΙs] [kæt]

Conceptualarticulatory interface IFGorb

Ventral pathway Uncinate fasciculus

Conceptualsemantic features ATL

NP

V PAST SG. CHASE

D

N

DEF SG.

SG. CAT

fig. 17.3 A schematic of healthy and agrammatic sentence/phrase production with respect to the dorsal and ventral pathways to articulation. The syntactic tree in green and the conceptual features in red illustrate the planned message, syntactically encoded (for simplicity, we omit the thematic relations among arguments). In healthy sentence production (A), the syntactic and conceptual features are transmitted via dorsal and ventral routes (respectively) to frontal lobe structures for coordinated speech articulation. In agrammatic production (B), the dorsal stream is damaged, which either results in a loss of morphosyntactic sequence representations (due to cortical damage) and/or impairs/prevents the use of syntactic features to guide selection of morphosyntactic sequences (due to white matter damage). The intact ventral route allows for the use of conceptual features to guide production, allowing for the production of isolated function words. SG = singular, DEF = definite, ATL = anterior temporal lobe, pSTS/MTG = posterior superior temporal sulcus/middle temporal gyrus, IFGorb = inferior frontal gyrus, pars orbitalis, IFGtri = inferior frontal gyrus, pars triangularis.

aphasia and syntax

623

the asymmetry between function and content words in agrammatic Broca’s aphasia patients who have damage primarily to the dorsal stream (Fridriksson et al. 2015; Figure 17.3, B). Individuals with Broca’s aphasia (i.e. those with agrammatic production) often have large left peri-Sylvian lesions that can encompass both of these proposed dorsal and ventral routes to healthy sentence production. Their selective morphosyntactic impairments but intact conceptual-semantic processing might also be explained by functional reorganization: Right-hemisphere ATL and IFG regions may be able to support conceptual-semantic and conceptual-articulatory processing adequately so that patients present with agrammatic production. Studies in acute stroke patients (i.e. within 24 hours after a stroke, thus allowing little time for compensation or functional reorganization to occur) are needed to test these possible distinct contributions of the dorsal and ventral routes to sentence production. While both Broca’s aphasia and conduction aphasia involve damage to the phonological WM system, resulting in agrammatic comprehension deficits, damage is often restricted to the posterior STG/Spt portion in conduction aphasia (Damasio and Damasio 1980; Buchsbaum et al. 2011). Thus the stark differences between these syndromes with respect to agrammatic production can be accounted for straightforwardly by the intact morphosyntactic system in conduction aphasia. We have not attempted to exhaustively describe how damage to morphosyntactic WM applies to the list of deficits attributed to Broca’s aphasia above. However, at least some of the salient deficits in agrammatism (particularly agrammatic production) not clearly explained by phonological WM problems can be coherently addressed through this framework, while they cannot in previous hypotheses.

17.6 Conclusions: Integrating syntactic theory and aphasiology

..........................................................................................................................

Assuming that the linguistic deficits of agrammatism not accounted for by verbal WM are successfully explained by deficits to the proposed morphosyntactic system, where does that leave the connection between syntactic theory and aphasia? As a starting point, our review highlights the general failure of the STC, regardless of the success of our proposal. This suggests that syntactic theories that propose elaborate grammatical machinery have not been successful in explaining patterns of language deficits in aphasia (i.e. Government and Binding theory). Theories which aim to reduce the degree of domain-specific operations and principles to a minimum, such as Chomsky’s Minimalist Program (1995) and Tree-Adjoining Grammar (Frank 2002) are preferable in this respect. At present, however, it is unclear to us how exactly syntactic theories of this sort might relate to our proposed WM architecture. It is beyond the scope of our review to present and explore hypotheses of this relation. However, we would like to

624

william matchin and corianne rogalsky

drive home that the critical challenge is to implement the postulates of syntactic theory in a processing architecture rather than connecting syntactic theory directly to behavior (including aphasia). This is part of a broader question of how syntactic theory connects to real-time sentence comprehension (Fodor, Bever, and Garrett 1974; Berwick and Weinberg 1986; Phillips 1996; Lewis and Phillips 2014), which we believe is critical for there to be effective interaction between syntactic theory and aphasia research.

References Adger, David. 2003. Core syntax: A minimalist approach. Oxford: Oxford University Press. Anwander, A. et al. 2007. Connectivity-based parcellation of Broca’s area. Cerebral Cortex 17(4): 816–825. Ashburner, John, and Karl J. Friston. 2000. Voxel-based morphometry: The methods. NeuroImage 11(6 I): 805–821. Avrutin, Sergey. 2001. Linguistics and agrammatism. Glot International 5(3): 87–97. Baddeley, Alan. 2010. Working memory. Current Biology 20(4): R136–R140. Baddeley, Alan D., and Graham Hitch. 1974. Working memory. Psychology of Learning and Motivation: Advances in Research and Theory 8(C): 47–89. Baddeley, Alan D., Neil Thomson, and Mary Buchanan. 1975. Word length and the structure of short-term memory, Journal of Verbal Learning and Verbal Behavior 14(6): 575–589. Badecker, William, and Alfonso Caramazza. 1985. On considerations of method and theory governing the use of clinical categories in neurolinguistics and cognitive neuropsychology: The case against agrammatism. Cognition 20: 97–125. Badre, David, and Anthony D. Wagner. 2007. Left ventrolateral prefrontal cortex and the cognitive control of memory. Neuropsychologia 45(13): 2883–2901. Baldo, Juliana V., Ellen C. Klostermann, and Nina F. Dronkers. 2008. Its either a cook or a baker: Patients with conduction aphasia get the gist but lose the trace. Brain and Language 105(2): 134–140. Bartha, Lisa, and Thomas Benke. 2003. Acute conduction aphasia: An analysis of 20 cases. Brain and Language 85(1): 93–108. Bastiaanse, Roelien, and Ron Van Zonneveld. 1998. On the relation between verb inflection and verb position in Dutch agrammatic aphasics, Brain and Language 64(2): 165–181. Bates, Elizabeth et al. 2003. Voxel-based lesion-symptom mapping. Nature Neuroscience 6(5): 448–450. Ben-Shachar, Michal et al. 2003. The neural reality of syntactic transformations: Evidence from functional magnetic resonance imaging, Psychological Science 14(5): 433–440. Berndt, Rita S., and Alfonso Caramazza. 1980. A redefinition of the syndrome of Broca’s aphasia: Implications for a neurological model of language, Applied Psycholinguistics 1(3), 225–278. Berwick, Robert C., and Noam Chomsky. 2016. Why only us? Language and evolution. Cambridge, MA: MIT Press. Berwick, Robert C., and Amy S. Weinberg. 1986. The grammatical basis of linguistic performance: Language use and acquisition. Cambridge, MA: MIT Press. Binder, Jeffrey R. 2017. Current controversies on Wernicke’s Area and its Role in Language. Current Neurology and Neuroscience Reports 17(58): 1–10.

aphasia and syntax

625

Binder, J. R. et al. 1994. Functional magnetic resonance imaging of human auditory cortex, Annals of Neurology 35(6): 662–672. Binder, Jeffrey R. et al. 2009. Where is the semantic system? A critical review and meta-analysis of 120 functional neuroimaging studies, Cerebral Cortex 19(12): 2767–2796. Blumstein, S. E. et al. 1983. Comprehension strategies determining reference in aphasia: A study of reflexivization, Brain and Language 18(1): 115–127. Boeckx, Cedric, Anna Martinez-Alvarez, and Evelina Leivada. 2014. The functional neuroanatomy of serial order in language, Journal of Neurolinguistics 32: 1–15. Brennan, Jonathan. 2016. Naturalistic sentence comprehension in the brain. Linguistics and Language Compass 10(7): 299–313. Bresnan, Joan. 2001. Lexical-functional syntax. Oxford: Wiley–Blackwell. Brodmann, Korbinian. 1909. Vergleichende Lokalisationslehre der Grobhirnrinde in ihren Prinzipien dargestellt aufgrund des Zellenbaues. Leipzig: Johann Ambrosius Barth. Brownsett, Sonia L. E. et al. 2014. Cognitive control and its impact on recovery from aphasic stroke. Brain 137(1): 242–254. Buchsbaum, Bradley R., and Mark D’Esposito. 2008. The search for the phonological store: From loop to convolution, Journal of Cognitive Neuroscience 20(5): 762–778. Buchsbaum, Bradley R., Gregory Hickok, and Colin Humphries. 2001. Role of left posterior superior temporal gyrus in phonological processing for speech perception and production, Cognitive Science 25(5): 663–678. Buchsbaum, Bradley R. et al. 2011. Conduction aphasia, sensory-motor integration, and phonological short-term memory: An aggregate analysis of lesion and fMRI data. Brain and Language 119(3): 119–128. Burton, Strang, and Jane Grimshaw. 1992. Coordination and VP-internal subjects. Linguistic Inquiry 23(2): 305–313. Butterworth, Brian, Ruth Campbell, and David Howard. 1986. The uses of short-term memory: A case study. Quarterly Journal of Experimental Psychology Section A, 38(4): 705–737. Caplan, David, and Gloria S. Waters. 1999. Verbal working memory and sentence comprehension, Behavioral and Brain Sciences 22(1): 77–126. Caplan, David, and Gloria Waters. 2013. Memory mechanisms supporting syntactic comprehension, Psychonomic Bulletin and Review 20(2): 243–268. Caplan, David, Hildebrandt, Nancy, and John C. Marshall. 1988. Disorders of syntactic comprehension. Cambridge, MA: MIT Press. Caplan, David, Nancy Hildebrandt, and Nikos Makris. 1996. Location of lesions in stroke patients with deficits in syntactic processing in sentence comprehension. Brain 119(3): 933–949. Caplan, David, Gayle DeDe, and Jennifer Michaud. 2006. Task-independent and task-specific syntactic deficits in aphasic comprehension. Aphasiology 20(9–11): 893–920. Caplan, David et al. 2007. A study of syntactic processing in aphasia I: Behavioral (psycholinguistic) aspects, Brain and Language 101(2): 103–150. Caplan, David, Jennifer Michaud, and Rebecca Hufford. 2013. Dissociations and associations of performance in syntactic comprehension in aphasia and their implications for the nature of aphasic deficits, Brain and Language 127(1): 21–33. Caplan, David et al. 2016. Deficit-lesion correlations in syntactic comprehension in aphasia. Brain and Language 152: 14–27.

626

william matchin and corianne rogalsky

Caramazza, Alfonso, and Edgar B. Zurif. 1976. Dissociation of algorithmic and heuristic processes in language comprehension: Evidence from aphasia, Brain and Language 3(4): 572–582. Caramazza, Alfonso et al. 1981. An investigation of repetition and language processing in a case of conduction aphasia, Brain and Language 14(2): 235–271. Caspari, Isabelle et al. 1998. Working memory and aphasia, Brain and Cognition 37(2): 205– 223. Chomsky, N. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press. Chomsky, Noam. 1981. Lectures on government and binding. Dordrecht: Foris. Chomsky, Noam. 1991. Some notes on economy of derivation and representation. In R. Freidin (ed.) Principles and parameters in comparative grammar, 417–454. Cambridge, MA: MIT Press. Chomsky, Noam. 1995. The minimalist program. Cambridge, MA: MIT Press. Chomsky, Noam. 2001. Derivation by phase. In M. Kenstowicz (ed.), Ken Hale: A life in language, 1–52. Cambridge, MA: MIT Press. Cinque, Guglielmo. 1990. Types of A-dependencies. Cambridge, MA: MIT Press. Cowan, Nelson et al. 2006. Scope of attention, control of attention, and intelligence in children and adults. Memory and Cognition 34(8): 1754–1768. Crain, Stephen, and Mineharu Nakayama. 1987. Structure dependence in grammar formation. Language 63(3): 522. Cupples, L., and A. L. Inglis. 1993. When task demands induce “asyntactic” comprehension: A study of sentence interpretation in aphasia. Cognitive Neuropsychology 10(3): 201–234. Damasio, Antonio R. 1992. Aphasia. New England Journal of Medicine 326(8): 531–539. Damasio, Hanna, and Antonio R. Damasio. 1980. The anatomical basis of conduction aphasia. Brain 103(2): 337–350. Daneman, Meredyth, and Margaret Newson. 1992. Assessing the importance of subvocalization during normal silent reading. Reading and Writing 4(1): 55–77. Del Gaizo, John et al. 2017. Mapping language networks using the structural and dynamic brain connectomes. Eneuro 4(5): 1–31. Demberg, Vera, and Frank Keller. 2008. Data from eye-tracking corpora as evidence for theories of syntactic processing complexity. Cognition 109(2): 193–210. Dick, Frederic et al. 2001. Language deficits, licalization, and grammar: Evidence for a distributive model of language breakdown in aphasics and normals. Psychological Review 108(4): 759–788. Dronkers, N. F. 1996. A new brain region for coordinating speech articulation. Nature 384(6605): 159–161. Dronkers, Nina F. et al. 2004. Lesion analysis of the brain areas involved in language comprehension. Cognition 92(1–2): 145–177. Embick, David, and David Poeppel. 2015. Towards a computational(ist) neurobiology of language: Correlational, integrated and explanatory neurolinguistics. Language, Cognition and Neuroscience 30(4): 357–366. Enke, A. Finn. 2016. One-eyed dog. Transgender Studies Quarterly 3(1–2): 81–83. Erickson, Robert J., Stephen D. Goldinger, and Leonard L. LaPointe. 1996. Auditory vigilance in aphasic individuals: Detecting nonlinguistic stimuli with full or divided attention. Brain and Cognition 30(2): 244–253. Fedorenko, Evelina, Edward Gibson, and Douglas Rohde. 2007. The nature of working memory in linguistic, arithmetic and spatial integration processes. Journal of Memory and Language 56(2): 246–269.

aphasia and syntax

627

Fedorenko, Evelina, John Duncan, and Nancy Kanwisher. 2012. Language-selective and domain-general regions lie side by side within Broca’s area. Current Biology 22(21): 2059– 2062. Fegen, David, Buchsbaum, Bradley R. and DEsposito, Mark. 2015. The effect of rehearsal rate and memory load on verbal working memory, NeuroImage, 105, 120–131. Fiebach, Christian J. et al. 2005. Revisiting the role of Broca’s area in sentence processing: Syntactic integration versus syntactic working memory, Human Brain Mapping, 24(2), 79–91. Fisher, Marc, Prichard, James W., and Warach, Steven. 1995. New Magnetic Resonance Techniques for Acute Ischemic Stroke, JAMA: The Journal of the American Medical Association, 274(11), 908–911. Fodor, Jerry, Bever, Thomas and Garrett, Merrill. 1974. The psychology of language. New York: McGraw-Hill. Frank, Robert. 2002. Phrase Structure Composition and Syntactic Dependencies. Cambridge, MA: MIT Press. Fridriksson, Julius et al. 2006. Functional communication and executive function in aphasia, Clinical Linguistics and Phonetics, 20(6), 401–410. Fridriksson, Julius, Bonilha, Leonardo, and Rorden, Chris. 2007. Severe Broca’s aphasia without Broca’s area damage, Behavioural Neurology, 18(4), 237–238. Fridriksson, J. et al. 2010. Impaired Speech Repetition and Left Parietal Lobe Damage, Journal of Neuroscience, 30(33), 11057–11061. Fridriksson, Julius et al. 2015. Chronic Broca’s aphasia is caused by damage to Broca’s and Wernicke’s areas, Cerebral Cortex, 25(12), 4689–4696. Fridriksson, Julius et al. 2018. Anatomy of aphasia revisited, Brain, 141(3), 848–862. Friedmann, NaAma, and Grodzinsky, Yosef. 1997. Tense and agreement in agrammatic production: Pruning the syntactic tree, Brain and Language, 56(3), 397–425. Friedmann, Naama, and Grodzinsky, Yosef. 2000. Split inflection in neurolinguistics, The Acquisition of Syntax: Studies in Comparative Developmental Linguistics, 84–104. Friedmann, Naama, and Gvion, Aviah. 2003. Sentence comprehension and working memory limitation in aphasia: A dissociation between semantic-syntactic and phonological reactivation, Brain and Language, 86(1), 23–39. Garraffa, Maria, and Grillo, Nino. 2008. Canonicity effects as grammatical phenomena, Journal of Neurolinguistics, 21(2), 177–197. Geranmayeh, Fatemeh, Brownsett, Sonia L. E., and Wise, Richard J. S. 2014. Task-induced brain activity in aphasic stroke patients: What is driving recovery?, Brain, 137(10), 2632– 2648. Geschwind, Norman. 1965. Disconnexion syndromes in animals and man, Brain, 88(3), 585– 585. Gibson, Edward. 1998. Linguistic complexity: Locality of syntactic dependencies, Cognition, 68(1), 1–76. Gibson, Edward. 2000. The dependency locality theory: A distance-based theory of linguistic complexity, in Marantz, A., Miyashita, Y., and ONeil, W. (eds) Image, language, brain. Cambridge, MA: MIT Press, 95–126. Gibson, Edward, Tily, Hal, and Fedorenko, Evelina. 2013. The processing complexity of English relative clauses, in Language Down the Garden Path: The Cognitive and Biological Basis for Linguistic Structure. Oxford: Oxford University Press, 149–173. Gleason, Jean Berko et al. 1975. The retrieval of syntax in Broca’s aphasia, Brain and Language, 2, 451–471.

628

william matchin and corianne rogalsky

Glosser, G., and Goodglass, H. 1990. Disorders in executive control functions among aphasic and other brain-damaged patients, Journal of Clinical and Experimental Neuropsychology, 12(4), 485–501. Goldberg, Adele E. 1995. Constructions: A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press. Goodglass, Harold. 1968. Studies in the grammar of aphasics, in Rosenberg, S. and Koplin, J. (eds) Developments in applied psycholinguistic research. New York: MacMillan. Goodglass, Harold. 1976. Agrammatism, in Whitaker, H. and Whitaker, H. A. (eds) Studies in neurolinguistics (Vol. 1). New York: Academic Press. Goodglass, Harold. 1992. Diagnosis of conduction aphasia, in Kohn, S. E. (ed.) Conduction Aphasia. Hillsdale, N.J.: Lawrence Erlbaum Associates, 39–49. Goodglass, Harold, and Kaplan, Edith. 1983. The Assessment of Aphasia and related Disorders. 2nd edn. Philadelphia: Lea & Febiger. Goodglass, Harold et al. 1972. Some Linguistic Structures in the Speech of a Broca’s Aphasic, Cortex, 8(2), 191–212. Gordon, Barry. 1988. Preserved learning of novel information in amnesia: Evidence for multiple memory systems, Brain and Cognition, 7(3), 257–282. Gorno-Tempini, M. L. et al. 2011. Classification of primary progressive aphasia and its variants, Neurology, 76(11), 1006–1014. Grillo, Nino. 2008. Generalized minimality: Syntactic underspecification in Broca’s aphasia. University of Utrecht. Grillo, Nino. 2009. Generalized Minimality: Feature impoverishment and comprehension deficits in agrammatism, Lingua, 119(10), 1426–1443. Grodzinsky, Yosef. 1986. Language deficits and the theory of syntax, Brain and Language, 27(1), 135–159. Grodzinsky, Yosef. 2000. The neurology of syntax: Language use without Broca’s area, Behavioral and Brain Sciences, 23(1), 1–71. Grodzinsky, Yosef. 2006. A Blueprint for a Brain Map of Syntax, in Grodzinsky, Y. and Amunts, K. (eds) Broca’s Region. New York: Oxford University Press, 1–29. Grodzinsky, Yosef and Finkel, Lisa. 1998. The neurology of empty categories: Aphasics failure to detect ungrammaticality, Journal of Cognitive Neuroscience, 10(2), 281–292. Grodzinsky, Yosef, and Friederici, Angela D. 2006. Neuroimaging of syntax and syntactic processing, Current Opinion in Neurobiology, 16(2), 240–246. Grodzinsky, Yosef et al. 1993. The breakdown of binding relations, Brain and Language, 45(3), 396–422. Grodzinsky, Yosef, Zurif, Edgar, and Drai, Dan. 1999. The Critical Role of Group Studies in Neuropsychology: Comprehension Regularities in Broca’s Aphasia, Brain and Language, 67(2), 134–147. Gutman, Roee et al. 2010. Rasch models of aphasic performance on syntactic comprehension tests, Cognitive Neuropsychology, 27(3), 230–244. Gvion, Aviah and Friedmann, Naama. 2012. Does phonological working memory impairment affect sentence comprehension? A study of conduction aphasia, Aphasiology, 26(3–4), 494– 535. Hayashi, Yutaka et al. 2012. Correlation between language function and the left arcuate fasciculus detected by diffusion tensor imaging tractography after brain tumor surgery, Journal of Neurosurgery, 117(5), 839–843. Heilman, Kenneth M., and Robert J. Scholes. 1976. The nature of comprehension errors in Broca’s, Conduction and Wernicke’s Aphasics. Cortex 12(3): 258–265.

aphasia and syntax

629

Hickok, Gregory. 1992. Agrammatic comprehension and the trace-deletion hypothesis. Cambridge, MA: MIT Center for Cognitive Science. Hickok, Gregory, and Sergey Avrutin. 1995. Representation, referentiality, and processing in agrammatic comprehension: Two case studies. Brain and Language 50(1): 10–26. Hickok, Gregory, and Sergey Avrutin. 1996. Comprehension of wh-Questions in two Broca’s aphasics. Brain and Language 52(52): 314–327. Hickok, Gregory, and David Poeppel. 2007. The cortical organization of speech processing. Nature Reviews Neuroscience 8(5): 393–402. Hickok, Gregory, Edgar Zurif, and Enriqueta Canseco-Gonzalez. 1993. Structural description of agrammatic comprehension. Brain and Language 45(3): 371–395. Hickok, Gregory et al. 2003. Auditory-motor interaction revealed by fMRI: Speech, music, and working memory in area Spt. Journal of Cognitive Neuroscience 15(5): 673–682. Hickok, Gregory et al. 2008. Bilateral capacity for speech sound processing in auditory comprehension: Evidence from Wada procedures. Brain and Language 107(3): 179–184. Hickok, Gregory, John Houde, and Feng Rong. 2011. Sensorimotor integration in speech processing: Computational basis and neural organization. Neuron 69(3): 407–422. Hillis, Argye E., and Jennifer Heidler. 2002. Mechanisms of early aphasia recovery. Aphasiology 16(9): 885–895. Hornstein, Norbert. 2009. A theory of syntax. New York: Cambridge University Press. Huettel, S. A., A. W. Song, and G. McCarthy. 2014. Advanced fMRI methods. In Functional Magnetic Resonance Imaging, 3rd edn. Oxford: Oxford University Press. Idsardi, William, and Eric Raimy. 2013. Three types of linearization and the temporal aspects of speech. In Challenges to linearization, 31–56. Berlin: de Gruyter. Isenberg, A. Lisette et al. 2012. Functionally distinct regions for spatial processing and sensory motor integration in the planum temporale, Human Brain Mapping 33(10): 2453–2463. Jakobson, Roman. 1956. Two aspects of language and two types of aphasic disturbances. In R. Jakobson and M. Halle (eds), Fundamentals of language, 115–133. The Hague: Mouton. Jakuszeit, Maria, Sonja A. Kotz, and Anna S. Hasting. 2013. Generating predictions: Lesion evidence on the role of left inferior frontal cortex in rapid syntactic analysis. Cortex 49(10): 2861–2874. Jarso, Samson et al. 2013. Distinct mechanisms and timing of language recovery after stroke. Cognitive Neuropsychology 30(7–8): 454–475. Just, Marcel Adam, and Patricia A. Carpenter. 1992. A capacity theory of comprehension: Individual differences in working memory. Psychological Review 99(1): 122–149. Kean, Mary Louise. 1977. The linguistic interpretation of aphasic syndromes: Agrammatism in Broca’s aphasia, an example. Cognition 5(1): 9–46. Kertesz, Andrew. 2007. The Western Aphasia Battery, revised. New York: Grune & Stratton. Kitagawa, Yoshihisa. 1986. Subjects in Japanese and English. Thesis, University of Massachusetts, Amherst. Kolk, H. 1995. A time-based approach to agrammatic production. Brain and Language 50(3): 282–303. Kush, Dave, Clinton L. Johns, and Julie A. Van Dyke. 2015. Identifying the role of phonology in sentence-level reading. Journal of Memory and Language 79–80: 18–29. Lazar, Ronald M., and Daniel Antoniello. 2008. Variability in recovery from aphasia. Current Neurology and Neuroscience Reports 8(6): 497–502. Lazar, Ronald M. et al. 2008. Variability in language recovery after first-time stroke, Journal of Neurology, Neurosurgery and Psychiatry 79(5): 530–534.

630

william matchin and corianne rogalsky

Levelt, Willem J. M. 1989. Speaking: From intention to articulation. Cambridge, MA: MIT Press. Lewis, Richard L., and Shravan Vasishth. 2005. An activation-based model of sentence processing as skilled memory retrieval. Cognitive Science 29(3): 375–419. Lewis, Richard L., Shravan Vasishth, and Julie A. Van Dyke. 2006. Computational principles of working memory in sentence comprehension. Trends in Cognitive Sciences 10(10): 447–454. Lewis, Shevaun, and Colin Phillips. 2014. Aligning grammatical theories and language processing models. Journal of Psycholinguistic Research 44(1): 27–46. Lichtheim, Ludwig. 1885. On aphasia. Brain 7: 433–484. Linebarger, Marcia C., Schwartz, Myrna F. and Saffran, Eleanor M. (1983) Sensitivity to grammatical structure in so-called agrammatic aphasics. Cognition 13(3): 361–392. Love, Tracy, and Kathleen Brumm. 2011. Language processing disorders. In L. P. Shapiro and R. Peach (eds), Cognition and acquired language disorders, 202–226. New York: Elsevier. Love, Tracy et al. 2008. How left inferior frontal cortex participates in syntactic processing: Evidence from aphasia. Brain and Language 107(3): 203–219. Luria, Alexander R. 1975. Two kinds of disorders in the comprehension of grammatical constructions. Linguistics 13(154–155): 47–56. Magnusdottir, S. et al. 2013. Damage to left anterior temporal cortex predicts impairment of complex syntactic processing: A lesion–symptom mapping study. Human Brain Mapping 34(10): 2715–2723. Marr, David. 1982. Vision: A computational investigation into the human representation and processing of visual information. Cambridge, MA: MIT Press. Matchin, William G. 2017. A neuronal retuning hypothesis of sentence-specificity in Broca’s area. Psychonomic Bulletin and Review 25(5): 1–13. Matchin, W., & Hickok, G. 2020. The cortical organization of syntax. Cerebral Cortex 30(3): 1481–1498. Matchin, William, Jon Sprouse, and Gregory Hickok. 2014. A structural distance effect for backward anaphora in Broca’s area: An fMRI study. Brain and Language 138: 1–11. McElree, Brian, Stephani Foraker, and Lisbeth Dyer. 2003. Memory structures that subserve sentence comprehension, Journal of Memory and Language 48(1): 67–91. Mesulam, M. Marsel et al. 2014. Primary progressive aphasia and the evolving neurology of the language network. Nature Reviews Neurology 10(10): 554–569. Mesulam, M. Marsel et al. 2015. The Wernicke conundrum and the anatomy of language comprehension in primary progressive aphasia. Brain 138(8): 2423–2437. Miller, George A. 1962. Some psychological studies of grammar. American Psychologist 17(11): 748–762. Miller, George A., and Noam Chomsky. 1963. Finitary models of language users. In Duncan Iuce, Robert R. Bush, and Eugene Galanter (eds), Handbook of Mathematical Psychology, vol. 2, 419–491. New York: Wiley. Mohr, J. P. 1976. Broca’s area and Broca’s aphasia. In H. Whitaker and H. Whitaker (eds), Studies in neurolinguistics, vol. 1, 201–235. New York: Elsevier. Mohr, J. P. et al. 1978. Broca aphasia: Pathologic and clinical. Neurology 28(4): 311–324. Murray, Laura L. Holland, Audrey L. Beeson, and M. Pelagie. 1998. Spoken language of individuals with mild fluent aphasia under focused and divided-attention conditions. Journal of Speech, Language and Hearing Research 41: 213–227. Naeser, M. A., and R. W. Hayward. 1978. Lesion localization in aphasia with cranial computed tomography and the Boston Diagnostic Aphasia Exam. Neurology 28(6): 545–551.

aphasia and syntax

631

Nakayama, Mineharu. 1987. Performance factors in subject–auxiliary inversion by children. Journal of Child Language 14(1): 113. Ogar, J. M. et al. 2011. Semantic dementia and persisting Wernicke’s aphasia: Linguistic and anatomical profiles, Brain and Language 117(1): 28–33. Okada, Kayoko et al. 2003. Word length modulates neural activity in auditory cortex during covert object naming. Neuroreport 14(18): 2323–2326. Patterson, J. P. 2008. Assessment of language disorders in adults. In R. Chapey (ed.), Language intervention strategies in aphasia and related neurogenic communication disorders, 64–160. Baltimore, MD: Wolters Kluwer. Patterson, Karalyn, Peter J. Nestor, and Timothy T. Rogers. 2007. Where do you know what you know? The representation of semantic knowledge in the human brain. Nature Reviews Neuroscience 8(12): 976–987. Pedersen, Palle Møller, Kirsten Vinter, and Tom Skyhøj Olsen. 2004. Aphasia after stroke: Type, severity and prognosis. The Copenhagen aphasia study. Cerebrovascular Diseases 17(1): 35– 43. Pesetsky, David. 1987. WH-in-situ: Movement and unselective binding. In E. J. Reuland and A. ter Meulen (eds), The representation of (in)definiteness, 98–129. Cambridge, MA: MIT Press. Pettigrew, Corinne, and Argye E. Hillis. 2014. Role for memory capacity in sentence comprehension: Evidence from acute stroke. Aphasiology 28(10): 1258–1280. Phillips, Colin. 1996. Order and structure. Cambridge, MA: MIT Press. Pillay, Sara B. et al. 2017. Lesion localization of speech comprehension deficits in chronic aphasia. Neurology 88(10): 970–975. Poeppel, David, and David Embick. 2005. Defining the relation between linguistics and neuroscience. In A. Cutler (ed.), Twenty-first century psycholinguistics: Four cornerstones, 103–118. Hillsdale, NJ: Erlbaum. Poeppel, David, and Gregory Hickok. 2004. Towards a new functional anatomy of language. Cognition 92(1–2): 1–12. Pollard, Carl, and Ivan A. Sag. 1994. Head-driven phrase structure grammar. Chicago: University of Chicago Press. Pollock, J. Y. 1989. Verb movement, Universal Grammar, and the structure of IP. Linguistic inquiry, 20(3), 365–424. Prather, Penny et al. 1992. Slowed lexical access in nonfluent aphasia: A case study, Brain and Language 43(2): 336–348. Rilling, James K. et al. 2008. The evolution of the arcuate fasciculus revealed with comparative DTI. Nature Neuroscience 11(4): 426–428. Rizzi, Luigi. 1990. Relativized minimality. Cambridge, MA: MIT Press. Roelofs, Ardi, and Victor S. Ferreira. n.d. The architecture of speaking. https://www. ardiroelofsscience.nl/Roelofs_Ferreira_Chptr_2019.pdf Rogalsky, Corianne, William Matchin, and Gregory Hickok. 2008. Broca’s area, sentence comprehension, and working memory: An fMRI study. Frontiers in Human Neuroscience 2(14): 1–13. Rogalsky, Corianne et al. 2015. Sentence processing selectivity in Broca’s area: Evident for structure but not syntactic movement. Language, Cognition and Neuroscience 30(10): 1326–1338. Rogalsky, Corianne et al. 2018. The neurobiology of agrammatic sentence comprehension: A lesion study. Journal of Cognitive Neuroscience 30(2): 234–255.

632

william matchin and corianne rogalsky

Rorden, Chris, Hans Otto Karnath, and Leonardo Bonilha. 2007. Improving lesion–symptom mapping. Journal of Cognitive Neuroscience 19(7): 1081–1088. Santi, Andrea, and Yosef Grodzinsky. 2007a. Taxing working memory with syntax: Bihemispheric modulations. Human Brain Mapping 28(11): 1089–1097. Santi, Andrea, and Yosef Grodzinsky. 2007b. Working memory and syntax interact in Broca’s area. NeuroImage 37(1): 8–17. Schwartz, Myrna F., Eleanor M. Saffran, and Oscar S. M. Marin. 1980. The word order problem in agrammatism, I. Comprehension. Topics in Catalysis 10(2): 249–262. Sebastian, Rajani et al. 2016. Imaging network level language recovery after left PCA stroke. Restorative Neurology and Neuroscience 34(4): 473–489. Shahid, Hinna et al. 2017. Important considerations in lesion–symptom mapping: Illustrations from studies of word comprehension. Human Brain Mapping 38: 2990–3000. Shallice, Tim. 1979. Case study approach in meuropsychological research. Journal of Clinical Neuropsychology 1(3): 183–211. Sheppard, Shannon M. et al. 2015. The auditory comprehension of wh-questions in aphasia: Support for the intervener hypothesis. Journal of Speech Language and Hearing Research 58(3): 781. Slowiaczek, Maria L., and Charles Clifton, Jr. 1980. Subvocalization and reading for meaning. Journal of Verbal Learning and Verbal Behavior 19(5): 573–582. Sprouse, Jon, and Norbert Hornstein. 2015. Syntax and the cognitive neuroscience of syntactic structure building. In G. Hickok and S. L. Small (eds), Neurobiology of language, 165–174. New York: Elsevier. Stefanatos, Gerry A., and Ida Sue Baron. 2011. The ontogenesis of language impairment in autism: A neuropsychological perspective. Neuropsychology Review 21(3): 252–270. Teuber, H. L. 1955. Physiological psychology. Annual Review of Psychology 6(1): 267–296. Thompson, C. K. et al. 1997. Agrammatic and non-brain-damaged subjects verb and verb argument structure production. Aphasiology 11(4–5): 473–490. Thompson, Cynthia K. et al. 1999. Agrammatic aphasic subjects comprehension of subject and object extracted Wh questions. Brain and Language 67(3): 169–187. Thothathiri, M., D. Y. Kimberg, and M. F. Schwartz. 2012. The neural basis of reversible sentence comprehension: Evidence from voxel based lesion-symptom mapping in aphasia, Journal of Cognitive Neuroscience 24(1): 212–222. Tremblay, Pascale, and Anthony Steven Dick. 2016. Broca and Wernicke are dead, or moving past the classic model of language neurobiology. Brain and Language 162: 60–71. Tyler, Lorraine K. et al. 2011. Left inferior frontal cortex and syntax: Function, structure and behaviour in patients with left hemisphere damage. Brain 134(2): 415–431. Vaden, K. I. et al. 2013. The cingulo-opercular metwork provides word-recognition benefit. Journal of Neuroscience 33(48): 18979–18986. Van Orden, Guy C., Bruce F. Pennington, and Gregory O. Stone. 2001. What do double dissociations prove? Cognitive Science 25(1): 111–172. Wernicke, Carl. 1874. Der aphasische Symptomencomplex. Eine psychologische Studie auf anatomischer Basis. Breslau: Cohn & Weigert. Wilson, Stephen M. 2017. Lesion–symptom mapping in the study of spoken language understanding. Language, Cognition and Neuroscience 32(7): 891–899. Wilson, Stephen M., and Ayşe Pinar Saygin. 2004. Grammaticality judgment in aphasia: Deficits are not specific to syntactic structures, aphasic syndromes, or lesion sites. Journal of Cognitive Neuroscience 16(2): 238–252.

aphasia and syntax

633

Wilson, Stephen M. et al. 2010. Connected speech production in three variants of primary progressive aphasia. Brain 133(7): 2069–2088. Wilson, Stephen M. et al. 2011. Syntactic processing depends on dorsal language tracts. Neuron 72(2): 397–403. Wulfeck, B. B. 1988. Grammaticality judgments and sentence comprehension in agrammatic aphasia, Journal of Speech and Hearing Research 31(1): 72–81. Wulfeck, B., and E. Bates. 1991. Differential sensitivity to errors of agreement and word order in Broca’s aphasia. Journal of Cognitive Neuroscience 3(3): 258–272. Yagmurlu, Kaan et al. 2016. Fiber tracts of the dorsal language stream in the human brain. Journal of Neurosurgery 124(5): 1396–1405. Yourganov, Grigori et al. 2016. Multivariate connectome-based symptom apping in poststroke patients: Networks supporting language and speech. Journal of Neuroscience 36(25): 6668–6679. Zurif, E. B. (1980). Language Mechanisms: A Neuropsychological Perspective: The effects of focal brain damage on the processing of syntactic elements may provide an important clue to the manner in which language is organized in the brain. American Scientist 68(3): 305– 311. Zurif, E. B., A. Caramazza, and R. Myerson. 1972. Grammatical judgments of agrammatic aphasics. Neuropsychologia 10(4): 405–417. Zurif, E. et al. 1993. An on-line analysis of syntactic processing in Broca’s and Wernicke’s aphasia. Brain and Language 45: 448–464.

Annotated bibliography for Part IV

..........................................................................................................................

Each of the contributors in this section have compiled a brief annotated bibliography of resources for readers interested in learning how to use the methods discussed in the chapters. The annotated bibliographies are organized below by chapter. ***

Chapter 15. Electrophysiological methods (Jon Sprouse and Diogo Almeida)

..........................................................................................................................

Luck, Steven J. 2014. An introduction to the event-related potential technique. Cambridge, MA: MIT Press. Steve Luck’s textbook is the place to start for anyone interested in learning EEG. It provides a complete introduction to all of the fundamentals of EEG and the most common analysis technique, the event-related potential technique. The newest edition also includes a number of online chapters that delve into more advanced topics and analysis techniques. Cohen, Mike X. 2014. Analyzing neural time series data. Cambridge, MA: MIT Press. Mike X. Cohen’s textbook is the textbook to read for anyone interested in learning timefrequency decomposition. It can, in principle, be read on its own, but it will likely be most useful to readers who are already familiar with the basics of EEG and the ERP technique. Delorme, Arnaud, and Scott Makeig. 2004. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics. Journal of Neuroscience Methods 134: 9–21 Download: https://sccn.ucsd.edu/eeglab/index.php EEG data analysis requires familiarity with one or more programming languages. At present, there are far more resources for EEG data analysis written in Matlab than any other language. EEGLAB is a free toolbox for Matlab that provides a complete solution to EEG analysis. It has three particular strengths: (i) users can write their own toolboxes for EEGLAB to extend its functionality, (ii) it has both a scripting option and a graphical user interface, and (iii) it has the most well-developed set of ICA (independent components analysis) tools. Lopez-Calderon, Javier, and Steven J. Luck. 2014. ERPLAB: an open-source toolbox for the analysis of event-related potentials. Frontiers in human neuroscience 8: 213. Download: https://erpinfo.org/erplab/

636

annotated bibliography for part iv

ERPLAB is a free toolbox for EEGLAB developed by the Luck lab. It provides a complete solution to using the ERP technique (and also implements a number of methodological recommendations found in Luck 2014). As a toolbox for EEGLAB, it provides both a scripting and graphical user interface option. Oostenveld, Robert, Pascal Fries, Eric Maris, and Jan-Mathijs Schoffelen. 2011. FieldTrip: open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data. Computational intelligence and neuroscience 2011: 1. Download: http://www.fieldtriptoolbox.org/ Fieldtrip is a free toolbox for Matlab that provides a complete solution to EEG analysis. Though it can be used for standard ERP analysis, Fieldtrip’s strength lies in advanced analysis techniques, like time-frequency analysis. We typically use ERPLAB for ERP analysis and Fieldtrip for time-frequency analysis. Cohen, Mike X. 2017. Matlab for brain and cognitive scientists. Cambridge, MA: MIT Press. For readers who wish to learn more about Matlab programming, Mike X. Cohen has written a Matlab textbook that is specifically tailored to EEG data analysis. Groppe, David M., Thomas P. Urbach, and Marta Kutas. 2011. Mass univariate analysis of event-related brain potentials/fields I: A critical tutorial review. Psychophysiology 48: 1711–1725. Download: https://openwetware.org/wiki/Mass_Univariate_ERP_Toolbox Fields, Eric C., and Gina R. Kuperberg. 2018. Having Your Cake and Eating It Too: Flexibility and Power with Mass Univariate Statistics for ERP Data. PsyArXiv. doi:10.31234/osf.io/qfkgc. Download: https://github.com/ericcfields/FMUT/wiki The statistical analysis of EEG data is a particularly complex topic given the extreme multiple comparisons problem posed by multiple electrodes and high sampling rates. Mass univariate permutation tests provide a good solution to this problem. The Fieldtrip toolbox implements its own mass univariate permutation tests. For EEGLAB, there are two plugins. The Mass Univariate Toolbox implements permutation tests for one condition and two-condition experimental designs (t-tests). The Factorial Mass Univariate Toolbox implements permutation tests for factorial designs. A list of open-source textbooks: https://aimath.org/textbooks/approved-textbooks/ EEG data analysis requires relatively complex mathematics. In most cases, the software solutions discussed above will perform the math without requiring user intervention.

annotated bibliography for part iv

637

For readers interested in a deeper understanding of the math, here is a list of free, opensource math textbooks. The important concepts will likely be found in trigonometry and linear algebra (sine/cosine, dot product, Fourier transform, convolution, complex numbers, etc). ***

Chapter 16. Hemodynamic methods (Jonathan R. Brennan)

..........................................................................................................................

Embick, D., and Poeppel, D. 2015. Towards a computational(ist) neurobiology of language: Correlational, integrated, and explanatory neurolinguistics. Language Cognition Neuroscience, 30(4): 357–366. This paper presents a deep dive into the complexities of connecting theories of competence, like those proposed in syntax, with performance accounts that bear on neural mechanisms. Hagoort, P., and Indefrey, P. 2014. The neurobiology of language beyond single words. Annu Rev Neurosci, 37: 347–62. This is a comprehensive review and meta-analysis of brain regions implicated in processing sentences, excluding regions implicated in speech or word processing. The analysis is couched within a theoretical framework that highlights the importance of predictive processing. Hickok, G., and Poeppel, D. 2007. The cortical organization of speech processing. Nature Reviews Neuroscience, 8(5): 393–402. This influential model of the neural bases of speech perception posits “two streams” of information processing. The one involved in comprehension passes from the auditory cortex to a “combinatory hub” on the anterior aspect of the left temporal lobe. Rogalsky, C., and Hickok, G. 2010. The role of Broca’s area in sentence comprehension. Journal of Cognitive Neuroscience, 23(7): 1–17. A measured take of the status of the famous “Broca’s Area” as it relates to sentence processing. The main take-away is that there is “no compelling evidence that there are sentence-specific processing regions within Broca’s area.” Just, M., Carpenter, P., Keller, T., Eddy, W., and Thulborn, K. 1996. Brain activation modulated by sentence comprehension. Science, 274(5284): 114–116. One of the earliest papers to apply hemodynamic methods to studying syntax. The results implicate the left inferior frontal gyrus in processing complex relative clauses.

638

annotated bibliography for part iv

Pallier, C., Devauchelle, A.-D., and Dehaene, S. 2011. Cortical representation of the constituent structure of sentences. Proceedings of the National Academy of Sciences, 108(6): 2522–2527. One of the first papers to apply a computational model of sentence comprehension to fMRI signals. Their accumulator model posits a relatively direct link between the length of a phrase and the amount of sentence-processing effort required, equivalent to a bottom-up parser. The results implicate a range of inferior frontal and temporal brain regions whose activation increases for longer phrases. Brennan, J. R., Stabler, E. P., Van Wagenen, S. E., Luh, W.-M., and Hale, J. T. 2016. Abstract linguistic structure correlates with temporal activity during naturalistic comprehension. Brain & Language, 157–158: 81–94. This study quantitatively compares models that differ in terms of the grammar they use: one is based on a simpler context-free grammar, and the other is based on a more abstract minimalist grammar. A statistical analysis tests for the linear relationship between hemodynamic signals and the number of syntactic nodes posited by an incremental parsing algorithm. Brain activity from the left anterior and posterior temporal lobe are better fit by the more abstract minimalist grammar than by the simpler context-free grammar. Matchin, W., Sprouse, J., and Hickok, G. 2014. A structural distance effect for backward anaphora in Broca’s area: an fMRI study. Brain & Language, 138: 1–11. This fMRI study disentangles syntactic movement from predictability by comparing English wh-questions with backwards anaphora. The results show activation for Broca’s Area in both cases. This is inconsistent with accounts that link Broca’s Area with syntactic movement, specifically, and instead implicates a predictive role for this region in sentence processing. Frankland, S. M., and Greene, J. D. 2015. An architecture for encoding sentence meaning in left mid-superior temporal cortex. Proc Natl Acad Sci USA, 112(37): 11732–11737. This study applies “multi-voxel pattern analysis” to identify hemodynamic signals that respond to abstract thematic roles. The analysis shows a region in the posterior of the left temporal lobe whose activity patterns are consistent with neural registers that are tuned to different thematic roles. Zaccarella, E., and Friederici, A. D. 2015. Merge in the human brain: A sub-region based functional investigation in the left pars opercularis. Frontiers in Psychology, 6: 1818.

annotated bibliography for part iv

639

Building on electrophysiological studies by Pylkkänen and colleagues, this fMRI study compares simple phrases with word lists and implicates just a sub-part of the left inferior frontal gyrus in phrase-structure processing. While the authors connect this with the linguistic construct “Merge”, see Embick and Poeppel, above, on the complexities of mapping directly between neural signals and linguistic constructs. ***

Chapter 17. Aphasia and syntax (William Matchin and Corianne Rogalsky)

..........................................................................................................................

Bates, Elizabeth et al. 2003. ‘Voxel-based lesion-symptom mapping’, Nature Neuroscience, 6(5), pp. 448–450. This paper is the first to describe the method of voxel-based lesion symptom mapping, which provides researchers with the ability to relate behavioral findings or deficits to specific areas of brain damage as depicted on structural MRIs with high spatial resolution across a sample of patients. Caplan, David et al. 2016. ‘Deficit-lesion correlations in syntactic comprehension in aphasia’, Brain and Language. Academic Press, 152, pp. 14–27. A lesion-symptom mapping study in individuals with aphasia that examined four types of syntactic structures and elements known to be difficult in Broca’s aphasia, three different types of sentence comprehension tasks, and two measures of comprehension (off-line and on-line). This design allowed Caplan et al. to demonstrate that the neural resources critical for sentence comprehension are not only dependent upon the type of syntactic information, but also the task-related cognitive operations required. Caplan, David, Hildebrandt, Nancy, and Marshall, John C. 1988. Disorders of syntactic comprehension. Cambridge: MIT Press. A systematic review of agrammatism in light of syntactic theory, psycholinguistics, and computational parsing theory, a combination that is quite rare today. Caplan, David, and Waters, Gloria S. 1999. ‘Verbal working memory and sentence comprehension’, Behavioral and Brain Sciences, 22(1), pp. 77–126. This study presented a clear case for two separate pools of working memory resources: domain-general and language-specific, and that these two pools can dissociate. Caramazza, Alfonso, and Zurif, Edgar B. 1976. ‘Dissociation of algorithmic and heuristic processes in language comprehension: Evidence from aphasia’, Brain and Language, 3(4), pp. 572–582.

640

annotated bibliography for part iv

This study was the first to clearly demonstrate sentence comprehension deficits in Broca’s aphasia, namely the contrast between semantically reversible (worse performance) and semantically non-reversible (better performance) object-relative sentences. This paper is marked as the origin of the notion of that a central syntactic deficit underlies agrammatism. In addition, a lesser-known observation of this study was that the conduction aphasia patients (without classic production agrammatism) showed the same comprehension profile as the Broca’s aphasia patients, a fact that we think has gotten not nearly enough attention in the aphasia literature. Fedorenko, Evelina, Duncan, John, and Kanwisher, Nancy 2012. ‘Language-selective and domain-general regions lie side by side within Broca’s area’, Current Biology, 22(21), pp. 2059–2062. This study clearly demonstrated that there are distinct subregions within Broca’s area: one region that responds to both sentences and a variety of cognitively demanding tasks (including verbal working memory), and one region that responds to sentences but not to these demanding tasks. These distinct subregions may map onto the distinct pools of resources hypothesized by Caplan and Waters (1999). It also is notable that these distinct subregions of Broca’s area are often missed in group averaged neuroimaging studies, but can be identified in individual subject analyses. Fridriksson, Julius et al. 2015. ‘Chronic Broca’s aphasia is caused by damage to Broca’s and wernicke’s areas’, Cerebral Cortex, 25(12), pp. 4689–4696. The first large scale investigation of the brain regions associated with Broca’s aphasia using voxel-based lesion symptom mapping. Notably, damage to both Broca’s area and Wernicke’s area correctly classified patients Broca’s aphasia 95% of the time. This finding reflects the complexity of brain damage patterns in individuals with Broca’s aphasia. Grodzinsky, Yosef 1986. ‘Language deficits and the theory of syntax’, Brain and Language, 27(1), pp. 135–159. This paper was the first to lay out the logic for and evidence behind the influential TDH, the idea that problems with syntactic movement underlie agrammatic comprehension in Broca’s aphasia, and the association between Broca’s area and syntactic movement. Hickok, Gregory, and Poeppel, David 2007. ‘The cortical organization of speech processing’, Nature Reviews Neuroscience, 8(5), pp. 393–402. This paper presents an overview of (arguably) the most prominent functional anatomical model of language, the dual-stream model of speech processing. For the present discussion, it is notable that the dual-stream model (1) overlaps with the phonological working memory system, and (2) identifies both frontal and temporal-parietal brain regions to be engaged in sentence comprehension.

annotated bibliography for part iv

641

Lewis, Shevaun, and Phillips, Colin 2014. ‘Aligning Grammatical Theories and Language Processing Models’, Journal of Psycholinguistic Research, 44(1), pp. 27–46. This paper discusses the relation between the grammar (from the perspective of syntactic theory) and real-time sentence comprehension and production, reviewing experimental evidence bearing on this issue and arguing in favor of a one-system view in which the grammar is directly implemented in online processing. Lewis, Richard L., and Vasishth, Shravan 2005. ‘An activation-based model of sentence processing as skilled memory retrieval’, Cognitive Science, 29(3), pp. 375–419. This paper introduces the content-addressable memory model of parsing in computational detail, as well as reviewing conceptual and empirical evidence for this model of memory. It marks a departure in psycholinguistics from the dominance of a verbal working memory storage model to a model of memory based on limited storage and re-activation of items in long-term memory. The model makes explicit that the memory representations and retrieval cues are syntactic features, which opens the path for a connection to language-specific deficits in aphasia and agrammatism more specifically. Linebarger, Marcia C., Schwartz, Myrna F., and Saffran, Eleanor M. 1983. ‘Sensitivity to grammatical structure in so-called agrammatic aphasics’, Cognition, 13(3), pp. 361– 392. This paper clearly demonstrated the asymmetry between agrammatic sentence comprehension in Broca’s aphasia and the capacity for these patients to make subtle acceptability judgments. This paper is widely cited, yet is not cited enough, regarding claims about the function of Broca’s area, Broca’s aphasia, and syntactic processing. It remains one of the most important findings for any theory to address. Mohr, J. P. et al. 1978. ‘Broca aphasia: Pathologic and clinical’, Neurology, 28(4), pp. 311–324. This paper examined several cases of Broca’s aphasia and demonstrated that damage to Broca’s area is neither necessary nor sufficient to cause Broca’s aphasia. This paper is important because it means that we cannot assume that a patient with agrammatic Broca’s aphasia has a lesion to Broca’s area, and many papers on agrammatism have made that assumption. Pettigrew, Corinne, and Hillis, Argye E. 2014. ‘Role for memory capacity in sentence comprehension: Evidence from acute stroke’, Aphasiology, 28(10), pp. 1258–1280. This study of a large sample acute stroke patients strongly links short-term memory impairments and damage to the left-lateralized short-term memory/ working memory system to comprehension deficits for semantically reversible sentences.

642

annotated bibliography for part iv

Poeppel, David, and Embick, David. 2005. ‘Defining the relation between linguistics and neuroscience.’, in Cutler, A. (ed.) Twenty-first century psychlinguistics: four cornerstones, pp. 103–118. New York: Routledge. This paper clearly laid out the serious challenges of integrating linguistics with cognitive neuroscience, challenges that we think have been under-appreciated by linguists looking to bridge these gaps. Schwartz, Myrna F., Saffran, Eleanor M., and Marin, Oscar S. M. 1980. ‘The word order problem in agrammatism: I. Comprehension’, Topics in Catalysis, 10(2), pp. 249–262. This study found that patients with agrammatic Broca’s aphasia often struggle with sentences that do not involve Movement: reversible active and locative sentences. In our view, there has never been a thorough attempt to account for these data. Shahid, Hinna et al. 2017. Important considerations in lesion-symptom mapping: Illustrations from studies of word comprehension, Human Brain Mapping 38(6): 2990–3000. This paper written by leading language neuroimaging researchers is in many ways a thorough follow-up to Bates et al. 2003, providing state-of-the-art statistical approaches and current recommendations and caveats for using lesion-symptom mapping to investigate brain regions associated with specific language deficits. Stromswold, K., Caplan, D., Alpert, N., and Rauch, S. 1996. Localization of syntactic comprehension by positron emission tomography. Brain and Language, 52(3), 452– 473. This paper was one of the first and clearest papers to demonstrate that sentences with noncanonical word order activate the posterior portion of Broca’s area (pars opercularis) more than sentences with canonical word order. Wilson, Stephen M., and Saygin, Ayşe Pinar 2004. ‘Grammaticality Judgment in Aphasia: Deficits Are Not Specific to Syntactic Structures, Aphasic Syndromes, or Lesion Sites’, Journal of Cognitive Neuroscience, 16(2), pp. 238–252. This study represents a careful and systematic assessment of the Trace Deletion Hypothesis and the role of Broca’s area in syntax more generally. It clearly showed that while Broca’s area damage appears somewhat related to sentence processing, it is not more or less involved in Movement than sentences without Movement, and there are other brain areas that are much strongly related to syntactic knowledge, namely posterior temporal areas. This study has been very under-cited, in our view, in discussions of syntactic theory and aphasia.

c ha p t e r 1 8 ...........................................................................................................

the future of e x p e r i m e n ta l s y n ta x ...........................................................................................................

The contributors for each of the chapters were asked to write a mini-essay on what they see as the future of experimental syntax. The mini-essays are presented here, organized alphabetically by the last name of the first author.

Diogo Almeida

..........................................................................................................................

The ability to collect, quantify and analyze acceptability data in more precise and controlled ways was one of the strong selling points early on in the development of experimental syntax. Indeed, there are two strands of important results that have emerged thus far capitalizing on this property. The first involves the vindication of the traditional informal methodology used by syntacticians against the claims that it yields largely unreliable data (e.g. Sprouse and Almeida 2012; Sprouse, Schütze, and Almeida 2013; Chen, Xu and Xie 2020). The second involves the demonstration that some puzzling putative cross-linguistic differences that have been discussed at length in the theoretical literature may have been at least partially artefactual: For instance, the purported absence of superiority, that-trace, and syntactic island effects in some languages have all been called into question when simplifying assumptions, such as sidestepping gradience in favor of binary judgments, are revisited and re-evaluated (e.g. Featherston 2005a; 2005b; 2005c; Almeida 2014; Kush, Lohndal, and Sprouse 2019). These results present syntacticians with interesting opportunities, as they suggest the possibility of significant theoretical simplification. The price-tag for such a development, however, is to interrogate and explicitly evaluate any and all unstated simplifying assumptions and to take the quantification of acceptability judgments, its gradient nature, and its relationship with theoretical constructs more seriously. Thus, in my opinion, the field is ripe for the development of more structured thinking about the relationship between acceptability judgments as behavioral measures, and the development of substantive theories of syntax. A more precise understanding of this

644

the future of experimental syntax

relationship would not only benefit syntactic theory, but would also allow for more detailed quantitative predictions that can be confronted with data. For instance, the way syntacticians conceive of syntactic rules and constraints bears a strong resemblance to the notion of latent variable in multivariate modelling in statistics, but this connection has thus far been left largely unexplored (cf. Langsford, Stephens, Dunn, and Lewis, 2019 for a recent attempt). Better integrated substantive theories of syntax and acceptability judgments would benefit from the development of statistical models that can serve as a lens through which data can be evaluated (in the sense of Kass 2011). In addition, if consideration about the nature of the connection between syntactic theory and a behavioral measure such as acceptability judgment becomes part of the routine work of syntacticians, it might provide a blueprint for how other behavioral measures, for instance from parsing and sentence processing, may be jointly considered with theory, strengthening the links between syntacticians and psycho- and neurolinguists. Such a development would presumably foster a clearer and more fruitful exploration of how the objects of study of each group actually relate to one another.

Mara Breen and Katy Carlson

..........................................................................................................................

Many interesting topics remain in the area of prosody and sentence processing. There are structures whose prosodic contours have not been studied yet; there are questions about how prosodic phrasing and accents separably or jointly influence sentence processing; there are languages in which interactions between prosody and syntax have not yet been studied; and so on. Moving forward, we see several trends within the field of psycholinguistics which will likely affect future empirical prosodic work. One is a rise in the popularity of probabilistic computational models of sentence processing, such as Levy’s surprisal framework (2008; related to Hale 2001) and Gibson’s noisy-channel theory (Gibson et al. 2013). Theories of this general type focus on word-by-word probability and predictability as a primary determinant of processing difficulty, relying on speakers and listeners tracking the frequency of strings of words (and less on syntactic structures that are built and interpreted). Current approaches to modeling prosody consider it part of a maximally efficient linguistic system constrained by speakers’ motivation to spread information evenly throughout an utterance (Bergen and Goodman 2015). This idea, known as Uniform Information Density (Jaeger 2006; Levy and Jaeger 2007; Frank and Jaeger 2008) or the Smooth Signal Redundancy Hypothesis (Aylett 2000; Aylett and Turk 2004; Turk 2010), maintains that speakers use prosody to distribute redundancy throughout an utterance, with a tradeoff between linguistic redundancy and acoustic redundancy. Words that are unpredictable (not redundant) are produced with greater acoustic prominence than words that are predictable (redundant). Applying this line of reasoning to prosodic processing could lead to experiments testing a very local view of prosodic influences, such as: Given the prior word/phrase and a

the future of experimental syntax

645

prosodic boundary after it, how likely is the next word to be a particular category or begin a particular phrase type? Similarly, if the prior word/phrase is accented, what does this predict for the next word? These are interesting questions, but should not be studied at the expense of larger issues of prosodic phrasing, the interplay between prosodic and syntactic structures, and the role of focus in information structure. A positive trend in prosodic sentence-processing work is the study of a more diverse set of languages. Theories of prosody’s influence on syntactic processing must not be based only on English, and work on many other languages is now being done. Xu and colleagues, for example, address how languages with lexical tone such as Mandarin use larger-scale prosody, what phonetic changes result, and how easily those are perceived given that lexical tone already employs pitch changes in its realization (e.g. Xu and Wang 2009; Xu 2011). Similar trends include the study of prosody use by second-language speakers and the separability or interaction of musical perception and training with prosodic perception (e.g. Patel 2008). All of these investigations broaden our understanding of what prosody can do, how it interacts with syntax in widely different languages, as well as how very different prosodic systems work, how prosody is processed in the brain, and how speakers with different levels of language facility use it.

Jonathan R. Brennan

..........................................................................................................................

The current state of neurolinguistics may seem rather fractured. I’d say one factor is that different researchers hold different conceptualizations of the competence/performance distinction. Some approaches map more or less directly between syntactic primitives and neural processes, while others posit little-to-no systematic relationship between these domains (For the former, see Friederici’s efforts to localize Merge, while Frank’s recent work on heuristic processing is but one example of the latter.) Debates between these camps tend to turn on whether it has come time to revisit or relax the classical distinction between competence and performance. I don’t think this sort of debate is going to move the field forward because the sides start with incompatible assumptions. Instead, I would suggest focusing on productive ways to test what are reasonable mappings between neural constructs and linguistic constructs? When framed this way, I think the field starts to look pretty exciting. That’s because there are a growing number of plausible models about these linking hypotheses. We have a reasonable sketch of the relevant brain systems, and also the beginnings of a set of candidate algorithms that may be implemented in those systems. While the hypothesis space allows for direct-relationship and no-relationship accounts, it also allows for theories for which these two constructs are indirectly but systematically related. This third path has been probed in psycholinguistics (e.g. Chapter 2 of Berwick and Weinberg’s 1984 book) and stands ready to be explored in neurolinguistics. To compare alternative theories we need to adopt shared assumptions. Importantly, before we can adopt shared assumptions, we need to be explicit about what those

646

the future of experimental syntax

assumptions have been up to now (that’s the main pitch of my contribution to this volume). Doing so, we can better assess the common ground between alternative proposals and begin to compare them. These next steps won’t be easy. As with other debates in syntax, one major stumbling block will be different conceptions of complexity. Recent language acquisition work by Chater and colleagues comes to mind as an example that confronts and attempts to quantify what it means for one model to be simpler than another. Similar thinking will be needed to evaluate neurolinguistic linking hypotheses. Theory comparison is facilitated, of course, by quantitatively precise models. In addition to “wearing their assumptions on their sleeves,” such models also match the precision offered by hemodynamic (and other experimental) methods. Going out on a limb, I’m inclined towards theories that posit a relatively indirect relationship between syntactic primitives and neural operations: Syntactic features may likely be chunked into complex memory representations based on usage, and structurebuilding operations are likely predictive, contingent on usage, and may be intertwined with semantic/conceptual composition. They are likely not isomorphic to derivational primitives like merge, but they yield outputs that conform to the mapping defined by such primitives. Over these next years, I’m looking forward to arguments about what is an adequate formalization of these intuitions? What alternatives are also plausible? And what kind of neural data can tease apart these hypotheses?

Jennifer Culbertson

..........................................................................................................................

As a field, syntax has been characterized by deep divisions—between nativism and empiricism, between generative and usage-based theory, between rules and statistics, and so on. In my view, some of the most persistent divisions reflect different intuitions rather than clear scientific reasoning. Put another way, whether a given linguist takes as the null hypothesis that language relies on domain-general or specialized mechanisms is more a matter of taste than a confluence of evidence. As with any other source of data, experiments in principle inherit all the biases of the experimenter. How experiments are designed, and how the resulting data are interpreted, depend (more than we want to admit) on what we see as likely outcomes. However, experiments also push us toward generating and testing precise predictions of our hypotheses, and hopefully, toward revising our theories when those predictions aren’t met. In some cases, they make it possible to test predictions when no other source of data is available—where we are most likely to fall back on our intuitions. Moreover, behavioral data force linguists to grapple with, and therefore potentially better understand, how language behavior interacts with and is influenced by wider cognitive processes. A linguist who runs acceptability judgment experiments cannot fail to see that different people generate different judgments, that longer sentences are judged as having lower acceptability, that order of presentation can influence how a single sentence is rated. These are very simple examples, but even still they can lead us to see things in a

the future of experimental syntax

647

different light. By abstracting away from issues we perceive as not immediately of interest to us as researchers, we may limit ourselves in a way that hampers progress. In my area, syntactic typology, all these issues hold in spades; there is massive disagreement about what the likely explanation for any given typological asymmetry is, even though typological data alone cannot tell us (just see Evans and Levinson 2009 and the responses it generated). There are clear limits on the predictions we can test, since these data, by their very nature, are limited (we only have the languages we have!). These roadblocks can be moved by adding experimental methods, like artificial language learning experiments, to our toolbox. Once we start using these, we must grapple with broader questions: How might prior experience affect how a novel system is learned? Could order of presentation, or priming, or specific aspects of the stimuli drive a result? Are there more general cognitive mechanisms we already know about that could explain some aspect of learners’ behavior. In asking these questions, we are not just being good experimenters, we are actually opening the door to new types of answers. Experimental Syntax is thus both a way to provide converging evidence in a field which sorely needs it, and as a way to build better, more inclusive theories of language.

Stephani Foraker, Ian Cunnings, and Andrea E. Martin

..........................................................................................................................

We believe that further integration with other subfields of linguistics, and utilization of various different methods, will be crucial to the future of experimental syntax. In particular, we believe increased understanding of the memory architecture that underlies linguistic representation and processing can help provide a common ground and bridge between work in theoretical linguistics, psycholinguistics, and neurolinguistics. This is obviously not a new proposal. Indeed, it has long been known that memory plays a role in sentence acceptability (e.g. Miller and Chomsky 1963), but the extent to which grammatical restrictions can be reduced to memory-based constraints has been debated. Recently, this has perhaps best been exemplified in debate surrounding “reductionist” accounts of island phenomena (e.g. Hofmeister and Sag 2010; Hofmeister, Casasanto, and Sag 2012a; 2012b; Sprouse, Wagers, and Phillips 2012a; 2012b). However, this debate has to date largely been discussed in terms of a capacity-based approach to working memory (e.g. Daneman and Carpenter 1980). Work in psycholinguistics that suggests memory access during language comprehension is content-addressable would suggest that framing such arguments in terms of memory “capacity” are not the best way to examine how memory restrictions interact with linguistic representation, and instead that emphasizing similarity-based effects of memory on linguistic representation and processing may be more fruitful. This is not to say that all island phenomena may be reducible to memory constraints, but increased understanding of how such phenomena

648

the future of experimental syntax

may be characterized when one assumes a well-specified theory of memory encoding, storage, and retrieval, as in the case of content-addressable cue-based retrieval, we believe will be key to delimiting the role of grammatical constraints and memory restrictions in explaining island constraints and other linguistic phenomena. It may be no coincidence that proposals from theoretical linguistics, such as relativized minimality (Rizzi 1990; 2011), and work in psycholinguistics on similarity-based interference (e.g. Van Dyke and Johns 2012), both emphasize the importance of the similarity in content between sentence constituents in influencing linguistic acceptability and processing. In a similar vein, it may be no coincidence that different notions of “locality” have played an important role in both psycholinguistics and theoretical linguistics. For example, in psycholinguistics, Gibson’s (2000) dependency locality theory posits an important role of dependency minimization in processing linguistic dependencies, while in theoretical linguistics the minimal-link condition (Chomsky 1995) attempts to explain movement constraints in terms of the locality of movement operations. Again, although these theories operationalize locality in different ways and have been proposed to account for different phenomena, both assume locality plays an important role in constraining linguistic representation and processing. However, recent work utilizing the SAT and other paradigms which indicate that linguistic memory is contentaddressable would suggest that the defining factor in such cases may not be locality per se, but rather the content of other items in memory. Future research that integrates a well-defined theory of memory operations and architecture with work in theoretical linguistics we believe will be key to the future of experimental syntax.

Tim Hunter

..........................................................................................................................

One issue that we may soon seek a sharper understanding of is the relationship between the gradient data that we collect from acceptability judgement experiments, and grammars that are formulated in discrete terms and define a discrete notion of grammaticality. We know that grammaticality is only one of many factors that contribute to acceptability, but how exactly is the labour divided? Specifically, is the gradience we observe in acceptability judgements all attributable to non-grammatical factors, such as plausibility and sentence length, that interact with a binary notion of grammaticality? Or is the grammaticality of a sentence something that itself has more than two possible values? The most obvious versions of the second option might involve positing continuous gradient values for grammaticality, but there are other less extreme alternatives. One would be to posit (more than two) discretely ordered degrees of grammaticality, for example determined by the number of violations present in the relevant structure. But an even less radical alternative, which seems in line with the logic underlying standard syntactic practice, is to take degrees of grammaticality to be only partially ordered: some pairs of sentences have equal degrees of grammaticality, some pairs consist

the future of experimental syntax

649

of one sentence that is more grammatical than the other, and some are incomparable. A natural way to extract such an ordering from familiar kinds of grammars is to say that X is more grammatical than Y if and only if X violates a proper subset of the constraints that Y violates. The classical analysis of argument/adjunct extraction asymmetries in terms of subjacency and ECP violations takes this form: Extracting an argument from a wh-island violates only subjacency whereas extracting an adjunct violates both subjacency and the ECP, and we can take this proper-subset relationship to be saying that the former is more grammatical than the latter. A sentence that violates no constraints will be more grammatical than any sentence that violates one or more constraints; this maintains the one distinction made by the binary view of grammaticality, without collapsing all distinctions between “imperfect” sentences. Notice that a sentence that violates only subjacency would not be comparable with a sentence that violates only, say, Condition B, because neither relevant set is a proper subset of the other. Similarly, a sentence that violates both subjacency and the ECP would be incomparable with one that violates only Condition B, despite the difference in the number of violations. This incomparability jibes with the standard practice of testing theories via minimal pairs: It would be odd to try to conclude anything from judgements of such poorly matched stimuli. Constructing minimal pairs typically amounts to constructing a pair of sentences such that either both will violate exactly the same constraints or they differ in exactly one constraint; if an acceptability difference is detected, this is evidence for the latter. This approach is distinct from the assumptions underlying the factorial analysis of island effects, in that distances between acceptability levels are not taken to be significant, only their relative orderings. But the two approaches are compatible: The factorial logic arises from incorporating additional assumptions about how the partially ordered objects are “flattened” into one-dimensional ratings space.

Elsi Kaiser and Jeffrey Runner

..........................................................................................................................

The past 35 years have seen an explosion of experimental work on questions relevant to syntactic theory. As this Handbook demonstrates, the field has come far in adapting experimental methods to syntactic questions. An important next step is to keep expanding the domain of languages we investigate experimentally. This is especially critical in the area of binding and coreference, because there are many types of pronouns and reflexives cross-linguistically that do not map straightforwardly onto the better-known ones in English and the other languages our Binding Theories were developed to account for. Given the challenges laid out in our chapter on how to assess the acceptability of the intended coreferential/binding configurations, this expansion, especially to languages for which speakers do not use a writing system, will have to be done with care. Our chapter includes discussion of methods involving auditory and visual (but not written) stimuli that we hope can be expanded and developed to be used in these language communities.

650

the future of experimental syntax

Binding and coreference are inherently “interface” issues: The relevant information includes syntactic structure as well as semantic and discourse-level properties. Thus, experiments on binding/coreference engage multiple levels of representation. Though it may sometimes be convenient to put aside non-syntactic considerations, it is critical to understand and (at minimum) control for them when developing experiments. Indeed, our hope is that an expansion of experimental work on binding/coreference from a syntactic perspective will lead to more creative manipulations of other kinds of features. This will be particularly important as we expand our domain of languages and proform types cross-linguistically, as each form may provide keys to understanding how syntactic, semantic, and discourse-level information can interact. We think that two methodological items will continue to be important as more researchers adopt experimental techniques. First, it is important to understand how language users react to a particular structure, regardless of its lexicalization (the specific words used in a particular instantiation of that structure), to ensure that judgments are not localized to a particular sentence, but rather a sentence type. The second issue is sometimes called syntactic satiation, adaptation or priming: There is evidence that participants may adapt their judgments to certain stimuli over the course of an experiment, as a function of increased exposure. This highlights the importance of filler/distractor items, and further suggests that researchers should design their studies so that they can recognize and mitigate any such effects. In our view, experimental syntax does not replace traditional methods for examining language structure. However, in some cases it can provide more reliable information on some aspect of the issue. For example, we have argued that cases where judgments on isolated examples are unclear or controversial—or involve confounding issues that can cloud interpretations—are ripe for experimental investigation. The challenge is to design experiments that can tease apart (and/or control) the various factors that might lead to unclear judgments. In these situations, experimental methods—though they require care—can offer a way to get at something that traditional methods cannot.

Dave Kush and Brian Dillon

..........................................................................................................................

Over the last 5–10 years, there has been an explosion in research that investigates the link between formal theories of linguistic representation and the algorithmic workings of the human sentence processing mechanism at an unprecedented level of detail (Felser, Wagers, and Phillips 2017). This line of work blurs the distinction between psycholinguistics and experimental syntax, puts contemporary models of working memory and attention in contact with grammatical theory, and builds formal—often quantitatively precise—models of how linguistic knowledge is used during routine sentence comprehension.

the future of experimental syntax

651

The field is rapidly narrowing the disciplinary boundaries between experimental syntax and psycholinguistics. We expect the subfield to mature and to attract broader interest from the constituent communities of psychologists and linguists. Already, we see recent research as bringing the important issues and questions at the intersection of experimental syntax and psycholinguistics into focus: What are the parameters of the cognitive architecture in which the parser is implemented? How are fundamental linguistic relations (e.g. recursion and hierarchy) used to structure working memory storage and access? What causes systematic deviations between the behavior of parser and the constraints of the grammar? What are the points of cross-linguistic variation in processing? We see consensus about the important questions and issues emerging, more sophistication in the theoretical models that support research in this area, and increasing interest from researchers in neighboring areas. Signs point to exciting growth, but the work is difficult because it requires the analyst to balance theoretical commitments in a number of different, plausibly independent domains. For example, experimental syntacticians may consider eye-tracking as a useful tool for their investigations because it is relatively cheap (compared to neuroimaging or electrophysiological methods) and because there is a well-worked-out task model. However, making progress requires researchers to understand both the practical side of the technique and the task model before higher-order questions can be addressed. After, experimental syntacticians must have clear hypotheses about how to link grammatical constructs to reading behavior. This requires linking hypotheses between parsing operations and grammatical constructs, and between reading behavior and parsing operations. The ability to articulate these linking hypotheses presupposes a basic understanding of the reading task and how aspects of the task are reflected in eye-tracking measures. To this end, we provide a rudimentary overview of the psychology of reading and the basics of the eye-tracking-while-reading method. We review a few findings that motivate the assumption that effects of grammatical processing—or at least parsing operations—are reflected in the eye-tracking record. We offer remarks on practical aspects of constructing an eye-tracking experiment, and interpreting its results. Our remarks include cautionary guidelines regarding statistical pitfalls and the tendency to assume too direct a mapping from specific dependent measures to exact parsing operations, which we offer in the service of helping researchers in this exciting subfield further develop work in this area.

William Matchin and Corianne Rogalsky

..........................................................................................................................

In order for aphasia research and syntactic theory to mutually benefit each other, we suggest addressing two specific issues.

652

the future of experimental syntax

1. Connecting syntactic theory with real-time sentence processing Language assessment tasks in aphasia research require patients to process sentences using their grammatical knowledge, non-grammatical processing resources like working memory, and cognitive systems unrelated to language (e.g. attention) that are required to make task responses. Thus, observed differences between patients and healthy subjects may lie at any of these levels. In order to evaluate the relevance of syntactic theory to aphasia, it is important to have clear ideas about how grammatical operations and principles relate to real-time processing and the tasks patients are asked to perform, so that the effects of these various components can be clearly separated. This is particularly important as syntactic operations are often formulated in ways that do not translate transparently to real-time sentence processing, such as bottom-up syntactic derivations.

2. Aligning the granularity of linguistics and aphasiology at the level of the cortical area The currency of aphasia is the brain area—some piece of tissue that can be damaged due to stroke, neurodegenerative disease, or other brain injury and can be quantified and compared across subjects. This means that aligning syntactic theory with aphasiology requires us to identify a level of linguistic granularity that aligns with relatively largescale neural organization (see Poeppel and Embick 2005 and Embick and Poeppel 2015 for discussion of this general point with respect to linguistics and neuroscience). It may be that grammatical operations do not correspond to that level of granularity— for instance, they might correspond to micro-level neural circuitry or even sub-cellular chemical properties (Gallistel and King 2010). We believe it is helpful to identify what properties of language do match up well with the cortical area or a network of areas. There are successful cases of functional localization in the visual domain that can inform this search, such as the fusiform face area (Kanwisher 2010) and the visual word form area (Dehaene and Cohen 2011). If these successful cases of functional localization are to be a model for syntax, then we should identify generalizations stemming from syntactic theory or psycholinguistics that can be examined in a similar fashion. This is not to say that we should forgo investigating how the brain instantiates basic grammatical operations, but rather that we should make progress in understanding what we can about the brain and syntax given the methods currently available to us. Our hope is that this will form a useful precursor to future investigations, in which hypotheses about the neural implementation of grammatical operations can be formulated and tested using finer-grained methods, as is starting to be the case (Ding et al. 2016; Nelson et al. 2017).

the future of experimental syntax

653

Lisa S. Pearl

..........................................................................................................................

It turns out that I had a lot more thoughts than I realized about the future of syntactic acquisition modeling—this is why my chapter has a subsection devoted to “where we’re headed” that spans several pages. Here, I’ll provide a more condensed version of those thoughts. From my perspective, there are four main takeaways from the current state of syntactic acquisition modeling. First, children may be able to accomplish quite a lot with general-purpose prior knowledge about syntax. For instance, a reappearing element in successful syntactic acquisition models is the ability to generate structured representations of certain kinds—not to prefer these representations, but simply the ability to generate them at all. Coupled with domain-general learning mechanisms (e.g. allowing for overhypotheses, preferring structural pieces that get reused), this ability can be used to converge on a variety of syntactic representations. Perhaps most interestingly, recent work has tantalizingly suggested how this kind of general-purpose linguistic prior knowledge could be used to generate syntactic knowledge that looks a lot like traditional linguistic parameters. It remains for us to work out exactly how close the derived knowledge is to traditional parameters, which will likely require much more cross-linguistic work. Second, viewing syntax as part of a larger linguistic (and cognitive) system of knowledge makes other sources of information relevant for syntactic acquisition. That is, even for what seems to be an acquisition task that targets a specific piece of syntactic knowledge, children may well be using a variety of data sources to either constrain possible hypotheses or helpfully search through those hypotheses (or both). I’ve reviewed examples that use other syntactic, linguistic, and non-linguistic information, but many more relevant information sources may be available than we currently realize. Relatedly, it may be useful to consider how syntactic acquisition may be bootstrapped by other representations developing at the same time. That is, how can partial information about other representations help children learn about syntactic representations, and how can partial information about syntactic representations help children learn about those other representations? The key idea is that a wealth of indirect positive evidence may exist for the specific syntactic knowledge children need to acquire, simply because children are learning a linguistic system as a whole and not just isolated pieces of it. Related to this, the third takeaway is the need to develop more articulated syntactic acquisition models that recognize the impact of developing language processing and extralinguistic abilities on the information available to children as they acquire their syntactic representations. Fourth, a current empirical hurdle is the lack of large-scale datasets of structurally annotated child-directed speech from different languages and different socioeconomic statuses (SESes). Right now, many computational models of syntactic acquisition focus on high-SES English because that’s where the empirical data are easily available. But that

654

the future of experimental syntax

means we only have a high-SES, English-focused modeling snapshot of the universal process of syntactic acquisition that all typically developing children are supposed to go through. To evaluate our syntactic acquisition theories more thoroughly with computational modeling techniques, we need the structurally-annotated developmental data to do it.

Laurel Perkins and Jeffrey Lidz

..........................................................................................................................

How do grammars arise in the human mind? Surprisingly quickly, as revealed by past behavioral research on the syntactic representations and processing of young children (Chapters 5 and 6 in this volume). Much of children’s core syntactic knowledge appears to be in place even before they begin to produce sentences of their own. In order to see how that knowledge arises, we must therefore examine syntactic development earlier in infancy, during the first and second years of life. Doing so will require both methodological and theoretical innovation. Designing experimental tasks to illuminate the linguistic representations of a 1-year-old is no small feat, but asking not only when but how grammars are acquired means we must grapple with new theoretical challenges. Here, we briefly lay out these challenges and point towards ways for future work to meet them. Linguistic theories since Chomsky (1965) have typically abstracted away from development, asking whether a learner’s corpus of linguistic input as a whole supports grammar selection (e.g. Wexler and Culicover 1980; Yang 2002). But a puzzle arises when we consider how learning proceeds incrementally in a child’s development. Language acquisition, like other forms of learning, involves building on prior knowledge: Just as child who can’t count cannot learn arithmetic, a child who can’t segment words cannot identify properties of verbs in her language. The way that learners perceive their input changes as a function of their developing linguistic, conceptual, and cognitive abilities, and they use those perceptions to draw inferences about their target grammar. Learning cannot wait until children can completely and accurately parse every sentence they hear, or there would be nothing further to learn (Fodor 1998; Valian 1990). But if children’s representations of their input are incomplete or inaccurate, how do they avoid faulty inferences, or even learn from the input at all? Answering this question is necessary to build a model of grammar learning in development. Our theories must account for what portion of the input is useful to an individual child at any single point in development, how the child represents that portion of the input given her current abilities, and what learning mechanisms enable her to draw the right grammatical generalizations, even if her input representations are noisy and incomplete. We believe that advancing these theories will require an interdisciplinary approach—using both computational methods (see Chapter 7 of this volume) to model explicitly how learning might proceed from both prior knowledge and current experience, and ever more sophisticated behavioral methods to test the predictions of

the future of experimental syntax

655

these models in infancy. We also advocate patience in studying this population. Tasks must be tailored to the limited repertoire of behaviors young infants can control, and large samples are needed to provide sufficient power for reliable effect estimates (Oakes 2017). But the payoff is large: Characterizing the earliest steps of grammar acquisition in development has implications not only for our theories of language as a cognitive faculty, but also for our theories of learning in general, by deepening our understanding of how data impacts the acquisition of knowledge in any domain.

Maria Polinsky

..........................................................................................................................

The advent of experimental work in syntax has been simultaneously exciting and menacing. The work is stimulating because it opens new avenues of research and gives linguists new tools. However, the work is also alarming because of its potential to lead linguists to carry out experiments solely for the sake of experimentation. This trend—of running experiments for experiment’s sake—is self-perpetuating: Once you run a big, expensive experiment, you feel compelled to publish the data, even though the results may be null, and the questions may be formulated in such a way that an experiment is not warranted. In such cases (and we can all think of many examples), it seems as though the field is moving toward experimental syntax in name only, as more and more of the work done in this paradigm is not really of interest to syntacticians. Out of anyone, psychologists and people studying language processing are most likely to be interested by what we now call experimental syntax. It would be desirable to bring more syntax into experimental syntax, all the while maintaining the high experimental standards achieved in this subfield. Designing experiments that answer interesting syntactic questions is hard, as it typically requires syntactic sophistication as well as some understanding of language processing. But most students do not get training in both syntax and psycholinguistics. They can do theoretical syntax, or they can do psycholinguistics, but they cannot do both—even if they (and we, their professors) pay lip-service to such highly desirable multi-training and multi-tasking. I worry about this multi-training being absent. I worry that we have not made significant enough progress in our graduate programs so as to produce PhDs who are bilingual in both theoretical syntax and language processing. Given the exigencies of our graduate programs—such as student funding, time needed to complete a dissertation, composition of faculty committees, and limits of students’ own interests—it is unlikely that the much-needed ambidexterity of fields will ensue any time soon. But experiments remain seductive, and new generations of students are drawn to them. How can we make their work more relevant to syntacticians? How can we make their work more theoretically solid? The answer may lie in teamwork. We may not be able to create specialists who are as well versed in theoretical syntax as they are in language science experiments, but we can bring people with these disparate

656

the future of experimental syntax

sets of interests to the same room and have them ask questions of each other. Such teamwork, especially if it starts at the earliest stages of graduate training, will allow each side to see how the other half thinks, teaching them to appreciate the others’ arguments and helping them to work together in the future.

Jon Sprouse

..........................................................................................................................

My view of the future of experimental syntax is captured in the structure of this Handbook: using a wide range of experimental methods to push the boundaries of syntactic theories, developing linking theories between syntactic theory, sentence-processing theories, language acquisition theories, and neurolinguistics theories, and ultimately building a comprehensive theory of the cognitive neuroscience of language. Accomplishing this is going to take a willingness to expand what it means to be a syntactician. I don’t think syntacticians need to be experts in other theories (sentence processing, acquisition, neurolinguistics), nor do I think that syntacticians need to be experts in all possible experimental methods. But I do think the next generation of syntacticians will need to be conversant enough in these other theories to collaborate with psycholinguistics, acquisitionists, and neurolinguists to develop the necessary linking hypotheses (and, conversely, that language scientists working on these other theories need to be conversant enough in syntactic theories to work with syntacticians). I also think that syntacticians will need to develop expertise in one or two experimental methods in order to participate in the construction of critical data collection studies, and in order to build on the experimental work done by other researchers. All of this is already happening. The work discussed in this Handbook demonstrates this conclusively. My hope is that graduate programs in linguistics continue to encourage this kind of cross-area training and collaboration.

Kristen Syrett

..........................................................................................................................

In the chapter devoted to behavioral methods for preschoolers, I outlined a variety of ways in which researchers employ diverse methodologies to probe the nature of the grammar and the developing linguistic competence of young children. While the overall picture seems to indicate that even at a very young age, children share many of the grammatical characteristics with adults—therefore suggesting that a number of these may be innate, especially considering the impoverished input children receive—it is also clear that children occasionally arrive at interpretations that are not licensed in the adult grammar. The question that arises is why this happens. In that chapter, I provided evidence from studies conducted over many years that the reason could stem from experimenters’ methodological choices, their failure to satisfy felicity conditions

the future of experimental syntax

657

on the use of a word or phrase, or children’s immature sentence processor, leaving open the possibility of the grammar itself still being fine-tuned in some cases. Future research should work to pin down the source of non-adult-like behavior for a range of linguistic phenomena, such as binding relations and reference, quantification, questions, comparatives, raising and control constructions, reconstruction, relative clauses, and complementation patterns. While the vast majority of studies in language acquisition have focused on English, the more that researchers branch out to investigate other languages (especially those outside of the Indo-European family), driven by their linguistic theoretic knowledge of and familiarity with other languages, the better understanding we will have of the universality and crosslinguistic diversity that exists among the world’s languages, and the process of acquisition. Already this kind of research is happening to some extent, but it will take dedicated fieldwork, reliance on language preservation, documentation, and revitalization, and collaboration among researchers in first- and second-language acquisition, bilingualism, and theory. The more research probes areas that have already been covered to arrive at a more precise understanding of the acquisition process (and hopefully also at a better the analysis of the phenomena) and the more research extends to others languages, the more we will also need to examine the ways in which the developing grammar interacts with other subsystems of language and with aspects of the discourse context. One area in which such a synergy is already in place is research on argument omission, since certain discourse conditions and speaker–hearer relations license the absence of linguistic material—although not the same way in every language. (See e.g. Allen 2000, Hulk and Müller 2000, Müller and Hulk 2001, and Serratrice, Sorace, and Paoli 2004.) Finally, as we continue to explore new horizons in the topics covered in our acquisition research, so too should we explore new methodological horizons. Online methods shed light on the incremental processing of linguistic material and the variable weighting of linguistic and contextual cues to interpretation. Dynamic methods that move away from static truth-conditional judgments of isolated sentences to assessments and observations of language usage highlight the child as a budding conversationalist, both as speaker and hearer. In the same vein, departure from traditional binary judgments and gravitation towards non-binary gradient judgments, not just of propositions expressed by sentences but of the knowledge of the speaker who delivered the utterance, allow for the possibility of nuances in interpretation and truth-value gaps, and indicate the extent to which judgments may vary from speaker to speaker and context to context, and how syntax interacts with other aspects of language and cognition. (For more on such methods, see Goro and Akiba 2004, Katsos and Bishop 2011, Syrett and Aravind 2017; 2022, and Simon-Pearson and Syrett 2018.) Finally, prosodic manipulations of speech presented to and interpreted by children can reveal which aspects of the sound– meaning relationship they appreciate (and when), and how these systems work together to generate information structure-dependent interpretations in a discourse context.

658

the future of experimental syntax

Kriszta Eszter Szendrői

..........................................................................................................................

Truth is fundamental to natural language. In formal semantics, this is represented by the idea that the semantic meaning of a proposition is truth or falsity (Tarski 1944). Many of our experimental methods rely on the notion of truth. Take for instance the picture verification task, or the Truth-Value Judgment Task. As their names suggest, both rely on participants’ actual computation of the truth or falsity of a given proposition with respect to the context provided in a picture or an acted-out story. Yet experimental work rarely targets the actual process of verification, or falsification, by which hearers actually arrive at their truth or falsity judgment. The truth-value associated with a proposition is implicitly assumed to be independent of the actual process by which the truth or falsity judgment is reached in actual processing. But this assumption is, ironically, false. Take for instance non-referential noun phrases such as the King of France. We know since Strawson (1964) that some sentences involving such noun phrases as their subject arguably lack a truth-value, e.g. The King of France is bald. But Lasersohn (1993) pointed out that in other similar sentences, a definitive truth-value is assigned. For instance, The King of France is sitting in this chair is false, when uttered in the context of an empty chair. This is because here one may verify the truth or falsity of the utterance in the context without confronting the non-referential nature of the subject noun phrase. So, the verification process itself plays a role in determining the truth-value of such sentences. (See Abrusan and Szendrői 2013 for experimental proof of Lasersohn’s proposal.) We also know that verification versus falsification strategies play out differently because verifying an utterance and falsifying it requires different amount of pragmatic commitment on the part of the speaker (e.g. Stalnaker 1974). This is generally recognized and adhered to in experimental methodology; see for instance the design requirements of the truth-value judgment task. But Conroy (2008) also showed that there is a more general human cognitive bias towards verification procedures over falsification procedures (see also Mulders and Szendrői 2013 for the same observation in their visual-world eye-tracking task). Generally, as Lidz et al. (2011) argue extensively, verification (or falsification) of truth is a process that lies at the interface of natural-language competence and human cognition. We have already seen that verification may play a role in determining truth-values. In addition, we also know that verification strategies interact with linguistic notions, like topic-hood (Reinhart 1981; Erteschik-Shir 2007; Abrusan and Szendrői 2013). Verification strategies also interact with various human cognitive processes. In addition to the preference for verification over falsification, Pietroski et al. (2009) have demonstrated that adults’ verification of quantified sentences reveal their reliance on the so-called Approximate Number System, a piece of “cognitive machinery shared throughout the animal kingdom” (Lidz et al, 2011: 235), and that

the future of experimental syntax

659

in fact natural-language verification strategies must conform to the limitations of this cognitive apparatus. Lidz et al. hypothesize that verification strategies that are less context-dependent are preferred over more context-dependent ones, and that “the verification procedures […] are biased towards algorithms that directly compute the relations and operations expressed by the semantic representation of that sentence”— their Interface Transparency Thesis. Different methods have been emerging in the literature to investigate not just static truth and falsity but verification and falsification procedures themselves, e.g. Conroy’s (2008) Incremental Verification Task and Hackl’s (2009) Self-Paced Counting method. In sum, I believe that future work will and should target verification and falsification procedures, and thus increase our understanding of the interface between human language competence and cognitive processes.

Matthew Wagers and Sandra Chung

..........................................................................................................................

As we contemplate the future of experimental syntax and how it will develop, we want to emphasize the importance of furthering the engagement between laboratory- and field-based methods. Experimental syntax is now some 25 years old, and much ground has been covered to cross-validate or quantify the acceptability judgments that have underwritten core syntactic theory. But quite apart from retrospectively guaranteeing the robustness of previous research, experimentally gathered data now appears more routinely in new research, particularly research done by younger investigators. In our own department, as well as other departments we are aware of, graduate students have come to regard it as an indispensable part of their intellectual training—akin to mastering the Binding Theory or the predicate calculus. Future research in syntactic theory will draw increasingly upon data sources that are varied and multifarious—which is all to the good. But it is important that we guard against a siloization of methodologies, in which some students are trained to be fieldworkers and others are trained to be experimentalists. Siloization may be the path of least resistance in some ways, but it would be deleterious in many others. In our view, joint training in field methods and experimental methods should become the standard. Firstly, there is much to be learned from training in the two traditions and synthesizing them. Experimentalists can find value in a more human engagement with the individuals who participate in their studies. For example, every debriefing offers an opportunity to do fieldwork, and we hope to see more considered engagement with that aspect of doing an experiment. Fieldworkers must prepare assiduously for how and what they will ask one-on-one with their consultant, but our own experience has taught us that experimentalists often view the debriefing as an afterthought or pro forma exercise. Correspondingly, fieldworkers can find value in the practical strategies provided by the experimental tradition for exploring a hypothesis space and collecting data with

660

the future of experimental syntax

reduced bias. Concepts such as crossed designs, counterbalancing, and randomization are simple but powerful tools that all linguists should know how to deploy. Secondly, we are still tracing the borderlands between the content of our grammatical knowledge (“competence”) and how it is deployed in a cognitive architecture (“performance”) . While much has been learned in recent years about what acceptability is, there is more progress to be made. It is important that the full range of linguistic diversity contribute to this endeavor, so that we avoid falling into the traps posed by incomplete typology or convenience. Smaller languages can have a chance to participate only if they are included in the dataset, and they can be included in the dataset only if questions of “performance” become priorities for research in the field.

Masaya Yoshida

..........................................................................................................................

I believe that it is important to determine the objectives of experimental syntax in general. The object of study in the tradition of transformational generative grammar has primarily been each speaker’s internalized knowledge of language, or I-language (e.g. Chomsky 1986; 2000; Den Dikken, Bernstein, Tortora, and Zanuttini 2007; Isac and Reiss 2013). The behavioral data that have been collected through experiments— whether online experiments, offline experiments, or acceptability judgments—are supposed to be a useful tool to study I-language. I believe the data themselves are not the object of study. If the object of study in the field of syntax is not I-language, we have to figure out what exactly we are studying. Either way, I believe it is important to ask what we are studying and why conducting large-scale experiments is useful, in terms of our research objectives. As is often emphasized, from the I-language perspective, individuals’ grammars may differ in some fundamental ways and there is no reason a priori to believe that each individual speaker who belongs to the “same” speech community have the same individual grammar (Den Dikken et al. 2007; Han, Lidz, and Musolino 2007; Dąbrowska 2012; Han, Musolino, and Lidz 2016). From this perspective, it is not clear what largescale experiments can tell us about I-language (a similar point is discussed in detail in Den Dikken et al. 2007). Thus, I believe that it is important to make clear what the object of study in experimental syntax is, and evaluate the methodologies that we employ from the perspective of this object of study.

References Abrusán, M., and K. Szendrői. 2013. Experimenting with the King of France: Topics, verifiability and definite descriptions. Semantics and Pragmatics 6(10): 1–43. doi:10.3765/sp.6.10 Allen, Shanley E. 2000. A discourse-pragmatic explanation for argument representation in Inuktitut. Linguistics 38: 483–521.

the future of experimental syntax

661

Almeida, Diogo. 2014. Subliminal wh-islands in Brazilian Portuguese and the consequences for syntactic theory. Revista da ABRALIN 13: 55–93. Aravind, Athulya, and Kristen Syrett. 2017. Investigating context sensitivity and vagueness in nominals in child and adult language. http://iceland2017.nelsconference.org/wpcontent/uploads/2017/08/Aravind-Syrett.pdf Aylett, Matthew P. 2000. Stochastic suprasegmentals: Relationships between redundancy, prosodic structure and care of articulation in spontaneous speech. In Proceedings of ICSLP-2000, Beijing. http://www.isca-speech.org/archive/icslp_2000/i00_3646.html Aylett, Matthew, and Alice Turk. 2004. The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech 47(1): 31–56. https://doi.org/10.1177/00238309040470010201 Bergen, Leon. 2016. Joint inference in pragmatic reasoning. Thesis, Massachusetts Institute of Technology. Retrieved from http://dspace.mit.edu/handle/1721.1/106430 Bergen, Leon, and Noah D. Goodman. 2015. The strategic use of noise in pragmatic reasoning. Topics in Cognitive Science 7: 336–350. Berwick, Robert C., and Amy Weinberg. 1984. The grammatical basis of linguistic performance, Cambridge, MA: MIT Press. Chen, Zhong, Yuhang Xu, and Zhiguo Xie. 2020. Assessing introspective linguistic judgments quantitatively: The case of The syntax of Chinese. Journal of East Asian Linguistics 29: 311–336. Chomsky, N. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press. Chomsky, Noam. 1986. Knowledge of language: Its nature, origin, and use. New York: Praeger. Chomsky, Noam. 1995. The Minimalist Program. Cambridge, MA: MIT Press. Chomsky, Noam. 2000. New horizons in the study of language and mind. Cambridge: Cambridge University Press. Conroy, A. 2008. The role of verification strategies in semantic ambiguity resolution in children and adults. PhD dissertation, University of Maryland, College Park. Dąbrowska, Ewa. 2012. Different speakers, different grammars: Individual differences in native language attainment. Linguistic Approaches to Bilingualism 2: 219–253. Daneman, Meredyth, and Patricia A. Carpenter. 1980. Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior 19(4): 450–466. Dehaene, Stanislas, and Laruent Cohen. 2011. The unique role for the visual word form area in reading. Trends in Cognitive Science 15(6): 254–262. Den Dikken, Marcel, Judy B. Bernstein, Christina Tortora, and Raffaella Zanuttini. 2007. Data and Grammar: Means and Individuals. Theoretical Lignuistics 33: 335–352. Ding, Nai, Lucia Melloni, Hang Zhang, Xing Tian, and David Poeppel. 2016. Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience 19: 158–164. Embick, David, and David Poeppel. 2015. Towards a computational(ist) neurobiology of language: correlational, integrated, and explanatory neurolinguistics. Language, Cognition, and Neuroscience 30(4): 357–366. Erteschik-Shir, Nomi. 2007. Information structure. Oxford: Oxford University Press. Featherston, Sam. 2005a. That-trace in German. Lingua 115: 1277–1302. Featherston, Sam. 2005b. Magnitude estimation and what it can do for your syntax: Some wh-constraints in German. Lingua 115: 1525–1550. Featherston, Sam. 2005c. The Decathlon model of empirical syntax. In M. Reis and S. Kepser (eds), Linguistic evidence: Empirical, theoretical, and computational perspectives, 187–208. Berlin: Mouton de Gruyter.

662

the future of experimental syntax

Felser, Claudia, Matt Wagers, and Colin Phillips. 2017. Encoding and navigating linguistic representations in memory. Frontiers in Psychology 8: 164. Frank, Austin F., and T. Florian Jaeger. 2008. Speaking rationally: Uniform information density as an optimal strategy for language production. In Proceedings of the 30th Annual Meeting of the Cognitive Science Society 30: 939–944. Retrieved from https://escholarship.org/uc/item/7d08h6j4 Fodor, J. D. 1998. Parsing to learn. Journal of Psycholinguistic Research 27(3): 339–374. Gallistel, C. R., and Adam Philip King. 2010. Memory and the computational brain: Why cognitive science will transform neuroscience. Malden, MA: Wiley-Blackwell. Gibson, Edward. 2000. The dependency locality theory: A distance-based theory of linguistic complexity. In A. Marantz, Y. Miyashita, and W. O’Neil (eds), Image, language, brain: Papers from the first mind articulation project symposium, 95–126. Cambridge, MA: MIT Press. Gibson, Edward, Steven T. Piantadosi, Kimberly Brink, Leon Bergen, Eunice Lim, and Rebecca Saxe. 2013. A noisy-channel account of crosslinguistic word-order variation. Psychological Science 24(7): 1079–1088. https://doi.org/10.1177/0956797612463705 Goro, Takuya, and Sachie Akiba. 2004. The acquisition of disjunction and positive polarity in Japanese. In V. Chand, A. Kelleher, A. J. Rodríguez, and B. Schmeiser (eds), Proceedings of the 23rd West Coast Conference on Formal Linguistics (WCCFL), 251–264. Somerville, MA: Cascadilla. Hackl, M. 2009. On the grammar and processing of proportional quantifiers: most versus more than half. Natural Language Semantics 17: 63–98. Hale, John. 2001. A probabilistic Earley parser as a psycholinguistic model. In Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, 1–8. Stroudsburg, PA: Association for Computational Linguistics. https://doi.org/10.3115/1073336.1073357 Han, Chung-hye, Jeffrey Lidz, and Julien Musolino. 2007. V-raising and grammar competition in Korean: Evidence from negation and quantifier scope. Linguistic Inquiry 38: 1–47. Han, Chung-hye, Julien Musolino, and Jeffrey Lidz. 2016. Endogenous sources of variation in language acquisition. PNAS 113: 942–947. Hofmeister, Philip, and Ivan A. Sag. 2010. Cognitive constraints and island effects. Language 86(2): 366–415. Hofmeister, Philip, Laura Staum Casasanto, and Ivan A. Sag. 2012a. How do individual cognitive differences relate to acceptability judgments? A reply to Sprouse, Wagers, and Phillips. Language 88(2): 390–400. Hofmeister, Philip, Laura Staum Staum Casasanto, and Ivan A. Sag. 2012b. Misapplying working-memory tests: A reductio ad absurdum. Language 88(2): 408–409. Hulk, Aafke, and Natascha Müller. 2000. Bilingual first language acquisition at the interface between syntax and pragmatics. Bilingualism: Language and Cognition 3: 227–244. Isac, Daniela, and Charles Reiss. 2013. I-language: An introduction to linguistics and cognitive science. Oxford: Oxford University Press. Jaeger, T. Florian. 2006. Redundancy and syntactic reduction in spontaneous speech. Doctoral dissertation, Stanford University. Jaeger, T. Florian. 2010. Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology 61(1): 23–62. https://doi.org/10.1016/j.cogpsych.2010.02.002 Kanwisher, Nancy. 2010. Functional specificity in the human brain: A window into the functional architecture of the mind. Proceedings of the National Academy of Sciences 107: 11163–11170.

the future of experimental syntax

663

Kass, Robert E. 2011. Statistical Inference: The Big Picture. Statistical Science 26(1): 1–9. Katsos, Napoleon, and Dorothy Bishop. 2011. Pragmatic tolerance: Implications for the acquisition of informativeness and implicature. Cognition 120: 67–81. Kush, Dave, Terje Lohndal, and Jon Sprouse. 2019. On the Island Sensitivity of Topicalization in Norwegian: An experimental investigation. Language 95: 393–420. Langsford, Steven, Rachel G. Stephens, John C. Dunn, and Richard L. Lewis. 2019. In search of the factors behind naive sentence judgments: A state trace analysis of grammaticality and acceptability ratings. Frontiers in Psychology 10: 2886. Lasersohn, Peter. 1993. Existence presuppositions and background knowledge. Journal of Semantics 2: 113–122. http://dx.doi.org/10.1093/jos/10. Levy, Roger. 2008. Expectation-based syntactic comprehension. Cognition 106(3): 1126–1177. https://doi.org/10.1016/j.cognition.2007.05.006 Levy, Roger, and T. Florian Jaeger. 2007. Speakers optimize information density through syntactic reduction. In Advances in Neural Information Processing Systems, 849–856. https://papers.nips.cc/paper/2006/file/c6a01432c8138d46ba39957a8250e027-Paper.pdf Lidz, Jeffrey, Justin Halberda, Paul Pietroski, and Tim Hunter. 2011. Interface transparency thesis and the psychosemantics of most. Natural Language Semantics 19(3): 227–256. Miller, George, and Noam Chomsky. 1963. Finitary models of language users. In D. R. Luce, R. R. Bush, and E. Galanter (eds), Handbook of mathematical psychology, vol. 2. New York: John Wiley. Mulders, I., and K. Szendrői. 2016. Early association of prosodic focus with alleen ‘only’: Evidence from eye movements in the visual-world paradigm. Frontiers in Psychology 7: 150. doi:10.3389/fpsyg.2016.00150 Müller, Natascha, and Aafke Hulk. 2001. Crosslinguistic influence in bilingual language acquisition: Italian and French as recipient languages. Bilingualism: Language and Cognition 4: 1–21. Nelson, Matthew J., Imen El Karoui, Kristof Giber, Xiaofang Yang, Laurent Cohen, Hilda Koopman, Sydney S. Cash, Lionel Naccache, John T. Hale, Christoph Pallier, and Stanislas Dehaene. 2017. Neurophysiological dynamics of phrase-structure building during sentence processing. Proceedings of the National Academy of Sciences 114: E3669–E3678. Oakes, L. M. 2017. Sample size, statistical power, and false conclusions in infant looking-time research. Infancy 22(4): 436–469. Patel, Aniruddh D. 2008. Music, language, and the brain. New York: Oxford University Press. Pietroski, P., J. Lidz, T. Hunter, and J. Halberda. 2009. The meaning of most: Semantics, numerosity and psychology. Mind and Language 24(5): 554–585. Poeppel, David, and David Embick. 2005. The relation between linguistics and neuroscience. In A. Cutler (ed.), Twenty-first century psycholinguistics: Four cornerstones, 103–120. Mahwah, NJ: Lawrence Erlbaum Associates. Reinhart, Tanya. 1981. An analysis of sentence topics. Philosophica 1: 53–94. Rizzi, Luigi. 1990. Relativized minimality. Cambridge, MA: MIT Press. Rizzi, Luigi. 2011. Minimality. In C. Boeckx (ed.), The Oxford handbook of linguistic minimalism, 220–238. Oxford: Oxford University Press. Serratrice, Ludovica, Antonella Sorace, and Sandra Paoli. 2004. Crosslinguistic influence at the syntax-pragmatics interface: Subjects and objects in English–Italian bilingual and monolingual acquisition. Bilingualism: Language and Cognition 7: 183–205. Simon-Pearson, Laura, and Kristen Syrett. 2018. Assessing truth and speaker knowledge when utterances are not maximally true. In Anne B. Bertolini and Maxwell J. Kaplan (eds),

664

the future of experimental syntax

Proceedings of the 42nd Annual Boston University Conference on Language Development, 708–721. Somerville, MA: Cascadilla. Sprouse, Jon, and Diogo Almeida. 2012. Assessing the reliability of textbook data in syntax: Adger’s Core Syntax. Journal of Linguistics 48: 609–652. Sprouse, Jon, Matt Wagers, and Colin Phillips. 2012a. A test of the relation between workingmemory capacity and syntactic island effects. Language 88(1): 82–123. Sprouse, Jon, Matt Wagers, and Colin Phillips. 2012b. Working-memory capacity and island effects: a reminder of the issues and the facts. Language 88(2): 401–407. Sprouse, Jon, Carson T. Schütze, and Diogo Almeida. 2013. A comparison of informal and formal acceptability judgments using a random sample from Linguistic Inquiry 2001–2010. Lingua 134: 219–248. Stalnaker, Robert C. 1974. Pragmatic presuppositions. In Milton Munitz and Peter Unger (eds), Semantics and philosophy: Essays. New York University Press. Strawson, Peter F. 1964. Identifying reference and truth-values. Theoria 30(2): 96–118. http://dx.doi.org/10.1111/j.1755-2567.1964.tb00404.x. Tarski, A. 1944. The semantical concept of truth and the foundations of semantics. Philosophy and Phenomenological Research 4: 341–375. Turk, Alice. 2010. Does prosodic constituency signal relative predictability? A smooth signal redundancy hypothesis. Laboratory Phonology 1(2): 227–262. https://doi.org/10.1515/LABPHON.2010.012 Valian, V. 1990. Logical and psychological constraints on the acquisition of syntax. In L. Frazier and J. G. De Villiers (eds), Language processing and language acquisition, 119–145. Dordrecht: Kluwer. Van Dyke, Julie A., and Clinton L. Johns. 2012. Memory interference as a determinant of language comprehension. Language and Linguistics Compass 6(4): 193–211. Wexler, K., and P. Culicover. 1980. Formal principles of language acquisition. Cambridge, MA: MIT Press. Xu, Yi. 2011. Speech prosody: A methodological review. Journal of Speech Sciences 1(1): 85– 115. Xu, Yi, and Maolin Wang. 2009. Organizing syllables into groups: Evidence from F0 and duration patterns in Mandarin. Journal of Phonetics 37(4): 502–520. https://doi.org/10.1016/j.wocn.2009.08.003 Yang, C. 2002. Knowledge and learning in natural language. Oxford: Oxford University Press.

Index ...................

Abend, O. et al. 255–9, 261 accents 453 and contrastive alternatives 472–3 eye movements, Visual World paradigm 470–473 and sentence processing 468–70 and speech disfluencies 475–6 and syntactic structures 473–5 acceptability judgments and agrammatism 609–11 biased vs. random sampling 7–9 categorization tasks 11 and comprehension of prosody 464 continuous, and the architecture of grammar 21–4 establishing reliability and validity 6, 9–10 informally vs. formally collected 5, 24 island effects 15–20 magnitude estimation task 12–13 psychometric theories 5–6 rating tasks 11 research, on the future of 643–4, 648–9 selection tasks 11 and SPR experiments 313 statistical analysis 127–8 testing convergence and divergence 9–10 Thurstone method 13–14 acceptability judgments, binding/coreference 29–30, 129 alternatives to subscripts 37–42 binary questions 34 Binding Theory 31–3, 649–50 data points issue 47–8 instruction task 46–7 magnitude estimation 36–7, 41 scale-based questions 34–6 use of fillers 47 verification strategies 42–5

Achimova, A. et al. 195 acquisition and acceptability judgments 22 grammatical deficit hypothesis 76–9 Language Acquisition Device (LAD) 91–3 universal-existential quantifier scope judgments 69–71 see also artificial language learning; behavioral acquisition acquisition, syntactic modeling 209 algorithmic 218–19, 233, 238, 260, 306 cognitive plausibility 215–17, 260 computational 217–18, 240, 254, 255, 260, 264 evaluation of theory 210–212 future developments on 653–4 implementational 219, 260 inference process 213, 214, 218–27 Bayesian 224–7, 240, 254, 257, 258, 304, 305 counting things 220–221 reinforcement learning 221–3, 306 Tolerance Principle 223–4, 255, 305 non-parametric approaches 237 incorporating meaning 251–9 optional infinitives (OIs), MOSAIC 237–8 pronoun interpretation 244–51, 261 structure dependence 238–41 syntactic islands 241–4 parametric approaches 228–37, 262 structural triggers learner (STLearner) 229–34, 306 variational learning (VarLearn) 234–7 perceptual intake 212–14 task components 214–15 triggering theory 209–10 Active Filler Hypothesis 104

666

index

Adger, D. 275, 284, 286 adjectives and behavioral infant acquisition 141, 142, 149, 161 effect of accents on 471 electrophysiology and syntactic violations 545 tough and control, preschooler acquisition 185–6 word order harmony and naturalness 280, 281, 283–5 agrammatism 595 acceptability judgment test 609–11 Broca’s aphasia 596–7 relativized minimality (RM) approach 608, 613 revised trace-deletion hypothesis (RTDH) 608 and sentence comprehension 606–15 trace-deletion hypothesis (TDH) 607–8, 609, 614 and working memory (WM) 594, 613–14, 615–23 Altmann, G. 57 Anand, P. et al. 492 anaphora backwards 19, 567–8, 612 in behavioral acquisition 149, 171, 174–6, 179, 183, 199 in binding/coreference constraints 29, 32–3, 40, 42, 48–9, 130, 131 comprehension, and anagrammatism 612, 616 and eye-tracking 349–52 in hemodynamic investigations 567–8, 570 in modeling syntactic acquisition 244–51, 261 and speed-accuracy trade-off (SAT) 382–4 Anderson, C. 62–6, 68, 69, 72, 132, 467 aphasia 593–4, 639–42 assessment measures 594–5 Broca’s 596–7, 600 classifications 595 and cognitive deficits 604–5 conduction 598–9 individual variability 603–4

and neuroimaging and lesion-symptom mapping 601–3 and neuropsychology 600–601 research, future developments of 651–2 and syntactic theory 606–15, 623–4 Wernicke’s 597–8, 600 see also agrammatism Arnett, N. 345 Arnold, J. 471, 475 artificial language learning 271, 290–291 and behavioral acquisition 157, 159 challenges to universal constraints 272–4 “ease of learning” paradigm 275, 282 efficiency 286 differential case marking (DCM) 287 on the future of 646–7 “iterated of learning” paradigm 286 morphology procession and perception 287–8 affix ordering 289–90 dependency-length minimization (DML) 288–9 naturalness 281 basic word order 282–3 noun phrase 283–6 “poverty-of-the-stimulus” (POS) 272, 275–6, 279, 284, 286 “regularization” paradigm 275–6, 286, 287, 308 “silent gesture” paradigm 276, 282, 285, 308 simplicity, word order harmony 279–81, 307 studies 277–9 Asudeh, A. 39 automata-theoretic parsing models 393–4, 412–15, 432–6 bottom-up 415–21 left-corner 429–32 top-down 421–9, 446 Baddeley, A. 605 Balogh, J.E. 474 Bantu 290 Bard, E. G. et al. 12, 13 Bastiaansen, M.C. et al. 545, 546 Becker, M. 183–4, 186, 252

index behavioral acquisition (infants) 1 37–9, 161–3 morphosyntactic dependencies 154, 156–8 movement dependencies 154–5, 158–9 referential dependencies 155, 159–61 scholarly debate on clause structure 147 sensitivity to subcategories 142–6 syntactical categories 139–40 telegraphic speech 150–153 word order 147–9 behavioral acquisition (preschoolers) 171–3, 200–201 passive constructions 188–91 pronouns and reflexives 173–9 propositional attitude verbs 181–3 quantifier raising 196–8 quantifiers and scope 179–80 raising and control verbs 183–5 reconstruction 199–200 relative clauses 187–8 research, future developments of 656–7 tough and control adjectives 185–6 wh-questions 192–5 yes/no questions 191–2 Bemis, D. 550 Bencini, G. 189 Bever, T.G. 584 Binding Theory, Government and Binding 29, 31–3, 243, 351–2, 518, 606–8, 649, 659 see also acceptability judgments Birch, S. 469 Bobaljik, J.D. 67–8 Boland, J.E. 318–19, 327 bootstrapping semantic 148 syntactic 142–6, 181, 256, 258–9, 261 Bornkessel, I. et al. 382, 574 Bott, L. et al. 384 Braine, M.D. 277, 278 Breen, M. et al. 458, 469–70 Bregman, A.S. 278 Brennan, J.R. 551, 572–3, 578, 583 Broca, P.P., Broca’s area 565, 593 see also aphasia Brooks, P.J. et al. 278

667

Brown-Schmidt, S. et al. 474 Brumm, K. 597–8 Caplan, D. et al. 600–601 Caramazza, A. 606, 613 Carlson, K. 467, 474 Catlin, J. 60 Cauvet, E. et al. 141 c-command in child acquisition 155, 160–161, 173–8, 180, 197–8, 199 in eye tracking and gender mismatch 350, 351 in judgment of binding and coreference 31–3, 46 in modeling syntactic acquisition 247 in quantifier scope judgments 53, 55, 59–60, 67, 69, 76, 93 in SAT modeling 384–5 in self-paced reading 323–6 Chamorro, fieldwork experiments 114, 491, 492–3, 509 community engagement 508 consent of participants 505–6 debriefing questions 504 investigation methods 499–503 participant motivations 507–8 use of audio materials 496–9 use of visual materials 495–6 Chien, Y.-C. 130–131, 176–7 Chierchia, G. 176, 199 CHILDES 216, 250, 258, 263 Chomsky, N. 7–8, 277, 333, 395, 581–2 Christiansen, M. 239–40 Clemens, L. et al. 502–3 Clifton, C. 340, 469 collective and non-collective nouns 341–2 Commonwealth of the Northern Mariana Islands (CNMI) 491–7, 505–7 Conditioned Head Turn procedure 141, 302 Conroy, A. 59, 76–88, 90, 132–3, 178–9 context-free grammars (CFGs) 408, 525–6 vs. finite-state automation 409 and magnetoencephalography (MEG) 551 vs. minimalist grammars 437, 445 tree structure 438–47 Continuity Hypothesis 76

668

index

Cowart, W. 12, 14, 35, 129 Crain, S. 57, 88–9, 91, 160, 175, 187, 191–2, 239 Culbertson, J. 275–6, 280–281, 284, 285, 286, 287 Cunnings, I. 352–4 Cutler, A. 289, 469 Dahan, D. et al. 470 Dennison, H.Y. 472 De Villiers, J. et al. 194, 195 Diessel, H. 188 Dikker, S. et al. 541 Ding, N. et al. 552 double-is construction 102–3 Drury, J.E. 541 Dudley, R. et al. 182 Dutch agrammatism in 613 modeling acquisition 236, 238 prosody 472 syntactic/semantic violations, and electrophysiology 545–7 electroencephalography (EEG) 533–4, 635–7 and biology 535–7 compared and combined to MEG 548–50 ELAN deflection 540–541 and electricity 534–5 ERP effects 539–44 LAN deflection 541–2, 546 linked to syntax 551, 552–3 N400 deflection 542–3, 546–8 P600 deflection 543, 545–6 SAN deflection 544, 548 time-frequency (TF) effects 538–9, 544–8 and wave mathematics 537–9 ellipsis sentences and accents 473–4 and behavioral acquisition 196 and SAT modeling 378, 380 endangered languages 98–9, 119–120 ergativity ergative languages, fieldwork 106–9 in hemodynamic methods 571, 580 Extra-Linguistic Hypothesis 58, 59, 86, 90–93

eye-tracking 333–4, 515–18 cognitive processes 335 data analysis and measures 338–40 fieldwork experiments 494 fixation and attention models 335–6 garden-path impediment 336–7 grammatical processing 340–342 implausibility impediment 337–8 and island constraints 346–9 lexical processing 336, 342 linking grammar to parser 343–5 and reflexive processing 349–54 research, future developments of 651 Fanselow, G. 13, 37 Featherston, S. 41 Fedorenko, E. et al. 620 Fedzechkina, M. et al. 287, 288, 289 Ferreira, F. 472, 475, 500 Féry, C. 455 fieldwork experiments 120–121, 509, 530 alignment, ergative languages 106–9 approaches 101–2 audio materials 496–9 constructing stimuli 498 data collection 98 dealing with phenomena 105 and debriefing 504 future developments of 655–6 importance of hypotheses 102–3 intervention effects 103–4 vs. lab-based research 97–101, 113, 114, 115, 117, 491–2, 659–60 and language endangerment 98, 119–20 materials 117–18, 493–5 methods 119 participants, consent 115–16, 505–6 participants, involvement 112–14, 508 participants, literacy 116–17 participants, motivations 507–8 participants, number 114–15 participants, pilot 118 participants, recruitment 505 participants, type 98–9 preferential looking 501–2 replication experiment 112 self-paced listening (SPL) 499–501, 502

index teamwork 100, 492–3 use of tablets 502–3 visual materials 495–6 word order 109–12 Fillmore, C.J. 8 Fisher, C. 144 Fodor, J.D. 57, 230, 232, 456, 469, 476 Foraker, S. 382, 383 Foss, D.J. 469 Frank, M. et al. 247–9 Frankland, S.M. 576–7 Fraundorf, S. et al. 473 Frazier, L. 336, 337 Freudenthal, D. et al. 238–9 Friederici, A.D. 540, 569–70 Friedmann, N. 568, 608 Frigo, L. 278 Futrell, R. et al. 283, 288 Gagliardi, A. et al. 158 garden-path sentences 7, 316 ambiguities, and SAT 381–2 behavioral acquisition 187, 193 and eye-tracking while reading 336–7 and self-paced reading 315, 316 Garraffa, M. 609 Gender Mismatch Effect (GMME) and eye-tracking while reading 350–351 and self-paced reading 324–6, 327 Gerard, J. 185 Gerken, L. 152 German accents and ellipsis 474 EEG, deflection effects 540 hemodynamic investigations 569–70, 574–5 magnitude estimation task 12, 41–2 quantifier scope judgment 69–70 SAT and garden-path analysis 382 syntactic dependencies, infant acquisition 156 Gertner, Y. et al. 145 Gibson, E. et al. 283 Gleason, J.B. et al. 613 Goldin-Meadow, S. et al. 282–3 Golinkoff, M. 148 Gómez, R.L. 157

669

Gordon, P.C. 31–5, 38, 46, 128 Goro, T. 70–71, 91, 92, 133 grammatical categories behavioral acquisition methods 140–142 lexical and functional 139–40 Grammatical Deficit Hypothesis 76–9 Greenberg, J. 283–5 Greene, J.D. 576–7 Grillo, N. 608–9 Grodzinsky, Y. 607 Grolla, E. 194 Grüter, T. 503 Gualmini, A. 77–8, 87, 191 Guasti, M.T. 176, 199 Hagoort, P. et al. 546–7 Hahn, M. et al. 285, 286 Hale, J. et al. 552, 553 Hall, M.L. et al. 283 Hawkins, J.A. 289 Hay, J. 285 He, A.X. 142 Head-Turn Preference procedure 140, 156, 159, 302 Hebrew hemodynamic investigations 568 judgment studies 8 hemodynamics 559–60, 584, 585, 637–9 computing and the left anterior temporal lobe (LATL) 564, 571–3, 577, 578 computing and the left inferior frontal gyrus (LIFG) 564, 565–71, 574, 575, 577, 578 computing and the left posterior temporal lobe (LPTL) 564, 565, 575, 577, 578 on the future of neurolinguistics 645–6 magnetic resonance imaging (MRI) 560–562, 568, 571, 574, 576, 577–8, 583 multi-voxel pattern analysis (MVPA) 576, 581 near-infrared spectroscopy (NIRS) 562–3 positron emission tomogrophy (PET) 563 and predictability 575, 577–80 and sentence processing 563–5 and syntactic theory 581–4

670

index

syntactic vs. semantic composition 564, 573–7 Henderson, J.M. et al. 578 Hendrick, R. 31–5, 38, 46, 128 Hickok, G. et al. 608, 620 Hicks, J. et al. 141 Hillis, A.E. et al. 604 Hillyard, S. 542 Hirschbühler, P. 56 Hirsh-Pasek, K. 148 Höhle, B. et al. 156 Hudson Kam, C. 278 Huettig, F. et al. 112 Hupp, J.M. et al. 289–90 Husband, E.M. 472, 574 Hyams, N. 153

ICS cognitive architecture 23 I-language 660 information theoretic complexity metrics 393–4, 523 and electroencephalography (EEG) 398–401, 552–3 entropy 411–12, 523, 578 hierarchical structure 408–12, 525 probability distribution and FSA 401–8, 523 surprisal 398–401, 552–3, 578 Intermodal Preferential Looking Paradigm 143, 301 Ioup, G. 60, 131 islands, island effect and acceptability judgments 15–20 and backward binding 323–6 in electrophysiological methods 541, 543 in eye-tracking experiments 346–9 and memory constraints 647–8 and parasitic gaps 319–23 and quantifier scope judgments 67 in SAT modeling 381 wh-dependency 241–4, 263–4, 320–322, 325, 346 Italian child acquisition 150, 176 modeling acquisition 236 Ito, K. 471

Jaeger, L.A. et al. 351 Japanese artificial language learning 286 filler-gap dependencies studies 513 judgment studies 8, 10 scope judgments 69–70, 71, 91, 92, 133 wh-questions, acquisition 193 Jusczyk, E.L. 156 Kaiser, E. et al. 43–4, 129–30 Kam, X. et al. 240 Kayne, R.S. 579 Kazanina, N. et al. 323–5 Keller, F. 39, 130 Kluender, R. 15–16, 19 Koizumi, M. et al. 110 Korean 9, 109, 477 Kreiner, H. et al. 341–2 Kruyt, J.G. 469 Kurtzman, H.S. 60, 62, 63, 131 Kurumada, C. 471–2 Kush, D. et al. 621 Kutas, M. 15–16, 19, 542 Lago, S. 34 Langsford, S. et al 13–14 Lau, E. et al. 541 Lau, Jey H. et al 23 Leddon, E. 177–8, 199 Lee, E.-K. 473 Legate, J. 235–7 Lehiste, I. 467 Lewis, S. et al. 182–3 Lidz, J. 75, 76–7, 79, 80, 142, 149, 177–8, 180, 194, 196, 198, 199, 246 Linebarger, M. et al. 609 Love, T. 597–8 Lukyanenko, C. et al. 161 Lust, B. et al. 174 MacDonald, M.C. 60, 62, 63, 86 magnetoencephalography (MEG) 533–4 combined and compared to EEG 548–50 linked to syntax 550–551, 552 sensitivity 549 Superconducting Quantum Interference Devices (SQUIDs) 548

index Magnitude Estimation 12–13 binding/coreference 36–7, 41 Makuuchi, M. et al. 569 Mandarin electrophysiological methods 552 judgment studies 9 modeling syntactic acquisition 210, 211, 235 SAT experiments 383 scope judgments 88–9, 91, 92 Maratsos, M. et al. 190 Marr, D. 217, 560 Martin, A. 284, 285, 289, 380 Marty, P. et al. 14 Matchin, W. et al. 567–8, 570, 578, 583, 620 Mayan languages Ch’ol, alignment 106–7 fieldwork participants 113–14, 116–17 Kaqchikel, word order 110–111 Maye, J. 157 McDaniel, D. et al. 177 McDonald, J. 278 McElree, B. 374–85 McKee, C. 160 Michaelis, J. 437 Micham, D.L. 60 Miller, G. 395 Moeser, S.D. 278 Moreton, E. 278 Moulton, K. et al. 39 Musolino, J. 73, 75, 77, 78, 79, 80, 86, 87, 88, 132, 180 Musso, M. et al. 278 Myers, J. 14 Naigles, L.R. 143 Nakayama, M. 191–2, 239, 614 Nelson, M.J. et al. 551, 583 Neville, H. et al. 540 Newmeyer, F. 273 Newport, L. 278 Nooteboom, S.G. 469 noun, noun phrases and behavioral child acquisition 139, 141–4, 149, 152 in binding/coreference judgments 31, 41–2

671

in electrophysiological methods 540, 545, 550, 551, 552 in ergative languages 106–7 and eye tracking 341, 353, 354 in hemodynamic methods 569, 574, 575, 576 in modeling syntactic acquisition 240, 245–6, 248, 259 and the naturalness of syntax 283–6 prosodic phrasing 455, 471–3, 476 and SAT model 380–383 in scope judgments 60, 67 and self-paced reading 316 in a Truth-Value Judgment Task 658 and word order harmony 276, 280–281 null subjects, child language 151–3, 236 Omaki, A. et al. 157, 193 Orfitelli, R. 153 Orita, N. et al. 249–51 Pallier, C. et al. 571–2, 573 paragrammatic speech 595 Wernicke’s aphasia 597–8 Parker, D. 353–4 Parser Hypothesis 58, 76, 79–90 passives in agrammatism 606, 609, 611 child acquisition 188–91 in hemodynamic methods 576 Pearlmutter, N.J. et al. 337, 340 Perfors, A. 240 Phillips, C. 18, 321–3, 344, 347, 353–4 Pickering, M.J. 322, 346–7 Poldrack, R.A. et al. 561 Polynesian languages 106 Poschmann, C. 455 Pozzan, L. 288 preferential looking method 501–2 Principle C 149, 174 infant behavioral acquisition 149, 160–161 judgment tests 32 preschooler behavioral acquisition 173–7, 197–8, 199 pronouns and accents 474 and backward binding 323–6

672

index

comprehension, and anagrammatism 612 infant acquisition 155, 159–61 modeling acquisition 244–51, 261 reflexives, eye-tracking while reading 349–54 reflexives, preschooler acquisition 173–9 prosody 453, 478, 527–9 annotation systems 461–3 boundaries 465–8 comprehension studies 464–5 direct measures 460–461 event-related potential (ERP) studies 465, 477 implicit 476–7 production, materials 458–60 relation to syntactic structure 454–7 research, on the future of 644–5 Visual World paradigm 465, 470–473 see also accents Pycha, A. et al. 278 Pylkkänen, L. 550, 551 quantifiers pronouns and reflexives, preschooler acquisition 177–8 quantifier raising (QR) 67, 69, 77, 80, 93, 94, 196–8 and scope, preschooler acquisition 179–80 Question after Story Paradigm 192–3 Question Elicitation Paradigm 191–2 Rayner, K. 336, 337, 341 Reali, F. 239–40 Reber, A.S. 277 Reinhart, T. 60, 67, 68, 93 relative clauses and accents 473 and electrophysiology 548 and eye-tracking 348–9 fieldwork experiments 502 preschooler acquisition in 187–8 and prosody 455, 476, 477 Reuland, E. 384 R-expressions 31–3, 174, 198 Rhythm and Pitch (RaP) system 462–3 Roeper, T. 195 Rogalsky, C. et al. 620

Sakas, W.G. 230, 232 Saldana, C. et al. 286 Santelmann, L.M. 156 Sasse, H.-J. 119 Saygin, A.P. 611–12 Scandinavian languages 348–9 Schafer, A.J. et al. 459–60, 473 Schūtze, C. 129 scope judgments ambiguities 53–8, 66 Extra-Linguistic Hypothesis 86, 90–93 forced-choice tasks 83–5 Grammatical Deficit Hypothesis 76–9 Incremental Verification Task (IVT) 81–3, 90, 93 and negation 72–5 Overt Scope Preference (OSP) 59–62, 66 Parser Hypothesis 79–90 studies 131–3 Truth-Value Judgment Task (TVJT) 55, 58, 70, 72–5, 79–81, 82, 86, 92, 93, 132–3 universal-existential, adult parsing preferences 59–66 universal-existential, child acquisition 69–71 Vagueness Principle 61 verification strategies 58–9, 89–93, 658–9 Scott, R.M. 145 Sedivy, J. et al. 471 Seidl, A. et al. 158 self-paced listening (SPL) 499–501 self-paced reading (SPR) 18, 34, 327, 513–15 and acceptability judgments 313 argument/adjunct distinction 318–19 backward binding and islands 323–6 and fieldwork 108 and online sentence processing 314–17 parasitic gaps and islands 319–23 reading time (RT) slowdown 313, 314–16, 317, 319, 322, 324, 327, 515 and scope judgments 58, 61, 62–6, 132 Selkirk, E. 456–7, 468 Shattuck-Hufnagel, S. 455 Shetreet, E. 568, 571 Shi, R. et al. 140 sign language 276, 278

index Sloggett, S. 354 Smith, N.V. et al. 278 Smolensky, P. 23 Snedeker, J. 468 Snyder, W. 20 Spalek, K. et al. 473 Spanish basic word order, artificial language learning 282 child acquisition of optional infinitives (OIs) 235, 237 speech clause structure, and acquisition 147 function words, and acquisition 140–141 relevance in fieldwork research 119–20 telegraphic, and acquisition 150–153 see also accents; aphasia; prosody speed-accuracy trade-off (SAT) 363–4, 386, 518–22 acceptability of expressions 367 and anaphora 382–3 and enriched composition 384 and filler-gap processing 381 functions 368–71 and garden-path ambiguities 381–2 interaction between syntax and memory 377–80 memory operations 372–6, 385–6 and metaphor 383–4 and metonymy 383 model fitting 371–2 research, on the future of 384–6, 647–8 and scalar implicatures 384 signal time-points 366–7 timing measures 364–6 Speer, R. 471 Sprouse, J. 12–13 Stabler, E.P. 437, 446, 583 Steedman, M. 57, 67, 68, 456 Steinhauer, K. 541 Sternberg, S. 365 Stevens, S.S. 12 Stolterfoht, B. et al. 474 Stowe, L.A. 18 Stromswold, K. 192, 567 Sturt, P. 349–54 Sussman, R.S. 494–5, 499

673

Sutton, M. et al. 161 syntactic dependencies (infant acquisition) morphosyntactic 154, 156–8 movement 154–5, 158–9 referential 155, 159–61 Syrett, K. 76–7, 197, 198 Szendrői, K.E. et al. 70 Tabullo, A. et al. 275, 282 Takahashi, E. 159 Tanner, D. 541 Tavakolian, S. 187 Temme, A. 39 Thornton, R. 175, 187, 197 Tily, H. et al. 282 Tomasello, M. 188 Tones and Break Index (ToBI) system 461–2 Townsend, D.J. 584 Traxler, M. 322, 346–7 Trueswell, J.C. 288, 468 Truth-Value Judgment Task (TVJT) binding-coreference judgments 42 null subjects, infant acquisition 153 preschooler behavioral acquisition 174–5, 178, 180, 199–200 scope judgments (quantifier) 55, 58, 70, 72–5, 79–81, 82, 86, 92, 93, 132–3 Tunstall, S. 61, 67, 131–2 Turk, A.E. 455 Tutunjian, D. et al. 348 UTAH (Uniformity of Theta Assignment Hypothesis) 253–5 Valian, V. 152, 189 van Hell, J.G 541 verbs clustering, modeling acquisition 252–3 do-be construction, and prosody 455–6 in hemodynamic investigations 568–9, 579–80 -ing form, syntactic dependencies 154, 155, 156 passive constructions, preschooler acquisition 188–91 propositional attitude, preschooler acquisition 181–3

674

index

raising and control, preschooler acquisition 183–5 reading time, sentence processing 315–16 root (or optional) infinitives, acquisition 151, 235–6, 237–8 and speech disfluencies 475–6 transitivity, and infant acquisition 143–6 Verhoeven, E. 39 verification strategies 658–9 binding and coreference patterns 42–5, 48 scope judgements 58–9, 81–5, 90–3 Viau, J. et al. 78–9 visual context acceptability judgments, binding/coreference 42–5, 130 materials, in fieldwork experiments 495–6 methods, in fieldwork experiments 501–3 prosody, Visual World paradigm 465, 470–473

Wagers, M. 344 Wagner, M. 455 Wang, L. et al. 547 Wasow, T. et al. 455 Watson, D. et al. 470–471, 473 Weber, A. et al. 471 Weiss, S. et al. 548 Weskott, T. 13, 37 Wexler, K. 176–7 wh-agreement and acceptability judgment 8 and eye-tracking 346–7, 494 fieldwork experiments 105, 494, 499, 502 and hemodynamics 567–8 island effects and online processing 320–321, 325, 326

modeling acquisition 209–10, 211, 213–14, 220, 221–3, 241–4 movement dependencies, infant acquisition 154, 155, 158 preschooler acquisition 192–5 SES children data 263–4 whether-sentences 15–18 Willems, R.M. et al. 577–8 Wilson, M. 611–12, 620 word order and agrammatism 606–7, 619 basic, artificial language learning naturalness 276, 279–81 behavioral acquisition 147–9 and dependency-length minimization 288–9 fieldwork investigations 109–12 harmony, artificial language learning simplicity 276, 279–81 working memory (WM), phonological and morphological 599 and agrammatism 616–23 cognitive deficits 605, 613–15 Wurmbrand, S. 67–8 Yamakoshi, K. 195 Yang, C. 223–4, 235 Yasunaga, D. et al. 110, 118 yes/no questions preschooler behavioral acquisition 191–2 structure dependence 238–41 Yoshida, M. et al. 19, 325 Yuan, S. 144 Zaccarella, E. 569–70 Zhou, P. 88–9 Zurif, E.B. 606, 613

OX F OR D HA N DB O OK S I N L I NG U I ST IC S Recently published

THE OXFORD HANDBOOK OF AFRICAN AMERICAN LANGUAGE Edited by Sonja Lanehart

THE OXFORD HANDBOOK OF AFRICAN LANGUAGES Edited by Rainer Vossen and Gerrit J. Dimmendaal

THE OXFORD HANDBOOK OF APPLIED LINGUISTICS Second edition Edited by Robert B. Kaplan

THE OXFORD HANDBOOK OF ARABIC LINGUISTICS Edited by Jonathan Owens

THE OXFORD HANDBOOK OF CASE Edited by Andrej Malchukov and Andrew Spencer

THE OXFORD HANDBOOK OF CHINESE LINGUISTICS Edited by William S-Y Wang and Chaofen Sun

THE OXFORD HANDBOOK OF COGNITIVE LINGUISTICS Edited by Dirk Geeraerts and Hubert Cuyckens

THE OXFORD HANDBOOK OF COMPARATIVE SYNTAX Edited by Gugliemo Cinque and Richard S. Kayne

THE OXFORD HANDBOOK OF COMPOSITIONALITY Edited by Markus Werning, Wolfram Hinzen, and Edouard Machery

THE OXFORD HANDBOOK OF COMPOUNDING Edited by Rochelle Lieber and Pavol Stekauer

THE OXFORD HANDBOOK OF COMPUTATIONAL LINGUISTICS Second edition Edited by Ruslan Mitkov

THE OXFORD HANDBOOK OF CONSTRUCTION GRAMMAR Edited by Thomas Hoffman and Graeme Trousdale

THE OXFORD HANDBOOK OF CORPUS PHONOLOGY Edited by Jacques Durand, Ulrike Gut, and Gjert Kristoffersen

THE OXFORD HANDBOOK OF DERIVATIONAL MORPHOLOGY Edited by Rochelle Lieber and Pavol Stekauer

THE OXFORD HANDBOOK OF DEVELOPMENTAL LINGUISTICS Edited by Jeffrey Lidz, William Snyder, and Joe Pater

THE OXFORD HANDBOOK OF ELLIPSIS Edited by Jeroen van Craenenbroeck and Tanja Temmerman

THE OXFORD HANDBOOK OF ENDANGERED LANGUAGES Edited by Kenneth L. Rehg and Lyle Campbell

THE OXFORD HANDBOOK OF ENGLISH GRAMMAR Edited by Bas Aarts, Jill Bowie, and Gergana Popova

THE OXFORD HANDBOOK OF ETHIOPIAN LANGUAGES Edited by Ronny Meyer, Bedilu Wakjira, and Zelealem Leyew

THE OXFORD HANDBOOK OF ERGATIVITY Edited by Jessica Coon, Diane Massam, and Lisa deMena Travis

THE OXFORD HANDBOOK OF EVENT STRUCTURE Edited by Robert Truswell

THE OXFORD HANDBOOK OF EVIDENTIALITY Edited by Alexandra Y. Aikhenvald

THE OXFORD HANDBOOK OF EXPERIMENTAL SEMANTICS AND PRAGMATICS Edited by Chris Cummins and Napoleon Katsos

THE OXFORD HANDBOOK OF EXPERIMENTAL SYNTAX Edited by Jon Sprouse

THE OXFORD HANDBOOK OF GRAMMATICAL NUMBER Edited by Patricia Cabredo Hofherr and Jenny Doetjes

THE OXFORD HANDBOOK OF GRAMMATICALIZATION Edited by Heiko Narrog and Bernd Heine

THE OXFORD HANDBOOK OF HISTORICAL PHONOLOGY Edited by Patrick Honeybone and Joseph Salmons

THE OXFORD HANDBOOK OF THE HISTORY OF ENGLISH Edited by Terttu Nevalainen and Elizabeth Closs Traugott

THE OXFORD HANDBOOK OF THE HISTORY OF LINGUISTICS Edited by Keith Allan

THE OXFORD HANDBOOK OF INFLECTION Edited by Matthew Baerman

THE OXFORD HANDBOOK OF INFORMATION STRUCTURE Edited by Caroline Féry and Shinichiro Ishihara

THE OXFORD HANDBOOK OF JAPANESE LINGUISTICS Edited by Shigeru Miyagawa and Mamoru Saito

THE OXFORD HANDBOOK OF LABORATORY PHONOLOGY Edited by Abigail C. Cohn, Cécile Fougeron, and Marie Hoffman

THE OXFORD HANDBOOK OF LANGUAGE AND LAW Edited by Peter Tiersma and Lawrence M. Solan

THE OXFORD HANDBOOK OF LANGUAGE AND RACE Edited by H. Samy Alim, Angela Reyes, and Paul V. Kroskrity

THE OXFORD HANDBOOK OF LANGUAGE AND SOCIETY Edited by Ofelia García, Nelson Flores, and Massimiliano Spotti

THE OXFORD HANDBOOK OF LANGUAGE ATTRITION Edited by Monika S. Schmid and Barbara Köpke

THE OXFORD HANDBOOK OF LANGUAGE CONTACT Edited by Anthony P. Grant

THE OXFORD HANDBOOK OF LANGUAGE EVOLUTION Edited by Maggie Tallerman and Kathleen Gibson

THE OXFORD HANDBOOK OF LANGUAGE POLICY AND PLANNING Edited by James W. Tollefson and Miguel Pérez-Milans

THE OXFORD HANDBOOK OF LANGUAGE PROSODY Edited by Carlos Gussenhoven and Aoju Chen

THE OXFORD HANDBOOK OF LANGUAGES OF THE CAUCASUS Edited by Maria Polinsky

THE OXFORD HANDBOOK OF LEXICOGRAPHY Edited by Philip Durkin

THE OXFORD HANDBOOK OF LINGUISTIC ANALYSIS Second edition Edited by Bernd Heine and Heiko Narrog

THE OXFORD HANDBOOK OF LINGUISTIC FIELDWORK Edited by Nicholas Thieberger

THE OXFORD HANDBOOK OF LINGUISTIC INTERFACES Edited by Gillian Ramchand and Charles Reiss

THE OXFORD HANDBOOK OF LINGUISTIC MINIMALISM Edited by Cedric Boeckx

THE OXFORD HANDBOOK OF LINGUISTIC TYPOLOGY Edited by Jae Jung Song

THE OXFORD HANDBOOK OF LYING Edited by Jörg Meibauer

THE OXFORD HANDBOOK OF THE MENTAL LEXICON Edited by Anna Papafragou, John C. Trueswell, and Lila R. Gleitman

THE OXFORD HANDBOOK OF MODALITY AND MOOD Edited by Jan Nuyts and Johan van der Auwera

THE OXFORD HANDBOOK OF MORPHOLOGICAL THEORY Edited by Jenny Audring and Francesca Masini

THE OXFORD HANDBOOK OF NAMES AND NAMING Edited by Carole Hough

THE OXFORD HANDBOOK OF NEGATION Edited by Viviane Déprez and M.Teresa Espinal

THE OXFORD HANDBOOK OF NEUROLINGUISTICS Edited by Greig I. de Zubicaray and Niels O. Schiller

THE OXFORD HANDBOOK OF PERSIAN LINGUISTICS Edited by Anousha Sedighi and Pouneh Shabani-Jadidi

THE OXFORD HANDBOOK OF POLYSYNTHESIS Edited by Michael Fortescue, Marianne Mithun, and Nicholas Evans

THE OXFORD HANDBOOK OF PRAGMATICS Edited by Yan Huang

THE OXFORD HANDBOOK OF REFERENCE Edited by Jeanette Gundel and Barbara Abbott

THE OXFORD HANDBOOK OF SOCIOLINGUISTICS Second edition Edited by Robert Bayley, Richard Cameron, and Ceil Lucas

THE OXFORD HANDBOOK OF TABOO WORDS AND LANGUAGE Edited by Keith Allan

THE OXFORD HANDBOOK OF TENSE AND ASPECT Edited by Robert I. Binnick

THE OXFORD HANDBOOK OF THE WORD Edited by John R. Taylor

THE OXFORD HANDBOOK OF TRANSLATION STUDIES Edited by Kirsten Malmkjaer and Kevin Windle

THE OXFORD HANDBOOK OF UNIVERSAL GRAMMAR Edited by Ian Roberts

THE OXFORD HANDBOOK OF WORLD ENGLISHES Edited by Markku Filppula, Juhani Klemola, and Devyani Sharma