Syntactic Processing: An Overview [1 ed.] 9781003405634, 9781032522258, 9781032522227

This book provides an overview of the structures, topics and main theories of syntactic processing. It covers the last 4

171 89 4MB

English Pages 248 Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Syntactic Processing: An Overview [1 ed.]
 9781003405634, 9781032522258, 9781032522227

Table of contents :
Cover
Half Title
Title Page
Copyright Page
Dedication
Table of Contents
Acknowledgements
Chapter 1 Introduction
1.1 Questions
1.2 The functional architecture of the linguistic mind
1.3 Minimal attachment
1.4 Ambiguity resolution and beyond: the oracle component of parsing
Notes
Chapter 2 Modifier adjunction with special reference to relative clauses inside complex NPs
2.1 Introduction: local adjunction
2.2 Relative clause adjunction
2.3 The adjunction of subject relative clauses to complex noun phrases
2.3.1 Computationally cheap trees: The Garden Path model
2.3.2 The role of frequency: Tuning
2.3.3 A refined two-step account: Construal
2.3.4 The role of segmentation, prosody and silent reading: the Implicit Prosody Hypothesis
2.3.5 Going for the meaning directly: the role of lexical semantics, coherence and reference
2.3.6 All at once in one stage: the Constraint Satisfaction Approach
2.3.7 The Unrestricted Race Model (URM) and the ambiguity advantage effect (AAE)
2.3.8 Grillo and Costa (2014): the Pseudo Relative confound
2.4 Conclusions
Notes
Chapter 3 Agreement
3.1 Introduction
3.2 Agreement attraction
3.2.1 Psycholinguistic theories of agreement
3.2.1.1 Maximal Input
3.2.1.2 Marking and Morphing
3.2.1.3 The cue-based Working Memory Model: retrieval in production
3.2.1.4 The scope of planning + semantic integration account
3.2.1.5 On morphology
3.2.1.6 Attraction in comprehension
3.3 More on comprehension: agreement in brain waves
3.4 Summing up
Notes
Chapter 4 Gap filling
4.1 Introduction
4.2 Working memory
4.3 Recent fillers: controlled PRO and gap-driven parsing
4.4 Filler-driven parsing: the Active Filler Strategy
4.5 Scrambling the word order predictability: the Minimal Chain Principle
4.6 Summary and conclusions
4.7 Epilogue, or when gaps are too radical and reference must be explicit: anaphor resolution
Notes
Chapter 5 On parsers and grammars
5.1 Introduction: on psychological adequacy
5.2 The Separate Grammar Hypothesis: heuristics and good-enough, goal-directed, predictive processing
5.3 The Grammatical Parser Hypothesis: the parser is the grammar
5.4 Grammatical illusions
5.5 On flexibility and opportunism
Notes
References
Index

Citation preview

SYNTACTIC PROCESSING

This book provides an overview of the structures, topics and main theories of syntactic processing. It covers the last 40 years of sentence-level psycholinguistic research and debates and makes it accessible to both theoretical linguists and experimental psychologists. Tying linguistically relevant issues to psycholinguistic theory, this book: • •

• •

Covers the processing of the grammatical phenomena adjunction, agreement and gap flling and discusses the relationship between grammars and parsers Discusses experimental work and theories, demonstrating how psychologists have made real strides in understanding language and how studying the processing of syntactic structure is the same as studying the nature of language Explores the key theories of psycholinguistics, including recent developments Explains the diferent methodologies of sentence processing, such as eyetracking and electroencephalography

Bridging the gap between psycholinguistic research and the study of language, this book is essential reading for advanced students and scholars of linguistics and experimental psycholinguistics as well as cognitive science and psychology. Carlos Acuña-Fariña is Full Professor of English Language and Linguistics at the University of Santiago de Compostela, Spain. He is the author of top-tier research of a theoretical kind in linguistics and of an experimental kind in both psycholinguistics and neurolinguistics. The latter has been done using the methodologies of selfpaced reading, eye-tracking and electroencephalography and has mostly covered the topics discussed in this book: relative clause adjunction ambiguities, agreement, gap flling and the relationship between grammars and parsers.

SYNTACTIC PROCESSING An Overview

Carlos Acuña-Fariña

First published 2024 by Routledge 4 Park Square, Milton Park, Abingdon, Oxon OX14 4RN and by Routledge 605 Third Avenue, New York, NY 10158 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2024 Carlos Acuña-Fariña The right of Carlos Acuña-Fariña to be identified as author of this work has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN: 978-1-032-52225-8 (hbk) ISBN: 978-1-032-52222-7 (pbk) ISBN: 978-1-003-40563-4 (ebk) DOI: 10.4324/9781003405634 Typeset in Bembo by Deanta Global Publishing Services, Chennai, India

Because every cloud has a silver lining, this book is very much a product of the pandemic, a time that I spent, adorably (I am embarrassed to add), at home with Mela, Aitana and Diego. I dedicate it to them for making me feel whole during that time, and for making me laugh every day.

CONTENTS

Acknowledgements

x

1 Introduction

1

1.1 1.2 1.3 1.4

Questions 1 The functional architecture of the linguistic mind 7 Minimal attachment 8 Ambiguity resolution and beyond: the oracle component of parsing 12 Notes 16

2 Modifer adjunction with special reference to relative clauses inside complex NPs 2.1 Introduction: local adjunction 18 2.2 Relative clause adjunction 21 2.3 The adjunction of subject relative clauses to complex noun phrases 23 2.3.1 Computationally cheap trees: The Garden Path model 23 2.3.2 The role of frequency: Tuning 26 2.3.3 A refned two-step account: Construal 35 2.3.4 The role of segmentation, prosody and silent reading: the Implicit Prosody Hypothesis 40

18

viii

Contents

2.3.5 Going for the meaning directly: the role of lexical semantics, coherence and reference 45 2.3.6 All at once in one stage: the Constraint Satisfaction Approach 50 2.3.7 The Unrestricted Race Model (URM) and the ambiguity advantage efect (AAE) 54 2.3.8 Grillo and Costa (2014): the Pseudo Relative confound 57 2.4 Conclusions 62 Notes 63 3 Agreement

66

3.1 Introduction 66 3.2 Agreement attraction 73 3.2.1 Psycholinguistic theories of agreement 75 3.2.1.1 Maximal Input 76 3.2.1.2 Marking and Morphing 79 3.2.1.3 The cue-based Working Memory Model: retrieval in production 83 3.2.1.4 The scope of planning + semantic integration account 88 3.2.1.5 On morphology 92 3.2.1.6 Attraction in comprehension 94 3.3 More on comprehension: agreement in brain waves 102 3.4 Summing up 111 Notes 115 4 Gap flling 4.1 4.2 4.3 4.4 4.5

Introduction 118 Working memory 123 Recent fllers: controlled PRO and gap-driven parsing 127 Filler-driven parsing: the Active Filler Strategy 136 Scrambling the word order predictability: the Minimal Chain Principle 145 4.6 Summary and conclusions 150 4.7 Epilogue, or when gaps are too radical and reference must be explicit: anaphor resolution 152 Notes 157

118

Contents

5 On parsers and grammars

ix

161

5.1 Introduction: on psychological adequacy 161 5.2 The Separate Grammar Hypothesis: heuristics and good-enough, goal-directed, predictive processing 165 5.3 The Grammatical Parser Hypothesis: the parser is the grammar 177 5.4 Grammatical illusions 185 5.5 On fexibility and opportunism 196 Notes 203 References Index

205 236

ACKNOWLEDGEMENTS

This research was funded by Consellería de Cultura, Educación e Ordenación Universitaria, Xunta de Galicia (ED481B-2022-041), by Axencia Galega de Innovación and Consellería de Economía, Industria e Innovación, Xunta de Galicia (ED431B 2019/2020; ED431B 2022/19) and also by the Spanish Ministerio de Ciencia e Innovación (PID2019-110583GB-I00). I would like to thank my colleagues Javier Pérez Guerra and Isabel Fraga Carou for reading a draft and providing feedback. I would also like to thank the reviewers for the crucial help they provided.

1 INTRODUCTION

1.1 Questions In my experience as a linguist, most linguists do their demanding jobs and pursue their complex scientifc agendas by studying the intricate system we call language from a large number of complementary angles without much regard for the actual psychological basis of whatever they describe and/or try to explain. When they mention ‘psycholinguistic evidence’, they do so mostly in passing, with the ‘evidence’ in question taking up an insignifcant amount of their argumentations. Not many linguists actually know about late closure or minimal attachment, garden paths and Garden Path Theory, construal, cue-based processing, the Filled Gap Efect, attraction, grammatical illusions, marking and morphing, maximal input, the Direct Association Hypothesis, surprisal theory, inhibitory interference, lossy context or the Minimal Chain Principle. When they drop a reference to psycholinguistic experiments, they are, in general, not likely to know much about cumulative self-paced reading, frst-pass and regression-path times, LANs and P600s. This book is intended to show how cognitive scientists interested in language may beneft from a deeper knowledge of psycholinguistic research. It will try to exemplify how a linguist who doubles as a psycholinguist does his/her work in four specifc areas of research: the frst one is adjunction, the operation that places a constituent A next to another constituent B and produces a superordinate constituent A’. This will be done especially by focussing on research on the adjunction of relative clauses to complex noun phrases. The second area will be agreement, with its vast cross-linguistic variability. The third will be the processing of gaps or empty categories. The fourth will use the piecemeal evidence accrued from the frst three areas to focus upon the relationship between a grammar and a parser, with (in order to make that discussion more tangible) grammatical illusions and so-called islands in mind. The frst three areas have been DOI: 10.4324/9781003405634-1

2

Introduction

chosen because they are central to actual language use and they cannot not relate to linguistic theory (that is, no theory of grammar can ignore them). Indeed, possessing a language means that one must have a way of merging/adjoining constituents together in the syntagmatic chain of speech. Additionally, given that over 80% of the world’s languages have some sort of agreement (Mallison & Blake 1981), and that languages vary massively on the size of the morphological components that make agreement possible, no linguistic theory can absent itself from an account of agreement co-indexations either. Likewise, no model of language can aford to remain silent on one of the most fundamental properties of the language faculty: the need to systematically drop given/recoverable information, as language systems cannot and need not spell out all of conceptual structure.1 Fortunately, psycholinguistic research on all these areas is abundant now. What seems to be missing is a bridge. Assuming that you are a linguist with an interest in the idea of psychological adequacy, a psychologist who is not well versed in syntactic structure or a budding psycholinguist, you may have asked yourselves at times: how does the mind do these things? What things? Well, take agreement for instance. The string the little, dark pens are all broken presents seven words and two morphological marks on two of them (pens and are, both plural). The corresponding Spanish sentence (‘los pequeños bolígrafos oscuros están todos rotos’) has also seven words but now the morphological marks go up to thirteen. Assuming a speed of some ffteen– seventeen phonemes per second plus lexical recovery of every piece, phonological encoding, morphological encoding, phrasal packaging, clausal wrap-up and semantic decomposition, etc., all at the same time, it is easy to wonder why one should bother with the computation of so many cues all marking masculine + singular. After all, English proves that the alliterative excess is totally unnecessary (Taylor 2002: 332 f.). And what is masculine about a pen anyway? Why replicate all those -os if they mean nothing? In sum, why make dear computational room for something apparently so useless so continuously? Some languages add case to that complexity. Others put gender on the verb. All linguists have always marvelled at these cross-linguistic mysteries. Some linguists cannot refrain from refecting upon the kind of mind that does that online. Agreement involves a syntactic domain (the phrase, the clause, the discourse) and short-term memory (to keep the morphological features of a controlling head in mind in order to copy them or expect them later on a target, or several targets). It is surely useful to investigate whether these two aspects of it infuence each other, or, alternatively, whether the grammar side is totally independent from the implementation side. Additionally, agreement features often originate in semantics (plurality, animacy, etc.), but once they become routinized in a myriad alliterative schemas of the Spanish -os, -os, -os type (above), does every occurrence of -o (meaning masculine) and -s (meaning plural) tap conceptual structure? If you believe in the Symbolic Thesis of Cognitive Grammar (Langacker 1991a, 1991b) then you may be inclined to answer in the afrmative, and that gives way to an easy hypothesis to entertain: we should be able to fnd

Introduction

3

more semantic interference in agreement operations in Spanish than in English. Do we? Conversely, if you believe that agreement is an operation involving only feature co-variance running on formal rails and reined in by linguistic cycles (something like the Phase Impenetrability Condition of Generative Grammar; Chomsky 2001: 14; see Abels 2012), then another interesting prediction comes to mind: you should not see semantic interference inside a hypothesized formal phase. Refexives provide an interesting test ground on a number of fronts. If you encounter the string *Mary couldn’t bring himself to believe in that, you will no doubt experience a garden path, a sort of momentary breakdown after the onset of himself (whose masculine feature violates agreement with the head Mary). Refexives are usually described as being very syntactically reined in, and c-command is usually mentioned as a confgurational requirement in them. Is that syntactic confguration psychologically strong? Can it be fooled by semantics (Parker & Phillips 2017)? We can play with the confgurations where the refexives show up and see what happens, and we can compare agreement with refexives with other kinds agreement, to see how encapsulated agreement co-indexations really are. We can then relate our fndings to the Symbolic Thesis or the Phase Impenetrability Condition. Past the bridge there are warehouses with people running experiments on all this all the time. In most of them the subject matter is language and the likes of c-command and agreement ad sensum are being tested. The continuous undercurrent is a latent evaluation of the competence vs performance fork. The bad thing is that running experiments is very time-consuming so the people who are actually doing that do not really have too much time to fnd out about c-command or, say, long-distance agreement by visiting the linguistics building. They could really use some help. Take another example, the (psycholinguistically) famous somebody shot the servant of the actress who was on the balcony. It turns out that, statistically, British speakers are more likely to understand that the person on the balcony is the actress, not the servant. Spanish and French speakers are rather more inclined to adjoin the relative clause to the frst noun (servant), instead (Cuetos & Mitchell 1988; Cuetos et al. 1996). Imagine that the context provides two servants and only one actress; that one of the nouns is animate and the other not (the suitcase of the servant); imagine one of the nouns is ten times more frequent than the other; that one is infectionally plural, that one is modifed, that one has no determiner, that the preposition joining the two nouns is ‘theta-marking’ (say, contentful like possessive of ) … imagine that by changing any one of those ‘cues’ the pattern of adjunction of the relative clause to the previous complex NP changes. Why would it? And if so, what would that tell us about the human language faculty? Would we be entitled to talk about a grammar of constraints, instead of one with an ‘elegant’ design? It is hard to see why all that is less relevant to a linguist than, say, the fact that non-restrictive relatives disfavour that and prefer which or who, that the relativizer can be dropped in certain syntactic environments, or that we cannot extract and move certain constituents from certain domains. For one thing, adjunction is geometry, so no one would expect languages to difer

4

Introduction

on that, and, especially, to keep difering given the slightest micro modifcation to the same macro template. If we were to fnd cross-linguistic diferences plus the working of very many constraints, we would surely be challenged to come up with an explanation, as linguists. We would also need to consider a grain-size problem, namely whether the macro template (complex NP followed by RC) is more relevant than the micro parameters (lexical frequency, animacy, attachee size, etc.), or the other way around. All of that is ‘language faculty’ and it impacts the way we see the competence versus performance distinction. Finally, consider so-called empty categories briefy. The generative grammar of the 1980s provided a convenient way of conceptualising elided material. The notions of PRO, pro, NP-trace and Wh-trace originated in that time and will be used here for descriptive purposes only (Haegeman 1994). Examples (1)–(4) illustrate them: (1) Petei tried PROi to be attentive all the time. (2) Vine con María i. proi Es tan maja …. I came with Mary i. She(=proi) is so nicefem .... (3) Ronniei seemed tNPi to be too worried about our performance. (4) What i was Diego worried about tWH i? Of the four categories, PRO and pro are ‘empty’ in the most literal sense in that there really is nothing at the position where they are posited to exist syntactically. Notice that, despite that, we easily understand that the subject of the infnitive to be in (1) is Pete and the subject of the tensed verb es in (2) is María. Wh-trace is also quite experientially salient in that, to take the example in (4), it involves leaving a preposition ‘stranded’ without its complement (what, presumably ‘moved’ to initial position). This is an area where psycholinguistic questions leap to the eye. The frst one is: Are PRO and pro syntactic or semantic? We know that grammar theories have traditionally difered on that. Another question involves just pro. Consider (5) now, from de Vincenzi (1991): (5) Ha chiamato Gianni a. pro has called Gianni ‘he/she/it has called Gianni’ b. t i has called Gianni i ‘Gianni has called’ In (5) there is ambiguity as to who did the calling: in interpretation (5a) that is a context-linked full pro like any other explicit pronoun (he, she, etc.; say, Mary i was really impatient; proi (she) has fnally called Gianni; so Mary did the calling). In interpretation (5b) a sort of trace takes the position of the subject Gianni, which presumably, being new information, is moved to a postverbal place in the overall structure (so Gianni himself made the call).2 Even if we disregard the particular

Introduction

5

agenda of the Government & Binding grammar of the time, if we just assume that Italian is sufciently SVO descriptively, we may ask ourselves: is movement costly? That is, is the interpretation in (5b) harder to process than that in (5a) just because in (5a) nothing is displaced whereas in (5b) Gianni has been moved to a post-verbal position from its ‘typical’ preverbal one leaving something like a trace behind? Long ago Chomsky (1995) voiced an opinion like that. So did de Vincenzi herself. If we were to fnd an efect of movement (meaning that movement is costly), we could further ask ourselves if that could be ameliorated in a language with case. German has case. Are NOM-marked + ACC-marked strings processed more easily than ACC-marked + NOM-marked ones in that language? Additionally, notice that in many languages syncretism often causes phrases to be morphologically ambiguous between the nominative and the accusative cases. This often results in syntactic ambiguity since sentences containing such segments can be interpreted in reference to either a canonical word order or a ‘scrambled’ one. It is surely interesting to see what the minds of the speakers of those languages do in those cases. Various current models of sentence processing appeal to preferred language-specifc (and even construction-specifc) word orders (Levy 2008a; Vasishth et al. 2010). Consider, fnally, raising versus control, as exemplifed in (6) and (7) below: (6) John seems PRO to be happy (7) John wants PRO to be happy3 In a raising structure of the kind in (6) the surface subject makes less sense beside the matrix verb because it is not really an argument of that verb but of the lower one. In (6) John ‘doesn’t do the seeming’; rather it seems that John is happy. In a control structure like (7) that is not the case, as John really wants something (so it is an argument of that verb, as well as of the lower one). That is why an expletive subject is impossible in that position: *it wants that John is happy. This is another way of saying that John in (6) might somehow originate in the lower clause before appearing as subject of the matrix one. Not so in (7), where John is in its lawful position as thematic subject of the matrix verb. Surely an interesting research agenda is to study whether the mind treats raising structures (involving something like real or ‘metaphorical’ movement) diferently from control ones. Generative linguists difer on that, with some believing that raising and control are really the same thing and some others insisting that they are diferent because of their ‘derivational history’ ( Jackendof & Culicover 2003). Finding (or failing to fnd) some kind of psychological evidence of derivational history is surely worthwhile. Over the past forty years or so, some linguists and many psychologists who take linguistic theory seriously have made real strides in understanding language processing. These results can be refected back to inform us about the ways in which processing restricts our ability to understand the formal

6

Introduction

envelope of the grammar due to restrictions from other cognitive systems in use (i.e., the diferent ways that other cognitive systems interfere with understanding the natural shape of the grammar as distorted by processing demands of memory, attention, etc.). A working linguist can take on board these lessons and come away with a sharper and more nuanced understanding of the grammar’s relationship with the wider cognitive system it fnds itself in. This should in no way be taken to mean that the really ‘cool stuf ’ lies in the experiments. Far from it, one of the major strengths of linguistics is that much of it relies on relatively cheap, clear-cut and robust evidence in the form of native speaker judgements of acceptability, which allows rapid cycles of hypothesis and test and provides the mental space to think theoretically. The psycholinguist with an interest in real-time processing does not have that luxury. Typically, the data of psycholinguistics are more expensive, time consuming and noisy – requiring statistical analysis to even discern when a data pattern is real, with many fewer opportunities to replicate. This leads psycholinguists to spend an enormous amount of their time thinking about experimental design and statistics and much less on theoretical issues. So linguists can and probably should learn from psycholinguistics, but that does not need to change the way that they do their job. Virtue is surely in knowing what the neighbour is doing and using that to improve oneself, not in launching or sufering a hostile takeover. Mutual enlightenment is the language spoken here. A major aim of this book is to cover the main debates and achievements of sentence-level psycholinguistics in the past 40 years. It reviews a very large number of experiments on the processing of morphosyntactic structure (more than sixty in Chapter 2 alone) and in so doing it aims to reveal how this kind of work is done, how it relates to classic notions of linguistics, (importantly) what kinds of new questions it illuminates, and how it enriches us as scientists who love language and are continuously mystifed by its ‘existence in the mind’. It is directed both to linguists with little knowledge of psycholinguistics and to experimental psychologists with a limited knowledge of grammar. Importantly, it is also directed to a growing body of young researchers who are now being trained in linguistics and the psychology of language from very early on in their academic careers (psycholinguists). Since psycho-grammar is discussed here, there may be times when particular topics may be of greater interest to one group of readers (the psychologists) or another (the grammarians) and when, accordingly, a particular group may struggle more or less given its previous academic background. Care has been taken to explain basic notions of grammar and of experimental research for the uninitiated in either of these two broad domains. Additionally, the beginning of each chapter will try to signpost what the main points will be for those groups so they know exactly what to expect. Whatever the case, though, this book is simply a psycholinguistic approach to the mystery of grammar and that is likely to interest a large portion of the population of cognitive scientists, regardless of their particular theoretical persuasions.

Introduction

7

1.2 The functional architecture of the linguistic mind The ultimate goal of a psycholinguistic agenda that looks into the processing of syntactic structure is to understand the cognitive processes that take place during language use and to shed light on the functional architecture of the linguistic mind. Most studies involve syntactic comprehension rather than production because experimental control is much easier in the former than in the latter – and it can be much more precise. This is because in production we are almost inevitably condemned to see the end result of message generation, missing the allimportant stage (or stages) taking place in the ideational domain, before actual linguistic output. With the functional architecture of the linguistic mind in focus, we seek to comprehend how mental representations are formed, changed and stored. We also seek to establish what kinds of information are used and, specifcally, whether listeners/readers compute syntactic structure during language comprehension. We may very well assume they do, but, on the one hand, there are theoreticians who believe that you do not really need a map to get from one place to another if you can count on good informative signposts along the way (meaning, if your lexicon is really informationally rich); on the other hand, even if we want to proceed from the assumption that there really is syntax in the mind, we surely want to know how we put it to use. Most importantly, we want to know when we may do so. One side of this syntax-in-the-mind agenda is the venerable question of syntactic autonomy: is syntax cognitively autonomous? Another side of the agenda is also the venerable question of syntactic supremacy or syntactocentrism: is syntax cognitively privileged? This involves pitting presumably syntactic operations against a-syntactic forces and seeing which one prevails. In psycholinguistics, prevailing means observing two things: a. either which one comes frst in the time record; or b. which one interferes with what. If syntactic efects are registered frst in the millisecond-by-millisecond time course of processing, then autonomy presents itself as a viable theoretical option. If, on the contrary, it does not or it is afected by a conceptual structure manipulation (say, animacy or context ft), a lexical dimension (say, lexical frequency) or a purely statistical bias, then it becomes much less viable. The previous comments boil down to whether the functional architecture of the linguistic mind is modular or interactive, serial versus parallel (or both), bottom-up or top-down (or both). One clearly identifable research tradition defends the classic Fodorian view that linguistic computations are automatic, domain-specifc, informationally encapsulated and cognitively impenetrable. As is well known, this view equates linguistic computation with syntactic computation. Everything else that many other cognitive scientists assume is also language (notably, lexical properties and semantics) need not really be an unimpregnable module. Another equally clearly identifable research framework maintains that the language faculty is essentially interactive, a complex, goal-directed, predictive, dynamical system consisting of a fairly large number of interacting constraints and at least a few parallel generative engines. This framework rejects

8

Introduction

the idea that the initial spark to get the whole system going must necessarily be syntax. These have been burning issues in psycholinguistics for at least forty years now, but with the development of sophisticated online experimental methods (such as eye-tracking and various brain scanning tools), and recent developments in probabilistic predictive models, the classic Fodorian view has become more of a challenge still (but see Ferreira & Nye 2018). The philosophies, the arguments and the debates in play are well known because they are the same in linguistics, and so are the players: syntactic structure (say, c-command, islands, binding requirements, bounding, etc.), lexical knowledge (say, a verb’s meaning, a verb’s preferred argument structure, a word’s frequency, etc.), conceptual structure (say, animacy, countability, plausibility, topicality, context ft, etc.), and statistical biases. An example from the early days of psycholinguistic research will sufce to illustrate how all these issues are addressed in the psycholinguistic literature (mostly by non-linguists).

1.3 Minimal attachment Modern psycholinguistic research on language comprehension started after the foundational work by Chomsky and Miller in the 1960s and the famed Derivational Theory of Complexity ‘episode’ of the late 1960s and early 1970s, also right after the work of Tom Bever on parsing principles (Fodor et al. 1974: 320–328; Bever 1970; see Phillips 2013 for a modern reevaluation, and Chapter 5). It may very well be said to have started with a doctoral thesis, and work directly spawned by it afterwards, entitled On Comprehending Sentences: Syntactic Parsing Strategies, by Lyn Frazier, in 1978 (Frazier 1978; Frazier & Fodor 1978). Frazier proposed two overarching principles to account for language processing in all languages of the world: minimal attachment and late closure. These were principles of a formal nature, by which was meant that semantics and/or the lexicon were not supposed to play a part in the initial stages of syntactic computation (remember: the mind ‘sees’ syntax frst, then all the rest, including ongoing context). Let us focus on the former principle briefy here now. Minimal attachment (henceforth MA) meant that the parser prefers to postulate (expect and/or generate) syntactic structures that contain fewer nodes in the tree over competing structures whose generation involves more nodes. In the classic example in (8) below: (8) Amanda believed the senator …, the unfolding structure is compatible with two kinds of continuations: (9) Amanda believed the senator steadfastly (10) Amanda believed the senator was guilty but the continuation in (10) involves an extra node in the tree to make room for the complement clause the senator was guilty (the CC or Complement Clause node in Figure 1.1b). It is therefore hypothesized to be cognitively dispreferred.

Introduction

C

C

NP

(a)

9

VP

V

VP

NP

NP

V

(b) FIGURE 1.1

CC

NP

Minimal (a) versus non-minimal attachment (b)..

At this stage we need to understand two things. First, how we come to know whether either interpretation is preferred or not. Second, why we need to care so much about ambiguous strings. Starting with the former point, it is time to introduce the technological tool of eye-tracking. An eye-tracker is a device that measures eye gaze relative to each and every character of a text millisecond by millisecond. Usually, only the recordings of the right eye are analyzed as that is the eye that connects to the left hemisphere of the brain, where most language comprehension is dealt with. An eye-tracker can give us amazing precision on the reading process, but the most important measures involve: ●









First-pass time or gaze duration: the sum of all the fxations in an area before the eyes move to another area to the left or to the right. Go-past time or regression path duration: the time spent from the moment an area is frst fxated until the eyes move to another area to its right. This also includes the time spent re-reading earlier areas of the sentence. Total time: the sum of all fxations in an area, including the fxations made on a given region during frst-pass and go-past time and ulterior re-fxations. First-pass regressions out: this includes regressions (leftward movements) made out of an area into earlier regions before the eyes move to the right. Regressions in: this includes regressions into a particular area during the whole trial.

These measures somehow refect the fact that when reading, the eyes do not uniformly travel along a straight line but make little jumps instead (called saccades). According to Carreiras and Clifton (2004: 4),

10

Introduction

the eyes fxate on a word for something like a quarter of a second to identify it. About 90% of reading time is spent in fxations, including some regressions to an earlier misperceived word. The typical reader makes about three to four saccadic movements per second. Each movement lasts between 20 to 40 ms, and the eyes remain fxated for about 200 to 400 ms. Following foundational work by Just and Carpenter (1980; see Pickering et al. 2004), it is assumed that if the eyes go quickly past a region of analysis, then that region presents no particular computational challenge by comparison with a similarly sized region that presents a longer time and, especially, more regressive saccades. Ease of reading is equated with preferred structure. If the eyes need not return to a previous region, then it is assumed that the frst-pass journey was suffcient to understand the syntactic structure. If, by contrast, the eyes (stop and) need to go back to a previous segment (we care about which segment in particular, for it will surely be a disambiguating one), then again we assume the parser is now pursuing a repair strategy, instead of the initially preferred one. Let us go back to (9) and (10) and add an unambiguous control: (9) Amanda believed the senator steadfastly (MA) (10) Amanda believed the senator was guilty (non MA) (11) Amanda believed that the senator was guilty (unambiguous control). Frazier and Rayner (1982) recorded people’s eye movements when reading sentences like these and realized that the average reading time (henceforth RT) per character was longer for the likes of (10) than for the likes of (9). There were also more regressions in (10) and these were associated with the disambiguating region (underlined in 10) as compared with the same region in the unambiguous (11). The authors showed that the MA preference exerted itself surprisingly quickly, even in the frst fxation duration of the disambiguating region. MA can be so powerful as a preferred heuristic that rather severe garden paths (i.e., moments of partial or total breakdown in the comprehension process) often result apparently as a result of its blind application, as in (13) versus (12) below, taken from Frazier and Clifton (1996: 11). That is presumably because, on the intended reading, in (13) an additional node is needed to accommodate the relative clause before the object complement node shows up.4 The efects were visible not only in RTs but also in ofine grammaticality judgements: (12) The teacher told the children the ghost story that she knew would frighten them. (13) The teacher told the children the ghost story had frightened that it wasn’t true. Now, if your research agenda is to prove a two-stage model of language comprehension, resting on the view that syntactic and semantic processing involves

Introduction

11

diferent sets of processes (autonomy), and that the brain prioritizes a syntactic cycle of analysis (syntactocentrism), the previous results come in handy. However, to scientists with a diferent research bias, these results only spur them to come up with an alternative explanation. In order to visualize that explanation, consider (14) now frst: (14) The horse raced past the barn fell. (14), due to Bever (1970: 316), is surely the most famous sentence in the history of psycholinguistics (on a par with colorless green ideas sleeping furiously in the world of linguistics).5 This is because it induces a major garden path in everyone who reads it. The sentence starts with an NP designating an animal capable of motion followed by a verb of motion and a PP complement with ideal motion-oriented possibilities. When a second verb comes ( fell) we all experience the difculty of having to fnd a place for it in a structure that we had felt to be already complete. It does not take much imagination to predict an eye-tracking crisis (with the eyes frantically going backwards in an attempt to salvage an interpretation) when reading such strings. What strings in particular? Well, this is the so-called main verbreduced relative ambiguity. Indeed, (14) can mean ‘the horse that was raced past the barn fell’, and now, on the reduced relative interpretation, everything makes sense (compare the horse ridden past the barn fell, the horse that was raced past the barn fell or the horse racing past the barn fell, all suggested in Bever 2009). The main verb versus reduced relative ambiguity was soon summoned by MA proponents to promote the notion of MA. Indeed, the main verb interpretation (involving fewer nodes) is the one we unerringly prefer for (14), and it actually takes mental efort to undo it. Based on the previous considerations, we would surely agree that in (15): (15) The witness examined by the lawyer was useless. we should experience a (minor) garden path following our bias to understand the unfolding structure up to the by phrase as a ‘minimal’ active structure (as in, say, the witness examined the photographs of the scene). Ferreira and Clifton (1986) reasoned that contextual or plausibility manipulations should have no efect on a frst pass of processing when dealing with these structures. This means that in (16), which is biased towards a passive reading, we should also experience the same kind of minor garden path, despite the bias: (16) The evidence examined by the lawyer was useless. That is precisely what they found: the same reading difculty for both sentences, a fact that they used to argue for the view that plausibility/contextual information could be used only during reanalysis, after the workings of MA. After a series of experiments confrming this, Trueswell et al. (1994) came up with counterevidence. This took the form of what would later become an

12

Introduction

habitual pattern: Trueswell et al. reexamined the Ferreira and Clifton structures closely and found out that they contained confounds that contaminated the experimental comparisons at issue. For instance, many of the examples that had been used in the ‘helpful context condition’ (e.g., 16) admitted other interpretations, thus invalidating the presumed contextual beneft. When they improved materials and data analyses they could show that a plausible context/ world knowledge did annihilate most of the difculty associated with the passive, reduced relative interpretation. MacDonald et al. (1994) reexamined previous results further and managed to show that context plausibility was not the only a-syntactic factor intervening in syntactic ambiguity resolution. They studied the lexical frequency of the verbs actually used in past experiments and found an interesting correlation: those verbs that tended to appear in the sentences which were preferentially interpreted as matrix sentences appeared more times in corpora as main verbs in simple transitive structures, whereas those other verbs that appeared in the sentences which were preferentially construed as reduced passive relatives appeared in corpora more often in the passive form (for instance, a sentence that starts with the sofa scratched … is likely to induce a minor garden path because the verb scratch is rarely used in the passive and sofas do not usually scratch anything but are scratched themselves; Trueswell 1996). This underscored the role of probabilistic information in parsing (Hale 2001; Levy 2008a; Gennari & MacDonald 2009). In this connection, MacDonald et al. made a point too that the preferred argument structure of a verb should be taken into account. And they fnally were bold enough to include a little section by the end of their paper which they entitled ‘Parsing without a Parser’, a clear pointer to the idea that most (probably not all, they conceded) apparent syntactic ambiguity resolution is, in fact, the consequence of lexical ambiguity. So, in the lexicalist model they proposed, far from syntax being privileged, it would rather be a secondary player at most, in humble combination with many other constraints. So, an initial hypothesis of a purely elegant, formal nature that looked all too promising and perhaps all too good (MA) soon had to face the test of semantic/ pragmatic/lexical interfacing. During the unfolding of the battle of views, a richer knowledge of the psychological reality of language was nicely revealed.

1.4 Ambiguity resolution and beyond: the oracle component of parsing As can be seen, the dynamics of investigating processing bias rests principally on a timing issue and on an interference agenda. If you postulate that something takes precedence (syntax), then fnding that a conceptual structure manipulation is efective at a really initial round of processing is a problem. If you postulate informational encapsulation, then a conceptual or a lexical cue interfering with a presumably syntactic choice is a problem too. It is surely useful to compare this research agenda with a doctor administering a dose of a medicine to a body and seeing how the body reacts. Say you have an ambiguous structure relatable to a

Introduction

13

phrase structure tree defned in reference to MA and you inject a dose of plausibility in it (thus forcing a reading of the string contrary to the MA tree). If there is a reaction in the body then MA is not all there is to its ordinary functioning. You then need to keep injecting in the body all kinds of other doses (of either a formal or a functional/semanticist nature) to see what makes it react (admittedly, this is a rather strange metaphor). As can also be seen, a lot (though by no means all) of this research agenda rests also upon the existence of structural ambiguities that experimental researchers can manipulate and measure. The manipulation is typically a disambiguating one. The basic dynamics is to compare the RT patterns of disambiguated versions of an ambiguous string. The disambiguation that takes less time (= less cognitive load) is assumed to refect the mind’s processing bias. Thus, in the ha chiamato Gianni structure of (5) above, for instance, we need to disambiguate towards a reading where Gianni is the caller (the displaced subject version), and another in which Gianni is the callee (the in-situ version, with Gianni as object). If, for instance, the second one takes less time, then we reach the conclusion that movement of the subject across the verb is indeed dispreferred: that movement is costly. An important prerequisite to this agenda is the existence of ambiguities that we can disambiguate. So how ambiguous is language? It turns out a lot. In fact, this aspect is quite often exploited in humorous environments, as when Groucho Marx used a well-known modifer attachment ambiguity: “I shot an elephant in my pajamas. How he got into my pajamas I don’t know” (from the movie Animal Crackers [1930]). Careless headlines often also contain ambiguities that later become a kind of joke: Prostitutes appeal to pope; Soviet virgin short of goal again. In 2006 a CNN headline went “Leahy Wants FBI to Help Corrupt Iraqi Police Force”. The following is a long list of ambiguous structures of various kinds that populate the Internet, many of which became popular since they frst appeared in Stephen Pinkers’s famous book The Language Instinct. I predict you are going to experience numerous garden paths right now, some of them of colossal proportions, but bear in mind that all these strings are perfectly grammatical once you manage to pull up the right syntactic tree in your mind: (17a) (17b) (18a) (18b) (19) (20) (21) (22) (23) (24)

The wolf was shot by the angry farmer with a scar. The wolf was shot by the angry farmer with a rife. They told us they were going to do it tomorrow! They told us they were going to do it yesterday! The old man the boat. The lady who sews dresses beautifully. I convinced them pets are dirty. Fat people eat accumulates. The prime number few. While mom was dressing the baby wouldn’t stop crying!

14

(25) (26) (27) (28) (29) (30) (31) (32) (33) (34)

Introduction

A friend that I had really loved musicals. Until the police arrest the drug dealers control the street. They painted the house with cracks. Tonight we will discuss sex with Dick Cavett. The cotton clothing is usually made of grows in Mississippi. My mum gave the lady a dog bit an antiseptic. If you always jog two miles and a half is easy. Vegetarians don’t know how good meat tastes. I’ll tell you when they arrive. She wants to marry a Norwegian who is rich.

Many of these structures are permanently ambiguous, like other famous instances such as cleaning ladies can be delightful and visiting relatives can be boring (eating pizza with a friend versus eating pizza with a fork, etc. See Altmann 1998). Others, however, are only temporarily so, such as the one we have already discussed in examples (15)–(16) above: the witness examined …. Take another one frst discussed by Bever (1970) and then in The Language Instinct, nicely explained by Pinker himself (1994: 208): (32) The plastic pencil marks … (…), the parser has to keep several options open: it can be a four-word noun phrase, as in The plastic pencil marks were ugly or a three-word noun phrase plus a verb, as in The plastic pencil marks easily. In fact, even the frst two words, The plastic, are temporarily ambiguous: compare The plastic rose fell with The plastic rose and fell. Note that even the ambiguities that have a lexical origin (e.g. the prime number few, where number is used as a verb) often have cascading structural consequences, since they enforce a diferent syntactic structure. Structural ambiguities are not just convenient for us to manipulate as tools; they embody a fundamental lesson in our quest to reveal the true nature of the human language faculty. This is because they difer from lexical ambiguity per se in a very important way. In 1979, David Swinney made a great discovery when he managed to show that ambiguous lexical items activate all of their disparate meanings in parallel, however counterintuitive that may sound. Swinney (1979) had participants in an experiment listen to sentences like (33), containing the ambiguous word bug (equibiased, out of context, towards the meaning of either insects, on the one hand, or surveillance device, on the other): (33) Rumor had it that, for years, the government building had been plagued with problems. The man was not surprised when he found several spiders, roaches and other bugs in the corner of his room.

Introduction

15

Contexts were either neutral or biased towards one of the interpretations of the ambiguous noun, as in (33) above, clearly bent towards the meaning of ‘insects’. Just when the word bugs was heard either ant or spy or an unrelated word such as sew, or a non-word, were fashed on a screen (this technique is called cross-modal priming, as it mixes two modalities of processing: the auditory and the visual). Participants were simply asked to decide whether the string of letters on the screen was a word or not (a so-called lexical decision task). Results indicated that both ant and spy were primed by bugs, that is they were processed faster than in a baseline condition out of context, but crucially there was no priming efect for the unrelated word sew. So participants were thus unconsciously activating the two meanings of the word in parallel. Three syllables downstream from that word, the priming efect vanished for the meaning that was not contextually supported (so in (33) above only ant would receive the benefts of a priming efect, not spy). Lexical access has evolved to become so amazingly fast that people have been shown to activate bathing suit when hearing something like the new distance, as the cross-word segment new-dist presumably activates nudist (see Altmann 1997 for an entertaining account of lexical access). This “promiscuous” ( Jackendof 2007a: 12), trigger-happy behavior of the lexical processor is mirrored by the syntactic processor, but with a crucial difference: when processing syntactic ambiguities the parser does not process all of the alternative possibilities in parallel, at least not so exhaustively, but instead does so incrementally. That is, syntactic representations are temporarily built on the fy, as it were, on a word-by-word or phrase-by-phrase basis, without waiting for the completion of the entire sentence structure and for a thorough search of all competing structural alternatives (Marslen-Wilson 1973).6 We know that because, as the long list of examples above show, we constantly fall down the garden path. It was this discovery, namely that we bet on the horse past the barn all the time, that launched a research agenda that looks into how we do the betting. The agenda clearly surpasses the confnes of ambiguity resolution and consists in understanding the oracle, that is, the set of principles or heuristics that the mind deploys in order to deal with a syntactic structure, something we do all the time in ordinary comprehension. In fact, since we have very strong evidence that ordinary language comprehension is an eminently predictive process (Hale 2001; Levy 2008a; Traxler 2014; Futrell et al. 2020), all structures (permanently ambiguous or not) may be said to be at least temporarily ambiguous (Altmann 2013). The oracle can be understood in two diferent ways: either to limit the search space or to search all paths but ranking them diferently. MA is an early proposal of the former kind, of a typical syntactic favour. Lexical frequency or contextual ft were soon ofered as alternatives consistent with an all-paths view, with a clearly diferent orientation. It is important to realize that sometimes ambiguity is exactly the point at which the grammar leaves the system to choose and plays no further role. However, very often, while the grammar does not specify how ambiguity is to be resolved, it does put rather specifc constraints on dependency formation, which can be exploited to investigate real-time structure

16

Introduction

building and the size and scope of the hypotheses that the parser deploys. This will be most evident, for instance, when we examine gap-flling operations and various binding phenomena (Chapters 4 and 5). When and how does the grammar step aside to let other cognitive systems take over? Or is it grammar all the way down? As can be seen, this is a diferent sort of attack on the competence versus performance problem, but the battleground where models compete to provide the best ft with psychological adequacy is eminently of a linguistics kind: either form-driven biases or functional/lexical/conceptual/statistical ones are continuously invoked. What difers from linguistics is the judge, which is not usually a pattern of acceptability judgements but an RT signature. The more sophisticated the technology used to capture that signal, the more information we have to reach theoretical conclusions. The use of reaction time measures to make claims about cognitive processes (mental chronometry) goes back to Donders (1968) and Sternberg (1966, 1969) and has been essential in fostering its dominant position in psycholinguistics ever since. In the remainder of this book, we will have plenty of occasions to see this battle unfold. It is hoped that both linguists with little knowledge of psycholinguistic research and experimental psychologists with no training or experience in syntactic processing (as well as young psycholinguists) will be surprised to see how ingenious researchers in the feld are when it comes to designing linguistic manipulations that reveal complex aspects of the structure of the language faculty. Just as I did above on the methodological innovation known as eyetracking, the diferent methodologies will be explained as we move along and as they show up naturally in the discussion of the diferent theoretical issues. I will very often assume a chronological order in the presentation of those issues. This will be done in an attempt to provide a feel for the evolution of the feld. It may be useful to think of the many diferent theories that will be presented here as out-of-the-box attempts by linguists to understand crucial aspects of the nature of language. But, of course, many of them were not proposed by linguists themselves … This is where it becomes necessary to have an open mind and, if we make it to the end, it is also when it would be a good idea to ask oneself if the contents, the issues, the structures and, especially, the agendas reviewed here are any less about the essence of language than the contents, the issues, the structures and the agendas in the linguistics journals.

Notes 1 ‘Conceptual structure’ is used in the informal sense of ‘thought’ here and in what follows. 2 In languages with liberal word order, when something is new information it is usually placed towards the end of the sentence. This is known as ‘end-focus’ (Lambrecht 1994). For instance, in Spanish if you are asked ‘who gave you that necklace, it was Tom wasn’t it? (¿Quien te dio ese collar; fue Toño, no?), you may reply ‘No, me lo dio Juan’, where the only piece of information that is not old is the subject Juan, which

Introduction

3

4

5 6

17

gets moved to the end. In English, a tyrannical syntax of the Subject-before-Verbbefore-Object type (SVO) prohibits this, so English has recourse to so-called marked focus: No, JOHN did. Marked-focus is expressed via elevated pitch prominence on the relevant constituent. ‘PRO’ is a convention which means that in that position we understand that the subject of the infnitive ‘to be’ ought to be there but it is not. We recover that information by looking into the main clause: the expressed subject of the main clause ( John) is also the understood subject of the subordinate infnitival clause, so it is dropped from this second clause for economy. ‘Little pro’ is the missing subject of a tensed clause, as in the example mentioned above, Mary i was really impatient; proi (she) has fnally called Gianni; See Chapter 4. That is, in (13) we need to invoke a relative ‘marker’ such as that or whom (the children whom the ghost story had frightened) before we can comprehend the object of told, namely that it wasn’t true. So we need to suppress the reading the teacher told the children the ghost story. At the moment of writing, a Google search of the sentence cast 528,000 hits. Traditionally, less incrementality is assumed in production, where whole sentoids refecting the basic dynamics of who-did-what-to-whom are often contemplated as viable encoding units (Bock 1982). See Ferreira and Engelhardt (2006) for a qualifcation of that view that rests on the notion of lexical retrieval.

2 MODIFIER ADJUNCTION WITH SPECIAL REFERENCE TO RELATIVE CLAUSES INSIDE COMPLEX NPS

2.1 Introduction: local adjunction One of the most basic properties of language is its open-endedness or creativity, that is, the fact that we humans can compose new messages of seemingly infnite length and complexity. The following sentence illustrates this: (1) The shocked individual who was contemplating it all with a strange-looking expression on her face that made us all quiver fnally said nothing at all right at the time when something would have been so great. (1) is just one sentence and, at the top, from a structural point of view it is, in fact, easy to understand, as it revolves around a simple ‘saying’ predication. By virtue of its lexical meaning, ‘saying’ predications involve only: (a) an argument doing the saying, in typically subject position; (b) the predicate; and (c) an argument denoting the theme, typically placed after the predicate in English (whatever is being said). This is exactly what we have in (1). Yet (1) feels very diferent from (2) in complexity, if nothing else: (2) Tom said nothing. The reason is straightforward: while in (2) each of the three necessary constituents is maximally simple (being instantiated by just one word each), in (1) two of the three constituents (the subject and the object) are far from simple. Keeping only to the subject phrase, this has 21 words entering various interrelationships among themselves. The word ‘shocked’ impacts ‘individual who was contemplating it all with a strange-looking expression on her face that made us all quiver’. The stretch ‘who was contemplating it all with a strange-looking DOI: 10.4324/9781003405634-2

Modifer adjunction with special reference

19

expression on her face that made us all quiver’ impacts ‘individual’. The stretch ‘with a strange-looking expression on her face that made us all quiver’ impacts ‘contemplating’; the compound segment ‘strange-looking’ impacts ‘expression’, and so do ‘on her face’ and ‘that made us all quiver’. All that impacting is adjunction, the structure-building operation that makes language and/or thought so richly expressive. In formal grammars adjunction is seen as the basic mechanism that attaches a constituent, B, to another constituent, A, and produces yet another constituent, A’, that is, one that does not alter the valence of the host. This makes node reiteration possible. Thus, the merging of a verb and its direct object creates a VP, but the adjunction of an adverbial to that VP simply produces a reiteration of the VP node; and so does the adjunction of yet another adverbial; and of yet another one, etc. So merging a predicate and its obligatory arguments involves structures like (2), but we all know that ordinary language use is much more than that, as (1) shows. The basic intuitive insight is that adjunct expressions are somewhat loosely connected to their heads, which explains why they are typically optional and cling to maximal projections.1 In the formal literature this operation is often called Chomsky-adjunction (to distinguish it from Joshi-adjunction in TreeAdjoining Grammar; Joshi et al. 1975). Even if adjunction originates in conceptual structure, there can be little doubt that once it is expressed linguistically, it must be subject to some kind of formal requirement/s, as linguistic messages are necessarily linearized. That is, we cannot output all of the adjuncts in the subject phrase in (1) above all at once in a kind of abstract formless mass that hovers around the noun ‘individual’. In Latin, we could – somewhat metaphorically – do something like that, as each new adjunct would come with a tag (case) pointing to where the adjunct in question is to do the impacting. In the English message in (1), however, what we observe is that ‘shocked’ adjoins to a local string ‘individual who was … ’; that the string ‘who was contemplating …’ adjoins to local ‘individual’ as well; that ‘strange-looking’ adjoins to local ‘expression’, and that ‘on her face’ does the same. Notice too that even though locality seems to be the preferred mode of structure-building in (1), it cannot be all. The string ‘with a strange-looking expression on her face that made us all quiver’ adjoins to ‘contemplating’ but it is not directly attached to it. Likewise, ‘that made us all quiver’ is directly close to ‘face’ but it impacts ‘expression’ instead (what makes us quiver is the expression on the face, not the face itself ), in the most likely interpretation at least. Usually, locality has been invoked by so-called principle-grounded theoreticians as a major constitutive player in grammar. The term itself fgures in wellknown places in the generative tradition, with locality efects and laws frequently populating the discussions in the feld. Interestingly, it does not fgure at all in rival theories, such as the Cognitive paradigm or Construction Grammar. Even if only via a loose kind of associationism, this might lead one to view it as the classic feature that belongs in an encapsulated conception of grammar (it actually is often viewed in that way). In the world of formal grammars, locality occupies

20 Modifer adjunction with special reference

a privileged position alongside the notion of adjunction. Intuitively, it is easy to see why. Take the message in (3): (3) The lawyers will read the documents your accountant and his team sent them tomorrow. Readers typically experience difculty when reading sentences like this because they cannot refrain from adjoining the adverbial expression ‘tomorrow’ locally to ‘sent’, instead of non-locally to ‘will read’, relative to which it makes much more sense. And this is the issue: since ‘sense’ is fooled in the interpretation of the likes of (3), and the fooling is being done by a locality bias, it is tempting to conclude that that locality bias is not ‘sense’. If it is not ‘sense’, then it can only be ‘encapsulated grammar’. I fnd this association/appropriation strange, since after all, say, a predator looking at two possible preys A and B would surely settle on attacking the more proximal one, other things being equal (there is of course nothing modularly linguistic in that operation; Acuña-Fariña 2016). Be that as it may, locality-driven garden paths are quite habitual in normal language understanding. In the introduction, we mentioned the famous I shot an elephant in my pajamas. How he got into my pajamas I don’t know, whose humorous efect arises only because linking ‘in my pajamas’ locally to ‘an elephant’ is tempting. The ambiguities in (4)–(6) were already mentioned in the introduction as well; (7) is a version of a classic in the literature on sentence processing. They all rest on the working of an apparent refex to assemble phrasal packages locally: (4a) The wolf was shot by the angry farmer with a scar. (4b) The wolf was shot by the angry farmer with a rife. (5) They painted the house with cracks. (6) Tonight we will discuss sex with Dick Cavett. (7) I saw the man with the binoculars. The inescapable prevalence of a locality bias in language was soon captured by the earliest researchers interested in the processing of language. Essentially, this is Kimball’s (1973) Principle 2, known as the principle of Right Association (“Terminal symbols optimally associate to the lowest nonterminal node”), invoked “to explain the frequently observed fact that sentences of natural language organize themselves generally into right-branching structures” (p. 24) rather than into more complex left-branching and centre embedding ones. Among the examples mentioned by the American scholar was (8), where ‘out’ hangs loose at the end of the structure forced to be related to either ‘take’ or to the even more local ‘New York’ phrase inappropriately (instead of to distant ‘fgured’, as in fgure out): (8) Joe fgured that Susan wanted to take the train to New York out.

Modifer adjunction with special reference

21

Writing in the wake of Tom Bever’s (1970) ground-breaking research on processing heuristic devices (Chapter 5), Kimball also contemplated Right Association as a kind of syntactically based heuristic principle, one that could interact with other principles like semantic information or argument structure lexicality (which he conceived of as “outside information infuencing parsing”). Right Association was meant to enter the scene if “no outside efects are relevant” (pp. 29–29).

2.2 Relative clause adjunction Prepositional phrases (PPs) and relative clauses (RCs) are the two classic types of adjuncts with the greatest tree-growth (and, therefore, complexity) potential. Unlike adjective phrases or adverbial phrases, whose growing potential is limited, both PPs and RCs greatly increase expressive variety because they easily and naturally accommodate NPs inside, and NPs are capable of great expansion potential on their own. RCs actually accommodate inside not just NPs but full predications, which makes them the ideal recursive device. (9) contains three PPs; (10) three RCs; (11) a combination of both: (9) I shot the sherif of the town with all those ugly-looking houses of strange functionality. (10) I used to live in the town which had received all the foreigners who had escaped from the riots that had plagued the entire region. (11) I shot the sherif of the town by the river which had received all those foreigners with strange looks who had escaped from the riots that had plagued the entire region during all the long years of the crisis. The overwhelming consensus is that whereas PP adjuncts show Right Association, adjoining to the nearest host, RCs need not do so, at least cross-linguistically (Traxler et al. 1998; Phillips & Gibson 1997; Hemforth et al. 2000): (12) They envied the servant of the actress on the balcony [actress is on balcony]. (13) They envied the servant of the actress who was on the balcony [either servant or actress may be on the balcony, but not both]. This does not mean that RCs are inherently extremely ambiguous by being insensitive to the benefts of locality. Long ago Kimball (1973) observed that (14) and (15) are not likely to mean the same, whereas (16) and (17) could, just because in the former pair the RC associates to the closest host in each: (14) The woman that was attractive took the job. (15) The woman took the job that was attractive. (16) The woman that was attractive fell down. (17) The woman fell down that was attractive.

22

Modifer adjunction with special reference

The remainder of this chapter will be devoted to understanding how RC adjunction works, especially with structures like (13) in mind, but before that it is important to zero in further on the kind of RC we need to focus our attention upon. To date the vast majority of work done on RC adjunction has used subject relatives, as in (18), instead of object relatives, as in (19), or other kinds: (18) The woman who kissed the Pope was thrilled. (19) The woman who the Pope kissed was thrilled. It is a well-known fact that subject relatives (SRCs) are both easier to produce and to understand than object relatives (ORCs; Wanner & Maratsos 1978; Zukowski 2009; Gibson & Wu 2013; Lin 2014, among many others; see chapter 3). Various proposals have been put forward to account for this, ranging from an appeal to Keenan and Comrie’s (1977) Accessibility Theory to an appeal to the blind application of Bever’s probabilistic ‘NVN’ heuristic, that is, the bias to preferentially expect and produce NP VP NP structures as Agents performing some kind of Action on Patients (see Chapter 5). A relevant diference between the two kinds of RC can be captured in the following rendition of (18) and (19) above: (18’)The woman whoi [GAPi] kissed the Pope was thrilled (19’)The woman whoi the Pope kissed [GAPi] was thrilled As the glosses suggest, RCs not only involve adjunction but gap-flling, in the sense that relativizers like which/who/that act as instructions to locate a gap downstream from where the relativizers themselves appear in the chain of speech. This is known as fller-driven parsing (Chapter 4). That is, the moment we hear something like I saw the girl who … ‘who’ lets us know that we need to keep the referent ‘the girl’ in mind till we fnd a place for it in the unfolding structure. There is no predicting where that place is going to be (Pinker 1994: 218 f.): (20) The woman whoi [GAPi] saw the Pope was thrilled. (21) The woman whoi the Pope saw [GAPi] was thrilled. (22) The woman whoi the Pope came with [GAPi] was thrilled. (23) The woman whoi the Pope came here to see John betrothed with [GAPi] was thrilled. (24) The woman whoi the Pope wanted everyone here to understand he would not have trouble at all in asking the Vatican to give money to [GAPi] was thrilled. As the glosses also suggest, ORCs involve longer fller to gap distance than SRCs, a fact that brings short-term memory into play. In fact, the longer distance (a cognitively costly anti-locality setup) brings in a further set of potential complexity factors, such as an increased number of discourse referents (Gibson 2000), or the degree of similarity between these other referents and the true

Modifer adjunction with special reference

23

fller (Grillo 2009; Van Dyke & McElree 2006). All this puts ORCs directly in the realm of long-distance dependencies (Wagers & Phillips 2014). This, in its turn, brings notions such as similarity-based interference, cue-based retrieval and active gap-flling to the fore. In Chapters 4 and 5 we shall look into this kind of research. For the remainder of this chapter we focus on SRCs only.

2.3 The adjunction of subject relative clauses to complex noun phrases 2.3.1 Computationally cheap trees: The Garden Path model We mentioned in the introduction that modern psycholinguistic research may be said to have started with the linguist Lyn Frazier’s PhD dissertation in 1978, On Comprehending Sentences: Syntactic Parsing Strategies. In that publication and also in The Sausage Machine: A New Two-Stage Parsing Model, which Frazier published with her PhD supervisor, Janet Fodor, also in 1978, it was proposed that processing systems obey two maximally simple strategies when parsing the incoming fow of speech. The frst one is Minimal Attachment (MA) (loosely: preferentially expect trees with fewer nodes over trees with more nodes; see Introduction) and, when MA fails to yield a preference, Late Closure (henceforth LC): Late Closure: If grammatically permissible attach new items into the clause or phrase currently being processed (i.e., the phrase or clause postulated most recently.2 Frazier (1987: 562) LC is tantamount to Right Association, the formalization of a locality bias. These principles composed what came to be known as the Garden Path (henceforth GP) model of language comprehension (Frazier & Rayner 1982). The essence of this model is its universality and its syntactocentrism. Thus, on the one hand, the two parsing principles are supposed to work for all languages of the world, and, on the other, they are supposed to work as a syntactic module before any other form of higher-level information is consulted by the parser. Notable candidates for the second stage of parsing are lexical and statistical biases and meaning, of whatever kind (semantics, pragmatics, etc.). The appeal to a universal parser was, in fact, not new, as both Bever (1970) and Kimball (1973), among others, had already made similar claims. Frazier (1987: 565) expressed it with unabashed bluntness: “we should be able to remove the grammar of English from our theory of sentence processing, plug in the grammar of some other language, and obtain the correct theory of processing of that language”. The GP model gained considerable traction in the 1980s. Part of the reason for this was no doubt the fact that psychologists and linguists were eager to cooperate again after the Derivational Theory of Complexity (DTC) phase, which at the time was seen as a failure of linguistics to come up with a realistic account

24

Modifer adjunction with special reference

of language that could be addressed in the laboratories (see Chapter 5). The model wielded the evidence on syntactic ambiguity resolution accrued in the late 1960s and all the 1970s (basically work on the notion of processing heuristics) to propose a reasonable idea: given that we humans often misparse syntactic structure (e.g., the horse raced past the barn fell), but even more often parse syntactic structure right, we must have a way of eliminating unwanted analyses whose overall rate of success (that of ‘the way’, not of the ‘analyses’) is sufciently functional. Additionally, given that we process some ffteen to seventeen phonemes per second, simultaneously with building phonological structure, syllabic structure, morphological structure, phrasal structure, semantic decomposition and referential tracking at the very least, our parsing system must actually behave like an oracle with access to the right and most trustworthy kind of information, for it clearly has no time to waste. According to Fodor and Inoue (2000), the human parser should lean towards the ideal of minimal processing and typically do the minimum amount of analysis that is necessary to get the parsing job done. In sum, GP theoreticians invoked the idea of computational economy and serial, automatized modularity of syntax as the solution to the puzzle of language comprehension.3 The basic idea is that when faced with a structural ambiguity the parser treats it as if there was no ambiguity at all; for an experienced parser (and/or an animal born into this world with a genetically determined head-start in grammar) this will yield successful parsing more often than not. That is, MA and LC are applied automatically and when they lead to an analysis that ends up being incorrect, reprocessing becomes necessary (Frazier & Rayner 1982; Ferreira & Henderson 1991). This explains our sensations with garden path sentences: both the fact that we do experience garden paths sometimes and the fact that, even more often, we do not, because we usually bet on the right horse past the barn (period). Syntax is the gateway into the whole system and meaning is supposed to wait till syntax is deployed. This translates into a rather tangible set of predictions concerning processing time: interpretations consistent with Late Closure and Minimal Attachment are hypothesized to take less time than those consistent with Early Closure (EC) and Non-Minimal Attachment. Additionally, when eye-tracking techniques are employed, reprocessing eforts should manifest themselves in a pattern or regressive saccades. When it comes to LC, in particular, the GP model rightly explains why in (25) ‘last night’ should stay low in the tree (Frazier 1987): (25) The reporter is saying the plane crashed last night.

Frazier and Rayner (1982) examined the reading times for sentences like (26)–(27):

Modifer adjunction with special reference

25

(26) Since Jay always jogs a mile and a half this seems a short distance to him. (27) Since Jay always jogs a mile and a half seems a short distance to him. and found that RTs were higher for (27), presumably because the phrase ‘a mile and a half ’ is adjoined to ‘jogs’ (so the phrase containing ‘jogs’ is not closed, or is closed late), so when a little later the verb ‘seems’ shows up, reprocessing is necessary to reinterpret it as its subject, instead of as the object of ‘jogs’. RTs thus revealed the parser’s bet, and this was consistent with LC. Although the Complex NP + RC template did not fgure much in the earliest discussions of proponents of GP, it soon became clear that the model’s prediction was that when the processor is faced with a structure like (13) above, it should automatically apply LC, which means blindingly opting for proximal adjunction to ‘actress’ (also known as Low Adjunction/Attachment, or LA), instead of to the distant head of the CNP, ‘servant’ (also known as High Adjunction/Attachment, or HA); see Figure 2.1.4 Recall that we need some kind of methodological manipulation to detect the adjunction preference. In languages with strong morphologies this is very easy indeed. Imagine we put masculine gender on the N1 and feminine gender on the N2 and an adjective in the RC that must agree with either one or the other noun. We make it agree with N1 in one experimental condition, with N2 in another experimental condition and if in one of the conditions the adjective is read statistically faster then that is the preferred adjunction pattern (e.g., Spanish: el criado de la actriz que fue asesinado/a, ‘the servant of the actress who was killed’). In languages such as English, with little in the way of morphology, we need to have recourse to other manipulations (say, with refexive pronouns, which do have gender), often of a semantic or pragmatic kind (say, a phrase like who was pregnant). Before we leave the GP model, it is as well to add a refnement of it which its proponents saw ft to undertake in view of the frst evidence that came up against NP

Det

NOM

N1

N2

FIGURE 2.1

RC

(They envied) the servant of the actress who was on the balcony

26 Modifer adjunction with special reference

it (see section 2.3.2, immediately below this section). From its inception, the model was criticized for not bothering at all to specify how in a two-stage process where frst is syntax and then all the rest, the ‘all the rest’ part was left a blank space. Initially, Rayner et al. (1983) had simply proposed a “thematic processor” as a mechanism that evaluated information about thematic relations activated by the words in the syntactic templates. A vague role played by discourse and context ft was also initially suggested (Clifton & Ferreira 1989). Finally, Frazier (1990) and De Vincenzi and Job (1993) proposed the principle of relativized relevance, understood as follows (Frazier 1990: 311): Relativized Relevance: Other things being equal (e.g., all interpretations are grammatical, informative, and appropriate to discourse), preferentially construe a phrase as being relevant to the main assertion of the current sentence. Even at the second stage, after syntax, relativized prominence was to act under strict formal guidance. This was made possible by an appeal to the Chomskyan notion of a theta-domain. Thus, ambiguity resolution was confned to the domain initiated by the last theta-assigner, that is, the last word capable of assigning a theta-role (see more on this below). Assuming that the preposition ‘of ’ in (13) is just a case assigner, but not a theta-assigner (it does a linking job similar to case in case languages but it means nothing), the last theta-assigner in (13) is the verb ‘envied’, which means that the ambiguity region covers the entire complex noun phrase (CNP) plus the RC.5 Inside that domain, discourse ft and lexical biases of whatever kind can play a role during the second stage. All this means that initially the RC is to be adjoined low to ‘actress’ by LC but that it may be later re-adjoined to ‘servant’, since ‘servant’ is, after all, the head of the nominal compound and therefore highly relevant (it is, in fact, the main “assertion” of the sentence). Relativized relevance was also meant to apply universally to all languages in the world. This intelligent solution, couched in popular generative principles of the time, did not, however, thrive much. Thus, on the one hand, evidence of the shift from low to high attachment was hard to fnd (see Mitchell & Cuetos 1991 for a detailed discussion). On the other hand, it was also hard to fnd evidence that pragmatic and semantic factors afected the choice of adjunction pattern precisely at a second stage (e.g., Carreiras 1992; Zagar et al. 1997). For instance, Carreiras (1992) used sentences that were semantically biased towards an LC interpretation (e.g., “Alguien disparó contra el criado de la actriz que estaba en el scenario”/Someone shot the servant of the actress who was on the stage) but failed to revert the Spanish refex to attach high to ‘criado’/servant (see section 2.3.2).

2.3.2 The role of frequency: Tuning The frst blow to the universalist theory of language processing, in general, and to the GP model and the LC strategy, in particular, was dealt by a study

Modifer adjunction with special reference

27

NP

NOM

Det

N1

N2

FIGURE 2.2

RC

(They envied) the servant of the actress who was on the balcony

conducted by Cuetos and Mitchell in 1988, published in Cognition. In two questionnaires and three online experiments with 130 native English-speaking and Spanish-speaking undergraduates, the Anglo-Spanish team showed that only the English-speaking natives obeyed LC. For sentences like (13), the questionnaires were followed by direct questions like “Who was on the balcony?” The Spanish group registered an Early Closure choice in 62% of the trials. The self-paced studies used pragmatic disambiguation in the form of an extra phrase like with her husband, which is supposed to link almost automatically with a female referent (Figure 2.2). Early Closure or High Attachment were soon confrmed in numerous questionnaire and online studies using similar materials in Spanish (e.g., Carreiras 1992; Carreiras & Clifton 1993, 1999; Mitchell & Cuetos 1991; Mitchell Cuetos & Zagar 1990) and other languages. For instance, in French, German and Dutch both questionnaires and online data soon also confrmed the anti-LC preference (Zagar & Pynte 1992; Hemforth et al. 2000; Mitchell & Brysbaert 1998). The English data have generally been less consistent. Thus, for instance, in two questionnaire studies, Mitchell and Cuetos (1991) obtained evidence consistent with an LC bias, but a few other studies have shown no online inclination in either direction (e.g., Carreiras & Clifton 1993). Early Closure has been seen to generally prevail cross-linguistically, but English is not the only exception. Some evidence of an English-type bias has been seen in Arabic, Romanian, Swedish, Norwegian and European Portuguese, but this (often made public in conference papers and posters only) has generally been less strong (Abdelgany & Fodor 1999; Ehrlich et al. 1999; Maia et al. 2006). The cross-linguistic diferences came as a shock to a community of researchers who had simply assumed (reasonably, it may be said) that even if the notion of universal grammar turned out to be questionable, there would be no serious grounds for not believing in universal parsing. As for the obvious computational merits

28 Modifer adjunction with special reference

of a locality bias (LC), these had been taken for granted too. A new model that espoused the notion of language variation head-on suddenly seemed necessary. This model should provide an explanation for the cross-linguistic disparities. The Tuning Hypothesis (Mitchell & Cuetos 1991; Mitchell et al. 1995) thus emerged as such a model, and the essence of the needed explanation is a classic player in cognitive studies: the role of experience in shaping cognitive biases.6 Thus, the basic idea was that parsing is not driven by general principles of a structural kind, but rather by the experience that language-users have accumulated on previous encounters with similar strings in the past. That is, instead of assuming (like Bever, Kimball and Frazier) that the human cognitive system is severely limited in its resources (forcing it to minimize storage and making sure that new structures are immediately adjoined to ongoing constituency to ease memory), Tuning made room for the idea that there may simply be important variations in the hierarchies that order strategy-dominance cross-linguistically, and that LC was just one important strategy, among others. For the structure that occupies our attention here, this means that readers should simply be expected to opt for the kind of disambiguation they have found to be statistically more felicitous in the past (be it low or high adjunction). Notice that, though not computationally grounded, this model is still a universalist one in that experience is seen as the factor at play in all linguistic events in the world. In the vein of well-known ideas in linguistics at the time, the model simply adopted a strong form of parameter-setting in order to salvage the idea of a universal parser. In sum, parsing continues to be universal but parametrized, with parameters being set by actual experience of an individual with his/her particular language. This new view is generally also compatible with exposure-based accounts in both linguistics and psycholinguistics (Slobin & Bever 1982; MacWhinney & Bates 1989; Bever 1970; MacDonald et al. 1994; Trueswell et al. 1993; etc.), including memory-based models and surprisal models that are currently fashionable (Vasishth et al. 2010; Levy 2008a, 2008b, 2011). Needless to say, the appeal to an exposure-driven mechanism is too general to account for the fact that speakers of diferent languages may exhibit diferent processing biases. In the context of initial investigations an obvious fact leaped to the eye, namely, that initially at least most languages that skipped LC were all Romance languages. This prompted the gestation of an ingenious idea: in these languages readers deploy a ‘jumping over the modifer’ strategy to adjoin the RC to the head of the CNP (‘servant’ in (13)). Why would they do that? The rationale was that modifer straddling made sense in languages that, unlike English, use more post-modifcation than pre-modifcation (compare the kitchen table to la mesa de la cocina in English and Spanish, respectively, with kitchen/cocina a modifying element). For the speakers of these languages, modifer straddling would act as a special-purpose strategy competing with LC, and this strategy would have to be ‘switched on’ as a parameter by experience with the language in question. Soon Dutch, German and other non-Romance languages made the specifc appeal to modifer straddling hard to keep (Brysbaert & Mitchell 1996; Hemforth et al. 2000).

Modifer adjunction with special reference

29

GP theoreticians who saw their model leaking heavily on the LC front noticed other potential pitfalls of the Tuning account as well. Clifton (1988) reasoned that the fnal segment that provided the pragmatic disambiguation (‘with her husband’, biased towards LC) might not have taken longer to read because readers were obliged at that point to revise a previous early closure commitment, but simply because more time is logically necessary to adjoin a constituent that is simply ambiguous and can, therefore, be attached to more than one place. Notice that this counter-argument does not mesh well with the GP idea that ambiguous strings are treated by the parser as if they were not ambiguous at all. That is, if the Spanish subjects were obeying LC on rails, the temporary ambiguity should not have been noticed. If the problem (refected in RTs) is the ambiguity, then they were not treating the region in question as if it were not ambiguous. A second objection to the idea that there is an anti-locality bias in Spanish had to do with methodological issues, in particular with the segmentation used in the online studies of Cuetos and Mitchell (1988). Remember that in a self-paced study segments are made available by the participant pressing a computer key. According to Gilboy and Sopena (1996), in the following presentation mode: First key press: Somebody saw Second key press: the servant of the actress Third key press: who was on the balcony … the segmentation introduced a bias against the LC reading of the sentence because, of the three displays used in it, one contained the entire CNP, whose head is the frst noun, thereby undermining the locality refex of joining the second noun to the proximal RC (which was not so proximal anymore, as it came in another display). Similar observations were made to account for the surprising fnding by De Vincenzi and Job (1993) that Italian behaved unlike Spanish and French and more like English in showing an LC preference (Carreiras & Clifton 1993). However, both Mitchell and Cuetos (1991) and Carreiras (1992) obtained similar results to those of Cuetos and Mitchell (1988) in studies whose segmentation did not separate the second noun from the RC. Overall, these objections did not pass the verifcation test, which means that cross-linguistic diferences in parsing adjunction phenomena (particularly RCs) still need to be accounted for. It also means that an account that invokes frequency of use instead of structural properties of both the language faculty and the human memory system still needs to be seriously contemplated. The exposure-based account entails a close match between production and comprehension; more in particular between corpus measures and online ones. Thus, if a language-user deploys a particular structure-building (and ambiguity resolution) strategy based on his/her previous experience with similar structures in the past, then that accumulated language-specifc experience should be observable in corpora. This is precisely what Mitchell et al. (1992) found. Using

30 Modifer adjunction with special reference

what for present times can only be considered a small corpus of English and an equally small corpus of Spanish (circa 450000 words), they managed to show statistically that more ambiguous RCs were disambiguated towards the local noun in English and towards the higher one (the head of the CNP) in Spanish (permanently ambiguous clauses had to be discarded, of course). This seemed like very promising data for the Tuning account, and it opened the way for a search methodology that aligned psycholinguistics (where corpus analyses had until then not been seriously contemplated) and linguistics (where they have typically been a frequent alternative). Cuetos et al. (1996: 175) provide insightful ramifcations of an account solely premised on frequency of exposure: The tuning hypothesis allows one to make two diferent kinds of prediction. The basic proposal is that ambiguities are initially resolved in line with the statistical prevalence of the alternative readings in the language as a whole. So, on the one hand it is possible to estimate these statistical values (using corpus studies) and hence predict the patterns of preference to be expected with individual linguistic structures. However, in addition to these static, well-established biases the model can also be used to predict more local or temporary biases. A central feature of any full account of tuning would be a description of the mechanism used to keep tally of the occurrence of events in diferent categories. In order to respond to cumulative frequencies such a mechanism would have to be subject to incremental change, and if there was a shift in the input distribution this would have to be refected in some way in the device’s internal settings. If sufcient weight is given to recent samplings, the tuning hypothesis will predict that preferences will show short-term changes on the basis of exposure patterns over the preceding minutes, days or weeks. Thus, the model predicts that parsing preferences will change if, during some period prior to testing, the reader or listener has been exposed to an unusual preponderance of one ambiguity resolution rather than another. (emphasis added) Notice that the role that the Tuning theory allocates to individual experience expressed in this quote is very much in line with current thinking in various corners of linguistics, such as Cognitive Grammar or Construction Grammar, and psycholinguistics, such as memory-based models and surprisal theory (see Chapter 5). At the time of writing (mid-1990s), the shaping role of experience was being tested on other structures as well. For instance, as the authors themselves point out, Tabossi et al. (1994) compared the corpus and latency data for sentences containing verbs ending in -ed. We noticed in the Introduction that such forms are ambiguous between a main verb versus reduced relative interpretation (the evidence examined versus the witness examined …). Tabossi et al. provided evidence that, in general, the main clause interpretation prevails in

Modifer adjunction with special reference

31

corpora, and this is exactly what happens in online experiments, where (other things being equal) such interpretations are read more quickly. To add to the far-reaching role of experience, remember that MacDonald et al. (1994) managed to show that specifc verbs have their own specifc disambiguating profle; thus, those verbs that favour a main clause reading in online experiments do tend to appear in corpora as main verbs more often (and the other way around; Trueswell 1996; on the use of probabilistic information in parsing, see Levy 2008a; Gennari & MacDonald 2009). Corley et al. (1993) compared the corpus and online profles of the contracted auxiliaries had and would and proved that ‘d is more likely to be interpreted as meaning had than would, in line with corpus counts. As for the role of manipulated, concentrated rounds of experience in shaping individual processing biases, Cuetos et al. (1996) report an “intervention study” in which they exposed two groups of Spanish children to either early closure or late closure sentences over a period of two weeks (the children were made to read stories containing such biased materials). A week after that, they were given a questionnaire to fll in and the results indicated that the exposure regime had altered their adjunction choices relative to an initial questionnaire they had taken right before the experiment began (on experience infuencing parsing decisions, a sort of priming in comprehension, see also Kaschak & Glenberg 2004; Wells et al. 2009; Vasishth et al. 2010; Pickering & Garrod 2013). However, the ‘actuarial’ approach that Tuning entailed is problematic on two grounds: frst, the theory does not specify what precisely an experience-based parser targets as a segment worth tabulating and storing (a word, a phrase, a ‘simple’ phrase, a clause, etc.). It is thus too open-ended. The second is that the theory does not really provide an explanation for the attested cross-linguistic variability but rather pushes the explanation one step back. That is, the theory descriptively correlates an online processing bias with large-scale frequency tallies but the question remains: why do, say, English and Spanish exhibit diferent tallies in the frst place? The frst issue is known as the grain size problem, and it has ramifcations to the present day. We address it now: Tuning’s grain size problem. In a paper devoted entirely to a discussion of the grain size problem, Mitchell et al. (1995: 470) defned the problem cogently: Any algorithm that makes decisions on the basis of past experience depends in part on procedures for recording and storing relevant features of that experience. In addition, to implement decisions, there must be a process for recovering “appropriate” records and using this information to execute the resulting action. The success of this process depends upon establishing a useful link between aspects of the current material and corresponding features of the established records. This is essentially a category-selection or pattern-matching problem. In the case of an exposure-based parsing mechanism this problem is complicated by the fact that almost any kind

32 Modifer adjunction with special reference

of linguistic pattern could potentially be exploited in procedures of this kind. (emphasis added) Applied to the problem at hand (the CNP + RC template), an actuarial device might keep records of (at least): (a) the entire CNP + RC complex; (b) the entire CNP + RC complex for CNPS with more than two attracting sites (the lamp near the tables with the tops which …); (c) the entire CNP + RC complex when the preposition is ‘empty’ of versus when it is a diferent one (with content; say, the steak with the sauce that …); (d) when the noun in the frst NP is already modifed (the amazing servant of the actress that …); (e) when the noun in the second NP is the one that is modifed (the servant of the amazing actress that …); (f ) when either of the two nouns (if there are only two nouns) has more than one premodifying adjective (the servant of the quietly-sitting, tall, blonde and amazing actress that …); (g) when the frst noun is human or animate and the second not so (the owner of the suitcase that …); (h) when the frst noun is neither human nor animate and the second one is so (the suitcase of the boy that …); (i) when the frst NP contains a noun of high lexical frequency; ( j) when the frst NP is defnite; etc. Linguists will surely recognize issues having to do with topicality and referentiality in the preceding list and how these may relate to the functional job of an RC, which is to narrow down the reference of a previous nominal head.7 The important thing is that, as the list suggests, the range of grains is almost infnite (and, it may be anticipated now, many of these potentialities actually became true in later experiments; see below). This means that since the records may be too many and too large, the problem of quickly discarding unwanted alternatives may not be solvable. Yet we do need an oracle that does just that: seeing ahead and seeing fast and efectively and, based on our seemingly highly efcient linguistic exchanges, we do seem to have one such mechanism in place. It appears that the oracle must choose between a fne and a course grain: The more detailed the records, the higher the chance that the stored information can be used to predict the correct analysis in any new sentence. However, highly detailed records would presumably be costly to maintain in computational terms (…). Course records would be easier to maintain, but the pooling of information would inevitably lead to loss of precision. (Mitchell et al. 1995: 472) Most research done on lines compatible with a fne-grain approach to the role of experience in parsing quickly sided with connectionist models of language and the mind (Tabossi et al. 1994; Trueswell et al. 1994; Spivey-Knowlton & Sedivy 1995). However, these did not target RCs in complex NPs in particular, or indeed the adjunction of modifers in general, being more concerned with nuclear argument-structure relations instead, which are, in their turn, more amenable to lexically driven efects.8 Those who did study RCs were the proponents of the Tuning model (Cuetos, Mitchell, Brysbaert, Corley, etc.), and

Modifer adjunction with special reference

33

they did not see in what way lexical factors could explain the cross-linguistic disparities in such a seemingly geometrical operation as adjunction. They thus opted for a course-grain approach to syntactic ambiguity resolution, at least for the adjunction of modifers to their heads. A substantial part of this debate is summed up in the opposition between lexicalists and syntacticians, much in the same way that that fork structures much of the world of theoretical linguistics. For the former, the lexicon is such a powerful container of information of all kinds (including especially semantics) that parsers are unlikely to ignore it at a frst round of processing, or indeed at any stage during processing. For the latter, the lexicon is often seen as a formless mass, too large and difuse to help guide initial decisions: only a few privileged syntactic structures and/or principles can stand any chance. In the end, Tuning renounced structural principles like LC and yet remained a serial, syntactocentric and universalist model of language understanding (like GP) in that they maintained the view that what parsers do is to keep records of syntactic structures, not of the words that make up those structures, or of anything beyond syntax. They aligned with the syntacticians (on syntactic ‘schemas’, see for instance Jackendof 2017; Goldberg & Bencini 2005; or Ziegler et al. 2019). But they were wrong, and it was research done within their own team that made that clear. Mitchell and Brysbaert (1998) focused on Dutch to prove that corpus data and online data did not match in that language. Specifcally, Dutch revealed (again) an online preference for high adjunction (anti-LC) but a low adjunction preference in corpus (which gave impetus to the idea that comprehension and production might not be identical). The interesting thing happened when Desmet et al. (2002) re-evaluated the mismatch. A close look at the corpus data revealed that the structure that had been put to the test in the overwhelming majority of previous research was one that involved two nouns designating humans (like (13), a classic in the literature). That setup was probably due to the fact that original research was done with English materials and in English the that/who distinction has a disambiguating efect, which is why, to preserve ambiguity in experiments, two animate NPs that can both be linked to who were used. When materials were translated or adapted to other languages for which the that/who distinction does not apply, the animacy manipulation was smuggled in. What Desmet et al. saw was that the [Animate NP1 + Animate NP2] template was statistically irrelevant in corpora (a mere 3% of all CNP + RC structures). This was crucial as all conclusions based on such an infrequent confguration had been applied to the entire CNP + RC template. When animacy (a typically lexical, fne-grain notion) was taken into account, they uncovered much more convergence between corpus frequencies and online data. There was, in particular, a very solid tendency to adjoin the RC high to the frst noun if this was animate (and the second one inanimate; the opposite setup (Inanimate NP1 + Animate NP2) did not converge so well, as no low bias was revealed). They used these results to reinstate the role of frequency in moulding processing biases but this reinstatement came at the cost of having to renounce

34 Modifer adjunction with special reference

syntactocentrism: the victor was the fne grain, not the course one. Indeed, if the geometry of adjunction (a presumably ‘formal’ afair) can be impacted by such quintessentially non-syntactic forces as the meaning of the lexical items that go into the syntactic structures, then syntactic structure per se is not at the helm – or least not at the helm alone. Using eye-tracking, Desmet et al. (2006) soon provided further evidence for the animacy manipulation and felicitously explored a new one of a similar kind: the concrete/abstract distinction: when the frst NP codes a referent that is both animate and concrete ( father, mother) as opposed to animate and abstract (government, police), high adjunction is chosen in both online and ofine measurements. Additionally, Acuña-Fariña et al. (2009) ofered evidence of even more alignment of online and ofine data for Spanish. They conducted a corpus study (1.5 million words) and two self-paced reading experiments. In the corpus, the scarcity of the Animate NP1-Animate NP2 condition was confrmed (3.2%). Overall, 59% of all sentences showed a high attachment preference (consistent with previous research; Carreiras 1992; Carreiras & Clifton 1993, 1999; Cuetos et al. 1996; see Carreiras et al. 2004 for ERP evidence). Revealingly, that tendency disappeared when NP2 was animate. When NP1 was animate and NP2 inanimate, high adjunction was strongest, as if the general preference for high adjunction (the course grain) and the specifc preference for an animate host (a fne grain) activated a coalition of forces. The online data largely converged. However, as Fraga et al. (2012: 4) point out: neither in the Dutch series of experiments nor in the Spanish one could the match between the corpus and the online data be shown to be perfect. For instance, in Dutch, the inanimate–abstract NP1 + animate NP2 should in all logic show the most robust of the NP2 adjunction preferences, but Desmet et al. (2006) could only register a nonstatistical trend in that direction in their eye-tracking data. This is surprising given the fact that the inanimate–animate condition is also the predominant confguration in the corpus analysis (not in Spanish: inanimate–inanimate). In the second place, the animate–animate type draws adjunction to the frst NP despite overall corpus preference for NP2. For that category specifcally, it is true that a match was reported between the corpus and the reading data, but we still need to know how the general attachment bias and the specifc one for each category interact. If animacy/ concreteness is the key, one would expect no robust biases there and, given the overall NP2 corpus preference of Dutch, maybe even a slight NP2 bias. The fact that animacy seems to be strong on NP1s only is therefore not entirely clear. In that respect, the preference for NP1 in Spanish animate + animate phrases may indeed be attributed to coarse-grain-level frequency efects, as indeed, overall, NP1 is preferred in Spanish. (emphasis added)

Modifer adjunction with special reference

35

To many linguists, the role of animacy (= the fne grain) in shaping grammatical biases (and, by extension, processing biases) may not be too surprising, given what we know about its similar role in, for instance, the grammar of agreement (Corbett 2006) and topicality (Lambrecht 1994; Goldberg 2006). More surprising is surely the fact that the fne grain may extend even to such ‘airy’ dimensions as the emotional connotations launched by the lexical pieces that compose the CNP. In three completion experiments (production, therefore), Fraga et al. (2012) manipulated that. The rather radical hypothesis they were pursuing was to study whether nouns such as rape or holocaust interfere in the general course-grain preference of Spanish to adjoin the RC high. In a nutshell, given that general NP1 bias, can that preference be reverted if an emotionally charged noun like genocide occupies the NP2 position? Such a ‘charge’ is a purely lexical charge. Towards that, they carefully manipulated the valence (pleasant versus unpleasant) and arousal (high (‘rape’) versus poor (‘table’)) of the nouns in the CNP. Their results indicate that both pleasant and unpleasant words (vs neutral) do afect the general NP1 preference. In fact, the efects induced by words of high arousal, in particular, were quite robust, in that they always drew the RC towards the NP where they happened to be placed, be it high or low. As with animacy, when NP1 was emotionally loaded high adjunction was strongest (see more of this below, section 2.3.5).9

2.3.3 A refned two-step account: Construal By the end of the 1990s, it had become clear that there was something strange about the processing of seemingly mundane subject relatives. Most languages studied using online measures (basically, self-paced reading and eye-tracking) revealed an anti-locality bias, but in no way was this the case for all languages studied (e.g., English). This is a puzzle, at least for many theoreticians who steadfastly believe in the existence of universal parsing strategies or principles.10 Notice that even though in the psycholinguistic world a failure to enforce LC was (and is usually still) interpreted as a failure to enforce a syntactically guided way out of a processing dilemma, there is in fact a priori nothing asyntactic about Early Closure. In fact, the very opposite may be true: when parsers opt for a distant host for an RC they do not just pick one randomly: they pick the head of the CNP.11 Linguists know that the problem lies in determining whether the head is a head based on formal or conceptual properties, but this fner issue generally escaped the research agenda as such. The fact remains that if the notion of head has any formal essence to it, its syntactic nature (as an abstract pivot of an enlarged hierarchically structured syntactic object) is actually of a much higher theoretical calibre than a simple locality bias. After all, determining the head of the structure in a phrase like the absolutely adorable, quaint, marble candlesticks on the top of the sideboard in Jane´s living room requires a much more sophisticated mental analysis than that needed to determine which element is more proximal to any other one. This fact is rather surprisingly overlooked in the literature to date.

36 Modifer adjunction with special reference

Descriptive sophistication coming from the realms of linguistics is evident in the way GP theoreticians reacted to the problems posed by RCs failing to comply with LC universally. Frazier and Clifton (1996) proposed a new model of language comprehension, Construal, according to which a distinction needs to be made between primary and non-primary constituents of the sentence (see also Gilboy et al. 1995; Frazier & Clifton 1997; Frisson & Pickering 2001; for a review, see Mitchell & Brysbaert 1998). As a frst step, for primary constituents, LC and MA will still be mandatory in all languages and for all speakers. Nonprimary constituents, however, are to be merely loosely associated to their hosts, not tightly attached or adjoined to them, in a second step. Associations are resolved using all kinds of information, including not only semantics but pragmatics in the form of Gricean principles like ‘be clear’, ‘be informative’, etc. (Frazier & Clifton 1996: 31–32). Final interpretations on an attachment decision are thus reached based on a Construal Principle, which was nevertheless reined in formally, in the way that had already been designed for the notion of relativized prominence, now endorsed more forcefully in the new model: The Construal Principle: associate a phrase XP (which cannot be analysed as instantiating a primary relation) into the current thematic processing domain; interpret XP within the domain using structural (grammatical) and nonstructural (extragrammatical) interpretive principles. Current thematic processing domain: the extended maximal projection of the last theta-assigner. (Gilboy et al. 1995: 134) Basically, primary constituents include the subject and the main predicate of a clause and their obligatory constituents (i.e., complements), that is the phrasal projections of argument-structure.12 Non-primary constituents include, among others, modifers of all kinds (e.g., RCs) and phrases linked via conjunction. In sum, under Construal, an RC is not defnitively linked to a host but simply “foats” in a processing domain until disambiguating information becomes available (Cokal & Ferreira 2018). This lack of commitment of the parser, which entails that actual dominance and sisterhood relations, may not be determined for a while (Frisson & Pickering 2001; Sanford & Graesser 2006; Sturt et al. 2004) and is in keeping with later developments in the literature and popular views today that emphasize notions such as that of underspecifed or goodenough representations, late assignment of syntax, and noisy channels (Townsend & Bever 2001; Sanford & Sturt 2002; Ferreira & Patson 2007; Swets et al. 2008; Gibson et al. 2013; Slattery et al. 2013; Karimi & Ferreira 2016). Under all these approaches, quick ‘dirty’ interpretations are formed by the application of various forms of heuristics, and these need not blindingly follow a strict formal grammar (see Chapter 5). Construal is fexible enough to accommodate a large variety of possible fndings, like the cross-linguistic diferences in the processing of RCs that the model

Modifer adjunction with special reference

37

was born to resolve. How does it do that in particular? In principle, quite elegantly, it may be said. Here is the explanation: when a Spanish speaker hears or reads an ambiguous CNP + RC complex, s/he is supposed to remain agnostic as to whether the RC is to be associated high or low in the unfolding tree, the basic, skeletal argument-structure of which would have already been determined by MA and LC. Since the syntax ofers two alternatives, s/he will apply Relativized Prominence (‘attach to the main assertion of the sentence’; Frazier 1990: 321). This will point to the head of the CNP (‘servant’ in our habitual example, (13) above). Note that Relativized Prominence is designed to capture the idea that attachees are more inclined to seek referential hosts (so nouns with determiners to nouns without determiners, for instance, other things being equal).13 This directly explains the results found for Spanish and so many other languages. Why would English not behave in the same way and associate the RC high to the more prominent host too? Enter the Gricean principles now. In English, the prepositional structure NP1 PREP NP2 (the servant of the actress) coexists with the Saxon Genitive (the actress’ servant) but the two variants are not completely identical. If one chooses to use the latter, there is in fact no ambiguity when an RC is associated to it: in the actress’ servant that … the RC can only be associated to servant, not to actress, which is why the only reason for choosing the rival prepositional structure would be if the preferred association is to actress. This ingenious explanation invokes principles like ‘avoid obscurity’, ‘be clear’, ‘be brief ’, etc., of an eminently discoursal nature. Two pieces of evidence are, in principle, consonant with the model’s theoretical guidelines. Thus, in the frst place, often readers process sentences with ambiguous relative clauses (say, the niece of the actor who was in the hall) faster than unambiguous ones (say, the nieces of the actor who was in the hall), except when such sentences are followed by a direct question that obliges them to commit to a specifc interpretation. When that is the case, they often take longer to read the ambiguous structure, presumably because resolving its ambiguity fnally takes a processing toll, after the initial ‘hovering’ period (Swets et al. 2008).14 In the second place, the theory’s ‘linguistics’ sophistication forced researchers to come to terms with the typical (and very real) complexity of linguistic structure. This has to do with the invocation of a current processing domain couched in yet habitual GB terms by the turn of the century. It turns out that once a processing domain is cast in reference to the last theta-assigner, theta-assigners come into the picture with their own complexities, and these change the way we see the grain of analysis.15 This is because in a CNP of the [NP1 PREP NP2] type, the preposition itself may be the last theta-assigner, if it has predicative properties (loosely speaking, if it has some sort of semantic content and is therefore not just a case-assigner like ‘of ’ in a book of poems). Whenever that is the case (say, in the steak with the sauce that …, or the book of the student that …, where ‘of ’ assigns genitive) the processing domain does not extend to the frst NP, which means that the RC is to be preferentially construed in reference to the second one only (yielding LC). However, when the preposition is thematically empty, as in the design of the picture that …,

38 Modifer adjunction with special reference

the last theta-assigner is the last verb, which means that the entire CNP is inside the processing domain.16 It is in this case that the association with the RC can be resolved only by consulting all kinds of sources of information, including notably referential properties of the two nominal hosts and their relativized prominence, usually pointing to the head. The importance of referentiality is obvious from the fact that structures like the table of wood hardly ever lead to an NP2 association, as the second noun has no determiner and is, therefore, incapable of referential power. Gilboy et al. Frazier (1995) manipulated the preposition in the CNP in both English and Spanish in a variety of structures and demonstrated that the cross-linguistic diferences practically disappeared when the type of preposition was controlled for. In fact, most of the diferences between the two languages were confned to the contrast between the ‘alienable possessive’ type (the computer of the student that …) and the ‘kinship relationship’ type (the cousin of the girl that …). That is, it was circumscribed to precisely the types where the Saxon Genitive prevails in English. In a large number of other cases both languages basically aligned their biases. Soon, when Baccino et al. (2000) made the point that Italian is an NP2 language, Frenck-Mestre and Pynte (2000a, 2000b) were quick to reason convincingly that their results were biased by the Italian authors not controlling the diferent prepositions used in their questionnaires. The two-step solution was a clever way of facing the cross-linguistic disparities and it rests on a distinction that very few linguists would be willing to ignore: the diference between complements and modifers (Ford et al. 1982; Abney 1989). This is no doubt an asset of Construal. Likewise, in principle, the attempt to anchor processing predictions to sophisticated machinery in linguistics is both brave and welcome, as it fosters a kind of isomorphism between linguistics and psycholinguistics that eschews the easy isolability of one’s linguistic theories when such theories fail to converge with results coming from the labs (see Chapter 5). However, the theory has problems. The frst problem is of a general kind and has to do with the fact that when modifers are concerned, the theory can accommodate any fnding. This is because the theory basically says that, within a theta domain, anything can happen. This is, ironically, precisely what GP theoreticians have always accused rival, semantically oriented theories of: their inability to be subject to specifc, falsifable predictions (see section 2.3.6). Positing a suspension of action and a late intervention of all kinds of factors is of little use unless one is able to spell out the relative ranking of at least a few of all those factors, as these are potentially too many (remember our discussion of the grain size problem above; Deevy 2000; Traxler et al. 1998). Other problems are more specifc. Soon after the theory was launched, in an eye-tracking experiment Zagar et al. (1997) found that the French NP1 bias was too solid to be the result of ‘hovering’ while looking for (late) interpretative clues. Early measures of processing revealed no contextual efect in any direction, suggesting that the Early Closure/High Attachment preference in that language was more structural than ‘construable’. This ties in with the fact that when faced with structures like (3) above, repeated here:

Modifer adjunction with special reference

39

(28) The lawyers will read the documents your accountant and his team sent them tomorrow, the parser does not actually wait for ‘tomorrow’ to be associated where it must (up in the tree) but clearly experiences a garden path because it automatically links/adjoins it to ‘sent’, locally. This casts doubt on the seemingly logical nature of the primary vs secondary distinction (complements versus modifers), in a way that is not easy to solve. The problem seems to be something that afects SRCs in particular, rather than other types of modifers, and it is surely intriguing that it is the other types that follow LC so blindly. In this light, ironically, the Construal explanation may be said to have thrown the baby out with the bath water, for by renouncing mandatory and automatic LC, it now may be able to explain SRCs, but at the cost of not being able to explain much more that it could explain before.17 Additionally, Corley (1996) and Brysbaert and Mitchell (1996) observed that Construal does not have an easy way of accommodating individual diferences in attachment nor habituation patterns in a precise, falsifable way either. Ultimately, the most problematic evidence against Construal comes from places that address the most concrete proposal of the model: the Gricean explanation that is supposed to account for the cross-linguistic diferences between English and many other languages in the attachment/association of SRCs to CNPs. This was soon pointed out by Tuning advocates (Mitchell & Brysbaert 1998) and it has to do with the Saxon Genitive of Dutch. Dutch constructs genitive structures in three diferent ways: (a) via a preposition, as in Romance; (b) via the Saxon form, as in English; (c) via a possessive pronoun following the genitive (the examples are taken from Brysbaert & Mitchell 1996): (29) a. de hoed van vader b. vaders hoed c. vader zijn hoed ‘the hat of father’ ‘father’s hat’ ‘father his hat’. Although the Saxon genitive is becoming outdated in Dutch, being used primarily with family relatives and names only, it is still an option of the grammar. From that fact it follows that: According to the Construal hypothesis, the frequency of occurrence of the Saxon genitive should make no diference with respect to listeners’ use of Gricean reasoning to determine interpretation of relative clauses in NP of NP structures (in Dutch, NP van NP sequences). In fact, the mere licensing of the Saxon genitive by the grammar of Dutch should prompt listeners to prefer low attachments in ambiguous structures, as it presumably does in English, by setting the Gricean reasoning into motion. However, it turns

40 Modifer adjunction with special reference

out that Dutch speakers prefer high attachments, both of- and on-line (Brysbaert & Mitchell 1996). (Fernández 1998: 208) Mitchell and Brysbaert (1998) observed that Construal could be exonerated from the facts of Dutch if associations were made in reference to the general preference of the language at large. On this reasoning, if RCs associated to genitive nominals in English were far more frequent than to CNPs with prepositions, then it could be said that the Gricean solution would make functional sense in that language. Since in Dutch the proportion of Saxon genitives in the same confguration would be much lower, then in this language another solution would be sought by the parser. The problem with this explanation is that, as Mitchell and Brysbaert (1998) noted, it is practically indistinguishable from an exposure-based account like Tuning, particularly from one set to the coarse grain. Afrikaans, Greek and Croatian also have alternative genitive forms and results from those languages do not align with the Construal explanation either (Papadopoulou & Clahsen 2003; Mitchell et al. 2000; Lovric 2003).

2.3.4 The role of segmentation, prosody and silent reading: the Implicit Prosody Hypothesis As briefy noted, as early as in Gilboy and Sopena (1996) it was seen that a factor that had not been taken into account and that in hindsight seems of obvious relevance might ofer some explicative hope: the diferent segmentations used in the by then already quite large list of experiments. A major fork in these is between those who had opted for a large segmentation (the entire CNP + the RC) or a small one (NP1 + NP2 + RC). Segmentation may well afect diferent languages in diferent ways. Take English, for instance. Being a beat-based language (instead of a strongly syllabic one), English does not rest so much on a stable, rigid intonational contour but relies instead on a larger menu of phrasal pitch prominence instantiations. This has a lot to do with the mechanics of information structure and the notion of marked focus (Lambrecht 1994). Thus, in English there is a very vivid diference between Tom DID (end focus) and TOM did (marked focus), whereas in languages like Spanish or Italian focus is expressed via word order manipulations, usually pushing focal subjects to the end (aka default end focus: Toño lo hizo (‘neutral’) versus lo hizo Toño (end focus = subject is new information)). In these languages phrasal accent plays less of a role, which may make diferences in segmentation less important in English than in Spanish. In the latter, large segmentation may foster an NP1 bias (Early Closure) because the two NPs are presented at the same time but only one of them typically has clear relativized prominence (being the main assertion of the sentence). Efects seemingly originating in segmentation manipulations have been observed in various other domains (Pynte & Prieur 1996; Carlson et al. 2001). Importantly, it has often been assumed that segmentation relates to pronunciation via the notion of subvocalization.

Modifer adjunction with special reference

41

It has long been known that prosody (pitch prominence, intonation, rhythm, etc.) impacts auditory sentence processing (Cutler et al. 1997; Hirotani et al. 2006; Bornkessel-Schlesewsky & Schlesewsky 2009: ch 13; White et al. 2014) and, more particularly, syntactic ambiguity resolution (Pynte & Prieur 1996; Schafer et al. 1996). Thus, when two diferent trees compete and each one of them is clearly associated with a particular intonational contour, subtle variations in pitch prominence and phrasal accent help resolve the ambiguity. It is not so clear, though, whether that efect is immediate or delayed (entering the scene only after a garden path has already occurred), although recent electrophysiological evidence seems to converge on the idea that efects can really be immediate in both inducing or averting garden paths (Bögels et al. 2013; Roncaglia-Denissen et al. 2013). Steinhauer et al. (1999) are credited with the discovery of a seemingly isolable ERP component that subserves the working of prosodic boundaries: the so-called Closure Positive Shift (CPS; we defer an explanation of this electrophysiological type of research till section 3.3). Fodor (1998) proposed that cross-linguistic diferences in the processing of CNP + RC templates may well be due to prosody-related efects stemming from the size of both the RC and the CNP (of note is her famous dictum, actually the title of a 2002 paper of hers: “Psycholinguistics cannot escape prosody”). She claimed in fact that the diferences between this adjunction type and others (which typically obey LC) has to do more with the size of the modifying clause. Bearing in mind that RCs may be rather large structures relative to typical modifying PPs or Adjective Phrases (compare the short, white man to the man who came to see John the other day after realizing that his sister had been with him in the party which … etc., etc.), she proposed a ‘Same-size-sister’ principle according to which a constituent “likes to have a sister of its own size” (p. 285). Fodor viewed this principle as a sort of “anti-gravity law” in that, in the structure under analysis, small RCs should optimally attach low (to the second NP), whereas large, heavy ones should attach high by opting for a phonological phrase comprising the entire CNP, and since the head of this is the frst NP, they should end up being preferentially linked to this one. In this way, a principle of balance should prevent a short RC from attaching to the entire CNP, since this is long. The ‘Same-size-sister’ principle rests on the notion of implicit prosody, which Fodor (2002) understood this way: The Implicit-Prosody Hypothesis (IPH): In silent reading, a default prosodic contour is projected onto the stimulus, and it may infuence syntactic ambiguity resolution. Other things being equal, the parser favors the syntactic analysis associated with the most natural (default) prosodic contour for the construction. The way Fodor contemplates the broad picture of syntactic ambiguity resolution and parsing in general is to maintain that universal parsing strategies need not be abandoned if we make room in our theories for a prosodic processor working in

42

Modifer adjunction with special reference

parallel with the syntactic processor and, therefore, potentially afecting attachment decisions. Since languages difer in their prosodic templates, all the burden of accounting for cross-linguistic variation is in fact lumped on the prosodic processor. This idea is reminiscent of Frazier and Fodor’s (1978) Sausage Machine model. More recently, Fodor (2013) has argued that the difculty in processing center-embedded structures that became evident in research in the 1960s and 1970s derives from a misaligning of syntax and prosody.18 The evidence for the IPH seems to be mixed. Initially, very soon after Fodor’s proposal, the following languages were successfully tested: Brazilian Portuguese, Croatian, Dutch, English, French, German, Hindi, Japanese, Korean and Russian (according to Augurzky 2006, who ofers a useful review). For instance, Jun and Kim (2004) (and, later, Jun 2007) showed that speakers of Japanese and Korean generally opt for high attachment and the most common phrasing of the [RC NP2 NP1] template is indeed RC + NP2 NP1, in line with the IPH. In a study with Jabberwocky sentences, Wijnen (2004) manipulated the length of the CNPs too (as well as that of the RCs) and found “modest” efects in the expected direction. Jun (2003) analyzed the “default phrasing of a sentence (explicit prosody), defned phonologically”, in seven languages difering in such defaults (Farsi, Japanese, Korean, English, Greek, Spanish and French) and found a correlation between language-specifc prosodic phrasing and the resolution of RC attachment, which was interpreted by the author as “strong” support for the IPH. Hemforth et al. (2015) compared German, English, Spanish and French and found generally more high attachments for long relative clauses. They interpreted these fndings as providing evidence for the view that a combination of general processing principles (for all languages) and “independently motivated” language-specifc diferences are responsible for the pattern of results. Finally, Fromont et al. (2017) also managed to efect changes in attachment in Spanish by manipulating prosodic breaks and in an eye-tracking experiment, Nakamura et al. (2012) report an early role of prosody in relative clause disambiguation in Japanese, but only when prosody is aligned with an appropriate visual context. However, Bergmann et al. (2008) compared the default phrasing of the NP1 NP2 RC construction in English and Spanish, with results in the opposite direction. This study had interesting methodological innovations: native speakers of each language read eight ambiguous target sentences without counting on previous practicing or scanning of the materials. After reading, speakers had to answer questions on attachment. They confrmed the typical fnding of low attachment in English and high attachment in Spanish; yet, the most common default prosodic pattern of the two languages was the same: CNP + RC. Interestingly, the choice of high versus low adjunction did not vary much as a function of the default phrasing patterns, which suggests that the prosodic boundaries that participants made did not afect their attachment choices. A later study by Jun (2010) replicated this pattern of fndings and introduced important considerations on the notion of default prosody and “out-of-theblue reading” (a critical methodological condition, as it turned out). Fodor and

Modifer adjunction with special reference

43

her colleagues had surmised that the prosody imposed during silent reading is the same as overt prosody and used reading-aloud prosody to test the IPH (e.g., Fodor 2002; for an interesting electrophysiological (ERP) study on silent reading, see Drury et al. 2016). In fact, we seem to be dealing with three (not just two) diferent kinds of ‘prosody’: that of (a) natural spontaneous speech; (b) reading aloud; (c) silent reading. And the second type (reading aloud) may actually change depending on whether reading is “out of the blue” or previous skimming is allowed. As Jun (2010: 1221) observes, contrary to spontaneous natural prosody, in the case of reading aloud out-of-the-blue, the text is already given and the speaker is not trying to emphasise or disambiguate any meaning. That is, reading aloud out-of-the-blue would generate “surface” prosody following only the roughest and simplest prosody-syntax mapping constraints, including phonological weight, i.e., rhythm and length (e.g., a break when incoming material is long …) and structural constraints (e.g., a break before a major syntactic constituent …). On the other hand, silent reading during a processing experiment would not generate surface prosody. In general, in a processing experiment where RC attachment is tested of-line, subjects read the questionnaire silently but reading time is not limited (though they are encouraged to answer as quickly as possible). The subjects have the opportunity to read the text multiple times, and as they are trying to answer questions regarding RC attachment, they are presumably reading at a deeper level. When reading aloud out-of-the-blue, especially when reading without frst skimming the sentence in detail and when the speakers are aware that their reading is recorded, the subjects’ focus would be on “fuent reading”, trying to make the fewest possible mistakes. In this case, a “safe” way to read, or good “performance” of reading aloud, would be to produce each content word prominently, thus putting pitch accent on every content word, and to put a prosodic break more frequently, following phonological and structural constraints (…). Furthermore, in no-skim reading aloud, the speakers would not be able to prosodify the entire sentence at the beginning. Instead, they would prosodify a few words, as they read along, within their reading span. It has long been known that silent reading is faster than reading aloud (Taylor 1965). Note too that, as Jun observes, (p. 1222), the sum of two NPs contains enough phonological size to compose a phonological phrase on its own and that the appearance of a relativizer (that, which, who, etc.) announces the appearance of a clause. This suggests that, in out-of-the-blue reading, the most common default prosodic chunking of the [NP1 NP2 RC] construction cross-linguistically ought to be [NP1 + NP2] + [RC], that is one that separates the CNP from the modifying clause. Cumulatively, all these considerations do seem to cast some doubt on

44 Modifer adjunction with special reference

the merits of the Implicit Prosody Hypothesis, as well as on the associated praxis of gauging implicit prosody via overt prosody, which is not read out of the blue. This question is not resolved. A fnal issue regarding silent prosody merits attention here. In two studies, Swets et al. (2007) analysed the efects of domain-general and domain-specifc (verbal) working memory on attachment decisions in ofine questionnaires in both English and Dutch. After carefully measuring the participants’ memory spans, in their frst study they found that low working memory readers were more likely to use high attachment strategies than were high span ones.19 They interpreted this fnding as being inconsistent with predictions of locality-based theories (e.g., LC). Further analyses revealed that both domain-specifc (verbal) and domain-general working memory were involved in the efect. In their second study, they examined the role that segmentation strategies play in accounting for the “counterintuitive” fnding of experiment 1 (readers with poorer memories opting for anti-recency co-indexations). Specifcally, they found that when the sentences were displayed in full in both Dutch and English, the lower the readers’ working memory the more likely they were to opt for high attachment. They interpreted this fnding to mean that readers with low spans have more of a need to partition large segments of text, leading to high attachment. That is, these readers tended to put a prosodic break between the CNP and the RC, whereas high span readers seemed to treat the entire CNP RC segment as a unit. When the materials presentation changed and there were three breaks (three lines) in it (CNP + RC + main clause VP), high attachment increased substantially for both English and Dutch readers independently of their working memory profles (presumably, memory was less taxed in this experimental setup). Remember that these are all ofine tasks. Traxler (2007, 2009) provides online, eye-tracking data that complicate the picture. For instance, in Traxler (2007), when the test sentences were displayed in full, it was the subjects with higher working memory that preferred high attachment. This is the opposite of what Swets et al. found ofine. However, when the sentences were cut up in two (two lines: CNP + RC and matrix VP), readers opted for high attachment independently of their working memory span, as in Swets et al.’s study. It is difcult to make sense of fndings that mix methodologies, materials and languages, but a conservative conclusion is that segmentation changes do elicit changes in silent prosody and these afect attachment strategies, as frst suggested by Gilboy and Sopena (1996). As Fromont et al. (2017) observe, there is an overall agreement that prosodic breaks act as boundaries (Wagner & Watson 2010) and that, clearly, a break after NP1 favors LC. But accounts difer in whether intonation breaks act as grouping cues (Clifton et al. 2002), or as separation cues (Watson & Gibson 2005). It may even be the case that some breaks have a stronger efect than others, and that all this interacts with other constraints, making their efectiveness more or less variable. It remains to be seen how precisely implicit prosody, overt prosody and out-of-the-blue overt

Modifer adjunction with special reference

45

prosody work both intra-linguistically and cross-linguistically. But we at least know that the three are players in the game now.

2.3.5 Going for the meaning directly: the role of lexical semantics, coherence and reference Till now, all accounts of the RC adjunction ambiguity have appealed to principles of various kinds. It is time to consider research and accounts that put the emphasis of language comprehension on the comprehension of what is being meant directly, without geometrical, actuarial, grammatical or prosodic flters. There are, of course, many diferent ways in which conceptual structure can potentially guide interpretations. This is because conceptual structure is a really large notion. It may be taken to do all the job, with syntax playing little or no part (e.g., the old adage: you don’t need a roadmap if signposts are clear), or a large part of the job in competition with other factors. Given that the functional job of a restrictive relative clause is to narrow down the range of referents of a previous noun (in the children who passed the exam will leave before all others not all children are supposed to have passed the exam), at a minimum we need to assess the role of context, since it is context that often provides the background needed to activate referential competition. Before that, it is as well to briefy recall experiments aimed at examining the role of lexical semantics. In section 2.3.2, when we discussed Tuning and the course versus fne grain problem, it was seen that that problem had a fnal outcome: the victory of the fne grain. This meant that the ‘notions’ that an actuarial processor uses to grapple with the RC attachment ambiguity are not syntactic or geometrical in nature (or at least not only), but lexical too. One prominent lexical feature, of an eminently semantic favor, was animacy. Recall that Desmet et al. (2002) found that when the frst NP in the CNP coded an animate referent, adjunction to that site was very strong, particularly if the second NP coded an inanimate one.20 In Dutch it appears that the animacy manipulation could go only so far (since Animate NP2s did not drive adjunction low), but in Spanish it has been seen to be a little stronger. In a self-paced study with the four relevant conditions (Animate + Animate, Animate + Inanimate, Inanimate + Animate and Inanimate + Inanimate), Acuña-Fariña et al. (2009) could see that the general NP1 preference widely attested in Spanish disappeared precisely in the Inanimate + Animate condition. The authors actually registered a numerical trend towards the NP2 attachment site too in that condition, which indicated that robust large scale (course grain-type) biases (Spanish being ‘generally’ an NP1language) can be modulated by semantic information residing in the lexical pieces. When animacy occurred on the already preferred site (NP1), adjunction to that site was the strongest. Recently, Kwon et al. (2019) studied the facts of Chinese and found that even though LC predominated over EC overall, animacy does also play a role in that language in that, when it is involved, RCs tend to seek animate NPs as hosts.

46

Modifer adjunction with special reference

Interestingly, this tendency interacted with syntactic function. In their second experiment, the researchers manipulated both subject relatives (which, as noted, tend to prefer animate subjects in corpora) and object relatives (the opposite) and found that animacy efects extended only to subject relatives, but not to object ones (see also Gennari & McDonald 2008 for animacy efects in object relatives in English). In sum, in neither Desmet et al. (2002, 2006), Acuña-Fariña et al. (2009) nor Kwon et al. (2019), inter alia, could it be seen that RC adjunction is insensitive to the meaning of the nouns that form the syntactic structure. Nor could it be seen that that meaning was all there is in the resolution process either. Recall too that Desmet el al. (2006) also obtained efects manipulating the abstract versus concrete dimension, another classic feature of lexical semantics. Finally, in section 2.3.2 we recounted the rather surprising fact that nouns that conjure up strong emotionality (strong pleasure or displeasure, strong activation or lack of it) manage to draw the RC towards them in Spanish (Fraga et al. 2012). Although, being a completion task, results are not comparable, this still underscores the potency of the lexical efect on a presumably encapsulated process of constituent adjunction. Let us turn to context now. Very soon after the initial invocations from GP theoreticians that Gricean factors had no role to play in initial decisions,21 work by Altmann and Steedman especially made it clear that that conclusion may have been premature. The pioneering work is Crain and Steedman (1985), who focused on the role that defnite NPs play in syntactic ambiguity resolution. These researchers showed that when referential context was manipulated, even garden paths of the horse raced past the barn fell type can be avoided (say, if previous discourse had introduced more than one horse in the scene). Ni et al. (1996) and Sedivy (2002) also studied the main verb versus reduced relative ambiguity and managed to corroborate that underlying referential competition does actually afect such resolutions. Altmann and Steedman (1988) showed that in sentences like the burglar blew open the safe with the new lock, which contain PP modifers (with the new lock) that may attach to a VP (blew open) or to a NP (the safe; see Introduction), Minimal Attachment could actually be abandoned if the anti-MA reading is referentially supported. Thus, for instance, the sentence in question is read faster than a similar one like the burglar blew open the safe with the dynamite if more than one safe is introduced frst in the discourse context. Work by Van Berkum et al. (1999) with the ERP methodology showed that structures such as Jane told the woman that …, where ‘that’ is ambiguous between (preferred) complementizer and (dis-preferred) relativizer, were more likely to be construed as relatives in ambiguous contexts containing more than one woman than in unambiguous contexts containing only one. Interestingly, these authors found that within less than 300 ms after the onset of critical nouns, diferent brain signatures emerged in conditions involving one versus two referents, which seems to indicate that referential information is recruited very early on. Soon it emerged that the processing of referential chains is afected by the local visual context as well, in such a way that a scene containing several objects (say two apples but just

Modifer adjunction with special reference

47

once orange) activates expectations of what forms these referents are going to take linguistically, a fact that afects not only syntactic ambiguity resolution but the processing of unambiguous discourse (Altmann & Kamide 1999).22 Although it is not our direct focus of interest here, it may be useful to mention that another pragmatic variable is susceptible to afect parsing choices: topicality. Hoeks et al. (2002) showed that the topic structure of a text infuenced temporary ambiguities involving coordinated structures such as The model embraced the designer and the photographer at the party versus The model embraced the designer and the photographer laughed. It is well known that when reading such sentences in isolation, readers opt for a coordination of ‘the photographer’ and ‘the designer’, as in the frst sentence. This leads to processing difculty at the verb in the second one. Hoeks et al. showed that when the sentence was preceded by a context that contained two topical protagonists (When they met the fashion designer after the show, the model and the photographer were very enthusiastic), that processing difculty disappeared. All these considerations helped propose a view of parsing known as the Referential Theory, with two major, interacting principles: the Principle of Referential Support and the Principle of Parsimony: Principle of Referential Support: An NP analysis that is referentially supported will be favoured over one that is not (Altmann & Steedman 1988: 4). Principle of Parsimony: If there is a reading that carries fewer unsatisfed but consistent presuppositions than any other, then that reading will be adopted and the presuppositions in question will be incorporated in the perceiver’s mental model. (Crain & Steedman 1985: 333) The idea behind these principles is that certain discourse interpretation constraints (most notably those relating to the creation, establishment and maintenance of referential coherence) may start acting before the computation of a determinate syntactic structure is actually fnalized. Thus, it is maintained that a defnite NP that is restrictively postmodifed, such as the servant (who is) on the balcony, not only presupposes the existence of a servant and a balcony but also activates an extrasentential conversational implicature to the efect that another servant or servants are not on the balcony. This approach rests on pragmatic entailments according to which, should there be only one servant activated in the scene, less coding material would have been chosen to refer to him/her, such as a pronoun or an unmodifed defnite NP (Givón 1993/I: 213 f.). Referential support and parsimony are to be applied in a manner that is consistent with the idea of a weakly-interactive parallel parsing model that is set to make decisions (in parallel with a tree-forming device) at small grain intervals, by which is meant that a representation launched by a tree that is being activated may be stopped if referential information suggests a diferent interpretative path. A well-known logo often used to refer to this view is: ‘Syntax

48

Modifer adjunction with special reference

proposes, semantics disposes’ (see Chapter 5). This is meant to capture the idea that the parser is supposed to incrementally generate an interpretation based on the continuous assessment of how the frst-generated tree relates to the ongoing semantic/pragmatic analysis. Thus, language-users quickly compute presuppositions and implicatures of of partial input and use this extra-syntactic information to afect syntactic parsing decisions. Experimental validation of this theory has been the subject of much debate concerning issues of methodological design and statistical analysis. The bottom line of such a debate is whether the referential information starts acting immediately in parallel with the ‘pure’ parsing device, before that device or a little after it. Grodner et al. (2005) showed that restrictive RCs are processed faster than non-restrictive ones in a supportive context, but more slowly in a null context (see also Fedorenko et al. 2012). We can safely conclude that reference matters in the RC disambiguation world. Cumulatively, the range of methodologies used (speeded grammaticality judgements (e.g., Crain & Steedman 1985), self-paced reading (Altmann & Steedman 1988), eye-tracking in reading (Garnham et al. 1997), eye-tracking in the visual world paradigm (Tannenhaus et al. 1995) and ERPs (van Berkum et al. 1999)) makes it hard to conclude otherwise. Whether it matters in the more circumscribed world of the structure under analysis here is much less clear. Zagar et al. (1997) did not fnd evidence for referential interfacing in French, and Desmet et al. (2002) failed to fnd it in Dutch too. Pan et al. (2014) studied both native English speakers and German and Chinese secondlanguage learners of English in both an ofine questionnaire comprehension task and a self-paced reading study. They found efects for both groups in the ofine task, but in the online one the native speakers were unafected by the manipulation of referential context information. It is difcult to make sense of these failures. Indeed, it is certainly strange that reference tracking afects RCs in the comparison with complement structures and restrictive RCs in the comparison with non-restrictive ones, but it does not afect the intra-construction comparison (the CNP + RCs with versus without supportive context), which seems ideal to expect the predicted efects. Admittedly, there is a scarcity of studies in this very particular area, so we cannot really be sure what to conclude. Let us fnally end our brief summary of meaning-based efects in the processing of SRCs with a cursory look at a research line that examines mid-sentence coherence-driven biases, particularly how expectations about upcoming discourse coherence relations have an impact on the resolution of a structural ambiguity. Rohde et al. (2011) were interested in studying how comprehenders trigger inferences that allow them to make sense of a situation. For instance, upon hearing statements such as those in (30): (30) John detests his coworkers. They are arrogant and rude, we typically do not treat the two statements as independent, but, instead, we tend to infer that John detests his coworkers because of their arrogance and rudeness.

Modifer adjunction with special reference

49

This may be a problem for artifcial intelligence systems, but seeing the implicit causality of the mini two-sentence discourse is mundane to a human processor. Consider how implicit causality afects the problem of the RC attachment ambiguity with reference to the sentence fragments in (31) and (32): (31) John babysits the children of the musician who … (32) John detests the children of the musician who … We know that in English low attachment predominates generally. Given that: we can ask what we would expect to happen if comprehenders are able to utilize (…) pragmatic knowledge when making a syntactic attachment decision. If IC [implicit causality] verbs like detest generate a greater-thanusual expectation for an ensuing explanation (as compared to non-IC verbs like babysit in [32], for example), and comprehenders are implicitly aware that RCs can describe such an explanation, and this explanation is likely to be about the direct object, then we might expect a greater bias for the RC to attach to the direct object in [32] than in [31], which, crucially, is the high attachment point for the RC. (Rohde et al. 2011: 4) Towards that, Rhode et al. conducted both an ofine completion study and a self-paced reading study. In the completion experiment, the habitual LC preference for English emerged clearly in the non-implicit causality condition. However, more high attachment choices were registered in the implicit causality condition in cases where “(i) the verb’s causally implicated referent occupies the high-attachment position and (ii) the relative clause provides an explanation for the event described by the matrix clause”. The self-paced reading experiment cast a similar pattern of results: more high attachment in the implicit causality context (note: before the appearance of any kind of linguistic evidence that the RC may in fact support an explanation), and LC in all other cases. The authors point out that their experiments are “the frst demonstration that expectations about ensuing discourse coherence relationships can elicit full reversals in syntactic attachment preferences” (p. 33). In fact they maintain their results show that such pragmatic inferences can afect online disambiguation processes as fast as lexical and morphosyntactic constraints (on expectations, see Levy 2008a and chapter 5). Overall, it seems that both lexical meaning and Gricean principles of various kinds are swiftly at work in the interpretation processes that are necessary to link restrictive SRCs to their hosts, although how/when the latter enter the scene in the particular case of CNP + RCs is still an open question. Not that this is not a strong point of contention among models at this stage, especially since GP morphed into Construal and allowed such principles to afect (a subset of ) non-primary constituents. Models apart, it is surely interesting to fnd out that

50

Modifer adjunction with special reference

a geometrical problem of constituent adjunction should be resolved strategically instead of in reference to strictly geometrical properties of the trees.

2.3.6 All at once in one stage: the Constraint Satisfaction Approach Starting in the late 1980s, as a reaction to the syntactocentrism of so-called principle-grounded models like GP and coarse-grain Tuning, a series of approaches to the puzzle of human language comprehension shared a number of important views. They stressed the fact that linguistic structure is not just syntax, and that, in fact, the lexicon is a much better source of information for dealing with both unproblematic language comprehension and syntactic ambiguity resolution. About the latter they made the point that very often syntactic ambiguity resolution is, in fact, the result of ambiguity at the lexical level and that it involves a process that addresses multiple levels of representation. These researchers maintained that all those levels are immediately consulted and used in parallel by the parser in one stage, if useful. The idea is that all possible constraints are activated and ranked according to the strength of their relative activations. This is very much in contrast with the binary views already examined here according to which a savvy, generally lucky processor manages at a frst stage to use a syntactic sieve to flter out from the search everything that is not word category information and a preferred tree – and to consider all else during a more or less undiferentiated second stage. Diferent constraint-satisfaction (henceforth also CS) models may difer in the nitty-gritty but they all share the idea that a sizable number of important cues are simply too useful for a parser to put on hold while a determinate syntactic tree is being deployed in full (for a review of the main models, see MacRae & Matsuki 2013). As noted, the lexicon takes centre stage, and it does so in various ways. Take lexically-specifc syntactic biases. For instance, it appears that a verb such as search is used rather infrequently as a passive participle whereas select occurs in that guise much more often. Work by Trueswell et al. (1993), MacDonald et al. (1994), and Trueswell (1996), among others, showed that such relative frequencies of specifc verbs afect the resolution of the main clause/reduced relative ambiguity (e.g., the witness examined … versus the evidence examined; see also the Introduction). Subcategorization frames are of course optimal candidates to be studied in this context too. By subcategorization frames is meant the kind of complement structure a predicate may have. For instance, the verb tell can appear in an NP V NP frame ( John told the news), an NP V NP INF frame ( John told Jane to go), an NP V NP THAT frame ( John told Jane that she was great), etc. In two online experiments (eye-tracking and self-paced reading), Garnsey et al. (1997) found evidence of their skewing role. They focused on the temporary ambiguity generated when an NP following a verb may be both its direct object or the subject of a subsequent embedded clause (Amanda believed the senator steadfastly/was guilty …). The results indicated that verb bias was swiftly used to resolve the

Modifer adjunction with special reference

51

ambiguity (they also showed that verb bias and plausibility interacted throughout the process).  Take selectional restrictions. Altmann and Kamide (1999) examined participants’ eye movements as they glanced at a scene showing a boy, a cake and various distractor objects (none of them edible). While the viewing took place, they heard sentences such as ‘the boy will move the cake’ or ‘the boy will eat the cake’. The onset of eye movements to the target object (the cake) was measured. It turned out that saccades to the target were made after the onset of the spoken word cake in the move condition, but before its onset in the eat condition. These results suggest that “information at the verb can be used to restrict the domain within the context to which subsequent reference will be made by the (as yet unencountered) post-verbal grammatical object”. The authors underscored the view that “sentence processing is driven by the predictive relationships between verbs, their syntactic arguments, and the real-world contexts in which they occur”. Finally, transitivity (whether a verb takes an NP complement versus clause complement) and verb sense also yielded positive results in various other experiments (Garnsey et al. 1997; Staub 2007; Hare et al. 2004). Naturally, all these diferent lexical biases are crucially tied to frequency tallies (which brings corpora studies to the fore, as for Tuning). The way constraint satisfaction works is through interactions, and although these may potentially be too many, lexical biases and semantic plausibility (e.g., verb sense, verb subcategorization preferences, selectional restrictions, etc.) narrow down their number without any need to wait for syntax to materialize in full. Consider the way McRae and Matsuki (2013: 2) explain the way an interactive processor of this kind might deal with horses racing past barns in the comparison between (33) and (34) below: (33) The horse raced past the barn fell. (34) The landmine buried in the sand exploded. Sentence [33] causes a great deal of difculty because all of the initial cues point toward the incorrect main clause interpretation. Consider the moment when someone has read or heard up to raced. People’s real-world knowledge about horses includes the fact that they often race, and thus horse is a great agent of raced. Although raced is ambiguous between a past tense and passive participle reading, it is usually used as a past tense verb. Therefore, comprehenders are likely to interpret the initial portion of (1) as if it will continue as a main clause, although it does not. In addition, the main clause reading carries smoothly through the prepositional phrase (PP, past the barn) because raced can be used intransitively (without a direct object, DO), and a horse racing past a barn is a plausible event. Furthermore, there is no context that contains multiple horses that might pragmatically be distinguished using a reduced relative (i.e., picking out the one that was raced past the barn). Therefore, even after barn is read or heard, it is very difcult to reject the main clause reading and to correctly interpret the temporarily

52 Modifer adjunction with special reference

ambiguous reduced relative. The main verb fell syntactically disambiguates [33] as having contained a reduced relative, but the sentence remains diffcult to comprehend even at this point due to the strong constraints that all work together to cue the incorrect interpretation. On the other hand, sentence [34] is quite easy to understand because the constraints point to a reduced relative interpretation. Because landmines do not bury things, landmine is a terrible agent for buried. Also, landmines are typically buried, and thus are a great ft as a patient. Furthermore, although buried is ambiguous, it is used more frequently as a passive participle than as a past tense verb. Thus, all of these cues support the reduced relative reading, even at the initial verb. As can be seen, CS leaves room for anticipation or expectation processes to actively foster interpretations as well (Levy 2008a).  In sum, apart from classic Bever-ian syntactic heuristics (subject to frequency-based activation anyway), a CS approach may include probabilistic counts of “lexically-specifc syntactic information, word meaning, selectional restrictions of verbs, knowledge of common events, contextual pragmatic biases, intonation and prosody of speech, and other types of information gleaned from intra-sentential and extra-sentential context, including both linguistic and visual contexts” (McRae & Matsuki 2013: 3). The model subsumes in it all of the Referential Model in the previous section, since reference (a meaning factor) is taken to be a major constraint. Real processing difculty only arises when two or more constraints have approximately equal activation, resulting in tough competition (MacDonald 1994) or even “competitive gridlock” (Levy 2008a). CS has been seen to combine well with computationally implemented Parallel Distributed Processing architectures of the mind of diferent sorts (Spivey-Knowlton 1994; Spivey & Tanenhaus 1998; McRae et al. 1998), which are, in fact, an attempt to grapple with the model’s most widely cited weakness, namely their open-endedness. Indeed, from the moment of their inception, all forms of CS research have been dubbed vague, underspecifed and unfalsifable (Frazier 1995). Although, beyond passing attention, the structure particularly under analysis here appeared not to have been seriously studied (with most work on relatives devoted instead to comparing biases involving SRCs versus other, diferent structures, such as complement that-clauses), it is not too difcult to visualize the CS approach to it. The prediction is twofold: frst, there should be more than one factor at play, instead of either LC or course grain frequency, or preposition type, or referential support, etc.; second, the diferent factors are expected to interact. In truth, seen from the present times, it is hard not to concede the frst point straight of. The literature we have reviewed so far has ofered at least partial support for the following factors/constraints (note that, as we have seen, some of these may actually yield further subconstraints, which we ignore now):

Modifer adjunction with special reference

1. 2. 3. 4. 5. 6. 7. 8.

53

Locality: Late Closure Relativized Reference: Early Closure Course grain frequency: the frequency of entire syntactic templates Fine grain frequency: the frequency of lexical properties, such as animacy or concreteness The kind of preposition: thematic versus case-assigner Silent prosodic chunking Short-term memory Referential anchoring

To these we may now add: 9. Anaphor resolution. Hemforth et al. (2000), Delle Luche et al. (2006) made the point that, in parallel with adjunction proper, [CNP + RC]s contain relative pronouns that bring about a process of anaphor resolution (deciding what a pronoun refers to), and that the anaphoric co-indexation of the pronoun is thus open to manipulations of both focus and visibility (in English, for instance, relative pronouns can often be dropped). This explains the difference between RC and PP attachment, in their view. 10. The lexical frequency of the nouns involved. Pynte and Colonna (2000) have shown that when the frst NP contains a noun of lower frequency than that of the NP2, the RC is more likely to be linked high to it in French; and vice versa: LC is favoured when it is the second noun that is less frequent. 11. The restrictive/non-restrictive nature of the RC. Baccino et al. (2000) point out that the sturdy EC preference found in Frenck-Mestre and Pynte’s (2000a, 2000b) French and Italian sentences is unduly afected by the authors failing to control the restrictive/non-restrictive dimension of the RCs in them. Since the French team had used names in the NP2 slot, an NP1 bias was thus created (as restrictive RCs cannot so easily modify proper nouns: */?did you see the John that was there?). 12. Grammatical number. Following up on research on agreement mistakes (see chapter 3), Deevy (2000) studied the way a plural NP in the second slot interfered with the adjunction process. She compared structures like the niece of the actors who was … to others like the nieces of the actor who was …, and found interference (higher RTs) only in the former. Since plural NPs are often considered to be marked, Deevy accounts for her fnding in the following way: assuming Construal postulates, since the RC is not initially attached to a host, it remains in a kind of hovering phase. This precludes the verb in the RC from having its plural feature swiftly checked till the RC is fnally adjoined. When the RC verb is plural no interference is expected as the nearby singular feature in the NP2 need not be checked. Regardless of this elaborate explanation, de Baecke et al. (2000) found in a corpus study of Dutch that plural NPs attract adjunction of the RC.

54

Modifer adjunction with special reference

13. Predicate Proximity. According to Gibson et al. (1996), who studied three-site CNPs like the changes of the orbits of the planets that …, attachments are easier if they are as structurally close to the head of a predicate phrase (normally a VP) as possible. This principle is meant to interact with LC cross-linguistically in motivated ways. For instance, it is supposed to be weak in English but strong in Spanish, a fact derivable from the distance of arguments to the predicate in a particular language: Larger distance (e.g., Spanish) results in the need for a stronger predicate activation (predicate proximity). Lacking rich agreement, English is supposed to opt for local bindings (recency) all the way through. This is not an exhaustive list, and not all parameters seem to have the same weight in the system, but this is precisely the point: they are not meant to. They are instead meant to afect syntactic attachments when too many of them militate against a major parameter, such as LC (they “gang up on” it; Phillips & Gibson 1997). Their interactive nature is also quite apparent, at least on the surface. Thus, for instance, we have seen some threads of evidence that prosodic chunking interacts with memory, and that silent reading interacts with the task. We have just referred to Gibson et al.’s (1996) theory that choice of predicate proximity versus LC may interact with general language type. In a visual world eye-tracking experiment, Nakamura et al. (2012) investigated the infuence of contrastive intonation and visual context on the processing of temporarily ambiguous relative clauses in Japanese and found evidence that subjects used the prosodic cue to implement structural predictions before hearing disambiguating information only in cases where the visual scene provided an appropriate context for the cue. Zahn and Scheepers (2015) crossed prosody and plausibility in an English study and showed that the two types of cues “interact in a complex way, suggesting that (a) the amount of surprisal associated with cueing a generally dispreferred structure and (b) the type of revision necessary to resolve the ambiguity both play a major role in determining relative clause attachments”. It seems reasonable to see prosody as a weak player made strong in favourable circumstances. Finally the skewed efects of the animacy manipulations in Desmet et al. (2002, 2006) and Acuña-Fariña et al. (2009) indicated that animacy interacted more with general language type (NP1 or NP2 preference at large) in Spanish than in Dutch. Kwon et al. (2019)’s animacy manipulation was clearly interfaced with syntactic function.23

2.3.7 The Unrestricted Race Model (URM) and the ambiguity advantage effect (AAE) Work by Traxler et al. (1998), Van Gompel et al. (2001) and Van Gompel et al. (2005) uncovered an interesting phenomenon afecting disambiguation, including disambiguation of RCs. This is the so-called ambiguity advantage efect (AAE). By this is meant something surprising a priori: namely, that globally ambiguous structures are easier to process than unambiguous ones. The pioneering work

Modifer adjunction with special reference

55

was Traxler, Pickering & Clifton (1998)’s eye-tracking study containing materials like those in (35)–(37): (35) The son of the driver that had the moustache was pretty cool. (ambiguous) (36) The car of the driver that had the moustache was pretty cool. (NP2 attachment) (37) The driver of the car that had the moustache was pretty cool. (NP1) In the frst experiment of that study, total times on the critical word (moustache) were shorter in the ambiguous conditions than in both disambiguated conditions, which did not difer from each other. In two further eye-tracking experiments with, among others, structures like (38)–(40) below, they managed to obtain clearer data in the form of faster measures: frst-pass regressions out of the region following the critical word (on the balcony), suggesting that difculty emerged soon after reading the disambiguating word. More regressions from this region occurred when the RC had to attach to NP1 (40) than when the sentence was globally ambiguous (38). A similar result obtained in the analysis of total times at the critical region himself/herself. (38) The brother of the colonel who shot himself on the balcony had been very depressed. (ambiguous) (39) The daughter of the colonel who shot himself on the balcony had been very depressed. (NP2 attachment) (40) The daughter of the colonel who shot herself on the balcony had been very depressed. (NP1 attachment)24 These data led to the formulation of a new model, a sort of hybrid approach that aimed at bridging the gap between serial, two-stage models and massively interactive parallel models of the constraint satisfaction kind. This is the so-called Unrestricted Race Model (URM): As in constraint-based theories, there is no restriction on the sources of information that can provide support for the diferent analyses of an ambiguous structure; hence, it is unrestricted. In the model, the alternative structures of a syntactic ambiguity are engaged in a race, with the structure that is constructed fastest being adopted. The more sources of information support a syntactic analysis and the stronger this support is, the more likely this analysis will be constructed frst. The model claims that when the various sources of information strongly favour one analysis over its alternative (a biased ambiguity), this analysis will nearly always be adopted. In contrast, when two analyses are about equally preferred (a balanced ambiguity), each analysis will be adopted about half the time. A weak bias, however, might lead to one analysis being adopted most, but not all of the time. This is one way the unrestricted race model can account

56

Modifer adjunction with special reference

for gradations in garden-path efects (…). In contrast to constraint-based theories, only one analysis is constructed at a time. Because only a single analysis is available at any time, reanalysis may sometimes be necessary if information following the initial analysis is inconsistent with it. Thus, the unrestricted race model is a two-stage reanalysis model. (Van Gompel et al. 2001: 226) Although the URM is a stochastic, serial parsing model which posits single parses that are selected probabilistically, it claims that the use of non-syntactic information (like plausibility) depends on when this information is actually available. If it is available at or after the point where the ambiguity presents itself, it will not drive the initial parse. But, if it is present already before the disambiguating point, then it can and surely will be employed immediately, which means that ensuing predictions would not difer from those of constraint-based theories. URM thus confronts the so-called now-or-never bottleneck (Christiansen & Chater 2016) by positing that at each timestep, the processor immediately extracts as much information from the linguistic signal as it can (see Futrell et al. 2020). Its proponents claim that their data are a direct refutation of the idea of competition arising from the parallel juggling of competing alternatives. Later work has qualifed this deterministic interpretation. For instance, Levy (2008a, 2008b) wields surprisal theory to account for the same range of facts. The idea is that when there are two compatible attachment sites, upcoming words are more easily predicted by context. Thus, relative to the contrasts in (35)–(37), the critical word ‘moustache’ in (35) has a higher conditional probability in a globally ambiguous structure, leading to a lower surprisal value. This translates into processing ease. Conversely, the same word is less predictable in both unambiguous conditions because it is compatible only with one attachment route. Swets et al. (2008) provide an explanation in terms of strategic underspecifcation, in the spirit of good-enough processing models (Ferreira 2003; Karimi & Ferreira 2016) and also broadly compatible with surprisal. The idea is that comprehenders come up with underspecifed representations if the task does not require ambiguity resolution. Since in the Traxler et al. experiments participants did not have to answer questions regarding RC attachment, ambiguity resolution did not actually take place. This line of reasoning predicts that the AAE should vanish when the participants expect to be asked about attachment. This was what Swets et al. found in their experiments: no AAE (see Logačev & Vasishth 2016 for qualifcations to this view). On balance, the least one can say about the URM is that it is genuinely realistic: it does not maintain the counter-intuitive idea that a lot of information is put on hold without need (like classic two-stage accounts); and it enforces the view that massive competition does not probably occur either (contra one-stage accounts). This it does by visualising a parser that incrementally uses whatever it has in its hands to construct a processing path and is somewhat simply used to doing mild reprocessing quite often.

Modifer adjunction with special reference

57

2.3.8 Grillo and Costa (2014): the Pseudo Relative confound The last major innovation to be examined here is the idea introduced frst in Grillo (2012) and fully developed in Grillo and Costa (2014) in an important Cognition paper that much previous research on the construction under analysis led to incorrect conclusions because it did not control for an important confound: the coexistence in many languages, but not all languages, of a structure that is “string-identical” with RCs: so-called pseudo-relatives (henceforth also PRs). Thus, according to Grillo and Costa, in Spanish for instance, a sentence such as (41) below is actually ambiguous between a PR reading and two RC readings: (41) c. Vi al hijo del médico que corría. ‘I saw the SON of the doctor who ran’. The son ran. b. Vi al hijo del médico que corría. ‘I saw the son of the DOCTOR who ran’. The doctor ran. c. Vi al hijo del médico que corría. ‘I saw the son of the doctor running’. The son ran. As the glosses suggest, PRs as in (41c) are structurally equivalent to small clauses (SCs) in English, of the eventive kind (Cinque 1992). The simplifed trees in Figures 2.3 and 2.4 capture the diference: As can be seen, in Figure 2.3 the verb see projects an NP object argument, and this is, in fact, a CNP containing a head noun and a modifying one. The RC that

C

VP

NP

CNP

V

N1

N2

FIGURE 2.3

RC

I saw the son of the doctor who ran

58

Modifer adjunction with special reference

C

VP

NP

SC

V

CNP FIGURE 2.4

VP

I saw the son of the doctor running

follows may modify either; hence, the ambiguity. Thus, “at the interpretive level this maps onto the perception of an entity/individual having certain additional restrictions specifed in the RC” (I did not see any boy or any doctor, only the one who ran). In Figure 2.4, the verb see takes the entire small clause as complement; inside that SC, the NP the boy is the subject. Again, “at the interpretive level this maps onto the perception of an event” (I saw something happening = somebody running; p. 162).25 According to Grillo and Costa, the confound has important consequences for the way we see the entire feld of studies on parsing natural languages. Researchers had naturally assumed that RCs, being conceptually easy to grasp, were identical across languages, so when the frst evidence of cross-linguistic diferentiality came out, many put the burden of accounting for that on the parser, undermining the idea of a universal parsing mechanism. However, the “assumption of identity is wrong” for “English that and Spanish (or Italian/ French/Dutch) que/che/qui/die are not syntactically identical” (p. 161). This seems relatively straightforward, as subordinators and/or complementizers have rather unique historical origins cross-linguistically and it is probably these historical diferences that resulted in the gradual development of subtly diferent behavioral profles (Grillo and Costa briefy mention the generative notions of subjacency efects and that-trace efects, where cross-linguistic diferences have long been attested and described). PRs and RCs difer in a number of respects, with verb type and tense being conspicuous among them. Thus, for a start PRs are strongly associated to a particular class of predicates, namely perception verbs, and while these can project both events and entities, most other predicates only introduce entities. As for tense, this is “anaphoric” in PRs, as the act of perception in the matrix clause and the event coded in the SC happen at the same time. RCs are not constrained at all in that area (e.g., I used to love the daughter of the general who will succeed my father from tomorrow on). Another notable diference between PRs and RCs is that the former

Modifer adjunction with special reference

59

can attach after names (e.g., in Italian Ho visto Gianni que correva ‘I saw Gianni running’) but the latter cannot, unless non-restrictive modifcation is involved (I saw John, who by the way was running at the time; *I saw John who was running). Grillo and Costa claim that PRs are structurally simpler than RCs, which is why a parsing device shaped to blindly adhere to computational economy will always prefer them to RCs as a frst choice, other things being equal. As evidence of that simplicity they maintain that: (a) both the syntax and the semantics of PRs are “impoverished” relative to those of RCs; (b) tense is dependent in PRs but “referential” in RCs; (c) “PRs stand in a sisterhood relation with the head NP, while RCs are embedded within the same NP, making the RC an arguably more complex confguration”; (d) PRs code information that is more relevant for the main assertion of the clause, usually as arguments of the main clause predicate, whereas RCs are always dispensable adjuncts; (e) PRs require fewer presuppositions than RCs, as they do not involve any kind of context-sensitive referential competition. On this logic, they propose the PR-frst Hypothesis, which rests on the following two generalizations: A. Low Attachment preference is observed, across languages and structures, with genuine restrictive RCs, i.e., when PRs are not available. B. High Attachment preference is observed in languages and structures which allow for a PR/SC reading (in contexts in which PRs are allowed by the grammar of each particular language). The HA preference in languages with a PR/RC menu is due to the fact that PRs should be chosen based on simplicity/minimality and these force HA. This way, they claim, locality is preserved as a universal parsing mechanism. They express it cautiously though, in that by no means is locality meant to be the sole factor in play: Locality is a natural principle of economy of computation, whose universality and appeal are so strong that when apparent counter-examples to this universal principle are found, as in the RC-attachment literature at hand, a massive amount of work is rightly dedicated to explain their origins. We should underline that the universality of a principle does not imply that principle will always win over other factors such as e.g. referentiality. (…) several factors ultimately contribute to attachment selection and many of them can apparently override locality (see e.g. Altmann et al. (1998) on the efects of context on Late Closure). The factors external to syntax that potentially afect attachment are compatible with locality applying universally within syntax. The biggest concern arising from the residual cross-linguistic variation in the RC attachment literature is that it questioned the universality of locality, not that it showed that other factors could take priority over it. (p. 166; emphasis added)

60 Modifer adjunction with special reference

In other areas of the paper, alongside referentiality, they emphasize the role of prosody and the related dynamics of anaphor resolution too. As evidence for their theory, they make the point that a kind of meta-study they conducted on previous research shows that there is almost perfect alignment between PR availability and attachment biases cross-linguistically. By way of new experimental evidence, besides the two studies on Italian in their infuential paper, they mention, among others, evidence from Greek (Grillo & Spathas 2014), European Portuguese (Grillo et al. 2013) and Spanish (Aguilar & Grillo 2019), much of it reported in recent conferences. In Grillo et al. (2015) they showed that English can opt for Early Closure if sentences are globally ambiguous between an SC and a reduced RC interpretation. Pozniak et al. (2019) ofer a recent account with new French and English data in much the same spirit. Before closing, it is fair to point out that, despite its important contribution overall, the PR-frst Hypothesis is not free of problems. As it is, it rests on the blind association between type of language (those having a string-identical PR/RC confound versus those not having it) and the type of preferred adjunction over some twenty years of testing (with vastly diferent amounts of evidence amongst all languages studied, as for many we count on only one study). However, without a detailed analysis of the actual materials used in all the experiments conducted till Grillo and Costa (2014), it is hard to give it all the credit it claims. We really do not know if in many of them perception predicates were signifcantly present. Then, casting Spanish in the same net as Italian, French, etc. is questionable, as it is not completely clear that in saying (42), (42) Vi al hijo de la profesora que estaba en la esquina ‘I saw the son of the teacher who was in the corner’ native speakers of Spanish (a HA language) can really recover a PR reading as easily as native speakers of Italian may do. In this sense, in Spanish, only the construction with names seems to naturally accommodate a PR reading, and this only in a distinctly conversational mode: (43) Vi a Juan que venía ‘I saw John coming’ The presumed relative simplicity of PRs over RCs is suspect too and without an adequate measurement of that the whole theory rests on unsafe ground. Grillo and Costa (2014: 167) invoke a new, peculiar form of Minimal Attachment to sustain their hypothesis: Whether the best way to capture Minimal Attachment is in terms of number of nodes, relative accessibility of the contextual representation associated with each alternative, or as a function of frequency/predictability of each parse, or even as a combination of these factors, is beyond the

Modifer adjunction with special reference

61

scope of this work, and in many ways irrelevant to the point we arguing for, especially since diferent approaches would probably converge on this prediction. What is relevant to the present point is that some principle akin to Minimal Attachment is at stake here. We argue that when a simpler option is available, restrictive relatives are not the preferred parse in the absence of a context supporting the relevant presupposition. (emphasis added) But, independently of that, are PRs really so minimal compared to RCs? How precisely are their syntax and semantics “impoverished”? If the diference between I saw John coming and I saw that John came rests on direct perception and aspect, it might make sense to assert that the latter is simpler than the former, but claiming that SC-type structures are simpler than good-old relatives is not immediately apparent. If their tense is “anaphoric” and “dependent”, then PRs need greater structural cohesion to be coded (since in order to code Tense 2 we need to go back and consider Tense 1). Statistically, a NP followed by an RC is run of the mill, but a NP that is the subject of a clause whose predicate phrase starts with que/qui/che and actually has no verb (hence the label ‘small clause’) is objectively rare. Also, as the authors point out (footnote 16), most perception predicates subcategorize for various complement patterns. We really do not know if juggling all the alternatives is cost-free, and how frequency afects selective activation of competitors. These are the fve argument structures for the predicate see that the authors recognize: (a) John saw Fred leave early, bare infnitive, direct perception; (b) John saw Fred leaving early, gerundive, direct perception; (c) John saw Fred owning a house, gerundive, imaginative; (d) John saw Fred to be a party-pooper, infnitive, belief; (e) John saw that Fred left early, fnite clause, factive. Additionally, Grillo and Costa insist that PRs code information that is more relevant for the main assertion of the clause, usually as arguments of the main clause, but they themselves are careful to point out that there are, in fact, at least three types of PR: complement SC, VP adjunct and NP adjunct. They also insist that inner and outer aspect of the embedded verb are relevant to their constitution too. A fnal point worth not forgetting concerns prosody. In fact, even though Costa and Grillo admit that PRs “are also associated with diferent prosodic representations” and “require specifc intonational phrasing” (p. 173), this is something of an understatement, for, in fact, the very availability of PRs rests crucially on a very particular intonational contour. As the authors rightly point out for Italian, “PRs are compatible with the presence of a prosodic boundary placed in between NP2 and the che-clause (…); and incompatible with a boundary following NP1”. In Spanish at least, lacking that makes invoking the construction practically impossible. In reading experiments, such as the vast majority of the studies reported here, it remains to be seen how much the PR interpretation is, in fact, activated without that special prosody. In a nutshell: prosody is more essential here than elsewhere. All in all, as Aguilar and Grillo

62 Modifer adjunction with special reference

(2019) point out, in fact: “Availability of PRs is heavily restricted: PRs require perceivable eventive predicates, tense-match between matrix/embedded verb and imperfectivity. RCs, on the other hand, do not impose any restrictions on any of these variables”.

2.4 Conclusions We can actually use Costa and Grillo´s PR-frst Hypothesis to provide some kind of general conclusion to the range of studies examined in this chapter and to their historical development. At a minimum, what the PR-frst Hypothesis entails is yet another potential language-dependent constraint at work in the resolution of RC attachment ambiguities. The authors themselves emphasize other aspects (constraints, as understood here), such as referentiality and prosody, but they insist that locality is “syntax”. This, however, is a moot point, frst, because, as has already been pointed out, locality is a domain-general operation and not particularly grammatically sophisticated. Second, its modularity does not follow directly from the PR-frst Hypothesis, but seems, instead, a choice of theoretical aesthetics. It seems to be safe to conclude that it matters in combination with more. That more is quite something, in fact, as we have seen. Locality and an at least superfcially defned notion of structural minimality (section 2.3.1), frequency of both large syntactic templates (constructions for some) and of the lexical items that form these templates (section 2.3.2), a seemingly well-grounded (and yet seemingly fallible, performance-wise) distinction between complements and modifers (section 2.3.3), prosody and implicit prosody (section 2.3.4), lexical semantics and the coherence relationships and predictions (such as implicit causality) that it entails, and referential tracking and the coherence relationships that it entails too (section 2.3.5), subcategorization biases, selectional restrictions, anaphor resolution, restrictiveness versus non-restrictiveness, and even general (agreement-rich versus agreement-poor) language type vis-à-vis distance to the predicate (section 2.3.6), strategic underspecifcation (section 2.3.7), and cross-linguistically variable pseudo-relative competition (section 2.3.8) … it all suggests that no magic bullet is in sight to resolve the puzzle of RC attachment ambiguities in all languages in one stroke. Thirty plus years on after Lyn Frazier’s PhD thesis, one thing is clear: it is defnitely much more than basic geometrical determinism. One may leave the contemplation of such a complex scenario with the feeling that language comprehension research on RCs has amounted to little more than the collecting of a bunch of facts that X can afect Y at least on some measure or other, and that anything is possible, as anything can be a ‘constraint’ that can ‘interact’ at any point in the decision-making process. Or, alternatively, one may decide to focus on the positive and cherish the amount of raw knowledge we have achieved by augmenting our discovery procedures via experiments. In fact, the positives are almost too numerous to tell. They simply compose a picture that is far more complex than cognitive scientists of magic bullet inclinations are willing to contemplate. In hindsight, the expectation of fnding a magic

Modifer adjunction with special reference

63

bullet in any domain that concerns language structure may now well be seen as naïve. So adjoining a modifying clause to a segment that contains two (or more) potential nominal hosts turns out to be complex? Realistically, how can that be surprising? We are after all talking about the two most complicated objects in syntactic description: NPs and clauses (not adjective phrases or adverbial phrases or prepositional phrases) combined. There is geometry to consider, naturally, but there is also rich lexical semantics, rich predicate-argument structure, rich prosodic interfacing and rich reference tracking, including anaphor resolution and issues pertaining to semantic restrictiveness and online contextual linking, coupled with general language type and sheer ambiguity resolution. In fact, the really odd thing to expect is for all those things NOT to count and not to interact! In this sense, again ‘at a minimum’, the RC attachment debate is a wonderful window into the nature of the language faculty. A really interesting aspect of the debate is the sheer ingenuity of the researchers involved in it, and how that ongoing ingenuity historically composes a fascinating narrative of the complexity of language.

Notes 1 In Cognitive Grammar this operation is overshadowed by the idea that the distinction between complements and modifers is a matter of degree, refecting relative salience (Langacker 1987: 435 f.). 2 First is MA, which explains why in structures like: a. Joe bought the book for Susan to the party b. Alice saw the singing frog in the garden in the bathroom c. Henry told the intruder that he met to leave

3 4 5 6

7

8

9

speakers prefer VP-attachment over NP adjunction in each case (see Phillips & Gibson 1997, from which the above examples have been taken). See Ferreira and Nye (2018) for more modern conceptions of modularity. From here on the terms Late Closure (LC) and Low Adjunction (LA), on the one hand, and Early Closure (EC) and High Adjunction (HA), on the other, will be used as synonyms respectively, in line with common parlance in the feld. A CNP is a NP that has another NP inside or more. For example, at the lexical level, we have long known that more common words are recognized faster than uncommon ones and that words starting with very common syllables are recognized more slowly (both efects being of about 70/80 milliseconds). See Balota and Spieler (1999). In a sentence like all the boys who passed the exam can now leave the relative clause who passed the exam is used to restrict the reference set of the boys in that not all the boys passed the exam. That is the job of so-called restrictive relatives. Non-restrictives (which require commas in writing and a special intonational contour in speech) cannot accomplish that narrowing down function: in all the boys, who (as you know/by the way) passed the exam, can now leave, all the boys did pass the exam. Argument structure relations hold between a predicate (say, the verb eat) and its arguments (for eating, a constituent denoting the eater and another constituent denoting the thing being eaten). The core/nucleus of any sentential message involves these two notions. Adjuncts are not nuclear. The whole issue of the grain problem seems amenable to a Construction Grammar approach that would need to carefully gauge the way the whole symbolic construction

64 Modifer adjunction with special reference

10

11

12

13 14 15

16 17

18

19

interacts with its component parts, particularly with powerful semantic dimensions like animacy. I am not aware of any research done in this direction. Fodor (1998: abstract): “A strong claim about human sentence comprehension is that the processing mechanism is fully innate and applies diferently to diferent languages only to the extent that their grammars difer. If so, there is hope for an explanatory project which attributes all parsing ‘strategies’ to fundamental design characteristics of the parsing device. However, the whole explanatory program is in peril because of the discovery (Cuetos & Mitchell 1988) that Late Closure is not universal”. I assume a rather traditional conception of the structure of the NP according to which the head is the noun, not the determiner. In any case, even if we adhere to the Abneyian view that the head is the determiner and talk about Determiner Phrases (DPs), it would still be the determiner of the frst NP, not of the second one. By ‘projection’ in linguistics is meant the enlargement of a basic category, such as noun or adjective, into phrase structure form. For instance, the noun car may be expanded into red car, and red car into the red car. This latter expression is the maximal expansion/projection of the noun car, a full NP with referential power (the red car may refer to an extralinguistic object in the world in a particular context of discourse; car and red car cannot: *car is great; *red car is great, but the red car is great). Reference is achieved by NPs, not by nouns alone. As already noted, a string like *car is great is not possible because car is not a noun phrase, just a noun. For recent work on the efects of task demand on processing decisions, see Schlueter et al. (2019). The generative grammar of the 1980s coined the expression thematic or theta assigner in connection with the theta-criterion (Chomsky 1981: 35). This stipulated that every argument in a predication (say, John and window in John broke the window) must have a thematic role. In John broke the window, John is ‘agent’ (the one performing the action) and the window is ‘patient’ (the one being afected by the action). Both thematic roles are given by the lexical semantics of the predicate break. Verbs and some prepositions assign thematic roles. For instance, for may assign benefciary, to may assign goal and with may assign accompaniment. See Grillo and Costa (2014: 171) for diferent types of the preposition with and how this may afect attachments. Traxler et al. (1996) consider that PP adjuncts are not actually treated as adjuncts initially because many PPs are in fact arguments (Put the key on the table). This is not the case for relative clauses. Note, however, that the relativizer used in most experiments in English is that, and that can also introduce arguments when it is used as a complementizer. ‘Centre embedding’ in  linguistics refers to  the process of embedding a phrase  in the middle of another phrase of the same kind. More often than not, this leads to almost impossible parsing, which may be difcult to explain on grammatical grounds alone. Relative clauses are usually mentioned in this context. For instance, we can say the rapidity that the motion has is remarkable (where the RC that the motion has has been embedded in the NP containing the rapidity) and then also the motion that the wing has is remarkable. But if we attempt to further embed the motion that the wing has phrase inside the rapidity that the motion has phrase we reach the rapidity that the motion that the wing has has is remarkable … Hardly palatable. In most Indo-European languages right-branching is all right (the rapidity of the motion of the wing of the …) and left-branching (common in head-fnal languages like Japanese) much less so but still possible: the bird’s wing´s motion’s rapidity … It is branching to the centre that constitutes a problem and this fact illuminated fairly specifc limitations of the human short memory system. See Pinker (1994: 201 f.), from whom the previous examples have been taken, and section 2.4.2 on working memory. On the reading span task, see the pioneering work of Daneman and Carpenter (1980). In this task subjects read sentences and are instructed to try and remember specifc

Modifer adjunction with special reference

20

21

22 23 24 25

65

words in them. An individual’s reading span is then the number of sentences for which the targeted word in them is remembered. See section 2.4.2. Kwon et al. (2019) observe that animate NPs are more likely to occur as heads of subject relative clauses but are dispreferred as heads of object relatives, where inanimates predominate (see also Mak & Schriefers 2002 on Dutch and German). This refects a common scene where humans act on objects. Note the well-known assertion of Clifton and Ferreira (1989): “To make a conversational implicature, a listener must have already parsed the sentence, assigned it its literal interpretation, realised that additional inferences must be added to make it conform to the Gricean maxim, and determined what these inferences are. Such activity could not reasonably afect the initial steps of parsing”. In the Visual World Paradigm, researchers study people’s eye movements while following verbal instructions about a range of objects in sight (for a review, see Huettig et al. 2011). Note that, outside the CNP + RC construction, we noted above that Garnsey et al. (1997) provided evidence that verb bias and plausibility interacted throughout the process of resolution of the direct object versus subject NP ambiguity. Note that the assumption that colonel refers to a man is not really warranted, at least not today. On Pseudo Relatives see, among many others: Radford (1975), Auwera (1985), Guasti (1988, 1993), Cinque (1992), Koopman and Sportiche (2010) and Casalicchio (2013).

3 AGREEMENT

3.1 Introduction The facts of agreement typically surprise the uninitiated. Take an English sentence such as the small squared pens may all be broken. The little morphology left in English is evident only in the plural of buttons and the –en participle of broken, and it hardly imposes any computing need of the co-constraining kind that is so habitual in grammars. That is, in this English sentence no part of it seems to agree with anything else. Spanish provides an interesting contrast. The Spanish version of the same sentence uses fourteen morphological cues and they all are of the co-constraining kind: (1) The small squared pens may all be broken (2) Los pequeños bolígrafos cuadrados pueden estar todos rotos The traditional way of approaching such exuberance is to say that the noun bolígrafos has the gender feature +masculine by lexical specifcation (mesa ‘table’ is feminine instead, in typical arbitrary fashion) and the number feature +plural by conceptual analysis (the apprehension of a numerosity in the world described extralinguistically). Whatever satellite expressions accompany the noun must replicate these features of gender and number redundantly and alliteratively (hence the multiple -os in the sequence, all meaning masc + pl). Immediately, the noun is accompanied by the determiner los and the adjectives pequeños and cuadrados, forming a noun phrase (NP). Non-immediately, the plural feature of the noun is repeated at the verb tienen (as opposed to singular tiene) and the participle rotos, which is, in turn, accompanied by the quantifer todos (the last three words thus forming a diferent phrase, the verb phrase). Additionally, the verb also codes a +person feature (third, plural) that links the event coded in the sentence to a speech DOI: 10.4324/9781003405634-3

Agreement

67

act situation. Third person roughly means that neither the speaker nor the hearer are involved in the event described. Some languages, like German or Russian, add a case feature to number, gender and person. Case indicates whether the phrase carrying it performs a subject role or an object role in the sentence, for instance. Some others, like Polish, put gender on the verb too. Understanding systems such as the ones just described is obviously challenging, yet the agreement systems of these well-known Indo-European languages pale into insignifcance when we consider other, less familiar ones. For instance, according to Ferguson (1964), Bengali codes twelve diferent categories of person, number and respect and verbs have syncretized afxes for person, mode, tense and aspect. For each tense/mode/aspect set of forms there is independent marking of fve person/number/respect classes. Thus, -i is the frst person ending in the present but the second person inferior in the future, while -o is the second person ordinary in the present and the frst person in the future. The combinatorial possibilities are explosive, and yet a mundane event for the native speakers of the language, happy to ignore the complexity they set in motion every second of their linguistic existences. Linguists have provided incredibly rich descriptions of the agreement systems of many of the world’s languages (see Corbett 2006 for an excellent cross-linguistic account and Acuña-Fariña 2009 for many of the points discussed here). Such descriptions compose an area of grammar that is well known for its extreme complexity. And yet, this is surprising given that, in principle, agreement seems easy to grasp, as in Steele’s (1978: 610) classic defnition: “agreement refers to some systematic covariance between a semantic or formal property of one element and a formal property of another”. In Ferguson and Barlow’s (1988: 1) words, agreement happens simply when: “A grammatical element X matches a grammatical element Y in some property Z within some grammatical confguration”. Thus, agreement co-indexations capitalize on the existence of a morphological component containing features (typically, gender, number and person) that originate somewhere (say, a noun acting as controller or probe, etc.) and are replicated on another place or places (say, on determiners or adjectives as targets or probes, etc.) within a structurally defned domain (the phrase, the clause, the sentence or the discourse). So, for instance, in (2) above, the features masc and pl originate in the noun bolígrafos and are replicated on a determiner (los), three adjectives (pequeños, cuadrados and rotos) and a quantifer (todos); additionally, the number feature pl of bolígrafos is replicated in the verb too (pueden). The domains involved are the (subject) NP and the clause (the NP-VP complex), since the features of the head noun in the subject NP cross over to a diferent phrase, the VP. It does not really sound so complicated … And yet puzzles abound, the main one probably being its utility. For instance, agreement-wise, given the English paucity that is evident in (1) above, it is hard to make sense of the dear computational room given to the exuberance that is so evident in Spanish in (2). In systems that must code so much conceptual structure via operations that take place in milliseconds, why bother to code all those

68

Agreement

objectively unnecessary redundancies? This puzzle has often prompted grammarians to express their surprise with unabashed Anglocentric shock. Additionally, since agreement often capitalizes on gender and this is often completely arbitrary (as in todas esas sillas altas y rojas ‘all those tall red chairs’, where the feminine feature of silla is replicated in four satellites of the noun, seemingly unnecessarily), it is instructive to take stock of the depth of the shock1: Gender and declension are grammatical features peculiar to spoken and written names. These features do not add to the signifcative power of language. (William of Ockham, c. 1285–c. 1349, Summa 1.3) It is almost as though at some period in the past the unconscious mind of the race had made a hasty inventory of experience, committed itself to a premature classifcation that allowed of no revision, and saddled the inheritors of its language with a science that they no longer quite believed in nor had the strength to overthrow. Dogma, rigidly prescribed by tradition, stifens into formalism. (Sapir 1921: 99–100) [on German gender]: the classifcation is arbitrary. No underlying rationale can be guessed at … The presence of such systems in a human cognitive system constitutes by itself excellent testimony to the occasional nonsensibleness of the species. Not only was this system devised by humans, but generation after generation of children peaceably relearns it. (Maratsos 1979: 232) Grammatical gender marking in languages such as European languages which have only two or three genders seems to be almost totally nonfunctional (Trudgill 1999: 148); “It is … not unlikely that languages with large numbers of afunctional grammatical devices will become less numerous, and indeed it is not entirely impossible that linguistic gender, except perhaps for natural gender in the third person, will one day disappear from the languages of the world, never to return”. (Trudgill 1999: 149) As can be seen, for many, gender classes and agreement constitute a parade example of the non-iconic nature of language: “a clear case of the victory of the indexical aspect of language over its iconic aspect” (Haiman 1985: 162). Jespersen (1922: 352 f.) viewed agreement as “superfuous” and “cumbersome”, and Corbett (2006: 274) notes that it “often appears to involve a lot of efort for a questionable payof”. Perhaps the best instance of a perplexed reaction to the mysteries of agreement comes from the confnes of Cognitive Grammar (Langacker 1991b: 289 f.). Remember that in this framework every form is

Agreement

69

a symbol, and symbols come with a meaning side. In principle, however, as Langacker (1991b) observes: Indeed, the apparent arbitrariness of gender assignment over most of the lexicon in European languages is generally the frst embarrassing fact to be thrown in the face of anybody with the audacity to suggest that grammar might be semantically based. (p. 304) Agreement markings are perhaps the archetypal example of sentence “trappings” employed for purely grammatical purposes, and are supposedly inconsistent with any claim that grammar might have a semantic basis. (p. 307) Against this, Langacker views redundant agreement markings as predications in their own right and, “since it would be counter to both the letter and the spirit of cognitive grammar to describe this situation by a rule that ‘copies’ x from A onto B, or in terms of features ‘percolating’ up, down, across, around or through”, both x and x’ are analyzed as “meaningful symbolic units” whose function is to signal grammatical relationships (say, “that B modifes A, or that A is an argument of B”). And: “I would only reiterate in this regard that serving a specifable grammatical function is perfectly consistent with being meaningful” (p. 308). Taylor (2002: 332 f.) seems to not even see a minimum of functionality in the infection classes that agreement rests upon and ends up expressing a peculiar view: the very existence of infection classes is somewhat puzzling, since they contribute little to the symbolization of conceptual structure. English, for example, which lacks noun gender, is thereby not one whit less efcient as a symbolic system than languages that do have gender systems. On the contrary, elaborate infection class systems might seem to be dysfunctional, in that they place a heavy burden on a speaker’s memory. But while infection classes certainly present the second language learner with severe problems, they are a fact of many languages, and speakers of these languages show no signs of wanting to give them up. It would even seem that speakers take delight in the formal complexity of their language. For my part, therefore, I am inclined to see the complexities of infection classes, (…), as manifestations of humans’ delight in what I called (…) form-focused activities. (emphasis added) The functional mystery is compounded by the fact that most languages do have some form of agreement, including English (Mallinson & Blake 1981). In fact, even in this language, speakers implement number agreement operations at least once

70

Agreement

every 16 words or so, that is, about once every four/fve seconds (Eberhard et al. 2005: 532–533; think, by comparison, of the number of times English speakers do actually use complex clefts, island-related structures with interesting ‘derivational’ histories, or large bouts of embedding inside a nominal or determiner phrase, that is the kinds of structures that populate grammars). It is also worth noting that English speaking four-year-olds mark number agreement correctly over 94% of the time (Keeney & Wolfe 1972). As Bock et al. (1999: 331) note, “this makes it all the more plausible to view agreement, in its typical manifestations, as one of the automatic mechanisms of normal language production rather than a nicety of carefully prepared speech”. Back to gender, in fact, Dahl (2004: 112) regards gender systems among the “mature elements of language,” since they normally involve long processes of evolutionary steps, which is probably the reason why such systems are generally absent in pidgin and creole languages (McWhorter 2001: 163; see also Audring 2014 and Dye et al. 2018). Corbett (1991: 4) reminds us: “the determining criterion of gender is agreement”. Ouhalla (2005: 667, 683) argues with good cause that agreement is about making ‘roots’ visible to the computational system. Finally, Eberhard et al. (2005: 26) note that agreement is the “transient linguistic glue that grammars and speakers can use to hold important pieces of utterances together”. Another mystery is the issue of mismatches. Ideally, given a defnition of agreement as formal feature covariance, both ends of an agreement relation should bear the same morphological specifcations (e.g., Chomsky 1995: 309: “mismatch of features cancels the derivation”; see Corbett 2006: 143 f.). This, however, is very often not the case, and English is now the parade example of that (Pollard & Sag 1994; Kathol 1999). Acuña-Fariña (2018a) provides the following constructed paragraph, noting that, based on the above defnition (agreement as feature co-occurrence), no single sentence in it displays agreement at all: The hash browns at table nine is getting angry because a number of dishes he ordered lately were wrong. It seems that the committee are fnally taking that case and others into consideration, especially because ninety dollars seems an obscene amount to pay for what they are receiving. They say what is needed are employees who can run this place like wolves, not like sheep: sheep only graze and are basically uninteresting. Then, eggs and bacon is good on occasion, sure, but clearly shouldn’t be the norm. Our current rules are a nuisance. After a customer replies with a curt ‘fne’ to a ‘did you like your meal?’ question, we can no longer wriggle out of the embarrassment with something as lame as ‘We seem to be a bit displeased with ourself today, Mr. Jones’. Heaps of time and efort is being wasted – in the wrong direction. That three days our restaurant was in the Paris papers has actually proved detrimental. Note that mismatches like that three days was great or the committee are taking are crucial in the way we contemplate a grammar of agreement (Acuña-Fariña 2018a; see beginning of section 3.2 below too):

Agreement

71

The hash browns is a classic in the agreement literature (Pollard & Sag 1994). The ‘nursely we’ construction is supposed to be unique ( Joseph 1979). The nature of fused relatives (what is needed are …) is such that every agreement option is left in the domain of the tentative. Measure terms (sixty dollars is a rip-of ) are just natural, that is they normally take a singular. Collectives are also naturally dual: easy to construe either as a forest or as the trees in it. When contemplating the contrast in (1) and (2) below: [3] A variety of fresh vegetables are available. [4] A variety of fresh vegetables is good for you. Wechsler and Hahm (2011) raise the question of why the target verb in (1) fails to ‘fnd’ the singular number feature on the noun ‘variety’ (…), and note that the particular solution to this problem is not crucial since any analysis that has the efect of blocking the projection of the syntactic number feature up to the NP node should “sufce” (see also Wechsler & Zlatić 2003: 121f ). I argue that any analysis might sufce if these numbertransparent nouns (Huddleston & Pullum 2002: 504) were an isolated phenomenon. As the opening paragraph shows, they are not. The same can be said of the central idea in Minimalism that agreement features need match, for otherwise derivations will crash (Chomsky 1995: 309; Chomsky 2000). Nothing crashes in the paragraph above, at least in the minds of English speakers. (Acuña-Fariña 2018a: 450–451) It seems therefore that mismatches are too many and too obvious to ignore and explain away as a small bunch of exceptions. The widespread semantic interference they exhibit seems obvious and, in light of the size of the problem, somewhat systemic (contra Baker 2008: 21; alternatively, for classic arguments against a pan-semantic approach, see Pollard & Sag 1994: 71 f. and Acuña-Fariña 2018c). Corbett (1979, 2006) made a great contribution to the understanding of this area when he proposed his Agreement Hierarchy (AH). The AH posits that semantic interfacing increases monotonically with distance between controllers and targets. Thus, for instance, it is quite natural to violate feature co-occurrence in subject/VP coindexations (the committee (sg) are (pl) gathered), but this is much harder inside a subject NP (*these committee), especially for American speakers versus British ones. In sum, both the utility of agreement systems, their extreme variation across the languages of the world and the complex interplay between form harmony and semantic interference make agreement an almost perfect illustration of the complex system we call grammar. Naturally, explanations of why language in general behaves this way, that is, with particular languages difering wildly on the type, size and number of the agreement operations they enforce, enter the

72

Agreement

domain of the theory-driven in linguistics. Psycholinguistically, the number of questions that such complex agreement systems pose is large. To name but a few: ●











Is agreement the same for both arbitrary gender (mesa alta ‘tall table’) and for biologically grounded gender (chico alto ‘tall boy’ as opposed to chica alta ‘tall girl’)? Is it the same for gender (of whatever kind) and (always conceptually grounded) number? Is morphology processed in the same way in controllers (say, the noun where features are supposed to originate) and targets (say, adjectives or verbs for which gender is an extraneous addition)? Is it the same when controllers and targets are close by (as in these-pl cars-pl in English, as opposed to this-sg car-sg) or separated (as in this car over there is nice versus these cars over there are nice), or even very separated (as in these boys told the person running the whole event that they didn’t like it at all, where these boys in the frst clause is co-referential with they in the second, under the most natural interpretation)? Is agreement directional, a sort of copying operation, or static, a sort of discontinuous morpheme expressed via a unifcation mechanism (Ferguson & Barlow 1988: 13)? Lastly: is agreement an essentially syntactic phenomenon (an encapsulated cycle, perhaps; Gazdar et al. 1985; van Riemsdijk & Williams 1986: 302; Chomsky 1995: Chapter 2; Bock et al. 2001; den Dikken 2003) or, alternatively, a semantic phenomenon (Dowty & Jacobson 1989; Pollard & Sag 1988; Wechsler & Zlatic 2003; Haskell & MacDonald 2003)?

Note that, in principle, the general dynamics of the formal encapsulation versus porosity (interference, interactivity, etc.) agenda lends itself easily to experimental evaluation. Take a conception of agreement that rests on notions such as Phase Impenetrability (Chomsky 2000, 2001). On this view, agreement must be local to minimize search (Chomsky 2001: 13) and must be resolved as an early cyclic step in a serially defned set of operations (‘Agree’; Chomsky 1995: 349 f.; Guasti & Rizzi 2001; Pfau 2003), so that semantics is not supposed to play a part in it (= syntax is ‘impenetrable’). The philosophy behind this kind of idea is well known to linguists of all persuasions and I am just using broad strokes here as illustration. In this context, the fnding of a semantic efect (animacy, topicality, underlying numerosity, etc.) at an early stage of processing may be taken as evidence against the theory, and vice versa: a failure of the efect might be taken as evidence to reinforce it, if we assume the psychological adequacy of our linguistic theories (Kaplan & Bresnan 1982; Van Valin & Lapolla 1997; Marantz 2005; Franck et al. 2006; Jackendof 2007b; see chapter 5). In the remainder of this chapter we will have plenty of occasions to see how this theoretical debate is handled psycholinguistically. We will move between models in language production and those in language comprehension. It is best

Agreement

73

to consider them separately even if one is of the opinion that production/comprehension asymmetries are about the temporal fow through the processing system and not really distinct processes. Thus, frst, in section 3.2, we will examine the way speakers make agreement mistakes mainly in production tasks (and a little also in comprehension tasks, about which we know less). There is now accrued a very rich body of experimental work done in the area known as attraction. Then, in section 3.3 we will evaluate work done in the realm of comprehension studies, most of it carried out with the sophisticated electrophysiological precision of the ERP methodology. The emphasis throughout will be to show how an appreciation of psycholinguistic theory and a grasp of the basic facts of the experimental agenda vastly augment our understanding of agreement in general, not just from the implementation side (performance) but also the representational side (competence).

3.2 Agreement attraction The study of agreement mistakes in production constitutes a rich research agenda that originated with the pioneering work of Kathryn Bock in the early 1990s (Bock & Miller 1991; Bock & Cutting 1992; Bock & Eberhard 1993). The mistakes studied involved now classic examples like the label on the bottles are green, where the complex Noun Phrase (henceforth also CNP) the label on the bottles has a sg specifcation at the top of the phrase (which means that the whole phrase is sg) but the verb incorrectly uses a pl feature instead, presumably because the nearby (usually called local) noun bottles appears in the plural. In classic parlance in the feld, it is said that the plural in bottles ‘attracts’ the verb illegally; hence, the whole domain of studies in these intervention confgurations (Franck et al. 2015) is known as attraction (the opposite confguration, Pl+ Sg, rarely results in mistakes; Bock & Miller 1991). These attraction mistakes very often (but not always) correspond to what traditional grammar regards as proximity concord ( Jespersen 1922; Francis 1986; Payne & Huddleston 2002: 500f.) In psycholinguistics, research of this kind involves a simple completion methodology: participants are given preambles like the label on the bottles in various experimental conditions (e.g., the label on the bottle, the label on the bottles, the labels on the bottle, the labels on the bottles) and they are simply instructed to repeat them and then go on to complete them any way they like. As we will see, the close examination of these completions provides a surprising wealth of information on the workings of agreement. Often explicit claims are made that connect ‘the grammar of mistakes’ to fairly specifc tenets of various grammars (e.g., Pfau 2003 maintains that agreement mistakes in German provide evidence for a view of Morphology that is compatible with the theory of Late Insertion; see also Acuña-Fariña 2012). Attraction has become too large to tackle in a single contribution, so the frst thing that is needed is a delimitation of the scope of this section. Although a brief account of attraction in comprehension will be ofered in section 3.2.1.6, the main focus here will be on production and, assuming that errors are informative

74

Agreement

on the nature of the structure where they occur, the specifc research question examined is: what causes attraction errors? Basically, there are two types of possibilities: form factors and conceptual factors. These are the same major factors discussed by grammarians in their independent research. For instance, as already noted, in all of (5)–(9) below (taken from Acuña-Fariña 2018a; see above), semantic control, or agreement ad sensum, explains the mismatch between the morphological specifcation in the subject phrase and that of the various targets: (5) The hash browns at table nine is getting angry. (6) We seem a bit displeased with ourself today, Mr. Joseph. (7) The committee are looking into that. (8) A number of ideas were/*was proposed. (9) Twenty dollars seems a ridiculous amount to pay to go to the movies. (10) The only thing we need now is some new curtains. (11) Eggs and bacon is my favourite breakfast / are particularly expensive. Thus, in the classic example in (5) the hash browns is to be construed as a single client asking for hash browns in a restaurant, hence, the agreement in the singular. In (6) a nurse speaks out of solidarity to a patient: ‘our’ in ourself refects that two people are involved, ‘self ’ indicates that only one is actually sick. In (7), another linguistics classic, committee is a collective noun in the singular that actually entails a numerosity in its meaning, and agreement resolves the confict by opting for the semantic type of information (plural now), disregarding the form. In (8) the ‘true’ head of the CNP seems to be ideas, not number, which appears to have been bleached of its lexical meaning in the process of becoming a quantifying expression. The examples in (9)–(11) show various cases of mismatch between pre- and post-copular NPs. For instance, in (9) twenty dollars (plural) is seen as an/one amount (singular). The semantic interfacing shown in these mismatches plagues the discussions in grammarians’ accounts. Attraction provides a convenient scenario to test such conceptual interference because it allows researchers to manipulate materials seemingly endlessly. Unlike in grammar, however, in attraction when such interference occurs it provokes malfunction in the form of an agreement error, not a legal string. But, crucially, the impetus seems to be the same: semantics derailing a presumably formal operation of feature copying. Before turning to the main theories and the main ideas in attraction research, I add a few real examples of it below. The frst three appear in an Internet guide designed to help avoid agreement mistakes (https://writerswrite.co.za/30-examples-to-help-you-master-concord/). (15) and (16), quoted in Bock, Eberhard, Cutting et al. (2001), were made by well-known personalities. (17) and (18) were made by well-known experimental psychologists in edited language, particularly in psycholinguistic papers where they themselves were studying attraction! Finally, (19) is due to a well-known linguist. According to Bock et al. (2001: 117), in the United States the Aptitude Test for Standard English presents attraction

Agreement

75

as one of the most insidious mistakes to detect. A cursory look at (12)–(19) surely confrms that statement: (12) The message between the lines ARE that we need to fnish before Monday. (13) The case of champagne bottles ARE for the year-end party. (14) The sentiment in our ofces ARE that our bonuses were measly this year. (15) The sheer weight of all these fgures MAKE them harder to understand (Ronald Reagan, 10/13/1982; quoted in Francis 1986). (16) … the illiteracy level of our children ARE appalling (George Bush, Washington, 23 January 2004). (17) Our work is based on the assumption that some key notions of formal syntax, such as intermediate traces, IS directly refected in processing/ memory constraints at play in on-line language production. (Franck et al. 2010: 3). (18) … the nature of the processes that underlie this task ARE complex. (Gillespie & Pearlmutter 2011: 377). (19) … the relation of more oblique arguments to the predicate ARE less obvious than those of the central … arguments. (Croft 1988: 169)

3.2.1 Psycholinguistic theories of agreement The purpose of this section is to provide a close examination of psycholinguistic theories of agreement production. The emphasis will be on what they accomplish well and what they do not. In the main, there are four diferent families of theories and they contemplate agreement from seemingly every conceivable angle. They all have underpinnings in linguistic theory, touching on such issues as the directionality of agreement operations, the copying of features or, alternatively, their unifcation. Additionally, the role of morphology in agreement production needs to be carefully evaluated and recent research is showing something that, with hindsight, now seems obvious: namely, that if agreement capitalizes on the existence of a morphological component, then the size and shape of that component surely matter. The theories or families of theories discussed here are presented largely as they appeared in the historical record, even if aspects of them are now less appealing than they used to be in light of recent fndings. I adhere to the history of fndings in an attempt to fnd a narrative that reveals how agreement can be approached in conceptually diferent ways. It will be seen that the facts uncovered by the last twenty-fve years of research are too many and too diverse to expect any of the present theories to account for them all. In this sense, this section will not change the fact that agreement continues to be a challenge for both linguistic and psycholinguistic theory because it “is not only syntactic, not only semantic, and not only pragmatic, but all of these things at once” (Eberhard et al. 2005: 531). However, it will also be seen that the major formants of a psychologically sound theory can nevertheless be discerned beginning with what we already know.2

76 Agreement

The methodology used in all these studies is very much the same: participants are given preambles to complete. These preambles refect the experimenters’ manipulations: they may manipulate the frst noun or the second or both in terms of their meaning (e.g., animate versus inanimate), form (sg versus plural; feminine versus masculine; phrases versus clauses) or both. Match conditions ofer CNPs with the same feature specifcation, so no error can actually occur (e.g., sg + sg; pl + pl). Mismatch conditions are the relevant part: e.g., sg + pl or pl + sg. For instance, in the frst, pioneering study of Bock and Miller (1991: 57), which compared PP modifers (the key to the cabinets) and RC modifers (the boy that liked the snakes), the authors used eight 88-item lists and every list contained 32 experimental preambles, one from each of the 32 sets, and 56 fllers. The lists were recorded on audio tape by a female speaker.

3.2.1.1 Maximal Input The frst theory is Maximal Input (Vigliocco et al. 1996a, 1996b; Vigliocco & Franck 1999; Vigliocco & Hartsuiker 2002, among others), and it rests on the notion of unifcation in grammar. Unifcation refers to the idea that agreement is a sort of ‘long component’ or a ‘discontinuous morpheme’ (Ferguson & Barlow 1988: 13). On this view, all morphological features originate where they actually are (so strictly speaking there is not a controller versus target distinction) and become unifed later in a checking operation to verify that they are consistent, that is that they match (Pollard & Sag 1994; Copestake 2002; Wechsler 2008). Thus, the features are embedded in the lexical items and they do not get copied or moved anywhere (Langacker 1991b: 307). In sum, this view rests on the idea that diferent sentence constituents can encode information about a single formal object if that information is consistent across the carrying elements. In grammar, unifcation is a great tool for dealing with absent controllers. For instance, in the Spanish sentence estoy contento/a ‘(I) am happy’, the gender in the adjective depends on the sex of the speaker, which does not appear anywhere in the sentence mainly due to pro-drop (the tendency in Romance languages to drop subjects, as these tend to be reduplicated at the verb position via a rich infection). Unifcation is, in principle, a useful tool for an incremental parser too because it does not necessarily involve whole sentences or even phrases. Thus, to use Spanish again, in me gustan mucho los lápices azules pero prefero los rojos (‘I like the blue pencils a lot but I prefer the red ones’) the head noun lápices ‘pencils’ is simply omitted in the second coordinated clause as it is given/recoverable information. This, however, poses no informational hurdle since both the determiner los and the adjective rojos mark the gender and number features (masculine + plural) that easily point to the information that has been omitted. The idea behind Maximal Input is that, even though agreement is resolved primarily in the domain of morphosyntactic encoding, after conceptual structure (Garrett 1976), domains are porous so conceptual structure may interfere with agreement if that is advantageous (Berg 1998; Acuña-Fariña 2009, 2012).3

Agreement

77

In a unifcation confguration each cue is a self-standing symbol, which means that it can connect to conceptual structure on its own directly without the need to receive its semantic specifcation from a more or less distant controller. This provides more opportunities for semantic interference. Evidence for this view was twofold: frst, distributivity efects were initially observed in languages with a rich morphology, like Italian, French and Spanish, but not in English, which is notorious for its poor morphological component. The distributivity agenda will be important here, so let us explain it already. Take a phrase like the fag on the windows, as opposed to another one like the pool for the swimmers. Even though both are formally identical (they both have a singular head noun followed by a modifying plural one), the former makes it easy to conjure up an underlyingly plural interpretation involving one fag on every window (so a plurality of fags), something not available, in principle, for the latter (there is only one pool shared by everyone). As a result of their difering semantic entailments (the underlying distributivity of the former but not of the latter), agreement errors in which the verb appears in the plural were much more common in distributive than in non-distributive preambles in the Romance languages (so more errors like *the fag on the windows are amazing than like *the pool for the swimmers are amazing). This suggested that the constant, redundant cues of Spanish, French and Italian, each being tied to a conceptual representation, provided constant access to that conceptual representation (Vigliocco et al. 1996a, 1996b), as illustrated in Figure 3.1 for the NP the tall blonde girls in Spanish below, where every –as afx codes fem + sg: The second factor that promoted the same interpretation was the rate of errors involving biological gender agreement (e.g., chica alta ‘young tall girl’ versus chico alto ‘tall young boy’ in Spanish) and arbitrary gender agreement (e.g., casa grande ‘big house’ versus abrigo grande ‘big coat’ in the same language). Thus, for instance, Vigliocco and Franck (1999) capitalized on the fact that French and Italian have nominal gender systems of this kind, which include a distinction between nouns refecting the sex of the referent (conceptual gender) and nouns which do not (grammatical gender), and manipulated gender attraction in subject-predicative adjective ties. They gave their subjects adjectives frst and, after that, preambles of the form ‘lo sposo-masc in chiesa-fem’ (‘the husband in church’) versus ‘il cero-masc in chiesa-fem’ (‘the candle in church’) in Italian, for instance. (2) Las chicas altas y rubias vienen solas

CONCEPTUAL STRUCTURE FIGURE 3.1

Constant access to conceptual representations via constant use of cues

78

Agreement

They were asked to complete the preambles using the adjective. So, for instance, imagine you are given the adjective nice and the preamble the husband in church and, asked to make a sentence, you come up with the husband in church was nice. They found more gender agreement errors between the subject and the predicate adjective when the subject head noun did not have any conceptual correlates (cero ‘candle’ above). This is extremely interesting because it is precisely those nouns with arbitrary gender that do not even require an initial gender computation that involves choosing between masculine and feminine, since their gender is lexically specifed, fxed and non-contrastive. Also with Italian and French materials, Vigliocco and Franck (2001) used epicene nouns in both languages in the same kind of experimental setup. An epicene noun (like vittima ‘victim’ or personaggio ‘character’ in Italian) has a fxed grammatical gender but can refer to either a female or a male referent. For these words, agreement between a subject noun phrase and a predicative adjective after the verb is with the grammatical gender of the head noun, regardless of whether it refers to a male or a female participant. They found fewer attraction errors in the preambles in which the gender of the epicene matched the sex of the referent (so, for instance, when vittima = María or personaggio = Antonio fewer errors were produced than when vittima = Antonio and personaggio = María). This allowed them to state that the conceptual information interfered with agreement operations. They argued that conceptual information helps syntactic accuracy when congruent with syntactic information and hinders it when it is incongruent. Despite these insightful discoveries, the evidence against Maximal Input (i.e., the idea that every form connects to conceptual structure so that the more form cues a language has the stronger semantic efects should be obtained in that language) is actually too strong: after initial failures in the early 1990s (having to do with the inadequacy of the materials used in early experiments), semantic efects in poorly morphologized English agreement operations are now incontestable (Eberhard 1997; Humphreys & Bock 2005; Foote & Bock 2012; Bock et al. 2012; inter alia). In fact, today, with accumulated evidence, the very opposite idea is gaining strength: that a poor morphology cannot contain semantic interfacing for, in fact, English often shows greater distributivity efects than the Romance languages (see below). This is a problem for usage-based views of language (Langacker 1991a, 1991b; Croft 2001; Goldberg 2006; Bybee 2010), since in these it is axiomatic that every form (every -o, every -a, or -os, -as, etc) is a symbol, and symbols come with a meaning side, which means that semantic control should really be more easily observable in languages with rich and numerous cues (but they are not).4 As for diferential gender efects (biological versus arbitrary gender), these have been reported when manipulating the head noun only (Vigliocco & Franck 1999, Experiment 2), which makes them explainable on other grounds (head pre-eminence; see below). Over and above the specifc experimental counterevidence, there is something of a more general kind that seriously limits the psychological applicability of unifcation-based theories. Being of a declarative (instead of a procedural) kind,

Agreement

79

these theories present a static model of agreement. This means that operations start only once each cue is reached. This may be attractive for grammar formalisms but, given what we know about processing at large, it is unrealistic. Recent models of encoding interference (Villata et al. 2018 and references therein) have shown that much computing activity (launched by cues of all kinds) takes place before a resolution is fnally made on a target, even if the activity in question does not directly resolve the target (see also Franck & Wagers 2020). For instance, Barker et al. (2001) found that plural completions were more likely in (20) than in (21) below: (20) The canoe near the sailboats … (21) The canoe near the cabins … but while the efect became evident at the verb there is nothing at the verb itself that can explain the result (which must therefore originate before; see below). Franck (2011) notes that constituents encoded early must remain active for further syntactic operations. For instance, in French the subject is necessary for verb agreement and a moved preverbal object must be re-invoked for past participle agreement. In this language, Franck et al. (2010) found object interference efects with agreement even in confgurations that do not code participle agreement. This is evidence of active, predictive processing. That is, participants anticipated a kind of object agreement operation that fnally did not materialize but that anticipation caused the really materialized subject-verb agreement operation to derail. As is well known, structural priming shows that preferred structures depend on previously activated choices (Pickering & Ferreira 2008 for a review). Perhaps the most solid lesson that the last three decades of psycholinguistic research has taught us is the idea that the mind never rests idle but rather does massive prediction and priming-related activation by default (Kutas & Hillyard 1984; Delong et al. 2005; see also Jackendof 2007b: 383–385). Most prediction and priming are done based on already encountered cues, and agreement, especially in alliterative languages, provides no shortage of cues. It therefore makes little sense to expect the system not to be predisposed to project them in anticipation of their actual phonetic existence, that is, not to posit targets in the presence of potential prior controllers (Lewis & Vasishth 2005; Badecker & Kuminiak 2007; on comprehension, see Molinaro et al. 2011).5

3.2.1.2 Marking and Morphing The second theory of agreement in production is probably the best known and more fully developed, Marking and Morphing (M&M), and, as its name indicates, it has two components (Bock et al. 2001; 2006; Eberhard et al. 2005; see also Bock & Middleton 2011). Marking occurs frst, during functional assembly, and is responsible for assigning a number feature to referent NPs in toto. The processing raison d’être for positing it is the inescapable fact regarding distributivity efects

80 Agreement

in every language examined: that is, once invoked as a referent at the earliest stages of conceptualization, the fag on the windows is somehow plural ‘from the top’, or plural ad sensum. Eberhard et al. (2005: 8) point out grammatical reasons for Marking as well: it is needed to deal with abstract phrase number (e.g., the number of phrases like what, who and which, which can be either plural or singular), “vague quantifcation” (a number of issues ARE … ), conjunctions (e.g., ham and eggs, Karin and Scott, which can be either singular or plural without any of the conjoints marking the relevant value), as well as for diferences in the number behaviour of pronouns and verbs.6 Morphing implements agreement; it occurs later and is encapsulated. It involves (p. 9): “a set of interrelated operations that (a) bind morphological information to structural positions, (b) reconcile numberrelevant features from the syntax (number marking) and the lexicon (number specifcations), and (c) transmit number features to structurally controlled morphemes (e.g., to verbs)”. It operates during structural integration (that is, after functional assembly). When reconciliation between Marking (phrasal) number values and specifc (morphological) lexical values is needed, the latter prevail, as the system, based on the principle of spreading activation, gives more weight to these (Badecker & Kuminiak 2007). This may happen in collectives, when conceptualized as a plurality, since they may be lexically singular but ‘marked’ plural: this committee are …. Head position and plural nouns are also given higher activation values by default (it is assumed that sg is a default). The last stage in Morphing occurs when the reconciled number specifcation is copied onto the verb (which, in English at least, does not code number directly, unlike in unifcation; see den Dikken 2003 from the linguistics standpoint). This two-stage system explains semantic efects (the hallmark of agreement in both grammar and experiments) as a result of Marking. Morphing, on the other hand, accounts for the fact that collectives like committee or army do not attract when they occur as local nouns. For instance, a phrase like the members of the committee hardly ever causes mistakes like *the members of the committee is angry. This is because committee is inserted late, after functional assembly, during structural integration, once the initial conceptualisation stage (marking) is left behind. Collectives do attract in head position, consistent with head preeminence and the ordering of operations (frst is marking, then the rest; but see below).7 The model also explains the strength of infectional efects when referenced to the local noun. Since in that position as local nouns soldiers-pl may attract but army-sg does not (Bock & Eberhard 1993), attraction is explained by the fact that these mistakes occur after marking, during constituent assembly, the stage when form computations like infection matter the most. Another positive aspect is that it suggests a natural interpretation for the widely attested supremacy of number attraction over gender attraction: number is usually contrastive (car/ cars) but gender is much less so (table ‘mesa’ is obligatorily feminine in Spanish), so number is more liable to infectional miscalculation, during morphing. Thus gender behaves similarly to non-contrastive number, as in the case of pluralia tantun nouns like suds or scissors (which attract less). Finally, the model also accounts

Agreement

81

for the distribution of notional efects in gender agreement that we referred to above. Gender attraction was attested frst in French and Italian (Vigliocco & Franck 1999, 2001), but its strength does not seem to depend on the local noun´s notional properties (Vigliocco & Franck 1999, Experiment 2). The clear efects found occurred when the grammatical gender of the head noun was aligned with the sex of the referent. Eberhard et al. (2005: 24) note that these fndings are analogous to those for natural and grammatical number on head nouns. In short, meaning is allowed to launch biases if it occurs frst during conceptualisation and it afects the head noun, but once the entire phrase number (or gender) is set during this stage the next stage (when the local noun must be positioned in the tree) is impenetrable, running uniquely on formal rails. This means that semantic manipulations of the entire phrase or the frst noun may produce efects, but semantic manipulations of the local noun should not. Unlike static unifcation, Marking and Morphing contains a feed-forward component that makes it psychologically viable. This is the copying part of morphing. Since the model also contains form-free conceptual access via Marking, it seems every possibility –every actual instantiation of agreement, including wrong agreement- can be predicted. However, this is not so. There are in fact a number of problems with this theory (see also Franck 2011 and Acuña-Fariña 2012). The frst is the contradiction that the theory incurs in when outlining the timing of operations. This stems from the fact that the model is strictly serial. It assumes “that morphological information typically enters the production process later than notional information, at a point that is closer to speaking when the notional contribution has weakened” (Lorimor et al. 2008: 773). This is the reason why infectional efects are strongest at the second noun, and also why notional plurality is weak at that stage (anything notional counts less at that post-marking stage). Agreement is formally implemented when dealing with form, after the conceptualization is complete. This view – and its corollary, that there is no looking back – seems to be inspired by something analogous to the phase impenetrability condition of minimalism (Chomsky 2000, 2001; see Pfau 2003 for the idea that agreement mistakes in German provide evidence for a view of Morphology that is compatible with the theory of Late Insertion). So, with miscalculation (number of errors) in mind, coming in second after Marking has more of a weight in the system, as the second stage is closer to the fnal, third stage: phonological output (Bock & Middleton 2011; but see also Franck 2011). But, when discussing the supremacy of number attraction over gender attraction, they add (Lorimor et al. 2008: 792): From the standpoint of Marking and Morphing, a major diference between number and grammatical gender is that phrase number features can arise very early in production, on the basis of notional information, and precede the lexical and morphological access processes that explicitly bear grammatical number and gender information. That is, number is present during marking. Because grammatical gender on most nouns remains

82

Agreement

indeterminate prior to lexical access, gender is injected fairly late into the production process, during Morphing. So coming in frst in the serial chain is regarded as more relevant for miscalculations now (on problems of the M&M timing account for attraction in comprehension, see Tanner et al. 2014: 204; see also Smith et al. 2018 for an attempt at refning M&M by eliminating its strong seriality in favour of a dynamical systems approach with continuously changing features). Additionally, by manipulating how plausible verbs are vis-à-vis the two nouns in the preceding complex NP, plausibility efects have been reported too (Barker et al. 2001; Thornton & MacDonald 2003). For instance, Thornton and MacDonald (2003; see also Cummings & Sturt 2018) used preambles like the album by the classical composers and ofered verbs like praise or play to complete them. They found more agreement errors when both nouns were plausible subjects (both albums and players can be praised) than when only the head noun was so (only albums can be played). Note that the plausibility efects uncovered by Thornton and MacDonald occur when the verb, and not the NP, was manipulated. However, in morphing/copying terms, the verb should merely have a passive role in agreement computations in English, which occur only in the (previous) NP part of the structure-building process. Proponents of the Marking and Morphing model point out that the plausibility efects uncovered by Thornton and MacDonald (2003) are not due to attraction proper but to ‘predicate confusion’ (Bock & Miller 1991, Experiment 3) or ‘subject inaccessibility’ (Foote & Bock 2012). They maintain that often encoders lose track of the subject referent/ phrase and then produce a verb form that is inappropriate for it but appropriate for a distracting fake subject. Work by Staub (2009, 2010) is often cited to back up this view. Though appealing, this explanation seems to ignore the fact that the structures analysed by Staub (downward percolation, relative clauses such as *The cabinets that the key open are on the second foor; see below) are diferent from the structures manipulated by Thornton and MacDonald (2003) and that these latter are PP modifers like those in all the other studies of attraction. This makes attraction and predicate confusion very difcult to tease apart in practice. It is also important to remember that in the Thornton and MacDonald study, no completion produced complex noun phrases with actual inverted order (that is, with the wrong subject as the head of the phrase), something that should be expected if participants had truly taken the second noun for the head of the overall structure. Also, as Franck (2011) notes, the model lumps together under morphing a large number of operations whose relative timing profle is not really specifed. A conspicuous lack of clarifcation concerns what seems to be a basic diference between lexical retrieval processes (e.g., the diference between a retrieval of cat or of cats) and the copying of the retrieved information onto the verb stem target (see, for instance, the Selection and Copy model of Franck et al. 2010). Finally, even though the model makes use of the notion of spreading activation,

Agreement

83

feature-percolation during the copying part of Morphing is supposed to be implemented in discrete syntactic trees.8 It is easy to see how, within subject phrases, an errant (lower) feature may be inadvertently passed too high up the NP, but it becomes more difcult to envisage the path for a feature to be copied onto the verb stem when the feature in question comes from another referential phrase, as with clitic or object NPs in French: *Il les promènent *ʻHe-sg them-pl walk-pl’ (see Franck et al. 2006, 2010, 2015, for an explanation for that in terms of intermediate derivations). Additionally, feature-percolation in discrete trees is also hard to reconcile with two other facts: 1. that pronouns are also subject to attraction (Bock et al. 1999);9 and 2. that attraction occurs in so-called downward percolation, as in *the books that the government want are sold out (see Staub 2010; also Wagers et al. 2009; Lago et al. 2015 on comprehension; section 3.2.1.6), since percolation involves going up trees, not down (here the feature of the matrix clause subject phrase, books, travels down to infltrate the subordinate clause verb, want; see Villata et al. 2018: fns 8 and 10). All in all, despite its sophistication and relative fexibility, Marking and Morphing is not malleable enough to provide an explanation for various experimental fndings, as its strict seriality seems to be at odds with some of the facts. Finally, the timing of operations that that seriality entails makes for contradictory predictions (Smith et al. 2018; also Franck 2011).

3.2.1.3 The cue-based Working Memory Model: retrieval in production If unifcation-based theories of agreement are static and Marking and Morphing makes room for a dynamic feed-forward component, the third model of agreement production, the Working Memory Retrieval Model, or WMRM for short (Lewis & Vasishth 2005; Badecker & Kuminiak 2007; see Vasishth et al. 2019 for a review), is essentially a feed-back account. Based in part on the Optimality Theory notion of violable constraints (Bresnan 2001; McCarthy 2002; Prince & Smolensky 2004), and on computationally oriented linguistic frameworks, such as Head Driven Phrase Structure Grammar (Pollard & Sag 1994), with their emphasis on morphosyntactically tagged lexical representations, the main tenet of the model is its reliance on the role of working memory. As Badecker and Kuminiak (2007: 68–69) point out, the incrementality of sentence production causes some lexical items to fulfl grammatical roles before others. When choosing the form of new elements of a sentence structure, the syntactic formulator and/or parser may need to consult co-constraining relations that specifc words or phrases enter into as regards their grammatical role and their position in the syntactic structure. In this sense, retrieving information about details of earlier constituents involves working memory. They note that attentional interference efects also support the idea that agreement production involves constant looking back (Hartsuiker & Barkhuysen 2006). It is assumed that subjects stay in the focus of attention as long as they are close to their verbs but may be shunted from that focus when further material intervenes (Franck

84

Agreement

& Wagers 2020). As in working memory–based accounts of parsing (Gordon et al. 2001; Lewis & Vasishth 2005; Wagers et al. 2009), WMRM maintains that as long as the infected form of a target depends on the morphosyntactic features of a previous trigger, then that trigger must be inspected and isolated from other constituents in the ongoing representation. In ordinary S-V agreement, for instance, only the actual subject “will resonate to these retrieval cues”, in principle (Badecker & Kuminiak 2007: 69). Relying also on the notion of similarity-based interference (Lewis et al. 2006), the model proposes that the more subject-like a local noun is in terms of linear or structural position, case marking, etc., the more likely it will be to ‘resonate’ to the cue-based retrieval mechanism. Attraction efects are thus interpreted as failures of the cue-based retrieval process. Evidence compatible with the model comes in the form of research showing that form ambiguity can infuence agreement choices and that attraction is harder to attest when head and attractor have clearly diferential formal case marking (Badecker & Kuminiak 2007; Lorimor et al. 2008). For instance, Vigliocco et al. (1995) showed that morphologically transparent plurals in Italian produce fewer errors than ambivalent ones. Vigliocco and Zilli (1999) also found a facilitatory efect of transparent versus ambiguous gender marking in predicative adjectives. In German et al. (2003) found that when local noun phrases occurred in the accusative case (as in Die Stellungnahme gegen die Demonstrationen ‘the position against the demonstrations’), they attracted number agreement more than when they were dative (as in Die Stellungnahme zu den Demonstrationen ‘the position on the demonstrations’). This is presumably because the determiner in the plural, accusative local noun phrase ‘die Demonstrationen’ is case-ambiguous (as die is used for both nominative and accusative plural NPs), thus causing a cue-based system eager to fnd a nominative subject to incorrectly select the local noun instead. Lorimor et al. (2015) have recently re-examined the data in Foote and Bock (2012), which focuses on the role of morphology in attraction cross-linguistically and cross-dialectally, and they have found out that attraction errors in Spanish become less numerous when the two competing nouns in the typical attraction confguration difer in grammatical gender. It is newsworthy that grammatical gender does not play any part in Spanish subject-verb agreement relationships, but it can apparently serve as a cue if it matches the feature held in content-addressable memory and may thus serve to speed up access by simply diferentiating the nouns. Finally, the model makes a case for its ability to deal with hierarchical and structural depth efects (Franck et al. 2002 studied three-NP confgurations, and found that the more deeply embedded N2 caused more interference than N3) in terms of memory decay, and it may explain why attraction is possible from constituents (like objects) that are not inside the subject NP by stressing the fact that a content addressable mechanism simply requires the existence in memory of a constituent bearing some similarity to the subject NP (Franck & Wagers 2020). On this general cognitive view, there is no need for a specifcally linguistic mechanism that specializes in feature-tracking through specifc syntactic trees

Agreement

85

(as in feature-passing or feature-percolation accounts; e.g., Franck et al. 2006, 2010) and no serial processing. In a content-addressable architecture, the role of schematic syntactic confgurations is apparently not seriously contemplated as a cue (see Franck et al. 2015; but also Franck & Wagers 2020 for the interplay of memory and structure; see also Qian & Jaeger 2012 and Futrell et al. 2020 for compatible views on decay of cue efectiveness and information locality efects/ progressive noise, respectively). There is no question that short-term memory plays a crucial role in syntactic processing in general. In fact, various experimental psychologists believe the human mind contains a short-term memory component that is exclusively syntactic in nature (e.g., Fiebach et al. 2002). Additionally, research showing similarity-based interference is quite solid. However, fnding evidence for memory efects, similarity or even violable constraints does not mean that that is all there is in agreement computations, at least in production (see section 3.2.1.6 on comprehension). There are in fact both conceptual and practical obstacles to the view that that is all – or most – there is. Conceptually, the same objections raised against unifcation-based views of agreement apply here. The fact is that from a psycholinguistic perspective it makes as little sense to deny the existence of powerful looking-back biases as to deny the existence of powerful lookingforward ones: priming efects are clear, solid evidence of the latter and though the ACT-R architecture (e.g., Cowan 2001) that the model relies on contains a predictive component, this is much less conspicuous in actual work on the model. De facto, most of the explanations of results that proponents of the model invoke involve looking back only. Also on conceptual grounds, WMRM assumes that the encoding of nominal number is accurate but that the later process of accessing a number feature is error prone (see also Wagers et al. 2009). One may ask: why can the nominal part of the structure-building process not go wrong? It is hard to see what principle saves NP-assembling processes from failure, on theoretical grounds (see below). Additionally, there is a legitimate concern that a substantial part of what WMRM takes to be retrieval interference turns out to be encoding interference instead. In fact, Villata et al. (2018) claim that only two studies to date provide unequivocal evidence of retrieval interference (Van Dyke & McElree 2006, and Beletti et al. 2012) and that various studies that have been used to argue for a cue-based account can be explained in reference to encoding dynamics. The best evidence for this interpretation comes from data that show similarity efects between a target and a distractor for features that do not play any part at the point of resolution. We referred above to Lorimor et al.’s (2015) re-examination of Foote and Bock’s (2012) production study on Dominican, Mexican and English, where they realized that attraction errors in Spanish were more likely when the two competing nouns in the preamble shared the same grammatical gender. Errors are registered at the verb in Spanish, but the verb codes no gender distinction in that language. In French, as Franck (2011) observes (and as noted above), displaced preverbal objects interfere with agreement even in structures

86 Agreement

that do not code participle agreement. Although we focus on production here, it may be added that in a reading study Villata et al. (2018) found a facilitatory attraction efect at the verb in an experiment that manipulated the gender of attractor and head noun in Italian object RCs (e.g., ‘The ballerina-fem  [that the waiter-masc / the waitress-fem has surprised]’). Results showed that the participle verb was read faster when the attractor and head noun had diferent genders, as opposed to when they matched. Again, past participles in Italian object relative clauses do not code gender agreement, which implies that this feature could not have possibly been used as a retrieval cue at the participle itself. Encoding interference emerges independently from retrieval, during the initial encoding of structure. In attraction studies, it should thus be observable prior to the appearance of a verb or in setups in which the manipulated feature is not a retrieval cue. In the face of these data, it becomes hard to see that retrieval (an indisputable factor anyway) takes the lion’s share of the interference agenda. Another aspect that the model needs to resolve is what counts as a cue for retrieval. It has been argued that the absence in it of fne-grained semantic features makes it inadequate to account for various fndings (Villata et al. 2018; Smith et al. 2018). For instance, as Villata et al. note, the fact that Barker et al. (2001) found that plural completions were more likely in (20) than in (21) above, repeated below: (20) The canoe near the sailboats … (21) The canoe near the cabins … might simply be explained if we realize that the two nouns in (20) simply share a large part of their semantics. Finally, the model assumes basically the same mechanisms for production and comprehension, but the encoding dynamics are distinct enough to make room for at least some diferentiality (Gillespie & Pearlmutter 2011: 50; AcuñaFariña 2012; Acuña-Fariña et al. 2014; Tanner et al. 2014; Villata & Franck 2016). Production starts in conceptual structure whereas comprehension starts in form (and, in alliterative languages, very often in redundant, exuberant, conspicuous form of the -o, -o, -o vs -a, -a, -a type). This makes it much more likely for priming processes to occur in the former and for form biases to occur in the latter (the ‘instinct’ to expect a morphological cue – say ‘masc’– on all morphology-carrying constituents once a value for a feature has been set, as in the Spanish phrase todos esos chicos altos ‘all those tall boys’, which contains eight identical gender and number cues; Berg 1998; Acuña-Fariña 2009, 2012). From a speaker’s perspective, the need for reactivating a subject cue at the verb seems less stringent given the fact that the speaker already knows what s/he intends to communicate. In comprehension, on the other hand, when trying to understand the basic thematic structure of a predication (who did what to whom), the only way to know who the, say, agent-subject participant is fnally going to be is by paying attention to the form cues that assemble phrasal packages and that

Agreement

87

separate that participant from all others in the ongoing scene. This cue-driven process becomes essential then. Attention to form cues is especially important in languages where word order is not a strong predictor of structure (e.g., Romance languages). One crucial piece of evidence is relevant in this respect: in an eyetracking experiment using the same materials with which Vigliocco et al. (1996b) found one of the earliest distributivity efects on record in production, AcuñaFariña et al. (2014) found none in comprehension (see their table 7, p. 119). This suggests that semantic interfacing is much more likely in production and that, conversely, something else is more important in comprehension: attention to form cues. Overall, it seems evident that the model is much better framed with comprehension in mind, but its ability to capture the facts of production seems less evident (see section below). Beyond the theoretical concerns, the practical concerns relate to fndings that are hard to reconcile with the premises of the model. In particular, the fnding of strong attraction efects caused by object clitics in French (Fayol et al. 1994; Franck et al. 2006, 2010, 2015) is particularly problematic, since the clitics manipulated do not exhibit any of the confounding features that a cuebased retrieval mechanism might mistakenly associate with subjecthood: they are not inside the subject phrase, but outside of it as an independent constituent; they are unambiguously marked for non-nominative case (either accusative or dative), and they are not NPs (see Hartsuiker et al. 2001 for object interference in Dutch). (22) below is from Franck et al. (2010): (22) Le sénateur les REÇOIT ‘The senator them RECEIVES (The senator RECEIVES them)’ Also, Franck et al. (2010) too have shown (experiment 1) that the syntactic confguration itself may afect error rates. For instance, they managed to register attraction in the displacement environment of (22), where the object phrase (the patient) appears to the left of the relative clause object (cures), but not the ‘movement-less’ environment of (23) below, where a similar place is ‘generated in situ’10: (22) Jean parle aux patientes que le médicament GUERIT ‘John speaks to the patients that the medicine CURES’ (23) Jean dit aux patientes que le médicament GUERIT ‘John says to the patients that the medicine CURES’ In addition to this, when CNPs contain a head and two modifying nouns instead of only one (e.g., the helicopter for the fight over the canyons), both linear and especially structural depth efects have been reported. These are difcult to explain by the model, whose reliance on the notion of memory decay is merely insinuated, but not elaborated. Concerning linear efects, in particular, recency of activation predicts that NP3 should be more likely to be erroneously retrieved than NP2 (because NP3 has been more recently produced), but this is the opposite of

88 Agreement

the pattern found in, for instance, Gillespie and Pearlmutter 2011; experiment 2). Similarity-based interference does not appear to diferentiate between NP2 and NP3 either, especially in fat structures of the kind in the highway to the western suburbs with the steel guardrails, where both nouns modify the head (instead of in the cascading type – the backpack with the plastic buckle on the leather strap – where NP3 modifes NP2 and this in turn modifes NP1; see Gillespie & Pearlmutter 2011).11 In sum, accounting for looking back processes in the production of agreement is necessary but not sufcient. Relevantly, the model does not sufciently diferentiate retrieval and encoding interference. It does not seem to have much to say on the important role played by notionality (as in the solid distributivity efects captured by Marking in the M&M model or the fne-grained semantic features that have been shown to afect decisions) and morphological complexity in agreement computations (more of this below). These have now been abundantly attested. Finally, the irrelevance of the syntactic confguration where cues occur that much (but now not all) work on the model has suggested is also hard to square with all the available evidence (on the last two issues, see below; also Bock & Middleton 2011).

3.2.1.4 The scope of planning + semantic integration account Another view of attraction (not really a full-fedged theory or model) combines a memory-based account similar to WMRM with the notion of semantic integration (Solomon & Pearlmutter 2004; Gillespie & Pearlmutter 2011; see also Gillespie & Pearlmutter 2013). With incrementality in mind, this ‘scope of planning’ model (SoP for short) assumes that not everything is available at once and that whatever is available depends not only on how far back it was previously activated but also on how semantically integrated it is. For instance, the two nouns in a phrase such as the drawing of the fowers are more integrated than in a formally similar phrase such as the drawing with the fowers (in the former the fowers is an argument of drawing so drawing the fowers is a kind of integrated predicational scene; in the latter the drawing and the fowers are not tied together in the same predicational scene), thus creating a more propitious environment for attraction mistakes in principle. Memory is measured as a function of distance to the head noun. In essence then, the scope of planning account relies on the degree to which elements are overlappingly planned (“in the pipeline at the same time”, as it were; Brehm & Bock 2013: 151), creating competition to relate to the upcoming predicate in the establishment of a referential and a topical chain (is the message going to be about the drawing or about the fowers?). Parallel retrieval is hypothesized to be weaker for less-integrated chunks. Unlike WMRM (whose connection with classic work on the comprehension of language is more obvious), SoP links the fndings of agreement error production to the tradition of research on other kinds of errors in both spontaneous

Agreement

89

and experimentally elicited speech. Indeed, exchange errors provide solid evidence that multiple elements of an utterance are active simultaneously during production (Garrett 1975, 1980). This indicates that speakers plan fragments of their messages well in advance of actual articulation. Under a scope of planning account, then, only modifying nouns that are within the scope of planning of the head noun when the number specifcation of the entire subject NP is determined can create interference. In this sense, the timing of planning of elements within a phrase is determined by the order in which these elements must be output, but semantic integration (as in the drawing of the fowers) may shift that order. This is in line with constraint satisfaction accounts of processing that make room for multiple constraints interacting and competing for activation. These accounts rely on architectures that keep syntax in check, so to speak, in that language production and processing are seen as involving much more than just trees (e.g., MacDonald 1994). Solomon and Pearlmutter (2004) reason that the structural depth efects of Franck et al.’s (2002) classic study (where, if we recall, in a three-NP confguration, N2 caused more interference than N3) can be explained without any appeal to syntactic, hierarchical structure. They obtained post hoc ratings of semantic integration for Franck et al.’s stimuli and showed that semantic integration was confounded with syntactic distance: “N1 and N2 were signifcantly more integrated than N1 and N3, predicting correctly that the N2 mismatch efect should be larger than the N3 mismatch efect, because N1 and N2 would be more likely to be planned simultaneously than N1 and N3” (Gillespie & Pearlmutter 2011: 8). Semantic integration is calculated via prior norming studies. For instance, in the Gillespie and Pearlmutter (2011) study that we mention right below, this is the description of the norming study: The frst norming survey, completed by 117 participants, was used to ensure that the preambles controlled semantic integration as desired. The 12 diferent versions of each of the 63 candidate stimulus items (4 number conditions x 3 possible rating pairs (N1-N2, N1-N3, N2-N3)), along with the 24 fllers, were rated for integration following the procedure described in Solomon and Pearlmutter (2004b). Participants rated integration of the two underlined nouns in each preamble, using a 1 (loosely linked) to 7 (tightly linked) scale. The instructions included example phrases (the ketchup or the mustard and the bracelet made of silver) and indicated that although ketchup and mustard are similar in meaning, they are not closely related in the particular example phrase, in contrast to bracelet and silver, which are closely related in the example phrase. The 12 versions of each candidate item for rating were counterbalanced across 12 rating lists such that exactly one version of each stimulus item appeared in each list. The 87 preambles in each list were presented over 5 printed pages, and the pages of each list were randomized separately for each

90 Agreement

participant. Each participant rated the stimuli in one list, and 9–10 ratings were thus obtained for all but one version of one stimulus item (which had only 8). (p. 13; PDF) Gillespie and Pearlmutter (2011) conducted two subject-verb agreement error elicitation studies aimed at testing the hierarchical feature-passing account (aka, M&M) and three timing-based alternatives: linear distance to the head noun, semantic integration and a combined efect of both (a scope of planning account). In their frst experiment, the stimuli consisted of a head NP followed by two PPs, where the frst PP modifed the frst NP, and the second PP modifed one of the two preceding NPs (this was the descending or cascading condition: e.g., the backpack(s) with the plastic buckle(s) on the leather strap(s)). Semantic integration between the head noun and the local noun within each PP was held constant across structures. Errors indicated an efect of linear distance to the head noun and no infuence of hierarchical distance. In their second experiment, both PPs modifed the head noun (this was the fat condition: e.g., the book(s) with the torn page(s) by the red pen(s)), but both the order of the two PPs and the degree of semantic integration of local and head nouns were varied. The analysis of errors revealed a combination of semantic integration and linear distance to the head noun. The authors concluded that agreement processes are strongly constrained by “grammatical-level scope of planning operations”, and that local nouns that are planned closer to the head have a greater chance of interfering with agreement computations. Thus, the model maintains that there is no feature passing over tree structures and nothing particular about the way agreement is processed. Memory and semantic integration relate to general cognitive abilities. Hierarchical factors (syntax) are considered to be “irrelevant” (Gillespie & Pearlmutter 2011: 39). One may presume that the Barker et al. (2001) efects referred to above (the canoe near the sailboats vs the canoe near the cabins) might also be interpreted in the context of semantic integration and that even recent accounts like SOSP (Smith et al. 2018), which make use of the notion of objectively measured semantic features, might ofer a refnement of it. Some of the criticisms raised against the WMRM apply to the scope of planning account as well. The model makes little room for (anticipatory) feed-forward processes, nor for the possibility that assembling a syntactic structure of the nominal kind (an NP) may go wrong. In fact, NPs may contain determiners (articles, quantifers, deictics, etc.), heads (simple, derived, compound, etc.), modifers (adjectival, nominal, prepositional, etc.) and complements (and quasicomplements, in its turn adjectives, nouns, prepositions, clauses, etc.) as well as a great deal of referential competition. The heads themselves may be predicates with an argument structure (news, feeling, observation, etc.) or not (table, car, weather, etc.) and the same applies to any of the nouns that sit inside a modifying or complementing layer of the overall structure. There is simply no reason to believe that putting all that in place must be error-free.

Agreement

91

In the second place, maintaining that hierarchical factors are irrelevant is at odds with the fact that attraction is domain sensitive: generally, signifcantly more pronounced with phrases than with clauses (even when constituent size is controlled for; Bock & Cutting 1992; but see Gillespie & Pearlmutter 2013). It is also at odds with the typical asymmetric pattern of attraction: sg + pl combinations attract but pl + sg ones do not. This markedness efect afects only form. The same objection comes to mind when we realize that head pre-eminence efects have been widely attested (see Franck 2011). For instance, collectivity (e.g., committee), a semantic dimension, modifes agreement when measured in the head noun, not on the modifer noun (Bock & Eberhard 1993). We have already mentioned that semantically based (biological) gender is less prone to miscalculation than arbitrary gender when measured in the head noun, not on the modifer noun (Vigliocco & Franck 1999). Evidence from partial stranding (i.e., cases where the infectional features of either the frst or the second noun get stranded, instead of both) shows that it is the infection of the frst noun, the head, that is stranded, not that of the second one (Igoa et al. 1999).12 Indeed, even if grammar and processing were seen to be not completely isomorphous (Phillips et al. 2011), if there is such a thing as a head in grammar, it would be strange to fnd its psychological reality ‘completely irrelevant’. We have already mentioned the Franck et al. (2010) research showing that similarly intervening material causes attraction in object relatives (involving movement) but not in complement clauses (without movement), a fact that places syntactic structure squarely in the focus of attention. Additionally, the many infectional processes that have shown efects on the local noun also show that infectional processing is relevant. For instance, attraction strength has been seen to be also modulated by contrastiveness in that invariant plurals (such as scissors or suds) attract less than ordinary plurals, which have a singular counterpart (e.g., door/doors; Eberhard et al. 2005). This speaks to the role of infectional calculations per se. The local noun is always in the same position relative to the head in all experiments on attraction; however, its form has an efect on some occasions (in the face of contrastiveness, for instance), but not on others. That form can hardly be immaterial then. Finally, form is essential in that languages with more or less morphological strength are diferentially sensitive to robust, indeed cross-linguistically indisputable, semantic interference, traditionally the single most disturbing factor about the grammar of agreement (see below).13 Last but not least, the entire logic of semantic integration in attraction has been called into question on seemingly solid grounds (Brehm & Bock 2013: 151): Attraction is spurious plural agreement. But integration, as we see it, creates representational unity, and unity is singular. Singular agreement is thus a plausible repercussion of integration, and is what would be predicted from a notional number efect. In explicit terms, stronger integration should yield more singular construals and more singular agreement, and weaker integration should yield more aggregate construals and more

92

Agreement

plural agreement. With pronoun agreement, this is the result that has been found for Complex Reference Objects in language comprehension (e.g. Eschenbach et al. 1989; Patson & Warren 2011) and language production (Bock et al. 2004), where analogs of weak integration promote plural number.

3.2.1.5 On morphology Starting in 2008, the team led by Kathryn Bock, who generated the M&M model of attraction in the early years of the present century and much work before and after that, has put forward the Morphological Filtering Hypothesis, or SoP for short. The MFH is not, in principle, a model in itself; in fact, its proponents view it as simply a component of M&M (see Acuña-Fariña 2012 for a qualifcation of that idea). It captures the simple notion that a language’s morphological structure is likely to condition its agreement operations, including the aberrant operation of attraction. In particular, contrary to popular ideas in the 1990s (Vigliocco et al. 1996a, b; see above), the MFH proposes that the stronger the morphological component of a language, the stronger its insulation from semantic interference is likely to be. Vice versa, if a language has a comparatively weak morphosyntax, then semantic interference is not fltered so easily, which results in agreement mistakes.14 To my knowledge, this idea originated in a little number completion study published by the linguist Thomas Berg in Linguistics in 1998. Using a fll-in-the-blanks type of questionnaire, Berg compared German and American responses to a series of well-known agreement uncertainties and realized that the two languages patterned diferently. Thus, for instance, so-called number transparent nouns (Huddleston & Pullum 2002: 501f ) like bunch, series, number, or lot tended to establish semantic agreement in American English (e.g., a series of reports were issued) but formal singular agreement with the singular head in German (eine Reihe von Berichten wurde erstellt ‘a series of reports was issued’). Berg suggested that the reason that a poor morphology cannot contain a previous conceptualization (the phrase a series of reports ‘is about’ reports, not about a series) is that this morphology must be put in place repeatedly in the actual performance of speakers of these languages and that, essentially, practice makes perfect. That is, that frequency moulded the agreement operations done in these rich-infection languages, efectively making them less – instead of more – error-free (Haskell et al. 2010). In a comparison of (Germanic) English and (Non-Germanic) Spanish using the same methodology, Riveiro-Outeiral and Acuña-Fariña (2012) found that, interestingly, Spanish patterned like German, and, unlike English, consistently with the morphological richness theory, but not with language family afliation. Notice that the Bergian data and the Spanish follow-up involve grammaticalized interference, that is, cases where a given language sanctions a mismatch as legal (as in, for instance, the committee-sg are-pl satisfed with the decision in English).

Agreement

93

What the Bock team found starting in 2008 was that in attraction, where the mismatch is illegal, the same thing seems to happen. Lorimor et al. (2008) frst compared English (poor morphology) and Russian (very rich morphology, including six cases in the nominal paradigm) and found out two things: frst, that the sheer rates of malfunction were statistically higher in English; and, second, and more importantly, that errors driven by the distributivity of the preambles (i.e., when the preamble coded a formally singular but underlying plural conceptualization, such as the fag on the buildings) were even more numerous in English, an evident sign of semantic infltration. Then in a meta-study Lorimor et al. compared those data with the accumulated evidence on attraction till the time of writing and uncovered a cline that neatly corresponded with the morphological spectrum: the language most likely to exhibit errors and a greater distributivity efect was the one with the poorest morphology (English), followed by Dutch (a little more morphology), the Romance languages French, Italian and Spanish (much more morphology) and Russian (the strongest morphological component). Foote and Bock (2012) ofered the frst investigation of cross-dialectal differences in attraction by comparing two varieties of Spanish that difer in morphological strength, Mexican (with an intact morphological component) and Dominican (with an eroded morphology that frequently does not code many word endings anymore; Toribio 2000; Lunn 2002). They found that Dominican behaved more like English than like Mexican in that distributivity had a greater efect in the former than in the latter.15 Acuña-Fariña (2018b) has recently explored the MFH (which he calls the Leaking and Blocking Theory of Agreement) in four completion studies carried out with two other dialectal versions of Spanish and two dialectal versions of Portuguese. Both versions of both languages difer in morphological strength: thus, whereas the Spanish spoken in the south of Spain (in the region of Andalucía) shows severe morphological erosion, the one spoken in the north (in the region of Galicia) preserves its morphology fully. The difference is even more conspicuous between European Portuguese and Brazilian Portuguese, whose extensive morphological erosion is well-known and is, in fact, greater than that of Dominican (Rodrigues 2002; Holmberg et al. 2009). The tests showed robust efects of the morphology in the predicted direction. In particular, they revealed that both region (Portugal versus Brasil, Andalucía versus Galicia) and preamble type (distributive versus non-distributive) have a signifcant impact on the probability of causing agreement to derail. As regards the comparison Andalucía versus Galicia, the probability of making an error was four times higher in the completions with distributive preambles in the former. As regards the probability of making an error by region, in Galicia this was half the one for Andalucía. The comparison Portugal versus Brazil showed that the probability of malfunction is twice as large in those sentences with distributive preambles and that, in Brazil, that probability is four times larger than in Portugal. It must be noted that Bock et al. (2012), centred around a comparison of Spanish versus English, ofered only a small non-signifcant diference between

94

Agreement

the two languages in the distributive preambles: 10.5% versus 11.3%, respectively (in both cases much higher than in the non-distributive ones). Yet, all in all, the accumulated evidence points to the view, expressed cautiously, that morphology does play a relevant role in agreement computations and that far from that role being to grant superior (qua continual) access to conceptual structure, it may be to create agreement schemas (e.g., [-Xo/-X’o/-X’’o]) that would behave as neural avalanches (MacWhinney 2001: 459; Acuña-Fariña 2009) in serving to assemble phrasal packages automatically (as is well known, in English that role is taken on by more determinate tree geometries; Acuña-Fariña 2009, 2018a, 2018b, 2018c; see section 3.3 on comprehension). Such assembling would surely be impeded via a constant and unnecessary tapping of a conceptual substrate, not only because that would potentially be of massive proportions (the Spanish phrase todos estos libros viejos y rotos ‘all these old broken books’ contains ten features of number and ten features of gender and is pronounced in a little over a second), but also because for arbitrary gender in particular (present in most nouns in Spanish), there is no conceptual substrate to tap in the frst place (book is masculine in Spanish for no conceptual reason at all, synchronically at least). If this view is correct, then the rich morphological components of so many languages in the world would not appear to be such a ‘dysfunctional’ waste (Taylor 2002: 332) but a preferred, syntactically oriented means of clause-construction.

3.2.1.6 Attraction in comprehension Work on attraction in comprehension is more recent and scarcer, and it originally sought to see if the same setup that produced attraction mistakes in production revealed a pattern of difculty when reading. If so, for phrases like the label on the bottles is red, a slowdown is predicted at around the verb region but, crucially, not for phrases like the labels on the bottle are red or the label on the bottle is red, with no plural attractors intervening. Likewise, for ungrammatical strings like the label on the bottles are red the prediction was that these should generate an illusion of grammaticality (Dillon, Mishler, Slogget & Phillips 2013; see also Tanner et al. 2014) because the momentary linking of the plural noun bottles and the plural verb are would receive its approval by the system in at least a statistically signifcant number of trials. Early work by Nicol et al. (1997) and Pearlmutter et al. (1999) supported these predictions. Thus, for instance, in two self-paced reading studies and one eye-tracking study, Pearlmutter et al. showed that the mismatch of the head noun and the local noun did interfere with verb agreement processes in normal reading, creating difculty in grammatical sentences and reduced difculty in ungrammatical ones. In the eye-tracking data, these efects were clear at the region following the verb. Thus, in the ungrammatical condition, when ungrammatical verbs were preceded by plural attractors, total reading times were shorter and regressions were fewer than when the same verbs were preceded by singular attractors (so there was facilitation/grammatical illusion). In the grammatical condition, however, the opposite result emerged: longer total

Agreement

95

reading times and more regressions with plural attractors (that is, processing diffculty). The authors concluded that something analogous to what Bock and colleagues had envisaged for production was at work also in comprehension, that is, an overwriting operation where the feature of the local noun overwrites that of the head noun, causing the representation of the entire NP to be plural. In this subfeld of studies, accounts that place the origin of malfunction at the verb in the continuous and equivocal representation of the subject NP prior to it are known as representational (or encoding-based) accounts (e.g., Eberhard et al. 2005; Staub 2009; Hammerly et al. 2019). A key issue in attraction studies today is whether attraction should occur in both production (where it is indeed very strong) and comprehension (where, despite the previous comments, it is disputed to date). Wagers et al. (2009) were the frst to reason that in comprehension there is a danger of attributing to attraction something that may have very little to do with it, namely, the fact that plural nouns are more difcult to process than singular ones on their own. Plurals are morphologically and, arguably, conceptually more complex than singulars, and there is solid evidence showing that this is manifested in increased processing times, independently of the syntactic mould they appear in (Lau et al. 2007). It is true that in the classic structure that has created the most malfunction, the [sg head + pl attractor] schema, efects are expected at the verb (or very close to it) but this comes directly after the plural attractor, whose very plurality may take a toll. This results in what may be taken to be spurious attraction. Wagers et al. did two things to solve this methodological problem. First, they put some distance between that plural NP and the verb, by introducing an adverbial element in the middle (e.g., “The key to the cabinets unsurprisingly was/were…”). Additionally, they used a diferent structure: object relative clauses (RC), the type referred to above as downward percolation. Note that in these the attractor does not linearly intervene between the RC subject and the verb (e.g., “The cabinets that the key opens/ open…”; here the attractor is cabinets). The authors reported no attraction efects in grammatical sentences and used these results to argue that attraction occurs only in ungrammatical ones and manifests itself as an illusion of grammaticality, as noted above. This is now known as the Asymmetry Efect (that is, the fact that attraction may be detected in ungrammatical but not in grammatical strings). As Lago et al. (2021) observe, in ungrammatical sentences the methodological confound is eliminated because, in these, attraction causes faster reading times at the verb (i.e., facilitation), which cannot logically be attributable to the (independently more difcult) plurality of the attractor. The results of Wagers et al. (2009) have been used by the authors themselves and others to argue for a cue-based model of processing that relies on a content-addressable working memory architecture (see section 3.2.1.3). On this view, interference does not arise because the representation of the subject’s number feature is graded or faulty, but, instead, because retrieval of the right agreement controller (the head of the entire phrase) is subject to similaritybased interference from other items in working memory (Gordon et al. 2001;

96 Agreement

Lewis & Vasishth 2005; Caplan & Waters 2013; Franck & Wagers 2020). Thus, when the subject NP contains two or more noun phrases inside, the nouns in these compete to control agreement with the verb, and, if they mismatch in number, competition can result in the selection of the syntactically unlicensed agreement controller. That is, retrieval interference arises when multiple items in memory match the retrieval cues (e.g., when “featurally similar information in non-target positions intrudes on retrieval of the target”; Parker & An 2018). The literature ofers two versions of the cue-based dynamics (Lago et al. 2015; Patson & Husband 2015). On one view, retrieval is only an errordriven mechanism, so readers use it as a repair strategy only after they detect a subject verb number agreement violation. This account predicts that awareness of grammaticality violations should precede attraction efects. The rival view posits that retrieval is always engaged when a verb is reached, which means that readers can compute agreement only after retrieval has taken place. Thus, this alternative account predicts synchronic or simultaneous efects of grammaticality violations and attraction. Despite attempts at presenting the Asymmetry Hypothesis as an Asymmetry Fact (see Schlueter et al. 2018), the binarism it entails is not warranted. For instance, Acuña-Fariña et al. (2014) did fnd strong attraction efects in grammatical sentences in Spanish, whereas Lago et al. (2015) did not fnd similar efects in the same language (see below). Hammerly et al. (2019) provide a review of attraction studies in comprehension. A cursory look at it reveals a complex picture with holes that future work needs to address. It turns out that only twenty-seven of the forty-fve experiments that directly tested the hypothesis (in some nineteen diferent publications) found a signifcant interaction between grammaticality and attractor number. Crucially, ffteen of the forty studies that reported efects of attractor number for grammatical and ungrammatical sentences found attraction in grammatical ones (thirty-two of thirty-fve with ungrammatical materials). In their own study, in which they manipulated response bias (manipulating fller item composition and instructions to participants), they managed to do so as well. They used these data to argue for a view of attraction involving a continuous and unstable representation of number, rather than retrieval interference at the verb. This model accounts for the variability in the attestation of attraction in grammatical sentences by stressing shifts in the decisional starting point. Note that continuous valuation models predict symmetrical efects of mismatch because the number marking of the complex subject phrase is not determined by whether the verb is ultimately singular or plural. This means that unstable number marking with plural attractors should occur in grammatical and ungrammatical sentences alike. Hammerly et al. make the point that their model is more parsimonious because it applies equally well to both production and comprehension (they call this ‘representational identity’), unlike recent proposals (AcuñaFariña 2012; Acuña-Fariña et al. 2014; Tanner et al. 2014; Schlueter et al. 2019) that both modalities of processing are diferent enough to entertain the idea that attraction is dealt with diferently in both.

Agreement

97

In fact, the Hammerly et al. review reveals more problems than a lack of control of response bias in the literature. The binary distinction entailed by the Asymmetry Hypothesis (there is or there is not attraction in grammatical sentences in comprehension in all the languages tested and to be tested) cannot seriously be contemplated unless a much tighter control of experimental conditions across studies can be ofered. For instance, the review shows that only six languages have been tested, with English providing a disproportionate amount of the evidence. Crucially, the structures analysed range from the classic PP modifers (the label on the bottles) to Object Relatives (the notes that the girl writes are …), Subject Relatives (the new executive who oversaw the middle managers was …) and Possessive Relatives (in German and Turkish). Notice that if one wishes to observe the merits of a feature percolation account (Eberhard et al. 2005, Abney 2009), PP modifers involve classic upward percolation (‘plural’ is a the bottom of the structure, in the head noun, and travels to the top of the phrase to project the head’s features), whereas in object relative clauses percolation is downward (with the attractor appearing before the head noun). These are very diferent processes linguistically (on the role of syntactic depth in attraction, see Franck et al. 2002; Staub 2009; Franck & Wagers 2020 and, for an alternative view, Gillespie & Pearlmutter 2011). Note that work by Van Dyke and McElree (2011) and Parker and An (2018) suggests that there may also be diferences between oblique (i.e., prepositional) and core (NP) arguments in their ability to resist attraction. As for the methodologies, these range from self-paced reading studies and binary acceptability judgements (the vast majority) to scaled acceptability judgements, ERPs and eye-tracking (Schlueter et al. 2019 combine self-paced reading with a speeded forced choice task). Taking into account that a major issue in this research agenda is a timing issue because it is crucial to know: (a) whether efects are obtained at the local noun already or at the verb, or the verb + 1 region; (b) whether they are immediate or the result of revisions initiated later; and (c) whether ungrammaticality must necessarily precede attraction or not, it is unfortunate that only three publications involve work done with the eye-tracking methodology (Pearlmutter et al. 1999; Dillon et al. 2013; and Parker & Phillips 2017). It is well known that eye-tracking is ideal for examining temporal profles. The three publications reviewed in Hammerly et al. (2019) used English materials. Acuña-Fariña et al. (2014) is not in their review (as it only used grammatical sentences), but it also used eye-tracking and it focused on a language diferent from English (Spanish). Eye-tracking also allows one to bypass the reservations expressed by Hammerly et al. (2019) on the infuence of decisional windows/response bias. Recent work in Spanish by Lago et al. (2021) may help illuminate the ground, if only because it directly addressed the issue of the comparability of materials and methodologies and also because it used a large population of 160 native speakers conferring robust statistical power. This work had two direct antecedents: Acuña-Fariña et al. (2014) and Lago et al. (2015). The former used eye-tracking and grammatical sentences only, and it reported robust and fast attraction efects

98

Agreement

for PP modifers in that language, faster in fact than those reported in Pearlmutter et al. (1999) for English (with cumulative reading times and frst pass regressions showing reliable diferences at the verb region, and frst fxation duration showing diferences at the verb region and at the following region, being signifcant only at the following region). The latter used self-paced reading with both grammatical and ungrammatical combinations and reported a lack of efects in the former with object relatives in the same language (albeit slightly inconsistently across the three experiments, as in one of them weak attraction efects were observed also in the grammatical condition). These are clearly opposite results. Lago et al. (2021) sought to equalize conditions (same language, same structure, same grammaticality setup and same methodology) by using the type of structure and materials in Lago et al. (2015) (containing object relatives) and the kind of methodology in Acuña-Fariña et al. (2014): eye-tracking. In short, they studied the attraction profle (or lack thereof ) in both grammatical and ungrammatical conditions in Spanish noun phrases containing object relative clauses by examining readers’ eye-tracking records. This narrower, more controlled kind of approach provides comparable data that may further illuminate our knowledge of agreement attraction cross-linguistically. It enables a better evaluation of the Asymmetry Hypothesis (as both grammatical and ungrammatical materials were used) and also allows us to better gauge the merits of the two main rival hypotheses: representational versus cue-based accounts. If the former are on the right track, rapid efects before the verb or right at the verb should be observed in grammatical conditions, as well as a non-diferential pattern of regressions. In contrast, in cue-based terms, regressions should be inexistent in grammatical conditions and very evident in ungrammatical ones. Since object relatives allow researchers to separate the plural attractor from the verb (Wagers et al. 2009), any observed efect cannot be attributed to the cost of lexical plurality. The following are the experimental conditions used (the bars mark regions of analysis): (24) GRAMMATICAL, SINGULAR ATTRACTOR La nota que / la chica / escribió / en la clase / alegró a su amiga. (25) GRAMMATICAL, PLURAL ATTRACTOR Las notas que / la chica / escribió / en la clase / alegraron a su amiga. (26) UNGRAMMATICAL, SINGULAR ATTRACTOR *La nota que / la chica / escribieron / en la clase / alegró a su amiga. (27) UNGRAMMATICAL, PLURAL ATTACTOR *Las notas que / la chica / escribieron / en la clase / alegraron a su amiga. Translation: The note(s) that the girl wrote.SG / *wrote.PL in the class cheered up her friend. The authors reported a clear pattern of intrusion, that is strong attraction efects. More relevantly, was there attraction in the grammatical materials? The general answer to that is also in the afrmative, but the hedge ‘general answer’

Agreement

99

now suggests that this conclusion is not totally informative. An initial conclusion to form in view of this new eye-tracking data is that when reading time is measured with greater precision attraction emerges (in Spanish at least) even in grammatical sentences: the same structures that failed to produce it in Lago et al. (2015) did produce it now. Note that the Hammerly et al. (2019) review already suggested that the conclusion reached by many researchers that attraction only showed up in ungrammatical material was too binary and, therefore, unfounded. It is important to remember that both Wagers et al. (2009) and Lago et al. (2015) did fnd small attraction efects in one of their experiments each. In the context of their other experiments (seven in Wagers et al. and three in Lago et al.), they separately attributed that to a type 1 error. In the larger context provided by all the research ever since, Acuña-Fariña et al. (2014), the Hammerly et al. (2019) review and Lago et al. (2021), it seems now evident that attraction is not incompatible with grammatical materials. It seems evident too that it is much weaker. Understanding why is essential to form a realistic, coherent account of the way agreement is processed when reading. In Lago et al. (2021) grammaticality efects were observed from early on across all reading measures in that ungrammatical verbs elicited more regressions and longer reading times than grammatical verbs in both early (e.g., frst-fxation, frst-pass) and late measures (e.g., re-reading and total times). Importantly, by contrast, attraction efects mainly afected participants’ regressive eyemovements and later processing measures only. These efects consisted of fewer regressions and faster regression-path and total reading times when the sentences contained a plural attractor. In contrast with Lago et al. (2015), attraction efects afected both grammatical and ungrammatical sentences, as evidenced by the lack of an interaction between attractor number and grammaticality. The really important aspect to understand is that the magnitude of the attraction efects was much smaller in the grammatical versus ungrammatical sentences and that the measures involved were also diferent: in the grammatical sentences attraction became evident in the re-reading of the head noun and the attractor, whereas in the ungrammatical ones it additionally became evident at the verb and the verb + 1 region. In sum, the only evidence consistent with attraction was seen in the re-reading of the nominal (preverbal) material. The explanation for the diference between these and Lago et al.’s (2015) results with the same materials must therefore lie in the diferent methodology used. On this, it is important to realize that a more precise reckoning of reading activity simply provided more power to see what in Lago et al. must have been very weak efects (the type 1 error), but it cannot motivate a view that attraction is the same in grammatical and ungrammatical sentences for it is extremely easy to see it in ungrammatical sentences but much more difcult to see it in grammatical ones. A reasonable (non-binary) conclusion we can form is that attraction does exist in grammatical sentences, at least in Spanish NPs containing object relatives, but that past failures to detect it must have been due to the inherent weakness of the process and the lack of methodological power to capture it.

100

Agreement

The fact that we seem to be dealing with non-binary contrasts here (more or less instead of yes or no) is also evident when we recall that attraction was indeed quite strong in the grammatical sentences of Acuña-Fariña (2014). Now, the reason for the diference between that strength and the weakness observed in Lago et al. (2021) cannot lie in the methodology (the same in both), so the only possible explanation is the kind of structure used: PP versus RC modifers. At this stage, we cannot determine if the structural diference lies more in the nature of the syntactic categories (phrases versus clauses) themselves or the upward versus downward feature percolation dynamics they enforce. Be that as it may, as early as in Bock and Cutting (1992), production studies already showed that relative clauses produced fewer attraction mistakes than phrasal (PP) modifers, a fnding that motivated the view that clauses (but not phrases) open up new processing cycles. Note that in the Acuña-Fariña et al. (2014) study with PPs, not only number but gender too produced robust attraction efects. In sum, on the weakness issue, a possible explanation is twofold: RCs produce weaker attraction efects than PPs and eye-tracking is better than self-paced reading to capture whatever efects are real and, therefore, almost essential to capture efects with RCs. These recent Spanish results also allow us to address the debate whether attraction is or is not due to the instability caused by the constant valuation of a number feature while dealing with the nominal component of the sentence (before the verb becomes visible). Recall that the prediction of continuous valuation (representational) models is that efects should be visible in that region – and, indeed, they were. However, they were visible only after the verb had already been processed, in regressive and re-reading measures of the head noun and the attractor (when this was plural), so the Lago et al. (2021) results do not particularly support the predictions of these models. No early measure indicated trouble suggestive of continuous unstable valuation before the verb region. Schlueter et al. (2019) suggest a way of conceiving (salvaging?) a representational account that may be applicable here: instead of abandoning the idea that the problem with attraction is with the representation of the subject phrase (singular?, plural?) and blaming retrieval, it may still be possible to maintain a representational account if we understand that readers do really revise the subject’s number feature after the verb is encountered. This would entail a kind of mixed model according to which the initial representation of the sentence is changed based on the ulterior retrieval output. As Schlueter et al. note (p. 13), this would be fundamentally diferent from other misrepresentation accounts (Franck et al. 2002) and the Marking and Morphing model (Bock et al. 2001; Eberhard et al. 2005) in that: “if the parser changes the subject’s number feature based on the output of retrieval in agreement processing, misrepresenting the subject’s number information would be a consequence of agreement attraction, rather than the cause of it”. I fnd it difcult to determine how much such a framework difers from a pure cue-based model.16 As for how the data reviewed above ft cue-based accounts in the attraction literature, we cannot ignore that the consensus seems to be that attraction is

Agreement

101

not possible in grammatical sentences simply because in grammatical sentences there is nothing to revise (Wagers et al. 2009; Dillon et al. 2013). For instance, in an ongoing sentence structure that starts like the stamp on the envelopes is … a plural local noun does not match any number feature at the verb and is therefore unlikely to cause interference. In ungrammatical sentences like the stamp on the envelopes are …, this is diferent as the attractor does indeed match the wrong form of the verb in plural number now. As Tanner et al. (2014) suggest, it is additionally likely that retrieval procedures operate in tandem with predictive ones (Lewis & Vasishth 2005). On that view, retrieval would be initiated only when encountered cues confict with prior predictions. This would lend even more credence to the Asymmetry Hypothesis, as retrievals would really not be necessary in grammatical strings (sg head noun predicts sg verb, and this is really what becomes visible later on). These considerations do not entirely mesh with the results reported in Lago et al. (2021), since they found evidence that attraction is indeed (weakly) visible in grammatical conditions. Indeed, the eyetracking record made it clear that even when no revision is necessary revision does take place any way in the form of diferential reading at the two nouns in late measures and regressions, with the need to revise both head noun and the non-intervening attractor being smaller when the attractor is plural. Finally, a word is in order regarding the diferent profles of attraction in production and comprehension studies, an issue we referred to earlier (sections 3.2.1.3 and 3.2.1.4). Representational accounts wield the notion of representational identity (Hammerly et al. 2019) to argue for the superior parsimony of models that use the same explanatory tools for both processing modalities. Yet it seems evident that attraction is cross-linguistically very robust in production, even with RCs, and not so in comprehension. The question remains, therefore, why it is much less strong in comprehension. In point of fact, as already noted, there is every reason to believe that access to information in the two modalities is diferent enough in timing to warrant diferential performance in both. In production speakers start from a certain conceptual structure that they need to express via linguistic form, but clearly the concept antecedents the form. In the domain of attraction, in particular, evidence of ‘agreement ad sensum’ is crosslinguistically very strong. To my knowledge, every language tested for the distributivity efect has produced positive results (Lorimor et. 2008; Foote & Bock 2012; Acuña-Fariña 2018). Recall that there is, in fact, an interesting contrast in the literature: the comparison between Vigliocco et al. (1996) and AcuñaFariña et al. (2014). These two studies used the very same sentences, yet the former was a production study whereas the latter was a comprehension one, and the distributivity efect (evidence of semantic interference) emerged only in the former one. We referred above to work in linguistics by Berg (1998) or AcuñaFariña (2009, 2012) pointing to the idea that in production semantics often ‘leaks’ into form, especially so in poor-infection languages with ‘poor form’ (see also Foote & Bock 2012). Contrary to production, in comprehension (especially in the typical decontextualized scenarios elicited in experiments) what reaches

102

Agreement

the processing system frst is form and in rich-infection languages like Spanish that form is massively alliterative and, more often than not, completely arbitrary (which makes use of semantics futile). Thus, the noun mesa (‘table’) is feminine in Spanish for no good reason, but a phrase that starts with a fem-sg determiner followed by a fem-sg head noun like l-a mes-a surely predisposes parsers to expect more fem-sg sufxes, as adjectives, quantifers and participles must bear the same redundant cue inside the NP: e.g., la mesa blanca alta que está rota (‘the tall white table that is broken’). There is no conceptual guidance in such phrasal packaging operations, just the formal need to replicate the morphological cue imposed by the head noun on all its satellites. It makes sense that Spanish language-users are exquisitely sensitive to such morphological avalanches (see below on ERP studies). In short, unlike production (where a pre-existing conceptual structure may be active and ready to interfere), in comprehension whatever is pre-existing and ready to interfere is form. Tanner et al. (2014) suggest that the mechanisms responsible for attraction in comprehension are only a subset of those at work in production and that, in the former, there is not enough time to access enough of the conceptual structure at really initial phases of syntactic processing (see Acuña-Fariña 2012 and Acuña-Fariña et al. 2014 for a similar view). This seems a sound conclusion.

3.3 More on comprehension: agreement in brain waves Many of the previous considerations concerning the timing of access to the relevant kind of information during the diferent stages of syntactic parsing are particularly amenable to a fne-grained electrophysiological analysis. This is where research using the ERP (Evoked Response Potentials) methodology enters the scene. ERPs “represent the synchronized electrophysiological activity produced by large populations of cortical pyramid cells, time-locked to an external or internal event” (Molinaro et al. 2011: 909). Informally, ERPs are brain responses to specifc stimuli, such as a specifc syntactic structure, a word, a prosodic contour, etc. This methodology ofers exquisite temporal resolution, so it allows researchers to tap into the online processes that unfold during language comprehension with a precision that behavioural measures, such as self-paced reading or eye-tracking, cannot provide (this is because the latter register the fnal product of neurocognitive processes, but often the same fnal behavioural result may be the consequence of either comparatively early or comparatively late processes that only the fnest control of time can reveal). Most ERP studies involve reading tasks, serial word-by-word visual presentation and a ‘violation paradigm’. This means that researchers present subjects with both grammatical (control) and ungrammatical strings and then see where they diverge. Compared to Magnetic Resonance Imaging, ERPs provide poor topographical precision, but this is still usually good enough to also ascertain if broad areas of the brain (e.g., Broca’s area, the frontal lobe, one hemisphere or both, etc.) are implicated in an efect. It is assumed that the onset of wave diferences between the conditions tested

Agreement

103

refects processing stages. It is also assumed that amplitude diferences in the electrical current mirror the amount of activity the brain needs to resolve a computation. Three wave forms are standardly used. There is much debate in the specialized literature about what each one of them really taps into, but the following rough summary refects the largest consensus (for reviews see Molinaro et al. 2011; Beres 2017; Courteau et al. 2020): N400: a centro-parietal peak in a negative-going wave starting at around 400 ms after the beginning of a violation, often associated with a lexicosemantic expectancy violation in particular. This is for instance the feld of selectional restrictions. Informally, an N400 registers how much the brain is surprised by the actual input, given what it expected to fnd (e.g., the footballer kicked the happiness). LAN: a left anterior negativity that starts also at around the 400 ms time interval and is usually taken to refect morphosyntactic anomalies, including, for instance, subject-verb agreement violations (there is also evidence of an early LAN or ‘eLAN’ for word category violations occurring prior to the 200 ms). P600: a late parietal positive-going component that starts peaking at around 500 ms and returns to baseline at around 1000 ms. This brain wave is usually interpreted as refecting reanalysis and repair processes, as well as sheer syntactic complexity. Researchers often talk about early and late P600s, and often understand the early type (with a larger distribution in the scalp) as showing difculty in integrating a constituent with the ongoing sentence fragment and the late type (peaking in more posterior brain regions) as refecting repair activity. To my knowledge, the earliest ERP study of agreement is Kutas and Hillyard (1983), who found that errors in S–V ties in English produced negativities between 200 and 500 ms in anterior areas of the brain. Early studies by Münte and Heinze (1994) and Osterhout and Mobley (1995) provided concurrent data. For instance, in the seminal study of Osterhout and Mobley (1995), in English, the authors employed number agreement violations between personal pronouns and their antecedents, refexive pronouns and their antecedents, and subjects and verbs, as in (28)–(31): (28) *The elected ofcials hopes to succeed. (29) *The hungry guests helped himself to the food. (30) *The successful woman congratulated himself on the promotion. (31) ???The aunt heard that [she/he] had won the lottery. Revealingly, they obtained efects in the P600 time window for the three types and LAN for S–V agreement. These early studies allowed researchers to suggest the beginning of a functional interpretation of the stages that agreement

104 Agreement

computations generate in the brain. The fnding that S–V infectional morphology ties were resolved in line with other seemingly automatic formal processes (such as word category violations, all generally producing LANs) established agreement as a toy object for many experiments to come aimed at studying ‘syntax’ in the mind. ERP research on agreement is now large (see Molinaro et al. 2011 for a review of 29 studies in nine diferent languages until the time of writing) and several issues are particularly relevant to both linguists and psycholinguists. Here, three will be briefy surveyed. The frst is the issue of features, especially number and gender, but also person. Early behavioural work by Nicol (1998) suggested that not all features may be created equal, and that number, in particular (as opposed to gender), has a more solid standing in syntactic computations. This led Carminati (2005) to posit a Feature Hierarchy Hypothesis that refects ‘cognitive strength’. For instance, in a self-paced reading study, Carminati used contrasts like those in (31)–(35) below in Italian to show that the processing penalty for forcing pro (remember: the missing subject of infnitivals and gerunds) to select an object NP – against its well-established preference for antecedent subject phrases – is signifcantly reduced when number disambiguates the pronoun (versus gender): (31) Quando Maria cerca Roberto, [pro] diventa ansiosa. ‘When Maria looks for Roberto, she becomes anxious’. (33) Quando Maria cerca Roberto, [pro] diventa ansioso. ‘When Maria looks for Roberto, he becomes anxious’. (34) Quando Maria lo cerca, [pro] diventa ansioso. ‘When Maria looks for him, he becomes anxious’. (35) Quando i Rossi lo cercano, [pro] diventa ansioso. ‘When The Rossi look for him, he becomes anxious’. In particular, she found that reading times for (34), with gender disambiguation, were signifcantly higher (so more difculty) than for (35), which uses number disambiguation instead, due to, she contended, the superior cognitive strength of the latter. This kind of work rests on knowledge of features in linguistics. Indeed, the typological literature (e.g., Greenberg 1963) has long recognized that features vary quite drastically in their presence in the world’s languages and posited an implicational hierarchy, as in (36): (36) Feature Hierarchy: Person > Number > Gender The hierarchy means that person is more prevalent than number, and number more than gender (so the implication is that if a language has gender, then it must also have number and person, and if it has number, then it must have person, but not necessarily gender). Cognitive diferences between number and gender, in

Agreement

105

particular, are quite apparent. For a start, there is very little latitude for number construals in the world’s languages simply because number refects cardinality, and this tends to the objective. There is, however, nothing necessarily objective in a gender system. As Acuña-Fariña (2009: 400) observes, one may very well fnd a language which uses a gender for women, fre and dangerous things (Lakof 1987), or for living creatures, including plants but excluding, say, pigs and monkeys, but it would indeed make front page news to fnd a language which uses, say, a morphological form for 7 plus or minus 2, and another for 400. Corbett (2000: 1) makes it clear that the most elaborate number systems in the world’s languages do not exceed fve number dimensions: singular, dual, trial, paucal and plural. This stands in stark contrast with, say, the twenty genders of Fula (Corbett 1991: 148 f.; see also Harley & Ritter 2002: 514). In sum, when it comes to gender, and to the categorizations that gender systems house, almost everything seems possible cross-linguistically (see also Corbett 2013a, 2013b). Despite these ‘representational’ diferences, however, processing diferences, such as the ones suggested by the Feature Hierarchy Hypothesis, are hard to fnd. For a start, the early behavioural studies mentioned above cannot provide solid evidence as they were done using technologies that were ill-suited to the task and/or were riddled with methodological problems. For instance, in the self-paced reading study of Carminatti (2005), one bar press introduced the entire subordinate clause, and the next introduced the entire main clause. These regions of analyses are too long, making room for all kinds of efects to occur without a proper register. Additionally, in those materials, number information appears earlier, at the verb, while gender information can be found only later, at the adjective, which means that reanalysis may well start sooner for the former than for the latter. Ultimately, fnding out whether features are processed as monolithic bundles (as might be expected given linguistic theory, since grammar models do not usually diferentiate trees based on features, despite their obvious representational diferences) or as cartographies of diferentiated features (Mancini et al. 2017) can actually be addressed by carefully examining brain waves. In the early study of Osterhout and Mobley (1995), gender and number were manipulated in the pronoun conditions, but no diferences emerged. A more in-depth investigation is Barber and Carreiras (2005), in Spanish. This study contained two ERP experiments with word pairs and whole sentences and violations in gender, number and gender + number agreement. The word pairs of their frst experiment were made up of nouns and adjectives and nouns and articles. For instance, (37)–(38) show the kind of violations in the article + noun condition for phrases like the table: (37) *El mesa [el-masc-sg + mesa-fem-sg] (38) *Las mesa [las-fem-pl + mesa-fem-sg]

106

Agreement

The authors reported an N400 efect for both gender and number violations. Probably because article + noun sequences involve the generation of full phrases, an additional LAN efect was registered in this condition, but again for both gender and number alike. In their second experiment, they used the same materials but inserted them in complete sentences, in either initial position (for article + noun pairs) or after the verb (for noun + adjective pairs). Ungrammatical strings yielded a pattern of LAN–P600 responses in both conditions. Here something interesting happened: No diferences between gender and number emerged in the LAN component and the initial portion of the P600 again. Diferences did emerge, however, in the later segment of the latter, the 700–900 ms time window, with larger amplitudes for gender than for number. Notice that this is not too unlike what the behavioural studies had found out: diferences emerging around a second after the beginning of violations. This is clearly the temporal domain of reanalysis. The authors connected their fndings to Bradley and Foster’s (1987) model of lexical access. In this model, word retrieval unfolds in three phases: access, recognition and integration. The frst stage deals with lexical identifcation. The second grants access to lexical content (meaning, word category and morphological specifcations). The third stage integrates the lexical entry in the syntactic context and it accommodates agreement operations. Assuming that for systems of arbitrary gender, such as Spanish, gender is housed in the lexical stem but number is added as a computation, gender agreement violations would force one to go back to stage 1 in order to ensure that the right lexical item has been chosen, so reanalysis would be expected to be more difcult than for number errors. This is what the increased late P600 component for gender versus number violations in their study appears to indicate. This set of data suggests that the features of gender and number are processed as undiferentiated bundles. The Molinaro et al. (2011) review further buttresses this thesis. For instance, 74% of the studies that looked into S–V number agreement violations reported a pattern of LAN + P600 brain waves (thirteen out of seventeen; see the authors for a methodological explanation for the non-complying cases; also see below). Determiner + noun and noun + adjective violations behaved alike. Regarding gender, this was examined in nine diferent studies containing determiner + noun or noun + adjective mismatches. A total of 80% of them produced the same LAN + P600 response, with no relevant diferences across the language dimension (German, Spanish, Dutch and Italian). A newsworthy fact concerning the handling of the gender and number features is that in double violations the system behaves in exactly the same way as in individual violations of either gender or number: it produces a quick LAN response followed by the P600 repair-driven stage. Another relevant issue is that when arbitrary gender violations and biological gender violations are compared, no early diferences seem to emerge. For instance, Barber et al. (2004) examined contrasts like the following: (39).*El faro(masc) es luminosa( fem) The lighthouse is bright

Agreement

107

(40).*El abuelo(masc) estaba delgada( fem) The grandfather was slim and registered the same LAN + P600 chain of efects for both types. Slight diferences emerged only late, with positivities more pronounced in anterior areas for the biological gender disagreement condition. In sum, it seems that the quick response picked up by the early negativities refects the detection of a ‘feature crash’ (a break in the formal feature co-variation pattern), a “failure to bind” (Hagoort 2003), in an undiferentiated manner, despite representational diferences. Recent work done on the processing of the feature of person confrms the monolithic scenario but adds interesting nuances to our knowledge of how agreement is dealt with by the mind. Person needs quite specifc “anchoring” requirements in that it codes information pertaining to the speech act itself (1st person is the speaker, 2nd person the addressee, and 3rd person is neither the speaker nor the addressee; Mancini, Molinaro et al. 2011a, 2011b). Biondo et al. (2018) express how the anchoring requirements difer from those of the number feature and how some forms of generative grammar treat such diferences: It has been suggested that the assignment of a speech participant role and the interpretation of the speech time expressed by person and tense requires linking the morpho-syntactic representation of these features to a clauseperipheral position in the complementiser (CP) zone, providing specifcations connected to the discourse representation of the sentence (…), a structural layer where the speech participants and the speech time are encoded. By contrast, the cardinality of number requires a clause-internal anchoring, as this property is expressed by the subject DP itself, located in a specifer position in the functional structure of the clause, the IP. This has interesting consequences that only a sophisticated analysis of the brain waves can illuminate. Over the past few years, work by Mancini and collaborators on Spanish (Mancini et al. 2011a, 2011b; see also Zawiszewski & Friederici 2009, Zawiszewski et al. 2016, and Mancini et al. 2019 on Basque) has shown that person violations behave in much the same way as those of gender and number in the really early stages of processing (therefore yielding frontal negativities) and that they do difer from both number and gender violations in the amplitude of the P600 component, that is, precisely at the time when repair is needed. The late diferences refect greater difculty (higher amplitudes) in recovering from a person mismatch (see also Nevins, Dillon & Phillips 2007 for enlarged P600s in Hindi person violations). However, even the early negativities are subtly distinct, as they conform more with the N400 wave form than with LAN (they peak in centro-parietal areas, bilaterally, with a slight right hemisphere orientation). This may very well refect the fact that the parser may be having trouble mapping morphosyntactic cues onto the pragmatic context that defnes the speech act

108

Agreement

itself and, if so, it indicates a surprisingly swift access to discourse information (see Mancini et al. 2019). Finally, an interesting fact concerning person violations (which are in any case vastly less frequent than number mismatches) is that in rich-infection languages like Spanish they are sometimes “amnestied” (Mancini et al. 2011a, 2011b) by the grammar in much the same way that number mismatches are amnestied by the grammar of English so often (e.g., the committee are gathered). This is what happens in so-called unagreement, as in (41) below: (41) Unagreement: Los lingüistas(3.pl) escribimos(1.pl) un artículo muy interesante ‘We linguists wrote a very interesting article’ (42) Standard agreement: Los lingüistas(3.pl) escribieron(3.pl) un artículo muy interesante ‘The linguists wrote a very interesting article’ This presents us with a chance to see if the brain ‘notices’ that, that is, if the waves generated by the reading of such segments compose a smooth ride through the formal anomaly or, alternatively, a pattern of ‘alarm’. In fact, the brain seems to be doing something in between: it quickly notices the anomaly by yielding a pattern of left frontal negativities in the LAN spectrum, but it also fairly quickly suppresses any need for reanalysis by failing to cast a P600 signal after it (Mancini et al. 2011a, 2011b). This again underscores how fast and automatically morphosyntactic feature error detection happens, and also how fast (but less fast …) higher-order processing routines put language-specifc agreement information to use. The second issue to tackle here is whether, analogously to what seemed to be the case in the attraction literature (production; see section 3.2), languagespecifc morphological richness has any role to play in the agreement computations that ERP research reveals. As noted, in the Molinaro et al. (2011) review, thirteen of the seventeen studies on S–V number agreement cast the same LAN + P600 brain wave complex. The ones failing to abide by the pattern were done on Dutch (Hagoort et al. 1993) and English (Osterhout et al. (1996), both languages with comparatively meagre morphological components (Hindi (Nevins et al. 2007) was also diferent but this is a non-alphabetic language, which, as Molinaro et al. observe (p. 918), makes it hard to evaluate the complexity of the visual stimuli). Although Molinaro et al. adduce other methodological reasons that may account for the diference, they make it clear that strong LANs clearly correlate with morphological salience. Spanish and Italian have that kind of salience and they typically register the LAN + P600 complex. However, even in these languages, this is often not the case when agreement is not vehiculed by clear morphosyntactic feature marking. For instance, when plural is not infectionally marked but is the result of coordination (the boys are versus the boy and the girl are), Molinaro et al. (2011) failed to report a LAN signature in Italian; also,

Agreement

109

when the number mismatch occurs across clauses, Munte et al. (1997) could not fnd it in German either, another relatively morphologically rich language: (43) *Der Opa hat zwei Maikaefer gefunden, die beim Fliegen laut brummt. ‘The grandfather has found two june bugs, which *hums loudly when fying’. Additionally, the role of clear morphological marking is underscored by studies reporting efects due to cue-to-ending consistency, or lack thereof. For instance, Cafarra et al. (2015) compared transparent and irregular nouns in determinernoun combinations in Italian and reported diferential efects for the two kinds of gender markings (with greater centro-anterior negativity for transparent nouns than for opaque nouns between 200 ms and 500 ms; suggesting that such anomalies are detected faster or better). Recall that a pioneer study by Osterhout and Mobley (1995) did report a LAN for English S–V ties. So did Coulson et al. (1998), but not Tanner and Bulkes (2015). So it seems that once again we are not dealing with a yes or no situation but with graded phenomena instead. This extends to types of S–V ties. For instance, it has been seen that verb fniteness violations with modal auxiliaries in English (e.g., the girl won’t *walking/walk to the station) only elicit a P600 (Schneider & Maguire 2018). Finally, the idea that LAN consistency correlates with morphological strength is suggested by recent work aimed precisely at calling into question the LAN efect. Tanner and Hell (2014) and Tanner (2019) have reasoned that the LAN efect is, in fact, an artefact of averaging procedures across subjects and that there is, in fact, substantial individual variation in response to S–V agreement anomalies. The variation would manifest itself in the LAN-N400 continuum in an undiferentiated manner. Revealingly, the range of facts examined by the authors pertains to English experiments. Cafarra et al. (2019) examined this potential component overlap revisiting experiments done in Spanish and concluded that the LAN efect was consistently found at the individual level as well, independently of averaging procedures. In sum, there seems to be rather solid evidence for the view that the LAN + P600 complex is the standard agreement signature for morphologically rich languages and that LAN in particular refects automatic, feature error detection. Variations in the LANN400 spectra and amplitude diferences are sensitive to the size of morphological components, which is well known to vary quite dramatically cross-linguistically. But, interestingly, given so much actual variation in form, the way the brain deals with that form seems fairly narrowly established: LAN and/or N400 followed by positivities starting at around 500-600 ms. The third issue we will discuss here does not have the standing in the literature and the connection with linguistic theory of the previous two points but will be presented here only to briefy show how agreement spans all kinds of confnes of both grammar and processing, which is surely a fact in and of itself regarding its complex nature and the way to go about studying it. This is the issue

110 Agreement

of gender stereotypes and how they are processed in syntactic computations. It is well known that words are often strongly associated with gender stereotypes that afect entire social groups (doctors versus nurses, male versus female footballers; male versus female presidents, etc.; Kreiner et al. 2008) and, more specifcally, that stereotypical knowledge is swiftly activated for lexical items that refer to such groups. In fact, there is solid evidence that such knowledge is summoned almost immediately, without much time for deploying any kind of higher-order inferential routines (Oakhill et al. 2005). Thus, for instance, Osterhout et al. (1997) used sentences containing a refexive pronoun that referred to a defnitionally (e.g., a mother being female) or stereotypically male or female antecedent noun and found that both ‘violations’ elicited similar ERP positivities in the P600 range (see also Canal et al. 2015). Usually, such efects have been obtained by timelocking efects to a pronoun that refers to the stereotypically gendered noun (instead of to the noun itself; say, mother-himself and nurse-himself ). Molinaro et al. (2016) used instead determiner + noun sequences in a morphologically rich language such as Spanish. They manipulated the role noun ending (congruent: miner-os, male miners; incongruent: miner-as, female miners) and the morphosyntactically expressed gender agreement between the determiner and the noun (los mineros, the[+m] male miners vs *las mineros, the[+f ] male miners). That is, they crossed grammatical and stereotypical violations, as in the following conditions (they included a double violation): (44) Control: stereotypically congruent/syntactic agreement: Ayer, los mineros fueron a una cena para celebrar el fn de la asamblea. ‘Yesterday, the[+M] (male) miners went to a dinner for the celebration of the end of an assembly’. (45) Stereotypically incongruent and syntactic agreement: Ayer, las mineras fueron a una cena para celebrar el fn de la asamblea. ‘Yesterday, the[+F] (female) miners went…’ (46) Syntactic violation and stereotypically congruent: *Ayer, las mineros fueron a una cena para celebrar el fn de la asamblea. *ʻYesterday, the[+F] (male) miners went…’ (47) Double anomaly: stereotypically incongruent/ syntactic violation: * Ayer, los mineras fueron a una cena para celebrar el fn de la asamblea. *ʻYesterday, the[+M] (female) miners went…’ The authors reported two main fndings: a long-lasting negativity in the N400 range extending to right anterior parts of the scalp for stereotypicality violations (typical N400 lexical efects are usually posterior); and no P600 efect for morphosyntactic violations. This latter fnding is extremely surprising, in principle, as it seems to indicate that – for the frst time on record? – the brains of speakers of a rich-infection language do not care for a salient morphologically expressed mismatch (analogous to *these car in English). They reasoned that in Spanish, when any cue (on the determiner, the noun, or both) conficts with the

Agreement

111

lexically-stored stereotype, “the neurocognitive system reacts to the anomaly by anchoring on the stereotype” … in such a way that “the strong attracting force of the role noun’s gender stereotype overshadows the typical ERP correlates for morphosyntactic violations”, thus explaining the shocking absence of such typical agreement-error detection + repair signals in their experiment. This fnding underscores the fact that habitual assumptions in the linguistics literature regarding ‘semantics’ (in toto, that is, assuming that stereotypicality is semantics) or the semantics/pragmatics distinction can be fnetuned using knowledge that stems from the brain waves and, more importantly, that we can relate very specifc linguistic distinctions to very specifc brain reactions. By way of comparison and contrast with prototypicality violations, in six diferent ERP experiments containing both morphosyntactic agreement violations and emotionally charged lexical items (so pitting syntax vs the lexicon), Díaz-Lago et al. (2015), Fraga et al. (2017) and Padrón et al. (2020) could never manage to ‘derail’ morphosyntactic agreement processing by fooling the system with strong lexical distractors.17 In all experiments a pattern of LAN + P600 emerged, revealing a language-specifc sensitivity to agreement violations that the authors defned as “encapsulated”. Notice that in the unagreement construction (legal grammatical mismatch) that we referred to above, the brain reacted by producing the typical fast response and suppressing the typical repair processes later (P600). The Molinaro et al. prototypicality (2018) study shows yet another picture of brain reaction specifcity, namely, how such a classic P600-type response can also be suppressed if previous N400-type responses timelocked to stereotypicality violations occur, as these result in lingering negativities instead. The topographic information regarding negative efects starting circa 400 ms seems important to qualify the whole picture of the functional architecture of agreement processing.18

3.4 Summing up We started this chapter by noting that agreement is deceptive in looking simple (a matter of sheer form co-variance), but being actually very complicated. That complexity has long been recognized in grammars. Now we have seen how essentially the same multidimensional nature that is evident in grammatical descriptions emerges when we look into the way agreement co-indexations are dealt with by the mind.19 In fact, the overall impression that an examination of the main ideas and models reviewed in this chapter creates is that we have been here before. Consider the case of the relative clause (RC) adjunction ambiguity that we studied in the previous chapter (the classic somebody shot the servant of the actress who was on the balcony). If you recall, in the early days of psycholinguistic research, this ambiguity was hypothesized to be resolved by locality alone on seemingly commonsensical grounds (the RC is closer to the second noun actress so that should be the preferred linking pattern), so when Cuetos and Mitchell (1988) proved that that logic did not explain the Spanish results researchers were surprised. Our review of this research in the last three decades showed

112 Agreement

a steep number of factors at play: locality, prosody, preposition type, attachee size, etc. Today, the idea that a single mechanism may account for all that seems far-fetched. Attraction in both production and comprehension and the electrophysiology of agreement (comprehension) have now been studied for a similarly long time and the overall impression is the same: nothing like mere formal feature harmony in a cyclic, syntactically defned tree geometry comes close to explaining agreement (however important that may be).20 This surely means something in itself re the idea that we can derive about the structure of agreement in the grammar and in the mind. We do have many relevant fragmentary pieces, but not a simple coherent mechanism (e.g., feature match in a specifed domain). This is very probably because there is no such elegant simple whole, but, instead, an active, dynamic multidimensional network that is responsive to the simultaneous working of a number of diferent constraints. At present, these provide a sufciently rich picture anyway. The picture must start from the realization that agreement is feature-driven and that access to features is time sensitive, which means that comprehension and production need not be aligned. We need to keep this contrast in mind and still be able to see some coherent unifed whole. Take, in the frst place, the idea that when features are massively present meaning is massively accessed, a derivation of basic tenets of usage-based accounts of grammar. We have seen that that idea is hard to keep, starting with work in production studies. The distributivity agenda in the attraction literature has been extremely useful in revealing that, if at all, languages with rich morphological components do not show increased semantic interfacing, but rather the opposite. We may speculate that alliterative features are a preferred way of phrasal and clause construction in these languages (e.g., Spanish), and, when they are such a preferred linking pattern, their most immediate job is to link similarly marked constituents [-Xo/-X’o/-X”o] in a frst round of processing. At that stage, accessing something like the same numerosity computation ten times per second is surely not cognitively functional. In rich infection languages, the importance of features is mirrored by strong cue reliability efects. The Lorimor et al. (2015) data showing that gender is used in subject-verb computations in Spanish (gender does not play any part in the grammar of that) and the Villata et al. (2018) data showing something similar in Italian on a diferent agreement confguration (object relatives) suggest that features are consulted preferentially to discriminate among possible lexical candidates in a manner consistent with contentaddressable architectures. But perhaps the most interesting side of this is the angle provided by comprehension studies using the ERP methodology. The Molinaro et al. (2011) review made it clear that language-users of rich infection languages use their rich morphologies to navigate their way into the establishment of phrasal packages at the earliest moment possible. Thus, when gender and number agreement violations of the [-Xo/-X’o/-X’’a] type occur, ERPs swiftly cast a LAN signal that basically means that the expected shallow morphological

Agreement

113

harmony has been broken. Molinaro et al. do not use the word ‘shallow’ but express the same idea more prudently: it appears that syntactic analysis (correlating with the LAN modulation) is sensitive to cues that are expressed (marked) in the functional morphology of both agreeing constituents (…). Following some hypotheses about active predictions as the possible underlying processes of these early components, the identifcation of the morphologically expressed feature may well trigger an active expectation for a following constituent showing the same value. For instance, a determiner triggers an expectation for a noun, while a noun triggers an expectation for a verb. This expectation is syntactic in nature, to create syntactically well-structured sentence representations. (…) a feature expressed by the functional morphology of a trigger would initiate a search for a target constituent with a matching feature. If the features are expressed formally, as functional morphemes attached to lexical stems, the cognitive system could rely just on those cues to satisfy the agreement expectation, and establish the syntactic relation (without accessing non-functional information). When the value expressed on the expected constituent does not match, a LAN is triggered. (Molinaro et al. 2011: 925; emphasis added) Speakers of diferent languages may develop subtly diferent neural refexes as a result of the place that such computations occupy in their lives. This is why LANs are harder to fnd in English or Dutch than in Spanish or Italian (for gender and number violations P600 is generally a constant). A related fnding that illuminates the nature of agreement is that systems well oiled in the use of massive morphologies do not care for representational differences among features; instead, they use their ‘shallow’ formal appearance to check the fuidity of phrasal packaging and then react to that diferentiality a little later (in the P600 time window) and in diferent areas of the brain. For reading studies, Acuña-Fariña (2018c) has likened this process to a process of Match and Check where, if Match is all right, Check does not ensue. Match is automatic and superfcial. Check is much less so. It is quite fascinating that we can map the journey from feature crash detection to repair activity with such precision for each feature involved. For instance, when examining person mismatches of the unagreement kind in Basque (meaning: legal person mismatches), Mancini et al. (2011a, 2011b) actually claim that the brain needs some 100 ms to go from blind feature mismatch detection (at circa 400 ms) to activating the right discourse anchor that provides an alternative reading of the subject (at circa 500 ms), suppressing any need for reanalysis. Additionally, it seems clear that the mind/brain does not need much to deploy its agreement signature: double violations do not difer from single violations in any signifcant way at early stages.

114

Agreement

The previous comments suggest blind feature-driven automaticity. However, another relevant aspect of the psycholinguistic work reviewed here is that semantic interfacing is a given formant, as evident from the fact that distributivity efects have been registered in every language examined to date in production studies. The plausibility efects uncovered in English cannot be easily explained away either. Again: in production studies … This can hardly be random (both the efects themselves and the fact that they are routine in production studies), and it speaks of semantic penetration, thus revealing a nice convergence with the ad sensum agreement patterns long recognized in the realm of linguistics. In production, it is instructive to conceive of a conceptual structure (a particular numerosity) being strongly activated at the stage of functional assembly, ready to ‘contaminate’ or ‘leak’ into form during constituent assembly if such form is weak. This is probably why number mismatches are so frequent in the grammar of English, which means that the grammar has succumbed to the conceptual pressure. It is also why distributivity efects are strong in that language since such efects (an index of semantic penetration) inversely correlate with morphological strength. Interestingly, the pressure from a previous conceptualization stage may well dissipate if we change the direction of encoding: in comprehension AcuñaFariña et al. (2014) did not register distributivity efects with the same materials that Vigliocco et al. (1996), and then many others, did in Spanish. There is a surprising lack of data on the role of distributivity in comprehension, but with the data we do have it seems that we can aford informed predictions of agreement patterns: other things being equal, semantic interfacing will be stronger the weaker the morphology; and they will be stronger in production than in comprehension, given the change in information fow. Together with Corbett’s Agreement Hierarchy, the juggling (interaction) of these constraints can probably account for a large slice of the notorious variability of agreement patterns in the world’s languages. Thus, there is no reason to believe that semantic penetration is rampant or capricious. Importantly, there is no reason to believe either that it is tightly reined in, that is, only possible during an encapsulated, initial conceptualization stage (a sort of distinct Marking cycle), with no looking back. Such a strict serial chain may be a habitual routine but not the only routine. As far as we can tell, domains are certainly more porous than that and they interact at least with the direction of information fow and with morphological strength. The strong semantic efects do not in any way entail that ‘a-syntactic’ architectures of agreement computations are attractive. We know that they are questionable from the grammar standpoint (Pollard & Sag 1994: 71 f. and Acuña-Fariña 2018a). Here we have seen that structural depth efects, feature markedness and head pre-eminence efects also count, and these are not easily accommodated inside syntax-free architectures. It is equally important not to lose sight of the fact that over 90% of agreement operations do actually respect head status, a fact that is easy to ignore when a whole research agenda focuses on the malfunctioning 10% left. Crucially, semantic penetration (broadly, semantics) interacts with headedness (broadly, form) in that it is greater when the semantic variable

Agreement

115

causing it resides in the head. This is typical agreement in grammar: form and meaning acting seemingly in parallel. Somewhat frustratingly, heads and depth efects are suggestive of trees and nodes, but we know that trees and nodes are not a sort of canalized obligation: clitics, pronouns and downward percolation attraction show that whatever spreading activation account we may envisage need not rest on the obligatory passing of features through tightly sealed trees, at least of the conventional kind. I see an apparent contradiction and/or complication here but cannot ofer a way out of it. In conclusion, there is no need to assume that agreement works in exactly the same way in production and comprehension. Whatever is activated frst (a specifc conceptualization in production, a specifc form in comprehension) changes with modality, and this crucial timing diference surely has repercussions. Crucially, there is no reason to believe either that extremely conspicuous diferences in the size of the morphological repertories of the various languages examined in the attraction literature (mark the size of this NP!) are immaterial. Recent research on the role of morphology is at least suggestive that semantic interference is more likely when that morphology is poor. Or, conversely, that a frequent, redundant, alliterative phrase-building schema staves of semantic penetration. In this delicate equilibrium, morphology (especially free-from-meaning gender cues) acts as an interface mechanism, showing itself in unique proportions in the world’s languages, thus rendering all agreement systems correspondingly unique but in a way that is motivated by the recognition of the acting constraints.

Notes 1 See Dye et al. (2018) on the usefulness of gender contra the idea that gender is a ‘language design problem’. 2 One approach that will not be examined directly here is that of a series of publications led by Julie Franck (e.g. Franck et al. 2006, 2008, 2010, 2015) where attraction mistakes are seen as a window into the diferent stages of hypothesized underlying derivations understood as in generative grammar (see Bock & Middleton 2011 and a response to that in Franck 2011). Even though I refer to this research quite often, I focus on the descriptive facts (some of which are crucial in the history of the feld), and not on the particular interpretations of those facts in the context of a generative grammar. 3 In this sense, Maximal Input fts into a moderate version of constraint satisfaction approaches to language. 4 This might also be slightly problematic for recent accounts in comprehension, like SOSP (self-organised sentence processing; Villata et al. 2018; Smith et al. 2018) that posit feature-based treelets where “meaning is always present when form is present” (Smith et al. 2018: 26), depending on whether that tenet afects sublexical units or not. 5 Evidence from reading studies (again, not our direct concern here though) that predictive processing is very strong is of course well known. For instance, the Filled Gap Efect shows that fllers (as which ship in Which ship did you say Andy was meant to board?) do not wait for the right gaps (after board) but rather proactively postulate them avidly all along the way (after say; Stowe 1986). The Feature Mismatch Efect shows that when dealing with cataphora in English (e.g., When hei was at the party, the boyi cruelly teased the girl during the party), processors project subjects in a specifed position before the

116 Agreement

6 7

8 9

10

11

12

13 14

15

16

subject in question actually shows up (so there is a penalty if in the position of the boy one fnds the girl, even though the girl is a legal subject; see van Gompel & Liversedge 2003). On all this, see Chapter 4. For a recent account of how the abstract idea of Marking can be made more precise by using a hierarchy of semantic features (that entail diferent levels of notional plurality), see Smith et al. (2018). In fact, strictly speaking, collectives do not ‘attract’; rather, they impose plural conceptualizations during Marking. In the model, attraction is a purelly infectional operation bound to the moment when the second noun is inserted into a syntactic tree. In a nutshell, *the label on the bottles are all wrong is the result of Marking; *the pool for the swimmers are all dirty is the result of true attraction. See Badecker and Kuminiak (2007) for a discussion of problems in incorporating case to a spreading activation account. Attraction in pronouns can be found in confgurations involving either tag or refexive pronouns, such as the actor in the soap operas rehearsed, didn´t he/*they?, and the actor in the soap operas rehearsed himself/*themselves, respectively. Bock et al. (1992) and Bock et al. (1999) have studied that. The authors maintain that the results of their experiments suggest that interference is caused by object movement, particularly by the intervention of the intermediate trace of the moved object. Generative grammar specifcities aside, the diference between (22) and (23) seems evident intuitively: in (22) the patients is the object argument of cures but appears displaced relative to the typical post-verbal position of object phrases in declarative sentences, after the verb (SVO). Not so in (23), where it appears in the ‘lawful’ object position after its subcategoring verb ‘say’. See Chapter 4 on the cost of these displacement operations. Franck and Wagers (2020) have recently expressed the view that, since “the strength of attraction coincides with the level of accessibility of the attractor [which is sensitive to structural factors] not with its retrieval speed”, the role of hierarchical structure in attraction is compatible with “a content-addressable mechanism that relies on cues, rather than on search”. In their frst experiment, they used jabberwocky sentences to reduce semantic guidance in performing the task and obtained attraction efects similar to those obtained for natural sentences. Igoa et al.’s (1999) make the following observation: “If our understanding of agreement models is correct, a copying model should expect partial strandings in the frst noun to outnumber parcial strandings in the second noun, provided that the infection assignment operation proceeds from the controller constituent to the target constituent through the phrase structure tree. In contrast, a unifcation model would predict no diference between both kinds of parcial stranding, insofar as infectional features are assigned independently”. For problems in the statistical interpretation of Gillespie & Pearlmutter’s (2011) fndings, see Brehm and Bock (2013: 151). The way in which MFH fts into M&M is via the reconciliation process of the latter, where morphology is believed to act as a counterweight to conceptual pressures: the more morphological computations there are in a language, the smaller the efect of marking (the contribution from notional number) from the top down. Notice that this means that morphology acts late. For instance, in Dominican syllable-fnal –s (as for the plural feature) has become weak or sometimes entirely gone, a fact that has helped blur or eliminate distinctions in practically the entire verbal paradigm. In the same fashion, syllable-fnal –n (which is needed to mark plural in the verb, as in viene versus vienen ‘s/he comes’ versus ‘they come’) has been similarly eroded, so that the distinction between the third person singular and the plural forms is also compromised. Determiners and adjectives are equally compromised. Schlueter et al. (2019) were interested in examining whether attraction leads to misinterpretation (that is, whether in *the label on the bottles are green the mind assigns to

Agreement

17

18

19

20

117

bottles the primacy of the conceptualization by turning it into the thematic subject, instead of label, a fact that would point to structural reanalysis) and answered this question in the negative. Again, with nuances: interpretive errors occurred only on a small subset of trials. This allowed them to argue for a view of attraction as a “lowlevel rechecking process”. These studies manipulated the valence and/or arousal of words, in a manner almost identical to the studies on RC disambiguation that we briefy discussed in chapter 2 (sections 3.2 and 3.5). So, for instance, one condition (pleasant, mismatch) included a violation like el chico pintó un cuadro-[masc] hermosa-[ fem] (‘the young man painted a beautiful-fem painting-masc’), where a pleasant adjective mismatched the gender of the noun, and in another (unpleasant + mismatch), they would use Tania tiró el pescado-[masc] podrida-[ fem] (‘Tania threw away the rotten-fem fsh-masc’), with an unpleasant adjective mismatching the gender of the head noun (Padrón et al. 2020). At this point, it may be relevant to point out that there also exists work on attraction done with the ERP methodology, but it was not included in either the previous section or this one because it is still nascent and it uses diferent structures, making comparisons very difcult. Tanner et al. (2014) did not fnd attraction efects in grammatical sentences with PP modifers in English, but Lee and Garnsey (2015) did. Martin et al. (2014) found such efects but in an entirely diferent construction: noun-phrase ellipsis. Recall Eberhard et al. (2005: 531) making the point that agreement may be “disarmingly simple in appearance” but also a “morass for linguistic and psycholinguistic theories” because, in fact, it “is not only syntactic, not only semantic, and not only pragmatic, but all of these things at the same time”. Recall that formal linguistic models (Chomsky 1981, 1995, 2000; Pollard & Sag 1994) view agreement as an encapsulated formal process occurring during the syntactic construction of the sentence, fully independent from anything non-syntactic.

4 GAP FILLING

4.1 Introduction In the psycholinguistics literature, gap-flling refers to the set of operations conducted by the parser in order to deal with elided material (gaps) in the chain of speech. Understood informally, gaps of various kinds are completely inescapable and therefore extremely frequent in ordinary language use. There are two major reasons for this. In the frst place, very often a constituent is silenced (and its ‘content’ felt latently anyway) simply because it codes old or easily recoverable information whose omission results in no information loss. That is, such gaps are the result of the assumed ongoing activation of a certain referent (Lambrecht 1994). If already activated, and still deemed to be in focus, the referent in question need not be re-coded all the time. For instance, even in English, which is a nonpro-drop language (i.e., one whose subject cannot usually be omitted because the verb contains little information about it), subjects of tensed verbs are often dropped in topical chains such as that in (1), centred on the constituent Tom: (1) Tom said he doesn’t want to get involved with that; … (gap) prefers to stay out of it at frst at least. The underlying reason for this kind of silencing operation is economy, of course. Not even at a speed of some ffteen to eighteen phonemes per second can we pretend to code all of our ongoing mentalese, so getting rid of the actual utterance of parts of it is extremely functional, as it allows us to focus on the other parts of the message that our addressee may not be able to predict or think about (the assumed new information). This accounts for the gaps that typically arise in gerundives and infnitives as well, which are heavily grammaticalized (that is, heavily regulated by grammatical ‘rules’; see below), as in (2)–(3): DOI: 10.4324/9781003405634-4

Gap flling

119

(2) Tom came here after (gap) visiting his son. (3) Tom wanted to (gap) visit his son. In these, the silenced information in the two non-fnite clauses can easily be recovered by looking into the main clauses, which, besides, come usually prior. In (2) and (3) the matrix clause subject Tom ‘flls’ the two downstream gaps, quite unproblematically for the speakers of English (not so unproblematically for the scientists trying to understand the grammatical laws governing such elisions). The second major reason for having to deal with a gap is sometimes less bluntly observable and therefore more open to diferent interpretations from diferent linguistic frameworks, namely, movement. When a constituent moves, the position where we fnd it natural to place it or expect it is now a gap. The diference with the frst type described above is that the fller for the gap may be, but need not be, in the same clause. In these specifc cases, senso stricto, there is therefore no gap but a scrambling of the order of the constituents and yet, as we will see, the mind does see such positions as gaps that have to be flled straight away. In any case, movement very often results in a constituent appearing far, or even very far, from the clause where it belongs. This is the realm of relativization and wh-question formation, as illustrated in (4) and (5): (4) Tom came with the man who the local police are trying to locate (gap) now.

(5) Which man is the local police trying to convince the Feds to locate (gap)?

It is also, as is well known, the realm of a very large portion of (especially generative) grammar that has spent decades trying to establish the conditions and rules that regulate and constrain displacement operations cross-linguistically (some early work: Chomsky 1981; Gazdar 1981; Bach 1982). In fact, languages vary considerably in this regard. For instance, as O’Grady (2010) observes (drawing on Hawkins 2004: 192 f.), Russian allows gap-creating extraction only from infnitival embedded clauses, (6) The cucumber which I promised [INF to bring (gap)], but English makes it possible out of tensed ones, (7) The cucumbers which I promised [S that I would bring (gap)], and Swedish permits it from a tensed clause that is embedded inside a noun phrase:

120 Gap flling

(8) A bone which I see [NP a dog [S which is gnawing on (gap)]]. Understood as a deviation from a language’s preferred word order, movement/ displacement is often invoked by linguists as a fundamental property of natural languages (Hauser et al. 2002). From very early on in the generative tradition, it has been assumed that it is precisely in the order in which constituents appear in ‘untransformed’ sentences that such constituents receive their theta roles (agent, patient, etc.), so displaced constituents posed a problem as they are unrelatable to the basic skeleton of the ongoing predication. The notion of trace was quickly summoned to provide a mechanism that could preserve a tidy structure: if, when a constituent moves, a record of its original position is kept by the computational system (the trace), then all movement transformations are structure preserving. As Hestvik et al. (2007) note, this has the added advantage of obviating the need to specify the order of transformations, which makes models of language using this technology better equipped to deal with both comprehension and production processes.1 It is evident that dealing with potentially long-distance dependencies of the kind described above poses a problem for parsers with limited working memory resources (see section 4.2 below). The amount of material intervening between fller and gap is purely arbitrary and need not be any simpler than the very same fller-gap co-indexation operation at both ends of it (Ross 1967). Additionally, the resolution of such ties involves the juggling of knowledge that is far from trivial (Phillips 2006). This was immediately apparent to the frst researchers who decided to look at gap-flling from its implementation side: Filler-gap sentences also provide a rich feld for studying constraints on the use of distinct types of information. In addition to the information needed to establish the constituent structure of ‘untransformed’ sentences, the sentence comprehension mechanism needs information about the possible movement rules of the language (or possible binding relations; cf., Chomsky 1980; or slashed categories; cf., Gazdar 1981), special constraints on the relation between the fllers and gaps (e.g., conditions like subjacency, Chomsky 1973, 1980), and information about the control properties of verbs (Chomsky 1980). Frazier et al. (1983: 192) A substantial part of the early experimental work done involved comparing subject and object relative clauses and trying to understand why the former appeared to be easier to process than the latter (Cook 1975). For instance, Wanner and Maratsos (1978) had subjects perform a working memory task while reading relative clauses and were able to see that the processing efort (measured both by the number of sentence comprehension errors and success at the simultaneous memory task) was indeed larger for object relatives. The increased reconstruction efort needed for object relatives seems evident enough in that the

Gap flling

121

“memory-stretching” region (Pinker 1984: 222), that is, the distance over which the fller must be kept in the memory bufer, is indeed conspicuously larger in them. Perhaps a more wide-ranging issue (that provides the backbone of much work to the present day) is the representational reality of the postulated gaps (Bever & McElree 1988; McElree & Bever 1989; Nicol & Swinney 1989; Featherston et al. 2000; Fiebach et al. 2001). The early experimental studies sparkled with enthusiasm at the possibility that gaps were not only psychologically very real but even a new, sophisticated yardstick for measuring the validity of one’s grammar (which may or may not contain traces or other socalled empty categories). For instance, Swinney et al. (1988) used a cross-modal priming technique (CMLP) to probe for the activation of wh-traces in relative clauses such as (9): (9) The policemen saw the boy that i the crowd at the party * accused ti * of the * crime. In the CMLP paradigm subjects hear a sentence while at some point during it they have to perform lexical decisions on words shown on a screen. So, if a word like truck fgures in the sentence (auditorily) and another word road is given for lexical recognition on a screen later on (‘push yes if you know this word’), then road would be recognized faster than in other contexts not containing a related word like truck read previously. That is the priming efect that truck produces on the recognition of words that appear after it: it activates them prior to their actual appearance. In (9) that announces the imminent presence of a gap, which becomes visible after accused (hence the trace t there). At the asterisked positions Swinney et al. tested for priming for words related to boy (the ultimate antecedent of the trace) and found it only after accused, right where the trace was postulated. They argued that the trace was therefore ‘real’. Nicol and Swinney (1989) subsequently provided evidence that only grammatically legit antecedents are reactivated at the trace positions. For instance, in (9) no priming after accused was obtained for words related to crowd, despite the fact that that word occurred even closer to the gap position. They used this fnding to argue for “the view that reactivation of potential antecedents is restricted by grammatical constraints when they are available. When structural information cannot serve to constrain antecedent selection, then pragmatic information may play a role, but only at a later point in processing” (Nicol & Swinney 1989: abstract). Remember the philosophy: frst is syntax; then all the rest … At around the same time, using a probe recognition task,2 Bever and McElree (1989) studied the NP-movement gaps postulated by the Government and Binding generative grammar of the time, as in (10) and (11): (10) John i was hit [i] by Mary. (11) John i was certain [i] to leave.

122

Gap flling

and found evidence that their antecedents were primed, a fact that they used to argue that such gaps efectively access a representation of those antecedents (like explicit pronouns; Cloitre & Bever 1988). In fact, they argued for more than that (p. 34; see also Bever 2013: 388): On its own, this simple demonstration lends credence to the notion that the representational assumptions of GB are relevant to the psychological processes operative in sentence comprehension. This notion is further supported by the demonstration that NP-movement gaps appear to activate their antecedent to a greater degree than PRO gaps. GB formally distinguishes between these types of gaps, and we have followed this analysis rather directly in suggesting that greater activation results from NP-movement gaps as a consequence of greater necessity of structurebuilding operations. In short, we assume that PRO gaps can be flled by access and coindexing to an independent structural representation of the antecedent, whereas NP-movement gaps provide the frst opportunity to incorporate the antecedent into an argument structure. (…) To our knowledge, competing grammatical theories, such as GPSG and LFG, at present do not distinguish between PRO and NP-movement gaps in a manner that would a priori predict the pattern of results we found. As we will see, later research would inevitably qualify these bold early claims, but the fact remains that if we assume a more or less transparent relationship with the grammar (Berwick & Weinberg 1982, 1983; Hale 2011; Lewis & Phillips 2015), it is surely worth our time to look for experimental evidence of ‘content’ at assumed gap positions and to study “how the human parsing system decides where to posit gap sites in amongst the pronounced elements as it works through a sentence incrementally” (emphasis added; Hunter et al. 2019: 1).3 Here, as anticipated in our Introduction, we will use the descriptive labels created by the generative grammar of the 1980s and 1990s (as we have just done) to refer to the main types of gaps studied in the psycholinguistics literature. So we will use such notions as PRO, pro or Wh-trace for convenience, as they now refer to well-known linguistic phenomena that are easily recognized by most linguists. Examples (12)–(15), from the Introduction, are repeated below: (12) Petei tried PROi to be attentive all the time. (13) Vine con María i. proi Es tan maja …. I came with Mary i. She(=proi) is so nicefem .... (14) Ronniei seemed tNPi to be too worried about our performance. (15) What i was Diego worried about tWH i? Section 4.2 will frst focus on the general role that working memory plays in gap-flling operations. Section 4.3 discusses the role of locality by focusing on

Gap flling

123

the Most Recent Filler Strategy of Frazier et al. (1983). This will take us to an examination of PRO gaps and gap-driven parsing. In section 4.4 the Active Filler Strategy and fller-driven processing are analysed in the context of relative clauses and wh-questions, structures containing transparent fllers and presumed traces. This section will also introduce the Direct Association Hypothesis of Pickering and Barry (1991). Section 4.5 focuses on scrambling word order predictability and the Minimal Chain Principle (de Vincenzi 1991). Section 4.6 presents a little summary and conclusions. Section 4.7 presents a sort of addendum or epilogue that focuses on the mirror image of reference resolution via gaps. We will briefy consider the linguistic constraints on the interpretation of pronominal elements, that is, on referential tracking when gaps are not allowed or are strongly dispreferred. This happens when reference is not maximally given information and must therefore be expressed (through pronouns).

4.2 Working memory As noted above, gap-flling typically involves a taxing of the memory system. This is intuitively evident in various forms. One is the geometry of the tree over which a memory-stretching unit unfolds. For instance, as early as 1964, Miller and Isard reasoned that an example like (16), containing several centre-embedding loops, was almost impossible to parse (see Introduction, and Pinker 1984: chapter 7 on these challenging geometries): (16) The man who said that a cat that a dog that a boy owns chased killed the rat is a liar.4 More typically, however, at least in the specialized literature, memory is a purely quantitative problem having to do with the sheer amount of tree one needs to keep in a bufer till a place for a dangling fller can be found in the unfolding structure. For instance, Gibson and Hickok (1993) made the point that (17a) was considerably easier to parse than (17b) due to the diference in the length of the region that the processor needs to keep the which-phrase in store before it can be tied to its legitimate subcategorizing predicate (the preposition in in (17b); the verb put in (17a)). As the underlining suggests, that diference is perceptually very salient and easily accounts for the way we perceive these segments of speech: (17a) In which box did you put the very large and beautifully decorated wedding cake bought from the expensive bakery? (17b) Which box did you put the very large and beautifully decorated wedding cake bought from the expensive bakery in? It has long been noted that in addition to well-known information structure needs, grammars ofer more than just one way of expressing the same basic truthconditional meaning, in part also, as a result of the need to reduce memory

124 Gap flling

burdens opportunistically. Pinker (1984: 221) provides the following obvious example contrasting an active and a passive sentence. Again, notice the diference in the underlined segments: (18a) Reverse the clamp that the stainless steel hex-head bolt extending upward from the seatpost yoke holds (trace) in place. (18b) Reverse the clamp that (trace) is held in place by the stainless steel hex-head bolt extending upward from the seatpost yoke. The limitations of the human working memory system have been in the minds of psycholinguists from the very beginning. Bever (1970) discussed example (19) below in this connection (check the words in bold), (19) I thought the request of the astronomer who was trying at the same time to count the constellations on his toes without taking his shoes of or looking at me over, and Kimball (1973), who discussed the same example, imbued several of his seven parsing principles with memory considerations, noting that the parser’s capacity does not exceed two sentences. Early work by Blaubergs and Braine (1974) had indeed suggested that comprehension is severely compromised in sentences with three or more centre embeddings. We have referred above to work by Wanner and Maratsos (1978) that capitalized on our fragile short-term memory by subjecting subjects to a working memory task during the reading of subject versus object relative clauses. They proposed the HOLD Hypothesis to account for the fact that the head of a relative clause must be held in the ‘HOLD list’ till a gap for it can be found (with that processing time being longer for object relatives). The main theories of gap-flling today, such as the Active Filler Strategy (Frazier & Flores d’Arcais 1989), the Minimal Chain Principle (De Vincenzi 1991) or the Direct Association Hypothesis (Pickering 1993, 1994; Pickering & Barry 1991; see the sections right below on all this), are all memory load models in that a major part of their essence is the postulation of heuristics or principles aimed at suggesting how humans succeed at circumventing memory limitations in parsing (Fiebach 2001; resource-limitation theories, Levy 2008a). These specifc theories of gap-flling share this simple idea with more general models like the Garden Path/ Construal theory (Frazier 1978; Frazier & Rayner 1982; Frazier & Clifton 1996) or the Syntactic Prediction Locality Theory of Gibson (1998). Already in relatively early electrophysiological research done with the ERP methodology, a sustained anterior negativity, or SAN, between the fller and the gap has been reported relative to control strings without gaps (Fiebach et al. 2001,  2002;  King & Kutas 1995; Phillips et al. 2005). This efect is known as ‘long negativity’ and is taken to index the processing toll that comes with keeping a fller unassigned over time (say the phrase which car in Which car do you think Mary is more likely to ask John to paint (gap) frst? see section 4.4 below). Indeed, as Leiken et al. (2016) note,

Gap flling

125

“One of the most replicated fndings in the neurolinguistic literature on syntax is the increase of hemodynamic activity in the left inferior frontal gyrus (LIFG) in response to object relative (OR) clauses compared to subject relative clauses” (e.g., the boy that I love (gap) versus the boy that (gaps) loves me). Overall, there is no question that memory is the most important bottleneck in the human sentence processing system (Stabler 2013). In fact, memory can be shown to exhibit stress symptoms even without displaced fllers announcing the presence of an upcoming gap (section 4.4). In Japanese, a head fnal language in which wh-words stay in the same canonical SOV position as their non-wh counterparts, RANs (Right Anterior Negativities) instead of LANs have been found (Ueno & Kluender 2009). Japanese wh-words always require a question (Q) particle  ka  or  no  at the end of the clause. This Q-particle  determines the interrogative scope of the wh-element (that is, the domain of the sentence that is being questioned) in that it is the position of the particle rather than the position of the wh-element that indicates scope (in English that scope is determined by the diferential placement of what in, for instance, Did John ask [what Susan brought (gap)]? versus What did John say [Susan brought (gap)]?). Another relevant property of Japanese is that the wh-word itself IS informative about syntactic function and semantic role. This means that a Japanese wh-word is typically unambiguous and that what is ambiguous is its interrogative scope. In their ERP study, Ueno and Kluender (2009) examined these in situ wh-structures by comparing them with equivalent yes/no questions (the control condition) and detected the anterior negativities at various positions between the wh-words and the corresponding Q-particles in several comparisons. The efects were rightlateralized in most cases, although they were bilaterally distributed sometimes, and began at the wh-phrase (the onset of the dependency), persisting right up to the corresponding Q-particle (the end of it), and allowing the authors to suggest they are correlated with the wh-Q dependency itself. Interestingly, there were no local LAN/RAN or P600 efects at either the embedded verb-Q or main verb-Q positions. The authors speculated that this is due to the absence of overt displacement of wh-fllers in Japanese, suggesting that the P600 signature that shows up habitually in wh-movement languages indexes just that: the repositioning of the fller after the gap in overt movement scenarios (section 4.5). So what the RAN efect indicated in the Ueno and Kluender experiments was the taxing of the memory system while the dependency was not resolved, movement aside (explanations in terms of ‘covert movement’ are not pursued here; see Lo & Brennan 2021 though). And yet memory is not the wholistic thing that all the previous considerations may have led one to believe. In fact, as Fiebach (2001: 22) observes, “working memory is assumed to refect the ability to maintain information available within the cognitive system and at the same time manipulate this information”, and this is something that contrasts with the initial accounts that suggested a rather passive storage unit (Miller 1956; see Caplan & Waters 1999). For one thing, when processing a sentence, there is a potentially large number

126

Gap flling

of intermediate computations (beyond gaps) that need to be kept active before they can be linked both backwards and forwards ( Jackendof 2007a). Then, it is well known that there are also isolable components of the working memory system that specialize in dealing with phonological and semantic information, both aspects being needed to interface with more syntactically oriented computations recruiting specifc memory operations of their own (Martin et al. 1994; for a review of the cognitive neuro-science of memory, see, for instance, Ricker et al. 2010). Then, over and above the purely geometric and quantitative aspects (the shape and amount of tree over which a memory unit must be kept), there is also evidence that elaboration and semantic distinctiveness of the items to be kept in store enhance retrieval, and that the complexity of antecedents in long-distance relationships results in longer encoding times but faster retrieval times due to their higher levels of activation (Hofmeister 2011; see López Sancio 2020 for a review). Finally, memory computations are clearly afected by competition between the items that are incrementally reached. The cue-based model of retrieval has mapped various forms of interference in retrieval caused by partial cue-matching and has recently become one of the pillars of current work on any kind of syntactic computation involving long-distance dependencies (Lewis et al. 2006; Vasishth & Lewis 2006; Parker et al. 2017). Ever since Daneman and Carpenter (1980) introduced the reading span task, researchers interested in various aspects of the reading comprehension process have counted on a tool that has allowed them to measure how individual diferences in working memory capacity enter the scene (one’s working memory capacity, or WMC). In this task subjects read sentences and are instructed to try and remember specifc words in them. An individual’s reading span is then the number of sentences for which the targeted word in them is remembered. The college students in Daneman and Carpenter’s study oscillated between two and fve. When examining the subject versus object relative clause problem, King and Just (1991) realized that the complexity efect that made object relatives harder was signifcantly due to the difculty that low-span readers had with these memory challenging structures. No diferences emerged between high-span and lowspan readers for subject relatives. Daneman and Carpenter’s methodology has been adapted in diferent ways over the years. One common variation involves the operation span task (Conway et al. 2005). In this the distractor activity consists in judging whether a simple equation (say, 3 + 6 = 9) is right or wrong (usually today this task is implemented using the Py-Span-Task software of von-der-Malsburg 2015). These more recent variations have arisen out of a concern that measurements of the reading span task may also be sensitive to reading experience, rather than working memory capacity per se (von-der-Malsburg & Vasishth 2013; Nicenboim et al. 2016). In this sense, the operation span task has the merit of being much less languageoriented, and it is thus thought to provide a more accurate reckoning of actual modality-independent memory power. Based on the number of correctly recalled items, researchers calculate a numeric score of individual WMC for

Gap flling

127

each participant. After Conway et al. (2005), they typically aim at partial-credit unit scores (PCU). This scoring method calculates the proportion of correctly recalled items within each individual sequence. For example, recalling correctly two items from a four-item sequence and recalling three items from a six-item sequence both correspond to a PCU score of .50. Thus, all sequences count the same and the fnal PCU score – ranging from 0 to 1 – is the result of averaging all the PCU scores from the 25 trials in the experiment. This measure retains more information than traditional absolute span scores in which individuals have to recall all the items from a given sequence so that it is counted in the fnal score, limiting the sensitivity of the task.

4.3 Recent fllers: controlled PRO and gap-driven parsing An early proposal on the way gap-flling operations are resolved is the Most Recent Filler Strategy of Frazier, Clifton and Randall (1983). As its name indicates, the proposal is quite simple: whenever a gap is detected the parser will look for the nearest potential fller available to fll it. The authors pointed out (p. 195) that intriguing developmental work by C. Chomsky (1969) provided evidence that young children systematically misinterpret sentences like (20a) below by assuming that it is Bill, the NP closest to the gap, that is the subject of the infnitive, instead of John. In so doing they seem to implement a minimal distance principle strategy, which is indeed correct for (20b): (20a) John promised Bill (gap) to grab the jewels. (20b) John told Bill (gap) to grab the jewels. In the philosophy of principle-grounded models nascent at the time (like the GP theory) this was seen to be a better hypothesis than the rival view that, somewhat miraculously, “the comprehension mechanism uses all types of information in a parallel, ‘interactive’, unstructured fashion” (p. 190). The MRFS was explicitly defned as follows: Most Recent Filler Strategy: During language comprehension a detected gap is initially and quickly taken to be co-indexed with the most recent potential fller. (Frazier et al. 1983) Frazier et al. were cautious to point out that the MRFS was just a heuristic mechanism that was part of a more general strategy of opting for the most ‘salient’ fller (p. 196): We propose [an] alternative, which explains the recent fller strategy in terms of the very general tendency for the human sentence processor to pursue the frst analysis of a sentence available to it (…). We assume that a

128

Gap flling

variety of factors, including its recency, interact to determine how quickly a noun phrase can be retrieved from memory to be assigned as a fller to a gap. These factors collectively determine the ‘salience’ of a fller. We propose that the processor initially assigns the most salient fller to an identifed gap. The empirical content of this ‘salience hypothesis’ consists in its claim that all factors which afect the availability in memory of a fller will contribute to the likelihood of initially assigning it to a gap. Such factors would, we presume, include the degree of sentential stress with which the potential fller was spoken, the presence of a special grammatical marker (e.g., a relative pronoun) which signals that a phrase is an obligatory fller, and, potentially, discourse factors including the communicative importance of a phrase. So they assumed a variety of potential players but their research had the specifc “initial” goal of evaluating locality, that is, whether the MRFS in particular is used online or not. To that end, they devised a sentence comprehension experiment that allowed them to see if sentences consistent with the MRFS, like (21), were easier to understand than sentences that were not, like (22). Importantly, they also aimed to be able to ascertain whether any such diference might be reduced or eliminated by the control information provided by the verbs (try and force; notice that you *cannot be tried to do something but you can indeed be forced to do something, so the lexicality of the two verbs is diferent; see below): (21) The mayor is the crook who the police chief tried (gap) to leave town with (gap). (22) The mayor is the crook who the police chief forced (gap) to leave town. In the experiment, participants had to read the sentences and then judge whether they had understood them (at the end of each one). So they measured their decision time. It turned out that participants reacted more rapidly to MRFS than to non-MRFS sentences (1071 versus 1165 mscs), and that they showed more success at comprehending the former than the latter (78% versus 66%). They concluded that “readers use control information only after they have applied the Most Recent Filler strategy” (p. 205). As the reader may have noticed, the structures tested by Frazier et al. mixed diferent kinds of elisions. To show that, below I translate (21) to the inventory of gaps employed in this book: (21’) The mayor is the crook whoi the police chief j tried PROi to leave town with (t i). This means that, over and above the limitations of a methodology that assessed only end-of-sentence make-sense results, Frazier et al. could not, in fact, tackle the issue of verb control adequately (Fodor 1988: 141, fn 17).5 No modern theory

Gap flling

129

has any idea on the way gap-flling is done when multiple gaps of diferent kinds need to be resolved simultaneously. Research has rather concentrated on studying particular types of gaps in isolation (understandably, given the complexity of the issue). This probably explains why the MRFS did not enjoy great success (to judge from its scant presence in academic papers). The other reason is that Frazier and colleagues soon proposed another strategy, the Active Filler Hypothesis (see section 4.4.), and this focuses on displacement operations that result in traces. This has taken the lion’s share of studies on gap-flling. Note that there is an important fork here: traces involve fller-driven parsing; but PRO is gap-driven, as no displaced fller is visible in structures like (23) and (24), illustrating subjectcontrol and object control, respectively (Rosembaum 1967): (23) The mayor i promised the police chief j to PROi leave town soon. (24) The mayor i encouraged the police chief j to PROj leave town soon. Notice that PRO gaps cannot therefore be ‘actively’ predicted by the prior detection of their fllers. If any prediction is possible on them, then it seems that only lexical information can be consulted (Bresnan 1982; Chierchia 1988; Jackendof 1974; etc.): in (23) only the lexical specifcation of the predicate promise guarantees knowledge of which participant in the scene will do the leaving (the subject the mayor); likewise, in (24) only knowledge of the lexical requirements imposed by the verb encourage informs us that it is the object the police chief that is to leave. The grammar of control has occupied a privileged position in linguistics since Rosenbaum (1967). In generative grammar, it has long become standard to differentiate the PRO gaps left by control phenomena, as in (25), from similarlooking elisions that arise as a result of the application of movement (raising), resulting in so-called NP-trace, as in (26): (25) Tom i tries to PROi like Mary. (26) Tom i seems to (t i) like Mary. The assumption is that PRO is base-generated, a silent pronominal copy left after deleting the original constituent under identity with the matrix clause subject Tom. In the scientifc jargon that popularized these types of structure in linguistics, the Theta Criterion (Chomsky 1981: 36) stipulates that an argument bears only one theta-role (e.g., agent, patient, etc.) per verb, and the null subject PRO of the infnitival clause in (25) receives its theta-role by ordinary structural principles from that lower verb (as it is a true argument of that verb). Since PRO does not involve movement, it is conceived as a covert anaphoric element whose content is controlled by an antecedent NP (the matrix clause subject Tom in (25)). So both antecedent and PRO mean the same, since they are co-referential, but they are not the same, just as in Tom suggested that he would not go, Tom and he may be co-referential but occupy two diferent places in the tree. Technicalities of generative grammar aside, it seems sufciently clear that the thematic properties

130

Gap flling

of control and raising predicates do justify two diferent types of gaps, and the linguistics literature has long recognized their distinct behavioural profles (Chomsky 1981; Chomsky & Lasnik 1993; see Polinsky 2013 for a clear and more recent account). Tom is an argument of both predicates in (25) above (Tom tries, Tom likes), but it can only be an argument of the lower one in (26): Tom does not seem; rather, it seems that Tom does something: to like Mary. The availability of it-extraposition (it seems that …) follows from this. Empty it shows the preverbal position is actually devoid of content in (26), but not in (25). There is no such availability for (25): *it tries that Tom likes Mary. We focus on controlled PRO gaps here now.6 As (23)–(24) show, classic obligatory control can be divided into subject-control and object-control. The vast majority of control verbs establish their control orientation rigidly in the sense that this is lexically determined and not subject to construal or context ft. That is, there is nothing we can do to manipulate context and make encourage a subject-control verb.7 We referred above to Bever and McElree (1988)’s early probe recognition study in which they argued that (lexically controlled) PRO gaps activate their antecedents later than syntactically controlled NP gaps, which they attributed to the fact that “PRO gaps can be flled by access and coindexing to an independent structural representation of the antecedent” (p. 34). Using a cross-modal priming technique, Nicol (1988) could not fnd any evidence of faster activation of the MRF in structures like (27) below, containing PRO: (27) That’s the actressi that the dentist j from the new medical centre in town a. had invited t i PROi to go to the party b. had hesitated PROj to go to the party with t i Primes related to PRO’s most recent antecedent were presented immediately after the word to, but they were not recognized faster than those of unrelated, control words. To account for this, like Bever and McElree, Nicol also underscored the semantic nature of the relationship between fllers and gaps and suggested that co-referential relationships enter the scene only after syntax has done its job. However, as in Frazier et al.’s study, and as (27) shows, Nicol also mixed gaps of diferent kinds and failed to provide the most natural contrast if the MRFS is to be properly evaluated: that of subject-control versus object-control alone, as in (23)–(24) above, repeated below with control information coded in the arrows: (23) The mayor promised the police chief tried to PRO leave town soon

(24) The mayor encouraged the police chief to PRO leave town soon

Gap flling

131

In an eye-tracking study that could get rid of the methodological limitations of the earlier accounts, Betancort et al. (2006) did just that. In their frst experiment, they contrasted subject-control and object-control verbs using Spanish materials like (28)–(29): (28) María i prometió a Pedroj [PROi] ser bastante cauta con los comentarios. Maryi promised Peterj [PROi] to be quite cautious ( fem) with her comments. (29) María i exigió a Pedroj [PROj] ser bastante cauto con los comentarios. ‘Maryi demanded from Peterj [PROj] to be quite cautious (masc) with his comments’. Mary demanded from Peter that he be quite cautious (masc) with his comments. They had two main objectives. First, they sought to fnd out if object-control gaps (as in (29)) were resolved more rapidly by the parser than subject-control ones due to the blind applicability of the MRFS (or something like it). Notice that indeed in (29) the fller Pedro is right beside the postulated gap, but not so in (28). The alternative, favoured by constraint-satisfaction and lexicalist models, is that language-users make rapid use of the control information in the lexical heads and easily anticipate the direction of control, ignoring distance, or at least not being too afected by it. So, we have two options: if recency is used as a quick default and control (i.e., lexical) information is delayed, then reprocessing becomes mandatory in the subject-control group only and, using eye-tracking, this should be easy to pick up from the reading data, particularly from the regression-oriented measures and total reading times. This is because readers would blindly opt for the MRF and, this being wrong for the subject-control group, they would have to undo their choice, incurring an extra use of processing time. However, if parsers are guided by their knowledge of lexical semantics, no diferences between both groups of verbs should be found, and no reanalysis should be necessary. Betancort and colleagues conceded that control information could be used simultaneously with recency. In this (third) scenario, the region of the PRO gap could be read faster in the object-control group for the fller is right beside the gap. The key diference between the two alternatives is then that whereas the MRFS predicts some form of clear reanalysis for the subject-control group, constraint-based models do not. In the second place, the authors aimed to be able to arbitrate between theories of grammar with diferent views on PRO, including the Movement Theory of Control, for which control is raw syntax. Relying on Culicover and Jackendof (2001), the authors set the scene for their experiments by reminding us of the seemingly inescapable lexical nature of PRO gaps. They used examples (30)–(34) below, from Culicover and Jackendof (Betancort et al. 2006: 219–220): (30) An American attempt to PRO invade Vietnam (31) John i begged Mary j to PROj leave

132 Gap flling

(32) John i promised Mary j to PROi leave (33) John’s promise/vow/ofer/obligation/pledge/oath/commitment to Susan to PRO take care of himself/*herself (34) John’s order/instruction/encouragement/reminder/invitation to Susan to PRO take care of herself/*himself Note that in (30) the controller is the adjective American, which precludes the possibility of upward movement from a subject position in the lower infnitival clause, since adjectives cannot normally be subjects. (31) and (32), like (23)–(24) above, show that control may be exercised by the higher clause object or the higher clause subject. In view of the fact that subject-control verbs like promise are much less frequent than object-control ones like encourage, Rosenbaum (1967) proposed a Minimal Distance Principle (MDP), according to which control is assigned by default to the closest NP. However, if we include nominals, and as (33) demonstrates, subject-control is no longer rare. In all of (33) the controller is John. In all of (34), however, it is Susan: As Culicover and Jackendof point out, “the most plausible basis for the difference in controller between [33] and [34] is thematic structure” (p. 505). This is evident from the fact that all the nominals in (33) receive control, not from a syntactically determined subject position but from the thematic role of Source (the giver of the promise) wherever that may be located syntactically:

(35) The promise to Susan from John to take care of himself/*herself (36) John gave Susan some sort of promise to take care of himself/*herself (37) Susan got from John some sort of promise to take care of himself/*herself If the controller is the lexically entrenched semantic role of Source, and not the confgurationally defned function of subject, “the facts emerge elegantly”. However, the price is that the controller cannot be identifed in terms of syntactic position (Culicover & Jackendof 2001: 506). What Betancort et al. found was that the second NP (Pedro) was read faster in the object-control group than in the subject-control one. Relevantly, this efect emerged in the earliest measure used (frst-pass reading times) and continued in later measures (go-past and total time).8 However, no trace of reanalysis could be seen in any region for the subject-control group, which means that readers did not have to undo any initial bet. The authors concluded that both “verb control information and recency (that is, proximity of antecedent and PRO) are taken into account during the process of antecedent selection” (Betancort et al. 2006: 235). A similar (but not identical) pattern of results was obtained by Kwon and Sturt (2016) in English in a study in which the authors manipulated nominal control, as

Gap flling

133

in (38) and (39). Since English lacks gender morphology in the predicative adjectives (cauto/cauta in (28) and (29) above), they used refexive pronouns instead to assess comprehenders’ sensitivity to grammatical violations. The grammatical violations are used to ensure that the control information is processed adequately by readers: (38) a. Naturally Lukei’s promise to Sophiaj PROi/*j to photograph himself in the barn amused everyone. b.*Naturally Lukei’s promise to Sophiaj PROi/*j to photograph herself in the barn amused everyone. (39) a. Naturally Lukei ’s plea to Sophiaj PRO*i/j to photograph herself in the barn amused everyone. b.*Naturally Lukei’s plea to Sophiaj PRO*i/j to photograph himself in the barn amused everyone. Kwon and Sturt (2016: 5) also reported a recency efect and made a very interesting observation on this efect in both theirs and Betancort et al. (2006)’s study. They argued that it can actually be used as a diagnostic test for the successful access to control information: “by defnition, the processing facilitation for a recent controller of PRO (relative to a non-recent controller) could only be observed if the parser has already accessed the information that allows the relevant control properties to be identifed”. It is to be noted that their own recency efect was visible during the frst-pass regressions out of the infnitive (with more regressions having been initiated in the subject-control condition), and that facilitation in the object control condition occurred in later measurements than in Betancort et al.’s study with Spanish materials and verb control. The authors pondered on the interpretability of their results. They pointed out that the object-control advantage (as seen in the fewer regressions out of the infnitive) is compatible with an explanation based on the MRFS. On this view, an initial dependency is frst established between PRO and the most recent fller, and this is checked later against control information, resulting in reanalysis for the subject-control ties (more regressions). But they granted that such a reanalysis would still have to take place very fast indeed. Since this rationale is based on regression measures and they do not apply to the Spanish data of Betancort et al. anyway (no diferential regressive movements across conditions there), they proposed another, alternative explanation based on integration cost (as in the cue-based retrieval model of Lewis & Vasishth 2005): the (relative) object-control advantage could be due to higher activation costs for distant subject-controlled antecedents while, concurrently, control information is accessed and used from early on, as the detection of violations in both studies show. It is hard to align the Spanish and English results described above (diferent structures and slightly diferent eye-tracking sensitivities) into a unifed account that can elucidate whether the recency efect in both studies represents some kind of evidence for the MRFS or not. The Spanish results, in particular, show

134 Gap flling

that access to control information is extremely expeditious too, and this idea is evident from another study by Kwon and Sturt (2014) on temporarily ambiguous garden path sentences containing control nominals in English again. The garden paths the authors studied involved classic contrasts like Before Andrew’s refusal/ order to wash the kids came over to the house, where attachment of the kids is temporarily ambiguous. Kwon and Sturt found that garden pathing was less acute “when the lexical control information highlighted the globally correct analysis” (2014: 59), suggesting that control information is quickly used during structurebuilding operations. Finally, a recent study by De Dios (2021) provides the most in-depth approximation to the issue of whether PRO gaps are resolved primarily by relying on a recency default or by consulting lexical semantics. De Dios studied classic, obligatory verbal control in Spanish and used materials like those in Betancort et al. (2006) but corrected a number of important defcits in that original study. She also provided a much more powered design and statistical analysis. She observed that the conclusion in Betancort et al.’s study that object-control gaps were resolved more quickly than subject-control ones (as early as in frst-pass reading times of the second noun; i.e., the object NP) was contaminated by the authors failing to provide comparable materials. In particular, she noticed that many of the NP2s used in the subject-control group were, in fact, not arguments of the subject-control verbs, but adjuncts or quasi-adjuncts instead (e.g., … aceptó ante su amigo, ‘… accepted before his friend’). By contrast, all NP2s in the object-control condition were really clear objects. In a series of six diferent experiments, including a completion task and two ofine acceptability ratings experiments (aimed at evaluating control preferences and perfecting her materials), one selfpaced reading study and two eye-tracking studies, the author was able to amass important data on the resolution of PRO gaps. In a nutshell, the most relevant result from the whole series of experiments is that there was no recency efect: no eye-tracking measure or region could be seen to show that the PRO gaps in the object-control conditions were read any faster than those in the subjectcontrol ones. Crucially, none of the measures showed any evidence of reanalysis in the subject-control dependencies either. In short, nothing in De Dios’s long series of studies provided any evidence for the MRFS. Instead, the picture that emerged is that the recency efect in Betancort et al.’s study was artifactual and that control information is swiftly used by the parser to resolve the PRO gaps in the same way in both subject-control and object-control dependencies. Also crucially, therefore, the evidence collected by the author contradicts the idea of a delayed strategy in the processing of the antecedents of PRO. On this the data in De Dios’s study fully align with the fndings in Betancort et al. (2006), who observe that there seems to be a contradiction here (p. 251): In essence, the contradiction revolves around whether control is an inherently semantic issue or not (Fodor 1988). If it is assumed that syntactic representations are computed more rapidly than semantic representations

Gap flling

135

(Fodor 1988), then, under further habitual assumptions in the processing literature, one might expect the binding of antecedent and co-referential PRO to be somewhat delayed (…). But since both electrophysiological evidence (Demestre et al. 1999; Featherston et al. 2000) and our own eye-tracking experiments have revealed a rapid resolution of that binding relationship, we are, in principle, forced to conclude that semantic representations need not be any slower than syntactic ones. The theoretical implications of this conclusion can hardly be ignored. Notice that, unlike the case of wh-trace and NP-trace (about which formal theories of grammar disagree, with GB/MP claiming that the two are syntactically governed and, for instance, GPSG insisting that only the former is; see Osterhout & Swinney 1993 on interesting processing implications of this), all formal models of grammar do agree that control cannot be captured structurally alone (through the MDP, for instance), so there would appear to be no escape from the conclusion that the rapid resolution of the binding relationship is a rapid resolution of a semantic afair (Fodor 1988).9 In fact, Betancort et al.’s point is only a contradiction if one assumes the views of the predecessors of their own study (namely, that control is an inherently semantic issue). But, as they themselves note, this is not so clear, for it is at least equally sound to view it as lexically specifed grammatical form. Verbs do indeed lexically determine the range of their complements, and, just as we expect complements to be processed faster than adjuncts, there would appear to be no problem in expecting tightly controlled antecedent-PRO ties to be processed faster than other non-lexically regulated co-referential relationships, or arbitrary PRO (My impression is that PRO studying in Berlin must be amazing). On this view, control relationships are processed fast “because they are launched from the same lexical platform that launches all fast connections, be they between a verb and a complement NP, a complement that-clause or a complement infnitive” (p. 251). Note that in Spanish the matrix verb also determines whether the verb in a subordinate that-cause must appear in the indicative or the subjunctive. Likewise, the verb also determines whether a subordinate infnitival clause is subject-controlled or object-controlled. It makes sense to believe that all these types of useful information that are automatically launched are also automatically used. Finally, as for the diferences between these results and those of Kwon and Sturt (2016), which did show a recency efect (in regression measures…), De Dios speculated that verbs might simply launch their argument structure predictions more strongly than nominals, as verbs are indeed the most valuable sources of information in parsing (e.g., MacDonald et al. 1994; Snedeker & Trueswell 2004). As she observed, something else seems to back up this idea, namely, the fact that in Betancort et al.’s (2006) second experiment, in which they manipulated control induced by prepositions in adjunct clauses (e.g., Toño le dijo eso a María para PRO meterse con ella, ‘Tom said that to Mary to PRO tease her’), a recency efect was indeed found. When comparing the superior use of lexical

136

Gap flling

information in verbs versus prepositions, the authors concluded that languageusers may well use verb control faithfully because “verb control is sufciently powerful for a parser to feel it will never lead it down a garden path” (2006: 250). This reliability is absent for prepositions since when dealing with these, contextual manipulations may override the direction of control. For instance, even though in order to/para typically induces subject control, in Tom brought Mary here to PRO do that job, it is the object Mary that controls PRO now (Haegemann 1991: 262).10 I conclude from all of the above that when control is lexically coded in a seamless manner, as is the case for verb control generally, parsers use that information preferentially over all others. When, in contrast, it is coded less seamlessly, parsers may fall back upon a very wide-ranging and well-oiled general strategy of opting for the most local connection frst, as a blind default.

4.4 Filler-driven parsing: the Active Filler Strategy As noted above, PRO gaps are typically unannounced. Traces, by contrast, are very often precisely the very opposite of that: very easy to predict given the sudden encounter of fllers that announce them. We referred already to early work by Swinney et al. (1988) using the cross-modal priming methodology. When examining structures like (9), repeated below, that contained relative clause markers like that, (9) The policemen saw the boy that i the crowd at the party * accused ti * of the * crime the authors observed that at the position of the postulated gap (after accused), lexical decisions to semantically related targets (e.g., girl) were signifcantly faster than those to unrelated ones. This seemed to indicate that the antecedent of the trace was reactivated precisely in the place of its canonical position (prior to hypothetical movement). Since other NPs in the sentence did not get any priming in that position, reactivation was thought to have been prompted by the postulated gap. Similarly, at around the same time, in a similar quest for trace reality, Tanenhaus et al. (1989) employed a stop-making-sense judgement task. Participants were given sentences like (40)–(41): (40) The businessman knew which articlei the secretary called t i at home. (41) The businessman knew which customer i the secretary called t i at home. and they provided evidence of spotting the semantic anomaly in the likes of (40) by the time they read the verb (called in (40); examples like (41) were used as a baseline for comparison). These initial fndings with less-than-ideal methodologies paved the way for the postulation of the Active Filler Strategy, or AFS (Frazier & Clifton 1989;

Gap filling

137

Frazier & Flores D’Arcais 1989). Essentially, this hypothesis maintains that when the parser sees an unassigned fller in a non-argument position (deriving from the application of movement), a gap is immediately posited at the frst available place in the structure as a kind of default heuristic that is insensitive to everything else (that is, any other kind of potentially relevant source of information). The reason, of course, is that keeping fllers in store consumes memory space and, since this is dear, the parser attempts to free that space as soon as possible. Memory space is taxed also because in many languages fllers are normally ambiguous in terms of their grammatical function and semantic role (agent, patient, etc.) until the parser reaches the gap position, which is where it can usually recover that kind of information (Ueno & Kluender 2009). Hence, the establishment of the longdistance dependency: Active Filler Strategy When a fller has been identifed, rank the option of assigning it to a gap above all other options. (Frazier & Clifton 1989: 95) Assign an identifed fller as soon as possible; i.e., rank the option of a gap above the option of a lexical noun phrase within the domain of the identifed fller. (Frazier & Flores D’Arcais 1989: 332) When a fller of category XP has been identifed in a non-argument position such as COMP, rank the option of assigning its corresponding gap to the sentence over the option of identifying a lexical phrase of category XP. (Clifton & Frazier 1989: 292) Interestingly, the AFS antecedes well-known generative tenets like the Minimal Link Condition (or MLC; Chomsky 1995), with which it obviously converges in the spirit of computational economy. The MLC posits that grammars preferentially develop movement phenomena involving short distances between moved phrases and their traces over competing alternatives that involve longer distances (see also Hawkins 1994, 2004 and Levy 2008a: 1126). In the AFS, distance is operationalized as the segment separating the ‘unthematic’ fller from its ‘thematic’ gap. The AFS can thus easily account for the superior ease of subject relatives versus object relatives. An interesting corollary of its eager active refexes is the Filled Gap Efect (or FGE; Crain & Fodor 1985; Stowe 1986). Consider (42)–(43) now: (42) My brother wanted to know whoi Ruth will bring us home to t i at Christmas. (43) My brother wanted to know if Ruth will bring us home to Mom at Christmas.

138

Gap flling

Notice that (42) contains a trace after to, signalling movement, but (43), an embedded polar interrogative, does not. Stowe registered elevated RTs at us in the former relative to the position occupied by that same word in the latter. This is usually interpreted to mean that the appearance of who prompts the automatic postulation of a gap to be flled at the frst available position. In (42) that is right after the verb bring (My brother wanted to know who Ruth will bring (period)), but that position is ‘flled’ by us, so the parser needs to overcome its surprise (higher RTs) and keep on looking till it hits the free position after to. Notice that even in the presence of a ‘lexical NP’ (us), the parser does not wait to see if the direct object slot is occupied (the lexical route) but instead launches a long-distance linking operation that is doomed to failure (hence, the ‘active’ component of the AFS and the adjectives used to describe its modus operandi in the literature: hasty, zealous, eager, etc.). So the FGE is a consequence of the AFS. Its intuitive foundations are clear. Like relatives, wh-questions contain traces that must be linked back to a previous fller. Early work by Fiebach (2001), using the ERP methodology, compared subject and object questions in German, like those in (44)–(45): (44) Karl fragt sich, wer i -i den Doktor verständigt hat. Karl asks himself, who-NOM the doctor-ACC called has (= Karl asks himself, who has called the doctor.) (subject-initial) (45) Karl fragt sich, wen i der Doktor -i verständigt hat. Karl asks himself, who-ACC the doctor-NOM called has (= Karl asks himself, who the doctor has called.) (object-initial) As was the case for the relatives discussed above, (44) and (45) crucially difer in the distance separating the fronted wh-phrase and the position of the gap. Notice that, as the glosses show, in this case the grammatical function of the fller phrase was clearly marked by case (NOM versus ACC). This ensured that whatever efects could arise, these could not be due to strategies conducive to dealing with ambiguity resolution. There is nothing ambiguous to resolve. Fiebach reported almost showroom results for the AFS (p. 130): sustained efects [that] were more negative going for object wh-questions and distributed over (left-)anterior electrode sites, suggesting that they were caused by a greater load in working memory. This ERP efect, which was observed in both ERP studies, was not found over the whole sentence. It was restricted to the region between the dislocated element (i.e., the object wh-fller) and the position of its gap. Furthermore, it was only observed when the fller-gap distance was long, and it was stronger for individuals with a low working memory capacity than for individuals with a high working memory capacity. Taken together, these results

Gap flling

139

strongly suggest that the obtained slow-wave efects were related to working memory processes. (emphasis added) And he added (p. 130): This interpretation is in line with the assumption of the Active Filler Hypothesis (…) that the fller is maintained actively in working memory until the gap is located. However, the present data suggest that AFH, which is a model of ambiguity resolution, can be extended to a more general account of memory costs in the processing of sentences with underlying transformational movement operations. It can be concluded that temporary maintenance of syntactic information in working memory is necessary when complex sentences are processed that contain verbal arguments in (non-canonical) positions that do not allow a direct establishment of the fller-gap dependency (cf. also Fanselow et al. 1999). When case-ambiguous wh-fllers are encountered, the same working memory mechanism is activated. In addition, the parser has to utilize strategic processes, like suggested in the AFH, to identify the correct gap site.11 This view of the way relatives and wh-questions are dealt with by the parser seems essentially right. In fact, this work nicely highlights the interface between structure-building and the constraints imposed by the human short-term memory system. The ERP research, in particular, has been extremely useful in detailing the specifcs of the sustained anterior negativity or SAN (increased negative voltage) associated with keeping the fller in the memory bufer until the gap (any gap) becomes visible (Fiebach et al. 2001,  2002; Phillips et al. 2005; Gouvea et al. 2010). Intuitively, recognized fllers like wh-words entail a need to reconstruct word order deviations from a preferred, default pattern (see next section), especially if such fllers are not too informative about syntactic function and semantic role, so it makes sense to see that modern technology can pick up traces of the online reconstruction process. In fact, Phillips et al. (2005), among others, registered SANs in both short and long wh-dependencies but with greater amplitudes in the long ones, a fact that underscores the memory problem.12Also, as Hunter et al. (2019: 2) observe, the AFS is akin to the Late Closure Principle, whose applicability in sentence processing is indeed phenomenal. Thus, when reading When Fido scratched the vet (and his new assistant) removed the muzzle, “humans’ frst guess” is not to wait but instead immediately attach the vet as object of scratch. Similarly, in What did John buy books about yesterday? their frst “guess” is to connect what and buy at once. Both moves share the same proactive dynamics. Yet, soon researchers came up with an alternative explanation to the active behaviour of parsers and the FGE, one that seemed equally intuitively attractive and, to some, even much more so. This is the Direct Association Hypothesis (DAH) initially proposed by Pickering and Barry (1991). The DAH is premised on the view that fllers are related directly with the subcategorizing heads,

140 Gap flling

obviating any need for the postulation of traces or gaps (Pickering 1993;  Sag & Fodor 1995;  Traxler & Pickering 1996). On this view, fllers simply fll an unsaturated position in a verb’s argument structure directly, and reactivation efects occur when reading the verb, not a gap after it. So: “gaps are not used in sentence processing” (Pickering & Barry 1991: 257). This gapless theory rests on the well-known place that verbs occupy in both grammar and sentence processing. Verbs indeed amass a wealth of thematic and subcategorization information that is essential for projecting and predicting syntactic structure (MacDonald et al. 1994; etc.). Fodor (1993) and Sag and Fodor (1995) noted that the crossmodal priming experiments were only evidence of a semantic process whereby a displaced argument is linked to its subcategorizer, not evidence of a syntactic operation. Since the experimental fndings reviewed above all involved hypothetical gap positions immediately following verbs, it is at least equally possible that the efects did come about because of the tangible verb and not because of the intangible, postulated empty category. Pickering and Barry (1991) used sentences like (46)–(47) to buttress their point. Notice that (46) sounds fne but (47) is considerably harder to process. This is because the short object NP a prize occurs after a very long Indirect Object phrase. The length of that phrase no doubt dilutes the strong link between the distant verb gave and its DO argument a prize, making it hard to re-establish it. They reasoned that in (46) the short NP object is to be found in the exact same position as in (47) via the trace. If the gap is really there, the same processing diffculty should arise. Since it does not, there may be no gap there at all after all. (46) That’s the prize which i we gave [every student capable of answering every single tricky question on the details of the new and extremely complicated theory about the causes of political instability in small nations with a history of military rulers] t i. (47) We gave [every student capable of answering every single tricky question on the details of the new and extremely complicated theory about the causes of political instability in small nations with a history of military rulers] a prize. Wh-questions can be explained in much the same fashion. Consider (48) –(49) now: (48) In which i box did you put the very large and beautifully decorated wedding cake bought from the expensive bakery t i? (49) Which i box did you put the very large and beautifully decorated wedding cake bought from the expensive bakery in t i? Again, intuitively (48) sounds much easier than (49). Pickering and Barry point out that that is because, in the latter, the processor has to hold the relevant whphrase in memory for much longer before it can fnally link it to its subcategorizer. That is, in (48) the displaced phrase in which box can be related to

Gap flling

141

its subcategorizer put almost immediately; in (49), however, the phrase which box is subcategorized by the preposition in, and this comes last in the structure. Kawasaki and Ishikawa (2003: 458) nicely paraphrase the point that Pickering and Barry were trying to make: (A) If an empty category is assumed, the fller receives its thematic role only via the empty category, which receives its thematic role from the selecting head (put in [48] and in in [49]). (B) This means that the parser would have to retain the fller (wh-phrase) in working memory until encountering the empty category. (C) Given that the object NP the very large and beautifully decorated wedding cake bought from the expensive bakery is quite complex and should put some burden on working memory, [48] –[49] should both cause parsing problems. (D) However, this is not the case. (E) Hence, an empty category should not be assumed; the fller is directly assigned its thematic role by the selecting head. This line of reasoning was soon questioned by Gibson and Hickok (1993), Gorrell (1993), Crocker (1994), and Sag and Fodor (1995), who pointed out that nothing precludes the AFS from being able to predict a gap on encountering the verb, before the gap actually becomes visible. As Lee (2004: 54) observes, this leaves us with a problem: While it is possible to argue on grounds of ontological simplicity that gaps/traces play no role in sentence processing and grammar (cf. Pickering 1993), it is widely acknowledged that the gap and gapless accounts “are empirically indistinguishable with respect to the type of data under consideration” (Gibson & Hickok 1993: 160). To reaffrm the role of gaps in sentence processing, it is necessary to demonstrate an experimental efect at an assumed gap location which cannot be explained in terms of a direct association between a fller and a verb/ subcategorizer. Fortunately, the relativity of the ontological arguments could soon be complemented with experimental evidence, and this came initially mostly from the German language. Thus, it soon became clear that antecedent reactivation can occur at trace positions that are not directly adjacent to verbs (Bader & Lasser 1994; Crocker 1994; Clahsen & Featherston 1999). German provided interesting data points because it allows short scrambling, that is, word order variations within

142

Gap flling

the clause. This makes it possible to test for efects before the verb comes. Most of this evidence used the same cross-modal priming task used in previous studies, however. In English, Lee (2004) came up with an ingenious response to the DAH too, one that could overcome the limitations of the cross-modal priming methodology (he used a self-paced moving window reading task; see also Tollan & Palaz 2021). word-by-word self-paced noncumulative moving window reading task ( Just et al. 1980) Lee sought to revisit the data uncovered by Stowe (1986) on the FGE. I repeat examples (42) and (43) below: (42) My brother wanted to know whoi Ruth will bring us home to t i at Christmas. (43) My brother wanted to know if Ruth will bring us home to Mom at Christmas. However, instead of the FGE at us, which caught everyone’s attention, Lee wanted to see why there was no FGE at the subject position, before the verb (at Ruth). Stowe (1986) had reported a statistically non-signifcant efect of between 19 and 28 milliseconds on the RTs for Ruth in (42) versus (43), a fact that some researchers interpreted to mean that the DAH was right. Note that the DAH can account for efects at us just as well as the AFS, because these come after the verb. But, crucially, before the verb becomes visible, after the wh-word, there is a subject (Ruth) and, on the AFS logic, this should have produced a signifcant FGE too. On the DAH logic, however, the absence of a FGE at Ruth follows from the fact that when hitting who “the parser cannot form any direct association involving who because the subcategorizer bring has not yet been reached”. From this it follows that “the parser should not be disrupted by the appearance of Ruth because at that point there is no erroneous direct association to revise” (Lee 2004: 56). In sum: the “null” fnding in Stowe’s experiment is predicted by the DAH but not by the trace co-indexation account. It seems fair to say that a non-signifcant efect in the direction of the AFS at a position that comes right after the wh-word is not actually so insignifcant, for there really is very little time for the parser to react from who to Ruth in (42) above. Be that as it may, Lee bypassed that peculiarity of the Stowe materials by presenting others like (50)–(51) below. These involve relative clauses of two varieties, namely, preposition stranding (50) and pied piping (51): (50) That is the laboratory [which]i Irene used a courier to deliver the samples to ti (51) That is the laboratory [to which]i Irene used a courier to deliver the samples ti

Gap flling

143

Notice that the FGE at stake now is at the subject Irene. To increase reanalysis diffculty, the author also included a modifed version of (50) and (51) with a length manipulation involving adjunct phrases, as in (50’) and (51’)13: (50’) That is the laboratory [which]i, on two diferent occasions, Irene used a courier to deliver the samples to ti (51’) That is the laboratory [to which]i, on two diferent occasions, Irene used a courier to deliver the samples ti What are the predictions of the two competing accounts? On the AFS logic, the parser should posit a subject gap at which in (50) and be surprised to see that position flled by the lexical subject Irene, triggering reanalysis. At which in (51), the parser also needs to fnd a gap for to which and there are several possibilities, but none involve a trace for to which in the subject position. On this logic then, the appearance of Irene in the subject slot should be unproblematic and no reanalysis is therefore expected. On the DAH logic, before the verb becomes visible, no diferences should be expected in the way Irene is read in both types of sentences. To summarize: “a subject flled-gap efect [i.e., longer processing time for Irene in [50] than in [51]] is predicted only by the gap processing view but not by the gap less view” (p. 59). What Lee found was clear evidence of a subject FGE in the shape of increased time for Irene in the two preposition stranding constructions vs their pied piping versions, with that efect being more pronounced in the longer versions containing intervening adjunct phrases. Together with the German data, this helped strengthen the view of a very proactive gap-flling process and the idea that something like a trace is really in the postulated gap position, as the FGE arises even in preverbal segments. Later research on head-fnal languages like Japanese (Nakano et al. 2002; Aoshima et al. 2004) made that view even more forceful. In fact, the size of the intuitively attractive active component of the AFS may not even make the cut for, when dealing with fller-gap environments, it turns out that the human sentence comprehension mechanism may not only be active but “hyper-active” indeed. Work by Omaki et al. (2015) certainly suggests that. The authors were eager to see whether the preverbal structure-building operations that had been uncovered in research on head-fnal languages like Japanese are custom-made, and therefore specifc to the speakers of those languages, or rather a general parsing strategy that simply happens to be more easily observable in such languages.14 To that end, they conducted three online reading experiments (self-paced reading and eye-tracking) that manipulated verb transitivity in fller-gap environments in a verb-medial language like English (see also Staub 2007). They used structures like (46)–(49) below: (52) Transitive, non-island: The city that the author wrote regularly about was named for an explorer.

144

Gap flling

(53) Transitive, island: The city that the author who wrote regularly saw was named for an explorer. (54) Intransitive, non-island: The city that the author chatted regularly about was named for an explorer. (55) Intransitive, island: The city that the author who chatted regularly saw was named for an explorer. The islands were used as a baseline condition aimed at providing reading times for the critical transitive versus intransitive contrast between (52) and (54), which were the real motivation for the study.15 This is because previous research has shown that the parser avoids actively building fller-gap dependencies across island boundaries (that is, in positions banned by the grammar; Stowe 1986; Phillips 2006; Omaki & Schulz 2011; also Kush et al. 2017 and section 5.3).16 Note that in (52) the city cannot be written but write is still a transitive verb subcategorizing an NP object (as in the story that the author wrote). By contrast, in (54) the verb chat disallows any kind of object (*the author chatted the story). Under a hyper-active gap-flling hypothesis, the authors predicted that readers would posit gaps irrespective of whether the verbs ultimately licence direct object positions or not (so they would posit a gap after chat in (54) as well). In that scenario, evidence of reading disruption should be observed in the intransitive verb conditions. This is precisely what they found. Thus, there was a slowdown in the region following the verb in non-island conditions relative to the corresponding island conditions. First fxation duration and frst pass times were signifcantly longer for the intransitive verbs in the condition that allowed a gap (non-island) than in the island condition. Crucially, this efect disappeared in the transitive verb conditions. The authors speculated that what happened in their experiments was that the presence of the subject NP prompted the parser to predict an upcoming VP and that, since VPs can contain objects, the parser immediately projected a VP with an object NP slot and assigned the fller to this object position “before confrming whether the upcoming verb is a transitive verb or not” (p. 3; on the crucial role of prediction, see Levy 2008a and Chapter 5). They did not speculate much on the nature of the information used to launch the blind, co-indexation process; they merely wanted to stress the idea that wherever that refex originated, it originated without the verb being responsible. Previously to that, van Gompell and Pickering (2001) and Pickering and Traxler (2003) had shown that fller-to-gap linking happens for optionally transitive verbs that are more frequently used as intransitives (though see Staub 2007 for qualifcations to that).17 We may conclude from all this that in the presence of an easily recognizable fller, the parser does not wait at all, not even for the appearance of the most important element in the overall structure: the subcategorising verb.

Gap flling

145

4.5 Scrambling the word order predictability: the Minimal Chain Principle Both in various corners of linguistics and more broadly in psycholinguistics, it has often been assumed that ‘scrambled sentences’ are more complex than sentences presented in a ‘canonical’ word order. Initially, the term scrambling was naturally applied to non-canonical word order in so-called (more or less) free word order languages, such as German, Japanese or Hindi. In these languages, certain constituents can occur in diferent positions in the sentence structure without evident changes in the propositional (truth-conditional) semantics. The changes rather relate to layers of thematic meaning pertaining to the domain of pragmatics. For instance, as has already been noted (section 2.3.4), in Spanish subjects which code new information are normally placed after the verb, so the diference between Va a venir Juan and Juan va a venir (‘JOHN is coming’ versus ‘John is coming’) refects the assumed status of activation of the subject NP in the addressee’s mind. Today, often scrambling simply means non-canonical word order, even in such relatively fxed word order languages as English (Saito 1985; Miyagawa 1997; Bošković & Takahashi 1998; Baltin 2003, 2006). In the early days of generative grammar, Ross (1967) proposed a scrambling transformation that applied in the stylistic component of the grammar. Soon after, the GB model (Chomsky 1981, 1986) started refning this area of description and analysis by distinguishing two major classes of movement, A- and A’-movement, exemplifed by NP-Movement and Wh-Movement, respectively. We focus here on phenomena ‘akin’ to the former type, that is, on structures where movement is hypothesized to occur without fllers like wh-words announcing it (discounting PRO, which we have already seen). I stress ‘akin’ because we are not going to be concerned with the fne details of all the movement phenomena postulated in various forms of generative grammar, but rather with more tangible changes to canonical word order (on ‘covert movement’, see Lo and Brennan 2021 and references therein). On the ‘medium tangibility of movement’ part of the spectrum, we have already referred to early experimental work by Bever and McElree (1988), who studied NP-movement gaps like those in (10; passives) and (11; raising) above, repeated below: (10) John i was hit [t i] by Mary. (11) John i was certain [t i] to leave.18 Featherston et al. (2000) conducted an ERP study of NP trace and PRO in German and provided electrophysiological evidence that the NP trace takes a toll (vs PRO). Using raising constructions such as the sherif i seemed, as the widow suddenly came into the room, t i to be able to sentence the ofender at last (where the sherif is supposed to have moved to the subject position of the main clause from the same position in the subordinate one) and control constructions like the sherif i seemed, as the widow suddenly came into the room, PRO i to be able to sentence the

146 Gap flling

ofender at last (where there is no movement and therefore no trace), they found that at the position of the empty subject the raising construction elicited a signifcantly stronger P600 efect than its control counterpart.19 Assuming that the P600 refects the cost of syntactic processing (Osterhout 1994), they concluded that the extra cost of the (presumably) scrambled structure is due to the fact that it requires “an extra computational operation which is not required in control constructions”: undoing the movement of the NP subject from the lower to the higher clause (p. 153). I reiterate that even though the thematic diferences between NP trace and PRO are clear and have well-known grammatical behavioural refexes that can be approached in a theory-neutral way, this kind of research does not target surface movement of the Juan va a venir versus va a venir Juan kind, so it involves word order reconstruction of a diferent, subtler and more theory-dependent kind (it is worth observing too that in the Featherston et al. work, the frequency of control versus raising verbs was not taken into account either). ‘Tangible’ movement has been studied in work that aims at measuring the cost of violating general language-type (SVO, SOV, etc.) word order preferences. In this context, when a constituent moves it leaves a gap behind too. A lot of work has been done on German verb-last sentences, where an advantage of the SO vs the OS order has been documented (Frisch et al. 2002; see Lamers 2001 on Dutch; and Sekerina 2003 on Russian). Thus, in that language for instance, even with clear case marking, the complex composed of [nominative NPs + by accusative NPs + V] is usually read faster than its mirror image: [accusative NP + nominative NP + V]. Something similar has been observed even in Chinese, despite the fact that the status of grammatical relations in that language is far from obvious. For instance, in two ERP studies, Wang et al. (2009) compared the ERP signatures at the verb and the second NP in OVS and SVO structures and found that Mandarin Chinese speakers exhibit a subject-frst preference, “thus providing further converging support for the notion that the subject-preference might constitute a universal processing strategy” (abstract). Principle-grounded models of parsing have in the Minimal Chain Principle (MCP) of Marica de Vinzenci (1991) a mechanism that complements their AFS and that is meant to capture the way that movement is dealt with by the mind. Under a GB framework, de Vincenzi proposed her hypothesis as follows: Minimal Chain Principle: Avoid postulating unnecessary chain members at S-structure, but do not delay required chain members. (De Vinzenci 1991) By chain (Rizzi 1990), de Vincenzi understood a “set of elements nondistinct in indices, bearing one thematic role and one grammatical case”. This is to say, a chain is the anaphoric connection between two or more positions in a tree. Another way of contemplating this is via the notion of a discontinuous (or long) constituent with a single thematic role (agent, patient, etc.) and a single specifable

Gap flling

147

function (subject, object, etc.). It is not clear whether de Vincenzi intended the MCP to apply to anaphoric chains without overt movement (as (10)–(11) above) as well as to obvious cases of overt movement, or to both. Here we examine the MCP in connection with the latter type of phenomena, as in the Italian ambiguous construction below20: (56) Ha chiamato Giovanni ‘Has called Giovanni’ a. pro has called Giovanni. He/she/it called Giovanni. SVO b. t i has called Giovanni i. Giovanni called (someone). VS As the glosses show, (56) is ambiguous between one interpretation where Giovanni did the calling and another one where he was called. Under the former reading (56b), assuming that Italian is an SVO language, Giovanni is the displaced subject that leaves a gap in the preverbal position after moving to fnal position.21 The GB convention t i shows that the gap position (t) and Giovanni form a chain after movement (hence the shared index). In this notational system, t is an expletive subject, a kind of place holder for the subject position when, for some reason, this has been vacated by a displacement operation. Conversely, the interpretation glossed in (56a) has a ‘contentful pro’ (like any other pronoun) that fgures in the subject position, as is typical of pro-drop languages, whose subjects are very often dropped when they are activated. This pro is contentful because it designates a specifc individual whose identity is clear in context (say, María in María was here this morning, (pro) spent some time in her room, (pro) called Giovanni and then left). That is, this pro is D-linked, as is often said. In a nutshell, the gap in (56a) means: subject missing because you know who I am talking about; that in (56b) means: subject missing because it is a new discourse referent: Wait for it in the focal position after the verb (Meseguer et al. 2009). Importantly, understood in the somewhat theory-neutral way we are trying to adopt here, in (56a) Giovanni takes the postverbal position because that is the typical DO position in Italian. This means that, unlike (56b), it has not reached that position after movement from any other place in the overall structure. At the risk of sounding contradictory, we may say that Giovanni in (56a) is ‘base-generated’. As Meseguer et al. (2009: 768) note: Here we are concerned with the simple notion that displacement of an entity [like Giovanni] leaves a gap in the vacated position and thus creates the need for the gap and the displaced entity to be co-indexed somehow (Clifton & Frazier 1989). At this level of description, when an element moves it forms a complex chain with its gap [as in (56b)]. If it does not move OVERTLY, the deep-to-surface mapping is direct, and therefore not complex. Given the above comments, it follows from the MCP that the preferred reading of (56) is (56a), as, in it, Giovanni is in a typical object position and is indeed and

148 Gap flling

object. In (56b) the object reading should be pursued too (since positing movement is a last resort) but as later disambiguating information becomes available reanalysis should prove inevitable because in (56b) the postverbal NP is not an object but a moved subject. If reanalysis takes time, this version should prove harder to read. This is what de Vincenzi found in her reading experiments: reading time for the disambiguating region was slower when the inverted-subject reading was enforced (1047 versus 950 ms). The single chain preference actually “amounts to saying that the parser prefers to analyze an element as being in its deep-structure position, that is, in the position where it directly receives a thematic role” (de Vincenzi 1991: 339), as Giovanni is in (56a) above. That is, the parser has “a general preference to posit unmoved elements over moved ones”. In the serial thinking that pervades much work done under the aegis of formal models of parsing, the motivation for such a strategy is that the parser has to be able to see frst what kind of thematic role a constituent has in the unfolding predication, before it can assemble phrasal packages to be sent to the ‘semantic processor’ (after syntax). Holding pieces of a long discontinuous constituent unstructured is deemed uneconomical. In sum, in complex chains the parser has to link all the members of the chain before phrasal packaging can commence. In (56b) it has to reconstruct the original position of the displaced subject, undoing movement, to see a tidy SVO template, before it starts making sense. The reader may have noticed that the MCP is very much in line with much work done in linguistics. Thus, for instance, Chomsky (1995) regards movement as a dispreferred option also in grammatical derivations and the STAY principle of Optimality Theory captures approximately the same idea (see Fanselow et al. 1999). This is the same philosophy that inspired the work and the conclusions of the Featherston et al. (2000) study mentioned above. Note that the MCP serves both as a heuristic device aimed at fnding orderly phrase structure in the foggy environment that movement often creates and, in view of the German-style NP NP VP data, also as a general parsing strategy that starts acting in the absence of verb information (Levy 2008a). A good place to see the MCP at work in a tangible movement scenario is the case of se structures in Spanish. Similarly to si in Italian, se in Spanish can be used in various ways to code rather diferent constructions (see Cinque 1988, 1995, for Italian, and Sánchez López 2002 for Spanish), but Meseguer et al. (2009) were interested in the specifc contrasts two-argument refexive versus impersonal and three-argument refexive versus passive, as in (57)–(60) below: (57) Two-argument refexive: Se vendó apresuradamente el corredor Refex-DO V ADJUNCT SUBJECT ‘Himself bandaged hurriedly the runner’. The runner bandaged himself hurriedly.

Gap flling

149

(58) Three-argument refexive: Se vendó el tobillo el corredor Refex-IO V DO SUBJECT ‘To himself bandaged the ankle the runner’. The runner bandaged his ankle. (59) Impersonal: Se vendó apresuradamente al corredor Clitic V ADJUNCT DO ‘Bandaged hurriedly to-the runner’. The runner was bandaged hurriedly. (60) Passive: Se vendó el tobillo al corredor Clitic V SUBJECT IO ‘Was bandaged the ankle to-the runner’. The runner’s ankle was bandaged. Note that the four structures are declarative sentences that start without any subject NP before the verb (as se is a direct or indirect object in the refexive structures and a verbal clitic in the passive and the impersonal). Since nothing announces the fact that the subject position is empty, what the parser needs to keep in store is a gap, not a fller. This makes the MCP a better candidate than the AFS for dealing with this kind of material.22 Meseguer et al. conducted a corpus study of 2,395 se structures in Spanish to make sure that whatever efect obtained in their online experiment could not be seen as the result of a strong statistical bias. It turned out that 64.8% of the items found were in the ‘other’ category, that is, they were neither refexive nor impersonal nor passive, but mostly aspectual se-s that surface with verbs like ir (‘go’) and that simply disappear in English translations ( Juan se fue = ‘John left’). Take the frst contrast, that between the two-argument refexive in (57) and the impersonal in (59). Note that in the refexive structure, the subject NP el corredor appears in the fnal position of the structure, clearly after the verb. In contrast, the phrase al corredor in (59), which takes the very same position, cannot be a subject NP since it is introduced by the preposition a (which is contracted with the article: a+ el = al, meaning ‘to the …’). From this it follows that el corredor in (57), like Giovanni in (56b), may be seen as the result of movement from the left of the verb to its right (SOV: el corredor se vendó versus OVS: se vendó el corredor), but al corredor in (59) is ‘base generated’ as the typical postverbal object NP. In MCP terms, the former involves a complex chain that needs to be undone; the latter does not. Meseguer et al. used eye-tracking to measure the putative cost of moving the subject phrase across the verb in (57). The disambiguating region

150

Gap flling

was the contrast el corredor versus al corredor. What they found was that al sentences were read more rapidly than el sentences (in various measures: total time, regression path time, regression path time to the left and second pass time). They interpreted that as the cost of displacement for (57) versus the superior ease of processing elements in situ in (59). The authors found (literally) more of the same in their examination of the contrast (58) versus (60), involving three-argument refexives and the very peculiar se-passive of Spanish, respectively. That is, they registered even greater difculty of the processor in undoing subject movement when more referential competition stood along the path of the moved phrase (notice that in both conditions there is an extra NP (el tobillo, ‘the ankle’) before the fnal disambiguating segment). The increased cost of referential competition is also well accounted for by the SPLT of Gibson (1998) but may be a problem for surprisal theories (Levy 2008a; Futrell et al. 2020). Meseguer et al. made the point that their results might seem shocking in that they involve contrasts between very complicated constructions (at least for linguists) and simple (active, transitive) refexive structures that merely had their subjects postposed (but still present).23 That the displaced refexives should prove harder to process underscores the basic goodness of the MCP idea, understood at least generally as an expected penalty for violating general language type word order preferences. This is all the more conspicuous in a language like Spanish, which grants a relatively large freedom to choose diferent word order options, unlike English, for instance. But it nevertheless aligns with the data from German and Chinese briefy mentioned above. Overall: movement – announced or not – does seem to take a toll in processing. The general conclusions we can derive from this fact align will with the postulates of more modern views on processing difculty. For instance, dependency locality theory makes room for the idea that speakers prefer to use orders that are easy to produce and comprehend (see Gibson et al. 2019; also Futrell et al. 2020; Hawkins 1994). This theory has been successful in explaining various typological universals of word order as well as corpus data (for recent reviews, see Temperley and Gildea (2018). Vasishth et al. (2010) also emphasize habituation with general language-type word orders as an important predictor of processing load.

4.6 Summary and conclusions We started this chapter with the observation that gaps in the chain of speech are practically unavoidable and that there are two main reasons for this: economy (‘if possible, do not recode old information’) and movement deriving from thematic needs (i.e., the need to rearrange basic, general, language-type word order patterns by exploiting especially the frst and the last positions in the sentence structure for reasons of focus and topicality). This makes gaps a very frequent reality of ordinary language processing, which is worth looking into. More often than not, gaps create a need to put human short-term memory limitations to the

Gap flling

151

test in that they add signifcant stress to the already challenging job of keeping the outline of a phrase structure tree active while either a fller or a gap itself are maintained in the memory bufer. If that is not evident enough, consider the time you have had to wait for the word ‘active’ in the last sentence after reading the verb ‘keeping’, and the size of the NP the already challenging job of keeping the outline of a phrase structure tree active while either a fller or a gap itself are maintained in the memory bufer. Imagine keeping a fller in mind during all that segment. To linguists interested in the psychological reality of actual language use, the study of such ‘interfaces’ (between phrase structure building and memory; between ‘modules’ like subjacency, control, etc. and memory) provides a fascinating domain of enquiry. The specifc objective is to reveal how “the human parsing system decides where to posit gap sites in amongst the pronounced elements as it works through a sentence incrementally” (Hunter et al. 2019: 1).24 Whether it is a fller or a gap that must be kept in store depends on the type of empty category we are dealing with. With PRO gaps no obvious displaced fllers are in sight, so the parser has to embark on a gap-driven type of search for a fller. We examined two major options: locality/recency (‘fnd the most recent fller’) and lexical information (‘use the control information of the main predicate to postulate the right kind of either subject-flled or object-flled gap’). The main conclusion we reached is that the parser uses lexical information if that is reliable and readily available. In the case of verb control, both conditions are met. Notice that this solution reveals important properties of the parsing system that recur in this book: instead of waiting for solid information, the parser uses proactiveness and prediction to anticipate needs, so when these come resolution is ready. It is crucial to realize that this way of resolving a processing problem is, in, fact also a matter of recency: in a segment like John promised Mary to …, the frst source of information that can be used to launch a prediction is the verb promise, so the parser does not, in fact, wait for the gap to become a reality. In this sense, this kind of gap-driven parsing is not so much gap driven as lexically driven. The lexical platform that launches such proactive behaviour is the same platform that launches the basic skeletal structure of the clause: predicate-argument structure. This is another way of saying that, when it comes to PRO, the parser and the grammar are fully aligned, since the grammar of control is indeed lexically controlled. With obvious displaced fllers in sight, as in the case of wh-questions and relatives, the parser cannot predict, for when a gap containing something like a trace will show up in the ongoing stream of words is purely arbitrary. This has interesting repercussions. In fact, when unable to predict, the parser becomes wildly proactive and postulates trace gaps not only in the frst available position but also in the frst minimally available position, as in the position of the direct object after a verb that is typically intransitive. And, if work on ‘hyper active gap-flling’ is confrmed, it may even do so in direct object positions after intransitive verbs. We know all this because of the interesting phenomenon known as the flled-gap efect. The FGE reveals the weight of the memory problem for the

152 Gap flling

human parser: when a fller is recognized, nothing seems more urgent than to get rid of it. Again, such proactiveness is recency driven: the frst available signal to launch any kind of strategy is the sudden appearance of a wh-fller. This is true fller-driven parsing therefore. Notice that both the problems with the Direct Association Hypothesis and the fnding that parsers postulate object gaps even after marginally intransitive (or purely intransitive) verbs shows now that verbs may be crucial in general, but a little less so in the specifc case of wh-gap flling. It is important not to lose sight of the fact that in this type of gap-flling the fller antecedes the verb, often by quite a long stretch. The parser is simply too proactive to wait – even for the head of the overall structure. When we discussed memory, we showed how in Japanese, a head-fnal language, proactiveness in the shape of sustained negativities launched at the appearance of pre-verbal content were also visible in questions without wh-movement. Something similar happens when scrambled word orders present themselves. It seems clear that the processing system uses the SVO, SOV, etc. general language type template as a map to navigate dependencies and that it penalizes deviations from it. Movement – unannounced or not – does take a toll in processing, in a nice convergence with various general postulates of linguistic theory. It is interesting to see that toll being taken even when constructions containing displaced constituents with clear case marking are pitted against other constructions with preferred word orders: the former are harder to process, despite the clear marking. Equally interesting is to see that the comparison scrambled order versus inherent linguistic complexity (impersonals, se-passives) still shows the former to be more problematic. In this sense, it is reassuring to see that current sophisticated research seems to confrm the existence of something like the old notion of preferred, basic ‘sentoids’ (Fodor et al. 1974; see Chapter 5).

4.7 Epilogue, or when gaps are too radical and reference must be explicit: anaphor resolution As noted at the beginning of this chapter, gaps typically arise when the information that their position in the structure elicits is uncontroversial, that is, maximally recoverable. For instance, in (60) and (61) below: (60) John tried that day [PRO] to stay in the area as much as he could (61) John tried that day to stay in the area as much as he could, [pro] said he was worried that help should be needed, there is no trouble understanding that the agent of both staying in (60) and of saying in (61) is John. That situation conforms to the maximal degree of recoverability in reference to, for instance, the classic Accessibility Hierarchy of Ariel (1990). This notion of accessibility refers to memory storage. Under the assumption that the mental representations of referential expressions are accessible to comprehenders in varying degrees, the idea is that speakers choose between

Gap flling

153

diferent forms of referring expressions (full NPs, stressed pronouns, unstressed pronouns, gaps, etc.) so as to ease their recoverability for the addressee’s convenience. Defnite descriptions have low accessibility, demonstratives and pronouns mark relatively higher degrees of that. Gaps score the highest. This section is meant to provide some brief information on what is usually known as anaphor resolution. The reason I talk about pronouns now is that, apart from refecting the same accessibility logic as gaps, studies on anaphor resolution compose a rather established are of research in psycholinguistics. The reason I present it as a brief epilogue only is that the area in question deals more with cross-clausal phenomena not covered in this book than with within-sentence processes. There is, in fact, a continuum that pronouns ofer that blurs the distinction between cross-sentence and within-sentence elements. This depends on the type of pronominal forms that we are talking about. For instance, refexives usually work best only when their antecedents are in the same clause (hence the ungrammaticality of (63)) and the notion of c-command is usually invoked as an accompanying structural constraint; as (64)–(65) show, other pronouns behave quite diferently in that they are very strongly associated with antecedents in other clauses (out of the purview of c-command, therefore): (62) John and Mary must really love themselves (63) *John and Mary told Jean to love themselves (64) *[ John and Mary]i must really love them i (65) [ John and Mary]i must really love themj Both the grammatical behaviour and the processing of these pronominal elements indicate that they are more morphosyntactically sensitive inside the sentence structure and more semantically (and even pragmatically) so when they span cross-clausal relations. Again, refexive pronouns are a good example. In Chapter 3, when we dealt with the phenomenon of attraction (mistakes of agreement), we noticed that attraction mistakes caused by semantic interference are a cross-linguistically solid fnding. However, those kinds of mistakes with refexives are extremely rare (more of this in Chapter 5). In an eye-tracking study, Dillon et al. (2013) used closely matched materials, such as (66)–(67) below, and found that the plural verb were in (66) showed typical attraction efects from the plural distractor (managers), but that plural refexive themselves in (67) did not: (66) The new executive who oversaw the middle manager(s) apparently was | *were dishonest about the company’s profts. (67) The new executive who oversaw the middle manager(s) apparently doubted himself | *themselves on most major decisions. This is a priori surprising given that the retrieval process at work at was and at himself afects the same structural position (the subject) and the same feature (number). In an ERP study with similar manipulations, Xiang et al. (2009) observed

154

Gap flling

that the refexives condition elicited strong P600 efects whose modulation was unafected by the presence of distractors (meaning that there was no confusing or fooling the processing system, which was clearly ‘alerted’ by the breach in the expected co-variation of form; Traxler 2014; see section 5.4 for a qualifcation to these fndings regarding more recent results from Parker & Phillips 2017). Note that, among other things, refexives cannot be easily anticipated based on previous predictions (Levy 2008a), so it would appear that a sort of blind reliance on structural form is functional. That is not the case with other kinds of pronouns. A classic example is so-called conceptual anaphors. In principle, formal cues, such as gender and number markings, facilitate antecedent-pronoun coindexation generally (Garnham & Oakhill 1985 and much research after that). Thus, for instance, when English pronouns are clearly marked (e.g., him for masculine referents, her for feminine ones), comprehenders typically identify their antecedents faster and read sentences containing them more quickly than otherwise. However, Gernsbacher (1991) demonstrated that conceptual anaphors do not behave like that in that even in the presence of clear formal cueing, the semantically motivated choice of form (e.g., underlying numerosity or an underlying sex distinction) is often preferred (Acuña-Fariña 2009). Thus, although it is true that English pronouns must agree with their antecedents in number, sometimes pronouns simply naturally violate this constraint, as in ‘I think I’ll order a frozen margarita. I just love them’. Gernsbacher used three types of antecedents: collective expressions, such as a football team, generic types, as in a book in general (as opposed to a particular book specifcally mentioned in a certain context), and multiple item/event nouns that refer to items that everybody is supposed to have many instances of, such as a plate or a fork. For instance, in: (68) After college, my sister went to work for IBM. They/it made her a very good ofer, IBM is a singular noun but continuations with they were preferred to (faster than) continuations with it (see Carreiras & Gernsbacher 1992 for the same fndings in Spanish). These are some other (real) examples that Gernsbacher herself mentioned in her pioneering paper: (69) (talking about the results of an exam). a. My roommate was so excited. She actually made an A. b. She doesn’t make them very often. (70) a: I can’t believe you drive a Fiat. b. Why is that? a. They’re so temperamental. (71) a. I need to call the garage. b. They said they’d have it ready by fve o’clock, but I bet they won’t.

Gap flling

155

Another classic area where the form of pronouns does not follow the typical co-variation pattern as a result of conceptual interfacing is the realm of gender stereotypes. In a self-paced reading study, Carreiras et al. (1996) showed that comprehenders used pragmatic knowledge to resolve pronoun-antecedent ties of this kind (so preferentially associated nouns like footballer to masculine pronouns and nouns like nurse to feminine ones). In fact, an early ERP study by Osterhout et al. (1997) showed that stereotypical gender agreement violations yielded the same error detection signal as grammatical gender violations: the P600 efect normally attributed to syntactic reprocessing (Acuña-Fariña 2009; see also Molinaro et al. 2016). So, we see that, with structural distance mediating between antecedents and pronominal forms (that is, when these appear in diferent clauses), anaphor resolution is done by resorting to meaning, not to form, or not only to form (unlike refexives). Anaphors are interesting because we humans do not seem too bothered (at the conscious level at least) by the ambiguity they very often induce. For instance, in Jane told Mary that she needs to have more space, she can refer to both Jane and Mary (this is, of course, a much bigger problem for machines, that is, for artifcial intelligence). Research on anaphor resolution has established that the key issue here is the notion of salience, in particular the salience or prominence of the referent antecedent. The more salient that referent is, the more reduced the anaphorical form is likely to be (Wolna et al. 2022). So the main goal of theories on anaphor resolution has been to refne what salience really is. For some, salience is to be defned in terms of accessibility in memory, directly as in Accessibility Theory (Ariel 1990). On this view, null pronouns will refer to more accessible antecedents than overt pronouns because they are less informative, less rigid, and more attenuated. For others, it is more a matter of the syntactic position that the antecedent occupies in its sentence (Carminati 2005). On this view, null pronouns will point back to subject antecedents and phonetically realized pronouns would choose object NPs instead. Other views emphasize the overall coherence of the situations that are being linguistically described and the probabilistic expectations that these aford (Kehler & Rohde 2013). This research is typically conducted via a violation paradigm that consists in analyzing the consequences of contradicting the preferred pronoun-antecedent match (so having pronouns corefer with an antecedent that does not match its typical antecedent preference). There are other theories still but it is as well to keep in mind that this is an area where a great number of possibly interacting factors (the accessibility of the antecedent, its syntactic function, its topical durability, the form of the pronominal expression, etc.) may be at play to override typical biases. For instance, even though we said above that typical refexive-antecedent ties are clause-bound, formally regulated and rather impervious to semantic interfacing, in three eyetracking experiments Parker and Phillips (2017) proved that even refexives show intrusion efects (partly from semantics) when the refexive and the subject mismatch in multiple cues. The authors compared conditions where the subject and the refexive mismatched in one feature (gender or number), or in two features

156 Gap flling

(animacy + gender, animacy + number, and gender + number; see section 5.4); in the latter condition they did register attraction efects. The fact that an animacy manipulation (together with the grammatical mismatch) worked indicates that, in the context of the right coalition of forces, even refexives are susceptible to semantic efects. Not only can ordinarily morphosyntactically regulated phenomena (e.g., refexives) be afected by a semantic dimension but also ordinarily semantically regulated pronoun-antecedent ties (such as in ordinary cross-clausal pronominal reference) can be morphosyntactically determined. Take pronoun-antecedent ties involving epicenes such as vittima (‘victim’) and personaggio (‘character’) in Italian (section 3.2.1.1). Notice that these nouns are clearly gender-marked but, despite that, they can refer to both male and female referents. Thus, la vittima, for instance, is gender-marked as feminine but it may be used to refer to either a female or a male referent. Cacciari et al. 2011) devised materials like those in (72) manipulating the contextual information preceding the antecedent: (72) La vittima del incidente stradale sbatte’ violentamente la testa contro il fnestrino. Lei (lui), percio’, perse molto sangue e svenne. ‘The victim of the car accident violently slammed the head against the window. She (he), therefore, lost a lot of blood and fainted’. They wanted to fnd out whether discourse-based gender information prevailed in the earlier phases of pronoun assignment or whether, alternatively, grammatical cues would do so. So they forced vittima to be co-indexed with both a matching pronoun (so lei, the feminine form) and a mismatching one (lui, the masculine one). If the early coindexing of antecedent and pronoun is done using grammatical information, the prediction is that there should be faster reading times for the grammatically matching condition (la vittima … lei; il personaggio … lui) in the pronoun region, irrespective of contextual biases. That is what they found. Interestingly, in the control condition that contained non-gendered nouns like erede (‘heir’), the pragmatic manipulation did work and readers read more rapidly the match that was more highly topicalized (so if the erede was a boy, they preferred lui and if it was a girl, lei was read faster instead; see Acuña-Fariña 2009 for an interpretation of these fndings in terms of agreement and general language type; see also the Bonding and Resolution model of Garrod and Terras (2000), where an initial, form-driven superfcial bonding stage puts pronouns and possible antecedents in touch, and a subsequent resolution stage establishes the fnal link against the discourse scene). Finally, not only can long-distance ties of the kind we are focusing upon here be dealt with using form biases (co-varying of features) but also sometimes co-indexations may be determined in a construction-specifc way using other resolution strategies. In English, for instance, backwards anaphora seems to be geometrically – not morphosyntactically – determined (see section 5.3). In an

Gap flling

157

eye-tracking experiment in which they used materials such as (73)–(75) below, Van Gompel and Liversedge (2003) registered a gender mismatch efect at the region immediately following the main clause subject NP (cruelly in the examples below): reading was slowed down in (76) compared to (75) just there: (73) Gender match: When he was at the party, the boy cruelly teased the girl during the party games. (74) Gender mismatch: When he was at the party, the girl cruelly teased the boy during the party games. (75) Control: When I was at the party, the boy cruelly teased the girl during the party games. Notice that the form of the pronouns is clear, not ambiguous, and that readers were nevertheless inclined to ignore and expect (Levy 2008a) co-indexation to occur at a particular site (the subject position of the main clause). It is in view of the extremely intertwined and varied nature of pronounantecedent ties that models capable of fexibility in accounting for disparate fndings (and forms of salience) seem so attractive, at least in principle. One such currently fashionable model is the form-specifc multiple-constraints framework (Kaiser & Trueswell 2008; Kaiser et al. 2009). In this model, the multiple constraints that guide reference resolution are weighted diferently for diferent referential forms. This approach was originally formulated to accommodate data from Finnish showing that pronouns and demonstratives difer in their sensitivity to the antecedent’s syntactic role and linear position. The diferential sensitivities strongly suggest that not all anaphoric forms are equally tuned to the same kinds of information (see Wolna et al. 2022).

Notes 1 As Hestvik et al. (2007: 12–13) observe, the “reason for the increased importance of trace in the Minimalist Program is the multiplication of movements for checking purposes. All fnite verbs and all case-bearing NPs will move at some stage in a derivation in order to check their features. This necessitates some means for them to engage in their local grammatical relationships at their base-generated positions”. In fact, in this model, “almost every argument will exist not only overtly but also as a trace, with a chain linking them”. As will become clear soon, we will not be concerned with this much more sophisticated notion of non-obvious movement here, but we will focus, instead, on those instances where movement is rather more tangible and more easily acknowledgeable by linguists of diferent persuasions. 2 In a probe recognition task, subjects are given a sentence to read self-paced and at some point following the area where a gap is postulated a probe word from the antecedent NP is shown. The subjects simply have to quickly respond if they recognize the probe (i.e., if they remember seeing it before).

158

Gap flling

3 Jackendof (2002) and Culicover and Jackendof (2005) provide helpful discussion on whether empty elements are a grammatical necessity. See Chomsky and Lasnik (1977) and references therein for an early defence of the view that gaps are not syntactically empty. 4 Pinker (1994: 204): “At frst one might wonder whether the sentences are even grammatical. Perhaps we got the rules wrong, and the real rules do not even provide a way for these words to ft together (…). No way; the sentences check out perfectly. A noun phrase can contain a modifying clause; if you can say the rat, you can say the rat that S, where S is a sentence missing an object that modifes the rat. And a sentence like the cat killed X can contain a noun phrase, such as its subject, the cat. So when you say The rat that the cat killed, you have modifed a noun phrase with something that in turn contains a noun phrase. With just these two abilities, onion sentences become possible: just modify the noun phrase inside a clause with a modifying clause of its own. The only way to prevent onion sentences would be to claim that the mental grammar defnes two diferent kinds of noun phrase, a kind that can be modifed and a kind that can go inside a modifer”. 5 Fodor (1988: 135, 145, and 146) dwells on the reason why interacting gaps considerably complicate the picture by activating issues that relate to several aspects of the submodules of Government, Case theory and Theta theory. She points out that such interactions result in a “furry of flling activity [that] creates a sort of cloud of dust around the object NP position”, p. 148). 6 Starting in Hornstein (1999), some generative grammarians have tried to confate the two types of gaps in (25) and (26) by subsuming control phenomena under a raising account (Boeckx et al. 2010). This is the so-called Movement Theory of Control. Evidence against such an account has been provided by, among many others, Landau (2000, 2007) and Culicover and Jackendof (2001). 7 The predicate beg may be an exception, as some speakers at least may fnd it possible to understand that in Tom begged me to sing, the subject Tom is in charge of the singing (Boland et al. 1990; Lyngfelt 2009). 8 Notice that the second NP region comes even before the gap, which means that no diferences should have been detected there. Betancort et al. argued that that was the result of parafoveal processing. Much research has shown that, while fxating a word with their foveas, readers are often able to ‘see’ the beginning of the next word in their parafoveas (with much less acuity, of course). The parafoveal span may extend to about eight or nine letters to the right of the fxation position (see Rayner 1998 for a review). 9 In the Featherston et al. ERP study that the authors referred to in the text (Featherston et al. (2000), the authors compared NP trace (raising) and PRO, and they observed that the raising constructions elicited a signifcantly stronger P600 efect than their control counterparts. See 4.4 below. 10 On adjunct control in English structures such as The mother said that the little girl fell asleep after PRO playing in the yard, see, for instance, Parker et al. (2015). On implicit control, as in The ship was sunk PRO to collect the insurance, see McCourt et al. (2016). 11 The author further noted that his results were also compatible with Gibson’s Syntactic Prediction Locality Theory (SPLT; see Gibson 1998) and that the diference between the AFS and the SPLT is that while the SPLT postulates that a syntactic prediction is maintained in working memory, in AFS terms what is kept in store is the dislocated constituent itself. Recently a debate has ensued over whether the parser maintains only syntactic category information or also semantic content (Ness & Meltzer-Asscher 2017). 12 Interestingly, the positivities they obtained (P600), associated with the integration of the fller and the gap at the gap site, did not difer between short and long conditions, suggesting that the amplitude of this component has to do with the (diferent) specifc processes of thematic role assignment and/or semantic integration between the verb and the fller. See also Lo and Brennan (2021) on in-situ fllers in Mandarin Chinese for an account that rests on the positing of covert movement of the fller to

Gap flling

13 14 15

16

17

18

19 20

21

159

a position similar to that of English wh-fllers in questions, of generative grammar inspiration. This account capitalized on the scope-taking properties of wh-elements associated with particular kinds of verbs to seek evidence of the establishment of “covert wh-dependencies” in processing. If the parser uses a question-selecting verb as a cue for launching an upcoming covert dependency, a sustained negativity should be observable between that verb and the embedded  wh-element (a prediction that was disconfrmed in their study, where no P600 emerged either). We have already referred, in section 4.2, to work by Ueno and Kluender (2009) on in-situ wh-questions in Japanese and to the fact that no P600 efect was found in their experiments either (so both with in-situ wh-elements). On studying gap-flling via length manipulations involving increasing separation between fllers and gaps, see Wagers and Phillips (2014). For instance, Aoshima et al. (2004) studied ‘scrambled sentences’ containing leftdislocated dative NPs, and reported a flled gap efect at a pre-verbal dative object position for the frst VP that became available. In section 5.3 we will deal with islands. For now, it may sufce to know that in Generative Grammar an island is a kind of structure from which no extraction of a wh- word is possible. For instance, in Who did you hope that the candidate said that he admired (gap)? the gap shows that we can extract the object phrase after admired and move it to the front (Who). However, in *Who did the candidate read The Times’ article about (gap)? or in *Who did the fact that the candidate supported (gap) upset voters in Florida? we cannot, so the relevant structural domains are islands. Keshev and Meltzer-Asscher (2017: 550) express this succinctly: “Studies of this process of active dependency formation and its island sensitivity have mainly been conducted in English and have generally assumed that if the parser enters an island environment while holding an open fller-gap dependency, then it has only two hypothetical options: (i) predicting a gap inside the island, which would result in ungrammaticality, or (ii) putting the active search for a gap ‘on hold’ and predicting a gap only when outside the island”. In the context of the relative disregard for the fne aspects of transitivity shown by the parser in Omaki et al.’s (2015) work, it is interesting to observe that in two recent ERP studies Chow et al. (2016) and Chow et al. (2018) report evidence that comprehenders’ initial verb predictions are sensitive to the arguments’ lexical meaning but not to their structural roles, and that predictions based on those roles are real but limp a little behind. By “medium tangibility” I simply mean that, in reference to a canonical SVO schema, it is easy to regard a transitive verb such as hit as requiring an NP object after it and that, when the passive construction is used, that DO slot is simply not flled. Likewise, it seems fairly intuitively apparent that, since (11) does not actually mean that John feels certain of anything, John is, in fact, only the agent argument of leaving, but it is not in the position that naturally accommodates that argument (the lower clause subject), which means that that position is equally unflled. Using the authors’ own paraphrase, in the German materials the word order was ‘the sherif hoped/seemed […] the ofender at last sentence to can’, and the ERPs were time-locked to the onset of the article in the critical NP the ofender. De Vincenzi did apply the MCP to the distinction between unaccusatives like È arrivato Gio (‘Gio arrived’) and unergatives like Ha corso Gio (Gio ran). After Belletti (1988), it is customary in the generative tradition to maintain that the subject originates postverbally in the former but preverbally in the latter. De Vincenzi reported faster RTs and higher comprehension accuracy for unaccusatives versus unergatives. See also de Santo (2019) on a minimalist interpretation of these fndings. Readers well versed in generative grammar know that this view does not necessarily refect the state of the art in that framework since at least Koopman and Sportiche (1991), as it is now widely assumed that all subjects are base generated inside the VP. Even so, even for modern generative linguists, the two structures difer in terms of

160 Gap flling

the empty category in the preverbal position: a discourse-linked referential empty pro or an expletive (dummy) element that signals that that position is thematically empty. See Fanselow, Schlesewsky et al. (1999). Also Rizzi (1996) for empty pronominals. 22 Note that of the two clauses of the MCP (1: Avoid postulating unnecessary chain members at S-structure; 2: but do not delay required chain members), the second one is actually shared by the two principles. 23 The authors note (p. 783) that, over and above the assumed greater complexity of passives versus actives, the Spanish se-passive is particularly “complex because among other reasons, (a) it has postverbal subjects that look like active objects [note that in fact el tobillo IS an object in (58)]; (b) the subject position remains therefore empty in S-structure; (c) subjects of se passives display the typical properties of unaccusatives (including subject-postposition and freedom to dispense with determiners), which testify to their “deep” object status (Burzio 1981; Levin & Rappaport-Hovav 1995)”. As for impersonals, “they embody the strange grammaticalization of an a priori ungrammatical string (Blevins 2003): one with no subject, full or dummy” (Otero 1986). 24 Obviously, the account of gap-flling operations that we have provided here rests on a heavy bias of research towards the comprehension dynamics and it practically ignores the needs of production dynamics, about which much less is known. It is conceivable that a certain fller-gap operation may be ‘penalized’ in comprehension but is still the result of some sort of gain from the production front.

5 ON PARSERS AND GRAMMARS

5.1 Introduction: on psychological adequacy Ever since Chomsky proposed the idea of syntactic autonomy (of colorless green ideas fame), linguists have struggled to isolate what it is precisely that grammar is vis-à-vis semantics/conceptual structure. As any practising linguist knows, some sixty years on and several linguistic wars fought, the quest is still on. A seemingly similar need to narrow down the scope of grammar afects the distinction grammar versus processor (or competence versus performance): do sleeping colorless green ideas and horses raced past barnyards really talk to each other? Less metaphorically: do grammatical derivations and processing dynamics refect diferent snapshots of the same process (‘the language process’), or do they really tap into two separate cognitive systems, the purely linguistic versus the implementational? A prerequisite to answering that question is a previous answer to another question, namely, whether linguists really need to care and need to do their job under the confnes set by the notion of psychological adequacy. Jackendof (2003: 652) sees “methodological convenience” behind attempts at what Poeppel and Embick (2005) have aptly dubbed “interdisciplinary crosssterilization”, that is, the failure of linguistics and psycholinguistics to communicate by dealing with competence or performance separately (see also Ferreira 2005: 368–370; Lewis & Phillips 2015; Acuña-Fariña 2016): More controversial has been an important distinction made in Aspects between the study of competence – a speaker’s f-knowledge of language – and performance, the actual processes (viewed computationally or neurally) taking place in the mind/brain that put this f-knowledge to use in speaking and understanding sentences. I think the original impulse behind the distinction was methodological convenience. A competence theory permits DOI: 10.4324/9781003405634-5

162 On parsers and grammars

linguists to do what they have always done, namely, study phenomena like Polish case marking and Turkish vowel harmony, without worrying too much about how the brain actually processes them. Unfortunately, in response to criticism from many diferent quarters (especially in response to the collapse of the derivational theory of complexity as detailed in Fodor et al. 1974), linguists tended to harden the distinction into a frewall: competence theories were taken to be immune to evidence from performance. So began a gulf between linguistics and the rest of cognitive science that has persisted until the present. One suspects that the DTC episode has had too much infuence in the history of THE feld (by which is meant cognitive science in general, thus encompassing both linguistics and psycholinguistics). And there is reason to believe that the idea of a total ‘fasco’ has been overplayed. In fact, the DTC was simply a “rather optimistic linking hypothesis” that “relates the length of transformational derivations to the perceptual complexity of a sentence” (Phillips 2013: 296). It most probably originates in the following observation of Miller and Chomsky (1963): The psychological plausibility of a transformational model of the language user would be strengthened, of course, if it could be shown that our performance on tasks requiring an appreciation of the structure of transformed sentences is some function of the nature, number, and complexity of the grammatical transformations involved. (Miller & Chomsky 1963: 481) As Phillips notes, the DTC simply narrowly proved a particular transformational grammar to be psychologically wrong. However, the deeper claim was rather that mental computations should take a time that is proportional to the difculty of the structure that is computed and that that should be observable in tasks that implement those computations. This has remained a standard assumption of psycholinguistic research to the present day. In fact, we know it to be basically true: remember the difculty with object relatives versus subject relatives that were mentioned in chapter 4, for instance, and the connection with speakers’ individual working memory profles. Beyond this broad fact and the even broader fact that, as far as we can tell, language IS a product of the human mind/ brain, there are now good reasons why studying mental linguistic representations online bears fruit. The frst reason is rather practical and it takes us back to the idea of narrowing down the scope of grammar. We know that that narrowing down has been taking place inexorably in the last forty years: a myriad of transformations have been discarded, the lexical component greatly augmented, ‘minimalist’ derivations proposed … A constant leitmotif during all that process has been to eliminate phenomena from the grammar and lumping them in the ‘interfaces’. There are two reasons why stopping just there is not well advised. The frst is that invoking

On parsers and grammars

163

interfaces is of no good unless we bother to look into them in some detail; otherwise, they become another convenient waste bag (more methodological convenience …) or a classic episode of throwing the baby out with the bathwater. The second is that competence theories are theories of well-formedness, and well-formedness involves grammaticality judgements that clearly interact with constraints that derive from limitations on language processing (Phillips 2006; Acuña-Fariña 2022). In fact, at least some of the mechanisms that have been used to argue for various complex syntactic theories may well originate in other cognitive domains, which exempts linguists from having to account for them, which, in its turn, is likely to make whatever they still need to account for a more realistic job (Phillips 2013). This applies straightforwardly to a type of sentences that language-users often deem unacceptable because they strain the language processing system in some way, usually because of the type and/or the size of the memory needed to process them. In section 5.3 we will discuss the classic case of island phenomena. Another frequently observed idea is that there is an inherent disconnect between online incremental structure-building and time-unconstrained grammatical structure-building due to the fact that the former lacks grammatical precision (Townsend & Bever 2001; Phillips 2006). This is because the situations involving actual performance are less than ideal due to all kinds of possible interference (ambient noise, distractions, etc.) and sheer time pressure, often leading to mistakes. This argument is partial (Acuña-Fariña 2022). In the frst place, we really do not know how much better language-for-thought is compared to observable language, so we run the risk of idealising the unseen. But more importantly, we do know today that there is a very large degree indeed of isomorphism between the results of classic grammaticality judgements and the mind/brain’s reaction to the linguistic structures involving those judgements. We can measure that with exquisite precision, in fact. In the last thirty years or so, ERP research, in particular, has provided a wealth of information tracing the onset of ungrammaticality to a few well-known brain signals. With standard English as a baseline, an incorrect sentence, such as the children is coming tomorrow, will surely produce a Left Anterior Negativity (LAN) some 400 ms after the onset of the word is. Another one like Tom was eating the pencils and … will most likely result in an N400 another 400 ms after the onset of pencils. A garden path induced by a structure like the cotton clothing is made of is too coarse … will surely produce a P600 brainwave after the second is, starting at around 550–600 ms and extending up to almost 1000 ms. All these processes may be the focus of much debate and specialized research dynamics, but the basic core fact (or at least a basic core fact) is often ignored, and that is that the brain detects breaches of grammaticality with amazing precision ordinarily (Hagoort et al. 2003; Friederici & Weissenborn 2007; Molinaro et al. 2011, among many others). Not only that, but it very often actually refuses to build illegal representations even in the face of challenging processing demands that would make those violations functional (Stowe 1986; Nicol & Swinney 1989; Wagers & Phillips 2009; see section 5.3).

164 On parsers and grammars

Finally, there seems to be an inherent contradiction between dissociating oneself completely from processing considerations and expressing and even shaping up competence theories that seem strangely couched in processing terms. Consider for instance the widely supported idea in formal grammar circles that probe–goal computations (e.g., controllers and targets in agreement) must be local “in order to minimise search” (Chomsky 2001: 13). Why worry about the cost of search if there are no time constraints? Consider also the assertion that in order to alleviate the “computational burden” derivations should take place in cyclic phases (Chomsky 2000: 9), and that “phases must be as small as possible, to minimise memory” (Chomsky 2001: 14; on ‘real’ phases, see Tanaka 2019). It is hard to see why “memory” and “computations” are mentioned at all in these passages. The Fodorian logic is clear, however (Acuña-Fariña 2012): only after syntactic processes are fnalized can the phonological and semantic components start doing their work, the main reason being one of computational cost: avoiding long searches in memory and avoiding mixing computations that originate in hypothetically distinct and encapsulated domains. The opposite to this orderly serial and modular determinism is an interactive system and massive parallelism. This is precisely the kind of claims one can pursue in a lab. Thus, for instance, if a probe-goal computation is insensitive to a semantic manipulation at an early stage of processing then the serial, modular architecture gains support. And vice versa … A sceptic might point out that the previous comments confuse real-time implemented memory systems (e.g., that humans are content-addressable whereas modern computers are location-addressable) with the abstract algorithmic/computational memory system in the theory of computation. Phrases like “computational burden”, “minimize search” and “minimize memory” might thus be taken in the context of computational complexity, e.g., the amount of resources (time/memory) needed to compute, say, whether a derivation converges (the output) given any numeration (the input). These computational resources might, however, be implementation-independent. I shall only add that I am not entirely convinced by this argument. In the remainder of this chapter, we will examine the relationship between linguistics and psycholinguistics by asking ourselves whether theories of grammar and language processing models describe separate cognitive systems or just one system from a diferent (time?) perspective (see Acuña-Fariña 2022 for many of the points tackled here). I shall refer to these two opposite views as the Separate Grammar Hypothesis (SGH) and the Grammatical Parser Hypothesis (GPH). First, in section 5.2 the SGH will be evaluated by bringing into consideration the general idea that a grammar is a static body of knowledge, whereas the language processing system is a set of heuristic procedures maximized in efcacy for dealing with the particular demands of comprehension and production. These demands often involve having to cope with such extralinguistic constraints as memory architectures and memory limitations, statistical biases of various kinds or potentially very constraining conceptual attractors. Then in section 5.3 the GPH will be considered by paying special consideration to the question of islands and various

On parsers and grammars

165

binding phenomena (the relationship between proforms and their antecedents). Section 5.4 will focus on illusions, that is, cases where the parser seems to be fooled by certain properties of the encountered stimuli and reacts by producing apparently ungrammatical representations. As Lewis and Phillips (2015) note, the most important challenge for the SGH is to explain how the language processing system and the grammar system manage to interact so efciently. Conversely, the most important challenge for the GPH is to account for situations when representations are formed that are not licensed by the grammar (such as illusions). Notice that a time issue permeates this discussion in two ways: frst because it is habitually assumed that grammar systems become visible in ofine data, whereas processing systems relate to online records; in the second place, because even models that believe in a two-system view may difer on whether grammar is used frst, before heuristics, or the other way around.1

5.2 The Separate Grammar Hypothesis: heuristics and good-enough, goal-directed, predictive processing By the early 1970s the idea that transformations were used to recover a deep structure online had been abandoned, but the diferent and more realistic idea that a determinate surface syntactic structure guided online sentence processing remained in place. The question was to be able to specify how the internal structure of sentences was actually derived from their external form. Fodor and Garrett (1966) and Bever (1970) came up with the idea of perceptual strategies. Bever’s lengthy publication, in particular, became a turning point in the world of cognitive science, and many forms of research up to the present day are directly based on it. In addition to the notion of strategies, as Tanenhaus (2013: 410) notes, a whole community of researchers today takes as axiomatic “Bever’s claim that how language is produced, comprehended, and learned shapes language structure, and thus must lie at the heart and not the periphery of understanding linguistic structure”. In this section we develop the idea that the relation between grammar and processing is abstract, not direct, and that grammatical rules are a back-up system that enters the scene after some kind of perceptual strategies/surface schemata/rough-and-ready templates/quick-and-nasty pseudo-grammars /analyses-by-synthesis/independent semantic compositions/good-enough parses and lossy channels have done their job frst. We fesh out the basic philosophy behind the invocation of perceptual strategies now. The key idea here is that the link between internal constituent structure and external input sentences is an analysis by synthesis. This is initially conceived as taking place in two stages. During the frst stage, automatic perceptual strategies developed through experience with the language are applied to provide a rough who-did-what-to-whom type of parse that is particularly sensitive to statistical regularities and strong semantic associations (Levy 2008a). Thus, in a message involving a hunter, a deer and the verb shoot, an unavoidable thematic structure involving a hunter shooting at a deer will deploy itself with only

166 On parsers and grammars

minimal bottom-up activation, as it were, even though a colorless-green-ideas type of scenario where the deer shoots at the hunter is expressable derivationally. This frst stage is thus what the literature has termed variously a dirty analysis, a primitive syntax or a quick-and-nasty pseudo-grammar. Bever (2009: 293) refers to it as a gestalt-based operation. The foggy initial parse is then synthesized by a second parsing phase that brings in fne grammatical knowledge and checks that both types of parse are compatible. In Bever’s early proposal, in particular, this second stage was ‘derivational’. Today we understand it simply to mean grammar (of whatever formal kind). The second stage is needed to ensure that the initial interpretation is correct, and, in any, case having a kind of symbolic syntax allows us not to be consigned to deal only with frequent, highly predictable constructions and very constraining local contexts. Derivational grammars are truly creative after all.2 Townsend and Bever (2001: 184) note that psychological evidence of this second stage is hard to fnd but suggest that it can be glimpsed in cases like the famous illusion we all experience when reading the sentence More people have been to Russia than I have, whose semblance of well-formedness despite its ungrammaticality and ulterior utter unintelligibility is, they suggest, the irrepressible working of the application of superfcial ready-made templates (see section 5.4 below). The idea of the two stages soon gave way to the motto we understand everything twice, which is usually how this broad model of parsing is informally known in the specialized literature nowadays. After Townsend and Bever (2001), this view is also captured in the slogan Late Assignment of Syntax (LAST); hence, the talk of LAST models that share this basic philosophy too. A third concurrent motto is semantics proposes and syntax disposes. Note that, as Franck (2016) observes, the two-stage idea is a hallmark of the modular approach to language processing too but, interestingly, on the SGH account what comes frst, at least in early proposals, is not an inexcusable module deploying fnely articulated, symbolic grammar, but a set of non-modular, experientially based, superfcial skeletal templates or heuristics, that is, precisely the other way around. The view that general cognitive constraints lead the parsing mechanism came as a shock to a community of researchers that had come to take for granted strongly modular and syntactocentric foundational generative assumptions. Bever thus forced a debate in a feld that was battling its own generative war. He was so bold as to ask what linguistics was a science of, for, after all, linguists wield grammaticality judgements to construct their competence theories but grammaticality judgements are themselves the output of performance processes. As the DTC ‘fasco’ was unfolding, many linguists saw this observation as the last straw and resolutely gave up on the idea that linguistics and psycholinguistics could ever be seen as sharing a common interest (namely, to study the nature of language): The question that arises is: What is the Science of Linguistics a Science of? Linguistic intuitions do not necessarily directly refect the structure of a language, yet such intuitions are the basic data the linguist uses to verify

On parsers and grammars

167

his grammar. This fact could raise serious doubts as to whether linguistic science is about anything at all, since the nature of the source of its data is so obscure. However, this obscurity is characteristic of every exploration of human behavior. Rather than rejecting linguistic study, we should pursue the course typical of most psychological sciences; give up the belief in an “absolute” intuition about sentences and study the laws of the intuitional process itself. (Bever 1970: 346; emphasis added) In fact, Bever did not renounce grammar but simply opted for a model where grammar and processing are not the same thing and where processing is not a slave to (in any case real) syntactic representations, nor the other way around. In a way, by relying on the frst experimental investigations on language processing, he managed to bring home the idea that the early generative conception of grammar was blown out of proportion and that some delimitation was needed. This was one of the earliest calls in that direction, and in so much as it served to refne what grammar must be left to deal with, it proved that going into the labs was actually quite benefcial from the very beginning. The important issue is that there was a conundrum to solve (Bever 2009 2013: 388): the natural computational domain of a grammatical derivation is the sentence (since derivations are internally hierarchical or “vertical”, starting at the C node), whereas its immediate processing is done incrementally, word by word or phrase by phrase at the most (“horizontally”, that is left-to-right; Phillips 2003). The idea behind we process everything twice, a sort of hybrid model, is thus a move of desperation, “inelegant” or even “brutal”, in that: it solves the conundrum (…) by fat – sentence processing is both fast and complex because it is simultaneously handled by two systems, one fast and sometimes wrong, one slower but ultimately correct. This is an inelegant solution to the conundrum, but shows that humans may solve it, albeit inelegantly. (Bever 2009: 288) Bever did strongly argue that language acquisition and performance shape grammar, but he has always suggested that an independent level of grammatical representation is, in fact, necessary anyway also during language acquisition in order to mediate comprehension and production processes.3 With acquisition in mind, both initially and more recently (e.g., Bever 2009, 2013; see also Townsend & Bever 2001), he has made the point that sophisticated grammars of the kind we are used to seeing in generative circles are simply unlearnable. This has solidifed into a series of theoretical ramifcations that converge on the widely felt need to propose a simpler “surfacey” syntax and more psychologically friendly approaches to the conundrum ( Jackendof 2002, 2015; Culicover & Jackendof 2005; in Phillips 2003 ‘simplicity’ was achieved by envisaging a kind of fexible

168 On parsers and grammars

constituency that is continuously afected and changed as the incremental generation of structure occurs in real time). An interesting observation in the context of acquisition via actual performance and of more realistic grammars is the idea of “rule conspiracies” (Bever 2009: 289), by which is meant sentences that share the same surface form despite their diferent underlying thematic heterogeneity. For instance, tough movement as in (1) and control as in (2) are surface identical but have obvious diferences in ‘underlying structure’ (note that the boy is not the ‘logical’ subject of was easy in (1) but the ‘logical’ object of push instead; that is, the boy was not easy, what was easy was to push him: [pushing the boy] was easy): (1) The boy was easy to push. (2) The boy was eager to push. However, both (1) and (2) share a very canonical form involving a subject preceding an infected verb. In (3), arguably, both control and raising are fused into yet a third type of structure that shares yet again the same surface form (as, loosely speaking, The Atlantic is an argument of being cold and also of swimming in): (3) The Atlantic is too cold to swim in (gap). The form of passive structures like the city was attacked, which arguably contains an NP-trace (or its equivalent) after the theme argument of the verb is topicalized and turned into a matrix clause subject, is similarly analogized to simpler templates like the ‘NP BE PREDICATE ADJECTIVE’ schema (as in the city was sophisticated). Thus, the notion of a rule conspiracy refects constraints on derivations such that they have the same general surface form regardless of diferences in logical form or semantic relations. This is despite the fact that each underlying form could be refected in a unique surface sequence or signalled by a specifc marker. On our interpretation, such computationally possible languages would be allowed by generative architectures, but are not learnable: they would make it hard for the language-learning child to develop a statistically based pattern that it can internalize and use for further stages of acquisition. The canonical form (…) facilitates the discovery of a surface template based on statistical dominance of the pattern. (Bever 2009: 290; emphasis added) Perceptual strategies may thus difer from language to language depending on the statistical regularities that dominate in each (Slobin & Bever 1982; Vasishth et al. 2010; Futrell et al. 2020). Note that under this kind of approach, in the development of one’s linguistic abilities, acquisition and adult processing follow

On parsers and grammars

169

a similar ordering of phases: frst is the formation of quick and nasty gestalts, then that of sophisticated hierarchical structure. But the crucial aspect is that the latter is shaped by the constraints afecting the former, instead of coming from outside (from ‘the unknown’). An interesting parallel provided by Bever early on (Bever 1970) was between language and mathematical calculations, particularly judgements of relative numerosity. Thus, it turns out that we humans have a very intuitive apprehension of numerosity that comes to the fore when we are asked to judge which row has more circles in arrays like (4) below. We may pause to count the circles or judge from the generalization that a larger row contains more units: (4) o o o o o o oooooo It appears that children juggle these two modes of analysis very early on, even before they know anything about explicit counting. The intuitive picture-based numerosity calculation is basically the same form of Gestalt processing as is suggested for dealing with language during the frst stage of we-process-everything-twice. Although originating in very diferent theoretical philosophies, perceptual strategies look a little like the streamlined syntactic templates of Construction Grammar, at least in that whole syntactic templates are given representational reality, instead of being seen as epiphenomenal, that is, the result of the application of rules (Goldberg 1995, 2006, 2019). As in Construction Grammar, they come in the guise of form/meaning packages (symbols) and have the relative merit of being immediately intuitively attractive.4 Here are a few (from the original 1970 paper by Bever): Strategy A: Segment together any sequence X… Y, in which the members could be related by primary internal structural relations, “actor action object … modifer.” Strategy B: The frst N… V… (N)… clause (isolated by Strategy A) is the main clause, unless the verb is marked as subordinate. Strategy C: Constituents are functionally related internally according to semantic constraints. For example, the three lexical items ‘man,’ ‘eats,’ and ‘cookie’ are internally related, as in ‘The man eats the cookie.’

170

On parsers and grammars

Strategy D: Any Noun–Verb–Noun (NVN) sequence within a potential internal unit in the surface structure corresponds to “actor–action–object.” Early work was in line with these superfcial heuristics. For instance, Schlesinger (1966; cited in Bever 1970) showed that sentences like (5) were easier to process than those like (6), presumably because the semantic relations holding between the set of potential subjects and predicates were more constrained in the former (notice the stark diference in animacy and volitional control between the referential phrases in the two, making associative information (Bever 2013) more easily deployable in the frst member of the pair): (5) The question the girl the lion bit answered was complex. (6) The lion the dog the monkey chased bit died. Clark and Clark (1968) showed that isomorphism between the superfcial order of a complex sentence and the actual order of the events described in it makes such sentences easier to remember. For instance, (6) is easier than (7): (7) He spoke before he left. (8) He left after he spoke. Initial investigations also showed that sometimes semantic constraints are so powerful as to cause syntactic factors to be completely ignored (Slobin 1966). Various forms of modern work from diferent corners of the experimental world are at least compatible with the basic philosophy of the SGH. At a lexical level, for instance, evidence for underspecifed representations seems to indicate that the processing system delays interpretations of words with multiple senses (say, football as a game or as an object for playing the game) by initially activating a vague conceptualization and then specifying it once context provides further richness of detail (see Frisson 2009). At the other extreme of processing units, work by Otero and Kintsch (1992) and Albrecht and O’Brien (1993; see also Sanford et al. 2006), for instance, has shown that readers are sometimes incapable of detecting contradictions in texts. In fact, if the initial conceptual world that has been conjured up is sufciently constraining semantically, they may even be incapable of updating or revising initial interpretations when later information becomes available. With sentence processing in mind, a strand of work led by Fernanda Ferreira is now widely cited as showing that sometimes readers do not really fully recover from very obvious syntactic garden paths (Christianson et al. 2001; Ferreira & Patson 2007; Slattery et al. 2013; Karimi & Ferreira 2016; see also Sanford & Sturt 2002) because they are completely ‘fooled’ by a heuristic shortcut. Typical research of this kind involves presenting subjects with garden-path-inducing

On parsers and grammars

171

sentences such as While the woman bathed the baby played in the crib and then asking them Did the woman bathe the baby? To the experimenters’ (initial) surprise, readers very often answer Yes. These fndings underscore how powerful semantically compelling interpretations are in driving a syntactic parse (or even in bypassing syntax altogether sometimes, according to some; remember: semantics proposes and syntax disposes … if it can).5 They have motivated the Good-Enough Model of language comprehension, a clear ofshoot or version of the SGH approach that has had a crucial impact on research in the last twenty years or so. The basic idea behind good-enough mentalities is “that syntactic computations and algorithmic procedures are sometimes not only efortful but even completely unnecessary and that, by contrast, heuristics are fast, frugal, and cheap” (Acuña-Fariña 2022). Seen in this light, quite often, linguistic representations are shallow, incomplete or even inaccurate and they may be lacking in detail but are still quite often good enough to get the job done. This satisfcing (Simon 1956) may actually result in the development of parsing systems that simply ignore or do not give careful consideration to the actual input, and even project representations that are at odds with the input if those representations align well with the plausibility of events in the real world (Traxler 2014). By way of analogy, consider the level of analysis that we need to walk up a cobbled street, for instance. You surely do not need a lot of ‘granularity’ for that (that is, a very precise apprehension of the shape of each cobblestone you step on); unless the street is in very bad shape, a rough, sloppy and easily automatic analysis of its basic outline IS usually good enough. Ferreira (2003) has shown that passive sentences are often misinterpreted when they code implausible scenarios involving reversed semantic roles, as in the dog was bitten by the man. On the Good-Enough processing view, this happens because, at the very least, the separate derivational path is fragile and susceptible to interference. This is a far cry from the assumption that that path was either all there is or the most important thing there is in sentence comprehension.6 Dwivedi (2013) studied sentences containing quantifer scope ambiguities such as every girl climbed a tree. If you think about it, you will soon realize that these sentences have two possible interpretations: one in which there are several trees (one for each girl, the ‘normal’ reading) and one with only one tree that all the girls climbed. Dwivedi did a moving window experiment that showed that subjects formed a dirty, foggy interpretation of the scope possibilities. Then question-response accuracy rates suggested that they were less accurate for the non-preferred reading sentences (one tree for all girls), a fact that appeared to indicate that the fne grammar aspects of quantifer scope processing had taken place after the frst sloppy parse. Love and McKoon (2011) have also shown that comprehenders may leave pronouns unresolved, especially when working memory demands are high or when WM span is small. Initial accounts of the cue-based models that relied almost exclusively on structure-independent content-addressable architectures are also broadly compatible with these satisfcing views (Lewis et al. 2006; Van Dyke & McElree 2006) and Bever (2009: 280) has argued that even one of the most long-standing tenets

172

On parsers and grammars

of generative grammar, the Extended Projection Principle or EPP (i.e., the idea that, one way or another, clauses must contain subject NPs obligatorily; Chomsky 1981), is actually based on the Canonical Form Constraint (CFC). To be more precise, he has made the point that the phenomena that have motivated the EPP are expressions of the CFC. On this logic, children learn the CFC better than theoretical competitors, with learning meaning “discover[ing] derivations for statistically frequent meaning/form pairs, using the available repertoire of structural devices”. This means that, in individual languages, children frst access and then learn derivational operations and representations that align descriptively with the EPP. However, the EPP itself is but an ulterior descriptive generalization resulting from acquisition constraints. Seen in this light, the EPP ensures that “(almost) every sentence construction maintain a basic confgurational property of its language” (p. 285). The CFC for English looks something like (8) below: (9) NP V (agreeing with NP) (optional NP) ARROW Agent/Experiencer Predicate (object/adjunct) In other words, the CFC/EPP schema is a huge “rule conspiracy”. Another classic area of processing that is amenable to a re-interpretation along the lines of the SGH and the good-enough approach is the widely known diffculty of object relatives versus subject relatives (see chapter 4). Lin Chien-Jer (2013) analysed garden path efects and general processing difculty with object relatives to conclude that they both result from the initial incorrect application of Bever’s probabilistic ‘NVN’ schema, a template in which agents precede patients. The author reviews evidence in both head-initial and head-fnal languages and suggests that comprehending relatives in which the thematic order is in line with the schema is less demanding than where it is not. Keeping to relative clauses, work by Traxler et al. (1998) and Swets et al. (2008), among others, suggests that ambiguous relative clause attachment (see section 2.3.7) is actually faster than non-ambiguous attachment especially when subjects are led to expect only superfcial comprehension questions. Traxler et al. (1998) used sentences such as (10)–(12): (10) The driver of the car that had the  moustache  was pretty cool. (high attachment) (11) The car of the driver that had the moustache was pretty cool. (low attachment) (12) The son of the driver that had the moustache was pretty cool. (ambiguous) When participants know that the ambiguity itself is going to be queried, the ambiguity advantage is reduced, but not even in those cases is it really gone. Such fndings are similar to the ones reported for quantifer scope ambiguities that we briefy touched upon above and are habitually cited today as compatible with a good-enough analysis (and surprisal theories; see below).

On parsers and grammars

173

Consider also the work by Pickering et al. (2006) on aspectual coercion. These authors have shown that telicity may be approached in an ‘undiferentiated’ fashion initially as well. A sentence such as the insect hopped describes a telic event; however, when such sentences are followed by until clauses, they must be reinterpreted as denoting an ongoing activity (atelic). Initial work by Piñango et al. (1999) and Todorova et al. (2000) using a stop-making-sense task had suggested that comprehenders experienced difculty with these continuations, a fact that was taken to indicate full commitment to the telic construal followed by re-analysis. In four diferent online studies with a more precise methodology (self-paced reading and eye-tracking), however, Pickering et al. (2006) provided evidence that readers do not really experience any difculty with these types of mismatches at all. They argue that during normal reading, language-users do not immediately commit fully to the telicity of events and that a more determinate commitment may happen only when the experimental task favours immediate decisions. Finally, in two eye-tracking experiments, Grant et al. (2020) compared ambiguous PP attachment (e.g., the brother of the waiter with a beard) and ambiguities involving pronominal reference (e.g., we met the brother of the waiter when he visited the restaurant; see Chapter 2) in English and found that, somewhat surprisingly, at least in theory, ambiguities sped, rather than slowed, reading in both cases. Today, after much research done aimed at carefully calibrating the role of sloppy, good-enough parsing, there is usually no implication that underspecifed representations are necessarily the norm or all that matters. The point is rather that they are sufciently frequent to compose a pattern that suggests that something else exists that is not perfect, fnely articulated, symbolic representations. Most of these approaches do not deny that the mind also handles (having previously created) the ‘perfect’ representations of a “psychogrammar” (Bever 2013). The core idea that one forms when confronting research of this kind is that heuristics and syntactic algorithms form a complex interplay, two streams, rather than two stages. This said, the idea of a two-stream system emphasizes parallelism, but it simultaneously stresses the notion that if a shortcut is available and it gets the job done faster, the sloppy heuristic is expected to be resorted to frst (Acuña-Fariña 2022). Naturally, very often the two paths will converge. All of the above has provided the conceptual apparatus that has made it possible over the last few years to continue to explore non-algorithmic and non-serial approaches to incremental sentence processing from new complementary angles. These new views stem from a general conception of language comprehension as a kind of dynamical goal-directed behaviour and all share a number of important concepts. One prominent idea is that in order to account for the demands of incremental processing, comprehenders anticipate/predict upcoming input rather than waiting passively for the signal to end. That is, they process what they encounter using more than what they encounter, in a top-down manner. It is claimed that comprehenders may even anticipate specifc structural properties

174 On parsers and grammars

of sentences before full bottom-up cues become available. This idea goes at least back to Trueswell et al. (1993, 1994) and their eforts to prove that listeners and readers are sensitive to the conditional probability of a structural analysis in a given context. More recently, for instance, Fedorenko et al. (2012) conducted a lexical decision experiment and a self-paced reading experiment and focused on ambiguous words that can be either nouns or verbs (e.g., bank, spin). They found that noun-biased words took longer to recognize in a context that favoured the verb meaning (‘to bank’), and that the opposite happened with verb-biased ones (‘a spin’). This underscores the role of lexical statistics. The implication is that the lexical and the syntactic levels of processing mutually constraint each other, not that one level dominates the other (Traxler 2014). The emphasis on prediction is a key issue. The idea is that comprehenders cannot usually count on a veridical representation of the input (Levy 2008a, 2008b) and that they cope with that problem using all the information available to evaluate the likelihood of an interpretation given the ongoing bottom-up cues (Hale 2001; Gibson et al. 2013; Jaeger & Snider 2013). That is, Bayesian estimates circumvent the problems caused by noisy channels. As Traxler (2014) notes: In many cases, the interpretation derived from a ‘lowfdelity’ representation of the input will match the speaker’s intended meaning, but in other cases comprehenders’ prior knowledge will lead to systematic distortions in interpretation. These distortions may occur because interpretation does not depend on the signal alone. Interpretation also depends on the comprehender’s knowledge about what is likely and what is not likely before the signal arrives. A Bayesian mechanism has to take base-rate information into account when deriving probability estimates. Comprehenders integrate base-rate information (how likely is a given interpretation in the absence of any evidence) with information available in the stimulus to rank interpretations from more likely to less likely. If the signal is noisy or is conveyed over a noisy channel, interpretation will be systematically biased toward higher-frequency interpretations. (emphasis added) Crucially, noise need not be only external. Internal noise may arise as a result of distraction or fatigue and also as a result of the message itself being more or less challenging in terms of complexity and/or memory demands. It is in this framework where independently motivated notions, such as similarity-based interference and memory-based interference, come to the fore (Gordon et al. 2001; Gordon & Lowder 2012). In memory-based accounts, mental representations often interfere with one another, and that interference increases when comprehenders must deal with very similar ones, causing lossiness and noise. Additionally, structures that impose greater memory demands (e.g., a left-displaced fller phrase that must be integrated much later; see Chapter 4) may add to the lossiness and the noise,

On parsers and grammars

175

urging the processor to take a stand that involves ignoring syntax and trusting lexically derived interpretations. Building on Hale (2001)’s idea of surprisal, on Constraint Satisfaction models (MacDonald et al. 1985, etc.) and on probability theory ( Jurafsky 2003), Levy (2008a) proved to be an unmissable landmark in studies of this kind. The American author started from the four prerequisites that an adequate comprehending system must have, namely: robustness to imperfectly formed input, accurate ambiguity resolution, inference on the basis of incomplete input; and diferential, localized processing difculty to propose a “resource-allocation theory of processing difculty grounded in parallel probabilistic ambiguity resolution (…) unifying the idea of the work done on incremental probabilistic disambiguation with expectations about upcoming events in a sentence”. This fully parallel, incremental probabilistic processor must be capable of online inference (“that is, inference before input is complete”), and also of updating its collection of ranked partial parses after every new input token. The surprisal component is to be seen as the difculty incurred in replacing the old distribution with the new. Hale (2001) had previously operationalized surprisal by viewing the lexical predictability of a word wt in terms of a surprisal value, that is the negative log of the conditional probability of a word given its preceding context, −log P(wt|w1 … wt−1). Higher surprisal values thus entail smaller conditional probabilities, which means that words that are less predictable are more surprising to the comprehender and thus harder to process. Levy notes that surprisal goes to zero when a word must appear in a given context (i.e., when P(wi|w1 … i−1,CONTEXT) = 1) and approaches infnity as a word becomes less and less likely. In his infuential paper, he observes that predictability efects have been obtained in both eye-tracking studies, as reduced reading time and increased skipping probability (e.g., Ehrlich & Rayner 1981), and in ERP studies, as a reduction in the N400 efect (e.g., Kutas & Hillyard 1984). The quantifcation of predictability is usually done via Cloze completion studies (Taylor 1953), where it is measured as the probability with which speakers complete a preliminary context. For instance, Levy hints that in a sentence such as He mailed the letter without a stamp, stamp is much more predictable than the car in There was nothing wrong with the car. When comparing the merits of an expectation-based account such as his to the locality-based Dependency Locality Theory (Gibson 2000), he referred to diferent predictions regarding German word order in an interesting manner (pp. 1139–1140): Head-fnal local syntactic dependencies turn out to be a rich source of divergence between predictions of the DLT and surprisal. There are a

176 On parsers and grammars

variety of syntactic circumstances in which a comprehender knows that a fnal governing category has to appear, but does not know exactly when it will appear, or what it will be. This situation is common in languages with obligatorily verb-fnal clauses, such as in German, Japanese, or Hindi. As Konieczny (2000) points out, the DLT predicts in these cases that a larger number of left dependents will cause greater processing difculty at the fnal governor, because all the left dependents must be integrated with it at the same time [a locality efect]. But the surprisal theory makes the opposite prediction in this case. The more dependents we have seen, the more information we have about their governor, and in general the more information we have, the more accurately we should be able to predict that governor’s location and identity. Levy (p. 1134) claims that experiments testing these diferent predictions “are informally consistent with surprisal’s predictions” (Konieczny 2000; Vasishth & Lewis 2006, among others). Under surprisal, the fnding that reading-time patterns refect these distributional patterns can be taken as support for the hypothesis that native speakers of German, in the process of online sentence comprehension, construct and are sensitive to statistical information. He applied the same reasoning to work on subject-modifying relative clauses in English (compare: The player [that the coach met at 8 o’clock] bought the house, The player [that the coach met by the river at 8 o’clock] bought the house, The player [that the coach met near the gym by the river at 8 o’clock] bought the house; Jaeger et al. 2005) and complement and relative clauses in Hindu (both head-fnal; Vasishth & Lewis 2006): In line with surprisal-based predictive accounts, reading times at the fnal embedded verbs in these structures are lowest when more (instead of less) preverbal material appears within the clause. The latest instantiation of this research framework is the so-called lossycontext surprisal theory (Futrell et al. 2020).7 According to this theory, the processing difculty of a word is determined by how predictable it is given a lossy representation of the previous context. As the authors point out, this theory is “rooted in the old idea that observed processing difculty refects Bayesian updating of an incremental representation given new information provided by a word or symbol, as in surprisal theory (Hale 2001; Levy 2008a). It difers from surprisal theory in that the incremental representation is allowed to be lossy”. They tie this general idea to (p. 4): theories of perception and brain function that are organized around the idea of prediction and predictive coding (…), in which an internal model of the world is used to generate top-down predictions about the future stimuli, then these predictions are compared with the actual stimuli, and action is taken as a result of the diference between predictions and perception.

On parsers and grammars

177

Such predictive mechanisms are well-documented in other cognitive domains, such as visual perception (…), auditory and music perception (…), and motor planning. In other words, given that expectation-based theories are forward-looking, since they relate processing difculty to the prediction of future material, and that memory-based theories are backwards looking, the fundamental motivation for lossy-context surprisal theory is to capture frequently attested memory efects within an expectation-based framework (“Lossy-context surprisal theory augments surprisal theory with a model of memory”; Futrell et al. 2020). The latent consideration of a good-enough philosophy is obvious in almost all aspects of these models advocating for a kind of ‘shallow’ processing mode where comprehenders have perfect information about the current word but only a foggy representation of the previous ones. Prediction (of both content and specifc form) and Bayesian calculations are always at work, especially in the case of structural forgetting, that is, in those cases where comprehenders actually forget or misremember the beginning of a sentence by the time they get to its end.

5.3 The Grammatical Parser Hypothesis: the parser is the grammar In the previous section we discussed how and why a parsing device and a grammar may diverge. We consider now the opposite idea, expressed bluntly in the motto “the parser is the grammar” (Phillips 1996). Notice that ‘a grammatical parser’ would be a good thing to have because it would solve a number of problems that predictive and good-enough approaches seem to have. Traxler (2014) mentions a few: The frst is that even though Bayesian calculations and noisy-channel heuristics may account for various empirical phenomena, it is not clear that they can account for all processing, given “the combinatorial explosion that occurs for any system that must compute interpretations, both for the signal and for near neighbors across diferent dimensions (phonological, lexical, syntactic) and diferent grain sizes”. Indeed, “the number of computed interpretations would be astronomical”. It seems true that so far only very restricted processing environments have been tested. Another difculty that underspecifed or satisfcing accounts may have is that they are less adept at explaining the very high degree of isomorphism between ofine grammaticality judgements and online sensitivity to grammatical violations that we mentioned in section 5.1 above (especially seen through ERP research). We noticed that it takes less than 400 ms to detect the ungrammaticality of the verb in incorrect sentences such as *the girls is having trouble with that, and that it may take even less than that to detect the grammaticality breach in Max’s of proof (instead of Max’s proof of …). When we studied agreement mismatches, we noticed that Mancini et al. (2011a, 2011b) reported how Spanish parsers ‘amnestied’ a person mismatch by suppressing the P600

178 On parsers and grammars

repair phase, having previously detected the mismatch at the LAN stage. As Acuña-Fariña (2022) observes, readers were able to detect the surface mismatch (LAN) of a string that was in any case legal (by 400 ms) and were also able to call of a reaction to it when it became clear that the grammar of Spanish sanctions the mismatch in question. Since the P600 starts peaking at around 600 ms post anomaly and the LAN efect enters the scene at circa 400 ms, this leaves a very narrow time window to operate in. But notice that it was already narrow at 400 ms, when the frst detection of the legal mismatch was done. Finally, even though production and comprehension need not be resolved in exactly the same way, it is nevertheless true that syntactic underspecifcation and lossy channels may be less viable in production for, after all, when we speak we must select morphosyntactic and phonological forms and these typically obey grammar rules or constraints. The last ffteen years or so have seen an increased interest on the experimental front in the study of phenomena that may illustrate just how much harmony exists between grammar and processing. Island phenomena – a classic generative notion – have become a parade case for that kind of alignment. Needless to say, there is a concurrent alignment between linguistics and psycholinguistics invoked in work of this kind, mostly led by Colin Phillis at Maryland (see Keshev & Meltzer-Asscher 2017 for a recent overview, and Tanaka 2019; see also Tollan & Palaz 2021 on similar law-abiding behaviour of subject gaps in complement clauses and complementizer-trace efects). Diferently from (13), (14)–(18) are islands (the examples are taken from a Language paper by Phillips 2006): (13) Who did you hope that the candidate said that he admired? (14) *Who did the candidate read a book that praised? (15) *Who did the candidate read The Times’ article about? (16) *Who did the candidate wonder whether the press would denounce? (17) *Who did the fact that the candidate supported upset voters in Florida? (18) *Who did the candidate raise two million dollars by talking to? As can be seen, these examples involve long-distance extraction or movement phenomena arising from the need to relativize, question or topicalize certain constituents of the clause. As (13)–(18) show for English, for reasons that have prompted often heated debate since at least Ross (1967), languages have developed various constraints on such extraction operations. When a particular grammatical environment prohibits them, the generative term ‘island’ has been applied to suggest that nothing can escape out of it. For instance, in (13) the object wh-phrase is fronted to the initial position in the overall structure while still keeping its relationship with the right subcategorising head (the verb

On parsers and grammars

179

admired) intact across an arbitrarily large portion of linguistic structure – the long-distance operation spanning, in fact, a stretch of three diferent clauses: (13’)Who did you hope that the candidate said that he admired

However, as (14)–(18) show, relative clauses, certain complex NPs, interrogative clauses, subject clauses and adjunct clauses (in that order in the list of examples above) are islands because they do not allow seemingly the same extraction operation. Starting in Chomsky (1973), the idea of a Subjacency Constraint has often been invoked to unite various kinds of islands under the scope of a single explanatory principle. Later work (Chomsky 1981, 1986) helped redefne this notion. Subjacency is meant to signify that a constituent is not allowed to move over more than one bounding category (typically NP or S) at a time. The theoretical literature has mapped a very complex descriptive scenario of island phenomena in light of the cross-linguistic variation that emerged soon. An outstanding fact about islands is that despite their quintessentially syntactic façade, they are clearly afected by semantic or pragmatic properties. For instance, in accordance with Trueswell (2011: 157)’s SingleEvent-Grouping Condition, some extractions out of adjunct clauses are allowed if the event described by the matrix clause and the event described by the adjunct are construed as jointly forming a single event grouping (the matrix event and the adjunct event must share the same participant, none other than the matrix subject, and the adjunct event must modify the event described in the matrix clause; see Tanaka 2019 too). Perhaps the pragmatic interfacing is more shocking. Long ago, Pesetsky (1987) noticed that island domains are less inexpugnable when the extracted phrase is ‘discourse-linked’ or ’D-linked’. By “less inexpugnable” is meant that such extractions are usually thought to be less ungrammatical or not ungrammatical at all. For instance, (19) is all right because the syntactically more prominent role of subject is fronted, but (20) is apparently unquestionably wrong (therefore an island) because object phrases do not enjoy the same privilege. However, as (21)–(22) show, no such defnitive ungrammaticality is apparently perceived if the extraction afects the d-linked phrase which man. In the relevant literature, this is known as the Superiority Efect (the examples are from Goodall 2016): (19) I wonder who bought what. (20) *I wonder what who bought. (21) I wonder which man bought which car. (22) (??) I wonder which car which man bought To the best of my knowledge, it was the pioneering work of Stowe (1986) leading to the discovery of the Filled Gap Efect (see section 4.4) that launched an interest

180

On parsers and grammars

in the experimental perspective on island phenomena. If you recall, a FGA arises when a fronted wh-phrase fller is avidly looking for a gap to fll and fnds that gap already flled by another phrase. The classic examples (23)–(24) below were already used in Chapter 4: (23) My brother wanted to know whoi Ruth will bring us home to t i at Christmas. (24) My brother wanted to know if Ruth will bring us home to Mom at Christmas. In (23) the FGE arises when the parse hits us. Pairwise comparisons with the same word in (24), where no gap-flling takes place, shows the strain via elevated RTs in that area. The interesting thing is that, according to Stowe, gap-flling is suspended in islands, as no FGE is registered inside them. This means that the parser recognizes the island as an extraction-illegal domain and stops looking till it exits it (the structures used in the relevant experiment, Experiment 2, were not identical to (23) but that is not relevant now). Given the conspicuous activeness of parsers during the gap-flling process, this is remarkable evidence of ‘cool’ grammatical sensitivity to quite a sophisticated grammatical constraint and seems to go against the idea of structural forgetting that noisy channel models often talk about (Levy 2008a; Futrell et al. 2020). At issue here is whether islands refect grammatical constraints only, or processing constraints, or both. “Both” means coming to terms with a rather realistic scenario, namely that there may very well be grammatical constraints that are grounded in previously existing and independently motivated processing dynamics. Recent work by the Phillips team and others has helped refne knowledge of this area of grammar and processing (of English mostly; see Pañeda et al. 2020 for evidence from Spanish). For instance, work by Sprouse et al. (2012) squarely tackled the issue whether island efects are, in fact, emergent epiphenomena arising from limited processing resources. If that were the case, one should expect clear connections between reading patterns and the participants’ individual working memory profles. Using two diferent acceptability rating tasks (a sevenpoint scale and magnitude estimation) and two diferent measures of working memory capacity (serial recall and n-back),8 the authors tested over three hundred native speakers of English on diferent island-efect types. They concluded that there is no evidence linking the studied efects to memory issues and that the efects in question are therefore more likely to refect either grammatical constraints or grounded grammaticized constraints, rather than the working of feeble and variable processing resources. A conclusion largely compatible with this was reached by Pañeda et al. (2020) for Spanish and by Goodall (2016), also for English. Goodall studied the D-linking efect on extraction from islands. To examine if a D-linked fller is easier to retrieve in an island due to a working memory advantage (on the logic that ease of recovery leads to overall perception of amelioration), the author compared reactions to the (un)-acceptability of both

On parsers and grammars

181

islands and non-islands. Wh-questions with both D-linked and bare wh-phrases and both island and non-island clauses were used in a seven-point scale acceptability rating experiment, as in the following examples: (25) *What/?Which of the cars do you believe the claim that he might buy? (26) *What/?Which of the cars do you wonder who might buy? (27) What/Which of the cars do you believe that he might buy? Note that (25) and (26) involve violations of the Complex Noun Phrase Constraint and the Wh-Island Constraint, respectively. Conversely, the gap in (27) is inside a that-clause, a non-island environment. Results indicated that D-linking signifcantly increased the acceptability of both island and non-island domains, a fact that illuminated a complex role of working memory. However, the acceptability boost was uniform in both types of clauses, which means that the island efect per se is not the result of working memory interference. Online sensitivity to the fne aspects of a complex syntactic structure exceeds the remit of island phenomena. Take the processing of backwards anaphora, or cataphora, for instance, as in (28)–(30): (28) Gender match: When he was at the party, the boy cruelly teased the girl during the party games. (29) Gender mismatch: When he was at the party, the girl cruelly teased the boy during the party games. (30) Control: When I was at the party, the boy cruelly teased the girl during the party games. In an eye-tracking experiment, Van Gompel and Liversedge (2003) registered an early mismatch efect at the region immediately following the main clause subject NP (cruelly in the examples above): RTs were higher in (29) than in (28) just there. The authors argued that this so-called Gender Mismatch Efect indexes the automatic (indeed geometrically determined) establishment of a referential dependency between the pronoun and the matrix clause subject phrase before relevant bottom-up information about likely antecedents has time to enter the scene: (31) When he ……………………., the boy ...............

This activeness is not too unlike the behaviour of the processor in the gapflling process (see Chapter 4). With this fnding as background, Kazanina et al. (2007) further examined whether syntactic constraints such as Principle C of the Binding Theory (Chomsky 1981) are at work during the processing of this

182 On parsers and grammars

long-distance pronominal dependency. In the relevant generative terminology, Principle C refers to the fact that a cataphoric relationship is not possible in geometries where a pronoun c-commands its antecedent, as in (32)–(34): (32) *Hei likes John i. (33) *Hei said that John i likes wine. (34) *Hei drank beer while John i watched a soccer game. Generative specifcities aside, the general cross-linguistic attestability of something like Principle C is quite large. In their three self-paced reading studies, Kazanina et al. (2007) used materials like (35)–(39), which, due to their complexity and length, one might be inclined to predict would cause some kind of structural forgetting efect: (35) Principle C/match: Because last semester shei was taking classes full-time while Kathryn was working two jobs to pay the bills, Erica i felt guilty. (36) Principle C/mismatch: Because last semester shei was taking classes full-time while Russell was working two jobs to pay the bills, Erica i felt guilty. (37) No constraint/match: Because last semester while shei was taking classes full-time Kathryn i was working two jobs to pay the bills, Russell never got to see her. (38) No constraint/mismatch: Because last semester while shei was taking classes full-time Russell was working two jobs to pay the bills, Erica i promised to work part-time in the future. (39) No constraint/name: Because last semester while Erica i was taking classes full-time Russell was working two jobs to pay the bills, shei promised to work part-time in the future. However, according to the authors, the results of their experiments indicated that gender mismatch efects were visible at grammatically legal antecedent positions, but not at grammatically illegal ones. So, the parser treated the critical subject NP as a potential antecedent only when that position was not proscribed by a constraint of the grammar that regulates coreference. They further suggest that the parser shows opportunistic behaviour in behaving this way: We suggest that structural information has priority in this domain because it is often the only type of information about a potential antecedent that is reliably derivable before bottom-up information about the noun phrase is encountered. If the parser encounters structural evidence for an upcoming clause, then it can immediately and reliably predict that the clause will have a subject noun phrase, and it could also immediately evaluate whether

On parsers and grammars

183

that subject noun phrase is in a position that makes it a structurally licit or illicit antecedent for a previously encountered pronoun. This information could all be computed before any bottom-up information about the subject noun phrase is encountered. In contrast, other information about the subject noun phrase such as its morphological features and its semantic match to a previously encountered pronoun cannot be evaluated except via bottom-up information, because these properties are not structurally predictable. Thus, under this view syntactic information plays a crucial role because it enables the parser to make predictions about upcoming material earlier than any other type of information, and hence there is no need to impose architectural constraints that force certain information types to have priority. Note that this line of reasoning might apply diferently in languages that display richer morphological agreement than English, such that it may be possible to reliably predict morphological properties of an upcoming noun in advance of the noun itself. (Kazanina et al. 2007: 406) Notice that the refex to automatically implement something like (31) is well grounded in the English grammar when contemplated more generally. Thus, for instance, the use of misrelated participles by second-language learners of English often brings about the pronounced general geometric determinism of English and its entrenched SVO preference, a fact that most probably arose historically as a consequence of the gradual loss of its morphology. The dialogue in response to examples (40)–(41) below is taken from an Internet search of the phrase ‘misrelated participle’. The oddness in (40) would disappear in, for instance, the Spanish literal translation, as there really are no misrelated participles in Spanish (there is simply a mild preference to expect the understood subject of the initial adverbial clause to coincide with that of the upcoming matrix clause, but not a strong constraint): (40) ***While watching the parade, my wallet was stolen. (41) While watching the parade, I got my wallet stolen. “Strange, it is only 2# that grammatically is incorrect. What do you learned teachers think of it, please? Thank you very much”. “Hmm. I don't fnd that strange. The frst implies that my wallet was watching the parade, which is pretty silly”. Finally, the same online sensitivity to grammatical constraints is tested in work by Kush et al. (2017), who were interested in examining the parser’s ability to use fne-grained syntactic information to prevent the retrieval of distractors in both Strong and Weak Crossover environments (Postal 1971; Wasow 1972). Pressed to solve a referential co-indexation, does the parser consider illegal retrievals?

184

On parsers and grammars

Strong Crossover takes place in structures like (42) below, where which girl cannot grammatically bind the pronoun she despite c-commanding it and matching it in morphological features: (42) *Bob asked which girl i it seemed that shei thought Bill made fun of ___. The term crossover is used to denote the idea that the trajectory that links the wh-fller phrase to its gap (that appears last in (42)) ‘crosses over’ the pronoun. These constructions are interesting as a way of gauging the extent or the limits of grammar-based retrieval precisely because their surface form cannot in theory preclude the possibility that the distractor and the pronoun be bound, so more fne-grained grammatical knowledge is necessary. Notice that, contrary to ofine grammaticality judgements, an incremental parser has no way of accessing the full sentence by the time it hits the pronoun, so there is no evidence of the upcoming gap. In this sense, the Kush et al. study tested forward-looking sensitivity to Principle C.9 In their frst experiment the authors found that the parser did access the displaced fller as an antecedent for the pronoun when no grammatical constraints prohibited it. However, it ignored the same fller in the Strong Crossover condition. They considered two possible explanations, and they both involve refned grammatical knowledge. First, the parser might have used Principle C to exclude the wh-fller on the basis of the c-command relation potentially holding between pronoun and gap. Alternatively, the retrieval mechanism might just have ignored any element that does not occupy an argument position. This is why in their second experiment they used Weak Crossover sentences such as (43) below: (43) *Bob asked which girl i it seemed that her i friend thought Bill made fun of ______. Co-indexation in (43) is not ruled out by Principle C now (as the pronoun her does not c-command the gap). Since it is still illegal, it is often assumed that the illegality stems from the fact that the fller is in a non-argument position (in the specialized literature: the specifer of CP). Their second experiment showed that pronoun antecedent retrieval accessed matching (distracting) wh-phrases: they did register a Gender Mismatch Efect. The authors concluded that the parser accesses and makes rapid use of Principle C and c-command information to constrain retrieval in Strong Crossover. The previous work shows that there seems to be a close alignment between the operations conducted by the parser and knowledge of quite sophisticated grammatical constraints, to the point that it seems fair to conclude that the parser simply uses that knowledge directly to do its job. As a fnal comment on the signifcance of this type of evidence, consider Phillips (2006: 819–820)’s observations on the relationship between theories of grammar and theories of processing (see also Steedman 2000; O’Grady 2005; Lewis & Phillips 2015):

On parsers and grammars

185

the more we fnd that real-time processes capture fne-grained grammatical distinctions, the less need there is for an additional system that recapitulates such distinctions in a time-independent fashion. A widespread assumption in generative grammar has been that grammars do not aim to capture how speakers construct sentence structures in real time (e.g. Chomsky 1965), and this assumption has sometimes been justifed by claims that real-time structure building lacks grammatical precision (e.g. Chomsky & Lasnik 1993; Townsend & Bever 2001). However, if we fnd evidence that real-time incremental structure building is grammatically precise, and have little clear evidence for grammatical processes that operate on any other time scale or in any other order, then there is reason to doubt standard assumptions about the time independence of generative grammars. (emphasis added)

5.4 Grammatical illusions We tackle grammatical illusions now, that is, cases where it seems like the parser ignores grammar and proceeds to construct grammatically illegal representations because they ‘look legal’. The term ‘illusion’ is borrowed from the literature on deceptive or misleading visual representations that the human brain produces sometimes. The brain does that because it is trying to make sense of what it sees and is sometimes tricked into seeing something that is actually not real. Long ago, Penrose and Penrose (1958: 32) referred to the now classic illusion in Figure 5.1, noting that although each individual part is acceptable as a separate

FIGURE 5.1

An impossible structure

186

On parsers and grammars

representation, “false connections” between the parts lead to the overall impression of an impossible structure: These visual illusions typically illustrate misalignments between distal and proximal stimuli by conjuring up familiar and perfect local cues whose ulterior overall integration turns out to be impossible. They have been highly instrumental in helping defne “the grammar of vision” ( Jackendof 1993: 165 f.; Seckel 2006). Recently, the so-called dress illusion (Brainard & Hurlbert 2015) has popularized individual diferences in the way the brains of diferent people construct visual representations of the same stimuli. According to some, the same phenomenon applies to language.10 Notice that this topic is more naturally a subdomain of the SGH, which we examined in section 5.2 above. The reasons why I have made a separate section of it are that experimental studies on illusions have evolved to become a subfeld, and also that it is not always clear that illusions actually refect the processing of a preferred ungrammatical heuristic (see Acuña-Fariña 2022 for much of what is said here). We referred above to the comparative illusion created by the sentence More people have been to Russia than I have. This is known as a kind of Escher Sentence, in reference to M.C. Escher’s (1960) lithograph Ascending and Descending, where the author used conficting proportions to create an impossible perception of depth. Since Townsend and Bever (2001), this structure has become something of a classic and has attracted quite a lot of interest. Townsend and Bever explain the illusion as the result of the automatic application of two blended syntactic templates: e.g., more people have been to Russia than I and People have been to Russia more than I have. The two templates share quite a lot of lexical material and the appearance of a comparison, hence both the blend and the illusion. Note that the ungrammaticality and ultimate unprocessability seems to be caused by the matrix clause subject invoking a comparison between two sets of individuals and the than clause failing to provide the second set (compare More athletes have been to Russia than businessmen in the last year). The interesting aspect of this illusion is that it lingers for quite some time actually, to the point that it takes signifcant attention and efort to realize that the sentence is, in fact, meaningless. This seems therefore an extreme case in a gradient at the other end of which lie structures that only very feetingly create the representation of ungrammatical strings. The best-known illusions involve negative polarity items (NPIs), such as any or ever. In principle, the grammar of negative polarity in English at least involves a constraint that amounts to what generative grammarians have identifed as a c-command relationship between a negative element and the NPI. In (44) no c-commands ever; in neither (45) nor (46) does that confguration obtain, hence the deviance (these examples and the next three come from Phillips et al. 2011): (44) No professor will ever say that. (45) *A professor will ever say that. (46) *A professor that no student likes will ever say that.

On parsers and grammars

187

However, as Phillips et al. (2011) observe, the class of negative licensors is far from being formally defned. It is certainly not needed that a transparent lexical negative item like no, few, or rarely be used. Very often, all that is needed is a ‘negative context’, indeed one that can be very loosely defned as such (but see Ladusaw 1979): (47) If John ever shows up, he will learn that he is fred. (48) Who has ever been able to answer a question like that? (49) Everybody who John has ever met ends up fnding him fascinating. This underscores the pragmatic underpinnings of the construction, and also the interactions with the lexical component, as well as the limits of the confgurational requirement (Fauconnier 1975; Chierchia 2006).11 Early experimental work on this area was done by Drenhaus et al. (2005) on German NPIs. In a speeded grammaticality judgement test, they found that readers do judge sentences with no licensors infelicitous but, interestingly, their grammaticality ratings showed comparatively increased acceptance for structures with negative quantifed NPs and non-c-commanded NPIs. They used structures like (50)–(52) below: (50) Accessible NPI licensor: Kein Pirat, [der einen Braten gegessen hatte,] war jemals sparsam No pirate who a roast eaten had was ever thrifty “No pirate who had eaten roast (meat) was ever thrifty”. (51) Inaccessible NPI licensor: Ein Pirat, [der keinen Braten gegessen hatte,] war jemals sparsam A pirate who no roast eaten had was ever thrifty “A pirate who had eaten no roast (meat) was ever thrifty”. (52) No NPI licensor: Ein Pirat, [der einen Braten gegessen hatte,] war jemals sparsam A pirate who a roast eaten had was ever thrifty “A pirate who had eaten roast (meat) was ever thrifty”. The key word is jemals  (‘ever’). In (50), the licensor Kein Pirat c‐commands it; in (51), the licensor keinen Braten  is inside the relative clause modifying the NP and is therefore unable to c‐command the NPI; and in (52) jemals appears again but there is no licensor at all. So (51) and (52) are both ungrammatical, but (51) presents the parser with a rough surface form that might make it not realize that. Participants saw the main clause NP, the embedded clause NP, and every other word for 300 ms each. 500 ms after the last word of the sentence was shown on the screen, they had to judge the sentence’s acceptability. The authors noticed that in a signifcant number of trials a linearly preceding but structurally inaccessible licensor resulted in an illusion of grammaticality: they judged the ungrammaticality of (51) less efectively than that of (52). They referred to this as the

188

On parsers and grammars

TABLE 5.1 Negative polarity illusions in German. Accuracy and reaction times (adapted

from Drenhaus et al. 2005) Condition

Accuracy (%)

Speed (in ms)

[46] Accessible licensor [47] Non accessible licensor [48] No licensor

85 70 83

540 712 554

intrusion efect. It looked like the illegal dependency resolution was mediated by having a sort of blind item-to-item retrieval check (something analogous to a Negation-frst NPI-second schema) that superfcially amnestied the construction in (51) momentarily. Since (52) lacked intruders, ungrammaticality detection was easier to accomplish. Table 5.1 summarizes the accuracy and reaction times of the Drenhaus et al. (2005) study. Vasishth et al. (2008) wanted to verify whether similarity-based interference caused by partial cue-matching could explain the intrusion efect. They managed to replicate that efect in an eye-tracking study and used this evidence to argue for the cue-based retrieval model of sentence processing and a content-addressable architecture where memory chunks of earlier parts of the sentence are queried in parallel and the chunk with more matching cues is fnally retrieved (Lewis & Vasishth 2005; Lewis et al. 2006; Vasishth & Lewis 2006). In this model, partial-cue matching leads to similarity-based interference, which, in its turn, leads to processing delay and grammaticality judgement errors: in sum, it causes illusions of grammaticality in the face of structures that are clearly deemed deviant ofine. Similar work done with the English language has helped confrm this general view of the illusion efect. For instance, in a self-paced study, Xiang et al. (2006) used sentences like (53)–(55), involving a contrast between a grammatical structure and two ungrammatical ones: (53) No diplomats have ever supported a drone strike. (54) *The diplomats have ever supported a drone strike. (55) *The diplomats that no congressman could trust have ever supported a drone strike. Again, though ungrammatical (54) presents little reason for retrieval error since there is no licensor in sight, whereas in (55) the presence of no in the embedded relative clause provides partial matching. Results indicated that both (54) and (55) were reliably deemed ungrammatical when subjects had a little time to think about them. However, sophisticated measuring of the online processing record showed that sentences like (55) were often treated momentarily as if they were all right, confrming the illusion. Xian et al. (2009) further analysed the same range of constructions in an ERP study and managed to fnd electrophysiological

On parsers and grammars

189

evidence of the intrusion efect in the form of a decrease in the size of the P600 waveform relative to refexive binding violations. These latter resisted the illusion by producing robust P600 efects whenever they were ungrammatical (see below). Recent work by De Dios (2019) has contributed to a refnement of the knowledge of this area. De Dios was interested in studying the online and ofine perception of sentences involving diferent types of negation, a notoriously complex area, as in (56)–(58): (56) The authors [that the critics recommended] have never received acknowledgement… (57) The authors [that no critics recommended] have never received acknowledgement… (58) No authors [that the critics recommended] have never received acknowledgement… …for a best-selling novel. Notice that the three sentences are grammatical but that (56), with single negation, appears to be the easiest to deal with. (57) is an instance of multiple negation, containing the negative markers no and never in diferent clauses. (58) illustrates double negation, with both no and never appearing in the main clause. De Dios conducted three experiments: a speeded acceptability task (Experiment 1), a self-paced reading task (Experiment 2) and an untimed acceptability judgement task (Experiment 3). The frst two were devised to tap into online and/or fast processing of these structures while the last one targeted readers’ ofine and/or slow perception of acceptability. Across the three experiments, the double negation condition, (58), exhibited the most degraded perception of grammaticality and the slowest recovery from disruption. The multiple negation condition, (57), displayed greater acceptability ratings and faster recovery from disruption. Finally, the single negation condition, (56), was, as predicted, the easiest/fastest to process and pass judgement on. Note that these results evince illusions of ungrammaticality, the mirror image of the cases we had been considering till now. Indeed, relative to the Xiang et al. studies reviewed above, which showed that intrusive no ameliorates the perception of ever in ungrammatical sentences, De Dios’s experiments demonstrate now that it seems to deteriorate the perception of never in grammatical ones. This is a result that goes against habitual cue-based views of processing, where only illusions that rescue ungrammatical strings are contemplated. Note too that perceived ungrammaticality increased when no c-commanded never, (58). Crucially, the acceptability contrast between single negation and multiple and double negation sentences cannot be attributed to constraints imposed by the grammar of English, as all the sentences examined were grammatical. De Dios considers several possibilities, the most obvious one being the idea that readers were being caught in an illusion-creating environment caused by the

190 On parsers and grammars

immediate application of heuristics involving the surface presence of two negative forms. In the cline that all these structures seem to naturally compose, it is conspicuous that the double negative condition approaches the comparative illusion we examined above by being quite resistant to fading, even in untimed grammaticality judgement tasks. Together with the comparisons data, this result, in particular, challenges the received wisdom of assuming a black-and-white view that equates illusions with online experiments and illusion-disappearance with untimed acceptability judgements. Similar efects to those obtained by De Dios have been reported by Yanilmaz and Drury (2018) and Xiang et al. (2006). De Dios (2021) also found the same type of lasting efects in her analysis of distractors in verb control in Spanish. In her study, ungrammatical sentences with a matching distractor, such as María aconsejó a Francisco PRO ser mucho más ordenada (‘Mary advised Frank-masc PRO to be much more orderly-fem’), where the object control verb aconsejar/advise requires that PRO be controlled by the object phrase Francisco/Frank, received higher grammaticality ratings than ungrammatical sentences with a mismatching distractor, such as Antonio aconsejó a Francisco PRO ser mucho más ordenada. Since only reaction time changes in the comparison between online and ofine judgements, it seems clear that the time dimension may be crucial here and needs to be carefully analysed and interpreted. In fact, current research seems to be zeroing in on the time issue.12 This is implicit in the activation decay component of cue-based models (Lewis et al. 2006; Vasishth & Lewis 2006; Parker et al. 2017), but Parker and Phillips (2016) have specifcally proved that illusions can be reliably “switched on and of” by manipulating the size of the constituents (that is the time) intervening between the appearance of the potential licensor and the appearance of the NPI. This fact underscores basic aspects of the incremental structure-building process, and it brings home a sort of déjà vu in this book: particularly how frantically the processor deploys automatic responses in (proximal) small-grain segments, and how quickly it recovers from situations where the mini stretch of local parsing yields representations that violate the (distal) larger, ongoing sentence representation. Parker and Phillips (2016: 335) view this as evidence against a purely cue-driven process (it is also somewhat at odds with the general idea inherent in predictive, noisy channel models that the more material intervenes in a long-distance relationship, the less room for surprisal there is at the fnal moment of resolution or co-indexation; see section 5.2): we argue that the encoding is not fxed, as previously assumed, but rather, changes over time. At one moment, irrelevant items inside the licensing context are transparently accessible as candidates for causing illusions. Then, at a later point in time, those same irrelevant items become opaque as candidates for causing illusions. If the encoding changes with the passage of time, we might expect diferent behaviors at diferent points in time depending on when the encoding is probed, as observed in our experiments. (emphasis added)

On parsers and grammars

191

The authors tentatively suggest that the distinction between localist and distributed representations might explain the on/of activation of the NPI illusion if two stages are contemplated. Notice that these stages occur in exactly the opposite order to the two stages proposed by researchers endorsing we-processeverything-twice views: The parser begins by constructing a localist representation of the sentence in which component feature values are made explicit and transparently accessible, as assumed in the Lewis and Vasishth model (Lewis & Vasishth 2005). In this stage, individual features, such as +negation, can be independently evaluated, creating the opportunity for partial match interference. In the second stage, those same features are bound together to form a distributed representation that interfaces with the interpretive system and pragmatic inferencing. In this stage, the individual features are no longer independently evaluable and are opaque for causing illusions, since bound representations can only be recovered holistically, i.e., without partial matching. (Parker & Phillips 2016: 336; emphasis added) In fact, it may very well be the case that we are dealing with three, instead of one or two, stages: the frst one might construct very local associations; the second might use these to activate stored templates (heuristics), while further local connections continue to be built; and the third would harmonize the two by invoking a larger, distributed representation. Notice that the fastest reaction to presumably syntactic operations in the ERP literature is usually the LAN efect, and this starts at around 400 ms post anomaly. Between that and the onset of the P600 efect that indexes a reaction to the violations (already under some form of strategic control therefore), there are some 200 ms, with the very P600 efect lasting between 600 and 900 ms. This means that there is probably a little time window of approximately half a second post stimulus to accommodate a changing representation. Fully conscious grammaticality judgements may be taken to start after 1000 ms. If, as some contend, there is also an ELAN (Early Left Anterior Negativity; Friederici & Weissenborn 2007) for obvious phrase structure violations (*Max’s of proof, instead of Max’s proof of …) at around 200 ms, the window for accommodating these putative three stages would be less stringent. An alternative to a system like this that most of these studies tend to ignore (by mentioning it just in passing; cfr. Parker & Phillips 2016) is as simple as hard to prove experimentally: massive parallel processing. As we have seen (section 5.2), predictive and good-enough models do make room for that. It is well known that recent research is questioning the tight distinction between the LAN and the N400, as both occur in the same time window but afect diferent areas of the scalp (Tanner & Hell 2014; Tanner 2019; Cafarra et al. 2019; also section 3.3), a fact that meshes well with the

192 On parsers and grammars

idea of parallelism. With the current state of knowledge, we are incapable of resolving this important puzzle. Finally, adding to the changing dynamics aforded by time, the fexible nature of the dependency-formation process tapped by work on illusions has also been revealed by another recent fnding, namely, that even the structures that have traditionally proved to be most resistant to illusions can, in fact, be shown to elicit intrusion efects if the right number of cues forms the right kind of coalition. The parade case for resistance to illusions is agreement in refexives and the control condition for that is agreement attraction, where intrusion is easy to achieve. Let us briefy illustrate the contrast. In the literature, agreement attraction has generally been regarded as a clear case of a grammatical illusion. This is because even though the subject-verb agreement rule is a straightforward constraint to implement (involving mere feature co-variance, with very little in terms of feature complexity in English …), in practice work in both production and comprehension has shown that it is quite susceptible to interference efects (see chapter 3 and references therein). For instance, in (59) below, a psycholinguistic classic, cabinets intrudes on the agreement dependency in production in a very conspicuous manner, making it derail: (59) *The key to the cabinets are on the table. Attraction thus provides a shocking contrast with islands in that the grammatical knowledge that must be used to deal with subject-verb agreement is but a fraction of the knowledge needed to tackle island phenomena. This makes the fallibility of the former and the grammatical faithfulness of the latter quite a mysterious afair. In comprehension, the intrusion efect emerges as a reduction of the reaction to the ungrammaticality: reading times often refect little or no difculty at all in the verb area (Pearlmutter et al. 1999; Wagers et al. 2009; Acuña-Fariña et al. 2014; Lago et al. 2021) and accuracy in grammaticality detection is reduced. In short, in the literature the ‘pardoning efect’ is interpreted as the behavioural refex of retrieving an illicit item in memory (cabinets), that is, an illusion of grammaticality (Phillips et al. 2011; Parker & Phillips 2017). Interestingly, the same efect emerges with intruders that are not adjacent to the agreement target (the verb), as in the downward percolation structures we tackled in Chapter 3 (section 3.2.1.2), illustrated in (60): (60) *The cabinets that the key open are on the second foor. Agreement in refexives is much more difcult to fool. This is a priori surprising given that the retrieval process afects the same structural position (the subject) and the same feature (number, in most experiments). Direct evidence of that was provided in an eye-tracking study by Dillon et al. (2013). The authors used closely matched materials, such as (61)–(62) below and found that the plural verb

On parsers and grammars

193

were in (61) showed attraction efects from the plural distractor (managers), but that plural refexive themselves in (62) did not: (61) The new executive who oversaw the middle manager(s) apparently was | *were dishonest about the company’s profts. (62) The new executive who oversaw the middle manager(s) apparently doubted himself | *themselves on most major decisions. Similar null efects for refexives have been reported in Sturt (2003), using gender agreement, and Cunnings and Sturt (2014). Additionally, in the ERP study of Xiang et al. (2009) that we mentioned above, the refexives condition elicited strong P600 efects whose modulation was untouched by the presence of distractors. Given their diferent intrusive profle, Dillon and colleagues reasoned that refexives and S-V agreement might use diferent retrieval cues, with morphological information being more valuable in S-V ties and structural information being more useful in refexives (the c-command relationship yet again). Notice that this entails that not all linguistic dependencies are resolved alike by a cue-based retrieval mechanism. Refexives would thus constitute a parade example of sturdy grammatical obedience that is hard to explain from cue-based or good-enough perspectives (Traxler 2014: “These results suggest that comprehenders do not underspecify syntactic form, but that plausible semantic interpretations derived early in a sentence are not always displaced based on later processing events”). However, in a recent study Parker and Phillips (2017) conducted three eyetracking experiments and managed to show that even refexives are susceptible to intrusion efects when the refexive and the subject mismatch in multiple cues. Specifcally, they compared conditions where the subject and the refexive mismatched in one feature (gender or number), or in two features (animacy + gender, animacy + number, and gender + number). For instance, these are the conditions used in their frst experiment, involving gender and animacy: TARGET MATCH CONDITION Distractor match: (63) The strict librarian said that/ the studious schoolgirl reminded/ herself/ about the/overdue book. Distractor mismatch: (64) The strict father said that/ the studious schoolgirl reminded/ herself/ about the/ overdue book. 1-FEATURE MISMATCH Distractor match: (65) The strict librarian said that/ the studious schoolboy reminded/ herself/ about the/overdue book.

194 On parsers and grammars

Distractor mismatch: (66) The strict father said that/ the studious schoolboy reminded/ herself/ about the/ overdue book. 2-FEATURE MISMATCH Distractor match: (67) The strict librarian said that/ the brief memo reminded/ herself/ about the/ overdue book. Distractor mismatch (68) The strict father said that/ the brief memo reminded/ herself/ about the/ overdue book. They concluded (p. 287) that “the selectivity of refexive attraction may refect a weighted cue-combinatorics scheme in which structural cues are weighted more strongly than morphological cues” (unlike in ordinary S-V agreement). The different resolution profle might refect the diference between retrieval driven by error correction (S-V agreement) or by normal dependency resolution. The latter type would more naturally apply to refexives because, unlike S-V agreement, refexives cannot be easily anticipated and are therefore less likely to contradict previous predictions. It is hard to make a unifed impression of such a variegated range of experiments but one impression that arises from an aerial contemplation of the literature on illusions is that the structures analyzed are too diferent to compare from just one viewpoint: their intrusion profle (see Acuña-Fariña 2022). S-V agreement shares nothing of the conceptual complexity of comparisons and, especially, of negation. To make the complexity issue more pronounced, the strongest attraction efects have always been reported for S-V agreement inside a simple clause (with the attractor being inside a PP), whereas the comparisons and negative structures tested inevitably span at least a compound sentence containing two clauses. If negatives are hard to process on their own, a combination of several negative elements surely stretches the processing system in a way a string like the key to the cabinets VERB cannot do. Most conclusions about grammar/parser relations target the simple clause, for the very good reason that a combination of clauses tilts the balance in favour of a consideration of memory limitations, of both a qualitative and a quantitative kind. With a stretch of linguistic structure that spans at least two clauses there is surely time for global comprehension processes to start acting, a fact that may obscure our understanding of the incremental small-grain build-up of structure. Although attraction efects have been reported for relative clauses, the combination of processing unit size (one factor or parameter or constraint) and inherent complexity (another factor or parameter or constraint) would make of comparisons and negatives a much more challenging processing domain.

On parsers and grammars

195

This is relevant because it is, in fact, not clear that attraction mistakes are illusions. When a speaker says *The key to the cabinets are too big, or when a reader’s eyes momentarily ignore the deviance, this may well be simply because locally the cabinets are is not ungrammatical. It is essential to understand that, in reference to the changing representations dynamics and the time issue tackled above, a stretch of proximal processing may momentarily validate the cabinets are as a legal unit and then abandon that local parse when more time makes it clear that of all the grammatical options that are becoming incrementally available, the cabinets are in particular is fnally incorrect non-locally. This explains why, of all putative illusions, attraction does not last much as such and is easily ‘corrected’ in grammaticality judgements. Again, this whole area has become too complex to be properly understood in reference to one or two magic strokes, but it surely makes sense to recognize that the non-lasting efect of the putative illusion of grammaticality in *The key to the cabinets are too big and the lasting efect of the illusion of ungrammaticality in No authors [that the critics recommended] have never received acknowledgment for a best-selling novel stem, at least in part, from the fact that the sheer overall complexity of the latter is at issue. In fact, I suggest that a way for the parser to resolve such complexity may be to ‘template’ it: that is, to rely on easy structure recognition procedures such as ‘No followed by Ever is fne’, for the very good reason that more often than not when no is followed by ever the sentence in question will surely be fne. This may help explain the illusion in *The diplomats that no congressman could trust have ever supported a drone strike. It would also make relevant all the evidence for underspecifed representations that we referred to in section 5.2: comparisons and negative structures would be easy to ‘template’ and this would foster initial underspecifed representations of their really deep complicated structures. Understood this way, ‘templatability’ would be another sort of rule conspiracy, and constructions may well difer in how easy it may be to turn them into templates. This is important because evidence for heuristics may be easy to fnd precisely in those cases; and evidence for stubborn compliance with complex grammatical constraints may be easy to fnd in much less ‘templatable’ domains, such as with islands. If these loose considerations are on the right track, the debate whether sophisticated symbolic grammar antecedes heuristics or the other way around may hinge upon which specifc construction we are dealing with. In any case, if: a. we can redefne some phenomena (e.g., attraction) as mere local versus distal discrepancy, instead of illusory processing; b. we can redefne Bever-style and good-enough heuristics as the online use of a simpler syntax, instead of the use of non-syntax; and c. most of the evidence for the SGH is to rest on the size of the facts of the illusions remaining (comparisons, negative structures and little more …, explainable besides on other grounds), then Phillips (2006: 819–820)’s observation that real-time structure-building does not lack grammatical precision becomes much more persuasive. And so does his mistrust of models of grammar that advocate such independence.

196 On parsers and grammars

5.5 On fexibility and opportunism A conspicuous feature of the many forms of research that we have been examining in this book is the way the parser adopts opportunistic solutions to processing dilemmas. In my opinion, this fact is usually inappropriately ignored. If we can see that the same kind of opportunistic behaviour is there also in the grammar, then the alignment view that the GPH entails becomes more attractive. Consider the facts we examined in Chapter 2 regarding the attachment of ambiguous relative clauses to complex NPs, as in somebody shot the servant of the actress who was on the balcony. We noticed that a lively theoretical debate emerged the moment Cuetos and Mitchell (1988) suggested that English and Spanish followed diferent attachment procedures. It turns out that, as subsequent research confrmed, English opts for a recency strategy (which involves attaching the RC low to actress in the example above), whereas Spanish prefers to skip the modifying noun and establish non-local high attachment to the head noun instead (servant). In English one cannot say *I love very much potatoes, because the rigidity of its general language–type SVO reliance demands that arguments occupy the S and the O position in transitive structures, with adjuncts appearing after the O (I love potatoes very much). No such restriction occurs in Spanish (me gustan mucho las patatas and me gustan las patatas mucho are both right). It seems that when facing the ambiguity problem in RC attachment, English speakers simply deploy their usual general language-type strategy of opting for a local parse. I take that to be an opportunistic decision because it is easy to imagine a more parsimonious strategy of opting for the same attachment bias in the two languages. What we see, instead, is that the English solution makes sense given its general language type. The same kind of ‘geometrical determinism’ of English is evident in the processing of backwards anaphora that we examined in section 5.3 above: the Gender Mismatch Efect (When he was at the party, the girl cruelly teased the boy during the party games), refects the fact that English speakers expect to fnd a pronoun’s coindexation site in a very determinate position: that of the subject of the main clause that follows. We alluded to the fact that in the grammar of English there is such a thing as a Misrelated Participle Constraint (*/? While watching the parade, my wallet was stolen) to the efect that the subject of an initial adverbial subordinate clause is to be found in “a very determinate position: that of the subject of the main clause that follows”. Since English cannot avail itself of the phrasalpackaging mechanism provided by a rich morphology, it must use other connectivity devices. Determinate constituent positions usually solve that problem and specifc parsing biases seem to rely on that.13 Take Basque – the preferred and computationally less demanding word order in this language is SOV (Erdocia et al. 2009). Since in reference to the SVO type this might involve a greater working memory load, Díaz et al. (2011) suggest that Basque speakers rely more on case morphology information online (on habituation with specifc constructions, see Levy 2008a; Vasishth et al. 2010 and Futrell et al. 2020, among many others; section 5.2). This may highlight the fact that diferent processing strategies

On parsers and grammars

197

are implemented depending on the salience of diferent grammatical phenomena, such as case-marking, verb agreement or word order, a hypothesis that fts in well with Greenberg’s (1966) generalization that languages with basic SOV order typically have case morphology. Díaz et al. (p. 361) refer to the Extended Argument Dependency Model of Bornkessel and Schlesewsky (2006) to account for these facts. This model “claims that incremental argument interpretation across language types is best explained with reference to a cross-linguistically motivated ‘prominence scale’ containing information such as case marking and word order, among others, where diferent grammars yield diferent prominence scales” opportunistically. Consider the facts of agreement now. In Chapter 3 we examined attraction mistakes (*the label on the bottles are green) and the theory known as Maximal Input. Recall that work on Italian proved that the percentage of errors involving biological gender agreement and arbitrary gender agreement is diferent (there is less of that in the former; Vigliocco & Franck 1999). This means that semantics is interfering with the agreement computation somehow, perfecting it. Again, a more parsimonious system would rely on a formal identifcation of the masc versus fem cue only, for this applies after all to both arbitrary and conceptually based agreement. Yet, what the system appears to do is to trust semantics when this is a reliable cue, and in the binary gender system of Italian, it is a truly reliable cue: -o designates ‘male’ and -a ‘female’ with outstanding consistency. Vigliocco and Franck (2001) used Italian and French epicene nouns and found the same pattern of results.14 They spotted fewer attraction errors in the preambles in which the gender of the epicene noun matched the sex of the referent. The authors argued that conceptual information helps syntactic accuracy when congruent with syntactic information and interferes with it when it mismatches it. Opportunism in agreement mistakes is even clearer in work by Franck et al. (2008). The authors manipulated the morphophonological transparency of both the articles and the nouns in Spanish, Italian and French NPs, as these difer in their cue reliability. Thus, for instance, it turns out that Italian nouns are highly reliable in that the -o/-a distinction that codes masculine versus feminine afects around 80% of the nouns in that language, but articles are less dependable because they are often contracted with the nouns. By contrast, in Spanish, the -o/-a distinction applies only to some 68% of the nouns, but the articles are extremely dependable. By examining the mistakes the participants made, Franck et al. noticed that the subjects in their experiments paid attention to the nouns alone in Italian, both to the nouns and to the articles in Spanish, and they were comparatively more inclined to be driven by the form of the articles in French, since noun endings have eroded in French and are thus less informative. That is, the speakers of each language group adapted their encoding biases to the particularities of their respective grammars with almost pristine precision (Levy 2008a; Vasishth et al. 2010). Chapter 4 focused on empty categories and, in this connection, we became acquainted with the work of Betancort et al. (2006) on the processing of PRO gaps in Spanish. The authors carried out two eye-tracking experiments aimed

198 On parsers and grammars

at measuring whether lexically established information of the control properties of verbs and prepositions was used when dealing with the infnitival gaps, or whether, alternatively, the parser merely opted for a blind connection with the most recent fller – a locality bias. They used structures like (69)–(72): (69) Subject control María i/prometió/a Pedroj/[PROi] ser/bastante cauta/con los comentarios Maryi/promised/Peterj/[PROi] to be/quite cautious ( f)/with her comments (70) Object control María i/exigió/a Pedroj/[PROj] ser/bastante cauto/con los comentarios ‘Mary i /demanded/from Peter j/[PROj] to be/quite cautious (m)/with his comments’ Mary demanded from Peter that he be quite cautious (m) with his comments (71) Subject control bias (preposition “para”) Yolanda i /se casó /con Jorgej /para [PRO1]I tener /dinero /y [PRO2]i ser / heredera /de una fortuna ‘Yolanda i (fem) married Georgej (mas) in order to [PRO1]i have money and [PRO2]i be the heir (fem) to a fortune’ Yolanda married George in order to have money and inherit a fortune (72) Object control bias (preposition “por”) Yolanda i /se casó/con Jorgej/por [PRO1] j tener/dinero/y [PRO2] j ser/ heredero/de una fortuna ‘Yolanda i (fem) married Georgej (mas) for [PRO1] j having money and [PRO2] j for being the heir (masc) to a fortune’ Yolanda married George because he has/had money and he is/was the heir to a fortune The results indicated that Spanish language-users were guided by the control properties of verbs but not by that of prepositions. It seems that their processing system was malleable enough to adapt to the markedly diferent reliability of the source of relevant information (for resolving a subject versus object control dependency): it trusted verbs because verbs code control almost seamlessly; by contrast, preposition control is much less rigidly established and can, in fact, be reverted quite easily by manipulating the context. For instance, even though (in order) to/para normally induces subject control, in Phil brought Jane here precisely to PRO do that job, it is the object Jane that controls the gap. Once again, the parser made an opportunistic choice, but one that was patently anchored on the specifcs of the grammar of control. It is clear that the cue reliability of specifc items of specifc constructions may vary greatly and it makes sense to see parsers being sensitive to that. For instance, Parker et al. (2015) showed that null subjects in time adverbial clauses in English (The little girl talked to her mother after GAP playing in the yard) behave diferently from refexives in that interference from illegal antecedents is

On parsers and grammars

199

possible in the former but not in the latter. Remember, from section 5.4 above, that the processing of refexives seems to be quite strongly structurally guided (via something analogous to c-command). It may be argued that refexives are like the control verbs in the Betancort et al. experiments (so the parser trusts something like c-command almost blindly) and that adverbial control is like preposition control (so the parser is used to considering more options and hesitating more). The previous cases illustrate diferent forms of research that we reviewed in this book, but the reader should be aware that the same opportunism evidenced in them can be found in many other areas that we have not touched.15 Take for instance determiner selection. Caramazza Miozzo et al. (2001) and Schriefers et al. (2002), among others, were keen to examine the way in which both openand closed-class word forms interact in NP production. Open-class words (verbs, adjectives, nouns …) are selected through their meaning. However, as (73)-(74) below show, in languages like Italian the selection of determiners very often extrinsically depends on properties of the head nouns. Thus in (73) feminine la depends on the previous existence in the processing domain of rosa-fem (‘rose’). The same applies to il and tulipano-masc in (74): (73) a. La fem rosa è sul tavolo b. Lefem rose sono sul tavolo [‘the rose is on the table / the roses are on the table’] (74) a. Il masc tulipano è nel vaso b. Imasc tulipani sono nel vaso [‘the tulip is in the vase / the tulips are in the vase’] Since languages difer widely in their morphosyntactic properties, it is interesting to see if the cross-linguistic variability is mirrored by cross-linguistically tuned processing biases that opportunistically exploit language-specifc constraints. Research of this kind uses the picture-word interference task, a variant of the classical Stroop task (Klein 1964).16 Two interesting fndings have emerged using this methodology. First, picture naming is slower if the distracting word is semantically related to the object in the picture (e.g. the word chair superimposed on the picture of a table). This is the so-called Semantic Interference Efect. Second, picture naming is faster when the distractor and the picture noun are phonologically related (e.g., the word card superimposed on the picture of a car). This is the Phonological Facilitation Efect. The typical interpretation of this pattern of results is that it refects a serial process of node selection competition followed by phonological priming of the chosen lexical node. The interesting contrast is between work done on Dutch versus Italian, Spanish and French. In Dutch, determiners are marked for gender: de is used for common gender nouns (de tafel, ‘the table’), and het is used for neuter gender nouns (het boek, ‘the book’). Schriefers et al. (2002) asked speakers of this language to produce noun phrases such as the red table in response to pictures. Speakers produced either a [de +Adj + N] phrase or a [het + Adj +N] one. A

200

On parsers and grammars

Gender Congruency Efect emerged: naming times were higher when targets and distractors had diferent genders. The authors interpreted this efect as arising at a level of gender feature selection: when a distractor word activates its gender feature, this interferes with the selection of the feature of the target word. When Caramazza et al. (2001) tried this in Italian, no such efect emerged. What can explain the discrepancy? Consider the following Italian NPs: (75) Il treno/i treni [the train/the trains] (76) Lo sgabello / gli sgabelli [the stool/the stools] (77) La forchetta / le forchette [the fork/the forks] (78) Il piccolo treno [the small train] (79) Il piccolo sgabello [the small stool] (80) La piccola forchetta [the small fork] (81) Il treno piccolo [‘the train small’] (82) Lo sgabello piccolo [‘the stool small’] (83) La forchetta piccola [‘the fork small’] As can be seen, the determiners lo/gli are selected if the next word starts with a vowel, a consonant cluster or with an africate. Il/i are selected in all other cases. Additionally, Italian adjectives can occur both before and after their head nouns. All this means that determiner selection can occur only after all the constituents of the phrase are ordered and after head selection and the morphological features of the noun become available. Notice that those features must be available when the determiner is chosen but the noun carrying the features themselves is output after the determiner. In short, Italian is a late-selection-language, and what seems to have happened with the null results of the Caramazza et al’s experiments is that determiners are selected so late in the chained production pipeline that activated competing information simply dissipated by the time the selection of the target determiner took place. Notice that the cross-linguistically diferent production biases reveal a sensitivity to the specifc timing of access to the relevant information in each language. The diferent cross-linguistic behaviour is thus appropriately explained in reference to the specifcs of the respective languages, and it would, in fact, be difcult to make sense of it otherwise (Acuña-Fariña 2016: 41; Levy 2008a; Vasishth et al. 2010). Take, briefy and fnally, the processing of pronouns. In an eye-tracking study, Lago et al. (2017) compared pronoun antecedent retrieval in English and German. Previous research showed that speakers of German reactivate the syntactic gender of the antecedent of a pronoun (presumably in order to implement gender agreement operations). Since syntactic gender information is stored in the lexicon, researchers assumed that pronouns in these languages reactivate the full lexical entry of their antecedent nouns. Conversely, in languages without arbitrary gender, such as English, lexical retrieval may not be necessary. Lago et al. found early sensitivity to the semantic features of the pronoun’s antecedent in German but not in English and concluded (p. 795) that “antecedent retrieval

On parsers and grammars

201

varies cross-linguistically depending on the type of information relevant to the grammar of each language”. This is custom-made opportunism, yet again. In fact, as is well known in several corners of the world of linguistics, the same kind of opportunistic behaviour is evident in the grammars of the world’s languages. As Acuña-Fariña (2016) observes (see also Acuña-Fariña 2009 and Corbett 1991: 136 f ), there are basically three ways for building syntactic structure in language: a consistent word order, agreement and the use of classifers. Most Indo-European languages use the former two simultaneously, but in unique proportions. To my knowledge, no one really knows what makes a language opt more for one kind or the other, but we do know that once that choice of preferred construction mode has been made a number of motivated cascading phenomena are to be expected. For instance, as Hawkins (2004: 250) observes, SVO languages are less likely to have rich agreement, presumably because in this word order type the verb clearly separates the two main arguments in a predication, making constituent identifability unproblematic. When morphology is lost, the loss afects gender more than number, as was the case for English historically. Is this chance? Not likely. Number is almost always conceptually grounded, but gender is very often conspicuously not so (the word for computer is masculine in Spanish; the word for motorbike is feminine …). So the laxitute of gender cues may be used for clause construction in a way that is hard to imagine for number. In efect, a gender cue X in a series of elements in a string simply means: unify all the X-bearing words into a phrase (say, la casa alta y blanca in Spanish (‘the tall, white house’), with all the constituents of the NP redundantly co-marking femsg). All this means that, when morphology is gradually being lost and increased word-order predictability supplants it, it is gender that stands to lose. Gender did remain in the English pronominal system, but this continued to be opportunistic: there is no great word order predictability in pronoun-antecedent ties, which makes gender a useful device to keep. In German NPs, the lack of a determiner results in the adjectives showing strong infection, which ensures the persistence of functional information (via case), thus guaranteeing the construction of the maximal projection of the NP (Hawkins 1994: 364). These “logical solutions to evolutionary conundrums” (Acuña-Fariña 2016: 36) are everywhere. According to Bybee (1985), gender is present only in 16% of the world’s verbal systems, whereas number and person clearly surpass the 50% fgure. Given that many of the world’s languages are of the SVO type and that SOV makes syntactic gender less necessary for constituent identifcation, the fgures make sense. Siewierska (1998) examined the word order and agreement patterns of a sample of 171 languages. She noticed that a fxed word order does not necessarily result in the loss of agreement, but free word order does justify a prediction that the language in question has agreement. Consider case. It turns out that when agreement in case is marked on noun phrase constituents that are adjacent, then it almost necessarily exists for those that are non-adjacent (but not the other way around; see Moravcsik 1995: 471; Hawkins 2004: 160). Case marking also tends to correlate with fexible word order (Siewierska 1998): most

202 On parsers and grammars

Indo-European languages that have lost case marking have developed more rigid word order. Consider, fnally, the repercussions of word order rigidity versus lack of it that Acuña-Fariña (2016) observes in a comparison of English and Spanish (p. 37): the prima facie evidence that, say, Spanish and English difer in the degree of mobility of their constituents (…) follows the same opportunistic trend but goes well beyond the need to compose constituent structure. It clearly afects information structure too. It is well-known that English marked focus allows the system of grammar to code information structure when end-focus is not possible. In English, end-focus is very often not possible because its rigid SVO bias puts objects and subjects in specifc locations. What happens when the informational focus falls on the subject? In Romance, encoders simply put the subject last. Unable to move constituents around so easily, English instead places elevated pitch prominence on that initial constituent to express the informational dimension (bypassing an obvious dent to communicative dynamism but preserving SVO; Lambrecht 1994). A related phenomenon occurs in the same language when the object, which must be placed in a focal position after the verb as a default, is in fact very anti-focal, that is, completely old/given information: the result is weak forms of all the pronouns, such as [ jə, əm, im, ə], in an attempt to minimise the presence of such low-communicativity in such prominent positions (…). In all these cases, and a myriad more, nothing like a universal veneer can be glimpsed which produces homogeneous or at least similar results in all languages. That could happen, in principle, but it did not. Instead, since Spanish has rich agreement, it uses it to move elements around when it comes to coding information structure. English has a beat-based prosody, so it uses it to mark focus. These are radically-diferent ways of encoding the same information, but what is important is that both ways draw from the pallette already provided by each language opportunistically. (emphasis added) The previous considerations highlight aspects of the self-structuring processes of grammar systems (Kloos & Van Orden 2009), their resort to exaptation (Lass 1990) and other forms of ‘adaptations’ (Haspelmath 1999) that may strike one as endlessly idiosyncratic but which are always relatable to what specifc languages have in their pre-existing menu of grammatical choices.17 This recognition of opportunism does not solve the problem we have as scientists of explaining why one specifc solution instead of potentially many others is opted for. The notion of prioritized constraints or the idea of an UG ftted with parameter-setting often come to mind in this context. However, the UG solution faces the further diffculty of explaining why in so many cases we do not actually see the same solution to the same problem but, instead, tailor-made ones (Acuña-Fariña 2016: 37).

On parsers and grammars

203

In sum, when contemplating systems where (say) antecedents tend to precede anaphors, topics precede comments, agents precede patients and fllers precede gaps, where word order and agreement behave ‘intelligently’ in unique doses in the world’s languages, opportunistic solutions dominate the background, and these appear to afect the very self-structuring of grammar systems and the way speakers and hearers seem to face the processing and encoding of those systems. All this surely does not compose an ‘elegant’, ‘optimal’ or ’perfect’ landscape, but it surely serves the purpose of making humans something unique in the galaxies we can visualize. One can only conjecture how much more unique we would be if our communication system were indeed optimal, elegant and perfect.

Notes 1 A standard reference to stress the role of ‘time’ in the connection between grammatical derivations and left-to-right structure-building is Phillips (2003). This is a theoretical contribution whose main goal is to account for apparent mismatches among diferent constituency tests (say, coordination sanctioning a string as legal and movement showing the reverse). An adequate evaluation of this contribution rests on very particular tenets of generative grammar that surpass the scope of this book. Superfcially, though, it stands out for the simple idea that diferent diagnostic tests may refect changes in constituency that occur during left-to-right structure-building (so the idea that real-time parsing afects constituency). On this view, contradictions between constituency tests only arise because those tests apply at diferent stages in the incremental derivation of a sentence. This may result in constituents that are initially built and then destroyed by the need to deal with the arrival of new material to integrate. These ‘time sensitive’ issues are particularly salient when reconstruction of moved or elided material needs to be accomplished. 2 In a way, the system contemplated here is like the dual system that is habitually contemplated for reading. Usually, expert readers read whole words directly (top-down) but resort to a more methodical phonological (bottom-up) route for unfamiliar words or words they may have to revise (see Altmann 1997: ch. 11). 3 Hence, Momma and Phillips (2018: 235) point out that the debate is not really of one of one versus two systems but one versus three: “two is not an option”. 4 Adele Goldberg doubles as a psycholinguist and has produced experimental work aimed at proving the psychological status of syntactic constructions (understood as form/meaning pairs). See, for instance, Goldberg and Bencini (2005), or Ziegler et al. (2019). 5 Note that the garden path here is not actually caused by a move to bypass syntax, though: the woman bathed the baby is perfectly grammatical, of course. It seems rather more a problem of ‘lossy memory’ (see below). See Phillips (2003) or Steedman (2000) on changing, fexible constituency. 6 Recent ERP work by Chow et al. (2018) on the use of argument roles on verb prediction suggests that semantic role information is but one type of semantic information used in coercing initial parsers. The authors took advantage of the word order properties of Mandarin Chinese to manipulate the order of pre-verbal noun phrase arguments while holding lexical information constant. They targeted the N400 component, as this is well known to be sensitive to surprisal. They found that argument role information is slightly delayed in that N400 efects for argument role reversals emerged only when some distance mediates between the arguments and the verb. The authors themselves suggest that their fndings provide empirical support for Laszlo and Federmeier’s (2009: 32) proposal that “quantitative shifts in the timing of

204 On parsers and grammars

7 8 9 10 11

12 13 14

15 16 17

processing can potentially lead to qualitative diferences in what particular facets of semantics come to be linked up with a given input” (emphasis added). The term ‘lossy’ comes from ‘lossy compression’, where data are transformed into a compressed form such that its original form cannot be reconstructed with precision (Nelson & Gailly 1996). In the n-back task (Kirchner 1958; Kane et al. 2007), participants are shown a series of letters on a computer screen (one at a time) and are then required to press a button if the letter just shown was presented n items before. The authors note that such sensitivity would compromise retrieval models that assume that retrieval is only driven by features of the prior context (e.g., Lewis et al. 2006). Montalbetti (1984) is usually credited with being the frst author who noticed these sentences. The notion of a ‘negative context’ is not completely mysterious. Linguists know that the antecedents of conditionals (not their consequent), questions, and universal quantifer restriction (not their scope) all share the property of being downward entailing, that is, non-veridical. See Ladusaw (1979). As already noted, on derivational time and constituency, see Phillips (2003). Such determinism explains why expletive subjects like it abound in English: they are necessary when for some reason (e.g., a heavy subject) something is not ‘in the right place’: it is essential that she does not spend more time doing all that during the present crisis. Remember that an epicene noun (like vittima ‘victim’ or personaggio ‘character’ in Italian) has a fxed grammatical gender but can refer to either a female or a male referent. This means that, for these words, agreement between a subject NP and a postverbal predicative adjective is with the grammatical gender of the subject head noun, regardless of whether it refers to a male or a female participant. See Acuña-Fariña (2016), on whom many of the observations made in the remaining of this section are based. Typically, subjects are asked to name a picture (say that of a cat) that appears with a word superimposed on it (say the word tiger (related) or truck (unrelated)). See Fay et al. (2008) and Galantucci (2009) for work in experimental semiotics that tries to show how structures might emerge under multiple soft constraints. For computational approaches focusing on a simulation of the trajectories of learning, see Kirby (2013).

REFERENCES

Abdelghany, H. & Fodor, J. 1999. Low attachment of relative clauses in Arabic. Poster presented at the 5th annual conference on architectures and mechanisms for language processing (AMLaP). Edinburgh, September 23–25. Abels, K. 2012. Phases: An essay on cyclicity in syntax. Berlin: De Gruyter. Abney, S.P. 1989. A computational model of human parsing. Journal of Psycholinguistic Research 18: 129–144. Acuña-Fariña, C. 2009. The psycholinguistics of agreement in English and Spanish: A tutorial overview. Lingua 119: 389–424. Acuña-Fariña, C. 2012. Agreement, attraction and architectural opportunism. Journal of Linguistics 48 (2): 257–296. Acuña-Fariña, C. 2016. Opportunistic processing of language. Language Sciences 57: 34–48. Acuña-Fariña, C. 2018a. Aspects of a psychologically-informed theory of agreement. Folia Linguistica 52 (2): 449–481. Acuña-Fariña, C. 2018b. The role of morphology in setting production biases in agreement: A cross-linguistic completion study. Language Sciences 66: 28–41. Acuña-Fariña, C. 2018c. Aspects of the constructional nature of agreement. Constructions 11: 1–24. Acuña-Fariña, C. 2022. Parsers and grammars: A tutorial overview from the linguistics building. Brain Sciences 12 (12): 1659. https://doi.org/10.3390/brainsci12121659. Acuña-Fariña, C., Fraga, I., García-Orza, J. & Piñeiro, A. 2009. Animacy in the adjunction of Spanish RCs to complex NPs. The European Journal of Cognitive Psychology 21: 1137–1165. Acuña-Fariña, C., Carreiras, M. & Meseguer, E. 2014. Gender and number agreement in comprehension in Spanish. Lingua 143: 108–128. Aguilar, M. & Grillo, N. 2019. Prediction and generation of fne-grained grammatical structure aligns with parsing preferences: The case of relative clauses. Psycholinguistics in Iceland. University of Iceland, Reykjavík, Iceland. https://eprints.whiterose.ac .uk/148037/1/PIPP_Prediction_and _generation_of _f ine_grained _grammatical _structure_aligns_with_parsing_preferences_2.pdf.

206

References

Albrecht, J.E. & O’Brien, E.J. 1993. Updating a mental model: Maintaining both local and global coherence. Journal of Experimental Psychology: Learning, Memory, and Cognition 19: 1061–1069. Altmann, G. 1997. The ascent of Babel: An exploration of language, mind, and understanding. Oxford: Oxford University Press. Altmann, G. 1998. Ambiguity in sentence processing. Trends in Cognitive Science 2 (4): 1–7. Altmann, G. 2013. Anticipating the garden path: The horse raced past the barn eat the cake. In Sanz, M., Laka, I. & Tanenhaus, M. (eds.), Language down the garden path: The cognitive and biological basis for linguistic structure, 111–130. Oxford: Oxford University Press. Altmann, G.T.M. & Steedman, M. 1988. Interaction with context during human sentence processing. Cognition 30: 191–281. Altmann, G.T.M. & Kamide, Y. 1999. Incremental interpretation at verbs: Restricting the domain of subsequent reference. Cognition 73: 247–264. Aoshima, S., Phillips, C. & Weinberg, A. 2004. Processing fller-gap dependencies in a head-fnal language. Journal of Memory and Language 51: 23–54. Ariel, M. 1990. Accessing NP antecedents. London: Routledge, Croom Helm. Audring, J. 2014. Gender as a complex feature. Language Sciences. Special issue: Exploring grammatical gender 43: 5–17. Augurzky, P. 2006. Attaching relative clauses in german: The role of implicit and explicit prosody in sentence processing. Ph.D. thesis, Leipzig: MPI Series in Human Cognitive and Brain Sciences. Auwera, J. 1985. The predicate relatives of French perception verbs. In Bolkstein, A., de Groot, C. & Mackenzie, J. (eds.), Predicates and terms in functional grammar, 219–234. Dordrecht: Foris. Baccino, T., Vincenzi, M.D. & Job, R. 2000. Cross-linguistic studies of the late closure strategy: French and Italian. In Lombardo, V. & Vincenzi, M.D. (eds.), Cross-linguistic perspectives on language processing, 89–118. Dordrecht: Kluwer. Bach, E. 1982. Purpose clause and control. In Jacobson, P. & Pullum, G. (eds.), The nature of syntactic representation. New York: Academic Press. Badecker, W. & Kuminiak, F. 2007. Morphology, agreement and working memory retrieval in sentence production: Evidence from gender and case in Slovak. Journal of Memory and Language 56: 65–85. Bader, M. & Lasser, I. 1994. German verb-fnal clauses and sentence processing: Evidence for immediate attachment. In Clifton, C., Frazier, L. & Rayner, K. (eds.), Perspectives on sentence processing, 225–242. Hillsdale, NJ: Lawrence Erlbaum. Baker, M. 2008. The syntax of agreement and concord. Cambridge: Cambridge University Press. Balota, D.A. & Spieler, D.H. 1999. Word-frequency, repetition, and lexicality efects in word recognition tasks: Beyond measures of central tendency. Journal of Experimental Psychology: General 128: 32–55. Baltin, M. 2003. The interaction of ellipsis and binding: Implications for the sequencing of principle A. Natural Language and Linguistic Theory 21: 215–246. Baltin, M. 2006. The non-unity of VP preposing. Language 82: 734–766. Barber, H. & Carreiras, M. 2005. Grammatical gender and number agreement in Spanish: An ERP comparison. Journal of Cognitive Neuroscience 17 (1): 137–153. Barber, H., Salillas, E. & Carreiras, M. 2004. Gender or genders agreement? In Carreiras, M. & Clifton, C. (eds.), On-line study of sentence comprehension; eye-tracking, ERP and beyond, 309–327. Brighton: Psychology Press.

References

207

Barker, J., Nicol, J. & Garrett, M. 2001. Semantic factors in the production of number agreement. Journal of Psycholinguistic Research 30: 91–114. Belletti, A. 1988. The case of unaccusatives. Linguistic Inquiry 19 (1): 1–34. Belletti, A., Friedmann, N., Brunato, D. & Rizzi, L. 2012. Does gender make a diference? Comparing the efect of gender on children’s comprehension of relative clauses in Hebrew and Italian. Lingua 122: 1053–1069. Beres, A. 2017. Time is of the essence: A review of Electroencephalography (EEG) and Event-Related Brain Potentials (ERPs) in language research. Applied Psychophysiology and Biofeedback 42: 247–255. Berg, T. 1998. The resolution of number agreement conficts in English and German agreement patterns. Linguistics 36: 41–70. Bergmann, A., Armstrong, M. & Maday, K. 2008. Relative clause attachment in English and Spanish: A production study. Proceedings of speech prosody, Campinas, Brazil. Berwick, R. & Weinberg, A. 1982. Parsing efciency, computational complexity, and the evaluation of grammatical theories. Linguistic Inquiry 13: 165–291. Berwick, R. & Weinberg, A. 1983. The role of grammar in models of language use. Cognition 13: 1–61. Betancort, M., Carreiras, M. & Acuña-Fariña, C. 2006. Processing controlled PROs in Spanish. Cognition 100 (2): 217–282. Bever, T.G. 1970. The cognitive basis for linguistic structure. In Hayes, R. (ed.), Cognition and the development of language. New York: John Wiley. Bever, T. 2009. Remarks on the individual basis for linguistic structures. In PiatelliPalmarini, M., Uriagereka, J. & Salaburu, P (eds.), Of minds and language: The Basque country encounter with Noam Chomsky, 279–295. Oxford: Oxford University Press. Bever, T. 2013. The biolinguistics of language universals: The next years. In Sanz, M., Laka, I. & Tanenhaus, M.K. (eds.), Language down the garden path: The cognitive and biological basis of linguistic structures, 235–405. Oxford: Oxford University Press. Bever, T.G. & McElree, B. 1988. Empty categories access their antecedents during comprehension. Linguistic Inquiry 19: 35–43. Biondo, N., Vespignani, F., Rizzi, L. & Mancini, S. 2018. Widening agreement processing: A matter of time, features and distance. Language, Cognition and Neuroscience 33 (7): 890–911. Blaubergs, M.S. & Braine, M.D. 1974. Short-term memory limitations on decoding selfembedded sentences. Journal of Experimental Psychology 102 (4): 745–748. Blevins, J.P. 2003. Passives and impersonals. Journal of Linguistics 39: 473–520. Bock, K. 1982. Towards a cognitive psychology of syntax: Information processing contributions to sentence formulation. Psychological Review 79: 1–47. Bock, K. & Miller, C.A. 1991. Broken agreement. Cognitive Psychology 23: 45–93. Bock, K. & Cutting, J.C. 1992. Regulating mental energy: Performance units in language production. Journal of Memory and Language 31: 99–127. Bock, K. & Eberhard, K. 1993. Meaning, sound, and syntax in English number agreement. Language and Cognitive Processes 8: 57–99. Bock, K., Nicol, J. & Cutting, J.C. 1999. The ties that bind: Creating number agreement in speech. Journal of Memory and Language 40: 330–346. Bock, K., Eberhard, K., Cutting, J.C., Meyer, A. & Schriefers, H. 2001. Some attractions of verb agreement. Cognitive Psychology 43: 83–128. Bock, K., Eberhard, K.M. & Cutting, J.C. 2004. Producing number agreement: How pronouns equal verbs. Journal of Memory and Language 51: 251–278.

208 References

Bock, K., Cutler, A., Eberhard, K., Butterfeld, S., Cutting, J.C. & Humphreys, K. 2006. Number agreement in British and American English. Language 82 (1): 64–113. Bock, K. & Middleton, E.L. 2011. Reaching agreement. Natural Language & Linguistic Theory 29: 1033–1069. Bock, K., Carreiras, M. & Meseguer, E. 2012. Number meaning and number grammar. Language and Cognitive Processes 26 (4/5/6): 509–529. Boecks, C., Hornstein, N. & Nunes, J. 2010. Control as movement. Cambridge: Cambridge University Press. Bögels, S., Schriefers, H., Vonk, W., Chwilla, D.J. & Kerkhofs, R. 2013. Processing consequences of superfuous and missing prosodic breaks in auditory sentence comprehension. Neuropsychologia 51: 2715–2728. Boland, J.E., Tanenhaus, M.K. & Garnsey, S.M. 1990. Evidence for the immediate use of verb control information in sentence processing. Journal of Memory and Language 29: 413–432. Bornkessel, I. & Schlesewsky, M. 2006. The extended argument dependency model: A neurocognitive approach to sentence comprehension across languages. Psychological Review 113: 787–821. Bošković, Ž. & Takahashi, D. 1998. Scrambling and last resort. Linguistic Inquiry 29: 347–366. Bornkessel-Schlesewsky, I. & Schlesewsky, M. 2009. Processing syntax and morphology: A neurocognitive perspective. Oxford: O.U.P. Bradley, D.C. & Foster, K.L. 1987. A reader’s view of listening. In Frauenfelder, U.H. & Tyler, L.K. (eds.) Spoken word recognition, 103–133. Cambridge: Cambridge University Press. Bradley, M.M. & Lang, P.J. 1994. Measuring emotion: The self-assessment manikin and the semantic diferential. Journal of Behavioral Therapy & Experimental Psychiatry 25: 49–59. Bradley, M.M. & Lang, P.J. 1999. Afective norms for English words (ANEW): Instruction manual and afective ratings. Gainesville, FL: Center for Research in Psychophysiology, University of Florida. Brainard, D.H. & Hurlbert, A.C. 2015. Colour vision: Understanding the dress. Current Biology 25 (13): R551–R554. Brannigan, H.P., Pickering, M.J., Liversedge, S.P., Stewart, A.J. & Urbach, T.P. 1995. Syntactic priming: Investigating the mental representation of language. Journal of Psycholinguistic Research 24 (6): 489–506. Brehm, L. & Bock, K. 2013. What counts in grammatical number agreement? Cognition 128: 149–169. Bresnan, J. 2001. Explaining morphosyntactic competition. In Baltin, M. & Collins, C. (eds.), Handbook of contemporary syntactic theory, 11–44. Oxford: Blackwell. Bresnan, J. 1982. The theory of complementation in English syntax. Doctoral dissertation. MIT Press. Bryssbaert, M. & Mitchell, D. 1996. Modifer attachment in sentence parsing: Evidence from Dutch. Quarterly Journal of Experimental Psychology 49A: 664–695. Bybee, J. 1985. Morphology: A study of the relation between meaning and form (Typological studies in language 99). Amsterdam: John Benjamins. Bybee, J. 2010. Language, usage and cognition. Cambridge: Cambridge University Press. Burzio, L. 1981. Intransitive verbs and Italian auxiliaries. Doctoral dissertation. Cambridge: MIT Press. Cacciari, C., Corradini, P., Padovani, R. & Carreiras, M. 2011. Pronoun resolution in Italian: The role of grammatical gender and context. Journal of Cognitive Psychology 23 (4): 416–434.

References

209

Cafarra, S., Siyanova-Chanturia, A., Pesciarelli, F. & Cacciari, C. 2015. Is the noun ending a cue to grammatical gender processing? An ERP study on sentences in Italian. Psychophysiology 52: 1019–1030. Cafarra, S., Mendoza, M. & Davidson, D. 2019. Is the LAN efect in morphosyntactic processing an ERP artifact? Brain & Language 191: 9–16. https://doi.org/10.1016/j .bandl.2019.01.003. Epub 2019 Feb 4. Canal, P., Garnham, A. & Oakhill, J. 2015. Beyond gender stereotypes in language comprehension: Self sex-role descriptions afect the brain’s potentials associated with agreement processing. Frontiers in Psychology 6: 1953. Caplan, D. & Waters, G. 1999. Verbal working memory and sentence comprehension. Behavioral and Brain Sciences 22: 77–126. Caplan, D. & Waters, G. 2013. Memory mechanisms supporting syntactic comprehension. Psychonomic Bulletin & Review 20: 243–268. Caramazza, A., Miozzo, M., Costa, A., Schiller, N. & Alario, F.-X. 2001. A crosslinguistic investigation of determiner production. In Dupoux, E. (ed.), Language, brain and cognitive development: Essays in honor of Jacques Mehler, 208–226. Cambridge, MA: MIT Press. Carlson, K., Clifton, C. & Frazier, L. 2001. Prosodic boundaries in adjunct attachment. Journal of Memory & Language 45 (1): 58–81. Carminati, M.N. 2005. Processing refexes of the feature hierarchy and implications for linguistic theory. Lingua 115: 259–285. Carreiras, M. 1992. Estrategias de análisis sintáctico en el procesamiento de frases: Cierre temprano versus cierre último. Cognitiva 4 (1): 3–27. Carreiras, M. & Gernsbacher, M.A. 1992. Comprehending conceptual anaphors in Spanish. Language and Cognitive Processes 7 (3–4): 281–299. Carreiras, M. & Clifton, C. 1993. Relative clause interpretation preferences in Spanish and English. Language and Speech 36: 353–372. Carreiras, M., Garnham, A., Oakhill, J.V. & Cain, K. 1996. The use of stereotypical gender information in constructing a mental model: Evidence from English and Spanish. Quarterly Journal of Experimental Psychology 49A: 639–663. Carreiras, M. & Clifton, C. 1999. Another word on parsing relative clauses: Eye-tracking evidence from Spanish and English. Memory and Cognition 27: 826–833. Carreiras, M. & Clifton, C. Jr. 2004. On the on-line study of language comprehension. In Carreiras, M. & Clifton, C. Jr. (eds.), The on-line study of sentence comprehension: Eyetracking, ERP, and beyond. Brighton: Psychology Press. Carreiras, M., Salillas, E. & Barber, H. 2004. Event-related potentials elicited during parsing of ambiguous relative clauses in Spanish. Cognitive Brain Research 20 (1): 98–105. Casalicchio, J. 2013. Pseudorelative, gerundi e infniti nelle varietà romanze: Afnità solo superfciali e corrispondenze strutturali. Doctoral dissertation. Università degli Studi di Padova. Chierchia, G. 1988. Structured meanings, thematic roles, and control. In Chierchia, G., Partee, B. & Turner, R. (eds.), Properties, types, and meaning (Vol. 2), 131–136. Dordrecht: Kluwer. Chierchia, G. 2006. Broaden your views: Implicatures of domain widening and the ‘logicality’ of language. Linguistic Inquiry 37: 535–590. Chomsky, C. 1969. The acquisition of syntax in children from 5 to 10. Cambridge, MA: MIT Press. Chomsky, N. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press. Chomsky, N. 1973. Conditions on transformations. In Anderson, S.R. & Kiparsky, P. (eds.), A festschrift for Morris Hale. New York: Holt, Reinhart and Winston.

210

References

Chomsky, N. 1980. On binding. Linguistic Inquiry 11: 1–46. Chomsky, N. 1981. Lectures on government and binding: The Pisa lectures. Dordrecht: Foris. Chomsky, N. 1986. Knowledge of language. New York: Praeger. Chomsky, N. 1995. The minimalist program. Cambridge, MA: MIT Press. Chomsky, N. 2000. Minimalist enquiries: The framework. In Martin, R., Michaels, D. & Uriagereka, J. (eds.), Step by step: Essays on minimalist syntax: In honour of Howard Lasnik. Cambridge, MA: MIT Press. Chomsky, N. 2000. New horizons in the study of language and mind. Cambridge, MA: Cambridge University Press. Chomsky, N. 2001. Beyond explanatory adequacy. Unpublished manuscript. MIT Press. Chomsky, N. & Lasnik, H. 1977. Filters and control. Linguistic Inquiry 8 (3): 425–504. Chomsky, N. & Lasnik, H. 1993. The theory of principles and parameters. In Jacobs, J., von Stechow, A. Sternefeld, W. & Vennemann, T. (eds.), Syntax: An international handbook of contemporary research, 506–569. Berlin: De Gruyter. Chow, W.Y., Smith, C., Lau, E. & Phillips, C. 2016. A “bag-of-arguments” mechanism for initial verb predictions. Language, Cognition and Neuroscience 31 (5): 577–596. Chow, W.Y., Lau, E., Wang, S. & Phillips, C. 2018. Wait a second! Delayed impact of argument roles on on-line verb prediction. Language, Cognition and Neuroscience 33 (7): 1–26. Christianson, K., Hollingworth, A., Halliwell, J.F. & Ferreira, F. 2001. Thematic roles assigned along the garden path linger. Cognitive Psychology 42: 368–407. Christiansen, M.H. & Chater, N. 2016. The now-or-never bottleneck: A fundamental constraint on language. Behavioral and Brain Sciences 39: 1–19. Cinque, G. 1988. On SI constructions and the theory of ARB. Linguistic Inquiry 19: 521–581. Cinque, G. 1992. The pseudo-relative and Acc-ing constructions after verbs of perception. University of Venice working papers in linguistics. Università di Venezia. Cinque, G. 1995. Italian syntax and universal grammar. Cambridge: Cambridge University Press. Clahsen, H. & Featherston, S. 1999. Antecedent priming at trace positions: Evidence from German scrambling. Journal of Psycholinguistic Research 28: 415–437. Clark, H.H. & Clark, E.V. 1968. Semantic distinctions and memory for complex sentences. Quarterly Journal of Experimental Psychology 20: 129–138. Clifton, C. 1988. Restrictions on late closure: Appearance and reality. Paper presented at the 6th Australian language and speech conference, Sydney. Clifton, C. & Frazier, L. 1989. Comprehending sentences with long-distance dependencies. In Carlson, G.N. & Tanenhaus, M.K. (eds.), Linguistic structure in language processing, 273–317. Dordrecht: Kluwer. Clifton, C. & Ferreira, F. 1989. Parsing in context. Language and Cognitive Processes 4: SI77–103. Clifton, C. 1988. Restrictions on late closure: Appearance and reality. Paper presented at the 6th Australian language and speech conference, August, Sydney. Clifton, C., Carlson, K. & Frazier, L. 2002. Informative prosodic boundaries. Language and Speech 45: 87–114. Cloitre, M. & Bever, T.G. 1988. Linguistic anaphors, levels of representation, and discourse. Language and Cognitive Processes 3: 293–322. Cokal, D. & Ferreira, F. 2018. Sentence processing. In G. Hickok, G. & Small, S. (eds.), Neurobiology of Language, 265–274. Oxford: Elsevier. Conway, A.R.A., Kane, M.J., Bunting, M.F., Hambrick, D.Z., Wilhelm, O. & Engle, R.W. 2005. Working memory span tasks: A review and a user’s guide. Psychonomic Bulletin and Review 12: 769–786.

References

211

Cook, V.J. 1975. Strategies in the comprehension of relative clauses. Language and Speech 18: 204–212. Coulson, S., King, J. & Kutas, M. 1998. Expect the unexpected: Event-related brain response to morphosyntactic violations. Language and Cognitive Processes 13 (1): 21–58. Copestake, A. 2002. Implementing typed feature structure grammars. Stanford, CA: CSLI Publications. Corbett, G.S. 1979. The agreement hierarchy. Journal of Linguistics 15: 203–224. Corbett, G.S. 1991. Gender. Cambridge: Cambridge University Press. Corbett, G.S. 2000. Number. Cambridge: Cambridge University Press. Corbett, G. 2006. Agreement. Cambridge: Cambridge University Press. Corbett, G. 2013a. Number of genders. In Dryer, M.S. & Haspelmath, M. (eds.), The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology. Corbett, G. 2013b. Sex-based and non-sex-based gender systems. In Dryer, M.S. & Haspelmath, M. (eds.), The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology. Corley, M.M.B., Mitchell, D.C., & Cuetos, F. (1993) Parsing biases for non-structural diferences: Evidence for the use of statistics in parsing. Presented at the Conference on Psychology of Language and Communication, Glasgow, 31 Aug to 3rd Sept 3. Corley, M.M.B. 1996. The role of statistics in human sentence processing. Doctoral dissertation. University of Exeter. Courteau, É., Martignetti, L., Royle, P. & Steinhauer, K. 2020. Corrigendum: Eliciting ERP components for morphosyntactic agreement mismatches in perfectly grammatical sentences. Frontiers in Psychology 11: 860. https://doi.org/10.3389/fpsyg .2020.00860. Coulson, S., King, W. & Kutas, M. 1998. Expect the unexpected: Event-related brain response to morphosyntactic violations. Language and Cognitive Processes 13 (1): 21–58. Cowan, N. 2001. The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences 24 (1): 87–114. Crain, S. & Fodor, J.D. 1985. How can grammars help parsers? In Dowty, D.R., Karttunen, L. & Zwicky, A.M. (eds.), Natural language parsing. New York: Cambridge University Press. Crain, S. & Steedman, M. 1985. On not being led up the garden path: The use of context by the psychological parser. In Dowty, D.R., Karttunen, L. & Zwicky, A.M. (eds.), Natural language parsing. New York: Cambridge University Press. Crocker, M.W. 1994. On the nature of the principle-based sentence processor. In Clifton, C. Jr., Frazier, L. & Rayner, K. (eds.), Perspectives on sentence processing, 245–266. Hillsdale, NJ: Erlbaum. Croft, W. 1988. Agreement vs. case marking and direct objects. In Barlow, M. & Ferguson, C.A. (eds.), Agreement in natural language: Approaches, theories, descriptions, 159–180. Stanford: Center for the Study of Language and Information. Croft, W. 2001. Radical construction grammar. New York: Oxford University Press. Cuetos, F. & Mitchell, D.C. 1988. Cross-linguistic diferences in parsing: Restrictions on the use of the late closure strategy in Spanish. Cognition 30: 73–105. Cuetos, F. Mitchell, D.C. & Corley, M.M.B. 1996. Parsing in diferent languages. In Carreiras, M., García–Albea, J.E. & Sebastián-Gallés, N. (eds.), Language processing in Spanish, 145–187. Hillsdale, NJ: Lawrence Erlbaum Associates Inc. Culicover, P.W. & Jackendof, R. 2001. Control is not movement. Linguistic Inquiry 32 (3): 493–511.

212 References

Culicover, P.W. & Jackendof, R. 2005. Simpler syntax. Oxford: Oxford University Press. Cunnings, I. & Sturt, P. 2014. Coargumenthood and the processing of refexives. Journal of Memory and Language 75: 117–139. Cunnings, I. & Sturt, P. 2018. Retrieval interference and semantic interpretation. Journal of Memory and Language 102: 16–27. Cutler, A., Dahan, D. & Donselaer. 1997. Prosody in the comprehension of spoken language: A literature review. Language and Speech 40 (2): 141–201. Dahl, Ö. 2004. The growth and maintenance of linguistic complexity. Amsterdam: John Benjamins. Daneman, M. & Carpenter, P. 1980. Individual diferences in working memory and reading. Journal of Verbal Learning and Verbal Behavior 19: 450–466. De Baecke, C., Brysbaert, M. & Desmet, T. 2000. The importance of structural and nonstructural variables in modifer attachment: A corpus study in Dutch. Poster presented at the AMLap-2000, Leiden, The Netherlands. De Dios-Flores, I. 2019. Processing sentences with multiple negations: Grammatical structures that are perceived as unacceptable. Frontiers in Psychology 10: 2346. https:// doi.org/10.3389/fpsyg.2019.02346. De Dios, I. 2021. Processing long-distance dependencies: An experimental investigation of grammatical illusions in English and Spanish. Doctoral Dissertation. University of Santiago de Compostela. De Santo, A. 2019. Testing a minimalist grammar parser on Italian relative clause asymmetries. Proceedings of the workshop on cognitive modeling and computational linguistics, 93–104. Minneapolis. Deevy, P. 2000. Agreement checking in comprehension. Evidence from relative clauses. Journal of Psycholinguistic Research 29: 67–79. Delle Luche, C., van Gompel, R.P.G., Gayraud, F. & Martinie, B. 2006. In Arstein, R. & Poesio, M. (eds.), Efect of relative pronoun type on relative clause attachment: Ambiguity in anaphora workshop proceedings, 23–30. ESSLLI, Málaga, Spain, August 7–11. DeLong, K.A., Urbach, T.P. & Kutas, M. 2005. Probabilistic word pre-activation during language comprehension inferred from electrical brain activity. Nature Neuroscience 8: 1117–1145. Demestre, J., Meltzer, S., Garcı´a Albea, J.E. & Vigil, A. 1999. Identifying the null subject: Evidence from event-related brain potential. Journal of Psycholinguistic Research 28: 293–312. Den Dikken, M. 2003. Agreement. In Nadel, L. (ed.), Encyclopedia of cognitive science. London: Macmillan. De Santo, A. 2019. Testing a minimalist grammar parser on Italian relative clause asymmetries. Proceedings of the workshop on cognitive modeling and computational linguistics. Desmet, T., Brysbaert, M. & De Baecke, K. 2002. The correspondence between sentence production and corpus frequencies in modifer attachment. The Quarterly Journal of Experimental Psychology 55A: 879–896. Desmet, T., De Baecke, C., Drieghe, D., Brysbaert, M. & Vonk, W. 2006. Relative clause attachment in Dutch: On-line comprehension corresponds to corpus frequencies when lexical variables are taken into account. Language and Cognitive Processes 21: 453–485. De Vincenzi, M. 1991. Syntactic parsing strategies in Italian: The minimal chain principle (Vol. 12). Dordrecht: Springer Science & Business Media. De Vincenzi, M. & Job, R. 1993. Some observations on the universality of the late closure strategy. Journal of Psycholinguistic Research 22: 189–206. Dye, M., Milin, P., Futrell, R. & Ramscar, M. 2018. Alternative solutions to a language design problem: The role of adjectives and gender marking in efcient communication. Topics in Cognitive Science 10: 209–224.

References

213

Díaz, B., Sebastián-Gallés, N., Erdocia, K., Mueller, J. & Laka, I. 2011. On the crosslinguistic validity of electrophysiological correlates of morphosyntactic processing: A study of case and agreement violations in Basque. Journal of Neurolinguistics 24: 357–373. Díaz-Lago, M., Fraga, I. & Acuña-Fariña, C. 2015. Time course of gender agreement violations containing emotional words. Journal of Neurolinguistics 36: 79–93. Dillon, B., Mishler, A., Slogget, S. & Phillips, C. 2013. Contrasting intrusion profles for agreement and anaphora: Experimental and modeling evidence. Journal of Memory and Language 69: 85–103. Donders, F.C. 1968. Die Schnelligkeit psychischer Processe. Archiv fur Anatomie, Physiologie und wissenschaftliche Medizin 6: 657–681. English translation (1969). On the speed of mental processes (W.G. Koster, Trans.) Acta Psychologica 30: 412–431. Dowty, D. & Jacobson, P. 1989. Agreement as a semantic phenomenon. ESCOL ‘88, 95–108. Columbus, OH: Ohio State University. Drenhaus, H., Saddy, D. & Frisch, St. 2005. Processing negative polarity items: When negation comes through the backdoor. In Kepser, S. & Reis, M. (eds.), Linguistic evidence: Empirical, theoretical, and computational perspectives, 145–165. Berlin: Mouton de Gruyter. Drury, J.E., Baum, S.R., Valeriote, H. & Steinhauer, K. 2016. Punctuation and implicit prosody in silent reading: An ERP study investigating English garden-path sentences. Frontiers in Psychology 7: 1375. Dwivedi, V.D. 2013. Interpreting quantifer scope ambiguity: Evidence of heuristic frst, algorithmic second processing. PLoS One 8: 1–20. Eberhard, K. 1997. The marked efect of number on subject-verb agreement. Journal of Memory and Language 36: 147–164. Eberhard, K., Cutting, J.C. & Bock, K. 2005. Making syntax of sense: Number agreement in sentence production. Psychological Review 112 (3): 531–559. Ehrlich, S.F. & Rayner, K. 1981. Contextual efects on word perception and eye movements during reading. Journal of Verbal Learning and Verbal Behavior 20: 641–655. Ehrlich, K., Fernández, E., Fodor, J., Stenshoel, E. & Vinereanu, M. 1999. Low attachment of relative clauses: New data from Swedish, Norwegian and Romanian. Poster presented at the 12th Annual CUNY conference on human sentence processing. New York, March 18–20. Erdocia, K., Laka, I., Mestres-Missé, A. & Rodriguez-Fornells, A. 2009. Syntactic complexity and ambiguity resolution in a free word order language: Behavioral and electrophysiological evidence from Basque. Brain and Language 109: 1–17. Eschenbach, E., Habel, C., Herweg, M. & Rehkämper, K. 1989. Remarks on plural anaphora. Proceedings of the fourth conference of the European chapter of the association for computational linguistics, 161–167. Manchester. Fanselow, G., Kliegl, R. & Schlesewksy, M. 1999. Processing difculty and principles of grammar. In Kemper, S. & Kliegl, R. (eds.), Constraints on language: Aging, grammar, and memory, 171–201. Boston: Kluwer. Fanselow, G., Schlesewsky, M., Cavar, D. & Kliegl, R. 1999. Optimal parsing, syntactic parsing preferences, and optimality theory. Rutgers Optimality Archive, ROA-367-1299. Fauconnier, G. 1975. Polarity and the scale principle. Proceedings of CLS 11: 188–199. Fay, N., Garrod, S. & Roberts, L. 2008. The ftness and functionality of culturally evolved communication systems. Philosophical Transactions of the Royal Society of Biological Sciences 363 (1509): 3553–3561. Fayol, M., Largy, P. & Lemaire, P. 1994. Cognitive overload and orthographic errors: When cognitive overload enhances subject–verb agreement errors: A study in French written language. The Quarterly Journal of Experimental Psychology 47A: 437–464.

214 References

Featherston, S., Gross, M., Munte, T.F. & Clahsen, H. 2000. Brain potentials in the processing of complex sentences: An ERP study of control and raising constructions. Journal of Psycholinguistic Research 29: 141–154. Fedorenko, E., Piantadosi, S. & Gibson, E. 2012. Processing relative clauses in supportive contexts. Cognitive Science 36 (3): 471–497. Ferguson, C.A. 1964. Baby talk in six languages. Language 66: 103–114. Ferguson, C. & Barlow, M. 1988. Introduction. In Barlow, M. & Ferguson, C. (eds.), Agreement in natural language, 1–22. Stanford, CA: CSLI Publications. Fernandez, E. 1998. Language dependency in parsing: Evidence from monolingual and bilingual processing. Psychologica Belgica 38–3 (4): 197–230. Ferreira, F. 2003. The misinterpretation of noncanonical sentences. Cognitive Psychology 47: 164–203. Ferreira, F. 2005. Psycholinguistics, formal grammars, and cognitive science. Linguistic Review 22: 365–380. Ferreira, F. & Clifton, C. 1986. The independence of syntactic processing. Journal of Memory and Language 25: 348–368. Ferreira, F. & Henderson, J. 1991. Recovery from misanalyses of garden-path sentences. Journal of Memory and Language 6: 725–745. Ferreira, F. & Engelhardt, P. 2006. Syntax and production. In Gernsbacher, M.A. & Traxler, M.J. (eds.), Handbook of Psycholinguistics, 61–91. Oxford: Elsevier. Ferreira, F. & Patson, N. 2007. The ‘good enough’ approach to language comprehension. Language and Linguistics Compass 1 (1–2): 71–83. Ferreira, F. & Nye, J. 2018. The modularity of sentence processing reconsidered. In de Almeida, R.G. & Gleitman, L.R. (eds.), On concepts, modules, and language: Cognitive science at its core, 63–86. Oxford: Oxford University Press. Fiebach, C.J. 2001. Working memory and syntax during sentence processing. Doctoral dissertation. University of Leipzig. Fiebach, C.J., Schlesewsky, M. & Friederici, A. 2001. Syntactic working memory and the establishment of fller-gap dependencies: Insights of ERPs and fMRI. Journal of Psycholinguistics Research 30 (3): 321–338. Fiebach, C.J., Schlesewsky, M. & Friederici, A.D. 2002. Separating syntactic memory costs and syntactic integration costs during parsing: The processing of German WH-questions. Journal of Memory and Language 47 (2): 250–272. Fodor, J.D. 1988. On modularity in syntactic processing. Journal of Psycholinguistic Research 17 (2): 123–168. Fodor, J.D. 1993. Processing empty categories: A question of visibility. In Altmann, G. & Shillcock, R. (eds.), Cognitive models of speech processing: The second Sperlonga meeting, 351–400. Hove: Erlbaum Fodor, J.D. 1998. Learning to parse. Journal of Psycholinguistic Research 27 (2): 285–319. Fodor, J.D. 2002. Prosodic disambiguation in silent reading. Proceeding of the North-East Linguistic Society 32: 113–32. Fodor, J.D. 2013. Pronouncing and comprehending center-embedded sentences. In Sanz, M., Laka, I. & Tanenhaus, M.K. (eds.), Language down the garden path: The cognitive and biological basis of linguistic structures, 206–228. Oxford: Oxford University Press. Fodor, J.A. & Garrett, M.F. 1966. Some refections on competence and performance. In Lyons, J. & Wales, R.J. (eds.), Psycholinguistic papers, 135–179. Edinburgh: University of Edinburgh Press. Fodor, J.A., Bever, T. & Garrett, M. 1974. The psychology of language. New York: McGraw-Hill.

References

215

Fodor, J.D. & Inoue, A. 2000. Garden path reanalysis: Attach (anyway) and revision as last resort. In Di Vincenzi, M. & Lombardo, V. (eds.), Cross-linguistic perspectives in language processing, 21–61. Dordrecht, The Netherlands: Kluwer. Foote, R. & Bock, K. 2012. The role of morphology in subject-verb number agreement: A comparison of Mexican and Dominican Spanish. Language and Cognitive Processes 27 (3): 429–461. Ford, M., Bresnan, J. & Kaplan, R.M. 1982. A competence-based theory of syntactic closure. In Bresnan, J. (ed.), The mental representation of grammatical relations, 727–796. Cambridge, MA: MIT Press. Fraga, I., Piñeiro, A., Acuña Fariña, C., Redondo, J. & García-Orza, J. 2012. Emotional nouns afect attachment decisions in sentence completion tasks. The Quarterly Journal of Experimental Psychology 65: 1740–1759. Fraga, I., Padrón, I., Acuña-Fariña, C. & Díaz-Lago, M. 2017. Processing gender agreement and word emotionality: New electrophysiological and behavioural evidence. Journal of Neurolinguistics 44: 203–222. Francis, W.N. 1986. Proximity concord in English. Journal of English Linguistics 19: 309–317. Franck, J. 2011. Reaching agreement as a core syntactic process: Commentary of Bock and Middleton ‘reaching agreement’. Natural Language & Linguistic Theory 29 (4): 1071–1086. Franck, J. 2016. Review of language down the garden path: The cognitive and biological basis for linguistic structures. In Sanz, M., Laka, I. & Tanenhaus, M.K. (eds.), Language down the garden path: The cognitive and biological basis of linguistic structures, 222–226. Oxford: Oxford University Press. Franck, J., Vigliocco, G. & Nicol, J. 2002. Subject-verb agreement errors in French and English: The role of syntactic hierarchy. Language and Cognitive Processes 17 (4): 371–404. Franck, J., Lassi, G., Frauenfelder, U. & Rizzi, L. 2006. Agreement and movement: A syntactic analysis of attraction. Cognition 101: 173–216. Franck, J., Vigliocco, G., Antón-Méndez, I., Collina, S. & Frauenfelder, U.H. 2008. The interplay of syntax and form in language production: A cross-linguistic study of form efects on agreement. Language and Cognitive Processes 23: 329–374. Franck, J., Soare, G., Frauenfelder, U. & Rizzi, L. 2010. Object interference in subjectverb agreement: The role of intermediate traces of movement. Journal of Memory and Language 62: 166–182. Franck, J., Colonna, S. & Rizzi, L. 2015. Task-dependency and structure-dependency in number interference efects in sentence comprehension. Frontiers in Psychology. https:// doi.org/10.3389/fpsyg.2015.00349. Franck, J. & Wagers, M. 2020. Hierarchical structure and memory mechanisms in agreement attraction. PLoS One 15 (5): e0232163. https://doi.org/10.1371/journal .pone.0232163. Frazier, L. 1978. On comprehending sentences: Syntactic parsing strategies. Unpublished doctoral dissertation, University of Connecticut, Storrs, CT. Distributed by the Indiana University Linguistics Club, Bloomington, IN. Frazier, L. 1987. Syntactic processing: Evidence from Dutch. Natural Language and Linguistic Theory 5: 519–559. Frazier, L. 1990. Parsing modifers: Special purpose routines in the HSPM? In Balota, D., Flores D’Arcais, G.B. & Rayner, K. (eds.), Comprehension processes in reading. Hillsdale, NJ: Erlbaum.

216 References

Frazier, L. 1995. Constraint satisfaction as a theory of sentence processing. Journal of Psycholinguistic Research 24: 437–468. Frazier, L. & Fodor, J.D. 1978. The sausage machine: A new two-stage parsing model. Cognition 6: 1–34. Frazier, L. & Rayner, K. 1982. Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology 14: 178–210. Frazier, L. & Clifton, C. Jr. 1989. Successive cyclicity in the grammar and the parser. Language and Cognitive Processes 4: 93–126. Frazier, L., Clifton, C. & Randall, J. 1983. Filling gaps: Decision principles and structure in sentence comprehension. Cognition 13 (2): 187–222. Frazier, L. & Flores D’Arcais, G. 1989. Filler driven parsing: A study of gap flling in Dutch. Journal of Memory and Language 28: 331–344. Frazier, L. & Clifton, C. 1996. Construal. Cambridge, MA: MIT Press. Frazier, L. & Clifton, C. 1997. Construal: Overview, motivation and new evidence. Journal of Psycholinguistic Research 26: 277–296. Frenck-Mestre, C. & Pynte, J. 2000a. Resolving syntactic ambiguities: Crosslinguistic diferences? In Lombardo, V. & di Vincenzi, M. (eds.), Cross-linguistic perspectives on language processing, 119–148. Dordrecht: Kluwer Academic Publishers. Frenck-Mestre, C. & Pynte, J. 2000b. Romancing syntactic ambiguity: Why the French and the Italians don’t see eye to eye. In Kennedy, A., Radach, R., Heller, D. & Pynte, J. (eds.), Reading as a perceptual process, 549–564. Oxford: Elsevier. Friederici, A.D. & Weissenborn, J. 2007. Mapping sentence form onto meaning: The syntax-semantic interface. Brain Research 1146: 50–58. Frisch, S., Schlesewsky, M., Saddy, D. & Alpermann, A. 2002. The P600 as an indicator of syntactic ambiguity. Cognition 85: B83–B92. Frisson, S. 2009. Semantic underspecifcation in language processing. Language and Linguistics Compass 3 (1): 111–127. Frisson, S. & Pickering, M. 2001. Obtaining a fgurative interpretation of a word: Support for underspecifcation. Metaphor and Symbol 16 (3): 149–171. Fromont, L., Soto-Faraco, S. & Biau, E. 2017. Searching high and low: Prosodic breaks disambiguate relative clauses. Frontiers in Psychology 8: 96. Futrell, R., Gibson, E. & Levy, R.P. 2020. Lossy-context surprisal: An informationtheoretic model of memory efects in sentence processing. Cognitive Science 44: e12814. Galantucci, B. 2009. Experimental semiotics: A new approach for studying communication as a form of joint action. Topics in Cognitive Science 1 (2): 393–410. Garnham, A. & Oakhill, J.V. 1985. On-line resolution of anaphoric pronouns: Efects of inference making and verb semantics. British Journal of Psychology 76: 385–393. Garnham, A., Oakhill, J. & Cain, K. 1997. The interpretation of anaphoric noun phrases: Time course and efects of overspecifcity. Quarterly Journal of Experimental Psychology Series a Human Experimental Psychology 50 (1): 149–162. Garnsey, S.M., Pearlmutter, N., Meyers, E. & Lotocky, M.A. 1997. The contribution of verb-bias and plausibility to the comprehension of temporarily ambiguous sentences. Journal of Memory and Language 37: 58–93. Garrett, M.F. 1975. The analysis of sentence production. In Bower, G.H. (ed.), The psychology of learning and motivation, 133–171. New York: Academic Press. Garrett, M.F. 1980. Levels of processing in sentence production. In Butterworth, B. (ed.), Language production, 177–220. London: Academic Press.

References

217

Garrod, S. & Terras, M. 2000. The contribution of lexical and situational knowledge to resolving discourse roles: Bonding and resolution. Journal of Memory and Language 4: 526–544. Gazdar, G. 1981. Unbounded dependencies and coordinate structure. Linguistic Inquiry 12: 155–184. Gazdar, G., Klein, E., Pullum, G. & Sag, I.A. 1985. Generalized phrase structure grammar. Oxford: Blackwell. Gennari, S.P. & MacDonald, M.C. 2008. Semantic indeterminacy in object relative clauses. Journal of Memory and Language 58: 161–187. Gennari, S.P. & MacDonald, M.C. 2009. Linking production and comprehension processes: The case of relative clauses. Cognition 111: 1–23. Gernsbacher, M.A. 1991. Comprehending conceptual anaphors. Language and Cognitive Processes 6: 81–105. Gibson, E. 1998. Linguistic complexity: Locality of syntactic dependencies. Cognition 68: 1–76. Gibson, E. 2000. Dependency locality theory: A distance-based theory of linguistic complexity. In Marantz, A., Miyashita, Y. & O’Neil, W. (eds.), Image, language, brain: Papers from the frst mind articulation project symposium, 95–126. Cambridge, MA: MIT Press. Gibson, E. & Hickok, G. 1993. Sentence processing with empty categories. Language and Cognitive Processes 8: 147–161. Gibson, E., Pearlmutter, N., Canseco-Gonzalez, E. & Hickok, G. 1996. Recency preference in the human sentence processing mechanism. Cognition 59 (1): 23–59. Gibson, E. & Wu, K.-H.I. 2013. Processing Chinese relative clauses in context. Language and Cognitive Processes 28: 125–155. Gibson, E., Tily, H. & Fedorenko, E. 2013. The processing complexity of English relative clauses. In Sanz, M., Laka, I. & Tanenhaus, M.K. (eds.), Language down the garden path: The cognitive and biological basis of linguistic structures, 149–173. Oxford: Oxford University Press. Gibson, E., Bergen, L. & Piantadosi, S.T. 2013. Rational integration of noisy evidence and prior semantic expectations in sentence interpretation. Proceedings of the National Academy of Sciences 110 (20): 8051–8056. Gibson, E., Futrell, R., Piantadosi, S.T., Dautriche, I., Mahowald, K., Bergen, L. & Levy, R. 2019. How efciency shapes human language. Trends in Cognitive Sciences 23 (5): 389–407. Gilboy, E., Sopena, J.M., Clifton, C., & Frazier, L. 1995. Argument structure and association preferences in Spanish and English compound NPs. Cognition, 54, 131–167. Gilboy, E. & Sopena, J.M. 1996. Segmentation efects in the processing of complex noun pronouns with relative clauses. In Carreiras, M., García-Albea, J.E. & SebastiánGallés, N. (eds.), Language processing in Spanish, 191–206. Mahwah, NJ: Lawrence Erlbaum. Gillespie, M. & Pearlmutter, N.J. 2011. Hierarchy and scope of planning in subject-verb agreement production. Cognition 118: 377–397. Gillespie, M. & Pearlmutter, N.J. 2013. Against structural constraints in subject-verb agreement production. Journal of Experimental Psychology: Learning, Memory, and Cognition 39 (2): 515. Givon, T. 1993. English grammar: A function-based introduction. Amsterdam: John Benjamins.

218

References

Goldberg, A. 1995. Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press. Goldberg, A. 2006. Constructions at work: The nature of generalization in language. New York: Oxford University Press. Goldberg, A. 2019. Explain me this: Creativity, competition and the partial productivity of constructions. Princeton: Princeton University Press. Goldberg, A. & Bencini, G. 2005. Support from language processing for a constructional approach to grammar. In Tyler, A. (ed.), Language in use: Cognitive and discourse perspectives on language and language learning, 3-18. Georgetown: Georgetown University Press . Goodall, G. 2016. The D-linking efect on extraction from islands and non-islands. Frontiers in Psychology 5 (1493): 1493. Gordon, P., Hendrick, R. & Johnson, M. 2001. Memory interference during language processing. Journal of Experimental Psychology: Learning, Memory and Cognition 27: 1411–1423. Gordon, P. & Lowder, M.W. 2012. Complex sentence processing: A review of theoretical perspectives on the comprehension of relative clauses: Review of complex sentence processing. Language and Linguistics Compass 6: 403–415. Gorrell, P. 1993. Evaluating the direct association hypothesis: A reply to Pickering and Barry (1991). Language and Cognitive Processes 8 (2): 129–146. Gouvea, A., Phillips, C., Kazanina, N. & Poeppel, D. 2010. The linguistic processes underlying the P600. Language & Cognitive Processes 25 (2): 149–188. Grant, M., Sloggett, S. & Dillon, B. 2020. Processing ambiguities in attachment and pronominal reference. Glossa: A Journal of General Linguistics 5 (1): 77. http://doi.org /10.5334/gjgl.852. Greenberg, J. 1963. Some universals of grammar with particular reference to the order of meaningful elements. In Greenberg, J.H. (ed.), Universals of grammar, 73–113. Cambridge, MA: MIT Press. Greenberg, J. 1966. Language universals with special reference to feature hierarchies. The Hague: Mouton. Grillo, N. 2009. Generalized minimality: Feature impoverishment and comprehension defcits in agrammatism. Lingua 119: 1426–1443. Grillo, N. 2012. Local and universal. In Bianchi, V. & Chesi, C. (eds.), Enjoy linguistics! Papers ofered to Luigi Rizzi on the occasion of his 60th birthday, 234–245. Siena: CISCL Press. Grillo, N., Tomaz, M., Lourenço Gomes, M. & Santi, A. 2013. Pseudo relatives vs. Relative clauses: Greater preference, lower costs. AMLaP (Architectures and mechanisms for language processing). Marseille, France. Grillo, N. & Costa, J. 2014. A novel argument for the universality of parsing principles. Cognition 133 (1): 156–187. Grillo, N. & Spathas, G. 2014. Tense and aspect modulate RC attachment: Testing the PR hypothesis in Greek. DGfS 2016–36 Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft. Grillo, N., Costa, J., Fernandes, B. & Santi, A. 2015. Highs and lows in English attachment. Cognition 144: 116–122. Grillo, N., Hemforth, B., Pozniak, C. & Santi, A. 2015. Pseudo relatives are easier than relative clauses: Eyetracking evidence from tense. AMLaP (Architectures and mechanisms for language processing). Los Angeles. Grant, M., Sloggett, S. & Dillon, B. 2020. Processing ambiguities in attachment and pronominal reference. Glossa: A Journal of General Linguistics 5: 77. https://doi.org/10 .5334/gjgl.852.

References

219

Guasti, M. 1988. La pseudorelative et les phénomenènes d’accord. Rivista di Grammatica Generativa 13: 35–57. Guasti, M. 1993. Causatives and perception verbs: A comparative study. Torino: Rosenberg and Sellier. Guasti, T. & Rizzi, L. 2001. Agreement and tense as distinct syntactic positions: Evidence from acquisition. In Cinque, G. (ed.), The structure of DP and IP: The cartography of syntactic structures (Vol. 1), 167–194. New York: Oxford University Press. Grodner, D., Gibson, E. & Watson, D. 2005. The infuence of contextual contrast on syntactic processing: Evidence for strong‐interaction in sentence comprehension. Cognition 95: 275–296. Haegeman, L. 1991. Introduction to government and binding theory. Oxford: Blackwell. Haegeman, L. 1994. Introduction to government and binding theory (2nd ed.). Oxford: Blackwell. Hagoort, P., Brown, C.M. & Groothusen, J. 1993. The syntactic positive shift (SPS) as an ERP measure of syntactic processing. Language and Cognitive Processes 8 (4): 439–483. Hagoort, P. 2003. How the brain solves the binding problem for language: A neurocomputational model of syntactic processing. NeuroImage 20: 18–29. Hagoort, P., Wassenaar, M.E.D. & Brown, C.M. 2003. Syntax-related ERP efects in Dutch. Cognitive Brain Research 16: 38–50. Haiman, J. 1985. Natural syntax. Cambridge: Cambridge University Press. Hale, J.T. 2011. What a rational parser would do. Cognitive Science 35 (3): 399–443. Hammerly, C.M., Staub, A. & Dillon, B. 2019. The grammaticality asymmetry in agreement attraction refects response bias: Experimental and modeling evidence. Amherst, MA: University of Massachusetts Amherst. Hare, M., McRae, K. & Elman, J.L. 2004. Admitting that admitting verb sense into corpus analyses makes sense. Language and Cognitive Processes 19: 181–224. Harley, H. & Ritter, E. 2002. Person and number in pronouns: Motivating a featuregeometric analysis. Language 78: 482–526. Hartsuiker, R., Antón-Méndez, I. & Zee, M. 2001. Object attraction in subject–verb agreement construction. Journal of Memory and Language 45: 546–572. Hartsuiker, R., Schriefers, H., Bock, K. & Kikstra, G. 2003. Morphophonological infuences on the construction of subject-verb agreement. Memory and Cognition 31: 1316–1326. Hartsuiker, R. & Barkhuysen, P. 2006. Language production and working memory: The case of subject–verb agreement. Language and Cognitive Processes 21: 181–204. Haskell, T.R. & MacDonald, M.C. 2003. Conficting cues and competition in subjectverb agreement. Journal of Memory and Language 48: 760–778. Haskell, T., Thornton, R. & MacDonald, M. 2010. Experience and grammatical agreement: Statistical learning shapes number agreement production. Cognition 114: 151–164. Haspelmath, M. 1999. Optimality and diachronic adaptation. Zeitschrift für Sprachwissenschaft 18 (2): 180–205. Hauser, M., Chomsky, N. & Fitch, T. 2002. The faculty of language: What is it, who has it, and how did it evolve. Science 198: 1569–1579. Hawkins, J. 1994. A performance theory of order and constituency. Cambridge: Cambridge University Press. Hawkins, J. 2004. A performance theory of word order and constituency. Cambridge: Cambridge University Press.

220

References

Hemforth, B., Konieczny, L. & Scheepers, C. 2000. Syntactic attachment and anaphor resolution: Two sides of relative clause attachment. In Crocker, M., Pickering, M. & Clifton, C. Jr. (eds.), Architectures and mechanisms for language processing, 259–281. Cambridge: Cambridge University Press. Hemforth, B., Fernández, E., Clifton, C.F. & Frazier, L. 2015. Relative clause attachment in German, English, Spanish and French: Efects of position and length. Lingua 166: 43–64. Hestvik, A., Maxfeld, N., Schwartz, R.G. & Shafer, V. 2007. Brain responses to flled gaps. Brain and Language 100: 301–316. Hirotani, M., Frazier, L. & Rayner, K. 2006. Punctuation and intonation efects on clause and sentence wrap-up: Evidence from eye movements. Journal of Memory and Language 54: 425–443. Hoeks, J.C., Vonk, W. & Schriefers, H. 2002. Processing coordinated structures in context: The efect of topic-structure on ambiguity resolution. Journal of Memory and Language 46: 99–119. Hofmeister, P. 2011. Representational complexity and memory retrieval in language comprehension. Language and Cognitive Processes 26 (3): 376–405. Holmberg, A., Nayudu, A. & Sheehan, M. 2009. Three partial null-subject languages: A comparison of Brazilian Portuguese, Finnish, and Marathi. Studia linguistica 63 (1): 59–97. Hornstein, N. 1999. Movement and control. Linguistic Inquiry 30: 69–96. Huddleston, R. & Pullum, G. 2002. The Cambridge grammar of the English language. Cambridge: Cambridge University Press. Huettig, F., Rommers, J. & Meyer, A.S. 2011. Using the visual world paradigm to study language processing: A review and critical evaluation. Acta Psychologica 137: 151–171. Humphreys, K. & Bock, K. 2005. Notional number agreement in English. Psychonomic Bulletin and Review 12: 689–95. Hunter, T., Stanojevic, M. & Stabler, E. 2019. The active-fller strategy in a move-eager left-corner minimalist grammar parser. Proceedings of the workshop on cognitive modeling and computational linguistics, 1–10. Igoa, J.M., García-Albea, J.E. & Sánchez-Casas, R. 1999. Gender-number dissociations in sentence production in Spanish. Rivista de Linguistica 11: 165–198. Jackendof, R. 1974. A deep structure projection rule. Linguistic Inquiry 5: 481–506. Jackendof, R. 1993. Patterns in the mind: Language and human nature. New York: Basic Books. Jackendof, R. 2002. Foundations of language. Oxford: Oxford University Press. Jackendof, R. 2003. Précis of foundations of language: Brain, meaning, grammar, evolution. Behavioral and Brain Sciences 26: 651–707. Jackendof, R. 2007a. A parallel architecture perspective on language processing. Brain Research 1146: 2–22. Jackendof, R. 2007b. Linguistics and cognitive science: The state of the art. The Linguistic Review 24: 347–401. Jackendof, R. 2015/2017. In defense of theory. Cognitive Science 41: 185–212. Jackendof, R. 2017. In defence of theory. Cognitive Science 41: 185–212. Jackendof, R. & Culicover, P.W. 2003. The semantic basis of control. Language 79: 517–556. Jaeger, F., Fedorenko, E. & Gibson, E. 2005. Dissociation between production and comprehension complexity. Poster presentation at the 18th CUNY sentence processing conference, University of Arizona.

References

221

Jaeger, F.T. & Snider, N.E. 2013 Alignment as a consequence of expectation adaptation: Syntactic priming is afected by the prime’s prediction error given both prior and recent experience. Cognition 127: 57–83. Jespersen, O. 1922. Language, its nature, development and origin. New York: Henry Colt and Co. Joseph, B.D. 1979. On the agreement of refexive forms in English. Linguistics 17: 519–523. Joshi, A.K., Levy, L.S. & Takahashi, M. 1975. Tree adjunct grammars. Journal of Computer and System Sciences 10 (1): 136–163. Jun, S.-A. 2003. Prosodic phrasing and attachment preferences. Journal of Psycholinguistic Research 32 (2): 219–249. Jun, S.-A. 2007. The intermediate phrase in Korean intonation: Evidence from sentence processing. In Gussenhoven, C. & Riad, T. (eds.), Tones and tunes: Studies in word and sentence prosody, 143–167. Berlin: Mouton de Gruyter. Jun, S.-A. 2010. The implicit prosody hypothesis and overt prosody in English. Language and Cognitive Processes 25 (7): 1201–1233. Jun, S.-A. & Kim, S. 2004. Default phrasing and attachment preferences in Korean. Proceedings of interspeech-ICSLP. Jeju, Korea. Jurafsky, D. 2003. Probabilistic modeling in psycholinguistics: Linguistic comprehension and production. Bod, R., Hay, J. & Jannedy, S. (eds.), Probabilistic linguistics. Boston: MIT Press. Just, M.A. & Carpenter, P.A. 1980. A theory of reading: From eye fxations to comprehension. Psychological Review 87: 329–354. Kaiser, E. & Trueswell, J.C. 2008. Interpreting pronouns and demonstratives in Finnish: Evidence for a form-specifc approach to reference resolution. Language and Cognitive Processes 23 (5): 709–748. Kaiser, E., Runner, J.T., Sussman, R.S. & Tanenhaus, M.K. 2009. Structural and semantic constraints on the resolution of pronouns and refexives. Cognition 112 (1): 55–80. Kane, M.J., Conway, A.R.A., Miura, T.K. & Colfesh, G.J.H. 2007. Working memory, attention control, and the N-back task: A question of construct validity. Journal of Experimental Psychology: Learning, Memory, and Cognition 33: 615–622. Kaplan, R. & Bresnan, J. 1982. Lexical-functional grammar: A formal system for grammatical representations. In Bresnan, J. (ed.), The mental representations of grammatical relations, 173–281. Cambridge, MA: MIT Press. Karimi, H. & Ferreira, F. 2016. Good-enough linguistic representations and online cognitive equilibrium in language processing. Quarterly Journal of Experimental Psychology 69: 1013–1040. Kaschak, M.P. & Glenberg, A.M. 2004. This construction needs learned. Journal of Experimental Psychology: General 133: 450–467. Kathol, A. 1999. Agreement and the syntax-morphology interface HPSG. In Levine, R. & Green, G. (eds.), Studies in contemporary phrase structure grammar, 223–274. Cambridge: Cambridge University Press. Kawasaki, T. & Ishikawa, K. 2003. Empty category and the efect of teaching in sentence processing. Volume: Proceedings of the 17th Pacifc Asia conference on language, information and computation, 456–461. Sentosa, Singapore. Kazanina, N., Lau, E., Lieberman, M., Yoshida, M. & Phillips, C. 2007. The efect of syntactic constraints on the processing of backwards anaphora. Journal of Memory and Language 56: 384–409.

222

References

Kazanina, N., Lau, E., Lieberman, M., Yoshida, M. & Phillips, C. 2007. The efect of syntactic constraints on the processing of backwards anaphora. Journal of Memory and Language 56: 384–409. Keenan, E.L. & Comrie, B. 1977. Noun phrase accessibility and universal grammar. Linguistic Inquiry 8: 63–99. Keeney, T. & Wolfe, J. 1972. The acquisition of agreement in English. Journal of Verbal Learning and Verbal Behavior 11: 698–705. Kehler, A. & Rohde, H. 2013. Probabilistic reconciliation of coherence-driven and centering-driven theories of pronoun interpretation. Theoretical Linguistics 39 (1–2): 1–37. Keshev, M. & Meltzer-Asscher, A. 2017. Active gap flling in islands: How grammatical resumption afects online sentence processing. Language 93: 249–268. Kimball, J.P. 1973. Seven principles of surface structure parsing in natural language. Cognition 2: 15–47. King, J. & Just, M. 1991. Individual diferences in syntactic processing: The role of working memory. Journal of Memory and Language 30: 580–602. King, J. & Kutas, M. 1995. Who did what and when? Using word- and clause-level ERPs to monitor working memory usage in reading. Journal of Cognitive Neuroscience 7 (3): 376–395. Kirby, S. 2013. Language, culture, and computation: An adaptive systems approach to biolinguistics. In Boeckx, C. & Grohmann, K. (eds.), The Cambridge handbook of biolinguistics, 460–477. Cambridge: Cambridge University Press. Kirchner, W.K. 1958. Age diferences in short-term retention of rapidly changing information. Journal of Experimental Psychology 55: 352–358. Klein, G.S. 1964. Semantic power measured through the interference of words with colour-naming. American Journal of Psychology 77: 576–588. Kloss, H. & Van Orden, G. 2009. Soft-assembled mechanisms for the grand theory. In Spencer, J.P., Thomas, M. & McClelland, J. (eds.), Toward a new grand theory of development! Connectionism and dynamic systems theory reconsidered, 253–267. Oxford: Oxford University Press. Konieczny, L. 2000. Locality and parsing complexity. Journal of Psycholinguistic Research 29 (6): 627–645. Koopman, H. & Sportiche, D. 1991. The position of subjects. Lingua 85: 211–258. Koopman, H. & Sportiche, D. 2010. The que/qui alternation: New analytical directions. Ms. UCLA. Kreiner, H., Sturt, P. & Garrod, S. 2008. Processing defnitional and stereotypical gender in reference resolution: Evidence from eye-movements. Journal of Memory and Language 58 (2): 239–261. Kush, D., Lidz, J. & Phillips, C. 2017. Looking forwards and backwards: The real-time processing of strong and weak crossover. Glossa: A Journal of General Linguistics 2 (1): 70 1–29. Kutas, M. & Hillyard, S.A. 1983. Event-related brain potentials to grammatical errors and semantic anomalies. Memory and Cognition 11 (5): 539–550. Kutas, M. & Hillyard, S.A. 1984. Brain potentials during reading refect word expectancy and semantic association. Nature 307: 161–163. Kwon, N. & Sturt, P. 2014. The use of control information in dependency formation: An eye-tracking study. Journal of Memory and Language 73 (1): 59–80. Kwon, N. & Sturt, P. 2016. Processing control information in a nominal control construction: An eye-tracking study. Journal of Psycholinguistic Research 45 (4): 779–793.

References

223

Kwon, N., Ong, D., Chen, H. & Zhang, A. 2019. The role of animacy and structural information in relative clause attachment: Evidence from Chinese. Frontiers in Psychology. 17 July 2019. https://doi.org/10.3389/fpsyg.2019.01576. Sanz, M., Laka, I. & Tanenhaus, M.K. (eds.). 2013. Language down the garden path: The cognitive and biological bases for linguistic structure. Oxford: Oxford University Press. Ladusaw, W. 1979. Polarity sensitivity as inherent scope relations. Doctoral dissertation. University of Texas at Austin. Lago, S., Shalom, D.E., Sigman, M., Lau, E.F. & Phillips, C. 2015. Agreement attraction in Spanish comprehension. Journal of Memory and Language 82: 133–149. Lago, S., Sloggett, S., Schlueter, Z., Chow, W.Y., Williams, A., Lau, E. & Phillips, C. 2017. Coreference and antecedent representation across languages. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43 (5): 795–817. Lago, S., Acuña-Fariña, C. & Meseguer, E. 2021. The reading signatures of agreement attraction. Open Mind: 1–22. https://doi.org/10.1162/opmi_a_00047. Lakof, G. 1987. Women, fre, and dangerous things: What categories reveal about the mind. Chicago: The University of Chicago Press. Lambrecht, K. 1994. Information structure and sentence form: Topic, focus, and the mental representations of discourse referents. Cambridge: Cambridge University Press. Lamers, M.A.L. 2001. Sentence processing: Using syntactic, semantic and thematic information. Doctoral dissertation. University of Groningen, Groningen. Landau, I. 2000. Elements of control: Structure and meaning in infnitival constructions. Dordrecht: Kluwer. Landau, I. 2007. Movement-resistant aspects of control. In Davis, W.D. & Dubinsky, S. (eds.), New horizons in the analysis of control and raising, 293–325. Dordrecht: Springer. Langacker, R. 1987. Foundations of cognitive grammar. Stanford: Stanford University Press. Langacker, R. 1991a. Foundations of cognitive grammar (Vol. 2). Descriptive application. Stanford: Stanford University Press. Langacker, R. 1991b. Concept, image and symbol: The cognitive basis of grammar. Berlin: Mouton. Lass, R. 1990. How to do things with junk: Exaptation in language evolution. Journal of Linguistics 26: 79–102. Laszlo, S. & Federmeier, K. 2009. A beautiful day in the neighborhood: An event-related potential study of lexical relationships and prediction in context. Journal of Memory and Language 61 (3): 326–338. Lau, E.F., Rozanova, K. & Phillips, C. 2007. Syntactic prediction and lexical surface frequency efects in sentence processing. University of Maryland Working Papers in Linguistics 16: 163–200. Lee, W.L. 2004. Another look at the role of empty categories in sentence processing (and grammar). Journal of Psycholinguistic Research 33 (1): 51–73. Lee, E.K. & Garnsey, S.M. 2015. An ERP study of plural attraction in attachment ambiguity resolution: Evidence for retrieval interference. Journal of Neurolinguistics 36: 1–16. Leiken, K., McElree, B. & Pylkkänen, L. 2016. Filling predictable and unpredictable gaps, with and without similarity-based interference: Evidence for LIFG efects of dependency processing. Frontiers in Psychology 6: 1739. Levin, B. & Rappaport-Hovav, M. 1995. Unaccusativity at the syntax-lexical semantic interface. Cambridge, MA: MIT Press. Levy, R. 2008a. Expectation-based syntactic comprehension. Cognition 106: 1126–1177.

224

References

Levy, R. 2008b. A noisy-channel model of rational human sentence comprehension under uncertain input. Proceedings of the 2008 conference on empirical methods in natural language processing, 234–243. Honolulu: Association for Computational Linguistics. Levy, R. 2011. Integrating surprisal and uncertain-input models in online sentence comprehension: Formal techniques and empirical results. Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, 1055– 1065. Portland, OR: Association for Computational Linguistics. Lewis, R. & Vasishth, S. 2005. An activation-based model of sentence processing as skilled memory retrieval. Cognitive Science 29: 375–419. Lewis, R., Vasishth, S. & Van Dyke, J.A. 2006. Computational principles of working memory in sentence comprehension. Trends in Cognitive Sciences 10: 447–454. Lewis, S. & Phillips, C. 2015. Aligning grammatical theories and language processing models. Journal of Psycholinguistic Research 44 (1): 27–46. Lin Chien-Jer, C. 2013. Thematic templates and the comprehension of relative clauses. In Sanz, M., Laka, I. & Tanenhaus, M. (eds.), Language down the garden path: The cognitive basis for linguistic structure, 294–315. Oxford: Oxford University Press. Lo, C.-W. & Brennan, J.R. 2021. EEG correlates of long-distance dependency formation in Mandarin Wh-questions. Frontiers in Human Neuroscience. https://doi.org/10.3389/ fnhum.2021.591613. Logačev, P. & Vasishth, S. 2016. Understanding underspecifcation: A comparison of two computational implementations. The Quarterly Journal of Experimental Psychology 69: 996–1012. López Sancio, S. 2020. Understanding dependencies in real time: A crosslinguistic investigation of antecedent complexity and dependency length. Doctoral dissertation. University of the Basque Country. Lorimor, H., Bock, K., Zalkind, E., Sheyman, A. & Beard, R. 2008. Agreement and attraction in Russian. Language and Cognitive Processes 23: 769–799. Lorimor, H., Jackson, C.N. & Foote, R. 2015. How gender afects number: Cuebased retrieval in agreement production. Language, Cognition and Neuroscience 30: 947–954. Love, J. & McKoon, G. 2011. Rules of engagement: Incomplete and complete pronoun resolution. Journal of Experimental Psychology: Learning, Memory and Cognition 37: 874–887. Lovric, N. 2003. Implicit prosody in silent reading: Relative clause attachment in Croatian. Doctoral dissertation. City University of New York. Lunn, P. 2002. Tout se tient in Dominican Spanish. In James, L., Kimberly, G. & Clements, C. (eds.), Structure, meaning, and acquisition in Spanish: The 4th Hispanic linguistics symposium, 65–72. Somerville, MA: Cascadilla Press. Lyngfelt, B. 2009. Control phenomena. In Brisard, F., Östman, J.-O. & Verschueren, J. (eds.), Grammar, meaning and pragmatics: Handbook of pragmatic highlights, Vol. 5, 33–49. Amsterdam (Holland): John Benjamins. MacDonald, M.C. 1994. Probabilistic constraints and syntactic ambiguity resolution. Language and Cognitive Processes 9: 157–201. MacDonald, M.C., Pearlmutter, N.J. & Seidenberg, M.S. 1994. The lexical nature of syntactic ambiguity resolution. Psychological Review 101: 676–703. MacWhinney, B. 2001. Emergentist approaches to language. In Bybee, J. & Hopper, P. (eds.), Frequency and the emergence of linguistic structure, 449–470. Amsterdam: John Benjamins. MacWhinney, B. & Bates, E. 1989. The crosslinguistic study of sentence processing. New York: Cambridge University Press.

References

225

McCarthy, J. 2002. A thematic guide to optimality theory. Cambridge: Cambridge University Press. McCourt, M., Green, J., Lau, E. & Alexander, W. 2016. Processing implicit control: Evidence from reading times. Frontiers in Psychology 6: 1629. Maia, M., Costa, A., Fernández, E. & Lourenço-Gomes, M. 2006. Early and late preferences relative clause attachment in Portuguese and Spanish. Journal of Portuguese Linguistics 5: 203–226. Mallinson, G. & Blake, B.J. 1981. Language typology: Cross-linguistic studies in syntax. Amsterdam: North Holland. Mancini, S., Molinaro, N., Rizzi, L. & Carreiras, M. 2011a. A person is not a number: Discourse involvement in subject-verb agreement computation. Brain Research 1410: 64–76. Mancini, S., Molinaro, N., Rizzi, L. & Carreiras, M. 2011b. When persons disagree: An ERP study of unagreement in Spanish. Psychophysiology 48 (10): 1–11. Mancini, S., Quiñones, I., Molinaro, N., Hernandez-Cabrera, J. & Carreiras, M. 2017. Disentangling meaning in the brain: Left temporal involvement in agreement processing. Cortex 86: 140–155. Mancini, S., Massol, S., Duñabeitia, J.A., Carreiras, M. & Molinaro, N. 2019. Agreement and illusion of disagreement: An ERP study on Basque. Cortex 116: 154–167. McRae, K. & Matsuki, K. 2013. Constraint-based models of sentence processing. In van Gompel, R.P.G. (ed.), Current issues in the psychology of language: Sentence processing, 51–77. New York: Psychology Press. Marantz, A. 2005. Generative linguistics within the cognitive neuroscience of language. The Linguistic Review 22: 429–446. Maratsos, M.P. 1979. How to get from words to sentences. In Aaronson, D. & Rieber, R. (eds.), Perspectives in psycholinguistics. Hillsdale, NJ: Erlbaum. Marslen‐Wilson, W. 1973. Linguistic structure and speech shadowing at very short latencies. Nature 244: 522–523. Martin, R.C., Shelton, J.R. & Yafee, L.S. 1994. Language processing and working memory: Neuropsychological evidence for separate phonological and semantic capacities. Journal of Memory and Language 33: 83–111. Martin, A.E., Nieuwland, M.S. & Carreiras, M. 2014. Agreement attraction during comprehension of grammatical sentences: ERP evidence from ellipsis. Brain and Language 135: 42–51. McElree, B. & Bever, T. 1989. The psychological reality of linguistically defned gaps. Journal of Psycholinguistic Research 18 (1): 21–35. McWhorter, J.H. 2001. The world’s simplest grammars are creole grammars. Linguistic Typology 5: 125–166. Meseguer, E., Acuña-Fariña, C. & Carreiras, M. 2009. Processing ambiguous Spanish se in a minimal chain. The Quarterly Journal of Experimental Psychology 62 (4): 766–788. Miller, G.A. 1956. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review 63: 81–97. Miller, G. & Chomsky, N. 1963. Finitary models of language users. In Luce, R.D., Bush, R.R. & Galanter, E. (eds.), Handbook of mathematical psychology, Vol. II, 419–492. New York: John Wiley. Miller, G.A. & Isard, S. 1964. Free recall of self-embedded English sentences. Information and Control 7 (3): 292–303. Mitchell, D., Cuetos, F. & Zagar, D. 1990. Reading in diferent languages: Is there a universal mechanism for parsing sentences? In d’Arcais, G.F. & Balota, D. (eds.), Comprehension processes in reading, 285–302. Hillsdale, NJ: Lawrence Erlbaum.

226

References

Mitchell, D.C. & Cuetos, F. 1991. The origin of parsing strategies. In Smith, C. (ed.), Current issues in natural language processing, 1–12. Austin: Center for Cognitive Science. Mitchell, D.C., Cuetos, F. & Corley, M.M.B. 1992. Statistical versus linguistic determinants of parsing bias: Cross-linguistic evidence. Paper presented at the ffth annual CUNY conference on human sentence processing, New York. Mitchell, D.C., Cuetos, F., Corley, M.M.B. & Brysbaert, M. 1995. Exposure-based models of human parsing: Evidence for the use of coarse-grained (nonlexical) statistical records. Journal of Psycholinguistic Research 24: 469–488. Mitchell, D.C. & Brysbaert, M. 1998. Challenges to recent theories of crosslinguistic variation in parsing: Evidence from Dutch. In Hillert, D. (ed.), Sentence processing: A crosslinguistic perspective, 313–335. San Diego, CA: Academic Press. Mitchell, D.C., Brysbaert, M., Grondelaers, S. & Swanepoel, P. 2000. Modifer attachment in Dutch: Testing aspects of construal theory. In Kennedy, D.H.A., Radach, R. & Pynte, J. (eds.), Reading as a perceptual process, 493–516. Oxford: Elsevier. Miyagawa, S. 1997. Against optional scrambling. Linguistic Inquiry 28: 1–25. Molinaro, N., Barber, H. & Carreiras, M. 2011. Grammatical agreement processing in reading: ERP fndings and future directions. Cortex 47: 908–930. Molinaro, N., Vespignani, F., Zamparelli, R. & Job, R. 2011. Why brother and sister are not just siblings: Repair processes in agreement computation. Journal of Memory and Language 64 (3): 211–232. Molinaro, N., Su, J.L. & Carreiras, M. 2016. Stereotypes override grammar: Social knowledge in sentence comprehension. Brain and Language 155–156: 36–43. Momma, S. & Phillips, C. 2018. The relationship between parsing and generation. Annual Review of Linguistics 4: 233–254. Montalbetti, M.M. 1984. Prologue. After binding: On the interpretation of pronouns (PhD). Massachusetts Institute of Technology. Moravcsik, E. 1995. Summing up sufxaufnahme. In Plank, F. (ed.), Double case: Agreement by sufxaufnahme, 451–484. Oxford: Oxford University Press. Munte, T.F. & Heinze, H.J. 1994. ERP negativities during syntactic processing of written words. In Heinze, H.J., Munte, T.F. & Mangun, G.R. (eds.), Cognitive electrophysiology, 211–238. Boston: Birkäuser. Munte, T.F., Szentkutia, A., Wieringaa, B., Matzkea, M. & Johannes, S. 1997. Human brain potentials to reading syntactic errors in sentences of diferent complexity. Neuroscience Letters 235 (3): 105–108. Nakamura, C., Arai, M. & Mazuka, R. 2012. Immediate use of prosody and context in predicting a syntactic structure. Cognition 125 (2): 317–23. Nakano, Y., Felser, C. & Clahsen, H. 2002. Antecedent priming at trace positions in Japanese long-distance scrambling. Journal of Psycholinguistic Research 31: 531–571. Nelson, M. & Gailly, J.-L. 1996. The data compression book. New York: M & T Books. Ness, T. & Meltzer-Asscher, A. 2017. Working memory in the processing of longdistance dependencies: Interference and fller maintenance. Journal of Psycholinguistic Research 46 (6): 1353–1365. Nevins, A., Dillon, B., Malhotra, S. & Phillips, C. 2007. The role of feature number and feature-type in processing Hindi verb agreement violations. Brain Research 1164: 81–94. Ni, W., Crain, S. & Shankweiler, D. 1996. Sidestepping garden paths: Assessing the contributions of syntax, semantics and plausibility in resolving ambiguities. Language and Cognitive Processes 11 (3): 283–334. Nicenboim, B., Vasishth, S., Gattei, C., Sigman, M. & Kliegl, R. 2016. Working memory diferences in long‐distance dependency resolution. Frontiers in Psychology 1: 312.

References

227

Nicol, J. 1988. Coreference processing during sentence comprehension. Unpublished doctoral dissertation. MIT Press. Nicol, J. & Swinney, D. 1989. The role of structure in coreference assignment during sentence processing. Journal of Psycholinguistic Research 18: 5–24. Nicol, J.L., Forster, K.I. & Veres, C. 1997. Subject-verb agreement processes in comprehension. Journal of Memory and Language 36: 569–587. Oakhill, J., Garnham, A. & Reynolds, D. 2005. Immediate activation of stereotypical gender information. Memory & Cognition 33: 972–983. O’Grady, W. 2005. Syntactic carpentry: An emergentist approach to syntax. Mahwah, NJ: Erlbaum. O’Grady, W. 2010. An emergentist approach to syntax. In Narrog, H. & Heien, B. (eds.), The Oxford handbook of linguistic analysis, 257–283. Oxford: Oxford University Press. Omaki, A. & Schulz, B. 2011. Filler-gap dependencies and island contraints in secondlanguage sentence processing. Studies in Second Language Acquisition 33 (4): 563–588. Omaki, A., Lau, E.F., Davidson White, I., Dakan, M., Apple, A. & Phillips, C. 2015. Hyper-active gap flling. Frontiers in Psychology 6: 384. https://doi.org/10.3389/fpsyg .2015.00384. Osterhout, L. 1994. Event-related brain potentials as tools for comprehending language comprehension. In Clifton, C. Jr., Frazier, L. & Rayner, K. (eds.), Perspectives on sentence processing, 15–44. Hillsdale, NJ: Erlbaum. Osterhout, L. & Swinney, D.A. 1993. On the temporal course of gap-flling during comprehension of verbal passives. Journal of Psycholinguistics Research 22: 273–286. Osterhout, L. & Mobley, L.A. 1995. Event-related brain potentials elicited by failure to agree. Journal of Memory and Language 34 (6): 739–773. Osterhout, L., McKinnon, R., Bersick, M. & Corey, V. 1996. On the language-specifcity of the brain response to syntactic anomalies: Is the syntactic positive shift a member of the P300 family? Journal of Cognitive Neuroscience 8 (6): 507–526. Osterhout, L., Bersick, M. & McLaughlin, J. 1997. Brain potentials refect violations of gender stereotypes. Memory & Cognition 25: 273–285. Otero, C.P. 1986. Arbitrary subjects in fnite clauses. In Bordelois, I., Contreras, H. & Zagona, K. (eds.), Generative studies in Spanish syntax, 82–109. Dordrecht: Foris. Otero, J. & Kintsch, W. 1992. Failures to detect contradictions in a text: What readers believe versus what they read. Psychological Science 3: 229–235. Ouhalla, J. 2005. Agreement features, agreement and antiagreement. Natural Language & Linguistic Theory 23 (3): 655–686. Padrón, I., Fraga, I. & Acuña-Fariña, C. 2020. Processing gender agreement errors in pleasant and unpleasant words: An ERP study at the sentence level. Neuroscience Letters 174: 134538. https://doi.org/10.1016/j.neulet.2019.134538. Pan, H.-Y., Schimke, S. & Felser, C. 2014. Referential context efects in non-native relative clause ambiguity resolution. International Journal of Bilingualism 19: 298–313. Pañeda, C., Lago, S., Vares, E. & Felser, C. 2020. Island efects in Spanish comprehension. Glossa: A Journal of General Linguistics 5 (1): 21. https://doi.org/10 .5334/GJGL.1058. Papadopoulou, D. & Clahsen, H. 2003. Parsing strategies in L1 and L2 sentence processing: A study of relative clause attachment in Greek. Studies in Second Language Acquisition 25: 501–528. Parker, D., Lago, S. & Phillips, C. 2015. Interference in the processing of adjunct control. Frontiers in Psychology 6: 1346. Parker, D. & Phillips, C. 2016. Negative polarity illusions and the format of hierarchical encodings in memory. Cognition 157: 321–339.

228

References

Parker, D. & Phillips, C. 2017. Refexive attraction in comprehension is selective. Journal of Memory and Language 94: 272–290. Parker, D., Shvartsman, M. & Van Dyke, J.A. 2017. The cue-based retrieval theory of sentence comprehension: New fndings and new challenges. In Escobar, L., Torrens, V. & Parod, T. (eds.), Language processing and disorders, 121–144. Newcastle: Cambridge Scholars Publishing. Parker, D. & An, A. 2018. Not all phrases are equally attractive: Experimental evidence for selective agreement attraction efects. Frontiers in Psychology 28 August 2018. https://doi.org/10.3389/fpsyg.2018.01566. Patson, N.D. & Warren, T. 2011. Building complex reference objects from dual sets. Journal of Memory and Language 64: 443–459. Patson, N. & Husband, E.M. 2015. Misinterpretation in agreement and agreement attraction. Quartely Journal of Experimental Psychology 69 (5): 1–22. Payne, J. & Huddleston, R. 2002. Nouns and noun phrases. In Huddleston, R. & Pullum, K. (eds.), The Cambridge grammar of the English language, 323–523. Cambridge: Cambridge University Press. Pearlmutter, N.J., Garnsey, S.M. & Bock, K. 1999. Agreement processes in sentence comprehension. Journal of Memory and Language 41: 427–456. Penrose, L.S. & Penrose, R. 1958. Impossible objects: A special kind of visual illusion. The British Journal of Psychology 49 (1): 31–33. Pesetsky, D. 1987. Wh-in-Situ: Movement and unselective binding. In Reuland, E. & ter Meulen, A. (eds.), The representation of (in)defnitess, 98–129. Cambridge: MIT Press. Pfau, R. 2003. Defective feature copy and anti-agreement in language production. In Grifn, W.E. (ed.), The role of agreement in natural language, 53. TLS 5 Proceedings. Austin: Texas Linguistics Forum. Phillips, C. 1996. Order and structure. Doctoral dissertation. MIT Press. Phillips, C. 2003. Linear order and constituency. Linguistic Inquiry 34: 37–90. Phillips, C. 2006. The real-time status of island phenomena. Language 82 (4): 795–823. Phillips, C. 2013. Parser-grammar relations: We don’t understand everything twice. In Sanz, M., Laka, I. & Tanenhaus, M (eds.), Language down the garden path: The cognitive basis for linguistic structure, 294–315. Oxford: Oxford University Press. Phillips, C. & Gibson, E. 1997. The strength of the local attachment preference. Journal of Psycholinguistic Research 26: 323–346. Phillips, C., Kazanina, N. & Abada, S.H. 2005. ERP efects of the processing of syntactic long-distance dependencies. Cognitive Brain Research 22 (3): 407–428. Phillips, C., Wagers, M. & Lau, E. 2011. Grammatical illusions and selective fallibility in real-time language comprehension. In Runner, J. (ed.), Experiments at the interfaces: Syntax & semantics (vol. 37, 147–180). Bingley: Emerald Group Publishing Limited. Pickering, M.J. 1993. Direct association and sentence processing: A reply to Gorrell and to Gibson and Hickok. Language and Cognitive Processes 8: 163–196. Pickering, M.J. 1994. Processing local and unbounded dependencies: A unifed account. Journal of Psycholinguistic Research 23: 323–352. Pickering, M. & Barry, G. 1991. Sentence processing without empty categories. Language, Cognition and Neuroscience 6 (3): 229–259. Pickering, M.J. & Traxler, M.J. 2003. Evidence against the use of subcategorisation frequencies in the processing of unbounded dependencies. Language and Cognitive Processes 18: 469–503.

References

229

Pickering, M.J., McElree, B., Frisson, S., Chen, L. & Traxler, M.J. 2006. Underspecifcation and aspectual coercion. Discourse Processes 42 (2): 131–155. Pickering, M. & Ferreira, F. 2008. Structural priming: A critical review. Psychological Bulletin 134 (3): 427–459. Pickering, M.J. & Garrod, S. 2013. An integrated theory of language production and comprehension. Behavioural and Brain Science 36: 329–347. Pinker, S. 1994/2007. The language instinct. New York: Harper Perennial Modern Classics. Piñango, M.M., Zurif, E. & Jackendof, R. 1999. Real-time processing implications of aspectual coercion at the syntax-semantics interface. Journal of Psycholinguistic Research 28 (4): 395–414. Poeppel, D. & Embick, D. 2005. Defning the relation between linguistics and neuroscience. In Cutler, A. (ed.), Twenty-frst century psycholinguistics: Four cornerstones. Hillsdale, NJ: Erlbaum. Polinsky, M. 2013. Raising and control. In den Dikken, M. (ed.), The Cambridge handbook of generative syntax. Cambridge: Cambridge University Press. Pollard, C. & Sag, I. 1988. An information-based theory of agreement. In Brentari, D., Larson, G. & McLeod, L. (eds.), The 24th annual regional meeting of the Chicago Linguistics Society (CLS 24), 236–257. Chicago, IL: Chicago Linguistics Society. Pollard, C. & Sag, I. 1994. Head-driven phrase structure grammar. Chicago: University of Chicago Press and Stanford: CSLI Publications. Postal, P. 1971. Cross-over phenomena. New York: Holt, Rinehart and Winston. Pozniak, C., Hemforth, B., Haendler, Y., Santi, A. & Grillo, N. 2019. Seeing events vs. entities: The processing advantage of pseudo relatives over relative clauses. Journal of Memory and Language 107: 128–151. Prince, A. & Smolensky, P. 2004. Optimality theory: Constraint interaction in generative grammar. Oxford: Blackwell. Pynte, J. & Prieur, B. 1996. Prosodic breaks and attachment decisions in sentence parsing. Language and Cognitive Processes 11: 165–192. Pynte, J. & Colonna, S. 2000. Decoupling syntactic parsing from visual inspection: The case of relative clause attachment in French. In Kennedy, A., Radach, R., Heller, D. & Pynte, J. (eds.), Reading as a perceptual process, 529–547. North-Holland: Elsevier Science Publishers. Pynte, J. & Colonna, S. 2000. Competition between primary and non-primary relations during sentence comprehension. Journal of Psycholinguistic Research 30: 569–599. Qian, T. & Jaeger, T.F. 2012. Cue efectiveness in communicatively efcient discourse production. Cognitive Science 36: 1312–1336. Radford, A. 1975. Pseudo-relatives and the unity of subject-raising. Archivum Linguisticum 6: 32–64. Rayner, K. 1998. Eye movements in reading and information processing: 20 years of research. Psychological Bulletin 124 (3): 372–422. Rayner, K., Carlson, M. & Frazier, L. 1983. The interaction of syntax and semantics during sentence processing: Eye movements in the analysis of semantically biased sentences. Journal of Verbal Learning and Verbal Behaviour 22: 358–374. Ricker, T.J., AuBuchon, A.M. & Cowan, N. 2010. Working memory. Wiley Interdisciplinary Reviews in Cognitive Science 1: 573–585. Riveiro-Outeiral, S. & Acuña-Fariña, C. 2012. Agreement processes in English and Spanish: A completion study. Functions of Language 19 (1): 53–83. Rizzi, L. 1990. Relativised minimality. Cambridge, MA: MIT Press. Rizzi, L. 1996. Residual verb second and the Wh-Criterion. In Belletti, A. & Rizzi, L. (eds.), Parameters and functional heads, Vol. 2, 63–90. New York: Oxford University Press.

230 References

Rodrigues, C. 2002. Morphology and null subjects in Brazilian Portuguese: Syntactic efects of morphological change. In Lightfoot, D. (ed.), Syntactic efects of morphological change, 160–178. Oxford/New York: Oxford University Press. Rohde, H., Levy, R. & Kehler, A. 2011. Anticipating explanations in relative clause processing. Cognition 118 (3): 339–358. Roncaglia-Denissen, M.P., Schmidt-Kassow, M. & Kotz, S.A. 2013. Speech rhythm facilitates syntactic ambiguity resolution: ERP evidence. PLoS One 8: e56000. Rosenbaum, P. 1967. The grammar of English predicate complement constructions. Cambridge: MIT Press. Ross, J.R. 1967. Constraints on variables in syntax. Doctoral dissertation. MIT Press. Sag, I.A. & Fodor, J.D. 1995. Extraction without traces. In Aranovich, R., Byrne, W., Preuss, S. & Senturia, M. (eds.), WCCFL 13: The proceedings of the 13th West Coast conference on formal linguistics, 365–384. Stanford, CA: CSLI Publications. Saito, M. 1985. Some asymmetries in Japanese and their theoretical implications. Doctoral dissertation. Stanford University. Sánchez López, C. 2002. Las construcciones con se. Madrid: Visor Libros. Sanford, A.J. & Sturt, P. 2002. Depth of processing in language comprehension: Not noticing the evidence. Trends in Cognitive Sciences 6 (9): 382–386. Sanford, A.J. & Graesser, A.C. 2006. Shallow processing and underspecifcation. Discourse Processes 42 (2): 99–10. Sanford, A.J.S., Sanford, A.J., Molle, J. & Emmott, C. 2006. Shallow processing and attention capture in written and spoken discourse. Discourse Processes 42: 109–130. Sapir, E. 1921. Language. New York: Harcourt, Brace. Schafer, A.J., Carter, J., Clifton, C. & Frazier, L. 1996. Focus in relative clause construal. Language and Cognitive Processes 11: 135–163. Schlesinger, I. 1966. The infuence of sentence structure on the reading process. U.S. Ofce of Naval Research Tech. Rept. 24. Schlueter, Z., Parker, D. & Lau, E. 2019. Error-driven retrieval in agreement attraction rarely leads to misinterpretation. Frontiers in Psychology. 07 May 2019. https://doi.org /10.3389/fpsyg.2019.01002. Schneider, J. & Maguire, M.J. 2018. Developmental diferences in the neural correlate supporting semantics and syntax during sentence processing. Developmental Science 22 (4). https://doi.org/10.1111/desc.12782. Schriefers, H., Jescheniak, J.D. & Hantsch, A. 2002. Determiner selection in noun phrase production. Journal of Experimental Psychology. Learning, Memory and Cognition 28: 941–950. Seckel, A. 2006. Optical illusions: The science of visual perception. Bufalo/Richmond Hill: Firefy Books. Sedivy, J.C. 2002. Invoking discourse-based contrast sets and resolving syntactic ambiguities. Journal of Memory and Language 46 (2): 341–370. Sekerina, I. 2003. Scrambling and processing: Complexity, dependencies, and constraints. In Karimi, S. (ed.), Word order and scrambling, 301–324. Oxford: Blackwell. Siewierska, A. 1998. Variation in major constituent order: A global and a European perspective. In Siewierska, A. (ed.), Constituent order in the languages of Europe, 475–551. Berlin: Mouton de Gruyter. Simon, H.A. 1956. Rational choice and the structure of the environment. Psychological Review 63 (2): 129–138. Slattery, T.J., Sturt, P., Christianson, K., Yoshida, M. & Ferreira, F. 2013. Lingering misinterpretations of garden path sentences arise from competing syntactic representations. Journal of Memory and Language 69 (2): 104–120.

References

231

Slobin, D. 1966. Grammatical transformations and sentence comprehension in childhood and adulthood. Journal of Verbal Learning & Verbal Behavior 5 (3): 219–227. Slobin, D. & Bever, T. 1982. Children use canonical sentence schemas: A crosslinguistic study of word order and infections. Cognition 12 (3): 229–265. Smith, G., Franck, J. & Tabor, W. 2018. A self-organizing approach to subject-verb number agreement. Cognitive Science 42 Suppl 4 (1). https://doi.org/10.1111/cogs .12591. Snedeker, J. & Trueswell, J. 2004. The developing constraints on parsing decisions: The role of lexical biases and referential scenes in child and adult sentence processing. Cognitive Psychology 49: 238–299. Solomon, E. & Pearlmutter, N. 2004. Semantic integration and syntactic planning in language production. Cognitive Psychology 49: 1–46. Spivey-Knowlton, M. 1994. Quantitative predictions from a constraint-based theory of syntactic ambiguity resolution. In Mozer, M.C., Touretzky, D.S. & Smolensky, P. (eds.), Proceedings of the 1993 connectionist models summer school, 130–137. Hillsdale, NJ: Lawrence Erlbaum. Spivey-Knowlton, M. & Sedivy, J. 1995. Resolving attachment ambiguities with multiple constraints. Cognition 55: 227–267. Spivey, M.J. & Tanenhaus, M.K. 1998. Syntactic ambiguity resolution in discourse: Modeling the efects of referential context and lexical frequency. Journal of Experimental Psychology: Learning, Memory and Cognition 24: 1521–1543. Sprouse, J., Wagers, M. & Phillips, C. 2012. A test of the relation between working memory capacity and syntactic island efects. Language 88: 82–123. Stabler, E.P. 2013. Two models of minimalist, incremental syntactic analysis. Topics in Cognitive Science 5 (3): 611–633. Staub, A. 2007. The parser doesn’t ignore intransitivity, after all. Journal of Experimental Psychology: Learning, Memory, and Cognition 33: 550–569. Staub, A. 2009. On the interpretation of the number attraction efect: Response time evidence. Journal of Memory and Language 60: 308–327. Staub, A. 2010. Response time distributional evidence for distinct varieties of number attraction. Cognition 114 (3): 447–454. Steedman, M. 2000. The syntactic process. Cambridge, MA: MIT Press. Steele, S. 1978. Word order variation: A typological study. In Greenberg, J.H., Ferguson,C. A. & Moravcsik, E. (eds.), Universals of human language, Vol. 4, 585–623. Stanford: Stanford University Press. Steinhauer, K., Alter, K. & Friederici, A.D. 1999. Brain responses indicate immediate use of prosodic cues in natural speech processing. Nature Neuroscience 2: 191–196. Sternberg, S. 1966. High speed scanning in human memory. Science 153: 652–654. Sternberg, S. 1969. Memory-scanning: Mental processes revealed by reaction-time experiments. The American Scientist 57: 421–457. Stowe, L. 1986. Models of gap location in the human language processor. Bloomington, IN: Indiana University Linguistics Club. Sturt, P. 2003. The time-course of the application of binding constraints in reference resolution. Journal of Memory and Language 48: 542–562. Sturt, P., Sanford, A.J., Stewart, A. & Dawydiak, E. 2004. Linguistic focus and goodenough representations: An application of the change-detection paradigm. Psychonomic Bulletin & Review 11 (5): 882–888. Swets, B., Desmet, T., Hambrick, D.Z. & Ferreira, F. 2007. The role of working memory in syntactic ambiguity resolution: A psychometric approach. Journal of Experimental Psychology: General 136: 64–81.

232 References

Swets, B., Desmet, T., Clifton, C. & Ferreira, F. 2008. Underspecifcation of syntactic ambiguities: Evidence from self-paced reading. Memory & Cognition 36 (1): 201–216. Swinney, D. 1979. Lexical access during sentence comprehension: (Re) consideration of context efects. Journal of Verbal Learning and Verbal Behavior 18: 645–659. Swinney, D.A., Ford, U., Frauenfelder, U. & Bresnan, J. 1988. On the temporal course of gap-flling and antecedent-assignment during sentence comprehension. In Grosz, B., Kaplan, R., Machen, M. & Sag, I. (eds.), Language structure and processing. Standford, CA: CSLI. Tabossi, P., Spivey-Knowlton, M.J., McRae, K. & Tanenhaus, M.K. 1994. Semantic efects on syntactic ambiguity resolution: Evidence for a constraint-based resolution process. In Umiltà, C. & Moscovitch, M. (eds.), Attention and performance series. Attention and performance 15: Conscious and nonconscious information processing, 589–615. Boston: MIT Press. Tanaka, M. 2019. Similarities and diferences between quantifer raising and Wh movement out of adjuncts. Syntax. https://doi.org/10.1111/synt.12189. Tanenhaus, M. 2013. The impact of the cognitive basis for linguistic structures. In Sanz, M., Laka, I. & Tanenhaus, M. (eds.), Language down the garden path: The cognitive basis for linguistic structure, 235–405. Oxford: Oxford University Press. Tanenhaus, M.K., Boland, J., Garsney, S. & Carlson, G. 1989. Lexical structure in parsing long-distance dependencies. Journal of Psycholinguistic Research 18: 37–50. Tanner, D. 2019. Robust neurocognitive individual diferences in grammatical agreement processing: A latent variable approach. Cortex 111: 210–237. Tanner, D., Nicol, J. & Brehm, L. 2014. The time-course of feature interference in agreement comprehension: Multiple mechanisms and asymmetrical attraction. Journal of Memory and Language 76: 195–215. Tanner, D. & Hell, J.V. 2014. ERPs reveal individual diferences in morphosyntactic processing. Neuropsychologia 56: 289–301. Tanner, D. & Bulkes, N. 2015. Cues, quantifcation, and agreement in language comprehension. Psychonomic Bulletin & Review 22: 1753–1763. Taylor, W.L. 1953. A new tool for measuring readability. Journalism Quarterly 30: 415. Taylor, S.E. 1965. Eye movements in reading: Facts and fallacies. American Educational Research Journal 2 (4): 187–202. Taylor, J. 2002. Cognitive grammar. Oxford: Oxford University Press. Temperley, D. & Gildea, D. 2018. Minimizing syntactic dependency lengths: Typological/ cognitive universal? Annual Review of Linguistics 4: 1–15. Thornton, R. & MacDonald, M. 2003. Plausibility and grammatical agreement. Journal of Memory and Language 48: 740–759. Tollan, R. & Palaz, B. 2021. Subject gaps revisited: Complement clauses and complementizer-trace efects. Frontiers in Psychology 12: 658364. https://doi.org/10 .3389/fpsyg.2021.658364. Todorova, M., Straub, K., Badecker, W. & Frank, R. 2000. Proceedings of the twenty-second annual conference of the cognitive science society, 3–8. Mahwah, NJ: Lawrence Erlbaum Associates. Aspectual coercion and the online computation of sentential aspect. Toribio, A.J. 2000. Setting parametric limits on dialectal variation in Spanish. Lingua 10: 315–341. Townsend, D.J. & Bever, T.G. 2001. Sentence comprehension: The integration of habits and rules (Vol. 1950). Cambridge, MA: MIT Press. Traxler, M.J. & Pickering, M.J. 1996. Plausibility and the processing of unbounded dependencies: An eye-tracking study. Journal of Memory and Language 35: 454–475.

References

233

Traxler, M., Pickering, M. & Clifton, C. 1998. Adjunct attachment is not a form of lexical ambiguity resolution. Journal of Memory and Language 39: 558–592. Traxler, M.J. 2007. Working memory contributions to relative clause attachment processing: A hierarchical linear modeling analysis. Memory and Cognition 35: 1107–1121. Traxler, M.J. 2009. A hierarchical linear modeling analysis of working memory and implicit prosody in the resolution of adjunct attachment ambiguity. Journal of Psycholinguistic Research 38 (5): 491–509. Traxler, M.J. 2014. Trends in syntactic parsing: Anticipation, Bayesian estimation, and good-enough parsing. Trends in Cognitive Science 18 (11): 605–611. Trudgill, P. 1999. Language contact and the function of linguistic gender. Poznań Studies in Contemporary Linguistics 35: 133–152. Truswell, R. 2011. Events, phrases, and questions. Oxford: Oxford University Press. Trueswell, J.C. 1996. The role of lexical frequency in syntactic ambiguity resolution. Journal of Memory and Language 35: 566–585. Trueswell, J.C., Tanenhaus, M.K. & Kello, C. 1993. Verb-specifc constraints in sentence processing: Separating efects of lexical preference from garden-paths. Journal of Experimental Psychology Learning Memory and Cognition 19 (3): 528–553. Trueswell, J.C., Tanenhaus, M.K. & Garnsey, S.M. 1994. Semantic infuences on parsing: Use of thematic role information in syntactic ambiguity resolution. Journal of Memory and Language 33: 285–318. Ueno, M. & Kluender, R. 2009. On the processing of Japanese Wh-questions: An ERP Study. Brain Research 1290: 63–90. Van Berkum, J.J.A., Hagoort, P. & Brown, C.M. 1999. Semantic integration in sentences and discourse: Evidence from the N400. Journal of Cognitive Neuroscience 11 (6): 657–671. Van Dyke, J.A. & McElree, B. 2006. Retrieval interference in sentence comprehension. Journal of Memory and Language 55: 157–166. Van Dyke, J.A. & McElree, B. 2011. Cue-dependent interference in comprehension. Journal of Memory and Language 65: 247–263. Van Gompel, R.P.G. & Liversedge, S. 2003. The infuence of morphological information on cataphoric pronoun assignment. Journal of Experimental Psychology, Learning, Memory and Cognition 29 (1): 128–139. Van Valin, R. & Lapolla, R. 1997. Syntax. Cambridge: C.U.P. Van Riemsdijk, H. & Williams, E. 1986. Introduction to the theory of grammar. Cambridge, MA: MIT Press. Vasishth, S. & Lewis, R.L. 2006. Argument-head distance and processing complexity: Explaining both locality and anti-locality efects. Language 82: 767–794. Vasishth, S., Brüssow, S., Lewis, R.L. & Drenhaus, H. 2008. Processing polarity: How the ungrammatical intrudes on the grammatical. Cognitive Science 32 (4): 685–712. Vasishth, S., Suckow, K., Lewis, R.L. & Kern, S. 2010. Short-term forgetting in sentence comprehension: Crosslinguistic evidence from verb-fnal structures. Language and Cognitive Processes 25 (4): 533–567. Vasishth, S., Nicenboim, B., Engelmann, F. & Burchert, F. 2019. Computational models of retrieval processes in sentence processing. Trends in Cognitive Sciences 23 (11): 968– 982. https://doi.org/10.1016/j.tics.2019.09.003. Vigliocco, G., Butterworth, B. & Semenza, C. 1995. Constructing subject-verb agreement in speech: The role of semantic and morphological factors. Journal of Memory and Language 34: 186–215.

234

References

Vigliocco, G., Hartsuiker, R.J., Jarema, G. & Kolk, H.H.J. 1996a. One or more labels on the bottles? Notional concord in Dutch and French. Language and Cognitive Processes 11: 407–442. Vigliocco, G., Butterworth, B. & Garrett, M.F. 1996b. Subject-verb agreement in Spanish and English: The role of conceptual factors. Cognition 51: 261–298. Vigliocco, G. & Zilli, T. 1999. Syntactic accuracy in sentence production: The case of gender disagreement in Italian language-impaired and unimpaired speakers. Journal of Psycholinguistic Research 28: 623–648. Vigliocco, G. & Franck, J. 1999. When sex and syntax go hand in hand: Gender agreement in language production. Journal of Memory and Language 40: 455–478. Vigliocco, G. & Franck, J. 2001. When sex afects syntax: Contextual infuences in sentence production. Journal of Memory and Language 45: 368–90. Vigliocco, G. & Hartsuiker, R.J. 2002. The interplay of meaning, sound and syntax in language production. Psychollogical Bulletin 128: 442–472. Villata, S., Tabor, W. & Franck, J. 2018. Encoding and retrieval interference in sentence comprehension: Evidence from agreement. Frontiers in Psychology 9 (2) https://doi.org /10.3389/fpsyg.2018.00002. Von der Malsberg, T. 2015. Py-Span-Task: A software for testing working memory span. https://doi.org/10.5281/zenodo.18238. Von der Malsburg, T. & Vasishth, S. 2013. Scanpaths reveal syntactic underspecifcation and reanalysis strategies. Language and Cognitive Processes 28 (10): 1545–1578. Wagers, M., Lau, E. & Phillips, C. 2009. Agreement attraction in comprehension: Representations and processes. Journal of Memory and Language 61: 206–237. Wagers, M.W. & Phillips, C. 2014. Going the distance: Memory and control processes in active dependency construction. The Quarterly Journal of Experimental Psychology 67 (7): 1274–1304. Wagner, M. & Watson, D.G. 2010. Experimental and theoretical advances in prosody: A review. Language and Cognitive Processes 25: 905–945. Wanner, E. & Maratsos, M. 1978. An ATN approach to comprehension. In Halle, M., Bresnan, J. & Miller, G.A. (eds.), Linguistic theory and psychological reality, 119–161. Cambridge, MA: MIT Press. Wasow, T. 1972. Anaphoric relations in English. Dissertation. MIT Press. Watson, D. & Gibson, E. 2005. Intonational phrasing and constituency in language production and comprehension. Studia Linguistica 59: 279–300. Wechsler, S. 2008. Agreement features. Language and Linguistics Compass 3 (1): 384–405. Wechsler, S. & Zlatic, L. 2003. The many faces of agreement. Stanford and Chicago: CSLI Publications and Chicago University Press. Wechsler, S. & Hahm, H.-L. 2011. Polite plurals and adjective agreement. Morphology 21: 247–281. Wells, J.B., Christiansen, M.H., Race, D.S., Acheson, D.J. & MacDonald, M.C. 2009. Experience and sentence processing: Statistical learning and relative clause comprehension. Cognitive Psychology 58: 250–271. White, M., Rajkumar, R., Ito, K. & Speer, S.R. 2014. Eye tracking for the online evaluation of prosody in speech synthesis. In Stent, A. & Bangalore, S. (eds.), Natural language generation in interactive systems, 281–301. Cambridge: Cambridge University Press. Wijnen, F. 2004. The implicit prosody of Jabberwocky and the relative clause attachment riddle. In Quene, H. & van Heuven, V.J. (eds.), On speech and language: Studies for Sieb G.Nooteboom, 169–178. Utrecht: Netherlands Graduate School of Linguistics.

References

235

Wolna, A., Durlik, J. & Wodniecka, Z. 2022. Pronominal anaphora resolution in Polish: Investigating online sentence interpretation using eye-tracking. PLoS One 17 (1): e0262459. https://doi.org/10.1371/journal.pone.0262459. Xiang, M., Dillon, B.W. & Phillips, C. 2006. Testing the strength of the spurious licensing efect for negative polarity items. Talk presented at the 19th annual meeting of the CUNY conference on human sentence processing, March. New York. Xiang, M., Dillon, B. & Phillips, C. 2009. Illusory licensing efects across dependency types: ERP evidence. Brain & Language 108: 40–55. Yanilmaz, A. & J.E. Drury. 2018. Prospective NPi licensing and intrusion in Turkish. Language, Cognition and Neuroscience 33 (1): 111–138. Zagar, D. & Pynte, J. 1992.The role of semantic information and of attention in processing syntactic ambiguity: Eye-movement study. Paper presented at the 5th conference of the European society for cognitive psychology, Paris. Zagar, D., Pynte, J. & Rativeau, S. 1997. Evidence for early-closure attachment on frstpass reading times in French. Quarterly Journal of Experimental Psychology 50A: 421–438. Zawiszewski, A. & Friederici, A.D. 2009. Processing canonical and non-canonical sentences in Basque: The case of object-verb agreement as revealed by event-related brain potentials. Brain Research 1284: 161–179. Zawiszewski, A., Santesteban, M. & Laka, I. 2016. Phi-features reloaded: An event-related potential study on person and number agreement processing. Applied Psycholinguistics 37: 601–626. Zahn, D. & Scheepers, C. 2015. Overt prosody and plausibility as cues to relative-clause attachment in English spoken sentences. PeerJ PrePrints 3: e1210v1. https://doi.org/10 .7287/peerj.preprints.1210v1. Ziegler, J., Bencini, G., Goldberg, A. & Snedeker, J. 2019. How abstract is syntax? Evidence from structural priming. Cognition 193: 104045. Zukowski, A. 2009. Elicited production of relative clauses in children with Williams syndrome. Language and Cognitive Processes 24: 1–43.

INDEX

abstract vs. concrete 34, 46 Accessibility Hierarchy 22, 152–153, 155 Active Filler Strategy 136–144 adjunction 18–23 Agreement Hierarchy 71 agreement mismatch 70–71, 74, 76, 89, 92–94, 96, 106–111, 113–114, 155–157, 173, 177–178, 181–184, 190, 193–194, 196, 197 ambiguity advantage efect 54–56 anaphora 129, 146–147, 152–157, 181, 196, 203 animacy 33–35, 45–46, 54, 156, 170, 193 Asymmetry Efect 95 Canonical Form Constraint 172 causality 49 centre embedding 20, 123–124 clitics 83, 115, 149 coherence 45–49, 62 Complex Noun Phrase Constraint 181 constraint satisfaction 50, 175 Construal Principle 36 continuous valuation 96, 100 contrastiveness 91 cross-modal priming 15, 121, 130, 136, 140, 142 crossover 183–184 cue-based parsing 95–100 Derivational Theory of Complexity or DTC 8, 23, 162

determiner selection 199–201 Direct Association Hypothesis 123, 124, 139–143, 152 dirty parse 166, 171 distributivity 77–79, 87, 88, 93, 101, 112, 114 downward percolation 82, 95, 97, 100, 115, 192 early/late closure 8, 23–29, 35, 38, 40, 53, 59, 60, 139 emotionality 46 epicene 78, 156, 197 Event-Related Brain Potentials (ERPs) 48, 97, 102–104, 112 eye-tracking technique 9–11 Feature Hierarchy Hypothesis 104–105 flled-gap efect 137–138 frequency 12, 15, 26–35, 51–53, 60–62, 92, 146, 174 garden path 3, 10–15, 20, 23, 24, 39, 41, 46, 56, 134, 136, 163, 170, 172 gender mismatch efect 157 geometrical biases 94, 112, 123, 126, 156, 181, 183, 196 grain size problem 4, 31–35 Gricean principles 36–40, 46, 49 high attachment see early/late closure hyper gap-flling 143–144

Index

information structure 40, 202 interactions 51–52 intrusion efect 98, 155, 188–189, 192–194 islands 144, 178–181, 192, 195 late assignment 166 late-selection-language 200 Left Anterior Negativity (LAN) 103 locality 18–21 lossy context 165, 174–177 low attachment see late closure memory 123–128, 137–141, 150–155, 164, 174, 177, 180–181, 188, 192, 196 memory-based interference 95–96, 101, 126, 174, 181 minimal distance principle 127, 132 Minimal Link Condition 137 misrelated participle 183, 196 modifer straddling 28 morphological richness 92–94, 102, 108–110, 112, 183, 196, 201–202 Most Recent Filler Hypothesis/Strategy 127–136 movement 5, 119–122, 125, 129, 131–132, 137–139, 145–157 Movement Theory of Control 131–132 N400 103 negative polarity 186–190 noisy channel 174, 177, 180, 190 now-or-never bottleneck 56 number-transparent 71 opportunism 124, 182, 196–203 oracle 12, 15, 24, 32 P600 103 parallel 7, 115, 127, 164, 173, 175, 188, 191–192 parsimony 47, 96, 101, 196–197 person feature 104, 107–108, 113 phase impenetrability 3, 106, 164, 166, 169, 178 Phonological Facilitation Efect 199 predicate proximity 55 prediction, predictive 113, 124, 140, 144, 151, 154, 174–177, 183, 191, 194

237

primary and non-primary constituents 36 Principle C 181–182, 184 pro 104, 147 PRO 122–123, 127–136, 145–146, 151, 190, 197–198 prosody 40–45, 202 RAN 125 reading span 44, 126–127, 171 reference 152–157, 173 referential support 46–53, 59, 62, 88 refexives 3, 103, 110, 133, 148–150, 153–156, 189, 192–194, 198–199 relativized relevance/prominence 26, 36–38, 40 retrieval 26, 96, 100–101, 106, 126, 133, 153, 183–184, 188, 192–194, 200 rule conspiracy 168, 172, 195 saccadic movements 9–10, 51 salience 108, 128, 155, 157, 197 same-size sister 41 satisfcing 171, 177 sausage machine 23, 42 semantic interference/interfacing 92, 93, 101, 112, 114–115 stereotypical gender 110–11 strategies 165–177 subject relatives vs. object relatives 97–99, 120, 124–125, 137, 162, 172 surprisal 150, 172, 175–177 sustained anterior negativity 124, 138–139 Syntactic Prediction Locality Theory 124, 158n11, 175 syntax proposes, semantics disposes 47–48, 166, 171, 178 theta 3, 26, 36–39, 120, 129 trace 4, 120–124, 129, 135–151, 168 unagreement 108, 113 underspecifcation 36, 52, 56 unifcation 72, 75–78, 80–81, 83, 85, 116n12 visual world paradigm 48