Empirical Approaches to Linguistic Theory: Studies in Meaning and Structure 9781614510888, 9781614510895

The mental representation of language cannot be directly observed but must be inferred and modelled from its effects at

312 98 5MB

English Pages 357 [360] Year 2012

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Empirical Approaches to Linguistic Theory: Studies in Meaning and Structure
 9781614510888, 9781614510895

Table of contents :
Foreword
Part 1: Methods and analysis
Incremental truth value judgments
Measuring Syntactic Priming in Dialogue Corpora
How structure-sensitive is the parser? Evidence from Mandarin Chinese
The annotation of preposition senses in German
Part 2: Applications to linguistic theory
Evidence about evidentials: Where fieldwork meets theory
Crosslinguistic variation in comparison: evidence from child language acquisition
Restricting quantifier scope in Dutch: Evidence from child language comprehension and production
McGee’s counterexample to Modus Ponens in context
Interpreting adjectival passives: Evidence for the activation of contrasting states
Focus projection between theory and evidence
Locative Inversion in English: Implications of a Rating Study
Part 3: Cognitive and neurological basis of language
Word- vs. sentence-based simulation effects in language comprehension
Language skills in patients with reorganized language (RL)
Predicting speech imitation ability biometrically
Index

Citation preview

Empirical Approaches to Linguistic Theory

Studies in Generative Grammar 111

Editors

Henk van Riemsdijk Harry van der Hulst Jan Koster

De Gruyter Mouton

Empirical Approaches to Linguistic Theory Studies in Meaning and Structure Edited by

Britta Stolterfoht Sam Featherston

De Gruyter Mouton

The series Studies in Generative Grammar was formerly published by Foris Publications Holland.

ISBN 978-1-61451-089-5 e-ISBN 978-1-61451-088-8 ISSN 0167-4331 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at http://dnb.dnb.de. ” 2012 Walter de Gruyter GmbH & Co. KG, Berlin/Boston Typesetting: PTP-Berlin Protago TEX-Production GmbH, Berlin Printing: Hubert & Co. GmbH & Co. KG, Göttingen 앝 Printed on acid-free paper 앪 Printed in Germany www.degruyter.com

Contents Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Britta Stolterfoht & Sam Featherston

vii

Part 1: Methods and analysis Incremental truth value judgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oliver Bott & Fabian Schlotterbeck

3

Measuring Syntactic Priming in Dialogue Corpora . . . . . . . . . . . . . . . . . . Christian Pietsch, Armin Buch, Stefan Kopp & Jan de Ruiter

29

How structure-sensitive is the parser? Evidence from Mandarin Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhong Chen, Lena Jäger & Shravan Vasishth The annotation of preposition senses in German . . . . . . . . . . . . . . . . . . . . Antje Müller, Claudia Roch, Tobias Stadtfeld & Tibor Kiss

43

63

Part 2: Applications to linguistic theory Evidence about evidentials: Where ¿eldwork meets theory . . . . . . . . . . . . Lisa Matthewson

85

Crosslinguistic variation in comparison: evidence from child language acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Sonja Tiemann, Vera Hohaus & Sigrid Beck Restricting quanti¿er scope in Dutch: Evidence from child language comprehension and production . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Petra Hendriks, Ruth Koops van ’t Jagt & John Hoeks McGee’s counterexample to Modus Ponens in context . . . . . . . . . . . . . . . 169 Janneke Huitink

vi

Contents

Interpreting adjectival passives: Evidence for the activation of contrasting states. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Berry Claus & Olga Kriukova Focus projection between theory and evidence . . . . . . . . . . . . . . . . . . . . . 207 Kordula De Kuthy & Detmar Meurers Locative Inversion in English: Implications of a Rating Study . . . . . . . . . 241 Sara Holler & Jutta M. Hartmann

Part 3: Cognitive and neurological basis of language Word- vs. sentence-based simulation effects in language comprehension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Barbara Kaup, Jana Lüdtke & Ilona Steiner Language skills in patients with reorganized language (RL) . . . . . . . . . . . 291 Eleonore Schwilling, Karen Lidzba, Andreas Konietzko, Susanne Winkler & Ingeborg Krägeloh-Mann Predicting speech imitation ability biometrically . . . . . . . . . . . . . . . . . . . 317 Susanne Reiterer, Nandini C. Singh & Susanne Winkler Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341

Foreword Britta Stolterfoht & Sam Featherston

The topics of how people learn and store grammatical structures, how these structures might be mentally represented, and how they are combined with lexical information in the process of creating and interpreting utterances are central to the ¿eld of linguistic theory. Since these phenomena cannot be directly observed, issues of the exploration and development of sources of evidence are of permanent interest to researchers in language. There have been two responses to this situation over the last ¿fty years in grammar research. The ¿rst has been what we might term the ‘clear cases’ approach (in the sense of Chomsky 1965; 19): this involves working on the fairly low-tech but nevertheless rich evidence of language behaviour and language perceptions. Examples of this are precisely many studies in syntax and semantics. The result has been lots of exciting work with abundant explanatory hypotheses and fertile debate. On reÀexion, however, it is apparent that not all claims made or theories advanced were empirically fully motivated, and some were even speculative. The alternative approach has been applied in those areas of linguistics where more precise quanti¿able data has been readily accessible – one might think of phonetics and sociolinguistics as examples. Here the more descriptive ‘measurement’ approach to linguistics has dominated. Claims tend to be wellsupported by data observation and theories developed on the basis of statistical analysis. This approach may tend to limit the scope of explanation to just those phenomena for which suf¿cient data is available. More recently however, there has been a trend for the empirical methods of the second school to be applied to the issues traditionally associated with the former approach. Linguists are spending more time addressing the evidential base of their work, developing tools to gather, process, and exploit linguistic data, and testing assumptions and analyses experimentally or through observation of natural language use. The aim is to combine the theoretical insights from the ¿rst group with the ability to deliver more ¿rmly based answers from the second group. This approach has rapidly gained popularity and is widely seen as an important part of the future of research in syntactic and semantic theory: it is becoming increasingly common for papers to contain some empirical con¿rmation of their claims, whether this be from experiments, corpus data, or other data types, such as ¿eld work with minority languages.

viii

Britta Stolterfoht & Sam Featherston

While the collection and analysis of linguistic data naturally takes time and effort, the rewards in terms of additional insights into structures and the interactions of factors have revealed themselves to be well worth the investment. The contribution towards theory is also positive: very frequently the more controlled data con¿rms what traditional data sources such as an individual’s judgements have previously told us. This is also a useful ¿nding since it can provide additional support for an analysis and places theories which are thus con¿rmed on a ¿rmer footing. But this data too has bene¿ts in providing the basis for the falsi¿cation of analyses: if the predictions of a theory are shown not to hold, then it unquestionably needs to be reconsidered. The ability to discon¿rm hypotheses permits us to reject certain accounts or approaches and concentrate upon the more adequate ones. Empirical testing thus makes for faster theory turnover and thus more rapid theory development. In fact the most frequent ¿nding is that the analysis on the basis of fairly simple data is broadly correct, but that there are more things that can be said. Most typically, it is revealed that previous assumptions need to be revised, either about the extent of a phenomenon, the range of structures that it applies to, its effects upon acceptability, or about its internal constituency. An example of this would be if what had been taken to be a single constraint on structure turned out, on closer inspection, to be made up of two logically independent constraints which however applies in similar circumstances. This new approach laying greater emphasis on empirical con¿rmation is thus generally seen as a welcome change in grammatical research and has won many adherents. It is ¿rstly seen as driving an acceleration in the research cycle of data analysis, theory construction, hypothesis testing, and theory adoption or rejection. It is also regarded as the basis of the sharpening of linguistics as a research ¿eld and a new self-consciousness among linguists of themselves as members of the wider ¿eld of cognitive science. This approach is that which is employed by the authors of the papers in this volume. We may distinguish three groups of papers, which each address one of the major issues in this empirical approach to linguistic theory. The ¿rst group focuses on the collection and analysis of linguistic data. These present innovations in linguistic methodology or show how new approaches to the analysis of data can render it more informative. Bott & Schlotterbeck propose a new method to track possible meanings of ambiguous sentences during online sentence comprehension and demonstrate the advantages of this incremental truth value judgement task (IVJT) over the usual method of gathering truth value judgements (TVJ). This paper provides us with a promising new methodology for empirical semantics and new insights in the processing of quanti¿er scope.

Foreword

ix

The paper by Pietsch, Buch, Kopp & de Ruiter investigates syntactic priming, which is a well-studied phenomenon in controlled psycholinguistic experiments. The authors build upon previous work in that they examine priming in more natural linguistic material using treebanks of dialogue corpora. This adds a very interesting new data type to the controversial discussion of syntactic priming. Furthermore, the results of the two studies challenge inÀuential explanations of syntactic priming (e.g. the Interactive Alignment Model of dialogue; Pickering & Garrod 2004). The question addressed by Chen, Jäger & Vasishth is whether the parser makes use only of grammatical constraints or also non-structural cues such as gender and number when resolving grammatical dependencies. Chen and colleagues make the important point that previous experimental studies have only yielded null results, which cannot positively establish the absence of an effect. Their article provides a useful corrective which should remind us that negative results are a never a ¿nal answer and that even quantitative data requires interpretation and analysis. Müller, Roch, Stadtfeld & Kiss utilize a very different data type in their work in distinguishing preposition senses, an essential step towards being able to investigate the semantic factors which inÀuence their behaviour and occurrence. Their approach is to produce an inventory of preposition senses consisting of 27 top-level senses, some of which open out into sub-senses. The papers illustrates the value of the approach by focusing on the use of ohne ‘without’ in preposition-noun combinations, that is, without a determiner. Their reference corpus and annotation scheme provides an essential tool for the investigation of prepositional structures. The second part of this volume contains papers which explore the insights for linguistic explanation and linguistic theory which can be gained from work taking the empirical stance. These articles all address important issues in linguistic structure and interpretation and show how the collection of more detailed evidence can help provide answers to questions which would otherwise remain obscure. This section therefore deals with the application of the more data-informed approach to linguistics. For example, the work by Matthewson investigates cross-linguistic variation in the semantics and pragmatics, and its relevance for Universal Grammar. The paper reports ¿eldwork on St’at’imcets (Lillooet Salish, a highly endangered language spoken by fewer than 50 speakers, all over the age of 70) and addresses the status of evidentials and epistemic modals as two different conceptual classes. The data support the claim that the two classes are identical. This result is an important contribution to the recent controversial discussion in semantic theory (see e.g. Aikhenvald 2004, Kratzer 2010).

x

Britta Stolterfoht & Sam Featherston

The thematically related paper by Tiemann, Hohaus & Beck brings together formal semantics and empirical methods in investigating comparison constructions. The ¿rst part of the paper deals with a wide range of crosslinguistic data. The second part presents corpus data on the time course of the acquisition of comparison constructions in English and German. Both types of data support a compositional semantic analysis of cross-linguistic variation proposed by Beck et al. (2009). Another paper making use of acquisition data is that of Hendriks, Koops van ‘t Jagt & Hoeks who investigate quanti¿er scope in Dutch in examples like A bear tickles every turtle. The Dutch examples appear to be subject to different syntactic constraints to the English equivalent structures. Dutch children, however, behave like adult speakers of English. Their comprehension and production data provide a much ¿ner picture of the contrast between the adults’ and the children’s grammars and interpretation strategies. The data enables the authors to propose an elegant account within Optimality Theory: an alternative ranking of constraints on speci¿city and familiarity in children’s grammars. Questionnaire studies can illuminate all sorts of semantic issues. Huitink discusses the effect of context on logical inference in language. McGee (1985) presented a supposed counter-example to Modus Ponens in right-nested conditions. Huitink analyses the contextual factors which lead informants to set aside the normal rules of inference and draw these perhaps intuitively comprehensible but ¿nally not narrowly logical conclusions. Claus & Kriukova report an experimental study on the comprehension of adjectival passives in German. With a picture veri¿cation task, they provide experimental evidence for recent semantic accounts (e.g. Maienborn 2009; Gehrke, to appear) assuming that the interpretation of adjectival passives involve a contrasting state. The results indicate that contrasting states are mentally more available after reading an adjectival passive compared to a control sentence with a genuine adjective. De Kuthy & Meurers examine the widely discussed concept of focus projection and present interesting new data with which to model the syntaxinformation structure interface. The authors evaluate both standard syntactically-driven theories of focus projection (e.g. Selkirk 1995) and more recent pragmatic approaches (Roberts 2006, Kadmon 2006) in the light of two types of evidence: a synopsis of the existing experimental data and their own new corpus-based speech data. Even though the results reveal evidence for focus projection, other non-syntactic factors must also be considered. The study thus opens up new perspectives in investigating focus structure.

Foreword

xi

In a thematically related paper Holler & Hartmann shows how experimental collection of speaker judgements can throw light onto locative inversion in English, in two ways. First, the extent of the phenomenon is explored and shown to be not as often assumed in the literature. This is an important advantage of the empirical approach, for we cannot hope to ¿nd the optimal theory if we are not yet clear exactly what phenomena it must account for. Second, they are able to put theoretical accounts from the literature to the test. While this paper does not yet provide a complete account of locative inversion, it does clearly move the discussion on and succeeds in re-formulating the problem. The papers in part 3 concern the cognitive and neurological background to language. Perhaps the single most important feature of the Chomskian approach to language structure is the fact that this approach regards the grammar as a phenomenon of cognitive psychology. This seminal step means that linguistic theory can be directly related to the architecture and functioning of the human mind. The papers in this last group throw light upon the cognitive systems that the language modules are embedded in. The ¿rst paper in this third part concerns the links between linguistic and non-linguistic cognitive processes. Research has shown that there are links between the mental traces left by real-world experiences and the linguistic representations of them. This overlap between language and experience has been connected to the priming between actions and the content of linguistic expressions. Kaup, Lüdke & Steiner look at whether these links relate to representations already at the word level or whether they only appear as a response to composed meanings at sentence level. This distinction is not easy to achieve but their results suggest that such effects are at least in part apparent at the lexical level, a ¿nding which has important implications for our understanding of these simulation phenomena. The ¿eld of neurolinguistics can also make use of evidence from neuropathology. Schwilling, Lidzba, Konietzko, Winkler & Krägeloh-Mann report their work analyzing the language abilities of patients with left-hemispheric brain damage, who therefore have reorganized a large part of their language processing into the right hemisphere. The novel combination of functional neuroimaging and linguistically informed tasks show that reorganized language is signi¿cantly different from the norm, even though patients with reorganized language can function appropriately in everyday situations of language use. Reiterer, Singh & Winkler use acoustic and fMRI data to show that, contrary to the general assumption in the generative literature, language skills vary considerably over individuals even in their native language. Furthermore, lan-

xii

Britta Stolterfoht & Sam Featherston

guage ability in a second language is closely related to language skill in the ¿rst. Whether these results can be generalized from the particular parameter of language ability investigated, namely pronunciation, to grammatical competence is of course yet to be answered, but these ¿ndings cast a fascinating light on the skills of the native speaker, which are of immediate relevance to the study of grammatical theory. We should like to express our heartfelt thanks to the many people who have contributed to the appearance of this collection of essays. First of all we should like to thank the staff at de Gruyter who have accompanied us on the way. In particular we should like to mention Ursula Kleinhenz, who was our valued linguistics contact at de Gruyter so many years and who we very much miss. Second, we wish to acknowledge our debt to the reviewers who have read and evaluated abstracts and papers in their own time and for our bene¿t rather than their own. The process of reviewing is too often a thankless task; we would wish it to be otherwise here.

The reviewers Cornelia Ebert Kordula de Kuthy Oliver Bott Fritz Hamm Irene Rapp Thomas Weskott Christian Ebert Helga Gese Christian Pietsch Walt Detmar Meurers Joana Cholin Claudia Friedrich Pia Knofferle Lisa Matthewson Fabian Schlotterbeck

Janneke Huitink Markus Bader Berry Claus Sigrid Beck Barbara Schlücker Caroline Féry Robin Hörnig Gerhard Jäger Heike Zinsmeister Antje Müller Petra Augurzky Rob Zwitserlood Jesse Harris Mingya Liu Philip Hofmeister

Foreword

xiii

We further recognize the important support of the Deutsche Forschungsgemeinschaft to the large parts of the work reported here and to the academic infrastructure which supports linguistics research. Thanks last of all to our colleagues here Sophie von Wietersheim, who proofread the manuscripts with an eagle eye, and Beate Starke, whose contribution to linguistics in Tübingen is ubiquitous and irreplaceable. Britta Stolterfoht & Sam Featherston, Tübingen

Part 1: Methods and analysis

Incremental truth value judgments Oliver Bott & Fabian Schlotterbeck*

1.

Introduction

Semantic and pragmatic theories are concerned with the possible meanings of a given sentence: they predict which readings are available and whether one is preferred to another. To test a semantic theory it is thus important to collect valid empirical data concerning the status of the proposed interpretations. From a psycholinguistic perspective a question of particular interest is how the ¿nal interpretation comes about. During processing there may be intermediate steps where the interpretation is rather different from the ¿nal one, as for instance in sentences with a semantic garden-path. To study semantic processing we would therefore like to probe all potential interpretations of the ambiguous input while it is being processed. In the present paper we propose a new method which allows us to track interpretation preferences as the sentence unfolds. With already existing online methods it is often not easy to tell when a particular reading, especially a dispreferred one, becomes available. Consider a hypothetical experiment intended to establish that one of two potential readings is initially preferred, but that the processor can readily shift to the other interpretation. We could, for instance, follow the common practice in research on garden-path sentences and measure reading times of a disambiguation towards the dispreferred reading. We would then expect to ¿nd processing dif¿culty when the disambiguation is encountered. In addition, in the ¿nal interpretation the sentence should be judged fully acceptable. This could be tested in a reading time study with end-of-sentence acceptability judgments. At ¿rst sight, dif¿culty at the disambiguating region as well as the ultimate availability of the dispreferred reading would look like the desired result. At closer inspection, however, it cannot tell us which of the following alternatives is correct. The dispreferred reading could be immediately computed at the disambiguation. Alternatively, it could be only arrived at at a later point, that is after having read the sentence during the judgment phase. When the dispre* This research was funded by the German Research Foundation (DFG) in the Project B1 of SFB 833. The authors would like to thank Wolfgang Steinefeld, Rolf Ulrich, Janina Radó, Britta Stolterfoht, the audience of Linguistic Evidence 2010 and the anonymous reviewers for their valuable comments on earlier drafts of this paper.

4

Oliver Bott & Fabian Schlotterbeck

ferred interpretation arises thus remains unclear. In general, we may distinguish three stages at which a reading may be generated: (i) during the processing of the ambiguous region e.g. as one of several differently ranked alternatives, (ii) at the disambiguation, e.g. during reanalysis, or (iii) ofÀine or post-interpretively, when providing a judgment. Following Caplan & Waters (1999) we would like to make a distinction between the extraction of meaning from a linguistic signal, which they called “interpretive processing”, and the use of that meaning to accomplish other tasks, such as reasoning, planning actions, and other functions, which they called “post-interpretive processing”. These two types of processing are closely related to Fodor’s (1983) distinction between modular and central systems. The new method proposed here enables us to decide whether an interpretation becomes available already during reading or is constructed only postinterpretively. It combines two well-established experimental paradigms: the truth value judgment task and incremental stops-making-sense judgments. We will demonstrate the method in a case study on quanti¿er scope, a perennial issue in semantics where intuitions are often far from clear. In particular, we were interested in whether and when inverse scopings become available during processing doubly quanti¿ed German sentences. We will present evidence that in object-before-subject sentences the inverse reading becomes available already during reading while for subject-initial sentences the inverse reading is only available post-interpretively. The paper is structured as follows. Section 2 introduces and motivates the new method. Section 3 presents the design of the case study and outlines predictions. Section 4 reports ofÀine evidence (Experiment 1) that inverse scope is generally dispreferred but possible in subject-initial and object-initial German sentences. This experiment is contrasted with the new method (Experiment 2) showing that during online processing inverse scope is only available in objectinitial sentences. Section 5 demonstrates (Experiment 3) that the new method is particularly sensitive to typicality biases. We conclude in Section 6 that the new method yields perfectly unbiased results, if we pay attention to this limitation.

2.

Introducing the new method

2.1.

The Truth-Value Judgment (TVJ) task

In order to measure interpretation preferences as they evolve during incremental processing of semantic ambiguities we took the truth value judgment task (henceforth TVJ task, see Crain & Thornton 1998, p. 209) and turned it into an

Incremental truth value judgments

5

online method. The TVJ task has been successfully applied to a wide range of semantic and pragmatic phenomena. In this task participants read or hear a sentence and have to verify or falsify it with respect to a given context. The context is often supplied in the form of a picture, but can also be purely linguistic. Thus, the experimenter has control over two variables, the linguistic stimulus and the context. The TVJ task is better suited to gather information about dispreferred interpretations than its alternatives, what we may call selection and production tasks. By “selection tasks” we refer to tasks where the experimenter only controls the sentence and provides a whole set of disambiguating contexts from which the participant can choose. “Production tasks” work the opposite way. A disambiguating context is presented and the linguistic expression has to be chosen that ¿ts it best. What is problematic with both of these types of tasks is the possibility of a winner-take-all effect. The preferred reading/expression may attract the participant’s attention causing a dispreferred option to be overlooked and, therefore, wrongly appear to be impossible. A winner-take-all effect is unlikely to occur in a TVJ task because, on each trial, the sentence appears with only one context. A dispreferred reading can therefore be evaluated without competition with the preferred alternative (see Crain & Thornton 1998 for a discussion). Note that the methodological considerations regarding selection and production tasks are not limited to ofÀine methods but also extend to online paradigms. For instance, the Visual World Paradigm (Cooper 1974; Tanenhaus et al. 1995) must be considered an online selection task since participants are looking at a set of referents associated with the different readings. Again, not looking at a referent associated with a certain interpretation can either mean (i) that a certain reading is not available or (ii) that the reading is principally available but the presence of a preferred alternative prevented participants from looking at the dispreferred alternative. A second advantage of the TVJ task is that it allows the experimenter to decide whether a reading with a very low acceptance rate is available or not. To disentangle highly dispreferred from unavailable readings we simply need to compare acceptance of dispreferred readings to a baseline error rate, for instance, acceptance in a disambiguated construction. We will elaborate on this point in more detail when we discuss our experiments. A potential problem with the TVJ task is that a given context may induce a bias to interpret the sentence in a way that ¿ts it. For instance, 80% acceptance of a sentence in a particular context does not imply that the corresponding meaning would be predominant without supportive context. High acceptance rates just indicate that an interpretation can be generated with relative ease. The difference in acceptance rates between two conditions does, however, represent

6

Oliver Bott & Fabian Schlotterbeck

the relative preferences. It is thus crucial to test a sentence with all its potential meanings to be able to properly interpret the relative differences. No matter what kind of task is used we have to be aware of a potential problem which is very rarely considered in psycholinguistic research on semantics. Often, meaning is only indirectly assessed by measuring the ¿t with a particular disambiguating context. Meaning is standardly characterized by the whole set of situations in which it is true and a given context is clearly only one instance of these. Situations may of course differ in how typical a representation they are of a particular meaning. If we test atypical or unnecessarily complex situations, it is possible that a meaning might be wrongly taken to be unavailable. So, ideally, we should always choose the most typical situation as the disambiguating context. We will come back to these issues when we discuss Experiment 3. However, the classical TVJ task can only inform us about the ¿nal interpretation. To ¿nd out whether a reading becomes available during the ongoing processing of a sentence we need an online method. For this purpose, we created an online version of the TVJ task, the incremental truth value judgment (ITVJ) task, which collects truth-value judgments incrementally for each new piece of the incoming sentence. 2.2.

The Incremental Truth Value Judgment (ITVJ) task

The ITVJ task is procedurally very similar to the stops making-sense judgment task (Tanenhaus et al. 1989). Participants have to make successive TVJs while reading a sentence. In contrast to the stops-making-sense judgment task, participants in the ITVJ task ¿rst inspect a disambiguating context (e.g. a picture). Then the context disappears and a sentence is displayed self-paced in an incremental fashion (e.g. via moving window presentation). At each segment participants have to decide whether, up to this point, the sentence still ¿ts the context. By pressing the “yes, it ¿ts, go on” button they get to the next segment. If they press the “no, can’t be true” button the presentation of the sentence is aborted. As in the stops-making-sense judgment task, two dependent variables are analyzed for each segment: cumulative rejections, i.e. how often the trial has been aborted up to this point and reaction times of “yes, go on” judgments. To illustrate how cumulative rejections are computed consider an example data set with three sentence segments and ten participants. At the ¿rst segment all participants choose “yes”. At the second segment ¿ve participants choose “yes”, while the other ¿ve abort the trial. At the last segment three out of the ¿ve remaining participants choose “yes”. This would yield the following cumulative rejection rates: region 1–0%, region 2–50%, region 3–70%.

Incremental truth value judgments

7

The rationale behind the task is the following. If the interpretation conveyed by the disambiguating context is inconsistent with the incoming sentence, more rejections are expected than when it is consistent. Furthermore, if one reading is preferred to another one fewer rejections and shorter reaction times for positive judgments are expected for the preferred than for the dispreferred reading. As in the original TVJ task, the experimenter controls both the linguistic construction and the disambiguating context. The two tasks differ, however, with respect to the information they provide. On the one hand, the ITVJ task provides more information than the original TVJ task, because it allows us to track the availability of a reading during incremental interpretation. At the same time, some information may be lost or skewed in the ITVJ task (compared to the TVJ task). Take, for instance, the sentence ‘All kids are at school or they are at home’ in a context where some of the kids are at home. Participants might abort the trial when processing school because locally, up to this point, the sentence is inconsistent with the context. It only turns out to be consistent after encountering the second disjunct. Therefore, researchers using the ITVJ task have to bear in mind that non-monotonic updates of the semantic representation may yield unexpected effects. Moreover, because in the ITVJ task participants have to keep a context in mind and compare it to the linguistic input, language external factors like complexity of the contexts or working memory limitations may inÀuence the judgments. This clearly limits the applicability of the task to a smaller range of phenomena compared to the original TVJ task. In contrast to other online methods, readers are forced to fully interpret every incoming piece where under normal circumstances they might have engaged in shallow processing (cf., for instance, Sanford & Sturt 2002). The literature on aspectual coercion provides us with a striking example of methodological inÀuences on the depth of processing. While Todorova et al. (2000) observed coercion effects in a stops making sense task, no such effects were found in ordinary self-paced reading and eyetracking during reading, even though the same set of materials was tested (Pickering et al. 2006). It is therefore important to stress that ITVJ task experiments are not intended as a substitute for other online methods, but to provide additional information not available otherwise. Conroy (2008) introduced a super¿cially similar method, the incremental veri¿cation task, and investigated interpretation preferences of scopally ambiguous sentences in children and adults. In her experiments participants ¿rst heard a potentially scope ambiguous sentence and then uncovered a complex picture one piece at a time. They had three options: (i) uncover more of the picture, (ii) abort the trial by pressing “true” or (iii) pressing “false”. Conroy used the method to investigate whether multiple scope readings are activated in

8

Oliver Bott & Fabian Schlotterbeck

parallel or whether only one reading is available at each given time during reading scope ambiguous sentences. By contrast, the ITVJ task is intended to reveal which readings are generated online when the processor is driven to compute them. Clearly, the methods complement each other and should be applied to the same phenomena.

3.

Introducing the case study

3.1.

The phenomenon

Scope Ambiguity has been at the core of semantic theory at least since Montague (1973). The literature is so huge that we cannot go into details here (for a recent overview see Szabolcsi 2010). Instead, we will arbitrarily pick three theories which make fundamentally different predictions. During the course of the paper we will demonstrate that the disagreement about the empirical facts may have to do with how “deeply” doubly quanti¿ed sentences are processed when judging whether a reading is available or not. First we will look at the constructions tested in our case study. (1) is a doubly quanti¿ed German sentence with the two determiners jeden dieser (‘each of these’, ‫ )׊‬and genau einer (‘exactly one’, ‫)!׌‬. Note that in this sentence the object each of these students appears in topicalized position and precedes the subject exactly one teacher. The standard assumption is that sentences like (1) have the two readings illustrated by the logical formulae in (2) and (3)1. (1)

Jeden dieser Schüler lobte genau ein Lehrer. each these studentsacc praised exactly one teachernom. ‘Each of these students was praised by exactly one teacher.’

(2)

‫׊‬x [ Student(x) ĺ ‫!׌‬y [Teacher(y) ‫ ר‬Praise(y,x)]]

(3)

‫!׌‬y [ Teacher(y) ‫׊ ר‬x [ Student(x) ĺ Praise(y,x)]]

We refer to the reading where the linearly ¿rst quanti¿er scopes over the second as the surface scope interpretation which is illustrated in (2). The inverse read-

1. We ignore tense and the internal semantics of the determiners here, and to keep matters simple abbreviate them by ‫ !׌‬and ‫׊‬.

Incremental truth value judgments

9

ing is an interpretation where the second quanti¿er takes scope over the ¿rst, as in (3). (4)

Genau ein Lehrer lobte jeden dieser Schüler. exactly one teachernom praised each these studentsacc. ‘Exactly one teacher praised each of these students.’

Example (4) can be used to illustrate that the sentence structure may have an inÀuence on scope preferences: Intuitively, an inverse reading2 seems to be more dif¿cult to get in (4) than in (1). Semantic frameworks often fail to account for this intuitive difference and assume multiply quanti¿ed sentences to uniformly display all combinatorially possible scopal relations (e.g. May 1977 concerning scope in English). Unconstrained theories thus predict a sentence with n quanti¿ers to allow for n! readings and sentences (1) and (4) are predicted to be equally ambiguous. More constrained theories predict different scopal relations depending on syntactic structure (e.g. Reinhart 1983, Frey 1993). Based on c-command relations, Frey (1993), for instance, would predict (1) to be ambiguous, but (4) to only allow surface scope. The above-mentioned theories do not account for lexical and discourse factors which may also inÀuence quanti¿er scope. This is different for instance in Pafel’s theory (2005) which takes quanti¿er scope preferences to be inÀuenced by multiple, syntactic as well as nonsyntactic, factors. His theory predicts that (1) should only have a surface scope interpretation. This is due to the fact that the object quanti¿er not only linearily precedes the subject quanti¿er, but is also distributive and discourse bound as an effect of the partitive construction. By contrast, (4) is predicted to be ambiguous with a preference for the surface scope interpretation since scope factors symmetrically apply to both quanti¿ers. To sum up, the theories make fundamentally different predictions regarding the constructions to be tested. Psycholinguistic studies of quanti¿er scope (for an overview see for example Pylkkänen & McElree 2006) provide evidence that a number of factors inÀuence preferences. What is important for our study is that (i) the linear order of the quantifying expressions and (ii) their grammatical functions have been shown to inÀuence scope. As for word order, surface scope is preferred to inverse scope (e.g. Johnson-Laird 1969). Concerning grammatical functions, subjects tend to scope over objects (e.g. Ioup 1975, Kurtzman & McDonald 1993). Applied to sentences (1) and (4), these ¿ndings are in line with Frey’s 2. Which in this case corresponds to the logical form in (3).

10

Oliver Bott & Fabian Schlotterbeck

(1993) semantic theory. In (1) linear precedence and grammatical function hierarchy are in conÀict, whereas in (4) both factors favor surface scope. In cases like (4), however, the existing studies do not allow us to determine whether the inverse reading is highly dispreferred or completely unavailable. A recent study by Musolino (2009) illustrates this point. In a TVJ experiment he reported 7.8% acceptance for the inverse scope of three boys are holding two balloons. The low acceptance suggests that this reading isn’t available, but, as Musolino himself points out, “this conclusion would be premature as the very low acceptance rate in this case might simply reÀect the fact that [inverse scope] is massively dispreferred” (p. 36). This brings us to the main research questions of the present case study. Do comprehenders have access to the inverse interpretation during reading a doubly quanti¿ed sentence? If so, is the inverse reading only possible in some constructions but not in others? The strongest claim concerning these questions is stated in the Unlimited Scope Ambiguity Hypothesis in (5)3. (5)

Unlimited Scope Ambiguity: During processing a multiply quanti¿ed sentence, all combinatorially possible orders of quanti¿ers constitute possible readings (i.e. a sentence with n quanti¿ers should allow n! readings).

3.2.

The tested constructions

To test this hypothesis we constructed four types of German doubly quanti¿ed sentences. A sample item is provided in (6)–(9) with vertical lines indicating segmentation in the ITVJ task. The subject of each sentence always contained the determiner genau ein (‘exactly one’) and the object included jeden dieser (‘each of these’). Sentence conditions (6) and (7) had SVO word order, whereas conditions (8) and (9) were object topicalized OVS sentences. For each word order, we compared potentially ambiguous sentences (6)/(8) with clearly unambiguous cases (7)/(9) that only have surface scope. The latter were disambiguated by making the quanti¿ers clause-bounded. These unambiguous controls served as baseline comparison for the potentially ambiguous conditions. The ambiguous conditions and the unambiguous controls were kept identical from the verb region until the end of the sentence to be able to compare them both with respect to rejection rates and reading times in the ITVJ task. 3. This is of course a considerable oversimpli¿cation. We are completely ignoring issues of absolute scope (e.g. quanti¿ers that are clause-bounded) and other constraints such as variable binding.

Incremental truth value judgments

11

(6)

Ambiguous, SVO Genau ein Lehrer| lobte| jeden dieser Schüler| … exactly one teachernom| praised| each these studentsacc| … ‘Exactly one teacher praised each of these students…’

(7)

Unambiguous, SVO Für genau einen Lehrer gilt:| er| lobte| jeden dieser Schüler|… for exactly one teacher holds:| henom| praised| each these studentsacc| … ‘For exactly one teacher holds: he praised each of these students…’

(8)

Ambiguous, OVS Jeden dieser Schüler| lobte| genau ein Lehrer|… each these studentsacc| praised| exactly one teachernom| … ‘Each of these students was praised by exactly one teacher…’

(9)

Unambiguous, OVS Für jeden dieser Schüler gilt:| ihn| lobte| genau ein Lehrer|… for each these students holds:| him| praised| exactly one teachernom| … ‘For each of these students holds: he was praised by exactly one teacher…’

(10) …voller| Wohlwollen. …‘full of| goodwill.’ One may ask whether the unambiguous controls really exclude the inverse interpretations. A possible counterargument that was pointed out to us by an anonymous reviewer is that (9) can be continued with an elaboration of a speci¿c referent as in …nämlich Lehrer Lutz (‘namely teacher Lutz’). We take this to be a matter which is orthogonal to scope. The linear interpretation of (9), i.e. (3), is compatible with a situation in which the same teacher praised all the students. The scopal facts only become clear when we test the sentences in contexts which clearly disambiguate towards either scope reading. 3.3.

Disambiguating Contexts

We paired the sentence materials with two types of set diagrams. Sample diagrams are shown in Figure 1. The ¿rst type of diagram, ‫!׌׊‬-diagrams; (Figure 1a), was a model satisfying the ‫!׌׊‬-reading but was inconsistent with the ‫׊!׌‬-reading. The second type, ‫׊!׌‬-diagrams (Figure 1b), depicted a model only satisfying the ‫׊!׌‬-reading but not the ‫!׌׊‬-reading. Both sets and the verbal predicate were labeled in the diagrams. In a cross-methodological compari-

12

Oliver Bott & Fabian Schlotterbeck

a) ‫ !׌׊‬diagram

b) ‫ ׊!׌‬diagram

Figure 1. Disambiguating set diagrams used in Experiment 1 and 2 (labels translated from German into English).

son Bott & Radó (2007) showed that disambiguation via set diagrams of this sort yield highly reliable and valid results compared to two other methods of disambiguation. In order to keep participants from developing strategies, we varied the depicted situations in both types of diagrams. The ‫!׌׊‬-diagrams consisted in situations with one-to-one mappings between the two sets, but the majority were situations as in Figure 1a which are intuitively less typical models of a ‫!׌׊‬-reading. The ‫׊!׌‬-diagrams always had additional lines to rule out an ‫!׌׊‬-interpretation, but we varied the exact number of additional lines between items. We will come back to processing consequences of typical vs. atypical models in Experiment 3. Crossing the two picture types with the four construction types yielded eight conditions in a 2 × 2 × 2 (word order × ambiguity × reading) within-design. We constructed 32 experimental sentence-picture pairs in eight conditions each and distributed them over eight lists in a latin square design. The same 73 ¿llers (31 true and 42 false sentence-diagram pairs) were added to each list. 3.4.

Predictions

The Unlimited Scope Ambiguity Hypothesis makes the following predictions. The unambiguous controls should only be compatible with diagrams disambiguating towards surface scope. The ambiguous constructions should be equally compatible with contexts disambiguating towards surface and inverse scope. This pattern should be the same in the OVS- and in the SVOconditions. Frey (1993) and Pafel (2005) are more restrictive. Frey predicts the ambiguous OVS condition in (8) to be compatible with both surface scope and

Incremental truth value judgments

13

inverse scope diagrams, whereas the ambiguous SVO condition in (6) should only be compatible with linear scope. In diagrams disambiguating towards inverse scope, the purportedly ambiguous SVO construction (6) should thus be rejected as often as unambiguous controls. Pafel, by contrast, predicts exactly the opposite pattern. While the SVO construction in (6) should be ambiguous, the OVS construction in (8) should only allow surface scope. The semantic theories do not allow us to derive any predictions about the processing of these constructions. We hypothesized, though, that the available readings might be more limited in the ITVJ task than in an ordinary TVJ task. In the TVJ task participants have access to the complete sentence and may thus transform it in a way to ¿t the context. They may, for instance, change the sentence form to match the picture by putting it in eg. passive voice or apply some semantic transformation not available during online processing. This should be impossible in the ITVJ task where readers have to decide whether the incoming, yet incomplete sentence ¿ts the context. More concretely, we expected only those readings to be available in the ITVJ task which can be arrived at by means of grammatical operations independently required for parsing. It has been claimed that OVS constructions involve more structure than SVO constructions in that they require a trace and movement of the topicalized object (e.g. Gorrell 2000). The presence of the trace, however, opens up the possibility to reconstruct the topicalized object quanti¿er in its base position leading to inverse scope. The reconstruction process can be either syntactically guided (e.g. von Stechow 1991) or purely semantic (Sternefeld 2001). By contrast, reconstruction is impossible in the SVO construction and inverse scope should thus be impossible online, even though it may be possible ofÀine.

4.

Comparing TVJ and ITVJ

4.1.

Experiment 1 – ofÀine data (TVJs)

The ¿rst experiment was an ordinary TVJ task. The ¿ndings of this experiment served as evaluative comparison for the new method. As outlined above ambiguity can arise at different stages during interpretation. While ordinary TVJs reÀect all stages, ITVJs are much closer to online processing and, in particular, should not reÀect readings that come about post-interpretively, that is only during reasoning about the possible meanings. If the results of an ITVJ task experiment contrast with the results of the present experiment that will allow us to draw conclusions about whether both the OVS and SVO sentences

14

Oliver Bott & Fabian Schlotterbeck

are ambiguous and will shed light on whether scope ambiguity can already be attested during the online processing of doubly quanti¿ed sentences. 4.1.1. Method The experiment was a TVJ task experiment in which participants judged how well the sentence ¿ts the diagram without any time pressure. The experiment was administered over the internet using WebExp 2 (Mayo et al. 2006). Participants were tested in a quiet computer pool. They ¿rst received written instructions and completed a practice session with ten trials. Then the experiment followed in a single block. The sentence-diagram pairs were presented in a random order. Acceptability judgments had to be provided on a seven point scale. This was done to give participants the opportunity to indicate imperfect matches by intermediate values. The experiment took about 20 minutes. 48 German native speakers from the faculty of modern languages (mean age: 24.7; range: 20–33; 32 female) participated in the study for payment of 5 €. We normalized the data by computing z-scores for each subject. Table 1. Mean judgments (+ standard errors of the mean) in Experiment 1. surface scope ratings untransformed normalized 6.09 (0.15) 0.42 (0.05) Ambig. SVO 0.65 (0.03) Unambig. SVO 6.68 (0.09) 6.13 (0.15) 0.44 (0.06) Ambig. OVS 0.57 (0.04) Unambig. OVS 6.46 (0.12)

inverse scope ratings untransformed normalized 2.89 (0.22) í0.79 (0.08) 1.67 (0.14) í1.26 (0.06) 3.41 (0.20) í0.58 (0.07) 2.78 (0.19) í0.82 (0.07)

4.1.2. Results Table 1 presents the mean judgments in the experimental conditions. Surface scope was generally preferred to inverse readings. This preference was modulated by sentence type. It was strongest in the unambiguous SVO sentences followed by the unambiguous OVS construction. In the ambiguous conditions the surface scope preference was weaker, but again it was more pronounced in the SVO than in the OVS sentences. A repeated measures ANOVA revealed a reliable main effect of reading (F1(1,47) = 183.70, p < 0.01; F2(1,35) = 1144.25, p < 0.01) reÀecting the surface scope preference. There was also a signi¿cant main effect of order (F1(1,47) = 8.38, p < 0.01; F2(1,35) = 23.45, p < 0.01) and a marginal main effect of ambiguity (F1(1,47) = 4.58, p < 0.05; F2(1,35) = 3.74, p = 0.06). The former was due to SVO sentences being rated better than OVS sentences and the latter

Incremental truth value judgments

15

to higher ratings for ambiguous than for unambiguous sentences. As it turned out, there was also an interaction of order and reading (F1(1,47) = 6.88, p < 0.05; F1(1,35) = 18.26, p < 0.01) which was due to higher ratings in the inverse OVS conditions (mean z-score: í0.70) than in the inverse SVO conditions (mean zscore: í1.03). Most importantly, the analysis revealed a signi¿cant interaction of reading and ambiguity (F1(1,47) = 55.28, p < 0.01; F2(1,47) = 43.51, p < 0.01) as well as a marginal three-way interaction of order, reading and ambiguity (F1(1,47) = 3.75, p = 0.59; F2(1,35) = 2.90, p = 0.99). The two-way interaction reÀects stronger surface scope preferences in the unambiguous than in the ambiguous conditions. The three-way interaction was due to the fact that this pattern was more clear-cut in the SVO than in the OVS sentences. In separate analyses of the SVO conditions only, we found a reliable main effect of reading (F1(1,47) = 474.45, p < 0.01; F2(1,35) = 956.23, p < 0.01) reÀecting the general surface scope preference. Furthermore, the main effect of ambiguity (F1(1,47) = 4.14, p < 0.05; F2(1,35) = 3.90, p = 0.06) was reliable by participants, but marginal by items. The main effect was due to higher ratings in the ambiguous conditions than in the unambiguous conditions. Crucially, the interaction of reading and ambiguity was also signi¿cant (F1(1,47) = 44.03, p < 0.01; F2(1,35) = 33.16, p < 0.01). This was because the surface scope preference was stronger in the unambiguous than in the ambiguous conditions. Pairwise comparisons revealed signi¿cant differences both in the inverse (t1(47) = í11.10, p < 0.01; t2(35) = í14.78, p < 0.01) and in the surface scope conditions (t1(47) = 5.47, p < 0.01; t2(35) = 6.85, p < 0.01). ANOVAs analyzing only the OVS conditions revealed a reliable main effect of reading. It was due to the fact that surface scope diagrams were rated better than inverse scope diagrams (F1 (1,47) = 165.57, p < 0.01; F2 (1,35) = 345.19, p < 0.01). We also found that the surface scope preference was reliably stronger in the unambiguous than in the ambiguous conditions leading to a signi¿cant interaction of reading and ambiguity (F1 (1,47) = 10.24, p < 0.01; F2(1,35) = 8.15, p < 0.01). The interaction was somewhat weaker than in the SVO conditions. In pairwise comparisons both the inverse (t1(47) = í5.29, p < 0.01; t2(35) = í4.65, p < 0.01) and the surface scope conditions (t1(47) = 3.41, p < 0.01; t2(35) = 3.43, p < 0.01) differed signi¿cantly from one another. 4.1.3. Discussion In the present experiment both ambiguous OVS and SVO conditions allowed an inverse interpretation. The inverse readings were above the baseline controls across both construction types. At ¿rst sight, this provides evidence in favor of the Unlimited Scope Ambiguity Hypothesis. What remains unexplained, however, is the preference for surface scope. Unlimited scope ambiguity would

16

Oliver Bott & Fabian Schlotterbeck

let us expect both readings to be equally available. What is clearly needed is a gradient notion of scope which can also account for graded preferences. In contrast to Frey’s (1993) predictions about doubly quanti¿ed SVO sentences, the ambiguous SVO construction was compatible with inverse scope. The SVO sentences seemed to be judged even “more ambiguous” than the OVS sentences. Our ¿ndings are not compatible with Pafel’s (2005) theory, either. Contrary to the predictions derived from his model, the OVS construction was ambiguous. With respect to the ambiguous SVO construction, though, the ¿ndings nicely ¿t the graded preferences predicted by this model. 4.2.

Experiment 2 – online data (ITVJs)

The ¿ndings of the ¿rst experiment show that readers are able to compute inverse interpretations when the sentence and the diagram are both present and there is no time limit. We cannot decide, however, whether readers were able to do so during reading or only at a later stage. To gain access to the intermediate stages we conducted an ITVJ version of the experiment. This might prove insightful in two respects. On the methodological side, diverging results between the ordinary TVJ and the ITVJ task will demonstrate that the two tasks in fact allow us to tap into different stages of sentence interpretation. On the semantic side, different results in the two tasks are relevant for the debate about quanti¿er scope ambiguities. To anticipate the ¿ndings, we will suggest that the fundamentally divergent intuitions reported in the literature can be reconciled once we take into account the cognitive effort which theoreticians might have put into their judgments. 4.2.1. Participants and procedure 40 native German speakers (mean age 25.6, range 20–67, 32 female) from Tübingen University participated in the study. They were naïve to the purpose of the study. Each participant received 8 € compensation. The ITVJ task experiment began with a practice session of ten trials. Feedback was provided only during the practice session. Three experimental blocks followed. Presentation order was randomized between and within blocks. Participants were tested individually. An experimental session took approximately 30 minutes. 4.2.2. Statistical analysis Cumulative rejections on the verb region and the region containing the second quanti¿er were analyzed using logit mixed effects models with ¿xed effects

Incremental truth value judgments

17

of word order, ambiguity, reading and their interactions as well as the random intercepts of items and participants (cf. Jäger 2008). RTs of “yes, go on” judgments on the second quanti¿er region were analyzed in the conditions with diagrams disambiguating towards surface scope. We limited the analysis to these cases because the other conditions received “no” responses on the majority of trials. We trimmed RTs longer than 3500 ms or below 100 ms. RTs were analyzed in a linear mixed effects model with ¿xed effects of word order and disambiguation, their interaction and random intercepts of items and participants (cf. Baayen, Davidson & Bates 2008). We limited the statistical analysis to regions up to the point when one of the experimental conditions was massively aborted4. 4.2.3. Results Mean cumulative rejection rates are presented segment by segment for SVO sentences in Figure 2 and for OVS sentences in Figure 3. Prior to the region containing the second quanti¿er all rejection rates were below 8%. From the second quanti¿er onwards there were clear differences in cumulative rejection rates with an overwhelming preference for surface scope diagrams. At the second quanti¿er, the unambiguous SVO conditions had been rejected more often after inverse diagrams than after surface scope diagrams (64.4% vs. 5.0%). The pattern was the same but numerically even more pronounced in the ambiguous SVO conditions with 72.5% cumulative rejections after inverse diagrams and 4.4% after surface scope diagrams. The preference for surface scope readings was less clear-cut in the OVS conditions. At the second quanti¿er, the unambiguous OVS construction had been rejected 64.4% of the time after an inverse and 30.6% after a surface scope diagram. The surface scope preference was even less pronounced in the ambiguous OVS condition. Here the second quanti¿er region received 56.9% cumulative rejections after an inverse and 36.3% after a surface scope diagram. The fact that OVS sentences were rejected more often than SVO sentences at the second quanti¿er led to a reliable effect of word order (estimate = í2.81; 4. Statistical analyses on later regions are dif¿cult to interpret because the number of still possible rejections may vary between conditions. This becomes immediately clear in the extreme case. Suppose that early in the sentence one condition is aborted 100%, but in another condition participants keep on reading. If later on the latter condition is aborted, say, 80% of the time we cannot make anything of this because there would be no data to sensibly compare it to. In the following, we will therefore only analyze rejections and judgment times using inferential statistics up to the point where participants massively aborted the trials. For the rest of the sentence we will only report the descriptive statistics.

18

Oliver Bott & Fabian Schlotterbeck

Figure 2. Cumulative rejection rates (SVO sentences) in Experiment 2.

Figure 3. Cumulative rejection rates (OVS sentences) in Experiment 2.

z = í6.3; p < 0.01) in the logit mixed effects model analysis. In addition, inverse diagrams were rejected more often than surface scope diagrams. This resulted in a reliable effect of reading (estimate = 1.04; z = í4.1; p < 0.01). The interactions of word order and reading (estimate = 3.68; z = 7.1; p < 0.01) and of ambiguity and reading were also significant (estimate = í0.7; z = í1.9; p = 0.05). We will come back to these effects when we discuss the effects of the models analyzing

Incremental truth value judgments

19

the two word orders separately. The three-way interaction of word order, reading and ambiguity was marginally significant (estimate = 1.32; z = 1.8; p = 0.07). Thus, the amount of ambiguity differed between SVO and OVS constructions. To further break down the three-way interaction, we computed separate logit mixed effects analyses for the SVO and the OVS conditions. The analysis of SVO conditions revealed a signi¿cant effect of reading (estimate = 4.80; z = 9.96; p < 0.01) reÀecting the preference for surface scope. Neither the effect of ambiguity nor the interaction of reading and ambiguity were reliable (|z| < 1). Numerically, the difference between surface scope and inverse scope conditions was even bigger in the purportedly ambiguous cases than in the disambiguated controls. In the OVS analysis, the effect of reading was again signi¿cant (estimate = í1.07; z = í4.21; p < 0.01) showing an across-the-board preference for surface scope. The interaction of reading and ambiguity was also signi¿cant (estimate = í0.72; z = í1.97; p < 0.05). It was due to smaller differences between surface scope and inverse diagrams in the ambiguous than in the unambiguous conditions. A one-sided pairwise comparison revealed that the difference between the ambiguous inverse scope condition and its corresponding control was signi¿cant (estimate = í0.52; z = í1.8; p < 0.05) and the difference between the two surface scope conditions was marginal (estimate = í0.44; z = í1.53; p = 0.06). As predicted, readers were thus able to compute inverse scope in the ambiguous OVS constructions while reading the sentence. More evidence that scope interaction only took place in the OVS sentences comes from the analyses of RTs (depicted in Figure 4). Up to the verb region the surface scope conditions did not differ from each other (|t| < 0.5). On the second quanti¿er region there were clear differences in RTs. While the unambiguous OVS condition had a mean RT of 1615 ms, the ambiguous OVS condition had a mean RT of 2420 ms. Unambiguous and ambiguous SVO conditions had mean RTs of 1167 ms and 1290 ms, respectively. The statistical analysis revealed a reliable effect of word order (estimate = í1709; t = í5.01; p < 0.01) and of ambiguity (estimate = í1324; t = í3.67; p < 0.01). These main effects reÀect the fact that on the second quanti¿er region it took participants longer to respond “yes” in the OVS than in the SVO condition and in ambiguous than in unambiguous sentences, respectively. Crucially, the interaction of word order and ambiguity was also reliable (estimate = 615; t = 2.91; p < 0.01). The interaction is due to the fact that in the OVS conditions the second quanti¿er had longer RTs in the ambiguous than in the unambiguous condition, whereas RTs were indistinguishable between the SVO conditions. A pairwise comparison analyzing only the OVS conditions revealed a reliable effect of ambiguity (estimate = í639; t = í3.05; p < 0.01). Another comparison analyzing only the SVO

20

Oliver Bott & Fabian Schlotterbeck

Figure 4. Judgment RTs of “yes” judgments in Experiment 2 (linear scope conditions only).

conditions did not reveal a reliable effect (t = í1.01; p = 0.32). The pattern was the same at the subsequent segments. 4.2.4. Discussion There were interesting differences between TVJs and ITVJs. For SVO sentences, the new method did not yield any indication of ambiguity. The purportedly ambiguous SVO construction was indistinguishable from the unambiguous controls both in rejection rates and in judgment times. The ¿ndings thus provide evidence against the Unlimited Scope Ambiguity Hypothesis. They instead support Frey’s account (1993) who predicted that doubly quanti¿ed German SVO sentences would only have surface scope. Nevertheless, we were able to detect ambiguity in the OVS constructions. In this respect, the data pattern ¿ts the results of the previous experiment. The surface scope preference was not as strong as in the unambiguous control. Finding this interaction in the ITVJ task, too, indicates that the inverse interpretation was accessible already during reading. Further evidence for this claim comes from RTs. We found longer RTs for “yes, go on” button presses in the ambiguous OVS construction than in in the disambiguated control. This may be due to solving a scope conÀict in the ambiguous construction. Again, our ¿ndings are compatible with Frey’s (1993) scope theory. However, the strong preference for surface scope remains unexplained under his account. The similarities and differences between the two methods ¿t our processing assumptions. In Experiment 2 the inverse reading was only available in the ob-

Incremental truth value judgments

21

ject topicalized construction, where reconstruction is independently assumed, but not in SVO constructions which lack this possibility. This result has implications in two directions. First, by comparing TVJs to ITVJs it is possible to disentangle those readings that are only available postinterpretively from the ones accessible during online interpretation. Secondly, semanticists should pay attention to processing because depending on how deeply a given sentence is processed intuitions may be completely different. If we distinguish readings available during ‘‘ordinary’’ comprehension from readings that only become available post-interpretively, it becomes possible to explain the variation of introspective scope judgments in the semantic literature. A possible objection to these conclusions could be that due to enhanced task complexity ITVJs are noisier than TVJs or that dispreferred readings are generally less available in the ITVJ task than under normal circumstances. Neither of these alternatives can explain our data in all their particulars. This is because in the ITVJ task readings which were possible in the ¿rst experiment were ¿ltered selectively. In the TVJ task the relative difference between the preferred and the dispreferred reading was even smaller in the ambiguous SVO sentences than in the OVS sentences. In the present ITVJ experiment, however, the dispreferred reading was ¿ltered only in the former construction. This is completely unexpected under the assumption of a general shift in preference towards the preferred interpretation. At ¿rst glance two aspects of the data seem problematic, however. The ¿rst one is that at the end of the sentence the ambiguous OVS construction under the inverse reading was rejected nearly as often as the corresponding control. Should we really assume that the OVS sentences were ambiguous, then? We replicated the OVS part5 of the present experiment to see whether the interaction of reading and ambiguity in the OVS conditions was stable across experiments. Again, this interaction turned out to be reliable, this time even up to the last segment of the sentence. A second concern is that the unambiguous OVS construction was falsely rejected in more than 40% of the cases. In Experiment 3, we investigated whether this error might be due to the atypicality of the disambiguating diagrams.

5. In this replication we tested ‫ !׌׊‬diagrams that contained no extra objects or branching lines, cf. Experiment 3. Participants made practically no errors (< 5%) with these diagrams, lending further support to the ¿ndings of Experiment 3.

22

5.

Oliver Bott & Fabian Schlotterbeck

Experiment 3: Typicality

Why did participants falsely reject the unambiguous OVS condition almost 40% of the time following a picture compatible with the only possible reading? Should we, nevertheless, rely on a task that gives rise to so many errors? In the present experiment we investigated different kinds of disambiguating situations to ¿nd out what might have gone wrong. We will see that results become perfectly unbiased even for the unambiguous OVS surface scope condition if we keep the disambiguations as simple as possible. Up to now, we have been equating the meaning of an expression with the particular situation used for disambiguation. This may not be entirely correct since a meaning corresponds to a whole set of situations and it is possible that some situations or models are easier to evaluate with respect to a certain meaning than others. Thus, presenting a model that is far from the most typical scenario may lead to more errors than a more typical model. Consider the ‫!׌׊‬-diagrams in Figure 1a again. Reinspecting them reveals two aspects we didn’t pay attention to when we constructed the materials. These might have caused the errors. Both features are illustrated in Figure 1a. The ¿rst thing to notice is that the restrictor set contains an additional object (an ‘extra’ teacher), so there is no one-to-one correspondence between the two sets. Secondly, the model includes a branching con¿guration (i.e. the same teacher praising two students). In the present experiment we manipulated these two features in a factorial design while keeping the sentences and their meaning constant. Interestingly, one of these features has already been extensively studied in the acquisition literature on universal quanti¿cation. Since the pioneering work of Inhelder & Piaget (1959) it has been established that children show certain non-adult responses in interpreting sentences containing a universal quanti¿er. One ¿nding is that children tend to judge sentences like every boy is riding an elephant to be false in a context with each boy riding an elephant and an extra elephant without a rider, but behave like adults when no extra elephant is present in the scene. Thus, children seem to not only restrict the quanti¿cational domain to the set of boys but also falsely consider the whole set of elephants. In comparison to an ordinary TVJ task, the incremental version arguably puts more cognitive load on the participants. While keeping context in memory they simultaneously have to read and relate the incoming sentence to the picture. Under closer scrutiny, the ITVJ task provides a multiple task setting and it thus seems very likely that it gives rise to errors that wouldn’t occur under ‘normal’ circumstances. So, from a methodological perspective, it would be important to know where the limits of the new method are. As a side-effect of establishing

Incremental truth value judgments

23

that, we will demonstrate that the method can be used to gain insights into the default interpretation associated with a particular meaning. 5.1.

Method

We tested the 32 experimental sentences from the previous experiment in the unambiguous OVS conditions. For each item, we drew four new ‫!׌׊‬-diagrams. Figure 5 depicts sample diagrams. We manipulated the presence of extra (i.e. unconnected) objects in the restrictor set as well as the presence of branching lines (i.e. an element in the set on the left that is connected to two elements on the right) according to a 2×2 factorial design (extra object (+/í eo) vs. branching line (+/í bl)). The simplest condition [íeo, íbl] were diagrams with neither an extra object nor a branching line. In the [+eo, íbl] condition the diagrams included an extra object. The [íeo, +bl] diagrams had a branching line but no extra object. The [+eo, +bl] diagram had both features. We added the ¿llers from the ¿rst experiments and created counterbalanced lists according to a latin square. The procedure was the same as in the previous experiment. 40 native German speakers from Tübingen University (mean age 23.9, range 19–35, 31 female) took part in the study for a payment of 8 €.

a) [íeo, íbl]

b) [+eo, íbl]

c) [íeo, +bl]

d) [+eo, +bl]

Figure 5. ‫ !׌׊‬set diagrams in Exp. 3 (eo = extra object; bl = branching line).

24

5.2.

Oliver Bott & Fabian Schlotterbeck

Results

Error rates clearly differed between conditions. At the second quanti¿er region, [íeo, íbl] was rejected only 0.6% of the time, [+eo, íbl] 7.5% of the time, [íeo, +bl] 36.3% and [+eo, +bl] 31.8% of the time. Thus, participants basically made no mistakes in the [íeo, íbl] condition. Errors only occurred when extra objects or branching lines were present. The end of sentence rejection rates followed the same pattern: 1.25%, 11.25%, 44.38% and 45.00% respectively. We computed a logit mixed effects model analyzing cumulative rejection rates of the second quanti¿er region with branching line and extra object as well as their interaction as ¿xed effects and participants and items as random intercepts. The statistical analysis revealed signi¿cant main effects of branching line (estimate = 5.00; z = 4.55; p < 0.01) and of extra object (estimate = 2.68; z = 2.38; p < 0.05) and a signi¿cant interaction between the two factors (estimate = í2.93; z = í2.53; p < 0.05). To further analyze the effect of branching line we computed pairwise comparisons: Comparing the [íeo, íbl] with the [íeo, +bl] condition revealed a signi¿cant effect (estimate = 5.7; z = 4.13; p < 0.01). A second model comparing only the [+eo, íbl] with the [+eo, +bl] condition also yielded a signi¿cant effect (estimate = 2.03; z = 5.35; p < 0.01). Thus, pictures with a branching line lead to an increase in rejections irrespective of the presence or absence of an extra object. Two more pairwise comparisons investigated the inÀuence of extra objects. The ¿rst was a comparison of the [íeo, íbl] with the [+eo, íbl] condition. It revealed that the presence of extra objects similarily resulted in a signi¿cant increase in rejection rate (estimate = 5.19; z = 2.49; p < 0.05). Another model comparing the [íeo, +bl] with the [+eo, +bl] condition revealed no reliable difference between the two conditions (effect of extra object: estimate = í0.23; z = 0.25; p < 0.05). Thus, although there was a reliable extra object effect when no branching line was present, there was no effect, if the picture had a branching line. 5.3.

Discussion

This experiment demonstrates the limits of ITVJs. We have to be careful when constructing the materials to be tested in the task. Choosing the simplest disambiguating context, i.e. the [íeo, íbl] diagrams, yielded perfectly unbiased results, but, more complex models lead to a massive amount of errors. For constructing materials to be tested in an ITVJ experiment, we therefore suggest to ¿rst consider the most typical model associated with the meaning

Incremental truth value judgments

25

under investigation. The disambiguating context should then be constructed in a way to adhere to the default model as closely as possible. Presumably, Experiment 2 would have yielded perfectly unbiased results if we had used ‫!׌׊‬-diagrams with a one-to-one mapping between the two sets. 6.

Conclusions

We have proposed a new method, the ITVJ task, which is intended to track the available interpretations and their relative preferences during the processing of semantic ambiguity. Our case study revealed that the method can be applied to phenomena as dif¿cult and subtle as quanti¿er scope for which judgments are often far from clear. When it comes to the computation of a dispreferred reading, we distinguished between the readings that become available online during reading and those that only become accessible later, that is, post-interpretively. The comparison of ITVJs and TVJs in the ¿rst two experiments suggests that the inverse readings of SVO and OVS doubly quanti¿ed sentences are generated during different processing stages. While the inverse reading was available during reading OVS sentences, it was completely absent in the online processing of SVO sentences. We do not see how we could have arrived at this result using any other method since in both OVS and SVO constructions the inverse reading was massively dispreferred. Our ¿ndings have important implications for semantic theories on quanti¿er scope that go well beyond the three theories selected for this case study. In the ¿rst two experiments we observed a strong preference for surface scope. Only very few of the existing theories on quanti¿er scope can account for the graded nature of these preferences. In our fairly limited sample, only Pafel’s (2005) model could account for graded preferences. Moreover, when we considered the obtained data in their entirety none of the three approaches was fully successful. Extending the discussion to theories on quanti¿er interaction in general, to date there is, at least to our knowledge, no model that can fully account for the observed pattern of scope interpretations. A descriptively adequate model is lacking even more if we want to understand at which processing stages dispreferred scope readings become available. As a ¿rst step, we proposed that grammatical mechanisms such as reconstruction can make a dispreferred interpretation accessible during online comprehension. Finally, we demonstrated that it is crucial to carefully consider the disambiguating contexts to be tested in ITVJ task experiments. We showed that readers are not able to compensate for atypical models. Of course, this opens up

26

Oliver Bott & Fabian Schlotterbeck

another possible application of the task, namely to investigate what the most typical model of a given reading might be.

References Baayen, R. Harald, Douglas J. Davidson & Douglas M. Bates 2008 Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language 59: 390–412. Bott, Oliver & Janina Radó 2007 Quantifying quanti¿er scope: A cross-methodological comparison. In: Roots: Linguistics in Search of its Evidential Basis. Sam Featherston and Wolfgang Sternefeld (eds.), 53–74. Berlin/New York: de Gruyter. Caplan, David & Gloria S. Waters 1999 Verbal working memory and sentence comprehension. Behavioral and Brain Sciences 22 (1): 77–94. Conroy, Anastasia M. 2008 The role of veri¿cation strategies in semantic ambiguity resolution in children and adults. Dissertation. University of Maryland. Cooper, Roger M. 1974 The control of eye ¿xation by the meaning of spoken language: A new methodology for the real-time investigation of speech perception, memory and language processing. Cognitive Psychology 6: 84–107. Crain, Stephen & Rosalind Thornton 1998 Investigations in universal grammar: A guide to experiments on the acquisition of syntax and semantics. Cambridge, MA: MIT Press. Fodor, Jerry A. 1983 The modularity of mind. Cambridge, MA: MIT Press. Frey, Werner 1993 Syntaktische Bedingungen für die semantische Interpretation: Über Bindung, implizite Argumente und Skopus. Berlin: Akademie Verlag. Gorrell, Paul 2000 The subject-before-object preference in German clauses. In: German sentence processing. Barbara Hemforth and Lars Konieczny (eds.), 25– 64. Dordrecht: Kluwer. Inhelder, Bärbel & Jean Piaget 1959 The early growth of logic in the child. London: Routledge. Ioup, Georgette 1975 Some universals for quanti¿er scope. In: Syntax and Semantics (Vol. 4), J. Kimball (Ed.), 37–58. New York: Academic Press.

Incremental truth value judgments

27

Jaeger, T. Florian 2008 Categorical data analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models. Journal of Memory and Language 59: 434–446. Johnson-Laird, Philip 1969 On understanding logically complex sentences. Quarterly Journal of Experimental Psychology 21: 1–13. Kurtzman, Howard S. & Maryellen C. MacDonald 1993 Resolution of quanti¿er scope ambiguities. Cognition 48: 243–279. May, Robert 1977 The grammar of quanti¿cation. Ph.D. dissertation, MIT. Mayo, Neil, Martin Corley & Frank Keller 2006 WebExp 2 – experimenter’s manual. University of Edinburgh. Montague, Richard 1973 The proper treatment of quanti¿cation in English. In: Approaches to Natural Language. Jaakko Hintikka, Julius Moravcsik & Patrick Suppes (eds.), 221–242. Dordrecht: Reidel. Musolino, Julien 2009 The logical syntax of number words: Theory, acquisition and processing. Cognition 111: 24–45. Pafel, Jürgen 2005 Quanti¿er scope in German. Amsterdam/Philadelphia: John Benjamins. Pickering, Martin J., Brian McElree, Steven Frisson, Lillian Chen & Mathew J. Traxler 2006 Underspeci¿cation and aspectual coercion. Discourse Processses 42 (2): 131–155. Pylkkänen, Liina & Brian McElree 2006 The syntax-semantic interface: Online composition of sentence meaning. In: Handbook of Psycholinguistics, 2nd ed., Matthew J. Traxler & Morton Ann Gernsbacher (eds.), 539–580. Amsterdam: Elsevier. Reinhart, Tanya 1983 Anaphora and semantic interpretation. London: Croom Helm. Sanford, Anthony J. & Patrick Strurt 2002 Depth of processing in language comprehension: Not noticing the evidence. Trends in Cognitive Sciences 6 (9): 382–386. Sternefeld, Wolfgang 2001 Semantic vs. syntactic reconstruction. In: Linguistic form and its computation, Hans Kamp, Antje Rossdeutscher and Christian Rohrer (eds.), 145–182. Stanford: CSLI. Szabolcsi, Anna 2010 Quanti¿cation. Research Surveys in Linguistics. Cambridge: Cambridge University Press.

28

Oliver Bott & Fabian Schlotterbeck

Tanenhaus, Michael K., Michael J. Spivey-Knowlton, Kathleen M. Eberhard & Julie E. Sedivy 1995 Integration of visual and linguistic information in spoken language comprehension. Science 268: 1632–1634. Tanenhaus, Michael K., Julie E. Boland, Susan M. Garnsey & Greg N. Carlson 1989 Lexical structure in parsing long-distance dependencies. Journal of Psycholinguistic Research 18 (1): 37–50. Todorova, Marina, Kathy Straub, William Badecker & Robert Frank 2000 Aspectual coercion and the online computation of sentential aspect. In: Proceedings of the 22nd annual conference of the Cognitive Science Society, Chicago. 3–8. von Stechow, Arnim 1991 Syntax und Semantik. In: Semantik – Ein internationales Handbuch zeitgenössischer Forschung. Arnim von Stechow and Dieter Wunderlich (eds.), 90–148. Berlin/New York: de Gruyter.

Measuring Syntactic Priming in Dialogue Corpora Christian Pietsch, Armin Buch, Stefan Kopp & Jan de Ruiter

1.

Introduction

1.1.

What is syntactic priming, and why do we care whether it exists?

1.1.1. Priming Priming, as studied by psychologists, can be de¿ned as the effect that one occurrence of a stimulus (the prime) inÀuences the processing of a subsequentstimulus (the target) (paraphrased after Tulving & Schacter 1990, p. 301). Stimuli can have external (other-priming) or internal (self-priming) origin. The inÀuence on processing can be inhibitory or facilitative. Inhibitory effects have seldom been reported, and we will not discuss them here. The mention of subsequent implies a temporal distance between prime and target which can be measured in different ways, e.g. time or intervening items. In this study, we measure distance in seconds. 1.1.2. Linguistic priming Priming effects have been assumed (and observed) for several traditional levels of linguistic description, including lexis and syntax. Potentially, these effects offer a way to investigate linguistic units (and their status as such): Building blocks of a linguistic theory that claims cognitive adequacy can be expected to exhibit priming. We concentrate on lexical priming and syntactic priming1. Lexical priming can be observed more directly, and therefore it is far better attested than syntactic priming (see section 1.1.3). An apparent syntactic similarity might be explained solely by a sharing of similar lexical items. This would then not be 1. Terminological note: We choose to use the relatively established term “syntactic priming” although some (Bock 1986; Szmrecsanyi 2005) argue that “syntactic persistence” would be more adequate. Another term, “structural priming”, is sometimes used interchangeably but could also be used outside syntax and even outside linguistics for entirely different phenomena (Pickering & Ferreira 2008, pp. 427–428).

30

Christian Pietsch, Armin Buch, Stefan Kopp & Jan de Ruiter

a genuine case of syntactic priming, but of lexical priming (an insight which might be more obvious for those working within a lexicalist framework, where each lexical head determines the constructions it occurs in). 1.1.3. Syntactic priming Syntactic priming has been de¿ned succinctly and informally as “the tendency to reuse syntactic constructions” (Gries 2005), or more generally “as the proposal that processing a particular syntactic structure within a sentence affects the processing of the same (or a related) syntactic structure within a subsequently presented sentence” (Branigan et al. 1995). In a landmark experiment published in 1986 (Bock 1986), syntactic production–production priming (a form of syntactic self-priming) was ¿rst established and immediately interpreted as evidence for the autonomy of syntax and against the functionalist view of syntax (as held e.g. by Bates & MacWhinney 1982) which considers syntactic knowledge only a derivative of semantic or super¿cial properties of utterances. Since then, many psycholinguistic theories of speech production have attempted to take syntactic (self-)priming into account (Pickering & Ferreira 2008). More recently, syntactic priming results have also been used to judge the comparative merits of different grammar formalisms (Reitter et al. 2006a). In contrast, evidence for syntactic other-priming has emerged only since 1998 (Potter & Lombardi 1998; Branigan et al. 2000), and mainly from lab experiments. This evidence suggests that comprehension and production processes operate on the same (or closely connected) representations. It has also been used to argue for the assumption that priming drives dialogue (Pickering & Garrod 2004). However, the corpus studies on syntactic priming we are aware of paint an inconsistent picture, especially with respect to comprehension–production priming. We mention some of them in section 1.2 below. As for the cause of priming effects, there are two prominent explanations. Many believe that priming effects are caused by transient activation spreading in the neural networks of the brain (Branigan et al. 2000; Pickering & Garrod 2004; Jaeger & Snider 2008). Others see the cause of priming effects in implicit learning (Bock & Grif¿n 2000; Chang et al. 2006). The spreading-activation account corresponds to short-lived priming effects, whereas effects of implicit learning are longer-lived. Recent experimental evidence seems to suggest that lexical priming is short-lived while syntactic priming is persistent (Hartsuiker et al. 2008). Alternative explanations for syntactic priming include social mimicry (Balcetis & Dale 2005), and the tendency to reduce processing costs (discussed by Smith & Wheeldon 2000, p. 127).

Measuring Syntactic Priming in Dialogue Corpora

31

In the dialogue corpora we use, the longest time span over which priming effects could be observed is the length of a single dialogue. Therefore we do not have access to long-term effects, and will hence be investigating the short-term effects of syntactic priming. 1.1.4. Motivation Our own interest in syntactic priming is driven by the following long-term research questions: (1)

Can syntactic priming be used to identify the cognitively adequate units of syntactic structure?

These units, known as syntactic exemplars (Bybee 2006)2, 3, are conjectured to span one or more words. As we outlined above, it is desirable for a theory to use empirically veri¿ed building blocks. Results could also be useful for computerimplemented natural language processing (Dubey et al. 2006). (2)

Is there syntactic priming in natural dialogue?

Below we will present some reasons for being skeptical. In this connection, we are primarily interested in other-priming, especially comprehension–production priming, as this is widely believed to cause syntactic alignment, a key ingredient of the inÀuential Interactive Alignment Model (IAM) of dialogue (Pickering & Garrod 2004). In a nutshell, the IAM assumes the existence of automatic alignment between the representations in both participants of a dialogue, and at several levels of representation, including the lexical and the syntactic one. This is assumed to lead to situation-model alignment and hence, successful dialogue. The IAM further postulates that at each level, alignment is caused by a “primitive and resource-free” priming mechanism, and that alignment at one level 2. The term exemplar refers to the actual (and repeated) occurrences of a type. 3. We would like to list some terms we consider related because a theory of syntax with an adequate notion of “basic unit” could provide a uni¿ed treatment for them: “multi-word expressions” (MWE), “multi-word unit” (MWU), “prefab”, “word cooccurrence”, “colli- gation”, “collocation”, “collostruction”, “collexeme”, “construction”, “idiom”, “phrasal verb”, “formulaic sequences”, “chunks”, “phraseological expression”, and more. Accounting for these phenomena is especially important in theories which try to model performance phenomena as well as “pure” linguistic competence.

32

Christian Pietsch, Armin Buch, Stefan Kopp & Jan de Ruiter

“percolates” up to the next level. For obvious reasons, self-priming does not come into consideration as a driver of dialogue. The study presented here can only address question 1 as it conÀates self-priming and other-priming. We will address question 2 in further research. 1.2.

Why is syntactic priming controversial?

The existence of syntactic priming is uncontroversial in lab experiments (Pickering & Ferreira 2008). However, lab experiments cannot provide reliable information about natural dialogue because they are conducted in tightly controlled environments. Experimenters strive to keep as many factors as possible under control so that they can claim a causal relationship between the factor they vary and the outcome. In language experiments, this often leads to situations where subjects are inÀuenced by confederates or instructions to restrict their behaviour because this makes the evaluation easier. This lowers the ecological validity, which is avoided in corpus studies. In the context of this research, an additional advantage of corpus studies lies in the possibility of investigating not only a few carefully selected constructions but all constructions/exemplars/rules that can be extracted (Reitter et al. 2006b; Reitter 2008; Healey et al. 2010a,b). Nevertheless, it is still the case that most corpus studies have limited themselves to speci¿c constructions (e.g. Gries 2005; Szmrecsanyi 2005; Dubey et al. 2005; Jaeger & Snider 2008; Howes et al. 2010). The results of corpus studies on syntactic priming have been inconsistent, especially those which investigate all constructions in a given corpus: Reitter et al. (2006b; 2008) reported signi¿cant effects for self-priming as well as otherpriming in the MapTask4 corpus. In the Switchboard5 corpus, he found selfpriming but no other-priming when using an utterance-based distance measure. With a time-based distance measure, he did ¿nd other-priming. Overall, David Reitter and his colleagues consider their ¿ndings compatible with the IAM (which, as we outlined in section 1.1.4, hinges on the existence of otherpriming). This is in stark contrast with the results of Patrick Healey and his colleagues (Healey et al. 2010a,b) who found evidence for structural divergence: Structural repetition across adjacent turns in natural dialogue (as documented

4. http://groups.inf.ed.ac.uk/maptask/ 5. http://www.ldc.upenn.edu/Catalog/readme ¿les/switchboard.readme.html

Measuring Syntactic Priming in Dialogue Corpora

33

in a different set of corpora: the DCPSE6 and the BNC7) was below chance level.

2.

Methods and data

2.1.

Hypothesis

Different linguistic theories propose different structures in the description of natural language. If these structures correspond to mental representations, then they can (and probably should, by the general and subconscious nature of priming) be primed. So ¿nding priming effects for such a structure offers support to the linguistic theory which proposed it. In other words: “[R]epeatable structures are evidence for the units of linguistic cognition” (Reitter 2008, sec. 1.2). 2.2.

Experiments

2.2.1. Preliminaries Classical priming experiments such as Bock & Grif¿n (2000) study a single, theory-neutral alternation in controlled experiments. In contrast, we study the distribution of each category in large annotated syntactic corpora (treebanks). Every sub-structure of an annotation is a possible category. We follow Reitter et al. (2006a,b) in studying Combinatory Categorial Grammar (CCG) categories and context-free production rules. In future work, it would be instructive to extend the latter annotation to include subtrees of arbitrary depth as in dataoriented parsing (Bod 1998). CCG assumes that there are many equivalent derivations for a given sentence analysis: the same lexical categories, but different modes of combination. Among these, the normal form derivation is the one along the lines of constituent bracketing, which is mostly right-branching for languages like English. The incremental derivation is as left-branching as possible; see Reitter et al. (2006a) for details. We use the same data as Reitter et al. (2006a,b): The Switchboard corpus (Godfrey & Holliman 1997) augmented with timing information (Calhoun 6. DCPSE: Diachronic Corpus of Present-Day Spoken English, a treebank combining the London-Lund Corpus of Spoken English (LLC) (Svartvik, 1990) and the British Com- ponent of the International Corpus of English (ICE-GB) (Nelson et al., 2002). Available from http://www.ucl.ac.uk/english-usage/projects/dcpse/. 7. BNC: British National Corpus (Burnard, 2007).

34

Christian Pietsch, Armin Buch, Stefan Kopp & Jan de Ruiter

et al. 2010), and annotated with either context-free rule expansions (sw-CFG) (Marcus et al. 1999) or CCG categories (sw-CCG-I and sw-CCG-N for incremental and normal form derivations, respectively (Hockenmaier & Steedman 2007); and the MapTask corpus (Anderson et al. 1991) with CFG annotation (mp-CFG). We also look at lexical priming (sw-words). Switchboard is a corpus of telephone conversations on loosely de¿ned topics, whereas MapTask (as the name suggests) contains dialogues in which an instructor has to communicate a path on a map to a follower in a cooperative task. 2.2.2. A simple measure of priming Newman et al. (2009) directly measure syntactic priming as a reduced reaction time in a brain region related to syntactic processing, facilitated by lexical priming of verbs. This effect is not particular to single syntactic categories or words, so priming as mental activation of representations cannot be measured directly (yet). In corpus studies, we observe the distribution of a category. The null hypothesis is a random distribution, described as a Poisson process. For this, the (temporal) distances between adjacent occurrences are exponentially distributed ( p(x) = Ȝ0eíȜ0 x ), where Ȝ0 equals the frequency of the category within the corpus. p(x) is the expected frequency of seeing the next instance of the category at exactly distance x. At distance 0 this is the frequency of the category itself, and integrated over all distances (ad in¿nitum) it is 1: In an in¿nite corpus, the category will surely be instantiated at some point. We compare this expected distribution of distances to the actual distribution. These also look like an exponential distribution (see Figure 1), yet with more short distances than expected from a random distribution. The gaps without any occurrence of the category now are longer. We ¿t an exponential curve with decay parameter Ȝ to the actual distribution. We interpret the ratio r = Ȝ /Ȝ0 as the strength of priming: The more the ¿tted parameter deviates from the expected one, the more skewed is the distribution. This parameter can be obtained for every single category, or as an aggregate over all categories. 2.2.3. Single Categories Figure 1 shows the estimated density function, a random distribution (dotted line), and the ¿tted, much steeper exponential (dashed line), for the expansion VP ĺ VB S. Across all corpora or their annotations, estimated parameters Ȝ are always larger than Ȝ0. Rare categories show more priming with r up to 2.3, and close to 1 for the very common expansion S ĺ NP VP (0.34 occurrences per second). Exponential decay ¿ts well, with standard deviation around 0.005.

35

0. 010 0. 000

0. 005

Density

0. 015

0. 020

Measuring Syntactic Priming in Dialogue Corpora

0

50

100

150

2 00

250

300

Distance

Figure 1. Distribution of pairwise distances of VP ĺ VB S.

The exponential decay supports the suggestion that priming is an effect of (short-term) memory. While frequent categories have generally less room for skewed distributions, there is still something more to be explained about the effect of frequency. 2.2.4. Corpus averages Measuring the overall priming in a corpus allows to compare several settings: different linguistic frameworks (CCG vs. CFG), spoken vs. written language, conversational (Switchboard) vs. task-oriented (MapTask). We normalize all categories for frequency (so that Ȝ0 = 1) and take the average. Corpus sw-CFG sw-CCG-I sw-CCG-N mp-CFG sw-words

decay parameter Ȝ 1.1589 1.0523 1.0364 1.4666 1.2521

standard error 0.0044 0.0054 0.0051 0.0049 0.0113

Standard errors are low, we have thus a good estimate of the actual distribution of distances. Yet Figure 2 suggests an even more extreme distribution. This

Christian Pietsch, Armin Buch, Stefan Kopp & Jan de Ruiter

1. 0 0. 0

0. 5

Density

1. 5

2. 0

36

0

1

2

3

4

5

6

Distance

Figure 2. Average over all categories (normalized by frequency) in MapTask.

might be a result of cumulating activation: short distances trigger more short distances. We see strong lexical priming (1.25). Task-oriented dialogue outranks conversational dialogue (Reitter et al. 2006b; Pickering & Garrod 2004). CCG annotation shows comparably little priming. Results by Reitter et al. (2006a) stated that it is signi¿cant, but that the difference is not. 2.3.

Results

We have devised a notably simple priming measure.8 A single parameter Ȝ per category (or per corpus) suf¿ces, modeling the distribution of distances. Experiments show it to be larger than its expected value, which is the category’s frequency. The effect appears to be larger for rare categories. Interpreting the ¿tted Ȝ as a frequency is somewhat paradoxical: Primed categories seem more frequent than they actually are.

8. Our measure is simple in comparison to Reitter’s approach which relies on sophisticated statistical modelling using generalized linear mixed models (GLMM) for a logistic regression with random effects.

Measuring Syntactic Priming in Dialogue Corpora

37

So far we have viewed categories as mutually exclusive. This does not take into account priming of similar categories. Adding pairwise similarities to the model could improve it. A simple lexical example is stemming or lemmatization: A word also primes all inÀected forms. In the syntactic domain, one might consider a measure of similarity between constructions. The skewness in the distribution of a category (or rather, in the distribution of the pairwise distances of its instances) may be attributed to priming. If the skewness parameter r systematically deviates from 1 across different corpora, then this may be taken as evidence for a mental representation which corresponds to the category and is subject to priming. Thus the categories proposed by linguistic theories can be evaluated for their psycholinguistic validity. This is an approach to inform linguistic theory (about linguistic competence) by performance data (Reitter et al. 2006a).

3.

Conclusion

In this corpus study, we have presented a method for measuring syntactic priming based on the decay of repetition probability in a given window of prime– target distance. While our simple priming measure can be easily used to compare corpora, corpus annotation schemes, and grammar formalisms, it cannot distinguish between self-priming and other-priming, and we have not yet been able to de¿ne an absolute baseline for this model. Before our measure is used for evaluating theories of grammar, a word of caution is in order. In this study, we have interpreted repetitions of syntactic structure as evidence for priming. However, there are certainly other reasons for repeating linguistic constructions, namely the limited range of expressions for talking about certain states of affairs. Coherent texts and dialogues tend to concentrate on certain topics for a while. This is why task-oriented dialogue exhibits more repetition than conversational dialogue. A reliable priming measure should provide means for factoring out semantic and pragmatic aspects – an empirical baseline or control condition.

4.

Outlook

The most important next steps for us are to provide (a) methods to measure other-priming separately from self-priming, and (b) to compare these results against an empirical baseline. This would allow us to test theories of dialogue

38

Christian Pietsch, Armin Buch, Stefan Kopp & Jan de Ruiter

such as the Interactive Alignment Model (Pickering & Garrod 2004). Reitter (et al. 2006a; et al. 2006b; 2008) provided a solution for (a) but not for (b).

5.

Acknowledgements

This research has been supported by the Center of Excellence 277 “Cognitive Interaction Technology” (CITEC) of the DFG at Bielefeld University. The impetus for this study came from Prof. Gerhard Jäger (who has since moved from Bielefeld University to Tübingen University). We would like to thank David Reitter and Julia Hockenmaier for kindly providing us with their corpus data, and the anonymous reviewers for reading and commenting on a text that was far from ¿nished.

References Anderson, Anne H., Miles Bader, Ellen Gurman Bard, Elizabeth Boyle, Gwyneth Doherty, Simon Garrod, Stephen Isard, Jacqueline Kowtko, Jan McAllister, Jim Miller, Catherine Sotillo, Henry S. Thompson & Regina Weinert 1991 The HCRC Map Task Corpus. Language and Speech 34 (4): 351–366. Balcetis, Emily E. & Rick Dale 2005 An exploration of social modulation of syntactic priming. In Proceedings of the 27th Annual Meeting of the Cognitive Science Society, 184–189. Bates, E. & B. MacWhinney 1982 Functionalist approaches to grammar. In E. Wanner and L. Gleitman, (eds.), Language acquisition: The state of the art, 173–218. Cambridge University Press, New York. Bock, Kathryn 1986 Syntactic persistence in language production. Cognitive Psychology 18: 355–387. Bock, Kathryn & Zenzi M. Grif¿n 2000 The persistence of structural priming: Transient activation or implicit learning? Journal of Experimental Psychology 129 (2): 177–192. Bod, Rens 1998 Beyond Grammar: An Experience-Based Theory of Language. CSLI Lecture Notes. CSLI Publications, Stanford. ISBN 157586150X. Branigan, Holly, Martin Pickering & Alexandra Cleland 2000 Syntactic co-ordination in dialogue. Cognition 75: 13–25.

Measuring Syntactic Priming in Dialogue Corpora

39

Branigan, Holly, Martin Pickering, Simon Liversedge, Andrew Stewart & Thomas Urbach 1995 Syntactic priming: Investigating the mental representation of language. Journal of Psycholinguistic Research 24 (6): 489–506. Burnard, Lou 2007 Reference Guide for the British National Corpus (XML Edition). http:// www.natcorp.ox.ac.uk/XMLedition/URG/. Bybee, Joan 2006 From usage to grammar: The mind’s response to repetition. Language 82 (4): 711–733. Calhoun, Sasha, Jean Carletta, Jason Brenier, Neil Mayo, Dan Jurafsky, Mark Steedman & David Beaver 2010 The NXT-format Switchboard Corpus: a rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue. Language Resources and Evaluation 44: 387–419. ISSN 1574-020X. Chang, Franklin, Gary S. Dell & Kathryn Bock 2006 Becoming syntactic. Psychological Review 113 (2): 234–272. ISSN 0033-295X. Dubey, Amit, Frank Keller & Patrick Sturt 2006 Integrating syntactic priming into an incremental probabilistic parser, with an application to psycholinguistic modeling. In ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, 417–424. ACL, Morristown, NJ, USA. Dubey, Amit, Patrick Sturt & Frank Keller 2005 Parallelism in coordination as an instance of syntactic priming: evidence from corpus-based modeling. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT/EMNLP), 827–834. ACL, Morristown, NJ, USA. Godfrey, John J. & Edward Holliman 1997 Switchboard-1 Release 2. Linguistic Data Consortium, Philadelphia. Gries, Stefan Th. 2005 Syntactic priming: A corpus-based approach. Journal of Psycholinguistic Research 34: 365–399. Hartsuiker, R.J., S. Bernolet, S. Schoonbaert, S. Speybroeck & D. Vanderelst 2008 Syntactic priming persists while the lexical boost decays: Evidence from written and spoken dialogue. Journal of Memory and Language 58 (2): 214–238. ISSN 0749-596X. Healey, Patrick G. T., Matthew Purver & Christine Howes 2010a Structural divergence in dialogue. In Proceedings of the 20th Annual Meeting of the Society for Text and Discourse. Chicago, IL.

40

Christian Pietsch, Armin Buch, Stefan Kopp & Jan de Ruiter

Healey, Patrick G. T., Matthew Purver & Christine Howes 2010b Structural divergence in dialogue. In Architectures and Mechanisms for Language Processing. York, UK. Hockenmaier, Julia & Mark Steedman 2007 CCGbank: A corpus of CCG derivations and dependency structures extracted from the Penn Treebank. Computational Linguistics 33 (3): 355– 396. ISSN 0891-2017. Howes, Christine, Patrick G. T. Healey & Matthew Purver 2010 Tracking lexical and syntactic alignment in conversation. In Proceedings of the 32nd Annual Conference of the Cognitive Science Society (CogSci), 2004–2009. Portland, OR. Jaeger, T.F. & N. Snider 2008 Implicit learning and syntactic persistence: Surprisal and Cumulativity. In The 30th Annual Meeting of the Cognitive Science Society (CogSci), 1061–1066. Washington, D.C. Marcus, Mitchell P., Beatrice Santorini, Mary Ann Marcinkiewicz & Ann Taylor 1999 Treebank-3. Linguistic Data Consortium, Philadelphia. Nelson, Gerald, Sean Wallis & Bas Aarts 2002 Exploring natural language: working with the British component of the International Corpus of English. Varieties of English around the world, G29. John Benjamins. ISBN 9781588112712. Newman, Sharlene D., Kristen Ratliff, Tara Muratore & Thomas Burns Jr. 2009 The effect of lexical priming on sentence comprehension: An fMRI study. Brain Research 1285: 99–108. ISSN 0006-8993. Pickering, Martin & Victor Ferreira 2008 Structural priming: A critical review. Psychological Bulletin, 134 (3): 427–459. Pickering, Martin & Simon Garrod 2004 Toward a mechanistic psychology of dialogue. Behavioral and Brain Sciences 27 (2): 169–190. Potter, Mary C. & Linda Lombardi 1998 Syntactic priming in immediate recall of sentences. Journal of Memory and Language 38: 265–282. Reitter, David 2008 Context Effects in Language Production: Models of Syntactic Priming in Dialogue Corpora. Ph.D. thesis, University of Edinburgh. Reitter, David, Julia Hockenmaier & Frank Keller 2006a Priming effects in Combinatory Categorial Grammar. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 308–316. ACL, Sydney, Australia. Reitter, David, Johanna D. Moore & Frank Keller 2006b Priming of syntactic rules in task-oriented dialogue and spontaneous conversation. In Proceedings of the 28th Annual Conference of the Cog-

Measuring Syntactic Priming in Dialogue Corpora

41

nitive Science Society (CogSci), 685–690. Cognitive Science Society, Vancouver, BC, Canada. Smith, Mark & Linda Wheeldon 2000 Syntactic priming in spoken sentence production: an online study. Cognition 78 (2): 123–164. ISSN 0010-0277. Svartvik, Jan, (ed.) 1990 The London-Lund corpus of spoken English: description and research. Lund studies in English; 82. Lund Univ. Press, Lund. Szmrecsanyi, Benedikt 2005 Language users as creatures of habit: A corpus-based analysis of persistence in spoken English. Corpus Linguistics and Linguistic Theory 1: 113–149. Tulving, E. & D. L. Schacter 1990 Priming and human memory systems. Science 247 (4940): 301–306.

How structure-sensitive is the parser? Evidence from Mandarin Chinese Zhong Chen, Lena Jäger & Shravan Vasishth

1.

Introduction

A series of recent articles have revived the discussion about the role grammatical constraints play in online sentence comprehension processes. Phillips, Wagers & Lau (2011) provide a very comprehensive summary of this evidence available in the experimental literature. The conclusion from this body of work is that the human sentence comprehension mechanism (hereafter, the parser) utilizes fairly ¿ne-grained grammatical constraints in real time. A striking example of the parser’s use of grammatical information is the ¿nding that the parser does not posit gaps within syntactic islands; this suggests that the parser can use grammatical knowledge about island constraints online when it makes a decision about what structure to build. It is fair to say that the evidence for such a role for grammar is quite well-motivated (although it does not go uncontested, cf. Kluender 1998, Kluender & Kutas 1993, Sag, Hofmeister & Snider 2007, Hofmeister & Sag 2010). A related issue that Phillips et al. (2011) bring up in their review is the search strategy pursued by the parser as it completes dependencies between reÀexives and antecedents. Given the existing evidence that the parser utilizes grammatical constraints when making parsing decisions, it is reasonable to assume that this dependency resolution process is informed by grammatical constraints. Here, the relevant grammatical constraint is Principle A of the binding theory. This principle states, approximately, that the antecedent of a reÀexive must c-command the reÀexive and be in the same clause as the reÀexive. If the parser uses Principle A to ¿nd the antecedent of a reÀexive, then it follows that the search strategy it pursues for completing the dependency between the antecedent and the reÀexive will be relatively ‘intelligent’: the parser only needs to ¿nd the relevant noun (this is usually a unique noun, at least in English) that is in the correct syntactic con¿guration with respect to it. Speci¿cally, when searching the string preceding the reÀexive, the parser does not need to consider any candidate other than the correct one.

44

Zhong Chen, Lena Jäger & Shravan Vasishth

This is a very attractive idea theoretically: the parser carries out an informed search and ef¿ciently ¿nds the antecedent. Indeed there is apparently considerable evidence consistent with this intelligent search hypothesis. Evidence in support for this claim comes from Sturt (2003), who ran two eyetracking experiments comparing the effect of gender match of a structurally accessible and a structurally incaccessible noun on the binding of the English reÀexive himself/herself. He had a 2×2 factorial design with sentences in which the gender of the reÀexive either matched or mismatched the stereotypical gender of the grammatical antecedent as well as the gender of another, structurally inaccessible, proper noun. Sturt reports a main effect of gender match/mismatch of the grammatical antecedent for early measures (¿rst ¿xation duration, gaze duration, regression path duration) as well as for second pass reading. The gender match/mismatch of the inaccessible antecedent showed no effect in early measures. In second-pass reading time, which is (arguably) considered a late measure, an interaction between the two factors was observed. Sturt concludes that the initial interpretation of the reÀexive is driven by Binding Principle A and thus not affected by the presence of structurally inaccessible nouns. He suggests that the late interference effect of an inaccessible antecedent might reÀect recovery strategies and wrap-up effects (which in some cases can lead to an incorrect ¿nal interpretation of the sentence). Another important piece of evidence comes from Xiang, Dillon & Phillips (2009), who investigated reÀexives in settings like (1) using event-related potentials (they carried out an interesting and important comparison between the processing of reÀexives and polarity items, but the present study is only concerned with the processing of reÀexives, so we do not discuss the polarity data here). (1)

a.

Congruent The tough soldier that Fred treated in the military hospital introduced himself to all the nurses.

b.

Intrusive The tough soldier that Katie treated in the military hospital introduced herself to all the nurses.

c.

Incongruent The tough soldier that Fred treated in the military hospital introduced herself to all the nurses.

In example (1a), the syntactically licit antecedent for himself is soldier; Fred occurs inside a relative clause modifying soldier and cannot therefore be a le-

How structure-sensitive is the parser? Evidence from Mandarin Chinese

45

gitimate antecedent. In (1b) and (1c) the antecedent of the reÀexive herself is also soldier (but the reader would have to reassess at the reÀexive the default assumption that soldier is male; soldier can also be female); the key difference between (1b) and (1c) is that in (1b) a female-referring noun, Katie, is present that is not a legal licensor of herself but nevertheless matches in gender with herself. In their ERP study, Xiang and colleagues found (inter alia) a non-signi¿cant positivity in the 800–1000 ms window in the Intrusive (1b) versus Incongruent (1c) condition (F(1,27) = 2.4, p = 0.13); had this effect been statistically signi¿cant, it would have been an interference effect. One important point to note here is that, to the extent that it can be interpreted as a non-null result (we return to this issue later), the increased positivity in the intrusive case suggests greater dif¿culty. Xiang et al. interpret the above ¿nding as a late interference effect. They also found a marginal centro-anterior negativity in the 250–350 ms interval, but reject it as possibly indicating an early effect of interference because they did not ¿nd this effect in the basic comparison of the standard intervals but only in a post hoc analysis driven by visual inspection, and because no previous ERP study has found such an effect in connection with reÀexives (Xiang et al. 2009:50). In sum, they argue that if any interference effect does exist in the case of reÀexives, it is potentially a late effect. In the initial parsing stages, the parser relies on structural cues to search for the antecedent. Xiang et al. also point out that their account (absence of interference effects) may only apply to reÀexives, and may not extend to pronouns or logophoric anaphors (2009:52). In other words, structure-sensitive retrieval applies to only one speci¿c case of antecedent resolution. Based on evidence such as that presented by Sturt (2003) and Xiang et al. (2009), Phillips et al. (2011) propose: […] we tentatively suggest that argument reÀexives are immune to interference from structurally inaccessible antecedents because antecedents are retrieved using only structural cues.

The above quote applies to con¿gurations such as (1) above. As discussed above, in (1b,c), the only legal antecedent of the reÀexive herself is soldier (note that the fact that soldiers are stereotypically male is irrelevant here because the same amount of dif¿culty should be experienced in reanalyzing soldier as a female in both the interference and no-interference condition). The claim is that the reÀexive should never consider the structurally inaccessible noun Katie, even though it has the feminine gender, just like herself. To quote Phillips et al. (2011):

46

Zhong Chen, Lena Jäger & Shravan Vasishth […] we are suggesting that the person, gender, and number features of reÀexives like himself, herself, and themselves play no role in the search for antecedents […].

The present paper provides initial evidence inconsistent with the above claim. Our study was motivated by the fact that there is clear evidence in the literature that dependency resolution in parsing is driven by an associative cue-based retrieval process. As an example, consider the work of Van Dyke (2007). In an eyetracking study, Van Dyke compared reading times at a verb (was complaining) that was preceded by a string containing one or two grammatical subjects, (2). (2)

a.

Non-subject, inanimate-referring intervening noun: The worker was surprised that the resident who was living near the dangerous warehouse was complaining about the investigation.

b.

Non-subject, human-referring intervening noun: The worker was surprised that the resident who was living near the dangerous neighbor was complaining about the investigation.

c.

Subject, inanimate-referring intervening noun: The worker was surprised that the resident who said that the warehouse was dangerous was complaining about the investigation.

d.

Subject, human-referring intervening noun: The worker was surprised that the resident who said that the neighbor was dangerous was complaining about the investigation.

Here, if, at the moment of dependency completion at the verb, the parser is momentarily confused by the presence of two grammatical subjects, then greater processing dif¿culty should be seen at the verb in (2c,d) compared to (2a,b). If this is an early process, this dif¿culty should be seen in early measures. This is in fact what Van Dyke found: in the early measure ¿rst-pass reading time, longer ¿xation durations are observed in (2c,d) (413 and 418 ms, respectively) compared to (2a,b) (376 and 382 ms respectively), all p’s < 0.05. This result is rather puzzling if we assume, following Phillips et al. (2011), that the parser is only sensitive to structural cues and not non-structural cues: after all, here the distractor subject inside the relative clause should never have been considered by the parser if it is using only structural information to ¿nd the subject of the verb. Thus, the structure-sensitive account of parsing would need to have a very limited scope: it would not apply to any con¿guration other than the reÀexive

How structure-sensitive is the parser? Evidence from Mandarin Chinese

47

(as mentioned earlier, Phillips et al. 2011 further limit the structure sensitivity of reÀexives to those in argument position). Pronouns, logophors and any kind of argument-verb dependency would not engage in structure-sensitive search but only reÀexives would. A simpler theoretical alternative would assume early interference effects in reÀexives, just as in other head-dependent resolution processes. We agree with Phillips and colleagues that the parser may be using grammatical constraints to decide on parser actions; but it would be very surprising if the parser were to not use every available piece of information in trying to ¿nd the antecedent. In English, one such piece of available information is gender marking: when the reÀexive is himself, the search process should be informed by the fact that the antecedent does not only have to be c-commanding the reÀexive, but must also be masculine. Why would the parser ignore the gender cue to make a decision? Nevertheless, Phillips and colleagues may be right that the parser ignores gender information in the case of reÀexives: after all, c-command provides suf¿cient information for ¿nding the antecedent. This is what Phillips and colleagues mean when they suggest that gender and other features of reÀexives play no role in the search for antecedents: the gender information is redundant because the syntactic constraint already provides all the information needed to ¿nd the antecedent. The reason that we feel that use of the gender cue may be unavoidable is that it would require the parser to actively avoid using information when it’s available. We see no a priori reason for the parser to parsimoniously give priority to syntactic cues over others; if syntactic constraints were given priority, interference effects such as Van Dyke’s (2007) would never be seen. One potential issue with all previous demonstrations of the absence of an interference effect is that these necessarily depend on null results. Arguing for null results is reasonable when statistical power is relatively high (note: we are not referring to ‘observed’ power; see Hoenig & Heisey 2001). However, when statistical power is relatively low, i.e. when the prior probability is low of ¿nding a signi¿cant effect when it is in reality present, it is dif¿cult to argue on the basis of null results. This issue has been extensively discussed in statistical theory (Cohen 1988), but has not received the attention it deserves in psycholinguistics and related areas. For example, for an interference effect that causes a 30 ms delay in processing, where the standard deviation is 110, for power = 0.80, type I error probability 0.05, the number of participants needed to achieve a signi¿cant difference (if there really were one in reality) in a two-sided paired t-test is 108. For psycholinguistic studies, it is quite common to have a sample size of 20, which yields a power of about 0.20. In other words, such an experiment has only a

48

Zhong Chen, Lena Jäger & Shravan Vasishth

20% chance of ¿nding an effect that is actually present in nature. As Cohen (1988) has pointed out, one might well ask oneself why we bother to put in so much effort into doing a study where we have relatively low chance of ¿nding anything at the outset. An effect size of 30 ms is by no means an unusually small number in reading studies. If one ¿nds a signi¿cant effect in spite of low statistical power, there are real grounds for drawing a conclusion based on the data; but if one fails to ¿nd an effect, it is dif¿cult to conclude anything. It is possible that low statistical power may be masking interference effects in the types of con¿gurations discussed above. An obvious way to demonstrate this point is to carry out an experiment (preferably multiple experiments) involving reÀexives where sample size is large enough to give us a reasonable chance of ¿nding an effect if there is one. Based on the experimental results described below, we suggest that Phillips et al.’s proposal that the parser uses only structural cues to ¿nd an antecedent may need to be quali¿ed. We suggest that, although the parser may well use structural cues to ¿nd an antecedent, it seems to use more than only structural cues. If correct, our ¿ndings imply that the parser can be fooled by – i.e., suffer interference effects from – nouns in structurally inaccessible positions that match the reÀexive on some search cue other than the structural cue. For example, in the case of (1), the parser may indeed experience interference from the intruding noun Katie. The parser’s search process may indeed be fallible. In the remainder of this paper, we present the evidence for fallibility of the parser’s search process, and then discuss the implications for theories of sentence comprehension.

2.

Experiment

In this experiment we investigate the Mandarin Chinese reÀexive ziji, which has several interesting properties that make it useful for the study of cue-based retrieval processes. We introduce the syntax of ziji before presenting the details of the experiment. 2.1.

The Mandarin Chinese reÀexive ziji

Ziji is unusual among reÀexives in that it can be a long-distance reÀexive anaphor that is not restricted in the local clause (Huang, 1984). Furthermore, it is subject to structural as well as pragmatic and semantic constraints (Huang & Liu 2001). When no antecedent is available in the sentential context, ziji

How structure-sensitive is the parser? Evidence from Mandarin Chinese

49

generally refers to the speaker (Dillon, Chow, Wagers, Guo, Liu, & Phillips submitted). When ziji is locally bound, it follows Principle A of Binding Theory (Chomsky 1981), which means that its antecedent is the subject of the clause it is contained in, as shown in (3a,b). However, violating Binding Principle A, ziji can also form long distance dependencies, such as (3c). (3)

a.

Nanhaii hai-le zijii. boyi harm-ASP selfi. ‘The boy harmed himself.’

b.

Zhe-pian wenzhangi shuo nanhaij hai-le ziji*i/ j. this-CL articlei say boyj harm-ASP self*i/ j. ‘This article says that the boy harmed himself.’

c.

Nanhaii shuo zhe-pian wenzhangj hai-le boyi say this-CL articlej harm-ASP ‘The boy says that this article harmed him.’

zijii/ *j. selfi/ *j.

The ability of ziji to form long distance dependencies poses a challenge for syntactic theory. In response to this challenge, syntacticians have come up with several explanations aiming to defend Binding Principle A. Cole & Sung (1994) explain the existence of the long distance bound reÀexive by assuming that ziji crosses clause boundaries via cyclic LF head movement. Huang (1984) proposes that long distance reÀexives should be interpreted as a special type of anaphoric pronoun that refers to the subject of the matrix clause. He assumes that there is an underlying representation with the subject of the matrix clause being the ‘speaker’ and the embedded clause that contains ziji being a direct quote. The long distance ziji, he concludes, should not be considered a reÀexive anaphor as de¿ned by Binding Theory. In a later account, Huang claims that the locally bound ziji is a common reÀexive anaphor following Binding Principle A, whereas the long distance bound ziji is to be explained as a logophor driven by pragmatic constraints. Huang states that this logophoric ziji has to be contained in a self-description of its antecedent’s referent. This self-description can be a description of a property that the referent explicitly self-ascribes or a description the referent is implicitly disposed to self-ascribe or the one that he self-ascribes via the speaker’s perspective (Huang & Liu 2001; cf. also Pan 1997). As mentioned above, there are several constraints on the antecedent of ziji that hold for both local and long-distance dependency formations. First, the

50

Zhong Chen, Lena Jäger & Shravan Vasishth

antecedent has to be a c-commander of ziji (Huang & Liu 2001). (3d) is one example of such cases. (3)

d.

Nanhaii juede muqin bu zai jia de-shihou boyi think mother not at home time jiejiej yinggai zhaogu hao zijii/ j. elder-sisterj should take.care.of well selfi/j. ‘The boy thinks that his elder sister should take good care of him / herself when their mother is not at home.’

Since muqin ‘mother’ does not c-command ziji, it is not a grammatical antecedent of ziji. Nanhai ‘boy’ and jiejie ‘elder sister’, however, are c-commanders of ziji and thus are both potential antecedents. As indicated by Huang & Liu (2001), in the case of long distance bound ziji, there is one exception to the c-command constraint. If it is part of an adjunct clause that precedes the matrix clause, ziji can be cataphorically bound to the subject of this matrix clause even though it is not its c-commandee: (3)

e.

Yinwei laoshi ma-le zijii nanhaii henshengqi. because teacher scold-ASP selfi boyi very angry. ‘Because the teacher scolded him, the boy was very angry.’

The second structural constraint on potential antecedents of ziji is that ziji can only refer to a subject NP (Huang & Liu 2001). (3)

f.

Nanhaii song-le nühaij yi-zhang zijii/*j hua de tuhua. boyi give-ASP girlj one-CL selfi/*j draw POSS painting. ‘The boy gave the girl a painting that he drew by himself.’

In spite of c-commanding ziji, nühai ‘the girl’ is not a grammatical antecedent as it is in the object position of the matrix clause, leaving nanhai as the only possible antecedent. There is one important exception that goes against both the subjectivity as well as the c-command constraint. Ziji can refer to a grammatical antecedent that is part of the subject NP (Huang & Liu 2001). (3)

g.

Nanhaii de jingyan jiu-le zijii. boyi POSS experience save-ASP selfi. ‘The boy’s experience saved him.’

How structure-sensitive is the parser? Evidence from Mandarin Chinese

51

In (3g), nanhai is the antecedent of ziji, although it neither c-commands it, nor is it a subject. Instead, it is a sub-commanding NP modifying the subject NP jingyan ‘experience’, which in turn is the head noun of an NP that c-commands ziji. In addition to these structural constraints, ziji also exhibits semantic constraints on its antecedent: only animate and sentient referents can build a dependency with ziji (Dillon et al. submitted). (3)

h.

Nanhaii piping-le zijii. boyi criticize-ASP selfi. ‘The boy criticized himself.’

i.

Nanhaii shuo zhe-pian wenzhangj piping-le boyi say this-CL articlej criticized ‘The boy says that this article criticized him.’

j.

Zhe-pian wenzhangi shuo nanhaij piping-le ziji*i/j. this-CL articlei say boyj criticize-ASP self*i/j. ‘This article says that the boy criticized himself.’

k.

Nanhaii shuo jiejiej piping-le zijii/j. boyi say elder-sisterj criticize-ASP selfi/j. ‘The boy says (his) elder sister criticized him / herself.’

zijii/*j. selfi/*j.

(3h-j) each has only one animate subject, which results in unambiguous sentences. (3k), however, has two animate subjects (nanhai and jiejie), both ccommanding ziji and thus candidates for being its antecedent. This leads to a globally ambiguous sentence. As for the long-distance binding of ziji, it is important to note that there are certain cases in which intervening non-antecedents of ziji can block this dependency formation. For example, although being in a clear non-antecedent position, the personal pronouns wo ‘I’ and ni ‘you’ can block a long-distance dependency formation with a third-person NP (3l), a third person NP in the same position, however, does not show any blocking effect (3m) (Huang & Liu 2001). (3)

l.

Nanhaii dui wo shuo jiejiej hen ai ziji*i/j. boyi to I say elder-sisterj very love self*i/j. ‘The boy says to me that (his) elder sister loves herself very much.’

52

Zhong Chen, Lena Jäger & Shravan Vasishth

m.

Nanhaii dui muqin shuo jiejiej hen ai zijii/j. boyi to mother say elder-sisterj very love selfi/j. ‘The boy says to (his) mother that (his) elder sister loves him / herself very much.’

Furthermore, Huang & Liu (2001) point out that an interfering local singular NP blocks dependency formation with a long-distance plural antecedent (3n), whereas in the reverse case there is no such blocking effect (3o). (3)

n.

Zhe-xie reni tingshuo nanhaij hen ziji*i/j. these-CL peoplei hear boyj hate self*i/j. ‘These people heard that the boy hated himself.’

o.

Nanhaii tingshuo zhe-xie renj hen zijii / j. boyi hear these-CL peoplej hate selfi / j. ‘The boy heard that these people hated him / themselves.’

Dillon et al. (submitted) point out that ziji is not only a structurally constrained reÀexive that can form local as well as long distance bindings, but it is also completely retrospective since there are no cues available indicating the existence of a dependency before encountering ziji. They argue that exactly these properties of ziji make it a very useful tool to test memory access. The important properties of ziji that we exploit in the next experiment are: the possibility of long- and short-distance antecedents; and the requirement common to both long- and short-distance antecedents that they have to be subjects (modulo the exception discussed above) and animate. 2.2.

Participants

120 participants from Dalian and Nanjing took part in this experiment for payment (2 Euros per participant). 2.3.

Method

We used the self-paced reading methodology (Just, Carpenter & Woolley 1982). A session started with practice trials to prepare the participants for the task. Participants are instructed to read at a natural pace. Each trial begins with a screen presenting a sentence in which the words are masked by dashes. The participant has to press the space bar to reveal the next word. After each sentence, a yes-no

How structure-sensitive is the parser? Evidence from Mandarin Chinese

53

comprehension question appears on the screen. Participants have to press a key for ‘yes’ or ‘no’ responses. 2.4.

Design and Predictions

In Chinese structures like (4), the antecedent ‘the opposition leader’ is the only legal antecedent for the reÀexive ziji, ‘self’ (which requires an animate antecedent, as discussed above). The non-local antecedent case (4a,b) is interesting because it helps us test another prediction of the cue-based retrieval model: due to decay and/or interference from other chunks in memory, the long-distance antecedent would be harder to retrieve at the reÀexive, resulting in a stronger preference for a more locally available partially matching candidate. In other words, the cue-based retrieval account predicts an interaction between locality and interference, which the structural-cue based access account does not (since the interfering noun would never be considered as a candidate in either short- or long-distance antecedent cases). Thus, in (4), under the exclusively structure-sensitive search view, the parser should never consider an intervening noun like kangyizhe, ‘protester’, as an antecedent because it is inside an adverbial phrase and cannot c-command the reÀexive ziji; this predicts no reading time difference between cases where the intervening noun is ‘protest’ (4a,c) versus ‘protestor’ (4b,d). By contrast, the cue-based retrieval account predicts an interference effect (slower reading time in (4b,d) versus (4a,c)), and an interaction between locality and interference, in that the non-local condition should show a stronger interference effect. (4)

a.

Long-distance dependency; inanimate interposed NP Fanduipai-lingxiu biaoshi [zhe-ge shengming [zai opposition leader said this-CL announcement at kangyi shikong de-shihou]AdvP gaojie-le ziji de protest out.of.control time warn-ASP ziji POSS dangyuan]CP party.member ‘The opposition leader said that this announcement warned his party members when the protest was out of control.’

54

Zhong Chen, Lena Jäger & Shravan Vasishth

b.

Long-distance dependency; animate interposed NP Fanduipai-lingxiu biaoshi [zhe-ge shengming [zai opposition leader said this-CL announcement at kangyizhe shikong de-shihou]AdvPgaojie-le ziji de protester out.of.control time warn-ASP ziji POSS dangyuan]CP. party.member ‘The opposition leader said that this announcement warned his party members when protesters were out of control.’

c.

Local dependency; inanimate interposed NP Zhe-ge shengming biaoshi [fanduipai-lingxiu [zai this-CL announcement said opposition leader at kangyi shikong de-shihou]AdvP gaojie-le ziji de protest out.of.control time warn-ASP ziji POSS dangyuan]CP. party.member ‘This announcement said that the opposition leader warned his party members when the protest was out of control.’

d.

Local dependency; animate interposed NP Zhe-ge shengming biaoshi [fanduipai-lingxiu [zai this-CL announcement said opposition leader at kangyizhe shikong de-shihou]AdvP gaojie-le ziji de protester out.of.control time warn-ASP ziji POSS dangyuan]CP. party.member ‘This announcement said that the opposition leader warned his party members when protesters were out of control.’

We used twenty-four sets of items shown as in (4); these (along with data and R code used) are available online from the website of the Potsdam Mind Research Repository (http://www.psych.uni-potsdam.de/pmr2). In addition to the stimulus items, seventy ¿llers with varying syntactic structures were randomly interspersed between items, with the constraint that at least one ¿ller intervened between two items. Both items and ¿llers were presented in simpli¿ed Chinese characters. Each target item was followed by a Yes/No question; the answer to this question required the reader to correctly resolve the antecedent-reÀexive relationship. For the examples in (4), the corresponding questions would be as follows:

How structure-sensitive is the parser? Evidence from Mandarin Chinese

55

Qa: Did this announcement warn members of the opposition party? Y Qb: Did this announcement warn members of the opposition party? Y Qc: Did the opposition leader warn members of the ruling party? N Qd: Did the opposition leader warn members of the ruling party? N

The non-local conditions always had the correct answer as Y, and the local conditions always had the correct answer as N. We return to this point in the Results and Discussion sections. 2.5.

Results

2.5.1. Question-response accuracies and latencies We ¿rst present the analyses for question response latency and accuracy, and then the reading time results. The question-response latencies and accuracies for the four conditions are presented in Tables 1, 2 below. Table 1. Means and standard errors of response times (ms) in the four conditions. Non-interfering Interfering

Non-local (a) 2787 (69) (b) 2944 (71)

Local (c) 2520 (63) (d) 2665 (88)

Table 2. Means and standard errors of response accuracy (percentages) in the four conditions. Non-interfering Interfering

Non-local (a) 85.62 (0.02) (b) 79.79 (0.02)

Local (c) 87.08 (0.02) (d) 88.54 (0.01)

For the analysis of question-response latencies, a linear mixed model (Bates & Sarkar 2007) was ¿t with participants and items as crossed random factors, and locality, interference, and their interaction as orthogonally coded factors. For question response accuracies, a generalized linear model with a logistic link function was ¿t (a correct response was marked 1, and an incorrect one 0). As shown in Tables 3, 4, we found a main effect of locality in both latencies and accuracies: the non-local condition had longer response latencies, and lower accuracies. The response latency to the interference conditions was signi¿cantly slower, but no signi¿cant effect of interference was seen in response

56

Zhong Chen, Lena Jäger & Shravan Vasishth

accuracy. Finally, a signi¿cant interaction between locality and interference was found in response accuracies: the interference effect was larger in the nonlocal compared to local conditions (this is as predicted by the cue-based retrieval model). In question-response latencies this interaction was not signi¿cant. The results shown for question response latencies have 26 extreme values ( > 10,000 ms) removed; this constituted 1.35% of the data points. If these data points are included, only the locality effect remains signi¿cant. No data were removed for the question response accuracies (these are 0,1 scores). Table 3. Statistical analysis of the question-response latencies. Latencies greater than 10,000 ms were removed (1.35% of the data) – see text for discussion. Contrast Locality Interference Loc x Int

Coef¿cient í0.11 0.03 í0.03

SE 0.02 0.02 0.02

t-score í6.10 2.14 í1.75

p-value < 0.05 < 0.05 n.s.

Table 4. Statistical analysis of the question-response accuracies. Contrast Locality Interference Loc x Int

Coef¿cient 0.49 í0.18 0.32

SE 0.14 0.14 0.14

z-score 3.44 í1.23 2.23

p-value < 0.01 n.s. < 0.05

2.5.2. Reading times at the critical and post-critical regions We present next the statistical analyses for the critical region ziji and the postcritical region (the word de immediately following the critical region). The post-critical region is interesting because processing costs that arise due to dif¿culty in the critical region are often expressed in the following region, socalled spillover (Mitchell 1984). We removed all reading times longer than 2000 ms (1.1% of the data) because these extreme values skew the residuals of the linear mixed model. The data analysis was carried out on log-transformed reading times in order to respect the additivity assumption of linear mixed models (Gelman & Hill 2007). As summarized in Tables 5 and 6, the results show a marginally signi¿cant locality effect at the critical region (the reÀexive ziji), and a signi¿cant interference effect. A marginal interaction is also found. If all the data are retained (i.e., if the extreme values greater than 2000 ms are included), then a signi¿cant locality effect is seen (t = í2.02), and the interference effect is rendered marginal (1.69).

How structure-sensitive is the parser? Evidence from Mandarin Chinese

57

In the spillover region (the word de following the reÀexive), a statistically signi¿cant effect of locality and interference was seen (here, removal of all reading times greater than 2000 ms resulted in removal of 0.3% of the data; results do not change even if we retain these extreme values). No interaction is found. Table 5. Means and standard errors of reading times (ms) in the four conditions, at the critical region and the region following the critical region (spillover region). These means and standard errors are computed after removing 1.1% of the data points (reading times greater than 2000 ms). Region ziji de

Long-distance, inanimate 410 (8) 371 (7)

Long-distance, animate 448 (12) 387 (7)

Local, inanimate 410 (8) 363 (5)

Local, animate 415 (9) 372 (7)

Table 6. Statistical data analysis of the critical and spillover regions. Region ziji

de

2.6.

Contrast Locality Interference Loc x Int Locality Interference Loc x Int

Coef¿cient í0.03 0.03 í0.03 í0.02 0.02 í0.01

SE 0.01 0.01 0.01 0.01 0.01 0.01

t-value í1.92 2.03* í1.97 í2.26* 2.25* í0.94

Discussion

To summarize the results, question-response accuracy was higher and questionresponse latency was shorter in the local conditions, suggesting that processing the local-antecedent case was easier (this is consistent with the ¿ndings of Dillon et al. submitted; Li & Zhou 2010). In question-response accuracy and latency we found an interaction between the locality and interference manipulation. Regarding reading times, we found: (i) a marginal effect of locality at the critical region, and a signi¿cant effect in the spillover region: the local-antecedent conditions were read faster; (ii) an effect of interference at the reÀexive and in the spillover region: the conditions with the interfering noun were read slower; and (iii) a marginal interaction in the reÀexives region. The presence of interference effects in the question-response data and the reading time data is consistent with the possibility that not only structural cues are used in searching for antecedents. The interaction seen in question-response

58

Zhong Chen, Lena Jäger & Shravan Vasishth

data and the marginal interaction (t = í1.97) in the reading time data are consistent with the prediction that interference effects should be stronger in the cases where the correct antecedent has decayed more (the non-local case has a stronger interference effect). Note, however, that the evidence for this interaction in the online data is quite weak. The evidence for interference effects in the question-response data is interesting because it suggests that in the interference conditions the incorrect antecedent has been retrieved and connected with the reÀexive. This is clear evidence that the wrong antecedent has been retrieved and has permanently become associated with the reÀexive. The self-paced reading data may be dismissed on the grounds that SPR data might be indexing only later processes. Xiang et al. (2009) have argued that illicit antecedents are considered only in later stages of processing. This objection can be addressed by using methods (such as EEG, and eyetracking) that can help distinguish between early and late processes. We intend to pursue this issue in future work. One potential concern in interpreting the question-response accuracies is that the non-local conditions always required “yes’’ as the correct response, whereas the local condition always required a “no” response. This could have led participants to adopt a strategy that could have biased our results. It is dif¿cult to determine whether this is a true confound in the design, but we intend to revisit this experiment with a more counterbalanced design in the questions.

3.

General Discussion

The presence of the interference effect is consistent with the view that retrieval processes involve the use of non-structural retrieval cues such as animacy. The alternative view, that only structural cues would be used in antecedent resolution of reÀexives, has in our opinion one major Àaw: the results are necessarily based on null results (the absence of interference effects, or the inability to observe interference effects). One possibility worth exploring is that these null results may have been observed in experiments with relatively low statistical power. This is a testable claim: if the experiments supporting the structuralcue based explanation are replicated with a larger sample size (how large the sample size needs to be can be estimated from previous results), it is possible that interference effects would be observed. It remains to be seen whether this will turn out to be correct.

How structure-sensitive is the parser? Evidence from Mandarin Chinese

59

Our ¿ndings should not be taken to imply that the parser does not employ structural cues to complete dependencies. Our claim is merely that the parser may not ignore relevant cues such as animacy in searching for an antecedent. We turn now to potential objections to our ¿ndings. A legitimate concern in the present experiment is that we only have one experimental result that supports the interference account (but see Badecker & Straub 2002 for evidence from English consistent with ours), whereas the alternative view has several experiments to back up their claims. This is a valid criticism. There is only one way to respond to this objection: we are in the process of trying to replicate this effect not just for Chinese but also for English and other languages. A further objection could be that the interference effect seems to occur only in the long-distance reÀexive condition; as mentioned above, long-distance anaphora in Chinese is believed to be an instance of logophoric anaphora. Xiang et al. (2009) mention that their constraint does not apply to logophoric anaphora. It is possible that our result is restricted to logophors. An easy way to address this concern is to carry out an experiment with English that exactly replicates the Xiang et al. con¿guration. If an interference effect is seen in such con¿gurations as well, this would suggest that there is nothing special about interference effects. We hope to de¿nitively answer this question before long, but at least one replication (Patil, Vasishth & Lewis 2011) using eyetracking of the Xiang et al. (2009) materials shows interference effects in ¿rst-pass regression probability, arguably an early effect. To conclude, we suggest in this paper that the evidence for structure-sensitive search in the case of reÀexives may not be well-motivated: it is based on null results that may have been a consequence of low statistical power, and in cases such as Xiang et al. (2009), the patterns are in fact consistent with the presence of an early interference effect. Although they interpret the early statistically non-signi¿cant effects as null results, we contend that if the experiment were given a chance to show an effect – by increasing statistical power suf¿ciently – these effects would be observed in their early time window. A further weakness of the structure-sensitive search account for reÀexives is its limited scope: a large number of con¿gurations must be exempted from this structuresensitive search constraint (pronoun, logophors, and argument-head dependencies), allowing only one narrow category to engage in structure-sensitive search. A simpler account would capture the uniform presence of interference effects in all these structures. Of course, the evidence showing interference effects needs to be strengthened by replication across a variety of structures, languages, and methods. For now, the debate remains open.

60

Zhong Chen, Lena Jäger & Shravan Vasishth

Aknowledgements Our greatest debt is to Brian Dillon, Colin Phillips, and Ming Xiang for patiently helping us understand their positions on the issue. This work has bene¿tted from discussions with John Hale, Richard Lewis, Umesh Patil, Titus von der Malsburg and Masaya Yoshida. Thanks go to Professor Baojia Li, Nanjing Normal University for allowing Zhong Chen to conduct half the study there, and Professor Huili Wang, Dalian University of Technology for allowing Lena Jäger to visit and carry out the second half of the data collection there. For questions about this paper, please contact Shravan Vasishth ([email protected]).

References Badecker, W. & K. Straub 2002 The processing role of structural constraints on the interpretation of pronouns and anaphors. Journal of Experimental Psychology: Learning, Memory, and Cognition 28: 748–769. Bates, D. & D. Sarkar 2007 lme4: Linear mixed effects model using S4 classes. Chomsky, N. 1981 Lectures on Government and Binding. Dordrecht: Foris. Cohen, J. 1988 Statistical power analysis for the behavioural sciences. Hillsdale, NJ: Lawrence Erlbaum. Cole, P. & L.-M. Sung 1994 Head movement and long-distance reÀexives. Linguistic Inquiry 25: 355–406. Dillon, B., W.Y. Chow, M. Wagers, T. Guo, F. Liu & C. Phillips submitted The structure-sensitivity of memory access: evidence from Mandarin Chinese. Gelman, A. & J. Hill 2007 Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press. Hoenig, J.M. & D.M. Heisey 2001 The abuse of power. The American Statistician 55 (1): 19–24. Hofmeister, P. & I.A. Sag 2010 Cognitive Constraints on Syntactic Islands. Language 86: 366–415. Huang, C.-T.J. 1984 On the distribution and reference of empty pronouns. Linguistic Inquiry 15: 531–574.

How structure-sensitive is the parser? Evidence from Mandarin Chinese

61

Huang, C.-T. J. & C.-S. L. Liu 2001 Logophoricity, attitudes, and ziji at the interface. In: P. Cole, G. Hermon, and C.-T. J. Huang (eds.), Long-Distance ReÀexives. New York: Academic Press, 141–195. Just, M.A., P.A. Carpenter & J.D. Woolley 1982 Paradigms and processes in reading comprehension. Journal of Experimental Psychology: General 111 (2): 222–238. Kluender, R. 1998 On the distinction between weak and strong islands: A processing perspective. Syntax and Semantics 29: 241–280. Kluender, R. & M. Kutas 1993b. Subjacency as a processing phenomenon. Language and Cognitive Processes 8: 573–633. Li, X. & X. Zhou 2010 Who is ziji? ERP responses to the Chinese reÀexive pronoun during sentence comprehension. Brain Research 1331: 96–104. Mitchell, D. C. 1984 An evaluation of subject-paced reading tasks and other methods of investigating immediate processes in reading. In: D. E. Kieras and M.A. Just (eds.), New Methods in Reading Comprehension Research. Hillsdale, NJ: Lawrence Erlbaum. Pan, H. 1997 Constraints on reÀexivization in Mandarin Chinese. New York: Garland Publishing Inc. Patil, U., S. Vasishth & R.L. Lewis 2011 Early retrieval inteference in syntax-guided antecedent-search? Talk presented at the 24th CUNY Sentence Processing Conference, Stanford, CA. Phillips, C., M. Wagers & E. Lau 2011 Grammatical illusions and selective fallibility in real-time language comprehension. In J. Runner (ed.), Experiments at the Interfaces, Syntax and Semantics 37, 153–186. Bingley, UK: Emerald Publications. Sag, I., P. Hofmeister & N. Snider 2007 Processing complexity in subjacency violations: The complex noun phrase constraint. Proceedings from the Annual Meeting of the Chicago Linguistic Society 43 (1): 215–229. Sturt, P. 2003 The time course of the application of binding constraints. Journal of Memory and Language 48: 542–562. Van Dyke, J. A. 2007 Interference effects from grammatically unavailable constituents during sentence processing. Journal of Experimental Psychology: Learning, Memory, and Cognition 33 (2): 407–430.

62

Zhong Chen, Lena Jäger & Shravan Vasishth

Xiang, M., B. Dillon & C. Phillips 2009 Illusory licensing effects across dependency types: ERP evidence. Brain and Language 108 (1): 40–55.

The annotation of preposition senses in German Antje Müller, Claudia Roch, Tobias Stadtfeld & Tibor Kiss

1.

Introduction

1.1.

Prepositions and their polysemy

While there is general agreement that prepositions (especially simple prepositions) are highly polysemous, there are two competing views on how polysemy might occur: the derivational view assumes that simple prepositions derive from and retain the basic meanings of adverbs, such as temporal, spatial or modal relations. In line with this, it has been assumed that preposition senses can be derived from primordial protosenses, typically spatial senses (cf. Tyler & Evans 2001). Alternative proposals on polysemy, however, suggest that different senses of words only share their phonological form and have emerged arbitrarily and independently at some point in time (cf. Croft 1998). We favor the second approach and assume a relational view on prepositions.1 Accordingly, prepositions are not speci¿ed as bearing a prototypical meaning. Instead, we assume a set of (sometimes arbitrary) relational preposition senses that are associated with preposition lexemes. Consequently, we avoid speaking of spatial prepositions, etc. and prefer to speak of preposition senses and their mappings to preposition lexemes. Of course, it cannot be denied that most simple prepositions in German have spatial interpretations. But neither can it be denied that almost as many prepositions show temporal or conditional interpretations, while other senses are distributed to a lesser degree among prepositions. For instance, the interpretation PRESENCE is expressed only with mit (‘with’) and ohne (‘without’) in German. (1)

Kommentarband mit einem Beitrag von Albert KnoepÀi Commentary volume with an article by Albert KnoepÀi ‘Commentary volume including an article by Albert KnoepÀi’

1. The work reported herein was supported by the DFG (Deutsche Forschungsgemeinschaft) under grant KI-759/5. We would like to thank three anonymous reviewers for their comments.

64

Antje Müller, Claudia Roch, Tobias Stadtfeld & Tibor Kiss

Typically, senses like PRESENCE are investigated to a lesser extent. In general, we deem it implausible to restrict investigations of preposition senses to a few, possibly prototypical cases, since a complete description of possible preposition senses is a requirement of an analysis of prepositional polysemy. This demand becomes even more important as we try to determine the distribution of preposition senses in large corpora. The de¿nition of an annotation scheme is a prerequisite for an analysis of the distributional patterns of preposition senses, and for their automatic classi¿cation. 1.2.

Existing literature on preposition senses

Preposition senses seem to be well explored. On a closer inspection, however, it turns out that we still lack a structured understanding of preposition senses. As far as we know, there is no account that provides every piece of information required for an annotation scheme of preposition senses. In dictionaries the description is quite often limited to simple – often handcrafted – examples and some more or less informative sense labels (Duden 2002; Kempcke 2000). Criteria for the distinction of senses are not provided and the interpretation of sense labels varies across the prepositions described. One could imagine that this is due to a lack of space in print dictionaries, due to a tension between page limitations and comprehensiveness. But electronic dictionaries, like the DWDS2, do not provide a more detailed analysis either. We sometimes ¿nd dictionaries or collections that focus on prepositions, and attempt to systematize the spectrum of preposition senses. Schröder’s Lexikon deutscher Präpositionen (Schröder 1986) stands out in including a ¿ne-grained feature-based analysis of preposition senses in German. Making use of over 200 binary features, it is, however, too complex to be feasible for corpus annotation. Reviewing the existing grammars and dictionaries, it is conspicuous that the examples used in reference works are repeated over and over again, rarely being revised or replaced. If the characterization of possible senses is carried out in neglect of large data sets then senses tend to be ignored that do not come to mind immediately. Some works select a subgroup of prepositions with a common subsense (e.g. Retz-Schmidt 1988; Wiese 2004). These publications are mostly concerned with spatial or temporal prepositions, describing the options of realiza-

2. Digitales Wörterbuch der deutschen Sprache, Berlin-Brandenburgische Akademie der Wissenschaften, D-10117 Berlin. http://www.dwds.de

The annotation of preposition senses in German

65

tion of space and time in language, and seldom touch on the other subsenses of the prepositions.

2.

Preposition senses and preposition-noun combinations

In a preposition-noun combination (PNC), a preposition governs a determinerless nominal projection, the head of which must be a singular count noun. PNCs should be ‘convertible’ into PPs by adding a determiner. Some German examples are provided in (2): (2)

auf Anfrage (‘on being asked’), unter Androhung (‘under threat’), mit Vorbehalt (‘with reservation’)

PNCs can be prenominally modi¿ed (3), and postnominal complementation is licit as well (4). (3)

auf parlamentarische Anfrage (‘after being asked in parliament’), mit beladenem Rucksack (‘with loaded backpack’)

(4)

Er wehrt sich gegen die Forderung nach Stilllegung He de¿es REFL against the demand for closedown einer Verbrennungsanlage. an incineration plant ‘He de¿es the demand for closing an incineration plant.’

Yet PNCs present a crosslinguistically attested anomaly (cf. Himmelmann 1998): They violate the rule that singular count nouns have to appear with a determiner, as it is stipulated in the Duden rule 442 for German: “Substantive mit Merkmalkombination ‘zählbar’ plus Singular haben [...] grundsätzlich immer ein Artikelwort bei sich, und wenn es als letzte Möglichkeit der inde¿nite Artikel ist.” [a determiner must accompany singular count nouns, even if this is the inde¿nite article as a last resort] (Duden 2005). For some time, PNCs have been treated as exceptions, as can be illustrated with the Duden Grammar for German (Duden 2005). As an amendment to rule 442, it offers rule 395 to cover PNCs. But this rule is a mere list of exceptions ascribing PNCs to special registers. Addressing such claims, recent research has shown that PNCs are indeed productive (Stvan 1998; Baldwin et al. 2006 for English; Dömges et al. 2007 for German).

66

Antje Müller, Claudia Roch, Tobias Stadtfeld & Tibor Kiss

Most researchers assume that the semantics of the preposition plays a major role in determining the grammaticality of PNCs. Baldwin et al. (2006) arrive at the conclusion that preposition senses in English are more restricted in PNCs than in PPs in general. Initial investigations have shown that certain senses seem to be blocked in German PNCs, too. The preposition unter (‘under’), e.g., does not allow spatial interpretations in PNCs, unless it appears in the context of newspaper headlines. We propose an analysis for PNCs that is based on corpus annotation, and logistic regression modeling (Harrell 2001). Annotation mining (cf. Chiarcos et al. 2008) is a recent extension of text mining, where linguistically relevant generalizations and correlations are derived in a bottom-up fashion from a suitably annotated corpus with multiple layers of information. In the case at hand we include several annotation layers that enrich our data with part of speech tags, shallow and deep syntactic analysis, sense annotation, and complex conceptual annotations derived from resources such as HaGenLex (Hartrumpf, Helbig & Osswald 2003) and GermaNet (Kunze & Lemnitzer 2002). Logistic regression modeling is a suitable tool to characterize determiner omission, as logistic regression identi¿es features (i.e. annotations) that are relevant for a binary distinction – realization or omission of the determiner in the present case. The annotations thus serve to induce licensing conditions for the omission of a determiner in PPs. But not every preposition can appear in a PNC. We therefore limit our analysis to (simple) prepositions that are allowed in PNCs and typically take NP complements in the singular when appearing in PPs. For instance, we do not consider bis (‘until’) that only allows NP complements if the NP refers to a temporal entity, but typically selects adverbial phrases as complements, zwischen (‘between’) that requires a coordination of two NPs or a plural NP as complement, or per (‘per’) that always joins with a bare nominal in German. We have also excluded secondary prepositions, as they never occur in PNCs, and prepositions that do not govern a case. Starting with the description of prepositions in traditional grammars (cf. Helbig & Buscha 2001), we arrived at the following set of prepositions for investigation: an, auf, bei, dank, durch, für, gegen, gemäß, hinter, in, mit, mittels, nach, neben, ohne, seit, über, um, unter, vor, während, and wegen. The development of an annotation scheme for preposition senses should not only be useful for the task at hand. It will also allow the de¿nition of a gold standard for automated preposition sense annotation. The resource will be considerably larger than available corpora that are fully annotated but smaller, and thus not suf¿cient for our needs. Previous investigations have shown that between 2,500 and 5,000 PNCs/PPs per preposition are necessary to provide

The annotation of preposition senses in German

67

reliable regression models, and we assume this order to be applicable to automated preposition sense annotation as well.

3.

Building an annotation scheme for preposition senses

3.1.

The reference corpus

The Swiss German newspaper “Neue Zürcher Zeitung” (vols. 1993–1999) forms the basis for the annotation, comprising approx. 230 million words. The annotation employs an XML-stand-off format.3 The annotation tool MMAX2 (Müller & Strube 2006) is used for manual annotation. For each preposition, we consider three datasets that enter into investigation: PNCs (approx. 91,000 instances), corresponding PPs (approx. 320,000 instances) with the same count noun, and PPs (approx. 122,000 instances) containing count nouns not appearing inside PNCs. The identi¿cation of count nouns is based on the combination of two classi¿ers, a decision tree classi¿er based on the C4.5 algorithm (Quinlan 1986) and a Naïve Bayes classi¿er (Witten & Frank 2005). The analysis of the classi¿ers resulted in 4,431 fully countable nouns that are considered in the three datasets. The following annotations are provided for each dataset: – Lexical level: Part of speech, inÀectional morphology, derivational morphology of nouns, interpretation of nouns, interpretation of prepositions, noun compounding. – Syntactic level: Mode of embedding of the phrase (adjunct or complement), syntactic dependents of the noun, modi¿cation of the noun. – Global level: Is the phrase contained in a headline, title, or quotation? Is the phrase idiom-like? Headlines, titles, and quotations are particularly prone to text truncation. Similarly, PNCs and PPs in idioms might follow combination rules that differ from general modes of combination. For automatic annotation the following tools are applied: The Regression Forest Tagger (Schmid & Laws 2008) for POS tagging and morphological analysis (the tagger contains the SMOR component for morphological analysis, cf. Schmid, Fitschen & Heid 2004), the Tree Tagger (Schmid 1995) for chunk parsing, and the Malt-Parser (Nivre 2006) for syntactic dependencies. 3. In general, one would have preferred to work with corpora from more than one genre. Hence, the present corpus can be considered representative, yet not balanced.

68

Antje Müller, Claudia Roch, Tobias Stadtfeld & Tibor Kiss

To determine noun meanings we make use of two resources. The ¿rst resource is GermaNet (Kunze & Lemnitzer 2002), the German version of WordNet. We employ 23 top-level categories4, and each noun is annotated with every top-level category it belongs to.5 Secondly, we use the computer lexicon HaGenLex (Hartrumpf, Helbig & Osswald 2003), which offers speci¿c sortal information derived from a formal ontology for each noun. As the current set-up is concerned with the creation of a reference corpus, annotations on the global level and preposition sense annotation are carried out manually. 3.2.

An inventory of preposition senses

There is as yet no standardized inventory of preposition senses for German. Ideally, one would employ a universally applicable scheme, but such a scheme would require language-speci¿c mappings from senses to prepositions and vice versa. The Preposition Project (cf. Litkowski & Hargraves 2005) as well as PrepNet (Saint-Dizier 2005) offer categories for preposition senses in English and Frensh respectively. Yet, we did not make use of these resources, as mappings from senses to lexemes are inherently language-speci¿c. The speci¿city of the mapping can be illustrated with the prepositions über (‘above’, ‘over’) and nach (‘to’, ‘after’) as discussed in section 3.3.1. If we started with an English inventory here, a mapping to two different lexemes would have to be taken into account, as well as the polysemy of the two lexemes, which is only partially present in the German lexeme. We envisage a comparison of the present scheme with the existing resources to identify cross-linguistic mappings. We started the development of the present scheme by consulting the German grammar by Helbig & Buscha (2001) and the dictionary Duden Deutsch als Fremdsprache (Duden 2002), as well as Schröder’s dictionary of German prepositions. 4. The following list gives the categories used: Motiv, natGegenstand, Gruppe, Nahrung, natPhaenomen, Form, Tier, Zeit, Gefuehl, Kommunikation, Substanz, Ort, Relation, Tops, Koerper, PÀanze, Besitz, Menge, Attribut, Geschehen, Kognition, Artefakt, Mensch. 5. Nouns that are assigned to more than one top-level category are presumably homonymous or polysemous. We do not disambiguate the nouns but represent this ambiguity by assigning a top-level probability to the noun. If a top-level category value is given a probability smaller than 1, it indicates that more than one reading of the noun is possible, and only one of these readings will be assigned the appropriate top-level category. We can thus be sure that a signi¿cant semantic feature (i.e. membership to a top-level category) will be reÀected in the classi¿cation.

The annotation of preposition senses in German

69

Dictionaries quite often do not seek complete coverage of preposition senses. So we did not only rely on the reference works and determined our scheme with the help of examples collected there, but de¿ned the scheme iteratively, testing it against corpus data and ¿lling in missing senses if required. The scheme makes use of three characteristic features: First, it is hierarchically organized, allowing for the inclusion of taxonomies of subsenses, or the representation of decision trees to arrive at speci¿c subsenses for individual prepositions. Secondly, it allows multiple sense assignments if further disambiguation proves to be infeasible. Thirdly, certain general properties of senses are extracted as cross-classifying features, allowing a compact representation of decision trees and taxonomies, as, e.g., for the ubiquitous distinction between local and directional senses. As was already mentioned, the scheme is based on a hierarchical tree-like structure. Beginning with a root node, types of preposition meanings branch to subtrees for different classes (e.g. spatial, temporal or conditional) with differing depths or to individual, unary branches. We have identi¿ed 27 different top-level senses for our restricted inventory of 22 prepositions (cf. (5) below).6 The senses SPATIAL and TEMPORAL are mapped to decisions trees. The senses CONDITIONAL and MODAL are mapped to subhierarchies of senses. This holds for the sense PRESENCE as well – which is only employed by the prepositions ohne and mit (cf. section 5) – and takes two subcategories, viz. ANALYTIC and SYNTHETIC. For the latter hierarchies, the scheme is agnostic as to the ontological status of senses. Assignment of a supersense instead of a most speci¿c subsense may thus imply that a more speci¿c interpretation is conceivable but cannot be derived from the criteria at hand. While the scheme includes an automatic mapping from subsenses to supersenses, it is naturally desirable that the most speci¿c annotation is provided whenever possible. As spatial and temporal senses are represented as decision trees, sense annotation will always yield a most speci¿c subsense (cf. next section). The majority of senses do not show subsenses. The senses are typically only instantiated by few prepositions. A complete list of the top-level senses is given in (5).

6. In addition to semantic features, the scheme contains the feature governed for prepositions governed by a lexical head. Governed prepositions are often considered to show light semantics only, if at all. But the assignment of the feature governed does not preclude the assignment of additional semantic features if it turns out that the preposition shows a discernible meaning despite its being governed.

70

(5)

Antje Müller, Claudia Roch, Tobias Stadtfeld & Tibor Kiss

conditional, spatial, temporal, modal, adversative, af¿liation, agent, centre of reference, communality, comparison, copulative, correlation, distributive, exchange, extension, hierarchy, inclusive, order, participation, presence, realization, recipient, state, statement, substitute, theme, transgression.

The senses in (5) were derived from the description of preposition senses in the three resources mentioned above. We compared descriptive and illustrative interpretations, and developed abstractions. This led to the uni¿cation of several senses, as well as to a further differentiation of other senses. In some cases, a clear distinction between two senses cannot be drawn: (6)

Feuer nach [temporal/conditional-causal] Blitzschlag Fire after/because-of lightning strike

It is for cases like (6) that the scheme allows the assignment of multiple senses to individual instances.7 3.3.

Properties of spatial and temporal decision trees

It seems obvious that sense labels like TEMPORAL or SPATIAL are not particularly revealing. Yet examples in dictionaries often just provide this kind of characterization, leaving open questions e.g. as to why two prepositions with spatial senses and identical case requirements cannot be substituted salva veritate. Instead, we assume decision trees for SPATIAL and TEMPORAL interpretations, which encode ¿ne-grained distinctions between individual spatial and temporal senses, respectively. Hence the assignment of speci¿c subsenses is guided by the application of speci¿c criteria, which also facilitate the annotation process.

7. With regard to example (6), one could argue that temporal interpretations are always present with causal interpretations, as factual causal relations are anchored temporally. But under closer scrutiny, one ¿nds causal interpretations that are predominant to such an extent that a parallel temporal interpretation diminishes. Further analysis might lead to the eventual conclusion that we are not actually observing an ambiguity in example (6), but an inference. Currently, however, we consider such a conclusion premature.

The annotation of preposition senses in German

71

3.3.1. Spatial interpretations We do not classify spatial prepositions as such. Instead, the present scheme classi¿es spatial senses that are associated with the respective prepositions. Criteria, like the differentiation between topological and projective (e.g. Wunderlich 1986), were used to distinguish different senses. Topological prepositions typically locate one object (the localized object, LO) in a neighbouring region of another object (the reference object, RO). They can be distinguished from projective prepositions whose semantics have to include a directional vector or reference axis. As Wunderlich proposed, the combination with spatial measurements like zwei Meter (‘two metres’) is only possible with projective but not with topological prepositions. In addition to the topological interpretation of in (‘in’), an (‘at’, ‘by’), auf (‘on’, ‘at’) and bei (‘at’, ‘by’, ‘near’) we have added topological senses for unter (‘under’/‘below’) and über (‘over’/‘above’). A topological sense is illustrated for über in (7), where a vertical axis plays no role in the interpretation.8 (7)

Das Bild hängt über dem Loch. The picture hangs above the hole ‘The picture hides the hole.’

In the same vein, we do not assume that auf is a purely topological preposition, but is ambiguous between topological and projective readings. Topological and projective preposition senses can be distinguished from shape-related um (‘around’), an interpretation of nach (‘after’/‘behind’), and some directed senses. Topological preposition senses can be further divided into the ones locating the LO in a region inside the RO, the ones locating the LO outside the RO, and the ones used for a traversal of the RO by the LO. As can be seen in Figure 29, the criteria for identifying the pertinent senses for localizing a LO within a RO are identical to the ones for localizing a traversal, except that they map to different prepositions. This captures the consistent behaviour of the path preposi8. The interpretation of über may change depending on the presence of a measure phrase: in (7) the hole is (at least partially) hidden by the picture, but in the following example the picture is located two metres above the hole. This is a projective use of the preposition. Das Bild hängt zwei Meter über dem Loch. ‘The picture hangs two metres above the hole. 9. The prepositions in Figures 1 and 2 are not part of the decision tree, but are listed for illustrative purposes only.

72

Antje Müller, Claudia Roch, Tobias Stadtfeld & Tibor Kiss

tions über (‘across’) and durch (‘through’) and the topological prepositions auf and in with respect to dimensionality. In and durch require their internal argument, the RO, to have at least the number of dimensions of the LO (cf. Kaufmann 1993). (8)

Schweizer Truppen auf deutschem Gebiet ‘Swiss troops on German territory’

(9)

Schweizer Truppen gehen über deutsches Gebiet. ‘Swiss troops walk across German territory.’

(10) Er liegt im Wald. ‘He lies in the woods.’ (11) Er geht durch den Wald. ‘He walks through the woods.’ Directionality does not affect the basic criteria for the identi¿cation of spatial senses with regard to regions and axes. Therefore the scheme in Figure 2 does not distinguish between directional and local interpretations. Instead, the feature [±DIR] will be added to a sense after the classi¿cation in Figure 2 has been traversed, and is not listed as a separate feature in Figure 1 and Figure 2 below.10 Directional preposition senses must be kept apart from preposition senses that are directed. Directed preposition senses express an inclination or alignment. They split up into the target-orientated interpretations of nach (‘to’, but ‘after’ with other senses) and gegen (‘to’) (12), and the interpretation ‘in line with’ of mit (‘with’) and its counterpart gegen (‘against’) (13). (12) Das Pendel schlug nach der Seite/gegen die Seite aus. ‘The pendulum swung to the side.’ (13) Ernst fotogra¿ert mit dem Licht./gegen das Licht. ‘Ernst takes a picture with the light/against the light.’ We distinguish the aforementioned localizations (Figure 2) from localizations that impose selectional restrictions on the syntactic object of the preposition 10. It has to be considered that a differentiation between local and directional senses does not apply to every spatial interpretation.

The annotation of preposition senses in German

73

(Figure 1). The latter are typically excluded from systematic classi¿cations of spatial prepositions, and cannot be compositionally derived from the more ordinary spatial senses. The PP in (14) does not denote the property of being localized in the proximal region of Herbert’s parents, but implies that Herbert is visiting his parents’ home, regardless of his parents being there at all. Such interpretation shifts are only possible if the object of the preposition meets the relevant restrictions.11 (14) Herbert ist bei seinen Eltern. Herbert is at his parents ‘Herbert is at his parents’ home.’ auf in, unter borderline crowd, collective institution RO is

meeting

an, auf, bei, vor auf, bei, in, vor

travelling (home of a) person

auf

working area medium

bei an auf, in

Figure 1. Selectional Restrictions on RO

11. Without a context the example (28) is ambiguous between an interpretation in which the LO is localized in the proximal region of the RO, and the interpretation adhered to above, and thus would lend itself to multiple annotations.

Figure 2. Spatial Relations

spatial

directed um

reference axes

nach

relation LO to RO is

shape-related

projective

passage of local reference points

topological

RO is

in line with/against

target-orientated

vertical horizontal 1 horizontal 2

area

pro-form for LO (Stelle, Platz, Ort)

hinter, vor

gegen, mit

gegen, nach

neben

is exterior to RO

traverses RO

is contained in RO

auf, über, unter

LO

an, auf

Dim(LO) is ? Dim (RO)

greater than

less than or equal

greater than

an, bei

über

durch

auf

in

an, auf, gegen, über, unter

less than or equal

in the boundary area

Dim(LO) is ? Dim (RO)

LO is

in the proximal area

74 Antje Müller, Claudia Roch, Tobias Stadtfeld & Tibor Kiss

The annotation of preposition senses in German

75

3.3.2. Temporal interpretations Here, we have been able to build on a decision tree for temporal interpretations of German prepositions developed in Durrell & Brée (1993). The decisions in the tree are based on the distinction between a matrix and a subordinate eventuality, the characteristics of these eventualities, and the identi¿cation of the temporal relationship between them. Where Durrell and Brée wanted to offer guided choices from senses to lexemes, we wanted to obtain a useful feature space to annotate the temporal interpretations of prepositions as they are used in PNCs or PPs. As there was no need for the included features to be distinct for all prepositions, we could prune the tree of Durrell and Brée at certain points. But we also had to amend Durrell and Brée’s tree because certain interpretations were missing. The tree itself is divided into two parts: One for prepositions building a time measure phrase with their complements, and one for prepositions that do not. A time measure phrase can either frame the duration of an eventuality (the matrix eventuality) (15) or relate the time in which the matrix eventuality is taking place to a reference time (16) (which is equal to the utterance time in this example). (15) Sie wohnt seit drei Jahren in Hamburg. She lives for three years in Hamburg ‘She has been living in Hamburg for three years now.’ (16) Vor einer Woche haben die Kurse angefangen. Ago one week have the courses started ‘The courses started one week ago.’ Non-time-measure readings are found in the upper half of the tree in Figure 3. They relate the matrix eventuality to a so-called subeventuality. It de¿nes whether both occur at the same time or in sequence, and whether they label points in time or periods. In example (17) the matrix eventuality is gehen wir schlafen (‘we go and sleep’) and the sub eventuality is dem Essen (‘the meal’). They do not occur at the same time but the subevent took place before the matrix event. (17) Nach dem Essen gehen wir schlafen. ‘After the meal we go and sleep.’

Figure 3. Temporal decision tree

+

time measure

-

S

M = S?

relates M directly to TOD

is duration of M

-

+

M

M is an

M?S

sub event is

=TOD-s

>TOD+s

=TOD+s

activity / state accomplishment

S MAX(D)

(6)

MAX(P) = Țd:

(type < < d, t >, < < d,t >,t > >)

P(d) = 1 & ‫׊‬d' [P(d') = 1 J d' ” d]

118

Sonja Tiemann, Vera Hohaus & Sigrid Beck

In the example, the difference degree d is ¿ve inches. The than-clause provides the set D of degrees to which the moat is wide, and the main clause provides the set D' of degrees to which the drawbridge is long. Relating the maximal elements of these two sets as described derives the meaning captured by the paraphrase. A dif¿culty in the analysis of comparatives lies in deriving the appropriate sets of degrees from the syntactic structure. The example is associated with the underlying structure in (7), the surface structure in (8) and the Logical Form in (9). The Logical Form allows us to derive the intuitively appropriate interpretation for the example on the basis of standard mechanisms of interpretation (given in (10) and (11) in the style of Heim & Kratzer (1998)). (7)

The drawbridge is [AP [DegP 5'' -er [than the moat is how wide]] long]

(8)

The drawbridge is [AP [DegP 5''_ ] long-er] [than how1 the moat is t1 wide]

(9)

[DegP 5'' -er [than how1 the moat is t1 wide]] [1[the drawbridge is [AP t1 long]]]

(10) [[ [1 [the bridge is [AP t1 long]]] ]]g = Qd. the bridge is d-long [[how1 the moat is t1 wide ]]g = Qd. the moat is d-wide [[ [DegP 5'' -er than how1 the moat is t1 wide] ]]g = QD’. MAX(D’) • 5''+ MAX(Qd. the moat is d-wide) (type < < d,t > ,t > ) (11) [[ [DegP 5'' -er than how1 the moat is t1 wide] [1 [the bridge is [AP t1 long]]] ]]g = 1 iff MAX(Qd. the bridge is d-long) • 5" + MAX(Qd. the moat is d-wide) Let us consider the crucial aspects of this analysis of comparatives. Example (2), ¿rst of all, motivates a degree semantics. We measure the drawbridge and the moat along different dimensions and relate the resulting degrees on the scale of physical extent. Moreover, we relate them rather precisely via the differential measure phrase ¿ve inches, using a sum operation on the differential and the than-clause degree. In this type of example, the comparative could not simply relate two objects (the drawbridge and the moat). Degrees are introduced into the semantics through gradable predicates. The basic contribution of a gradable adjective is a relation between degrees and individuals. We give an example below. (12) [[long]] = Qd. Qx. x is d-long = Qd. Qx. LENGTH(x) • d

(type < d, < e,t > >)

Crosslinguistic variation in comparison: evidence from child language acquisition

119

Finally, comparison operators like the comparative quantify over degrees. Such degree operators behave at the level of Logical Form in a manner that is quite parallel to nominal quanti¿ers. In particular, they create properties of degrees via movement and predicate abstraction. This can be seen in the Logical Form in (9), where the comparative DegP [5” -er than the moat is wide] underwent quanti¿er raising (QR). Its semantic type < < d,t > ,t > is parallel to the type of a quanti¿er over individuals, < < e,t > ,t > (Heim 2001). This analysis is supported by a range of data which follow from it as straightforward generalizations. We give a selection of relevant examples in (13) to (17) below, with the sentence in (a), a paraphrase in (b), and the Logical Form in (c). (13) a.

Mary is exactly 1.70m tall. Overt Direct Measure Phrase (MP)

b.

The maximal height degree that Mary reaches is 1.70m.

c.

[ [DegP < < d,t > ,t > exactly 1.70m] [ < d,t > 1 [Mary is t1 tall]] ]

(14) a.

How tall is Mary?

b.

For which degree d: Mary is d-tall?

c.

[Q [ < d,t > how1 [Mary is t1 tall]]]

(15) a.

Degree Question (DegQ)

Mary is taller than 1.70m. Comparison to a Degree (CompDeg)

b.

The maximal height degree that Mary reaches exceeds 1.70m.

c.

[ [DegP < < d,t > ,t > -er than 1.70m] [ < d,t > 1 [Mary is t1 tall]] ]

(16) a.

Mary is the tallest.

Superlative (Sup.)

b.

The maximal height degree that Mary reaches exceeds the maximal height degree that any other relevant person reaches.

c.

[Mary [[DegP -est C] [ < d, < e,t > > 1 [2 [t2 is t1 tall] ]]]]

Note ¿nally that the analysis of a gradable adjective in the simple, positive form requires the introduction of an abstract Positive operator POS to derive from the lexical semantics of the adjective (a relation between individuals and degrees) the right meaning, a context-dependent property of individuals. We provide a simple version of POS in (17c).

120

Sonja Tiemann, Vera Hohaus & Sigrid Beck

(17) a. b. c.

2.2.

Mary is tall.

Positive

Mary counts as tall in the context of evaluation = Mary’s height reaches s (where s is the threshold for tallness in the context of evaluation) [[tall]] = [Qd. Qx. x is d-tall]

[[ POSs ]] = [QAdj. Qx. MAX(Qd. Adj(d)(x)) • s] [[ [AP POSs tall] ]] = Qx. MAX(Qd. x is d-tall) • s = Qx. x’s height reaches s

(type < d, < e,t > > )

(type < e,t > )

Crosslinguistic variation in comparison

The above-mentioned properties of the grammar of comparison are not shared by all languages, and accordingly the expression of comparisons varies widely crosslinguistically. Beck et al. (2009) – referred to in the following as B17, after the joint project funded by the German Research Foundation DFG – have conducted a systematic investigation into crosslinguistic variation in comparative constructions which was theoretically guided by the theory introduced above. The table below summarizes their main results.2 Table 1. Crosslinguistic variation in comparison constructions CompDeg No. Yes. Yes. Yes.

DiffC No. Yes. Yes. Yes.

NegI n/a No. Yes. Yes.

Scope n/a No. Yes. Yes.

SubC n/a No. No. Yes.

MP No. No. No. Yes.

DegQ No. No. No. Yes.

Lang.Ex. Motu Japanese, Chinese Guaraní, Russian English, German

Notice that the properties in the table always occur in a cluster: either a language has both CompDeg and DiffC, or it has neither – and similarly for the other clusters (NegIs, Scope) and (SubC, MP, DegQ). According to B17, these

2. Explanation: DiffC stands for differential comparative, exempli¿ed by (2) or the simpler Mary is two inches taller than John is; NegIs stands for an English-like negative island effect, as witnessed by the unacceptability of *Mary bought a more expensive book than nobody did; Scope is intended for scope interaction, i.e. scope ambiguities between the comparative operator and other quanti¿ers, as exempli¿ed by the ambiguity of Heim’s (2001) example The paper is allowed to be exactly ¿ve pages longer than that. N/a means that the relevant data cannot be constructed, e.g. Scope, a judgment on wide scope degree operators, makes no sense in a language without degrees.

Crosslinguistic variation in comparison: evidence from child language acquisition

121

clusters identify three dependent parameters of crosslinguistic variation, which we explain brieÀy in the following. 2.2.1. Degree semantics The basis of the grammar of comparison in English is the degree ontology used in the semantics. Adjectives – more precisely, gradable predicates – have an argument position for degrees. Those argument positions must be saturated in the syntax. Degree operators do so, indirectly, by quantifying over degrees. In order to determine whether the language under investigation is like English in this respect, B17 evaluated the comparison data from that language with respect to: (i) whether the language has a family of expressions that plausibly manipulate degree arguments: comparative, superlative, equative morphemes; items parallel to too and enough. (ii) whether the language has expressions that plausibly refer to degrees and combine with degree operators: CompDeg, DiffC.

Motu, B17’s representative of a conjunctive language, gives a clear negative answer to both of these questions. Comparison in Motu is expressed as in (1), repeated from above. (1)

Mary na lata to Frank Mary is tall but Frank ‘Mary is taller than Frank.’

na kwadoǦgi. is short. (Beck et al. 2009: 18, ex. (59))

Other types of data that would be indicative of a degree semantics, like measure phrases or degree questions, are unavailable as well (cf. Beck et al. 2009: 4749 for the respective data). Thus we see no evidence for an underlying degree semantics, and B17 accordingly suggest that there is the following parameter of language variation: (18) Degree Semantics Parameter (DSP): A language {does/does not} have gradable predicates (type < d, < e,t > > and related), i.e. lexical items that introduce degree arguments. (Beck et al. 2009: 19) The DSP is a point of systematic variation in the lexicon (similar in spirit to proposals in Chierchia (1998) for nominal semantics). Motu would, of course, have the negative setting -DSP. This leaves us with the task of ¿nding a semantic analysis for Motu adjectives. They occur only in one form, which seems similar to the English positive form in its context dependency. B17’s sugges-

122

Sonja Tiemann, Vera Hohaus & Sigrid Beck

tion is that Motu adjectives have a context dependent semantics without involving < d, < e,t > > adjectives or POS (cf. the negative DSP setting just hypothesized). The Motu example in (1) is analyzed in (19). (19) a.

[[tallMotu]] = lx. x counts as tall in c

b.

[[shortMotu]] = lx. x counts as short in c

c.

[[Mary na lata to Frank na kwadoǦgi]] = 1 iff Mary counts as tall in c and Frank counts as short in c

The sentence is predicted to be true in a context as long as the context can be construed as ranking Mary and Frank with respect to their height, with Mary on the tall side and Frank on the short. The meaning that this semantics derives for Mary na lata is indistinguishable from the meaning of the English Mary is POS tall, but it is derived differently. The point is that Motu has no relational < d, < e,t > > adjective meanings and no degree operators; comparisons are made in an indirect manner. 2.2.2. Degree operators A more subtle variation between English and Japanese is already observed in Beck, Oda, and Sugisaki (2004). While Japanese (20) looks super¿cially similar to English (21a), several important empirical differences between the two languages lead Beck, Oda & Sugisaki to propose a different semantics, closer to that of English (21b). (20) Sally-wa Joe-yori Sally-TOP. Joe-yori (21) a. b.

kasikoi. smart

Japanese

Sally is smarter than Joe. Compared to Joe, Sally is smarter.

In contrast to English, Japanese does not permit MPs, SubC, or DegQs. Beck, Oda & Sugisaki also note that in contrast to English, there is no scope interaction with modal verbs in Japanese comparison constructions; thus Japanese does not seem to have a comparative operator that behaves like a quanti¿er at the level of Logical Form. Negative island effects are also not English-like. The acceptability of a differential comparative, however, indicates that the semantics underlying the yori-construction is a degree semantics. These basic facts as B17 would cluster them are summarized in (22):

Crosslinguistic variation in comparison: evidence from child language acquisition

123

(22) Japanese: *SubC, *MP, *DegQ; NegIs, Scope not like English; DiffC okay! Thus B17 take Japanese to have the positive setting of the DSP. Some other parameter must be responsible for the differences to English. B17 follow Beck, Oda & Sugisaki’s suggestion that Japanese does not permit quanti¿cation over degrees. The following parameter expresses that: (23) Degree Abstraction Parameter (DAP): A language {does/does not} have binding of degree variables in the syntax. (Beck, Oda & Sugisaki 2004: 336) If there is no binding of degree variables, a language cannot have degree operators like the English comparative. This explains the lack of scope interaction and the properties *DegQ (which needs binding of degree variables, as seen in Section 2.1), *SubC (comparing two sets of degrees requires degree variable binding, cf. the analysis of example (2)) and *MP (since overt direct measure constructions involve quanti¿cation over degrees, see once more Section 2.1). But of course we face the question of what the semantics of the normal comparison construction then is. Beck, Oda & Sugisaki (2004) consider English compared to and Japanese yori to be context setters not compositionally integrated with the main clause. They provide us with an individual (type < e > ) that is used to infer the intended comparison indirectly. Thus we would be concerned in (20) above with a comparative adjective without a syntactic item of comparison, similar to English (21b). We present Beck, Oda & Sugisaki (2004)’s semantics for Japanese kasikoi ‘smart’ in (24). The analysis implies that Japanese adjectives directly combine with a context dependent comparative operator. (24) a.

[[kasikoi COMPJapanese c]]g = Qx. MAX(Qd. x is d-smart) > g(c) [[COMPJapanese ]] = QAdj. Qd'. Qx. MAX(Qd. Adj(d)(x)) > d’

b.

[[Sally wa kasikoi]]g = 1 iff MAX(Qd. Sally is d-smart) > g(c)

c.

c : = the standard of intelligence made salient by comparison to Joe = Joe’s degree of intelligence

Thus even when there is evidence that the language under investigation employs a degree semantics, it may still lack English-type quanti¿ers over degrees. For a given language and comparison construction, we need to ask whether the constituent seemingly corresponding to the English than-constituent is really

124

Sonja Tiemann, Vera Hohaus & Sigrid Beck

a compositional item of comparison denoting degrees, and whether there is a genuine comparison operator. B17 suggest that the parameter setting +DSP,-DAP is also exempli¿ed by Mandarin Chinese, Samoan, and the exceed-type languages that they investigated, Mooré and Yorùbá. 2.2.3. Degree Phrase arguments Another group of languages appears to be closer to English than Japanese, but still not completely parallel. Russian, Turkish and Guaraní belong to this group, and show the behavior summarized in (25). (25) Russian, Turkish, Guaraní: *SubC, *MP, *DegQ; but DiffC, CompDeg okay, English-like NegIs and Scope. B17 argue that Guaraní, Russian and Turkish have an English-like degree semantics for main clause and subordinate clause, i.e. have the parameter setting +DSP,+DAP. But we must ask how the differences to English degree constructions arise. B17 propose that the following parameter creates the cluster SubC, MP, DegQ: (26) Degree Phrase Parameter (DEGPP): The degree argument position of a gradable predicate {may/may not} be overtly ¿lled. (Beck et al. 2009: 24) The degree argument position (SpecAP in this paper) is ¿lled by the MP at the surface in measure constructions, and by overt or silent how in DegQ and SubC. The difference between SubC and ordinary comparatives can be tied to ellipsis, in that comparatives with ellipsis only have a ¿lled SpecAP at the level of LF. Thus the languages with *DegQ, *SubC, *MP are identi¿ed by the parameter setting -DEGPP, while at the same time being +DSP and +DAP. A language like English would, according to B17’s analysis, have the parameter setting +DSP,+DSP,+DEGPP. Besides English, the properties identi¿ed by these settings are documented in Bulgarian, German, Hindi, Hungarian, and Thai. 2.2.4. Subsection summary The table below provides a summary of the predictions that B17’s three dependent parameters are designed to make. The table lists all possibilities opened by the parameters: If a language is -DSP, it must be -DAP as well, because there can be no abstraction over degree variables without degree semantics. Similarly, if a language is –DAP, B17 infer that it is also -DEGPP because the DegPs that B17

Crosslinguistic variation in comparison: evidence from child language acquisition

125

investigate are all operators over degree arguments and can only be interpreted with the help of binding of the degree argument slot. Table 2. Parameter settings and predictions íDSP +DSP, íDAP +DSP, +DAP, íDegPP +DSP, +DAP, +DegPP

CompDeg No. Yes. Yes. Yes.

DiffC No. Yes. Yes. Yes.

NegIs n/a No. Yes. Yes.

Scope n/a No. Yes. Yes.

SubC n/a No. No. Yes.

MP No. No. No. Yes.

DegQ No. No. No. Yes.

The interest in such parameters lies in the fact that they make predictions about a range of phenomena. Each parameter is responsible for a set of effects, a cluster of empirical properties. Taken together, the settings of the proposed parameters group languages together that share a bunch of key properties in the realm of comparison constructions. 2.3.

Time course of acquisition

The particular interest in parameters for present purposes comes from their connection to child language acquisition. Snyder (2007: 7) postulates that a “… theory of (syntactic) variation is simultaneously a theory of the child’s hypothesis space during language acquisition.” That is, the child has to determine, on the basis of the available evidence, the parameter settings for the language that she is learning from the range of possibilities. Snyder (2007: 7) goes on to propose two acquisition predictions for any parameter suggested: (i) If the grammatical knowledge (including parameter setting and lexical information) required for construction A, in a given language, is identical to the knowledge required for construction B, then any child learning the language is predicted to acquire A and B at the same time. (ii) If the grammatical knowledge (including parameter setting and lexical information) required for construction A, in a given language, is a proper subset of the knowledge required for construction B, then the age of acquisition for A should always be less than or equal to the age of acquisition for B. (No child should acquire B signi¿cantly earlier than A.)

Application of (i) and (ii) to children’s spontaneous speech relies on the assumption that the child is conservative, in the sense that she won’t use a construction unless she is certain of its analysis in the target language. As Snyder (2007: 166) describes grammatical conservatism: “When children do not yet

126

Sonja Tiemann, Vera Hohaus & Sigrid Beck

know how to construct a given sentence-type, it appears that they actually refrain from producing the sentence-type, rather than risking an error of commission.” This means that parametric variation cannot only be tested in crosslinguistic studies but should also be detectable during the acquisition process. As Snyder (2007) points out, the study of acquisition even has theoretical advantages over a cross-linguistic analysis. First, we can focus on a single, well-studied language. Furthermore, testing the acquisitional predictions for each child is comparable to testing parametric predictions for a new language. Every new language in a typological study comes with the possibility for two associated grammatical characteristics to diverge and each new child comes with the possibility for the two grammatical characteristics to be acquired at different times. 2.4.

Predictions

Let us now examine the concrete predictions about the time course of child language acquisition that the theory of comparison from B17 makes in conjunction with Snyder’s understanding of the process of language acquisition. To begin with, we expect that the initial stage is one during which we have negative settings of all three parameters involved in the grammar of comparison. Then, diverging parameter settings can be achieved on the basis of positive evidence (for example, upon hearing Mary is six months older than Joe, one could deduce a positive setting of the DSP). Generally, we expect that the properties that identify a particular parameter should be acquired at roughly the same time, unless a construction in the cluster also requires some independent knowledge that is acquired later. The setting of +DSP is obligatory for a language to have the potential for a +DAP setting. Thus, the grammatical knowledge required for the constructions which are indicative of the +DSP setting is a proper subset of the knowledge required for the constructions that are indicative of the +DAP setting. Consequently, the phenomena for the +DAP setting are expected to be acquired no earlier than the phenomena of the +DSP setting. Similarly, since the required knowledge for the constructions of the +DAP setting is a proper subset of the grammatical knowledge for the constructions of the +DEGPP setting, the latter should be acquired later than the former. How do these general expectations translate into predictions regarding the occurrences of particular comparison constructions in the child’s spontaneous language, as witnessed in the CHILDES corpora? We concentrate on the acquisition of languages with the settings +DSP,+DAP,+DEGPP, which include in particular English and German.

Crosslinguistic variation in comparison: evidence from child language acquisition

127

Some of the constructions used as indicators in B17’s crosslinguistic work will not be useful for the analysis of spontaneous speech of Englishor German-learning children. Reliable evidence regarding the NegIs property cannot be gained from corpora as it depends on negative evidence. Other constructions, in particular the subcomparative, and scope facts, are very likely too rare to show up with any reliability in corpora. Thus we face the obvious dif¿culty that some crosslinguistic indicators of parameter settings are not available in a corpus study of language acquisition. There are some further considerations to be made. With respect to child language, the question is what other factors there may be that could possibly slow down the acquisition process of a certain construction. We follow Syrett (2007) who observes that children by the age of three share with adults an abstract representation of the positive form of gradable adjectives, that both incorporate a standard of comparison and allow for variation with respect to how the standard is set. So this is not a problem. Potentially problematic however is the knowledge of units of measurement such as meters and years, and of how they apply to degrees. Additionally to MPs, we will therefore take expressions that refer directly to a degree like the one given in (27). (27) I am that tall.

Pronominal Measure Construction (PMP)

Here, that stands in for a degree and ¿lls the degree argument slot of the gradable predicate. We will refer to such data as pronominal measure constructions and have included them in our investigation, in addition to MPs, in the hope of being able to circumvent the problem of units of measurement. They were not considered by B17. Notice that this concern applies not only to MPs but also to comparison with a degree and differential comparatives. This indicates that we should speci¿cally consider data like (28) and (29) in the acquisition study. (28) I am taller than that. (29) I am taller.

CompDeg with degree pronoun Contextual Comparative

Besides unavailability of certain data points in acquisition that were available in the crosslinguistic study, there is fortunately also the reverse situation of certain constructions being available as evidence in acquisition that are not observationally available in a crosslinguistic study. Since the analysis of than-clauses in English involves predicate abstraction (cf. Section 2.1), it requires a +DAP setting. Therefore the child’s conservativity in conjunction with the B17 analysis leads us to expect that constructions indicative of a +DSP setting only (degree

128

Sonja Tiemann, Vera Hohaus & Sigrid Beck

morphology; CompDeg and DiffC pace the concerns voiced above) should be acquired no later than than-clauses. (Note that here we have an advantage over a crosslinguistic investigation because it is precisely a question of the crosslinguistic study how an apparent counterpart of an English than-clause is actually to be analyzed in the language under investigation. Cf. Japanese, as discussed above.) A comparative with a than-clause needs to be distinguished from a contextual comparative, e.g. (29) above, which does not provide evidence for a +DAP setting. It also needs to be distinguished from a than-phrase, for which it is unclear whether the same analysis must apply (e.g. Hankamer 1973; Lechner 2004; Hofstetter 2009; Bhatt & Takahashi (to app.)). Some authors argue that the than-phrase in examples like (30a) is really a reduced clause (30b), while others like to take it at face value, necessitating another semantic entry for the comparative operator (30c). This issue surfaces in our acquisition results. We use the term than-constituent when we want to remain neutral as to its status. The term than-phrase is used to refer to than followed by what seems to be one phrasal constituent (mostly a DP, referring to an individual), without prejudging its analysis as either reduced or non-reduced (like (30a)). The term than-clause refers to than followed by what is unequivocally a clausal structure (like example (2) once more). (30) a. b. c.

Mary is taller than John. [-er than [John is tall]] [Mary is tall] Mary [[-er < e, < < d, < e,t > > , < e,t > > > than John] [Ȝd. Ȝx. x is d-tall]]

Given these considerations, here is a list of comparison constructions for which CHILDES corpora may be a good source: – – – – –

degree morphology: comparative and sup. vs. unmarked forms of adj.s pronominal measure constructions overt measure phrases occurring (i) in CompDeg, DiffC and (ii) as MP degree questions than-constituents occurring (i) phrasal and (ii) clausal

We make at least the following predictions about the time course of the acquisition of comparison:

Crosslinguistic variation in comparison: evidence from child language acquisition

129

(31) +DSP before +DAP: a.

No child should acquire than-clauses signi¿cantly before degree morphology.

b.

No child should acquire than-clauses signi¿cantly before CompDeg.

(32) +DAP before +DEGPP: No child should acquire DegQs and MPs signi¿cantly before comparatives (including all of degree morphology, CompDeg, and than-clauses). We would like to emphasize that these predictions rule out scenarios that would otherwise be logically possible and intuitively reasonable. For example, one could imagine a child ¿rst acquiring (33a) and then (33b). But this is incompatible with the above predictions. (33) a. b.

How old is Molly? Molly has an older bike than Sue does.

Let us now look at what we found in the CHILDES corpora.

3.

The corpus study

3.1.

Methodology

In order to test the acquisitional predictions above, we selected transcripts from spontaneous speech of three American and three German children from CHILDES (MacWhinney 2000). Corpora were selected by their density and sampling length. Expectations were that the majority of comparison constructions would not be acquired until late, and the corpora selected reÀect this expectation in that the children were recorded up to seven years of age for German, and up to at least the age of 5;1 for English. The list of transcripts analyzed in our study is presented in the tables below.

130

Sonja Tiemann, Vera Hohaus & Sigrid Beck

Table 3. Corpora analyzed for (Mainstream American) English Child Adam Sarah Ross3

Collected by Roger Brown (Brown 1973) Brian MacWhinney

Downloaded 08/20/2008 08/20/2008 10/22/2008

Ages 2;3–5;2 2;3–5;1 2;6–7;5

# of Child Utterances 90,852 31,369 30,912

Table 4. Corpora analyzed for German Child Cosima Pauline Sebastian

Collected by Rosemarie Rigol Rosemarie Rigol Rosemarie Rigol

Downloaded 10/16/2008 10/16/2008 10/16/2008

Ages 0;0–7;2 0;0–7;7 0;0–7;0

# of Child Utterances 76,888 83,572 79,451

The programs provided by CLAN were used to identify potentially relevant child utterances. The results were then searched by hand for the relevant constructions, and checked against the original transcripts to exclude imitations, repetitions, and formulaic routines. Following suggestions in Snyder (2007), we excluded (i) material transcribed as mumbled, unclear, or overlapping with another person’s utterance; (ii) as repetition and imitation, material occurring in the same order in an earlier utterance within the same transcript, and containing the exact same words (including inÀectional morphology); and (iii) memorized routines. In order to be characterized as novel, an utterance had to contain a new word, a change in word order, or a change in morphology. The results were then analyzed for very ¿rst use and age of acquisition, as well as for types of errors and their frequency. Following Stromswold (1990) and Snyder (2007), the age at which a child produced her or his ¿rst clear example of a construction followed soon after by regular use with a variety of lexical items was considered to be the age of acquisition for this construction (First of Repeated Uses, FRU). “Soon after” was here understood as within the next two months. Frequency of grammatical and ungrammatical constructions

3. This corpus does not contain natural production data in the strict sense but elements characteristic of a diary study, i.e. recordings have only been partially transcribed. What was selected for transcription was what had been deemed interesting or remarkable. We believe we can still gain valuable insight regarding the age of very ¿rst use, though very little regarding the further course of acquisition. Ross is included because his is the only English corpus to extend past the age of seven.

Crosslinguistic variation in comparison: evidence from child language acquisition

131

was determined per 1,000 utterances for each month (cf. Hohaus and Tiemann (2010: 92) for data on types of errors and their frequency). Irregular and periphrastic forms were not taken into consideration when determining the age of acquisition for the comparative and the superlative. 3.2.

Results

The results of our search are summarized below. Table 5 presents the number of gradable adjectives that the children have at their disposal by the age of 2;3 to 2;64, Table 6 the results for the age of very ¿rst use, and Table 7 the results for the age of acquisition. Table 5. Repertoire of gradable adjectives Adam (2;3) 18 types

English Sarah (2;3) 11 types

German Ross (2;6) Cosima (2;5) Pauline (2;5) Sebastian (2;5) 28 types 16 types 20 types 17 types

Table 6. Age of very ¿rst use5 First use Comp Sup. PMP CompDeg DiffC Than-phrases Than-clauses MP DegQ

English Adam Sarah Ross (2;3–5;2) (2;3–5;1) (2;6–7;5) 2;6 2;10 2;6 4;2 3;7 3;5 3;1 3;0 2;10 5;4 4;11 3;5 3;11 3;5 3;8 3;5 4;5 4;5 3;0 4;0 3;1 3;3

German Cosima Pauline Sebastian (0;0–7;2) (0;0–7;7) (0;0–7;0) 2;7 1;1 3;11 3;7 3;5 3;9 3;3 2;4 2;7

6;6

5;9

4;8

3;11

6;5 3;5

4;3

4. For the American children, the age is the age at which transcripts started. 5. Empty ¿elds indicate that there were no occurences in the transcripts.

132

Sonja Tiemann, Vera Hohaus & Sigrid Beck

Table 7. Age of acquisition English FRU

Adam (2;3–5;2)

Comp 3;4 Sup. 4;2 PMP 4;0 4;2 Than-phrases Than-clauses Inde¿nable. MP Inde¿nable. DegQ Inde¿nable.

German

Sarah (2;3–5;1)

Ross (2;6–7;5)

Cosima (0;0–7;2)

Pauline (0;0–7;7)

Sebastian (0;0–7;0)

3;7 4;2 4;0 Inde¿nable.

2;6 4;8 4;1 3;5 Inde¿nable. Inde¿nable. Inde¿nable.

2;9 3;7 3;7 Inde¿nable.

2;8 4;5 2;10 6;6

3;11 4;3 Inde¿nable.6 6;3

Inde¿nable.

Inde¿nable. 6;10

Inde¿nable.

4;5 Inde¿nable.

Let us summarize the above results. None of the children acquire all constructions before the end of the respective corpora, although Ross makes use of all constructions before the end of the corpora, by the age of 7;5. For both English and German, mean age of acquisition of regular comparative morphology was 3;1, with a range of 2;6 to 3;7 for English and of 2;8 to 3;11 for German. For English, mean age of acquisition of than-phrases was 3;9, with a range of 3;5 to 4;2. For the German children Pauline and Sebastian, however, mean age of acquisition of than-constituents was 6;4, with a range of 6;3 to 6;6. For the superlative in English, mean age of acquisition was 4;4, with a range of 4;2 to 4;8. Gaps in the recordings at 3;8 as well as 4;0 and 4;7 probably account for the late age of acquisition of the superlative by Ross. In German, mean age of acquisition for the superlative was 4;1, with a range of 3;7 to 4;5. For English PMPs, mean age of acquisition was 4;0, with a range of 4;0 to 4;1. Mean age of acquisition for PMPs for the German children Cosima and Pauline was 3;2, with a range of 2;10 to 3;7. MPs were only acquired by Sarah, at 4;5, and not by any German-learning children. Degree questions were only acquired by Pauline by the end of the corpora, at 6;10, and not by any English-learning children. Figure 1 and 2 on the next page additionally give an overview over the course of acquisition for PMPs in English and German, as they present the focus of this paper.

6. Age of acquisition could not be determined due to low number of occurrences.

Crosslinguistic variation in comparison: evidence from child language acquisition

Figure 1. Acquisition of the pronominal measure construction in English

Figure 2. Acquisition of the pronominal measure construction in German

133

134

Sonja Tiemann, Vera Hohaus & Sigrid Beck

For German, acquisition of comparative morphology was earlier than acquisition of the superlative and of PMPs, and earlier than the acquisition of MPs and DegQs. Acquisition of MPs and DegQs seems to be last. So far, the time course of acquisition in German, too, is in line with the predictions. However, in German than-constituents are not acquired concurrently with the superlative but considerably later than in English. (Unequivocally clausal comparatives in both languages were used too infrequently to gain de¿nitive insight into their acquisition.) The sequencing indicated in Figure 3 summarizes what we ¿nd. Evidence for where to locate the acquisition of than-clauses in English is only suggestive, as indicated. But see Hohaus, Tiemann & Beck (in prep.) for a more conclusive discussion. English

German

Figure 3. Course of acquisition for English and German

Unpredicted are the early acquisition of than-phrases in English when compared to German, and the early acquisition of PMPs in both languages when compared to the acquisition of MPs. The next section provides a detailed discussion of our ¿ndings.

Crosslinguistic variation in comparison: evidence from child language acquisition

4.

Discussion

4.1.

Con¿rmation of crosslinguistic picture

135

Let us ¿rst note that the acquisition study has con¿rmed the fundamental setup of the theory of parametric variation developed in B17. Recall that this theory irrevocably makes the following predictions about acquisition. (34) +DSP before +DAP: a.

No child should acquire than-clauses signi¿cantly before degree morphology.

b.

No child should acquire than-clauses signi¿cantly before comparison to a degree.

(35) +DAP before +DEGPP: No child should acquire DegQs and MPs signi¿cantly before comparatives (including all of degree morphology, CompDeg and than-clauses). In English and German, those expectations are con¿rmed. Let us take a closer look at the data we found. Children learning the +DSP,+DAP,+DEGPP languages English and German seem to go through the following stages as regards acquisition of the semantics of comparison: 0. gradable adjectives in unmarked form with Positive-like semantics 1. gradable adjectives with comparative morphology (in addition to unmarked form) 2. PMPs, Sup., than-phrase with predicative adjective (English) 3. all other than-constituents (English); all than-constituents (German) 4. DegQs and MPs All six children investigated show this sequencing in their production of comparison constructions. (Admittedly, stages 3 and 4 are not identi¿ed by the data available as clearly as one would wish. But see Hohaus, Tiemann & Beck (in prep.), who provide evidence for stage 3 from attributive and adverbial comparatives with than-constituents and who argue that the sparsity of MPs and DegQ in the corpora shows that they have not been acquired.) What can semantic analysis say about these stages? We suggest that the child begins with the adult semantics of a positive gradable adjective at stage 0, though without arriving at this meaning as a combina-

136

Sonja Tiemann, Vera Hohaus & Sigrid Beck

tion of the Positive operator and a relational adjective meaning. We suggest she uses (36) as a simple, uncomposed meaning (like Motu). (36) [[tall]] = Qx. x counts as tall in c

(type < e,t > )

(37) [[taller]] = Qx. HEIGHT(x) > dc

(type < e,t > )

Once the child has acquired the comparative form of the adjective together with its correct meaning, we suppose that she uses (37). This semantics is a ¿rst step towards a degree semantics and a +DSP setting, since a degree argument occurs in this interpretation. Once more though we propose that the child has not necessarily learned yet that this meaning arises from a combination of a relational lexical entry for the adjective plus a comparative operator. This step must be taken next, at stage 2, when the child acquires PMPs and the superlative. A PMP only makes sense on the basis of a relational adjective meaning. PMPs are discussed in detail in Section 4.2. Superlative morphology similarly indicates that a basic relational adjective meaning combines with degree morphology to yield comparative and superlative meanings. Hence we suggest that now the child has the following semantic knowledge: (38) a.

[[tall]] = Qd. Qx. HEIGHT(x) • d

(type < d, < e,t > > )

b.

[[POSs]] = QAdj. Qx. MAX(Qd. Adj(d)(x)) • s

c.

PMP: [[that tall]] = [[tall]] ([[that]]) = Qx. HEIGHT (x) • [[that]]

d.

Contextual Comparative: [[-er1]] = QAdj < d, < e,t > > . Qd'. Qx. MAX(Qd. Adj(d)(x)) > d' [[taller dc ]] = [[-er1]] ([[tall]]) ([[ dc ]]) = Qx. HEIGHT (x) > dc

e.

Superlative: [[-est C]] = QAdj < d, < e,t > > .Qx. MAX(Qd.Adj(d)(x)) > MAX(Qd.Ӄy[C(y) & Adj(d)(y)])

Now the child has gradable predicates in the same sense as the adult grammar, type < d, < e,t > > and related (+DSP setting). Note that our sketch of the process of acquisition is compatible with conservatism, since the child does not revise her assumptions; she merely re¿nes them. She realizes that the meaning in (36) is derived by composing (38a) and (38b), and similarly for (37) and (38a) and (38d). Regarding the comparative, instances without a than-constituent are

Crosslinguistic variation in comparison: evidence from child language acquisition

137

analyzed as contextual comparisons (cf. Section 2 on Japanese) and thereby CompDeg. At this stage, the English-learning children also master than-phrases with predicative adjectives, and this is a very surprising ¿nding of our study. It is discussed in detail in Hohaus, Tiemann & Beck (in prep.). The acquisition of than-clauses shows, according to the reasoning laid out in Section 2, that a +DAP setting is arrived at. Hence we propose that once a child produces than-constituents that are derived from a clausal source (stage 3), she has the following semantic knowledge: (39) [[than how1 the drawbridge is t1 long]] = [< d,t > Qd. the drawbridge is d-long]

(5')

[[-ersimple]] = QD. QD'. MAX(D') > MAX(D)

(type < < d,t > , < < d,t > ,t > >)

And ¿nally, as expected given the B17 theory, degree questions and overt direct MPs (12) and (13) above, indicative of the +DEGPP setting, appear to come last in both languages and all children we considered. This characterizes stage 4. The preceding discussion has demonstrated that the CHILDES corpora do not contain all the data that would be of interest in the study of the acquisition of comparison. One comment we have is that it would have been helpful to have more corpora that extend into a later age. Longer corpora would permit determination of the age of acquisition for MPs and DegQs. In sum, we have shown that not only the general outcome, but also the details of the steps revealed in the acquisition process, support the theory of comparison from B17. 4.2.

An unexpected result: pronominal measure phrases

In this subsection we take a closer look at PMPs. The difference that we found in the acquisition data between PMPs and MPs indicates that semantic theory needs to differentiate between the two kinds of measure phrases. Since PMPs are acquired very early in English and German, at the stage at which the +DSP setting solidi¿es, we propose that PMPs have the referential type < d > (as seen in the analysis proposed above and repeated in (40)). This enables them to combine directly via Function Application with the gradable predicate. The construction should be available as soon as the adjective has the < d, < e,t > > lexical entry.

138

Sonja Tiemann, Vera Hohaus & Sigrid Beck

(40) a.

[[tall]] = Qd. Qx. HEIGHT(x) • d

(type < d, < e,t > > )

b.

PMP: [[that tall]] = [[tall]]([[that]]) = Qx. HEIGHT(x) • [[that]]

c.

[[that]] = dc (where dc is the contextually relevant degree) (type < d > )

The PMP thus differs semantically from the quanti¿cational type of MPs ( < < d,t > ,t > ). This difference is important in terms of the parameters proposed by B17 and supported in this paper. B17 considered in their crosslinguistic survey only MPs. Being degree quanti¿ers, their availability hinges on the availability of degree quanti¿cation, i.e. a +DAP setting. In fact, all constructions indicative of the setting of the DEGPP depended on a +DAP setting (MP, DegQ, SubC). B17 thus investigated the DEGPP as dependent on the DAP. The theoretical relevance of PMPs lies in the fact that they do not require a +DAP setting, but nonetheless overtly ¿ll the degree argument position of a gradable predicate. Our acquisitional ¿ndings and their analysis indicate, therefore, that (i) availability of PMPs must be determined independently of availability of MPs, and that (ii) the (in-)dependence of the DEGPP on the DAP must be reinvestigated. We ¿rst discuss the theoretical picture and then report the results of a crosslinguistic study on PMPs. The discussion to follow presupposes that PMPs are of type < d > . While it is conceivable that this is not universally the case, our acquisition results are incompatible with PMPs of type < < d,t > ,t > and therefore, at least in English and German they are of type < d > . Adding type < d > PMPs to the picture could affect the relation of DAP and DEGPP in one of two possible ways. The ¿rst possibility is that B17 were right in the way they formulated the DEGPP. They were not right in assuming that the DEGPP depended on the +DAP setting. The consequence is that we need to ask about the DEGPP setting in +DAP and in -DAP languages. Speci¿cally we need to ask about the acceptability of PMPs in both types of languages. We expect (i) that PMPs should pattern with MPs (and DegQ, SubC) in +DAP languages, and they could either all be okay or all be bad, and (ii) that among the -DAP languages, there may be ones that allow PMPs and ones that do not, reÀecting +/-DEGPP; among the -DAP languages, overt MP, DegQ and SubC are out because they would require degree variable binding. The second possibility is that B17 were wrong in the way they formulated the DEGPP. They were right in proposing that the DEGPP depended on the +DAP setting, and they should have formulated the DEGPP as follows to bring this out:

Crosslinguistic variation in comparison: evidence from child language acquisition

139

(26') Degree Phrase Parameter' (DEGPP'): The degree argument position of a gradable predicate {may/ may not} overtly host an operator.

The consequence here is that we need to ask about PMPs in all languages separately from MP, DegQ and SubC, because they are not theoretically related. In any given language, be it +DAP or -DAP, it is possible to have a -DEGPP setting but accept PMPs. We expect that in +DSP,+DAP,-DEGPP languages and in +DSP,-DAP,-DEGPP languages, PMPs should be acceptable (as opposed to MPs, DegQs, and the SubC). We have conducted a small survey among B17’s languages to establish acceptability of PMPs. The results are summarized in Table 8 below.7 Table 8. Availability of measure constructions in B17’s languages Language Motu Mandarin Japanese Mooré Samoan Yorùbá Russian Turkish Romanian Spanish Guaraní English German Bulgarian Hindi-Urdu Hungarian Thai

Overt No. No. No. No. No. No. No. No. No. No. No. Yes. Yes. Yes. Yes. Yes. Yes.

Pronominal –8 Yes. No. – No. No. No. No. No. No. – Yes. Yes. Yes. Yes. Yes. No.

7. When concluding that a language does not have this particular construction or does not allow for it, we merely wish to say that the language does not allow a structure parallel to the English degree construction. We do not mean, however, that the language does not have some way of expressing a similar content. We will always provide one alternative structure as well. 8. B17 collected their data on Motu, Mooré, and Guraní in New York City, Burkina Faso, and Paraguay, respectively. Unfortunately, we had no acccess to native speakers.

140

Sonja Tiemann, Vera Hohaus & Sigrid Beck

Just as in B17, judgments were elicited following the techniques presented in Matthewson (2004). Informants were presented with ¿ve contexts, including the ones in (41) and (42). In (42), informants were additionally presented with a photograph of a young boy demonstrating the size of the ¿sh with a gesture. Informants were then asked to judge the acceptability of sentences in the target language, in which a demonstrative, among these the respective translations of English so, this and that, directly combined with a gradable predicate. (41) Context 1: Mary is 5'8'' tall. John is this tall, too. (42) Context 2: Last night, John, our neighbor went for a walk with his father along the river. At dinner, he tells his mother about a ¿sh which he saw. His mother asks him what size the ¿sh was. John shows her and replies: “The ¿sh was that big.” With Mandarin Chinese, we indeed ¿nd a +DSP,-DAP language that allows PMPs but not MPs, cf. (46) and (47). In all other languages in the cluster, PMPs seem to be not available, cf. (43) to (45). For those, one of the ungrammatical examples and the alternative construction are shown below. (43) Sakana-wa sono *(kurai) ooki. ¿sh-TOP. that degree big ‘The ¿sh was that big.’ (44) a.

*E TAP

b.

umi tall

lenei foi this also

Ioane. John

E

umi faapea foi tall likewise also ‘John is this tall, too.’ TAP

(45) a. b.

*Isaac naa Isaac also

ga tall

Japanese

Samoan

Ioane. John

bee. so

Isaac naa ga to bee. Isaac also tall reach so ‘Isaac is this tall, too.’

Yorùbá

Crosslinguistic variation in comparison: evidence from child language acquisition

141

In Mandarin Chinese, an MP cannot combine with e.g. gao ‘tall’ in (46). The PMP construction in (47) with name or zheme however is acceptable, though many speakers prefer (48).9 (46) ??/*Yuehan shi yi mi John be one meter ‘John is 1.70m tall.’

qi gao. seven tall

(47) Yuehan ye shi name/ zheme John also be that-me/ this-me ‘John is that tall, too.’

Mandarin Chinese

gao. tall

The construction in (48) employs the copula you ‘have’ and that an MP is acceptable with you ‘have’ as well, e.g. (49). We follow Krasikova (2008) in assuming that neither of them can be analyzed as a measure phrase construction: She suggests that we are dealing with a secondary predication structure (Krasikova 2008: 278). See also Xie (2011) for a similar proposal. Our classi¿cation of Mandarin hence relies on (46) and (47). (48) Yuehan ye you John also have ‘John is this tall, too.’

name/ zheme gao. that-me/ this-me tall

(49) Yuehan you yi John have one ‘John is 1.70m tall.’

mi meter

qi seven

gao. tall.

All +DSP,+DAP,-DEGPP languages investigated lack PMPs. The Russian example in (50) is grammatical as is the Spanish example in (55) but both may only be used as an exclamative and cannot refer to a degree. Spanish and Romanian seem to employ the degree particle de as a rescue strategy as has already been

9. The morpheme –me is obligatory in (47) and (48). Its semantic contribution is for futher research to explore. If –me were an indicator of internal compositional complexity (for example a wh-element similar in effect to how), our conclusions about Mandarin and the DEGPP would be called into question because (51) could not be analysed as a simple PMP. If, on the other hand, -me is semantically relatively harmless (say, a classi¿er), the interpretation of the data is as presented in the text. Thanks to Lisa Cheng for discussion of this point.

142

Sonja Tiemann, Vera Hohaus & Sigrid Beck

observed by Beck et al. (2009: 25) for MPs and is discussed for Romanian in Gergel (2009). (50) #Ɋɵɛɚ ɛɵɥɚ ɬɚɤɨɣ ɛɨɥɶɲɨɣ. ¿sh be.PAST such big ‘The ¿sh was so very big!’

Russian

(51) Ɋɵɛɚ ɛɵɥɚ ɬɚɤɨɝo ɪɚɡɦɟɪɚ. ¿sh be.PAST such.GEN. size.GEN. ‘The ¿sh was of such a size.’ (52) *Gör.dü÷ü.m balÕk bu büyük.tü. see-PAST.PART-1SG. ¿sh this big.PAST ‘The ¿sh seen by me was this big.’ (53) Gör.dü÷ü.m balÕk bu kadar see-PAST.PART-1SG. ¿sh this like ‘The ¿sh seen by me was this big.’ (54) Uite, aúa a fost *(de) Look so be.PAST de ‘Look, it was that big.’

büyük.tü. big.PAST

mare. big

Romanian

(55) #El pez era tan grande. the ¿sh be.PAST so big ‘The ¿sh was so very big!’ (56) El pez era así the ¿sh be.PAST so ‘The ¿sh was that big.’

*(de) de

Turkish

Spanish

grande. big

We would expect that languages which have MPs at their disposal also have PMPs. In general, this is the case. But it seems possible for a language – despite the availability of MPs – not to have PMPs for the reason that it does not have a degree pronoun. Thai seems to be an example of such a language. It has a positive setting of all three parameters, and accordingly MPs can be directly combined with degree predicates, as in (57). The structure in (58) is ungrammatical, however, and the alternative structure in (59) is employed. Thai is the only language we have found that disallows PMPs for reasons orthogonal to its parameter setting.

Crosslinguistic variation in comparison: evidence from child language acquisition

(57) Maria soong 172cm. Mary tall 172cm ‘Mary is 1.72m tall.’ (58) *Bplah dtoo-uh yai ¿sh body big ‘The ¿sh is this big.’

143

Thai (Beck et al. 2009: 58)

nee. this

(59) Bplah dtoo-uh yai tao-nee. ¿sh body big equal-this ‘The ¿sh is big like this.’ The data collected clearly support the ¿rst possibility: a -DEGPP setting affects PMPs along with the other expressions that may ¿ll a gradable predicate’s degree argument slot (MP, DegQ, SubC). The DEGPP is, for present purposes, stated correctly. However, it can no longer be seen as semantically dependent on +DAP. Among the languages investigated, Mandarin Chinese provides evidence for a +DSP,-DAP,+DEGPP parameter setting.

5.

Summary and Conclusions

We have brought together research in child language acquisition and formal semantic theory, to show how they can both bene¿t when this connection is made. A theory of systematic crosslinguistic variation in semantics makes interesting predictions about the time course of language acquisition. Conversely, acquisition data may support or falsify claims about parametric variation in semantics. While acquisition data are a rich source of evidence for theories of grammar (be they syntactic or semantic), we have also shown that compositional semantics offers new insights into the acquisition process. The particular aspect of the grammar we have investigated is the grammar of comparison. The wide variation between languages in this area as well as the existence of a parametric analysis of this variation made it a promising ¿eld of study for acquisition. In all essential respects the acquisition data we have collected have con¿rmed the view on variation that B17 take. The type of data – corpora of children’s spontaneous speech instead of semantic ¿eldwork – has made a difference, bringing to light the importance of some previously neglected constructions like PMPs. The acquisition data also contain new evidence on familiar constructions, differentiating unexpectedly between English and German than-constituents.

144

Sonja Tiemann, Vera Hohaus & Sigrid Beck

Our study highlights directions for future work. Regarding acquisition, the need for longer corpus studies has emerged. Another question is how to gain insight into the acquisition of infrequent constructions. Both follow as desiderata from the sparsity of data that affects some aspects of our study. It would be very interesting to conduct studies parallel to ours on other languages, e.g. with a -DEGPP or a -DAP setting. The investigation of child language acquisition has lead to further interesting questions about crosslinguistic semantics and will no doubt continue to do so.

Acknowledgments We thank Nadine Bade, Rajesh Bhatt, Lisa Cheng, Noah Constant, Remus Gergel, Chenjie Gu, Irene Heim, Stefan Hofstetter, Svetlana Krasikova, Tobias Pfaff, Britta Stolterfoht, Guillaume P. Thomas, John Vanderelst; the audiences at the University of British Columbia in Vancouver, at Linguistic Evidence 2010 in Tübingen, and at the 2011 Leiden Workshop Degrees under Discussion; as well as two anonymous reviewers for providing comments and collecting data. This work would not have been possible without our informants: Many thanks to Nan Li, Chenjie Gu, Tingchun Chen, Bitian Zhang, and Zhiguo Xie for Chinese; to Toshiko Oda for Japanese; to Puaina Pfeiffer, Alofa Tjus, Temukisa Grundhöfer, Leutu Jaschke and Tony T. Faleafaga for Samoan; to Bunmi Aina for Yoruba; to Polina Berezovskaya and Svetlana Krasikova for Russian; to the Cebeci Family for Turkish; to Remus Gergel for Rumanian; to Álvaro Octavio de Toledo y Huerta for Spanish; to Janina Rádo for Hungarian; to Ventsislav Zhechev for Bulgarian; to Ashutosh Singh for Hindi-Urdu; and to Jitraphan Hajjavanija, Duangkamon Poosaksrikit, and Nutaporn Vititviriyakul for Thai.

References Beck, Sigrid, Daniel Fleischer, Remus Gergel, Stefan Hofstetter, Sveta Krasikova, Christiane Savelsberg, John Vanderelst & Elisabeth Villalta 2009 Crosslinguistic variation in comparison constructions. Linguistic Variation Yearbook 9: 1–66. Beck, Sigrid 2011 Comparison constructions. In: Claudia Maienborn, Klaus von Heusinger and Paul Portner (eds.), Semantics: An International Handbook of Natural Language Meaning, 1341–1389. Berlin: De Gruyter.

Crosslinguistic variation in comparison: evidence from child language acquisition

145

Beck, Sigrid, Toshiko Oda & Koji Sugisaki 2004 Parametric variation in the semantics of comparison: Japanese vs. English. Journal of East Asian Linguistics 13: 289–344. Bhatt, Rajesh & Shoichi Takahashi to app. Reduced and unreduced phrasal comparatives. Natural Language and Linguistic Theory. Bresnan, Joan 1973 Syntax of the comparative clause in English. Linguistic Inquiry 4: 275–343. Brown, Roger 1973. A First Language: The Early Stages. Cambridge: HUP. Büring, Daniel 2007 Cross-polar nomalies. Proceedings of Semantics and Lingustics Theory 17, 37–52. Chierchia, Gennaro 1998 Reference to kinds across languages. Natural Language Semantics 6: 339–405. Gergel, Remus 2009 The little de of degree constructions. In: Ronald P. Leow, Héctor Campos and Donna Ladiere (eds.), Little Words: Their History, Phonology, Syntax, Semantics, Pragmatics and Acquisition, 75–86. Washington, D.C.: Georgetown University Press. Hankamer, Jorge 1973 Why there are two than’s in English. Proceedings of the 9th Annual Meeting of the Chicago Linguistics Society, 179–191. Heim, Irene 2001 Degree operators and scope. In: Caroline Féry and Wolfgang Sternefeld (eds.), Audiatur Vox Sapientiae. A Festschrift for Arnim von Stechow, 214–239. Heim, Irene & Angelika Kratzer 1998 Semantics in Generative Grammar. Oxford: Blackwell. Hofstetter, Stefan 2009 Comparison in Turkish: A rediscovery of the phrasal comparatve. Proceedings of Sinn und Bedeutung 13, 187–203. Hohaus, Vera & Sonja Tiemann 2010 “…this much is how much I’m bigger than Joey…” A corpus study in the acquisition of comparison constructions. Proceedings of AquisiLyon’09, 90–93. Hohaus, Vera, Sonja Tiemann & Sigrid Beck in prep. Acquisition of comparison. Ms., Eberhard Karls Universität Tübingen. Kennedy, Christopher 1997 Projecting the adjective: the syntax and semantics of gradability and comparison. Ph.D. diss., University of California, Santa Cruz.

146

Sonja Tiemann, Vera Hohaus & Sigrid Beck

Kennedy, Christopher 2009 Modes of Comparison. Proceedings from the Annual Meeting of the Chicago Linguistic Society 43, 141–156. Klein, Ewan 1991 Comparatives. In: Arnim von Stechow and Dieter Wunderlich (eds.), Semantics: An International Handbook on Contemporary Research, 673– 691. Berlin: de Gruyter. Krasikova, Svetlana 2008 Comparison in Chinese. In: Olivier Bonami and Patricia Cabredo Hofherr (eds.), Empirical Issues in Syntax and Semantics 7, 263–281. Lechner, Winfried 2004 Ellipsis in Comparatives. (Studies in Generative Grammar 72.) Berlin: De Gruyter. MacWhinney, Brian 2000 The CHILDES Project: Tools for Analyzing Talk. 3rd ed. Hillsdale: Lawrence Erlbaum. Matthewson, Lisa 2004 On the Methodology of Semantic Fieldwork. International Journal of American Linguistics 70, 369–415. Snyder, William 2008 Relating language variation to language acquisition. Handout from CSAAL Workshop on Locating Variability, University of Massachusetts, Amherst. Snyder, William 2007 Child Language: The Parametric Approach. Oxford: Oxford University Press. Stassen, Leon 1985 Comparison and Universal Grammar. Oxford: Blackwell. Stechow, Arnim von 1984 Comparing semantic theories of comparison. Journal of Semantics 1 (3): 1–77. Stromswold, Karin 1990 Learnability and the acquisition of auxiliaries. Ph.D. diss., Massachusetts Institute of Technology, Cambridge. Syrett, Kristen 2007 Learning about the structure of scales: Adverbial modi¿cation and the acquisition of the semantic gradable adjectives. Ph.D. diss., Northwestern University, Chicago. Xie, Zhiguo 2011 Degree possession is a subset relation (as well). Proceedings of Sinn und Bedeutung 15, 661–676.

Restricting quanti¿er scope in Dutch: Evidence from child language comprehension and production Petra Hendriks, Ruth Koops van ’t Jagt & John Hoeks

1.

Introduction1

In English, transitive sentences with an inde¿nite subject and a universally quanti¿ed direct object such as (1) are generally ambiguous and can refer to a situation in which there is a speci¿c bear who tickles all the turtles, as well as to a situation in which each turtle is tickled by a different bear. (1)

A bear tickles every turtle.

These two interpretational possibilities are usually semantically distinguished in terms of quanti¿er scope: In the former case, the inde¿nite has wide scope over the universal quanti¿er, and in the latter case, the inde¿nite has narrow scope with respect to the universal quanti¿er. Similar sentences in Dutch, however, strongly resist the narrow scope reading for the inde¿nite subject. Surprisingly, in contrast to Dutch adults, Dutch children seem to behave like English adults and allow the narrow scope reading until quite a late age (Philip 2005). The difference between Dutch adults and Dutch children can be explained in a number of different ways. First, it is conceivable that the dispreference for the narrow scope reading in adult Dutch is caused by some syntactic property of Dutch that children have not yet mastered. If this explanation is correct, we expect the same Dutch children to make corresponding errors in their production. On the other hand, the dispreference for a narrow scope reading for inde¿nite subjects in Dutch may also be the result of a pragmatic restriction on the interpretation of inde¿nite subjects in Dutch. This explanation allows for the possibility that Dutch children’s production of quanti¿er scope is perfectly adult-like. A third possibility is that the preference for a wide scope reading is 1. Petra Hendriks gratefully acknowledges NWO (grant no. 277-70-005) for ¿nancial support. The authors thank Henk Leo Deuzeman, Douwe Schelvis, Jacolien van Rij and Charlotte Koster for their help in carrying out the experiment, and Tom Roeper, the Acquisition Lab Groningen, the audience of Linguistic Evidence 2010 and three anonymous reviewers for valuable comments.

148

Petra Hendriks, Ruth Koops van ’t Jagt & John Hoeks

caused by a semantic property of inde¿nites such as speci¿city, which could inÀuence adults’ and children’s production of inde¿nites differently. The aim of this paper is to shed more light on the lack of quanti¿er scope ambiguities with inde¿nite subjects and universally quanti¿ed objects in adult Dutch, and their presence in child language, by studying comprehension as well as production of such sentences. This paper is organised as follows. Section 2 discusses the three explanations for the dispreference for the narrow scope reading in adult Dutch mentioned above in more detail. These explanations differ in the predictions they make about children’s production. Section 3 describes our study, which tests comprehension as well as production of universally quanti¿ed sentences in children and adult controls. The results of the comprehension task are presented and discussed in section 4, and the results of the production task are presented and discussed in section 5. As we will show, considering comprehension as well as production of the same linguistic form can help to decide among competing linguistic explanations.

2.

Quanti¿er scope in Dutch

Why are Dutch inde¿nites restricted in their scopal possibilities? In section 2.1, we discuss three possible explanations: a syntactic explanation, a pragmatic explanation and a semantic explanation. Section 2.2 shows how these explanations can account for children’s non-adult pattern of interpretation, and section 2.3 discusses their predictions with respect to production. 2.1.

Restricting quanti¿er scope

A ¿rst explanation for the restricted scopal possibilities of inde¿nites in Dutch is suggested by Beghelli & Stowell (1997) in their article on English every and each. In this article, they propose a syntactic account of quanti¿er scope, according to which scopal ambiguity arises because various landing sites are available to inde¿nites for covert movement. Depending on the landing site, the inde¿nite takes wide or narrow scope with respect to a quanti¿er or negation. Discussing cross-linguistic evidence for their syntactic account of quanti¿er scope, Beghelli and Stowell point out that in various Germanic languages, such as Dutch, speci¿c readings of scrambled inde¿nite objects are necessarily associated with overt leftward movement out of the VP. With respect to the interpretation of scrambled inde¿nite objects, Dutch children were found to differ from Dutch adults: They do not distinguish between inde¿nite objects in unscrambled position and inde¿nite objects that have

Restricting quanti¿er scope in Dutch

149

scrambled out of the VP to a position to the left of a sentential adverbial like negation, and preferably assign a non-speci¿c interpretation to inde¿nite objects in both positions (Krämer 1998; Unsworth 2007). Although Beghelli and Stowell do not discuss inde¿nite subjects in Germanic languages, it is conceivable that the speci¿c reading of inde¿nite subjects in Dutch is also associated with a particular scope position to which the inde¿nite subject moves. Consequently, children’s non-adult comprehension of inde¿nite objects in scrambled position and inde¿nite subjects in sentence-initial position may receive a similar explanation in terms of dif¿culty with leftward movement or the absence of an appropriate landing site. This syntactic account would thus predict that children’s non-adult pattern in comprehension may be accompanied by word order errors in production. That is, children who assign a non-speci¿c interpretation to scrambled inde¿nite objects are predicted to have problems with overt leftward movement of inde¿nite objects. Similarly, children who assign a non-speci¿c interpretation to inde¿nite subjects in sentence-initial position may experience problems with leftward movement of inde¿nite subjects. This should then result in errors with verb second in Dutch, which not only requires the ¿nite verb in main clauses to move from sentence-¿nal position to the second position but, in canonical sentences, also requires the subject to move to a position in front of the verb. Children’s errors with verb second typically consist in leaving the verb (usually in the in¿nitive form) in sentence-¿nal position or in using verbsubject order (Wijnen & Verrips 1998). Thus, if children’s acceptance of the narrow scope reading of inde¿nite subjects is caused by their inability to move the inde¿nite subject to an appropriate landing site to the left of the verb, we expect these children to produce ungrammatical sentences in which the verb is left in ¿nal position or which have verb-subject order. An alternative explanation for the dispreference for the narrow scope reading in adult Dutch is Philip’s (2005) pragmatic account. Philip argues that the interpretation of inde¿nite subjects in Dutch is restricted by a constructionspeci¿c and language-speci¿c pragmatic rule. For Dutch transitive sentences with a sentence-initial subject and a universally quanti¿ed direct object, this rule requires listeners to select the strongest possible meaning consistent with the context of use, in accordance with the Strongest Meaning Hypothesis (Dalrymple et al. 1994). Because the wide scope interpretation implies the narrow scope interpretation but not vice versa (if a speci¿c bear tickles all the turtles, then it is also true that each turtle is tickled by some bear, but not the other way around), the wide scope interpretation is the strongest of the two. If both interpretations are possible, Philip argues, Dutch adults will choose the wide scope interpretation.

150

Petra Hendriks, Ruth Koops van ’t Jagt & John Hoeks

The pragmatic nature of the rule would explain why the rule is not yet available to Dutch children from an early age on: as the rule is not innately speci¿ed, it must be learned from the language input, which is expected to take time. Also, its pragmatic nature would explain why there is considerable variation across adult native speakers of Dutch. According to Philip, for many speakers of Dutch “it is always in principle possible for a high inde¿nite subject to be nonspeci¿c as long as the verb is transitive” (2005: 274). Because Philip’s rule is an ‘interpretive rule’ that selects a particular interpretation under speci¿c structural conditions, it only applies in interpretation. Therefore, if no additional assumptions are made, Philip’s construction-speci¿c interpretive rule will predict a delay in Dutch children’s comprehension of inde¿nite subjects while at the same time predicting adult-like production of these forms. A third explanation for the lack of the narrow scope reading in adult Dutch, is that inde¿nite subjects in canonical position in Dutch must receive a speci¿c interpretation for independent reasons. Because a speci¿c interpretation is incompatible with a narrow scope reading of the inde¿nite, the narrow scope reading is blocked. De Hoop & Krämer (2005/6) offer such a semantic explanation in terms of Optimality Theory (OT; Prince & Smolensky 1993/2004). According to their OT account of inde¿nites in Dutch, inde¿nite subjects in canonical sentence-initial position must receive a speci¿c (or, in their terminology, referential) interpretation, and inde¿nite objects in canonical non-scrambled position must receive a non-speci¿c interpretation, as a result of the interaction between two constraints of the grammar: Constraints on speci¿city (adapted from de Hoop & Krämer 2005/6): M1: Subjects get a speci¿c interpretation; objects get a non-speci¿c interpretation. M2: Inde¿nite noun phrases get a non-speci¿c interpretation. These two constraints are in conÀict when interpreting an inde¿nite subject, as satisfaction of M1 requires that the inde¿nite subject receives a speci¿c interpretation whereas satisfaction of M2 requires that the inde¿nite subject receives a non-speci¿c interpretation. However, if M1 is stronger than M2 (as de Hoop & Krämer argue to be the case for Dutch adults), satisfaction of M1 is more important than satisfaction of M2 and hence inde¿nite subjects of Dutch transitive sentences receive a speci¿c interpretation. For inde¿nite objects, the two constraints are not in conÀict and each promote a non-speci¿c interpretation.

Restricting quanti¿er scope in Dutch

2.2.

151

Acquiring quanti¿er scope

As mentioned above, the syntactic explanation suggests that Dutch children interpret inde¿nite subjects non-speci¿cally because they fail to move inde¿nite subjects into the relevant scope position. According to the pragmatic explanation, Dutch children do not yet know the language-speci¿c pragmatic rule and hence allow an inde¿nite subject to receive a weak, non-speci¿c, interpretation. So how does the semantic explanation account for Dutch children’s interpretation of inde¿nite subjects in universally quanti¿ed sentences? De Hoop & Krämer (2005/6) show how the constraints M1 and M2 account for children’s non-adult-like interpretations of inde¿nites in special, marked, positions, such as the sentence-internal position of the subject in existential there-constructions and the scrambled position of the object to the left of a sentential adverbial. Because children have a general preference for subjects to be speci¿c and objects to be non-speci¿c, which is reÀected by M1, they incorrectly interpret inde¿nite subjects in existential constructions as speci¿c, too. Similarly, they incorrectly interpret inde¿nite objects in scrambled position as non-speci¿c. Adults distinguish between subjects and objects in canonical position and in marked position, de Hoop & Krämer argue, because they take into account the perspective of the speaker. For reasons of economy, speakers prefer subjects and objects in their canonical position over subjects or objects in marked position. If listeners take into account the speaker’s choices and reason that the speaker would have used an unmarked form to express an unmarked interpretation, by this reasoning marked forms such as inde¿nite subjects in existential constructions and inde¿nite objects in scrambled position receive a marked interpretation. That is, inde¿nite subjects of there-sentences receive a non-speci¿c interpretation, and scrambled inde¿nite objects receive a speci¿c interpretation. As de Hoop & Krämer claim, children until the age of 7 or perhaps even later still lack the ability to take into account the speaker’s perspective when interpreting these marked forms. Hence, they assign an unmarked interpretation to unmarked as well as marked forms. If children fail to take into account the speaker’s perspective but already know that M1 must be given more weight than M2, as de Hoop & Krämer claim, they are predicted to interpret inde¿nite subjects in their canonical sentence-initial position as speci¿c, just like adults do. To support the validity of this claim, de Hoop & Krämer refer to an empirical study by Bergsma-Klein (1996), who concludes that 4- to 8-year old Dutch children correctly assign a speci¿c reading to inde¿nite subjects of intransitive sentences. However, this is not what Philip (2005) found in his study with 142 Dutch-speaking 6- to 12-year-olds. Philip’s 6-year-olds assigned a non-speci¿c interpretation to the

152

Petra Hendriks, Ruth Koops van ’t Jagt & John Hoeks

inde¿nite subject een vogel ‘a bird’ in the transitive test sentence Een vogel heeft elke bosbes opgegeten (‘a bird has eaten each blueberry’) in 33% of the cases. Surprisingly, the percentage of non-speci¿c interpretations by children did not decrease with age and was the same in the 12-year-olds tested. Children’s relative lack of speci¿c interpretations remains unexplained if children know that M1 is stronger than M2. However, in Optimality Theory children’s non-adult pattern can straightforwardly be explained by their lack of knowledge of the relative weights of the two constraints. In OT, a grammar consists of a set of universal constraints and their ranking. Hence, language variation as well as language acquisition is captured in terms of different rankings of these constraints (Prince & Smolensky 1993/2004, 1997; Tesar & Smolensky 1998). A different ranking of the same constraints could explain the difference between Dutch and English with respect to the scope of inde¿nite subjects. In addition, it could provide an explanation for the acquisition of inde¿nite subjects in Dutch. Children may start with a non-adult ranking of the relevant constraints and learn the adult ranking by reranking the constraints on the basis of the language input they receive. The inverse ranking of M1 and M2, with M2 ranked above M1, gives rise to the interpretational pattern shown by Dutch children as well as English adults. That is, if M2, which says that inde¿nite noun phrases get a non-speci¿c interpretation, is the strongest of the two constraints, the OT grammar predicts that all inde¿nites can receive a non-speci¿c interpretation, even inde¿nite subjects. As a result, for Dutch children as well as English adults inde¿nite subjects can be non-speci¿c and hence allow a narrow scope interpretation. So the difference between Dutch adults, on the one hand, and Dutch children and English adults, on the other, could be explained by a different ranking of M1 and M2 in the grammar: Dutch adults rank M1 above M2, and Dutch children and English adults rank M2 above M1. 2.3.

Producing sentences with inde¿nites

In OT, comprehension and production are assumed to be mediated by the same grammar, with the same constraints under the same ranking. The effects of these constraints may differ, however, because the nature of input and output to optimization differ: in comprehension, the output is an interpretation, whereas in production, the output is a form. Particular constraints in OT pertain to the output only and hence either apply to interpretations or to forms. Using this property of OT constraints, for example, Hendriks and Spenader (2005/6) explain the remarkable phenomenon that English and Dutch children make errors in their interpretation of object pronouns until the age of 7 (the so-called Delay

Restricting quanti¿er scope in Dutch

153

of Principle B Effect) but show adult-like performance in production from an early age on. To determine the predictions of an OT account of quanti¿er scope, we have to consider the effects of M1 and M2 as applied to potential output forms for a particular input meaning. Under either ranking of the constraints M1 and M2, a grammar consisting of only these two constraints predicts that if the input is a speci¿c referent, the best form to express that meaning is a de¿nite; a de¿nite satis¿es M2, whereas an inde¿nite would violate this constraint. This is illustrated by the following OT tableau: Input: +Spec Indef ) Def

M1: *S/íSpec; *O/+Spec M2: *Indef/+Spec *!

Figure 1. OT tableau of adult speakers’ choice of form for a speci¿c referent.

In an OT tableau, the constraints are given in the ¿rst row, ordered according to strength from left to right. Here, the notation *Indef/+Spec (M2) must be read as: Avoid inde¿nites that are speci¿c. In the adult grammar, M1 is stronger than M2. The input meaning is given in the lefthand corner. The relevant candidates for expressing the input meaning are listed in the ¿rst column below this input. The optimal form is the form that satis¿es the constraints of the grammar best because it incurs the least severe constraint violations. Constraint violations incurred by a candidate output are shown in the row of that candidate, and are indicated by an asterisk in the corresponding cell. Because the violation of M2 makes the inde¿nite less preferred in comparison to the de¿nite, this violation is fatal (marked by *!) and hence a de¿nite is the optimal form (indicated by )). Note that the violation of M2 by the inde¿nite is not fatal in an absolute sense but only in comparison to the constraint violations incurred by the de¿nite: If another but stronger constraint were relevant that would be violated by a de¿nite but not by an inde¿nite, this violation would be fatal for the de¿nite and consequently the inde¿nite would be the optimal form, despite its violation of M2. With respect to the production of non-speci¿c referents, the constraints are indecisive: M1 does not distinguish between inde¿nite and de¿nite subjects and objects, and inde¿nites and de¿nites both satisfy M2: Input: íSpec ) Indef ) Def

M1: *S/íSpec; *O/+Spec M2: *Indef/+Spec

Figure 2. OT tableau of adult speakers’ choice of form for a non-speci¿c referent.

154

Petra Hendriks, Ruth Koops van ’t Jagt & John Hoeks

Because the constraint pro¿les of inde¿nites and de¿nites are identical for a non-speci¿c input meaning, speakers may select a de¿nite or an inde¿nite at random, or perhaps base their choice on other factors that we have not considered here, such as givenness. Furthermore, as there is no situation in production where the two constraints are in conÀict, both constraint rankings yield the same output. So if the same two constraints M1 and M2 that explain the dispreference for the narrow scope reading in adult Dutch are applied to production, it is predicted that adults and children will show the same pattern. Both groups are predicted to avoid producing the Dutch equivalent of the sentence A bear tickles every turtle when reference is intended to a speci¿c bear. Instead of an inde¿nite subject, they will use a de¿nite subject. If, on the other hand, reference is intended to a non-speci¿c bear, they are predicted to use either an inde¿nite or a de¿nite subject. Similar predictions are made for inde¿nite objects. Summarizing, the three possible explanations for the restricted scopal possibilities of Dutch discussed in this section make different predictions about children’s production of sentences with inde¿nite subjects. For the semantic explanation, these predictions are independent of the occurrence of a universal quanti¿er and hold for inde¿nite subjects in general. The syntactic explanation predicts that children who show a non-adult pattern in comprehension also make word order errors in production. The pragmatic explanation predicts that these children’s production will be completely adult-like; however, no predictions are made about the exact pattern of production. The semantic explanation, on the other hand, does make predictions about the exact pattern of production for speci¿c and non-speci¿c referents. First of all, it predicts that children as well as adults will generally avoid using inde¿nites to refer to speci¿c referents. Furthermore, because children and adults employ a different constraint ranking, it allows for the possibility that in particular situations – governed by additional constraints, which may interact with M2 – children produce nonadult-like forms for speci¿c referents.

3.

Experiment

To test these predictions and to determine the role of speci¿city in the comprehension and production of quanti¿er scope, a group of children and a group of adult participants were tested on both a comprehension task (picture veri¿cation) and a production task (elicited production).

Restricting quanti¿er scope in Dutch

3.1.

155

Participants

Participants in the study were 17 Dutch-speaking children (age 4;6–6;8, mean age 5;11) and 20 Dutch-speaking adults (age 18;0–46;6, mean age 23;7). In total, 31 children were tested. However, 14 of these children made more than one error with the 6 control items (see section 3.2 on Materials and design), or did not complete the comprehension task, and were therefore excluded from analysis. 3.2.

Materials and design

Two types of sentences were used in the comprehension task: sentences with an inde¿nite in subject position and a universal quanti¿er in object position (IS-QO), and sentences with a universal quanti¿er in subject position and an inde¿nite in object position (QS-IO): (2)

Een beer kietelt elke schildpad. (IS-QO surface order) ‘a bear tickles every turtle’

(3)

Elke beer kietelt een schildpad. (QS-IO surface order) ‘every bear tickles a turtle’

These two sentence types were combined with picture sequences consisting of three pictures each, as shown in Figure 3. Three types of picture sequences were used: with 1 actor and 3 undergoers (sequence 1-3), with 3 actors and 3 undergoers (sequence 3-3) and with 3 actors and 1 undergoer (sequence 3-1). Sentences with an inde¿nite subject (IS-QO sentences) were presented with picture sequences showing one (1-3) or three (3-3) actors, corresponding to a speci¿c or non-speci¿c interpretation of the inde¿nite subject, respectively. Similarly, sentences with an inde¿nite object (QS-IO sentences) were presented with picture sequences with one (3-1) or three (3-3) undergoers, corresponding to a speci¿c or non-speci¿c interpretation of the inde¿nite object. We thus have four conditions in comprehension. The comprehension task consisted of 16 transitive test items (4 per condition), preceded by 2 practice items. In addition to these items, the comprehension task also included 6 intransitive control items (3 target ‘yes’ and 3 target ‘no’ items, also with sequences of three pictures) with inde¿nite or universally quanti¿ed subjects, such as Elke schildpad slaapt ‘every turtle is asleep’. After testing, we removed participants from our analysis who made more than one error with these control items and hence did not yet comprehend inde¿nites or quanti¿ers correctly, did not properly under-

156

Petra Hendriks, Ruth Koops van ’t Jagt & John Hoeks

stand the task, or displayed a yes bias in their responses. Two lists were created of the test items and control items, with a different order of items.

Figure 3. In the comprehension task, two sentence types were combined with two types of picture sequences each. In the production task, the same three types of picture sequences were used as in the comprehension task.

Based on the ¿ndings of Philip (2005), we expect children to accept the sentences in all four conditions. Although Philip did not consider the interpretation of inde¿nite objects, on the basis of the literature on the acquisition of inde¿nite objects in Dutch we expect the interpretation of inde¿nite objects to be unproblematic for Dutch children, as inde¿nite objects in canonical position are preferably interpreted as non-speci¿c by adults as well as children (see de Hoop & Krämer, 2005/6, for discussion). In contrast to children, we expect adults to reject IS-QO sentences such as (2) for picture sequence 3-3 in the majority of cases, as for these sentences a speci¿c interpretation of the inde¿nite subject is preferred. In the other three conditions, adults are expected to accept the sentences as correct descriptions of the picture sequences. In the production task, the same three types of picture sequences were used as in the comprehension task, but with new pictures. The production task consisted of 12 test items (4 per condition), preceded by 2 practice items. Again, two lists were created of the test items. Depending on the adopted explanation for children’s non-adult pattern in comprehension, children who assign a non-speci¿c interpretation to inde¿nite subjects are expected (1) to make word order errors in production, (2) to show a completely adult-like pattern in production, or (3) to show a largely adult-like pattern in production and prefer de¿nites over inde¿nites when referring to a speci¿c referent.

Restricting quanti¿er scope in Dutch

3.3.

157

Procedure

Participants were tested individually by two experimenters. They all received a comprehension task and a production task, in that order. In the comprehension task, the three pictures of each picture sequence were presented to the participant on a laptop screen one at a time. All sentences were pre-recorded and care was taken that their intonation was as neutral as possible. The prerecorded sentence was played while the third picture was visible on the screen. Participants were asked to respond by pressing the ‘yes’ button if the sentence matched the picture sequence, or the ‘no’ button in case of a mismatch. In the production task, the same three types of picture sequences were presented as in the comprehension task, again one picture at a time. Participants were asked to give a one-sentence description of the third picture. No speci¿c instructions were given regarding the form of the sentence. Only for the children the two tasks involved a puppet, to encourage them to respond. The puppet was said to have messed with the sentences in the comprehension task and to be unable to see the pictures in the production task.

4.

Comprehension

4.1.

Results

If inde¿nite subjects are assigned a speci¿c interpretation, which corresponds to a wide scope reading for the inde¿nite, participants are expected to reject sentences with an inde¿nite subject (IS-QO sentences) as an adequate description of a situation with multiple actors (3-3 picture sequences). In contrast, the same sentences should be accepted for a situation with a unique actor (1-3 picture sequences). If inde¿nite objects do not require a speci¿c interpretation, sentences with an inde¿nite object (QS-IO sentences) should be accepted as a description of 3-3 picture sequences as well as 3-1 picture sequences. The results of the comprehension task are shown in Figure 4. Proportions correct per participant were arcsine transformed and subjected to a Repeated Measures ANOVA with Group (children vs. adults) as a between-participants factor, and Condition (with levels “IS-QO 1-3”, “IS-QO 3-3”, “QS-IO 3-3” and “QS-IO 3-1”) as a within-participants factor. To guard against possible violations of the statistical assumption of sphericity, the Huynh-Feldt correction was used whenever factors with more than two levels were involved (Stevens 1992). We report the actual degrees of freedom that were used in the statistical test, rounded to the nearest integer. The results showed main effects of

158

Petra Hendriks, Ruth Koops van ’t Jagt & John Hoeks

1.00

Proportion 'YES' responses

.90 .80 .70 .60 Adults

.50

Children

.40 .30 .20 .10 .00 IS-QO 1-3

IS-QO 3-3

QS-IO 3-3

QS-IO 3-1

Condition

Figure 4. Proportion of ‘yes’ responses by adults and children in the comprehension task (based on means by participants). IS-QO = sentence with inde¿nite subject and quanti¿ed object; QS-IO = sentence with quanti¿ed subject and inde¿nite object; 1-3 = picture with one actor and three undergoers; 3-3 = picture with three actors and three undergoers; 3-1 = picture with three actors and one undergoer. Error bars represent one SE.

both Group (F(1,35) = 4.1; p = 0.05) and Condition (F(3,101) = 28.9; p < 0.001). These main effects were quali¿ed by a signi¿cant interaction between Group and Condition (F(3,101) = 36.8; p < 0.001). Follow-up analyses by means of independent group t-tests for each condition showed a signi¿cant difference in ‘yes’ responses between children and adults in the IS-QO 3-3 condition (t(35) = 7.9; p < 0.001), where adults gave substantially fewer ‘yes’ responses than children (0.21 (SE = 0.06) vs. 0.89 (SE = 0.06)), and a marginally significant difference in the QS-IO 3-1 condition (t(26)2 = í1.8; p = 0.081), where adults tended to give more ‘yes’ responses than children (0.95 (SE = 0.04) vs. 0.84 (SE = 0.05)). No signi¿cant differences emerged in the other two conditions (all p-values > 0.40). 4.2.

Discussion

The results of the comprehension task reveal that 4- to 6-year-old Dutch children allow inde¿nite subjects to be interpreted non-speci¿cally. They accepted sentences with an inde¿nite subject for situations with a non-speci¿c referent in 89% of the cases, whereas adults accepted such sentences in only 21% of the 2. Corrected degrees of freedom were used because of the violation of the equal variance assumption.

Restricting quanti¿er scope in Dutch

159

cases. The adult responses indicate that adults have a strong preference to interpret an inde¿nite subject in sentence-initial position as referring to a speci¿c individual.3 On the other hand, children as well as adults allow a non-speci¿c interpretation for inde¿nite objects. Our comprehension results for adults (21% acceptance) are comparable to Philip’s (2005) results (16% acceptance). However, the children in our study (mean age 5;11) allowed a non-speci¿c reading for inde¿nite subjects much more often, namely in 89% of the cases, than the youngest age group in Philip’s study (mean age 6;5), who allowed a non-speci¿c reading in only 33% of the cases. This difference might be due to the fact that we tested slightly younger children than Philip did, which would suggest a steep learning curve around the age of 6. Alternatively, the quantitative difference between the children’s responses in the two studies may have been caused by differences between the experimental tasks. Philip used a truth-value judgment task with an elaborate story to introduce the single test sentence Een vogel heeft elke bosbes opgegeten (‘a bird has eaten each blueberry’). At the beginning of this story, three birds were introduced: a fat bird, a thin bird and a small bird. Throughout the story, these birds are referred to with de¿nite noun phrases. If children know that inde¿nites generally introduce new referents in the discourse and do not refer to referents that are already given, they may have rejected the test sentence for the wrong reason. However, there is some doubt as to whether children are actually able to use givenness information appropriately in an experimental setting.4 We will return to the issue of givenness below, as it may also be relevant for our interpretation of the production results. Suf¿ce it to say that in our comprehension task, we did not provide any introductory linguistic discourse but presented the pre-recorded test sentence out of the blue as a description of the 3. One of the reviewers wonders whether the low proportion of ‘yes’ responses for adults in the IS-QO 3-3 condition could have been due to the adult participants’ expectancy to receive roughly equal numbers of ‘yes’ and ‘no’ items. However, if this were true, we would have seen comparable yes-no patterns across conditions. The fact that we did not, indicates that the adults made a linguistically based distinction between the four conditions. 4. In spontaneous speech, children as young as 3 seem to use referring forms in line with their givenness status (Gundel, Ntelitheos & Kowalsky 2007). In experimental tasks, however, children may be less sensitive to the givenness status of the referents, perhaps because givenness competes with other factors such as speci¿city. For example, van Hout, Harrigan & de Villiers (2010) found that English preschoolers use de¿nite noun phrases to refer to new discourse entities and choose a given referent in more than half of the cases when hearing an inde¿nite noun phrase.

160

Petra Hendriks, Ruth Koops van ’t Jagt & John Hoeks

picture sequence on the computer screen. Consequently, the inde¿nite subjects and inde¿nite objects in our test sentences are felicitous as far as their givenness status is concerned.

5.

Production

5.1.

Results

In the production task, participants were asked to give a one-sentence description of the third picture of the sequence, without being given instructions about the speci¿c form of the sentence. As a consequence, many different forms were produced. We were interested in the participants’ production of transitive ISQO and QS-IO sentences in the active voice, which were the test items in the comprehension task, as well as in their production of other forms to express a speci¿c or non-speci¿c meaning of the subject or object. These latter forms include sentences with de¿nite subjects and objects. We coded as inde¿nite noun phrases (IS and IO) all singular forms with the inde¿nite article een, plural forms without an article, and singular and plural forms with bare numerals (één ‘one’, drie ‘three’). We coded as universally quanti¿ed expressions (QS and QO) all expressions with one of the quanti¿ers elke ‘every’, alle ‘all’ or iedere ‘each’ (iedere was used only a few times by children and adults, but elke and alle occurred frequently). De¿nite noun phrases (see below) included all noun phrases with a de¿nite article (de or het) or a universal quanti¿er (elke, alle or iedere). The participants’ production of IS-QO and QS-IO sentences is shown in Figure 5:5

5. We only counted target utterances in the active voice, although some adult participants also produced passives. In coding the utterances, we ignored disambiguating expressions such as verschillende ‘different’, om de beurt ‘in turn’ , allemaal ‘all’ and samen ‘together’, which were produced several times by children (in 4/204 utterances) and adults (in 13/240 utterances).

Restricting quanti¿er scope in Dutch

161

Production

proportion of total responses

.70 .60 .50 .40

Adults

.30

Children

.20 .10 .00 IS-QO

QS-IO 1-3

IS-QO

QS-IO 3-3

IS-QO

QS-IO 3-1

Figure 5. Production of IS-QO and QS-IO utterances in the active voice for the 1-3, 3-3, and 3-1 picture sequences by adults and children.

Proportions of IS-QO and QS-IO responses were calculated per participant and then arcsine transformed. Neither children nor adults produced any ISQO utterance in the 3-3 pictures and the 3-1 pictures. Therefore, instead of an ANOVA, we ran an independent groups t-test on proportions IS-QO for the 1-3 pictures. We found a signi¿cant difference between children and adults (t(30) = í2.7; p < 0.01). The transformed proportions of QS-IO responses were then entered into a Repeated Measures ANOVA with Group (children vs. adults) as a between-participants factor, and Picture (1-3, 3-3, or 3-1) as a within-participants factor. Huynh-Feldt correction was applied where appropriate; actual degrees of freedom are reported, rounded to the nearest integer. There were no signi¿cant differences between the groups as far as the QSIO responses were concerned: There was no main effect of Group (F < 1), nor was there a signi¿cant interaction between Group and Picture (p-value > 0.24). The main effect of Picture was signi¿cant (F(2,59) = 35.0; p < 0.001), indicating that most QS-IO responses were made to 3-3 pictures (mean = 0.47; SE = 0.07), followed by 3-1 pictures (mean = 0.24, SE = 0.05), and fewest to 1-3 pictures (mean = 0.04, SE = 0.02); all differences between the pictures were signi¿cant (p-values < 0.001).

162

Petra Hendriks, Ruth Koops van ’t Jagt & John Hoeks

proportion of total number of responses

Production 0.80 0.70 0.60 0.50 Adults

0.40

Children

0.30 0.20 0.10 0.00 indef

def

other

Figure 6. Production of inde¿nite versus de¿nite subjects of transitive sentences by adults and children. The category ‘other’ includes unscorable utterances as well as subjects of intransitive sentences and passive sentences.

We were also interested in ¿nding out what form participants would choose for the grammatical subjects in the sentences that were produced: an inde¿nite or a de¿nite noun phrase. The category of inde¿nite noun phrases includes noun phrases with an inde¿nite article (children: 57% of inde¿nite subjects, adults: 52% of inde¿nite subjects) as well as noun phrases with a bare numeral (children: 33%, adults: 48%) and plural forms without any article (children: 10%, adults: 0%). We calculated proportions of inde¿nite subjects, de¿nite subjects and other productions for each participant. These proportions were arcsine transformed and entered into a Repeated Measures ANOVA with Type of Response (inde¿nite, de¿nite, other), and Picture (1-3, 3-3, 3-1) as within-participants factors, and Group (children vs. adults) as between-participants factor. Figure 6 shows the mean proportions by participants averaged over pictures. The ANOVA showed an interaction between Group and Type of Response (F(1, 46) = 3.7; p = 0.050), which was due to children producing fewer inde¿nite (p < 0.05), but more de¿nite subjects (p = 0.083) than adults; the difference in ‘other’ responses was not signi¿cant (p > 0.40). No other effects involving Group were signi¿cant (p > 0.19). 5.2.

Discussion

In the production task, we aimed to elicit similar utterances as we tested in comprehension: utterances with an inde¿nite subject and a universally quanti¿ed object (IS-QO), and utterances with a universally quanti¿ed subject and an inde¿nite object (QS-IO). Our production results show that adults as well as children

Restricting quanti¿er scope in Dutch

163

produced such utterances on the basis of the picture sequences. They produced utterances with an inde¿nite subject and a universally quanti¿ed object (IS-QO) for unique actors (1-3 picture sequences), and utterances with a universally quanti¿ed subject and an inde¿nite object (QS-IO) for non-unique undergoers (3-3 picture sequences) and unique undergoers (3-1 picture sequences). Importantly, they did not produce utterances with an inde¿nite subject and a universally quanti¿ed object (IS-QO) for non-unique actors (3-3 picture sequences). This is not particularly surprising for adults, as adults did not allow this form-meaning combination in comprehension either. So the adult pattern in production corresponds to their pattern in comprehension: Adults do not allow inde¿nite subjects to be interpreted non-speci¿cally, and correspondingly do not produce inde¿nite subjects with a non-speci¿c meaning. This suggests that speci¿city is not merely an interpretational phenomenon, as would follow from Philip’s pragmatic treatment of quanti¿er scope in Dutch. Recall that, according to Philip’s account, speci¿city is the result of an interpretive rule that requires listeners to select the strongest possible meaning (see Section 2.1). Consequently, there is no reason why we should ¿nd corresponding speci¿city effects in production. Surprisingly, just like the adults, the children in our study did not produce IS-QO utterances for non-unique actors (3-3 picture sequences) either. This contrasts with the comprehension task, where children allowed inde¿nite subjects to be interpreted non-speci¿cally. So children’s pattern with non-speci¿c inde¿nite subjects in comprehension is different from their pattern in production. In general, children hardly produced any IS-QO utterances. However, their avoidance of these constructions does not seem to indicate a lack of syntactic knowledge, as would be expected from a syntactic explanation of the quanti¿er scope restrictions in Dutch. The children did produce inde¿nite subjects in other transitive constructions, as can be seen from Figure 6. Also, they did not make any word order errors such as leaving the verb in ¿nal position or producing verb-subject order. In general, almost all utterances they produced were grammatical transitive sentences with SVO word order. Children were not completely adult-like in their productions, however. They differed from adults in their production of speci¿c inde¿nite subjects, and produced signi¿cantly fewer IS-QO utterances than adults in situations with a unique actor (1-3 picture sequences). They even produced a few QS-IO utterances, thus violating the anti-uniqueness presupposition of universal quanti¿ers according to which the domain of quanti¿cation must contain at least two elements. As children still make comprehension errors with this presupposition as late as age 6 (Yatsushiro 2008), such violations can be expected to occur in production, too. These violations contribute to the general pattern that children, unlike adults, avoid producing speci¿c inde¿nite subjects.

164

Petra Hendriks, Ruth Koops van ’t Jagt & John Hoeks

So why do children hardly produce any IS-QO utterances in situations with a unique actor, in contrast to adults? The semantic account predicts that both adults and children will avoid using an inde¿nite to refer to a speci¿c referent, and use a de¿nite instead. From this perspective, the crucial question is not why children avoid using inde¿nite subjects for speci¿c referents, but rather why adults use an inde¿nite subject in these circumstances in as much as a third of the cases. To shed more light on this issue, we looked at the full set of utterances produced. Together with the two target forms IS-QO and QSIO, transitive utterances with one or two de¿nites (such as elke beer kietelt de schildpadden ‘every bear tickles the turtles’) or two inde¿nites (e.g., drie beren kietelen een schildpad ‘three bears tickle a turtle’) account for almost all of the production data. Focusing on the production of de¿nite and inde¿nite subjects, we ¿nd that, in general, children produced signi¿cantly more de¿nite subjects and fewer inde¿nite subjects than adults, independently of the speci¿city of the referent. Additional analyses showed that children also produced slightly more de¿nite objects than adults, but this difference was not signi¿cant. So perhaps the adults produced more IS-QO utterances than the children because in general they produced more inde¿nite subjects than the children. At least two explanations are conceivable for our ¿nding that adults produce inde¿nite subjects for speci¿c referents. First, as the production task was preceded by the comprehension task, adults may have been primed more strongly than children (perhaps because of their larger working memory capacity) to produce similar forms as the ones that were presented in the comprehension task. As the comprehension task involved sentences with inde¿nite subjects, the adult participants may have been primed to use such forms in production, too. A second explanation for our ¿nding is that givenness, or familiarity of the referent, may have had a different effect on children and adults. Adding a third constraint, M3 (*Def/-Fam: Avoid non-familiar de¿nites, cf. Farkas & de Swart 2008), to the OT grammar under consideration may account for this selective inÀuence of the (absence of) linguistic discourse. The OT tableaux below show the interaction among these constraints in the adult grammar (Figure 7) and the child grammar (Figure 8): Input: +Spec, íFam Indef ) Def

M1: *S/íSpec; *O/+Spec

M3: *Def/íFam

M2: *Indef/+Spec *

*!

Figure 7. OT tableau of adult speakers’ choice of form for a speci¿c but non-familiar referent, with M1 outranking M2.

Restricting quanti¿er scope in Dutch

165

In the adult grammar, M1 is stronger than M2 (see Section 2.1). If the input to optimization is a referent that is speci¿c but not familiar, a de¿nite violates the M3 constraint on familiarity, whereas an inde¿nite violates the M2 constraint on speci¿city. If M3 is ranked higher than M2 in the adult grammar, the violation of M3 is fatal and hence an adult speaker will select an inde¿nite to express the input meaning. Input: +Spec, íFam Indef ) Def

M2: *Indef/+Spec *!

M3: *Def/íFam

M1: *S/íSpec;*O/+Spec

*

Figure 8. OT tableau of child speakers’ choice of form for a speci¿c but non-familiar referent, with M2 outranking M1.

A child entertaining a non-adult constraint ranking with M2 ranked highest (Figure 8) will, for the same input meaning, prefer a de¿nite, as an inde¿nite would violate the strongest constraint M2. According to this constraint-based explanation, the different patterns of adults and children may arise because the speci¿city constraint M2 and the familiarity constraint M3 are in conÀict when the input meaning is speci¿c but at the same time non-familiar. In all other situations, these constraints will yield the same output form for children and adults. These predictions about the interaction between speci¿city and familiarity are partly con¿rmed by our production data. Indeed, we found that children produced signi¿cantly more de¿nite subjects than adults. However, they did so for subjects in general rather than for speci¿c subjects only.

6.

Conclusion

The question this study aimed to answer is why Dutch strongly prefers an inde¿nite subject in sentence-initial position to have wide scope with respect to a universally quanti¿ed direct object. We were interested in determining whether the preference for the wide scope reading in adult Dutch, and the lack of this preference in Dutch child language, is related to the preference to interpret an inde¿nite subject in sentence-inital position speci¿cally. Therefore, we tested adults and 4- to 6-year-old children on their comprehension and production of universally quanti¿ed sentences in situations featuring speci¿c versus non-speci¿c referents. The results of our comprehension task con¿rm the ¿ndings of Philip (2005) that Dutch children, in contrast to Dutch adults, prefer a narrow

166

Petra Hendriks, Ruth Koops van ’t Jagt & John Hoeks

scope interpretation according to which inde¿nite subjects in canonical position receive a non-speci¿c interpretation. The results of our parallel production task indicate that children’s non-adult interpretations do not have a purely syntactic or pragmatic cause. Rather, children’s acceptance of non-speci¿c readings for inde¿nite subjects in comprehension and their preference for de¿nite subjects in production suggest that the scopal restrictions on inde¿nites in Dutch may be due to a non-adult ranking of constraints on speci¿city and familiarity.

References Beghelli, F. & T. Stowell 1997 Distributivity and negation: The syntax of each and every. In: A. Szabolcsi (Ed.), Ways of scope taking, 71–107. Dordrecht: Kluwer. Bergsma-Klein, W. 1996 Speci¿city in child Dutch: An experimental study. MA Thesis, Utrecht University. Dalrymple, M., M. Kanazawa, S. Mchombo & S. Peters 1994 What do reciprocals mean? In: M. Harvey and L. Santelmann (Eds), Proceedings of SALT IV, 61–78. Ithaca: Cornell University. de Hoop, H. & I. Krämer 2005/6 Children’s optimal interpretations of inde¿nite subjects and objects. Language Acquisition 13: 103–123. Farkas, D. & H. de Swart 2008 Article choice in plural generics. Lingua 117: 1657–1676. Gundel, J. K., D. Ntelitheos & M. Kowalsky 2007 Children’s use of referring expressions: Some implications for Theory of Mind. In: D. Bittner and N. Gagarina (Eds), ZAS Papers in Linguistics 48: 1–21. Hendriks, P. & J. Spenader 2005/6 When production precedes comprehension: An optimization approach to the acquisition of pronouns. Language Acquisition 13: 319–348. Krämer, I. 1998 Children’s interpretations of inde¿nite object noun phrases: Evidence from the scope of negation. In: R. van Bezooijen and R. Kager (Eds), Linguistics in the Netherlands 1998, John Benjamins. Philip, W. 2005 Pragmatic control of speci¿city and scope: Evidence from Dutch L1A. In: E. Maier, C. Bary and J. Huitink (Eds), Proceedings of Sinn und Bedeutung 9, Nijmegen, 271–285.

Restricting quanti¿er scope in Dutch

167

Prince, A. & P. Smolensky 1993/2004 Optimality Theory: Constraint interaction in generative grammar. Technical Report, Rutgers University and University of Colorado at Boulder, 1993. Revised version published by Blackwell, 2004. Prince, A. & P. Smolensky 1997 Optimality: From neural networks to Universal Grammar. Science 275: 1604–1610. Stevens, J. 1992 Applied multivariate statistics for the social sciences. Second edition. Hillsdale, NJ: Lawrence Erlbaum Associates. Tesar, B. & P. Smolensky 1998 Learnability in Optimality Theory. Linguistic Inquiry 29: 229–268. Unsworth, S. 2007 L1 and L2 acquisition between sentence and discourse: Comparing production and comprehension in child Dutch. Lingua 117: 1930–1958. van Hout, A., K. Harrigan & J. de Villiers 2010 Asymmetries in the acquisition of de¿nite and inde¿nite noun phrases. Lingua 120: 1973–1990. Yatsushiro, K. 2008 Quanti¿er acquisition: Presuppositions of “every”. In: A. Grønn (Ed.), Proceedings of Sinn und Bedeutung 12, 663–676. Oslo: ILOS. Wijnen, F. & M. Verrips 1998 The acquisition of Dutch syntax. In: S. Gillis and A. De Houwer (Eds), The acquisition of Dutch. Amsterdam: Benjamins.

McGee’s counterexample to Modus Ponens in context Janneke Huitink

1.

Background: McGee’s famous counterexample

1.1.

Modus Ponens and McGee

‘If’ is one of the most important words in our language. We use ‘if’ to express hypotheses, which in turn form the basis of the plans and decisions we make in our everyday life. For example, I may be pretty certain about (1), and – wishing to pass the exam – decide to study hard. (1)

If I study hard, (then) I will pass the exam.

Sentences of the form ‘If P, (then) Q’ are termed conditionals. Conditionals help us to think about what might (have) be(en) the case if some condition holds. The part of the sentence describing the condition – in (1): if I study hard – is called the antecedent of the conditional, and the part that describes its consequences–in (1): I will pass the exam – is called the consequent. The present paper is concerned with the inferences that people are willing to draw on the basis of conditionals. In particular, we study the inference form Modus Ponens. In classical logic, this is a valid inference. It takes the following form: (2)

If P, then Q. P. Therefore, Q.

It is hard to imagine anything more self-evident than Modus Ponens. Suppose that you learn that: (3)

If Mary has an essay to write, then she visits the library.

(4)

Mary does in fact have an essay to write.

170

Janneke Huitink

What follows? The answer is clear: that Mary visits the library. In experiments, about 90% of the participants answer in accordance with Modus Ponens (see Stenning and Van Lambalgen 2008: 180, who base this number on averaging the results from multiple experiments cited in their book, and go on to use it as a base-line for the so-called suppression task). In the literature, Modus Ponens is often considered as an almost sacred principle. For instance, Gillies (2009, p. 342) writes that admitting that conditionals are not ripe for Modus Ponens “would be an embarrassment than which none greater can be imagined”. However, not everyone is as embarrassed. In 1985, McGee presented a famous counterexample to Modus Ponens. The background of his example is an opinion poll taken just before the 1980 United States presidential election. The poll results show that Reagan, one of the Republican candidates, is decisively ahead of the Democrat Carter, while the second Republican candidate in the race, Anderson, is a distant third. According to McGee, the poll results give us good reason to believe: (5)

If a Republican wins the election, then if it’s not Reagan who wins it will be Anderson.

(6)

A Republican will win the election.

Yet we have little reason, McGee argues, to believe: (7)

If it’s not Reagan who wins, it will be Anderson.

Clearly, Modus Ponens would warrant moving from (5) and (6) to (7). McGee concluded that Modus Ponens is invalid. It is important to stress that this conclusion pertains only to a special kind of conditionals: so-called right-nested conditionals like (5), whose consequent itself contains a further conditional. McGee thinks that Modus Ponens is allright for simple conditionals like ‘If Mary has an essay to write, she visits the library’. He thus does not argue that it is wrong to conclude from (3) and (4) that Mary visits the library. However, if Modus Ponens does not hold for right-nested conditionals, it does not hold generally. McGee gives an interpretation for conditionals that does not correspond to the material conditional familiar from classical logic, in which Modus Ponens is invalid. But not everyone interprets his counterexample in this way. In fact, many people have argued that McGee fails to show anything important with respect to the validity of Modus Ponens. See Piller (1996) for a good overview of the literature. Unsurprisingly, there is also a lot of disagreement about

McGee’s counterexample to Modus Ponens in context

171

the truth conditions of conditionals. See Bennett (2003) and Edgington (1995) for a comparison between different theories. It is not the purpose of this paper to settle these disputes. Rather than arguing for or against the validity of Modus Ponens for ordinary conditionals (though I will say a few words about this at the end of the paper), the present paper seeks to experimentally uncover the conditions under which people are willing to conclude ‘If Q, then R’ from ‘If P, then (if Q, then R)’ and ‘P’. We will see that this depends on the context of utterance. In particular, it depends on how the categorical premise ‘P’ is justi¿ed in this context. Point of departure will be a theory of conditionals that, in my opinion, nicely explains our intuitions in McGee’s scenario. 1.2.

The Ramsey Test and McGee’s counterexample

It is widely held, especially among linguists, that ordinary conditionals are not truth-functional or extensional, but that they have an intensional semantics (Stalnaker 1968; Lewis 1973; Kratzer 1979, 1991; Veltman 1985; McGee 1985; von Fintel to appear). Stalnaker (1968) developed an inÀuential account in terms of possible worlds. He says that a conditional ‘If P, then Q’ is true if and only if ‘Q’ is true in the closest ‘P’-world: the world in which ‘P’ is true that differs minimally from the actual world. Stalnaker’s truth conditions are based on the so-called Ramsey Test, after Frank Ramsey (1931): First, add the antecedent (hypothetically) to your stock of beliefs; second, make whatever adjustments are required to maintain consistency (without modifying the hypothetical belief in the antecedent); ¿nally, consider whether or not the consequent is then true (Stalnaker 1968, 44; emphasis mine)

So in a Ramsey Test, people decide how much con¿dence they have in a conditional ‘If P, then Q’, by temporarily accepting ‘P’ and making minimal adjustments to preserve coherence in their beliefs. They then judge the acceptability of ‘Q’. This comes down to judging the conditional probability of ‘Q’, given ‘P’. In recent years, psychologists have gathered a lot of experimental data demonstrating that people indeed predominantly proceed according to the Ramsey Test when they evaluate ordinary conditionals (the relevant work is summarized in Evans & Over 2004). This paper may be regarded as further demonstration of the psychological reality of the Ramsey Test. Our focus will be one of the simplest argument forms of all: Modus Ponens. I will argue that this inference is found compelling only in case the context is such that the Ramsey Test can coherently be performed on the conclusion.

172

Janneke Huitink

Here is how the Ramsey Test provides a diagnosis for McGee’s counterexample. The problem with McGee’s scenario is that here performing the Ramsey Test on the conclusion leads to inconsistency. After all, the Ramsey Test requires that we add the conclusion’s antecedent ‘If it’s not Reagan who wins’ to our stock of beliefs (or: to the context), which already contains the second premise ‘A Republican will win’. But the outcome of this is incoherent, the argument runs, because the sole reason to accept this premise is that Reagan is likely to win. Once we suppose that Reagan doesn’t win, we no longer have reason to believe that a Republican will win, because this belief was entirely based on the poll results according to which Reagan is the likely winner. The natural way to react to this inconsistency – according to the Ramsey Test – is to make the adjustments to our stock of beliefs that are needed to save coherence and throw out the conÀicting information in the second premise. The result is that ‘If it isn’t Reagan who wins, it will be Anderson’ becomes evaluated against a context where all that is believed is that there are three candidates, two Republicans Reagan and Anderson and the Democrat Carter. Obviously, the conditional is not acceptable in this context, as Carter might just as well win in case Reagan doesn’t. To sum up, the Ramsey Test neatly explains our intuitions in the McGeescenario. What is more, the theory suggests that these intuitions depend on the context. In contexts where we need not lose sight of the premises to evaluate the conclusion, the Ramsey Test results in high con¿dence in the conclusion. This allows us to formulate some testable predictions about the strength of Modus Ponens for right-nested conditionals. 1.3.

Predictions

We saw that in McGee’s scenario there is a conÀict between our grounds for accepting the second premise (a Republican will win, because Reagan is likely to win) on the one hand and the conclusion (that asks us to imagine that Reagan will not win) on the other. Then, in order to perform the Ramsey Test on the conclusion, people will have to give up their endorsement of the second premise. There are however two ways to block the information in this premise from slipping. First, one may rephrase the conclusion by adding some strategically placed anaphors. Compare the original McGee argument to this reworded argument: (8)

If a Republican wins, then if he is not Reagan he will be Anderson.

McGee’s counterexample to Modus Ponens in context

(9)

173

A Republican will win.

(10) If he is not Reagan, he will be Anderson. Whether we believe (10) depends on what the referent for ‘he’ is supposed to be. The most salient candidate is introduced in the second premise: ‘the Republican winner’. If we substitute this referent for ‘he’, the result is: (11) If the Republican winner is not Reagan, the Republican winner is Anderson. This conclusion can be assessed without giving up our belief in the second premise. What is more, the Ramsey Test predicts that people will now judge the conclusion acceptable (Over 1987; Gillies 2004). Reagan and Anderson are the only Republican candidates, therefore the probability that Anderson will win on the supposition that Reagan isn’t the Republican winner, is high. That is, Modus Ponens will go through in this case. Instead of rephrasing the conclusion, we might also change the grounds for believing the second premise. Following terminology in Dummett (1978), the orginal McGee scenario gives constructive evidence: the belief in ‘A Republican will win’ is constructed from below or founded in the belief that a particular Republican (i.e. Reagan) will win. The result is that the Ramsey Test cannot be coherently performed on ‘If it isn’t Reagan who wins, it will be Anderson’. To proceed with the Test, coherence has to be restored by setting aside the belief that a Republican will win. In contrast, McGee might have given us non-constructive evidence in the background story, so that the belief that a Republican will win is no longer parasitic on a belief about which particular Republican it will be. For instance, one might tell participants that the elections will be interfered with such that a Republican will win, but it is still uncertain which one will come out the winner. Relative to such a context, the minimal changes required by the Ramsey Test will not affect the belief that a Republican will win. Then, if the antecedent of McGee’s conclusion is added (that Reagan will not win), participants will infer that Anderson will win. They will thus accept the conclusion, i.e. endorse Modus Ponens (Fulda 2010). We can now formulate three hypotheses regarding the con¿dence that people will have in Modus Ponens with right-nested conditionals:

174

Janneke Huitink

– participants will have little con¿dence in Modus Ponens in McGee-type scenarios – participants will have strong con¿dence in Modus Ponens in the same scenarios if the conclusion is augmented with anaphors that pick up the second premise – participants will endorse Modus Ponens in case the context offers non-constructive evidence for the second premise The next section describes an experiment that tests these predictions.

2.

Experiment: manipulating the context for Modus Ponens

2.4.

Method

Although there is a lot of experimental research on conditionals, there is nothing yet on hypothetical thinking with nested conditionals like McGee’s ¿rst premise. It is telling that Oaksford & Chater (2010) have nothing about reasoning with iterated hypotheses, even though their book contains contributions by pretty much every psychologist working on conditionals. One reason is perhaps that it can be expected that people will ¿nd iterated conditionals hard to understand. Following Stalnaker’s semantics, to evaluate a right-nested conditional ‘If P, then (if Q, then R)’ one ¿rst has to ¿nd the closest ‘P’-world, and then ¿nd the ‘Q’-world that is closest to that ‘P’-world (and check whether ‘R’ holds there). This involves judging similarity for worlds other than the actual world. It would not be surprising if people found this dif¿cult. In order not to stress the participants too much, the number of items to judge was kept as small as possible. We therefore chose a between subject design. Three groups of participants were compared. The ¿rst group (n = 9) saw the original McGee form, the second group (n = 9) saw these arguments with an anaphoric conclusion, and the ¿nal group (n = 9) received non-constructive justi¿cation for the second premise. So a total of 27 people participated in the experiments, and these were all employees of the university library in Nijmegen, the Netherlands. Six cover stories were created, in which the premises were embedded. Each story had a ‘McGee-type architecture’. For example: Sample item 1 (original McGee-format; translated from Dutch) James is at the bar ordering a drink. He usually drinks champagne, but he sometimes enjoys matcha (Japanese green tea) as well. He hardly ever drinks

McGee’s counterexample to Modus Ponens in context

175

sake. But if he orders an alcoholic drink, then if it is Japanese, it is sake. Since he usually drinks champagne, he will probably order an alcoholic drink. Do you think that he will order sake if he orders a drink that is Japanese? He probably will/he probably won’t/neither.

The information about James’ drinking habits makes the ¿rst and second premise likely, but the conclusion unlikely. Evaluating the conclusion in a coherent fashion amounts to set aside the information that James is likely to order an alcoholic drink. This makes the conclusion inacceptable, because James will order matcha rather than sake on the supposition that he orders a Japanese drink. The prediction is thus that participants will answer ‘He probably won’t’. This answer corresponds to resisting Modus Ponens. Note that participants had to judge whether the conclusion was likely, unlikely or neither, not whether it was (un)likely to be true. This was done because asking for the probability that ‘If P, then Q’ is true might incite some people to give the probability of ‘P and Q’ (cf. Edgington 2003). This could potentially interfere in the third condition (I will come back to this below). The original McGee format was now subjected to two manipulations of the context, (i) by adding anaphors and (ii) by adding non-constructive evidence. In the ¿rst manipulation, however, we could not add pronouns, as Over (1987) and Gillies (2004) recommend: ‘If he is not Reagan, …’ (recall example (8)– (10) above). The reason for this is that the items were presented in Dutch, and in this language, using pronouns in identifying contexts is odd. Rather, in such contexts it-clefts are preferred: (12) Als het Reagan niet if it Reagan not ‘If it isn’t Reagan, …’

is, … is, …

Unlike the pronoun in (10), the it-cleft in (12) is open to various interpretations. Though it might be understood as picking up ‘the Republican winner’, it might just as well mean ‘the winner’. In this second interpretation, however, information slippage would not be blocked. To overcome this problem, we used discourse particles like ook ‘as well’ instead of it-clefts or pronouns in all items in the second condition. Such particles have anaphoric properties too (see for instance Geurts & van der Sandt 2004), so they are expected to have the same acceptability enhancing effect on Modus Ponens. Here is an example (the underlining highlights the change with respect to sample item 1; this was not used in the ¿nal questionnaire):

176

Janneke Huitink

Sample item 2 (anaphoric conclusion; translated from Dutch) James is at the bar ordering a drink. He usually drinks champagne, but he sometimes enjoys matcha (Japanese green tea) as well. He hardly ever drinks sake. But if he orders an alcoholic drink, then if it is Japanese as well, it will be sake. Since he usually drinks champagne, he will probably order an alcoholic drink. Do you think that he will order sake if he orders a drink that is Japanese as well? He probably will/he probably won’t/neither.

The particle ‘as well’ indicates a parallel relation between the property of being a Japanese drink and some other property. The most salient property is given in the second premise: being an alcoholic drink. Thus, the idea is that the conclusion will become interpreted as ‘If he will order a drink that is both alcoholic and Japanese, it will be sake’. As explained above, the hypothesis is that one will ¿nd signi¿cantly more ‘He probably will’ answers for this conclusion compared to the non-anaphoric one. That is, Modus Ponens will be endorsed more. As for the second manipulation, this involved the evidence given for the second premise. In the orginal story about James, the belief that James will order an alcoholic drink is clearly parasitic upon the belief that he will order a speci¿c alcoholic drink, champagne. But what would happen in case this belief came about by non-parasitic, non-constructive evidence instead? To test this, we used items that looked like this (again, the underlining just highlights the change with respect to sample item 1 but was not presented to the participants): Sample item 3 (non-constructive evidence; translated from Dutch) James is at the bar ordering a drink. He usually drinks champagne, but he sometimes enjoys matcha (Japanese green tea) as well. He hardly ever drinks sake. But if he orders an alcoholic drink, then if it is Japanese, it will be sake. Since he doesn’t have to drive, James will probably order an alcoholic drink. Do you think that he will order sake if he orders a drink that is Japanese? He probably will/he probably won’t/neither.

Again, the hypothesis is that we ¿nd signi¿cantly more ‘He probably will’ answers than in the original case. Here it is important that people actually judge the conditional probability of James ordering sake given that he will order a drink that is both alcoholic and Japanese. This probability is high. Evans & Over (2004) report that in experiments people tend to answer in terms of conjunctive probabilities. Here this would mean that they would assess the prob-

McGee’s counterexample to Modus Ponens in context

177

ability of James ordering a drink that is alcoholic, Japanese and sake. This probability is low, simply because he hardly ever drinks sake. However, in their experiments, participants were asked to assess the probability that the target sentence was true. Edgington (2003) convincingly argues that this may have led the participants to adopt a non-conditional interpretation. She suggests one should simply ask for judgments of probability rather than for judgments of probability of being true. This is what we did. To sum up, each questionnaire contained 3 target items, and 3 ¿ller items, so participants judged 6 story-conclusion pairs all together. Group 1 judged only original McGee cases, group 2 saw these stories but with discourse particles added, and group 3 saw these stories but they were given non-constructive evidence for the second premise. So there were three versions of the questionnaire. The items were presented in random order. The ¿ller items were used to control whether the participants actually did use conditional probabilities to answer the question. An example is: Filler item Three women compete in a TV show for the love of a millionaire. Daisy and Amanda have blonde hair, Rebecca is a brunette. So if a blond woman wins, it will be Amanda if it isn’t Daisy. Because the millionaire is only attracted to blond women, it is likely that a blond woman will win. Do you think that Rebecca will win if a brunette wins? She probably will/she probably won’t/neither.

Here the intended answer is of course that she probably will. However, if participants were judging the probability of the conjunction of ‘A brunette will win’ and ‘Rebecca will win’ instead of the conditional probability, they would answer ‘She probably won’t’. The answers to the ¿llers were not analyzed, but participants were excluded from the experiment if they gave the “wrong” answer for 2 or more ¿ller items. This was necessary only once. This participant was replaced with another person (so, strictly speaking, 28 rather than 27 people participated in the experiment). 2.5.

Results

Table 1 presents the percentages of Modus Ponens inferences made; each based on the responses of 9 participants to 3 items:

178

Janneke Huitink

Table 1. Percentages of Modus Ponens inferences: original format 30% anaphoric conclusion 56% non-constructive evidence 81%

A one-way independent ANOVA revealed a signi¿cant effect of context type on the endorsement of Modus Ponens, F(2, 26) = 6.84, p < 0.01. Prior to the ANOVA test, Levene›s Test for equality of variances was performed, p = 0.953, so the assumptions for parametric statistics were met, even though the sample size was quite small. Participants drew signi¿cantly more Modus Ponens inferences with an anaphoric conclusion than in the original case t(24) = í1.84, p < 0.05, revealed by a one-tailed planned contrast. However, as the percentage of accepted inferences (56%) is around chance-level, we arguably cannot conclude that Modus Ponens is truly endorsed in this case. Thus, although the addition of strategically placed anaphora does inÀuence the acceptability of the conclusion, the effect is not as big as predicted. Participants also made more Modus Ponens inferences (81%) with background stories that provided non-constructive evidence compared to the original case, t(24) = í3.70, p < 0.001 (one-tailed again). This strongly supports the charge that we cannot simply ask whether ‘If Q, then R’ follows from ‘If P, then (if Q, then R)’ and ‘P’ without asking how ‘P’ itself is inferred, constructively or non-constructively. In the non-constructive case, the conclusion follows.

3.

Discussion: the importance of context

The experiment’s results give rise to three questions that merit further discussion. First, why should there be such a difference between the second (anaphoric) condition and the third (non-constructive evidence) condition? Second, why is there still a 30% endorsement rate of Modus Ponens in the original case? And ¿nally, what do these results mean with respect to the Ramsey Test? As for the ¿rst question, although the addition of discourse particles led to an increase in the endorsement of Modus Ponens compared to the original McGee format, the endorsement rates were not as high as one would expect. Why were higher rates not achieved? We submit that this is an artefact of the use of discourse particles instead of pronouns. Yes, it is true that discourse particles are anaphoric just as pronouns are, but there are also differences between the two. If we encounter a sentence with a pronoun, we have to look into the context to see whom it refers to in order to make sense of the sentence. That

McGee’s counterexample to Modus Ponens in context

179

is, without knowing the identity of ‘he’, we have no idea what proposition is expressed by ‘He is the winner’. Not so with discourse particles. According to many theories (e.g. Zeevat 2006), discourse particles do not contribute to the truth-conditional content of the sentence in which they occur. Take ‘John lives in New York as well’. This sentence suggests that some salient person who is not John lives in New York. However, even in the absence of knowing the identity of this person, the statement ‘John lives in New York as well’ still manages to communicate something, namely that John lives in New York. One can thus ignore the discourse particle. Admittedly, this makes the sentence pragmatically odd: one wonders why ‘as well’ was included in the ¿rst place, but it doesn’t render the sentence uninterpretable. We suspect that people who did not endorse Modus Ponens with anaphoric conclusions ignored the discourse markers in the material. Oral feedback that the participants gave after participating in the experiment suggests that something like this was indeed going on. Participants would say that they were unsure how to deal with words like ‘as well’. They felt that these words were probably not included without reason, but the items were already so dif¿cult that they decided not to make too much of an effort to ¿guring out that reason. As an anonymous reviewer points out, the addition of a discourse particle makes the conclusion ambiguous. For instance, adding ‘as well’ in sample item 2 does suggest that the drink should be both alcoholic and Japanese, but this is just one possible interpretation. Therefore, even if participants did not ignore the particles, this doesn’t guarantee that they gave it the intended Modus Ponens enhancing interpretation. It is thus only to be expected that the endorsment rate in the anaphoric conclusion condition is not as high as in the nonconstructive evidence condition. Second, 30% of the participants endorsed Modus Ponens even in the McGee scenario. One may wonder whether this percentage shouldn’t be lower if the participants were using the Ramsey Test to assess their con¿dence in the conclusion. It turns out that the endorsment of Modus Ponens in the original case was not evenly distributed among the items, but that one of the target items elicited ‘He probably will’-answers. This was the following item, which was adapted from one of the examples in McGee (1985): Sample Item 4 (original McGee-format; translated from Dutch) Because he heard that gold and silver were once mined in his region, Otto has dug a mine in his backyard. Except for gold and silver, the soil doesn’t contain anything of value, so if Otto doesn’t ¿nd gold, then he will ¿nd silver if he ¿nds something of value. Unfortunately for Otto, there is hardly any gold and silver left in his region, so he probably won’t ¿nd any gold.

180

Janneke Huitink

Do you think that Otto will ¿nd silver, if he ¿nds something of value? He probably will / he probably won’t / neither

What is it about this item that makes people endorse Modus Ponens? The example is somewhat different from the more famous election-case, and also from the James-item presented above. In this example, the probability of John ¿nding gold seems equal to the probability of him ¿nding silver. In the election-case, on the other hand, Reagan was far more likely to win than Anderson. Finally, let’s address how these results con¿rm that people employ a Ramsey Test like semantics for conditionals. The ¿ndings point to the conclusion that the willingness to apply Modus Ponens depends on the context. If the context is such that the Ramsey Test can coherently be performed in it, people readily apply Modus Ponens, even with right-nested conditionals. If this is not the case, they resist drawing the conclusion. Anaphors may restrict the context, such that ‘P’ is retained in it. We saw that in this case, people have con¿dence in Modus Ponens. Without such anaphors, they do not. We also saw that if ‘P’ is justi¿ed non-constructively, the coherence-criterion is satis¿ed. If, however ‘P’ is justi¿ed constructively, it is not. The experiment above con¿rms the prediction that Modus Ponens is judged acceptable in the ¿rst case, but far less acceptable in the second. This suggests that people actually will set aside a previous belief when they make a supposition to evaluate a conditional, i.e. people do seem to perform something like the Ramsey Test when they handle conditionals. It is hard to see how alternative theories could explain these data. The most prominent psychological alternative is the mental model theory by JohnsonLaird & Byrne (2002). According to this theory, we draw conclusions by building mental models that reÀect our understanding of the premises. These models represent possibilities. Not just formal information inÀuences the possibilities that we take into account, but also things like world knowledge and tacit assumptions. For this reason, the conclusions that people draw on the basis of mental models may deviate from what classical logic prescribes. About McGee’s counterexample, Johnson-Laird & Byrne (2002: 665–666) state that “from a formal standpoint, the inference [Modus Ponens] is valid”, yet they admit that “many people judge that the conclusion to the Reagan inference is false” (p. 665). Their explanation for this is that people only construct two models on the basis of the poll results, one in which Reagan wins and one in which Carter wins, because participants “know that either Reagan or his Democratic opponent, Carter will win” (p. 665). They continue:

McGee’s counterexample to Modus Ponens in context

181

The premise that a Republican will win is combined with these models, and the result is that Reagan will win, and so the conditional conclusion in the example does not follow from the premises. (Johnson-Laird & Byrne 2002, p. 665)

This may well be the case, but it doesn’t explain why adding anaphors to the conclusion should make it acceptable. Because the background information did not change in comparison to the original cases, the constructed mental models must be the same in each case, and it is thus expected that participants will answer that the conclusion does not follow. Our experiment shows however that the contrary is the case.

4.

Philosophical discussion: Modus Ponens and Or-to-If-inference

To conclude, I would like to address the more philosophical question whether Modus Ponens is valid or not. Obviously, experiments aren’t of any help settling this question. All that we have demonstrated is that Modus Ponens is sometimes applied with great con¿dence, and sometimes not, but this still leaves us with two theoretical options. On the one hand, we might give a semantics for conditionals according to which Modus Ponens is valid, and try to explain why a valid pattern is sometimes resisted. Or, we could give a theory of conditionals that makes Modus Ponens invalid, and then we have to explain why an invalid pattern may nevertheless sometimes seem compelling. Ultimately, the choice between these two options depends on which one gives rise to the best overall theory. In the case of Modus Ponens, however, the second option is usually not seriously considered. I think that this is a mistake. Rather than arguing directly for the palatability of the idea that Modus Ponens is invalid, I would like to compare it to another inference pattern called Or-to-If-inference. This type of inference takes the following form: (13) Either P or Q Therefore, if not P, then Q. Suppose that you learn that: (14) Either Reagan or Anderson will win the elections. What follows? Indeed:

182

Janneke Huitink

(15) If Reagan doesn’t win, Anderson will. Upon closer reÀection, however, whether or not this conclusion can be drawn depends on the way that the disjunctive premise (14) is presented. The justi¿cation of this premise may be constructive, founded in the belief that a particular person will win. For example, one might justify that either Reagan or Anderson will win, by saying that the poll shows that Reagan will win. In that case, however, the Ramsey Test cannot be coherently performed on (15). Again, adding the antecedent to the stock of beliefs containing the premise results in inconsistency, because the only reason to believe that either Reagan or Anderson will win is that Reagan will win. People will then set their belief in (14) aside in evaluating (15), making the conclusion unacceptable. Alternatively, one might have non-constructive evidence for the disjunction. For example, one might accept that either Reagan or Anderson will win because one accepts that a Republican will win, without it being settled which one. Only in this case does the disjunction express genuine uncertainty. The Ramsey Test predicts that people will now be willing to conclude ‘If it isn’t Reagan who wins, it will be Anderson’. In the absence of background information, people will naturally presume that it is not known which of the two will win, for otherwise the speaker would not be cooperative in the sense of Grice (1975). That is why you probably readily drew the conclusion in (15) from (14), when you ¿rst saw this premise. The above remarks are corroborated in the experiments by Over, Evans & Elqayam (2010). Participants were asked to rate their con¿dence in the conclusion on a scale of 1 to 5. With constructive evidence, con¿dence was signi¿cantly lower than with non-constructive evidence. Thus, the justi¿cation people have for premises they accept affects which conclusions they will draw in conditional reasoning. This is also the main contribution of the present paper. Over, Evans & Elqayam show that these justi¿cations affect Or-to-If-inference, and the experiment in this paper demonstrates the effect for Modus Ponens. It is not a coincidence that justi¿cation turns out to be important both in Modus Ponens and Or-to-If-inference. McGee’s counterexample might just as well have been used to argue against the validity of Or-to-If-inference (cf. Gillies 2004). Note that given the poll results, we have good reason to believe: (16) If either Reagan or Anderson wins, then if it isn’t Reagan who wins, it will be Anderson. (17) Either Reagan or Anderson will win.

McGee’s counterexample to Modus Ponens in context

183

Yet Gillies argues we have little reason to believe: (18) If it isn’t Reagan who wins, it will be Anderson. But Or-to-If-inference would get us from (17) to (18). Therefore, the inference pattern is not generally compelling. So both Modus Ponens and Or-to-If-inference are compelling in some cases, but unconvincing in others. As said, we have two theoretical options in this case: (i) claim that the patterns are valid and explain why they are sometimes resisted, or (ii) claim that the patterns are invalid and explain why sometimes they are followed nonetheless. Surprisingly, Or-to-If-inference and Modus Ponens are usually not treated alike when it comes to choosing between these two options. Since Stalnaker (1975), the second option is predominantly chosen for Orto-If-inference. In his semantics, the pattern is invalid, but he argues that it is nevertheless a reasonable inference, under the right pragmatic conditions. This theory is by now widely accepted. For instance, it is quite telling that Over, Evans & Elqayam (2010, 18 ff.) scold Johnson-Laird & Byrne (2002) for claiming that Or-to-If-inference is valid. With Modus Ponens, on the other hand, matters are totally different. In fact, most scholars would be embarrassed to defend a theory that invalidates the principle, so they go out of their way to maintain the validity of Modus Ponens (e.g. Gillies 2009). However, the only underlying reason for this appears to be the mere desire to have Modus Ponens valid, which is arguably not the best of reasons. I do not see why we should be more conservative about Modus Ponens than about Or-to-If-inference. To be sure, I am not saying that Modus Ponens is invalid, nor that my experiment demonstrates its invalidity. The aim of this paper is much more modest: to make a ¿rst stab at uncovering the conditions under which people are willing to draw conclusions by Modus Ponens. It remains to be seen what kind of theory best explains these conditions, one that claims Modus Ponens to be valid or one that claims it to be invalid. I merely wanted to argue that we should be open to both these options. In the mean time, I strongly recommend that Modus Ponens is studied alongside other inference patterns, Or-to-If-inference being a case in point.

184

Janneke Huitink

References Bennett, Jonathan Francis 2003 A Philosophical Guide to Conditionals. Oxford: Oxford University Press. Dummett, Michael 1978 Truth and Other Enigmas. London: Duckworth. Edgington, Dorothy 2003 What if? Questions about conditionals. Mind and Language 18 (4): 380– 401. Edgington, Dorothy 1995 On conditionals. Mind 104 (414): 235–329. Evans, Jonathan St. B. T. & David Over 2004 If: Supposition, Pragmatics and Dual Processes. Oxford: Oxford University Press. Fintel, Kai von to appear Conditionals. In Klaus von Heusinger, Claudia Maienborn and Paul Portner (eds), Semantics: An international handbook of meaning. Preprint: http://mit.edu/¿ntel/¿ntel-2009-hsk-conditionals.pdf. Fulda, Joseph 2010 Vann McGee’s counterexample to Modus Ponens: An enthymeme. Journal of Pragmatics 42 (1): 271–273. Geurts, Bart & Rob van der Sandt 2004 Interpreting focus again. Theoretical Linguistics 30: 149–161. Gillies, Anthony 2009 On truth-conditions for if (but not quite only if). The Philosophical Review 118 (3): 325–349. Gillies, Anthony 2004 Epistemic conditionals and conditional epistemics. Noûs 38 (4): 585– 616. Grice, Paul 1975 Logic and conversation. In Peter Cole and Jerry Morgan (eds.), Speech Acts ( = Syntax and Semantics: Volume 3), 41–58. New York: Academic Press. Johnson-Laird, Philip & Ruth Byrne 2002 Conditionals: A theory of meaning, pragmatics and inference. The Psychological Review 109 (4): 646–678. Kratzer, Angelika 1991 Modality/Conditionals. In Arnim von Stechow and Dieter Wunderlich (eds.) Semantik: ein internationales Handbuch der zeitgenössischen Forschung/Semantics: An international handbook of contemporary research, 639–656. Berlin: De Gruyter.

McGee’s counterexample to Modus Ponens in context

185

Kratzer, Angelika 1979 Conditional necessity and possibility. In Rainer Bäuerle, Urs Egli and Arnim von Stechow (eds.) Semantics from different points of view, 117– 147. Berlin: Springer-Verlag. Lewis, David K. 1973 Counterfactuals. Cambridge, MA: Harvard University Press. McGee, Vann 1985 A counterexample to modus ponens. The Journal of Philosophy 82 (9): 462– 471. Oaksford, Mike & Nick Chater 2010 Cognition and Conditionals: Probability and Logic in Human Thinking. New York: Oxford University Press. Over, David 1987 Assumptions and the supposed counterexamples to Modus Ponens. Analysis 47 (3): 142–146. Over, David, Jonathan St. B. T. Evans & Shira Elqayam 2010 Conditionals and non-constructive reasoning. In Mike Oaksford and Nick Chater (eds.), Cognition and Conditionals: Probability and Logic in Human Thinking. New York: Oxford University Press. Piller, Christian 1996 Vann McGee’s counterexample to Modus Ponens. Philosophical Studies 82: 27–54. Ramsey, Frank 1931 The Foundations of Mathematics. London: Routledge and Kegan Paul. Stalnaker, Robert 1975 Indicative conditionals. Philosophia 5 (3): 269–286. Stalnaker, Robert 1968 A theory of conditionals. In Nicholas Rescher (ed.), Studies in Logical Theory. Oxford: Blackwell. Reprinted in William Harper, Robert Stalnaker and Glenn Pearce (eds.), Ifs: Conditionals, Beliefs, Decision, Chance and Time, 41–55. Dordrecht: D. Reidel Publishing. 1980. Stenning, Keith & Michiel van Lambalgen 2008 Human Reasoning and Cognitive Science. Cambridge (MA): MIT Press. Veltman, Frank 1985 Logics for conditionals. Ph.D. thesis, University of Amsterdam. Zeevat, Henk 2006 A dynamic approach to discourse particles. In Kerstin Fischer (ed.), Approaches to Discourse Particles 1: 133–148. Amsterdam: Elsevier.

Interpreting adjectival passives: Evidence for the activation of contrasting states Berry Claus & Olga Kriukova

1.

Introduction

Upon entering a room and noticing that the window is still in the state of being open, one might say something like (1a) or (1b): (1)

a.

The window is still open.

b.

The window is still opened.

Even though the two sentences describe the window’s current state, this is accomplished in two different ways: by using an adjective in (1a) and an adjectival passive in (1b). Adjectives and adjectival passives appear to be alternative means to convey seemingly similar states; yet, the information they carry is not exactly the same, and sentences (1a) and (1b) are not equivalent. In the present paper, we address the question as to whether comprehenders are sensitive to the differences between adjectives and adjectival passives on the level of the content of mental representations constructed during comprehension. Adjectival passives, also referred to as “be-passives” (Gehrke to appear), “adjectival resultatives” (Gese, Stolterfoht & Maienborn 2009), or “state/ stative passives” (Kratzer 2000 & Schlücker 2009, respectively), occupy a special position in language as they bear some similarities both to verbal passives and copula-adjective constructions. In English, a structure consisting of a form of “to be” and participle II, as “to be opened”, is ambiguous between a verbalpassive reading and an adjectival-passive reading. It is context that often helps to discriminate between the two interpretations, as illustrated in (2). (2)

a.

The window was opened three hours ago.

b.

The window was opened for three hours.

The predicate in (2a) is a verbal passive: it clearly refers to a speci¿c event, the time of which is marked by the temporal adverbial. Conversely, the predicate in (2b) is an adjectival passive: it describes a state which is a result of some

188

Berry Claus & Olga Kriukova

event. The event itself in this case is only implicit and the temporal adverbial does not refer to the event time (von Stechow 1998), but rather to the temporal extension of the state. The same analysis also holds for German verbal and adjectival passives, as in examples (3a) and (3b). Verbal passives contain a clear event-reference and allow modi¿cation by temporal-frame adverbials, while adjectival passives describe a state and only imply some kind of event. Unlike in English, in German, the two structures are grammatically different: verbal passives are formed with an auxiliary verb werden (become) and a participle, while adjectival passives are formed with the copula sein (be) and a participle. The absence of structural ambiguity makes German a very suitable language for research on adjectival passives. (3)

a.

Das Fenster wurde vor drei Stunden geöffnet. The window became opened three hours ago. ‘The window was opened three hours ago.’

b.

Das Fenster war drei Stunden lang geöffnet. The window was opened for three hours. ‘The window was opened for three hours.’

How exactly the formation of adjectival passives happens is still a matter of a debate: some researchers use a syntactic approach to explain the phenomenon (e.g. Embick 2004); some take a lexical route assuming adjectivization of the verbal participle by a zero-af¿x (e.g. Kratzer 2000); others take pragmatic aspects into account (e.g. Maienborn 2009) or appeal to the semantics of the underlying verb from which the participle was formed (Gehrke to appear). Yet, it is widely accepted that adjectival passives are copula-adjective constructions similar to copular constructions with genuine adjectives (e.g. Gehrke to appear; Kratzer 2000; Levin & Rappaport 1986; Maienborn 2009, 2011; Rapp 1996; Schlücker 2009; von Stechow 1998; Welke 2007). An adjectival analysis of the adjectival passive is supported by several diagnostics (for a review see Embick 2004 and Schlücker 2009, for evidence from corpora see Gese et al. 2009). One example is the occurrence of the negative pre¿x un- in German adjectival passives as in (4). As un-pre¿xation is licensed with adjectives but not with verbs, the combination of the pre¿x with participles indicates the adjectival nature of the participle. Moreover, the participle can also appear in comparative or superlative forms, indicating adjectival gradation (see (5)). A further diagnostics is based on the observation that adjectival passives can be conjoined with genuine adjectives in coordinated copular

Interpreting adjectival passives: Evidence for the activation of contrasting states

189

constructions, such as in (6). Considering that coordination requires category identity (cf. Lang 1984), this also supports an adjectival analysis. (4)

Das Hemd ist ungebügelt. ‘The shirt is unironed.’

(5)

a.

In the entire world, it is the hawksbill that is most threatened by extinction.

b.

He is more humiliated than her at times.

(6)

The aircraft is cleaned and ready for boarding.

In addition to these diagnostics, the adjectivization account is further bolstered by psycholinguistic evidence from a self-paced reading study by Stolterfoht, Gese & Maienborn (2010). They found prolonged reading times for participles in adjectival passive constructions compared to participles in verbal passive constructions. This ¿nding is in line with the assumption that the participle of an adjectival passive requires an adjectival conversion resulting in extra processing costs. With regard to the interpretation of adjectival passives, the view that adjectival passives are copula-adjective constructions implies that adjectival passives, just like copula-adjective constructions, describe states by assigning an object a certain property denoted by the predicate. However, the property ascribed is not the same. An adjective’s meaning is coded in the lexicon, and the property, ascribed by an adjective, occupies a “¿xed place in the subject referent’s property space” (Maienborn 2009: 40). For instance, the adjective leer (‘empty’) in (7a) simply assigns a lexically determined property to the rubbish bin and does not require the comprehender to derive any further implications (see also Welke 2007). Unlike adjectives, adjectival passives attribute a referent a property that is not lexically coded but needs to be inferred from the event denoted by the verbal base of the participle (Maienborn 2009). A property assigned by an adjectival passive is linked to the event and is the result or outcome of this event (Embick 2004; Gese et al. 2009; Maienborn 2009; Schlücker 2009; see also Rapp & von Stechow 1999), hence, the alternative name “adjectival resultatives”. The adjectival passive in (7b) does not merely ascribe a state to the rubbish bin but conveys that it emerged as a result of some action, i.e. emptying. (7)

a.

Der Mülleimer ist leer. ‘The rubbish bin is empty.’

190

Berry Claus & Olga Kriukova

b.

Der Mülleimer ist geleert. ‘The rubbish bin is emptied.’

A recent experimental ¿nding by Kaup, Lüdtke & Maienborn (2010) suggests that comprehenders are sensitive to the difference between adjectives and adjectival passives. Employing the action-sentence compatibility paradigm (Glenberg & Kaschak 2002), they tapped into the motor actions that were activated when reading state descriptions with either of the two forms. For adjectival passives, they found an activation of the movement involved in performing the action underlying the current state (e.g. moving the hand toward the body for opening in (8)) – suggesting that comprehenders simulated the action corresponding to the root verb of the adjectival passive1. (8)

Die Schublade ist noch geöffnet. ‘The drawer is still opened.’

The ¿nding by Kaup et al. (2010) is consistent with the notion that the stative property conveyed by an adjectival passive has to be recovered on the basis of its root verb. According to Maienborn (2007, 2009, 2011), an adjectival passive assigns a semantically underspeci¿ed event-based ad hoc property. Instantiating the ad hoc property may require much pragmatic effort in case of non-resultative verbs (e.g. The cat is petted, which is only ¿ne with a “job is done” interpretation (Kratzer 2000: 4)), and less or virtual no pragmatic effort with resultative verbs. However, Maienborn argues that the basic mechanism of interpreting adjectival passives in terms of an ad hoc formation is the same and

1. More speci¿cally, response times to sentences with adjectival passives such as (8) were found to be faster when responding required a hand movement that relates to the action that induced the current state compared with when responding required a hand movement that relates to the action that would change the current state (e.g., described state: (still) opened drawer; inducing action: opening, matching handmovement: toward the body; changing action: closing, matching hand-movement: away from the body). In contrast, response times to corresponding sentences with adjectives (e.g. The drawer is still open) showed the opposite pattern: Response times were faster for hand movements that matched the (anticipatable) changing action compared with hand movements that matched the inducing action. For further details concerning method, results, and interpretation issues we refer to the original paper by Kaup et al. (2010). The crucial point for our consideration in the current context is the ¿nding of a mental activation of the inducing action of a described state with adjectival passives but not with adjectives.

Interpreting adjectival passives: Evidence for the activation of contrasting states

191

generally requires the availability of a salient contrasting state which crucially determines the speci¿cation of the ad hoc property. Maienborn proposes two possible contrast dimensions, the temporal and the qualitative dimension, determined by contextual information and resulting in two different readings2 of adjectival passives. (9)

Der Schatz ist vergraben … ‘The treasure is buried …’ a.

wir können jetzt Àiehen. ‘now we can make our escape.’

b.

und nicht versenkt. ‘and not sunk.’ Example from Gese (2010)

post state reading target state reading

In (9a), the contextual information provides a contrasting state s’ which differs from the linguistically conveyed state s along the temporal dimension, such that s’ precedes s and the ad-hoc property expressed by the adjectival passive, does not hold in s’. On this post state reading, the treasure is classi¿ed as being in the post state of a burying event (Maienborn 2009: 46). In contrast, on the target state reading, as exempli¿ed in (9b), the treasure belongs to the class of buried treasures rather than being sunk (or sealed or burned). Here the contrast dimension is qualitative; the contrasting state s’ pertains to some property that is distinct from the ad-hoc property conveyed by the adjectival passive and is salient within the given context (Maienborn 2009: 46). The crucial point of Maienborn’s account with regard to the present paper is that the interpretation of an adjectival passive necessarily involves a contrasting alternative state. A very similar assumption is made in a recent proposal by Gehrke (to appear) who argues that adjectival-passive states are evaluated with respect to an opposite state. However, differently from Maienborn, she attributes the involvement of a contrasting alternative to semantics rather than pragmatics. We will come back to this difference in the Conclusion. In what follows, we will report a psycholinguistic experiment that aimed at gaining some insight with regard to the psychological plausibility of Ma2. Kratzer (2000; see also Brandt 1982) also distinguishes two different readings of adjectival passives: a resultant state reading in case the state is permanent and holds forever after the event denoted by the root verb and a target state reading with a more narrow de¿nition than Maienborn’s target state, i.e. restricted to states that are reversible.

192

Berry Claus & Olga Kriukova

ienborn’s (2009) and Gehrke’s (to appear) proposals that the interpretation of adjectival passives includes the involvement of a contrasting state. Consider sentence (10a). Here the adjectival passive stems from a change-of-state verb (to open) and is embedded in the context of a description of a desired situation. We reasoned that by this con¿guration the changeability of the predicated entity from time t to time t+1 is highlighted. In turn, this may favour a construal of the contrasting state along the temporal dimension – resulting in a post state reading of the adjectival passive. Hence, with a sentence such as (10a), we expect that the adjectival passive is evaluated against a temporally preceding contrasting state. As the root verb is a change-of-state verb, the contrasting state is given by the initial state of the respective event structure, i.e. closed window. If interpreting adjectival passives does indeed necessitate a contrasting state, then this should be reÀected in a mental availability of that state during comprehension. For instance, processing sentence (10a) should involve an activated representation of a closed window. (10) a. b.

Ralf wäre es lieber, wenn das Fenster geöffnet wäre. ‘Ralf would like it more if the window were opened.’ Ralf wäre es lieber, wenn das Fenster offen wäre. ‘Ralf would like it more if the window were open.’

To investigate this issue, we juxtaposed state descriptions with adjectival passives and with adjectives (see (10a) and (10b)) and compared the mental availability of the contrasting state. If the contrasting state can be found to be more available with adjectival passives than with adjectives, then this would provide empirical support for the proposal of an involvement of a contrasting state in the interpretation of adjectival passives. The experimental material consisted of descriptions of desired states, such as (10a) or (10b). All adjectival passives rooted from change-of-state verbs. As was outlined above, we expected that such a combination would bring forward a post-state reading which should ensure that the contrasting state would be as unequivocal as possible. That is, for our material, the contrasting state can be assumed to be the initial state of the root-verb event. A side-effect of employing descriptions of desired states as experimental material is that even the adjective versions (see (10b)) do not only convey information about the linguistically conveyed (desired) state but implicitly also about the factual state – as this is easily inferable. Considering that the factual state always corresponds to contrasting state (e.g. closed window), the materials in principle allow for the possibility of an activation of the contrasting state

Interpreting adjectival passives: Evidence for the activation of contrasting states

193

for the adjective version of the sentences. Thus, our experiment could be considered a conservative test of the prediction that the contrasting state should be more available with adjectival passives than with adjectives. Yet, the inference of the contrasting (factual) state is not a prerequisite for the understanding of the adjective versions, whereas the interpretation of the adjectival passive versions is proposed to necessitate the activation of the contrasting state.

2.

Experiment

To test the mental availability of the contrasting state, we employed a variant of sentence-picture veri¿cation paradigm (Clark & Chase 1972) suggested to us by B. Kaup (p.c.). After reading a sentence such as (10a) or (10b), participants were presented with a picture that either depicted the linguistically conveyed state, i.e. the desired state, or the contrasting state, i.e. the implied factual state (e.g. open or closed window, respectively, as depicted in Figure 1).

Figure 1. Sample pictures used in the experiment, corresponding to the sentence Ralf would like it more if the window were opened/open. Left: picture of the linguistically conveyed state. Right: picture of the contrasting state.

Participants’ task was to press a key as soon as they had identi¿ed the depicted object. We measured the picture-identi¿cation latencies considering them to reÀect the mental availability of the depicted state3. If it is true that the inter3. The picture-identi¿cation task asked for the identi¿cation of the visually presented object independent of the depicted state of the object. Still, it can be expected that the task is facilitated when the depicted state corresponds to a state that is activated in the course of processing the preceding sentence. Research on visual perception indicates that object recognition is facilitated by top-down processes based on semantic knowledge or contextual information (e.g. Abdel Rahman & Sommer 2008; Bar 2004). Moreover, empirical ¿ndings suggest that linguistic information (e.g. a duck in the lake vs. a duck in the air) shapes early visual processing in subsequent object recognition tasks (Hirschfeld & Zwitserlood 2011; Hirschfeld, Zwitserlood & Dobel 2011; see also Potter & Faulconer 1979).

194

Berry Claus & Olga Kriukova

pretation of adjectival passives necessarily involves a contrasting state then participants should be faster to identify a picture of the contrasting state after reading a sentence with an adjectival passive than after reading a sentence with an adjective. The pictures of the linguistically conveyed state (e.g. open window) were included as a control condition – intended to provide an indication of whether or not an obtained difference between adjectival passives and adjectives would be speci¿c to the mental availability of the contrasting state. To gain some insight concerning the time course of processing adjectival passives, we tapped participants’ mental representation at two different stages. The picture of the linguistically conveyed or contrasting state was either presented immediately after sentence reading or with a delay of 1,500 milliseconds. Assuming that the interpretation of adjectival passives does indeed involve a contrasting state, there are several possibilities with regard to the time course. It may be that the activation of the contrasting state decays rapidly – resulting in different result patterns at the two times of testing. Alternatively, it may be that the activation of the contrasting state is sustained – then the pattern of results should not differ across the two times of testing. However, the main focus of the present study was to experimentally investigate the general prediction that processing adjectival passives involves the activation of a contrasting state. 2.1.

Method

Sixty-four participants from Saarland University took part in the experiment and received a monetary reimbursement for their participation. All participants were native speakers of German. The materials comprised sets of sentences, pictures, words, and questions. There were 32 experimental sentences. All were of the type X wäre es lieber, wenn Y Z wäre (‘X would like it more if Y were Z’), where X was a proper name referring to an individual, Y denoted an object, and Z expressed a property. There were two versions of each experimental sentence: in one version, Z was an (adjectival-passive) participle, in the other version, Z was an adjective (as illustrated in (10a) and (10b)). The root verbs of all adjectival passives were change-of-state verbs. In each of the adjective versions, the adjective denoted the result state of the corresponding adjectival passive’s root verb (e.g. openopened, empty-emptied). In approximately half of the experimental sentences (n = 17), the root verb of the employed adjectival passive was a deadjectival verb (e.g. spitzen [to sharpen], derived from spitz [sharp]); in the remaining experimental sentences (n = 15), the adjectival passive’s root verb was a verb that is not derived from an adjective (e.g. bügeln [to iron]).

Interpreting adjectival passives: Evidence for the activation of contrasting states

195

The ¿ller items served to conceal the purpose of the experiment. We constructed 50 ¿ller sentences that were similar to the experimental sentences in length, form, and complexity and in ascribing a propositional attitude. However, the propositional attitude expressed by the ¿ller sentences differed from the desiderative attitude expressed by the experimental sentences. There were three types of ¿ller sentences. One type of ¿ller sentences (n = 16) expressed a positive attitude about some state, such as X ¿ndet es gut, dass Y Z ist (‘X ¿nds it good that Y is Z’). The other two types of ¿ller sentences (n = 34) expressed a negative attitude about some state either in the form X ¿ndet es schlecht, dass Y Z ist (‘X ¿nds it bad that Y is Z’) or X wäre es unangenehm, wenn Y Z wäre (‘X would be uncomfortable if Y were Z’). Half of the ¿ller sentences contained an adjectival-passive construction; the other half contained a copula-adjective construction. There were 114 black-and-white pictures. Sixty-four pictures served as pictures for the experimental trials. They were comprised of 32 pairs of pictures such that each pair depicted the object that was named in the respective experimental sentence but in opposite states (see sample pictures in Figure 1). That is, the depicted state either corresponded to the linguistically conveyed state or to the contrasting state. Fifty pictures were used for the ¿ller sentences. Eight ¿ller pictures showed the object that was named in the corresponding ¿ller sentence and the remaining 42 ¿ller pictures depicted an object that was not named in the corresponding ¿ller sentence. To motivate the picture-identi¿cation task, it was followed by a pictureword veri¿cation task. Participants had to decide whether a given word named the depicted object. There were 82 words, 32 for the experimental pictures and 50 for the ¿ller pictures. For half of the experimental trials and for half of the ¿ller trials, the word named the object that was depicted on the picture and for the other half of the experimental and ¿ller trials, the word named an object that was not depicted on the picture For twenty-six ¿ller sentences, “Yes/No” questions were constructed to encourage the participants to read for comprehension. The questions were related to the content of the corresponding sentence. Half of the comprehension questions required a positive answer, and the other half a negative one. The experiment employed a 2(Delay: zero vs. 1500 ms) x 2(Form: adjectival passive vs. adjective) x 2(Depicted State: linguistically conveyed vs. contrasting) design with Delay being the only between-subjects factor. Half of the participants were randomly assigned to the zero-delay condition and the other half to the 1500 ms-delay condition. The assignment of conditions (Form x Depicted State) to experimental sentences/pictures and participants was coun-

196

Berry Claus & Olga Kriukova

terbalanced. Each participant saw eight different items in each of the four conditions (combinations of sentence versions and picture versions). Experimental and ¿ller trials were presented to the participants in various mixed random orders. Figure 2 provides an illustration of the trial procedure.

Figure 2. Illustration of the trial procedure for experimental trials.

Each trial started with a sentence presented on a computer screen for self-paced reading. After reading and understanding the sentence, participants had to press a key which caused the sentence to disappear. Either immediately or 1500 ms after that (depending on the delay condition), a picture was presented. Participants’ task was to press a key as soon as they had identi¿ed the object depicted on the picture. Then, the picture was replaced by a word for which participants had to indicate whether or not it named the depicted object by pressing either of two keys. For approximately 30% of the trials (all ¿ller trials), this was followed by a comprehension question requiring a “yes-” or “no-”answer. Participants were instructed to respond as quickly and accurately as possible to all tasks during a trial and were allowed to take breaks between trials. 2.2.

Results and Discussion

Analyses were carried out on the picture-identi¿cation latencies of the experimental trials with correct responses to the word-veri¿cation task (accuracy for word veri¿cation: 99.3%). We excluded trials with a sentence reading time shorter than 300 ms and longer than 5000 ms and with a picture-identi¿cation latency shorter than 300 ms and longer than 3000 ms (3.6% of the data). Subsequently, pictureidenti¿cation latencies that deviated more than 2 SDs from an item’s mean in the respective Form x Depicted State condition were classi¿ed as outliers and eliminated (4.6% of the data). In addition, the data from one participant had to be excluded due to too few remaining data points (n = 2) in one condition after applying the outlier-elimination procedure.

Interpreting adjectival passives: Evidence for the activation of contrasting states

197

Table 1 shows the means of the picture-identi¿cation latencies as a function of Form and Depicted State separately for the zero-delay and for the 1500 msdelay conditions. An alpha-level of 0.05 was used for all statistical tests. Data were ¿rst submitted to a 2(Delay) × 2(Form) × 2(Depicted State) mixed analysis of variance. There was a signi¿cant main effect of Delay, F(1,61) = 5.56, MSE = 1,619,006, p < 0.05. Picture-identi¿cation latencies were faster in the 1500 ms-delay condition than in the zero-delay condition. This effect is of little interest, however; it could be attributed to a higher processing load in the zero-delay condition or it may simply be due to the between-subjects manipulation of delay. Most relevant with regard to the present issue, there was a highly signi¿cant interaction of Form and Depicted State, F(1,61) = 6.91, MSE = 14,258, p = 0.01. No other effects or interactions were signi¿cant, all Fs < 1. Table 1. Mean picture-identi¿cation latencies (in ms) as a function of Form and Depicted State separately for the two delay conditions.

Depicted State linguistically conveyed contrasting

zero-delay adjectival adjective passive

95% CI

1500 ms-delay adjectival 95% adjective passive CI

1080

1123

±42

919

955

±51

1118

1085

±47

967

923

±34

Note. The 95% CIs are within-subject con¿dence intervals (cf. Masson and Loftus 2003) associated with the contrast between adjectives and adjectival passives in the respective Depicted State condition.

Since there were no signi¿cant interactions involving the factor Delay, the data was collapsed across the two delay conditions and pairwise comparisons were conducted, separately for the two levels of the factor Depicted State. For pictures showing the linguistically conveyed state (e.g. open window), there was a trend toward shorter picture-identi¿cation latencies after sentences with adjectives than after sentences with adjectival passives, t(62) = 1.76, p = 0.08. In contrast, for pictures showing the contrasting state (e.g. closed window), picture-identi¿cation latencies were signi¿cantly shorter after sentences with adjectival passives than after sentences with adjectives, t(62) = 1.99, p = 0.05. The pattern of results clearly indicates that comprehenders are sensitive to the differences between adjectives and adjectival passives. This is in line with previous studies (Kaup et al. 2010; Stolterfoht et al. 2010). The speci¿c contribution of the present results is that they provide empirical evidence for

198

Berry Claus & Olga Kriukova

the assumption that the interpretation of adjectival passives includes the involvement of a contrasting state (Gehrke to appear; Maienborn 2009). Pictureidenti¿cation latencies for contrasting states (e.g. closed window) were found to be shorter after processing a sentence with an adjectival passive than after processing a sentence with an adjective. This suggests a difference in mental availability, consistent with the proposed activation of a contrasting state when interpreting an adjectival passive. The ¿nding of an interaction between Form and Depicted State indicates that the difference in mental availability for the contrasting state cannot be attributed to a general effect but seems to be speci¿c to the contrasting state. For pictures of the state that was linguistically conveyed (e.g. open/opened window), the identi¿cation latencies were not shorter after adjectival passives than after adjectives. Rather, there was a trend for the opposite pattern. We will come back to this point in the Conclusion. Remarkably, we found no interaction with the manipulated delay, demonstrating that the pattern of results did not differ in the two delay conditions. More speci¿cally, the picture-identi¿cation latencies indicate that the contrasting state was better available after an adjectival passive than after an adjective immediately after reading the sentence as well as 1,500 ms after reading the sentence. With regard to the time-course issue, this suggests that the activation of the contrasting state is sustained rather than rapidly decreased.

3.

Conclusion

Our study addressed the interpretation of adjectival passives. Unlike adjectives, which ascribe a lexically coded property, the property ascribed by an adjectival passive is generally not lexicalized but needs to be inferred on the basis of the root verb’s event (e.g. Embick 2004; Maienborn 2009; Schlücker 2009), possibly incorporating contextual information and background knowledge. Typically, the property assigned by an adjectival passive corresponds to the result or outcome of the underlying event. However, as pointed out by Maienborn (2009; 2011), interpreting an adjectival passive calls for more than deriving a result state. On the one hand, it is not always clear what constitutes the result state (as an example consider cited; cf. Maienborn 2011). On the other hand, adjectival passives characteristically serve more than just expressing a result state, for example they may imply information concerning the quality of the predicated object (consider as an example The manuscript is submitted, which may imply: ‘better than being prepared but not as good as being published’; cf. Maienborn 2009). Maienborn proposes that an adjectival pas-

Interpreting adjectival passives: Evidence for the activation of contrasting states

199

sive is a means to convey an ad hoc property that is derived on the basis of a contrasting state. That the interpretation of an adjectival passive involves an alternative state is also put forward in a recent account by Gehrke (to appear), who assumes that the state expressed by an adjectival passive is evaluated with regard to an opposite state. The results of our study are consistent with these proposals. Identi¿cation latencies for pictures showing the contrasting state were found to be faster after reading a sentence with an adjectival passive than after reading a sentence with an adjective (e.g. Ralf would like it more if the window were opened vs. open). This ¿nding indicates a difference in mental availability, which corresponds to the prediction of an activation of the contrasting state when interpreting an adjectival passive. As elucidated in the Introduction, our experiment constituted a conservative test of the prediction. In both versions, adjectival passive and adjective, the sentences described desired states and could have invited for the inference that the contrasting state holds in the factual situation. Thus, an activation of the contrasting state was conceivable for both versions. However, the crucial difference is that understanding the adjective versions does not require inferring and activating the contrasting state, whereas the activation of the contrasting state is proposed to be essential for interpreting the adjectival passive versions. As a control, our study also included a test of the mental availability of the linguistically conveyed desired state (e.g. picture that shows an open window). Here, we did not ¿nd any indication of an enhanced availability following adjectival passives. This demonstrates that the effect is speci¿c to the availability of the contrasting state. Identi¿cation latencies for the pictures depicting the linguistically conveyed state were shorter after adjectives than after adjectival passives. We want to point out that this difference was not signi¿cant. Still, the direction of the descriptive difference may be considered as an indication that the linguistically conveyed desired state was mentally better available after adjectives than after adjectival passives. At ¿rst sight this may be surprising, considering that one would expect a representation of the linguistically conveyed state for both forms – possibly delayed in the case of the adjectival passive. Yet, even a signi¿cant difference in the picture-identi¿cation latencies would not necessarily imply that the linguistically conveyed state was not available after reading the adjectival passive. Prolonged picture-identi¿cation latencies for the linguistically conveyed desired state with adjectival passives could also be accounted for by assuming interference between two activated state representations, a representation of the contrasting state and a representation of the linguistically conveyed state.

200

Berry Claus & Olga Kriukova

As to the time course of the activation of the contrasting state, the pattern of picture-identi¿cation latencies did not differ for the two times of testing. More speci¿cally, the results indicated an enhanced availability of the contrasting state in the adjectival passive conditions both immediately after reading and after a delay of 1,500 ms. This suggests that the activation of the contrasting state does not decay rapidly but rather is long lasting. It is tempting to speculate that the sustainability of the activation may be taken as an indication that the contrasting state is not only involved in the process of deriving the property that is expressed by the adjectival passive but, moreover, becomes a part of the meaning representation. However, the present results do not allow for any conclusions in this regard. It is a highly interesting task for future research to investigate this issue. It could be objected that the main ¿nding of our study – i.e., enhanced availability of the contrasting state – could also be accounted for without the assumption that the contrasting state is involved in deriving the property implied by the adjectival passive. In our study, the root verbs of the employed adjectival passives all were change-of-state verbs and the contrasting state corresponded to the initial state of the root verb’s event. On the assumption that processing an adjectival passive generally activates the root verb’s lexical entry including the respective event structure (cf. Moens & Steedman 1988) of a change-ofstate verb, then this may, at ¿rst sight, provide a more parsimonious account of our ¿ndings. The enhanced availability of the contrasting state, i.e. initial state, could be attributed to its activation as part of the event structure. Yet, there is reason to doubt this alternative account. Psycholinguistic ¿ndings suggest that the activation of the different parts of an event structure is modulated by whether the event is described as being ongoing or completed (Ferretti, Kutas & McRae 2007; Ferretti, Rohde, Kehler & Crutchley 2009; Morrow 1985) and that the initial state is not always highly activated but that its level of activation may depend on verb aspect. For example, ¿ndings from a study by Ferretti et al. (2009) on transfer-of-possession verbs indicate a higher activation of the Goal compared with the Source with the difference being more pronounced with perfective verb aspect. Hence, processing adjectival passives may trigger the activation of the event denoted by the participle’s root verb but this may not necessarily involve an enhanced and persistent mental availability of the event’s initial state. However, on the basis of the present ¿ndings, the alternative account cannot be de¿nitely discarded. An interesting and revealing test case would be to test for the mental availability of the contrasting state in cases of a target state reading rather than a post state reading (see Introduction). On a target state reading, the contrasting state does not correspond to the underlying event’s initial state but depends on the respective qualitative dimension. If one

Interpreting adjectival passives: Evidence for the activation of contrasting states

201

could still ¿nd enhanced availability for the contrasting state then this could not be accounted for by verb-based, lexically driven activation. If accepting that our ¿nding of an enhanced mental availability of the initial state could be explained by the activation of a contrasting state in the process of interpreting an adjectival passive, then this is in line with Maienborn’s (2009) as well as with Gehrke’s (to appear) proposals. As was mentioned in the Introduction, the two accounts differ in whether the involvement of a contrasting state is attributed to pragmatics or semantics. According to Maienborn (2009), an adjectival passive expresses an ad hoc property, which is semantically underspeci¿ed. Deriving a suitable instantiation of the property is assumed to require pragmatic inferencing on the basis of a contrasting state – either on the temporal dimension (post state reading) or on the qualitative dimension (target state reading). In Gehrke’s (to appear) account, on the other hand, the state expressed by an adjectival passive is semantically speci¿ed as the instantiation of a consequentstate kind. The state is evaluated against an opposite state on a scalar dimension. In cases where the event structure of the root verb includes a BECOME component (cf. Dowty 1979), as with change-of-state verbs, the dimension is temporal and the opposite state is given by the respective initial state. According to Gehrke, the temporal scale is the basic one; cases where the opposite state cannot be evaluated on the temporal dimension require pragmatic licensing and call for converting the temporal interpretation of BECOME to a qualitative dimension. In our study, each of the employed adjectival passives was based on a change-of-state verb and the intended interpretation was a target state reading. Hence, both Maienborn’s and Gehrke’s account yield the same prediction: the contrasting state or opposite state, respectively, should correspond to the initial state of the root verb’s event and should be involved in the interpretation of the adjectival passive – which is consistent with our ¿nding of an enhanced mental availability of the initial state with adjectival passives. So, how could one empirically decide between the two accounts with regard to their psychological plausibility? One possibility is to test for potential context effects. A study that is germane to this issue is the one by Kaup and colleagues (2010), which has been mentioned already in the Introduction. As was outlined before, their results suggest that when processing an adjectival passive, comprehenders simulate the action denoted by the root verb. However, evidence for such mental simulations was only found when the sentential context contained the temporal particle noch (‘still’ as in The drawer is still opened), but not for otherwise equal sentences without the temporal particle. A sentential context which contains the temporal particle noch (‘still’) high-

202

Berry Claus & Olga Kriukova

lights the changeability of an entity’s state and in this regard corresponds to the desiderative-modality context of our experimental sentences (e.g. Ralf would like it more if the window were opened), which likewise may draw attention to the alterability of states. Thus, it would be interesting to see whether modifying our material such that the context is “neutral” (e.g. The window is opened) would yield different results than the present experiment. Obtaining different results depending on sentential context could also be revealing with regard to the difference between the adjectival-passive accounts of Maienborn (2009) and Gehrke (to appear). If there would be no evidence of an enhanced availability of the initial state with simple assertions, then this could be accounted for within Maienborn’s framework by the possibility of a target state reading with a contrasting state on a qualitative dimension, which simply does not correspond to the probed state. In contrast, Gehrke’s account does not leave such a loophole. Within her framework, an adjectival passive that is based on a change-of-state verb is evaluated with respect to an opposite state on the temporal dimension, i.e. the underlying event’s initial state – regardless of whether the adjectival passive is included in a desiderative-modality sentence or in a simple assertion. Hence, different results for the two sentential contexts would pose a problem for Gehrke’s account. We would like to stress that the ¿ndings of the study by Kaup et al. (2010) do not provide a suitable basis for any implications concerning the effect of manipulating the sentential context of the material in our study – not least because the two studies differ with regard to the investigated processes and the employed paradigms. It should also be added that decisive experimental tests of an activation of a contrasting state on a qualitative dimension are hardly possible as clear-cut predictions are lacking and are dif¿cult to make. In conclusion, our study provided ¿rst empirical evidence for the mental activation of alternative states when comprehending state descriptions with adjectival passives. This ¿nding provides preliminary support for recent accounts that propose the involvement of a contrasting state (Maienborn 2009) or opposite state (Gehrke to appear) in the interpretation of the state expressed by an adjectival passive. Obviously, our study is just a small step and leaves many questions open. Yet, our results are promising with regard to conducting more precise tests of implications of these accounts.

Interpreting adjectival passives: Evidence for the activation of contrasting states

203

Acknowledgments We are grateful to Ulrike Karg and Kalina Petrova for their assistance in collecting the data, to Helga Gese and Claudia Maienborn for valuable discussions, and to Berit Gehrke for drawing our attention to the opposite state. We also want to thank Britta Stolterfoht and two anonymous reviewers for their comments and suggestions on an earlier version of this paper.

References Abdel Rahman, Rasha & Werner Sommer 2008 Seeing what we know and understand: How knowledge shapes perception. Psychonomic Bulletin & Review 15: 1055–1063. Bar, Moshe 2004 Visual objects in context. Nature Reviews Neuroscience 5: 617–629. Brandt, Margareta 1982 Das Zustandspassiv aus kontrastiver Sicht. Deutsch als Fremdsprache 19: 28–34. Clark, Herbert H. & William G. Chase 1972 On the process of comparing sentences against pictures. Cognitive Psychology 3: 472–517. Dowty, David 1979 Word Meaning and Montague Grammar: The Semantics of Verbs and Times in Generative Semantics and in Montague’s PTQ. Dordrecht: Reidel. Embick, David 2004 On the structure of resultative participles in English. Linguistic Inquiry 35: 355–392. Ferretti, Todd R., Marta Kutas & Ken McRae 2007 Verb aspect and the activation of event knowledge. Journal of Experimental Psychology: Learning, Memory and Cognition 33: 182–196. Ferretti, Todd R., Hannah Rohde, Andrew Kehler & Melanie Crutchley 2009 Verb aspect, event structure and coreferential processing. Journal of Memory and Language 61: 191–205. Gehrke, Berit to appear Passive states. In: Violeta Demonte and Louise McNally (eds.), Telicity, Change and State: A Cross-Categorial View of Event Structure. Oxford: Oxford University Press.

204

Berry Claus & Olga Kriukova

Gese, Helga 2010

Implicit events and their participants: experimental studies on adjectival passives. Paper presented at RALFe 2010, 1st Fall Meeting on Formal Linguistics: Language(s) and Cognition, Paris. Gese, Helga, Britta Stolterfoht & Claudia Maienborn 2009 Context effects in the formation of adjectival resultatives. In: Susanne Winkler and Sam Featherston (eds.), The Fruits of Empirical Linguistics, Volume 2: Product, 125–155. Berlin: de Gruyter. Glenberg, Arthur M. & Michael P. Kaschak 2002 Grounding language in action. Psychological Bulletin and Review 9: 558–565. Hirschfeld, Gerrit & Pienie Zwitserlood 2011 How vision is shaped by language comprehension – Top-down feedback based on low-spatial frequencies. Brain Research 1377: 78–83. Hirschfeld, Gerrit, Pienie Zwitserlood & Christian Dobel 2011 Effects of language comprehension on visual processing – MEG dissociates early perceptual and late N400 effects. Brain and Language 116: 91–96. Kaup, Barbara, Jana Lüdtke & Claudia Maienborn 2010 “The drawer is still closed”: Simulating past and future actions when processing sentences that describe a state. Brain and Language 112: 159–166. Kratzer, Angelika 2000 Building statives. In: Lisa Conathan, Jeff Good, Darya Kavitskaya, Alyssa B. Wulf and Alan C. L. Yu (eds.), Proceedings of the Twenty-Sixth Annual Meeting of the Berkeley Linguistic Society, 385–399. Berkley: University of California. Lang, Ewald 1984 The Semantics of Coordination. Amsterdam: Benjamins. Levin, Beth & Malka Rappaport 1986 The formation of adjectival passives. Linguistic Inquiry 17: 623–661. Maienborn, Claudia 2007 Das Zustandspassiv: Grammatische Einordnung – Bildungsbeschränkungen – Interpretationsspielraum. Zeitschrift für Germanistische Linguistik 35: 83–114. Maienborn, Claudia 2009 Building event-based ad hoc properties: On the interpretation of adjectival passives. In: Arndt Riester and Torgrim Solstad (eds.), Proceedings of Sinn und Bedeutung 13: 35–49. University of Stuttgart. Maienborn, Claudia 2011 Strukturausbau am Rande der Wörter: Adverbiale Modi¿katoren beim Zustandspassiv. In: Stefan Engelberg, Anke Holler and Kristel Proost

Interpreting adjectival passives: Evidence for the activation of contrasting states

205

(eds.), Sprachliches Wissen zwischen Lexikon und Grammatik, 317–343. Institut für Deutsche Sprache, Jahrbuch 2010. Berlin: de Gruyter. Masson, Michael E. J. & Geoffrey R. Loftus 2003 Using con¿dence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology 57: 203–220. Moens, Marc & Mark Steedman 1988 Temporal ontology and temporal reference. Computational Linguistics 14: 15–28. Morrow, Daniel G. 1985 Prepositions and verb aspect in narrative understanding. Journal of Memory and Language 24: 390–404. Potter, Mary C. & Barbara A. Faulconer 1979 Understanding noun phrases. Journal of Verbal Language and Verbal Behavior 18: 509–521. Rapp, Irene 1996 Zustand? Passiv? – Überlegungen zum sogenannten “Zustandspassiv”. Zeitschrift für Sprachwissenschaft 15: 231–265. Rapp, Irene & Arnim von Stechow 1999 Fast ‘almost’ and the visibility parameter for functional adverbs. Journal of Semantics 16: 149–204. Schlücker, Barbara 2009 Passive in German and Dutch: The sein / zijn + past participle construction. Groninger Arbeiten zur Germanistischen Linguistik 49: 96–124. Stechow, Arnim von 1998 German participles II in distributed morphology. Ms, University of Tübingen. Stolterfoht, Britta, Helga Gese & Claudia Maienborn 2010 Word category conversion causes processing costs: evidence from adjectival passives. Psychonomic Bulletin & Review 17: 651–656. Welke, Klaus 2007 Das Zustandspassiv: Pragmatische Beschränkungen und RegelkonÀikte. Zeitschrift für germanistische Linguistik 35: 115–145.

Focus projection between theory and evidence Kordula De Kuthy & Walt Detmar Meurers

1.

Introduction

Research over the past decade has established that the nature of the integration of a sentence into the discourse can provide explanations for constraints previously stipulated in syntax. For example, Cook (2001) explores informationstructural conditions on syntactic coherence in German. De Kuthy (2002) and Fanselow & ûavar (2002) relate the occurrence of discontinuous NPs in German to speci¿c information-structural contexts. De Kuthy & Meurers (2003) show that the realization of subjects as part of fronted non-¿nite constituent and its constraints can be accounted for based on independent information-structure conditions, and Bildhauer & Cook (2010) show that sentences in which multiple elements have been fronted are directly linked to speci¿c information structures. To further explore and re¿ne this line of research, it is essential to be able to refer to an explicit model of the interaction of syntax, information structure, and intonation as part of a formal linguistic architecture. Research investigating the interaction of syntax, information structure, and intonation has traditionally been theoretically driven, with the syntactic F-marking approach of Selkirk (1995) serving as one prominent foundation. At the same time, recent work mostly driven by pragmatic and semantic considerations has questioned the very foundation of such an approach. This includes the claim that focus projection as the fundamental syntactic means of connecting the focus exponent (the word carrying the nuclear pitch accent) and the semantically interpreted focus element is not needed at all (Roberts 2006; Kadmon 2006, 2009; Beaver & Velleman 2011), or that it is not subject to syntactic constraints (Büring 2006; Fanselow 2008; Fanselow & Lenertová 2011). Importantly, the new approaches do not just differ in terms of their perspective and theoretical interpretation – they make claims about a fundamentally different empirical landscape. To replace focus projection with a general pragmatic condition building on retrievable1 (Roberts 2006) or expectable2 (Kadmon 2006), the authors must assume that there are signi¿cantly more pitch 1. Referred to as salient in Roberts (2008). 2. Renamed to recoverable in Kadmon (2009).

208

Kordula De Kuthy & Walt Detmar Meurers

accents than have been assumed by previous approaches: they assume pitch accents on all elements which are part of the focus and are not retrievable/ expectable.3 And the proposal of Büring (2006) that focus projection is in principle always possible negates the empirical subcases delineated by focus projection constraints, which have been the hallmark of the long research tradition building on Selkirk’s F-marking approach. The ¿eld thus is in a situation where drastically different perspectives and theoretical interpretations of the syntax-information structure interface are based on wildly different empirical assumptions. In this paper, we want to bring together and compare the predictions of traditional focus projection on the one hand and the more recent pragmaticsonly approaches (Roberts 2006; Kadmon 2006) on the other with two sources of empirical evidence, experimental and corpus-based. In essence, the paper is an empirical exploration of the evidence for focus projection, working out the empirical challenge that a pragmatics-only approach needs to ¿nd an alternative explanation for. The ¿rst source of evidence is experimental research, where we review the published experimental results relating to focus projection in English (Gussenhoven 1983; Birch & Clifton 1995; Welby 2003) and in German (Féry 1993; Féry and Herbst 2004; Baumann, Grice & Steindamm 2006; Féry & Kügler, 2008). Corpus data as the second source of evidence to be discussed in this paper is less prominent in the literature, possibly because linguistically annotated corpora of spoken language are not as widely available as written language corpora. We explore where annotated corpora can provide empirical evidence for or against the existence of focus projection. The paper is organized as follows: In section 2 we introduce the background, including the core prosodic and pragmatic concepts, the relation between prosody and focus, and focus projection as a proposal for relating pitch accent placement and focus interpretation. Section 3 then sketches the recent pragmatics-only proposals, which eliminate focus projection. On this basis, in section 4 we then turn to the published experimental evidence which addresses the empirical reality of focus projection. While the bulk of the experiments deal with perception (section 4.1), section 4.2 also reports on some production studies. In section 5, we then complement the survey of experimental results from the literature with our own exploration of two spoken language corpora, 3. Notions such as retrievable can be seen to stand in the tradition of givenness (Schwarzschild 1999; Wagner, to appear) and earlier related notions such as construablity from the context (c-construable, Culicover & Rochemont 1983) – though note that Schwarzschild’s approach in addition includes syntactic F-marking.

Focus projection between theory and evidence

209

the IMS Radionewscorpus (section 5.1) and a section of the Verbmobil corpus (section 5.3). Just as with the experimental evidence, the goal is to look for data which are relevant for choosing between a focus projection and a pragmatics-only approach. The empirical landscape emerging from this exploration is signi¿cantly more complex than predicted under either approach. In the ¿nal section 6 we thus argue for extending the empirical research and suggest that an empirically adequate approach will need to combine both perspectives in an architecture supporting both pragmatic and syntactic constraints on focus projection.

2.

Background

Languages differ with respect to how the information structure of an utterance is encoded. Linguistic means of marking information structure include word order, morphology, and prosody. English and German are so-called intonation languages where information structuring is signaled by the intonation of an utterance. 2.1.

Intonation

We here follow the autosegmental-metrical theory of phonology (Liberman 1975; Pierrehumbert 1980) in which a pitch accent is de¿ned “as a local feature of a pitch contour – usually a pitch change, and often involving a local maximum or minimum – which signals that the syllable with which it is associated is prominent in the utterance.” (Ladd 2008, p. 48) The presence and nature of a pitch accent is argued to be an indicator of the discourse function of a particular part of a sentence (cf., e.g. Beckman & Pierrehumbert 1986; Grice, Baumann & Benzmüller 2005). For the issue targeted by this paper, clarifying the empirical reality of focus projection, it turns out that a basic prosodic analysis in terms of the presence or absence of pitch accents is suf¿cient. Naturally, a more elaborate prosodic analysis distinguishing different types of phonological domains, pitch accents, and prenuclear accents will be important when going beyond the fundamental architectural issue targeted here.

210

Kordula De Kuthy & Walt Detmar Meurers

2.2.

Information structure

The most widely discussed discourse function is the focus, which has been characterized in a variety of ways as the “most important” or “new” information of an utterance (cf. Krifka 2008). The focus can be de¿ned to be the part of an answer that corresponds to the wh-part of a question.4 This question-answer congruence is not always explicitly expressed in discourse. Instead, a number of theories assume that a coherent discourse is structured by implicit Questions Under Discussion (QUD) (cf., e.g. Roberts 1996; Büring 2003). As a simple example with an explicit question, consider the question in (1a) asking for the object that John is renting. (1)

a.

What did John rent?

b.

He rented ²a BICYCLE³F .

The answer in (1b) provides the element asked for, the focus of the utterance: Out of the various alternative things John could have rented, he picked a bicycle. The word bicycle is shown in small caps to indicate that it contains a syllable bearing a nuclear pitch accent. In this most basic case, the focused material thus is marked by a pitch accent and consists of information that is new in the discourse. The interesting questions arise when one considers situations in which the relation between focus, pitch accent, and new information is less direct. Let us ¿rst consider the dissociation of focus and new information. To explore this, let us add the context in (2) which introduces some conference participants, Bill, the rental of vehicles, and red and blue convertibles into the discourse. Based on this context, essentially following Schwarzschild (1999, p. 146), we then again consider the question (2a) asking for the object that John is renting as the focus. (2)

The conference participants are renting all kind of vehicles. Yesterday, Bill came to the conference driving a red convertible and today he’s arrived with a blue one. a.

What did John rent?

b.

He rented ²a GREEN convertible³F .

4. We only use the term focus in this formal pragmatic sense to avoid confusion with the prosodic notion, which we only refer to as focus exponent or pitch accent.

Focus projection between theory and evidence

211

One can now answer this question with sentence (2b), where a green convertible is the focus: Out of all the things John could have rented, he picked a green convertible. In this focus, only green is new to the discourse, whereas convertibles were already given in the context. That the focus is indeed the full expression a GREEN convertible can also be con¿rmed by adding the focus-sensitive expression only in front of the verb in (2b): In the context of (2a), only in the sentence He only rented a green convertible. clearly is interpreted as taking scope over the entire NP meaning. Pushing the dissociation of focus and new information to the extreme, it is possible for the focus to consist entirely of material already given in the context, as illustrated by (3b). (3)

In the rental lot, there were two bicycles and a motorcycle. a.

What did John rent?

b.

He rented ²a BICYCLE³F .

While focus and new information thus can be clearly dissociated, the distribution of new information in the focus has a direct impact on the realization of the prosodic indicators of focus, which we turn to next. 2.3.

Relation between prosody and focus

For considering this relation, we need to take a closer look at the prosodic indicators of information structure. More speci¿cally, we need to determine how focus is related to the occurrence of pitch accents. In the most simple case we saw in (1), every substantive5 element of the focus contains a pitch accent, i.e., there is a one-to-one correspondence. Yet, this is not generally the case. The same prosodic realization of a sentence, with a single pitch accent on the object bicycle, is traditionally also assumed to be appropriate in a context with a broader focus. This is illustrated by (4), where three different questions are paired with the prosodically identical answer. (4)

a.

What did John rent? John rented ²a BICYCLE³F .

(narrow, NP focus)

5. Here and in the following, substantive elements is used to refer to the non-function words contributing lexical content, e.g., nouns, verbs, adjectives.

212

Kordula De Kuthy & Walt Detmar Meurers

b. c.

What did John do? John ²rented a BICYCLE³F .

(wide, VP focus)

What happened yesterday? ²John rented a BICYCLE³F .

(wide, S focus)

In (4a), we see the original question focusing on the object a bicycle. The question in (4b) requires an answer in which the VP rented a bicycle is the focus: Out of the alternative actions John could have performed, it is renting a bicycle that he did. And the question in (4c) puts the entire sentence John rented a bicycle into focus: Out of everything that could have happened yesterday, it asserts that John renting a bicycle is what happened. Crucially, the exact same realization of the answer, with a single pitch accent on bicycle, is traditionally assumed to be appropriate for either of the three focus realizations. This Àexible relation between pitch accent placement and focus interpretation is referred to as focus projection when the relation is assumed to be mediated by syntax, and a number of lexical and syntactic conditions have been formulated in the literature to de¿ne when focus can project in this way (e.g. Gussenhoven 1983; Selkirk 1995; von Stechow & Uhmann 1986; Uhmann 1991; Jacobs 1988, 1993), including the role of word order (e.g. Höhle 1982). Conditions on whether the material that is projected over must have particular formal pragmatic properties (e.g. be given, new) in the discourse have traditionally not been discussed in this context. Speci¿cally, it is generally not ruled out that focus projects over unaccented material that is new in the discourse – whereas we will see in the next section that the pragmatic approaches specify speci¿c pragmatic requirements on any unaccented material included in the focus which rule out unaccented, new material as part of the focus. In addition to such cases of focus projection, where the focus includes substantive material which does not bear a pitch accent, we should also revisit example (2b) under this perspective. It shows that the focus can also include unaccented substantive material when focused material is already given in the discourse, so-called deaccenting of given material. Since every focus must contain a pitch accent, in such cases of deaccenting the pitch accent must be realized on another, new word in the focus. For sentences in a context where the focus contains no new information, as in the example (3b) we saw above, the pitch accent must exceptionally be realized on a given element. Departing brieÀy from the general background discussion here, it seems clear to us that the information structural nature of the material projected over needs to be taken into account. Consider, for example, the following examples in (5a) and (5b) in the out of the blue context given.

Focus projection between theory and evidence

(5)

213

John, what’s going on? Why are you so pale? a.

²I just saw a man with an AXE!³F

b.

²I just saw a chicken with an AXE!³F

In such a wide focus context, sentence (5a) seems more appropriate than (5b). The intuitive explanation seems to be that seeing chicken is so unexpected that it needs its own accent, whereas axes are typically carried by men as in (5a). It remains to be explored whether the kind of non-accenting of material projected over is the same as the deaccenting of given material (as in (3b) in the background section) or whether the notion of givenness involved there is stricter. Last but not least, this intuitive observation needs to be experimentally tested. As we aim to show in this paper, the link between theoretical claims and empirical evidence needs to be strengthened further to resolve the current conÀicting assumptions about focus projection – to which we now return.

3.

Pragmatic proposals eliminating focus projection

Roberts (2006) proposes to eliminate focus projection entirely and instead presents a general approach deriving the relation between focus and prosody using a notion of retrievability. She de¿nes the following notions: Accentuation: Freely align pitch accents with words in independently generated prosodic and syntactic structures. Retrievability presupposition: If a contentful constituent bears no accent, then its denotation is conventionally implicated to be retrievable. Novelty implicature of pitch accents6: If a constituent bears an accent, then its denotation is not retrievable. The central notion of retrievability is de¿ned by Roberts as follows: Retrievability: An expression Ș as part of an utterance U is retrievable iff 1.

Ș is not the focus7 in a direct answer to the QUD at U (so Ș by itself cannot serve as a constituent answer) and

6. Roberts (2006) uses the term ‘focus’ in place of ‘pitch accents’ here since she uses focus for both the pragmatic and the prosodic notion. In this paper, we use focus only for the pragmatic notion. 7. Roberts (2006) refers to ‘rheme’ here; for the issues discussed in this paper, Roberts’ notion of rheme and focus seem to be equivalent.

214

Kordula De Kuthy & Walt Detmar Meurers

Ș has a salient antecedent A and modulo ‫׌‬-type-shifting, A entails the Existential Accent-Closure of Ș.

2.

Existential Accent-Closure: Replace any maximal constituent such that all of its content words are accented with a variable, and existentially bind all such variables. Under this approach, any unaccented element must be retrievable. This encompasses two cases which traditionally were dealt with separately: i) unaccented elements which are part of the focus, such as John and rented in (4c), in what is called focus projection in the syntax-based approaches, and ii) unaccented elements such as convertible in (2b), which have been discussed as deaccenting of given material. Pursuing a related perspective, Kadmon (2006) also proposes to eliminate focus projection and replace it with a formal pragmatic account, but for her the fundamental concept relating accent placement and focus interpretation is the notion of expectability. The core components of her approach are: Interpretation of pitch accent placement: A word is interpreted as expectable iff it is unaccented. Expectability: An expression B is expectable in an utterance U iff the following holds: Presented with the result of replacing B in U with a variable, it would be possible for the hearer to infer on the basis of prior context that the variable in the actual utterance is occupied by B. Under Kadmon’s approach, focused elements without nuclear pitch accents, which traditionally were analyzed as part of a projected focus, thus must be expectable – or they must turn out to actually be accented after all. While there are some interesting differences between the two approaches, the claim that both the theories of Roberts (2006) and Kadmon (2006) have in common is that there is no focus projection, i.e., there is no single realization of a sentence with an accent placement that can have ambiguous interpretations with respect to the information structuring of the respective utterance. For every possible information structuring of a sentence, based on Robert’s and Kadmon’s approaches one expects to ¿nd different accent patterns. Take, for example, the example in (6), where the question requires an answer with a VP focus. (6)

What did you do? a.

I invited BILL.

b.

I INVITED BILL.

Focus projection between theory and evidence

215

The utterance in (6a), with a pitch accent on Bill, is traditionally assumed to be a felicitous answer to the VP focus question. In contrast, Kadmon and Roberts claim that this accent placement is not a possible answer to the question in a genuine out-of-the blue context. It only is an option in a context such as that at a party, where the unaccented verb invited is expectable/retrievable. According to their approaches, the only possible accent pattern in a genuine out-of-theblue context is the one in (6b), where both the verb and the NP argument are accented.8 In sum, the empirical predictions of the reconceptualized interface between prosody and information structure as presented by Roberts and Kadmon (henceforth referred to as RK) differ in signi¿cant ways from those of the traditional focus projection approaches – yet, so far the different predictions have not yet been empirically explored or tested. In the next section, we thus compiled potentially relevant experimental results from the published literature and discuss it under this perspective. In section 5, we explore potentially relevant data from annotated corpora to investigate the empirical validity of the new approaches and to determine where more empirical evidence is needed to distinguish the competing theoretical proposals.

4.

Exploring the experimental evidence

We start the discussion of experimental evidence with research studying the perception of spoken language and then turn to experiments investigating language production in such controlled, experimental settings. Experiments for English and for German are included, as both of these are intonation languages for which the relation between pitch accents and pragmatic focus has traditionally been assumed to involve focus projection. We here discuss only those studies presenting data that is relevant for the fundamental distinction between focus projection and a pragmatics-only approach as the topic of this paper. Some other experimental and corpus studies address orthogonal aspects of the prosody-information structure, such as Stolterfoht & Bader (2004) investigating the effect of word order variation on processing of focus structures, Bau-

8. As pointed out by a reviewer, RK’s explanation does not carry over to German, where in the context of the wide focus question of (6), the verb clearly is unaccented in a sentence such as (31). (31)

Ich I

habe BILL have Bill

eingeladen. invited

216

Kordula De Kuthy & Walt Detmar Meurers

mann & Riester (2012) the prosodic realization of given elements, and Féry & Ishihara (2009) the prosodic realization of second occurrence focus. 4.1.

Perception experiments

4.1.1. Gussenhoven (1983) Gussenhoven (1983) contains one of the earliest sets of experiments in which the relationship between accent placement and focus is studied. He investigates the hypothesis that a single accent on an argument is suf¿cient for socalled merged predicate-argument combinations to be focused, whereas this is not possible for other predicate argument combinations or when a predicate combines with an adjunct. The experiment thus directly addresses the empirical grounding of a particular subcase of focus projection: whether and when focus projection over an unaccented verbal head is possible. For the purpose of our paper, the question whether Gussenhoven (1983) ¿nds any evidence for focus projection is the crucial question here. The nature of the constraints on such focus projection, e.g., whether argument vs. adjunct and merging vs. non-merging predicates is the right distinction to make here, is an important question to tackle in future work once the fundamental question about the existence of focus projection has been settled. The perception experiment conducted by Gussenhoven to test his hypothesis is a context-retrievability experiment: Participants in the experiment judge whether a question and an answer are from the same dialogue or whether the answer was given in response to another question. The experiment included two sets of data differing in the type of predicates occurring in the VP. 4.1.1.1. Experiment 1 The ¿rst set of data only contained so-called merging predicate-argument combinations, which according to Gussenhoven are combinations involving regular, lexically ¿lled argument NPs (in contrast to pronouns and quanti¿ers). The experiment included two types of questions and two types of answers as illustrated in (7) and (8): (7)

(8)

a.

What does he do?

(wide, VP focus)

b.

What does he teach?

a.

He TEACHES LINGUISTICS.

(accents on verb and argument NP)

b.

He teaches LINGUISTICS.

(accent on argument NP only)

(narrow, argument NP focus)

Focus projection between theory and evidence

217

Gussenhoven hypothesizes that in a sentence with a merging predicate-argument combination and an accent on the argument such as (8b) the entire VP can be the focus – just like for (8a), where both words in the VP are marked by an accent. For the experiment, he thus predicts that listeners should not be able to tell any difference between the answers (8a) and (8b) to question (7a). This prediction was con¿rmed by the results of the experiment: Listeners performed no better than chance in judging whether questions and answers were matched. This ¿nding supports the existence of focus projection: To focus the VP, it is suf¿cient to accent the object NP. This ¿nding is not expected under RK’s theory, where the accent on teach and linguistics in (8a) indicates that neither is retrievable, whereas the sole accent on linguistics in (8b) requires teach to be retrievable. To save the approach without postulating exceptions, one apparently has to argue that when participants hear (8b) following the question (7a) they accommodate a context with different retrievability relations than when they hear (8a) following the same question. Even then it remains unclear, though, how teach in (8a) can be not retrievable in a context following the question (7b). 4.1.1.2. Experiment 2 The second set of data investigated by Gussenhoven consisted of sentences with predicate-adjunct combinations and non-merging predicate-argument combinations (involving pronouns and quanti¿ers). For such answers Gussenhoven hypothesizes that in the VP focus condition both the predicate and the adjunct or argument should receive an accent. In contrast to the ¿rst experiment, the listeners in the experiments should thus be able to match narrow and wide focus questions with the corresponding one or two accent answers. For the adjunct case, Gussenhoven’s experiment includes questions such as the ones in (9), where the wide focus question (9a) is identical to the one used for merging predicate-argument cases above (7a), but the narrow focus question (9b) focuses on the adjunct PP instead of the object. The answers are shown in (10), with (10a) including one accent on the verb and a second one on the adjunct, whereas (10b) only includes an accent on the adjunct. (9)

a.

What does he do?

(wide, VP focus)

b.

Where does he teach?

(narrow, adjunct PP focus)

(10) a.

He TEACHES in GHANA.

(accents on verb and PP adjunct)

b.

He teaches in GHANA.

(accent on PP adjunct only)

218

Kordula De Kuthy & Walt Detmar Meurers

Examples (11) and (12) illustrate the questions and answers used in the experiment for a non-merging predicate-argument combinations, here involving a negative existential quanti¿er as argument. (11) a. b.

Please tell me what happened that night. What do you remember from the last lesson?

(wide, VP focus) (narrow, NP focus)

(12) a.

I REMEMBER NOTHING.

(accent on V and NP)

b.

I remember NOTHING.

(accent on NP only)

The results of the perception experiment for this second data set, containing both predicate-adjunct and non-merging predicate-argument combinations con¿rms Gussenhoven’s hypothesis: Listeners matched narrow focus questions (11b) with answers accenting only the NP (12b) and wide focus questions (11a) with answers accenting both the verb and the NP (12a) more frequently than expected by chance. In sum, Gussenhoven’s experiments show that merging predicates-argument combinations allow focus projection from an accented argument over the non-accented verb, whereas non-merging predicates and head-adjunct combinations are not among the syntactic patterns which allow focus projection. For these results to be compatible with RK’s approach, any such difference would have to be shown to systematically arise from differences in retrievability/expectability. 4.1.2. Birch & Clifton (1995) Birch & Clifton (1995) revisit the issues of Gussenhoven (1983) using two experimental tasks: a make-sense judgement task asking about the appropriateness of a dialogue in which the time to make a yes/no judgement is measured, and a linguistic judgement task in which subjects rate prosodic appropriateness on a Likert scale (1–5). Using those two tasks, they investigate focus projection for merging and non-merging predicate-argument combinations, resulting in a total of four experiments. 4.1.2.1. Experiments 1 and 2 The examples in (13) and (14) show the relevant question and answers used in dialogues in the ¿rst two experiment covering the merging predicate-argument combinations.

Focus projection between theory and evidence

(13) Isn’t Kerry pretty smart?

219

(wide, VP focus)

(14) a.

Yes, she TEACHES MATH.

(accents on V and NP)

b.

Yes, she teaches MATH.

(accent on NP only)

For the linguistic judgement task in experiment 1, for the broad VP focus question (13) subjects showed a small but signi¿cant preference for answers with accents on both V and NP (14a) over an accent only on the NP (14b). This result contrasts with Gussenhoven’s result that subjects in a wide focus context were unable to distinguish between the two options, a sentence with two pitch accents and a sentence where focus projects from a single pitch accent on an argument. For the make-sense judgement task in experiment 2, Birch & Clifton report that for the VP focus question (13) they observed the same reaction times for answers with accents on both V and NP (14a) as for answers in which only the NP is accented (14b). This supports the hypothesis that focus can project from a pitch accented argument. In sum, Birch & Clifton (1995) interpret the results of these two experiments as indicating that for merging predicates accenting the verb of a focused VP is optional. 4.1.2.2. Experiments 3 and 4 The second set of experiments used questions supporting VP focus and answers with non-merging predicate-argument combinations, which Birch & Clifton refer to as “non-lexical” argument NPs. This is illustrated with negative quanti¿ers in (16), answering question (15). (15) What can you tell me about the math program at Cornell this year? (16) a.

They ACCEPTED NO ONE.

b.

They accepted NO ONE.

The results for these two experiments surprisingly are the opposite of those obtained in the ¿rst two experiments. In the linguistic judgement task in experiment 3, the subjects showed no preference for answers with accents on V and NP (16a) over only on NP (16b). In the make-sense judgement task in experiment 4, Birch & Clifton received faster response times for answers with accents on both V and NP (16a) than for NP only (16b).

220

Kordula De Kuthy & Walt Detmar Meurers

The overall conclusion that Birch & Clifton draw from these mixed results is that accented lexically ¿lled argument NPs project focus, while non-lexically ¿lled ones do not, which is supported by the results of experiment 1 (linguistic judgement task, lexically-¿lled) and experiment experiment 4 (make-sense judgement, non-lexically ¿lled). This does not explain, however, why in experiment 2 (make-sense judgement task, lexically-¿lled) subjects preferred sentences with two accents in a wide VP focus context. And it leaves unexplained why in experiment 3 (linguistic judgement, non-lexically ¿lled) no distinction between the single and the double accented answers was observed. 4.1.3. Welby (2003) Welby (2003) investigates the inÀuence of prosodic phrasing on focus projection. Gussenhoven (1983) and Birch & Clifton (1995) probed into the existence of focus projection by checking whether an accent on the verb is required or not to obtain a broad VP focus. Welby (2003) distinguishes two prosodic patterns with respect to the accented verb: one, where the verb has a prenuclear accent and occurs in the same prosodic phrase as the accented NP argument (a hat pattern), and one, where the accented verb contains a nuclear accent and both, the verb and the NP argument, form their own prosodic phrases (a two peak accent pattern). Similar to Gussenhoven (1983), Welby (2003) uses two types of questions in her experimental setup, one for VP focus as illustrated in (17a), and one for object-NP focus illustrated in (17b). (17) a. b.

What’s that terrible smell coming from the neighbors’ yard? There’s a terrible smell coming form the neighbors’ yard. What are they burning?

There are the four possible answer types, which are illustrated in (18). To make the relevant answer intonation patterns explicit, Welby (2003) uses the ToBI system (Beckman & Pierrehumbert, 1986), which based on the autosegmentalmetrical approach to intonation describes the perceived intonation contour in terms of high (H) and low (L) targets in the local pitch range. For English, seven accents are distinguished, with the * marking the tone on the accented syllable: H*, L*, or bitonal: H*+L, H+L*, L*+H, L+H*, H*+H. Intonational boundaries are marked with a strength of 0–4, with the tones of intermediate boundary (0–3) being notated as H˰G or L˰G and that of full boundaries (4) as L% or H%.

Focus projection between theory and evidence

(18) a. b. c. d.

They’re BURNING their garbage. H*

L-L%

They’re burning their GARBAGE. H*

L-L%

They’re BURNING their GARBAGE. H* H*

L-L%

They’re BURNING their GARBAGE. H* LH*

L-L%

221

(verb) (obj-NP) (“hat”) (two peak)

Each of the two question types was matched with each of the four answer types and parallel to the linguistic judgement task of Birch & Clifton (1995) listeners were asked to rate the acceptability of the question-answer pairs using a Likert scale. The results showed no signi¿cant difference between the two question types, the wide VP focus (17a) and the narrow object NP focus questions (17b). Answers with the “hat” pattern (18c) and the object-NP-only accent pattern (18b) were rated as equally appropriate for both question types. The two-peak pattern (18d), was rated as less acceptable. Answers with a single accent on the verb (18a) were rated the worst. Welby (2003) interprets this as showing that a prenuclear pitch accent does not affect focus structure interpretation. Interestingly, this even holds for the narrow NP focus case, where a prenuclear accent on the verb (which is not part of the focus) did not affect the acceptability rating. This is not expected under Robert’s approach, where the novelty implicature of focus requires accented material to be irretrievable. For the issue discussed in this paper, the fact that broad and narrow focus contexts resulted in the same judgements provides clear support for focus projection. It is unclear how RK’s approach could explain that the subjects assumed that the unaccented material is retrievable/expectable in one but not the other question context. 4.1.4. Féry (1993) In one of the few perception experiments for German, Féry (1993) tested the hypothesis that the same early nuclear pitch accent can signal narrow focus or broad focus. Féry conducted the following context-retrievability experiment: Minimal pairs of sentences with a pitch accent on the subject were recorded, once as the answer to a question inducing narrow focus as in (19a), and once as the answer

222

Kordula De Kuthy & Walt Detmar Meurers

to a question inducing broad focus as illustrated in (20a). The experiment thus seems to be parallel to the one conducted by Gussenhoven (1983) for English, but without his second answer type bearing two pitch accents. (19) a.

Wer ist verhaftet who has arrested

worden? been

b.

GORBTSCHOV ist Gorbachev has

verhaftet worden. arrested been

(20) a.

Hast Du heute have you today

die the

b.

GORBTSCHOV ist Gorbachev has

verhaftet worden. arrested been

(narrow, NP focus)

Nachrichten gehört? news heard

(wide focus)

The two recorded questions then were randomly paired with the realizations of the answers to obtain the four pairs (19a) – (19b), (19a) – (20b), (20a) – (19b), (20a) – (20b) and the participants in the experiment had to judge whether a question and an answer are from the same or a different dialogue. Féry (1993) reports that listeners decided at random whether the realizations of the answer, (19b) or (20b), was an answer to the question inducing narrow focus (19a) or to the one inducing broad focus (20a). She thus concludes that there is no difference in tonal realization between a narrow and a wide focus answer, i.e., the same pitch accent on the subject signals broad or narrow focus. The result of the experiment is unexpected under RK’s approach, where a single accent on Gorbachev should not be acceptable as an all-new utterance in answer to (20a). Even if one were to assume that (20b) in answer to (20a) does differ from the realization (19b) recorded in answer to (19a) by also including an accent on the verb, then Roberts (2006) would predict the sentence to only be acceptable as an answer to (20a) and not also to (19a). In either case, the result of the experiment showing that subjectless associated answers to questions by chance is not expected. 4.2.

Production experiments

Complementing the perception experiments discussed above, there are two recent production experiments that study the different prosodic means used to signal different focus structures, including broad, narrow and contrastive focus.

Focus projection between theory and evidence

223

4.2.1. Baumann, Grice & Steindamm (2006) Baumann et al. (2006) report on a production experiment testing the prosodic means that speakers use in German utterances with focus domains of various sizes. When going from broad to narrow focus, on to contrastive focus examples, Baumann et al. (2006, sec 3.2) observe that speakers make use of one or more of the four strategies: i) increased duration of the focus exponent, ii) higher peak on the nuclear accent (marking the focus exponent), iii) greater pitch excursion to the peak of the nuclear accent, and iv) delay in the nuclear accent peak. However, there is signi¿cant variation in the use of those strategies. Some speakers make use of a categorical distinction downstepping vs. nondownstepping, using downstepping for broad focus and non-downstepping for narrow focus and contrastive focus. Other speakers do not use downstepping contours at all, i.e., all nuclear accents are of the same type regardless of a narrow or a broad focus domain. 4.2.2. Féry & Kügler (2008) Féry & Kügler (2008) report on a related production experiment studying the prosodic means employed in focus domains of various sizes. They observe similar strategies as Baumann et al. (2006) in structures with narrow focus versus structures with broad focus. For example, the height of the nuclear pitch accent tends to be higher in narrow focus structures. They also observe a signi¿cant variation in the use of strategies, for example, with downstepping in broad focus structures: Not a single speaker uses a downstep pattern in every broad focus, but all speakers use it at least once. The interesting question arising from both of the production studies (Baumann et al. 2006; Féry & Kügler 2008) is what those gradient production tendencies mean for the perception of those contours. Does the use of these strategies lead to categorical perception differences? Are there sen- tences which are realized in such a way that they can only be interpreted with a particular information structuring (i.e., narrow or broad focus)? What are suf¿cient strategies for appropriate information structure compatibility? On the other hand, it is important to ask whether the use of one or more of these strategies is required to permit a sentence to occur in one focus context or another. Are there acceptable sentences which are considered as possible with a particular information structuring, yet do not include any of those strategies? What are necessary realizations for indicating a speci¿c information structure realization? Steindamm (2005) conducted perception experiments with examples generated using the production strategies identi¿ed by Baumann et al. (2006). She

224

Kordula De Kuthy & Walt Detmar Meurers

reports that the strategies identi¿ed in the productions do not transparently impact the judgements made by the listeners in a linguistic judgement task – highlighting the importance of further investigating the suf¿cient and necessary prosodic indicators of focus. In sum, the observed variation raises the question of determining suf¿cient and required prosodic indicators for focus and how this can be theoretically captured. For a pragmatics-only theory with its strict linkage between prosodic realization and pragmatic effect without intervening relations such as focus projection, it seems particularly dif¿cult to license such signi¿cant variation.

5.

Exploring corpus evidence

The review of the experimental evidence, in particular that arising from the perception experiments, supports the existence of focus projection. Yet, we also discussed some contradictory results and a number of aspects which the experiments conducted so far have either not distinguished or have not investigated. And we discussed the signi¿cant range of prosodic strategies and variation in their use which was made explicit in recent production studies. To properly evaluate the claims made by the different theoretic approaches relating pitch accents and their information structure effects, more evidence thus is crucially needed. This is even more pressing when going beyond the fundamental architectural question about the existence of focus projection towards an answer to the more general question: In which constructions in which context can (or must) which kind of elements be accented with which type of accents to support focusing of which part of the sentence? While beyond the scope of this paper, the role of different types of syntactic constructions and different types of accents clearly is an important topic for future research. Complementing experimental evidence testing concrete and speci¿c experimental hypotheses, in this section we want to investigate a second source of empirical data: exploring linguistically annotated corpora. Before diving into the speci¿cs of the corpus used, let us be clear that corpus data needs to be interpreted with care. The fact that a particular type of example was found in a corpus, does not necessarily mean that it is a systematic instance which needs to be licensed by linguistic theories. Similarly, the absence of a particular type of example in a corpus does not mean that it should not be licensed, given that following Zip’s law (Zipf 1936) most things will occur only rarely and corpora are limited in size. Nevertheless, corpus data can provide important empirical insights for theoretical linguistic analysis (cf. Meurers 2005; Meurers & Müller 2009).

Focus projection between theory and evidence

5.1.

225

The IMS Radionews Corpus

We base our ¿rst corpus exploration on the IMS Radionews Corpus (Rapp 1998), one of the few intonationally annotated corpora of German. It includes recordings of radio broadcasts on the Deutschlandfunk for a total length of 1 hour and 26 minutes, amounting to 514 sentences. The corpus prepared by Rapp (1998) includes manual segmentation into news stories, orthographic transliteration, automatically word alignment, phonetic transcription, and manual prosodic labeling with the Stuttgart version of GToBI. The question whether such ToBI annotation represents all of the relevant aspects of prosody and does so in suf¿cient detail is an interesting one (cf., e.g. Breen, Dilley, Kraemer & Gibson, in press). While pointing to a relevant avenue for future research, for the research issue discussed in this paper, the exploration of evidence for focus projection in comparison to a pragmatics-only approach, the GToBI annotation provided with the corpus is suf¿cient to access and interpret relevant sets of data. Searching for the relevant focus projection patterns in a corpus is made signi¿cantly easier if one can refer to constituents, yet the IMS Radionews Corpus is not syntactically annotated. We therefore parsed the corpus with the Berkeley parser (Petrov & Klein, 2007). While the resulting syntactic annotation is far from perfect, we found that it is of high enough quality to search for the relevant patterns with suf¿cient precision and recall. Following syntactic annotation, we converted the corpus into TIGER-XML format, so that it can be browsed and searched with the TIGERSearch tool (Lezius 2002). The converted corpus includes the orthographic transcription, the phonetic transcription, the ToBI annotation, and the syntactic analysis. 5.2.

Exploring focus projection in the IMS Radionews Corpus

To identify potential instances of focus projection, we used TIGERSearch to search the corpus for examples containing complex NPs or VPs with H* or H*L accents. These accents in German can signal focus (Féry 1993; Grice et al. 2005), and sentences in which such an accented syllable is included in a complex NP or VP structure thus are potential candidates for focus projection. Qualitatively evaluating the results thus obtained, the ¿rst observation is that one ¿nds sentences which exemplify the traditional focus projection pattern. For example, in the example shown in (21)9, the H*L accent falls on the

9. Audio available at http://purl.org/dm/audio/np-ein-akzent.aiff

226

Kordula De Kuthy & Walt Detmar Meurers

last element (Bosnia) of the PP but the entire PP constituent is focused in this all-new utterance beginning this news item. (21) Bundesinnenminister Kanther hat sich gegen die Aufnahme federal minister Kanther has self against the acceptance L*H L*H L*H weiterer Flüchtlinge aus Bosnien ausgesprochen. further refugees from Bosnia spoken H*L L% Figure 1 shows the PP structure as it appears in the annotated corpus. Each token is annotated phonetically, with its part of speech, the GToBI break and tone indices.

PP

NP

PP

gegen

die

Aufnahme

weiterer

ge:1 (L*H) g@n0 di:1 aUf1 na:0 m@0 vaI1 t@0 R@R0 APPR ART NN ADJA 1 1 1 1 L*H

Flüchtlinge flYCt1 lIN0 N@0 NN 1

aus

Bosnien

aUs1 bOs1 (H*L) ni:0 @n0 APPR NE 1 2 H*L

Figure 1. PP exemplifying focus projection

The automatically obtained syntactic annotation shows Àat PP and NP structures, which are suf¿cient for searching for potentially relevant patterns which we can then analyze qualitatively. Here we were looking for a PP containing a single H*L pitch accent on the rightmost element. The fact that one does ¿nd corpus examples such as (21), in which a single nuclear pitch accent seems to be suf¿cient to support focus of a much larger unit, lends support to the existence of some form of focus projection. Or viewed another way, such corpus examples provide concrete cases for which alternative explanations (such as missing accentuation due to retrievability/expectability) would have to hold up.

Focus projection between theory and evidence

227

In exploring the corpus, we also found many examples with signi¿cantly more accents than are predicted by syntactic theories built on focus projection, with some examples carrying pitch accents on almost all of the words. Example (22)10 shows an example with an NP including multiple pitch accents and (23)11 shows an example with an accent on every part of an NP. (22) Der nordrhein-westfälische the North Rhine Westphalian H*L

Ministerpräsident Rau hat den prime minister Rau has the !H*L L*H L%

Führungsstreit bei den leadership dispute among the H*L-

Sozialdemokraten kritisiert. social democrats criticized H*L *? L%

(23) Außenminister de Charette versicherte in dem heute von der Zeitung Sydney Morning Herald veröffentlichten Schreiben, Foreign minister de Charette assured in a letter published by the newspaper Sydney Morning Herald today von den Versuchen auf of the testing on L*H? L*H- L*H Gefährdung der Umwelt harm the environment H*L H*L L*HL

dem Mururoa-Atoll werde the Mururoa atoll will L*H

keinerlei no

ausgehen. emanate H%

We also found numerous examples such as (24)12, with accents that occur in positions that are unexpected for current theories. In this all focus sentence, focus projection approaches would seem to predict a pitch accent Menschen (people), yet we ¿nd the H*L pitch accent further to the left on Verunsicherung (uncertainty). For RK’s approach to work, it would be interesting to work out why Menschen would be analyzed as retrievable/expectable here.

10. Audio available http://purl.org/audio/4-multiple-np-accents.aiff 11. Audio available at http://purl.org/dm/audio/3-multiple-np-accents.aiff 12. Audio available at http://purl.org/dm/audio/verunsicherung.aiff

228

Kordula De Kuthy & Walt Detmar Meurers

(24) Der deutsche Sparkassen- und Giroverband hat davor gewarnt, die psychologischen und praktischen Probleme bei der Einführung einer gemeinsamen europäischen Währung zu unterschätzen. Die Konvergenzkriterien müßten unbedingt eingehalten werden, betonte Köhler in einem Interview. Bloße Tendenzen reichten dabei nicht aus, The German banks warned that the psychological and practical problems with introducing the joint currency should not be underestimated. The convergence criteria must de¿nitely be observed, said Köhler in an interview. Bare tendencies are not suf¿cient, es dürfe nicht zu einer Verunsicherung der Menschen kommen. it needs not to a uncertainty of the people come *? H*L L% Finally, there seems to be signi¿cant variation in the prosodic realization. This can be exempli¿ed by comparing the realization of a news item which was repeated in several news broadcasts included in the corpus. In (25)13 we see an example for such a repeated news item, with the two prosodic annotations showing the different ways the same sentence was realized.

(25) Der Verband südostasiatischer Staaten, ASEAN, hat heute auf the organization southeast Asian nations ASEAN has today on L*H? L*!H- L*H H% H*L? L*H H*L L*HL*H% L*Hseiner Jahrestagung im its annual meeting in the L*!HL*H

Sultanat Brunei Vietnam sultanate Brunei Vietnam L*H- H*L L*H% H*L

aufgenommen. af¿liated L% %

13. Audio available for the ¿rst at http://purl.org/dm/audio/vietnam-1.aiff and for the second realization at http://purl.org/dm/audio/vietnam-2.aiff.

Focus projection between theory and evidence

229

Focus projection theories typically do not include any predictions on prenuclear accents, but arguably would need to be extended to do so. At the same time, it is unclear how a theory such as RK’s, requiring every accented word to be irretrievable or unexpectable and every unaccented word in the focus to be retrievable or expectable, can account for the exhibited prosodic variation. While the examples discussed above shed some light on the nature and variability of the intonation found in real-life sentences spoken in context, some of their properties will also be related to the nature of the data collected in the IMS Radionews Corpus. Read news speech is a very speci¿c genre, in which text is read to a heterogeneous audience for which some background knowledge is assumed. While a theory explaining the relation between pitch accent and pragmatic focus arguably also has to be able to explain this context of use, for a broad empirical basis and sound generalizations it is important to complement examples from the IMS Radionews Corpus with other corpora of authentic language arising in real-life tasks. Thus we next turn to the Verbmobil corpus as another potential source of corpus evidence. 5.3.

The German Verbmobil Corpus

Our second exploration of the focus projection issue is based on the German Verbmobil Corpus. The Verbmobil Corpus consists of spontaneous speech recorded in a dialog task in the domain of appointment scheduling.14 The German corpus of the ¿rst phase (VM1) consists of 13.910 utterances (dialogue turns) with 317.142 words. A small portion of the corpus from the ¿rst phase of the project was annotated prosodically with GToBI. Selecting all dialogues which were GToBI labeled (all from the CDs VM1.1, VM2.1, VM3.1, VM4.1, and VM5.1), we obtained a subcorpus of 917 dialogues, consisting of 1.841 sentences (dialogue turns). The GToBI annotation (Grice, Reyelt, Benzmüller, Mayer & Batliner 1996; Reyelt, Grice, Benzmüller, Mayer & Batliner 1996; Reyelt 1996) used in the VM corpus15 distinguishes L*, H*, H*?, and the bitonal L+H*, L*+H, H+L*, H+!H*, two intermediate phrase boundary tones H- and L-, and four IP bound14. A reviewer pointed out that the authenticity of the data may be viewed as limited by the fact that these are recordings of people who were told to schedule appointments, i.e., they were acting instead of satisfying genuine real-life needs. At the same time, it is unclear whether pursuing the extremes of authenticity would provide better evidence for research questions discussed here. Even for lab speech, Xu (2010) provides convincing arguments as to its validity and importance for studying the nature of human speech. 15. Cf. http://www.bas.uni-muenchen.de/forschung/Bas/BasProsodie.html.

230

Kordula De Kuthy & Walt Detmar Meurers

ary tones L-L%, L-H%, H-L%, H-H%. The German Verbmobil Treebank (Stegmann et al. 2000) corpus contains dialogue turns arbitrarily extracted from all data collected during both phases of the Verbmobil project, so that we cannot refer to it for the syntactic analyses of the prosodically annotated dialogues. We thus used the same procedure as with the IMS Radionews Corpus above and parsed the 1.841 GToBI annotated turns with the Berkeley parser (Petrov & Klein 2007) to be able to search more ef¿ciently for potentially relevant syntactic and prosodic patterns. 5.4.

Exploring focus projection in the Verbmobil corpus

We used TIGERSearch to search the Verbmobil corpus for examples containing H* or H*L accents to identify potential instances of focus projection and manually evaluated the results. The utterances in the Verbmobil Corpus display a similar variability of accent patterns as the ones in the IMS Radionews corpus. We found many examples with signi¿cantly more accents than are traditionally assumed by syntactic theories of focus projection, with some examples carrying pitch accents on almost all of the words. In the dialogue in (26), the utterance (26b) illustrates a prosodic pattern where all content words in a broad focus structure carry an H* accent. (26) a. b.

Wenn Sie mir noch kurz erklären, wie ich zu Ihnen komme. Could you brieÀy explain how I can ¿nd you. Sie ¿nden mich im zweiten Stock in Zimmer zweihundert you ¿nd me on the second Àoor in room two hundred H* L+H* L+H* H* !H* drei three !H*

The example also shows a pattern discussed in the production study of Baumann et al. (2006) for broad focus: the downstepping of pitch accents towards the end of the focus domain. There is variation in the use of the downstepping pattern, as illustrated in (27).

Focus projection between theory and evidence

(27) a.

231

Was kann ich für Sie tun? What can I do for you?

b.

In unserem Projekt ist unerwartet ein Problem aufgetaucht. in our project is unexpectedly a problem surfaced L+H* H* L+H*

c.

Wir müssen möglichst schnell eine Besprechung ansetzen. we must preferably quick a meeting arrange H* !H* !H*

In this dialogue, the sequence of utterances (27b) and (27c) answering the question in (27a) both show H* accents on most of the content words in the broad focus. But only in the second utterance, the speaker uses the downstepping pattern. We also found examples with fewer accents in a broad focus pattern. The example in (28) shows a VP focus with L+H* accents only on the arguments. Such unaccented verbal heads and unaccented adjuncts are commonly found in the corpus. (28) a. b.

Wie sieht das bei Ihnen am Donnerstag aus? What does your Thursday look like? Da muß ich leider zu einem Treffen nach Köln. there must I unfortunately to a meeting in Cologne H* L+H* L+!H*

And ¿nally, the examples in (29) and (30) illustrate the typical focus projection pattern. (29) Ich wollt’ Dir gerade ’ne Mail schicken. I wanted you just a mail send L*+H (30) Ja, Frau Petz, dann lassen Sie uns doch einen Termin ausmachen. yes Mrs Petz then let you us still a date schedule H* H* H* In (29), an all-new utterance at the beginning of a dialogue, the noun Mail carries the single pitch accent in this all-focus sentence. In (30), which also is the

232

Kordula De Kuthy & Walt Detmar Meurers

opening sentence of a dialogue, there is only one H* accent on the noun Termin in the focused then sentence. In sum, the exploration of the Verbmobil corpus con¿rms the patterns found for the IMS Radionews corpus, with some examples illustrating apparent focus projection patterns, others showing substantial additional accentuation, and a signi¿cant amount of variation in the realizations. There is an interesting similarity between the additional accentuation and variation in the realization we found and the results of Baumann & Riester (2012), who report on a corpus study with two types of data, read speech and spontaneous speech. They investigate the accentuation of different types of given material and report signi¿cant additional accentuation running counter to their original expectations. In particular, they did not con¿rm the hypothesis that given noun phrases are generally deaccented. Instead, in many cases given NPs carried a nuclear pitch accent. They also found a clear difference between spontaneous and read speech: While in read speech there was a general tendency to deaccent given NPs, this was not observed in the spontaneous data.

6.

Conclusion

In researching the interaction between intonation, information structure and syntax, the question whether focus projection exists and, if so, how it is constrained, plays a central role. In three decades of research, the different aspects which need to be taken into account at this interface, and the linguistic modeling it requires, has become more elaborate. Yet, recent theories proposing to eliminate focus projection showcase that the theoretical proposals deserve to be revisited and reconnected to sound empirical insights. In this paper, we linked the discussion in theoretical linguistics around eliminating focus projection altogether (Roberts 2006; Kadmon 2006) to the empirical evidence which has been provided so far for focus projection in experimental studies. We found that the experimental evidence, in particular that arising from the perception experiments, supports the existence of focus projection. Yet, we also discussed some contradictory results and a number of aspects which the experiments conducted so far have either not distinguished or have not investigated, such as the role of additional accents in broad focus structures. And we reported on the signi¿cant range of prosodic strategies identi¿ed by the experiments, like increased duration of the focus exponent or higher peaks on the nuclear accents, and the variation in their use which was made explicit in recent production studies.

Focus projection between theory and evidence

233

We complemented the experimental results with an exploration of evidence in annotated spoken language corpora illustrating the space of apparently acceptable realizations. In addition to con¿rming the existence of focus projection patterns, we also found intonation patterns with additional or unexpected accentuation under a focus projection perspective. In terms of theoretical interpretation, such signi¿cant variation goes against requiring particular intonational patterns for particular information structure uses. We can thus conclude that it is relevant and important to further investigate the nature of such sentences and contexts where variation in the realization is possible: Are there syntactic, semantic, or information structure restrictions on when such variability arises? The corpus-based investigation essentially adds to the experimental evidence on production. But we only see what the speakers realized in a given linguistic context. In corpora we generally have limited information on the context and the questions under discussion, and we often have no evidence on how the sentences are interpreted by the hearers. One way to push the boundaries of what can be inferred based on corpus data is to collect task-based corpora making concrete what the speaker/writer wanted to do and what information was available to them. Following this line of thinking, we are collecting a corpus of answers to reading comprehension questions (Meurers, Ziai, Ott & Kopp 2011). The written corpus provides access to the text that the questions are about as well as the actual question that was being answered, providing a more explicit basis on which to interpret the collected answers and investigate their information structure – though spoken answers would be needed to also investigate prosody with such a corpus. Task-based corpora bear some similarity to experimental research and one may want to view corpora and experiments as two sides of a continuum: from fully controlled, uncontextualized lab experiments at one end of the continuum to more ecological validity in natural experimental tasks and non-interfering online measurements (e.g. visual world paradigm) in the middle; from corpora as collections of whatever happens to exist (traditionally news corpora) at the other end of the continuum to corpora resulting from elicitation in controlled tasks (e.g., answering reading comprehension questions asking for information in explicitly given in the reading texts) in the middle. The notion of a continuum may also be relevant with respect to the overall topic of this paper. As we discussed, the literature on the prosody-pragmatics interface has been driven by the two extreme perspectives, the syntax-driven and the pragmatics-driven approach. These essentially are two distinct but not entirely incompatible perspectives: The ¿rst question is whether syntax plays a role in mediating between pitch accent placement and what is interpreted as

234

Kordula De Kuthy & Walt Detmar Meurers

focus in pragmatics. The second question is whether formal pragmatic aspects (retrievable, expectable) play a role in determining which elements can be part of the focus despite not bearing an accent. The traditional F-marking approach answers the ¿rst question with yes and makes concrete how the F-marking of syntactic structure proceeds; no particular pragmatic status of the material projected over is assumed, answering the second question with a no. The more recent pragmatics-only approaches answer the ¿rst question with no, negating the existence of focus projection and syntactic constraints on it. The second question is answered af¿rmatively, with the claim that all parts of the focus which do not bear an accent must be retrievable/expectable. While these two proposals mark the extremes, one can also subscribe to a position answering both answers with yes to include syntactic projection (including the possibility that lexical and/or syntactic constraints exist, i.e., have a direct impact on the mediation) together with some pragmatic factors constraining which material can be part of the focus without being accented (i.e., what can be projected over or be deaccented). Any sustainable pragmatic account will need to revisit the lexical, word order, and other syntactic conditions which have been identi¿ed in the literature to capture when focus can project. If one wants to limit or rule out syntactic mediation between intonation and information structure, one needs to identify other, pragmatic conditions providing alternative explanations for the constraints traditionally derived from syntax. At the same time, it seems equally clear that for a syntactic focus projection approach to be sustained, an investigation of the formal pragmatic status of the material that can be projected over is needed to ¿ll this important gap. As things stand, neither the syntactic nor the pragmatic perspective alone are suf¿cient to account for the complex empirical landscape.

Acknowledgements We would like to thank David Beaver, Daniel Büring, Arndt Riester, Craige Roberts, Philip Schulz, and the audiences at Linguistic Evidence 2010 for helpful pointers and stimulating discussions around the topic of this paper. We are particularly grateful to Britta Stolterfoht and the two anonymous reviewers for their insightful and detailed reviews.

Focus projection between theory and evidence

235

References Baumann, Stefan, Martine Grice & Susanne Steindamm 2006 Prosodic Marking of Focus Domains – Categorical or Gradient? In Proceedings SpeechProsody 2006, pp. 301–304. Dresden, Germany. http:// www.uni-koeln.de/phil-fak/phonetik/Institut/Forschungsschwerpunkte/ baumann-grice-steindamm-revised.pdf. Baumann, Stefan & Arndt Riester to appear Referential and Lexical Givenness: Semantic, Prosodic and Cognitive Aspects. In Gorka Elordieta and Pilar Prieto, (eds.), Prosody and Meaning, volume 25 of Interface Explorations. Mouton de Gruyter, Berlin. http://www.uni-koeln.de/phil-fak/phonetik/Institut/Mitarbeiter/ sbauman1/sbaum/Baumann-Riester-draft.pdf. Beaver, David & Dan Velleman 2011 The Communicative Signi¿cance of Primary and Secondary Accents. Lingua, 121(11): 1671–1692. Beckman, Mary & Janet Pierrehumbert 1986 Intonational Structure in Japanese and English. Phonology Yearbook, 3. Bildhauer, Felix & Philippa Cook 2010 German Multiple Fronting and Expected Topic-Hood. In Stefan Müller, (ed.), Proceedings of the 17th International Conference on Head-Driven Phrase Structure Grammar, Paris University Paris Diderot, France, pp. 68–79. CSLI Publications, Stanford. http://cslipublications.stanford. edu/HPSG/2010/bildhauer-cook.pdf. Birch, Stacy & Charles Clifton Jr. 1995 Focus, Accent and Argument Structure: Effects on Language Comprehension. Language and Speech, 38(4): 365–391. Breen, Mara, Laura C. Dilley, John Kraemer & Edward Gibson in press Inter-transcriber Reliability for Two Systems of Prosodic Annotation: ToBI (Tones and Break Indices) and RaP (Rhythm and Pitch). Corpus Linguistics and Linguistic Theory. http://people.umass.edu/mbreen/pubs/ CLLT-Breenetal.pdf. Büring, Daniel 2003 On D-Trees, Beans and B-Accents. Linguistics & Philosophy, 26(5): 511–545. Büring, Daniel 2006 Focus Projection and Default Prominence. In Valéria Molnár and Susanne Winkler, (eds.), The Architecture of Focus, volume 82 of Studies in Generative Grammar, pp. 321–346. Mouton De Gruyter, Berlin. http:// semanticsarchive.net/Archive/DVmMzI4M/buring.focus.projection.03. pdf.

236

Kordula De Kuthy & Walt Detmar Meurers

Cook, Philippa

2001

Coherence in German: An Information-Structural Account. Ph.D. thesis, University of Manchester. Culicover, Peter & Michael Rochemont 1983 Stress and Focus in English. Language, 59: 122–165. De Kuthy, Kordula 2002 Discontinuous NPs in German — A Case Study of the Interaction of Syntax, Semantics and Pragmatics. CSLI Publications, Stanford, CA. De Kuthy, Kordula & Walt Detmar Meurers 2003 The Secret Life of Focus Exponents and What It Tells Us about Fronted Verbal Projections. In Stefan Müller, (ed.), Proceedings of the Tenth Int. Conference on HPSG, pp. 97–110. CSLI Publications, Stanford, CA. http://purl.org/dm/papers/dekuthy-meurers-hpsg03.html. Fanselow, Gisbert 2008 In Need of Mediation: The Relation Between Syntax and Information Structure. Acta Linguistica Hungarica, 55(3–4): 1–17. Fanselow, Gisbert & Damir ûavar 2002 Distributed Deletion. In Artemis Alexiadou, (ed.), Theoretical Approaches to Universals, pp. 65– 07. Benjamins, Amsterdam. Fanselow, Gisbert & Denisa Lenertová 2011 Mismatches between Syntax and Information Structure. NLLT, 29(1): 169–209. http://www.ling.uni-potsdam.de/૫fanselow/¿les/ LeftPeripheralFocus2.pdf.

Féry, Caroline 1993 German Intonational Patterns. Number 285 in Linguistische Arbeiten. Max Niemeyer Verlag, Tübingen. Féry, Caroline & Laura Herbst 2004 German Sentence Accent Revisited. In M. Schmitz Ishihara, S. and A. Schwarz, (eds.) Interdisciplinary Studies on Information Structure, volume 1, pp. 43–75. Universität Potsdam. Féry, Caroline & Shinichiro Ishihara 2009 The Phonology of Second Occurrence Focus. Journal of Linguistics, 45(2): 285–313. Féry, Caroline & Frank Kügler 2008 Pitch Accent Scaling on Given, New and Focused Constituents in German. Journal of Phonetics, 36(4): 680–703. Grice, Martine, Stefan Baumann & Ralf Benzmüller

2005

German Intonation within the Framework of Autosegmental-Metrical Phonology. In Sun-Ah Jun, (ed.), Prosodic Typology and Transcription: A Uni¿ed Approach, pp. 55–83. Oxford University Press, Oxford. Grice, Martine, Matthias Reyelt, Ralf Benzmüller, Jörg Mayer & Anton Batliner 1996 Consistency in Transcription and Labelling of German Intonation with GToBI. Verbmobil Report 153, Universität Braunschweig. http://www. ims.uni-stuttgart.de/projekte/verbmobil/vm-reports/report-153-96.ps.gz.

Focus projection between theory and evidence

237

Gussenhoven, Carlos 1983 Testing the Reality of Focus Domains. Language and Speech, 26: 61–80. Höhle, Tilman N. 1982 Explikationen für ‘normale Betonung’ und ‘normale Wortstellung’. In Werner Abraham, (ed.), Satzglieder im Deutschen, pp. 75–153. Gunter Narr Verlag, Tübingen. Jacobs, Joachim 1988 Fokus-Hintergrund-Gliederung und Grammatik. In H. Altmann, (ed.), Intonationsforschungen, pp. 89–134. Max Niemeyer Verlag, Tübingen. Jacobs, Joachim 1993 Integration. In Linguistische Arbeiten, volume Volume 306, pp. 63–116. De Gruyter. Kadmon, Nirit 2006 Some Theories of the Interpretation of Accent Placement. Handout for Presentation in the Department of Linguistics Colloquium, OSU, October 19, 2006. (Based on a talk given at the Colloque de Syntax et Sémantique in Paris, 2000). Kadmon, Nirit 2009 Some Theories of the Interpretation of Accent Placement. Ms., Tel Aviv University, 64pp. http://semanticsarchive.net/Archive/TI3Njg1M/ Kadmon-ms-2009-interp-of-pitch-accent-placement.pdf. Krifka, Manfred 2008 Basic Notions of Information Structure. Acta Linguistica Hungarica, 55: 243–276. Ladd, D. Robert 2008 Intonational Phonology, volume 119 of Cambridge Studies in Linguistics. Cambridge University Press, Cambridge. Lezius, Wolfgang 2002 Ein Suchwerkzeug für syntaktisch annotierte Textkorpora. Ph.D. thesis, IMS, University of Stuttgart. Arbeitspapiere des Instituts für Maschinelle Sprachverarbeitung (AIMS), vol 8, nr 4. Liberman, Mark 1975 The Intonational System of English. Ph.D. thesis, MIT. Meurers, Walt Detmar, Ramon Ziai, Niels Ott & Janina Kopp 2011 Evaluating Answers to Reading Comprehension Questions in Context: Results for German and the Role of Information Structure. In Proceedings of the TextInfer 2011 Workshop on Textual Entailment, pp. 1–9. Association for Computational Linguistics, Edinburgh, Scotland, UK. http://aclweb.org/anthology/W11-2401. Meurers, Walt Detmar 2005 On the Use of Electronic Corpora for Theoretical Linguistics. Case Studies from the Syntax of German. Lingua, 115(11): 1619–1639. http://purl. org/dm/papers/meurers-03.html.

238

Kordula De Kuthy & Walt Detmar Meurers

Meurers, Walt Detmar & Stefan Müller 2009 Corpora and Syntax. In Anke Lüdeling and Merja Kytö, (eds.), Corpus Linguistics, volume 2 of Handbooks of Linguistics and Communication Science, pp. 920–933. Mouton de Gruyter, Berlin. http://purl.org/dm/ papers/meurers-mueller-09.html. Petrov, Slav & Dan Klein 2007 Improved Inference for Unlexicalized Parsing. In HLT/ACL 2007, pp. 404–411. http://aclweb.org/anthology/N07-1051. Pierrehumbert, Janet 1980 The Phonology and Phonetics of English Intonation. Ph.D. thesis, MIT. Distributed by Indiana University Linguistic Club, Bloomington. Rapp, Stefan 1998 Automatisierte Erstellung von Korpora für die Prosodieforschung. Ph.D. thesis, Universität Stuttgart. Reyelt, Matthias 1996 Consistency of Prosodic Transcriptions. Labelling Experiments with Trained and Untrained Transcribers. Verbmobil Report 155, TU Braunschweig. http://www.ims.uni-stuttgart.de/projekte/verbmobil/vm-reports/ report-155-96.ps.gz. Reyelt, Matthias, Martine Grice, Ralf Benzmüller, Jörg Mayer & Anton Batliner 1996 Prosodische Etikettierung des Deutschen mit ToBI. Verbmobil Report 154, Universität Braunschweig. http://www.ims.uni-stuttgart.de/projekte/verbmobil/vm-reports/report-154-96.ps.gz. Roberts, Craige 1996 Information Structure: Towards an Integrated Formal Theory Of Pragmatics. In Jae Hak Yoon and Andreas Kathol, (eds.), Papers in Semantics, volume 49 of OSUWPL. The Ohio State University Department of Linguistics. http://semanticsarchive.net/Archive/WYzOTRkO/InfoStructure. pdf. Roberts, Craige 2006 Resolving Focus. Conference abstract of Sinn und Bedeutung 11. Barcelona, Spain. http://purl.org/dm/handouts/roberts-sinn-und-bedeutung-2006. pdf. Roberts, Craige 2008 Resolving Focus. Handout for the presentation given at the 34th Annual Meeting of the Berkeley Linguistics Society (BLS), February 2008. http://ling.osu.edu/૫croberts/resolvingfocus.BLS.hd.pdf. Schwarzschild, Roger 1999 GIVENness, Avoid F and Other Constraints on the Placement of Focus. Natural Language Semantics, 7(2): 141–177.

Focus projection between theory and evidence

239

Selkirk, Elisabeth 1995 Sentence Prosody: Intonation, Stress and Phrasing. In John A. Goldsmith, (ed.), The Handbook of Phonological Theory, chapter 16, pp. 550–569. Basil Blackwell, Oxford. Stegmann, Rosmary, Heike Telljohann & Erhard W. Hinrichs 2000 Stylebook for the German Treebank in VERBMOBIL. Verbmobil-Report 239, Universität Tübingen, Tübingen, Germany. Steindamm, Susanne 2005 Fokusprojektion und Akzenttypen im Deutschen – Produktion und Perzeption. Master’s thesis, Universität zu Köln. Institut für Linguistik – Abteilung Phonetik. http://www.uni-koeln.de/phil-fak/phonetik/Lehre/ MA-Arbeiten/Susanne-MA-KOMPLETT.pdf. Stolterfoht, Britta & Markus Bader 2004 Focus Structure and the Processing of Word Order Variations in German. In Anita Steube, (ed.), Information Structure: Theoretical and Empirical Aspects, pp. 259–275. Mouton de Gruyter, Berlin. Uhmann, Susanne 1991 Fokusphonologie. Number 252 in Linguistische Arbeiten. Max Niemeyer Verlag, Tübingen. von Stechow, Arnim & Susanne Uhmann 1986 Some Remarks on Focus Projection. In Werner Abraham and Sjaak de Meij, (eds.), Topic, Focus and Con¿gurationality, number 4 in Linguistik Aktuell, pp. 295–320. Benjamins, Amsterdam. Wagner, Michael to appear Focus and Givenness: A Uni¿ed Approach. In Ivona Kuþerová and Ad Neeleman, (eds.), Information Structure: Contrasts and Positions. Cambridge University Press, Cambridge. http://semanticsarchive.net/ Archive/GNmMjJlN/wagner10focus.pdf. Welby, Pauline 2003 Effects of Pitch Accent Position, Type and Status on Focus Projection. Language and Speech, 46(1): 53–81. Xu, Yi

2010 In Defense of Lab Speech. Journal of Phonetics, 38(3): 329–336. Zipf, George Kingsley 1936 The Psycho-Biology of Language. Routledge, London.

Locative Inversion in English: Implications of a Rating Study* Sara Holler & Jutta M. Hartmann

1.

Introduction

Locative inversion (LI) in English as in (1) exhibits a number of interesting properties that are speci¿c to inversion structures. (1)

Into the room walked John.

(Rochemont & Culicover 1990, 70)

First of all, the subject appears post-verbally. Second, a prepositional or adverbial phrase appears in initial position. Third, LI has the discourse function of presentational focus. The inverted locative sets a scene onto which the subject is (re-)introduced (Bolinger 1971, 1977; Rochemont 1986; Bresnan 1994). This presentational focus is different from other inversion structures such as comparative inversion, where the subject receives contrastive focus (cf. Culicover & Winkler 2008). Fourth, LI is available for a restricted verb class only; however, the exact classi¿cation is disputed. This paper concentrates on this aspect of LI. We investigate with a rating study whether LI is restricted to unaccusative verbs as proposed by Bresnan (1994) (see also L. Levin 1986), cf. (2), or whether unergative verbs can also appear in LI structures, as e.g. Levin & Rappaport Hovav (1995) argue on the basis of a corpus study, cf. (3). (2)

*

a.

Among the guests was sitting my friend Rose.

b.

*Among the guests was knitting my friend Rose. (Bresnan 1994, 78)

We thank S. Winkler, P. Culicover, M. Salzmann, A. Konietzko and the audience at Linguistic Evidence 2010 and the (Post)Doktorandenkolloquium 2011 for comments. Special thanks go to Janina Radó for help with the statistics and comments. This research was partially funded by the DFG (SFB 833, A7).

242

Sara Holler & Jutta M. Hartmann

(3)

On the third Àoor worked two young women called Maryanne Thomson and Ava Brent, who ran the audio library and the print room. (B. Levin & Rappaport Hovav 1995, 224, citing from: L. Colwin, Goodbye without Leaving)

Furthermore, Culicover & Levine (2001) (henceforth C&L) claim that LI is possible with unergative verbs as long as the post-verbal subject is heavy (in terms of length, complexity and stress) and shifted to the right. On the basis of two rating studies, the current paper investigates the following questions: First, is LI judged acceptable with both verb types? Second, does the heaviness of the subject play a crucial role for the acceptability of LI? Third, do unergative verbs require a right-adjacent subject? We show that LI is equally possible with unaccusative and unergative verbs independently of the heaviness of the subject. LI with unergative verbs does not require the subject to be right-adjacent. This implies that various previous syntactic analyses of LI are not adequate since the subject of unergative verbs cannot be base-generated post-verbally. We will outline possible alternatives. Nevertheless, it is clear that not all verbs are acceptable in LI. However, the restriction seems to be pragmatic (cf. Birner 1995) and related to the speci¿c information structure of LI. The paper is structured as follows. In section 2, we present two rating studies. The ¿rst one looked at unaccusative verbs, the second one at unergative verbs. In both studies, we investigated the effect of the word order and heaviness of the subject. In section 3, we will present the results. Section 4 discusses the implications of the ¿ndings for the syntactic analysis of LI and the possible information structural restrictions on the verb type. Section 5 concludes the paper. 2.

Experiment

2.1

Hypotheses

C&L distinguish Light and Heavy Inversion as two different syntactic phenomena. In Light Inversion the subject is base-generated post-verbally and remains in situ (see also Hoekstra & Mulder 1990; Bresnan 1994). As only unaccusative verbs – but not unergative verbs – allow this base-generated word order, Light Inversion is restricted to unaccusative verbs (compare claim (4)). Light Inversion with an unaccusative versus an unergative verb is displayed in (5). (4)

Claim 1: Light Inversion only occurs with unaccusative verbs.

Locative Inversion in English: Implications of a Rating Study

(5)

a.

Into the room walked Robin.

b. * In the room slept Robin.

243

(unaccusative) (C&L 2001, 292) (unergative) (C&L 2001, 293)

Furthermore, the base-generated word order predicts that manner adverbs should not be allowed before the subject, as in (6) (see C&L 2001, 292). (6)

a.

Into the room walked Robin carefully.

b. * Into the room walked carefully Robin. Heavy Inversion, where the subject is moved to the right via the syntactic subject position, is possible with both unaccusative and unergative verbs as given in (7). Thus, LI with unergatives is only possible with a heavy subject, cf. (8a) vs. (8b), and it needs to be shifted to the right, cf. (8b) vs. (8c). (7)

Claim 2: Heavy Inversion occurs with both unaccusative and unergative verbs. For Heavy Inversion, the subject needs to be heavy and Heavy NP shifted.

(8)

a. * In the room slept Robin. b.

In the room slept ¿tfully the students in the class who had heard about the […] experiment that we were about to perpetrate.

c.

*In the room slept the students in the class who had heard about the […]experiment that we were about to perpetrate ¿tfully. (C&L 2001, 293)

C&L’s observation of LI being a conÀation of two different syntactic phenomena is in line with Levin & Rappaport Hovav’s (1995) corpus-based study, which also reports unergative verbs in LI. This observation was the starting point of our experimental investigations. However, the question is whether heaviness plays a crucial role in allowing unergative verbs in LI. A pilot study suggested that unaccusative and unergative verbs occur with both heavy and light subjects. This pilot study led us to our two hypotheses. (9)

Verb Class Hypothesis LI is possible with both unergative and unaccusative verbs.

244

Sara Holler & Jutta M. Hartmann

(10) Extraposition Hypothesis Extraposition (in LI) applies whenever the subject is heavy.1 (9) claims that LI can apply with unergatives regardless of the heaviness of the subject. We expect both verb types to be equally acceptable in (11).2 (11) a. b.

In the dormitory arrived twenty students quite happily. (unacc.) In the dormitory slept twenty students quite happily. (unerg.)

The second hypothesis assumes extraposition of heavy subjects independently of the verb class: heavy subjects have to be extraposed with both verb types. Accordingly, (12a) and (13a) are expected to be unacceptable whereas (12b) and (13b) are expected to be acceptable. Thus, the behaviour of both unergative and unaccusative verbs combined with a heavy subject should be the same. (12) a.

Inside appeared various colourful ¿sh which my uncle had recently bought from a ¿sh breeder very slowly.

b.

Inside appeared very slowly various colourful ¿sh which my uncle had recently bought from a ¿sh breeder.

(13) a.

Inside swam various colourful ¿sh which my uncle had recently bought from a ¿sh breeder very slowly.

b.

Inside swam very slowly various colourful ¿sh which my uncle had recently bought from a ¿sh breeder.

We conducted two rating studies. The ¿rst one investigated LI with unaccusative verbs, the second one LI with unergative verbs. Both studies manipulated the factors heaviness and extraposition, leading to the four conditions given in (14). (14) a.

Light-intraposed:

PP – verb – lightNP



AdvP

b.

Light-extraposed:

PP – verb – AdvP



lightNP

1. By “extraposition”, we refer to the word order with the subject appearing at the very right edge of the sentence, independent of any speci¿c syntactic analysis. 2. According to the pseudo-passive test arrived is an unaccusative verb whereas slept is unergative (Perlmutter & Postal 1984, 101): (i) * The airport was arrived at by our uncle. (ii) A bed was slept in by a girl.

Locative Inversion in English: Implications of a Rating Study

2.2

c.

Heavy-intraposed:

PP – verb – heavyNP –

d.

Heavy-extraposed: PP – verb – AdvP



245

AdvP heavyNP

Materials

In order to create a stark contrast between light and heavy subjects, light subjects were kept as short and simple as possible. Following Arnold et al.’s (2000) de¿nition of heaviness, light subjects consisted of two words and four syllables.3 We chose numerals to precede them. By contrast, heavy subjects were made up of 13 to 16 words and 18 to 25 syllables. The modi¿er which was added to the heavy subjects included not only a numeral but also an adjective. Moreover, for syntactic complexity, heavy subjects contained a relative clause. Two examples of heavy subjects are given below: (15) twenty lazy students who had heard about the researchers’ important social psych experiment (16) numerous hideous trolls which looked rather inhuman with their oversized heads and noses Manner adverbs (AdvP), which mark the right edge of the verb phrase, served the purpose of comparing extraposed word order (PP V AdvP NP) and intraposed word order (PP V NP AdvP). Manner adverbs were combined with adverbs of degree (examples are very slowly or quite cheerfully). As the light NPs and the AdvPs were equal in length, it was entirely left up to the grammar to determine when to shift the subject. Concerning the selection of unaccusative versus unergative verbs, we followed the basic classi¿cation from Perlmutter (1978). According to him, subjects of unaccusatives are base-generated below the verb as its complement, while subjects of unergatives are base-generated higher than the verb in the verb’s subject position. This means that unaccusative verbs have a direct internal argument, which functions as the theme or patient, whereas unergatives have an external argument, which receives an agent role. The verbs used in our experiments were all tested with the so-called pseudo-passive test (cf. Perlmutter & Postal 1984), which is the most reliable test for the distinction of the two 3. Peter Culicover (p.c.) pointed out that intonation can make a noun phrase heavy and thus improve the extraposed word order with light subjects. However, the interesting cases are light subjects with intraposed word order. Thus, the lack of intonation in our study is not a problem.

246

Sara Holler & Jutta M. Hartmann

verb types in English. Since unaccusatives cannot occur in pseudo-passives, see (17), only those verbs that passed the test were classi¿ed as unergatives. (17) a. * The bridge was existed under by the trolls. b. * The dome was collapsed under by the model. (Perlmutter & Postal, 1984, 100f) In (18), pseudo-passivization with unergative verbs is illustrated. (18) a. b.

The bed was slept in by the shah. The bed was jumped on by the children. (Perlmutter & Postal, 1984, 100f)

The ¿rst study examined LI with unaccusatives and the second one LI with unergatives. The unaccusative and unergative verbs used in this experiment were based on a list by Perlmutter (1978) as given in Kuno & Takami (2004). The verbs in the test sentences were all used in the simple past form. Twelve lexical variants were written, of which four sentences were based on examples given in the literature. The experimental items were distributed on four lists following a Latin square design. Each lexical variant was presented once per list in one of the four conditions. Each condition was tested three times per list. A sample item with both verb types (unaccusative: appeared, unergative: swam) is given in (19).4 (19) a.

Light-intraposed: Inside appeared/swam [various ¿sh] [very slowly].5

b.

Light-extraposed: Inside appeared/swam [very slowly] [various ¿sh].

c.

Heavy-intraposed: Inside appeared/swam [various colourful ¿sh which my uncle had recently bought from a ¿sh breeder][very slowly].

4. This example is based on B. Levin & Rappaport Hovav 1995, 257, citing from: J. Olshan, The Waterline, 177. 5. P. Culicover pointed out that the VP adverbials in our items could also be parenthetical/afterthoughts. However, if participants had interpreted adverbials as parenthetical, we would expect the same effect for heavy and light subjects – this effect did not turn up in our experiments.

Locative Inversion in English: Implications of a Rating Study

d.

247

Heavy-extraposed: Inside appeared/swam [very slowly][various colourful ¿sh which my uncle had recently bought from a ¿sh breeder].

In addition to the experimental items, 36 ¿ller sentences were added. Roughly one third were rather unacceptable sentences. The ¿llers included four sentences with LI and transitive verbs, which are generally assumed not to be able to undergo LI (cf. Bresnan 1994 among others; for an exception see C&L). The ¿ller sentences (20) were thus expected to receive very low ratings (20c,d are based on Bresnan 1994, 77). (20) Fillers: LI with transitives a. Among the guests ate roast beef several guys.

2.3

b.

In the of¿ce saw a note two employers.

c.

In the rainforest found the reclusive bird thirteen lucky hikers who actually just wanted to have adventurous and exciting holidays.

d.

On the corner drank beer numerous teenaged boys who were ready for a weekend full of fun and parties.

Participants and Procedure

Twenty-seven native speakers of English took part in the ¿rst study, twentyfour native speakers in the second one. They were randomly assigned to the lists, and each questionnaire was judged six or seven times. Speakers of both British English and American English participated. All participants were naïve to the purpose of the study. They received the questionnaires via e-mail, and ¿lled them in within 15 days. The participants’ task was to read the sentences carefully and to rate them on a scale of one (= unnatural and hard to understand) to seven (= natural and highly acceptable). Participants were asked to rely on their intuitions of what sounds good. They were also told to make use of the whole scale and not to go back to single sentences to change their ratings. The questionnaires started with written instructions and an example. At least ¿ve ¿ller sentences appeared on each list before the ¿rst test sentence.

248

Sara Holler & Jutta M. Hartmann

3.

Results

3.1

Results Experiment 1: Unaccusatives

Mean ratings per condition are given in table 1 and ¿gure 1. Ratings were analysed in two repeated measures ANOVAs with subjects and with items as random effects. The effect of heaviness was fully signi¿cant by subjects but only marginally by items (F1(1,26) = 4.483, p = 0.044; F2(1,11) = 4.483, p = 0.058). The effect of extraposition only approached signi¿cance in the subjects analysis (F1(1,26) = 3.121, p = 0.089; F2(1,11) = 1.056, p = 0.326). There was a signi¿cant interaction between extraposition and heaviness (F1(1,26) = 19.901, p = 0.000; F2(1,11) = 16.402, p = 0.002), indicating that extraposition affects heavy and light subjects differently. We used planned contrasts to test the speci¿c predictions. LI with heavy subjects received signi¿cantly higher ratings in the extraposed word order than in the intraposed order (t1(26) = í3.969, p = 0.001; t2(11) = í3.969, p = 0.010). Intraposed heavy subjects were rated rather low. Table 1. Mean ratings per condition with unaccusative verbs 6 1

Condition light-intraposed

Rating 3.59

2

light-extraposed

3.05

3

heavy-intraposed

2.36

4

heavy-extraposed

3.5

Example6 Under the stairs existed numerous trolls quite cheerfully. Under the stairs existed quite cheerfully numerous trolls. Under the stairs existed numerous hideous trolls which looked rather inhuman with their oversized ears and noses quite cheerfully. Under the stairs existed quite cheerfully numerous hideous trolls which looked rather inhuman with their oversized ears and noses.

6. Note that the examples in table 1 and table 2 are for illustration purposes only. The ratings are mean values for all experimental items in this condition.

Locative Inversion in English: Implications of a Rating Study

249

Figure 1. Mean ratings per condition for experiment 1

The pattern was reversed with light subjects: the intraposed word order was rated higher than the extraposed word order (t1(26) = 2.418, p = 0.023; t2(11) = 1.562, p = 0.146) though the difference was not as big as with heavy subjects, and was only signi¿cant by subjects. Sentences with light subjects and extraposed order were rated worse, but not as bad as LI with heavy subjects with intraposed word order. 3.2

Results Experiment 2: Unergatives

Mean ratings per condition are given in table 2 and ¿gure 2. Table 2. Mean ratings per condition with unergative verbs Condition 1 light-intraposed

Rating Example 3.86 Under the stairs danced numerous trolls quite cheerfully. 2 light-extraposed 3.56 Under the stairs danced quite cheerfully numerous trolls. 3 heavy-intraposed 2.70 Under the stairs danced numerous hideous trolls which looked rather inhuman with their oversized ears and noses quite cheerfully. 4 heavy-extraposed 3.97 Under the stairs danced quite cheerfully numerous hideous trolls which looked rather inhuman with their oversized ears and noses.

250

Sara Holler & Jutta M. Hartmann

Figure 2. Mean ratings per condition for experiment 2

Ratings were analysed in two repeated measures ANOVAs with subjects and with items as random effects. The effect of extraposition was signi¿cant (F1(1,23) = 4.632, p = 0.042; F2(1,11) = 10.703, p = 0.007) whereas the effect of heaviness was marginal both by subjects and by items (F1 (1,23) = 3.232, p = 0.085; F2(1,11) = 3.550; p = 0.086). There was a signi¿cant interaction between extraposition and heaviness (F1(1,23) = 33.455, p = 0.000; F2(1,11) = 10.170, p = 0.009). This shows that heavy and light subjects are affected differently by extraposition. Planned contrasts were used to test the speci¿c predictions. The ratings for LI with heavy subjects were signi¿cantly higher with extraposed than with intraposed word order (t1(23) = í4.247, p = 0.000; t2(11) = í3.685, p = 0.010). With light subjects, there was no signi¿cant difference between extraposed and intraposed word order (t1(23) = 1.309, p = 0.203; t2(11) = 1.367, p = 0.199). LI with light, extraposed subjects is still rated higher than LI with heavy non-extraposed subjects. 3.3

Summary of the Results

The most important ¿ndings of our results are: (i) LI is equally possible with both unaccusative and unergative verbs: ¿rst, the pattern was the same in both experiments; second, the participants of the two experiments appeared to have used the same scales, which is indicated by very similar means for individual sentences, as well as for the general means of the ¿llers. (ii) Not only unaccusative verbs but also unergative verbs allow the intraposed word order with

Locative Inversion in English: Implications of a Rating Study

251

light subjects, i.e. the subject in LI with unergative verbs does not need to be extraposed (as long as the adverbial is equal in length). (iii) Heavy subjects have to be extraposed. Heaviness plays a crucial role for word order, but not for the acceptability of the different verb classes. Our results do not support C&L’s ¿rst claim, namely that Light Inversion only occurs with unaccusative verbs. Intraposed subjects are also possible with unergative verbs. C&L’s second claim that Heavy Inversion occurs both with unaccusative and unergative verbs is supported by our results. Note that the highest average rating for experimental sentences was only 3.97 on a seven-point scale even though the participants used the whole range of the scale: Some of the ¿llers were designed to mark the top, the bottom and the middle of the scale. There are probably two related reasons for the low ratings of the experimental items. Firstly, LI is limited to certain informationstructurally de¿ned contexts. As the sentences were presented without context, the information structural requirements of LI are not satis¿ed. Additionally, the sentences contained a post-verbal adverbial, which makes it even more dif¿cult to construe an appropriate context. Nevertheless, the contrasts between the individual conditions as well as the similarities between the two experiments remain signi¿cant and as such, we take our results to be reliable.

4.

Implications of the Results

In this section, we discuss the implications of the ¿ndings of our study both for the syntactic analysis as well as for the restrictions on verb classes in LI. 4.1

Syntax of LI

The syntactic analysis of LI in the generative grammar framework can be divided into two major approaches: (i) Low subject accounts and (ii) subject extraposition analyses. As the names suggest, the main distinguishing feature in this classi¿cation is the position of the thematic subject. The low subject accounts have to be further subdivided into PP movement accounts and silent proform accounts. We will see below that our results are problematic for both low subject approaches (for a recent summary of the advantages and problems in general see Salzmann 2009). The second class of approaches fares slightly better, but the subject extraposition analyses are available only for a subclass of LI cases. In search of a better analysis, we look at the verb movement account as proposed in Salzmann (2009). This approach can handle the data in our experiment. However, the predictions of the proposal need to be tested empirically.

252

Sara Holler & Jutta M. Hartmann

Finally, we present a proposal for a possible alternative account in terms of PF movement. Whether this approach is feasible requires further data. 4.1.1 Low subject accounts The class of low subject accounts can be further divided into PP movement and silent pro-form analyses. In the PP movement approaches (see Hoekstra & Mulder 1990; Bresnan 1994; Collins 1997; den Dikken 2006; Broekhuis 2008; Hartmann 2008; Light Inversion in C&L), a post-verbal PP moves to the subject position (and possibly beyond) to satisfy the EPP. In the silent proform analyses (see Postal 1977, 2004; Coopmans 1989) the subject position is ¿lled by a silent counterpart of the pro-form there in English. Both approaches crucially rely on the thematic subject being base-generated as a complement of the verb, cf. the tree structures in (21) and (22). This is arguably true for unaccusative verbs. (21) PP movement

(22) Silent pro-form analysis

However, both structures are incompatible with unergative verbs: the thematic subject is base-generated higher than the verb (Perlmutter 1978; Bowers 2001). As a result, the intraposed word order for unergative PP verb-subject-adverbial cannot be a base-generated word order.

Locative Inversion in English: Implications of a Rating Study

253

One attempt to account for the possibility of unergative verbs in LI structures with a PP movement account is the proposal by Hoekstra & Mulder (1990). They argue that unergative verbs can be made unaccusative by the presence of a directional/result PP. However, the argumentation is circular for English LI: They claim that the construction only allows for unaccusative verbs and therefore the verbs occurring in the structure have to be unaccusative. The examples in the literature and those used in the experiment reported here, lack a directional/result PP. Thus, the crucial ingredient for unaccusativization is missing. So far, we do not see any other argument for this approach – especially because other tests for unaccusativity are not applicable in English (e.g. auxiliary selection, impersonal passives) or cannot be combined with LI (e.g. past participle as nominal modi¿er, pseudo-passive test). 4.1.2 Subject extraposition The major exponent of the subject extraposition account is C&L’s proposal for Heavy Inversion. According to their analysis, a heavy subject can extrapose to the right from the Spec,IP position, cf. (23). The trace in the subject position is licensed by a prepositional/adverbial phrase adjoined to IP. This analysis is certainly possible for the cases of heavy extraposed subjects. And we follow this analysis for these cases (for arguments that LI with heavy subjects is a separate phenomenon, see C&L). However, the analysis is not available for the intraposed word order with unergative verbs and light subjects. An adverbial can appear to the right of the subject, which is unexpected in the extraposition analysis. 4.1.3 Verb movement account Based on a previous analysis by Rochemont & Culicover (1990), Salzmann (2009) proposes that the word order PP V NP (ADV) with unergative verbs is derived from verb movement: the verb moves across the subject to a head above vP, which he takes to be an aspectual projection. This verb movement is only available in LI because it serves the requirement for the subject to be focused.

254

Sara Holler & Jutta M. Hartmann

(23) Heavy Inversion (cf. C&L 2001, 294)

(24) Salzmann (2009)

Salzmann (2009) provides evidence for this verb movement from the position of adverbials. VP adverbials can follow the verb in LI, which is otherwise not possible, (1) (Salzmann 2009, citing books.google.de/books?isbn = 1579788335 John Oyer). (25) Behind Luther’s Word stood always the concept of a historical revelation which had been recorded in the Scriptures. However, these examples are taken from the internet and it is not entirely clear whether this is an acceptable pattern only occurring in LI. An additional rating study should clarify the issue. From a theoretical point of view, the question arises of why this movement occurs only in LI and how it can be triggered. In Salzmann’s analysis, the movement of the verb is driven by two forces that add up in LI: (i) feature-checking of aspectual properties of the verb and (ii) repair-driven movement (in the sense of Heck & Müller 2007) of the verb to

Locative Inversion in English: Implications of a Rating Study

255

allow the subject to be right-aligned and thus, occupy the default focus position in the sentence. This is implemented with different rankings in optimality theoretical (henceforth OT) terms. Technically this means that the constraint ALIGNFOCUS is ranked higher than the constraint NOLEXMVT, which usually bans verb movement. 4.1.4 PF Movement Göbbel (2010, to appear) argues for PF movement of phrases in relative clause extraposition, cf. (26), PP extraposition, cf. (27), Heavy-NP Shift, cf. (28) and CP shift, cf. (29). (26) a. b. (27) a.

Last night, a man we’d never seen before arrived. > Last night, a man arrived who we’d never seen before: (EX-Rel) I read a magazine about Turner on Monday. >

b.

I read a magazine on Monday about Turner. (EX-PP)

(28) a.

Bill explained Newton’s law of gravitation to Mary. >

b. (29) a. b.

Bill explained to Mary Newton’s law of gravitation. (HNPS) Bill explained why he was late for work to Mary. > Bill explained to Mary why he was late for work. (CP shift)

In an OT-based analysis of various phonological constraints, Göbbel argues that the word orders in the b-examples in (26)–(29) are optimal candidates at PF, whilst not being faithful to the syntactic representation. In a nutshell, by reordering the syntactic constituents, the phonological representation gains in balance. The constraint BinIP given in (30) favours the b-sentences over the a-sentences in HNPS and CP shift. (30) Bin IP: An IP contains two prosodic phrases. (31) Excited about Greece and its cultural heritage, a.

(he donated a vase) (that shows Zeus and Apollo ¿ghting) (to a museum)

b.

(he donated to a museum) (a vase that shows Zeus and Apollo ¿ghting)

256

Sara Holler & Jutta M. Hartmann

(32) What did you say about Mary? a.

(She mentioned) (that her jeans were dirty) (to Bill)

b.

(She mentioned to Bill) (that her jeans were dirty)

However, there is a crucial difference between the structures that Göbbel (2010) investigates and the LI cases here: while Göbbel investigates the deaccentuation of old material on the right edge, the constituents on the right edge in LI are typically new information (cf. Birner & Ward 1998) and are assigned presentational focus (see below). Yet it is in principle possible to transfer the core ideas of such reordering at PF to the LI cases with unergative and unaccusative verbs. Note, however, that the phonological reordering in English is usually highly restricted: in most cases, the word order is strictly SVO. Therefore, it is crucial to exclude overgeneration. This is feasible on the following assumptions: (i) In the syntax, the subject remains low in the vP both with unergative and unaccusative verbs. (ii) The PP is base-generated or moved to the initial position. (iii) The violation of the EPP is possible because the subject needs to remain low to satisfy information structural restrictions in the syntax. This analysis implies that focus and realization of intonation are independent to some degree. The subject remains low to satisfy information structural restrictions, but it is pronounced and stressed at the end of the phrase for phonological reasons. In the phonological phrasing of the vP, the word order of the verb and the subject is rearranged to satisfy the constraints on heaviness and newness. An open question is whether this PF account could in principle work for transitive verbs as well. On the one hand, some transitive verbs seem possible in LI as long as they are not “semantically transitive”. On the other hand, the syntax of transitive verbs differs from the syntax of intransitive verbs, which in turn might affect the possible reordering at PF. Further data is necessary to determine the adequacy of this approach. 4.2

Verb Classes and Information Structure

The results of our studies show that the distinction of unergative vs. unaccusative verbs is not relevant for the licensing of LI in English. Nevertheless, following the investigations on verbs in LI in the literature, it is still clear that not all verbs are possible in LI. In this section, we ¿rst look at the restriction on transitive verbs. Based on the data of C&L, we tentatively conclude that the restriction on transitive verbs is not a syntactic restriction. Instead, we suggest in line with Birner (1995) that whether a verb can or cannot occur in LI is restricted by the information structure (IS) of LI. Even though we follow the

Locative Inversion in English: Implications of a Rating Study

257

main intuition underlying Birner’s proposal, the exact nature of the IS restriction both on LI and the verb classes is more dif¿cult to grasp and needs to be based on a broader set of experimental data. 4.2.1 Transitive Verbs in Locative Inversion It has been reported repeatedly that transitive verbs are impossible in LI structures (cf. Rochemont 1978, Bresnan 1994 among others). Our study supports this ¿nding: the ¿ller sentences with LI and transitive verbs given in (33) were rated very low (Mean rating experiment 1: 1.98; Mean rating experiment 2: 1.81). (33) a.

Among the guests ate roast beef several guys.

b.

In the of¿ce saw a note two employers.

c.

In the rainforest found the reclusive bird thirteen lucky hikers who actually just wanted to have adventurous and exciting holidays. (based on Bresnan 1994, 77)

d.

On the corner drank beer numerous teenaged boys who were ready for a weekend full of fun and parties. (based on Bresnan 1994, 78)

By contrast, C&L (2001) present examples with Heavy Inversion and transitive verbs that they rate acceptable, see (34) and (35). (34) [In the backyard] were sunning themselves [a group of the largest iguanas that had ever [been] seen in Ohio]. (C&L, 2001, 308) (35) The economist predicted that [at that precise moment] would turn the corner [the economics of half a dozen South American nations]. (C&L, 2001, 308) Syntactically, these verbs should still be analysed as transitive verbs. Thus, the restriction on transitive verbs cannot be ruled out on syntactic grounds. The crucial difference between these examples and our ¿ller sentences is that the direct object does not introduce a further event participant in C&L’s examples. If this is indeed the relevant difference, this supports an approach in which the information structural restrictions on LI are decisive: a further event participant following the verb cannot be accommodated to the presentational function (in the sense of Bolinger 1977) of the structure. Thus, we suggest seeking an explanation for the restriction on verb classes in LI in the information structure of LI.

258

Sara Holler & Jutta M. Hartmann

4.2.2 Information Structural Requirements of LI According to the literature, three information structural restrictions on LI can be formulated (see Bolinger 1977; Rochemont 1986; Birner 1992, 1994; Levin & Rappaport Hovav 1995; among others): (i) The preposed PP functions as a scene setter, while (ii) the subject NP is (re-)introduced onto this scene receiving presentational focus.7 (iii) Only verbs that can support the function of presentational focus occur in LI. The verb serves to support the presentational function of the construction. Birner (1995) suggests that verbs that can occur in LI need to be ‘inherently light’. By this she means that the verb can be predicted from the ¿rst constituent and does not contribute any new information. This is clear for the example in (36) – the verb preach is predictable from the pulpit. But this is less clear for the examples in (37) – why should the verb melt be more predictable from sticky hands than from the streets of Chicago? (36) From this pulpit preached no less a person than Cotton Mather. (Birner 1995, 253) (37) a * b.

On the streets of Chicago melted a lot of snow. In Maria’s sticky hand melted a chocolate-chip ice cream cone. (Birner, 1995, 253)

We argue that the unacceptability of the example in (37a) does not (only) depend on the predictability of the verb, but that it is rather caused by the choice of a wrong subject. The subject a lot of snow cannot receive presentational focus. When snow melts, it usually disappears and it is dif¿cult to accommodate

7. We adopt Rochemont’s (1986: 52) de¿nition of presentational focus here. An expression P is a presentational focus in a discourse if P is not c-construable (i.e. it does not have a semantic antecedent in the discourse). Applying this de¿nition to LI constructions means that the subject NP should not be accessible or given in the context.

Locative Inversion in English: Implications of a Rating Study

259

the presentation of snow in such a situation.8 The example can be improved in the following ways:9 (38) a. b.

On the streets of Chicago melted an iceberg. Out in the Chicago streets melted the very handful of snow containing the diamonds that they were looking for.

From our point of view, the crucial change that is achieved in these examples is that something is presented on the scene: an iceberg can be imagined as something that is melting without disappearing at the same time. In (38b), with the melting of the snow, the diamonds appear on the scene instead. These examples show that the problem with (37a) is not the predictability from the prepositional phrase, but the presentability of the post-verbal noun phrase. (37a) cannot therefore ful¿ll the information structural requirements of locative inversion. In order to investigate the information structural requirements on the verb, it is necessary to make sure that the requirements on the PP and the post-verbal subject are ful¿lled in the ¿rst place. These requirements will be considered, and we will then come back to the question of the verb types in LI. The information structural requirements of LI can be described as follows: the preposed PP enables LI to have the discourse function of presentational focus since it sets a scene onto which the subject referent is (re)introduced (Bolinger 1971, 1977; Rochemont 1986; Bresnan 1994). Both requirements, the scene setting function of the PP and the presentation of the subject, have to be ful¿lled to make LI felicitous. Consider (39).

8. One can interpret the examples in (i) from Bolinger (1977, 96, 99) similarly. (i) a. Slowly dissolving was a mass of ectoplasm. b. Away sailed an enormous ship. A mass of ectoplasm that is slowly dissolving is still presentable. Similarly, an enormous ship is surprising enough to be presented even when the direction of the ship is towards the horizon, where it potentially disappears from view. Similarly in the example On the streets of Chicago ¿nally melted all that dirty and ugly grey snow (provided by an anonymous reviewer), the introduced snow adds to the overall picture, even though the snow potentially disappears from view. 9. We thank K. Grif¿n for providing example (37a) and an anonymous reviewer for (38b) who also suggested that the problem with (37a) is the lack of presentation of the post-verbal NP.

260

Sara Holler & Jutta M. Hartmann

(39) A:

I’m looking for my friend Rose.

B:

#Among the guests of honor was sitting Rose.

C:

Rose was sitting among the guests of honor. (Bresnan 1994, 85)

Speaker A’s utterance is odd: Firstly, as Rose has just been mentioned by what has been said by speaker A, it is unnatural to reintroduce her on a scene by speaker B. Moreover, the scene among the guests of honor has not been set in the preceding sentence but is rather newly introduced. Birner (1994) explains this effect with the information-packing function of inversion: Following Horn (1984) and Prince (1992) she argues that in inversion structures, the preposed element must not be newer in the discourse than the postponed element. Thus, the relationship between the PP and the subject is crucial for the felicity of a LI structure. The following examples from the British National Corpus and Doris Lessings’s The Grass is Singing illustrate this claim. In both examples the PP has been given in the sentence preceding the LI structure but the subject is discourse-new.10 (40) The Primitive Methodist chapel was built in 1837 and then rebuilt on the same site in 1887. Near the chapel stands the church institute, a Gothicstyle building built in 1844. (BNC text = “C93” n = “1527”) (41) When he reached the house at last, he saw, as he approached through the bush, six glittering bicycles leaning against the wall. And in front of the house, under the trees, stood six native policemen, and among them the native Moses, his hands linked in front of them. (Doris Lessing, 2007, 15) We therefore claim that the discourse status of the preposed locative and that of the subject NP both play a crucial role for the felicity of a LI construction. Coming back to the IS requirements on the verb, we would like to propose that the verb in LI has to function as an adequate link between the PP and the noun phrase. The core meaning of the verb in LI links the PP and NP in such a way that the PP can introduce the scene on which the noun phrase is presented. This is straightforward for the verb be, verbs of appearance and locative verbs 10. It is not entirely clear whether the crucial factor is the discourse-status (new, old) of the PP and the subject, or rather if the PP needs to establish a link to the description of the preceding discourse in order to anchor the ground on which the post-verbal noun phrase is presented (see Cheng 2003 for discussion).

Locative Inversion in English: Implications of a Rating Study

261

like stand or sit. The more additional meaning a verb has (e.g. manner of appearance, manner of location, change of state), the less likely it is to occur in LI.11 More work is required here. In sum, Birner’s requirement for the verb to be light in LI therefore seems to be a step in the right direction, but more data would be needed to de¿ne the notion of “lightness” and to investigate our hypothesis.

5.

Conclusion

In this paper, we have presented results from two rating studies about the possibility of unergative and unaccusative verbs in LI in English. Our results show that both classes are equally possible, independent of the post-verbal subject being heavy or light. If a subject is heavy, it has to be extraposed with both types of verbs. If the subject is light, it can occur in an intraposed position, preceding a manner adverbial. Current syntactic analyses of LI cannot account for the possibility of intraposed word order with unergative verbs. We have presented two alternative theories, Salzmann (2009) and our own suggestions regarding PF movement, to account for the observed data. More work is needed to distinguish these proposals. Our results show that the distinction between unergative and unaccusative verbs is not relevant for the restrictions on verb classes in LI. In order to account for the restrictions reported in the literature, a different approach is needed. We hypothesize that such a restriction on verbs in LI has to be closely linked to the information structural requirements of LI.

References Arnold, Jennifer E., Thomas Wasow, Anthony Losongeo & Ryan Gintstrom 2000 Heaviness vs. newness: The effects of structural complexity and discourse status on constituent ordering. Language 76: 28–55. Birner, Betty 1992 The discourse function of inversion in English. Doctoral Dissertation, Northwestern University, Evanston Ill. Birner, Betty 1994 Information status and word order: An analysis of English inversion. Language 70: 733–59.

11. For a discussion of change-of-state verbs in LI, see Landau (2010, 121–123).

262

Sara Holler & Jutta M. Hartmann

Birner, Betty 1995 Pragmatic constraints on the verb in English inversion. Lingua 97: 233– 256. Birner, Betty & Gregory Ward 1998 Information Status and Noncanonical Word Order in English. Amsterdam/Philadelphia: John Benjamins. Bolinger, Dwight 1971 A further note on the nominal in the progressive. Linguistic Inquiry 2: 584–586. Bolinger, Dwight 1977 Meaning and Form. London: Longman. Bowers, John 2001 Predication. In: Mark Baltin and Chris Collins (eds.), The Handbook of Contemporary Syntactic Theory, 299–333. Malden, MA/ Oxford: Blackwell. Bresnan, Joan W. 1994 Locative inversion and the architecture of Universal Grammar. Language 70 (1): 72–131. British National Corpus (BNC) 2001 Version 2 (world edition), Oxford University Computing Services on behalf of the BNC Consortium. Broekhuis, Hans 2008 Derivations and Evaluations: Object Shift in the Germanic Languages. Berlin/New York: Mouton de Gruyter. Cheng, Rong 2003 English Inversion: A Ground-before-Figure Construction. Berlin, New York: Mouton de Gruyter. Collins, Chris 1997 Local Economy. Cambridge, MA: MIT Press. Coopmans, Peter 1989 Where stylistic and syntactic processes meet: Locative inversion in English. Language 65 (4): 738–751. Culicover, Peter W. & Robert D. Levine 2001 Stylistic inversion in English: A reconsideration. Natural Language and Linguistic Theory 19: 283–310. Culicover, Peter W. & Susanne Winkler 2008 English Focus Inversion. Journal of Linguistics 44: 625–658. Den Dikken, Marcel 2006 Relators and Linkers: the Syntax of Predication, Predicate Inversion, and Copulas. Cambridge, MA: The MIT Press.

Locative Inversion in English: Implications of a Rating Study

263

Göbbel, Edward 2010 Prosodically-conditioned rightward movement: The case of relative clauses. Talk presented at “Focus, Contrast and Givenness in Interaction with Extraction and Deletion.” Göbbel, Edward to appear Extraposition of defocused and light PPs in English. In: Manfred Sailer, Heike Walker, and Gert Webelhuth (eds.), Rightward Movement From a Cross-Linguistic Perspective. Amsterdam: John Benjamins. Hartmann, Jutta M. 2008 Expletives in Existentials: English there and German da. Utrecht: LOT. Heck, Fabian & Gereon Müller 2007 Derivational optimization of wh-movement. Linguistic Analysis 33 (1–2): 97–148. Hoekstra, Teun & Rene Mulder 1990 Unergatives as copular verbs: Locational and existential predication. The Linguistic Review 7: 1–79. Horn, Laurence R. 1984 Toward a new taxonomy for pragmatic inference: Q-based and R-based implicature. In: Deborah Schiffrin (ed.), Meaning, Form, and Use in Context: Linguistic Applications, 11–42. Washington DC: Georgetown University Press. Kuno, Susumu & Ken-Ichi Takami 2004 Functional Constraints in Grammar on the Unergative-unaccusative Distinction. Amsterdam/Philadelphia: John Benjamins. Landau, Idan 2010 The Locative Syntax of Experiencers. Cambridge, MAss.; MIT Press. Levin, Beth & Malka Rappaport Hovav 1995 Unaccusativity at the Syntax-Lexical Semantics Interface. Cambridge, MA: MIT Press. Levin, Lorraine 1986 Operations on lexical forms: Unaccusative rules in Germanic languages. Dissertation, MIT. Lessing, Doris 2007 The Grass is Singing. London New York Toronto Sydney and New Delhi: Harper Perennial. Perlmutter, David M. 1978 Impersonal passives and the unaccusative hypothesis. Proceedings of the Fourth Annual Meeting of the Berkeley Linguistics Society, 157–189. Berkeley: Berkeley Linguistic Society, University of California, Berkeley.

264

Sara Holler & Jutta M. Hartmann

Perlmutter, David M. & Paul Postal 1984 The 1-advancement exclusiveness law. In: David M. Perlmutter and Carol G. Rosen (eds.), Relational Grammar 2, 81–125. Chicago: Chicago University Press. Postal, Paul 1977 About a non-argument for raising. Linguistic Inquiry 8: 141–154. Postal, Paul 2004 Skeptical Linguistic Essays. Oxford: Oxford University Press. Prince, Ellen 1992 The ZPG Letter: Subjects, de¿niteness, and information-status. In: Sandra Thompson and William Mann (eds.), Discourse Description: Diverse Analyses of a Fundraising Text, 295–325. Amsterdam: John Benjamins. Rochemont, Michael S. 1978 A theory of stylistic rules in English. PhD Dissertation. University of Massachusetts, Amherst. Rochemont, Michael S. 1986 Focus in Generative Grammar. Amsterdam/Philadelphia: John Benjamins. Rochemont, Michael S. and Peter W. Culicover 1990 English Focus Constructions and the Theory of Grammar. Cambridge/ New York: Cambridge University Press. Salzmann, Martin 2009 Repair-driven verb movement in English locative inversion. In: Patrick Brandt and Eric Fuss (eds.), Repairs. Berlin: Mouton.

Part 3: Cognitive and neurological basis of language

Word- vs. sentence-based simulation effects in language comprehension Barbara Kaup, Jana Lüdtke & Ilona Steiner

1.

Introduction1

In the literature on language comprehension many authors nowadays assume that comprehenders understand language by mentally simulating the described objects, events and situations. These simulations are assumed to be experiential in nature as they are grounded in perception and action (Barsalou, 2008; Glenberg & Kaschak, 2002; Zwaan, 2004). More speci¿cally, according to this simulation view of language comprehension, each interaction with the world leaves experiential traces in the brain. These traces are partially re-activated when people read or hear words referring to the respective entities. If words appear in larger phrases or sentences, the activated traces are presumably combined to yield simulations consistent with the meaning of the larger phrase or sentence (Zwaan & Madden, 2005). There is a steadily growing body of evidence for this view. On the one hand there are neuroscience studies indicating a considerate overlap between the mental subsystems utilized in representing linguistically speci¿ed states of affairs and those utilized in direct experience. For instance, studies using brain imaging techniques have shown that the processing of linguistic materials referring to actions that are typically performed with certain effectors (e.g. to lick, to kick, to grasp) activates those sections of the premotor and motor cortex that are speci¿c for actions with the respective effector (Hauk, Johnsrude & Pulvermüller 2004; Tettamanti et al. 2005). Similarly, studies using transcranial magnetic stimulation have found that motor evoked potentials recorded from hand and foot muscles are speci¿cally modulated by listening to hand-action-related vs. foot-action-related sentences, re1. We thank the students of the course “Experimental Methods in Linguistics”, especially I. Andris, H. Bischoff, B. Blankenhorn, J. Boegl, M. Fan, C. Hitzigrath, M. Joachim, S. Maile, A. Plätzer, K. Winkler und L. Riester for their help in material construction and data collection. We also thank Monica de Filippis and three anonymous reviewers for their very helpful comments on an earlier version of this manuscript. The work reported in this chapter was supported by a grant form the German Research Foundation awarded to the ¿rst author (SFB 833; Project B4).

268

Barbara Kaup, Jana Lüdtke & Ilona Steiner

spectively (e.g. He threw/kicked the ball; Buccino et al. 2005; Glenberg et al. 2008). In addition, numerous behavioural studies have provided evidence that linguistic and non-linguistic cognition interact. A particularly elegant paradigm was introduced by Glenberg & Kaschak (2002). In a sentence-sensibilityjudgment task, participants were presented with sentences that described an action involving a movement either towards or away from the body (e.g. You opened/closed the drawer). For half of the participants the correct response involved a movement towards their body, for the other half a movement away from their body. Thus, the movement implied by the sentence either matched or mismatched the required response movement. In line with the idea that comprehenders mentally simulate the described actions when understanding the sentence, reading times were signi¿cantly faster in the match than in the mismatch conditions. Similar effects have been found in studies presenting isolated words. For instance, processing words like up vs. down or towards vs. away is facilitated if the correct response requires a matching rather than a mismatching movement (e.g. Lindsay 2007). Also, for words referring to entities typically encountered in the upper vs. lower part of the visual ¿eld (e.g. hair vs. shoe) processing is facilitated when correctly responding requires an up vs. down response (e.g. Borghi, Glenberg & Kaschak 2004; Lachmair, Dudschig, de Filippis, de la Vega & Kaup, in press; see also Estes, Verges & Barsalou, 2008). In addition to these studies (providing evidence for the simulations view of language comprehension with respect to motor aspects) there are many behavioural studies providing evidence with respect to perceptual aspects of described states of affairs. For instance, in a study by Stan¿eld & Zwaan (2001), participants read sentences referring to a particular target entity. The sentences either implied a horizontal or a vertical orientation of the target entity (e.g. (1) and (2) respectively). Responding to a subsequently presented picture of the target entity was facilitated if the picture matched the orientation implied by the sentence. Similar results were obtained for the shape of the entities mentioned in a sentence. Zwaan, Stan¿eld & Yaxley (2002) presented sentences such as (3) and (4), which depending on the last word in the sentence, implied different shapes of the target entity. Picture-recognition and picture-naming latencies were signi¿cantly faster if the depicted shape matched the implied shape (i.e., an eagle with wings outstretched for (3), drawn in for (4)) compared to when it mismatched. The results of these latter studies ¿t nicely with the idea that readers mentally simulate the described state of affairs when comprehending the sentences. Matching pictures are primed by the simulations that were activated during sentence reading. This may well be the reason why latencies are faster in the match than in the mismatch condition.

Word- vs. sentence-based simulation effects in language comprehension

(1)

He hammered the nail into the wall.

(2)

He hammered the nail into the ceiling.

(3)

The ranger saw the eagle in the sky.

(4)

The ranger saw the eagle in the nest.

269

As was mentioned above, the simulations view of language comprehension assumes that words activate experiential traces stemming from encounters with the entities they refer to. These traces are then presumably combined to yield simulations of the described state of affairs in case the words are part of a larger phrase, sentence or text. However, as of yet, not much attention has been devoted to this composition process. The mechanisms by which experiential traces are combined to yield simulations of more complex states of affairs are still unclear. We do not even know whether these mental simulations are created by a compositional process at all. The currently available evidence in the literature does not provide a good basis for reasoning about mental simulations and compositionality. One reason for this lies in the fact that for many empirical results it is not completely clear, which level of comprehension they reÀect. To illustrate, let us return to the ¿nding that listening to hand-action-related vs. foot-action-related sentences modulates motor evoked potentials recorded from hand and foot muscles, respectively. In principle this effect may be a word-based effect that is solely due to the verb in the sentence (e.g. throw; kick). Alternatively, the effect may be a sentence-based effect that reÀects the fact that throwing a ball is performed with the arm and kicking a ball with the foot (instead of, for instance, throwing a tantrum and kicking the bucket). A similar argument can be made for many of the studies. For instance, the match/ mismatch effects observed in the orientation/shape studies mentioned above are usually interpreted as being sentence-based. A picture of a horizontally oriented nail is easier to process after reading (1) than after reading (2) because (1) but not (2) describes a situation in which the nail is horizontally oriented. Admittedly, as the orientation is not mentioned explicitly in the sentences, an account that attributes the effect to an individual word of the sentences is not possible. However, the effect may still be word-based rather than sentencebased: In principle it seems possible that the word nail in combination with the word wall activates an experiential trace of a nail that is horizontally oriented simply because this combination of words has occurred more often in situations in which the nail was horizontally oriented. The same may be true for the eagle.

270

Barbara Kaup, Jana Lüdtke & Ilona Steiner

In combination with the word nest, the word eagle may activate a trace of an eagle with its wings drawn in, whereas in combination with sky it activates a trace of an eagle with its wings outstretched. If so, the effect would not be based on a simulation of the described situation (sentence-based effect) but on traces activated by the combination of words in the sentence (word-based effect). In what follows we will use the term sentence-based effect in case the relevant variable is the meaning of the sentence as a whole, or in other words, the state of affairs described by the sentence. We will use the term word-based effect in case the relevant variable is the bag of words that make up the sentence, with the syntactic relations between the words being irrelevant. Thus, for a wordbased effect, the difference between sentence (5) and (6) should be irrelevant because the same words are mentioned. For a sentence-based effect, in contrast, the difference should be relevant as the two sentences describe a rather different situation. Moreover, a word-based effect should even work with a stimulus such as (7) that simply presents a list of words that do not make up a grammatical sentence. (5)

At the dance Sarah wore a red dress and black shoes.

(6)

At the dance Sarah wore a black dress and red shoes.

(7)

At a wore shoes the black and dance Sarah dress red.

Whether the effects are word- or sentence-based has theoretical implications. For explaining sentence-based effects one needs to assume that comprehenders composed the meaning of the stimulus sentence and mentally simulated the described states of affairs. In contrast, such an assumption is not required for explaining word-based effects. It suf¿ces to assume that combinations of words activate situation speci¿c experiential traces of the referents they refer to, independent of sentence meaning. Sentence meaning in this case may be composed by a propositional mechanism that is linguistic in nature and independent of the modal systems. Experiential simulations in this case possibly only constitute an optional by-product of comprehension rather than a functional component. We will come back to this issue in the general discussion. The aim of the experiments reported in this chapter was to ¿nd out whether the effects observed in the studies by Stan¿eld & Zwaan (2001) and Zwaan et al. (2002) reÀect word- or sentence based effects. The logic underlying the experiments was as follows: If the effects are sentence-based then match/mismatch effects should depend on the orientation or shape of the target entity as implied by the sentence as a whole. In contrast, if the effects are word-based,

Word- vs. sentence-based simulation effects in language comprehension

271

match/mismatch effects should only depend on the particular words mentioned in the sentences, and prove independent of the sentence meaning or the particular state of affairs described by the sentence. In addition, match/mismatch effects should be observed in an experimental paradigm that does not even involve sentences as stimulus but rather requires the participant to process a list of relevant content words.

2.

Experiment 1

In Experiment 1, we presented participants with sentences that mentioned a particular target entity and implied that this entity was in a certain orientation or shape (e.g. (8)–(11)). In the Zwaan studies the differences with respect to implied shape or orientation were achieved by using different nouns in the two sentence versions (e.g. sky vs. nest in (3) and (4) above). According to the word-based hypothesis, these nouns are critical for the occurrence of the match/mismatch effect. In the following, we will therefore call these nouns critical nouns. In contrast to the Zwaan studies, we used the same critical words in both sentence versions in this experiment. Differences with respect to the implied shape or orientation resulted from differences in word ordering and syntax. If the match/mismatch effect with respect to orientation and shape is a sentence-based effect then we should be able to replicate it in the present experiment (sentence-based view). The sentences clearly implied a different shape or orientation depending on sentence version. In contrast, if the effect is word-based and due to the co-occurrence of the critical words (word-based view), then it should not replicate in the present study: Both critical words are present in both versions of the sentences. The two sentence versions thus do not differ with respect to the relevant content words, and an effect that is based on words can therefore not be obtained after the sentence has been read. (8)

Maria entdeckt den Pinsel im Wasserbecher neben dem Malkasten. ‘Mary ¿nds the paint brush in the water mug next to the paint box.’

(9)

Maria entdeckt den Pinsel im Malkasten neben dem Wasserbecher. ‘Mary ¿nds the paint brush in the paint box next to the water mug.’

(10) Der Wanderer fotogra¿ert den Adler im roten Abendhimmel über dem Nest. ‘The hiker takes a picture of the eagle in the red evening sky above the nest.’

272

Barbara Kaup, Jana Lüdtke & Ilona Steiner

(11) Der Wanderer fotogra¿ert den Adler im Nest vor dem roten Abendhimmel. ‘The hiker takes a picture of the eagle in the nest in front of the red evening sky.’ 2.1.

Method

2.1.1. Participants Fifty-two people participated in the study, all with normal or corrected to normal vision. 2.1.2. Materials A total of 32 experimental sentence pairs were constructed. Each pair mentioned a particular target entity (e.g. paint brush). The pairs were constructed in such a way that they differed with respect to the implied shape or orientation of the target entity. For instance, (8) clearly implies a vertical orientation of the paint brush. In contrast, in (9) the implied orientation is clearly horizontal. Importantly, the two versions of each pair mentioned the same nouns and verbs. The differences in implied shape or orientation were achieved by syntactic manipulations, i.e., by changing the word order or by exchanging spatial prepositions. We deliberately used a variety of different sentence structures to prevent readers from applying strategies when processing the sentences. Thus, some sentences were of the structure exempli¿ed in (8)–(11), others for instance employed relative clauses, as in (12)–(13). (12) Der Pfad¿nder benutzt die Leiter, um über die Grube neben dem Apfelbaum zu kommen. ‘The boy scout uses the ladder to get across the pit beside the apple tree.’ (13) Der Pfad¿nder benutzt die Leiter, um auf den Apfelbaum neben der Grube zu kommen. ‘The boy scout uses the ladder to get onto the apple tree beside the pit.’ Also, the two critical nouns were mentioned in different orders across the experimental sentence pairs. The noun that was decisive for the orientation or shape of the target entity (i.e. water mug in (8), paint box in (9)) was mentioned prior to the other noun (i.e. paint box in (8) and water mug in (9)) in some sentences. In others, the order was reversed, as in (14)–(15).

Word- vs. sentence-based simulation effects in language comprehension

273

(14) Der Mann schlägt den Nagel dicht unter der Holzdecke in die Wand. ‘The man pounds the nail close to the wooden ceiling into the wall.’ (15) Der Mann schlägt den Nagel dicht an der Wand in die Holzdecke. ‘The man pounds the nail close to the wall into the wooden ceiling.’ Two black-and-white images, depicting the target object in the two implied shapes or orientations, were also constructed to correspond to each experimental sentence pair2. This yielded two sentences and two pictures for each of the 32 target objects. Each experimental sentence could be paired with a picture that matched or mismatched the implied shape or orientation of the target object, yielding four possible sentence–picture combinations (see Figure 1). Participants were to see only one of these four possible combinations for each target object (see below). A total of 32 additional ¿ller sentences were constructed. These ¿ller sentences were followed by pictures of objects not named in any of the sentences. All pictures were scaled to occupy a 3-inch square on the screen. In addition, a total of 16 comprehension questions were constructed. Eight of these were presented following experimental items and the other 8 following ¿ller items. Half of the questions required a ‘yes’-response and the other half a ‘no’-response. In summary, each participant saw 32 experimental sentences that were paired with pictures that required a “yes” response. In addition, each participant saw 32 ¿ller sentences that were paired with pictures requiring a “no” response. In half of the experimental trials the picture matched the shape and orientation of the target object as implied by the sentence. In the other half the picture mismatched with respect to shape or orientation. To make sure that the two sentences in each experimental sentence pair indeed differed with respect to the picture they matched or mismatched with, we gave four participants, who did not take part in the experiment proper, a list of all 64 experimental sentences together with a list of all 64 pictures. We underlined the word for the target entity in the sentences, and asked participants to select the picture that best matched the target entity as described in the sentence. The mappings of sentences to pictures were as intended. Differences occurred in less than 3% of the cases (i.e. in 7 out of 256 answers).

2. Many of the pictures employed in the present studies were employed in the original studies by Stan¿eld et al. (2001) and Zwaan et al. (2002). We are grateful to the authors of these studies for giving us access to those pictures.

274

Barbara Kaup, Jana Lüdtke & Ilona Steiner

2.1.3. Design and Procedure We created four lists that counterbalanced items and conditions. Each list included a different one of the four possible versions (2 sentences × 2 pictures) for each object. Each participant saw one of these lists. For two of the overall four versions the picture matched the shape and orientation of the target object described in the sentence, and for the other two the picture mismatched orientation or shape (see Figure 1). For the statistical analyses we combined the two former and the two latter conditions, resulting in a 2 (match/mismatch) design.

Figure 1. Sample Materials employed in Experiment 1.

Participants were instructed to read each sentence and then to decide whether or not the pictured object that followed had been mentioned in the preceding sentence. They were informed that reaction times and accuracy were being measured and that it was important for them to make the decisions about the picture as quickly and accurately as possible. During each trial, participants ¿rst saw a sentence, left justi¿ed on the screen, which either did or did not mention the object that they would later see. They pressed the space bar when they had understood the sentence, and then a ¿xation point appeared at the centre of the screen for 1500 ms, followed by a picture. Participants then determined whether the pictured object had been mentioned in the preceding sentence, by pressing the appropriate key (“.”-key, marked with “J” for Ja (yes), “x”-key, marked with “N” for Nein (no)). In trials with a comprehension question, the question was presented next. Participants were asked to respond to the questions by pressing the “y” or “n” key, respectively. Participants were not given feedback on their responses. The comprehension questions were included in the procedure in order to make sure that participants were reading the sentences for comprehension, rather than only paying attention to the words. The experiment took approximately 20 minutes to complete.

Word- vs. sentence-based simulation effects in language comprehension

2.2.

275

Results and discussion

Response latencies of experimental trials were submitted to two paired-samples t-tests, one treating participants as random factor and one treating items as random factor. The latency analysis was performed on correct responses only. Responses longer than 6000 ms were omitted. In determining outliers within the remaining latencies, we took not only differences among the participants into account, but also differences among the items. We employed a two-step procedure: First, the valid latencies of each participant were converted to zscores. Then latencies with a z-score deviating more than 2 standard deviations from the mean z-score of the respective item in the respective condition were discarded. This eliminated 6 % of the data. The mean latencies and accuracy scores in the match and mismatch conditions are displayed in Table 1, together with the 95% con¿dence interval for within participants designs (Masson & Loftus 2003). Overall participants responded with a mean latency of 1058 ms and a mean accuracy of 97 % to the picture-recognition task. The comprehension questions were answered with a mean accuracy of 84%. In contrast to the predictions of the sentence-based view, latencies in the picture-recognition task did not depend on whether the picture matched or mismatched the orientation/shape implied by the sentence (t1(51) = í0.32; t2(31) = í0.43; both ps > 0.60). This also held true if we only included those participants who correctly responded to the comprehension questions in at least two third of the cases (t1(48) = í0.95; t2(31) = í0.98; both ps > 0.17, one-tailed). When we analysed only those 28 participants who made two or less mistakes in the comprehension questions, picture-recognition times were nearly 30 ms faster in the match than in the mismatch condition (1076 vs. 1105 ms), but this difference was still not signi¿cant (t1(27) = í1.013; t2(31) = í0.73; both ps > 0.16, one-tailed). The different sentence versions clearly implied different shapes or orientations of the target object. Thus, if readers mentally simulated the sentence content and this then facilitated or hindered processing of a matching or mismatching picture (sentence-based view) then we should have observed a match/ mismatch effect in this experiment. In contrast, the results are in line with the word-based hypothesis, according to which the match/mismatch effect results because reading the word for the target object (e.g. paint brush) activates different experiential traces depending on the other words in the context. If mug is present in the context, a trace with a vertical brush is being activated. If box is present, a trace with a horizontal brush is being activated. In this experiment, both of the critical words were present in the sentence, and thus both of the

276

Barbara Kaup, Jana Lüdtke & Ilona Steiner

traces should be equally active according to the word-based hypothesis, leading to the observed null-effect. Table 1. Mean latencies (in ms) and accuracy scores in the match-/mismatch conditions of Experiments 1–3. The size of the con¿dence interval was determined according to Masson & Loftus (2003). Experiment Exp.1: both critical words Exp.2: both critical words Exp.2: one critical word Exp.3: word lists

Match M Acc 1054 0.98 1239 0.98 1115 0.97 947 0.83

Mismatch M Acc 1059 0.97 1237 0.97 1191 0.95 1021 0.80

CI95% +/í 21.6 +/í 42.4 +/í 39.4 +/í 17.25

Of course it is always dif¿cult to draw conclusions from a null-effect. In principle it is possible that our experimental design was not powerful enough to unearth the effects, or that our stimuli were not adequate for obtaining match/ mismatch effects. We therefore conducted Experiment 2, in which we directly compared the conditions of the present experiment with conditions for which both views predict a match/mismatch effect.

3.

Experiment 2

In Experiment 2, we manipulated within one experiment whether in addition to the name of the target entity only the relevant critical word was being mentioned (as in the Zwaan experiments; e.g., (1)–(4)), or whether the other critical word was mentioned as well (as Experiment 1; e.g., (8)–(11)). If the wordbased hypothesis is correct, then the match/mismatch effect should interact with the number of critical words mentioned in the sentence. If only the critical word is being mentioned that is relevant for the target entity’s orientation or shape, then pictures in the match conditions should lead to faster latencies than those in mismatch conditions. If both critical words are mentioned, no latency difference shoud be observed.

3.1.

Method

3.1.1. Participants Thirty-two people participated in the study, all with normal or corrected to normal vision.

Word- vs. sentence-based simulation effects in language comprehension

277

3.1.2. Materials A total of 32 experimental items were constructed. Each experimental item was available in four versions. Two of the four versions were identical with respect to the nouns and verbs mentioned in the sentences but differed in sentential content. These versions mentioned both critical words. Differences with respect to the implied shape or orientation of the target entity resulted from differences in word order or spatial prepositions. These “both-critical-words”-versions corresponded to the versions employed in Experiment 1 (cf. (8)–(11) above). The other two versions differed with respect to sentential content as well as with respect to the words used in the sentences. The versions mentioned only one of the two critical words. Differences with respect to the implied shape or orientation of the target entity resulted from the use of different words. These versions correspond to the versions used in the original studies by Stan¿eld et al. (2001) and Zwaan et al (2002), and are exempli¿ed by (16)–(19). (16) Maria entdeckt den Pinsel im Atelier im Wasserbecher. ‘Mary ¿nds the paint brush in the studio in the water mug.’ (17) Maria entdeckt den Pinsel im Atelier im Malkasten. ‘Mary ¿nds the paint brush in the studio in the paint box.’ (18) Der Wanderer fotogra¿ert den Adler im roten Abendhimmel im Park. ‘The hiker takes a picture of the eagle in the red evening sky in the park.’ (19) Der Wanderer fotogra¿ert den Adler im Nest im Park. ‘The hiker takes a picture of the eagle in the nest in the park.’ As in Experiment 1, two black-and-white images, depicting the target object in the two implied shapes or orientations, were available for each experimental item. This yielded four sentences and two pictures for each of the 32 target objects. Thus, each experimental sentence could be paired with a picture that matched or mismatched the implied shape or orientation of the target object, yielding eight possible sentence–picture combinations. The four combinations involving the “both-critical-words”-versions were presented to one group of participants. The four combinations involving the “one-critical-word”-versions were presented to the other group of participants. For each target object, participants saw only one of these four combinations. As in Experiment 1, participants also saw 32 ¿ller items with pictures depicting an object not mentioned in any of the sentences. Again, there were comprehension questions for 16 of the overall 64 trials in the experiment.

278

Barbara Kaup, Jana Lüdtke & Ilona Steiner

3.1.3. Design and Procedure The design was the same as in Experiment 1, except that there were two groups of participants, one receiving the sentence-picture combinations with both critical words in the sentence, and the other receiving the combinations with only one critical word in the sentence. For each of the two groups four lists were created that counterbalanced items and conditions. Each list included a different one of the groups’ four versions (2 sentences × 2 pictures) for each object. Each participant saw one of these lists. For two of the overall four combinations, the picture matched the shape and orientation of the target object as described in the sentence. For the other two the picture mismatched the target object’s orientation or shape (see Figure 2). For the statistical analyses we again combined the two former and the two latter conditions, respectively, resulting in a 2 (group: both-critical-words vs. one-critical-word) × 2 (match/mismatch) design.

Figure 2. Sample materials employed in Experiment 2.

3.2.

Results and discussion

Response latencies of experimental trials in the picture-recognition task were submitted to two analyses of variance (ANOVAs), one treating participants as random factor and one treating items as random factor. To reduce error variance, we included the counterbalancing factor ‘list’ in the analyses, resulting

Word- vs. sentence-based simulation effects in language comprehension

279

in 2 (group: both-critical-words vs. one-critical-word) × 2 (match/mismatch) × 4 (list) analyses with repeated measurement on match/mismatch in both the by-participant analysis and the by-items analysis. The latency analysis was performed on correct responses only. Outliers were eliminated according to the same procedure as in Experiment 1. This reduced the data set by less than 4%. One participant was excluded from further analyses because of too many errors in the picture-recognition task (> 50%). The means of the remaining latencies and the accuracy scores in the match and mismatch conditions are displayed in Table 1, together with the 95% con¿dence interval for within participants designs (Masson & Loftus, 2003). Participants responded to the picture-recognition task with a mean latency of 1197 ms and mean accuracy of 97%. The comprehension questions were answered with a mean accuracy of 80%. As in Experiment 1, participants’ latencies in the match and mismatch conditions were nearly identical for conditions in which the sentence mentioned both critical words. In contrast, for conditions mentioning only one critical word, latencies in the picture-recognition task were 76 ms faster in case the picture matched the implied orientation and shape compared to when it mismatched shape or orientation. These differences were reÀected in the statistical analyses: In the overall analyses, there was a main effect of group, which however was only signi¿cant in the by-item analysis, F1(1,23) < 1, F2(1,28) = 7.0, p < 0.05. There was no main effect of match/mismatch, F1(1,23) = 2.8, p = 0.11, F2(1,28) = 2.8, p = 0.11. The interaction between match/mismatch and group was only marginally signi¿cant in the by-participant analysis, F1(1,23) = 3.1, p = 0.09, F2(1,28) = 1.2, p = 0.28. According to the hypotheses, we nevertheless conducted separate analyses for the two groups. For the conditions with both critical words, there was no match/mismatch effect, t1(15) = 0.05, p = 0.96, t2(31) = í0.01, p = 0.98. In the conditions with only one critical word, however, the match/mismatch effect was signi¿cant in the by-participant analysis and marginally signi¿cant in the by-item analysis, t1(16) = í2.4, p < 0.05, t2(31) = í1.4, p = 0.08 (one-tailed). At the ¿rst sight, the results correspond nicely to the word-based hypothesis. For conditions in which the sentence mentioned both critical words there was no match/mismatch effect. This presumably is due to the fact that the two critical words, in combination with the word referring to the target entity, activate conÀicting orientations or shapes, therefore eliminating the match/mismatch effect. Put slightly differently, in these conditions the sentences were made up of the same nouns and verbs and should thus be equivalent according to the word-based hypothesis. The null-effect in these conditions replicates the null-effect observed in Experiment 1. The situation is different for the two ad-

280

Barbara Kaup, Jana Lüdtke & Ilona Steiner

ditional conditions employed in the current experiment. In these conditions, the sentence mentions only one of the two critical words. In combination with the word referring to the target entity, this critical word presumably activates an experiential trace of the target entity in a particular orientation or shape. This orientation or shape matches the orientation or shape in the picture presented in the match conditions, and mismatches the orientation or shape in the picture presented in the mismatch condition. The latencies in the picture-recognition task are therefore shorter in the match than in the mismatch conditions. Maybe the match/mismatch effect was more pronounced in the one-critical word condition in the present experiment because participants in this group put more effort into reading the sentences compared to the both-critical words group? This hypothesis can be ruled out: When analyzing the comprehension questions that were the same for the two groups (i.e. the questions asked in ¿ller trials) no difference in the accuracy scores between the two groups was observed (t(29) = 0.33, p > 0.70). Differences were observed, however, when analyzing the accuracy scores in experimental trials. Here the one-critical word group signi¿cantly outperformed the both-critical word group (t(29) = í3.33, p < 0.01). This probably reÀects the fact that the experimental sentences presented in the both-critical word group were longer and more complex than those in the one-critical word group. Of course, this may also be a reason why the match/mismatch effect was more pronounced in the one-critical word condition. Further studies that control for this confound would be required in order to permit de¿nite conclusions to be drawn. Moreover, when interpreting the results in favor of the word-based hypothesis, it should be kept in mind that the main task of the present experiment was word-based: In the picture-veri¿cation tasks, participants decided whether the depicted entity had been mentioned in the sentence or not. In principle it is possible that participants did not put too much effort into comprehending the sentences, and if so, it might come of less surprise that sentence-based effects were not observed. Indeed when analyzing only those participants who correctly responded to the comprehension questions in at least two thirds of the cases, the pattern of results looks quite different: For these participants we observe no match by group interaction (both Fs < 1) but a signi¿cant match effect in the overall analysis (F1(1,21) = 5.47, p < 0.05; F2(1,28) = 6.55, p < 0.05). As shown in Figure 3, both groups show a clear numerical advantage in the match compared to the mismatch condition, but a separate analysis for the both-critical word group did not reveal a signi¿cant match effect (t1(12) = í1.09, p = 0.15; t2(31) = í1.428, p = 0.09). It is dif¿cult to interpret these post-hoc results. They could be taken to suggest that sentence-based effects may be observed provided that participants carefully read for comprehension. However, when interpreting the results in this way, it

Word- vs. sentence-based simulation effects in language comprehension

281

Figure 3. Match/Mismatch effect observed in Experiment 2 for participants with at least 66% accuracy in the comprehension-question task. Error bars represent the 95% con¿dence interval for within subject designs (Masson & Loftus, 2003).

has to be kept in mind that no match/mismatch effect was observed in Experiment 1, not even for participants with high degrees of accuracy in the questionanswering task. Taken together, the results of this experiment combined with the results of Experiment 1 do not allow for de¿nite conclusions with respect to the wordand sentence-based explanation of the match/mismatch effect. At the very least however, the results suggest that the usual explanation in terms of purely sentence-based processes may fall too short. Of course, the two explanations in terms of sentence and word based processes must not necessarily be contradictory. In principle it seems possible that the match/mismatch effect reÀects a mixture of word- and sentence-based processes with sentence-based processes possibly being particularly pronounced when participants carefully read for comprehension and when the experimetal task focuses on sentence-based rather than on word-based processes.

282

Barbara Kaup, Jana Lüdtke & Ilona Steiner

In Experiments 1 and 2 the experimental task involved sentence reading, and the goal was to ¿nd evidence for a word- or sentence-based explanation of the match/mismatch effect. In Experiment 3 we will focus exclusively on wordbased processes in a task that does not involve sentence reading. The logic in this case is the following: Should word-based processes contribute to the match/mismatch effect, then a match/mismatch effect should also be observed as long as the relevant words are being processed, even if they do not make up a larger phrase or sentence.

4.

Experiment 3

Participants were presented with lists of words and non-words in a lexical decision task. After each sequence of six items a picture was presented, and participants decided as quickly as possible whether the depicted object had been mentioned in the sequence or not. In experimental trials, the sequence contained the relevant words of the ‘one-critical-word’ conditions of Experiment 2, intermixed with non-words (cf. (20) and (21)). The picture presented in experimental trials depicted the target entity either in the orientation or shape that matched or that mismatched the words in the sequence (see Figure 4). (20) Pinsel / lorfen / entdecken / Tempe / Wasserbecher / Karumpe. ‘paint brush / lor¿ng / ¿nding / tempe / water mug / karumpe’ (21) Pinsel / lorfen / entdecken / Tempe / Malkasten / Karumpe. ‘paint brush / lor¿ng / ¿nding / tempe / paint box / karumpe’ If word-based processes contribute to the match/mismatch effect then such an effect should be observed in this experiment because the combination of words in experimental sequences presumably activates a particular orientation or shape. If the presented picture matches the activated traces then picture-recognition latencies should be faster than when the presented picture mismatches the activated traces. In contrast, if the match/mismatch effect were solely due to sentence-based processes, no effect should be obtained in this experiment, in which participants are presented with lists of words, not with sentences.

Word- vs. sentence-based simulation effects in language comprehension

283

Figure 4. Sample materials employed in Experiment 3.

4.1.

Method

4.1.1. Participants Thirty people participated in the study, all with normal or corrected to normal vision. 4.1.2. Materials Eighty-¿ve sequences of six strings of letters were constructed. Twenty eight of these sequences were experimental sequences. Each of these contained three words and three non-words, and was available in two versions. The three words in the two versions were taken from the experimental sentence pairs employed in Experiment 2, namely the pairs in the ‘one-critical-word’ conditions. The three words in each experimental sentence consisted of the word referring to the target entity, the verb of the sentence (in in¿nitive form), and the critical word of the respective version. Two of the three non-words resembled nouns and the third non-word resembled a verb. The 28 experimental sequences were paired with the 28 corresponding experimental picture pairs from Experiment 2. Thus, each sequence was followed by one of two pictures of the target entity, one with matching and one with mismatching orientation or shape. Half of the 56 ¿ller sequences consisted of four words and two non-words, the other half of two words and four non-words. For each ¿ller sequence one of the words was a verb and one of the non-words resembled a verb. Fourteen of the ¿ller sequences were paired with a picture of an entity mentioned in the sequence (i.e. requiring a ‘yes’-response in the picture-recognition task). The remaining 42 ¿ller sequences were paired with a picture of an entity not mentioned anywhere in the list. Thus, overall, half of the trials required a ‘yes’- and half a ‘no’-response. Also, overall, one third of the trials consisted of three words and

284

Barbara Kaup, Jana Lüdtke & Ilona Steiner

three non-words, one third of two words and four non-words and one third of four words and three non-words. 4.1.3. Design and Procedure The design was the same as in Experiment 1. We created four lists that counterbalanced items and conditions. Each list included a different one of the four possible versions (2 sequence versions × 2 pictures) for each object. Each participant saw one of these lists. For two of the overall four versions the picture matched the shape and orientation of the target object presumably activated by the words in the sequence, and for the other two the picture mismatched orientation or shape (see Figure 4). For the statistical analyses we combined the two former and the two latter conditions, resulting in a 2 (match/mismatch) design. Participants were told that they were to see sequences of letter strings intermixed with pictures. For each string they were instructed to decide as quickly as possible whether or not it corresponded to a word of the German language, and for each picture they were to decide whether the depicted object had been mentioned in the preceding sequence or not. They were informed that reaction times and accuracy were being measured and that it was important for them to make the decisions about the picture as quickly and accurately as possible. During each trial, participants ¿rst saw a letter string left justi¿ed on the screen (black font). They decided as quickly as possible whether the string constituted a word or not by pressing the appropriate key (“.”-key, marked with “J” for Ja (yes), “x”-key, marked with “N” for Nein (no)). Then a ¿xation cross came up for 400 ms (black font). Afterwards the next letter string appeared on the screen, and participants again decided whether it was a word or not. Four more letter strings followed in the same manner. After pressing the key in response to the sixth letter string, a red ¿xation cross appeared in the center of the screen for 400 ms. Then the picture came up. Participants decided whether the depicted object had been mentioned in the preceding sequence of letter strings or not, by pressing the appropriate key (again “.”-key, marked with “J” for Ja (yes), “x”-key, marked with “N” for Nein (no)). Participants were not given feedback on their responses in the experiment proper. The experiment took approximately 20 minutes to complete. 4.2.

Results and discussion

Outlier elimination was performed as in Experiment 1 with two exceptions. As the latencies in the picture-recognition task of this experiment (following word lists) were shorter than those in Experiment 1 (following sentences), we omitted responses longer than 4000 ms (rather than those longer than 6000 ms

Word- vs. sentence-based simulation effects in language comprehension

285

as in Experiment 1). Also, as error rates were higher in this experiment, and the number of observations per cell thus varied to a stronger degree, we did not use a ¿xed value of +/í 2 standard deviations as a cutoff for determining outliers. Rather, we used different values depending on the number of observations per cell, as suggested by Van Selst & Jolicoeur (1994). This eliminated less than 3% of the data. The data of one item were discarded as more than 45% of the participants responded erroneously to this item. The means of the remaining latencies and the accuracy scores in the match and mismatch conditions are displayed in Table 1, together with the 95% con¿dence interval for within participants designs (Masson & Loftus, 2003). Participants responded with a mean of 984 ms, and an accuracy of 83% to the picture-recognition task in this experiment. Participants’ responses to the picture were 74 ms faster if the picture matched the orientation and shape of the target entity as suggested by the combinations of words in the sequence than when it mismatched this orientation or shape (see Figure 5). This latency difference was signi¿cant in the by-participant analysis but just missed the usual signi¿cance level in the by-items analysis (t1(29) = í2.6; p < 0.01; t2(26) = 1.6; p = 0.065 (one tailed)).

Figure 5. Match/Mismatch effect observed in Experiment 3. Error bars represent the 95% con¿dence interval for within subject designs (Masson & Loftus 2003).

286

Barbara Kaup, Jana Lüdtke & Ilona Steiner

This result is in line with the hypothesis that word-based processes contribute to the match/mismatch effect and speaks against the view according to which the match/mismatch effect solely reÀects sentence-based processses. An advocate of a pure sentence-based view could argue that participants might have mentally constructed a sentence from the words in the sequence. In this case the match/mismatch effect observed in this experiment would not reÀect wordbased processes but rather sentence-based processes, despite the fact that the experimental task in this experiment did not involve sentences. The present experiment was not designed to rule out this possibility. However, considering that the words were intermixed with the same amount of non-words in the experimental trials in this experiment we do not consider this alternative explanation to be very likely. In any case, future research is necessary which explicitly addresses this possibility. It would also be interesting to see whether the order in which the words appear in experimental trials makes a difference. In this experiment, the order was based on the order of appearance in the sentences employed in Experiment 2. In principle it seems possible that different results would be obtained if a different order was used.

5.

General Discussion

In three experiments we investigated the question of whether the match/mismatch effects observed in the studies by Stan¿eld & Zwaan (2001) and Zwaan et al. (2002) reÀect word- or sentence-based processes. In two experiments, participants in each experimental trial read a sentence referring to a particular target entity and subsequently responded to a picture of this entity. In both experiments, the depicted entity either matched the described state of affairs with respect to the orientation and shape of the target entity or mismatched this state of affairs. A clear match/mismatch effect was observed in conditions in which manipulating the implied orientation or shape went along with mentioning different content words in the sentence (i.e. eagle and nest vs. eagle and sky). In conditions in which the implied orientation was manipulated without changing the content words, the match/mismatch effect was less clear. In Experiment 1, no match/mismatch effect was observed under these conditions. In Experiment 2, the effect emerged for those participants who carefully read the sentences for comprehension, as indicated by their accuracy scores in the question-answering task. Futhermore, in Experimetn 3 a match/mismatch effect was observed in a paradigm that does not involve sentence reading but the processing of lists of individual words. These results suggest that the match/mismatch effect may at least partly be due to processes at the lexical level: Words activate experiential

Word- vs. sentence-based simulation effects in language comprehension

287

traces of the entities they refer to. Combinations of words activate a particular context-adequate subset of these traces. Thus, in isolation, the word eagle activates traces of all sorts of eagles, some with their wings outstretched and some with their wings drawn in. In the context of a word like nest, however, mainly those traces are being activated in which the eagle has its wings drawn in. This may be part of the reason why a sentence such as (3) activates an eagle with outstretched wings whereas (4) activates an eagle with wings drawn in. If the sentence mentions the target entity with both of these critical words (as in (10) and (11)) then the situation is similar to the isolated word case: Traces with both orientations/shapes are being activated, and accordingly a match/mismatch effect can not occur, unless sentence-based processes play a role as well. Whether sentence-based processes play a role cannot be unambiguously answered on the basis of the experiments reported in this chapter. The experiments neither provide clear evidence against nor clear evidence for the involvement of sentencebased processes in the match/mismatch effect. The results reported in this chapter indicate that one needs to be careful when interpreting simulation effects observed in sentence comprehension tasks. Under certain conditions, these simulation effects may reÀect word-based processes rather than sentence-based processes, even if the task involves sentence reading. As was already mentioned in the introduction, whether simulation effects are word- or sentence based has important theoretical implications: For explaining word-based effects, simple associative mechanisms suf¿ce: Through co-occurrence in experience certain words or combinations of words get associated with certain experiential traces. These traces then are re-activated whenever the words are being encountered (cf. Zwaan & Madden 2005). In accounting for word-based effects we do not need to assume a composition process that operates on these individual traces. Rather, we may stick to the assumption that the process responsible for composing sentence meaning operates on and results in meaning representations in a linguistic format (i.e. propositional representations; e.g. Kintsch 1988; McKoon & Ratcliff 1992: see also Chomsky 1980; Fodor 2000; Pinker 1994). In contrast, for explaining sentence-based effects, we do need to assume sentence-based simulation processes. In our view there are two rather different potential accounts for sentence-based simulation effects. According to ¿rst account, experiential traces are the only kind of meaning representation utilized in language comprehension. Sentence meaning presumably is composed on the basis of the activated traces, and is assumed to result in an experiential simulation of the state of affairs described in the sentence. Obviously, with a radical account such as this one, much theoretical work is needed. Research would need to focus on the question of how lexically activated experiential traces can be combined to yield simulations consistent

288

Barbara Kaup, Jana Lüdtke & Ilona Steiner

with the meaning of the larger phrase or sentence. The second account is less radical. According to this account experiential traces and simulations constitute only one of two kinds of meaning representations utilized in language comprehension. Words and word combinations would therefore activate experiential traces (see above). However, the composition process itself does not operate on these traces. Rather it operates on meaning components that are represented in a linguistic format, and its result is a meaning representation in a linguistic format. Once the composition process has determined the meaning of a sentence, the comprehender may simulate the corresponding state of affairs. Obviously, as was the case for the ¿rst account, more theoretical work would be needed for this account to be convincing. Research would need to determine how a propositional representation can be “translated” into an experiential simulation. Furthermore, it would be important to investigate whether the experiential simulations of the described states of affairs created after the composition process has taken place are functional for comprehension, or rather constitute an optional by-product of comprehension. Currently it is not obvious how empirical studies in the ¿eld of language comprehension could distinguish between the two alternative accounts for sentence-based effects. Possibly, the temporal dynamics of simulation effects observed during language comprehension could be meaningful. In any case, ¿rst future research needs to ¿nd out which of the observed simulation effects are word-based and which are sentence-based. Only if there is clear evidence for sentence-based effects does research need to focus on investigating their theoretical basis.

References Barsalou, L. W. 2008 Grounded cognition. Annual Review of Psychology 59. 617–645. Borghi, A.M., A.M. Glenberg & M.P. Kaschak 2004 Putting words in perspective. Memory & Cognition 32. 863–873. Buccino, G., L. Riggio, G. Melli, F. Binkofski, V. Gailee & G. Rizzolatti 2005 Listening to action-related sentences modulates the activity of the morotr system: A combined tms and behavioural study. Cognitive Brain Research 24: 355–363. Chomsky, N. 1980 Rules and representations. New York: Columbia University Press. Estes, Z., M. Verges & L.W. Barsalou 2008 Head up, foot down. Psychological Science 19: 93–97.

Word- vs. sentence-based simulation effects in language comprehension

289

Fodor, J. 2000 The mind doesn’t work that way. Cambridge, MA: MIT Press. Glenberg, A. M. & M. P. Kaschak 2002 Grounding Language in Action. Psychonomic Bulletin & Reivew 9: 558– 565. Glenberg, A. M., M. Sato, L. Cattaneo, L. Riggio, D. Palumbo & G. Buccino 2008 Language modulates motor system activity. The Quarterly Journal of Experimental Psychology 61: 905–919. Hauk, O., I. Johnsrude & F. Pulvermüller 2004 Somatotopic representation of action words in human motor and premotor cortex. Neuron 41: 303–307. Kintsch, W. 1988 The role of knowledge in discourse comprehension: A construction-integration model. Psychological Review 95: 163–182. Lachmair, M., M. De Fillipis, C. Dudschig, I. De la Vega & B. Kaup in press. Root versus roof: Automatic activation of location information during word processing. Psychonomic Bulletin & Review. Lindsay, S. 2007 The word action compatibility effect. Paper presented at ESP07. Saarland University. Masson, M. E. J. & G. R. Loftus 2003 Using con¿dence intervals for graphically based data interpretation. Canadian Journal of Experimental Psychology 57: 203–220. McKoon, G. & R. Ratcliff 1992 Inference during reading. Psychological Review 99: 440–466. Pinker, S. 1994 The language instinct. New York: Harper Collins. Stan¿eld, R. A. & R. A. Zwaan 2001 The effect of implied orientation derived from verbal context on picture recognition. Psychological Science 12: 153–156. Tettamanti, M., G. Buccino, M.C. Saccuman, V. Gallese, M. Danna & P. Scifo 2005 Listening to action-related sentences actiates fronto-parietal motor circuits. Journal of Cognitive Neuroscience 17: 273–281. Van Selst, M. V. & P. Jolicoeur 1994 A solution to the effect of sample size on outlier elimination. The Quarterly Journal of Experimental Psychology 47 A: 631–650. Zwaan, R. A. 2004 The immersed experiencer. Toward an embodied theory of language comprehension. In B. H. Ross (Ed.), The Psychology of Learning and Motivation 44: 35–62. New York: Academic Press.

290

Barbara Kaup, Jana Lüdtke & Ilona Steiner

Zwaan, R. A. & C. J. Madden 2005 Embodied sentence comprehension. In D. Pecher and R. A. Zwaan (Eds.), Grounding Cognition, 224–245. Cambridge: Cambridge University Press. Zwaan, R. A., R. A. Stan¿eld & R. H. Yaxley 2002 Language comprehenders mentally represent the shapes of objects. Psychological Science 13: 168–171.

Language skills in patients with reorganized language (RL) Eleonore Schwilling, Karen Lidzba, Andreas Konietzko, Susanne Winkler & Ingeborg Krägeloh-Mann

1.

Introduction

Generally, language functions in the brain are represented in the left cerebral hemisphere (Price 2010). Especially for expressive language abilities this is evident in most right-handers and even in most left-handers (Cabeza & Nyberg 2000; Whitehouse & Bishop 2009). Studies on very young children provide evidence for a genetic predisposition of the left hemisphere for language (Dehaene-Lambertz et al. 2002): Electroencephalogram (EEG) amplitudes for syllable discrimination are larger in the left than in the right temporal lobe of 3-month-old children (Dehaene-Lambertz & Dehaene 1994) and when confronted with more complex stimuli (stories), sleeping infants recruit typical left-hemispheric temporal regions in fMRI (Dehane-Lambertz, Dehaene & Hertz-Pannier 2002). It is well documented that left-hemispheric strokes in adults often result in aphasia. Patients with aphasia show in many cases persistent language de¿cits both in production and perception. After early (pre- or perinatally acquired) left hemispheric brain damage (LHD), however, language functions can be salvaged by reorganization into the right hemisphere (Staudt et al. 2002, Bishop 1983, Rasmussen & Milner 1977, Lenneberg 1967). Due to the plasticity of the young brain the right hemisphere can take over some language functions even when the injury occurs as late as at the age of 6 years as shown by HertzPannier et al. (2002). Regarding the effects of early left-hemispheric brain lesions there are two competing hypotheses: The equipotentiality hypothesis based on Lenneberg (1967) states that the two hemispheres are equipotential during a period in infancy: “at the beginning of language development both hemispheres seem to be equally involved” (p. 151). The equipotentiality hypothesis claims that both hemispheres are equally able to control and generate language. If this is correct, left- and righthemispheric language should be equal. The equipotentiality hypothesis is cor-

292

Eleonore Schwilling, Karen Lidzba, Andreas Konietzko et al.

roborated by the fact that adult patients with early acquired left hemispheric damage and resulting reorganized right-hemispheric language show normal cognitive language function as measured by verbal IQ of the Wechsler intelligence test (Staudt et al. 2002). The genetic predisposition hypothesis claims that the left hemisphere is genetically predisposed for the control and generation of language. If this is correct, left- and right-hemispheric language should differ. The genetic predisposition hypothesis is corroborated by evidence for delayed language acquisition in children with early left-hemispheric brain lesions (Eisele & Aram 1995; Bates 1999) on one hand and by evidence for early anatomical asymmetries in language areas on the other hand (Chi, Dooling & Grilles 1977). In earlier work by our group, children, adolescents and young adults with left-hemispheric lesions acquired before or at birth have been examined using neuropsychological and neuroimaging methods. These studies have shown that the language network can be reorganized in homologous areas of the right hemisphere after early left hemispheric lesions (Staudt et al. 2001; Lidzba et al. 2008). This reorganization was not accompanied by gross language de¿cits in early adulthood (Staudt et al. 2002). However, language reorganization in the right hemisphere proved to be correlated with visuospatial de¿cits (Lidzba et al. 2006). So far, the evidence provided in the literature on language de¿cits in patients with early LHD after accomplished language acquisition is inconsistent. Several researchers have observed that children with early LHD reach a functionally adequate pro¿ciency in language abilities by ¿ve (Stiles et al. 1998) or ten years of age (Reilly, Bates & Marchman 1998). Most children with LHD show, however, lexical and grammatical delay on the early stages of language acquisition (Chilosi et al. 2005). In particular, delay is reported in syntactic processing in children with LHD in contrast to children with right-hemispheric damage (RHD) up to the age of eight years (Aram, Ekelman & Whitaker 1986). Eisele & Aram (1994) found an “early and continuous left hemisphere specialization for expressive syntax” (p. 212). In their study a group of patients with early LHD demonstrated – as compared to controls – worse performance in an expressive (sentence imitation) language task. For the purpose of the present study patients with LHD were subjected to fMRI to identify language lateralization. Only subjects with right-hemispheric language lateralization were included in the patient group. We refer to this right-hemispheric language lateralization in patients with LHD as reorganized language (RL). This is the ¿rst experimental study carried out with German patients with LHD, where right hemispheric language lateralization is proven by fMRI.

Language skills in patients with reorganized language (RL)

293

According to the cited observations of normal language skills in LHD patients after accomplished language acquisition there should be no differences to be found between patients with RL and healthy controls. We will refer to this claim as the null hypothesis. The alternative hypothesis to be tested is the following: (I) Language skills in patients with RL are affected in both comprehension and production. In order to test hypothesis (I) we conducted an explorative study which examined the language skills of patients with early LHD and right hemispheric language representation more accurately. We designed a set of productive and perceptive language tasks. As the null hypothesis does not assume any differences between patients and healthy controls it was necessary to construct tasks that are dif¿cult and contain grammatical structures acquired late in typical language acquisition.

2.

Subjects and methods

In our ¿rst approach to investigating language skills in LHD patients with RL, we assessed children, adolescents and young adults with LHD and RL as compared with healthy, right-handed controls. The study was approved by the Ethics committee of the Tübingen Medical Faculty and therefore complied to The Code of Ethics of the World Medical Association (Declaration of Helsinki) with respect to ethical, scienti¿c and judicial aspects. All participants had German as their ¿rst language. The degree of lateralization in all experimental subjects was determined in a preceding fMRI experiment, where they completed the vowel identi¿cation task (as described in Wilke et al. 2006). In the activation condition of this task, subjects are presented with the picture of an object and have to decide if the name of this object contains the phoneme /i/. Thus, the subjects ¿rst internally generate a word (matching the object) and then phonologically analyse this word in search for the /i/ sound. Based on the statistical analysis of the fMRI data, we calculated a lateralization index (LI) in frontal cortex for every patient according to the following formula: (activationleft – activationright) / (activationleft + activationright). By using a bootstrap approach the resulting LIs were independent of differences in individual activation strength (for more details on this method see Wilke & Lidzba 2007). The values of the lateralization index range from í1 (exclusively right) to +1 (exclusively left), with LIs < í0.2 categorised as ‘right’, LIs between í0.2 and +0.2 as ‘bilateral’, and LIs > +0.2 as ‘left’. All patients in this experiment had a LI < í0.2 (cf. Table 1).

294

Eleonore Schwilling, Karen Lidzba, Andreas Konietzko et al.

We collected behavioural data in linguistic experiments on speci¿c sentence structures such as object topicalization, passive and relative clauses and tested the participants’ performance at different morphological and syntactical structures. The participants were presented with both comprehension and production tasks. The whole battery of tests took about one and a half hour to complete. The testing took place individually in a therapy room of the university children’s hospital in Tübingen. Language production was recorded and afterwards transcribed and analysed. The responses in the receptive tasks were recorded in writing and afterwards analysed. 2.1

Participants

Eight patients with early acquired (i.e. before or around birth) left hemispheric brain damage (average age 15.0 years, range 9–25 years, 3 male, 5 female) and 9 healthy controls (matched for age and verbal IQ, 5 male, 4 female) participated in the study. VIQ and forward digit span were measured in adults with the HAWIE-R (Tewes, Schallberger & Rossmann 1991) and in children with the HAWIK-III (Tewes, Schallberger & Rossmann 1999). Demographic, clinical and fMRI data of the patients are given in Table 1. All participants or (in the case of underage subjects) their parents gave written informed consent prior to participating in the study. They were paid for their participation according to their time commitment. All patients suffered from right-sided hemiparesis Table 1. Descriptive data of the patient group. Code 05 07 09 11

Age 12 12 9 13

Sex m f m f

12 14

25 21

m f

19 17

19 9

f f

Lesion characterisation periventricular periventricular schizencephaly cortico-subcortical (MCA) schizencephaly cortico-subcortical (MCA) schizencephaly polymicrogyria

Verbal IQ 92 91 111 106

Digit Span 5 5 4 6

LI í0.45 í0.5 í0.44 í0.68

109 91

7 5

í0.48 í0.84

75 84

3 4

í0.79 *

MCA = infarction of middle cerebral artery * patient #17 did not cooperate well enough in the scanner for calculation of an LI, but lesion size and localization (documented in anatomical MRI) make a positive LI improbable

Language skills in patients with reorganized language (RL)

295

particularly in the upper limb. In an oral interview, they assessed their own language abilities as normal. All of them attended normal schools except for three patients, who attended a school for physically handicapped children. 2.2

Tasks

We constructed a test battery out of ¿ve tasks for testing complex language skills. It contained different kinds of tasks: There were four tasks to elicit language and one task to retrieve semispontaneous language. In the elicitation part, tasks 2.2.1 and 2.2.2 tested language production, tasks 2.2.3 and 2.2.4 tested language perception. All elicitation tasks were at the language level of the sentence. 2.2.1 Sentence imitation task The subjects were asked to repeat 22 sentences presented with normal prosody and intonation. Each sentence contained nine or ten words except the coordinated sentences that contained up to 13 words. The stimulus sentences were presented in a pseudo randomized order. The subjects were instructed to repeat each sentence as precisely as possible. A given sentence was presented only once, unless the subject asked for a repetition. This kind of task is common in language acquisition tests and useful for testing speci¿c grammatical structures (Vinther 2002). It is based on the assumption that only structures that can be parsed can be actively produced and thus reproduced. The subject has to reconstruct the particular sentence. Because the sentences were unrelated and the lexical content did not bias towards a speci¿c reading it was necessary to decode the morphosyntactic structure accurately. The sentences displayed noncanonical word order in different constructions. These grammatical structures are less frequent in German everyday language and are therefore perceived as being more complex. They are also acquired relatively late, some of them only at primary school age. For analysis, we compared the reproduction to the stimulus sentence. All sentences were transcribed and afterwards analysed in quantitative terms: We counted grammatical errors and correctly reproduced words. Only morphologically identical reproductions were counted as correct. When a topicalized sentence was reproduced in a canonical word order as a grammatically correct but semantically incorrect sentence, we counted this production as a ‘grammatical mistake’. We tested object topicalization as shown in (1), passive (2), relative clauses (3) and as a control condition coordinated sentences with canonical word order (4):

296

Eleonore Schwilling, Karen Lidzba, Andreas Konietzko et al.

(1)

Den braunen Hund jagt das kleine Kind auf der Wiese. the brown dog.ACC chases the little child.NOM ‘The little child chases the brown dog.’

(2)

Das weinende Mädchen wird von einem schnellen Schwein gejagt. the crying girl.NOM is being by a fast pig.DAT chased ‘The crying girl is being chased by a fast pig.’

(3)

Der lange Stift ist auf dem Buch, das the long pencil.NOM is on the book.DAT which ‘The long pencil is on the book which is yellow.’

(4)

Das Mädchen schaut das Pferd an und der Junge steht the girl.NOM looks the horse.ACC on and the boy.NOM stands

gelb ist. yellow is

auf der Mauer. on the wall.DAT ‘The girl is looking at the horse and the boy is standing on the wall’. Group differences were tested for statistical signi¿cance with a Mann-Whitney U-test, with number of correctly produced words and number of grammatical errors as dependent variables. Differences between conditions (1) to (4) were tested for signi¿cance with the non-parametrical Friedman-test. 2.2.2 Declension of adjectives In this elicited production task we tested the subjects’ ability to use the inÀection paradigm in the nominal domain. The adjectives are marked for gender, number and case features. In German, the inÀectional suf¿xes on prenominal adjectives depend on the syntactic distribution. Most grammars differentiate three types of inÀexion (Eisenberg 2000). If no determiner is present the socalled strong paradigm for adjectival inÀection has to be used, for example, das Abendessen – gutes Abendessen (the dinner – good dinner). After de¿nite determiners the weak paradigm is used, for example, das gute Abendessen (the good dinner) and after inde¿nite determiners the choice of the paradigm depends on whether the determiner is fully marked or not. If the determiner is case-marked the weak paradigm follows, otherwise the strong paradigm. The result is a so called mixed pattern after inde¿nite determiners. In the literature on language acquisition and language impairment inÀection tasks are often used to test the level of acquisition or loss of rule based knowledge in regular

Language skills in patients with reorganized language (RL)

297

morphological patterns. In such studies nonsense words are used to exclude effects of frequency or lexicon. The subjects were asked to complete 15 phrases by inÀecting on the whole 21 real adjectives and 15 nonsense adjectives in strong or weak declension contexts. The task was presented both auditory and in written form to minimize memory problems especially for the nonsense words. A phrase was given as stimulus. Then a sentence beginning was given and the subject had to complete this sentence with the stimulus phrase. For a grammatical response it is necessary to adjust the endings of the adjective. See (5) for real words and (6) for nonsense words: (5)

Stimulus: ein schöner, roter, warmer Schal a nice red warm scarf

(5’) Sentence beginning: “Ich möchte keinen….” I like no … (5’’) Expected response: Ich möchte keinen schönen, roten, warmen Schal I like no nice, red, warm scarf ‘I don’t like to have a nice, red, warm scarf.’ (6)

Stimulus: das lahne, röne, the (nonsense adjective) ( nonsense adj.) Ment ( nonsense noun)

(6’) Sentence beginning:

“Ich will ein …” I want a …

(6’’) Expected response: Ich will I

kunzige (nonsense adj.)

want

ein lahnes, rönes, kunziges Ment a

(all nonsense words)

In (5) the stimulus contains a noun phrase with an inde¿nite article followed by strong inÀected forms of the adjective. The subjects were presented the beginning of a sentence for which they had to provide a completion. The sentence beginning (5’) contains the negative inde¿nite keinen which is to be followed by weak forms of the adjective (5’’). The inverse case is shown in (6) where the stimulus contains the de¿nite article followed by weak adjectival forms. The sentence beginning (6’) contains the inde¿nite article, hence strong forms of the nonsense adjectives are expected to follow (6’’).

298

Eleonore Schwilling, Karen Lidzba, Andreas Konietzko et al.

Group differences were tested for statistical signi¿cance in both conditions (real adjectives vs. nonsense adjectives) with the Mann-Whitney U-test, with number of incorrectly inÀected adjectives as dependent variable. Error types were analysed qualitatively: missing vs. wrong endings; consistent vs. inconsistent errors across adjectives within the same phrase. 2.2.3 Grammaticality judgments Sixteen sentences were presented auditorily. Eight of them were grammatically well formed, eight ill formed. The subjects had to judge whether the sentence is ‘good’ or ‘not good’. When they detected an ungrammatical sentence, they were asked to provide a correction. Because the stimulus sentence could be repeated on request, the inÀuence of memory was minimized. In this task we used the same syntactic constructions as in the sentence imitation task that is passive, relative clause, object topicalization and combinations of these as shown in the ungrammatical example in (7): (7) * Den Computer, der besonders schnell sein the computer.ACC which very fast be überall empfohlen. everywhere recommended

soll, wird should is

The correct form of this sentence would have been: (7’) Der Computer, der besonders schnell sein soll, wird the computer.NOM which very fast be should is überall empfohlen. everywhere recommended ‘The computer that is said to be particularly fast is everywhere recommended.’ Group differences were tested for statistical signi¿cance in both conditions (grammatical vs. ungrammatical sentences) using the Mann-Whitney U-test, with number of correct judgments as dependent variable. The corrections were classi¿ed qualitatively as correct or false both syntactically and semantically. 2.2.4 TROG-D The TROG-D (Fox 2006) is the German version of the standardized Test for Reception of Grammar (Bishop 1989). It is a hierarchically structured sentence-to-picture-correlation test. The subjects heard a sentence and were asked

Language skills in patients with reorganized language (RL)

299

to choose from a set of four pictures the one which matches the sentence. Because the stimulus sentence could be repeated on request, again the inÀuence of memory was minimized. The test was structured into blocks of four items for each grammatical phenomenon. These included local prepositions, passive, relative clauses, coordination, subordination and object topicalization. In the quantitative analysis one incorrect item out of the four renders the whole block failed (zero points). Only if each of the four items is correct, the block is scored with 1 point. The test is structured in 21 blocks with 84 single items. The highest raw score is 21. Group differences were tested for statistical signi¿cance with the Mann-Whitney U-test, with the raw score and number of mistakes as dependent variables. For a more detailed pro¿le, the allocation of errors to the target structures (topicalization, relative clauses, passive, coordination) was analysed descriptively. For an illustration of one item for object topicalization see (8): (8)

Den braunen Hund jagt das Pferd. the brown dog.ACC chases the horse. ‘The horse chases the brown dog.’

Figure 1. Item Q1 from the TROG-D. Reprinted with permission of editor Anette Fox and the Schulz-Kirchner-Verlag.

300

Eleonore Schwilling, Karen Lidzba, Andreas Konietzko et al.

2.2.5 Telling a picture story In this task the subjects were asked to tell a story called Papa Moll und der Hamster (‘Papa Moll and the hamster’) on the basis of eight pictures, which were presented simultaneously and in the correct order. To standardize the situation, every subject got some preparation time for orientation and the same instruction: to tell a complete story using all pictures. The production was recorded and afterwards transcribed and analysed. The content of the story was measured by standardized expert ratings. Group differences were tested for statistical signi¿cance by the Mann-Whitney U-test with speech rate (syllables per minute), number of words, number of grammatical errors per 100 words, and sentence complexity as dependent variables. The syntactic analysis of the stories was based on the analysis of word order, ¿lling of the pre¿eld, order of constituents in the middle ¿eld and semantic content. 3.

Results

3.1

Sentence imitation task

In the sentence imitation task the two groups differed signi¿cantly in both the number of grammatical mistakes (mean patients = 5.75 (SD 5.97), mean controls = 0.33 (SD 0.71); p < 0.005) and the number of correctly reproduced words (mean patients = 176.43 (SD 30.07), mean controls = 212.11 (SD 7.75); p < 0.05) (see Table 4 in section 3.6). The Friedman-Test revealed that the patient group made the most mistakes in topicalized sentences, and the least mistakes in coordinated sentences (Ȥ² = 11.89, df = 3, p = 0.008; see Table 2), while for the controls, no differences were found between the four grammatical constructions. Table 2. Mistakes in the different sentence structures in the patient group. sentence topicalization passive relative clause coordination a

mean 2.75 0.88 1.63 0.38

standard deviation 2.375 1.458 2.066 0.744

mean rank 3.44a 2.19 2.69 1.69a

signi¿cant difference (p < 0.01)

(9’) is an example for a patient’s reaction on object topicalization as shown in (1) here repeated as (9):

Language skills in patients with reorganized language (RL)

(9)

301

[Den braunen Hund]acc jagt [das kleine Kind]nom (…) the brown dog. ACC chases the little child.NOM ‘The little child chases the brown dog .’

(9‘) [Der braune Hund]nom jagt [das kleine Kind]acc (…) the brown dog. NOM chases the little child. NOM/ACC ‘The brown dog chases the little child.’ The topicalization of the accusative object (Den braunen Hund – the/acc brown dog) was reinterpreted as an agent-¿rst sentence and thereby the patient argument changes into an agent argument (Der braune Hund – the/nom brown dog) (The brown dog chases the little child.). The reproduced sentence is grammatically correct but semantically incorrect. 3.2

Adjective declension task

In the morphological task (adjective declension) patients differed signi¿cantly from controls in both conditions (real adjectives and nonsense adjectives) by making more mistakes (real adjectives: mean patients = 3.13 (SD 4.49), mean controls = 0.00 (SD 0); p < 0.05 / nonsense adjectives: mean patients = 10.25 (SD 5.60), mean controls = 2.25 (SD 1.79); p < 0.01; Figure 2). Average total number of mistakes by category

Number of mistakes

25 20 15

Nonsense adjectives Real adjectives

10 5 0 Patients

Controls

Figure 2. Mean and SD of total number of mistakes in real and nonsense words: Controls made no mistakes in the category ‘real adjectives’.

302

Eleonore Schwilling, Karen Lidzba, Andreas Konietzko et al.

The qualitative analysis of the patients’ performance revealed that in most cases their strategy to cope with the inÀection of nonsense adjectives was no adjustment of suf¿xes as shown in (10*) [stimulus (6) is repeated here as (10)]. (10) Stimulus: das

lahne,

röne,

kunzige

the (nonsense adjective) (nonsense adj.) (nonsense adj.) Ment (nonsense noun) (10’) Sentence beginning: “Ich I

will ein …” want a …

(10’’) Expected response: Ich will ein lahnes, rönes, I want a (nonsense adjective) (nonsense adj.) kunziges Ment (nonsense adj.) (nonsense noun) (10*) Response: Ich will ein I want a kunzige_ (nonsense adj.)

lahne_ (nonsense adjective) Ment (nonsense noun)

röne_, (nonsense adj.)

Only in some cases patients used wrong or inconsistent suf¿xes (e.g. adjustment in two endings but not in the third of a series of adjectives). For illustration see (10**) expressed by another patient: (10**) Response: Ich will ein I want a kunzige_ (nonsense adj.)

lahnes (nonsense adjective) Ment (nonsense noun)

rönes, (nonsense adj.)

In this example the ¿rst two nonsense adjectives bear correct af¿xiation, only the third one is in the incorrect weak form. 3.3

Grammaticality judgment task

In the grammaticality judgment task patients differed signi¿cantly from controls in both the correct and the incorrect sentences, with patients making fewer correct judgments (grammatical sentences: patient mean = 0.63 (SD 0.74), controls

Language skills in patients with reorganized language (RL)

303

mean = 0.00 (SD 0), p < 0.05; ungrammatical sentences: patients mean = 4.00 (SD 2.27), controls mean = 1.11 (SD 1.69), p < 0.05; Figure 3).

Number of wrong judgments

Mistakes in grammaticality judgment 8 7 6 grammatical as ungrammatical

5 4

ungrammatical as grammatical

3 2 1 0 Patients

Controls

Figure 3. Mean and SD of mistakes in the grammaticality judgment task: Patients even judged grammatically correct sentences as incorrect.

All controls classi¿ed all grammatical sentences as grammatical (‘a good sentence’), while some patients classi¿ed grammatical sentences as ungrammatical (‘not a good sentence’). The stimulus sentence is shown in (11). The ‘corrected’ sentence of one patient is shown in (11’) and his explanation is shown in (11’’). (Stress is marked with capital letters.) (11) Der Taucher, den der Hai angegriffen hat, wird gerettet. the diver whom the shark had attacked is saved ’The diver which the shark attacked is saved.’ (11’) Der Hai, der den Taucher angegriffen hat – DER wird gerettet. the shark which the diver had attacked – HE is saved ’The shark which had attacked the diver is saved.’ (11’’) Der HAI wird doch nicht gerettet! Menschen sind viel WICHtiger! the shark is not saved people are much more important ‘But the SHARK is not saved. People are much more IMPORTANT.’

304

Eleonore Schwilling, Karen Lidzba, Andreas Konietzko et al.

The patient judged sentence (11) as ‘not a good sentence’, although its structure is grammatical. The examiner asked him to propose ‘a better sentence’: By rephrasing the sentence (11’) he used a subject relative clause instead of an object relative clause. In addition he used the stressed pronoun ‘DER’ used like the deictic pronoun that which refers to the subject antecedent in the unmarked case. The patient’s explanation (11’’), however, shows that the stressed pronoun is intended to refer to the antecedent the diver (der Taucher) which is embedded in the relative clause. Ungrammatical sentences were judged to be grammatical by patients signi¿cantly more often than by controls (12). (12) * Der Einbrecher wird man heute will be today the burglar. NOM ’The burglar will be arrested today’

verhaften. arrested

The ungrammatical stimulus sentence shown in (12) starts with a nominative NP, but because of the object topicalization structure an accusative is needed for marking of the object. Often patients judged the sentence as grammatical in the sense of an agent-¿rst strategy. 3.4

TROG-D

In the TROG-D patients obtained a signi¿cantly lower raw score (mean patients = 17.29 (SD 1.98), mean controls = 20.11 (SD 1.05), p < 0.005, Figure 5) and made signi¿cantly more mistakes in the single blocks (mean patients = 6.14 (SD 3.80), mean controls = 1.22 (SD 1.64); p < 0.05, (see Table 4 in section 3.6). The quantitative analysis revealed that patients made most mistakes in object relative clauses and object topicalization (for details see Table 3). Table 3. Mistakes in the different sentence structures in the TROG-D in the patient group sentence topicalization relative clause passive coordination

mean 1.43 2.14 0.29 0.43

standard deviation 1.902 1.215 0.488 0.535

mean rank 2.64 3.36 2.00 2.00

Language skills in patients with reorganized language (RL)

305

The detailed analysis revealed that most mistakes were made in topicalization (example (8) repeated as (13)) and relative clauses (object relative clauses) (14). (13) Den braunen Hund jagt das the brown dog .ACC chases the ‘The horse chases the brown dog.’

Pferd. horse.NOM

Figure 1. Repeated as Figure 4: Item Q1 from the TROG-D. Reprinted with permission of editor Anette Fox and the Schulz-Kirchner-Verlag.

The correct ‘picture to sentence mapping’ in (13) is bottom right (No. 4). But one frequent example of patients’ response was ‘The brown dog chases the horse’ (top left, No. 1). The accusative marking for object was ignored and taken as subject of the sentence (switch of agent and patient). One example for patients’ reaction on a relative clause is given in (14): (14) Der Hund, den die Kuh jagt, ist braun. the dog. NOM [which. ACC the cow .NOM/ACC chases] is brown ‘The dog that is chased by the cow is brown.’

306

Eleonore Schwilling, Karen Lidzba, Andreas Konietzko et al.

Figure 5. Item M2 from the TROG-D. Reprinted with permission of editor Anette Fox and the Schulz-Kirchner-Verlag.

The correct ‘picture to sentence mapping’ in (14) is top left (No. 1). But one frequent example of patients’ response was ‘The dog chases the brown cow’ (top right, No. 2). The relative clause of the stimulus sentence modi¿es the NP Der Hund (the dog) with an accusative relative pronoun functioning as an object. The patients’ selection ignores the accusative marking and analyses it – as simpli¿cation – as nominative. To test for the inÀuence of verbal short-term memory on the patients’ linguistic performance, we conducted a post-hoc correlation analysis of digit span (forward) with errors in both tasks. For the sentence repetition task we found a trend level positive correlation of digit-span with the number of correctly reproduced words (r = 0.593, p = 0.08, Spearman rank correlation). Digit span, however, did not correlate with grammatical errors in the sentence repetition task (r = í0.411, p = 0.180, Spearman rank correlation), nor with errors in the TROG-D (r = í0.149, p = 0.389, Spearman rank correlation). 3.5

Picture story

In the semispontaneous language production (picture story) both patients and controls used similar sentence patterns. Patients constructed marginally less complex sentences than controls, however, the difference failed to reach statistical signi¿cance. The groups did not differ in the number of words produced. In the qualitative analysis we found no differences between the groups. Topo-

Language skills in patients with reorganized language (RL)

307

logically, in both groups the pre¿eld was occupied by a subject, an adverbial, a subordinate clause, an interrogative pronoun or an accusative object. Most of the sentences had the subject in the pre¿eld. However we found statistically signi¿cant differences between patients and controls in the number of grammatical mistakes and in the speed of language production: patients produced fewer syllables per minute than controls (mean patients = 133.17 (SD 39.76), mean controls = 171.00 (SD 47.87), p < 0.05) and more grammatical mistakes per 100 words (mean patients = 0.602 (SD 0.875), mean controls 0.078 (SD 0.221), p < 0.05). As a grammatical error we counted mistakes that were not corrected by the speakers themselves. Although one could evaluate such errors as slips of the tongue that are not noticed by the speaker, there were a signi¿cantly higher number of errors in the patient group. See one example for this in (15): (15) der Papa Moll, der hat (den klein) [Abbruch] the papa Moll, he had (the.ACC small) [incomplete/discontinuation] die kleine Schwester ‘n Hamster gekauft the little sister. NOM a hamster bought ‘Papa Moll had bought the little sister a hamster.’

In example (15) the patient realised that he was wrong and corrected himself – but in a wrong way: He started with an accusative marking ‘den klein’ (the small), but this would be the marking for the accusative in masculine nouns. So he restarted by using the nominative form ‘die kleine Schwester’ (the.NOM little sister) instead of the dative ‘der kleinen Schwester’ (the.DAT little sister). The content of the story measured by standardized expert ratings was principally provided by all subjects but some patients reported very brieÀy and sometimes without pronominal reference. Cohesive agents were often missing. Often the sentences started with ‘und dann’ (and then). See (16) as an illustration for this: (16) Dann heult das Mädchen. then cries the girl ‘Then the girl cries.’

308

Eleonore Schwilling, Karen Lidzba, Andreas Konietzko et al.

Und dann – springt der Hamster, als der Vater den Stiefel anziehen and then – jumps the hamster, when the father the boot put on

will, ja raus. want, yes out ‘And then when the father wants to put on his boots, the hamster jumps out.’ 3.6

Summary of the results

The patient group performed signi¿cantly worse than controls on all elicitation tasks and on two aspects in the semispontaneous production. Table 4 shows mean and SD of both groups on all tasks. Even though no speci¿c response time measurement was performed we observed, that patients required more time to complete the tasks: Their production was slower and the pauses between phrases were longer. Table 4. means and SD of participants’ performance in all tasks. Task Sentence imitation: correctly reproduced words Sentence imitation: mistakes Adjective declension: mistakes in real adjectives Adjective declension: mistakes in nonsense adjectives Grammaticality judgment: incorrect judged as correct Grammaticality judgment: correct judged as incorrect TROG-D: raw score TROG-D: mistakes Picture story: Syllables per minute Picture story: Grammatical mistakes per 100 words

Patients 176.43 (SD 30.07)

Controls 212.11 (SD 7.75)

Statistics p < 0.05

5.75 (SD 5.95) 3.13 (SD 4.49)

0.33 (SD 0.71) 0.00

p < 0.005 p < 0.05

10.25 (SD 5.60)

2.22 (SD 1.79)

p < 0.01

4.0 (SD 2.27)

1.11 (SD 1.69)

p < 0.05

0.63 (SD 0.74)

0.00

p < 0.05

17.29 (SD 1.98) 6.14 (SD 3.80) 133.17 (SD 39.76)

20.11 (SD 1.05) 1.22 (SD 1.64) 171.00 (SD 47.87)

p < 0.05 p < 0.005 p < 0.05

0.602 (SD 0.875)

0.078 (SD 0.221)

p < 0.05

We transformed the number of mistakes in all elicitation tasks to z-scores to show a pro¿le of the participants’ performance across tasks. By this procedure, all variables get a mean score of 0 and a standard deviation of 1 within the whole sample. Negative values indicate fewer mistakes than the average,

Language skills in patients with reorganized language (RL)

309

z-scores 1 0,8 0,6 0,4 0,2 0 -0,2 -0,4 -0,6 -0,8 -1

Patients Controls

Sen.imit.

Adj.decl. TROG-D Gram.judg.

Figure 6. Average z-scores of patients versus controls in the four elicitation tasks.

positive values more mistakes than the group average. Figure 6 shows the average z-scores, which clearly demonstrates the large and consistent differences between the groups.

4.

Discussion

This is the ¿rst study to test speci¿c linguistic structures in German speaking patients with early left-hemispheric brain lesions and right-hemispheric language representation detected using fMRI. Previous studies have either tested the language outcome after early left-hemispheric brain damage (irrespective of the type of reorganization) (e.g. Thal et al. 1991) or they interpreted their results as problems of the right hemisphere in mediating language, assuming that an early LHD automatically leads to language reorganization (e.g. Thal et al. 1991; Eisele & Aram 1994). There is, by now, ample evidence that only large or anatomically strategic lesions lead to language reorganization, even when the lesion occurs prenatally (Staudt et al. 2002). Without an individual assessment of language lateralization, as for example with functional MRI, the results of linguistic studies in patients with early LHD cannot contribute to the question whether RL is comparable to left hemispheric language. The novel combination of functional neuroimaging and speci¿c linguistic tasks in our study revealed signi¿cant differences between patients with reorganized language and healthy controls in both language comprehension and production. These differ-

310

Eleonore Schwilling, Karen Lidzba, Andreas Konietzko et al.

ences were stable across a wide range of grammatical constructions, all tend to be acquired during the later stages of language development. Our ¿ndings indicate that the null hypothesis cannot be maintained: In the light of an early left-hemispheric brain lesion, the right hemisphere can accommodate language to such a degree that everyday language functions are adequately preserved. The processing of the more complex structures, however, causes problems. While prepositions, coordination, and subordination were mastered relatively well by the patients, the declension of nonsense adjectives, and both the production and comprehension of sentences with non-canonical word order posed serious problems. When forced to deal with the above mentioned structures the patients seemed to be less con¿dent in solving the tasks. In contrast to previous studies, we found de¿cits in both production and perception. Eisele & Aram (1994, p. 212) described in their patient group ‘signi¿cantly impaired imitation coupled with relative preserved comprehension’. However, the patient group described in their experiment was rather heterogeneous (e.g. age of lesion onset) and in their experiment language reorganization was not investigated. Interestingly, when the patients in the present study were confronted with a semispontaneous language task (telling a picture story), the differences to the control group were much less obvious. Again, however, the patients made more grammatical errors than controls, and it took them more time to tell their stories because of longer lasting pauses in between their language production and lower speech rates. This latter point hints at possible compensational strategies of patients with reorganized language: The syllables-per-minute index of the picture stories is concordant with the behavioural observation during the testing sessions that the patients were slower than the controls to solve the tasks. Furthermore, during the grammaticality judgement task, the patients seemed less con¿dent in their judgments than the controls. Thus, we probably observed a cognitive compensation strategy with heightened awareness and effort when solving speci¿c language tasks in our patient group. While the mere quantitative analysis of the two groups’ performance strongly demonstrates the existence of de¿cits in the processing of complex linguistic structures, the qualitative analysis of errors provides hints to linguistic compensation strategies. One of these strategies seems to be the use of an agent-¿rst strategy to simplify sentences with non-canonical word order. This strategy was evident in the perceptive task, where patients often chose the picture matching the canonical word order, and in the sentence imitation task, where patients inaccurately reconstructed the heard sentences, again in canonical word order. Another observation is that patients make simpli¿cations even in case marking by following a strategy used in typical and impaired lan-

Language skills in patients with reorganized language (RL)

311

guage acquisition in German: nominative before accusative before dative. In the grammaticality judgment task we observed an unexpected reaction in the patient group: some of them judged correct sentences as incorrect; no control reacted like this. In the morphological task we observed no mistakes among the controls in dealing with real adjectives but among the patients there were a few. A serious fallacy in this interpretation could, however, lie within the cognitive domain of verbal short-term memory. It has been demonstrated before that patients with early brain lesions may be impaired in verbal memory capacity (Aram et al. 1985). Although we tried to minimize the inÀuence of working memory in our tasks, it is striking that patients made a lot of errors in the tasks with a possibly high working memory load [i.e. the sentence repetition task and in the picture-matching task (TROG-D)]. However, as we had the digitspan scores of all patients available from the IQ assessment we were able to conduct a post-hoc analysis on this issue. For the sentence repetition task we found a trend level positive correlation of digit-span with the number of correctly reproduced words (r = 0.593, p = 0.08, Spearman rank correlation). Thus, this difference between patients and controls could be explained by working memory differences. Grammar errors, however, did not correlate with digitspan (r = í0.411, p = 0.180, Spearman rank correlation), so that for the patients’ true linguistic de¿cits working memory cannot serve as an explanation. Also the observation that patients could better reproduce sentences with easier structures such as coordinates than sentences with complex structures such as object topicalization speaks against a mere memory effect. In the light of the two competing hypotheses (equipotentiality versus genetic predisposition) our ¿ndings could not unequivocally corroborate one or the other hypothesis. On the one hand, we observed considerable linguistic competence of the right hemisphere; on the other hand, however, we also detected clear de¿cits in complex linguistic structures in patients with reorganized language. In summary: The patients need more time, produce fewer complex structures and make more grammar errors predominantly in very complex structures. The problems RL speakers have with inÀecting nonsense words can be explained if we assume that RL speakers rely more strongly on semantic content in processing because it helps them to identify the lexical category needed for constituent formation. One way of interpreting these results is that the right hemisphere resembles an imperfect mirror image of the left hemisphere, as far as language functions are concerned. It should be noted, however, that the small group of patients in a wide range of age and the small set of tasks limits interpretation of our data.

312

Eleonore Schwilling, Karen Lidzba, Andreas Konietzko et al.

Our results provide ¿rst insights into the nature of RL but more detailed experiments will be necessary.

5.

Conclusions and outlook

Our results suggest that language processing differs between subjects with reorganized language and healthy controls. Our ¿ndings support the alternative hypothesis (I) and raise the question to what extent the right hemisphere is able to take over language functions from the left hemisphere. While semispontaneous production is not conspicuous at ¿rst glance, the slower speech rate may indicate that the language network in the right hemisphere might not be as ef¿cient as the left-hemispheric network. However, the question remains open whether it is the brain damage per se or an inef¿cient right hemisphere which is responsible for the slower language processing. From our results it appears that the right hemisphere is able to compensate for early left-hemispheric brain damage with respect to language functions super¿cially, but not when we look more closely at complex grammatical structures. Our results suggest that language processing is different between subjects with reorganized language and healthy controls. Based on these preliminary results further experiments are in progress with the aim to describe RL in more detail. In these future studies the alternative hypothesis will be differentiated in speci¿c hypotheses about RL in order to answer the question: What exactly is the nature of the affection? In future studies we plan to include a group of patients with right-hemispheric lesions and a group with left-hemispheric lesions and left-hemispheric language to rule out simple lesion effects as cause for the de¿cits.

Acknowledgements This study was funded by the Beitlich-Foundation (Tübingen, Germany), the Tübingen Medical Faculty (fortüne programme 1704-0-0 and AKF programme 235-0-0), and by the German Research Foundation (Collaborative Research Center 833).

Language skills in patients with reorganized language (RL)

313

References Aram, D.M., B.L. Ekelman, D.F. Rose & H.A. Whitaker 1985 Verbal and Cognitive Sequelae Following Unilateral Lesions Acquired in Early Childhood. Journal of Clinical and Experimental Neuropsychology 7 (1): 55–78. Aram, D.M., B.L. Ekelman & H.A. Whitaker 1986 Spoken Syntax in Children with Acquired Unilateral Hemisphere Lesions. Brain and Language 27: 75–101. Bates, E.A. 1999 Plasticity, localization and language development. In S. Broman and J.M. Fletcher (Eds.), The changing nervous system: Neurobehavioral consequences of early brain disorders, 214–253. New York: Oxford University Press. Bishop, D.V.M. 1983 Linguistic impairment after left hemidecortication for infantile hemiplegia? A reappraisal. Quarterly Journal of Experimental Psychology 35A: 199–207. Bishop, D.V.M. 1989 Test for Reception of Grammar: TROG, 2. Edition 1989, University of Manchester, Manchester: D.V.M. Bishop. Cabeza, R. & L. Nyberg 2000 Imaging cognition II: an empirical review of 275 PET and fMRI studies. Journal of Cognitive Neuroscience 12: 1–47. Chi, J.G., E.C. Dooling & F.H. Grilles 1977 Left-right Asymmetries of the Temporal Speech Areas of the human Fetus. Archives of Neurology 34: 346–348. Chilosi, A.M., C. Pecini, P. Cipriani, P. Brovedani, D. Brizzolara, G. Ferretti & L. Pfanner 2005 Atypical language lateralization and early linguistic development in children with focal brain lesions. Developmental Medicine&Child Neurology 47: 725–720. Dehaene-Lambertz, G., S. Dehaene & L. Hertz-Pannier 2002 Functional Neuroimaging of Speech Perception in Infants. Science 298: 2013–2015. Dehaene-Lambertz, G. & S. Dehaene 1994 Speed and cerebral correlates of syllable discrimination in infants. Nature 370: 292–295. Eisele, J.A. & D.M. Aram 1994 Comprehension and Imitation of Syntax Following Early Hemisphere Damage. Brain and Language 46: 212–231.

314

Eleonore Schwilling, Karen Lidzba, Andreas Konietzko et al.

Eisele, J.A. & D.M. Aram 1995 Lexical and Grammatical Development in Children with Early Left Hemisphere Damage: a Cross-Sectional View from Birth to Adolescence. In: Fletcher, P. and Mac Whinney, B (Hrsg.) Handbook of Child Language, Oxford, Basil Blackwell, 664–689. Eisenberg, P. 2000 Grundriss der deutschen Grammatik. Band 1: Das Wort. Die deutsche Bibliothek. Stuttgart, Verlag. Metzler. Fox, A 2006 TROG-D Test zur Überprüfung des Grammatikverständnisses, Idstein, Schulz-Kirchner-Verlag. Herz-Pannier, L., C. Chiron, I. Jambaqué, V. Renaux-Kieffer, P.F. Van de Moortele, O. Delande, M. Fohlen, F. Brunelle & D. Le Bihan 2002 Late plasticity for language in a child‘s non-dominant hemisphere: a preand post-surgery fMRI study. Brain 125: 361–72. Lenneberg, E. 1967 Biological Foundations of Language. New York: John Wiley. Lidzba, K., M. Staudt, M. Wilke, W. Grodd & I. Krägeloh-Mann 2006 Lesion-Induced Right-Hemispheric Language and Organization of Nonverbal Functions. NeuroReport 17: 929–933. Lidzba, K., M. Wilke, M. Staudt, I. Krägeloh-Mann & W. Grodd 2008 Reorganization of the Cerebro-cerebellar Network of Language Production in Patients with Congenital lefthemispheric Brain Lesions. Brain and Language 106: 204–210. Price, C. 2010 The Anatomy of Language: A Review of 100 fMRI studies published in 2009. Annals of the New York Academy of Science 1191, 62–88. Rasmussen, T. & B. Milner 1977 The role of early left-brain injury in determining lateralization of cerebral speech functions. Annals of the New York Academy of Sciences 299: 355–369. Reilly, J.S., E.A. Bates & V.A. Marchman 1998 Narrative discourse in children with early focal brain injury. Brain and Language 61: 335–375. Staudt, M., W. Grodd, G. Niemann, D. Wildgruber, M. Erb & I. Krägeloh-Mann 2001 Early Left Periventricular Brain Lesions Induce Right Hemispheric Organization of Speech. Neurology 57: 122–125. Staudt, M., K. Lidzba, W. Grodd, D. Wildgruber, M. Erb & I. Krägeloh-Mann 2002 Right-hemispheric Organization of Language Following Early Left-sided Brain Lesions: Functional MRI Topography. NeuroImage 16: 954–967. Stiles, J., E.A. Bates, D. Thal, D. Trauner & J. Reilly 1998 Linguistic, Cognitive and Affective Development in Children with Preand Perinatal Focal Brain Injury: A Ten-year Overview from the San Di-

Language skills in patients with reorganized language (RL)

315

ego Longitudinal project. In: Rovee-Collier, C., Lipsitt, L. and Hayne, H. (Hrsg.) Advances in infant research 12: 131–163. Tewes, U., P. Schallberger, & U. Rossmann 1991 Hamburg-Wechsler-Intelligenztest für Erwachsene Revision (HAWIER). Bern: Huber. Tewes, U., P. Schallberger & U. Rossmann 1999 Hamburg-Wechsler-Intelligenztest für Kinder III (HAWIK III). Bern: Huber. Thal, D., V. Marchman, J. Stiles, D. Aram, D. Trauner, R. Nass, & E. Bates 1991 Early lexical development in children with focal brain injury. Brain and Language 40: 491–527. Vinther, T. 2002 Elicited Imitation: a brief overview. International Journal of Applied Linguistics 12 (1): 54–73. Whitehouse, A. J. O. & D. V. M. Bishop 2009 Hemispheric division of function is the result of independent probabilistic biases. Neuropsychologia, 47 (8-9): 1938–1943. Wilke, M., K. Lidzba, M. Staudt, K. Buchenau, W. Grodd & I. Krägeloh-Mann 2006 An fMRI task battery for assessing hemispheric language dominance in children. Neuroimage 32 (1): 400–410. Wilke, M. & K. Lidzba 2007 LI-tool: A new toolbox to assess lateralization in functional MR-data. Journal of Neuroscience Methods163 (1): 128–136.

Predicting speech imitation ability biometrically Susanne Reiterer, Nandini C. Singh & Susanne Winkler

1.

Introduction

Speech sound imitation is a pivotal learning mechanism for humans. Vocal imitation provides a basis for both the developmental acquisition and cultural evolution of languages and musical systems (Fitch 2010). However, as cultures and language systems are diverse, equally diverse are individuals, their linguistic knowledge and speech acquisition skills. Individuals differ greatly from each other in their aptitude, ability and success in sound imitation learning (Dogil & Reiterer 2009; Golestani et al. 2002). This is especially evident when it comes to the acquisition of a second language sound system, with diverse shades of foreign accent being left as audible traces for the listener. Some people, on the other hand, are so good at vocal imitation that they can even make a living from mimicking other people’s dialects, speech characteristics or foreign accents. Although the origins of these individual imitation ability differences – whether biological, genetic, psycho-cognitive, social or environmental – are still unknown, recent neuroscienti¿c research has shed new light on this issue. This evidence suggests brain structural and functional differences as a partial explanation (e.g. Golestani et al. 2007, Golestani & Pallier 2007, Reiterer et al. in press). Anatomical predispositions in the brain have been proposed to explain why some individuals have better speech sound imitation or auditory discrimination skills (Harris et al. 2009; Wong et al. 2008; Golestani, Price & Scott 2011; Golestani & Pallier 2007). Neurofunctionally, a repeatedly observed pattern is that greater amounts of brain activation accompany poorly skilled processing. In contrast, highly skilled performance, either due to practice, aptitude or a combination of both, is reÀected in less activation. These phenomena are known as “cortical ef¿ciency/effort” effects (Reiterer et al. 2005a, b; Prat, Keller & Just 2007). In an earlier brain imaging experiment, using functional magnetic resonance imaging (fMRI), we showed that this neurofunctional skill effect also holds true for individual differences in speech sound imitation aptitude. Speakers with higher speech imitation ability showed less brain activation in a speci¿c brain network related to pronunciation, phonemic awareness, articulation, phonological processing, sound imitation and auditory working memory (Reiterer

318

Susanne Reiterer, Nandini C. Singh & Susanne Winkler

et al. in press). Since these differences are neurobiological, it should be possible to ¿nd quantitative bio-markers to measure differences in speech imitation ability biometrically – just as it is possible to recognize mind states and identify speakers acoustically. An interesting combination of acoustic and neurophysiological methods has been applied to predict speech discrimination ability in animals. Using spectrograms and neural activity in the form of spike timing and ¿ring rate, the performance success of consonant discrimination ability could be predicted by looking at neural activity in the primary auditory cortex of rats (Engineer et al. 2008). Similarly, the domain of forensic phonetics traditionally uses acoustic biometrics to identify individual speakers. Usually some derivative forms of timefrequency spectral signal analysis are used to identify speakers according to certain phonetic, phonation or voice characteristics, like relative amplitude of formants or subglottal resonances (Disner, Fulop & Hsieh 2010; Arsikere et al. 2010). It has been found that those measures demarcate speaker-related differences and are relatively stable over diverse cultures and languages, independent even of the native language (L1) of the speaker. Identifying not only interindividual, but also intra-individual differences, Singh et al. could differentiate typical from atypical vocal development and characterize inter-speaker variability in articulation ability on a segmental and supra-segmental level (Singh & Theunissen 2003; Singh & Singh 2008). In this paper we applied neuroimaging (fMRI) and modulation spectrum analysis to identify individual differences in speech imitation/pronunciation capacity. Our hypothesis was that lower ability speech imitators would engage more widespread, diffuse neural activation around articulation-relevant areas in the brain and show different segmental and suprasegmental articulatory patterns relative to high ability, versatile pronouncers. For this aim we used an elicitation task which reliably separates talented from untalented speech imitators. Flege & Hammond (1982) introduced the technique of delayed mimicry in the ¿eld of phonology and second language acquisition. They used this paradigm (production from memory) to assess speakers’ awareness of non-distinctive phonetic differences which in part characterize language varieties. In their case they investigated American English native speakers who had to imitate Spanish-accented English, an accent they were to some degree familiar with. The researchers primarily wanted to show that second language speakers can detect even those phonetic differences which are non-distinctive in their own language (L1) and are in principle able to produce them. They did indeed ¿nd evidence for that, because the American speakers were able to mimic these

Predicting speech imitation ability biometrically

319

non-distinctive differences. This undermines theories of “phonological ¿ltering” with L1 as an insuperable ¿lter for L2 speech production. Recently scienti¿c evidence has grown (Reiterer et al. in press, Abrahamsson & Hyltenstam 2009), that some high aptitude speakers can attain even native speaker-like pronunciation, even if they learn the second language as so-called late learners (after puberty). We observed, however, that the percentage of people even with higher education who were able to pronounce the foreign language at such high levels was fairly low, i.e. around 8–10%. In order to predict and characterize speech imitation aptitude biometrically, we used a delayed mimicry task which required the volunteers to fake a familiar foreign accent from memory. Participants read aloud sentences like ‘Der Professor präsentiert das Resultat’ with a typical English accent while we scanned them with magnetic resonance imaging and recorded their speech output for further modulation spectral analyses.

2.

Methods

2.1.

Method 1: fMRI experiment (functional magnetic resonance imaging)

2.1.1. Subjects We screened 200 volunteers with different levels of foreign language speech imitation ability. Those who matched our experimental control criteria, such as, age, handedness, gender, educational status, mother tongue, age of onset of second language acquisition (AOA), linguistic experience, neurological status, verbal and nonverbal IQ, were subjected to in-depth linguistic (138 participants) and psychological testing (113 participants). The age range was between 20–40 years (mean: 25.94 yrs; SD±5.2). Participants were predominantly (73%) right-handed according to the Edinburgh Laterality Quotient (Old¿eld 1971). The male/female ratio was 53/85 (53 males). All had had university level education. We did not restrict participants by ¿eld of study, but controlled whether the sample was balanced for language related study backgrounds. Approximately half of the participants came from a language study background but this “linguistic experience” variable turned out to have no predictive effect on levels of speech imitation or articulation ability, nor did the variable of grammar pro¿ciency in the structure of L2 English as measured by a ToeÀ subtest. The sample consisted of German native speakers who were late bilinguals/ L2-learners of English as their ¿rst L2 (AOA around 10 yrs). No early bilinguals were included. They all knew at least one foreign language, 24% knew

320

Susanne Reiterer, Nandini C. Singh & Susanne Winkler

just English, 30% knew two L2s, 22% three, 17.5% four, 3.5% ¿ve, 2% six and 1% nine foreign languages to various degrees of pro¿ciency. Their mean exposure to formal school instruction in L2 English was: 9.8 yrs. Their verbal (Lehrl, Triebig & Fischer 1995) and nonverbal IQ (Raven, Raven & Court 1998) scores were in the normal to advanced range (between 110–140) corresponding to their educational background. For the further fMRI investigation 64 out of the 72 strongly right-handed, MR-compatible (those who were scanned because they showed no exclusion criteria like e.g. claustrophobia or a metal implant) subjects were chosen. 72 members of the behavioral group with 138 participants (N = 138) were scanned with the fMRI protocol. 18 of these scanned participants with the highest and lowest speech imitation scores (equivalent to the upper and lower 15%) formed the extreme groups of 9 high ability (6/3 male/female) versus 9 low ability (4/5 male/female) subjects for statistical group versus group comparisons. The two extreme groups (high versus low ability) differed signi¿cantly in their ability to imitate speech samples in an unknown foreign language, namely Hindi (mean score high = 6.6; low = 3.2; p = 0.000**). 2.1.2. Behavioural experiments Behavioural phonetic testing (“Hindi imitation score”): We recorded the participants in a sound-proof room at a phonetics laboratory while they performed different speech imitation, pronunciation or reading tasks in the languages German (L1), English (L2) and Hindi (L0 for unknown language). For details of the different task types and elicitation techniques see Jilka (2009). To avoid the inÀuence of language experience and to elicit the purest imitation capacity possible, the participants were exposed to speech material in Hindi, a language they had never heard before. They had to repeat sentences spoken by a model Hindi speaker. The imitations were based on 4 Hindi sentences of different length and phonetic complexity (7/7/9/11 syllables long) which had to be repeated immediately after presentation. We repeated the stimulus sentences three times before imitation, to ensure suf¿cient exposure for reproduction. Example sentence: (a)

[‘mE: pra‘dA:n@m@n,tri: ‘kabbanUN,gi:] translation: ‘When will I become Prime Minister?’

Evaluation of the quality of speech imitation of these stimuli was performed in India with online blind native speaker ratings of the participants’ speech pro-

Predicting speech imitation ability biometrically

321

ductions. The raters were naïve with regard to phonetic background and whom they rated. They were instructed to judge whether the sample they were listening to could be spoken by a native speaker of Hindi or not. To ensure the quality of the evaluation procedure, we randomly inserted into the database recordings of Indian native speakers (N = 18). The speech samples were presented in random order. For the intuitive rating scale (Jilka 2009) we used a rating bar which ranged from 10 to 0 (highest to lowest representation of native-speaker-likeness). 30 gender-balanced Indian native speakers rated all the samples online using earphones. The same pronunciation ability score was created also for L2 English as control, following the same internet-based rating procedure, based on read speech samples (reading aloud the text “The Northwind and the Sun” from the international phonetic alphabet). Further behavioural tests (psycholinguistic testing): Additional bahavioural questionnaires were administered for foreign language learning experience. Furthermore a nonverbal IQ-test (Raven, Raven & Court 1998), a verbal IQ-test (Lehrl, Triebig & Fischer 1995), a TOEFL subtest on English grammar for control of grammatical pro¿ciency level, and an Auditory Working Memory (Digit Span, Tewes 1991) plus German Nonword Repetition test (Benner 2005) were applied. We also controlled foreign language aptitude with the short form of the validated Modern Language Aptitude Test (MLAT, Carroll & Sapon 1959). 2.1.3. fMRI task, protocol and statistics fMRI task: We scanned 64 well controlled participants (see 2.1.1.) during overt, microphone-monitored speech production. Using an event-related, sparse-sampling fMRI paradigm, we administered an overt Sentence reading task (30min), which was subdivided into the three sub-conditions: A) reading aloud German sentences (L1), B) reading aloud English sentences (L2) and C) reading German sentences with a fake English accent (any variety). This task required phonemic awareness of foreign accented speech characteristics. Mean sentence production duration was 3 sec. The sentences were presented in the centre of the screen and condition C) was signalled by a British Àag symbol above the sentence. The 75 total stimuli (25 per condition) were 11-syllables long and matched for semantic content. Participants were instructed to start reading as soon as the sentence appeared. For acquisition, a sparse sampling paradigm was used (TR = 12s, TA = 3s, pause = 9s) with sentences presented and read during the scanner pauses. Stimuli were randomized for time-onsets and order of presentation. Interstimulus baseline trials were inserted alternatingly every sec-

322

Susanne Reiterer, Nandini C. Singh & Susanne Winkler

ond TR accompanied by ¿xating a white cross on black screen. The produced speech was recorded by a commercially available optical MR-microphone1. Before the start of the fMRI scanning session subjects were familiarized with sample stimuli. Here are examples of each condition. (A)

‘Die gute Mensa begeistert uns alle’

(L1 German)

(B)

‘The mechanic repaired cars in the garage’

(L2 English)

(C)

‘Der Professor präsentiert das Resultat’

(L1 with fake Engl. accent)

MR Image acquisition: For fMRI acquisition see endnote 22. For anatomical MR image acquisition see endnote 33. FMRI statistical analysis: FMRI images were analyzed using the free software packet SPM5. Conventional data pre-processing was performed4. At the ¿rst level, design matrices of 1. company: www.optoacoustics.com. 2. A Siemens Vision 1.5 T scanner was used. For functional imaging (fMRI) of the BOLD (blood oxygen level dependent) signal, we used an EPI (echo planar imaging) Gradient Echo sequence with sparse sampling method set at the following parameters: TR = 12 s, TA = 3s, delay in TR (pause) = 9s, TE = 48 ms, slice number = 36 transversal, Flip angle: 90 deg, Slice thickness = 3mm+1mm gap, Voxel Size: 3 × 3 × 4 mm3, FoV = 192 × 192 × 143 mm3, matrix = 64 × 64. The ¿rst 3 EPI data sets of each session were discarded prior to analysis to allow for T1-saturation effects. 3. As structural-anatomical MR sequence a T1-weighted MDEFT sequence (Modi¿ed Driven-Equilibrium Fourier Transform), scan time = 12min, repetition time (TR) = 7.92 ms; echo time (TE) = 2.48 ms; inversion time (TI) = 910 ms; Àip angle (FA) = 16 deg; voxel size: 1 × 1 × 1 mm3, Field of View (FoV) = 176 × 256 × 256 mm3, slices per slab = 176 sagittal, matrix = 256 × 256. An 8-channel head coil was used. 4. Data pre-processing: each fMRI data set underwent spatial realignment by aligning the ¿rst scan from each session with the ¿rst scan of the ¿rst session and aligning the images within sessions with the ¿rst image of a particular session. The realigned data were spatially normalized to the standard Montreal Neurological Institute (MNI) T1 template, with the coregistered individual T1 image as a reference. Volumes were resliced to a voxel size of 3 × 3 × 3 mm3, motion corrected and spatially smoothed using a 10-mm full-width at half-maximum Gaussian kernel and prepared for later random effects analyses.

Predicting speech imitation ability biometrically

323

individual general linear models incorporated three regressors of language type (German, English, German with fake accent) for the session sentence reading5. At the second level, group analysis was performed using analysis of variance (ANOVA), with one between subject factor “ability group” (high vs. low ability group) and one within-subjects factor “language type” (L1, L2, LAcc)6. Main effects for group and language type and the interaction effect of group by language type were calculated separately for each session. A statistical threshold of p < 0.05 (whole brain cluster level correction for multiple comparisons) was obtained. Results were overlaid on the mean anatomical image of the group (N = 64). 2.2.

Method 2: acoustic experiment (modulation spectrum analysis)

2.2.4. Subjects and behavioral experiment The same subjects as described under 2.1.1. (N = 138, the behavioural group with 138 participants) were used and Hindi speech samples (sentence imitations, described in 2.1.2.) were then subjected to modulation spectrum analysis. 2.2.5. Introduction to modulation spectrum analysis Since the speech signal is a series of pressure changes that is non-stationary, in that successive disturbances are not equally spaced in time and are not of constant shape, we describe it in terms of a joint time-frequency representation called the spectrogram that provides information about how the spectral content of the speech signal evolves with time. This is performed by ¿rst partitioning a series of one-dimensional pressure changes in the time-domain into small overlapping equal frames of time t and then estimating for each segment a Short Time Fourier Transform popularly called STFT. The spectrogram is a three-dimensional plot which displays how the energies contained in different frequencies changes over time (Fig. 1). In a spectrogram, the horizontal dimension represents time, vertical dimension represents frequency and the color encodes the energy, with red representing the highest energy, then in decreasing order of importance orange, and yellow, and the blue areas representing energies below a threshold decibel level.

5. Six additional regressors of movement parameters were added for each session as well. Regressors were de¿ned with onsets at the time of appearance of the corresponding event and convolved with the canonical hemodynamic response function. 6. A third factor, “subject,” was added to the design matrix in order to remove variability as a result of differences in the participants’ average responses.

324

Susanne Reiterer, Nandini C. Singh & Susanne Winkler Vinkum

10000

Frequency(kHz)

8000

6000

4000

2000

0

0.1

0.2

0.3

0.4 0.5 Time (sec)

0.6

0.7

0.8

Figure 1. Speech spectrogram, characterizing speech by Àuctuations of energy in different frequencies over time. X-axis represents time (here seconds), y-axis frequency (Hertz) and the colours encode the energy with red representing the highest energy, followed by orange, yellow and blue (lowest, below a threshold decibel level).

As is evident from Fig. 1, vocal utterances can be characterized by Àuctuations in time and frequency. We label energy Àuctuations across a frequency spectrum at a particular time as spectral modulations (\f) whereas temporal modulations (\t) are energy Àuctuations at a particular frequency over time. Spectral modulations thus essentially provide information about the set of frequencies present at any particular time whereas temporal modulations indicate how a frequency structure changes over time. Researchers (Rosen 1992; Stevens 1980) have proposed that temporal modulations of different order encode various articulatory/phonological features. For example, amplitude envelope modulations that occur at hundreds of milliseconds (continuous stretch of energy over hundreds of milliseconds) are believed to encode syllabicity and prosodic phenomena. In contrast, changes of the order of 3–5 milliseconds are related to ¿ne structure information like vowel quality and pitch (¿ne vertical striations in Fig 1).

Predicting speech imitation ability biometrically

325

It is thus evident that spectro-temporal Àuctuations spanning different timescales encode spoken language features. Here we present a method to capture these different spectro-temporal time scales in terms of the speech modulation spectrum. 2.2.6. Modulation spectrum analysis Speech Modulation Spectrum (SMS): To capture amplitude Àuctuations that are both temporal and spectral in nature we use a 2D Fourier decomposition of the spectrogram. Since this decomposition captures Àuctuations/modulations in amplitude in the time-frequency domain, it is called modulation spectrum. A 2-dimensional Fourier decomposition of the spectrogram therefore provides a distribution of different spectrotemporal modulations that make up that speech signal. We call this the Speech Modulation Spectrum (SMS). Figure 2 shows a typical SMS for an adult German female speaker. Interestingly the distribution of energy is not uniform but

Figure 2. Speech Modulation Spectrum (SMS) obtained by a 2-dimensional Fourier decomposition of the spectrogram (Fig1). It provides a distribution of different spectro-temporal modulations that make up the speech signal. X-axis: temporal modulations (here 1–100Hz), y-axis: spectral modulations (KHz). The centre (black rectangle containing red) captures syllabicity (~200 ms), the black ellipses (left and right) formant transitions, and the black outmost squares place of articulation.

326

Susanne Reiterer, Nandini C. Singh & Susanne Winkler

is localized at speci¿c time scales. As seen in ¿gure 2 energy is distributed in two regions, namely temporal modulations between 0–100Hz and at around 150–200Hz. Each of these regions encodes different features; the region from 0–100Hz encodes syllabic rhythm, formant transitions, and place of articulation, whereas the second region with modulations of ~200Hz encodes voiced pitch. Fig. 2 provides the details of ¿rst region (1–100Hz). The region between 2 to 10Hz carries supra-segmental information and encodes information about phonological features between 1000 milliseconds to 10 milliseconds, (for example, word lengths and language rhythm). The side lobes localized between 10 to 100Hz encode segmental information and can be further divided into two regions, one between 25–50 Hz, which encodes slower formant transition,s and a second region between 50–100 Hz, which captures Àuctuations arising due to place of articulation in stops and fricatives. As we go from 1 to 100 Hz, we approach sounds whose amplitude Àuctuations become faster and go from syllabic to vowel-like to plosive-like segments. Thus based on the temporal scale, the SMS can be classi¿ed into three regions: 1. Speech rhythm (2–10Hz) 2. Formant transitions (25–40Hz) 3. Place of articulation in stops/fricatives (50–100Hz) To obtain speech spectra for individual subjects, we subjected the speech samples of 18 participants (the extreme groups of high versus low imitation ability as described under 2.1.1.) to modulation spectrum analyses on the basis of the Hindi word and sentence imitations.

3.

Results

3.3.

Behavioral results

As a result of the Hindi imitation native speaker ratings, we found that the scores of Hindi imitation capacity followed the shape of a Gaussian normal distribution for our 138 German participants (see also 2.1.1.) with 70% of subjects ranging within one SD below and above the mean (average ability). The subjects̓mean score was 4.62, SD± 0.99, ranging from a lowest score of 2.42 to a highest score of 7.74 on a range from 0 = min to 10 = max. The 18 Hindi native speakers who had been interspersed into the database to confuse the native raters were ranked along the ¿rst 18 places of the evaluation, scoring between

Predicting speech imitation ability biometrically

327

8.07 and 9.9, SD±0.6. This means they had been perfectly identi¿ed as Hindi native speakers by other Indians, a fact which controls the quality of the online rating procedure. Further linguistic and psychological control variables yielded the following results. From a sample of N = 113 we obtained additional test scores on the variables: auditory working memory, non-verbal and verbal IQ, number of foreign languages spoken, general experience with languages (linguistic expertise), foreign language aptitude, English grammar (ToeÀ) and English pronunciation as rated by native English speakers. Working memory (nonword repetition and digit span) correlated most signi¿cantly with the Hindi score: nonword repetition r = 0.37, p = 0.000**, digit span r = 0.36, p = 0.000**, closely followed by the MLAT (total): r = 0.33, p = 0.000**, the English imitation skills as rated by native English judges: r = 0.3, p = 0.001**, and the results of the English grammar (TOEFL) subtest with r = 0.27, p = 0.004**. No signi¿cant correlations were obtained for Hindi score with non-verbal IQ (r = 0.1, p = 0.29); general linguistic experience (r = 0.01, p = 0.99); number of foreign languages spoken (r = 0.16, p = 0.09); verbal IQ (r = 0.17, p = 0.07). Thus we found that both imitation ability scores (Hindi and English score) correlated signi¿cantly with one another: r = 0.3, p = 0.001** and working memory as well as foreign language aptitude was also highly signi¿cantly correlated with our primary indicator of imitation ability, namely the “Hindi imitation score”. 3.4.

Brain Imaging (fMRI) results

To visualize the individual differences in speech imitation ability, the fMRI analyses were based on the extreme groups (9 high/9 low ability) according to the Hindi imitation score (2.1.1.) and comprised main effects for the reading task for each group per se (Fig. 3) as well as group versus group comparisons (Fig4). For the main effects for each group (pure effect for group) for all three subconditions of the reading task (A. reading aloud German (L1) sentences, B. English (L2) and C. German sentences with fake English accent (L1Acc), we found that both groups showed hemodynamic (brain) activation of widely distributed bilateral language networks around the classical peri-sylvian language-zones plus additional parietal, temporal as well as occipital areas (Fig 3). Most prominently involved were parts of the inferior-frontal, middle frontal and precentral gyrus frontally (motor and premotor areas including Broca’s area), postcentral gyrus, supramarginal gyrus, inferior and superior parietal lobes parietally, superior and middle temporal gyri temporally, the left insula,

328

Susanne Reiterer, Nandini C. Singh & Susanne Winkler

High ability group

Low ability group

Figure 3. Group activation for the high imitation ability (left side) and low imitation ability (right side) group during the task reading sentences in A) German, B) English and C) German with fake foreign accent. More extended clusters of activation around speech-language related areas found for the low ability group. Main effects ANOVA Àexible factorial design. P< 0.05, corrected for multiple comparisons.

Low – High (A) L1 Germ

Low – High (B) L2 Engl

Low – High (C) L1 Acc

Figure 4. Group versus group comparisons: signi¿cant differential activations (low ability > high ability group) for the low ability group during sentence reading in A) German (L1), B) English (L2) and C) German with fake English accent (L1Acc). P> 0.05, corrected for multiple comparisons. Signi¿cant differences in activation already found for A) reading in mother tongue, with increasing activations in speech-related areas the more complex the reading task gets (reading English sentences and German sentences with fake English accent). Activation peak observed in the left inferior parietal (IPL) cortex, supramarginal gyrus.

Predicting speech imitation ability biometrically

329

basal ganglia and the cerebellum. Although most activations were bilateral, a preponderance of more activation of the left hemisphere is clearly visible, more so, in the low ability group. Visual inspection of Figure 3 shows that subjects with low pronunciation skills displayed signi¿cantly higher and more widespread activation than the high ability group. Testing for differential effects revealed that there were no areas used signi¿cantly more by the high ability group than the low ability group. However there were a number of areas where the low ability group show signi¿cantly greater fMRI activation than the high ability group (Fig. 4). The group differences corroborate what we found in the main effect (Fig. 3). In the statistical group comparison the low ability group shows signi¿cant increases in activation most intensely along the Rolandic ¿ssure (central sulcus) which demarcates the motor (precentral gyrus) and somatosensory (postcentral gyrus) areas, also called the sensorimotor strip. It is involved in the planning, control and execution of voluntary motor functions and processing of the sensory input, such as touch and proprioception, all functions vital for the task of articulating speech. In detail, the network more activated by the low ability group comprised 1. pre- and postcentral gyrus, 2. inferior and superior parietal lobe, 3. the basal ganglia (caudate and putamen), 4. the left insula, 5. parts of the middle frontal and middle temporal gyrus and last but not least, 6. the cerebellum. More activations were generally observed in the left hemisphere. As predicted, we found that the areas of this speech production network are engaged to a higher degree in intensity and extent of activation by the low ability group, as a function of effortful processing. As can be seen from ¿gure 4, the gradual increase in dif¿culty of the speech reading process is reÀected in the increase of activated brain areas as the reading task becomes more complex, from German as L1 (A), over English as L2 (B) to (C) German with foreign accent. Especially when faking a foreign accent (Fig. 4, C), the low ability group seems to be particularly stressed in terms of overactivation of their speechmotor-relevant areas. The differences between the three subconditions (A vs B vs C) consisted mostly in the intensity and extent of activation rather than in recruiting different areas for each condition. For example, in condition A and B (reading normal German and English sentences) a signi¿cant activation of only the right basal ganglia (caudate and putamen) was observed, whereas in condition C (faking the accent) the activation increased into a signi¿cant bilateral involvement of right and left basal ganglia. The overall activation increases of the low ability group in all reading tasks occurred in both hemispheres, but were more pronounced within the left hemisphere. Here, the activation peak

330

Susanne Reiterer, Nandini C. Singh & Susanne Winkler

(yellow colour) lies in left inferior parietal and postcentral areas, close to the supramarginal gyrus. To compare the intensity of the fMRI signal change between the groups, we took the highest (peak) activated voxel of the left inferior parietal area – which most signi¿cantly discriminated between the groups – and found striking group differences, especially for condition C (see Fig. 5). Within the high ability group the differences between the conditions are not signi¿cant. Signi¿cantly higher signal intensity is observable for the low ability group in all three conditions, but most strikingly in the condition where they fake the foreign accent. Thus, in the latter group, the big leap happens between normal sentence reading and imitating foreign accented speech. However striking this difference, it should not be forgotten that a signi¿cant group difference was already observed in simple mother tongue sentence reading.

(High ability group)

(Low ability group)

Figure 5. Activation intensity as given by percent BOLD (blood oxygen level dependent) signal change (y-axis) in the peak activated area (=left inferior parietal cortex, voxel coordinates, MNI: [-54, -21, 45]). Left side: high ability group (“talents”) for task reading sentences in 1. German (L1), 2. English (L2), 3. German with fake Engl. accent (L1Acc). Right side: low ability group (“low tal”) for the same task, reading in 4. German (L1), 5. English (L2), 6. German with fake Engl. accent (L1Acc). Most signi¿cant group difference found for reading German with foreign accent (compare 3. versus 6.).

Predicting speech imitation ability biometrically

3.5.

331

Acoustic (Modulation Spectrum Analysis) results

This section will be devoted to a description of our preliminary acoustic ¿ndings applying Speech Modulation Spectrum Analysis (SMS) on the recorded speech materials of a subset of our tested sample (subjects of the high and low ability group, as in the fMRI experiment 3.2.). When analyzing the Hindi imitations of our extreme groups, we found ¿rst characteristic traces of individual differences between the speakers reÀected in their spectrograms. Fig. 6 shows spectrograms of a sample Hindi native speaker (left graph), a sample German speaker of high imitation ability (middle graph) and a sample German speaker of low imitation ability (right graph) – when uttering/imitating Hindi speech. PoorSpeaker

Good Speaker

Vinkum

10000

10000

Temporal Modulations

Temporal Modulations 8000

6000

4000

Spectral Modulations

Spectro-temporal Moddulations

Frequency(kHz)

Frequency(kHz)

8000

2000

0

6000

4000

Spectral Modulations

Spectro-temporal modulations

2000

0.1

0.2

0.3

0.4 0.5 Time (sec)

0.6

0.7

0.8

0.2

0.4 0.6 Time (sec)

0.8

1

0.2

0.4

0.6 Time (sec)

0.8

1

Figure 6. This ¿gures shows representative speech spectrograms for the three groups: from left to right: native speaker, high ability, low ability imitator. X-axis: time (seconds), y-axis frequency (Hz).

Figure 7. This ¿gures shows contour plots and speech modulation spectra (Fig2) for the three groups: from left to right: native speaker, high ability, low ability imitators. The contour (black line) encloses 99% of the total spectro-temporal modulation power for the respective SMS.

332

Susanne Reiterer, Nandini C. Singh & Susanne Winkler

Figure 8. Left-hand plot: here, the pure contours of the contour plots from Fig. 7 are extracted. Line in blue: native speaker, in black: high ability speaker, in red: low ability speaker. Right hand plot: bar graph. Bars represent an estimate of the contour area (number of pixels). Colour bars: blue: native speaker, black: high ability speaker (largest area), red: low ability speaker (lower area).

Visual inspection of the spectrograms revealed that the native speaker model and the high ability speech imitator were more similar to each other in terms of all three parameters: temporal modulations, spectral modulations and spectrotemporal modulations. High ability speakers showed bigger articulation space (Figs. 7,8). The speakers with low imitation ability on the other hand behaved unlike the other two groups and showed smaller articulation space (Figs. 7, 8). Fig. 6 shows representative speech spectrograms while Fig. 7 shows the contour plots for the three groups. The contour plots are regions enclosing 99% of the total spectro-temporal modulation power for the respective speech modulation spectra. A visual comparison of Figs. 6,7 suggests that the high ability speakers have a larger articuation space as compared to the low ability speakers. We estimated the contour area in Fig. 8 by counting the number of pixels that have energy greater than or equal to the energy speci¿ed by the contour. As shown in Fig. 8 the contour area of the high ability group is larger than that of the low ability group.

4.

Discussion and Conclusion

We found signi¿cant differences between the groups of high and low ability speech imitators in terms of brain imaging as well as biometric acoustic analyses, as differentiated by the task of imitating foreign accented speech. This shows ¿rst of all that it is possible to predict from purely biometric data who has higher and lower articulation abilities. It also shows that the task of faking

Predicting speech imitation ability biometrically

333

accents from memory, which requires phonological awareness at both segmental and suprasegmental levels, was a very effectively discriminant of individual differences in pronunciation aptitude, as we hypothesized. Our brain imaging results con¿rmed our initial hypothesis that lower ability speakers would activate speech production areas more extensively and intensively than high ability speakers (compare Figs. 3–5). This reÀects what has been replicated many times, namely the mechanism of neural or cortical ef¿ciency (compare Reiterer et al. 2005a,b). However, we are the¿rst to connect this neural mechanism to speech imitation ability and show that it also detects individual differences in ability levels of speech production, not only mastery of skills after practice (compare also Reiterer et al. in press). Turning to the location of distinctive brain activation, we found signi¿cant differential activation in areas well known as relevant to speech production, established by some researchers as “minimal speech production network” (Bohland & Guenther 2006). Biased more towards the left hemisphere, we found signi¿cant differential activation, most notably within the pre-and postcentral gyrus (sensory motor cortex), the left inferior and superior parietal cortex (which included the area of maximum difference) the left supramarginal gyrus, the basal ganglia system (mostly caudate and putamen bilaterally), cerebellum, parts of the middle frontal and middle temporal gyrus, inferior frontal gyurs (Broca, pars opercularis), and the left insula. The left insula has experienced a revival of attention in the brain imaging literature as a region important for coordinating speech articulation as well as respiration and voluntary control of breathing during speech production (Ackermann & Riecker 2010; 2004). Recently a critical role in both motoric and cognitive aspects of speech production has been attributed to the cerebellar hemispheres (Ackermann 2008). We consistently found higher bilateral basal ganglia activation, as a marker of lower ability or increased dif¿culty in accent imitation. This ¿nding is not surprising because the basal ganglia are frequently found to play an important role in speech production and articulation (Bohland & Guenther 2006) in both the L1 and L2 pronunciation. The most signi¿cant group difference (see Fig. 5), concerned the left inferior parietal lobe (IPL), left supramarginal gyrus. The left IPL is a region that has been claimed to integrate aspects of speech perception and production, phonological representations, working memory store, multilingual language learning and reading, amongst others. The left supramarginal gyrus/left IPL has long been associated with the phonological loop component of verbal working memory and implicated in the phonological underpinnings of reading in both native (Graves et al. 2010) and second language (Das et al. 2010). Interestingly, in a recent fMRI study on verbal working memory (Kirschen, Chen &

334

Susanne Reiterer, Nandini C. Singh & Susanne Winkler

Desmond 2010) a similar location of brain activation to the one found in our experiment was described as a cerebral correlate of auditory and modality independent verbal working memory7. Part of the explanation why the low ability group had signi¿cantly more activation particularly in the left IPL could lie in their generally weaker auditory working memory functions. Our behavioural data strongly support this line of thinking (see 3.1). Indeed, auditory working memory correlated most signi¿cantly with the Hindi imitation ability score, i.e. the better their imitation ability of Hindi sentences, the higher their digit span of recalling numbers and non-word syllables. The common ground for working memory as a trigger for aptitude in second language acquisition has long been established (Baddeley, Gathercole & Papagno 1998). It is well documented in the literature that (auditory) working memory plays a crucial role in all aspects of language perception and production, especially during sentence production. A recent study investigating 95 school children (Andersson 2010) re-af¿rmed that L2 learning success and native language processing can be predicted by working memory capacity, and that there is a strong association between L1 and L2 processing capacity, which suggests that working memory and general language aptitude are mediating mechanisms or common sources. Similarly, it has recently been shown that native monolinguals (English) – who are usually thought to behave linguistically uniformly – could be differentiated by their event related potential responses (EEG/ERP) to simple phrase structure violations into high and low L1 performers (Pakulak & Neville 2010). In our present study, we found similar results for individual differences in brain activation even in mother tongue reading (Fig. 4), pointing to differences in general language ability. Our preliminary modulation spectrum analysis results suggest that skilled accent imitators have a larger articulation space than poor imitators. Since this is the ¿rst time this method is applied to individual differences in speech imitation ability, we cannot directly compare these results to those of similar earlier studies. We were able to con¿rm our preliminary hypothesis that we would ¿nd differences between the high and low imitation ability group as measured by acoustic spectrograms. Our working hypothesis suggests that this extension in articulation space in the high ability group might provide access to a larger repertoire of sounds, which in turn could possibly provide skilled imitators greater Àexibility in pronunciation. This might con¿rm our hypothesis that even late but highly talented L2 speakers who are good at accent imitation in general keep their phonetic categories more Àexible and open for being exposed to 7. For a recent and critical review on the cerebral basis of phonological working memory (‘from loop to convolution’) see Buchsbaum & D’Esposito 2008.

Predicting speech imitation ability biometrically

335

new sounds without con¿ning their articulatory repertoire to the mother tongue speech sound processing schemes. Neurocognitive Àexibility as a determiner of language talent was proposed two decades ago by researchers examining exceptionally gifted language learners (Schneidermann & Desmarais 1988). Recent evidence from studies with infants’ phonetic language learning development con¿rms these earlier postulates of neurocognitive Àexibility. In a series of ERP and behavioural experiments on phonetic discrimination ability in infants, Patricia Kuhl (Kuhl et al. 2008) showed that individual differences in phonetic discriminative behaviour at an early age could predict later language capacities in L1 development. Those children who at 7 months showed better phonetic discrimination rates of native language speech contrasts, also showed faster advancement in their L1 as measured at 30 months. Those children were thus better tuned towards their L1. In contrast, the children who displayed better phonetic discrimination performance of non-native contrasts at 7 months were also slower L1 developers. The authors concluded that the brains of the slower L1 development group had remained more in the initial, more immature state, which makes them more open towards new foreign sounds but thus less neurally committed to native language speech patterns. Our behavioural and computational acoustic data point into the same direction. There is higher neuro-cognitive Àexibility, reÀected by higher articulatory Àexibility in the group of the more talented speech imitators. Since the Hindi imitation task did not involve any pre-existing experience with the language, we assume that some individuals must still possess this openness to build new phonetic categories on an ad-hoc basis, and not rely on pre-experienced, entrenched categories. If transfer/interference from L1 had taken place, their speech output would not have been rated as close to the native speaker range of Hindi speakers. Conclusion: Our data provide evidence that speaker-related individual differences in speech imitation ability/aptitude play a decisive role in speech production, being similarly expressed for L1 and L2, and can be visualized and quanti¿ed by various biometric methods of signal analysis. Regarding speech imitation ability on a neurometric scale, we con¿rmed the theory of cortical processing effort by showing that both greater intensity and extention in activation of cortico-subcortical speech relevant areas can be shown as a function of speech imitation ability even in the case of the mother tongue. Poorer skills are always associated with greater consumption of neural workspace. With regard to acoustic measures of speech output, we found a larger articulation space to be a possible marker of high ability in L2 speech imitation skills. Further research is needed to re¿ne the acoustic parameters for discriminating differences in

336

Susanne Reiterer, Nandini C. Singh & Susanne Winkler

pro¿ciency and ability of the speech imitation skills. We further showed that faking or imitating foreign accents is a valuable task in differentiating speakers with different levels of speech sound imitation and pronunciation ability. It remains to be clari¿ed exactly why faking accents manifests such a considerable obstacle for the low ability group and exactly what role is played by auditory working memory. What we can conclude from our present experiment is that working memory is de¿nitely amongst the stronger predictors of speech imitation capacities, but it is surely not the only one. Future research needs to clarify the role of working memory in relation to other factors more precisely. However, it becomes now clear that these individual differences visible in mother tongue as well as foreign speech processing do exist, are detectable, even predictable to a certain degree of speci¿city, by neurometric and audiometric means of signal analysis.

Acknowledgements This project was supported by the German Research Foundation, SFB 833 “The Construction of Meaning”, the project no. AC 55/7-1, and the Excellence Centre for Integrative Neuroscience (CIN-Tübingen).

References Abrahamsson, Niclas & Kenneth Hyltenstam 2008 The robustness of aptitude effects in near-native second language acquisition. Studies in Second Language Acquisition 30: 481–509. Ackermann, Hermann 2008 Cerebellar contributions to speech production and perception: psycholinguistic and neurobiological perspectives. Trends in Neurosciences 31: 265–272. Ackermann, Hermann & Axel Riecker 2010 The contribution(s) of the insula to speech production: a review of the clinical and functional imaging literature. Brain Structure and Function 214: 419–433. Ackermann, Hermann & Axel Riecker 2004 The contribution of the insula to motor aspects of speech production: a review and a hypothesis. Brain and Language, 89: 320–328. Andersson, Ulf 2010 The contribution of working memory capacity to foreign language comprehension in children. Memory 18: 458–472.

Predicting speech imitation ability biometrically

337

Arsikere, H., Y.H. Lee, S. Lulich, J. Morton, M. Sommers & A. Alvan 2010 Relations among subglottal resonances, vowel formants and speaker height, gender and native language. Journal of the Acoustical Society of America 128 (4): 2288. Baddeley, Alan, Susan Gathercole & Costanza Papagno 1998 The Phonological Loop as a Language Learning Device. Psychological Review 105: 158–173. Benner, Uta 2005 Syllables in Speech Production: A Study of the Mental Syllabary. Master Thesis, Department of Natural Language Processing, University of Stuttgart. Bohland, J. & Frank Guenther 2006 An fMRI investigation of syllable sequence production. NeuroImage 32: 821–841. Buchsbaum, B. & Marc D’Esposito 2008 The search for the phonological store: from loop to convolution. Journal of Cognitive Neuroscience 20: 762–778. Carroll, John & Sapon, Stanley 1959 Modern Language Aptitude Test (MLAT): Manual. New York: The Psychological Corporation. Das, Tanusree, P. Padakannaya, K.R. Pugh & Nandini C. Singh 2010 Neuroimaging reveals dual routes to reading in simultaneous pro¿cient readers of two orthographies. NeuroImage: in press. Disner, S., S. Fulop & F. Hsieh 2010 The ¿ne structure of phonation as a biometric. Journal of the Acoustical Society of America 128(4): 2394. Dogil, Gzregorz & Susanne Reiterer (eds.) 2009 Language Talent and Brain Activity. Trends in Applied Linguistics 1. Berlin/New York: Mouton de Gruyter. Engineer, C., C. Perez, YeTing Chen, R. Carraway, A. Reed, J. Shetake, V. Jakkamsetti, K. Chang & M. Kilgard 2008 Cortical activity patterns predict speech discrimination ability. Nature Neuroscience 11: 603–608. Fitch, William Tecumseh 2010 The Evolution of Language. Cambridge: University Press. Flege, James Emil & Robert Hammond 1982 Mimicry of non-distinctive phonetic differences between language varieties. Studies in Second Language Acquisition 5: 1–17. Graves, W., R. Desai, C. Humphries, M. Seidenberg & J.R. Binder 2010 Neural systems for reading aloud: a multiparametric approach. Cerebral Cortex 20: 1799–1815.

338

Susanne Reiterer, Nandini C. Singh & Susanne Winkler

Golestani, Narly, Cathy Price & Sophie K. Scott 2011 Born with an ear for dialects? Structural plasticity in the expert phonetician brain. Journal of Neuroscience 31: 4213–4220. Golestani, Narly, Nicolas Molko, Stanislas Dehaene, Denis LeBihan & Christophe Pallier 2007 Brain structure predicts the learning of foreign speech sounds. Cerebral Cortex 17: 575–582. Golestani, Narly & Christophe Pallier 2007 Anatomical correlates of foreign speech sound production. Cerebral Cortex 17: 929–934. Golestani, Narly, Tomas Paus & Robert Zatorre 2002 Anatomical correlates of learning novel speech sounds. Neuron 35: 997– 1010. Harris, Kelly C., Judy Dubno, Noam Keren, Jayne Ahlstrom & Marc A. Eckert 2009 Speech recognition in younger and older adults: a dependency on lowlevel auditory cortex. Journal of Neuroscience 29: 6078–6087. Jilka, Matthias 2009 Assessment of phonetic ability. In: Dogil, Greg and Reiterer Susanne M (eds.), Language Talent and Brain Activity, 17–66. (Trends in Applied Linguistics 1) Berlin/New York: Mouton de Gruyter. Kirschen, Matthew, Annabel Chen & John Desmond 2010 Modality speci¿c cerebro-cerebellar activations in verbal working memory: an fMRI study. Behavioural Neurology 23: 51–63. Kuhl, P.K., B. Conboy, S. Coffey-Corina, D. Padden, M. Rivera-Gaxiola & T. Nelson 2008 Phonetic learning as a pathway to language: new data and native language magnet theory expanded (NLM-e). Phil. Trans. R. Soc. B 363: 979–1000. Lehrl, S., G. Triebig & B. Fischer 1995 Multiple choice vocabulary test MWT as a valid and short test to estimate premorbid intelligence. Acta Neurol. Scand. 91: 335–345. Old¿eld, R. 1971 The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia 9: 97–113. Pakulak, Eric & Helen Neville 2010 Pro¿ciency differences in syntactic processing of monolingual native speakers indexed by event-related potentials. Journal of Cognitive Neuroscience 22: 2728–2744. Prat, Chantel S., Timothy Keller & Marcel A. Just 2007 Individual differences in sentence comprehension: a functional magnetic resonance imaging investigation of syntactic and lexical processing demands. Journal of Cognitive Neuroscience 19: 1950–1963. Raven, J., J.C. Raven & J.H. Court 1998 Manual for Raven’s Advanced Progressive Matrices. Oxford, UK: Psychologists Press.

Predicting speech imitation ability biometrically

339

Reiterer, S., X. Hu, M. Erb, G. Rota, D. Nardo, W. Grodd, S. Winkler & H. Ackermann subm. Individual differences in audiovocal speech imitation aptitude in late bilinguals: functional neuron-imaging and brain morphology. Manuscript, submitted. Reiterer, S., M.L. Berger, C. Hemmelmann & P. Rappelsberger 2005a Decreased EEG coherence between prefrontal electrodes: a correlate of high language pro¿ciency? Exp Brain Res 163: 109–113. Reiterer, S., C. Hemmelmann, P. Rappelsberger & M.L. Berger 2005b Characteristic functional networks in high- versus low-pro¿ciency second language speakers detected also during native language processing: an explorative EEG coherence study in 6 frequency bands. Cognitive Brain Research 25: 566–578. Rosen, S. 1992 Temporal information in speech. Acoustic, auditory and linguistic aspects. Philos. Transactions: Biological Sciences, 336: 367–373. Schneidermann, E. & C. Desmarais 1988 The talented language learner: some preliminary ¿ndings. Second Language Research 4: 91–109. Singh, Latika & Nandini Chatterjee Singh 2008 The development of articulatory signatures in children. Developmental Science 11: 467–473. Singh, Nandini Chatterjee & F.E. Theunissen 2003 Modulation spectra of natural sounds and ethological theories of auditory processing. Journal of the Acoustica Society of America 114: 3394–3411. Stevens, K.N. 1980 Acoustic correlates of some phonetic categories. J. Acoustic Soc. Am., 68: 836–842. Tewes, U. 1991 Hamburg-Wechsler-Intelligenztest für Erwachsene, HAWIE-R. Bern/ Stuttgart/Toronto: Huber. Wong, P., C. Warrier, V. Penhune, A. Roy, A. Sadehh, T. Parrish & R. Zatorre 2008 Volume of left Heschl’s Gyrus and linguistic pitch learning. Cerebral Cortex 18: 828–236.

Index acceptability judgments 3, 14, 88, 139, 140, 218–220, 221, 224, 241–261; see also Ramsey Test activation 30, 34, 36, 187–202 adjectival declension 296–298, 301–302, 308, 310, 311 adverbial – manner 243, 245, 261 – sentential 149, 151 – temporal 187–188 agent-¿rst strategy 301, 304, 305, 310 annotation 33–34, 36, 37, 66–80, 226 – prosodic 225, 228, 229 antecedent – discourse 213, 258 – of conditional 100–101, 105, 169, 171–172, 173, 182 – of reÀexive 43–59 – of relative pronoun 304 at-issue content 89, 99, 101, 102–103, 105, 108 automatic classi¿cation 64, 67–69, 71, 72–73 behavioural data 268, 294, 310, 320–321, 323, 326–327, 334, 335 binding theory 43–44, 49, 153; see also ziji British National Corpus 33, 260 Bulgarian 124, 139 c-command 9, 43, 47, 50–51, 53 challengeability test 99, 102–104, 105–107

Cheyenne 89–90, 94, 105, 106, 107, 109 CHILDES corpus 115, 116, 126, 128, 129–130, 137 Combinatory Categorial Grammar 33–36 compositional interpretation 115, 123, 143, 269 comprehension task 148, 154–156, 157, 158–159, 163, 164, 165; see also eyetracking, picture veri¿cation, reading time study, visual world paradigm context-free grammar CFG 33–35 context-retrievability experiment 216, 221 coordination 188–189 – clausal 295–296, 299, 300, 304, 310, 311 corpus study 30, 32, 34, 35–37, 116, 117, 127, 129–134, 144, 208, 215, 224–232, 233, 241, 243, 260 cross-linguistic variation 68, 86, 87, 115–144, 148 cue-based retrieval 45–48, 52–53, 56, 58–59 deaccenting 212–213, 214, 232, 234, 256 decay 34, 35, 37, 53, 58, 194, 200 decision tree classi¿er 67, 69, 70, 71, 75, 76 dependency resolution 43, 46, 47, 49, 51–52 Diachronic Corpus of Present-Day Spoken English 33

342

Index

digit span 294, 306, 311, 321, 327, 334 discourse particle 175, 177, 178–179 Dutch 77, 147–166, 174, 175, 176, 179 elicitation task 233, 295, 308–309, 318, 320 English 9, 33, 43, 44, 47, 59, 65, 66, 68, 77, 86, 90, 91, 94, 96, 98, 106, 108, 109, 115–144, 147, 148, 152, 159, 187, 188, 208, 209, 215, 220, 222, 241–261, 318–330 event-related potentials 44–45, 58, 267, 269, 291, 321, 334, 335 experiential trace 267, 269, 270, 275, 280, 286, 287–288 extraposition 244–251, 253, 255, 261 eyetracking 7, 44, 46, 58, 59 fMRI 291, 292, 293, 294, 309, 317, 318, 319–320, 321–322, 327, 329–330, 333 focus 210–211, 253, 255 – and downstepping 223, 230–231 – contrastive 223, 241 – narrow 212, 216, 217–218, 220–221, 221–222, 223 – pragmatics-only accounts 208–209, 215, 224, 225, 233–234 – presentational 241, 256, 258, 259 see also pitch accent foreign accent 317, 319, 329, 330, 332, 335–336

French 68, 77 functional neuroimaging 267, 292, 309, 317, 318, 327, 332–333 German 4, 8–13, 20, 63–80, 98, 115–116, 120, 124, 126, 127, 129–135, 137, 138, 139, 143, 188, 207, 208, 209, 215, 221, 223, 229–230, 284, 295, 296, 299, 309, 311, 320, 321, 322, 323, 327, 329 GermaNet 66, 68 Gitksan 88, 89, 107 grammaticality judgment 298, 302–303, 308, 310, 311 Grice 95–96, 182 GToBI 225–226, 229–230 Guarani 120, 124, 139 heaviness 242–246, 248–250, 251, 253–254, 255, 256, 257, 261 Hindi 124, 139, 320–321, 323, 326–327, 334, 335 Hungarian 124, 139 IMS Radionewscorpus 209, 225, 229, 230, 232 information structure 207–234, 242, 251, 256–259, 261 interference effect 44–48, 53–58, 59 Japanese 101, 103, 120, 122–123, 128, 137, 139, 140 language acquisition 7, 22, 116, 117, 125–138, 143–144, 147– 166, 292, 293, 295, 297, 311, 317

Index

– second language 318, 319, 334 language impairment 291, 297, 310–311 lateralization 292, 293–294, 309 linear mixed effects model 16, 17, 18, 19, 24, 36, 55, 56 make-sense judgment 218, 219, 220 Mandarin Chinese 48–59, 120, 124, 140, 141, 143; see also ziji MapTask corpus 32, 34, 35, 36 measure phrase 71, 75 – pronominal 127, 128, 131–134, 135, 136, 137–143 methodology 5,7,11,16, 22, 47–48, 58, 59, 87–88, 110, 116 modality 63, 69, 70, 77, 122 – desiderative 192, 193, 195, 199, 202 – epistemic 83–110 modulation spectrum analysis 318, 319, 323–326, 331–332, 334 Mooré 124, 139 Motu 115, 120, 121–122, 136, 139 naive Bayes classi¿er 67 non-constructive evidence 173, 174–178, 179–180, 182 Optimality Theory 150, 152–154, 164–165, 166, 255 Or-to-If inference 181–183 passive 13, 160, 162, 294, 296, 298, 299, 300, 304 – adjectival 187–202 pseudo-passive 244, 245–246, 253 – verbal 187, 188, 189

343

picture veri¿cation 7, 154–156, 193–198, 199–200, 268, 275, 278–280, 282, 283, 284–285; see also truth value judgment task pitch accent 207, 208, 209, 210, 211–212, 214, 220–221, 223, 225–228, 229–232 post-verbal subject 241, 242, 259, 260, 261 pragmatics 3, 5, 37, 48, 49, 88, 147, 148, 149–150, 151, 154, 163, 166, 179, 183, 188, 190, 191, 201, 242; see also focus PrepNet 68 preposition-noun combination 65– 66, 67, 75, 77, 80 production task 5, 154, 156, 157, 160–162, 166, 222–224, 294, 295, 296, 300, 306–308, 310 quanti¿er scope 4, 8–13, 14–22, 25, 147–166 Quechua 86, 89–90, 94, 98, 102, 105–106, 108–109, 110 Question Under Discussion 210, 213 Ramsey Test 171–173, 178, 179, 180, 182 reading time study 3, 52, 58, 189, 196; see also eyetracking relative clause 245, 255, 272, 294, 296, 298, 299, 300, 304–306 Romanian 139, 141–142 Russian 120, 124, 139, 141–142 Samoan 124, 139, 140 semantic embedding 99–101, 105 sentence imitation task 292, 295, 298, 300, 306, 308, 310, 311

344

Index

sentence-sensibility judgment task 268 simulation, mental 190, 201, 267–288 Spanish 139, 141–142, 318 speaker’s perspective 49, 151 spoken language 208, 215, 233, 325 St’á’imcets 83–110 stops-making-sense judgment task 4, 6 subcomparative 116, 117, 127 superlative 119, 121, 131, 132, 134, 136, 188 Thai 124, 139, 142–143 Thompson Salish 107 ToBI 220, 225 Topicalization 8, 10, 13, 21, 115, 294, 295–296, 298, 299, 300– 301, 304–305, 311 TROG-D 299, 304–306, 308, 309, 311 truth value judgment task 4–6, 7, 10, 13, 14, 20, 21, 22, 87–88, 159 – incremental 6–26 Turkish 124, 139, 142 typicality 4, 21, 22, 24

verb type – change-of-state 192, 194, 200, 201, 202, 261fn – transitive 151, 152, 155, 160, 162, 247, 256, 257 – unaccusative vs. unergative 241–261 Verbmobil corpus 209, 229–232 visual world paradigm 5, 233 vowel identi¿cation task 293 WebExp2 14 word order 300, 307 – canonical 149, 150, 151, 156, 166, 296 – non-canonical 295, 310 – scrambling 148–149, 151 see also extraposition, post-verbal subject, Topicalization WordNet 68 working memory 7, 22, 164, 311, 317, 321, 327, 333–334, 336 Yorùbá

124, 139, 140

ziji 48–57