Diachronic and Comparative Syntax 1138233048, 9781138233041

This book brings together for the first time a series of previously published papers featuring Ian Roberts' pioneer

805 143 3MB

English Pages xiv+558 [573] Year 2019

Polecaj historie

Comparative and Diachronic Perspectives on Romance Syntax [1 ed.] 9781527509498, 9781527504011

The volume brings together fifteen papers focusing on the morphosyntax of different Romance varieties. It is based on pa

262 104 3MB Read more

Diachronic syntax : the Kartvelian case 9780126135183, 0126135185

158 88 111MB Read more

Austroasiatic Syntax in Areal and Diachronic Perspective 9004396950, 9789004396951

This volume elevates historical morpho-syntax to a research priority in the field of Southeast Asian language history, t

970 104 6MB Read more

Diachronic Slavonic Syntax: Traces of Latin, Greek and Church Slavonic in Slavonic Syntax 9783110647204, 9783110651331, 9783110647068

The impact of the ecclesiastical languages Greek, Latin and Church Slavonic on the Slavic standard languages still lacks

215 108 2MB Read more

Diachronic Slavonic Syntax: Traces of Latin, Greek and Church Slavonic in Slavonic Syntax 9783110651331, 9783110647068

The impact of the ecclesiastical languages Greek, Latin and Church Slavonic on the Slavic standard languages still lacks

311 47 2MB Read more

Elements of Comparative Syntax: Theory and Description 9781501504037, 9781501511943, 9781501518935

This volume brings together a selection of articles illustrating the multifaceted nature of current research in generati

290 77 7MB Read more

Old French and Comparative Gallo-Romance Syntax 9783484522329

526 64 109MB Read more

Diachronic Slavonic Syntax: The Interplay between Internal Development, Language Contact and Metalinguistic Factors 9783110531435, 9783110529296, 9783110686142

The book is dedicated to the study of the causes and mechanisms of syntactic change in Slavonic languages, including int

406 155 2MB Read more

Elements of Comparative Syntax: Theory and Description 9781501504037, 9781501511943, 9781501518935

This volume brings together a selection of articles illustrating the multifaceted nature of current research in generati

163 35 26MB Read more

Diachronic Slavonic Syntax: The Interplay between Internal Development, Language Contact and Metalinguistic Factors 9783110531435, 9783110529296, 9783110686142

The book is dedicated to the study of the causes and mechanisms of syntactic change in Slavonic languages, including int

198 92 22MB Read more

Diachronic and Comparative Syntax
1138233048, 9781138233041

Author / Uploaded
Ian Roberts

Citation preview

Diachronic and Comparative Syntax

This book brings together for the first time a series of previously published papers featuring Ian Roberts’ pioneering work on diachronic and comparative syntax over the last thirty years in one comprehensive volume. Divided into two parts, the volume engages in recent key topics in empirical studies of syntactic theory, with the eight papers on diachronic syntax addressing major changes in the history of English as well as broader aspects of syntactic change, including the introduction to the formal approach to grammaticalisation, and the eight papers on comparative syntax exploring head-movement, the nature and distribution of clitics, and the nature of parametric variation and change. This comprehensive collection of the author’s body of research on diachronic and comparative syntax is an essential resource for scholars and researchers in theoretical, comparative, and historical linguistics. Ian Roberts is Professor of Linguistics in the Department of Linguistics at the University of Cambridge. His most recent publications include The Final-Over-Final Condition, with Theresa Biberauer, Anders Holmberg and Michelle Sheehan (2017) and The Wonders of Language (2017).

Routledge Leading Linguists Edited by Carlos P. Otero

University of California, Los Angeles, USA

On Shell Structure Richard K. Larson Primitive Elements of Grammatical Theory Papers by Jean-Roger Vergnaud and his Collaborators Edited by Katherine McKinney-Bock and Maria Luisa Zubizarreta Pronouns, Presuppositions, and Hierarchies The Work of Eloise Jelinek in Context Edited by Andrew Carnie and Heidi Harley Explorations in Maximizing Syntactic Minimization Samuel D. Epstein, Hisatsugu Kitahara, and T. Daniel Seely Merge in the Mind-Brain Essays on Theoretical Linguistics and the Neuroscience of Language Naoki Fukui Formal Grammar Theory and Variation Terje Lohndal Aspects of Grammatical Architecture Alain Rouveret Biolinguistic Investigations and the Formal Language Hierarchy Juan Uriagereka Diachronic and Comparative Syntax Ian Roberts For more information about this series, please visit: www.routledge.com

Diachronic and Comparative Syntax

Ian Roberts

First published 2019 by Routledge 711 Third Avenue, New York, NY 10017 and by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an informa business © 2019 Taylor & Francis The right of Ian Roberts to be identified as author of this work has been asserted by him/her in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data A catalog record for this book has been requested ISBN: 978-1-138-23304-1 (hbk) ISBN: 978-1-315-31057-2 (ebk) Typeset in Sabon by Apex CoVantage, LLC

Contents

Preface Acknowledgments List of Contributors

vii x xii

PART I

Diachronic Syntax

1

1 Agreement Parameters and the Development of English Modal Auxiliaries

3

IAN ROBERTS

2 A Computational Model of Language Learnability and Language Change

39

ROBIN CLARK AND IAN ROBERTS

3 Object Movement and Verb Movement in Early Modern English

88

IAN ROBERTS

4 Directionality and Word Order Change in the History of English

104

IAN ROBERTS

5 Verb Movement and Markedness

139

IAN ROBERTS

6 Theoretical Consequences ANNA ROUSSOU AND IAN ROBERTS

184

vi Contents

7 Cascading Parameter Changes: Internally-Driven Change in Middle and Early Modern English

227

THERESA BIBERAUER AND IAN ROBERTS

PART II

Comparative Syntax

259

8 Passive Arguments Raised

261

MARK BAKER, KYLE JOHNSON AND IAN ROBERTS

9 Complex Inversion in French

296

LUIGI RIZZI AND IAN ROBERTS

10 Excorporation and Minimality

325

IAN ROBERTS

11 Two Types of Head Movement in Romance

334

IAN ROBERTS

12 Clause Structure and X-Second

369

ANNA CARDINALETTI AND IAN ROBERTS

13 The Analysis of VSO Clauses

419

IAN ROBERTS

14 Introduction: Parameters in Minimalist Theory

473

ANDERS HOLMBERG AND IAN ROBERTS

15 Macroparameters and Minimalism: A Programme for Comparative Research

535

IAN ROBERTS

Index554

Preface

This book brings together in one place, for the first time, a series of papers on diachronic and comparative syntax that I have published over a period of more than three decades. The papers deal with central empirical topics in recent syntactic theory (verb-movement, word order, null subjects, the nature and distribution of clitics) and also several major theoretical topics (head-movement, argument structure, the nature of parametric change and variation). Together, they form a coherent body of work reflecting how syntactic theory, as it has developed since the 1980s, can shed new light on questions of variation and change. Part I includes papers on diachronic syntax. Several of these papers deal with well-known changes in the history of English. Chapter 1 was a pioneering paper in that it attempted to recast Lightfoot’s classic (1979) analysis of the development of English modal auxiliaries in terms of principles and parameters governing verb-movement. It was the first paper to observe that English lost V-to-I movement of main verbs in the 16th century and to connect this to the breakdown of verbal inflection. In important respects, it anticipated Pollock’s (1989) seminal work on V-to-I movement in English and French, as well as the large literature on verb-movement that grew out of Pollock’s work in the 1990s (see, for example, the papers in Hornstein and Lightfoot 1994, Vikner 1995, 1997, Rohrbacher 1999, Bobaljik 2002, Bobaljik and Thráinsson 1998, Bentzen 2007, Wiklund et al. 2007, Koeneman and Zeijlstra 2014 and Tvica 2017). Chapter 3 shows how Holmberg’s (1986) generalization (object shift can only apply when the verb moves) holds throughout the history of English (albeit vacuously in Modern English); a corollary of this is that there is no need to posit any change in the nature of English pronouns in order to account for their changed distribution since the Early Modern period, as this follows directly from the change in verb syntax discussed in Chapter 1 combined with Holmberg’s generalisation. Chapter 4 was the first attempt to account for the change from OV to VO word order in Middle English from the point of view of the antisymmetric theory of syntax of Kayne (1994), closely basing the analysis of Old English word order on the proposals for Dutch in Zwart (1993). This approach

viii Preface was largely superseded by the one developed in Biberauer & Roberts (2005, 2008), the latter of which is republished here as Chapter 7. This latter paper presents an overview of a series of changes in English, from the Old English period through to the 17th century, showing how these changes form a “cascade”, with each change creating the conditions for the next. The other chapters of Part I deal with more general aspects of syntactic change. Chapter 6 is excerpted from Roberts & Roussou (2003) and summarises the overall approach to grammaticalisation adopted there, considering its implications both for the theory of parameters and for the theory of functional categories. Chapter 2 is an ambitious attempt to develop an account of parameter setting using genetic algorithms, and applying the idea to an account of how syntactic change originates in language acquisition. An important aspect of this approach is a “least-effort” principle in acquisition, which forms the basis for the theory of markedness developed and applied to a range of data in Chapter 5. The chapters making up Part II, dealing with comparative syntax, treat head-movement, the nature of clitics and/or the nature of parametric variation. Chapter 8 proposes an influential analysis of passive constructions, whose central idea is that the passive morpheme is an argumental clitic. Chapter 10 points out that there is nothing in the systems of head- movement put forward in Chomsky (1986) and Baker (1988) that, without stipulation, prevents “excorporation”, i.e. successive-cyclic head-movement without pied-piping. It is suggested that this may be an empirical advantage. Chapter 11 proposes, on the basis of Romance data, that there are two distinct kinds of head-movement, “A-head-movement” and “A’-headmovement,” subject, in terms of Relativised Minimality (as formulated in Rizzi 1990), to different locality constraints. It is argued that this can account for certain apparent violations of the Head Movement Constraint. Both Chapter 9 and Chapter 12 deal with clitic-placement and its interactions with verb-movement, the former in relation to a particular construction in French, the latter in relation to a range of “second-position” effects in a range of languages. In some respects, the latter paper anticipates Rizzi’s (1997) proposals regarding the expanded left periphery. Chapter 13 analyses verbmovement in VSO clauses in Welsh, relating the situation in this language to verb-movement and clause structure in Germanic and Romance. Finally, Chapters 14 and 15 develop a new approach to parametric variation, the former on the basis of a thorough overview of work on null subjects, the latter in more general terms. Taken together, these papers form a coherent body of work applying the theory of principles and parameters, at different stages of its development, to a range of diachronic and comparative phenomena. Ian Roberts Los Angeles, October 2017

Preface ix

References Baker, M. (1988) Incorporation: A Theory of Grammatical Function Changing. Chicago. Bentzen, K. (2007) Order and Structure in Embedded Clauses in Northern Norwegian. PhD dissertation, CASTL, University of Tromsø, Norway. Biberauer, T. and I. Roberts. 2005. Changing EPP-parameters in the history of English: accounting for variation and change. English Language and Linguistics 9, 1: 5–46. Biberauer, T. & I. Roberts. 2008. Cascading Parameter Changes: Internally-driven Change in Middle and Early Modern English. In T. Eythórsson (ed) Grammatical Change and Linguistic Theory: The Rosendal Papers. Amsterdam: Benjamins, pp. 79–114 [this volume, Chapter 7]. Bobaljik, J. 2002. Realizing Germanic Inflection: Why Morphology Does Not Drive Syntax. Journal of Comparative Germanic Linguistics 6: 129–167. Bobaljik, J. & H. Thráinsson. 1998. Two heads aren’t always better than one. Syntax 1: 37–71. Chomsky, N. 1986. Barriers. Cambridge, MA: MIT Press. Holmberg, A. 1986. Word Order and Syntactic Features in Scandinavian Languages and English. PhD Dissertation, University of Stockholm. Hornstein, N. & D. Lightfoot (eds). 1994. Verb Movement, Cambridge: Cambridge University Press. Kayne, R. 1994. The antisymmetry of syntax, Cambridge, MA: MIT Press. Koeneman, O. & H. Zeijlstra. 2014. The Rich Agreement Hypothesis Rehabilitated. Linguistic Inquiry 45: 571–615. Lightfoot, D. 1979. Principles of Diachronic Syntax. Cambridge: Cambridge University Press. Pollock, J-Y. 1989. Verb movement, Universal Grammar and the structure of IP. Linguistic Inquiry 20: 365–424. Rizzi, L. 1990. Relativized Minimality. Cambridge, Mass.: MIT Press. Rizzi, L. 1997. On the fine structure of the left periphery. In L. Haegeman (ed.). Elements of grammar. Dordrecht: Kluwer, pp. 281–337. Roberts, I. & A. Roussou. 2003. Syntactic Change. A Minimalist Approach to grammaticalization. Cambridge: Cambridge University Press. Rohrbacher, B. 1999. Morphology-driven syntax. Amsterdam: Benjamins. Tvica, Seid. 2017. Agreement and verb movement. PhD diss., University of Amsterdam, LOT. Vikner, S. 1995. Verb movement and expletive subjects in the Germanic languages. Oxford: Oxford University Press. Vikner, S. 1997. V-to-I movement and inflection for person in all tenses. In: Haegeman, L. (ed.). The New Comparative Syntax. London: Longman, 187–213. Wiklund, A.L., G. Hrafnbjargarson, K. Bentzen & T. Hróarsdóttir. 2007. Rethinking Scandinavian verb movement, Journal of Comparative Germanic Linguistics 10: 203–233. Zwart, J.-W. 1993. Dutch Syntax. PhD dissertation, University of Groningen.

Acknowledgments

Chapter 1 was first published as Roberts, I. 1985 “Agreement Parameters and the Development of English Modal Auxiliaries,” Natural Language and Linguistic Theory, 3: 21–58. Chapter 2 was first published as Clark, R. & I. Roberts 1993, “A Computational Model of Language Learnability and Language Change,” Linguistic Inquiry 24: 299–345. Reprinted by kind permission of MIT Press. Chapter 3 was first published as Roberts, I. 1995, “Object Movement and Verb Movement in Early Modern English” H. Haider, S. Olsen & S. Vikner (eds) Studies in Comparative Germanic Syntax. Dordrecht: Kluwer, pp. 269–284. Chapter 4 was first published as Roberts, I. 1997, “Directionality and Word Order Change in the History of English.” In A. van Kemenade & N. Vincent (eds) Parameters of Morphosyntactic Change. Cambridge: Cambridge University Press, pp. 397–426. Reprinted with permission. Chapter 5 was first published as Roberts, I. 1999, “Verb Movement and Markedness,” in Michel deGraff (ed) Language Creation and Language Change. Cambridge, Mass.: MIT Press, 287–328. Reprinted by kind permission of MIT Press. Chapter 6 was first published as Chapter 5 of Roberts, I. & A, Roussou 2003, Syntactic Change: A Minimalist Approach to Grammaticalization, Cambridge: Cambridge University Press. Reprinted with permission. Chapter 7 was first published as Biberauer, T. & I. Roberts 2008, “Cascading Parameter Changes: Internally-driven Change in Middle and Early Modern English,” in T. Eythórsson (ed) Grammatical Change and Linguistic Theory: The Rosendal Papers. Amsterdam: Benjamins, pp. 79–114. Reprinted with kind permission of John Benjamins. Chapter 8 was first published as Baker, M., K. Johnson & I. Roberts 1989, “Passive Arguments Raised,” Linguistic Inquiry 20: 219–251. Reprinted with kind permission of MIT Press. Chapter 9 was first published as Rizzi, L. & I. Roberts 1989, “Complex Inversion in French,” Probus 1, 1–30, and reprinted in A. Belletti & L. Rizzi (eds) Parameters and Functional Heads. Oxford/New York: Oxford University Press, 1996. Reprinted by permission of Oxford University Press.

Acknowledgments xi Chapter 10 was first published as Roberts, I. 1991, “Excorporation and Minimality,” Linguistic Inquiry, 22, 209‑218. Reprinted by kind permission of MIT Press. Chapter 11 was first published as Roberts, I. 1994, “Two Types of Head Movement in Romance,” N. Hornstein & D. Lightfoot (eds) Verb Movement, Cambridge: Cambridge University Press, pp. 207–242. Reprinted by permission of Cambridge University Press. Chapter 12 was first published as Cardinaletti, A. & I. Roberts 2002, “Clause Structure and X-Second”, in Guglielmo Cinque (ed) The Functional Structure of DP and IP, Cambridge: Cambridge University Press, pp. 123–167. Reprinted by permission of Oxford University Press. Chapter 13 was first published as Chapter 1 of Roberts, I. 2005, Principles and Parameters in a VSO Language: a Case Study in Welsh. Oxford/ New York: Oxford University Press. Reprinted by permission of Oxford University Press. Chapter 14 was first published as the Introduction to Biberauer, T., A. Holmberg, I. Roberts & M. Sheehan, 2010 Parametric Variation: Null Subjects in Minimalist Theory. Cambridge: Cambridge University Press. Reprinted by permission of Cambridge University Press. Chapter 15 was first published as Roberts, I. 2012, “Macroparameters and Minimalism: A Programme for Comparative Research,” in C. Galves, S. Cyrino, R. Lopes, F. Sândalo and J. Avelar (eds) Parameter Theory and Linguistic Change. Oxford: Oxford University Press, pp. 320–335. Reprinted by permission of Oxford University Press.

Contributors

I am grateful to the co-authors of several of the chapters herein for our collaboration, and for their agreement in republishing the material. Their contact details are as follows: Professor Mark Baker Department of Linguistics Rutgers University 18 Seminary Place New Brunswick, NJ 08901-1184 USA [email protected] Dr Theresa Biberauer St John’s College Cambridge CB2 1CP UK [email protected] Professor Anna Cardinaletti Università Ca’ Foscari Venezia Dipartimento di Studi Linguistici e Culturali Comparati 30123 Venice Italy [email protected] Professor Robin Clark Department of Linguistics University of Pennsylvania Philadelphia, PA 19104 USA [email protected] Professor Anders Holmberg Department of Linguistics Newcastle University Newcastle NE1 7RU UK [email protected]

Contributors xiii Professor Kyle Johnson Department of Linguistics University of Massachusetts N408 Integrative Learning Center 650 North Pleasant Street, Amherst, MA 01002 USA [email protected] Professor Luigi Rizzi Centro interdipartimentale di studi cognitivi sul linguaggio University of Siena Complesso S. Niccolò Via Roma, 56 Siena Italy [email protected] Professor Anna Roussou Division of Linguistics University of Patras Patra Greece [email protected]

Part I

Diachronic Syntax

1

Agreement Parameters and the Development of English Modal Auxiliaries* Ian Roberts

1. Introduction 1.1 General Background In Modern English there is a syntactically and morphologically definable subclass of verbs, the modal auxiliaries. These verbs1 differ from other verbs (main verbs) with respect to the following criteria among others (cf. Jespersen, 1909–49; Palmer, 1974; Pullum and Wilson, 1977): (1) a. Inversion:

Must they leave? *Leave they? b. Negation: They cannot walk. *They walk not. c. Agreement: *He mays, musts, wills, cans, etc. d. Non-finite forms: *He has (?)might (etc.) to do it. *They are canning to do it. *They might could do it.2

This list of properties, while not exhaustive, suffices to establish the distinction between the two classes of verbs. This distinction did not exist at an earlier stage of the language. On this matter we quote Visser (1963–73): Originally, . . . they [the modals – IGR] were not function-words, but full or independent notional verbs that did not differ syntactically in any way from other full verbs. Thus they could regularly be construed with direct objects: “ic can eow” (= “I know you”), “ic sculde tyn þusend punda” (= “I had to pay ten thousand pounds”), “eall þæt he ahte” (= “all that he possessed”). Since infinitives were nouns, the relation between them and the verbs shall, can, etc., to which they were joined must originally have been the same as that between a direct object and a full verb, so that there was structurally

4 Ian Roberts no difference in this respect between ‘he can manigfealdan spræce’ [= “he knows many languages”—IGR] and ‘he can sprecan [= “he can speak”—IGR].’ (Visser, 1963–73, p. 548) The sameness in syntactic distribution of these two classes in Middle English is shown in the examples in (2), where we can see that negation and various processes of V-fronting3 affected both modals and main verbs in the same fashion:4 (2) a. Inversion: (i) Al her cariage was stole be the Frenshmen, so mote they nedes go home on fote All their conveyance was stolen by the Frenchmen; so they had to go home on foot. (c. 1464 Capgrave Chronicle of England: V 1694) (ii) Wilt thow ony thinge with hym? Do you want him for anything? (1470–85 Malory Morte d’Arthure III, iii, 102, V 559)

(iii) Than longen folk to goon on pilgrimages Then people want to go on pilgrimages. (c. 1386 Chaucer General Prologue Canterbury Tales, 12) b. Negation: (i) ʒif ʒe wollnot to haue mercy of God If you don’t want God’s mercy. (c. 1450 Mirk’s Festial 285: V 1177)

(ii) Thy godfadirs wyff thow shalt not take You shall not take your godfather’s wife. (c. 1450 Idley Instructions 2 a. 1757: V 1489)

(iii) A blynde man kan nat juggen wel in hewis A blind man cannot judge colours well. (c. 1387 Chaucer Troilus 2, 21: V 1624) (iv) He ne held it noght He did not hold it. (Mossé, 1952, p. 112) (v) My wyfe rose nott My wife did not get up. (Mossé, 1952, p. 112)

Agreement Parameters and Modal Auxiliaries 5 Example (3) shows that Middle English (ME) (and some Early Modern English (ENE)) modals had non-finite forms: (3) (i) I shall not konne answere I will not be able to answer. (c. 1386 Chaucer Canterbury Tales B 2902: V 1649)

(ii) Cunnyng no recour in so streit a neede Knowing no recourse in so desperate a need. (c. 1439 Lydgate Fall of Princes 7, 1346: V 1650)

(iii) if we had mought conuenient come together If we had been able to meet conveniently. (c. 1528 St. Thomas More Works 107, 86: V 1687)

(iv) if he had wolde If he had wanted to. (1525 Ld. Berners, Froiss. II, 402: V 1687) We will see later that the ME modals were in fact distinct with respect to their agreement properties. This distinction will be crucial in what follows. The change that involved the development of the subclass of modals and its consequences will be the object of study in this article. The paper is organized as follows: the remainder of this introduction will be devoted to giving theoretical background to our account. In particular, we will present those aspects of Government Binding theory that are important here: government and the theory of thematic roles. We also propose a condition on the distribution of verbs. This condition leads to the postulation of two kinds of agreement system: one syntactic and one morphological. We take these two agreement systems to represent parameters of Universal Grammar (UG), and consider the development of English auxiliaries to be an instance of a shift in the value of this parameter from a morphological system to a syntactic system. In section 2, after discussing the syntax and morphology of ME modals (section 2.2), we go on to consider the causes of the parametric shift: the loss of subjunctive inflection (section 2.3) and the general loss of inflections marking verb agreement (section 2.4). In section 2.5, we describe the parametric shift in more detail and point out some of its effects. Finally, in section 2.6 we briefly consider how root modals in ENE and presentday English fit into our account. In conclusion, we compare our account with those of Lightfoot (1974, 1979) and Steele et al. (1981), criticizing Lightfoot’s Transparency Principle and suggesting a way of viewing syntactic change in terms of a parameter-setting model of acquisition. Thus the paper has three distinct but related goals. First, as a contribution to Government Binding theory, we propose and motivate the condition on verbs. Second, the paper is meant to contribute to our knowledge of the history

6

Ian Roberts

of English by bringing together well-known facts in a novel way. Third, the paper exemplifies an approach to diachronic syntax adumbrated in Lightfoot (1979), where syntactic change is explicitly related to aspects of acquisition. 1.2

Theoretical Background

We assume the framework of Government Binding theory (henceforth GB theory), as in Chomsky (1981, 1982). The expansion of S is as in (4): (4) S → NP INFL VP GB theory consists of a small number of autonomous subsystems of principles. Of these subsystems, the most important in this paper is the theory of thematic relations (θ-theory). We briefly sketch the main points of θ-theory below. Before doing so, however, we introduce and define the notion of government, as this will be central in what follows. 1.2.1

Government

The definition of Government is as in (5): (5) α governs γ in a configuration like [β . . . γ . . . α . . . γ . . .] where: (i) α = X0 (a lexical element) (ii) where φ is a maximal projection, if φ dominates γ, then either φ dominates α, or φ is the maximal projection of γ (iii) α c-commands γ. (from Belletti and Rizzi, 1981, p. 12). Definition (5) means that a head α governs a node γ if and only if α c-commands γ and α c-commands no maximal projection which dominates γ except possibly the maximal projection of γ.5 C-command is defined in (6): (6) α c-commands β iff the minimal maximal projection dominating α also dominates β. Definitions (5) and (6) together define an upper limit to government. It is impossible for a head (α) to govern a node (γ) which is outside the minimal maximal projection dominating α. So α cannot govern γ in (7): (7) γ

ϕ (= α max)

α

Agreement Parameters and Modal Auxiliaries

7

On the other hand, if γ is located within the minimal maximal projection (φ) dominating α, three situations are possible: (i) φ is the minimal maximal projection dominating γ, as in (8): (8)

ϕ

α

γ

(ii) φ is the minimal maximal projection dominating the maximal projection of γ: (9)

ϕ

γ

a

max

γ

(iii) γ is more deeply embedded inside φ than in either (8) or (9): (10)

ϕ

δ max(≠ γ max)

a

γ

In all of (8), (9) and (10), α c-commands γ. In (8) and in (9) α also governs γ. In (10), however, γ is too deeply embedded within φ to be governed by α. We can see from this discussion that government is a more restrictive version of c-command. More concretely, if in (7–10) α = V, φ = VP and γ = NP or N0, we have the following configurations: (7ʹ) NP

VP

V

(8ʹ)

VP

V

NP

8 Ian Roberts (9ʹ)

VP

V

NP

N0

(10ʹ)

VP

V

X NP

In (7ʹ), NP is the subject of V. In (8ʹ, 9ʹ), NP is the object of V. In (10ʹ), NP is not a complement of V. So we see that government is intimately bound up with complementation. In fact, Chomsky (1981, p. 51) proposes that only positions governed by α can be subcategorized by α. Now consider certain proposals in the recent literature on morphology (cf. Lieber, 1980; Selkirk, 1982; Williams, 1981): (11) In a word of the form [w Stem +Af]:

a. Af subcategorizes for the stem; b. Af heads W.

The Affix -al, for example, makes Nouns into Adjectives. This is captured by saying that -al subcategorizes for a Noun. Thus -al cannot attach to Stems which are not Ns: (12) *outP + alA *carryV + alA *fortunateA + alA. In addition, (11b) says that when -al attaches to a stem, the resulting word will be of category A. This is illustrated in (13): (13) [transformation + al]A [industry + al]A [nation + al]A. So the Affix-Stem relation is a kind of head-complement relation. Then we might suggest that in (11), Af governs Stem; (11) would then be a case of (8), where α = Af, and γ = Stem. The proposal that Affixes govern Stems is immediately supported by (i) the fact that Affixes do not govern outside the Word—this is the analog of (7), where φ = Word; and (ii) the fact

Agreement Parameters and Modal Auxiliaries 9 that Affixes cannot ‘see into’ the structure of Stems they attach to—this is the analog of (10). (The proposal also predicts that Affixes can govern the head of the stem they attach to (cf. (9)). It is not clear that this prediction has any consequences.) We conclude that there exists both syntactic government and morphological government. These are instances of the same relation. 1.2.2 θ-theory An important aspect of complementation is that complement NPs are semantic arguments of the verb. Complement NPs are in a thematic relation with the verb, e.g., “source of action”, “goal of action”, etc. (cf. Gruber, 1965; Jackendoff, 1972). In other words, complement NPs bear thematic roles (θ-roles), which are assigned to them by the verb. It is clear that θ-role assignment is closely linked to subcategorization. However, it is also clear that θ-roles can be assigned to non-subcategorized material, in particular subjects. Associated with each lexical entry for a verb is a specification of the lexical item’s argument structure in the form of a thematic grid. The thematic grid is an unordered listing of θ-roles. For illustration the θ-grids of some verbs are given in (14): (14) a. give [θ1, θ2, θ3] b. hit [θ1, θ2] c. smile [θ1]. The θ-roles are assigned to the verb’s arguments, in accordance with the θ-criterion: (15) Each argument bears one and only one θ-role, and each θ-role is assigned to one and only one argument. (Chomsky, 1981, p. 36). Williams (1980) proposed that the arguments in a verb’s θ-grid be divided into two types: external arguments and internal arguments. Internal arguments are assigned their θ-roles from V by government, while the external argument is assigned its θ-role in subject position, i.e., externally to VP and so in a position that is not governed by V. (Chomsky (1981) proposes that VP compositionally assigns the external argument to the subject; cf. Marantz (1981) for evidence in favor of this idea.) θ-role assignment is in principle optional. However, θ-roles must in fact always be assigned because of the Projection Principle, which we state as follows: (16) The thematic properties of lexical items must be preserved at all syntactic levels.

10 Ian Roberts Together, (16) and (15) ensure that a verb like give will have exactly two internal positions at all syntactic levels; hit will have exactly one, and smile none at all. 1.3 V-Visibility6 The presentation of θ-theory and the Projection Principle so far has been standard. At this point, we introduce a further condition on θ-role assignment which is central to what follows: (17) V assigns θ-roles iff V is governed. Condition (17) holds at S-Structure. We will refer to (17) as the ‘V-VISIBILITY CONDITION’. “Governed” positions in (17) include both syntactically and morphologically governed positions. So (17), combined with the θ-criterion, forces a verb which assigns θ-roles to appear in one of the positions in (18) (where α is a head): (18) a. VP

a

V

b. a

V

c. [w V = Stem α = Af].

If a verb fails to appear in one of the environments in (18), it will not be ‘visible’ for θ-role assignment. However, the Projection Principle requires the arguments of the Verb to be present at all levels. As a result, these arguments will be present, but not θ-marked. And this violates the θ-criterion, (15). On the other hand, the θ-criterion is not violated if the arguments are not present, but then the Projection Principle is. Also, if a verb has no θ-roles to assign, then (17) will force such a verb to appear in an ungoverned position. If it appeared in a governed position it would have to assign θ-roles, but the Projection Principle prevents this. So a verb with no θ-roles to assign will have a radically different distribution compared to other verbs. We propose that there does exist a class of verbs with no θ-roles to assign: the modal auxiliaries. The examples of (1) show that these verbs have a different distribution to other verbs. Condition (17) forces modals to appear in an ungoverned position.

Agreement Parameters and Modal Auxiliaries 11 We make the following assumption about negation and inversion: (19) Negation and inversion are processes affecting INFL. Given (19), (1a,b) are evidence that only modals can appear in INFL. INFL is an ungoverned position, and so modals must appear there because of (17). Main verbs cannot appear in INFL because of (17). Let us return now to the configurations in (18) and consider possible values for α. In (18a), V is syntactically governed by α. Here the obvious candidates for α are INFL and V. In (18c), V is morphologically governed by α. Here α is some kind of verbal inflection, e.g., an agreement affix or some kind of participial affix. Configuration (18b) also involves syntactic government, but here it is not so clear what α could be. It follows from X-bar theory that V is the only head in VP, so either V or a must have moved from its base position. We will leave (18b) aside for the remainder of the paper. If V is governed by another V in (18a), what is the status of the governing V with respect to (17)? The governing V may itself be governed by yet another V or by INFL. If the governing verb is an auxiliary, it must be in INFL, as we saw above. If the governing verb has θ-roles to assign, then it must subcategorize for VP: this possibility is exemplified by the class of causative and perception verbs make, let, see, hear, etc. (Cf. Manzini (1983) for a treatment of these verbs as taking small-clause VP complements.) A third possibility is that α = INFL with no lexical auxiliary. In this situation, the abstract agreement features (AGR) in INFL govern V, if the clause is finite. If the clause is nonfinite, to governs V.7 Since (17) prevents auxiliaries from appearing in governed positions, we predict auxiliaries to be incompatible with agreement. This prediction is correct for modals, but incorrect for the aspectuals have and be. In fact, we have now isolated three properties which distinguish auxiliaries from main verbs in ways predicted by the claim that auxiliaries have no θ-roles. Main verbs are correctly distinguished from modals, but aspectuals seem to straddle the division, as (20) shows: (20)

θ-roles Main verbs + Modals − Aspectuals −

agreement + − +

appearance in INFL − + +

Agreement is the exceptional property here. For the purposes of this paper, we make the simplifying assumption that aspectuals show inherent agreement, and so in fact are not governed by AGR.8 In (18c), the Affix governing V may be a participial affix like passive -en or progressive -ing. Also, it may be an agreement affix. In languages with a variety of affixes marking agreement, we frequently find Verb-movement to INFL. This movement rule has been proposed for French (by Emonds,

12 Ian Roberts 1978), for German (by Safir, 1982) and for Welsh (by Sproat, 1983). Koopmann (1983) refers to this as the “the NP-type of V-movement”. She notes that this kind of V-movement has the following properties: (21) a. Movement does not take place when an auxiliary is present. b. Movement only takes place in finite clauses. c. Movement is always clause bound. We propose that the NP-type of V-movement is motivated by (17): it has to occur whenever the verb’s D-Structure position is not a visible position, i.e., is not governed by AGR or an auxiliary. Schematically, V-movement takes place as shown in (22): (22) a. D-Structure: NP [INFL[V[V—]Af]] [VP V . . .] b. S-Structure: NP [INFL[V Vi + Af]] [VP [V ei] . . .]. We propose that languages with ‘rich’ agreement systems in fact lack AGR. Affixes are generated in INFL, with an empty subcategorized verb-stem attached. Affixes are unable to govern out of the word they head, as we saw. So, in order to satisfy (17), the verb-stem moves from its D-Structure position into the empty verb-stem position in INFL. In this way, the verb-stem appears in environment (18c) at S-Structure and so satisfies (17).9 We can derive Koopmann’s generalizations about V-movement given in (21) from (17). First, the fact that movement does not take place when an auxiliary is present follows from the fact that the auxiliary already occupies the verb-stem position in INFL. Also, (17) is satisfied by the auxiliary syntactically governing V. Second, the lack of movement in nonfinite clauses is a consequence of the non-appearance of the agreement affixes in such clauses: (17) is satisfied in situ by infinitival affixes. Third, if we assume that movement always takes place to the nearest visible position, then the clauseboundedness follows, as V will always move to the nearest INFL. The discussion of (18a) and (18c) amounts to a description of two systems of agreement. In the first, AGR governs V in the configuration of (18a). We suggest that this type of agreement is typical of languages with little or no agreement morphology. We will refer to this system as a ‘syntactic agreement system’, since (17) is satisfied by syntactic government of V. The second system of agreement involves the base-generation of agreement affixes in INFL, and movement of V into the empty V position which these affixes subcategorize for. V then satisfies (17) in the configuration (18c). We refer to this second system as a ‘morphological agreement system’, since morphological government satisfies (17). Morphological agreement systems are typical of languages which are rich in agreement morphology.10 We now make explicit what was implicit in the above paragraph: the choice of a morphological or a syntactic agreement system is a parameter of UG. Our central proposal in this paper is that English has historically

Agreement Parameters and Modal Auxiliaries 13 developed from having a morphological agreement system to having a syntactic agreement system. We propose that this parametric change took place during the sixteenth century. The development of a class of modal auxiliaries was part of this change. The parametric change involved a change in the structures in which modals appeared. Formerly, modals assigned θ-roles and appeared in the configuration in (23), and met condition (17) by moving into INFL in the syntax. Modals were just like all other verbs in this respect: (23)

S

NP

INFL

VP

Af

V modal

VP ...

V

In (23), the modal syntactically governs the lower V. When modals were reanalyzed as auxiliaries, they no longer assigned θ-roles (cf. section 2.6 for a slight restatement of this point). So (17) forced them to appear in an ungoverned position. They thus were analyzed as appearing in a configuration like (24): (24)

S

NP

INFL

modal

VP

V

...

Here no agreement morphology or abstract AGR can be present. The modal syntactically governs V. The factors which led to the parametric change were: (25) (i) The use of modals as functional substitutes for the moribund system of subjunctive inflections. (ii) The morphological irregularity of the modals. (iii) The phonologically motivated obsolescence of agreement inflection. The first factor, (25i), meant that modals were interpreted as clausal operators specifying the mood of the clause, exactly like subjunctive inflections. Clausal operators do not assign θ-roles, and so modals could be construed as

14 Ian Roberts not assigning θ-roles. The second factor, (25ii), made it appear that modals lacked agreement, and (17) now forced them to be construed as clausal operators with no arguments. The increasing frequency of periphrastic constructions like (24), where V is syntactically governed and no agreement is present, combined with the general loss of agreement morphology, due quite independently to phonological change, the third factor, led to the resetting of the agreement parameter. The result of the change in the agreement parameter is that present-day English verbs can be divided into the two classes illustrated in (1). Modals can only appear in INFL, because of their lack of θ-roles and the resulting effects of (17). Thus modals are affected by negation and inversion. At the same time, (17) also prevents any kind of verbal affix from attaching to modals. Finally, present-day English lacks Verb-movement to INFL as a consequence of the parametric change. The absence of this rule has a range of consequences as we shall see in section 2.5. We now consider in more detail what happened.

2. The Changes 2.1 Introduction In this section we discuss in detail the various changes that took place in the auxiliary and agreement systems. We will argue that the syntactic differences between present-day English and Middle English verb systems, as outlined in the Introduction, are the result of one major parametric change and an associated lexical change. First we describe the syntax and morphology of ME modals, i.e., we look at their properties prior to their being reanalyzed as auxiliaries. We point out certain morphological and syntactic peculiarities which set these verbs off as a somewhat marked class even in ME. Sections 2.3 and 2.4 are both concerned with the loss of verbal inflection, in different ways. Section 2.3 focusses on the subjunctive mood, which as a morphological paradigm was moribund in late Middle English. Periphrastic constructions with modals replaced the subjunctive, adding to the incidence of syntactic government of V. Section 2.4 deals with the general loss of verbal inflection during late Middle English. We also consider the rise of the periphrastic construction with do at this point. This construction is important because its frequency greatly decreased the amount of evidence for a morphological agreement system available to learners of the language. This is so because do appears in INFL, syntactically governing V. Also, do was most common in the sixteenth century in precisely those structures where it is obligatory today, namely, in questions and negatives. These constructions provided evidence of Verb-movement to INFL, and therefore, given our assumptions, of a morphological agreement system.

Agreement Parameters and Modal Auxiliaries 15 Section 2.5 considers the parametric change and its consequences. The major consequence of the rise of a syntactic agreement system replacing the morphological system was the loss of Verb-movement to INFL. The loss of this rule has the immediate consequence of rendering do-support obligatory where it had formerly been optional. This, combined with the reanalysis of the modals as auxiliaries, explains the contrast in (1a,b). Other consequences of the loss of Verb-movement to INFL are the restriction of both quantifier-floating and adverb-placement to preverbal positions. We suggest that neither of these operations has changed historically but the loss of Verb-movement to INFL has given rise to this surface constraint. We also discuss the loss of other ‘verb-like’ properties of modals in this section: the loss of direct objects, and the loss of participial and other non-finite forms. Both of these developments follow from the absence of θ-roles, if (17) is correct. Finally, in section 2.6 we discuss the status of root modals since the reanalysis. We suggest that root modals have retained adjunct θ-roles. Adjunct θ-roles, however, are neither subject to the θ-criterion nor to (17). Hence, there is a sense in which ability can, for example, appears to take arguments. However, this kind of argument-taking is independent of (17), and so has no consequences for our proposals here. 2.2 Modals in Middle English 2.2.1 Syntax In the examples we have given of ME modals, such as (2), the modals appear preceding a verb in the infinitive. We take it then that ME modals subcategorized for VP. Thus they appeared in a position governing VP, in a structure like the following: (26)

S

NP

INFL

VP1

V

VP2

Modal

The modal must move to INFL in order to be morphologically governed and so meet (17). The modals have somewhat unusual θ-marking properties in (26). First, they subcategorize for VP, and so θ-mark VP. This means that VP in (26) is an argument. Arguments are generally interpreted as referential, but such an

16 Ian Roberts interpretation is not available for VPs. So we say that modals had nonreferential arguments. This is a marked property. More interestingly, it is clear from the examples in (2) that the subject of the modal is also the subject of the head of VP2. This can be most clearly seen in (2b,ii) and (2b,iii), which are repeated here for convenience: (2) b. (ii) Thy godfadirs wyff thow shalt not take You shall not take your godfather’ s wife. (c. 1450 Idley Instructions 2a. 1757: V 1489) (iii) A blynde man kan nat juggen wel in hewis A blind man cannot judge colours well. (c. 1387 Chaucer Troilus 2, 21: V 1624) In these examples, it is clear that thow and a blynde man are the respective subjects of take and juggen ‘judge’. This fact, combined with the general ‘epistemic’ meanings of the modals, leads to the suggestion that the modals were raising verbs in Middle English. Further plausibility is added to this idea by the fact that the equivalents of modals in a number of languages are raising verbs (e.g., most Germanic and Romance languages). If modals were raising verbs in Middle English, (26) would be replaced by (27): (27)

S

NP

INFL

VP

V

S¯

Modal

COMP

S

NP

INFL

VP

The subject of the lower clause becomes the subject of the matrix clause by means of Raising, one instantiation of Move-α. The subject of a raising verb is a nonthematic position. Because it is a nonthematic position, this position is a potential ‘landing site’ for NP-movement. All the diagnostics for raising are essentially tests to see if the subject position of a given verb is thematic or not. These tests include finding out whether expletives can appear in subject position, and whether idiom chunks can appear in subject position. If these possibilities exist, the subject is nonthematic.

Agreement Parameters and Modal Auxiliaries 17 Applying the second test first, it is very difficult to tell whether something is an idiom or not, in the absence of native-speaker intuitions on the matter. Hence this test is not useful for us. For the first test, it seems that ME modals could in some circumstances have expletive subjects. These were sometimes phonologically null, as in (28a) and (28c). However, all the examples have an oblique Case-marked NP associated with the subject of the complement clause. If we assume that oblique Case is inherent and that raising is motivated by the Case Filter, the presence of oblique Case-marked NPs associated with the subject of the complement clause indicates that these sentences do not involve raising. (Presumably the oblique NP controls the PRO subject of the complement.) (28) a. Mee moste nedys been dampned for this I will have to be damned for this. (1455 Speculum Misercordie, 251: V 1715) b. It deuit me no langare for to ly11 I must no longer lie. (c. 1490 Lancelot of the Lake, 18: V 57) c. Vs muste make lies We must tell lies. (c. 1440, York Myst. 164, 321: V 33) Thus the sentences in (28) do not provide evidence that ME modals were raising verbs. There are further difficulties with the idea that ME modals were raising verbs: it is difficult to show that the complement to modals was sentential. The complement is never tensed. On the assumption that to appears in INFL and that INFL is the head of S, the presence of to in infinitivals could be taken as evidence of a sentential complement. However, to rarely appears in such complements (but cf. (2b, i)). A final point on this matter: if we could show that ME modals were raising verbs, then the diachronic change could be schematized as in (29): (29) e Modal  S NPi VP ⇒ NPi Modal VP. This change closely parallels the synchronic rule of restructuring proposed for Italian by Rizzi (1982). The surface evidence we have is consistent with the idea that ME modals were restructuring verbs, but we have no clear positive evidence for this hypothesis. If the ME modals were restructuring verbs, then the change in their syntax would involve a reanalysis of the restructured structure as the D-Structure, with the corresponding alteration of the thematic properties of the modals. We conclude for now that ME modals appeared in the S-Structure configuration given in (26), with the

18 Ian Roberts added possibility that this configuration was the result of the application of restructuring. Some of the ME modals also had direct objects, as shown in (30): (30) a. for all the power thai mocht for all the power at their command. (1470 Henry, Wallace iii 396: Lightfoot (1979: 101)) b. Ich hit wulle heortlicher I want it very much. (c. 1225 Ancrene Wisse 199, 23 (ed. Tolkien)) c. God grante I mot wel achieve God grant that I’ll be able to achieve it. (c. 1390 Gower Conf. Am. I, 6 i: V 1689) We can clearly see that modals had exactly the status of main verbs in these examples. They assigned θ-roles to their subcategorized objects, and so were subject to (17). Middle English had a morphological agreement system, so in this usage the modals, like any main verb, moved into INFL in the derivation from D-Structure to S-Structure. To sum up, modals appeared preceding NP and VP at S-Structure. In the latter case, the possibility exists that the S-Structure was the result of restructuring, although I have been unable to find evidence to confirm this. There is also the possibility that ME modals were raising verbs; on this point, too, crucial evidence is lacking. 2.2.2 Morphology ME modals were a morphologically definable subclass of verbs. In fact, they had quite irregular conjugations. In the present tense, they had the regular second person singular agreement, but lacked third person singular agreement. They also had irregularly formed preterits, although the preterits showed the regular plural agreement. Both the lack of third person singular agreement in the present tense and the irregular preterits were the consequence of the modals’ membership of the Proto-Germanic class of preteritpresent verbs. Preterit-present verbs were verbs whose preterit had taken over the functions of the present, and so a new preterit had been formed by analogy. This class had about a dozen members in Old English, but just the modals were left by late Middle English, in the standard dialect (cf. Lightfoot, 1979, pp. 101–103, for documentation). Plural agreement was lost in the early sixteenth century (cf. section 2.4). After the plural endings had disappeared, the only agreement distinctions remaining were the second person singular and the preterit/present distinction. However, as Lightfoot points out, the preterit/present morphological

Agreement Parameters and Modal Auxiliaries 19 distinction did not correspond to the usual semantic opposition, and so to some extent pairs like shall and should came to be felt to be separate lexical items, rather than different tenses of the same verb. Lightfoot comments: the preterits seem to have been unstable from early times, perhaps as a result of competition from the subjunctive . . . The breakdown of the productivity of preterit/present relationship appears to have started quite early and the preterit and present tense forms developed uses independently of each other and the tense relationship between them was steadily eroded. (Lightfoot, 1979, p. 104) So we can see that ME modals were morphologically marked, semantically anomalous in at least one respect, and syntactically marked in taking VP as an argument (or possibly in being restructuring verbs). These factors by themselves may not have been sufficient for reanalysis—in fact they were present in the language for centuries before the modals were reanalyzed. However, combined with the loss of verbal inflections, and in particular with the loss of the subjunctive paradigm, the marked properties of ME modals we have seen in this section allowed the reanalysis to take place. 2.3 Loss of Subjunctive Inflection Here we consider the role played in the reanalysis of the modals by the loss of subjunctive inflections in Middle English. These inflections were not entirely lost; in fact the subjunctive still exists in some dialects and registers of present-day English.12 However, the ME period saw a considerable rise in the frequency of periphrastic constructions, consisting of a modal with an infinitive. These periphrases presumably grew in frequency due to the loss of distinctions between the indicative and the subjunctive caused by phonological changes. Visser commented on this: In the earliest periods of the English language the modally marked form [the subjunctive—IGR] was extensively used in all sorts of writings. That this did not remain so was in the first place due to the phonological changes the language underwent in the course of time. (Visser, 1963–73, p. 789) It seems that the subjunctive could be replaced by a modal in every major ME use of the subjunctive.13 With regard to the development of periphrastic subjunctives, I add nothing to the traditional account: the verbal inflections which manifested the subjunctive/indicative distinction no longer existed due to phonological change, hence a new means of expressing modality arose. This development was important for the parametric change because it meant that by late Middle English the modals commonly appeared as “semantic substitutes” for verbal inflection. This meant that modals were

20 Ian Roberts being construed as clausal operators, like subjunctive inflection. As clausal operators, modals assign no θ-roles. In a subjunctive clause, the head of VP and not the subjunctive inflection assigns θ-roles to the NPs in the clause; likewise the modal assigns no θ-roles, the head of its complement VP does. If modals have no θ-roles to assign, then (17) forces them to appear in ungoverned positions. In other words, if modals are semantic substitutes for the subjunctive, they have the same θ-properties as the subjunctive (i.e., none), then they must not be governed or show agreement. This in turn means that modals in these uses could appear in INFL, as in (24), governing V but not themselves governed. We suggest that it was possible for modals (rather than, say, adverbs) to functionally substitute for the subjunctive because they already expressed generally ‘modal’ notions. In other words, it is clear that the core lexical meaning of modals facilitated this part of the change. Also, we saw in section 2.2.2, that modals as a class appeared to lack agreement morphology. Thus, modals both appeared on morphological grounds not to meet (17), and were semantically compatible with an interpretation on which they were not construed as θ-role assigners. The decline in the subjunctive led to a rise in the number of constructions in which modals had to be construed as clausal operators. So, to be compatible with (17), modals were reanalyzed as lacking θ-roles. As a result of this reanalysis of the θ-properties of modals, modals were forced to appear only in ungoverned positions. This in turn increased the number of periphrastic constructions showing syntactic government of V still more. 2.4 Loss of Agreement Inflections As we mentioned in section 2.2.2, agreement inflections had almost disappeared by the mid-sixteenth century.14 The prime cause for the loss of these inflections was phonological, as was the case with the loss of other inflections in English (e.g., case endings on nouns, and markings of adjectival concord). The loss of verbal inflections took place over a long period; certain OE distinctions had already been lost by the beginning of the ME period. Also, it is obviously true that not all agreement was lost: -s for third person singular (3Sg) survives to the present day, and second person singular (2Sg) -st lasted as long as the pronoun thou, a considerable time after the sixteenth century. The crucial fact seems to have been the loss of plural agreement, and this happened in the sixteenth century. Loss of plural agreement meant that preterits (except for be) showed no person agreement, and that present tenses agreed only in 2Sg and 3Sg. Thus language learners at this time were faced with a highly impoverished morphological agreement system. We have already discussed, in section 2.3, the increase in the use of the periphrastic subjunctive in Middle English. Other periphrastic constructions

Agreement Parameters and Modal Auxiliaries 21 also arose during this period. Many authors (Jespersen, 1938; Traugott, 1969, for example) point out that the otherwise independent development of the progressive, perfect and passive during Middle English added to the number of periphrastic constructions. The most important such construction, however, is that with do. Do in these constructions was a semantically empty tense carrier, and so, like Tense, assigned no θ-roles. So (17) forces do to appear in INFL. In this respect do parallels the modals. Where modals were periphrastic substitutes for the subjunctive, do was a periphrastic substitute for tense. Periphrastic constructions with do were most common in the sixteenth century. One reason for this could be the loss of verbal inflection, leading to the use of a periphrastic construction to signal tense more clearly. We can put this intuitive statement into our terms and say that periphrastic do indicates a marked increase in the number of constructions with syntactic verb-government and V in situ as opposed to morphological V-government and V-movement to INFL. We now sketch briefly the rise of this construction. Do was used with a following infinitive throughout Middle English. It could either be a semantically empty tense carrier, as in Modern English, or a causativizer (cf. Ellegård, 1953, p. 208, on the relation between these). The following are examples of causatives with do: (31) a. that they kepyn and do kepyn . . . accorde and pes that they keep and make (others) keep accord and peace. (c. 1475 Gregory’s Chronicle p. 138: V 1212). b. they shall putt or done putt in any certaine place they shall put or have put (i.e., do + infinitive). (c. 1475 Gregory’s Chronicle p. 145: V 1212). We can see the causative meaning of these examples from the fact that the finite form of the main verb is given and then repeated with do. Visser says of this construction that ‘Soon after 1500 this do + infinitive pattern became obsolete’. The last example given is: (32)

Every such person . . . shall doe make a seale Every such person shall have a seal made. (MMED: 1541 Act 33 Henry VIII: V 1212).

So the causative do disappeared in the sixteenth century. The sixteenth century is noted for the frequent occurrence of semantically empty do. Jespersen comments: At first it [auxiliary do—IGR] was used indiscriminately without any definite grammatical purpose. In some poets such as Lydgate, in the beginning of the fifteenth century it served chiefly to fill up the line

22 Ian Roberts and to make it possible to place the infinitive at the end as a convenient rime-word. Sometimes it served to make the tense clear in verbs that are alike in present and preterite . . . “the holy spyryte dyd and dothe remayne and shall remayne” (Fisher c. 1535). The culmination was reached in the sixteenth century. . . But then a reaction set in and gradually restricted the use of do to those cases that are well known from grammars of Present English. (Jespersen, 1938, p. 195) These remarks are supported by the following statistical evidence, presented in graph form, from Barber (1976): (33) 100% 90% 80% 70% 60% 50% 40% 30% 20% 10%

1500

1600

1700

Figure 1.1 Auxiliary do. Percentage of do forms in different types of sentence, 1500–1700. Upper broken line: negative questions. Upper solid line: affirmative questions. Lower broken line: negative declarative sentences. Lower solid line: affirmative declarative sentences. (Adapted from Alvar Ellegård, The Auxiliary Do, University of Gothenburg 1953, from Barber (1976).)

Agreement Parameters and Modal Auxiliaries 23 Notice that do is particularly frequent in both questions and negatives. These are exactly the environments that provide evidence of a morphological agreement system by providing evidence of V-movement to INFL. In the late sixteenth century, up to almost 90% of negative questions, almost 60% of affirmative questions and almost 40% of negative sentences had a periphrasis with do. So in these cases no evidence for a morphological agreement system was available. The direct evidence for a morphological agreement system was rather slight after the loss of plural agreement inflections. The frequent occurrence of periphrastic constructions involving modals and do, combined with the impoverishment of agreement inflection, led to a change in the agreement system in the sixteenth century. The change was from a morphological agreement system to a syntactic system. In other words, V no longer moved into INFL in tensed clauses in order to be morphologically governed by an agreement affix, and thereby meet condition (17). Instead, V met (17) by being syntactically governed in its base position by some element in INFL, an auxiliary or abstract agreement feature (AGR). No agreement affixes appeared in INFL any more; instead, abstract AGR appeared. AGR is not an affix, and can govern out of INFL. Thus, where an auxiliary was not present, AGR syntactically governed V in its base position. Certain feature combinations were spelled out on the verb at PF: 2Sg as -st, 3Sg as -s or -(e)th. This situation obtains in present-day English, except that 2Sg has disappeared, and 3Sg is only spelled as -s. For convenience, we illustrate the change with the following diagrams: (34) a. Middle English: S

NP

INFL

[w Af]

VP

V

...

Af morphologically governs V.

b. Modern English: S

NP

INFL

} } AGR Aux

VP

V

...

AGR or Aux syntactically governs V.

24 Ian Roberts 2.5 Consequences of the Change The principal consequence of the change from a morphological to a syntactic agreement system was the impossibility of V-movement to INFL. Verbs no longer needed to move to INFL, as (17) was satisfied by syntactic government of V. Also, verbs no longer could move to INFL because there is no possible landing site. Assuming INFL, like any other node, contains only one position, that position is always occupied after the change by AGR or to or some auxiliary. The contrasting behavior of modals and main verbs in questions and negatives in present-day English, illustrated in (1), is also a consequence of the change in the agreement system: (1) a. Inversion: Must they leave? *Leave they? b. Negation: They cannot walk. *They walk not. The contrast illustrated here must hold in Modern English; modals must appear in INFL because they do not assign θ-roles, and so cannot appear in V, which is always governed by AGR, an Auxiliary or to. We are assuming that negation and inversion are operations on INFL—hence the behavior of modals in (1). Related to the above point was the development of obligatory do- support. As we saw in section 2.4, do-support had been optional in late Middle English and into the sixteenth century. Do-support is forced in the first place because V-movement to INFL was impossible. If we assume that not must attach to a lexical element in INFL, and that movement of INFL for inversion must move a lexical item, then we see why do is necessary in questions and negatives in Modern English. (17) predicts the absence of agreement on modals and the impossibility of modals appearing in nonfinite forms. In such cases, a modal would be morphologically governed by some participial affix, or syntactically governed by to in INFL. Thus the contrast in (1c,d) is explained. Another consequence was the disappearance of direct objects of modals. Lightfoot (1979, p. 101) gives the following as the last attestations of modals with direct objects (but see section 2.6): (35) a. the leeste ferthyng þat y men shal the least farthing that I owe to people. (c. 1425 Hoccleve, Min. Poems xxiii 695). b. Yet can I Musick too Yet I know music too. (1649 Lovelace, Poems (1659) 120). c. for all the power thai mocht for all the power at their command. (1470 Henry, Wallace iii 396 (cf. (30a)).

Agreement Parameters and Modal Auxiliaries 25 Modals were no longer able to assign θ-roles for the reasons we saw in sections 2.2 and 2.3. It follows from the θ-criterion that they could no longer have direct objects. We return to this issue, in the light of further data, in section 2.6. We have seen that the change in thematic properties of the modals, the change in the agreement system, and condition (17) together derive the facts of present-day English illustrated in (1) from the ME situation shown in (2). Some further consequences follow from the absence of V-movement to INFL. In Modern English, quantifiers semantically associated with the subject can ‘float’ rightward. However, no quantifier can float past a main verb, although they can appear to the right of auxiliaries: (36) a. They must have all left. b. *They must have left all. At an earlier stage of English, (36b) was allowed. In fact, (36b) died out in the sixteenth century (cf. Lightfoot, 1979, pp. 168–196). This change can be related to the loss of Verb-movement to INFL, without any change being posited in the rule of quantifier-floating itself. Assume that floated quantifiers have always appeared in the X-position in (37): (37)

S

NP

INFL

VP

X

V

...

As long as English had a rule raising verbs to INFL, as illustrated in (37), floated quantifiers could appear after the main verb at S-Structure. Once the V-movement to INFL rule was lost after the parametric change, however, quantifiers could no longer appear following the main verb. Since auxiliaries appear in INFL, floated quantifiers can still appear after auxiliaries. So we explain the Modern English contrast in (36) and its absence in Middle English in terms of the parametric change in the agreement system. Similar reasoning may well explain why adverbs stopped appearing between a tensed verb and its object in the sixteenth century. Lightfoot comments: “In ME light Adverbs . . . regularly occurred here [in immediate postverbal position—IGR], but from ENE it ceased to be a possible position: *he wrote well the poem, *he touched lightly her shoulder.” If we take the X-position in (37) as the position of adverbs in such cases, then the sentences where the adverb intervenes between verb and object are just like those with floated quantifiers. The disappearance of V-movement to INFL as a consequence of the change in the agreement parameter entailed the disappearance of sentences like those cited by Lightfoot above.15

26 Ian Roberts The above shows that the parametric change in the agreement system had a number of consequences for English syntax. Most of the consequences follow from the loss of V-movement to INFL. 2.6 Root Modals We have said that modals were reanalyzed as verbs with no θ-roles. One consequence of this reanalysis is that modals were unable to take direct objects. However, we do find late examples with direct objects, particularly with will and can. Sentence (38) is a seventeenth-century example of will with a direct object: (38) Where we would no pardon they laboured to punish us Where we wanted no pardon they laboured to punish us. (1643, OED will B22). See also (35b) for a late example with can. Warner (1983) points out that there is evidence that will and can, and to a lesser extent may, have main-verb properties later than the first half of the seventeenth century. The mainverb properties in question are infinitival and participial forms: (39) a. In evill, the best condicioun is not to wille, the second not to can In evil, the best way to be is not to want to (do it) and the second best not to be able to. (1607: Bacon G. PI. Ess. Arb. 242, in OED Can A. 5). b. If he had woulde, he might easily have . . . occupied the Monarchy If he had wanted to, he could easily have occupied . . . (1633: Donne Hist. Septuagint 226, OED Will B. 49). All of these examples involve root modals, as can clearly be seen from the glosses. (Note also that the last example of shall with a direct object is root, as it means ‘to owe’.) The above examples show that root readings, assigning a θ-role and appearing in nonfinite forms are correlated. On the other hand, epistemic readings, absence of nonfinite forms and having the status of a clausal operator correlate. This was the situation for a time in Early Modern English, it seems. These correlations are consistent with (17). ENE root modals assigned θ-roles, and so could be syntactically governed by to, as in (39a), or morphologically governed by participial morphology, as in (39b). Epistemic modals could not appear in these environments because of (17). We address two questions in this section: (i) What was the status of the modals in (38) and (39) with respect to the reanalysis of the modals as auxiliaries? (ii) What is the nature of the root/epistemic distinction in present-day English? To answer the second question first: we follow Zubizarreta (1982) in treating root modals essentially as modifiers. Root modals are analogous

Agreement Parameters and Modal Auxiliaries 27 to a class of sentential adverbs: Jackendoff’s (1972) agent-oriented adverbs. The evidence for the parallelism comes from contrasts like the following: (40) a. John can (ability) read Arabic. b. John deliberately read the forbidden text. (41) a. *Arabic can (ability) read easily. b. *The forbidden text deliberately read easily. Here we can see that both root modals and agent-oriented adverbs require an agent. Middle verbs, like read in (41), are formed from transitive verbs by a lexical process of deletion of the agent θ-role. The root modal and agentoriented adverb are grammatical with transitive read in (40), but ungrammatical with middle read in (41). The contrast between these examples is captured by saying root modals and agent-oriented adverbs require the presence of an agent argument in the clause they modify. Consider next (42): (42) Klamath can be/was deliberately heard in the wilds of Oregon. We saw that root modals and agent-oriented adverbs require an agent argument in the clause they modify. Example (42) shows us what happens to this requirement in passives where the derived subject cannot be construed agentively. This example is grammatical because an agent argument is considered to be ‘implicitly’ present (see Zubizarreta (1983) on implicit arguments). Because of the implicit argument, the root modal and agent-oriented adverb are allowed here. Some adverbs and some root modals are oriented to subject position rather than to an agent argument: in these cases, the selection requirement for an agent still holds, so (42) results as ungrammatical (cf. Jackendoff, 1972; Zubizarreta, 1982). For us, distinctions among adverbs are less important than the fact that modals and adverbs pattern alike as modifiers. We have now seen evidence that both root modals and agent-oriented adverbs have a semantic argument. However, this argument is always the argument of some other predicate. So, given the θ-criterion, we are led to suppose that root modals and agent-oriented adverbs do not assign θ-roles to their arguments. However, there is a modification relation between the root modal or adverb and the agent argument in the examples above. Zubizarreta captures this by proposing a different class of thematic relations: adjunct θ-roles. Adjunct θ-roles differ from ‘main’ θ-roles, i.e., all instances of θ-roles we have considered so far, in that they’re not subject to the θ-criterion. So adjunct θ-roles can be assigned to some argument already bearing a θ-role. Also adjunct θ-role assignment is optional. We adopt the notion of adjunct θ-roles here. We also exempt adjunct θ-role assigners from (17). So root modals appear in ungoverned positions in present-day English and assign adjunct θ-roles to the agent argument in the clause in which they appear.

28 Ian Roberts To return now to the first question, we have seen that root modals, as adjunct θ-role assigners, pose no problem for our account, as long as we assume that adjunct θ-role assigners are exempt from (17). Also, root modals appearing as main verbs pose no problem, as they were syntactically governed, and so able to assign main θ-roles. The question now is: how did the constructions illustrated in (38) and (39) die out, and root senses of auxiliary modals develop? In fact, given Zubizarreta’s theory of adjunct θ-roles, nothing in our account of the development of modal auxiliaries prevents some of that class from being adjunct θ-role assigners. Adjunct θ-roles seem to form quite a limited semantic class, having primarily to do with notions of volition and intention. The main θ-roles assigned by ME modals had mainly to do with such notions, so these θ-roles were reanalyzed as adjunct θ-roles, when (17) and the other factors discussed in sections 2.2 and 2.3 led to the reanalysis of the modals. The crucial aspect of the change is that modals were not construed as assigning main θ-roles. We can conclude that alongside the ‘main-verb’ ENE root modals in (38) and (39) there were auxiliary root modals. The main-verb root modals were then redundant. We commented earlier on the marked features of ME main-verb modals. After the reanalysis of modals as auxiliaries, the surviving main-verb root modals would have been even more marked. A further factor, noted by Lightfoot (1979, pp. 112–3), is the rise of ‘semi- auxiliaries’ like have to, be able to and be going to. These paraphrases took over many functions of the modals in nonfinite clauses. It is unclear whether they existed before the reanalysis, but it is clear that they became more common during the sixteenth century. The availability of these paraphrases must have speeded the decline of the highly marked main-verb modals. As a result, main-verb modals are not attested later than the seventeenth century. So main-verb root modals were not exactly ‘replaced’ by auxiliary root modals: auxiliary root modals could have existed from the time of the reanalysis, given certain plausible assumptions about adjunct θ-roles, and main-verb root modals were both marked and redundant, and eventually replaced in finite clauses by auxiliaries, and in nonfinite environments by semi-auxiliaries. 2.7 Conclusion We have presented in this section a parametric change from the history of English. The change was from a morphological system of agreement to a syntactic system. An important aspect of this change was the development of a class of verbs which did not assign main θ-roles: the modals. The causes of the change were various morphological irregularities of modals in Middle English, and the loss of verbal inflections—in particular inflections marking plural person agreement and the subjunctive/indicative mood distinction. The effects of the change were quite wide-ranging, but the principal effect was the loss of Verb-movement to INFL.

Agreement Parameters and Modal Auxiliaries 29

3. Conclusion This conclusion has three sections. In the first two we briefly discuss earlier accounts of the facts considered in section 2; section 3.1 deals with Lightfoot (1974, 1979) and section 3.2 covers Steele et al. (1981). In section 3.3 we suggest a way of relating language change and language acquisition that incorporates the insights of a parameter-setting model of acquisition. We propose that the Transparency Principle of Lightfoot (1979) can be eliminated as its major results are guaranteed by a parameter-setting approach. 3.1 Lightfoot (1974, 1979) Here we will focus primarily on the 1979 account. Lightfoot discusses the changes we have analyzed in the context of arguing for the possibility of radical restructuring of grammars. The crucial concept is the Transparency Principle. Lightfoot’s Transparency Principle forces changes to take place when the relationship between the adult grammar underlying the input data to acquisition, and the actual input data the acquirer receives is “too opaque”. This opacity typically arises from the accumulation of irregularities. So, for Lightfoot, the change from the ME grammar (no separate class of auxiliaries) to the Modern English grammar (a distinct class of modals; do-support) took the form of a radical restructuring of the PS-Rules of the base in Early Modern English. This restructuring involved in particular the introduction of a new category: Aux. The restructuring was forced by the Transparency Principle, and took place abruptly in the early sixteenth century. The Transparency Principle came into operation because modals no longer had enough verb-like properties to be analyzed by language acquirers as verbs, so the reanalysis was forced. Lightfoot (1979, pp. 101–104) gives five independent causes for the opacity of the categorial membership of the ME modals. Lightfoot goes on to claim that these five factors led to a situation in Early Modern English where language learners ceased analyzing the modals as members of the category V, and instead posited a separate category Aux. This in turn led to the change in the PS-Rules in the base, and the reassignment of some lexical items, the modals, to the category Aux. The consequences of this change include the reformulation of the rules of negation and inversion so as to affect the new Aux node, instead of affecting V, as had formerly been the case. Aux includes do, so the change in these rules captures the development of do-support for negation and inversion. The other major consequence of the change in the PS-Rules and the reassignment of modals to Aux was the disappearance of infinitival and participial forms of modals. All these surface changes result from the restructuring of the base rules and the introduction of a new category. Aux is inherently tensed, so no

30 Ian Roberts nonfinite or participial forms can appear. Also, Aux does not iterate, so double-modal sequences do not appear. Our account is quite close to Lightfoot’s. We consider many of the same factors to be involved (in particular the morphological and semantic irregularity of ME modals). The major difference is that our account is framed within a principles-and-parameters theory, while Lightfoot’s is cast in terms of a rule-based theory. For us, the entire complex of changes is essentially driven by (17). This one principle underlies the different kinds of agreement system—the two systems represent the two ways of satisfying this principle. Also, because of (17), all the properties of Modern English modals (appearance in INFL, lack of affixes) follow from the fact that modals lack main θ-roles. Thus our account is conceptually superior to Lightfoot’s. Our account also covers a wider range of data than Lightfoot’s as we can cover the changes in quantifier floating and adverb placement as consequences of the loss of V-movement to INFL (cf. section 2.5). Nevertheless, it should be clear that our account owes a lot to Lightfoot’s. 3.2 Steele et al. (1981) Steele et al.’s account is primarily a reworking of Lightfoot’s data in terms of a theory of a universal Aux node. Their main point is that Lightfoot’s account ‘is on the right track, but incomplete’ (p. 283). There are two respects in which they alter Lightfoot’s account: (i) they regard the loss of the subjunctive/indicative distinction as central, (ii) Steele et al. claim that inversion and negation always affected Aux. We can see from the second point that Steele et al. consider Aux to have been present all along. In Old and Middle English, there was a rule of Aux-attachment to V. Tense was contained in Aux. So this is why inversion and negation only affected tensed verbs in Middle English. They note that this means that negation and inversion rules were unchanged by the reanalysis of the modals: We attribute the changes [in the rules of negation and inversion—IGR] not directly to the reanalysis of the modals, but rather to the loss of the obligatory attachment of Aux. (Steele et al., 1981, p. 282) The Aux-attachment rule became optional in Middle English. This optionality is related by Steele et al. to the general rise in periphrastic constructions in this period; in particular the appearance of periphrastic do, which they attribute, as in classical analyses of do-support (e.g., Chomsky, 1957), to stranding of Tense when the attachment rule does not apply. We concur with the account given by Steele et al. for negation and inversion. This account captures the fact that the change in these rules did not involve any change in the modals, or in the form of the rules in question, but

Agreement Parameters and Modal Auxiliaries 31 rather in all the other verbs of English: the majority of verbs changed their behavior with respect to these processes while the modals did not. However, on our account, this change in the majority of verbs is captured by the loss of the V-movement to INFL rule, not the Aux-attachment rule. We have given a principled explanation in terms of condition (17) and the theory of government for why Verb-movement to INFL must exist. Also, we showed how the loss of this rule was a consequence of a parametric shift in the system of agreement, caused by the impoverishment of agreement paradigms and the rise of periphrastic constructions. Our account therefore has a more principled basis and wider implications. 3.3 The Transparency Principle and Parameters The main theoretical defect of Lightfoot’s approach concerns the Transparency Principle. This principle is put forward as an inductive generalization about the theory of grammar. The idea is that only a certain amount of opacity can be tolerated in grammars before they will be necessarily abductively restructured through acquisition. However, as Warner (1983) points out, it is not clear what ‘opacity’ really is. We will suggest here that, given the notion of parameter-setting through acquisition, as described in Chomsky (1981), the Transparency Principle has the force of making one parametric setting ‘too opaque’ and favoring the selection of another. As choosing among parameters is the main task of acquisition, the Transparency Principle is ‘built-in’ to a parameter-setting model. Thus we can dispense with the Transparency Principle as a separate principle. One thing that follows from our abandonment of the Transparency Principle as a motivating force behind syntactic change is that we are no longer compelled to regard changes as resulting from the accumulation of exceptional properties, but rather as resulting from the interaction of possibly quite independent factors. It is also possible for some factor in a change to be a feature of the grammar for a long time before becoming a factor which leads to a change in conjunction with some other, otherwise independent factor. For example, modals were irregular for centuries before the sixteenth century (see section 2.2.2); however, in conjunction with the loss of subjunctive and plural verbal inflections the irregularity of modals became a factor leading to a parametric change. We now briefly propose an alternative to Lightfoot’s earlier proposals for the relation of language change to language acquisition, one which does not make use of the Transparency Principle.16 The crucial notion is that of a ‘parameter of Universal Grammar’. This notion is outlined in the following way by Chomsky: In a highly idealized picture of language acquisition, UG [Universal Grammar—IGR] is taken to be a characterization of the child’s prelinguistic initial state. Experience—in part, a construct based on the

32 Ian Roberts internal state given or already attained—serves to fix the parameters of UG, providing a core grammar, guided perhaps by a structure of preferences and implicational relations among the parameters of core theory. If so, then considerations of markedness enter into the theory of core grammar. (Chomsky, 1981, p. 7) If we take this view of acquisition, and continue to regard acquisition as the driving force behind language change, we are led to the parameter-changing view of language change that we have adopted in this article. We will give an illustration as follows: we define a syntactic change as a difference in the value of at least one parameter of UG over a period of time. Now imagine a parameter P with the potential values [+F] and [−F]. For concreteness, take P to be agreement systems and [+F] to morphological agreement, with [−F] therefore syntactic agreement. From the point of view of a learner of English in the early sixteenth century, [+F] is the value of P in the core grammar which underlies the language behavior of the surrounding speech community. Owing to the large number of periphrastic constructions with modals and do and the lack of agreement morphology on verbs, the acquirer initially sets P at [−F], i.e., the acquirer assumes that English has a syntactic agreement system. How can P now be ‘reset’ to [+F], the value corresponding to that of the surrounding speech community? The only possible way for this to happen would be on the basis of strong positive evidence, disconfirming the original hypothesis ([−F] = syntactic agreement) and causing the child to arrive at [+F] (= morphological agreement). However, such positive evidence is not always available in the trigger experience. Moreover, the same evidence that led the acquirer to posit a parametric difference with respect to the adult grammar may often lead to further reanalysis—a case in point being the reanalysis of -s, -st agreement affixes as spell-outs of agreement features rather than base-generated affixes. We can see from this illustration how an acquirer may never be led to reset a parameter. In this way a syntactic change is initiated in the speech community,17 since the acquirer’s grammar contains one parametric setting which differs from that in the grammar underlying the input data he or she received. The above account is only intended as an outline. In the absence of a developed markedness theory and learnability theory we are not in a position to say how much irregularity or indeterminacy in the input data is enough to cause an acquirer to set a parameter in a way that does not correspond to the setting underlying the input data. However, theoretically motivated work in diachronic syntax can lay some groundwork for markedness and learnability theory by uncovering examples of parametric changes. We can then begin to approach the problems of markedness and learnability inductively.

Agreement Parameters and Modal Auxiliaries 33

Notes * The material in this paper has been presented before audiences at Salzburg Comparative Syntax Festival, USC, MIT and UC Berkeley. Those audiences have all contributed helpful comments. I am also indebted to Joseph Aoun, Mürvet Enç, Osvaldo Jaeggli, George Lakoff, David Lightfoot, David Pesetsky and two anonymous NLLT reviewers for useful discussion of the ideas in this paper. The biggest thanks are due to Nigel Fabb, with whom the central notion of this paper was formulated. All mistakes, of course, are inalienably mine. 1. We take modals to be ordinary verbs. The motive for proposing that modals are members of a separate category of auxiliaries, or are verbs marked [+Aux] is precisely the exceptional properties of modals compared to main verbs illustrated in (1). In this paper we show that these properties derive from two properties (i) the fact that modals assign no (main) θ-roles, and (ii) a condition requiring verbs with θ-roles to appear in governed positions and verbs with no θ-roles to appear in ungoverned positions (Condition (17)). These two properties together allow us to continue to regard modals as verbs. Since modals clearly were verbs in Middle English, as (2) shows, we do not need to view the historical change as a category change from V to Aux, or as the addition of the feature [+Aux] to the modals. 2. In some dialects of Modern English, sequences of modals are grammatical. I will have nothing to say about those dialects here. Traugott (1972) gives double-modal sequences as one example of a property of Black English taken over from Early Modern English without change. 3. Old and Middle English exhibited the ‘verb-second’ (V2) phenomenon, also found in Modern German and Dutch. In a tensed root clause, a tensed verb must appear in second position. (2a, i) and (2a, iii) exhibit this in Middle English. We assume, following current proposals (den Besten, 1977; Evers, 1981; Haider, 1984; Koopmann, 1983; Koster, 1975; Thiersch, 1978) that a rule of INFL-fronting is involved in the derivation of clauses with V2 order. This rule is fed by a rule moving V into INFL in tensed clauses, cf. section 1.3. Examples (2a, i) and (2a, iii) show that this rule applied indifferently to modals and to main verbs. This situation no longer holds for the only Modern English INFLfronting process (Subject-Aux inversion, or SAI). SAI can only affect auxiliaries, as (1a) shows. We assume that Middle and Modern English both have the same INFL-fronting rule, and we will see in the course of the paper how the range of application of this rule became restricted. The first restriction was on the range of environments in which the rule applied; the general development of SVO order which replaced the earlier SOV order in Middle English eliminated the V2 phenomenon. This left the type of inversion seen in (2a, ii) intact, but eliminated that seen in (2a, i) and (2a, iii). Section 2 accounts for the second restriction: the loss of the possibility to front main verbs, leaving only Subject-Aux Inversion. This change followed the loss of V-movement to INFL. Cf. section 2.5. 4. Middle English examples are mostly taken from Visser (1963–73). In such examples, the citation is followed by the reference given by Visser, using his abbreviatory conventions, followed by V and the number of the paragraph of Visser the example was taken from. Other examples of Middle and Early Modern English are as cited. 5. I am grateful to an anonymous reviewer for this perspicuous reformulation of (5). 6. The ideas in this section were developed in close collaboration with Nigel Fabb. 7. So nonfinite AGR governs VP but not the subject.

34 Ian Roberts 8. If we do not make the simplifying assumption of the text, (20) suggests that the presence or absence of agreement relates to some property other than θ-role assignment. So condition (17) would relate to some condition other than θ-role assignment. In forthcoming work (Roberts (1985)), I suggest that (17) should be stated in terms of selection rather than θ-role assignment. Lack of θ-role assignment is the property that allows aspectuals to appear in INFL. However, (17) is stated in terms of selection, and aspectuals arguably select for properties of VP. This approach involves reformulating the Projection Principle and lexical theory in such a way that selection does not entail θ-role assignment. However, the details of the reformulation and all its implications go beyond this paper’s aims. For this reason, we leave aspectuals aside, in rather unsatisfactory limbo. 9. Note that this is an instance of movement to a ‘complement’ position. The Projection Principle and the θ-criterion generally rule out such movements (e.g., Raising to Object), as they force complements to be θ-marked, and movement to a θ-marked position always violates the θ-criterion. Here the Projection Principle is not violated, however, because affixes have no thematic relation with the stems which are their ‘complements’. Also, it is only possible for a Stem position to be filled at D-Structure by an auxiliary; if a main verb, i.e., a verb with θ-roles, appears there at D-Structure, it will be unable to govern its complements and so the θ-criterion will be violated. In fact, this is the reason why only θ-role-less verbs, (i.e., auxiliaries) can appear in INFL at D-Structure. 10. Another property has been correlated with rich agreement morphology in recent work: the prodrop parameter (cf. Chomsky, 1981, Chapter 4; Rizzi, 1982, Chapter 4). Chomsky suggests that Rule R, which lowers INFL onto V, takes place in the syntax in prodrop languages, and this is what leads to the properties associated with prodrop. This would be consistent with the theory of Verb-movement proposed in the text. We could say that some languages had affix-movement to V instead of V-movement to affix, the latter being what is outlined in the text. Another view has been adopted more recently, by Osvaldo Jaeggli in class lectures. Jaeggli holds that agreement morphology is able to form a kind of clitic-chain with an empty category, pro, in subject position. The agreement morphology has the effect of identifying pro, identification being the main requirement for pro. This proposal makes interesting predictions in conjunction with our theory of Verb-movement to INFL. We can derive the following implication: (i) if there is rich agreement, there will be Verb-movement to INFL, (ii) if there is prodrop, there will be Verb-movement to INFL, (iii) if there is little or no agreement, there will be no Verb-movement. Modern English is consistent with these implications, having little agreement, no Verb-movement to INFL, and no prodrop. Italian and Spanish are also consistent, having rich agreement, Verb-movement to INFL and prodrop. Middle English and Modern German have Verb-movement, rich agreement and no prodrop. Another interesting point in this connection is assignment of Nominative Case. We are assuming that languages with rich agreement lack AGR, so we can ask how Nominative Case is assigned in these languages. For prodrop languages we can say one of two things: either the empty category in subject position of a tensed clause does not need Case (certainly the Case Filter does not require it), or Case is transmitted via the chain formed with the agreement affix in INFL. For languages with no prodrop, i.e., Middle English and Modern German, the question is more acute. We may conjecture that the

Agreement Parameters and Modal Auxiliaries 35 INFL-fronting process that underlies verb-second may be relevant here (cf. fn. 3 and Koopmann, 1983). 11. ‘deuit’ here is the 3Sg. present form of dowen (OE dugan). This verb was a ME modal, and former preterit-present verb that meant roughly ‘to be fitting’. It died out of the Standard language in late Middle English but survived in certain dialects until the nineteenth century (cf. Lightfoot, 1979, pp. 102–3). 12. According to Visser, who cites other commentators on English, the Modern English subjunctive as seen in examples like (i) is unique to the twentieth century, and most common in American English: (i) I require that he be there at 8.

In fact, this construction can give further support to our claims about Verbmovement in Modern English. We propose that the complement in (i) contains an empty modal. The empty modal appears in INFL and is selected by the matrix verb. The fact that the ‘subjunctive’ verb always appears in a stem form (with the exception of the fossilized if I/he were) follows automatically, as only stem forms can follow modals. Moreover, overt modals are impossible in subjunctive complements like (i). If the subjunctive is an empty modal, this fact is simply a case of the general prohibition against double-modal sequences, which is itself a consequence of (17). Likewise, the possibility of aspectual have and be in subjunctive complements is explained. The empty modal syntactically governs the verb in (i). We can see that the verb does not move into INFL from the position of clausal negation, which precedes the verb: (ii) I suggest that he not be there by 8. In (ii), not is in its normal position, between INFL and VP. Aspectual auxiliaries are able to move into INFL in present-day English (cf. Emonds, 1976; Akmajian et al.,1979). However, they are unable to appear before not in subjunctive complements: (iii) *I require that he be not there by 8. (iv) *I require that he have not left before I arrive.

If there is a phonologically null modal in INFL, the impossibility of have/be raising in subjunctive complements is explained. In Middle and Early Modern English, however, the situation was different. At these periods, we find the verb and not in the reverse order in subjunctives: (v) Beware thou that thou bring not my son thither again. (1611, Bible, Gen 24, 6: V 869).

With the verb—not order, assuming that not is always between INFL and V, Verb-movement to INFL must have taken place. In this case, the verb was morphologically governed. The present-day order, on the other hand, shows no evidence of Verb-movement. Instead, the verb is syntactically governed by the empty modal in INFL. This change is a function of the parametric shift from morphological to syntactic government. 13. The subjunctive appeared mainly in the following environments: subject clause (whoever hate his brother . . .), relatives (the properties that a king have), conditionals (thou art dead if thou speak one word), temporal clauses (if and when the need of work allow not such leisures to be taken), purpose clauses (the properties that are required to an argument, that it be full and formal), result

36

Ian Roberts

clauses (God keep him, that he come not to such a pass), complements to verbs of saying (Ask his father where he be), complements to verbs of fearing (I dread that he become my bane), complements to verbs of wishing (Christ wants that his glory last). For each of these uses of the ME subjunctive, it is possible to find parallel instances of periphrastic constructions with modals. 14. To quote an (almost) contemporary source: In former times, till about the reigne of King Henry the eighth, they [plural forms of Verbs—IGR] were wont to be formed by adding -en thus loven, sayen, complainen. But now (whatsoever is the cause) it has growne quite out of use, and that other so generally prevailed, that I dare not presume to set it a-foote againe. Albeit (to tell you my opinion) I am perswaded, that the lack hereof well considered will be found a great blemish to our tongue. (Ben Jonson, 1637) Henry VIII reigned from 1509 to 1547. So the final loss of plural agreement coincides very closely with the date for the parametric change, which we could put at the mid-to-late sixteenth century. 15. The Adjacency Condition on Case Assignment prevents the adverb from appearing inside VP, intervening between the verb and the NP it Case-marks. Notice how our theory of Verb-movement deals with apparent counterexamples to this condition in Middle English. 16. I should stress at this point that this view of language change has much in common with that given in Lightfoot (1979). The main innovations are due to the incorporation of advances in linguistic theory. In fact, this kind of view was proposed recently by Lightfoot (class lectures, LSA Institute, UCLA, 1983). 17. What we have said only covers the instigation of a change. The spread of a syntactic change through a speech community is presumably governed by constraints like those observed for phonological change by Labov (1972). Note that Labov’s account presupposes the existence of a change.

References Akmajian, A., S. Steele and T. Wasow: 1979, ‘The Category AUX in Universal Grammar’, Linguistic Inquiry 10(1), 1–64. Barber, C.: 1976, Early Modern English, Andre Deutsch, London. Belletti, A. and L. Rizzi: 1981, ‘The Syntax of ‘ne’: Some Theoretical Implications’, The Linguistic Review 2(1), 117–155. den Besten, H.: 1983, ‘On the Interaction of Root Transformations and Lexical Deletive Rules’, in W. Abraham (ed.), On the Formal Syntax of the Westgermania, John Benjamins, Amsterdam, pp. 47–132. Chomsky, N.: 1957, Syntactic Structures.(Janua Linguarum, 4). Mouton, The Hague. Chomsky, N.: 1981, Lectures on Government and Binding, Foris Publications, Dordrecht. Chomsky, N.: 1982, Some Concepts and Consequences in the Theory of Government and Binding, MIT Press, Cambridge, MA. Culicover, P.: 1976, Syntax, Academic Press, New York. Ellegård, A.: 1953, The Auxiliary ‘do’: the Establishment and Regulation of its Use in English, Almqvist and Wiksell, Stockholm. Emonds, J.: 1976, A Transformational Approach to English Syntax: Root, StructurePreserving and Local Transformations, Academic Press, New York. Emonds, J.: 1978, ‘The complex V—Vʹ in French’, Linguistic Inquiry 9, 151–175. Evers, A.: 1981, ‘Verb-Second Movement Rules’, in Wiener Linguistische Gazette 26.

Agreement Parameters and Modal Auxiliaries 37 Fabb, N. and I. Roberts: in preparation, The English Auxiliary System. Gruber, J. S.: 1965, Studies in Lexical Relations. MIT Ph.D. dissertation, distributed by Indiana University Linguistics Club. Haider, H.: 1984, ‘Topic, Focus, and V-Second’, in GAGL 25. Jackendoff, R.: 1972, Semantic Interpretation in Generative Grammar, MIT Press, Cambridge, MA. Jespersen, O.: 1909–49, A Modern English Grammar on Historical Principles, Vols. I–III, Allen and Unwin, London. Jespersen, O.: 1938, Growth and Structure of the English Language, Allen and Unwin, London. Jonson, Ben: 1637, English Grammar (edited by A.V. Waite, 1909). Koopmann, H.: 1983, The Syntax of Verbs: From Verb-Movement Rules in the Kru Languages to Universal Grammar. Unpublished Ph.D. dissertation, McGill University. Koster, J.: 1975, ‘Dutch as an SOV language’, Linguistic Analysis 1, 111–136. Labov, W.: 1972, Sociolinguistic Patterns, University of Pennsylvania Press, Philadelphia, PA. Lieber, R.: 1980, On the Organization of the Lexicon, Unpublished Ph.D. dissertation, MIT. Lightfoot, D. W.: 1974, ‘The Diachronic Analysis of English Modals’, in Anderson, J. and C. Jones (eds.), Historical Linguistics, Proceedings of the First International Conference on Historical Linguistics, Amsterdam, North Holland. Lightfoot, D. W.: 1979, Principles of Diachronic Syntax, Cambridge University Press, Cambridge, England. Manzini, M. R.: 1983, Restructuring and Reanalysis, Unpublished Ph.D. dissertation, MIT. Marantz, A.: 1981, On the Nature of Grammatical Relations, Unpublished Ph.D. dissertation, MIT. Palmer, F. R.: 1974, The English Verb, Longmans, London. Pullum, G. and D. Wilson: 1977, ‘Autonomous Syntax and the Analysis of Auxiliaries’, Language 53(4), 741–789. Rizzi, L.: 1982, Issues in Italian Syntax, Foris Publications, Dordrecht. Roberts, I. G.: 1985. The Representation of Implicit and Dethematized Subjects. USC Ph.D. dissertation. Safir, K.: 1982, ‘Inflection-Government and Inversion’, The Linguistic Review 1, 417–467. Selkirk, E.: 1982, The Syntax of Words, MIT Press, Cambridge, MA. Sproat, R.: 1983, ‘VSO Languages and Welsh Configurationality’, in MIT Working Papers in Linguistics, Vol V. Steele, S. et al.: 1981, An Encyclopedia of AUX. A Study of Cross-Linguistic Equivalence, MIT Press, Cambridge, MA. Thiersch, C.: 1978, Topics in German Syntax, Unpublished Ph.D. dissertation. Traugott, E.: 1969, ‘Diachronic Syntax and Generative Grammar’, in R. Lass (ed.), Approaches to English Historical Linguistics, Holt, Rinehart and Winston, New York. Traugott, E.: 1972, A History of English Syntax: A Transformational Approach to the History of English Sentence Structure, Holt, Rinehart and Winston, New York. Travis, L.: 1984. Parameters and Effects of Word Order Variation. Unpublished Ph.D. dissertation, MIT. Visser, Th.: 1963–73, An Historical Syntax of the English Language, Vols. I–IIIb, E. J. Brill, Leiden, Holland. Warner, A.: 1983, ‘Review of D. W. Lightfoot: Principles of Diachronic Syntax’, Journal of Linguistics 19, 187–209. Williams, E.: 1980, ‘Predication’, Linguistic Inquiry 11(1), 203–238.

38 Ian Roberts Williams, E.: 1981, ‘On the Notions ‘Lexically Related’ and ‘Head of a Word’,’ Linguistic Inquiry 12(2), 245–274. Zubizarreta, M. L.: 1982, On the Relation of the Lexicon to Syntax, Unpublished Ph.D. dissertation, MIT. Zubizarreta, M. L.: 1983,‘The Relation of Morpho-Syntax to Morpho-phonology: The case of Romance Causatives’, Mimeographed, MIT.

2

A Computational Model of Language Learnability and Language Change Robin Clark and Ian Roberts

1. Introduction Darwin’s (1859) theory of natural selection had an important influence on the Neogrammarians. Like Darwin, they believed that diachronic change was the result of selective pressures on organisms from the environment operating on random variation within a population (see Haldane 1990 for a classic exposition of natural selection as the motive force underlying evolution). Darwin proposed that natural selection was accounted for by the greater reproduction rates of fitter organisms; in the linguistic realm, Paul (1920) proposed that language change is driven by restructuring of the target grammar that may take place during language acquisition. If the input to language acquisition is taken to be the environment and if language acquisition is taken to be the linguistic correlate of biological reproduction, a clear parallelism between Darwin’s view of natural selection and Paul’s view of the selection of grammars emerges. Despite the appeal of this notion, no successful evolutionary theory of the relationship between language acquisition and language change has been developed in the 130 years since Darwin’s On the Origin of Species. The purpose of this article is to relate natural selection, language acquisition, and language change in light of current computational models of learning. The basic problem for the hypothesis that language change is driven by acquisition concerns the relationship between the adult input, which is generated by one grammar, and the learner’s hypotheses, which may differ at certain points from the adult grammar. We have grown accustomed to thinking of acquisition as a relation between linguistic experience and a target grammar; the learner must converge to a single target grammar in order for learning to be considered successful (see Gold 1967, Osherson, Stob, and Weinstein 1986). Although this idealization has proven useful in the study of the logical problem of language acquisition, it renders opaque the relationship between language acquisition and language change. If each generation converges successfully to the adult grammar, how can languages ever change? One would expect them to remain forever fixed since change entails that there must be at least one generation whose grammar differs

40 Robin Clark and Ian Roberts from its parents’ grammar; yet, by definition, this generation would have misconverged. We can easily state the problem in terms of parameter setting. Acquisition is a process of accurately fixing parametric values. That is, the learner sets parameter pn to the value vi in response to some property, ci, of the input text; the usual idealization states that the learner has successfully converged to the value vi for the parameter pn if the target grammar has pn set to vi. Language change, on the other hand, presupposes that a population must converge on a value vi for at least one parameter, p, where the adult grammar has p(υj) and υi ≠ υj. Strictly speaking, the learner has failed to learn. More puzzling still, the property ci of the input text that allowed adults to induce pn(υi) when they were learning the language should be present in the speech that they, in turn, address to children. How is it that, for one generation, property ci causes learners to hypothesize pn(υi) whereas in a succeeding generation it loses its causal force? We will argue that the question of how parametric change can take place given reasonable constraints on learnability is fundamental both for understanding language acquisition and for understanding language change. Indeed, the logical problem of language change cannot be separated from the logical problem of language acquisition; one of the claims of this article is that the former problem is a subcase of the latter (see Lightfoot and Hornstein 1981) in that the answer reduces to the relation between property ci, the structure of the learner, and pi (the same point has been made by Lightfoot (1991)). We will formalize this problem in light of current thinking on language learnability; doing this elucidates both the processes that underlie diachronic change and those that drive learning. The result is of importance for an understanding both of language acquisition and of diachronic change.1 A central problem for acquisition theory is that of characterizing how the learner formulates and retracts hypotheses in light of its linguistic environment. Equally, one of the central problems for language change concerns how a population of learners can converge on a grammar that is systematically different from the adult grammar in the sense defined above. In both cases, hypothesis formation and retraction by learners appear to be the crucial mechanisms. We will adopt the genetic algorithm approach to learnability developed in Clark 1990, 1992.2 This approach treats learning as a special case of natural selection. In what follows, we will show how to encode the learner’s hypotheses about the target sequence of parameter settings as “bit strings”—that is, strings of 0s and 1s—that serve to enumerate not only hypotheses but also, by extension, grammars and parsing devices. These bit strings, then, can be treated like genetic material that specifies grammatical “phenotypes” that may be expressed by parsing devices. These parsing devices are then run against an input text, and their relative fitness is measured by a simple metric. Those hypotheses that are judged most fit are then combined via a special mating operation; in other words, we will literally allow hypotheses to mate and thereby produce “offspring” hypotheses that

Computational Model of Language Learnability 41 share genetic material (subsequences of bit strings) of both parents. Since the mating operation prefers the most fit hypotheses, this technique allows the learner to search the hypothesis space efficiently while optimizing the learner’s computational resources. The genetic algorithm technique presupposes that the input text expresses each parameter with sufficient frequency that the learner’s hypotheses are placed under pressure to bear that parameter setting. Hypotheses that carry a parameter value corresponding to a parameter setting frequently expressed in the input text will be strongly selected for by the fitness metric. As a result, hypotheses containing “favorable” parameter settings will tend to reproduce more frequently, whereas the “unfavorable” setting will disappear from the population, where favorable simply means ‘better able to parse the input’. If, on the other hand, a parameter is not expressed frequently in the input text, the learner will be under less pressure to set that parameter in accordance with the target setting. In this case, the fitness metric will not be decisive in driving the learner toward the target setting, so that either the correct setting or the incorrect setting can survive in the linguistic environment. The fitness metric, which we will describe in detail below, plays a crucial role in mediating between the learner and the input text. Implicit to this discussion is the notion that relative fitness determines convergence; the learner converges to the most fit hypothesis relative to the input text even if this grammar differs from the adult state for the values of some parameters. We will propose that parametric change occurs when the target of acquisition contains parameter values that cannot be uniquely determined on the basis of the linguistic environment. This can occur when the evidence presented to the learner is formally compatible with a number of different, and conflicting, parameter settings. In these cases the learner must evaluate its hypotheses using criteria that are not purely a response to the external environment; in particular, the learner must consider factors like the Subset Condition (Berwick 1985) and elegance of derivations (the least effort strategy; Chomsky 1991). Thus, the consideration of language change from a learnability perspective gives us access to how learners evaluate the relative merit of their hypotheses. Our goal here will be to characterize, in a precise manner, the conditions under which a learner arrives at a grammar distinct from the target, thus fueling diachronic change. Moreover, this approach reduces the logical problem of language change to the logical problem of language acquisition by relating both to the question of how learners set parameters to particular values. Intuitively, our argument will be that, because of various factors, the input data do not put pressure on the learner to set certain parameters to a definite value; several alternative grammars can adequately account for the input stream; the appropriate choice of grammar is underdetermined by the linguistic environment, even given the learner’s rich internal structure. Since external pressures do not force the learner to select a particular grammar,

42 Robin Clark and Ian Roberts it will turn in on itself, abandoning external pressure, and rely on its own internal structure to select from the alternatives at hand. If this is correct, then diachronic change can provide crucial information on those factors that learners rely on to select hypotheses. Since the external environment is not decisive in these cases, diachronic change reflects pure learnability considerations. Thus, diachronic change reflects what is, in a sense, “pathological” learning, and so a careful study of its properties can reveal a great deal about how learning transpires in nonpathological cases (a similar idea is developed for phonological change by Kiparsky (1982)). We will argue that parametric change can involve a variety of factors. Change in one component—for example, the phonology—can obscure syntactic parameter expression. The resulting text will not uniquely drive the learner toward the target. At this point the learner appeals to the fitness metric to select an appropriate parameter setting, and factors such as the Subset Condition or general economy of representations come into play rather than pure selective pressure from the input text. This type of change is exemplified by the introduction of subject clitics in 15th-century French. A second important factor is instability due to independent parametric changes within a component; change in one parameter setting can trigger a number of changes to other parameter settings. As we will show, parametric change in 16th-century French provides a case study on how parametric change can cascade through a system (see Roberts 1993). During this period, French ceased to be both a null subject language and a verb-second (V2) language. We will show that, because of innovations in the 15th century, the system became unstable, and deep parametric change was forced on the learner via the fitness metric. Fundamental to this analysis is the formalization of the notion of stability relative to a particular parameter setting: a parameter setting is stable to the degree that its expression in the input data is unambiguous. Following Clark (1990, 1992), we will say that a parameter value, p(υj), is expressed by an input sentence, si, just in case a grammar must have p set to value υj in order to assign a well-formed representation to si (see section 2.4). We should note that this does not mean that the parameter is set by raw data; rather, parameter expression defines a class of representations that are compatible with the current input sentence and the parameter values that those representations entail. An unstable parameter setting, then, is one whose expression is ambiguous. We will show that, through a variety of independent changes, 16th-century French became highly unstable, resulting in the loss of null subjects and V2 phenomena. The article is organized as follows. In section 2 we discuss the formal and conceptual underpinnings of the learning theory. In section 3 we apply the learning theory to a particular case of change. Finally, in section 4 we discuss some of the consequences of the current approach for the theory of learning and change.

Computational Model of Language Learnability 43

2. Genetic Algorithms and Language Learnability The basic problem faced by a language learner is to discover a target grammar based on a plausible input text.3 A principles-and-parameters (see Chomsky 1981) approach to grammar provides a powerful way of limiting the problem of discovering the appropriate target grammar given the impoverished nature of the input data. Parameters can be viewed as finite vectors along which natural languages may vary; the learner is faced with the problem of searching a finite space of possible grammars rather than the more difficult problem of inducing a set of rules that lies at an undetermined point in an infinite hypothesis space. Learning theory must provide an account of how the learner’s search through the set of possible combinations of parameter values takes place, and of how certain values are chosen over others. We believe, with Lightfoot (1979), that such an account should give a solution to the logical problem of language change. In this section we will describe in detail our account of how the learner searches through the available parameters and fixes their values. The approach is based on the notion of a genetic algorithm (Holland 1975, Goldberg 1989, Clark 1990, 1992). Genetic algorithms model the basic process of natural selection in the biological world: how certain patterns of genetic material are more adapted to their environment (i.e., fitter) than others, and hence tend to reproduce at the expense of the others. Our account of language learning is analogous: the input text is the analogue of the environment, and so “fitness” means consistency with this; parameter settings correspond to the genetic material of the biological world (and so a whole grammar would be a genome). Successful combinations of parameter settings “reproduce” (i.e., contribute to the formation of new hypotheses about the target grammar) at the expense of others. In this way, the learning mechanism gradually eliminates “unfit” hypotheses (those that are not consistent with the input text) and arrives at a single fittest grammar. Since nothing in the approach requires this grammar to be consistent with the one that underlies the input text, learners may arrive at final-state systems that differ from those of their parents; this, in essence, is our solution to the logical problem of language change. 2.1 The Nature of the Learning Problem It is possible to see the learner as a relation between input data and a sequence of parameter values (see Clark 1990, 1992). More precisely, we can view the learner as a function from input texts to parameter values, as in (1). (1) φ(σi ) = x1 ,x 2 , . . . , x n

44 Robin Clark and Ian Roberts Here the learner is the function φ that applies to an arbitrarily selected text, σi, and gives a sequence of n parameter values, x1 , x2 , …, xn . Given a sequence of parameter values, we can imagine that a special compiling function, ϕn, maps the sequences of parameter values onto a grammar, Gi, for the input text σi. We can further define a function γ, which, given a grammar Gi, returns a parsing device Pm for the grammar Gi. Thus, we can view learning as a relation between inputs and parsing devices. This is important, since the notion of fitness with respect to input texts is most naturally defined in terms of the number of failed or successful parses of those texts. We will discuss how this is done below. Putting the above together, the learning situation is as described in (2). (2) γ[ϕn(φ(σi))] = Pm In considering the learning problem, it is important to recall that the learner is computationally bounded. In other words, the learner has finite resources in terms of time and memory. It cannot take indefinite periods of time before converging to the target grammar, nor does it have a perfect memory for past sequences in the input text or past (unsuccessful) hypotheses. Furthermore, the learner is given little information about the proper analysis to be accorded to the input data. It has only limited information about the proper structural analysis for any given datum, and little to no access to input that is ill formed with respect to the target. The claim that the hypothesis space, under a principles-and-parameters approach, is finite is not, in itself, sufficient to guarantee that the learner can converge in a reasonable amount of time. Finite problems can be sufficiently large that their solution might take an impractical amount of time to compute. Suppose, for example, that the hypothesis space is determined by 30 binary parameters. In this case there are 230, or 1,073,741,824, possible grammars. If the learner could test each of these grammars at the rate of one per second, it might in the worst case take the learner over 34 years to converge on the target. Clearly, the learner must be capable of searching the hypothesis space in a more efficient manner. Beyond efficiency considerations, it is clear that the learner cannot use a brute-force search technique to converge on the target since certain parameters may fall into subset relations; allowing 0 to stand for the negative value of a parameter and 1 to stand for the positive value, we can indicate as in (3) that the language that results when a certain parameter, px, is set to 0 is a proper subset of the language that results when px is set to 1. All the sentences that are grammatical in the subset language will also be grammatical in the superset language. If the learner guesses the superset language, then no further evidence will contradict its hypothesis. Thus, the learner will never have grounds to retract this (incorrect) hypothesis. Thus, the learner must guess the minimal language compatible with the input sequence σi. Given that the learner has no reliable access to negative evidence, it appears

Computational Model of Language Learnability 45 that the learner must guess the smallest possible language compatible with the input at each step of the learning procedure. This is, in essence, the Subset Condition proposed by Berwick (1985), which is intended to circumvent the sort of trap posed by subset parameters. (3) L[p1, . . ., px − 1, px(0), px+1, . . ., pz] ⊂ L[p1, . . ., px − 1, px(1), px+1, . . . pz]

L[p1 ,..., px – 1, px(0), px + 1 ,..., pz]

L[p1 ,..., px – 1, px(1), px + 1 ,..., pz]

A further possibility arises if we consider that sets of parameters might interact in such a way as to generate superset languages. That is, when considered individually, the parameters in question may not necessarily generate superset languages, but when they act in a group, they do generate a superset language. This is the shifting relation observed by Clark (1990):4 (4) Shifting Two parameters, xi and xj, cause a shift at values xi(1) and xj(1) just in case: a. L[ϕn(x1, . . ., xi(1), . . ., xj(0), . . ., xn)] ⊄ L[ϕn(x1, . . ., xi(0), . . ., xj(1), . . ., xn)] b. L[ϕn(x1, . . ., xi(0), . . ., xj(1), . . ., xn)] ⊄ L[ϕn(x1, . . ., xi(1), . . ., xj(0), . . ., xn)] c. L[ϕn(x1, . . ., xi(1), . . ., xj(0), . . ., xn)] ⊂ L[ϕn(x1, . . ., xi(1), . . ., xj(1), . . ., xn)] d. L[ϕn(x1, . . ., xi(0), . . ., xj(1), . . ., xn)] ⊂ L[ϕn(x1, . . ., xi(1), . . ., xj(1), . . ., xn)] In other words, a shift occurs given two parameters that generate superset languages when they are both set to some particular value. Notice, crucially, that if the language generated by setting xi to 0 is a subset of the language generated by setting xi to 1, this relationship is preserved in the shifted language. In brief, a learner could obey the Subset Condition on the microscopic level (with respect to a single parameter) while violating it on the macroscopic level (due to shifting interactions between parameters). In order for the learner to avoid these higher-level violations of the Subset Condition, it would have to calculate interactions between parameter settings. But this would become increasingly difficult as the number of parameters that could “conspire” to generate a shifted language increased; given n parameters, the learner may have to consider n! possible interactions. The graph in (5) illustrates a case of shifting that involves superset parameters. In this example we have two parameters p1 and p2 that interact to generate a shifted language, L[p1(1), p2(1)]. In (5) dominance indicates the subset/superset relation.

46 Robin Clark and Ian Roberts (5)

L[p1(1), p2(1)]

L[p1(0), p2(1)]

L[p1(1), p2(0)]

L[p1(0), p2(0)]

In this case both p1 and p2 are superset parameters; any language with p1 set to 0 is a subset of a language with p1 set to 1, and any language with p2 set to 0 is a subset of a language with p2 set to 1. Note that L[p1(1), p2(0)] and L[p1(0), p2(1)] are not in the superset relation with each other. The language L[pl(l), p2(1)], however, properly contains the other three possible options. As we will show, the learner will be reluctant to posit the language L[pl(l), p2(1)] and will only do so if faced with a significant amount of empirical prodding in the form of failed parses. A more difficult case is illustrated in (6). (6)

L[p1(1), p2(1)]

L[p1(1), p2(0)]

L[p1(0), p2(1)]

L[p1(0), p2(0)]

In this case only one of the parameters, p1, is a superset parameter. One might imagine that p1 regulates the option of having left-dislocation of a constituent. The parameter p2 does not generate languages in the superset relation. For example, one might take p2 to be a parameter that regulates V2 phenomena in matrix clauses. Suppose that p1 and p2 interact in such a way that, when both are set to 1, the language allows left-dislocation of a constituent over the V2 structure of the root clause; the resulting language has all of the normal V2 orders plus clauses with an additional constituent left-dislocated before the normal V2 order. Such a language would be a shifted language. Take the case where the target language is V2 without left-dislocation. Suppose that the learner, during an early phase of the learning cycle,

Computational Model of Language Learnability 47 erroneously sets p1 to 1, allowing left-dislocation of an NP (or DP) in response to the presence of nonsubject NPs/DPs in clause-initial position. This hypothesis, however, is inadequate to account for all the root V2 orders that the learner encounters—for example, those with initial adverbials and also possibly those with initial NPs/DPs without a resumptive pronoun. In response, the learner sets p2 to 1, allowing for the possibility of V2, but does not reset p1 to 0. In this case the learner has now entered a shifted language; because of the interaction between p1 and p2, all the target orders will be consistent with the learner’s hypothesis, which, nevertheless, overgenerates. We will show that such a hypothesis will be selected against in such a way that the learner can retract its overgeneral hypothesis without access to direct negative evidence. Such a shifted language, although a possibility empirically, will tend to be unstable diachronically, with one of the two superset possibilities, V2 or left-dislocation, being quickly lost. Notice that a learner will have two analyses available for “V3” structures (structures with two constituents before the tensed verb); either such a structure involves left-dislocation with a standard V2 structure, as in (7a), or it involves simple left-dislocation, as in (7b). (7) a. [CP DP [CP DP [C′ V [IP . . .]]]] b. [CP DP [IP DP V . . .]] We will argue that (7b) involves a simpler syntactic analysis, with shorter chains, than (7a). Thus, the learner will tend to prefer the hypothesis that allows (7b) over one that requires the analysis in (7a). We will discuss further how this factor of “elegance” influences parameter setting below. In our discussion of the data from French in section 3, we will present some cases of this type. 2.2 Genetic Algorithms Clark (1990, 1992) proposes that genetic algorithms provide a computational model of learning for a principles-and-parameters theory that circumvents the problems discussed in section 2.1 while accounting for the relationship between input evidence and parameter setting. Genetic algorithms mimic natural selection by representing hypotheses about a problem in a way that is similar to the way in which genetic material is represented. Hypotheses are then tested against the problem space, with the most fit hypotheses contributing to the formation of new hypotheses via reproduction (the combination of preexisting hypotheses to form new hypotheses in a way that is similar to the biological recombination of DNA present in mating). By “breeding” the most fit hypotheses, testing them against the problem space, and pruning the least fit, a genetic algorithm can efficiently search large spaces and find optimal solutions.5 More precisely, a genetic algorithm defines a number of automatic mechanisms for combining hypotheses that

48 Robin Clark and Ian Roberts are, in some sense to be defined below, “fit.” These mechanisms, which simulate breeding or reproduction, produce new hypotheses that are likely to replicate the advantageous properties of existing hypotheses while eliminating those properties that are ill adapted to the environment (in our case, the sequence of input sentences that the learner encounters). By repeating this process over successive “generations” of hypotheses, the learner is able to approximate the target sequence of parameter settings.6 A genetic algorithm consists of the following components: • A representation of hypotheses in terms of strings, similar in structure to genetic material. In our case we will encode sequences of parameter values as strings of binary numbers. • A set of reproduction operators that combine or alter existing “parent” hypotheses in order to produce new “offspring” hypotheses. Reproduction will be based on the performance of the hypotheses relative to the input stream; those that perform best will reproduce most prolifically. Furthermore, since reproduction is based on existing hypotheses, the search of the hypothesis space is highly constrained and not random (see Holland 1975 and Goldberg 1989 for careful discussions of these points): 1. A crossover mechanism. This mechanism combines two hypotheses and produces a new hypothesis by combining parts of each of the parents’ genetic material. 2. A mutation operator. This mechanism randomly alters an offspring’s genotype to produce a new hypothesis close to, but not identical with, the parents’ genetic endowment. • A measure of fitness of hypotheses in terms of their performance in an environment. The fitness metric defines how well adapted hypotheses are to their environment. In our case the fitness metric mainly measures success in parsing the input text (although it does contain other factors, as we will show). Most crucial for our purposes are the representation of hypotheses in terms of strings and the notion of a fitness metric. Let us first turn to a more careful consideration of the representation of hypotheses. It is common to think of parameters as variables in Universal Grammar that range over a limited set of values. The bounding nodes for classical Subjacency (Chomsky 1977) provide a good example of such a parameter. Here Subjacency is taken as an invariant property of natural languages whereas the bounding nodes may be contingently selected from a restricted set: (8) Subjacency No rule may involve X and Y in the configuration: . . .X. . . [α . . .[β . . .Y. . . β]. . .α] . . .

Computational Model of Language Learnability 49 (order irrelevant) where α and β are bounding nodes; α, β ∈ {NP, IP, CP}. Parameters can equally be viewed as variant properties of natural languages; in other words, a parameter can be thought of as a descriptive statement that may be either true or false of a given grammatical system. From this perspective, we could rewrite the parameter for the bounding nodes as a series of three statements: (9) a. IP is a bounding node for Subjacency. b. CP is a bounding node for Subjacency. c. NP is a bounding node for Subjacency. The learner’s task would be to scan the input data and attempt to assign truth-values, 1 for true and 0 for false, to each of the above propositions. The learner’s hypotheses could then be taken as strings of 0s and 1s corresponding to the truth-value associated with each parameter. For example, the string 100 could correspond to the hypothesis that IP is a bounding node for Subjacency, but neither CP nor NP is. Thus, it is relatively natural to represent parameter settings in terms of strings. Notice that this binary representation of sequences of parameter values serves both to encode grammars as binary numbers and to enumerate the set of possible natural languages (see Clark 1992). Crucially, given the above method of encoding parameter sequences, we must be capable of recovering the grammars and parsing devices that these encodings represent. This is crucial because fitness will be measured in terms of the performance of parsers relative to a stream of input data; the actual algorithm, however, will operate on the string representation of the hypotheses. We must, then, have a translation function that relates our hypotheses (strings) to the parsers that they represent, as shown in (10): (10)

Input stream

Genetic algorithm

Population of hypotheses

Translation function

Population of parsing devices

In fact, we have already defined all the machinery needed to accomplish the above. We conceive of the learner, φ, as operating on strings of parameter settings; thus, φ is the set of reproduction operators in the genetic algorithm. The translation function in (10) then maps the learner’s hypothesis strings onto parsing devices; in other words, the translation function is comparable to the

50 Robin Clark and Ian Roberts functions ϕn, which maps sequences of parameter settings onto grammars, and γ, which maps grammars onto parsers. In a sense, the hypothesis strings represent genotypes for parsing devices whereas the translation function (ϕn and γ) maps genotypes onto phenotypes. Overlying all of this is the fitness metric, which guides the learner’s application of the reproduction operators. The crossover operator combines two hypothesis strings to create new hypotheses. For example, suppose that the two hypotheses in (11) have been selected for reproduction. (11) a. 000111 b. 101000 Now suppose we “cut” both strings after the third position in the bit string: (12) a. 000―111 b. 101―000 The first part of string (12a) is then recombined with the second part of string (12b), and the first part of string (12b) is recombined with the second part of string (12a): (13) a. 000―000 b. 101―111 And thus two new “offspring” hypotheses that have inherited genetic material (hypotheses about settings of particular parameters) from each parent are created. It should be noted that fitness interacts in a crucial way with the crossover operation. Highly fit hypotheses are more likely to be selected to take part in crossover and therefore are more likely to pass the parameter settings that made them fit on to new generations of hypotheses. The mutation operator similarly creates new hypotheses on the basis of existing ones. In essence, it must slightly alter a hypothesis string in order to create a new, but “nearby,” hypothesis. We can do this by flipping a randomly selected bit position in a hypothesis string by the following rules: (14) a. 0→1 b. 1→0 Thus, selecting the second position of the following hypothesis for mutation would yield a “mutant” that is nearly identical to its parent structure: (15) 000111→010111 The mutation operation can be viewed as a means of searching the immediate hypothesis space surrounding a parameter string. Thus, the learner can,

Computational Model of Language Learnability 51 in a sense, experiment with near-optimal hypotheses that approximate, but do not correspond to, the target. In terms of an actual parsing framework, there would be a fixed central algorithm, corresponding to UG. Within this algorithm would be various flags, indicating points where code must be inserted for the parser to function. The 0s and 1s in the hypothesis strings could be interpreted as pointers to the parameterized code. Upon receiving a hypothesis string, the machine would look up the various pieces of code indicated by the 0s and 1s and systematically substitute the code it finds for the flags in the main algorithm. The result would be a special parsing device designed to analyze the language enumerated by the hypothesis string. Thus, a “self-constructing” parser would be the ensemble of the core algorithm, the parameterized code, and a learning device that would select the appropriate hypothesis string in response to the input text. We then have a straightforward model of the translation function required by the genetic algorithm to relate hypothesis strings to parsing devices. Recall that this translation function, itself, corresponded to the functions γ and ϕn in the formalization of the learning problem, above. 2.3 Fitness Having shown how hypotheses can be represented in terms of strings and how these can be combined systematically to form new hypotheses, we still face the problem of defining the relative fitness of a hypothesis with respect to a linguistic environment. Ultimately, we want the learner to become better able to represent the input data. In other words, the learner should change its hypothesis on the basis of evidence from the external environment, and its new hypothesis must be better able to account for this evidence. In some sense, new hypotheses must be an improvement over the old hypotheses. Clark (1990) provides a crude definition of improvement based on the ability to parse input sentences in terms of failed parses. We will modify his treatment by supposing that the crucial property of a failed parse is that it violates at least one principle of core grammar.7 In particular, we will suppose that a parser consists of a number of modules (Case, binding, X-bar theory, and so on) that operate in tandem to produce a full syntactic representation. When a principle in one of these modules is violated, when the current grammar cannot assign a well-formed representation to some input, the offended component will signal a violation. With this in mind, we adopt the following notion of improvement of one hypothesis with respect to another: (16) A hypothesis A is an improvement over a hypothesis B if, given an input datum, si, A signals m violations of core grammatical principles while B signals n violations and m < n.

52 Robin Clark and Ian Roberts Intuitively, a parser that signals 3 violations on a parse is rather better than one that signals 4 violations, and a parser that signals 2 violations is superior to one that signals 3. Crucially, parsers need not perform perfectly in order for the performance to be distinguished. We will suppose, then, that the various modules of the parser are connected to a summation function, Σ, as shown in (17). (17)

X-bar θ Case ∑ Binding Bounding ECP

Each module can signal a violation to the function Σ, which then sums up the number of violations and passes the number on to the learner. Notice that the learner has no access to which grammatical principles have been violated; it only receives a number representing the sum of the violations for each parse. As noted above, the learner must be able to distinguish between hypotheses that generate a superset language and those that do not. If a superset hypothesis and a subset hypothesis can both account for an input datum, then, all things being equal, the learner should prefer the latter to the former. Thus, any fitness metric should be such that it generally rates a subset hypothesis more highly than a superset hypothesis just so long as the subset hypothesis is empirically adequate (does not fail to parse the input data). Finally, we will assume that the learner can take into account the overall “elegance” of its hypotheses. That is, the learner will, all else being equal, prefer hypotheses that lead to more compact representations. Compactness, here, can be defined in terms of such factors as the number of nodes required to cover the input string, the length of the chains associated with arguments and operators, or both. For the moment we will assume that the measure of elegance is a raw node count from each parse. With these factors in mind, we suggest the following as a fitness metric, defined over a population of parsing devices relative to an input sentence (see Clark 1990 for an earlier version of this metric). It should be noted that hypotheses are judged indirectly by means of the parsing devices that they determine, just as a genotype is judged through its expression as a

Computational Model of Language Learnability 53 phenotype. In particular, the learner has no information about why certain hypotheses perform better than others, only that certain hypotheses do, in fact, perform better. In assessing the performance of hypotheses, the fitness metric will consider a number of different factors. Above all, it will consider raw success or failure to parse; other factors, like subset relations and elegance of representation, are also taken into account, although their contribution is weighted so that they influence the learner slightly less than actual success or failure to parse. Let the number of parsing devices be n. We then need a way to count up the number of violations incurred by a given parser P and, since we are defining relative fitness, to relate this to the number of violations signaled by all the parsing devices together. We indicate the total number of violations n of all parsing devices by ∑ j =1 υ j ; this operation simply sums the number of violations in the entire population of parsing devices. We indicate the number of violations incurred by P as υi. To relate υi to the number of violations incurred by other parsers, we follow a standard statistical technique and subtract the number of violations incurred by P from the total and divide that figure by the total:

∑ υ −υ ∑ υ n

(18)

j =1

j

i

n

j =1

j

Thus, if the total number of violations is 1,000 and Pi produces 10 viola1000 − 10 tions, the metric will give = 0.99. Where Pj produces 100 vio1000 1000 − 100 lations, the metric gives = 0.9. Pi is thus more highly valued 1000 than P . j

For complete precision, we must prevent the parser in question from being compared with itself, so we exclude it from the population as follows: (19)

∑

n j =1

υ j − υi

(n − 1)∑ j =1 υ j n

As noted earlier, we also want to evaluate whether a given hypothesis gives rise to a superset grammar. We can do this by proceeding in the same way as above: If we allow sm to represent the number of superset settings n in the hypothesis hm, then ∑ j =1 sj is the number of parameters set to superset values in the population.8 We now introduce a “superset penalty”, the constant b < 1, and multiply the count of superset settings by b. In this way, Subset Condition violations are scaled so that they will have less weight in the overall metric than a simple failure to parse a sentence. The product of b and the superset count for a single parsing device is evaluated relative to

54 Robin Clark and Ian Roberts the population of parsers in the same way as above. Combining the superset factor with the parsing factor, then, produces the metric in (20).

(20)

(∑

n

)

υ j + b∑ j =1 sj − (υi + bsi ) n

j =1

(n − 1)(∑ j =1 υj + b∑ j =1 sj ) n

n

Finally, we need to weigh in the relative elegance of parses as a factor. n Again we proceed in the same fashion: ∑ e j is the measure of the genj =1 eral elegance of the analyses in the entire population of parsers (which we continue to take to be a simple tally of the number of nodes) and ei is the measure for parser Pi. Analogous to the superset penalty, we introduce the constant c, which is a scaling factor for the elegance of the representation. Here again, this means that elegance is a less important factor than failure to parse. If we include the elegance factor in the equation, we arrive at the fitness metric: (21) The fitness metric

(∑

n

)

υ j + b∑ j =1 sj + c ∑ j =1 e j − (υi + bsi + cei ) n

j =1

n

(n − 1)(∑ j =1 υj + b∑ j =1 sj + c∑ j =1 e j ) n

n

n

We will leave the question of the exact values of the constants b and c open, assuming only that 1 > b, c > 0 (preliminary calculations suggest that both of these constants are in fact very small, in the region of 0.00002; see Clark 1990). It is worth emphasizing that the fitness metric takes these factors into consideration, but that they are weighted so that they always count less than straightforward failure to parse. Notice, though, that they become crucial in distinguishing between successful parses. This will play an important role in our discussion of language change in section 3. Finally, the fitness metric in (21) blurs the reasons for success or failure of a hypothesis relative to a population; the learner has no way of knowing why a given hypothesis succeeds or fails. It is perhaps useful to consider the contribution of each of the above factors, using some hypothetical examples. Let us turn first to the way in which the fitness metric treats grammatical violations. For the population, this is n the term ∑ υ j in the fitness metric; for the individual parsing device, j =1 it is the term υi. Suppose we have the three parsing devices p1, p2, and p3. Running these on an input sentence yields the following results: (22) a. p1 returns 1 violation, covering the input with 15 nodes. b. p2 returns 2 violations, covering the input with 15 nodes. c. p3 returns 3 violations, covering the input with 15 nodes.

Computational Model of Language Learnability 55 Running the above results through the fitness metric gives the following results, with b = 0.02 and c = 0.05 (we ignore, here, the contribution of the subset factor by assuming that none of the hypotheses underlying the parser contain superset settings): (23) a. p1 receives a fitness rating of 0.393939. b. p2 receives a fitness rating of 0.333333. c. p3 receives a fitness rating of 0.272727. Thus, parser p1 is judged the most fit, p2 the next most fit, and p3 the least fit. Notice that the learner does not receive information about which grammatical principles are violated. It has no need of such information in order to distinguish between the hypotheses at hand. Instead, it need only observe the performance of its hypotheses in an external manner, without information about their inner workings. The learner will base its new hypotheses on those old ones that are relatively more fit, thus passing on the parameter settings that made those hypotheses fit to future generations. Those parameter settings that avoid grammatical violations relative to the input text will be preserved, and those that tend to generate violations will gradually disappear. Let us turn, now, to the contribution of the superset penalty, the term n ∑ j=1 sj for the entire population and the term si for a single parsing device. Suppose that p1 and p2 both signal no violations of any grammatical principles and both cover the input in 20 nodes. Suppose further that p2 contains a superset setting for one parameter and p1 contains no superset settings. The fitness metric will then return the following results: (24) a. p1 receives a fitness rating of 0.50495. b. p2 receives a fitness rating of 0.49505. Notice that the “smallest hypothesis,” in this case the one underlying p1, is judged more fit than the one that violates the Superset Condition. Thus, the fitness metric can distinguish both between hypotheses that are unequal in their parsing powers and between hypotheses that are equal in parsing power but differ with respect to the Subset Condition. We turn, finally, to the contribution of the “elegance” factor; this is n the term ∑ j =1 e j for the entire population and ei for individual parsing devices. Consider two hypotheses, p1 and p2, which both return no violations and contain no superset settings but cover the input with trees of different elegance. Suppose that p1 is able to cover the input with 17 nodes whereas p2 covers the input with 18 nodes. The results of the fitness metric are then as follows: (25) a. p1 receives a fitness rating of 0.514286. b. p2 receives a fitness rating of 0.485714.

56 Robin Clark and Ian Roberts The first hypothesis is preferred by the fitness metric since it is able to span the input in a more elegant way than the second hypothesis. In order to see the importance of this factor, consider the case where the target is SVO. Suppose that hypothesis h1 treats the subject as being in the Spec of IP at S-Structure whereas hypothesis h2 treats the subject as having moved to the Spec of CP, attracting the main verb with it. For a simple clause, h1 and h2 will return the following structures: (26) a. h1: [IP DP [Iʹ I VP]] b. h2: [CP DPi [Cʹ Vj [IP ti [Iʹ tj VP]]]] By assumption, both h1, and h2 can account for the input stream. Notice, however, that h2 involves systematically longer chains than h1 since the former always involves movement of the subject to the Spec of CP, with subsequent attraction of the verb to C0, whereas the latter does not. The representations returned by h1 are simpler than those returned by h2. Since the learner, via the fitness metric, can take into account the general elegance of representations, it can successfully distinguish between h1 and h2. Notice, however, that elegance is defined quite simply as a count of the nodes in the tree covering an input item plus the lengths of the chains in the representation. The fitness metric can be considered to work as follows. The population of parsing devices specified by the learner’s hypothesis strings is run against n n n each input item. The term ∑ j=i υ j + b∑ j=1 sj + c ∑ j=1 e j yields the total number of violations, the total number of superset settings, and the total elegance of representations of the entire population, with the various factors weighted appropriately by the constants b and c. Dividing this term by n, the size of the population, would give the average number of undesirable properties for the entire population. Next consider the term υi + bsi + cei. This yields the number of unhealthy properties each individual parsing device carries. As this term grows in relation to the population average, the relative fitness of the parsing device decreases. If this term decreases with respect to the population average, then the parsing device is judged relatively more fit.9 The opportunity to reproduce (that is, be selected for the crossover operation and mutation) is a direct function of relative fitness. The simulation developed in Clark 1990 assumes that the fitness associated with a hypothesis corresponds transparently to its proportion of the general population. In an environment with random mating, then, those hypotheses with a high proportion in the population are more likely to meet and reproduce. The fitness ratings are used to simulate a weighted roulette wheel, the results of which undergo the crossover and mutation operations. In other words, successful hypotheses will receive a high fitness rating. The fitness rating corresponds to the probability that the hypothesis will get to reproduce. Thus, the fittest hypotheses will reproduce more frequently

Computational Model of Language Learnability 57 and pass on their parameter settings to new hypotheses. Cumulatively, then, the population will tend toward the optimal set of parameter settings for the target. Crucially, the most fit hypotheses are the most likely to contribute to the formation of new hypotheses. These hypotheses have the greatest opportunity to pass on the parameter settings that made them fit to new hypotheses. Because weak hypotheses are pruned at random intervals, these are ultimately prevented from contributing their inferior parameter settings to the general pool. Thus, fit parameter settings tend to take over while unfit parameter settings are purged. By iterating the process of parsing, judging fitness, reproduction, and “death,” the learner is able to incrementally approach the target grammar. 2.4 P-Encodings Before we turn to the diachronic data, two other definitions are required. Consider a simple example like (27). (27) John loves Mary. Notice that certain parameters must be set in a particular way if the sentence is to be parsed. Both John and Mary must receive θ-roles and Case, the verb love must be capable of picking up its inflectional affix, and so on. Any parsing device that can successfully account for these features of the sentence in (27) will return a well-formed representation. Other parameters (e.g., bounding nodes and those that regulate conditions on anaphora) are irrelevant to the representation of this sentence. It will not matter what values for these parameters the parsing device presupposes. This suggests that any given input sentence expresses certain parameters and that a set of distinct parsing devices can account for (27): (28) Parameter expression A sentence σ expresses a parameter pi just in case a grammar must have pi set to some definite value in order to assign a well-formed representation to σ. When a given datum expresses some parameter value, the learner will be under pressure to set that parameter to the value expressed by the datum. This is because the fitness metric will prefer hypotheses with the correct setting to those without it. This provides a simple definition of the intuitive notion of triggering datum: (29) Trigger A sentence σ is a trigger for a parameter pj if σ expresses pj.

58 Robin Clark and Ian Roberts Given the above interpretation of the input data, we can imagine a method of encoding the data in string form. Suppose we have a function ψ that maps a sentence onto the set of sequences of parameter settings that are compatible with that sentence. For example, a given input sentence, Sm, can be accounted for by grammars with the second and third parameters set to 0 and the fifth parameter set to 1. Applying ψ to Sm would give the following set of parameter strings: (30) ψ(sm) = {00001, 10001, 00011, 10011} Using “*” as a variable to range over 0 and 1, we could replace the above set of strings with a cover term: (31) {00001, 10001, 00011, 10011} = [* 0 0 * 1] We will refer to the sequence [* 0 0 * 1] as the p-encoding for sm; the p-encoding of a sentence may be thought of as a “pure” representation of the parameters expressed by the sentence.10 Notice that, in principle, one could replace the sentences in an input text with their p-encodings and, so, study the frequency of expression for various parameters and the overall structure of the text relative to parameter expression. There is an important relationship between parameter expression and the fitness metric. Ultimately, the fitness associated with a hypothesis governs its probability of being selected for reproduction. The more fit a hypothesis is, the more likely it is to pass on those parameter settings that made it fit. Now consider parameter expression. When a parameter is expressed, those hypotheses that have the correct value for that parameter will be judged more fit than those that lack the proper value. If a parameter is expressed robustly by several different construction types (and, hence, has a higher probability of occurring in the input text), then those hypotheses bearing the correct value will have more opportunity to be selected for reproduction and the appropriate parameter setting will tend to dominate in the population. Furthermore, those hypotheses bearing the incorrect value will have a lower fitness rating and will tend to reproduce less so that the parameter values that made them unfit are washed from the population. Thus, parameter settings that are expressed robustly will tend to be set quickly and efficiently by the learner. Parameters that are not expressed robustly, however, will tend not to affect the fitness of a hypothesis in the same way. The learner will have correspondingly less stake in setting the parameter correctly and will not converge so readily to the parameter value. Now consider the case where parameters are ambiguously expressed. In our terms, there might be several contradictory p-encodings associated with a class of data, for example. Here the learner has several possible solutions available that can account for the input without generating grammatical violations. In this case frequency of parameter expression will not aid the learner in distinguishing between its hypotheses. Instead, the learner will

Computational Model of Language Learnability 59 have to rely on the structure of the hypotheses themselves, and not their empirical coverage, in order to select a winning hypothesis. These internal factors are the overall elegance of representations and the number of superset settings in each hypothesis, both of which are factors in the fitness metric. We argue here that it is this sort of case that provides the fuel for core diachronic change in a parameter setting. In the next section we will turn to a case where learners were faced with just such an ambiguity

3. A Case Study in Diachronic Change We believe that applying genetic algorithms in the form outlined above to the acquisition of natural languages is not only possible but desirable. It is desirable in part because it avoids the problems discussed in the previous section: it allows convergence over a finite but large hypothesis space, and it can be defined such that superset traps can be avoided (which the version of the fitness metric given in (21) in fact does). Our main contention here, however, is that the genetic algorithm approach provides a solution to the logical problem of language change. We will now turn to an application of the genetic algorithm approach to learning and show how it can model diachronic change as well. 3.1 The History of French Roberts (1993) analyzes three major syntactic changes in the history of French as reflexes of a single underlying parametric change. The three changes are as follows (here and elsewhere, unless otherwise noted, OF and MidF data are from Roberts 1993): (32) Loss of “simple inversion” in interrogatives

a. *A Jean pris le livre? ModF has Jean taken the book b. Comment fu ceste lettre faitte? OF how was this letter made

(33) Loss of null subjects

a. *Ainsi s’amusaient bien cette nuit. ModF thus (they) had fun that night b. Si firent pro grant joie la nuit. OF thus (they) made great joy the night

(34) Loss of V2

a. *Puis entendirent-ils un coup de tonnerre. ModF then heard they a clap of thunder b. Lors oïrent ils venir un escoiz de tonoire. OF then heard they come a clap of thunder

60 Robin Clark and Ian Roberts As Roberts shows in some detail, each of these constructions was lost in the early 16th century. Roberts argues that these changes reflect an underlying change in the value of the parameter determining nominative Case assignment proposed by Koopman and Sportiche (1991): nominative Case may be assigned (by I) under government, or under agreement, or under both. The central idea of this account of the history of French is that OF allowed nominative Case assignment under government, whereas ModF does not. More precisely, all of the OF constructions depend on the possibility of the inflected verb, V + I, assigning nominative Case to the subject in Spec of IP from C, as shown in (35). (35)

CP C' C0 V + I0

IP NP

I'

This situation was allowed in the grammar of OF (and is still allowed in, for example, the contemporary Germanic languages). In a grammar where this configuration of Case assignment is not allowed, no lexical NP can survive in subject position in inversion contexts; this is the situation in ModF, where (32) thus violates the Case Filter. Following Kayne (1983) and Rizzi and Roberts (1989 [this volume, Chapter 9]), we assume that clitics can survive in subject position in this context since they are able to pass the Case Filter in other ways (also see Baker 1988, Everett 1986). Adopting Rizzi’s (1986a) proposal that the necessary condition on formal licensing of pro is that it occupy a Case-marked position, Roberts accounts for the change illustrated in (33) by extending the nominative Case parameter to the pro module; it is well known that OF null subjects were licensed only in contexts of inversion (see Thurneysen 1892, Price 1971, Einhorn 1974, Foulet 1982, Vanelli, Renzi, and Benincà 1986, Adams 1987a,b), and so a natural interpretation of this is that null subjects could only be licensed where nominative Case was assigned under government, that is, in the configuration (35). This in turn accounts for why null subjects were lost when nominative Case assignment under government was lost.11 Regarding (34), V2 also depends on the capacity of I to assign nominative Case to the subject under government after being raised to C with the verb. Note that nominative-Case-under-government is a necessary, not a sufficient, condition for V2. Hence, a system without this possibility cannot have V2. However, a system with this possibility need not have V2 (Modern English is probably such a system). In fact, as we will illustrate, obligatory V2 was already eroding in MidF—this was a crucial factor in the instability that led to the change in the nominative Case assignment parameter.

Computational Model of Language Learnability 61 The principal trigger for the change in the possibilities of nominative Case assignment was the introduction of new word orders that did not strictly conform to V2, notably XSVO (where “X” could be a topic or an adverb). This innovation was probably caused by the development of a series of subject clitics in MidF (see below). The cumulative effect of the new word orders was to destabilize the system in such a way that setting the parameter for nominative Case assignment under government positively became impossible by about 1500, and learners converged on a grammar lacking this property. The result was the elimination of the structures in (32)–(34) in 16th-century texts—a major change in the grammar of French. Note that we do not consider the null subject parameter or the V2 parameter as in any sense subsumed by the nominative Case parameter; however, the particular circumstances of French at the time the change took place were such that the loss of nominative Case assignment under government entailed the loss of null subjects and the elimination of V2. Our proposal is that the initial weakening of V2 combined with the development of a series of subject clitics created a system that ultimately eliminated V2, and in doing so eliminated null subjects and simple inversion. In particular, the weakening of V2 had the effect that hypotheses that allowed an input datum to be analyzed as a V2 structure became more costly relative to the fitness metric; thus, the learner was under pressure from fitness to eliminate the V2 hypothesis. Although we concentrate exclusively on French here, there is also evidence (see in particular Vanelli, Renzi, and Benincà 1986) that many of the Northern Italian dialects of Italy have undergone the same parametric change, since in their recorded history, simple inversion, V2, and, arguably, null subjects have been lost (although the contemporary dialects in fact have a kind of “disguised” null subject system that probably represents an independent diachronic innovation; see Poletto 1990, Renzi and Vanelli 1983, Rizzi 1986b). Moreover, Renzi (1983) argues that Modern Standard Italian has undergone the same changes as French regarding inversion while retaining null subjects. In all, five parameters are relevant to our account of the historical development of French. These are given in (36). (36) a. Nominative Case is assigned (by I) under agreement. {1,0} b. Nominative Case is assigned (by I) under government. {1,0} c. Clitic nominative pronouns. {1,0} d. Null subjects licensed canonically (Case-dependently). {1,0} e. Obligatory V-movement to C in matrix declaratives (V2). {1,0}

62 Robin Clark and Ian Roberts Note that we split Koopman and Sportiche’s parameter for nominative Case assignment into two separate parameters in order to preserve a basically binary vocabulary for parameters (see the discussion of Subjacency and bounding nodes in section 2). We take it that (36a) has been constant at 1 throughout the entire period (but see section 4). As just mentioned, (36b,d,e) changed together in the 16th century. The shift in (36d) and (36e) was forced by the change in the value of (36b). This is presumably quite a standard situation with parametric change: changes in parameter values interact. Moreover, parameter values can be affected by nonsyntactic factors, notably phonological changes. This is the case with (36c); properties connected to the stress system may cause a class of pronouns to cliticize and thereby trigger a shift in the value of this parameter. We now review the relevant data from the different periods of French and show how the data trigger parameter settings. To illustrate the general technique, we will first consider Modern French. Then we will consider Old French and finally the period of greatest “structural instability” (and, hence, of greatest interest), Middle French. 3.2 Learning Modern French Before we consider the earlier periods of French, let us first look at the situation in the contemporary language. What are the parameter values for ModF? It is clear that nominative Case is assigned by I to its Spec position; hence, the first position in the string must be set to 1. On the other hand, Rizzi and Roberts (1989 [this volume, Chapter 9]) argue that ModF does not allow nominative Case assignment under government; this is what leads to the restriction to clitics in contexts where the inflected verb, a complex head that contains I, moves to C (e.g., in interrogatives or conditionals; cf. also (32)): (37) a. b.

Ont-ils/*les enfants vu ce film? have they/the children seen this film Aurait-elle/*Marie fait cela . . . had she/Marie done this

Once moved to C, I must Case-mark the subject position under government; the ungrammaticality of (37a–b) with a nonclitic subject shows this to be impossible. In terms of this analysis, I does not assign nominative Case under government in ModF, and we therefore set the second position in the string to 0.12 It is well known that ModF has a class of clitic nominative pronouns (see Kayne 1975, Rizzi 1986b); the contrasts in (37) in fact illustrate that these elements interact with Case theory in a manner distinct from nonclitics. Rizzi and Roberts (1989 [this volume, Chapter 9]) propose that clitics can satisfy Case theory by incorporating with the verb in C (see Baker 1988, Everett 1986, 1989). Thus, we take it that in ModF parameter (36c) is set to 1. Both parameters (36d) and (36e) are set to 0: ModF is neither a null subject nor a V2 language, as comparisons with contemporary Italian and German show, respectively.13

Computational Model of Language Learnability 63 These remarks on the grammar of ModF (which we of course cannot fully substantiate here; see the references cited for further arguments) lead to the following conclusion regarding the representation of the parameters in (36) as a string of binary units: (38) The “target string” for ModF is 10100. Nominative Case is assigned under agreement and subject clitics are allowed. Let us now consider how the parameter values for ModF are expressed in the input text. Recall that a sentence S expresses a parameter Pi iff a grammar must have Pi set to a particular value in order to assign a well-formed representation to S. In such a situation, S is a trigger for Pi. The following examples illustrate a significant part of the trigger for the parameter values of ModF: (39) a. b. c.

Jean aime Marie. Jean loves Marie Hier Jean est parti. yesterday Jean left Où est-il allé? where did he go

Recall that the conditions of acquisition are such that starred examples like the (a) cases of (32)–(34), which can be used by the linguist to justify a particular analysis, are not available. Moreover, many sentences are amenable to differing structural analyses that can affect their status as triggers. This last point is crucial to understanding how change takes place, as we will show. Consider first (39a), a simple declarative sentence with canonical SVO order. In terms of the usual analysis of ModF, the relevant parts of this sentence are as follows: (40) [IP Jean [I′ aime . . .]] Parsed in this way, (39a) triggers nominative Case assignment under agreement and indicates that V-movement to C is not required in matrix declaratives—in other words, that ModF is not V2. Thus, (40) is associated with the following p-encoding: (41) [1 * * * 0] Nominative Case is assigned under agreement, and V-movement to C is not allowed in matrix declaratives. (41) indicates that (39a) tells the learner that nominative Case is assigned under agreement, and that French is not V2; but it does not say anything

64 Robin Clark and Ian Roberts about whether nominative Case is assigned under government, whether subject clitics are allowed, or whether null subjects are allowed. However, strings exactly equivalent to (39a) are grammatical in the Germanic V2 languages. In these languages the relevant parts of the structure are as follows: (42) [CP Jean [C′ aime [IP t [I' t. . .] Call this the “V2 parse” of an SVO sentence. Here I assigns nominative Case to the Spec of I' (i.e., the position occupied by the trace of the subject) under government; we will refine this analysis in section 4. Hence, the p-encoding for this parse is as follows: (43) [* 1 * * 1] The parser must have nominative Case assignment under government, and V-movement to C is obligatory in matrix declaratives. As (43) shows, (39a) remains silent regarding subject clitics and null subjects. To sum up, SVO declaratives in ModF have the following p-encodings: (44) SVO declaratives p-encode a. [1 * * * 0] Nominative Case is assigned under agreement, and V remains in I in matrix declaratives. b. [* 1 * * 1] Nominative Case is assigned under government, and V moves to C in matrix declaratives. SVO sentences are thus associated with different p-encodings depending on the parse they are given. We can characterize this situation in terms of the following notion of p-ambiguity: (45) A sentence S is p-ambiguous with respect to some parameter Pi just in case S has the set of well-formed representations (R1 . . . Rn) and Pi must be set to some definite value υ1 in order to assign Ri to S (i.e., Ri triggers a Pi (υ1)), whereas Pi does not need to be set to υ1 in order to assign Rj ≠ Ri to S. ModF SVO sentences are p-ambiguous, as (44) shows. As will be discussed in section 3.3, however, the representation where V is in C is disfavored since it involves a more complex structure than the representation where V is in I. Now consider (39b). In V2 languages generally, orders of this type are impossible (see Schwartz and Vikner 1996). This can be interpreted in terms of a ban on adjunction to CP. Supposing that this is so, this example must

Computational Model of Language Learnability 65 be parsed with the adverb attached to IP, V in I, and the subject in Spec of I′. In other words, the relevant parts of the structure are like the parse of (39a) given in (40), and the triggering properties of the sentence are the same. More generally, we can conclude the following: (46) XSV p-encodes [1 * * * 0] Nominative Case is assigned under agreement, and movement of V to C is not allowed in matrix declaratives. Now consider the interrogative in (39c). (39c) provides evidence for the subject clitic (this evidence is probably morphological, given the existence of a separate paradigm of clitic pronouns) and therefore, given that clitic pronouns do not obey the Case Filter in the same way as nonclitic NPs, provides no evidence for either Case assignment parameter. We take it that interrogative sentences by their nature provide no evidence regarding V2 in declaratives, and the null subject parameter is not determined either. We therefore arrive at the p-encoding in (47) (where s indicates a subject clitic in the schematic word order). (47) whVsO p-encodes [* * 1 * *] Subject clitics are possible. If the subject clitic is not recognized as such, but treated as a full NP, this sentence would p-encode (48). (48) [* 1 * * *] Nominative Case is assigned under government. We assume, however, that phonological and morphological evidence disfavors this possibility. When we put the p-encodings in (43)–(47) together (and disregard the one in (48)), the following picture emerges: (49) a. SVO, XSV: [1 * * * 0] Nominative Case is assigned under agreement; no V2 is possible in declaratives. b. SVO: [* 1 * * 1] Nominative Case is assigned under government; V2 is possible in declaratives. c. whVsO: [* * 1 * *] Subject clitics are possible. The two parameters that are not positively set are nominative Case assignment under government and null subjects. These are both set to 0 in the optimal case. Let us consider why.

66 Robin Clark and Ian Roberts The two parameters determining nominative Case assignment by I, (36a–b), are in a shifting relation. Although neither parameter directly determines a superset relation (a grammar that allows nominative Case assignment under agreement generates a language that intersects with one that does not; similarly for nominative Case assignment under government), if both parameters are set to 1, they together generate a language that is the superset of the one that results from setting either parameter to 0. This is a classic case of shifting (of the type seen in section 2). Now, as we have shown, (36a) is unambiguously expressed in the input for ModF and thus is set to 1. In order to avoid shifting, a positive value for (36b) is strongly disfavored. Since there is no unambiguous evidence for nominative Case assignment under government, the pressure against shifting is decisive and the parameter is set to 0 in the optimal grammar. It should be noted that the only evidence for nominative Case assignment under government consists of sentences with the order SVO, with a V2 parse, which can also be analyzed more compactly under the assumption that nominative Case is assigned under agreement. In particular, the V2 parse for the SVO order must involve movement of the subject to the Spec of CP and thus entails a longer chain than would occur under the competing analysis. Thus, the non-V2 parse is again favored and the V2 parse is disfavored by the fitness metric. This provides the learner with further evidence in favor of setting the V2 parameter to 0, as well as disfavoring nominative Case assignment under government. For the null subject parameter, we could follow Berwick’s (1985) reasoning and invoke the Subset Condition. If null subject languages are a superset of non-null-subject languages, the lack of a trigger for a positive value of the null subject parameter will guarantee that (36d) is set to 0. Alternatively, we could appeal to morphological conditions and say that, although the syntactic evidence does not determine a value for (36d), the “poverty” of French verbal inflection determines a negative value. We will leave this question open here. The above paragraphs demonstrate how the various factors we are concerned with work. On the basis of simple, plausible, positive evidence, the learner can converge on the correct parameter settings for Modern French. In what follows we will show how these same factors led to a major parametric change in French, circa 1500. 3.3 Old French As mentioned earlier, OF allowed nominative Case assignment under government (see (34a–b)). We assume that nominative Case could also be assigned under agreement, although we will return to this point in section 4. (34b) shows that OF allowed null subjects, although it is well known that these were possible only in contexts of inversion. Another well-known and much-discussed difference between OF and ModF is that the OF nominative

Computational Model of Language Learnability 67 pronouns je, tu, il, etc., were potentially tonic elements, unlike their ModF counterparts (see Kayne 1975 on ModF; Adams 1987a,b, Roberts 1993:sec. 2.2, and below on OF). These facts about OF syntax lead to the following parameter settings, in terms of (36): (50) The target string for OF is 11011. Nominative Case assignment was possible both under agreement and under government; null subjects were possible; V2 was obligatory in matrix declaratives. As in the previous section, we now show how this string could be determined on the basis of simple, positive evidence.14 The following kinds of sentence were available as evidence, where (S) indicates a null subject: (51) a. XVS (Et) lors demande Galaad ses armes. (and) then asks Galahad (for) his arms b. SVO Aucassins ala par le forest. Aucassin went through the forest c. XV(S)O Si firent grant joie la nuit. so (they) made great joy the night (52) a. whVSO (Mais) ou fu cele espee prise . . . ? (but) where was that sword taken b. whVSO Ne nos connoissez vos mie? NEG us know you not (51a) is a V2 declarative (as in modern Germanic languages, conjunctions like ‘and’ and ‘but’ do not count in the computation of V2; these elements can be external to CP when they conjoin CPs). The relevant parts of the structure of this sentence are as follows: (53) [CP Lors [C' demande [IP Galaad . . .]]] Here the inflected verb in C assigns nominative Case to the subject NP, Galaad, under government. Of the five parameters in (36), this example then positively triggers nominative Case assignment under government and V2. More generally, this word order has the following p-encoding: (54) XVS p-encodes [ * 1 * * 1 ] Nominative Case is assigned under government, and V2 is obligatory in matrix declaratives.

68 Robin Clark and Ian Roberts OF also allowed SVO sentences like (51b). As in the case of the ModF SVO order, this kind of sentence is p-ambiguous in the following way: (55) SVO p-encodes either [ * 1 * * 1 ] Nominative Case under government, V2 or [ 1 * * * 0 ] Nominative Case under agreement, no V2 We will return to this point below. As noted earlier and illustrated in (51c), OF allowed null subjects in V2 contexts. Such examples are also p-ambiguous from the point of view of the learner: if V is in C, then the null subject is licensed under government in Spec of I; if V is in I, then the null subject is licensed under agreement in Spec of I. In the former situation, nominative Case under government and V2 are triggered; in the latter, nominative Case under agreement is triggered along with a negative value for V2. In both cases, the null subject parameter is positively triggered. The following p-ambiguity arises: (56) XV(s)O p-encodes either [ * 1 * 1 1 ] As above, with null subject or [ 1 * * 1 0 ] As above, with null subject Now consider the interrogatives in (52). (52a) has the same trigger properties as a V2 declarative, except that by assumption interrogatives cannot trigger the V2 parameter. On the assumption that the nominative pronouns were tonic,15 (52b) involves nominative Case assignment under government to the clitic, just as with any other NP subject. These examples, then, have the following p-encoding: (57) whVSO p-encodes [ * 1 * * * ] Nominative under government Putting the above p-encodings together, we arrive at (58). (58) a. b. c. d. e.

[*1**1] Nominative under government and V2 [1***0] Nominative under agreement and no V2 [*1*11] Nominative under government, null subject, and V2 [1 * * 1 0 ] Nominative under agreement, null subject, and no V2 [*1***] Nominative under government

Computational Model of Language Learnability 69 Both nominative Case parameters are triggered positively. (Notice that the positive evidence overrides the fact that these two parameters are in a shifting relationship; we return to this in section 4.) The null subject parameter is also positively triggered. V2 is also triggered if we take it that the positive evidence for the more complex trigger weighs more heavily than the pressure in favor of the simpler structure in the p-ambiguous cases; this is a matter that can be captured by the fitness metric. Finally, as mentioned in footnote 11, there is no morphological evidence in favor of subject clitics, in that there was only one series of subject pronouns at this time. Phonological evidence presumably militates against treating the nominative pronouns as obligatory clitics; for example, these pronouns could be stressed in OF, as their occurrence in topicalized position indicates: (59) Je, que sai? me what do I know Moreover, subject pronouns, unlike object pronouns, could appear first in V2 declaratives. This indicates that they “counted” just like other XPs for the determination of V2; object pronouns did not “count,” however: (60) a. b.

Tu es or riche et ge sui po proisié. you are now rich and I am little valued Toutes ces choses te presta Nostre Sires. all these things to you lent our Lord

On the basis of evidence of this kind, the subject clitic parameter was set to 0. Thus, we have demonstrated how simple, positive data could trigger the parameter settings for OF. Indeed, this discussion of the OF data brings out one important point: clear, positive evidence overrides all other considerations. We showed this in two cases. First, OF had a shifted system with respect to the nominative Case parameter, but learners nevertheless converged on this system since there was clear, positive evidence for it. Second, the p-ambiguities of SVO and V2/null subject examples are resolved by the unambiguous V2 cases, and moreover this resolution is in the direction of the more complex structure. In other words, clear, positive evidence can override both subset/shifting considerations and the pressure toward the simplest possible structure. In terms of our assumptions and definitions, “clear, positive evidence” means non-p-ambiguous evidence. Since the only non-p-ambiguous evidence for V2 is the XVS order, this type of sentence clearly played a crucial role. This order was very frequent in OF matrix declaratives. Roberts (1993:sec. 2.3.1) gives the following percentages for (X)VS and SV(X) order (based on the first 100 matrix declaratives with overt subjects in six representative texts): (61) (X) VS = 58% SV (X) = 34%

70 Robin Clark and Ian Roberts Although a more sophisticated and exhaustive quantitative analysis is needed in order to fully demonstrate the point, we can conclude that (X) VS orders were sufficiently frequent to trigger a positive setting of the V2 parameter. This in turn means that SVO sentences could be analyzed as V2, unlike in ModF. Thus, a shifted system is allowed because there is clear evidence for it; the situation is quite different in ModF, where the only evidence for the shifted system is p-ambiguous and is therefore disregarded. In section 1 we introduced the notion of stability of parameter setting, proposing that a parameter setting is stable to the degree that its expression in the input data is unambiguous. Was the V2 parameter stable in OF? The only non-p-ambiguous trigger for V2 is provided by XVS orders. The frequency of these orders positively sets the nominative-Case-undergovernment parameter and thereby makes the V2 parse available for the p-ambiguous SVO and null subject structures. The potential instability created by the “non-V2 parses” of these examples is eliminated in the optimal grammar of OF. Nevertheless, it is likely that the non-V2 parse for SVO and null subject sentences was a close rival for the V2 parse, even in (later) OF, especially since elegance considerations always favor a non-V2 parse over a V2 parse where there is a choice. More explicitly, in terms of the fitness metric, the existence and frequency of an unambiguous trigger for V2 was sufficient to establish a positive setting for the V2 parameter. Recall that the relative elegance of a parse plays a less crucial role in judging fitness than real grammatical violations. This is because the elegance factor is scaled down by the constant c of the fitness metric, whereas violations are not scaled down. Thus, a hypothesis that leads to slightly more inelegant representations without generating grammatical violations will ultimately drive a hypothesis that generates elegant violations out of the population. In the next section we will show how the MidF situation contrasts with what we have just described for OF. In particular, we will show that, in part because of the introduction of new word orders and in part because of the diminishing frequency of XVS, XVS orders were no longer able to trigger a positive value for the V2 parameter. As a result, the V2 parameter became maximally unstable. The instability was resolved by a parametric change that led to the loss of the constructions in (32)–(34). 3.4 Middle French In MidF, XSV was introduced, and SVO and V1 became more frequent. These facts are standardly described in histories of French (see Harris 1978, Marchello-Nizia 1979, Vance 1989, and, for a detailed treatment in terms of the parameters under discussion here, Roberts 1993). Together, they meant that the V2 constraint was less rigorously respected than it had been in OF (although V2 orders were still possible throughout this period, unlike ModF). Also, a separate series of nominative clitics emerged. For now, we will take the introduction of the new word orders as given, although we

Computational Model of Language Learnability 71 discuss possible causes for this change in section 4 (also see Adams 1987a,b, Roberts 1993). We treat the cliticization of nominative pronouns as a phonologically driven change. Otherwise, MidF was like OF and different from ModF, in particular with respect to nominative Case assignment under government and null subjects. We do not present a target string for MidF, however, since we precisely wish to show how indeterminacy in one parameter (V2) created indeterminacy elsewhere (nominative Case assignment under government and the possibility of null subjects). Let us consider the types of evidence available in MidF. As in OF, the following kinds of declaratives were found: (62) a. XVS Or avoit nostre curé priez des aultres prebtres. now had our priest asked the other priests b. SVO Les Anglais veulent un roi guerrier. the English want a warrior king c. XV(S)O Or ai eu plusseurs fois grant imagination. now have (I) had several times great imagination Also as in OF, these constructions have the following p-encodings, corresponding to (62a), (62b), and (62c), respectively: (63) a. XVS: [ * 1 * * 1 ] Nominative under government and obligatory V2 in matrix declaratives b. SVO: [ * 1 * * 1 ] As above or [ 1 * * * 0 ] Nominative under agreement and no V2 c. XV(S)O: [ * 1 * 1 1 ] As in (a) with null subjects or [ 1 * * 1 0 ] As in (b) with null subjects The changes that took place in MidF created further possibilities, however. Consider the following examples (where s indicates a subject clitic): (64) a. XVs Or ai je proposé ensi que . . . now have I proposed thus that b. XsV Et ce conseil nous vous donnons. and this advice we to you give

72

Robin Clark and Ian Roberts

Taking these examples to positively trigger the subject clitic parameter, we propose that they have the p-encodings in (65a) and (65b), respectively. (65) a. XVs: [ * * 1 * 1 ] Subject clitics and V2 b. XsV: [ * * 1 * 0 ] Subject clitics and no V2 Since clitics can receive Case in ways unavailable to other nominal elements, sentences containing subject clitics provide no information about either nominative Case assignment parameter. The order verb-clitic in (64a) triggers a positive setting for the V2 parameter. On the other hand, since French subject clitics (then as now) do not attach to a verb and move with it (unlike object clitics), the order clitic-verb in (64b) triggers a negative value for the same parameter (but see below for further discussion of this kind of case). As mentioned earlier, MidF allowed, with growing frequency, other word orders that were not found in OF: (66) a. XSV Lors la royne fist Saintré appeller. then the queen had Saintré called b. (S)VY Se appensa de faire ung amy. (he) to himself thought to make a friend (66a), combined with the greater frequency of SVO orders in MidF as compared to OF, shows that V2 began to “erode” at this period. Sentences like (66b) illustrate another phenomenon, noticed and analyzed by Vance (1989): the fact that null subjects increase their distribution in this period, no longer being licensed only in inversion contexts. Roberts (1993:sec. 2.3.5) analyzes this situation in terms of the idea that null subjects could be licensed under agreement as well as under government in MidF, whereas in OF they were licensed only under government. So MidF allowed a null subject in the following configuration: IP

(67) NP

I'

pro

V+I

The p-encodings for these orders are as follows: (68) a. XSV: [ l * * * 0 ] Nominative under agreement and no V2

Computational Model of Language Learnability 73 b.

(S)VY: [ 1 * * 1 0 ] As in (a) with null subject or [ * 1 * 1 1 ] Nominative under government, null subject, and V2

In interrogatives the same general situation holds as in declaratives. On the one hand, the same kinds of examples are found as in OF: (69) a. whVSO Que voelt ceste parolle dire? what wants this word to say ‘What does this word mean?’ b. whVsO A qui estes vous? whose are you (69a) has the same p-encodings as its OF counterpart: (70) whVSO: [ * 1 * * * ] Nominative under government (69b), on the other hand, no longer encodes nominative Case under government, since the subject has cliticized: (71) whVsO: [ * * 1 * * ] Subject clitics Let us now put together the MidF p-encodings: (72) a. b. c. d. e. f. g. h.

[*1**1] Nominative under government and V2 [1***0] Nominative under agreement and no V2 [*1*11] Nominative under government, null subject, and V2 [1**10] Nominative under agreement, null subject, and no V2 [**1*1] Subject clitics and V2 [ * * 1 * 0] Subject clitics and no V2 [*1***] Nominative under government [**1**] Subject clitics

74 Robin Clark and Ian Roberts In terms of p-encodings alone, the V2 parameter appears to be no more or less unstable than it was in OF. However, two factors distinguish the MidF situation from the OF one. First, the unambiguous trigger for V2— XVS order—was much less frequent in MidF than in OF. According to Marchello-Nizia (1979), the mean orders for three texts from the late 15th century are as follows: (73) (X) VS = 10%

SV (X) = 60%

This is a significant difference in frequency as compared to OF (see (61)). The second factor concerns the status of SVO clauses. As shown earlier, the “V2 parse” for these clauses is disregarded in ModF, yet it was favored in OF. In MidF there is total indeterminacy on this point: there is (infrequent) evidence for V2 in the form of XVS order, and there is evidence against V2 in the form of XSV. Any parsing device with a positive setting for V2 would engender a violation on this word order and would be disfavored by the fitness metric. Another factor that adds to the instability of V2 at this point is the development of left-dislocation with a resumptive pronoun (Priestley 1955, Kroch 1989). This is illustrated by the following example from Priestley: (74) Les autres arts et sciences, Alexandre les honoroit bien. the other arts and sciences Alexandre them-honored well The development of this type of construction led to shifting of the type described in (5) and (6). That is, the interaction between left-dislocation and V2 further obscured the latter due to surface “V3” orders. Kroch (1989: 215) shows that there is a real correlation between the rise of the construction in (74) and the loss of V2. The correlation results from the action of the fitness metric, which will judge a system of this type as relatively unfit. Late MidF V2 provides an instance of the situation described in section 1: learners are unable to converge on a single value for a parameter. In other words, the V2 parameter is maximally unstable. This case therefore exemplifies the “pathological” situation for acquisition. Since the available data cannot decide between two parametric values, other aspects of the fitness metric come into play: the Subset Condition and the elegance criterion. As noted earlier, a language with both V2 and left-dislocation will be disfavored by the Subset Condition, since it is a case of shifting. Another factor that can decide between competing parses, and therefore competing p-encodings and triggers, is the criterion of elegance. It is reasonable to suppose that learners follow a least effort strategy in that they try to assign the simplest possible parse to the input string.16 This idea can be instantiated in terms of counting nodes, traces, or chain positions. We will not attempt to

Computational Model of Language Learnability 75 choose between those possibilities (Roberts (1993) opts for chain positions; for a formal statement of this, see his chapter 2, note 26); what is important here is that any parse that represents the inflected verb as being moved to C is more costly in terms of the least effort strategy than one that represents the verb as being moved only to I (by any of the above criteria). Suppose, then, that the least effort strategy plays a crucial role in resolving the instability in the data, by penalizing all p-encodings that depend on V-movement to C where there is a choice between this and V-movement to I. More technically, suppose that hypothesis h1 is identical to hypothesis h2 except that h2 allows for V2 in matrix declaratives whereas h1 does not. That is, h1 and h2 admit the same sentences and contain the same number of superset settings to parameters, differing only in the value for the V2 parameter. Hypothesis h2, then, systematically includes more structure in its representation than h1 since h2 will represent the verb as having moved to C (as well as movement of the subject in SVO). In other words, if h1 returns k nodes on a structure, h2 will return k + n nodes. Letting m represent the number of superset settings in each hypothesis, running each of the above through the fitness metric will yield the following ratings: (75) a.

h1: 1 −

m + ck 2m + c (2k + n)

b.

h2: 1 −

m + c (k + n) 2m + c (2k + n)

m + c (k + n) m + ck is greater than 1 − , the fitness 2m + c (2k + n) 2m + c (2k + n) metric prefers h1 over h2 and the learner is under pressure to select h1. This, then, effectively sets the V2 parameter to 0. Like OF, MidF had one order where the V2 parameter was unambiguously p-encoded as 1: namely, XVS orders, which unambiguously p-encode [* 1 * * 1]. In the situation of instability that reigned in MidF, the fitness metric, formulated to take account of the way in which the least effort criterion resolves p-ambiguities, will lead to convergence on a grammar where such experience is simply disregarded (i.e., not parsed where no alternative analysis can be found).17 Thus, this case shows how an unambiguous trigger for a given property can be disregarded when the system is maximally unstable, even if the instability is located in another area of the grammar. If the hypotheses where the V2 parameter has a positive value are penalized, the only remaining triggers for nominative Case assignment under government are whVSO orders. This order, too, is only weakly triggered in 15th-century French. The difference between MidF and OF in this regard was that several new constructions were available in MidF, notably complex inversion (as in Où Jean est-il allé? ‘(lit.) where Jean is-he gone’) and Since 1 −

76 Robin Clark and Ian Roberts (qu’)est-ce que ‘(what) is-it that’.18 Whereas nominative Case assignment under agreement received strong support from the input data, nominative Case assignment under government received very little. Since the two parameters are in a shifting relationship, there was some pressure (built into the fitness metric, as shown in section 2) not to set them both to 1. In this situation, the fact that nominative Case assignment under government was only weakly triggered led to a change in the value of this parameter. The change to a system with nominative Case assignment under agreement entailed a change in the null subject parameter (already only weakly triggered, as (74) shows) for theory-internal reasons. Under the assumption that null subjects can only be licensed in positions where Case is assigned (Rizzi 1986a), once nominative Case could no longer be assigned under government, null subjects could no longer be licensed under government. In this way, French lost null subjects with no significant change in the verbal inflectional morphology. There is a complication here, however—namely, that MidF, unlike OF, also allowed null subjects to be licensed in configurations of agreement. Why were these null subjects lost along with those licensed in government configurations? Roberts (1993) answers this question in terms of a postulate concerning the identification of null subjects that we can phrase as follows: (76) Where null subjects are licensed only in configurations of agreement, they require a “pronominal” Agr for identification. A “pronominal” Agr is an Agr that morphologically distinguishes at least five persons—that is, an Agr of the kind found in languages like Spanish and Italian. French Agr is not pronominal in this sense, and indeed has not been since early in the OF period. The intuition behind (76) is that a system where null subjects are licensed under government requires less inflectional morphology to recover the content of those null subjects than one where the only licensing configuration is agreement, since government is a closer syntactic relation than agreement. A system that licenses null subjects both under government and under agreement, like MidF, tolerates a relatively poorer agreement morphology. Therefore, once null subjects could no longer be licensed under government in French, the relative “poverty” of the verbal morphology became crucial, and null subjects were also lost in contexts where they had been licensed under agreement. As Roberts shows, the parallel development of Northern Italian dialects, in particular Veneto, supports the postulation of (76). Thus, at the beginning of the MidF period (ca. 1300) the relevant parameter settings were those in (77a); by the end of this period (ca. 1500) they had become those in (77b). (77) a. 11011 (= OF) b. 10100(= ModF)

Computational Model of Language Learnability 77 It is clear that the crucial element of instability was created by the gradual erosion of V2 as a rigid constraint on word order in matrix declaratives. In particular, the introduction and spread of XSV orders brought about a situation that eliminated a crucial trigger for nominative Case assignment under government—XVS order. The previous discussion shows how the genetic algorithm approach to learnability, and in particular the fitness metric, can shed light on this. What seems to have happened is that V2 was mildly unstable in, say, 1300 (recall the discussion at the end of section 3.3) in the sense that non-V2 parses for certain types of sentence (e.g., SVO) were close competitors for V2. These competitors generated “mutant” word orders, notably XSV, which were highly successful. The critical point was reached in the late 15th century, when V2 was eliminated. For completely contingent reasons (which concern the overall organization of the MidF grammatical system), the loss of V2 led to the loss of nominative Case assignment under government. And for reasons having to do with the organization of UG, this entailed the loss of null subjects. Moreover, Roberts (1994 [this volume, Chapter 11]) argues that this in turn led to the loss of clitic climbing (also see Kayne 1989). This account of syntactic changes in the history of French illustrates how syntactic change can be internally driven: change in one parameter can destabilize another. We will provide another example of this in section 4. However, we now find ourselves up against the problem posed by innovations: how were XSV orders introduced into a V2 system? Since these orders are ungrammatical in modern V2 Germanic languages, their introduction into a V2 system requires some comment. If we say that the weakening of V2 was a condition for this development, we risk falling into an unproductive regress. It was in part for this reason that we avoided the issue earlier and simply took this innovation as given. However, there are good reasons to think that the introduction of XSV order is related to the cliticization of subject pronouns. Adams (1987b) points out that the overwhelming majority of early cases of XSV involved a pronominal subject. As Adams suggests, it is possible that XSV originates from cases of V2 where the clitic subject pronoun is not counted in determining V2.19 If Adams’s idea is correct, then the initial stimulus for the erosion of V2 comes from a morphophonological change in the subject pronouns. As is frequently the case, syntactic change can be traced back to extrasyntactic factors, although the relationship between the extrasyntactic factors and the syntactic changes they cause can be extremely indirect. This is because instability, once introduced, can propagate through a grammatical system.

4. Some Concluding Remarks Here we wish to address some of the wider issues raised by our case study of language change. These concern the shifting relationship between the nominative Case parameters in section 3.1 with respect to the OF data and

78 Robin Clark and Ian Roberts what our approach has to say about the classic questions for diachronic linguistics concerning the nature of innovation and loss. How is it that a massively unstable system of parameter settings, like the one in MidF, can come into being in the first place? Of course, factors external to the syntax can destabilize a syntactic system, but we believe that instability can propagate within a syntactic system and that exactly this has happened in the history of French. Consider again the p-encodings for the OF data: (78) a. b. c. d. e.

[*1**1] [1***0] [*1*11] [1**10] [*1***]

Bearing in mind that the correct grammar for OF did not contain nonV2 parses (i.e., the p-encodings in (78b) and (78d) are discarded in the correct grammar), it seems that nominative Case assignment under agreement had a quite precarious status in OF. There was another trigger for nominative Case assignment under agreement, however: the fact that subordinate clauses regularly had SVO order (assuming, contra Lightfoot (1989, 1991), that subordinate word order can trigger parameter settings). Thus, it is the fact that OF had a root/embedded asymmetry with respect to V2 order that is crucial for triggering nominative Case assignment under agreement. Now, there is evidence that early OF (prior to ca. 1200) allowed embedded V2 (Cardinaletti and Roberts, 2002 [this volume, Chapter 12], Dupuis 1989, Hirschbuhler 1990). This means that nominative Case assignment under agreement was an OF innovation, emerging in subordinate clauses as V2 became a uniquely root phenomenon. This innovation started the chain of changes leading to the MidF innovations that were crucial to our account in section 3.4 (and hence to the later changes discussed there). Assume that an archaic stage of OF did not allow nominative Case assignment under agreement. How can Case assignment under agreement arise? Notice that when such assignment comes into the grammar, a shifted system is introduced on the basis of a nonshifted one. Following an idea originally due to Cardinaletti (1990), let us suppose that expletive elements can never topicalize. In a V2 system, however, Spec of C’ is a topic position: it is an Ā-position and a position that does not receive Case. Cardinaletti proposes that when an expletive occupies this position, as frequently happens in the V2 Germanic languages, the position is able to count as an A-position in that (nominative) Case can be assigned there. Thus, we can attribute the introduction of nominative Case assignment under agreement to the introduction of a lexical expletive capable of occupying Spec of CP in matrix declaratives. OF had a lexical expletive il that appeared in Spec of C’ in examples like (79) (from Einhorn 1974: 123).

Computational Model of Language Learnability 79 (79) Il ne me chaut. it not to me matters ‘It doesn’t matter to me.’ Supposing that this construction emerged in archaic OF, we can then say that nominative Case assignment under agreement was triggered by this kind of example. Finally, let us briefly consider what implications our proposals may have for traditional preoccupations of diachronic syntax: the nature of innovation and the nature of loss. Of course, it should be immediately clear that the conception of how grammatical systems differ from one another that lies at the heart of the principles-and-parameters approach means that parameters themselves never change.20 What changes over time are parametric values. Nevertheless, at the level of constructions (e.g., available word order types) it is clear that possibilities are both innovated and lost. In our terms, innovation may arise from one of two sources: either internally, when a parametric change makes new constructions available, or externally, when phonological or morphological change weakens evidence for certain hypotheses. The second type of innovation is likely to lead to instability at the level of parameter settings, as in the case of the introduction of XSV orders triggered by the cliticization of subject pronouns in MidF.21 Concerning loss, it seems that only parametric change can truly eliminate a construction in the sense that construction C is accepted by native speakers of language L at time T and rejected at T′ (T > T′). This has been the fate of simple inversion, V2, and null subjects in French. In terms of the standard view of language acquisition, this situation seems problematic. Put very simplistically, why is one generation’s trigger experience the next generation’s fossil? This is the logical problem of language acquisition again. Various solutions have been proposed, but we believe we have discovered a new and interesting one. An approach to learnability based on a genetic algorithm including a version of the fitness metric makes it possible to see how a data point can be disregarded in a situation of instability (where instability can be formalized); this was what happened in the case of XVS orders in 15th-century French. Although relatively infrequent and often parsable as some other construction, XVS was certainly found in 1500, and so, given the standard assumption that parameters can be set on the basis of quite impoverished experience, an account of loss based on frequency considerations alone will not answer the fundamental question. The fitness metric, properly formulated so that frequency and other considerations are taken into account, seems able to resolve this tension between standard views of acquisition and the fact that structures are lost in the course of language change, since it can be seen why one class of input strings may be rendered unparsable. This can happen even where, as in the case of XVS orders, the input in question is intrinsically simple and structurally “transparent”; here we

80 Robin Clark and Ian Roberts see a major difference between our account and the approach to language change based exclusively on something like Lightfoot’s (1979) Transparency Principle, although we believe our approach retains the basic insight behind the Transparency Principle in the elegance part of the fitness metric.22 Another important consideration that emerges from our discussion is that exactly the same string Si can successfully trigger a parameter setting P(υ1) in one grammatical system Gi, but fail to trigger P(υ1) in system Gj ≠ Gi. French XVS order is a case of exactly this sort, where Gi is the grammar of OF and Gj that of late MidF. In terms of the genetic algorithm, Si can trigger a successful hypothesis or an unsuccessful one. As in the biological world, successful propagation depends as much on the external environment as on internal properties, so that little can be predicted purely on the basis of internal structural criteria. It is this aspect of the genetic algorithm that makes possible a deeper understanding of language change and demonstrates how successive generations may treat the “same” trigger experience differently. Note also that in these terms, language change refers not only to the “limit cases” of innovation and loss, but also to the varying success of strings in encoding viable parameter settings. Our approach also has implications for the theory of markedness. It is part of the classical concept of markedness that marked properties are both diachronically unstable and “difficult” in terms of acquisition. A shifted system of parameter settings can be thought of as a marked system. It is clear from our discussion that a shifted system is diachronically unstable. Consider again the shifted system discussed in section 2, which featured both V2 and left-dislocation. Neither V2 nor left-dislocation is marked on its own (note the stability of Germanic V2 and the fact that all periods of French feature left-dislocation of one kind or another); however, their combined presence in a system leads to markedness—witness the instability of MidF.23 So we suggest that in general markedness, rather than being an inherent property of certain parameter values, is a property that derives from the interaction of parameters in a given grammatical system, relative to the fitness metric. This in turn implies that a given parameter value can be marked in one grammatical system (or at one period) and unmarked in another system (e.g., at another period). Diachronic studies of the type discussed here also have important implications for the study of language learnability and language acquisition. As discussed briefly above, diachronic change represents a type of “pathological” learning, where learners systematically arrive at the wrong grammar for the target language. Strictly speaking, these are cases where learners fail. We would argue that learners fail for reasons that reveal something important about their internal structure. Parametric change is the result of an input text that places indifferent pressure on the learner’s hypotheses; several different grammars can provide an acceptable account for the input text. We have shown that other factors, always related to the learner’s internal fitness metric, come into play to distinguish between the competing hypotheses.

Computational Model of Language Learnability 81 These factors involve the Subset Condition and a measure of elegance. Let us return to the fitness metric, repeated here as (80). (80) The fitness metric

(∑

n

)

υ j + b∑ j =1 sj + c ∑ j =1 e j − (υi + bsi + cei ) n

j =1

n

(n − 1)(∑ j =1 υj + b∑ j =1 sj + c∑ j =1 e j ) n

n

n

Our study of diachronic change reveals certain facts about the scaling constants b and c. We assume that empirical coverage of the input text is the n learner’s central interest; thus, violations (calculated by ∑ j =1 υ j for the population and by υi for the individual) are the single most important factor in the equation. Both superset settings and elegance are scaled down by the constants b and c, respectively. Let us now consider what the relative magnitudes of b and c are. At a certain point, French was a V2 language that allowed for left-dislocation (the latter associated with atonic pronouns), and it was a shifted language that would be selected against by the fitness metric. Furthermore, the relative frequency of structures that would have required both V2 and left-dislocation was relatively low, placing little pressure on the learner in terms of violations. All else being equal, learners could have preferred either a language with matrix V2 and no left-dislocation or a language with left-dislocation and no matrix V2. Notice that left-dislocation is a superset parameter; a language that allows left-dislocation in addition to its basic word order is a superset of a language that allows only the basic word order. We argued, on the other hand, that matrix V2 led to more complex representations, relative to the input text, than a grammar without matrix V2. Now, the changes we have illustrated in French involve the abandonment of matrix V2, a nonsuperset parameter, and the persistence of leftdislocation, a superset parameter. Given our premises, then, the fitness metric must have preferred a grammar that generated an elegant set of representations and a superset language over a grammar that generated inelegant representations and a subset language. Thus, learners appear to consider elegance a more important factor than superset settings when evaluating hypotheses: (81) c > b Thus, our study of diachronic change has enabled us to make a concrete hypothesis about how learners evaluate parameter settings. We can now test this hypothesis against actual child grammars, perhaps by attempting to characterize successive developmental stages in child language. In general, we should see children avoiding grammars that create inelegant representations. More to the point, we should find children resisting grammars that

82 Robin Clark and Ian Roberts force longer chains to the point of, temporarily at least, preferring grammars with superset settings if these grammars can approximate the target. We have shown how a theory of language learning based on a genetic algorithm affords a novel and insightful account of language change, taking as our case study of language change the development of word order and null subjects in French. We believe that our account sheds light both on the mechanisms of language change and on those of language acquisition, and goes some way toward building a bridge between these two domains; in this respect, our work is conceptually very close to work by Lightfoot (1989, 1991). Moreover, we have shown that it is possible to characterize the markedness of systems and to clearly see the role played by such factors as elegance and frequency of input, and the interactions between these factors. We know of no other approach to language learnability and language change that achieves these results.

Notes The first author received support from grant 11–25362.88 from the Fonds national suisse pour la recherche scientifique and from a grant from the Fondation Ernstet Lucie Schmidheiny. This article has greatly benefited from comments made by two anonymous reviewers for Linguistic Inquiry. 1. The first person to formulate this problem in terms of generative syntax was Lightfoot 1979. 2. Genetic algorithms were developed by Holland; see particularly Holland 1975. Goldberg 1989 provides a comprehensive overview of the technique; see also Booker, Goldberg, and Holland 1990. Clark 1990 develops a model of parameter setting in terms of genetic algorithms as an approach to demonstrating the learnability property. See also Clark 1992 for a comprehensive theoretical treatment. 3. We will assume, with many researchers in developmental psycholinguistics, that an input text consists of short, simple, grammatical sentences. Little in the present discussion hinges on the precise nature of the text, so long as the basic constructions of the language are adequately exemplified. For further discussion (and debate) on the nature of the input evidence, see Wexler and Culicover 1980, Lightfoot 1989, and the discussion of the latter work. For a formal characterization of the input evidence and its relation to learning, see Osherson, Stob, and Weinstein 1986. 4. As we will show, shifting is more than a logical possibility and serves to force parametric change over time. 5. Space prevents a comprehensive discussion of this class of algorithms; see Goldberg 1989 for a general introduction and Clark 1990, 1992, for an application to the learnability problem for natural languages. 6. Genetic algorithms are part of a class of algorithms that approximate some desired optimum but are not absolutely guaranteed to return the optimum. Other such algorithms include the “simulated annealing” found in some applications using neural networks, as pointed out by an anonymous reviewer. The property of returning a result that is “probably approximately correct” (PAC) is important for our purposes since such approximations are the fuel for language change (see the discussion of PAC learning in Natarajan 1991). We have selected genetic algorithms from the class of PAC algorithms because genetic

Computational Model of Language Learnability 83 algorithms incorporate a notion of relative fitness and for the formal clarity of the resulting model of parameter setting. We will argue below that this notion of fitness provides some insight into the nature of the learner and how properties of the learner govern diachronic change. 7. See Clark 1992 for an extensive formal discussion of fitness and reproduction and of their influence on convergence. Here, we will mainly be concerned with the intuitions that underlie the formalism. 8. For simplicity, we assume that the learner has access to a table that tells it which settings are superset settings; this is much simpler than forcing the learner to calculate whether or not a given parameter value generates a superset language. Note that shifting relations will not be included on the table. These will be selected against by the fitness metric in an indirect way. 9. The results discussed here receive a more formal discussion in Clark 1992, where proofs of certain theorems entailed by the fitness metric are given. For present purposes, the important point is that, relative to an input text, the fitness metric drives the learner toward a hypothesis that minimizes the number of violations and the number of superset settings and that generates the most elegant syntactic representations possible, given that grammatical violations are avoided. 10. The notion of p-encoding defined here is essentially isomorphic to that of “schemata,” which has been widely discussed in the genetic algorithm literature (see, in particular, Goldberg 1989 and the references cited there). There is one important difference, however; schemata are usually taken as ranging over empirical generalizations whereas p-encodings represent the ambiguities inherent in the input stream. The two are similar in that p-encodings represent the set of grammars that can, in principle, assign a well-formed representation to a given string. 11. Vance (1989) in fact shows that 15th-century MidF null subjects could be licensed under agreement as well as government. Nevertheless, both null subjects licensed under government and null subjects licensed under agreement are lost with simple inversion in the 16th century. Roberts (1993:sec. 2.4.3) proposes that the loss of null subjects where they were licensed under government also entailed their loss throughout the system on the basis of the idea that, for null subjects to be licensed only under agreement, a very rich “pronominal” morphology is required. This type of morphology is found in Italian and Spanish but not in MidF or ModF. Hence, the “poverty” of French agreement, combined with the change in the nominative Case parameter, led to the loss of null subjects everywhere. We will discuss Vance’s data further below. 12. In our presentation, we abstract away from the “split Infl” hypothesis of Pollock (1989), restricting ourselves to projections of I. To fully account for the facts of ModF inversion, however, it is necessary to split I into at least Agr and T (and their projections). In terms of the “Agr over T” system proposed by Belletti (1990), our nominative Case parameter refers to Agr. To account for stylistic inversion, we probably need to say that T can assign nominative Case to a postverbal subject under government (see Rizzi 1990). (Also see note 13.) 13. Literary ModF allows strings that appear to be V2—for example, Dans cette maison vécut Racine ‘In this house lived Racine’. However, such examples should be treated as instances of stylistic inversion. Stylistic inversion differs from V2 and subject-clitic inversion in that the subject appears in a position following the entire verbal complex in a compound tense and is not sensitive to the root/embedded distinction, unlike true V2. (See Kayne and Pollock 1978 and Pollock 1986.) In fact, Pollock (1986) suggests that stylistic inversion may involve a nonreferential null subject in Spec of Iʹ. If so, ModF allows at least

84 Robin Clark and Ian Roberts some highly restricted occurrences of null subjects and (36d) should therefore be reformulated to refer to referential null subjects. 14. In the case of OF, as in the case of all languages now without native speakers, negative evidence in the form of grammaticality judgments is unavailable. Linguists working on such languages are in a situation almost analogous to that of children acquiring their native language, although in fact the linguists’ situation is worse since they have no access to UG and their data are seriously degenerate owing to dialect mixture, scribal error, and so on. Unlike children, however, linguists have no access to a regular input text. Children are surrounded by native speakers producing grammatical utterances. Linguists obviously are not, since all the native speakers are dead. 15. In fact, there are reasons to think that in the position immediately following the inflected verb, as in (52b), these pronouns did cliticize in OF (see Dupuis 1989:119f., Roberts 1993:sec. 2.2.2, and Vance 1989: 70ff.). However, Roberts argues that the crucial step in the development of the system of subject pronouns in French was the emergence of complementary distribution between the je-series and the moi-series. This happened because the cliticization of the je-pronouns became obligatory in MidF. What the OF evidence shows is that these pronouns were optionally clitics in that they cliticized only in certain contexts. In other contexts, such as those in (59) and (60), these pronouns were clearly tonic. It may be, then, that the correct formulation of the parameter in (36c) should refer to obligatory cliticization of nominative pronouns, or, more likely, to the existence of a special series of clitic pronouns. Note that in the latter case the trigger for the parameter is morphological: the learner must recognize two paradigms of subject pronouns. 16. This idea is discussed at length in the context of syntactic change by Roberts (1993), who notes the close resemblance between this idea and the Transparency Principle proposed by Lightfoot (1979). Also see de Vincenzi’s (1989) proposal that something of this kind is a general parsing strategy, not limited to language learners. Note that the least effort strategy as conceived here is not a principle of grammar; in this we differ from Chomsky (1991). 17. An alternative analysis is often available. Roberts (1993:sec. 2.4.1) shows that many cases of V2 could be treated as “free inversion.” 18. For a synchronic analysis of the former construction, see Rizzi and Roberts 1989 [this volume, Chapter 9], and for a discussion of its diachronic development, Roberts 1993:sec. 2.3.4. Concerning the development of the latter construction as a nonemphatic interrogative, see Foulet 1921. 19. We do not want to propose that preverbal subject pronouns in MidF or ModF are syntactic clitics; rather, following Kayne (1983), we believe that these pronouns cliticize only in PF. However, the ultimately unsuccessful hypothesis that these pronouns were indeed syntactic clitics could nevertheless have given rise to XSV orders at the time when the subject-pronoun system was undergoing change. See Roberts 1993 for a more elaborated approach. 20. Except perhaps at the higher diachronic level of phylogenetic change; it is a reasonable assumption that the set of parameters available to modern Homo sapiens is not the same as the set that was available to the first hominids with a language faculty. Of course, we are concerned in the text with changes in the recorded history of languages that by assumption fall within the set of human languages, so this question does not arise. 21. There is at least a metaphorical sense in which cases like XSV are successful rogue hypotheses, where success is determined by the least effort criterion. This is mutation at the level of constructions, not at the level of parameters, so the mutation operator of section 2 is presumably not relevant.

Computational Model of Language Learnability 85 22. More recently, Lightfoot (1991) has proposed a new approach to change based on “Degree-0 learnability.” A detailed comparison of that approach with the one developed here is beyond the scope of this article (though see Clark, in preparation). 23. Modern German also has left-dislocation, but with a tonic resumptive pronoun. On the other hand, MidF left-dislocation featured atonic resumptive pronouns. This was yet another way in which the clitic nature of pronouns in MidF created instability.

References Adams, Marianne. 1987a. From Old French to the theory of pro-drop. Natural Language and Linguistic Theory 5:1–32. Adams, Marianne. 1987b. Old French, null subjects and verb second phenomena. Doctoral dissertation, UCLA, Los Angeles, Calif. Baker, Mark. 1988. Incorporation: A theory of grammatical function changing. Chicago: University of Chicago Press. Belletti, Adriana. 1990. Generalized verb movement: Aspects of verb syntax. Turin: Rosenberg and Sellier. Berwick, Robert. 1985. The acquisition of syntactic knowledge. Cambridge, Mass.: MIT Press. Booker, L. B., David E. Goldberg, and John H. Holland. 1990. Classifier systems and genetic algorithms. In Machine learning: Paradigms and methods, ed. Jaime Carbonell, 235–282. Cambridge, Mass.: MIT Press. Cardinaletti, Anna. 1990. Pronomi nulli e pleonastici nelle lingue germaniche e romanze: Saggio di sintassi comparata. Dottorato di ricerca in linguistica, Università di Padova. Cardinaletti, Anna, and Ian Roberts. 2002. Clause structure and X-second. In Functional Structure in DP and IP: the Cartography of Syntactic Structure Volume One ed. G. Cinque New York/Oxford: Oxford University Press, pp. 123–166 [this volume, Chapter 12]. Chomsky, Noam. 1977. On wh-movement. In Formal syntax, ed. Peter Culicover, Thomas Wasow, and Adrian Akmajian, 71–132. New York: Academic Press. Chomsky, Noam. 1981. Lectures on government and binding. Dordrecht: Foris. Chomsky, Noam. 1991. Some notes on economy of derivation and representation. In Principles and parameters in comparative grammar, ed. Robert Freidin, 417–454. Cambridge, Mass.: MIT Press. Clark, Robin. 1990. Papers on learnability and natural selection. Technical Reports in Formal and Computational Linguistics, No. 1. Université de Genève. Clark, Robin. 1991. A computational model of parameter setting. Paper presented at the American Association for Artificial Intelligence Spring Symposium on Machine Learning, Natural Language, and Ontology. Stanford, Calif. Clark, Robin. 1992. The selection of syntactic knowledge. Language Acquisition 2:85–149. Clark, Robin. In preparation. Finitude, boundedness, and approximate learning of natural languages. Ms., Université de Genève. Darwin, Charles. 1859. On the origin of species. London: John Murray. Dupuis, Fernande. 1989. L’expression du sujet dans les subordonnées en ancien français. Thèse de Ph.D., Université de Montréal, Montréal, Québec. Einhorn, Einar. 1974. Old French: A concise handbook. Cambridge: Cambridge University Press.

86 Robin Clark and Ian Roberts Everett, Daniel. 1986. Pirahã clitic doubling and the parametrization of nominal clitics. In MIT working papers in linguistics 8, 85–127. Department of Linguistics and Philosophy, MIT, Cambridge, Mass. Everett, Daniel. 1989. Clitic doubling, reflexives and word order alternations in Yagua. Language 65:339–372. Foulet, Lucien. 1921. Comment ont évolué les formes de l’interrogation? Romania 47:243–348. Foulet, Lucien. 1982. Petite syntaxe de l’ancien français. 3d ed. Paris: Editions Champion. Gold, E. M. 1967. Language identification in the limit. Information and Control 16:447–474. Goldberg, David. 1989. Genetic algorithms in search, optimization, and machine learning. Reading, Mass.: Addison-Wesley. Haldane, J. B. S. 1990. The causes of evolution. Princeton, N.J.: Princeton University Press. Harris, Martin B. 1978. The development of French syntax: A comparative approach. London: Longmans. Hirschbuhler, Paul. 1990. La légitimation de la construction V1 à sujet nul dans la prose et le vers en ancien français. Revue québécoise de linguistique 19:32–55. Holland, John. 1975. Adaptation in natural and artificial systems. Ann Arbor, Mich.: University of Michigan Press. Kayne, Richard. 1975. French syntax. Cambridge, Mass.: MIT Press. Kayne, Richard. 1983. Chains, categories external to S, and French complex inversion. Natural Language and Linguistic Theory 1:109–137. Kayne, Richard. 1989. Null subjects and clitic climbing. In The null subject parameter, ed. Osvaldo Jaeggli and Ken Safir, 239–261. Dordrecht: Kluwer. Kayne, Richard, and Jean-Yves Pollock. 1978. Stylistic inversion, successive cyclicity, and Move NP in French. Linguistic Inquiry 9:595–621. Kiparsky, Paul. 1982. Explanation in phonology. Dordrecht: Foris. Koopman, Hilda, and Dominique Sportiche. 1991. The position of subjects. In The syntax of verb-initial languages, ed. James McCloskey, 211–258. Amsterdam: Elsevier. [Special issue of Lingua 85.] Kroch, Anthony. 1989. Reflexes of grammar in patterns of language change. Language Variation and Change 1:199–244. Lightfoot, David. 1979. Principles of diachronic syntax. Cambridge: Cambridge University Press. Lightfoot, David. 1989. The child’s trigger experience: Degree-0 learnability. Behavioral and Brain Sciences 12:321–334; commentary 334–375. Lightfoot, David. 1991. How to set parameters. Cambridge, Mass.: MIT Press. Lightfoot, David, and Norbert Hornstein. 1981. Explanation in linguistics. London: Longmans. Marchello-Nizia, C. 1979. Histoire de la langue française aux XIVe et XVe siècles. Paris: Bordas. Natarajan, Balas. 1991. Machine learning: A theoretical approach. Palo Alto, Calif.: Morgan Kaufmann. Osherson, Daniel, Michael Stob, and Scott Weinstein. 1986. Systems that learn: An introduction to learning theory for cognitive and computer scientists. Cambridge, Mass.: MIT Press. Paul, Hermann. 1920. Prinzipien der Sprachgeschichte. 5th ed. Halle: Niemeyer. Poletto, Cecilia. 1990. Diachronic development of subject clitics. Talk given at the Crucial Languages Workshop, Université de Genève. Pollock, Jean-Yves. 1986. Sur la syntaxe de en et le paramètre du sujet nul. In La grammaire modulaire, ed. Mitsou Ronat and Daniel Couquaux, 211–246. Paris: Editions de Minuit.

Computational Model of Language Learnability 87 Pollock, Jean-Yves. 1989. Verb movement, Universal Grammar, and the structure of IP. Linguistic Inquiry 20:365–424. Price, Glanville. 1971. The French language: Present and past. London: Edward Arnold. Priestley, Lawrence. 1955. Reprise constructions in French. Archivum Linguisticum 7:1–28. Renzi, Lorenzo. 1983. Fiorentino e italiano: Storia dei pronomi personali soggetto. In Italia linguistica: Idee, storia, struttura, ed. F. Albano Leoni et al., 223–239. Bologna. Renzi, Lorenzo, and Laura Vanelli. 1983. I pronomi soggetto in alcune varietà romanze. In Scritti linguistici in onore di Giovan Battista Pellegrini, 121–145. Pisa. Rizzi, Luigi. 1986a. Null objects in Italian and the theory of pro. Linguistic Inquiry 17:501–557. Rizzi, Luigi. 1986b. On the status of subject clitics in Romance. In Studies in Romance syntax, ed. Osvaldo Jaeggli and Carmen Silva-Corvalàn, 391–419. Dordrecht: Foris. Rizzi, Luigi. 1990. Relativized Minimality. Cambridge, Mass.: MIT Press. Rizzi, Luigi, and Ian Roberts. 1989. Complex inversion in French. Probus 1:1–30 [this volume, Chapter 9]. Roberts, Ian. 1993. Verbs and diachronic syntax. Dordrecht: Kluwer. Roberts, Ian. 1994. Two types of head-movement in Romance. In D. Lightfoot & N. Hornstein (eds) Verb Movement. Cambridge: Cambridge University Press, pp. 207–242 [this volume, Chapter 11]. Schwartz, Bonnie, and Sten Vikner. 1996. The verb always leaves IP in V2 clauses. In A. Belletti & L. Rizzi (eds) Parameters and Functional Heads: Essays in Comparative Syntax. New York/Oxford: Oxford University Press, pp. 11–62. Thurneysen, Robert. 1892. Die Stellung des Verbums im Altfranzösischen. Zeitschrift für Romanische Philologie 16:289–371. Vance, Barbara. 1989. Null subjects and syntactic change in medieval French. Doctoral dissertation, Cornell University, Ithaca, N.Y. Vanelli, Laura, Lorenzo Renzi, and Paola Benincà. 1986. Typologie des pronoms sujets dans les langues romanes. In Actes du XIIe Congrès de Linguistique et Philologie Romanes. Aix-en-Provence. de Vincenzi, Maria. 1989. Syntactic parsing strategies in a null subject language. Doctoral dissertation, University of Massachusetts, Amherst. Wexler, Kenneth, and Peter Culicover. 1980. Formal principles of language acquisition. Cambridge, Mass.: MIT Press.

3

Object Movement and Verb Movement in Early Modern English Ian Roberts

0. Introduction This paper provides evidence that an earlier stage of English had a rule ‘object shift’, similar to that found in the Mainland Scandinavian (MSc) languages. The evidence of object shift in English sheds light on the nature of object shift in general and provides a new perspective on the well-known loss of overt verb-movement in the history of English. We begin by illustrating the phenomenon of object shift from Swedish and Danish, drawing on the important work by Holmberg (1986, 1991) and Vikner (1989, 1994). In our discussion of MSc, we underline the central fact about object shift: the object moves just when the verb moves. This is section 1. Having illustrated object shift in MSc, we turn in section 2 to the English data. What we show is that Early Modern English (ENE) of the 16th century had object shift of a type very similar to that found in MSc, in particular in that the connection between object movement and verbmovement is attested. The ENE facts are thus amenable to an analysis parallel to that of MSc. Similarly, the loss of object shift since ENE can be naturally connected to the loss of overt verb-movement, and we can thus explain the absence of shifted objects in NE in terms of the absence of overt verb-movement. Section 3 elaborates on the analysis, showing how a small extension of Chomsky’s (1993) system of feature-checking, head-movement and locality can provide a straightforward account of object shift in MSc and ENE, and of the diachronic development of English. The analysis also extends, at least in part, to Icelandic and Faroese. We are led to two main conclusions on the basis of the observation that object shift is attested in English for as long as verb-movement is. First, we see that the English pronoun system is essentially parallel to that of the MSc languages. In particular, English pronouns are not cross-linguistically unusual in any sense. Their cross-linguistically unusual syntax derives from the fact that, in the absence of overt verb-movement, they never (or almost never) occupy a ‘special’ syntactic position. Similarly, the English pronouns

Object Movement and Verb Movement 89 have not changed since ENE; what has changed in English is AgrS, in that overt verb-movement is no longer possible (for main verbs).

1. Object Shift in Mainland Scandinavian Holmberg (1986, 1989, 1991) and Vikner (1989, 1994) discuss the phenomenon of object-shift in MSc. In these languages, unstressed pronominal objects are obligatorily moved leftward out of VP if the main verb moves out of VP (here and throughout, object pronouns are assumed to be unstressed). Taking the negative adverb ikke to be at the left margin of VP (whatever its precise position may be), the following Danish examples illustrate: (1) a. Hvorfor læste studenterne ikke [t artiklen]? Why read the-students not the-articles? b. *Hvorfor læste studenterne artiklen ikke [t t]? Why read the-students the-articles not? ‘Why didn’t the students read the articles?’ (2) a. *Hvorfor læste studenterne ikke [t den]? Why read the-students not it? b. Hvorfor læste studenterne den ikke [t t]? Why read the-students it not? ‘Why didn’t the students read it?’ In all these examples the inflected verb has moved to C, as is usual in both declarative and interrogative main clauses in MSc since these are V2 languages (see Vikner (to appear, ch. 2)). In (1), the non-pronominal direct object DP artiklen cannot be moved out of VP, as the ungrammaticality of (1b) shows. In (2), we observe the converse behaviour of the pronominal object: where the verb leaves VP, so must the object pronoun. (2a) is ungrammatical because the object has remained in VP while the verb has moved out of VP. In (2b), the object pronoun, although it has left VP, has not ‘followed’ the verb to C. This is evident from the relative positions of the object pronoun and the subject DP here. There is no reason to say that the subject DP is anywhere other than in its usual Spec-AgrS position, and the object pronoun cannot precede this DP.1 Hence the object pronoun does not move to C, but moves to a position somewhere in IP but outside VP. The data in (1) and (2) provide us with the essentials of object shift, and illustrate the two basic generalizations about the phenomenon: a) The object pronoun leaves the VP when the verb does; b) The object pronoun does not ‘follow’ the verb to C but instead remains in I.

90 Ian Roberts Vikner (1989, 1994) gives a range of further data which illustrate these properties of object shift. (3) shows that object shift does not take place in embedded clauses where the verb does not move: (3) a. Det var godt at han ikke It was good that he not b. *Det var godt at han den It was good that he it ‘It was good that he didn’t buy it’.

[købte den]. bought it. ikke [købte t]. not bought.

Still assuming ikke to be at the left edge of VP, the fact that ikke precedes the verb here shows that it has not left VP. In (3a) the object pronoun also remains in VP and the sentence is grammatical. In (3b) object shift takes place and, in the absence of verb-movement, the result is ungrammatical. This shows that generalization (a) above should be tightened up to say that object shift takes place only when verb-movement does. In (4), we give examples with a modal followed by an infinitive: (4) a. Hvorfor skal studenterne ikke why shall the-students not b. *Hvorfor skal studenterne den why shall the-students it ‘Why don’t the students have to read it?’

[læse den]? read it? ikke [læse t]? not read?

It is likely that MSc modals are main verbs with clausal infinitive complements, like their counterparts in many languages (cf. Vikner 1988). The object cannot move either to a position between the negative and the infinitive (giving ikke den læse) or, as shown in (4b), to a position preceding the negative. So, whatever the position of ikke, object shift is impossible. There is no reason to suppose that the infinitive moves (that is, we do not find the order infinitive–ikke with the negative in the lower clause). Nor, as (4b) shows, can the object move. So (4) is consistent with generalization (a). (5) shows what happens in periphrastic tenses: (5) a. Hvorfor Why b. *Hvorfor Why

har have har have

studenterne the-students studenterne the-students

ikke not den it

[læst den]? read it? ikke [læst t]? not read?

Again, there is no reason to say that past participles move in Danish. Hence, the ungrammaticality of (5b) further confirms generalisation (a). Holmberg (1986) and Vikner (1989, 1994) have convincingly shown that object shift is distinct from scrambling of the type found in Continental West Germanic languages. Moreover, they argue that it is a variety of A-movement. Consequently they propose that, like other A-movements,

Object Movement and Verb Movement 91 object shift is driven by Case theory. The essential idea is that Case theory applies to unstressed pronouns in a more stringent way than to full DPs and hence these elements are required to undergo some ‘extra’ movement in order to satisfy this further requirement. Holmberg (1986) relates this further requirement to morphological case, observing that pronouns show morphological case in MSc while full DPs do not. Holmberg’s analysis carries over to Icelandic, where full DPs show morphological case and undergo object shift.2 Vikner’s (1989) account of Danish essentially endorses this view. An apparent problem for the idea that object shift is A-movement has to do with its interaction with A-movement of the subject (as noted by Vikner 1994). Movement of the object crosses the base position of the subject, and subsequent movement of the subject crosses the landing site of object shift. Suppose, for concreteness, that the landing site of object shift is Spec-AgrOP (this position is in fact the only plausible candidate—see section 3). The relevant parts of the derived structure of a clause with object shift, e.g. (2b), must then be as in (6): (6) [AgrSP subji . . . [AgrOP objj [AgrOʹ [VP* ti [VP . . . tj]]]] Spec-AgrOP is an A-position, and c-commands the trace of the subject without c-commanding the derived position of the subject. Formally, both the movement of the object and the movement of the subject violate the Shortest Link requirement on movement (Chomsky 1993), i.e. Relativised Minimality. The structure in (6) has the abstract character of a superraising example like (7): (7) *Johni seems that it is likely [ti to win]. Why is (6) not a violation of shortest link on a par with (7)? Chomsky (1993: 21–26) proposes an answer to this question. He posits an operation of V-raising to AgrO (either at S-structure or LF), which creates a chain C = (V, t). The minimal domain of this chain consists of Spec-AgrOP, the base position of the subject and the base position of the object.3 Chomsky then introduces the following notion of equidistance: (8) If α and β are in the same minimal domain, they are equidistant from Γ. Equidistant positions are those which cannot act as interveners for each other, since movement to either of them has the same status with respect to the Shortest Link requirement on chains. In (6), Spec-AgrOP and the base position of the subject are equidistant from the base position (or launching site) of the object. Hence the object can move directly to Spec-AgrOP, ‘skipping’ the subject position, since the subject is not an intervener for the resulting chain. Movement of AgrO to a higher position creates a further chain C´ whose minimal domain will contain both Spec-AgrOP and the

92 Ian Roberts specifier of the target head, allowing the subject to ‘skip’ Spec-AgrOP but preventing the object from skipping the subject. Thus, according to Chomsky, overt V-movement facilitates overt object movement, i.e. object shift. Two questions remain open, however. First, V-movement only allows object shift; it does not require it. Yet we have seen that in MSc object shift is obligatory for unstressed pronouns. Second, we need an account of the restriction of object shift to pronouns in MSc. We will return to both of these points in section 3. For now, the important conclusion is that treating object shift as A-movement (to Spec-AgrOP) can at least explain why overt V-movement is a precondition for overt object shift. In this section we have reviewed the basic data concerning object shift and we have seen that it is plausible to regard this operation as A-movement, probably to Spec-AgrOP. We now consider the evidence that object shift existed in ENE.

2. Object Shift in Early Modern English In this section we will show that the correlation between V-movement and object shift holds diachronically in English. As long as English had overt V-movement, it had object shift. This observation is interesting for two reasons. First, it supports the approach to object shift outlined in the previous section. Second, it implies that, although Modern English appears to lack ‘special’ clitic pronouns (in Zwicky and Pullum’s 1983 sense), this fact can be attributed to the well-known lack of overt V-movement rather than to any peculiar feature of the pronoun system. Since we need to say that overt V-movement is lacking in Modern English in any case, this conclusion regarding the pronoun system is advantageous. We now review that diachronic evidence. We restrict our attention largely to the 16th century for two reasons. First, as we will show directly, overt V-movement is not always found at this period, but the correlation between overt V-movement and object shift is systematic. Second, at this period object shift is, as in MSc, restricted to pronouns. Earlier periods of the language show more general object shift—see note 5. We follow Emonds (1978) and Pollock (1989) in taking clausal negation as a diagnostic for movement from V to AgrS. In fact, we make the simplifying assumption that not, when it has clausal scope, is at the left edge of VP in ENE.4 ENE had three possibilities for clausal negation, which are illustrated by the following examples: (9) a. it serveth not (1513, Anon) b. it not belongs to you (1600, Shakespeare) c. whose sore task Does not divide the Sunday from the week (1605, Shakespeare) The order V–not in (9a) involves V-to-AgrS movement. This was the only form of clausal negation prior to the 16th century, and it died out

Object Movement and Verb Movement 93 soon after 1600 (see Kroch 1989, Jespersen 1909–49, Lightfoot 1979, Roberts 1985 [this volume, Chapter 1], 1993). For these reasons, we refer to this as the ‘conservative’ system. The order not–V of (9b) is a 16th-century innovation which died out in the 17th century; we will say more about this order below. Owing to the similarity with non-V2 clauses in MSc, we will designate this the ‘Scandinavian’ order. The construction in (9c) is of course the only one to survive in contemporary English. This was also a 16th-century innovation (see Ellegård 1953, Denison 1985, 1993, Kroch 1989, Roberts 1993). We refer to this as the ‘modern’ construction. Let us now consider the behaviour of pronominal objects in each of these constructions in turn.5 With the ‘conservative’ form of negation, which involves V-to-AgrS movement, we consistently find object shift. The following are representative examples: (10) a. b. c. d.

if you knew them not they tell vs not the worde of God yf thou smyte it not of Why bring you him not up?

(1580, John Lyly) (1565, Thomas Stapleton) (1534, Thomas More) (1614, Jonson)

(10a) is a straightforward case of the order V–pronoun–not . (10b) features a double-object verb with one pronominal object and one non-pronominal object. The pronominal object precedes not while the non-pronominal follows. This is what we expect, given what we have seen regarding object shift, and is exactly parallel with what is found in MSc (see Vikner 1994). (10c) involves the verb-particle smite . . . off. There is a variety of approaches to the analysis of this construction (e.g. Emonds 1976, Kayne 1985, den Dikken 1990, Johnson 1991), but they all have in common the idea that the particle is in VP at S-structure in the sequence V–DP–Prt. In that case, the order pronoun–not –Prt in (10c) provides clear evidence of object shift. Finally, (10d) is a further case of a verb-particle construction. Since it is an interrogative, the inflected verb is in C in this example. As in MSc, the object pronoun does not ‘follow’ the verb to C. The situation is substantially the same as it is in MSc regarding object shift. Of course, there are independent differences between ENE and MSc which to some degree obscure the similarities. There are two relevant differences, both concerned with verb-movement. In conservative ENE, V consistently moves to AgrS. This is not the case in MSc, as we saw in section 1. On the other hand, in MSc V consistently moves to C in all main-clause declaratives, while ENE was no longer a V2 language (cf. van Kemenade 1987, and, for a different view, Lightfoot 1994). In ENE the inflected verb moves to C only in interrogatives and a restricted range of declarative main clauses. If we abstract from these differences, we see that ENE and MSc are exactly alike regarding object shift.

94 Ian Roberts Consider next the ‘Scandinavian’ construction in (9b). With this kind of negation, we only find the order not –V–pronoun. This is illustrated in (11): (11) a. b. c.

She not denies it I not bid thee that I not see them

(1599, Shakespeare) (1611, Jonson) (1633, Jonson)

Here, too, the pattern parallels MSc, modulo differences in verb-movement. Given our assumption about the position of not, these examples have no overt verb-movement at all. So this construction is exactly like what we saw for MSc embedded clauses in section 1. These examples are thus consistent with generalisation (a) about object shift: the object moves only when the verb moves. Since the verb does not move here, neither does the object. The pattern of negation in (11) is of course ungrammatical in contemporary English. As we mentioned earlier, it is only found from roughly 1550 to 1650. Roberts (1993), drawing on ideas in Kroch (1989), proposes an account of this which is consistent with most theories of negation, do-insertion and verb-movement in contemporary English (see, among others, Pollock 1989, Rizzi 1990, Chomsky 1991). Suppose, following Chomsky (1993), V raises to T and AgrS at LF in Modern English in order to check its features, i.e. the V-features of T and AgrS are ‘weak’. This raising is blocked by the presence of Neg (inter alia), and so in negative clauses V cannot be inserted bearing an affix, as that affix will fail to be checked at LF. Do is inserted to check features of T and AgrS (the insertion takes place prior to LF, despite the fact that the features in question are weak and hence do not need to be checked before LF, since post-S-structure lexical insertion is impossible).6 In these terms, the ‘Scandinavian’ negation pattern of (11) can be analysed exactly, in fact, as MSc negation would be analysed: not does not block LF-raising since it is not, at this period, a head, but rather an adverbial (in an analysis which features NegP, the most plausible assumption is that not occupies Spec-NegP). In the mid-17th century, not becomes a head and so do-insertion becomes obligatory with negation because not now blocks LF verb-raising. Independent evidence that not becomes a head comes from the appearance at this time of the reduced form n’t in texts from 1660s, cf. Jespersen (1909–49, V: 429). With do-insertion, the pronominal object always occupies the modern position. Thus sentences which are negated in the contemporary manner and which contain pronominal objects appear to be exactly like their counterparts in contemporary English. This is illustrated by the following examples: (12) a. b. c.

ye do not remembre me this sorrow does not leave me they dyde not assaile it

(1463, Anon) (c1480, Anon) (1523–25, Berners)

Object Movement and Verb Movement 95 In the 16th century, we do not find cases where the object appears adjacent to the inflected verb do. That is, the order do pronoun not V is unattested; we do not find examples like (13): (13) I did him not see. The best way to make this evidence compatible with everything else we have seen about object shift in ENE is to assume that sentences involving doinsertion are parallel to compound tenses. In that case, they are comparable with MSc examples of the type in (4) and (5). In this section, we have seen that ENE object shift was exactly parallel to MSc object shift once the differences in verb-movement that distinguish ENE from MSc are taken into account.

3. Object Shift and Cliticisation In this section, we propose an analysis of object shift in terms of the featurechecking system put forward in Chomsky (1993). Recall that two questions remain open from our discussion of V-movement and object-movement in section 1. The first concerns why object shift should be obligatory when V moves. The second concerns the selectivity of object shift: why (in MSc and ENE) is it restricted to pronouns? We can answer the second of these questions by saying that pronouns are subject to some feature-checking requirement over and above the one which applies to ‘full’ DPs (very much in the spirit of earlier work by Holmberg and Vikner). A natural proposal is that pronouns, whose content is exhausted by phi-features, are required to check those features with some functional head, perhaps in addition to the usual requirement for checking Case features. If we take that head to be AgrO, we see why pronouns must undergo object shift under certain conditions. Object shift will be triggered to the extent that AgrO has strong phi-features. We now need to see what triggers AgrO’s strong phi-features. This will give us the answer to our first question. Our proposal is this: when V raises (for V2 in MSc, because of the V-features of AgrS in ENE) it must pass through AgrO in order to satisfy the Head Movement Constraint. In so doing, it ‘activates’ AgrO’s agreement property. We can think of this as AgrO’s strong phi-features being induced by V-movement. More technically, we can say that the presence of some element in the checking domain of AgrO activates AgrO’s potential for checking. Hence, when the verb moves to the checking domain of AgrO (by adjoining to AgrO), the pronoun is required to raise to Spec-AgrOP in order to check AgrO’s strong phi-features. In this way, we see why V-movement both allows and requires pronoun object shift. This approach extends straightforwardly to Icelandic. In this language, as we mentioned in note 2, pronominal object shift is obligatory whenever

96 Ian Roberts the verb moves and verbs systematically raise at least to AgrS in all finite clauses (i.e. AgrS has strong V-features). Hence pronouns always shift (in finite clauses). On the other hand, full DPs optionally undergo object shift. We take no position on what may trigger this, noting only that a mechanical option is to say that AgrO has an optionally strong N-feature. Clearly, Faroese differs from Icelandic in this last respect (cf. note 2). In all the Scandinavian languages, the trigger for pronoun object shift is the same, however. Moreover, we can maintain that English from at least the 16th century onwards patterns exactly like the Scandinavian languages. The difference in English is that verbs do not raise overtly, hence they never induce object shift. There is a well-known complication for the view that Modern English lacks overt V-movement: auxiliary verbs do raise. Roberts (1983) and Pollock (1989) have suggested that this is connected to the fact that auxiliaries lack Θ-roles. Lacking in Θ-roles, auxiliaries do not have direct objects, and so they are not relevant for the discussion of object shift. However, there is one exception to this statement: the possessive have of conservative dialects of British English. This verb behaves in inversion and negation contexts as though it moves to AgrS. Pollock (1989) argues that this fact can be made consistent with the view that only verbs which have no θ-roles are able to move to AgrS in Modern English; this is arguably correct, but the important point for present purposes is that possessive have is a verb which superficially has a direct object and which continued to raise to AgrS after this possibility had been lost for other English verbs with direct objects. So it is appropriate to ask how this verb behaves with respect to object shift. Unfortunately, it is difficult to ascertain what the precise situation is with regard to object shift in contemporary dialects which allow have to raise to AgrS. In negatives, the paradigm is as follows: (14) a. ?? I have it not. b. *I have not it. c. I haven’t it. It seems that in this variety the contracted negation n’t is obligatory where have raises to I. Since n’t combines with have and moves with it, this makes it impossible to tell if object shift has taken place in (14c). The general requirement for the contracted negation with have is confirmed by the ungrammaticality of (15a), which is clearly independent of object shift: (15) a. *I have not a car. b. I haven’t a car. Despite this difficulty, native speakers are unanimous in preferring (14a) over (14b). (14a) sounds very archaic or poetic, while (14b) is simply impossible. So the evidence favours the view that object shift may still marginally

Object Movement and Verb Movement 97 remain with conservative possessive have. Clearly, this is expected if in fact Modern English pronouns have exactly the same property as their Scandinavian counterparts. Several authors (Deprez 1990, Josefsson 1992, Jonas and Bobaljik 1993) have proposed that pronoun object shift is head-movement (this was also my own view in an earlier version of this work, presented in 1991 and written in 1992). One argument which has been given in favour of this idea is that it assimilates object shift to cliticisation in Romance.7 However, it is likely, especially given the facts of Romance past-participle agreement discussed in Kayne (1989), that Romance clitic-placement involves DP-movement (see Roberts (1993), Sportiche (1992), Vikner and Sprouse (1988) and below). Moreover, pronoun object shift has many properties that are quite unlike any Romance cliticisation. So, postulating that object shift is head-movement does not achieve the alleged assimilation, and this assimilation is probably not desirable in any case. If we propose that pronominal object shift is head-movement, we are led to posit a kind of head-movement that is otherwise unattested, and to suggest relaxation or violation of well-known constraints on headmovement (this was a very real flaw in my earlier version of this material, cited above). One hypothesis—the simplest head-movement variant of what we have proposed—is that object shift adjoins the pronoun (a D) to AgrO. This operation either violates the Head Movement Constraint by moving D over V, or, if we adopt the more complex derivation with DP-movement to SpecAgrOP applying first, the D-movement effectively becomes downgrading since it moves D from Spec-AgrOP to AgrO. An alternative is to assume some kind of excorporation (either of D from [D + V] in V, or of V from [D + V] in AgrO). This kind of operation is arguably undesirable on general grounds (even if, strictly speaking, Relativised Minimality allows it—cf. Roberts (1991 [this volume, chapter 10])). If we exclude excorporation, then it is technically impossible to adjoin to the head of the object DP to AgrO. Suppose then that the object’s host head is higher than AgrO. The Head Movement Constraint requires that DP-movement take place in order to permit D to attach to the host without incurring a violation. This kind of derivation may be right for Continental West Germanic (and Old English) and perhaps Romance clitics (but see below on the latter). However, these groups of languages each provide different kinds of independent motivation for this view, motivation which is lacking in MSc (and ENE). Continental West Germanic languages all have leftward scrambling of DP. While the nature of and trigger for scrambling remain rather unclear (cf. the references in Vikner (1994) and other articles in Corver and Van Riemsdijk (1994)), it is plausible to suppose that scrambling is the DP-movement operation which feeds cliticisation (cf. Roberts (1993)). MSc languages lack scrambling, and hence one type of independent motivation for one kind of cliticisation analysis of object shift.

98 Ian Roberts It is also instructive to compare Romance clitics with MSc shifted objects. Romance clitics always occupy “special” positions, unlike MSc object pronouns, which have to remain in what appears to be their base position if V does not move. This is one reason why it is plausible to think that Romance clitics are base-generated in their “special” positions, as a kind of agreement (or Voice) head which triggers raising and licensing of a DP (pro in nonclitic-doubling cases). This is what Sportiche (1996) proposes. On this view, DP-movement (of pro) is still associated with Romance clitics. The facts of MSc clearly offer no scope for such an approach to shifted objects. More generally, any approach raising shifted objects higher than AgrO raises two questions: (i) why is DP raised? (ii) how do D-movement and V-movement interact? Although we have no satisfactory general answer to (i) for Continental West Germanic and Romance, we have independent evidence of the existence of such “long” DP-movement, as we have just seen. Question (ii) naturally leads to the postulation either of excorporation or head-adjunction to a head containing a trace. Neither of these options is conceptually attractive (see Kayne (1991) on the latter). We do not need to appeal to either of them if the shifted object stays at the AgrO level, but then it can only be in Spec-AgrOP. One variant of the cliticisation approach, explored in Holmberg (1991), is to say that pronominal objects always cliticise: where V does not leave VP, the cliticisation is string-vacuous D-movement to V. However, this putative cliticisation differs from Romance cliticisation in two important respects: (a) it is head-adjunction to the right of the host; (b) it is cliticisation to a lexical head.8 We see then that MSc and ENE object shift are actually rather unlike Romance cliticisation, and in any case Romance cliticisation involves a DP-movement component. Hence this kind of comparative consideration provides no argument at all for regarding object shift as head-movement. The technical problems associated with this idea are such that we continue to regard object shift as DP-movement to Spec-AgrOP, triggered and licensed as described above. To recapitulate: if pronoun object shift is DP-movement, we have a natural account of the synchronic and diachronic link with verb-movement. The ENE data are particularly clear in this regard. We must say that object shift is at least DP-movement; we have seen that there is no good reason for saying that it involves any more than this—and several good reasons not to say this. In this section we have implicitly introduced a typology of clitic, weak or shifted pronouns. In North Germanic and English they are required to check for phi-features with AgrO, where AgrO’s strong features are induced by verb-movement (always in Icelandic, sometimes in MSc, almost never in Modern English). In West Germanic they typically undergo scrambling and may cliticise to some head position above AgrS and below C (for a detailed proposal, which does not assume exactly the mechanisms sketched

Object Movement and Verb Movement 99 here, see Haegeman (1993)). In Romance, clitics are base-generated in the higher head positions and trigger pro-raising there (Sportiche (1996)). Understanding what precisely underlies these differences is a topic that goes beyond our goals here. One point which arises concerns the status of the “higher” agreement or voice projections between AgrS and C in North Germanic and English. Are these positions present? If so, what are their reflexes? Sadly, we must leave these fascinating questions open here.

4. Conclusion In this paper, we have proposed an analysis of MSc object shift which carries over to the essentially parallel phenomenon in ENE. An important aspect of our analysis is that it leads to the conclusion that English object pronouns have not changed at all since ENE. What has changed since ENE is the position of the inflected verb, as is well known. Since V (almost) never raises to AgrO, it neither triggers nor licenses object shift. So we arrive at the welcome conclusion that the observed change in the distribution of object pronouns is not an independent development, but simply a further reflex of the general loss of overt verb movement in English.

Acknowledgements An earlier version of this material (which featured a rather different analysis) was presented at the GGS-Treffen, Bern, the 7th Comparative Germanic Syntax Workshop, Stuttgart, and the University of California, Irvine. Thanks to the audiences at those presentations for their comments. Thanks also to Bob Borsley, Sten Vikner and the editors of this collection for their comments. All errors are my own.

Notes 1. Josefsson (1992) presents evidence that the order XP—V—reflexive object pronoun—subject is possible in some varieties of Swedish. In that case, we would say that the object pronoun does follow the verb to C° in these varieties. I will leave this potentially important fact aside in what follows. 2. However, Holmberg’s analysis runs into problems in Faroese, as Vikner (1994) points out, since here NPs have morphological case, as in Icelandic, but object shift is limited to pronouns as in MSc (see Barnes 1992: 28): (i) a. Jógvan keypti ikki bókina. J. bought not the-book. b. *Jógvan keypti bókina ikki. J. bought the-book not. ‘J. didn’t buy the book’. (ii) a. *Jógvan keypti ikki hana. J. bought not it. b. Jógvan keypti hana ikki. J. bought it not. ‘J. didn’t buy it’.

100 Ian Roberts

Middle English poses the converse problem for Holmberg (1986), in that there is no morphological case-marking on non-pronominal NPs but they nevertheless appear to be able to undergo object shift. We will return briefly to ME below (see note 5). In section 3, we will make an alternative proposal as to what triggers pronoun movement. This proposal can handle both the Faroese and the ME data. A further point which is relevant here is that full NPs may undergo object shift in Icelandic, while pronouns must. See section 3. 3. The minimal domain of a head H is the smallest set of nodes such that its members dominate all nodes the categories in the domain of H dominate except those that contain H. The domain of H is the set of nodes contained in the maximal projection of H distinct from and not containing H. These definitions extend straightforwardly to head-chains. 4. In light of the recent work stemming from Pollock (1989), this assumption is obviously too simplistic. In fact, our remarks on do-support below will suggest a partial refinement. This is, of course, not the place to provide a full analysis of negation in ENE. 5. In the text, we do not consider the possibility of ‘Icelandic-style’ object shift of non-pronominal DPs. It does not seem that this was possible in ENE, at least. A simple count of the relative positions of objects and not in Spevack’s (1970) Shakespeare Concordance for eight plays revealed no examples at all of object shift of a non-pronominal DP. Conversely, of a total of 93 instances of ‘conservative’ negation with a pronominal object, 78 featured object shift (and several of those which did not had clearly emphatic objects, in addition to two cases of reflexive X-self objects) and 15 had no object shift. Of a total of 23 cases of ‘modern’ negation involving do-insertion, only one had object shift with another 22 having the object in the modern position. Finally, neither of the two cases of ‘Scandinavian’ negation had object shift. In light of Vikner’s (1994) suggestion that object shift of full DPs is related to overt V-to-AgrS movement (see Note 2), these facts may be problematic, since the ‘conservative’ form of ENE negation clearly involved overt V-to-AgrS movement. Object shift of full DPs appears to have been possible in Middle English, as the following example shows: (i) Triacle schal be leide to . . . forto þe posteme breke Treacle should be laid on . . . to the boil break (ca 1398, Trevisa)

However, it is extremely difficult to say whether such cases involve scrambling or object shift. We leave this question open for future research. 6. It is difficult to account for the ‘last-resort’ nature of do-insertion here without invoking the idea that lexical insertion for feature-checking is more costly than movement—see Chomsky (1991). It is altogether unclear to me why this should be, however. There is also the diachronic question of the 16th-century use of do in positive declaratives, i.e. the non-last-resort situations. See Roberts (1993a: 3.2) and, for an account using the mechanisms in Chomsky (1993), Watanabe (1993). 7. This is not the only argument that has been given in favour of a head-movement approach, of course. Jonas and Bobaljik (1993) have a theoretical motivation which derives from the fact that they establish a correlation between transitive expletive constructions (TECs, e.g. There ate a man an apple) and nominal object shift. This correlation can be explained in terms of the idea that Spec-TP must be a possible site for subject raising when the object is shifted to Spec-AgrOP; likewise, Spec-TP is the position of the argumental subject in TECs. Hence the availability of Spec-TP underlies both properties. Since MSc has pronoun object shift

Object Movement and Verb Movement 101 only, and no TECs, our approach to pronoun object shift threatens to undermine Jonas and Bobaljik’s explanation for the correlation between nominal object shift and TECs. In fact, there are independent grounds for rejecting Jonas and Bobaljik’s approach. If we follow Kayne (1993) and assume that Specifiers are adjoined positions, and if we continue to adopt the definition of minimal domain in Chomsky (1993: 12), then we create the potential for AgrO-to-T movement to license object-shift to Spec-TP in the following configuration (because domains are defined in terms of containment and adjoined categories are contained in but not dominated by the category they adjoin to, according to Chomsky):

(i) [AgrSP AgrS [TP T [AgrOP Subj AgrO [ t V Obj . . .

The minimal domain of the chain formed by AgrO-to-T movement now contains Spec-TP, Spec-AgrOP and Spec-VP. We can prevent object-movement to Spec-TP if we assume that Spec-TP is either absent (Chomsky 1993) or an A´-position (Rizzi (1990)) and that domain extension by head-movement only ever creates one further potential landing site for movement, as pointed out by Jonas and Bobaljik. This approach is adopted in Roberts (1993) in the account of restructuring and clitic-climbing given there. See also Kayne (1994) for conceptual arguments in favour of treating Specifiers as adjoined categories. Object shift is still possible where T raises to AgrS, forming a chain with minimal domain {Spec-AgrSP, Spec-TP, Spec-AgrOP}, all positions accessible to the subject (but not to the object). To the extent that T-to-AgrS movement is reflected by overt V-movement (something of an open question at present), then the tie-in between verb-movement and object shift may be made still tighter: object shift would depend on V-movement to AgrS. Note that the MSc and ENE facts are compatible with this more stringent characterisation. In these terms, one is led to formulate an account of TECs which posits some Agr-recursion at the AgrS-level. One can regard Agr-recursion as substitutionmovement of AgrS (given Chomsky’s (1993) domain-extension requirement on transformations any substitution movement of a head will be indistinguishable from copying of that head). If AgrS raises only where V raises, then one can tie TECs to systematic overt V-raising to AgrS. This is a correct result for North Germanic, in that it distinguishes Icelandic from Mainland Scandinavian. This might be the beginning of an alternative to Jonas and Bobaljik’s generalisation, but this is not the place to develop it further. 8. Both of these objections could be avoided by postulating that D left-adjoins to some functional head while V moves to some higher position, giving a derived structure like (i):

(i) F

G V

D

F:

But there is no independent motivation for this kind of structure. Contrast the evidence for V-movement in Romance enclisis given in Kayne (1991).

References Barnes, Michael: 1992, ‘Faroese Syntax—Achievements, Goals, Problems’, in Jonna Louis-Jensen and Jóhan Hendrik W. Poulsen (eds.), The Nordic Languages and Modern Linguistics 7, Føroya Fróðskaparfelag, Tórshavn, pp. 17–37.

102 Ian Roberts Chomsky, Noam: 1991, ‘Some Notes on Economy of Derivations and Representations’, in R. Friedin (ed.), Principles and Parameters in Comparative Grammar, MIT Press, Cambridge, MA, pp. 417–454. Chomsky, Noam: 1993, ‘A Minimalist Program for Linguistic Theory’, in Kenneth Hale and Samuel Jay Keyser (eds.), The View From Building 20, MIT Press, Cambridge, MA, pp. 1–52. Corver, Norbert and Henk van Riemsdijk: 1994, Scrambling, Foris/de Gruyter, Berlin. Denison, David: 1985, ‘The Origins of Periphrastic Do: Ellegård and Visser Reconsidered’, in R. Eaton et al. (eds.), Papers from the 4th International Conference on Historical Linguistics, Amsterdam, April 10–13, 1985, John Benjamins, Amsterdam, pp. 45–60. Denison, David: 1993, English Historical Syntax: Verbal Constructions, Longmans, London. Deprez, Vivienne: 1990, ‘Parameters of Object Movement’, talk given at the Scrambling Workshop, University of Tilburg, October 1990. Dikken, den Marcel: 1990, ‘Particles and the Dative Alternation’, in Proceedings of the Second Leiden Conference for Junior Linguists, pp. 71–86. Ellegård, Alvar: 1953, The Auxiliary do: The Establishment and Regulation of its Use in English, edited by Fred Behre, Gothenburg Studies in English, Almqvist and Wiksell, Stockholm. Emonds, Joe: 1976, A Transformational Approach to English Syntax: Root, Structure-Preserving and Local Transformations, Academic Press, New York. Emonds, Joe: 1978, ‘The Complex V–V’ in French’, Linguistic Inquiry 9, 151–175. Haegeman, Liliane: 1993, ‘The Morphology and Distribution of Object Clitics in West Flemish’, Studia Linguistica 47:57–94. Holmberg Anders: 1986, Word Order and Syntactic Features in the Scandinavian Languages, Dept of General Linguistics, University of Stockholm. Holmberg, Anders: 1989, ‘What is Wrong with SOV Word Order in an SVO Language?’, ms. University of Uppsala. Holmberg, Anders: 1991, ‘The Distribution of Scandinavian Weak Pronouns’, in Henk van Riemsdijk and Luigi Rizzi (eds.), Clitics and Their Hosts, EUROTYP Working Papers 8.1, European Science Foundation, Strasbourg, pp. 155–174. Jespersen, Otto: 1990–49, A Modern English Grammar on Historical Principles, George Allen & Unwin, London. Johnson, Kyle: 1991, ‘Object Positions’, Natural Language and Linguistic Theory 9, 577–636. Jonas, Dianne and Jonathan Bobaljik: 1993, ‘Specs for Subjects’, in Jonathan Bobaljik and Colin Phillips (eds.), Papers on Case and Agreement I, MIT Working Papers in Linguistics 18, 59–98. Josefsson, Gunlög: 1992, ‘Object Shift and Weak Pronominals in Swedish’, Working Papers in Scandinavian Syntax 49, 59–94. Kayne, Richard: 1985, ‘Principles of Particle Constructions’, in Jacqueline Guéron, Hans Georg Obenauer and Jean-Yves Pollock (eds.), Levels of Syntactic Representation, Foris, Dordrecht, pp. 101–140. Kayne, Richard: 1989, ‘Facets of Romance Past Participle Agreement’, in Paola Benincà (ed.), Dialect Variation on the Theory of Grammar, Foris, Dordrecht, pp. 85–104. Kayne, Richard: 1991, ‘Romance Clitics, Verb Movement and PRO’, Linguistic Inquiry 22, 648–686. Kayne, Richard: 1994, The Antisymmetry of Syntax. Cambridge, MA: MIT Press. van Kemenade, Ans: 1987, Syntactic Case and Morphological Case in the History of English, Foris, Dordrecht.

Object Movement and Verb Movement 103 Kroch, Anthony: 1989, ‘Reflexes of Grammar in Patterns of Language Change’, Journal of Language Variation and Change 1, 199–244. Lightfoot, David: 1979, Principles of Diachronic Syntax, Cambridge University Press, Cambridge. Lightfoot, David: 1994, ‘Why UG Needs a Learning Theory: Triggering Verb Movement’, in Adrian Battye and Ian Roberts (eds.), Clause Structure and Language Change, Oxford University Press, New York/Oxford, pp. 31–52. Pollock, Jean-Yves: 1989, ‘Verb Movement, UG and the Structure of IP’, Linguistic Inquiry 20, 365–424. Rizzi, Luigi: 1990, Relativized Minimality, MIT Press, Cambridge, MA. Roberts, Ian: 1983, ‘The Syntax of English Modals’, in Dan Flickinger et al. (eds.), Proceedings of the Second West Coast Conference on Formal Linguistics, Stanford, pp. 227–246. Roberts, Ian: 1985, ‘Agreement Parameters and the Development of English Modal Auxiliaries’, Natural Language and Linguistic Theory 3, 21–58 [this volume, Chapter 1]. Roberts, Ian: 1991, ‘Excorporation and Minimality’, Linguistic Inquiry 22, 209–218 [this volume, Chapter 10]. Roberts, Ian: 1993a, Verbs and Diachronic Syntax, Kluwer, Dordrecht. Roberts, Ian: 1993b, ‘Restructuring, Pronoun Movement and Head Movement in Old French’, ms. University of Wales. Spevack, Michael: 1970, A Complete and Systematic Concordance to the Works of Shakespeare, Volume V: Hildings—Severing, Olms, Hildersheim. Sportiche, D. 1996, ‘Clitic Constructions’, in J. Rooryck & L. Zaring (eds) Phrase Structure and the Lexicon. Dordrecht: Kluwer, pp. 213–276. Vikner, Sten: 1988, ‘Modals in Danish and Event Expressions’, Working Papers in Scandinavian Syntax 39. Vikner, Sten: 1989, ‘Object Shift and Double Objects in Danish’, Working Papers in Scandinavian Syntax 44, 141–155. Vikner, Sten: 1994, ‘Scandinavian Object Shift and West Germanic Scrambling’, in Norbert Corver and Henk van Riemdijk (eds.), Scrambling, Foris/de Gruyter, Berlin, pp. 487–517. Vikner, Sten:1995, Verb Movement and Expletive Subjects in the Germanic Languages, Oxford University Press, Oxford/New York. Vikner, Sten and Rex Sprouse: 1988, ‘Have/Be Selection as an A-Chain Membership Requirement’, Working Papers in Scandinavian Syntax 38. Watanabe, Akira: 1993, ‘The Role of Triggers in the Extended Split INFL Hypothesis: Unlearnable Parameter Setting’, ms. University of Tokyo. Zwicky, Arnold and Geoff Pullum: 1983, ‘Cliticization vs. Inflection: English n´t’, Language 59, 502–513.

4

Directionality and Word Order Change in the History of English Ian Roberts

1. Introduction* A standard view of the historical development of English word order involves the idea that Old English (OE) was, like all other attested West Germanic varieties (with the possible exception of Yiddish—see Santorini 1992), head-final at least in VP and IP (see Stockwell 1977; Canale 1978; Lightfoot 1979, 1991; Bean 1983; van Kemenade 1987; Pintzuk 1991; Traugott 1992; Denison 1993; although not all of these authors assume an IP). In this respect, OE (and West Germanic generally) differs from Modern English (NE) (and North Germanic generally) in the value of the directionality parameter in (1), for at least some values of Y: (1) Directionality parameter: Y′ → Y XP Y′ → XP Y According to the standard view, at some point in the Middle English (ME) period—probably in the twelfth century—there was a change in the value of the directionality parameter (for the relevant categories). As a result of this change, the language became uniformly head-initial. This means that OV and V–Aux orders, formerly abundantly attested (see section 2), are no longer found. Recently, Kayne (1994) has argued that UG cannot contain a parameter like (1). Kayne proposes a theory of phrase structure which derives many of the properties of X′-theory from the central idea that asymmetric c-command relations among non-terminals are intrinsically connected to linear order among terminals. We can phrase the central constraint as follows:1 (2) If A, a non-terminal, asymmetrically c-commands B, a non-terminal, then all terminals a dominated by A precede all terminals b dominated by B. To see how (2) works in the case of head-complement order, consider the VP in (3): (3) [VP [V see[DP [D him]]]]

Directionality and Word Order Change 105 Here V asymmetrically c-commands D (the definition of c-command is ‘X c-commands Y iff X does not contain Y and every category dominating X dominates Y’). Hence, by (2), see must precede him. This conclusion would follow even if we chose to draw the phrase marker the other way around. Thus there can be no parametric variation as regards head-complement order; (2) (or whatever it derives from, cf. note 3) requires that heads precede their complements. Hence, all languages are underlyingly head-initial. In a system like Kayne’s, superficial OV patterns, or, more generally, head-final typologies, must be derived by leftward-movement processes. Chomsky (1994) adopts a similar position. Zwart (1993) has shown that this approach yields positive results in the analysis of Dutch; in particular, one can dispense with the idea that Dutch is a mixed-branching language, with some categories (e.g. CP) right-branching and others (e.g. VP) leftbranching. Zwart’s proposal is that ‘The SVO order of Dutch main clauses is derived from an “underlying” SOV order, visible in embedded clauses. However, this SOV order is derived from an underlying SVO order in the Dutch VP, still visible when the object is not a noun phrase but a clause’ (p. 29). The proposal accounts for the following generalizations about Dutch: (i) ‘top projections’, e.g. C and D, are always head-initial; (ii) ‘when a head allows its complement to appear on one side only, the complement always follows the head’—this is true of the complements of N, for example; (iii) ‘when the head allows its complement to appear on both sides, the head and the complement are never adjacent when the complement precedes the head’—this is true of the complements of A, P and V. Zwart captures these generalizations by assuming that all categories are head-initial, and that some complements, in particular direct objects of V, move leftward during the derivation. In addition to accounting for these generalizations the proposal allows a simple treatment of verb raising (as in fact the absence of overt verb movement). Various other proposals made by Zwart will be discussed and adopted to OE below. The purpose of this chapter is to explore the consequences of what we might call the Zwart/Kayne view for OE and, in particular, for the wordorder changes that took place in ME. I will argue that the Zwart/Kayne view is at least as good as the more standard views as regards the synchronic analysis of OE, and that it permits a more natural and revealing account of the ME changes. One advantage can be identified straightaway. On the ‘standard’ view, OE is not a uniformly head-final language. For example, there is no doubt that CP and DP are both head-initial projections, as in Dutch. The projections that are usually regarded as head-final are IP and VP (we leave aside AP, NP and PP; these are usually regarded as either headinitial or mixed, and so they do not affect the point at hand). However, there is a fair amount of evidence for a medial IP of some kind (cf. in particular Pintzuk 1991), some of which I review below. If we split IP into various functional projections, then we may be able to claim that some of them are head-initial and some head-final. A claim like this is made in Cardinaletti &

106 Ian Roberts Roberts (2002 [this volume, Chapter 12]), for example. However, this kind of analysis shares with the standard analysis the consequence that at some point in the functional system that makes up the clause there is a switch from head-initial to head-final patterning. Note also that this is the most restrictive view compatible with the mixed typology. Such mixed typologies look very suspect: clearly it would be better to opt for a uniform direction in head-complement ordering. In that case, the only possibility that is seriously workable is to assume that OE is uniformly head-initial. More generally, the view I advocate is: (4) Principle: Y′ → Y XP Parameter: Morphosyntactic features causing leftward movement from VP. What changed in ME could not, ex hypothesi, have been the base expansion of V′ or I′.2 Instead, leftward movement possibilities are lost in ME. In this way, the word-order change becomes one of a type that is already very familiar: the loss of a movement dependency. It thus falls into the same general category of changes as the loss of V-to-I movement in early Modern English (cf. Roberts 1985 [this volume, Chapter 1], 1993a: ch. 3; Rohrbacher 1994a) or the loss of V2 in English (van Kemenade 1987; Platzack 1995) or French (Adams 1987; Roberts 1993a; Clark & Roberts 1993 [this volume, Chapter 2]), etc. More precisely, in each case I take it that strong features of the relevant kind (I’s V-feature in the case of V-to-I movement; the relevant feature of C in the case of V2) are lost, leading to the impossibility of movement thanks to UG-internal economy conditions (the Procrastinate Principle). In the case at hand, AgrO loses strong N-features, and so DP-movement to Spec, AgrOP is lost. Thus change in base expansion of X′ is reduced to the more familiar loss of movement dependencies, itself caused by changes in abstract features of functional heads. This kind of change is well attested, and, if Clark & Roberts are right, can be understood in terms of the idea that the language-learning algorithm contains a simplicity metric which values the absence of overt movement, and therefore weak features of functional heads, more highly than overt movement, i.e. strong features of functional heads. Hence language acquirers will tend to assign representations without overt movement to parts of the input which involve movement in the adult grammar. I take this treatment of word-order change to be a positive move. The chapter is organized as follows. In section 2, I review what I will call the ‘standard’ GB view of OE word order and describe the ME changes. The ‘standard’ view is a distillation of the work of many researchers, most notably van Kemenade (1987), although it probably does not correspond precisely to any one analysis that has been put forward. In section 3, I present an alternative view, arguing that OE can usefully be seen as a VO language. Section 4 deals with the word-order change in ME.

Directionality and Word Order Change 107

2. ‘Standard’ GB Accounts of OE Word Order 2.1 Introduction In this section, we first discuss the evidence for head-final order in OE IPs and VPs. Then we discuss the various operations that must be postulated in order to account for the attested facts if this order is assumed. These fall into two groups: three rightward movement operations (verb raising, verbprojection raising and extraposition) and two leftward movement rules (scrambling and clitic placement). We will discuss each of these operations in turn. I should emphasize at the outset that throughout this section we are reporting a point of view that will be replaced by an alternative later in the chapter. There is a consensus among scholars who have worked on OE syntax that the predominant word order in the clause is verb-final in subordinate clauses and verb second in main clauses (see Stockwell 1977; Canale 1978; Lightfoot 1979; Bean 1983; Denison 1993; van Kemenade 1987; Traugott 1992). The situation is thus very similar to that in Modern Dutch or German. The following examples illustrate OV order in subordinate clauses: (5) a. . . . Ϸæt ic þas boc of Ledenum gereorde to Engliscre spræce That I this book from Latin language to English tongue awende (AHTh, I, pref, 6; van Kemenade 1987: 16) translate ‘. . . that I translate this book from the Latin language to the English tongue’ b. . . . pæt he his stefne up ahof (Bede 154.28; Pintzuk 1991: 77) that he his voice up raised ‘ . . that he raised up his voice’ c. . . . forþon of Breotone nædran on scippe lædde wæron because from Britain adders on ships brought were ‘. . . because vipers were brought on ships from Britain’ (Bede 30.1–2; Pintzuk 1991: 117) The example in (5b) shows that verb-particle complexes can appear in the order Particle–V, again as is typical in Modern Dutch and German subordinate clauses (see Koster 1975). The example in (5c) shows that auxiliaries can follow participles in subordinate clauses (auxiliaries also follow infinitives in embedded clauses where there is no verb (-projection) raising— see below). This is another trait shared with Modern Dutch and German (although verb raising interferes with this pattern in Dutch—see below), and is also a typological feature of ‘OV’ languages (see Greenberg 1963; Hawkins 1983). The usual assumption is that verb-second order is derived by an operation which fronts the verb from its final position to the second

108 Ian Roberts position (although the precise nature of the second position remains a matter of debate on which I will take no position here), see Koster (1975), den Besten (1983) on Dutch, and van Kemenade (1987) for a clear demonstration that the arguments applied to Dutch carry over to OE. Hence OV order is taken as underlyingly general to all clauses in OE. Aside from verb second, various factors disguise the basic OV order. In the following subsections, I illustrate these, summarizing the standard analysis in each case. 2.2 Rightward Movement Rules 2.2.1 Verb Raising This phenomenon is well known from studies of Standard Dutch (see inter alia Evers 1975; den Besten & Edmonson 1983; Rutten 1991). As we mentioned above, finite auxiliaries usually follow non-finite participles or infinitives in languages conforming to the OV typology; (5c), where a finite auxiliary follows a participle in an embedded clause, illustrates this for OE. This is also generally the case in Standard German, for example. However, in Dutch a large class of finite auxiliaries and auxiliary-like verbs must or may precede the participle or infinitive. The following contrast (from van Kemenade 1987: 56) illustrates this: (6) a. . . . dat Jan het boekje wilde hebben(Dutch) that John the booklet wanted to have b. . . . daß der Johann das Büchlein haben wollte(German) that the John the booklet to have wanted ‘. . . that John wanted to have the booklet’ The precise details of which Dutch verbs allow this, and under which conditions, are complex, and are treated in detail in the references just given. The essential point is that, alongside the expected order (given an assumed OV typology) of non-finite verb followed by finite verb (or aux), Dutch also allows the order in which the finite verb / aux followed non-finite verb. Verb raising is found in OE, as the examples in (7) show: (7) a. þe æfre on gefeohte his hande wolde afylan who ever in battle his hands would defile ‘who would ever defile his hands in battle’ (ÆLS 25.858; Pintzuk 1991: 102) b. & from Offan kyninge Hygebryht wæs gecoren and by King O. H. was chosen ‘and H. was chosen by King O.’ (ChronA 52.8-54.1 (785), Pintzuk 1991: 102)

Directionality and Word Order Change 109 Assuming an OV, I-final structure for OE (7) would involve verb raising as in (8): (8) þe æfre on gefeohte [VP his hande ti] wolde afylani. (8) shows that the infinitive afylan moves to the right. Most analyses (e.g. Rutten 1991) assume that it adjoins to the right of I, the position containing the inflected verb. 2.2.2 Verb-Projection Raising This phenomenon is the counterpart of verb raising, with the extra complication that some further constituent, typically a complement of the non-finite verb, appears to the right of the finite auxiliary and (consistent with OV typology) preceding the non-finite verb. Although not attested in Standard Dutch or German, this phenomenon is found in many Continental West Germanic dialects, e.g. West Flemish (Haegeman & van Riemsdijk 1986; Haegeman 1992) and varieties of Swiss German (Haegeman & van Riemsdijk 1986). It is also found in OE, as the examples in (9) show: (9) a. hwær ænegu Ϸeod at oϷerre mehte [frið begietan] where any people from other might peace obtain ‘where any people might obtain peace from another’ (Or31.14–15; Pintzuk 1991: 113) b. Ϸæt nan man ne mihte [ða meniu geniman] that no man NEG could the multitude count ‘that no man could count the multitude’ (ÆLS 25.418; Pintzuk 1991: 33) Here the bracketing indicates the constituent that, on an OV analysis, must be assumed to move to the right of the finite auxiliary. This constituent is often thought of as a VP or other V-projection, whence the term ‘verbprojection raising’. Although the issue of the category of the post-verbal constituent is clearly distinct from the issue of the underlying position of this constituent, we will suggest below that these constituents represent something larger than VP, and in fact may be clausal. In any case, this constituent is assumed by proponents of an OV analysis of OE word order to move from a position immediately preceding the inflected verb to the superficial position seen in (9). 2.2.3 Extraposition If we assume an underlying OV order for OE, we must assume that this language, once again like Dutch and German, allows rightward extraposition

110 Ian Roberts of PPs and CPs. Unlike these languages, however, it also allows apparent rightward extraposition of DPs (cf. Stockwell 1977 or ‘exbraciation’): (10) a. drihten wæs acenned [PP on þære byrig] the lord was born in the city ‘The Lord was born in the city’ (ÆCHom i.34.9; Pintzuk 1991: 69) b. þæt turonisce folc wilnigende wæs [CP þæt Martinus wære to the Tours people desiring was that M. were as biscope gehalgod to heora burh-scire] bishop consecrated of their city ‘the people of Tours wanted M. to be consecrated as bishop of their city’ (ÆLS 31.254–6; Pintzuk 1991: 69) c. . . . þæt ænig mon atellan mæge [DP ealne þone demm] that any man relate can all the misery ‘. . . that any man can relate all the misery’ (Or 52.6–7; Pintzuk 1991: 36) In each of these cases, the bracketed constituent is assumed to have moved from a VP-internal position to the left of the main predicate. We could schematize this operation as follows: (11) [VP ti V] XPi (where XP = CP, PP and, possibly, DP) Clearly, the existence of examples like (10c) is superficially incompatible with the OV typology, although the facts can be accounted for by postulating an optional DP-extraposition rule. Pintzuk & Kroch (1989) show, on the basis of a metrical analysis of Beowulf, that whenever a DP is found in final position in that text, after the finite verb in a subordinate clause, it receives stress. They interpret this as indicating that these orders result from the application of an operation like Focus NP Shift (cf. Ross 1967; Stowell 1981). They also show that final PPs and CPs are not always subject to stress of this kind, hence these elements appear finally due to the operation of an extraposition rule just like that of Dutch or German (cf. Koster 1975). In this way, the basic ‘OV’ typology of OE can be maintained; the sole difference between OE and Modern Dutch or German is the existence in OE of Focus NP-Shift and the absence of a rule of this type in Dutch and German. However, it is not clear if this result can be maintained for OE in general, aside from Beowulf. In that case, we must postulate optional DP-extraposition for OE prose texts. OV analyses of OE word order have not properly solved this problem. Verb raising, verb-projection raising and extraposition all involve rightward movements of various kinds. OE also had various leftward movement operations that distorted the basic OV typology. We now turn to these.

Directionality and Word Order Change 111 2.3 Leftward Movement Rules 2.3.1 Scrambling Originally discussed in Ross (1967), scrambling is an operation which, in West Germanic languages, moves definite DPs leftwards within the clause, to some position outside VP. In Dutch and German, a diagnostic for scrambling is the order of a definite complement with respect to clausal negation, since the marker of clausal negation is taken to be at the left margin of VP (cf. Bennis & Hoekstra 1986; den Besten & Webelhuth 1990; Deprez 1994; Fanselow 1990; Lee & Santorini 1994; Webelhuth 1991). The following examples illustrate scrambling in German; in both cases, the fact that the direct object das Buch precedes nicht is taken to show that it has left VP: (12) Gestern kauftev Peter. . . yesterday bought Peter

a. . . . das Buchi ohne Zweifel nicht [VP ti tV] the book without doubt not b. . . . ohne Zweifel das Buch nicht [VP ti tV] without doubt the book not ‘Yesterday John without doubt didn’t buy the book’ (Vikner1995: 5) Assuming the adverb ohne Zweifel to occupy a constant position (an assumption that is denied by Zwart 1993, for example), the object is clearly moved further in (12b) than in (12a). There is little consensus in the literature on West Germanic either as to the nature of the scrambling operation as A-movement or Ā-movement or as to the landing sites of the operation. For our present expository purposes, this is not important; what is crucial is that the different positions occupied by the direct object in (12), along with the possibility of it following negation (i.e. remaining in the putatively VP-internal position) show that it can be moved to the left out of VP. In OE, clausal negation is signalled by pre-verbal ne, hence this diagnostic for scrambling is not available. Nevertheless, the following is a plausible instance of scrambling since the complement precedes an adjunct, and hence, given standard assumptions about the base positions of complements and adjuncts, must have moved leftward out of VP as indicated: (13) ne mihton hi nænige fultumi æt him [VP ti begitan] not could they any help from him get ‘They couldn’t get any help from him’ (Bede 48.9–10; Pintzuk 1991: 1.44) We can also see the effects of scrambling from interactions with verb-projection raising. On the standard view of OE word order, the following example

112 Ian Roberts involves scrambling of the direct object combined with rightward movement of the constituent containing the trace of the scrambled object: (14) þæt he þæt godes husi wolde [myd fyre ti forbærnan] that he the god’s house wanted with fire to-burn ‘that he wanted to burn the god’s house with fire’ (ÆLS 25.613–14; Pintzuk 1991: 39) The above discussion does not take into account the possibility of movement to Spec, AgrOP. Clearly, ‘short’ scrambling of the type in (12b) might be analysable in this way (if NegP is taken to be contained in AgrOP). This possibility, which does not figure in ‘standard’ accounts of OE word order will be considered in more detail below. If such an operation can be motivated, it adds a third case of leftward movement to the inventory of OE leftward movement rules (or, if it could be shown that there is only one position for leftward-moved complements outside VP, then it would substitute for scrambling on the view I am outlining here—however, we will see below that there are clearly two such positions). For our present exposition, we do not distinguish movement to Spec,AgrOP from scrambling. 2.3.2 Cliticization OE had a special position, or class of positions, for clitics. These elements preceded the main verb in ordinary matrix declaratives, followed the verb in matrix interrogatives, negative clauses and clauses beginning with a certain class of adverbs and followed the complementizer or the subject in embedded clauses. These positions are illustrated by the following examples: (15) a. God him worhte þa reaf of fellurn (clitic–verb) God them made then garments of skin ‘God then made them garments of skin’ (AHTh, I, 18; van Kemenade 1987: 114) b. Hwæt sægest ϸu, yrþlincg? (verb–clitic) ‘What sayest thou, ploughman?’ (AColl 22; van Kemenade 1987: 138) c. . . . ϸæt him his fiend wæren æfterfylgende (clitic–subject) that him his enemies were following ‘. . . that his enemies were following him’ (Oros.,48, 12; van Kemenade 1987: 113) d. . . . þæt þa Deniscan him ne mehton þæs ripes that them the Danes NEG could the harvest forwiernan (subject–clitic) refuse ‘. . . that the Danes could not refuse them the harvest’ (ChronA 89.10 (896); Pintzuk 1991: 188)

Directionality and Word Order Change 113 It is usually assumed that the clitic moves to its special position from an underlying position within VP. The precise nature of this movement rule, and of the alternating cl–V and V–cl orders, is not clear. It is likely that West Germanic cliticization processes are connected to scrambling—see for example Haegeman (1993a) and below. On the relative order of verb and clitic, see van Kemenade (1987) and Tomaselli (1995). The above paragraphs outline the kinds of assumptions and analyses that are typically proposed for OE, assuming that the language is head-final in VP and IP (although we should note that Pintzuk 1991 argues for a ‘double base’ system where both IP and VP were subject to synchronic variation in OE with respect to the headedness parameter). In the next section, I will propose a head-initial analysis for OE VPs and IPs.

3. OE as a Head-Initial Language 3.1 Introduction The purpose of this section is to show that OE can plausibly be analysed as head-initial. Our goal is quite limited: I wish to show only that a head-initial analysis does no worse than a head-final one. In fact, a number of peculiar properties of OE have to be stipulated on a head-initial analysis just as they do on a head-final analysis; it is interesting to observe that many of the same descriptive puzzles arise on both approaches, suggesting that they are not mere artifacts of a given approach. I should also note that this section is compatible with a less restrictive theory of phrase structure than Kayne’s, such as standard GB versions of X′-theory: I argue that there are no compelling empirical reasons to analyse OE as head-final. What we will see is that since OE can be analysed as VO it does not pose a strong empirical challenge to a thesis such as Kayne’s. To put it another way: there is a weak version and a strong version of that thesis being put forward here. The weak version states OE is not an OV language and hence that there are fewer underlyingly OV languages the previously thought. The strong version says that there are no OV languages, and hence OE is not an OV language. The facts do not decide between than weak and the strong version, and, in a sense, it is not crucial for the empirical claims of this chapter that Kayne’s view be accepted. Either version, however, requires that our conception of the attested word order changes in English be rethought. Moreover, if Kayne’s view is accepted, then all cases of OV-to-VO change will have to be looked at in the terms proposed here, and so our argument has a more general scope (see note 11). We begin by discussing the position of I, and then move to the position of V. 3.2 The Position of I Zwart (1993) points out that if we adopt the checking theory of Chomsky (1993a) there is no reason to assume that any verbal functional projection,

114 Ian Roberts i.e. any I-type element, is ever head-final. The basic motivation for this assumption in all the West Germanic languages has been that V-to-I movement creates an inflected verb. Hence, if inflected verbs are found in final position, I must be final. But if V is inserted fully inflected and raises to I to check its morphological features, it is possible to assume that its final position does not correspond to I but to V, with the possibility that the raising takes place covertly. Thus the final position of inflected verbs does not tell us anything about the position of I. (I will return to the question of V-movement below.) Other properties can serve as diagnostics for the position of I-type elements, though. It has often been proposed (beginning with Kayne 1989) that clitics raise to I-type positions. It is clear that there are at least two types of ‘clitic’ elements crosslinguistically, and there are at least two types of positions that they raise to. On the first point, Cardinaletti (1994) distinguishes between clitics and weak pronouns by saying that (a) the latter are homophonous with strong pronouns while the former are not and (b) that the latter only optionally move to ‘special’ positions while the former do so obligatorily. By criterion (a) OE ‘clitics’ are weak pronouns; by criterion (b), they are clitics since they are always found in special positions (although there are some examples where this is debatable—see (19) below). While it may be correct to view OE ‘clitics’ as weak pronouns, this does not affect the argument to be made here since the important point for our purposes is the nature of the special position that is moved into. The fact that this movement is obligatory in OE (examples like (19) notwithstanding) strongly suggests that these elements are clitics, and I will take this view in what follows. I will also follow Cardinaletti (1994) in regarding movement of these pronouns as head-movement, although I will return to this point in section 4. Concerning the second point, clitics or weak pronouns seem to be attracted either to the inflected verb (as in Romance) or to a position after C (as in Germanic; possibly identifiable as the Wackernagel position). Rivero (1994) distinguishes ‘I-oriented’ from ‘C-oriented’ clitics. In these terms, OE clitics are clearly C-oriented, like other West Germanic clitics. However, I will suggest directly that the ‘C-oriented’ position is a kind of Agr-position; in particular, although close to C, it cannot be identified with C at least in OE. The OE clitic alternations in main clauses that are illustrated in (15) suggest strongly that clitics occupy a functional position lower than C. In the orders WH–V–CL (15b), V can be reasonably thought of as being in C (Rizzi 1991), in the order XP–CL–V (15a) either we must explain why the clitic selectively moves with V only in these cases (cf. Tomaselli 1995 for an attempt to do this for the comparable OHG data) or we must conclude that the verb and the clitic are in positions lower than C. Cardinaletti & Roberts (2002 [this volume, Chapter 12]), Pintzuk (1991) and Kiparsky (1995) take the latter option. The alternative of regarding the clitics as consistently moving to C (with V left-adjoined to C in (15b) and not raising to C in (15a)) cannot be maintained for embedded clauses where clitics can follow the

Directionality and Word Order Change 115 subject. Taking the weaker position that clitics always move to C in main clauses would (a) mean that there is no unified target for clitic placement and (b) posit an otherwise unmotivated root–embedded distinction in clitic placement which would be quite separate from that affecting verb placement. I thus conclude that clitics do not always move to C, and, since it is desirable to assume that there is the minimum possible number of targets for clitic placement, that they never move to C. The alternative, directly motivated by the position of the clitic in (15d) and indirectly by its position in (15a), is that clitics occupy ‘medial’ head-initial functional projections. These projections do not seem to form part of the complementizer system, since their nature does not seem to be determined by factors to do with complementation, and so I conclude that they are part of the I-system. This seems natural to the extent that it is plausible to consider that clitics move to special positions because they are subject to special checking requirements. The features to be checked are presumably ϕ-features (since the content of these elements is exhausted by such features), and hence the checking position is presumably an Agr-type position (cf. Sportiche 1996 and section 4 for a proposal along these lines). Note that this view of putatively ‘C-oriented’ clitics reduces the distinction between ‘C-orientation’ and ‘I-orientation’ either to orientation to different parts of the I-system or perhaps, to orientation to the same position with independent difference in V- and subject-placement. We thus have evidence for medial I-positions and, if we assume inflectional affixes are attached to verbs in the lexicon, no evidence for final I positions. Following Zwart (1993) (as the above reasoning does, modulo certain differences between Dutch and OE), we can conclude that IP, or the IP-type projections, are head-initial. We will see further evidence that supports this conclusion below. 3.3 The Position of V The V-raising operation is proposed in order to regularize the head-final order. If inflected verbs are assumed to have raised to I, then we must assume that the non-finite V has raised to the right of I in an example like (7a), repeated here: (7) a. þe æfre on gefeohte his hande wolde afylan who ever in battle his hands would defile ‘who would ever defile his hands in battle’ (ÆLS 25.858; Pintzuk 1991: 10) However, if the finite verb has not moved, we are not forced to conclude that the non-finite form is external to the VP headed by the finite verb. Hence it is possible that the non-finite verb is in a constituent that is a complement of the finite verb; in this case, it is likely that it is something larger than VP. Note that the position of the object is immaterial here; we must

116 Ian Roberts postulate leftward movement out of VP in any case, and so we could say that this element has moved here. In that case, the structure for (7a), rather than (8), would be (8′): (8) þe æfre on gefeohte [VP his hande ti] wolde afylani (8’) þe æfre on gefeohte his handei wolde [VP afylan ti] Presumably, the object moves for case-checking purposes. It may be the ‘verb raising triggers’ are verbs whose complements do not contain an (active) AgrO, and so movement out of them is forced (cf. note 10). It certainly well known that this class of verbs is close to the class of restructuring verbs in Romance (cf. Evers 1975; Rizzi 1982; Rutten 1991), and may be correct to analyse the phenomena associated with restructuring as the reflex of the functional substitution of the matrix AgrO for the low AgrO (cf. Roberts 1997). Where the non-finite verb precedes the finite verb, I assume that the complement (or, in some cases, part of the complement—see the discussion around (29) below) is fronted for checking. It seems then the infinitival clauses are subject to a distinct checking requirement from finite clauses, which are able to remain final. Non-finite complements are subject to the same leftward movement processes as DPs and other complements (e.g. small clauses). However, we must impose one restriction on them: material cannot intervene between a non-finite verb and a finite verb. This property is shared with Dutch small clauses, and implies, as in Zwart’s (1993) analysis of the placement of these elements, that they can only be moved to a relatively ‘low’ position. Positions available to definite DP complements which are non-adjacent to V are not available for non-finite clauses (see note 4 for more on this). Given these assumptions, and an analysis of (7a) as in (8′), V-raising has no clear motivation in OE. V-projection raising is also rather suspect. The same comments as were just made about verb raising apply: if we do not assume a final I it is not clear that we must treat a VP in post-finite-verb position as external to the VP headed by the finite verb. Zwart’s position on this is that ‘V-projection raising’ is exactly like ‘verb raising’ with the single difference that AgrO is available in the lower clause for checking the lower object (1993: 19, n. 14). We can adopt this view. This means that an example like (9a), repeated here for convenience, would involve only movement of the object frið internal to the complement, as illustrated in (9a′): (9) a. hwær ænegu þeod at oþerre mehte [frið begietan] where any people from other might peace obtain ‘where any people might obtain peace from another’ (Or 31.14–15; Pintzuk 1991: 113) (9′) a. hwær ænegu þeod at oþerre mehte [friði begietan ti]

Directionality and Word Order Change 117 One piece of evidence for this point of view for OE is discussed by Haeberli & Haegeman (1995). They show that OE contrasts minimally with West Flemish (WF) in that negative polarity items can appear in putatively raised VPs and form a single semantic negation with ne on the finite verb in OE, while this is impossible in WF: (16) a. þæt heora nan ne mehte [nanes wæpnes gewealdan](OE) that of-them none NE might no weapon wield ‘that none of them could use any weapon’ (Mitchell 1989: 660, cited in Haeberli & Haegeman 1995) b. *. . . dan-ze en-willen [tegen niemand klapen] that-they en-want to no-one talk c. . . . dan-ze tegen niemand en-willen klapen that-they against no-one en-want talk In (16b), the WF negative polarity item niemand cannot be licensed in a raised VP. This is unsurprising since it is a typical property of rightward-moved categories that they form islands of various kinds (cf. Haegeman & van Riemsdijk 1986 on this and other properties in both WF and Zurich German). However, (16a) indicates that the putatively rightward-moved VP in OE allows the negation to link up with the main negation ne (OE, like most varieties of English, has negative concord). Haeberli & Haegeman conclude that many instances of V-projection raising in OE should be analysed as involving verb movement to a medial I. This conclusion provides further evidence against a head-final IP, and partially undermines one of the rightward movement rules that the standard account assumes. (If we are to adopt Kayne’s general view, then we cannot account for the WF facts in terms of the islandhood of rightward-moved projections; for our purposes, it suffices to say that WF negative concord is subject to a restriction that OE negative concord is not subject to. I have no speculations to offer as to what that restriction might be. However, the important point for present purposes is that the absence of rightward movement makes the claim possible, while a theory that treats orders like those in (16) as derived from an island-creating rightward movement rule has no recourse for accounting for the OE vs. WF differences). Haeberli & Haegeman nevertheless argue that V-projection raising is needed in some cases (where van Kemenade 1987 had proposed it). These are cases where there is both a non-subject and a subject before the inflected verb, creating a situation in which there is ‘not enough space’ for both constituents if the finite verb is in I (e.g., the bracketing here is what is implied by Haeberli & Haegeman’s analysis): (17) a . . . þæt [IP he [?? þæs gewinnesi [I mehtej] [VP mare ti gefremman tj ] that he the victory could better achieve ‘. . . that he could better achieve the victory’ (van Kemenade 1987: 21, Oros 47, 1)

118 Ian Roberts b . . . þæt [IP mon [?? ælcne ceapi [I mehtej] [VP be twiefealdan that people each commodity could by twofold bet ti geceapian tj] (van Kemenade 1987: 21, Oros 130, 2) better buy ‘. . . that people could buy each commodity twice as cheaply’ These examples are only problematic if one takes evidence of the type is (16a) as forcing a medial-I analysis. However, as mentioned above, another possibility is to treat the VPs (more precisely, the complement to mehte) as occupying the complement position of mehte. In that case, we are not force to regard the finite verb as having raised to I, and we know that there must be landing sites for scrambling to its left. Haeberli & Haegeman also note the following example with a negative polarity item in the putatively raised VP: (18) þæt wæs ða ða he Iudeas nolde nan wuht læran that was when he the Jews not-wanted nothing/not to-advise hwæt hi don scolden (Haeberli & Haegeman 1995, (31c), CP 58.433) what they do should ‘that was when he didn’t want to advise the Jews what to do’ The authors note that Iudeas here must be in an adjoined position; presumably it would be adjoined to I′. In terms of the idea just sketched, we can regard this DP as occupying a scrambled position to the left of the category whose head contains the inflected verb, clearly a more satisfactory analysis (in fact the I′-adjunction possibility is ruled out on Kayne’s 1994 assumptions). There may be a ‘Pollockian’ argument for V-movement here. If the second element of negation—nan wuht— is in a position comparable to NE not or French pas, then the inflected V is not in VP. However, if there is a scrambled position to its left the verb cannot be in the position that inflected verbs occupy in French (Agrs). But the negative polarity evidence shows that the verb is not final with a raised VP following it. I conclude that it must be in a medial I-position (medial in the sense of being lower than Agrs but nevertheless VP-external). The natural candidates are T and AgrO (assuming the clause structure proposed in Belletti 1990 and Chomsky 1993 where Agrs is higher than T and T higher than AgrO; AgrO is only a candidate to the extent that it is above NegP—see Roberts 1995 [this volume, Chapter 3]).3 For the sake of concreteness, we take this position to be AgrO. Another property that has been attributed to V-projection raising in WF and other contemporary West Germanic varieties is that pronouns cannot be part of ‘raised’ projections. However, Pintzuk (1991) gives examples where pronouns are in such positions in OE: (19) . . . þæt heo wolde hine læran that she wanted him to-teach ‘. . . that she wanted to teach him’

(Pintzuk 1991, ÆLS 18.291)

Directionality and Word Order Change 119 There are several ways to interpret examples like this. First, if we assume V-projection raising, then we are led to conclude that OE pronouns are different from those of WF and other Modern West Germanic varieties. In that case, (19) would be evidence that pronouns do not always move to a special position in OE on a ‘V-projection raising’ analysis, supporting the idea that OE ‘clitics’ are really weak pronouns in Cardinaletti’s terms (and conversely that they are clitics in Cardinaletti’s sense at least in WF). Second, we could deny that this example contains V-projection raising, and treat it as evidence for medial I. In principle, such a conclusion would not tell us anything about the existence of V-projection raising elsewhere in OE or about the position of V in VP. Third, if we consider that ‘V-projection raising’ reflects the presence of an in situ complement, then we must take it that the pronoun has moved within the complement in (19). This option is unavailable in WF, etc. On this view, we have no evidence that OE weak pronouns are anything other than clitics. The kind of variation in clitic positions between OE and WF that we posit is attested in Romance languages with the very similar operation of clitic climbing: in Standard Italian, clitics can climb (to an appropriate matrix verb) or cliticize to the lower verb, while in Sardinian and a number of southern Italian dialects climbing is obligatory wherever it is possible. This parallel can be maintained whether or not we consider clitic movement to be head movement, since West Germanic languages allow ‘long’ scrambling in contexts which can be plausibly regarded as restructuring contexts (Evers 1975; Rutten 1991; Zwart 1993). The third position is consistent with everything else we have said, and also indicates that OE weak pronouns are clitics in Cardinaletti’s sense. I thus adopt this position. The evidence from Haeberli & Haegeman combined with the general considerations regarding verb raising and the position of functional head that we raised earlier combine to cast some doubt on the existence of V-projection raising in OE. If V-projection raising is not assumed, we have more cases of leftward movement than was previously thought: orders where VP precedes the finite verb might be derivable by leftward movement (again, not necessarily of VP, rather of a larger constituent). Since leftward movement is required in any case, this is not a problem. We also have prima facie cases of complements to the right of the verb. As in the case of verb raising we do not take these complements to be VPs. They must be at least IPs; more generally, they are complements that are ‘transparent’ rather in the manner of the complements to restructuring predicates is Romance languages (a similarity that was originally observed by Evers 1975), hence clitic climbing and scrambling can take place from within them. I thus suggest that V-projection raising, which has always been a highly problematic operation, does not exist. More specifically, I conclude that OE provides no cases of such an operation: the derived structure of putative examples of this construction is as in (9a′). I have now dispensed with two of the rightward movement rules of OE that the standard analyses assume. The remaining one is extraposition. For

120 Ian Roberts CP- and PP-extraposition, I simply assume that the elements in question are able to remain in their complement positions (Zwart 1993 provides evidence from the fact that post-verbal CPs are not necessarily islands— which extraposed clauses always are—in favour of the idea that this is the situation in Dutch; I do not have comparable data in OE, unfortunately). DP-extraposition is more interesting. Here, one possibility is to adapt Pintzuk & Kroch’s analysis in the obvious fashion: focussed DPs are able to remain in complement position. An alternative, at least for complements of non-finite verbs, is to say that final DPs are fronted inside the complements and the remainder of the complement undergoes the usual leftward movement operation for non-finite complements. The derived structure of an example like (l0c) would then be as in (20): (20) . . . þæt ænig man [XP atellan ti ] mæge [YP [DP ealne þone demm]i tXP] We will discuss this kind of derivation in more detail below—cf. (29). Focussing and remnant fronting account for many cases of final DPs. The residue must be demonstrably unfocussed and in embedded clauses with a single finite verb which are demonstrably not V2; and note that even this kind of example could be handled by postulating V-raising higher that AgrO and object movement to Spec,AgrO (see Note 4). (Similarly, V-movement allows for cases where the final DP is not adjacent to the finite verb). An important argument for V-final orders, originally due to Koster (1975), has to do with the positions occupied by particles belonging to separable verbs like terug+geven ‘give back’ in V2 clauses. Koster’s observation was that the position of the particle always corresponds to the position of the verb+particle combination in a verb-final clause: it cannot be followed by a DP, cannot be preceded by a finite clause, and may be followed by a PP. Koster argued that this distribution of particles could be simply accounted for if one assumed that the particle was stranded by V-movement. Hence the underlying position was final (modulo the position of finite clauses and certain PPs). Van Kemenade (1987: 29–39) applies Koster’s criteria to OE. What emerges is that particle positions pattern fairly systematically with V-positions, although they can be more readily separated from V in OE than in Dutch and they are also able to move with V in V2 clauses in OE, unlike Dutch. It seems, then, that the rule attaching Prt to V applies more liberally in OE. Aside from this, however, the particle positions in OE differ from those in Dutch in three main respects. i. Particles can be followed by complement DPs, as can finite embedded verbs: (21) . . . þa ahof Paulus up his heafod then raised P. up his head ‘. . . then P. raised up his head’ (van Kemenade 1987: 33, AHTh, I, 96)

Directionality and Word Order Change 121 As van Kemenade notes, the ‘postposing’ of DP here is just a case of the general possibility of having post-finite-verb DPs in OE (here it seems clear that the post-verbal DPs are not necessarily focussed). Pintzuk (1991) shows that particles do not occur after a non-finite verb, and interprets this as an argument for a medial I. I concur in this view; this is consistent with the suggestion made above that finite V moves to AgrO. ii. In Dutch, predicate adjectives and participles have to precede the final V in embedded clauses and always precede the particle in V2 clauses. This is not the case for OE; (22a) shows that adjectives can follow V, and (22b) shows that an adjective can follow Prt: (22) a . . . forðam ðe hi licettað hie unscyldige because that they pretend themselves innocent ‘. . . because they pretend themselves to be innocent’ (van Kemenade 1987: 35, CP, 439, 19) b. . . . he ahof þæt cild up geedcucod and ansund he raised the child up quickened and healthy (van Kemenade 1987: 36, AHTh, II, 28) These examples arguably involve small clauses. Zwart (1993) argues that small clauses have to move to a special checking position (which he calls Spec, PredP) in Dutch, this derives the fact that constituents of this type always precede V in embedded clauses. Suppose this is true; then the OE data indicate that V can raise to a higher position in that language. We thus have another piece of evidence for V-movement. We also arrive at an important difference between OE and Dutch. iii. In Dutch, very few adverbs follow the finite V in embedded clauses and very few follow Prt in main clauses. OE does not show the embedded pattern of Dutch, i.e. it allows post-verbal adverbs is embedded clauses: (23) . . . ðæt hie ðæt unaliefede doð aliefedlice that they the unlawful do lawfully ‘. . . that they do unlawful things as if they were lawful’ (van Kemenade 1987: 36, CP, 144,10) There is no data available with respect to the root pattern. Again, I take (21) as evidence for V-to-AgrO movement in OE. The object has been moved to Spec,AgrO, or perhaps higher. More generally, I regard particles, following Kayne (1985), as small clause predicates. They optionally adjoin to the left of V in OE. I also have to assume that V can ‘excorporate’ from Prt in OE, as in Dutch. When Prt does not adjoin to the left of V, it occupies the same positions as

122 Ian Roberts other small-clause predicates like those seen in (22). This gives the following possibilities: (24) . . . V . . . Prt + t [ DP t]: (V-movement and Prt-movement) . . . þæt he ahof up the earcan (Pintzuk 1991 78, GD(C) 42.6-7) ‘. . . that he lifted up the chest.’ (25) . . . V [ DP Prt] (no movement) . . . þæt he wearp þæt sweord onweg (Pintzuk 1991 91, Bede 38.20) ‘. . . that he threw the sword away.’ (26) . . . Prt+ V [ DP t] (Prt-movement) . . . þæt up arisað lease leogeras (Pintzuk 1991; 84; WHom 1b.16) ‘. . . that up arise false liars.’ ‘. . . that false liars rise up.’ (27) . . . V X [ DP Prt] (V-movement) . . . þa ahof Drihten hie up (van Kemenade 1987: 33, Blick 157) ‘. . . then raised the-Lord them up ‘. . . then the Lord raised them up.’ The examples of ‘V-movement’ in (24)–(27) are cases where V moves beyond AgrO. In some cases, e.g. (27), this is clearly movement to C; in others, e.g. (24), it may not be (note the different position of the verb with respect to the subject in (24) vs. (27)). The DP subject of the small clauses (24)–(27) may be moved out of the small clause to a checking position— this is very likely for the clitic hie in (27), and cannot be excluded in the other examples (this depends on what we assume about the position of V; see note 3). If V does not in fact move further than AgrO in (25) and (26) then we must assume that the object can check for case inside the small clause; notice that on this view (26) becomes analogous to our proposal for final DP-complements of non-finite verbs in (20). It is striking that this order, like that in (20), is not found in contemporary West Germanic. So far, we have seen that we can do without the three rightward movement processes assumed by the standard analysis. We have seen that V moves to AgrO, and that small clauses, non-finite complements and DP-complements move leftwards. We have suggested that finite CPs, some PPs and, possibly, focussed DPs can stay in complement position, and that DP can be ‘stranded’ in final position by remnant complement-movement. It is natural to relate the CP- and PP-positions to the idea that such categories are not required to check for case, although given that we assume small-clause predicates and non-finite complement clauses are also subject to a checking requirement the notion of ‘case’ here is much more abstract than in GB theory—we return to this point briefly in the next section. We are also assuming both scrambling and cliticization. At this point, we should make our general position clear regarding the leftward movement of complements.

Directionality and Word Order Change 123 I propose that the OE clause contains the following positions: 1. a topic position: active only in V2 clauses (although V is not always in C in such clauses, as the clitic evidence shows); 2. C; 3. a subject position, although, as in Dutch and German, the subject can and often does remain in a lower position; 4. the clitic position, again comparable to German; 5. a scrambling position, which can also be occupied by the subject; 6. a second scrambling position; 7. the checking position for objects, non-finite complements and small clauses: Spec, AgrOP; 8. the position of the finite verb in non-V2 clauses: AgrO; 9. the base position of V (which in certain examples it appears to stay in); 10. the complement position, occupied by CP, some PPs and (possibly) focussed DPs. I distinguish two scrambling positions on the assumption that OE scrambling is parallel to that of German as seen in (12), although we have not seen any direct evidence of this. There are thus three possible positions for the direct object: 5, 6 and 7. For concreteness, we identify the second scrambled position (Position 6) as Spec, TP. If we take Position 3 to be Spec, Agr1P in the sense of Cardinaletti & Roberts, then the clitic position is Agr1 and the first scrambling position may be Spec, AgrS (following Kayne 1994 we do not structurally distinguish specifiers from adjoined positions and assume a given YP cannot simultaneously support a specifier and an XP adjunct, hence Position 5 is the position adjoined to AgrSP and when the subject moves there it presumably checks with AgrS, but a scrambled element may not).4 These assumptions give us the following clause structure: (28) [CP1 [C2 [Agr1P3 [Agr14 [AgrSP5 [AgrS [TP6 [T [AgrOP7 [AgrO 8 [VP9 10]]]]]]]]]]] The above proposal accounts for the word orders found in OE with little stipulation. Like all other analysts, we must allow for a certain optionality: for example, V-movement is not always required, even in main clauses. Also, we may have to allow for focussed DP complements to have a special privilege with respect to Case theory. It may be that the optionality in verb movement concerns the strength of features of functional heads which trigger movement (what this effectively means in a framework like that of Chomsky 1993 is that the optionality represents distinct parameter values; this view of OE is argued for by Pintzuk 1991, but there the idea is framed in terms of variation in the branching direction of I′ and V′. The proposal in Kiparsky 1995 that C may be absent from OE main clauses reduces to the same idea in Minimalist terms—C cannot be absent, but may have weak features in some instances and strong in others).

124 Ian Roberts To see how the system works, let us consider the principal subordinate clause word orders, as discussed in van Kemenade (1987) and Kiparsky (1995). Here I gloss over clitic positions, and use Aux to mean finite V: (29) a. S V Aux O b. S O Aux V c. S Aux O V d. S O V Aux e. S Aux V O f. *S V O Aux

(standardly DP-extraposition) (standardly V-raising) (standardly V-projection raising) (standardly underlying) (V-raising, DP-extraposition) (underivable)

In the head-initial system being advocated here, the grammatical orders are derived by combinations of object movement and ‘VP-movement’ (here again, it is likely that the category being moved is larger than VP). I now outline the relevant derivations in detail. On our account, (29a) must involve fronting of the non-finite verb. We can view this as a kind of VP-fronting if we can motivate moving the object out of VP. As sketched earlier (cf. (20)), I assume that this happens on the lower cycle, i.e. that the object moves out of the fronted constituent inside the lower clause, and the remnant of the lower clause is fronted. This analysis ties the existence of post-verbal DPs in OE to the existence of examples like (19) in terms of the idea that nominals, full DPs and pronouns can be fronted for checking on the lower cycle (cf. also the discussion of (25)). Neither examples like (19) nor post-verbal DPs are found in Modern West Germanic; this then reduces to the same fact (but cf. the discussion of (29c) below). Recall again that similar variation is found in clitic-climbing constructions across Romance. On this view, the relevant parts of the S-structure for (29a) are as follows: (30) . . . [AgrOP [VP V ti] [AgrO Aux] [XP . . . Oi . . . We could capture the connection to focus observed by Pintzuk & Kroch in the following manner: at LF, the object must raise to a position c-commanding its trace. This can only happen where the object is focussed and undergoes LF raising. The well-known weak-crossover effects discussed is Chomsky (1977) show that LF raising places DPs in a position higher that the subject position. Therefore, this position is higher than the position of the fronted VP in (30), and the object c-commands its trace at LF. This idea has two disadvantages: first, the empirical status of Pintzuk & Kroch’s result for texts other than Beowulf is uncertain, as we mentioned earlier; second, it is not clear that the fronted complement cannot be reconstructed (although if this is a kind of A-movement, we do not expect this possibility, unlike the Modern English VP-topicalization discussed in Huang 1993). We thus leave this question open. The order in (29b) results straightforwardly from either object scrambling or object movement to the higher Spec,AgrO. (29c) involves raising

Directionality and Word Order Change 125 of the object to Spec, AgrOP for case-checking inside the complement clause without movement of the remnant category containing V. Since this kind of order is found in some contemporary West Germanic varieties (it is standardly analysed as V-projection raising, cf. section 2.2.2), we have to conclude that checking on the lower cycle is allowed in these varieties. What is not allowed in these varieties, however, is (i) clitic placement on the lower cycle (cf. the discussion around (19)), and (ii) fronting of the remnant constituent into the higher clause (to give the order in (29a)). We can now derive the following prediction for West Germanic (other than English): a language in which non-finite complements are fronted has the order in (29a) only if it has V-projection raising, i.e. the order in (29c). This prediction is fulfilled; Old High German and Middle Dutch are like OE in having both orders, while no modern variety which has the order in (29c) has the order in (29a). (29d) can be analysed as object scrambling combined with VP-fronting (note that the object c-commands its trace inside VP), or as fronting of the entire complement including the direct object. We could regard (29e) as not involving movement (except for Aux-to-AgrO) and claim that the object must be focussed in order to escape the requirement that it move to Spec, AgrOP. Once again, if this cannot be sustained empirically, we need an alternative analysis. To capture (29e) while ruling out (29f) and without assuming that the final DP is obligatorily focussed, we must invoke a variant of the standard verb raising idea (cf. section 2.2.1). We propose, then, that the infinitive attaches to the auxiliary in these cases (and, presumably, in (29b)). This captures the further fact that nothing can intervene between Aux and V in such cases. As in all analyses of verb raising in systems where the order in (29c) (‘V-projection raising’) is also found, we have to treat this infinitivemovement as optional.5,6 Consider now the illicit order (29f). There are several possible derivations to look at. First, we can rule out the possibility that V and O move as a constituent, since the object must leave VP in order to be licensed in the lower clause (and V does not move to Aux here, clearly). So we assume that the object and the VP are both fronted separately. Essentially, VP must front to a position lower than the object. We have already guaranteed this by assuming that VP must move to the same position as small-clause predicates (including particles): Position 7. Thus, the object would have to be in VP to give the order in (29f), and we have just seen that this is impossible. (We also need to prevent the object from being focussed here; perhaps this can be achieved by requiring LF-raising of focussed categories and preventing raising out of a constituent on a left branch). The above paragraphs give my account of OE word order. The account fares no worse than standard ones (which are highly stipulative; all the processes described in section 2 are motivated purely by the need to attain descriptive adequacy) and in some cases does better, e.g. regarding the crosslinguistic prediction about the relation between the orders in (29a)

126 Ian Roberts and (29c). Our approach also captures the observation that non-finite Vs always form a ‘verbal cluster’ with finite Vs in West Germanic (cf. Evers 1975; den Besten & Edmonson 1983; Prinzhorn 1990), except when they undergo remnant topicalization to the first position (cf. den Besten & Webelhuth 1990). Verbal clusters result either from fronting a non-finite complement to Position 7 or from V-raising out of the complement (note that this is not optionality in the sense of two parametric systems; the same checking operation takes place in each case, and, seemingly, equally economically). One final point: the ten positions of the OE clause carry over straightforwardly to Dutch and German. There are three properties which distinguish both Dutch and German from OE, all of them well known and all of them unaccounted for on this or any other analysis. First, V always precedes clitics in V2 clauses in both Dutch and German but not OE. We illustrate with German—this example should be contrasted with the grammatical OE (15a): (31) *Gott ihnen werkte die Kleider God them made the clothes We take this to indicate that V moves to C obligatorily in topic-initial V clauses in Dutch and German, but not in OE (cf. Kiparsky 1995); we take no position on the analysis of SV V2 clauses—cf. Zwart (1993), Schwartz & Vikner (1996). Second, V does not raise to AgrO in embedded clauses in either Dutch or German (but cf. note 3), hence we do not find post-verbal final adverbs or small clauses in these contexts. Compare the following Dutch examples with (22) and (23) respectively: (32) a. * . . . omdat hij ontving de wijnglazen gebroken because he received the wine-glasses broken b. * . . . omdat hij zijn werk deed ijverig because he his work did industriously (van Kemenade 1987: 27) Third, as we have seen, final (light) DPs are not allowed, as the following Dutch example shows: (33) * . . . omdat hij kocht het boek. because he bought the book. Dutch also differs from both German and OE in not allowing complement clitics to precede the subject. We take this to indicate that the subject must appear in Position 3 (Spec,Agr1P) in Dutch—cf. Cardinaletti & Roberts (2002 [this volume, Chapter 12]). These differences aside, Dutch and German pattern like OE.

Directionality and Word Order Change 127

4. Changes in ME This section will outline my account of how word-order change took place in the early ME period. This account can link the word-order change to three other important changes that took place at this time and can be embedded in a general approach to language change. These advantages follow naturally from the assumption of head-initial order in OE. It follows that if Kayne’s theory of phrase structure forces us to assume that OE was head-initial, this theory has advantages in this particular domain over less restrictive theories such as the standard GB one. At the same time as the word-order change, two other important syntactic changes took place: (34) a. Loss of complement clitics (but cf. note 7) b. Loss of scrambling Van Kemenade (1987) argues at length that the object cliticization was dramatically reduced in the twelfth century and completely extinct by 1400. On the other hand, subject clitics are found until c.1400, especially in southern texts. The loss of the subject clitics was connected to the loss of V2 (cf. van Kemenade 1987). However, our main interest here is in complement clitics. It is often observed that English word order was ‘rigidified’ in ME.7,8 Although the word order in ME (and indeed ENE) was freer in various ways than in NE (see note 7), we interpret this observation as meaning that scrambling disappears quite early on, around the time of the word-order change. Another change that takes place in early ME is the loss of the morphological case declensions. OE had a system of case marking on nouns which distinguished four cases and two numbers, and up to seven declension classes (case marking on articles has a slightly different history—see note 9). Owing at least in part to phonological changes (the reduction of unstressed vowels to [ə] and the loss of final nasals) and in part to standard processes of morphological ‘levelling’, this system was reduced by EME to one where nominative–accusative distinctions were essentially no longer made, and only the dative ([–ǝ]) and genitive singular ([–(ǝ)s]) survived. Of these, the former did not last long, and so we arrive at essentially the modern system. These changes can be illustrated with the following paradigms for stone, which in OE was a representative of the masculine a-stem declension, the one to which all other declensions were apparently levelled: (35) OE:

nom: acc: gen: dat:

stan stan stanes stane

stanas stanas stana stanum

128 Ian Roberts 12c.

14c.

ston ston stones ston(e) stoon (sg)

stones stones stone(s) stonen/s stoon(e)s (pl/gen sg)

For further details on the loss of the OE morphological case system, see Lass (1992: 103–112). The ‘standard’ account of the development of English word order postulates a change in the directionality parameter in (1), taking place in the twelfth century (see van Kemenade 1987 and Lightfoot 1991). It is unclear how to connect this change to the loss of clitics and scrambling, and to the loss of morphological case (although van Kemenade posits a connection between the loss of clitics and the loss of case morphology). The account that I propose here has the merit of connecting the word-order change to the changes in (34). It is also possible, as we shall see, to regard the loss of morphological case as the trigger for all these changes. The account of OE word order sketched in section 3 crucially involves object fronting to Spec,AgrOP. Along with scrambling, this gives rise to many OV orders. Suppose in fact that these two processes are linked. For concreteness, I take scrambling (in West Germanic) to be Ā-movement. That is, scrambling is movement to a non-L-related position; the landing site of scrambling is adjoined to a maximal projection that has no lexical feature to assign to the scrambled element. However, scrambled DPs must check for case; thus, they move through Spec,AgrOP en route to the scrambled position (I assume the same, or the analogous, for scrambled indirect objects and some PPs; the checking mechanism that is relevant here may correspond to the GB notion of inherent case, and so it is no surprise to see it extended to at least some PPs). Movement to Spec,AgrO is required because AgrO has a strong N-feature (although, possibly, focussed DPs are exempt from this requirement, as we have seen). I take the general view that ‘deep’ syntactic changes such as word-order change arise through restructuring of grammars by language acquirers (see in particular Lightfoot 1979, 1991). In that case, if we are to understand the word-order change in early ME, we must understand what leads language acquirers to postulate a strong N-feature associated with AgrO. On this point, I assume the following: (36) a. Morphological trigger: if a head H has the relevant L-morphology then H has strong L-features. b. Syntactic trigger: if a well-formed representation can be assigned to a given string by assuming that H has strong L-features, then H has strong L-features. c. In general, weak features are the default value. These are assumed in the absence of clear evidence to the contrary of the type in (a) or (b).

Directionality and Word Order Change 129 (36a) states, in general terms, what the morphological trigger of a strong feature is. It is motivated by the attested facts involving the loss of V-to-I movement and the loss of agreement features in sixteenth-century English (see Roberts 1985 [this volume, Chapter 1], 1993a; Pollock 1989; Rohrbacher 1994a). Similarly, (36b) gives a rough statement of the syntactic trigger for a strong feature. The assumption in (36c) derives from the general idea that a preference for maximally simple representations of the input is a property of the learner (cf. Clark & Roberts 1993 [this volume, Chapter 2] for a formalization of this). The simplest representation compatible with the input is chosen, where representations lacking overt movement are defined as simpler than those featuring movement dependencies (arguably because overt movement always creates adjunction structures, while the lack of movement may not, and adjunction structures are more complex than non-adjunction structures). In this sense, we see that weak features give rise to simpler representations than strong features, and so robust positive evidence is required for strong features, while weak features represent the default (or unmarked) value (this conception of markedness is discussed and illustrated at greater length in Roberts1999 [this volume, Chapter 5]). In the case of the word-order change in early ME, the morphological trigger in the sense of (36a) is provided by nominal morphology, i.e. case marking. As we saw above, the OE morphological case system broke down in the early ME period. Once this happened, there was no morphological trigger for strong N-features on AgrO. So we see that the loss of case marking in English removed part of the trigger for these features.9 However, the loss of part of the trigger for the strong value of the AgrO parameter does not on its own change the value of that parameter; it simply means that the syntactic trigger became crucial for determining its value. To prevent the parameter defaulting to the weak value in the absence of morphology, OV orders must be robustly attested in the trigger experience, forcing the postulation of representations where AgrO has a strong N-feature. In this respect, the crucial factor was the existence of post-verbal DPs, and a number of other post-verbal constituents (particles and other small-clause predicates) owing to the existence of V-to-AgrO movement and stranding of final DPs by remnant movement of non-finite complements. As long as the morphology provided a clear trigger for a strong value of AgrO, these constructions were assigned representations of the type seen in section 3. Once the morphological trigger for DP-movement to Spec,AgrOP was lost, however, VO and other V-complement orders could be assigned simpler representations not involving DP-movement. In such representations, AgrO has a weak N-feature. This kind of analysis is favoured by the preference for weak feature-values, (36c) (which in turn derives from a general preference for simpler representations wherever possible, as mentioned above). Hence the presence of these orders weakens the syntactic trigger for a strong feature on AgrO by making possible more highly valued representations in which AgrO has a weak feature. Given this possible analysis of VO orders, the weak

130 Ian Roberts value of AgrO is both the default value, and confirmed by part of the trigger experience. Hence there is no robust syntactic trigger for the strong value of the parameter, and the OV orders either die out or are reanalysed.10 Thus the AgrO-parameter changed as the formerly strong N-feature became weak. This entails the loss of the orders in (29b, c, d). As I mentioned above, scrambling must move through Spec,AgrOP in order for the scrambled DP to check its case. Once Spec,AgrOP loses its strong case feature, there is no reason to move there overtly. Hence, by Economy, movement to this position becomes impossible, and scrambling is lost. Of course, wh-movement survives the loss of a strong N-feature on Spec,AgrOP. We must assume that wh-movement can allow its trace to check for case, but that scrambling cannot. The reason for this might be that whtraces are ‘true variables’ while the traces of scrambling are not, a fact which we can connect to the well-known fact that (Germanic) scrambling does not trigger weak crossover (cf. for example Lee & Santorini 1994, Vikner 1994b). Suppose further, following Sportiche (1998), that clitic placement (in West Germanic) involves DP-movement followed by local D-movement. The only kind of DP-movement that can place D within range of the clitic position without violating the head movement constraint is scrambling, given our analysis of OE clause structure. The loss of scrambling thus implies the loss of special positions for clitics. More precisely, it implies the loss of complement clitics; subject DPs can move within range of the clitic position by A-movement. As we saw above, complement clitics are essentially lost at the same time as the word order changes in early ME. We are thus able to treat the principal cause of the word-order change as the loss of morphological case marking, and connect this change to the loss of scrambling and complement clitics. Notice that the loss of the latter two operations is guaranteed by central principles of the theory, in particular economy constraints on movement. Linking the word-order change to the loss of morphological case raises two major objections from a comparative Germanic perspective. First Icelandic is VO and has a rich morphological case system. Second, Dutch is OV and lacks morphological case (by and large). We can account for Dutch in terms of the idea that (36) provides: the morphological trigger for the strong N-feature of AgrO. (36) states that AgrO has a strong feature if the morphology is present. Hence, the lack of morphology implies nothing. In Dutch, the syntactic properties that facilitated the change in AgrO’s featurevalue in English are missing: V-to-AgrO movement and final DPs.11 Hence the reanalysis has not taken place, and Dutch retains a strong N-feature or AgrO despite the lack of a morphological trigger. For Icelandic, we must say that V always moves to a higher position than the object: here the issue of object shift comes up again. It is unclear, however, why Icelandic should lack scrambling. Again, though, our account simply says that there is a necessary condition for scrambling—a strong N-feature on AgrO —we have nothing to say about what the sufficient condition might be.

Directionality and Word Order Change 131

5. Conclusion I have argued that the facts of OE and ME word order, and the changes relating the two systems, can be accounted for in terms of the idea that OE was a head-initial language. Our argument concerning the synchronic analysis of OE was simply that such an analysis does no worse than many recent analyses which postulate a head-final order. Many descriptive problems remain for both approaches. The real motivation for our approach comes from the treatment of the word-order changes in early Middle English. I argued that the loss of OV orders was caused by the loss of a strong N-feature on AgrO, a development which is related to the loss of morphological case on DPs by (36). In this way, the word-order change in English can be viewed as an instance of a typical kind of change: the loss of an overt movement rule caused by the loss of the morphological trigger for a strong feature of a functional head. The loss of overt movement of inflected verbs in early Modern English was arguably a change of a similar kind (cf. in particular Roberts 1993a). I further suggested that the loss of scrambling and of complement clitics was connected to the change in the value of this feature: once this feature had changed, economy constraints on movement prevented overt scrambling from taking place, and this in turn blocked cliticization of complements. In this way, our account connects four salient changes in early Middle English in terms of the loss of a single abstract feature, and the account of the causation of the change is embedded in a more general theory of language change, from which it follows that strong features may become weak when a morphological trigger for overt movement disappears from the input to language acquisition.12

Notes * Earlier versions of this material were presented at the University of Venice; University of Geneva; University of York; the Centre National pour la Recherche Scientifique, Paris; the School of Oriental and African Studies; London; the ninth Comparative Germanic Syntax Conference (Harvard University); Georgetown University; and the 3rd Diachronic Generative Syntax Conference (Free University of Amsterdam). I would like to thank the audiences at those presentations, and particularly Anthony Kroch and Giuseppe Longobardi, for their comments and criticisms. Thanks also to the editors of this collection and one anonymous reviewer for comments and suggestions. All errors are my own. 1. This is not quite how Kayne puts it. Kayne proposes the Linear Correspondence Algorithm (LCA):

(i) For a given phrase marker P, with T the set of terminals, d(A) is a linear ordering on T

where d(X), for X a non-terminal, is the set of terminals X dominates, and A is the set of pairs of non-terminals such that the first asymmetrically c-commands the second. These notions formalize the relation between asymmetric c-command and linear order. However, as Chomsky (1994) and Rohrbacher (1994b) both point out, it is necessary to stipulate precedence, not just ordering, in order to derive the result that Specifier–Head–Complement is the only possible order within XP. To see this, take the VP in (3). Here d(VP) is {⟨see, him⟩}. So

132 Ian Roberts the LCA requires that the terminals see and him be ordered, but not necessarily that see precede him. Moreover, where d(X) contains more than one ordered pair, say {⟨x, y⟩, ⟨y, z⟩}, nothing in Kayne’s system prevents us from choosing ‘precede’ as the ordering among x and y and ‘follow’ as the ordering among y and z (giving xzy where x asymmetrically c-commands y and y asymmetrically c-commands z).The fact that precedence has to be stipulated makes Kayne’s system less elegant than it might have been. 2. Note that in a theory of the type advocated in Chomsky (1993), which lacks a single point in the derivation that can be defined as a base, it is not even clear that the notion ‘base expansion of X’ can be defined. 3. This conclusion depends on the assumption that there are no scrambling positions to the left of AgrsP. This assumption is somewhat dubious, and has been explicitly denied by Haegeman (1993b) and Sportiche (1996). If these authors are right in positing scrambling positions between C and AgrS, then we could maintain that V raises to Agrs in examples of this type (and more generally in OE and West Germanic). If there is a clear relation between the ‘richness’ of verbal inflection and verb movement to Agrs of the type proposed by Roberts (1985, 1993) and Rohrbacher (1994a), then we are forced to say this anyway for at least some of these languages. However, in that case superficial OV order is the result of obligatory scrambling to positions above Agrs; it is not clear what causes this (although this does not affect the account of the wordorder change given in section 4, which postulates a relation between the loss of scrambling and the loss of overt movement to Spec, AgrOP; even if superficial OV derives from obligatory scrambling to positions above Agrs, loss of movement to Spec, AgrOP will nevertheless account for the loss of scrambling). For the purposes of the present argument, however, we can maintain the slightly simpler assumption that the verb raises to T or AgrO in OE. 4. Alternatively, given the discussion in note 3 of scrambling positions above AgrS, both positions 5 and 6 may be above AgrSP. This would entail a number of modifications to the structure in (28). 5. Put this way, V-raising might be thought to involve right-adjunction, another structural configuration ruled out by the proposals in Kayne (1994). We can avoid this consequence by positing that V left-adjoins to Aux (i.e. the higher V-position), and Aux the raises to AgrO by excorporation. Note that any analysis of West Germanic verb raising must assume excorporation (cf. Roberts 1991 [this volume, Chapter 10]; Rutten 1991). To get the correct orders in clusters of more than two verbs, we have to assume that non-finite verbs also move; in fact, this is exactly the same operation. 6. The reason for verb clustering might be that ‘transparent’ complements must be licensed by an external head. The licensing head would always be the finite verb, and the relationship either a Spec-head one (where the complement moves to Position 7) or head-head one (motivating V-raising in (29b, d)). On this view, VPs can front to the specifier of a position occupied by the finite verb or its trace (e.g. Spec,AgrOP or, in V2 clauses, Spec,CP). A similar constraint holds for ‘restructuring’ in contemporary Romance languages—cf. Kayne (1989), Roberts (1997). 7. Until the sixteenth century, pronoun object shift of the sort found in contemporary Mainland Scandinavian is attested:

(i) They tell vs not the worde of God (1565, T. Stapleton, A Fortress of the Faith (Antwerp 1565); Roberts 1995: 27 [this volume, Chapter 3])

This is a different kind of pronoun movement from the type being considered in the text. It always places the pronoun in a position lower than the subject (i.e. below AgrS) and it is dependent on verb movement to a higher position (cf. Holmberg 1986). On the other hand, clitic movement of the kind found

Directionality and Word Order Change 133 in West Germanic and OE places the pronoun above AgrS (but below C, we have argued) and is independent of V-movement. The historical development of Eng lish strongly suggests that the latter system developed into the former. Here we propose an account of how the latter system was lost. There are residual OV orders in ME with full DPs. These are mostly found with infinitives and participles, e.g. (thanks to Najib Jarad, p.c., for (ii-b, c)): (ii) a. I may no rest haue a-mongys 3ow (MKempe A 122.19–20; Fischer 1992: 373) b. and prattest hine to slayne and his cun to fordonne ‘and threaten to slay him and destroy his kin’ (CursM 12965; Visser 1963–73: §1039) c. She did him excite . . . hir story for to write (Lydgate Fall Pr. 9.518; Visser 1963–73: §2279)

These are reminiscent of similar orders found in Old French (cf. Pearce 1990). This construction is too restricted to be considered a case of scrambling. Perhaps it is a variant of Icelandic-style object shift, although to show this we would have to show that ME infinitives move (and note that (iia–c) illustrate the three main kinds of ME infinitive: bare, to and for-to). We have no detailed analysis along these lines to offer here, although it is tempting to consider such a movement as the diachronic residue of OE V-raising—see below. 8. In minimalist terms, this may seem strange, but one could claim that scrambling assigns an interpretive feature rather than a formal, morphosyntactic feature; probably the basis of the distinction between L-related and non-L-related movement/positions can be recast in these terms, with L-related positions being ‘pure’ checking positions and non-L- related ones being associated with an interpretive feature of some kind. On what the interpretive property of scrambled positions might be, cf. Diesing (1992). 9. Here the question which arises is what we should call the ‘relevant’ morphology in terms of (36). One possibility is that it concerns the existence of overt morphological nominative-accusative distinctions. However, only two out of seven noun declensions ever distinguished nominative from accusative in OE, and those only in the singular. A more promising proposal would be to attribute the crucial properties to determiners. Masculine and feminine singular forms of the protodefinite article (a demonstrative at the time—cf note 13) distinguished Nominative from Accusative in OE (masc se (NOM), þone (ACC), fem sēo (NOM), þā (ACC)) but this distinction dies out around the same time as the word-order becomes VO. The correlation here seems quite close in that, for example, the Final Continuation of the Peterborough Chronicle (1132–55) shows an invariant þe as the singular definite article (Lass 1992: 112), and is usually thought to be VO (Mitchell 1964, cited in Fischer (1992: 372), shows that this text has 88% VO order). We might take it then, that morphological case-marking on articles, in particular a nominative-accusative distinction, provides the ‘relevant morphology’. This conclusion is consistent with the situation in Modern German, which provides a morphological trigger for a strong N-feature on Agro only if we regard case marking on articles as crucial. On Modern Dutch, see below. 10. As we saw in note 7, residual OV orders survive for some time after the twelfth century. We must assume that these orders neither depend on nor trigger a strong N-feature of Agro, although their analysis remains unclear. In this situation, we can only conclude that twelfth-century acquirers reanalysed OV orders as whatever construction the one seen in note 7 is. 11. Final DPs are found in Middle Dutch (Weerman 1989). Our expectation is then that these orders are lost before morphological case is lost. Weerman shows that Old and Middle High German also had final DPs. It is clear, then, that

134 Ian Roberts these properties can be lost independently of the English developments that we are discussing here. An intriguing possibility is suggested by our analysis of (29a). We suggested above that Modern West Germanic varieties with ‘V-projection raising’ and without final DPs lack the possibility of remnant fronting of non-finite clauses. A plausible speculation is that this is due to these complements having a more reduced functional structure than the OE (or OHG and Middle Dutch) counterparts. If non-finite complements develop a more reduced functional structure then there are fewer landing sites for the object on the lower cycle and correspondingly less possibility of remnant complement fronting. Such a development, which is quite independent of anything discussed in the text, may have led to the loss of final DPs in all of Continental West Germanic. If this can be maintained, then we do not have to predict that final DPs were lost before morphological case-marking in Dutch. 12. In the highly restrictive view of parametrization imposed by the Minimalist framework it is difficult to see what other properties might give rise to OV orders. It is then tempting to regard other cases of OV-to-VO change as being caused in the same way. This is a plausible speculation as regards the development from Latin to Romance. Latin was OV with free word order and morphological case; Modern Romance languages are all VO, have rigid word order and—with the possible exception of Rumanian—have no morphological case (outside the pronominal system). Taking Latin ‘free word order’ as indicative of scrambling (whether of a type precisely like that found in West Germanic or closer to what is found in Russian, Hindi or Japanese remains to be seen), and the Romance ‘rigid’ word order to indicate the absence of scrambling, it seems likely that an account of the sort given above would carry over. The obvious anomaly concerns clitics/weak pronouns: why has Romance retained such elements while English has lost them? Although I cannot give a full answer here, I speculate that this is connected to the fact that Romance clitics are essentially V-related elements that license pro (cf. Rizzi 1993 on the former and Sportiche 1996 on the latter), rather than being Wackernagel elements like their Germanic (and Late Latin) counterparts. The changes described in the text eliminate Wackernagel pronouns (or ‘C-oriented’ clitics in Rivero’s terminology—cf. the discussion of this notion in section 3.2), they are not necessarily incompatible with clitics with the particular properties that the Modern Romance ones have. On this view, when the movement source for clitics described in the text was lost Romance pronouns changed status. The question now becomes: why did this change not happen in English? One possible answer is that it did: I mentioned in note 8 that Middle and early Modern English show pronoun object shift of the sort found in Mainland Scandinavian languages today—see Roberts (1995 [this volume, Chapter 3]) for an analysis that is largely, but not entirely, compatible with the analysis of word-order change given here However, English/ Scandinavian pronoun object-shift is radically different from Romance cliticization, essentially in that the English and Scandinavian pronouns do not seem to be V-related. A more intriguing possibility is that English lacked a sufficiently rich agreement system to allow a new class of licensers of pro. The question for this approach is why the clitics themselves did not become an agreement system; to attempt an answer to this would take us too far afield here. Giuseppe Longobardi (p.c.) points out another possible consequence of the account of word-order change given here. Suppose, as is often suggested, that scrambling is a way of marking the specificity of a DP. Then the loss of scrambling led to the loss of this mode of marking specificity, and may thereby have contributed to the development of the article system of Middle and Modern English, absent in Old English (sē, sēo, þæt was a demonstrative

Directionality and Word Order Change 135 at this time). More generally, OV languages with rich morphological case and scrambling tend to lack article systems: cf. once again Latin. Here we see how changes in clausal functional categories can interact with developments inside DPs, another very rich area that we cannot begin to investigate adequately here.

References Adams, M. 1987. Old French, Null Subjects and Verb-Second Phenomena. PhD dissertation, UCLA. Bean, M. 1983. The Development of Word Order Patterns in Old English. London: Croom Helm. Belletti, A. 1990. Generalized Verb Movement: Aspects of Verb Syntax. Turin: Rosenberg & Sellier. Bennis, H. & T. Hoekstra. 1986. Gaps and Parasitic Gaps. The Linguistic Review 4: 29–87. Den Besten, H. 1983. On the Interaction of Root Transformations and Lexical Deletive Rules. In W. Abraham (ed) On the Formal Syntax of the Westgermania. Amsterdam: John Benjamins. Den Besten, H. & J. Edmonson. 1983. The Verbal Complex in Continental West Germanic. In W. Abraham (ed) On the Formal Syntax of the Westgermania. Amsterdam: John Benjamins. Den Besten, H. & G. Webelhuth. 1990. Stranding. In G. Grewendorf & W. Sternefeld (eds) Scrambling and Barriers. Amsterdam: John Benjamins. Canale, M. 1978. Word Order Change in Old English: Base reanalysis in generative grammar. PhD dissertation, McGill University. Cardinaletti, A. 1994. On the Internal Structure of Pronominal DPs. Linguistic Review 11: 195–219. Cardinaletti, Anna, and Ian Roberts. 2002. Clause structure and X-second. In Functional Structure in DP and IP: The Cartography of Syntactic Structure Volume One ed. G. Cinque. New York/Oxford: Oxford University Press, pp. 123–166 [this volume, Chapter 12]. Chomsky, N. 1977. Essays on Form and Interpretation. Amsterdam: Elsevier. Chomsky, N. 1993. A Minimalist Program for Linguistic Theory. In K. Hale & S.J. Keyser (eds) The View from Building 20. Cambridge, MA: MIT Press, pp. 1–52. Chomsky, N. 1994. Bare Phrase Structure. In G. Webelhuth (ed) Government and Binding Theory and the Minimalist Program. Oxford: Blackwell, pp. 383–440. Clark, R. & I. Roberts. 1993. A Computational Model of Language Learnability and Language Change. Linguistic Inquiry 24:299–345 [this volume, Chapter 2]. Diesing, M. 1992. Indefinites. Cambridge, MA: MIT Press. Denison, D. 1993. English Historical Syntax. London: Longman. Deprez, V. 1994. Parameters of Object Movement. In N. Corver & H. van Riemsdijk (eds) Scrambling. Berlin: Mouton de Gruyter, pp. 101–152. Evers, A. 1975. The Transformational Cycle in Dutch and German. PhD dissertation, University of Utrecht. Distributed by Indiana University Linguistics Club. Fanselow, G. 1990. Scrambling as NP-Movement. In G. Grewendorf & W. Sternefeld (eds) Scrambling and Barriers. Amsterdam: John Benjamins. Fischer, O. 1992. Syntax. In N. Blake (ed) The Cambridge History of the English Language, Volume II: 1066–1476. Cambridge: Cambridge University Press, pp. 207–409.

136 Ian Roberts Greenberg, J. 1963. Some Universals of Grammar with Particular Reference to the Order of Meaningful Elements. In J. Greenberg (ed) Universals of Language. Cambridge, MA: MIT Press. Hawkins, J. 1983. Explaining Language Universals. London: Croom Helm. Haeberli, E. & L. Haegeman. 1995. Clause Structure in Old English: Evidence from Negative Concord. Journal of Linguistics 31:81–108. Haegeman, L. 1992. Generative Syntax: Theory and Description. A Case Study in West Flemish. Cambridge: Cambridge University Press. Haegeman, L. 1993a. Object Clitics in West Flemish. Geneva Generative Papers, 2. Haegeman, L. 1993b. Some Speculations on Argument Shift, Clitics and Crossing in West Flemish. In: W. Abraham and J. Bayer (eds) Dialektsyntax. Linguistische Berichte Sonderheft, vol 5. VS Verlag für Sozialwissenschaften, Wiesbaden. Haegeman, L. & H. van Riemsdijk. 1986. Verb Projection Raising, Scope and the Typology of Rules Affecting Verbs. Linguistic Inquiry 17:417–466. Holmberg, A. 1986. Word Order and Syntactic Features in Scandinavian Languages and English. PhD Dissertation, University of Stockholm. Huang, C.-T.J. 1993. Reconstruction and the Structure of VP: Some Theoretical Consequences. Linguistic Inquiry 24:103–138. Kayne, R. 1989. Null Subjects and Clitic Climbing. In O. Jaeggli & K. Safir (eds) The Null Subject Parameter. Dordrecht: Kluwer, pp. 239–261. Kayne, R. 1994. The Antisymmetry of Syntax. Cambridge MA: MIT Press. Van Kemenade, A. 1987. Syntactic Case and Morphological Case in the History of English. Dordrecht: Foris. Kiparsky, P. 1995. The Indo-European Origins of Germanic Syntax. In A. Battye & I. Roberts (eds) Clause Structure and Language Change. New York/Oxford: Oxford University Press, pp. 140–170. Koster, J. 1975. Dutch as an SOV language. Linguistic Analysis, 1. Lass, R. 1992. Phonology and Morphology. In N. Blake (ed) The Cambridge History of the English Language, Volume II: 1066–1476. Cambridge: Cambridge University Press, pp. 23–155. Lee Y.-S. & B. Santorini. 1994. Towards Resolving Webelhuth’s Paradox: Evidence from German and Korean. In N. Corver & H. van Riemsdijk (eds) Scrambling. Berlin: Mouton de Gruyter, pp. 257–300. Lightfoot, D. 1979. Principles of Diachronic Syntax. Cambridge: Cambridge University Press. Lightfoot, D. 1991. How to Set Parameters: Arguments from Language Change. Cambridge MA: MIT Press. Mitchell, B. 1964. Syntax and Word Order in the Peterborough Chronicle 1122-54. Neuphilologische Mitteilungen 65:113–144. Mitchell, B. 1989. A Guide to Old English. Oxford: Clarendon Press. Pearce, E. 1990. Parameters in Old French Syntax. Dordrecht: Kluwer. Pintzuk, S. 1991. Phrase Structures in Competition: Variation and Change in Old English Word Order. PhD dissertation, University of Pennsylvania. Pintzuk, S. & A. Kroch. 1989. The Rightward Movement of Complements and Adjuncts in the Old English of Beowulf. Language Variation and Change 1:115–143. Platzack, C. 1995. The Loss of Verb Second in French and English. In A. Battye & I. Roberts (eds) Clause Structure and Language Change. New York/Oxford: Oxford University Press, pp. 200–226. Pollock, J.-Y. 1989. Verb Movement, UG and the Structure of IP, Linguistic Inquiry 20:365–424. Prinzhorn, M. 1990. Head Movement and Scrambling Domains. In G. Grewendorf & W. Sternefeld (eds) Scrambling and Barriers. Amsterdam: John Benjamins.

Directionality and Word Order Change 137 Rivero, M.-L. 1994. On Two Locations for Complement Clitic Pronouns: SerboCroatian, Bulgarian and Old Spanish. In A. van Kemenade & N. Vincent (eds) Parameters of Morphosyntactic Change. Cambridge: Cambridge University Press, pp. 170–206. Rizzi, L. 1982. Issues in Italian Syntax. Dordrecht: Foris. Rizzi, L. 1991. Residual Verb Second and the wh-Criterion. University of Geneva Technical Reports in Formal and Computational Linguistics 2. Rizzi, L. 1993. Talk Given at EUROTYP Clitics Workshop, University of Durham. Roberts, Ian, 1985. Agreement Parameters and the Development of English Modal Auxiliaries. Natural Language and Linguistic Theory 3:21–58 [this volume, Chapter 1]. Roberts, I. 1991. Excorporation and Minimality. Linguistic Inquiry, 22:209–218 [this volume, chapter 10]. Roberts, Ian. 1993. Verbs and Diachronic Syntax. Dordrecht: Kluwer. Roberts, I. 1995. Object Movement and Verb Movement in Early Modern English. In H. Haider, S. Olsen & S. Vikner (eds) Studies in Comparative Germanic Syntax. Dordrecht: Kluwer, pp. 269–284. [This volume, Chapter 3]. Roberts, I. 1997. Restructuring, Head Movement and Locality. Linguistic Inquiry 28:423–460. Roberts, I. 1999. Verb Movement and Markedness. In Michel deGraff (ed) Language Creation and Language Change. Cambridge, MA: MIT Press, 287–328. [this volume, Chapter 5]. Rohrbacher, B. 1994a. The Germanic Languages and the Full Paradigm: a Theory of V-to-I Raising. PhD dissertation, University of Massachusetts, Amherst. Rohrbacher, B. 1994b. Notes on the Antisymmetry of Syntax. Ms., University of Massachusetts, Amherst. Ross, J. 1967. Constraints on Variables in Syntax. PhD dissertation, MIT. Rutten, J. 1991. Infinitival Complements and Auxiliaries. Amsterdam Studies in Generative Grammar 4. Santorini, B. 1992. Variation and Change in Yiddish Subordinate Clause Word Order. Natural Language and Linguistic Theory 10:595–640. Schwartz, Bonnie, and Sten Vikner. 1996. The verb always leaves IP in V2 clauses. In A. Belletti & L. Rizzi (eds) Parameters and Functional Heads: Essays in Comparative Syntax. New York/Oxford: Oxford University Press, pp. 11–62. Sportiche, D. 1999. Movement, Case and Agreement. In. D. Sportiche Partitions and Atoms of Clause Structure. London: Routledge, pp. 88–243. Sportiche, D. 1996, ‘Clitic Constructions’, in J. Rooryck & L. Zaring (eds) Phrase Structure and the Lexicon. Dordrecht: Kluwer, pp. 213–276. Stockwell, R. 1977. Motivations for exbraciation in Old English. In C. Li (ed) Mechanisms of Syntactic Change. Austin: University of Texas Press. Tomaselli, A. 1995. Cases of V3 in Old High German. In A. Battye & I. Roberts (eds) Clause Structure and Language Change. New York/Oxford: Oxford University Press, pp. 345–369. Stowell, T. 1981. Origins of Phrase Structure. PhD dissertation, MIT. Traugott, E. 1992. Syntax. In R. Hogg (ed) The Cambridge History of the English Language, Volume I: The Origins to 1066. Cambridge: Cambridge University Press. Vikner, S. 1994. Scandinavian Object Shift and West Germanic Scrambling. In N. Corver & H. van Riemsdijk (eds) Scrambling. Berlin: Mouton de Gruyter, pp. 487–517. Vikner, S. 1995. Verb Movement and Expletive Subjects in the Germanic Languages. Oxford University Press, Oxford/New York.

138 Ian Roberts Visser, F.T. 1963-73. An Historical Syntax of the English Language. Part I: Syntactic Units with One Verb. Leiden: Brill. Webelhuth, G. 1991. Syntactic Saturation Phenomena and the Modern Germanic Languages. New York/Oxford: Oxford University Press. Weerman, F. 1989. The V2 Conspiracy. Dordrecht: Foris. Zwart, J.-W. 1993. Dutch Syntax: A Minimalist Approach. Groningen: Groningen Dissertations in Linguistics 40.

5

Verb Movement and Markedness Ian Roberts

This chapter can be thought of as an exercise in the application of the theory of parameters to a set of data. I will discuss a fairly simple, well-known, and well-understood parameter and show how evidence from language change, language acquisition, and creolization supports the idea that there is an unmarked value of this parameter: the “weak” value in terms of Chomsky’s (1993) proposals. This leads to the contention that the weak value of a parameter is always the unmarked value. Drawing on work by Clark and Roberts (1993 [this volume, Chapter 2]), we will see that this conclusion is underpinned by the theory of learnability. The parameter in question is the one that governs verb movement to I.1 In section 5.1, I present the basic data that motivates the postulation of such a parameter. In section 5.2, I describe the evidence that the value of this parameter changed in sixteenth-century English. Section 5.3 offers an interpretation of the recently discovered “root-infinitive” stage of the acquisition of English and many other languages that amounts to proposing that root infinitives result from the verb-movement parameter being initially set to the unmarked value. In section 5.4, I argue, concentrating on Haitian Creole, that French-based creoles have the default value of this parameter, perhaps because creoles generally have default parameter values (see Bickerton 1984), an idea that I attempt to substantiate.

5.1 The Verb-Movement Parameter It was originally argued by Emonds (1978) that French has a rule moving finite verbs out of VP, whereas English does not. The basic form of the observation is as follows: there is a class of elements X that can be plausibly regarded as positioned on the left edge of VP. These elements include VP adverbs, clausal negation, and floating quantifiers. In French, finite main verbs must precede X, but English main verbs always follow X. The relevant paradigms are as follows: (1) Adverb a. Jean embrasse souvent Marie. *Jean souvent embrasse Marie.

140 Ian Roberts b. *John kisses often Mary. John often kisses Mary. (2) Negation a. Jean (ne) mange pas de chocolat. *Jean (ne) pas mange de chocolat. b. *John eats not chocolate. John does not eat chocolate. (3) Floating quantifiers a. Les enfants mangent tous le chocolat. *Les enfants tous mangent le chocolat. b. *The children eat all chocolate. The children all eat chocolate. The evidence clearly shows that finite verbs are in different positions in the two languages. The alternative is to suggest that the X-elements differ between the two languages. This has been suggested by Williams (1994); however, Williams’s approach does not encompass what I take to be the very important interactions with inversion, which I turn to directly. Standard assumptions about inversion, deriving from the seminal work of den Besten (1983), treat this operation as involving movement of I to C. Given the Head Movement Constraint (Travis 1984; Baker 1988), V cannot move directly to C, and so inversion of main verbs depends on the prior operation of V-to-I movement to feed it. Thus we find that French main verbs are able to undergo inversion (subject to the independent restriction that the subject be a clitic; see Rizzi and Roberts 1989 [this volume, Chapter 9]), whereas English main verbs are unable to do so: (4) a. Voit-il le cheval? b. *Sees he the horse? The contrast in (4) is further evidence that French main verbs move to I, whereas their English counterparts do not.2 Pollock (1989) developed Emonds’s original proposal in terms of principles-and-parameters theory (henceforth P&P). His proposal was that English I is theta opaque, in that it does not permit a category it contains to assign theta roles. Hence, a verb with argument structure could not move there without violating the Theta Criterion. A verb that lacks argument structure could move there, however. On the assumption that auxiliaries do not assign theta roles, this derives the fact that the auxiliaries in English can undergo verb movement (cf. Emonds’s rule of have/be raising): (5)

a. b. c. d.

John has often kissed Mary. John has not kissed Mary. The kids have all eaten the chocolate. Has John seen Mary?

Verb Movement and Markedness 141 French I, on the other hand, is theta transparent, and so main verbs are able to move there without violating the Theta Criterion (and are required to move there because Tense must create a bound variable by LF; this requirement is met in English by movement of an abstract auxiliary when no overt auxiliary is present). Pollock notes that nonfinite I is theta opaque in French, and hence we find the same split between auxiliaries and main verbs as in English finite clauses:3 (6) a. N’avoir/*posséder pas de voiture en banlieue crée des problèmes. to-have to-possess not of car in suburbs creates some problems ‘To have/possess not a car in the suburbs creates problems.’ b. N’être/*sembler pas heureux est une condition pour écrire des romans. to-be to-seem not happy is a condition for to-write some novels ‘To be/seem not happy is a condition for writing novels.’ For Pollock, then, the parameter governing V-movement is the theta opacity or transparency of I. Pollock notes that it is likely that the value of this parameter in a given language is connected to the “richness” of agreement morphology because French agreement morphology is somewhat richer than that of English. Moreover, Pollock notes, following Roberts (1985 [this volume, Chapter 1]), that earlier stages of English had richer agreement and V-to-I movement (i.e., a theta transparent I). I discuss the diachronic evidence and the relation between V-to-I movement and the agreement system in more detail in section 5.2. Chomsky (1993) proposes that the relevant parameter concerns the value of an abstract morphological feature that licenses verbs, and is associated with I. This feature is called I’s V-feature. In Chomsky’s system, such features are generated both on V and on I and must be canceled out by a checking operation prior to LF because they have no semantic content and will thus violate the Principle of Full Interpretation unless eliminated. The feature varies parametrically as either strong or weak. If it is strong, it is visible to the PF component and hence must be eliminated prior to the mapping to that level of representation, Spell-Out. Because feature checking takes place in a highly local domain, V must move to I in order for feature checking to take place. Thus where the V-feature is strong, V raises overtly to I. Where the feature is weak, the Procrastinate principle, which delays movement to the covert, post-Spell-Out part of the grammar wherever possible, prevents this movement from taking place overtly. In these terms, then, French I has a strong V-feature and English I has a weak V-feature.4 Again, this account is compatible with the observation that the value of the parameter is connected to the richness of agreement morphology: relatively “rich” agreement morphology, in some sense, gives rise to a strong V-feature, whereas relatively “poor” agreement morphology is associated with a weak feature. I elaborate this idea in the following discussion.

142 Ian Roberts

5.2 The Loss of Verb Movement in English It is well known that English has historically lost V-to-I movement (see Roberts 1985 [this volume, Chapter 1], 1993a; Kroch 1989; Pollock 1989). The historical evidence from English prior to roughly 1600 shows that, at this period, English verbs patterned like French verbs with respect to movement to I. (7) a. if I gave not this accompt to you ‘if I gave not (= didn’t give) this account to you’ (1557: J. Cheke, Letter to Hoby, in Görlach 1991, 223) b. How cam’st thou hither? ‘How camest thou (did you come) here?’ (1594: Shakespeare, Richard III) c. The Turks . . . made anone ready a grete ordonnaunce. ‘The Turks . . . made soon ready (=soon prepared) a great ordnance.’ (c1482: Kaye, The Delectable Newsse of the Glorious Victorye of the Rhodyans) d. In doleful wise they ended both their days. ‘In doleful way they ended both (= both ended) their days.’ (1589: Marlowe, The Jew of Malta, III, iii, 21) These examples, and many like them, show that I had a strong V-feature at this period. According to most accounts (Kroch 1989; Lightfoot 1991; Roberts 1993a), verb movement of this type began to decline in the latter part of the sixteenth century and was lost from the colloquial language in the seventeenth century, although it remained in the literary language throughout the seventeenth century and perhaps slightly longer (see Jespersen 1909– 49, VI, 502). Kroch (1989), reanalyzing the quantitative data of Ellegård (1953), shows that the crucial turning point was 1575. Beginning with Roberts (1985 [this volume, Chapter 1]), it has been argued that the loss of parts of the verbal conjugation in English is related to this change. There is little doubt that, as part of the general loss of morphology that took place during the Middle English period, a large number of verbal endings had disappeared by the end of the fifteenth century. In particular, Gray (1985, 495ff) gives the following paradigms for Londonarea English circa 1400 and circa 1500: (8) 1400 1500 cast(e) cast cast-est cast-est cast-eth cast-eth caste(n) cast(e) caste(n) cast(e) caste(n) cast(e)

Verb Movement and Markedness 143 Presumably these changes are caused by phonological erosion of final nasals and of unstressed vowels. In particular, in the sixteenth century there are only very few attested survivals of any plural ending (see Roberts 1993a, 257ff). It is therefore plausible to propose that the presence of agreement morphology, particularly plural endings, played a very important role in triggering the strong V-feature. Evidence from the Scandinavian languages supports the idea that the “richness” of agreement is related to the strength of I’s V-feature. Platzack (1987) establishes a diachronic correlation between the loss of agreement inflection and the loss of V-to-I movement in subordinate clauses in Swedish.5 A similar diachronic pattern can be shown for Danish also (see Roberts 1993a, 264). Moreover, Icelandic has retained a relatively “rich” agreement inflection (similar to that found in Old Swedish and Old Danish) and has retained V-to-I movement. Similarly, evidence from Scandinavian dialects supports this idea. On the one hand, Älvsdalsmålet, a dialect of Swedish spoken in Dalecarlia, Central Sweden, has plural agreement marking and verb movement in subordinate clauses (see Platzack 1987; Platzack and Holmberg 1989). On the other hand, the Norwegian dialect of Hallingdalen effectively lacks plural agreement endings and lacks verb movement in subordinate clauses (Trosterud 1989). The crosslinguistic evidence from English and North Germanic clearly establishes a link between agreement marking and verb movement. Roberts (1993a) formulated this link in terms of number agreement, suggesting that overt, distinct marking of number agreement is the relevant kind of “richness.” Rohrbacher (1994) points out that Roberts’s proposal makes the wrong predictions for Faroese, which lacks verb movement but has overt, distinct plural marking. Instead, Rohrbacher proposes that distinct first- and second-person markings in at least one number of one tense of the regular verbs is what is required; this generalization covers Faroese (which, unlike Icelandic, lacks such marking) and the other languages.6 If we could establish a simple and general biconditional relation between agreement morphology of a certain kind and a strong V-feature in I, we could simply attribute the loss of verb movement in English to the effects of Procrastinate because the loss of morphology would entail the change in the value of the feature from strong to weak and, hence, the loss of verb movement. However, it is clear that such a simple biconditional relation cannot be maintained. First, there is the case of V-to-I movement in infinitives—lacking in French (for main verbs) but obligatory in Italian. Belletti (1990) illustrates this with the following kind of contrast between French and Italian (Belletti argues that pas and più are in the same position): (9) a. *Ne lire pas le livre. neg read neg the book

144 Ian Roberts b. c. d.

Ne pas lire le livre. neg neg read the book ‘To not read the book.’ Non leggere più il libro. neg read more the book ‘To no longer read the book.’ *Non più leggere il libro. neg more read the book

Neither language has any agreement morphology in infinitives (and infinitives themselves are morphologically marked in broadly the same way). Second, there are certainly languages with V-movement that lack the relevant morphology. Depending on other considerations (see note 5), Dutch and Afrikaans might be such cases. A clearer case is the English of Northern England and Scotland of the fourteenth to sixteenth centuries. In this language, the same ending (usually spelled -is) appeared in all persons in the present tense and there is clearly verb movement (the ending -it in (10b) is the past tense). (10) a. b.

Consideris thou this? ‘Considerest thou (=do you consider) this?’ (1515: Douglas, Aeneid, 1. 208; Görlach 1991, 208) For then they observit not Flowing nor eschewit not Ryming in termes. ‘For then they observed not (= didn’t observe) flowing nor eschewed not (didn’t eschew) rhyming in terms’ (1584: James VI, The Essays of a Prentice, Preface to the Reader; Görlach 1991, 309)

Third, the Kronoby dialect of Swedish combines an absence of agreement morphology with verb movement (Platzack and Holmberg 1989). Finally, in both Danish and English there was a considerable time lag between the loss of agreement morphology and the loss of V-to-I movement: Danish had lost its verbal morphology by 1400 (Karker 1974, 25), but V-to-I in subordinate clauses survived until at least the seventeenth century (Falk and Torp 1900, 302). In Standard English there was a gap of at least 75 years between the loss of relevant verbal morphology (c1500) and the loss of verb movement (c1575). These reasons lead me to formulate the relation between agreement morphology and the presence of a strong V-feature on I as a one-way implication: (11) If there is verbal agreement marking of the relevant type, then I has a strong V-feature. The statement in (11) implies that the loss of verbal agreement marking on its own is not sufficient for the change in the value of I’s V-feature, and

Verb Movement and Markedness 145 hence, such a change on its own has no effect on V-to-I movement. Thus the statement is consistent with the existence of languages that have verb movement but lack the “relevant” type of inflectional morphology. However, (11) says that the loss of this morphology is a necessary condition for the loss of V-to-I movement: a language that has this morphology has V-to-I movement, and this morphology must be lost if V-to-I movement is lost. Clark and Roberts (1993 [this volume, Chapter 2]) argue that the parametersetting algorithm contains an elegance condition, which, all other things being equal, favors those parameter settings generating relatively simple representations over those generating relatively more complex ones. In Clark and Roberts, in progress, we further argue that the preference for greater simplicity acts at a higher level, causing the form of parameters to be as simple as possible—that is, binary feature values associated with underspecified categories (see section 5.4 on the view that functional categories are inherently underspecified). For the sake of the present exposition, I take simplicity to be a function of movement relations. Thus a structure involving movement is more complex for the learner than a structure not involving movement. This idea forms the basis of a theory of markedness of parameter values that holds that strong feature values are marked because they inevitably give rise to movement dependencies. 7 In these terms, (11) (properly formulated, following Roberts, Rohrbacher, or some other characterization of “relevant morphological marking”) states the morphological trigger for a marked parameter setting. Hence the loss of the relevant morphology implies the loss of that morphological trigger. However, this does not in itself imply a change in the parameter value. Many marked parameter values are triggered by word order alone—call these syntactic triggers. For V-to-I movement to be lost, it must be that this trigger somehow became inoperative, leading the learner to select the unmarked parameter value (i.e., the weak V-feature). We can phrase this in terms of the notion of P-expression introduced by Clark and Roberts (1993 [this volume, Chapter 2]): a sentence S expresses a parameter P just in case a grammar must have P set to some definite value in order to assign a wellformed representation to S. Given a theory of markedness, only marked values of parameters need to be expressed; unmarked values will be triggered by default if they are not expressed. I regard a marked parameter value as morphologically expressed when a condition like (11) is satisfied; its syntactic expression depends on word order unambiguously expressing the marked value. In the case of V-to-I movement, it is plausible that the most salient syntactic expression of the marked parameter value was movement of I containing V to C in inversion and the order V+I not in negating. A notable characteristic of sixteenth-century English is the emergence of do as a dummy auxiliary. In this period, do was freely available as a semantically empty carrier of tense and agreement marking, leaving the main verb inside VP and in an unmarked form in all contexts, including positive declaratives (where in standard Modern English it is, of course, ungrammatical;

146 Ian Roberts Ellegård 1953; Visser 1963–73; Denison 1985). Roberts (1993a) argues that do developed from a homophonous causative and raising verb by essentially a process of grammaticization, the same process that created a syntactically identifiable class of modal auxiliaries (see also Lightfoot 1979 on the latter). The most important step in the process of grammaticization was signaled by the loss of nonfinite forms of do and modals. Sixteenth-century do was like Modern Standard English auxiliary do in that it could not appear in nonfinite contexts (see Visser 1963–73, section 419). The following is one of the last examples of do in a non-finite context: (12) Now if I would then doe . . . tel hym. ‘Now if I would then do . . . tell him.’ (1534: St Thomas More, Works (1557) 1192, F4; Visser, ibid) It may be significant that Thomas More is one of the last authors (of standard English) to use sequences of modals (see Lightfoot 1979, section 2.1). I take it that the inability of do to appear in a nonfinite context, combined with its lack of lexical content, meant that it was analyzed as an I-element, not as a V that raises to I (the same can be said of the modals; see Roberts 1985 [this volume, Chapter 1]). Because they were base-generated in I, these elements created an indeterminacy in the trigger for the V-to-I parameter: in contexts where these elements were present, the trigger for this parameter was simply not present (in the terms of Clark and Roberts (1993 [this volume, Chapter 2]), this parameter value was not expressed). According to Ellegård’s (1953) figures (reproduced in Kroch 1989 and Roberts 1985 [this volume, Chapter 1], 1993a), the incidence of do in questions averaged approximately 60% in 1575 and approximately 65% in 1600. In negatives, the figures were approximately 25% and 35%, respectively. These figures, particularly those for questions, strongly suggest that the development of the auxiliary system, in particular the availability of dummy do, undermined the syntactic expression of the marked parameter value and hence led learners to default to the unmarked value. In this way, the V-feature of I was set as weak, and the Procrastinate principle blocked V-to-I movement. To summarize, three factors led to the parameter change. First, the loss of agreement morphology, given (11), removed the morphological trigger for the strong feature. Second, a number of constructions in the input, essentially those involving auxiliaries, were compatible with grammars lacking verb movement. Neither of these developments in themselves guarantee the loss of V-to-I movement. The crucial third factor is the sensitivity of the learner to complexity, embodied in a markedness theory claiming that strong feature values, because they create more complex representations involving overt movement, are dispreferred relative to weak features. The situation for learners of English in the sixteenth century was one where the morphological and syntactic evidence for verb movement was no longer categorical.

Verb Movement and Markedness 147 Hence, given the general drive to minimize complexity, the system that did not feature the option of verb movement was preferred. Therefore, the parametric value of I changed: the former strong V-feature became weak.

5.3 Root Infinitives and Parameter Setting In this section, I briefly consider one way that the proposals made in the previous section about the variation and fixation of verb-movement parameters might interact with language acquisition. I have proposed that the setting of verb-movement parameters consists in locating the strong feature values of functional heads. Such features can be triggered either morphologically or syntactically: morphologically by the presence of relevant inflectional paradigms, and syntactically by the presence of clear evidence of movement of the relevant kind. Here, I present a case study of one aspect of developmental syntax that has received a good deal of attention in the recent literature: the so-called root infinitives (Rizzi 1994; Wexler 1994). I will try to show that this phenomenon is a reflex of the learner’s propensity for weak features; this propensity manifests itself in the production of root infinitives under certain highly specific circumstances. Wexler (1994) shows that acquirers of a range of languages, whose V-movement properties are well understood, go through a developmental phase (between roughly 20 and 30 months) where they show evidence of having acquired the relevant kinds of V-movement, but that, alongside adult-like instances of V-movement, these acquirers produce so-called root infinitives. The root infinitives are unmoved verbs in a form homophonous with the infinitive. The evidence, which I will now review, comes from a range of Germanic and Romance languages. Pierce (1989, 1992) shows that children acquiring French produce adultlike sentences such as (13a) with inflected verbs and V-to-I movement, alongside sentences where verb movement does not occur and the verb has a form homophonous with the infinitive, such as (13b): (13) a. Patsy est pas là-bas. Patsy is not over-there b. Pas manger la soupe. not eat-inf the soup Example (13b), like (13a), is a declarative. Moreover, although the -er infinitive ending for this conjugation is phonetically identical to both the second-plural present and the past participle (all are /-e/), Pierce shows that the form here is an infinitive on the basis of examples involving verbs from other conjugations where these forms are distinct (e.g., voir ‘see’ (infinitive) vs. vu (past participle) vs. voyez (second-plural present)). The correlation between the use of the infinitive and the absence of V-to-I movement is clear.

148 Ian Roberts In German, the pattern is finite verb in second position, (14a), versus apparent infinitive in final position, (14b) (all the utterances are root clauses). (14) a. Mein Hubsaube hat Tiere din. my helicopter had animals in-it b. Ich der Fos hab’n. I the frog have Weverink (1990) shows that Dutch children go through a comparable stage: (15) a. Ik pak’t op. I pack-it up b. Pappa schoenen wassen. daddy shoes wash Plunkett and Stromqvist (1990) show that in Swedish child language, finite verbs are used when the verb precedes negation, as in the adult language (16a), and apparently non-finite forms are used when the verb follows negation (16b) (again, all clauses are root). (16) a. Alg sager inte mu. elk say not moo b. Inte ha den. not have(–FIN) it Finally, Wexler (1994) insightfully relates this crosslinguistic evidence to an old observation from developmental studies of English that the third-singular marker -s is frequently “dropped”: (17) He no bite you. Wexler analyzes this kind of production as a further instance of a form homophonous with an infinitive appearing in child language where it would not appear in adult language. Wexler supports this analysis with the observation that not V + s forms are rarely attested in child language. In English, it is not verb movement that is lacking but rather the dummy auxiliary do, which can in a sense be seen as the counterpart of verb movement in other languages. Wexler considers three different possible analyses of root infinitives. One possibility is that all these clauses contain a null modal. This would immediately explain the form and distribution of the verbs in all the languages: they take the infinitive form and cannot move because those are the properties of verbs in the complement to modals. However, beyond this, such an approach is little more than a restatement of the problem: why should there

Verb Movement and Markedness 149 be a developmental phase in which a null modal is postulated? Notice further that, as Wexler points out, this putative modal has no semantic value (unlike, for example, the empty modal that has been proposed for English subjunctives by Culicover (1971), Emonds (1976) and Roberts (1993a), which roughly corresponds to should semantically). From the perspective of English, at least, it is more plausible to consider that there is a null do present in examples like (17); however, this idea does not translate naturally to the other languages. In any case, here we come up against the same problem of explanation: why should acquirers postulate such an element in the first place? A second possible account is that T is either optional or underspecified, at this developmental stage, for [±finite]. Either way, the adult distinction between finite and non-finite clauses would be lacking as a consequence of this. This analysis would then explain the data and indicate that this sole factor perturbs the children’s verb-movement and inflection system. Wexler adduces some support for this idea from the fact that children acquiring English sometimes use bare-stem forms for the past tense. The problem with this idea is that there is still no real explanation as to why such a phase should exist. The third possibility that Wexler considers is that Agr, or perhaps T, has weak V-features at this stage. Because this idea can be readily integrated into my system of markedness, learning, and parameter setting, this is the analysis I adopt. However, before going into a detailed discussion of this idea, we must consider a fourth analysis that has been put forward by Rizzi (1994). Rizzi proposes the Clausal Truncation Hypothesis. The central idea is that adult grammars contain the principle in (18). (18) CP = root The effect of this principle is that all clauses in adult grammar must contain a full functional structure, including at least C, AgrS, T, and AgrO. T is assumed to contain a temporal variable that must be bound: in finite clauses verbal inflection performs this function (see Pollock 1989), whereas nonfinite clauses require some binder; for example the T-variable of embedded nonfinite clauses is bound by the higher T (see Enç 1988; Stowell 1982). Root infinitives are ungrammatical in adult language because this variable cannot be bound. Child language allows root infinitives because (18) does not hold and, in Rizzi’s words: [I]f the selected starting point is a category lower than TP . . , then one will get the root infinitive, or a root construction exhibiting whatever unmarked nonfinite form the language possesses; so, a root infinitive can be a bare VP in a language like English, in which the infinitival form is the bare stem . . ; or the maximal projection of the head corresponding to the infinitival morpheme . . . Since there is no T position, there is

150 Ian Roberts no tense variable to bind, hence the root infinitive at this stage will not incur the violation of the binding requirement which bans this construction in adult systems . . . . (Rizzi 1994, 379) The Clausal Truncation Hypothesis accounts for a number of important properties that seem to hold of root infinitives. First, root infinitives are incompatible with wh-movement. Examples like (19) are apparently not found. (19) *Was Hans essen? what Hans eat-inf In wh-interrogatives, the verb always moves and is always finite in form at this stage of acquisition of the languages that have been studied from this perspective. This follows from the Clausal Truncation Hypothesis because wh-movement requires CP, and if CP is present, then the option of clausal truncation is not taken, and root infinitives cannot appear for the same reason as in the adult language. Root infinitives are also largely incompatible with negation. This follows if NegP is held to be higher than TP (see Belletti 1990 and pace Ouhalla’s (1991) proposal that the position of NegP can vary crosslinguistically). Rizzi suggests that examples of negation that are found with root infinitives (e.g., (13b), (16), (17)) should be treated as constituent negation. He also notes the observation of Hoekstra and Jordens (1991) that a negation marker corresponding to the adult anaphoric negation was used in root infinitives, whereas the standard clausal negator of the adult language was used with finite verb forms (in this connection, see no instead of not in (17)). French root infinitives are incompatible with the presence of the subject clitics (e.g., je, tu, etc.). If French subject clitics are clitics in AgrS or pronouns that cliticize obligatorily to AgrS (in section 5.4 I argue for the latter approach), then this is explained by the Clausal Truncation Hypothesis: when there is root infinitive, there is no TP and therefore no AgrSP. A further important observation is that auxiliaries do not occur as root infinitives (Wexler 1994). Thus, root infinitives like those in (20) are not found. (20) a. *avoir mangé have-inf eaten b. *gekauft haben bought have-inf Rizzi proposes that auxiliaries can be treated either as generated in T or as obligatorily raised to T. Either way, they depend on T for licensing and so would not be found in root infinitives, where TP is absent.

Verb Movement and Markedness 151 A final point is that root infinitives are not found in Early Italian (Guasti 1992). Rizzi relates this fact to the obligatory raising of Italian infinitives to AgrS, as illustrated by the contrasts in (9) above. Since infinitives depend on AgrS, root infinitives involving truncation of TP are impossible. The Clausal Truncation Hypothesis explicitly claims that this stage of child language has at least one grammatical property that is not found in any adult system: the fact that (18) is inoperative. The suggestion is then that principle (18) matures (see Rizzi 1994, note 4). Another possibility Rizzi considers is that, given that (18) is nonoperative in adult language in certain registers or under certain pragmatic conditions (e.g., in questionanswer pairs), perhaps children are simply not aware of the pragmatic conditions governing the suspension of (18) in early stages of acquisition and, hence, suspend it “too often.” This proposal too seems to entail that this stage of acquisition features a grammatical system that is not found in adult language (unless we can find a language where (18) is suspended more readily).8 I would like to explore the possibility that root infinitives are a reflection of the approach to markedness discussed earlier. If this idea can be maintained, it will yield the desirable result that this stage of acquisition is a grammatical system of the usual kind—that is, a particular combination of parameter settings and, in principle, one that could correspond to an adult language. And it will also, of course, be evidence for the operation of this theory of markedness in language acquisition. In the previous section, we saw a case of language change that provided evidence for a morphological trigger for strong values of parameters. In its general form, the morphological trigger can be stated as follows (see the more specific statement given in (11)): (21) If a head H has the relevant L-morphology, then H has strong L-features. For example, the relevant V-morphology for AgrS seems to consist of particular properties of the agreement paradigms. In the present context, two things are important about the morphological trigger. First, its absence is simply the absence of a trigger, and not a trigger for some “negative” property like weak features (this would in fact be a kind of negative evidence and so must be discounted on general grounds). Second, to function as a trigger, there must be some morphological analysis of the input. It is a well-known fact about language change (that played a clear role in the previous section) that morphology can be diachronically “weakened”; this amounts to saying that successive generations discount aspects of the morphological system. So it is clear that the morphology itself may not always be acquired. A natural inference is that morphological acquisition may proceed in stages, depending on which aspects of the adult system are acquired. It is well known from acquisition studies that this is true (Brown 1973).

152 Ian Roberts In the absence of a trigger, whether morphological or syntactic, this theory of markedness claims that weak values of parameters will always be preferred because they are associated with simpler overt representations. Suppose, then, that when confronted with an agreement system of a particular kind A, where agreement affixes are not abundant (and where referential null subjects do not appear—assuming that these are licensed via agreement morphology), the preference for simpler structures causes acquirers to ignore the morphological input and to assume weak feature values. In other words, in cases where the agreement system is “poor” enough to allow the morphological trigger to pass unnoticed, as it were, given the preference for weak features, a verb-movement system that can be characterized as follows emerges: (22) No agreement; no verb movement. In production, the nonagreeing verb form is used; in many languages, this is the infinitive (note that, in English, it is simply the stem form that appears in the infinitive, the imperative, and in all persons of the present tense except the third-singular form). A system like (22) would give rise to the production of “root infinitives.” Notice, however, that these forms are no more infinitives than the non-third-singular present-tense forms of English verbs; they are simply instantiations of nonagreeing verbs. Evidence that these forms are not infinitives comes from the fact that they are compatible with overt subjects, as examples like (14b), (15b), and (17) show; here nominative Case is assigned to the subject by I in exactly the way it is in simple adult English sentences like John smokes. (Rizzi suggests that the subjects in (l4b), (15b), and (17) may be dislocated, but it seems clear that an analysis like the one I present, which can explain the possibility of subjects with “root infinitives” without recourse to some special stipulation about subjects, is preferable to one that needs such a stipulation.) The system in (22) is a product of the lack of morphological expression of the strong value of the V-to-I parameter; it appears sporadically and is eliminated due to the syntactic expression of this parameter. So (22), combined with the theory of markedness I outlined, is the central idea of my account of “root infinitives,” which specifically links the occurrence of this phenomenon to the fact that the languages in question have somewhat impoverished agreement morphology. The “root infinitives” are a manifestation of the default option for AgrS’s V-feature in the case where the morphological trigger does not immediately rule this option out. In languages where the verbal agreement morphology is richer (and which allow null subjects)—namely, languages where the agreement system is not of type A—the morphological trigger is sufficiently salient that AgrS’s V-feature is immediately triggered, and so root infinitives are not found in acquisition. This is the situation in Italian, for example. So, my account of Italian is directly related to the nature of the trigger. More generally, I predict that,

Verb Movement and Markedness 153 other things being equal, “root infinitives” will be found only in languages with somewhat “impoverished” agreement morphology. It may be that the set of languages that manifest root-infinitive stages in acquisition is coextensive with the set of languages that do not allow referential null subjects; however, I will not speculate further on this point. Let us now consider the other morphosyntactic properties that, according to Rizzi, are correlated with “root infinitives.” It is possible to account for the absence of subject clitics in French root infinitives in essentially the same way as Rizzi (only without assuming clausal truncation). French subject clitics are effectively dependent on the presence of strong AgrS, because they are typical of Romance clitics in requiring both cliticization to AgrS and cliticization to an inflected verb. Thus, if the verb does not raise to AgrS, these pronouns cannot cliticize to it and hence cannot appear. An alternative would be to regard these clitics as manifestations of the strong features of AgrS. However, I retain the view that they are pronouns that obligatorily cliticize (see Kayne 1983; Rizzi 1986b). One reason for this is that if French subject clitics are AgrS then French is a null-subject language and the presence of root infinitives would therefore be surprising. As for the absence of auxiliaries, I could follow Rizzi in regarding them as dependent on (overt) raising to T for licensing; if raising is unavailable, auxiliaries will thus be unavailable. However, Pollock’s (1989) evidence that auxiliaries may but do not have to raise in French infinitives suggests that the licensing condition involving overt raising to T is not correct. The question that the absence of auxiliaries in root infinitives effectively raises is: why do we not find auxiliaries behaving like the tense/mood/aspect (TMA) markers of creoles (i.e., showing no agreement and no raising)? I suggest that the answer to this question lies in the fact that auxiliaries in English and French clearly undergo more raising to AgrS than do main verbs and also, especially in English, have richer agreement than main verbs. These facts may create a situation where auxiliaries effectively follow the Italian pattern: there is a sufficiently robust trigger for their movement that the unmarked option (22) does not surface. This implies that Universal Grammar (UG) distinguishes auxiliaries from main verbs and that movement parameters can be set differently for auxiliaries and main verbs. These ideas are needed independently, of course, in order to account for the existence of have/be raising in Modern English (as well as its diachronic origin). My account of the absence of root infinitives with wh-preposing and negation is similar. Suppose that there exists a parameter determining the “lexicalization” of an I system containing [+wh] or negation. The value of that parameter is positive in the adult versions of all the languages in question (except for Mainland Scandinavian negation, although this may be related to the fact that negation in these languages is adverbial rather than being a head in the I system):9 lexicalization involves V-movement in all the languages except English, where it is achieved by do-insertion. The trigger for the lexicalization parameter is syntactic: inversion in the case of [+wh]

154 Ian Roberts (linked to the value of the parameter determining I-to-C movement in interrogatives, again positive in all the languages in question), and V-movement or do-insertion in the case of negation. This parameter is also positively set in the root-infinitive stage of acquisition of the languages in question. For this reason, root infinitives, which are immobile, are incompatible with negation and inversion, which require movement. That the lexicalization parameter is distinct from the V-movement parameter is independently shown by Modern English, which requires lexicalization of [+wh] and [+neg] and yet does not allow V-movement (for main verbs). However, the lexicalization parameter and the V-movement parameter interact, thanks to the principle of Greed, which requires that a given category α move only to satisfy a property of α (see Chomsky 1993). Given that these languages all have I-to-C movement in interrogatives, V will raise through AgrS. Hence it must also check off its own strong AgrS features. We can extend this reasoning to the negation case if we assume that negation must combine with Tense (Laka 1990; Zanuttini 1991) and that, in finite clauses Tense must combine with AgrS (Chomsky 1993). Thus we derive the result that a verb moving in interrogatives and negatives is always inflected. To be consistent with the general theory of parameters of Chomsky (1993), the lexicalization parameter must be expressible in terms of V- or N-features. It seems natural to regard it as a case of strong V-features of I-elements with the relevant feature specifications; say, C and Neg, respectively, for [+wh] and [+neg]. In Modern English, the strong feature values of these elements cannot be checked by V because V is unable to move and hence do is inserted in just these contexts; do is able to move and so can check the strong V-features and hence cannot be inserted in positive declaratives because its own feature would then be unchecked. In sixteenth-century English, do raised to check I’s strong V-feature independently of the values of [wh] and [neg]. When I’s V-feature became weak, the raising of do began to take place under different conditions (i.e., only where there was [+neg] or [+wh]). Thus the presence of do in positive declaratives declined from 1575 (see Kroch 1989). This analysis of root infinitives has several advantages over the one proposed by Rizzi. It makes a general prediction about the connection between root infinitives and the richness of the agreement system. It does not postulate properties of child language that are not found in any adult grammatical system, and, of course, it integrates the phenomena into a general theory of markedness and triggers. Rizzi’s account of the development from a root-infinitive system to an adult-like one lacking in root infinitives involves the maturation either of (18) itself or of the pragmatic principles regulating the suspension of (18). It is because the root-infinitive systems are “immature” in this sense that they do not correspond to any adult grammatical system. Because my analysis of root infinitives treats them as involving a weak setting for I’s V-feature, I must account for why this feature value changes in languages where the

Verb Movement and Markedness 155 adult system has the opposite setting. The account is straightforward: acquisition of the adult morphological paradigms will ensure, given my account of the trigger for strong feature values, that the I has strong features in a language such as French. The syntactically triggered strong values for I[+wh] and I[+neg] also, given the Greed-based analysis sketched out above, create a number of situations in which V has strong features. Thus the grammar with a strong V-feature for I is preferred; the trigger experience requires the marked parameter setting. Similar considerations extend to the trigger for V2, although here the precise nature of the trigger is less clear.10 I conclude with a speculation on how the I parameter may fail to be set to the same value as that underlying the trigger. It is a reasonable speculation, especially given the suggested correlation with “poor” agreement systems, that sixteenth-century English acquirers passed through a rootinfinitive stage. We saw above that do was a freely available “dummy verb” at this stage of the language. Effectively, when do appears, adult English has a root infinitive (see Wexler 1994). All through the sixteenth century, do was particularly frequent in questions and negatives, removing one piece of evidence that I has strong V-features. Moreover, the crucial morphological trigger for V raising to I was lost by the sixteenth century. Given these considerations, it is easy to see how the I parameter was not reset so as to correspond to the then adult value.

5.4 Creoles, Markedness, and the Language Bioprogram Hypothesis In a series of very interesting papers, Bickerton (1981, 1984) has argued that creoles give a direct insight into the language faculty. His central idea is that creoles are acquired on the basis of a radically impoverished trigger (see Bickerton 1999), so impoverished that UG (in the form of the Language Bioprogram) has a more direct relationship with the final-state system than in the case of noncreole languages. Noncreole languages are, of course, underdetermined by experience, but the standard assumption is that aspects of experience profoundly influence the final state by setting parameters and so on. On the other hand, the characteristic property of creoles is that their history features a break in the normal generation-to-generation “transmission of language.” Of course, language is not really transmitted; parameter values are triggered thanks to their expression in the input to language acquisition (when they are not, a change may take place). In these terms, the special property of creoles is that they are based on highly defective, or even absent, triggers. At this point, “the human linguistic capacity is stretched to the uttermost” (Bickerton 1981, 4). The trigger experience is impoverished (consisting primarily of pidgin and/or jargon). In Bickerton’s words: It is debatable whether the P-input [pidgin input—IGR] is language at all. P-input is in no sense a reduced or simplified version of some

156 Ian Roberts existing language: it is a pragmatic, asyntactic mode of communication using lexical (and very occasionally grammatical) items drawn mainly, but by no means exclusively, from the politically dominant language. (Bickerton 1991, 365) At the point of creole genesis, then, a new system is effectively “invented.” Because of this, Bickerton argues, creoles are a unique window on the language faculty. Bickerton’s evidence for his point of view comes from the striking morphosyntactic similarities that hold among creoles that are based on different lexifier languages and are widely dispersed both geographically and historically. I review a number of these below. Bickerton argues that it is most unlikely that these similarities are the result of historical borrowing or contact and extremely unlikely that they are due to chance.11 The view that creoles, or any class of languages that one might be able to identify pretheoretically, have some kind of privileged relationship with UG is one of which I am skeptical. If creoles are genuine natural languages, and all the evidence indicates that they are (unlike pidgins, on most views), then they are simply instantiations of UG possibilities like any other system. The P&P approach cannot treat some set C of languages as “closer” to UG than its complement set Cʹ. Parameters must have determinate values for a grammatical system to function (notice that this is a theorem of the view of parameters in Chomsky 1993, not something that needs to be stipulated separately); creoles must therefore contain fixed parameter values. As such, creoles have exactly the same relationship to UG as any noncreole language. (Essentially this point is also made by Lightfoot (1991, 182)). The conclusion just reached is, in a sense, pessimistic with regard to the interest of creoles (of course, creoles will always be just as fascinating as any other natural languages, but, as we have seen, special claims have been made on their behalf). More importantly, it leaves open the question of the similarities that appear to hold among creoles. Also, it glosses over the fact that creoles are acquired on the basis of an unusual trigger (although Lightfoot (1991, 182) argues that both this fact and any consequences it may have are open to doubt). A weaker view, which is less “pessimistic” regarding the special interest of creoles, and which is moreover compatible with most versions of P&P theory, is that creoles are systems with predominantly unmarked parameter values. This is the position of Bickerton (1984) and is essentially what I wish to defend here. Lightfoot argues that this cannot be true across the board, primarily on the basis of word-order facts. Creoles are overwhelmingly SVO languages, and so one would be led to claim that SVO is the unmarked word order. Lightfoot is skeptical of such a claim. However, if we follow recent work by Kayne (1994), who argues that all languages must effectively be SVO, and Zwart (1993), who argues that Dutch is SVO with a great deal of leftward movement of complements, then we are led to conclude that OV

Verb Movement and Markedness 157 order results from a strong N-feature associated with some functional head, presumably AgrO. In that case, the approach to markedness adopted here tells us that OV is a marked word order, because it depends on the presence of a strong N-feature in AgrO. And so Lightfoot’s objection disappears. The claim of this section is that, although Bickerton’s original idea that creoles effectively had some privileged relation to UG as compared with noncreoles is too strong (and in fact cannot be formulated in the theory of parameters assumed here), the idea that creoles fairly systematically reflect unmarked parameter settings (i.e., weak values of the features of functional heads) can account for many attested similarities among creoles and can be plausibly related to the unusual status of the trigger. My empirical focus will be on properties connected to verb movement, and so the claim made in the previous two sections that the lack of V-to-I movement reflects the unmarked parameter setting will be further substantiated. Naturally, a view of the relations among language acquisition, creolization, and language change emerges from the discussion—here, the unifying notion is again markedness. My position will be a fairly weak one: I do not wish to assert that creoles always and only have unmarked values for all parameters. This would make the excessively strong prediction that creoles are all parametrically isomorphic to one another. As Bickerton (personal communication) points out, there are differences between creoles, and these differences arose from variation in the degree to which realizations of functional categories are triggered (Bickerton cites the example of the survival of French relative pronouns into French-based creoles, whereas these pronouns were lost in English-based creoles). The hypothesis I would like to entertain, then, is that creoles tend to have weak values of parameters. As such, they will have a number of properties in common (but notice that these common properties could perfectly well be shared by noncreole languages, something that the strongest version of the Language Bioprogram Hypothesis does not obviously admit). We can relate the tendency towards weak parameter values to the nature of the trigger. Assume that a significant part of the trigger consists of pidgin. Such vernaculars lack inflectional morphology, and so the morphological triggers for strong feature values are missing. Similarly, we can speculate that pidgins do not robustly encode syntactic triggers; the discussion of Hawaiian Pidgin by Bickerton (1981, chap. 1; 1999) strongly suggests this. Notice that we do not need to make any very strong claim about the trigger for the acquisition of first-generation creoles—only that the trigger is morphologically and syntactically defective on crucial points, just in the sense that certain properties required for the triggering of strong features were not expressed with sufficient frequency or clarity. This strikes me as a plausible and conservative speculation as to the circumstances of creole genesis. Granted this much, if the triggering data for strong values is not available to the learner, the learner’s preference for maximally elegant representations will always

158 Ian Roberts favor weak feature values because these give rise to representations that are simpler than those arising from strong feature values. Hence weak feature values will tend to predominate. So, we can relate the unusual circumstances of creole acquisition to a propensity for unmarked (i.e., weak) feature values. The central factor that determines the unmarked status of creoles is the simplicity metric that is intrinsic to the learning device. It is now necessary to show that the syntactic characteristics of creoles can be accounted for as the reflexes of unmarked feature values. Muysken (1981, 1988) discusses six properties that he takes as characteristic of creoles. I will now discuss these one by one (making slight adaptations to the theoretical assumptions) and show how, with one partial exception, they can all be viewed as deriving from unmarked values of parameters. 5.4.1 Lack of Verb Movement This property is particularly interesting in creoles whose lexifier languages have verb movement. It is also of obvious relevance given what was said in the preceding sections. A clear case of this type is Haitian Creole, whose lexifier language is French. Haitian Creole very clearly lacks V-to-I movement, as has been shown by DeGraff (1994) and DeGraff and Dejean (1994). DeGraff (1994) explicitly applies the Pollock-Emonds tests for V-to-I movement (see section 5.1 for discussion) to Haitian Creole, and the results consistently show that V-to-I movement is lacking in Haitian Creole. Contrasts with French of the following type are illustrative (although DeGraff discusses a much wider range of data, including many different classes of adverbs): (23) Adverb placement a. i. *Bouki pase deja rad yo. Bouki iron already cloth the ii. Bouki deja pase rad yo. ‘Bouki has already ironed their clothes.’ b. i. Bouqui repasse déjà le linge. Bouqui irons already the clothes ‘Bouqui is already ironing the clothes.’ ii. *Bouqui déjà repasse le linge. (24) Negation a. i. Boukinèt pa renmen Bouki. Boukinèt neg love Bouki. ‘Boukinèt does not love Bouki.’ ii. *Boukinèt renmen pa Bouki. b. i. *Jean ne pas aime Marie. ii. Jean n’aime pas Marie. Jean neg-love neg Marie ‘Jean does not love Marie.’

(Haitian Creole)

(French)

(Haitian Creole)

(French)

Verb Movement and Markedness 159 These examples clearly show that elements that are usually held to intervene between I and VP, and whose position relative to a verb is therefore a test for verb movement, always precede V in Haitian Creole. (Michel DeGraff (personal communication) informs me that Haitian Creole lacks floating quantifiers, and so this test for verb movement is inapplicable; neither is inversion (i.e., I-to-C movement) found in Haitian Creole (DeGraff 1993a, 75)). The conclusion seems clear: Haitian Creole lacks V-to-I movement. As DeGraff points out, Haitian Creole is also a typical creole in that there is no verbal inflectional morphology. There is no subject-verb agreement at all, and information regarding tense, mood, and so forth is carried by preverbal particles (to which I return below). Hence, the correlation between the lack of V-to-I movement and the lack of verbal inflectional morphology observed in the history of various Germanic languages in section 5.2 holds up in the relationship between French and Haitian Creole. The generalization about the trigger experience that I made there, repeated below as (25), is then supported: (25) If a head H has the relevant L-morphology then H has strong L-features. Haitian Creole clearly lacks the relevant verbal morphology, unlike French. Moreover, the impoverished trigger for Haitian Creole must also have lacked the syntactic cues for V-to-I movement; this is plausible for the reasons given above. Given the absence of a trigger for a strong V-feature in I, the drive to minimize the complexity of linguistic structures leads to a system where there is no overt V-movement—that is, a system where I has a weak feature. We thus observe a striking parallel between the development from Middle/ Early Modern English to Modern English and the development from French to Haitian Creole (in addition to the parallels between the development of English and the development of many Mainland Scandinavian languages discussed in section 5.2); this parallel is also observed by DeGraff (1994). The two developments were, of course, entirely unconnected. Moreover, the trigger experiences were altered in differing ways. In English, regular processes of phonological reduction eroded the verbal inflections and an independent syntactic change led to the development of an auxiliary system (see Lightfoot 1979; Roberts 1993a on the latter); moreover, there was a continuous development from one stage to the other. Haitian Creole, on the other hand, is not directly diachronically derived from French; it is a grammatical system that emerged spontaneously from a pidgin whose lexicon is largely French-based (but see Mufwene 1999 and Lumsden 1999 for alternative proposals). As suggested above, I take the nature of the pidgin to be the source of the lack of morphological and syntactic triggers for V-to-I movement. The parallel extends to Mauritian Creole, another French-based creole, as the following example shows (from Green 1988): (26) Li pa pu dir narjê. it neg prt say nothing ‘It doesn’t mean anything.’

160 Ian Roberts On the other hand, Réunionnais appears to pattern with French, to judge by the following (ibid): (27) Li mãz pa sel. he eat neg salt ‘He doesn’t eat salt.’ Interestingly, Réunionnais is known to be more “heavily influenced” by French than the other Indian Ocean creoles or Haitian Creole. In fact, Réunionnais is explicitly excluded from the class of creoles by Bickerton (1981, 4). Baker and Corne (1982, 107) provide evidence that Réunionnais had greater superstrate contact than other French-based creoles. This contact presumably provided a syntactic trigger for V-to-I movement in Réunionnais. Mesolectal Louisiana Creole presents an interesting variation on this theme, according to Rottet (1993) (discussed in DeGraff 1994). This variety has “short” and “long” forms for a large number of verbs, where the long form is clearly derived from the French infinitive and the short form from the French present tense: (28)

aret frem kup

arête freme kupe

‘stop’ ‘shut’ ‘cut’

The short form precedes VP adverbs and negation, whereas the long form follows these elements: (29) a. Fo tuzhu kupe zerb-la. Fo always cut (long form) grass-the ‘It’s always necessary to cut the grass.’ b. Fo kup tuzhu zerb-la. fo cut (short form) always grass-the ‘It’s always necessary to cut the grass.’ (30) a. Mo pa mõzhe. I not eat (long form) ‘I haven’t eaten’/ ‘I didn’t eat.’ b. Mo mõzh pa. I eat (short form) not ‘I don’t eat.’ These examples show that the short form acts like a French verb and the long form acts like a Haitian Creole (or English) verb. The short form thus appears to move to I. The idea that the short form moves to I whereas the long form stays in V is confirmed by the interactions of these forms with certain preverbal

Verb Movement and Markedness 161 particles that mark tense and aspect. These particles are incompatible with short forms, but compatible with long forms: (31) Le klosh ape sone/*son aster. the bell PROG ring now ‘The bells are ringing now.’ Assuming that the preverbal particle occupies an I-like position here, (31) further indicates that short-form V raises to I. Raising is blocked when the particle is present. Example (31) is thus analogous to the English case where the presence of a modal blocks have/be raising. DeGraff (1994) follows Rottet (1993) in assuming that the long forms and the short forms are representatives of different grammatical systems. This is in part corroborated by the fact that only long forms are found in the basilect, and, of course, standard French root clauses require a morphologically inflected verb, the counterpart of the Mesolectal Louisiana Creole short form. DeGraff and Rottet thus take the mesolectal data to be the reflection of coexisting grammatical systems in the sense of Kroch (1989). 5.4.2 SVO Order It appears that creoles are without exception SVO (Bickerton 1981, 1984, 1988; Mühlhäusler 1986; Muysken 1981, 1988). SVO is of course a very common order among noncreoles, but SOV is just as common, and there is a significant minority of VSO languages (the other word-order types are very rare; see Greenberg 1963). So, creoles as a group can be distinguished from noncreoles in that they do not show non-SVO typologies; on the other hand, SVO itself is not confined to creoles. This is a good example of how creoles occupy just part of the space of variation that is attested in language in general. The facts are particularly striking in creoles that derive from non-SVO lexifier languages. This is the case of Berbice Dutch, derived from Dutch and Ijo, both surface OV languages. Another example is Rabaul Creole German, which is SVO despite being derived from German, a superficially SOV language. Mühlhäusler (1986, 156) cites Thomason (1981, 333) who points out that Chinook Jargon is SVO, but the dominant order in the Northwestern Amerindian languages on which it is based is VSO.12 It is generally assumed that VSO order involves verb movement (see Emonds 1980; Sproat 1985). Hence it is a marked order, as compared to SVO, which does not necessarily feature verb movement. I have also suggested that VO is the unmarked order with respect to OV. This is because, following Kayne (1994), Chomsky (1994), and Zwart (1993, 1994), I take VO to be the only available underlying order and surface OV to be derived by overt DP-movement to Spec,AgrOP for Case checking. So, OV results from the presence of a strong N-feature in AgrO. The strong feature of AgrO entails more complex representations for the learner, since it induces

162 Ian Roberts more overt movement. Hence it is marked. So, I conclude that OV order is marked. In this light, the fact that creoles are always VO is a reflection of their tendency to show unmarked values for parameters, given the extremely degenerate trigger for creole genesis. Thus, this property of creoles can be explained in terms of our theory of markedness. In this connection, it is worth reflecting on the surface position of the subject. If we apply the above reasoning to subject movement, the straightforward prediction, assuming some version of the VP-internal subject hypothesis, would be that subject raising out of VP is a marked option. This predicts that the “least marked” word order is SVO (because verb movement, also a marked option, is required for VSO) where the subject is obligatorily adjacent to the verb (and the verb to the object). In such a system, VP adverbs and negation would always precede the subject, as would tense, mood, and aspect particles (if these are not verbs; see below), and any I-type material. As examples like (23), (24a), and (26) above show, this is not the situation in creoles; negation typically intervenes between the subject and the verb (as do tense, mood, and aspect particles). We are led to the conclusion that creoles typically show the marked value for subject movement and the unmarked value for object movement. Why should this be? One possibility would be to deny the VP-internal subject hypothesis. It is certainly true that, if this hypothesis is adopted, we have to explain the general preference to raise subjects (notice that this is true even where subject raising is an option, independently of markedness considerations). Languages with the “maximally unmarked” SVO order of the type just described are hard to find; similarly, VSO languages typically show adjacency effects between the verb and the subject (see McCloskey 1991 on Irish, for example; the same is true of Welsh), whereas a system with only V-raising would show adjacency effects between the subject and the object. These are general puzzles for our approach, as it stands. These observations must be related to two well-known facts: (a) that object agreement is never richer than subject agreement, and (b) that referential null objects are much rarer than referential null subjects. Tendentially, then, AgrS tends to be “richer” or “stronger” than AgrO. To be more precise, it seems that we never find a system where AgrO has a strong N-feature and AgrS has a weak N-feature.13 However, abandoning the VP-internal subject hypothesis is not a solution. The proposal would be that subjects are base-generated in [Spec, AgrS] and that therefore there is no raising to this position, hence SXVO is not a marked order. However, there is clear evidence of raising constructions elsewhere in Haitian Creole, hence evidence that AgrS has a strong N-feature independently of what we assume about the base position of subjects. DeGraff (l993a) gives pairs like the following: (32) a. Genlè Jak damou. seem Jak in-love ‘It seems that Jak is in love.’

Verb Movement and Markedness 163 b. Jak geniè damou. Jak seems in-love ‘Jak seems in love.’ These examples clearly show that raising to subject of a familiar kind exists in Haitian Creole, and hence that AgrS has strong N-features. I propose that the strength of AgrS’s N-feature in Haitian Creole reflects a property of UG, not a parametrically variant property or a property of the learning device. The generalization can be phrased as follows: (33) AgrS has a strong N-feature. This statement corresponds to the Extended Projection Principle (EPP) of earlier work (Chomsky 1982). In the minimalist framework, the EPP reduces to a feature-checking requirement, as (33) states. However, the above considerations regarding the greater richness of AgrS as compared to AgrO, as well as the fact that we find subject expletives but not object expletives, indicate that AgrS’s N-feature does not vary parametrically.14 If this is the case, then no markedness issue arises, and we of course expect creoles to have a strong N-feature in AgrS.15 It remains an open question why (33) should hold. I will not speculate on this here.16 5.4.3 No Referential Null Subjects This property seems to be generally true of creoles, according to Muysken (1981) and Lightfoot (1991).17 It is particularly striking in creoles based on null-subject lexifier languages like Spanish and Portuguese. A case in point is Papiamentu, based on Spanish. Muysken (1988, 291) gives the following examples to show that this is not a null-subject language: (34) a. E ta kome. he asp eat ‘He is eating.’ (Él está comiendo.) b. *Ta kome. asp eat (Está comiendo. ‘He/she is eating.’) c. *Ta kome Maria. asp eat Maria (Está comiendo Maria. ‘Maria is eating.’) We can attribute this to the fact that creole agreement systems are too impoverished to permit recovery of the content of referential pro.18 Assuming, following Rizzi (l986a), that this ability to recover pro’s content is the fundamental property of null subject languages, we can ask how this should

164 Ian Roberts be expressed in terms of the system of parametric variation adopted here. Given what I said above, it cannot be that the ability to license null subjects is connected to AgrS’s N-feature because (33) would then imply that all languages are null-subject languages. Therefore, it must be a property of AgrS’s V-feature (recall that we assume that only these two features can vary). Accordingly, I adopt the following approach to licensing pro: (35) a. pro is formally licensed by a strong N-feature. b. pro’s content is identified by specifier-head agreement with the relevant inflection. The condition in (35a) is necessary for the occurrence of any kind of pro, expletive or argumental, and (35b) requires that the morphology permitting the recovery of pro’s features be present in AgrS (if something other than morphology can permit recovery of pro’s features (see note 18) then no particular requirement on AgrS may result). This is necessary at LF for the correct interpretation of pro, and it is necessary at PF for the correct identification (and perhaps elimination; see Chomsky 1993) of pro’s features. Assuming that there is no PF movement, this means that pro must be in the relevant configuration at Spell-Out. Moreover, assuming Greed, the head bearing the independent inflection must be required to move into a position satisfying (35b) for pro for reasons independent of pro. The relevant morphology may be a grammaticalized pronoun, as in many Northern Italian dialects, or it may be the verbal morphology. In the latter case, (35b) requires that V be in AgrS at Spell-Out (i.e., that AgrS have a strong V-feature). In this way, we derive the result that referential null-subject languages where pro’s content is recovered morphologically always have V-to-AgrS movement (although the converse does not necessarily hold: languages with V-to-AgrS movement do not necessarily allow referential null subjects; e.g., French, Middle English, Icelandic, etc.). Thus (35b) gives us the result that referential null subjects of this type are a marked property, contingent on V-movement. It then follows from what I said above about verb movement that creoles will not have referential null subjects of this type. Independent evidence for (35b) comes from the fact that referential null subjects cannot occupy the “freely inverted” position in Italian (Rizzi 1987). The requirements in (35) are a part of UG, comprising a well-formedness condition on a particular element, pro. As such, it is not a parameter. The only syntactic variation with respect to null subjects concerns V-movement; the verb must be in a position where (35b) is satisfied in order to license subject pro at Spell-Out. Hence, AgrS’s V-feature plays a crucial role in the null-subject parameter. There is also morphological variation of an illunderstood kind: the verbal inflection must be “rich” enough to permit recovery of pro’s features. I take it that inflection has whatever properties it has at PF and that this may or may not suffice to satisfy the licensing condition in (35b) at PF; there is no syntactic variation on this point beyond verb movement.

Verb Movement and Markedness 165 DeGraff (l993a) argues that Haitian Creole has null subjects. Although he clearly shows that Haitian Creole (and other creoles) can have null expletive subjects, DeGraff’s evidence that Haitian Creole has referential null subjects is not convincing. His argument is that “‘subject’ pronouns in HA [Haitian Creole] do not appear in subject position, but that they are clitics phonologically spelling person and number features of INFL” (p. 73). The evidence for this is as follows: (36) Subject pronouns must be adjacent to VP a. Yaya, bèl ti abitan an, ap viv nan vil Sen-Mak. Yaya beautiful little peasant det prog live in town Sen-Mak ‘Yaya, the beautiful little peasant, lives in Sen-Mak.’ b. *Li, bèl ti abitan an, ap viv nan vil Sen-Mak. 3SG beautiful little peasant det prog live in town Sen-Mak (37) Subject pronouns are phonologically proclitic on V or the first particle a. mwen ale → m ale b. mwen ap ale→ m ap ale c. mwen pa te ale → m pa te ale (38) Subject pronouns cannot be contrastively stressed a. BOUKI ale. Bouki left ‘BOUKI left.’ b. *LI ale. he left (39)

Subject pronouns cannot occur in isolation Ki moun ki genyen? Bouki/*li. who comp won Bouki/him ‘Who won? Bouki/him.’

(40) Subject clitics cannot head complex NPs (e.g., appositive relatives) Bouki/*mwen avèk li, de abitan Sen-Mak, pral Leogàn. Bouki/*isg with 3sg two peasant Sen-Mak go Leogàn ‘Bouki/I and he, two peasants from Sen-Mak, are going to Leogàn.’ (The li in (40) is a complement pronoun, not a clitic; see below.) The properties of the subject pronouns in (36)–(40) indicate only that Haitian Creole subject pronouns are phonological clitics. In fact, standard French subject clitics share all these properties, and these have usually been regarded as phonological clitics (Kayne 1983; Rizzi 1986b; Rizzi and Roberts 1989 [this volume, Chapter 9]) or, more recently, as weak pronouns (Cardinaletti and Starke 1999). The data in (36)–(40) provide no reason to consider the Haitian Creole pronouns as distinct from their Standard French counterparts. Hence they do not provide evidence that Haitian Creole allows referential null subjects. Moreover, the fact that Haitian Creole clitics cannot be “doubled”

166 Ian Roberts by a nominal subject also indicates that these pronouns are not syntactic clitics in AgrS: (41) *Jan li ale. Jan 3sg leave ‘Jan left.’ (DeGraff’s (1993a, 76) ex. (34)) This situation should be contrasted with that found in many Northern Italian dialects (Rizzi l986b; Brandi and Cordin 1989; Poletto 1993), where the clitics are found in contexts like those in (41). Moreover, fully nominal subjects can occur with no clitic in AgrS: (42) Jan ale. Jan leave ‘Jan left.’ Again, this situation parallels that of Standard French and differs from what is found in many Northern Italian dialects, favoring the conclusion that Haitian Creole is like French and not like a typical Northern Italian dialect.19 I conclude that Haitian Creole does not allow referential null subjects. However, it is clear from examples like (43) that it does allow expletive null subjects (and see note 17). (43) Te fè frèt. ant make cold ‘It was cold.’ (DeGraff (1993a, 72) ex. (2)) Also, Haitian Creole allows apparent violations of the C-trace filter, suggesting that it has expletive pro (following the analysis of comparable Italian data in Rizzi 1982, although free inversion of the Italian type is impossible; this must be due to the fact that whatever Case checking process licenses postverbal subjects in Italian is unavailable in Haitian Creole): (44)

Ki moun ou kwè (ki) pa vini? who 2sg believe (comp) will come ‘Who do you think will come?’ (DeGraff (1993a, 80) ex. (43))

The condition in (35a), in combination with (33), allows any language to have expletive pro in Spec, AgrS. Hence we expect to find it in Haitian Creole. The question now becomes: why does English apparently not have expletive pro in Spec, AgrS? This may reduce to a simple matter of lexical

Verb Movement and Markedness 167 variation: the inventory of pronouns in English contains expletive elements; that of Haitian Creole does not (DeGraff (1993a, note 3) points out that an overt expletive must appear with certain adjectives having clausal complements, e.g., “difficult,” “impossible,” etc.; following Bennis (1986), these look like correlative argument pronouns). The inventory of pronouns in referential null-subject languages seems to consistently lack weak subject pronouns. This has usually been accounted for in terms of the Avoid Pronoun principle, which is presumably some kind of PF Economy principle; I have nothing to add to such an account here. Expletive null subjects are also found in other creoles. DeGraff (1993a, 84) gives the following examples, among others, all of which have expletive null subjects: (45) a.

b.

(A) (bi-) kendi/koto. it tns hot/cold ‘It was hot/cold.’ (Byrne 1987, 76) pro tawata jobe. past rain ‘It was raining.’ (Kouwenberg 1990, 46)

(Saramaccan)

(Papiamentu)

Such data are consistent with the Haitian Creole data presented by DeGraff, and with my proposed treatment of it. We see then that referential null subjects, to the extent that they depend on V-movement to AgrS. represent a marked parametric option. On the other hand, expletive null subjects are a lexical, rather than a parametric, option (in fact, it is arguable that pro is universally available but its distribution varies depending on the other expletive pronouns available in the language; in English, for example, it is arguably available in locativeinversion constructions (see Hoekstra and Mulder 1990)). Hence, if creoles tend to favor unmarked parameter values, we expect that referential null subjects will not be found in these languages, although expletive null subjects may be. 5.4.4 No Complement Clitics This property is particularly striking in those creoles deriving from Romance languages with rich systems of complement clitics. Haitian Creole is again a good example because it derives from French. DeGraff (1994, llff) gives the following contrasts with French: (46) a. Bouqui l’aime. Bouqui 3sg-like ‘Bouqui likes him/her/it.’

168 Ian Roberts

b. *Bouqui aime le/la. Bouqui like 3sg-m/3sg-f

(47) a. Bouki renmen li. Bouki like 3sg ‘Bouki likes him/her/it.’ b. *Bouki li renmen. Bouki 3sg like Haitian Creole is quite typical in this respect. There are various views on the nature of Romance clitics and cliticization processes. One view, due to Sportiche (1988) and elaborated by Rizzi (1993), is that Romance clitics undergo a combination of D- and DP-movement. In that case, some strong feature must trigger the movements, and hence cliticization is a marked property. Another view, elaborated by Sportiche (1996), is that Romance clitics are themselves heads that trigger movement of pro. Although Sportiche claims that pro can move either before or after Spell-Out, the system of pro-licensing in (35) implies that pro would have to move prior to Spell-Out (although overtly doubled elements could procrastinate, as Sportiche proposes). Hence in this system, too, clitics imply more movement than nonclitics. We do not need to choose between these two different views of Romance clitics here; the important thing is that both imply the existence of an overt movement operation for clitics that is not required where there are no clitics. So, on either view, complement clitics represent a marked parameter value. We thus expect clitics of this kind to be lacking in creoles. 5.4.5 Preverbal Tense/Mood/Aspect Particles This is another property that has often been noticed (see inter alia Bickerton 1981, 1984, 1988; Muysken 1981, 1988). Tense/mood/aspect (TMA) particles carry information that is typically indicated inflectionally in languages with richer morphology, including many lexifier languages. The particles themselves are usually related to auxiliaries of the lexifier language—for example, the Haitian anteriority marker te derives from either été or était, both past forms of French être ‘to be’. Muysken (1981, 183) shows that TMA elements have the following properties: • they occur adjacent to the first verb • they cannot be fronted with the fronted verb in predicate clefts, as the following Papiamentu examples show: (48) a. Ta ganja Wanchu a ganjabo. foc lie John asp lie-you ‘John has really lied to you.’

Verb Movement and Markedness 169

b. *Ta a ganja Wanchu a ganjabo. foc asp lie John asp lie-you •

they only occur with the first verb in serial-verb constructions20

Muysken concludes that these elements are auxiliaries. Although seemingly correct, Muysken’s conclusion leaves important questions open. English auxiliaries, for example, show quite different syntactic and morphological properties internal to the class; it is generally assumed that modals and do are generated in I, whereas have and be are generated in syntactically lower V-positions. The main reason for this is that modals and do are always finite and always precede negation, and have and be are both able to be non-finite and follow negation (see section 5.2). What all English auxiliaries have in common is that they are athematic; these elements appear to lack argument structure. Have and be are also able to appear as “main verbs,” where they may have an argument structure, although this is rather unclear (see Pollock 1989 and Williams 1994 for discussion of this point from different perspectives). It is interesting to consider the Haitian Creole TMA markers from the perspective of English. In his discussion of these elements, DeGraff (l993a, 74–75) makes two important observations. First, negation always precedes all these markers: (49) a. Jan pa t ava ale nan mache. Jan neg ant mood go in market ‘John would not have gone to the market.’ b. Jan te (*pa) ava (*pa) ale (*pa) nan mache. Jan ant neg mood neg go neg in market Second, most of the TMA markets are able to act as main verbs in isolation: Pral, marking future, also means ‘to go’; dwe, marking obligation or possibility, also means ‘to owe’; fini, marking completion, also means ‘to finish’; konnen, marking habituality, also means ‘to know’; sòti, marking recent past, also means ‘to leave’; etc. (DeGraff 1993a, 75) It seems clear from the above that the TMA markers of Haitian Creole at least are auxiliaries comparable to English have and be, rather than to English modals and do (DeGraff also points out (note 14) that Saramaccan and Antiguan Creole and comparable to Haitian Creole in this respect, although Mesolectal Louisiana Creole has tense and mood markers that precede negation (DeGraff, personal communication)). However, an important difference as compared to have and be is that the TMA markers never raise, consistent with the general absence of V-movement in Haitian Creole.

170 Ian Roberts It has often been argued that auxiliaries like have and be should be treated as verbs heading their own VPs (this proposal goes back to Ross (1969) and has been adopted in more recent work by Chomsky (1986), Pollock (1989), Roberts (1993a), and many others). In that case, they are verbs that are at least able to be thematically defective, even if they may also have thematic structure when they appear as main verbs. However, suppose that functional heads are underspecified lexical heads,21 then these elements must strictly speaking be functional heads. The fact that these elements appear lower than negation indicates that they represent a layer of functional structure below Neg. A proposal of this type has been made for complex auxiliary structures by Cinque (1994). Notice that, if these elements are functional heads, then they are functional heads with weak features because they do not trigger overt incorporation of lower verbs. This is of course consistent with my general proposal about creoles. These elements are licensed as LF affixes (when they have no argument structure; when they have an argument structure, they are licensed by assigning thematic roles, like all other verbs). One question remains: why are the TMA markers like have/be rather than like modals? The answer to this becomes clear if we consider the account of grammaticization proposed by Roberts (1993b) and Clark and Roberts (in progress). This theory of grammaticization postulates that three properties were required for the grammaticalization of a lexical item l of category L: • • •

l is in a category L which is always moved to a functional head F l is aprosodic l has a potentially athematic interpretation

Our analysis of the TMA markers of Haitian means that they clearly have the third of these properties. Also, Haitian TMA markers are readily subject to various processes of phonological reduction, which indicates that they are prosodically weak (Michel DeGraff, personal communication); notice the parallel with English auxiliary reduction. However, it is the first property that is relevant here: in order for a lexical category L to be grammaticalized as a functional item F that occupies a position above negation, it must be that at some prior stage in the history of the system there was movement from L to F. This is the case in English, for example, where the positions occupied by modals and do are a relic of the earlier V-movement system (see section 5.2 and Roberts 1993a, chap. 3). If we assume that creoles retain unmarked properties, then creoles lack V-movement; hence the conditions for creating auxiliaries like English modals have never obtained. We see, then, that the TMA system is a complex of functional heads with weak features. Above negation, there is at least one further functional head (depending on one’s assumptions about the position of Neg and the fine structure of C), which has weak V-features. All the features that are capable of variation thus have weak values in Haitian Creole, and as far

Verb Movement and Markedness 171 as I am aware, Haitian Creole is quite typical of creoles generally in these respects. This is consistent with our general view of creoles and our theory of markedness. I must add a proviso to this discussion of TMA systems. Beginning with Bickerton (1975), there has been a great deal of discussion of highly specific regularities in these systems. Two principal claims have been made: (a) that the order is always precisely T–M–A; and (b) that there is a “Jakobsonian” (Bickerton’s term) system of markedness in operation, in that if T is phonologically realized, it is [+anterior], if M is phonologically realized, it is [+irrealis], and if A is phonologically realized, it is nonpunctual. The empirical basis of these claims has been disputed (see, for example, Muysken 1981 and the response in Bickerton 1981), and I do not wish to enter this controversy here. As it stands, my proposal for markedness makes no prediction about the finer structure of the TMA system. Further research on clause structure may yield some such prediction. The Jakobsonian markedness may be a reflection of markedness properties of the lexicon; alternatively, it may show “absorption” of the features of functional heads in the sense of Roberts and Shlonsky (1996). One potential problem arises from our discussion of TMA markers. It could be taken to imply that creoles have reached a steady state, at least as far as V-features are concerned. This is so because the main mechanism by which strong features are created diachronically is grammaticization, and we have seen that this process depends on the prior existence of L-to-F movement—that is, in the verbal domain, the existence of strong features. It is possible that this is a correct result: creoles seem to be subject to a great deal of change, and yet the presence of TMA markers characterizes those of differing ages and in widely scattered parts of the world. A maximally unmarked V-system may be particularly resistant to change. However, I do not want to say that change is impossible. Phonological factors (essentially prosodic reductions), a change in the negation system obscuring the boundary between the two functional layers, or the development of referential null subjects by the grammaticization of subject pronouns as AgrS (see note 19) could lead to the development of strong V-features in the higher functional domain. Hence we can conclude that creole TMA systems are robust, owing to their unmarked nature, but they are not a steady state. I conclude that it is possible to analyze the shared syntactic properties of creoles that are reasonably well understood in terms of the idea that they are reflexes of weak feature values of functional heads (i.e., as unmarked parameter-settings).22 So, I maintain the view put forward by Bickerton (1984) that creoles tend to have unmarked values for parameters. It is reasonable to speculate that this is due to the peculiar circumstances of creole genesis: if the trigger experience is made up largely of pidgin, then morphological triggers will be wholly lacking and syntactic triggers defective. The learner will then employ the elegance property in such a way that the simplest possible representations—that is, those with the minimum amount

172 Ian Roberts of movement—will be selected. The result is that overt movement will be lacking, and creoles will have weak values for all or most parameters. In this precise and rather limited sense, Bickerton’s claim that creoles can tell us something special was right. However, this does not mean, pace Bickerton, that creoles give us a unique insight into the workings of language faculty.23 Creoles, even if they truly have all parameters set to the weak value, are still just instantiations of what UG in combination with the learning device creates as possible variation. Moreover, languages other than creoles may have similar constellations of weak features: English for example has no main-verb movement to the higher functional domain, surface SVO order, no referential null subjects, no complement clitics, and a system of preverbal TMA markers somewhat similar to those found in creoles. Moreover, all of these properties, except for the null-subject one, have been innovated in the recorded history of the language (and the null-subject parameter must have changed since Indo-European, if not since Proto-Germanic). What gives us a privileged view of UG, and of the nature of the parametersetting algorithm, is not creoles but language change. Creoles are particularly interesting because they represent an extreme of language change, but it is the mechanisms of language change, which are ubiquitous in the history of every language and every language family, that have made creoles what they are.

5.5 Conclusion In this chapter, I have presented a number of the ramifications of a wellknown parameter: the one that determines whether a finite V moves to I. After seeing the initial motivation for positing this parameter—essentially the English/French contrasts originally discussed by Emonds (1978)—we saw the evidence that this parameter changed its value in sixteenth-century English. I gave an account of this change in terms of a theory of markedness based ultimately on the elegance condition of the learner proposed by Clark and Roberts (1993 [this volume, Chapter 2], in progress) and the notions of morphological and syntactic expression of a parameter. We next considered the root-infinitive phenomenon of child language, discussed by Rizzi (1994) and Wexler (1994). Here, I argued that these facts could be understood in terms of an early attempt to set the V-to-I parameter to the weak value in languages where the morphological trigger for the parameter is not salient. Finally, we considered the nature of creoles and creolization, and I suggested that Bickerton’s (1984) conjecture that creoles tend to instantiate unmarked values of parameters is correct, once viewed from the perspective of the theory of markedness being proposed here. In the case of V-to-I movement, this implies that creoles typically lack V-movement, which appears to be correct. I also suggested that the possibility of referential null subjects is intimately connected to the V-to-I movement parameter.

Verb Movement and Markedness 173 If the ideas in this paper are anywhere near correct, then the next step for research is to extend this kind of reasoning to other parameters. In that way, a truly general theory of crosslinguistic variation will emerge.

Notes Parts of this material have been presented in lectures at the Universities of Paris, Florence, Venice, Oxford, and Wales. I’d like to thank the audiences at those presentations for their comments. I’d also like to thank Robin Clark, from whom the ideas about learning theory are derived, and the students in my Seminar on Language Change at the University of Maryland—especially Jairo Nuñes, who suggested that I look at creoles. Thanks also to David Lightfoot and Juan Uriagereka for interesting discussions and much more. Finally, my grateful thanks to Derek Bickerton and Michel DeGraff for very helpful comments on an earlier draft. All the errors are mine. 1. For the purposes of exposition, I adopt the “split-I” version of clause structure, originally proposed by Pollock (1989), only where necessary. Elsewhere, I, IP, and so forth are used as cover terms for the functional system associated with the verb, although I recognize that this hides a more complex reality. 2. The straightforward implication that a language allows inversion only if it has the French-style orders in (la), (2a), and (3a) does not hold. The Mainland Scandinavian languages—Swedish, Danish and Norwegian—are verbsecond in root clauses and pattern like English with respect to the Pollock/ Emonds tests in embedded clauses (Platzack 1987; Platzack and Holmberg 1989; Vikner 1995). If at least some V2 clauses involve V-to-I movement, as proposed by Travis (1984) and Zwart (1993), then the generalization can be maintained. The absence of French-style patterns in the equivalents of (1), (2), and (3) becomes a question of the trigger of V-to-I movement. See also note 5. 3. In fact, Pollock notes that main-verb infinitives can undergo “short” movement over VP adverbs but not over negation: (i) A peine comprendre/comprendre à peine l´italien après cinq ans hardly to-understand/to-understand hardly Italian after five years d’étude est une honte. of-study is a disgrace ‘Hardly to understand/to understand hardly Italian after five years of study is a disgrace.’

This led to the postulation of two landing sites for V-movement and hence the Split-I Hypothesis. 4. Chomsky accounts for have/be raising with the suggestion that these elements cannot obey the Procrastinate principle because they lack an LF interpretation and so must be eliminated prior to that level. I will not adopt this account but return to this matter briefly below. 5. Because Swedish has always been a V2 language, I moves to C in most main clauses; this is clearly a separate parameter from the V-to-I one, although they interact in that verb second triggers V-to-I independently of any property of C, suggesting that verb second is at least in part a property of I, although it must be a distinct one from that governing V-to-I movement of the kind we are concerned with here. See Zwart 1993 for an analysis of verb second that treats it as partially an I-property. These considerations do not alter the fact that inversion (of the “residual V2” kind found in English and French) interacts with V-to-I movement. This can be seen in the history of English, where the loss of

174 Ian Roberts orders like (6a,c,d) patterns with the loss of inversion in interrogatives of the kind in (6b). French, of course. retains these patterns. English used to be a V2 language, but this property was lost around 1400, well before the change being discussed in the text (see Van Kemenade 1987). 6. On the other hand, Rohrbacher predicts that Dutch and Afrikaans lack verb movement whereas German has it, and Roberts predicts that all the West Germanic languages have it. If these languages have a final I, as both Roberts and Rohrbacher assume, it is not clear if any empirical differences follow from this. If, on the other hand, they have medial I, as argued by Zwart (1994), then differences are predicted. Both Roberts’s and Rohrbacher’s proposals are incompatible with Zwart’s analysis of Dutch clause structure and its natural extension to German. However, the adoption of a richer clause structure for West Germanic opens up the possibility again of detecting differences. One relevant consideration is that German allows more leftward DP-movement than Dutch; in Chomsky’s (1993) terms, this can be connected to German having more V-movement than Dutch. There are many open questions here, however, and to go into them fully would take us too far afield. 7. Strong features are marked wherever there is a choice. If there exists a functional head that always and only has strong features (for example, Watanabe (1993) has argued that wh-operators always have strong features), then these features would not count as marked. The crucial notion here is relative complexity. 8. A possible alternative, which retains the Clausal Truncation Hypothesis without postulating maturation of (18), is to say that CP is lacking at this stage of acquisition. Once CP is acquired, (18) comes with it, and root infinitives are banished. In order to avoid invoking maturation all over again by saying that CP matures, such an account would have to say that CP is not triggered early on. However, as (19) shows, wh-questions are found at the root-infinitive stage, and so this stage and CP coexist, which makes it difficult to see how root infinitives could be due to the absence of CP from this stage of acquisition. One could perhaps claim that there are two grammars at this stage: one with CPs and no root infinitives, and one with root infinitives and no CPs. But how, in that case, is CP not triggered in the root-infinitive system? It is hard to see an answer to this that would not appeal to maturation. (Thanks to Derek Bickerton for raising this possibility.) 9. It seems clear that UG allows for negative items to be realized either as XPs or as heads, and that double-barrelled negation featuring both a head and an XP is allowed (the best example of this last situation is of course French, although Middle English, Old High German, Welsh, Breton, and numerous Northern Italian dialects also have a ne. . .pas type of negation). The XP is a specifier of NegP and the head a Neg0. Acquirers can distinguish them on the basis of their interactions with movement, particularly head movement. See Pollock 1989 and Belletti 1990. 10. It obviously is syntactic in part, but there may be a morphological trigger in some varieties in the form of complementizer agreement (see Zwart 1993). For further remarks on the trigger for V2, see Clark and Roberts 1993 (this volume, Chapter 2). See also note 5. 11. By contrast, the monogenesis theory of creole origins holds that all creoles are essentially derived from a Portuguese-based creole that was widely spoken in the fifteenth century (see Thompson 1961; Todd 1974); however, even if the monogenesis theory is correct, creoles have had 600 years of quite independent development in geographically dispersed locations and with, in every non-Portuguese-based creole, massive relexification. The Romance languages in 1100, separated from a common parent by 600 years of development— but with more contact and less relexification than creoles—show a number of

Verb Movement and Markedness 175 striking typological differences: some are V2, some are not; some have residual morphological Case, some do not; some have scrambling, some do not; and so on. Monogenesis alone is, in my view, very unlikely to explain why creoles should occupy such a small space of the available variation; in this sense, the monogenesis theory is irrelevant to the explanation of the morphosyntactic properties of creoles. 12. Mühlhäusler (1986) gives one or two counterexamples to the claim that all creoles are SVO. However, these may be pidgins rather than creoles. Sri Lankan Portuguese creole is SOV (Smith 1977, cited in Romaine 1988, 40). However, as Romaine says “Portuguese influence was removed in 1658, rather early in the development of the creole. This meant that the substratum languages provided the input during the creolization phase” (ibid). The substratum languages are Sinhala and Tamil, both rigidly OV. These languages may thus have provided a syntactic trigger for OV order. 13. I am assuming that AgrS triggers raising to its specifier, and not, pace Chomsky (1993), that T does this; the relevant generalization could be reformulated in terms of the exact assumptions that Chomsky makes, but I retain this formulation for clarity of exposition. 14. The statement in (33) does not explain the apparent absence of object expletives generally, but it does explain the prevalence of subject expletives. The fact that object expletives are not found, if it is a fact, suggests that there is more to say on the general question of Agr’s N-features. 15. This proposal implies that in VSO languages the subject raises to [Spec, AgrSP] and the verb higher. I believe that such an analysis is correct, but to go into this matter would take us too far afield here. 16. Derek Bickerton suggests an account of why creoles may have subjects in [Spec, AgrSP] that does not rely on (33). Pidgins do not have consistent word-order patterns because they do not have syntax. However, they show a statistical tendency towards Topic-Comment order. Because Topics are typically agents, and agents are always subjects, creolizers may reanalyze Topics as subjects. If the Topics precede everything else, then the subject would be analyzed as occupying [Spec, AgrSP]. In other words, the tendency towards Topic-Comment order in the pidgin would act as a syntactic trigger for the marked N-feature of AgrS. This account is quite plausible (although one would need to demonstrate that pidgins always have a tendency towards Topic-Comment order). However. the facts about noncreoles mentioned in the text suggest that (33) is needed anyhow, and so I tentatively retain that account. 17. Several remarks are in order here. First. Michel DeGraff (personal communication) tells me that Mauritian Creole allows null subjects: (i) e pu repar sa sime la dimen. (we) mod repair det road det tomorrow ‘We will repair this road tomorrow.’

See Syea 1985 and Adone 1994. Adone suggests that such null arguments are variables bound by empty topics rather than occurrences of pro. If this is correct, then the claim in the text can be maintained. Second, Derek Bickerton raises the problem of tensed serial-verb constructions (SVCs), giving the following paradigm from Seychelles Creole, a Frenchbased Indian Ocean creole: (ii) a. Zot pran balye koko bat Kazer. they take broom coconut hit Kaiser b. Zot ti pran balye koko ti bat Kazer. they ant take broom coconut ant hit Kaiser

176 Ian Roberts c. Zot ti pran balye koko zot ti bat Kazer. they ant take broom coconut they ant hit Kaiser ‘They hit the Kaiser with a coconut broom.’

Example (iib) looks like a clausal structure, because it contains a TMA marker, and can contain an overt subject (as in (iic)). This is not a coordinate structure because extraction is possible. The only possibility is to assimilate (iib) to (iia) and treat it as an SVC despite the presence of the TMA element (below I will in fact suggest that creole TMA markers in general are “low” functional elements that have more in common with verbs and in fact are directly comparable to English have and be). I have nothing to offer on the subject of serial verbs here, however. Bickerton also points to Hawaiian Creole English examples like the following: (iii) a. She bring food for I eat. b. She bring food for eat. c. *She bring food for me (to) eat.

The ungrammaticality of (iiic) shows that Hawaiian Creole English differs from English in that for is unable to assign Case to a following subject. And (iiia) shows that Hawaiian Creole English differs from English in that for can have a tensed complement. However, we can take (iiib) to be an instance of an infinitive complement to for containing a PRO subject that is controlled by the matrix subject. Examples (iiia) versus (iiib) thus parallel English (iva) and (ivb), respectively:

(iv) a. She brings food in order that I eat. b. She brings food in order to eat. If Shei bring food for shei eat is bad in Hawaiian Creole English, then this parallels the mysterious impossibility of (va) versus (vb) in French: (v) a. *Je veux que je parte. I want that I go (subjunctive) b. Je veux partir. I want to-go ‘ I want to go.’ None of the above instances provides clear evidence of null subjects in creoles. 18. I have nothing to say about the difficult question of null arguments in languages that entirely lack overt agreement (e.g., Chinese, Thai, etc.). Presumably, (35b) should be extended to include some general notion of “recovery from the syntactic context” (suggested to me by Bickerton, personal communication). See Huang 1989 for a proposal. 19. Poletto (1993) discusses a number of Central Veneto dialects (Venetian, Paduan, and Trevisan) that show the French pattern of non-co-occurrence with an overt nominal subject. These dialects show other null-subject properties, notably free inversion. Moreover, the subject clitics follow preverbal negation and are obligatory in all conjuncts of coordinated predicates (see Rizzi 1986b). Both of these properties contrast with standard French and lead Poletto to propose that the subject clitics are not in subject position but are Agr elements that license pro. However, these clitics are argumental, hence no subject argument can cooccur with them without violating the Theta Criterion. It is possible that the Haitian Creole subject pronouns that we are discussing here are like this. Note that this conclusion would not undermine the point being made in the text because only expletive null subjects are allowed in this kind of system, and my claim is that referential null subjects are connected to a marked parameter

Verb Movement and Markedness 177 setting. We might thus expect to find null-subject systems that are comparable to Central Veneto in creoles. However, note that we do not have the principal independent indicator of null-subject status in Haitian Creole—namely, free inversion. Haitian Creole pa is preverbal, but it arguably occupies a position comparable to French pas; see DeGraff 1993b for arguments that Haitian pa, unlike French pas, heads the negation phrase, NegP. In any case, one reason why pa precedes the verb in Haitian Creole is because the verb does not move, as we have seen; on the other hand, in French and in most Northern Italian dialects, the verb does move. In the absence of ne-type preverbal negation in Haitian Creole, the only indicator of the position of subject pronouns is whether they are obligatory in all conjuncts of coordinated predicates. Michel DeGraff (personal communication) informs me that this is tendentially the case, although not an absolute requirement: (i) Li vini, *(li) manje epi *(li) ale. 3 sg came 3sg eat and 3sg leave ‘S/He came, (s/he) ate and (s/he) left.’

DeGraff adds “However, this ‘constraint’ is often not obeyed in the speech and writings of Haitian Creole/French bilingual speakers—turning the stars in the examples above into question marks.” It is thus not clear whether this diagnostic is giving a clear indication of the status of the subject pronouns here. 20. Michel DeGraff (personal communication) informs me that Saramaccan is a counter-example to this generalization. In this creole, a tense marker can appear with both the first and the second verb. See Byrne 1992. Derek Bickerton (personal communication) points out that the same is true in Seychelles Creole, as illustrated by example (iib) of note 18. 21. The characterization of functional heads is a very complex matter that is central for current theory (and for the view of parameters adopted here). The one suggested in the text has the merit of simplicity, at least. “Underspecification” relates to the interfaces: functional heads are typically athematic and aprosodic (weakly stressed and often showing less-than-minimal prosodic structure (see McCarthy and Prince 1986)). Notice that, if parametric variation is the result of underspecification (Uriagereka 1988), then it follows that functional heads are the locus of parametric variation. Clark and Roberts (in progress) extend this idea by saying that parametric variation is imposed by the learner on an underspecified aspect of UG. Then the entire theory of parameters can be deduced from the statement that categories can be underspecified. 22. The above is a somewhat sketchy and second-hand overview of a number of salient syntactic properties of creoles. Bickerton (1981, chap. 2) lists a number of further properties as characteristic of creoles. Here I briefly list, exemplify, and comment on those which are most relevant to the present concerns: (i) No movement in yes/no questions Yo pa-t-a-vlé mênê-m lakay-li. (Haitian Creole) they not-tns-mon-want take-me house-his ‘They wouldn’t have wanted to take me to his house.’/’Wouldn’t they have wanted to take me to his house?’ (Bickerton 1981, 70)

This property fits straightforwardly with our claims (it is also discussed by Mühlhäusler (1986, 156)); inversion is I-to-C movement, triggered by a feature of C (see Rizzi 1991 and section 10.2), and as such is a marked property. Hence we do not expect to find it in creoles.

178 Ian Roberts

(ii) Same “verb” for existential and possessive constructions (as distinct from locatives) Gê you fâm ki gê you pitit-fi. (Haitian Creole) have one woman who have one child-daughter ‘There is a woman who has a daughter.’ (Bickerton 1981, 66)

This property is very interesting in the light of Kayne’s (1993) account of auxiliaries. Suppose that the existential auxiliary is Kayne’s “archi-auxiliary” BE; to form a possessive auxiliary, a functional head in the complement of BE must raise to BE. Hence there must be movement, and so the presence of distinct existential and possessive auxiliaries is a marked property. So we do not expect creoles to have them. (We cannot explain the systematic distinction with the locative auxiliary, however).

(iii) Wh-fronting Wisaid yu bin de? which-side you tns be-loc ‘Where have you been?’

(Guyanese) (Bickerton 1981, 70)

Watanabe (1993) has argued that syntactic wh-movement is universal (i.e., that the relevant “N-feature” of C is universally strong); in that case, the presence of wh-movement in creoles is not a reflection of a variant and so cannot be a marked property. Here, the reasoning essentially parallels what I said above in connection with subject-raising out of VP.

(iv) Negative concord on subjects (see also Déprez, 1999) Non dog na bait nan kyat. (Guyanese) ‘No dog bit any cat.’ (Bickerton 1981, 66) (v) Lack of subject relative pronouns Wan a dem a di man bin get di barn. (Guyanese) ‘One of them was the man who had the bomb.’ (Bickerton 1981, 62)

We cannot account for the phenomena in (iv) and (v), although it seems clear that they are independent of verb movement. Many of the other properties that Bickerton originally claimed to be common to creoles have either been claimed not to be, or claimed to be due to substrate influence, and hence I will not consider them here. 23. Bickerton (personal communication) has responded to this point by underlining that creoles always grammaticize certain functional elements (TMA markers and oblique Case markers) when these have been lost from the original language because of pidginization, and only much more sporadically grammaticize other parts of the functional system (e.g., relative pronouns). Thus, he claims, creoles “spell out the minimal amount of morphology required for the principles of UG to work.” This point seems to be connected to the issue of “Jakobsonian markedness” raised above: I have nothing to say at present as to why certain properties are overtly realized by grammaticalized items and others not. I agree (a) that these elements are morphology (LF affixes, see above) and (b) that they are “minimal,” precisely in the sense that they are LF affixes and not syntactic or lexical affixes, and so they do not induce the complexity of overt movement. However, it is not clear to me that other processes of language change do not reveal the same propensity for grammaticization of

Verb Movement and Markedness 179 certain features of functional heads. Neither is it clear that, in any sense beyond that of instantiating weak values of parameters, these properties indicate that this is the minimum needed for UG.

References Adone, Dany. 1994. The acquisition of Mauritian Creole. Amsterdam: John Benjamins. Baker, Mark. 1988. Incorporation: A theory of grammatical-function changing. Chicago: Chicago University Press. Baker, P., and C. Come. 1982. Isle de France Creole. Ann Arbor, Mich.: Karoma. Belletti, Adriana. 1990. Generalized verb movement. Turin: Rosenberg & Sellier. Bennis, Hans. 1986. Gaps and dummies. Dordrecht: Foris. Besten, Hans den. 1983. On the interaction of root transformations and lexical deletive rules. In W. Abraham (ed) On the formal syntax of the Westgermania. Amsterdam: John Benjamins, 47–131. Bickerton, Derek. 1975. Dynamics of a creole system. Cambridge: Cambridge University Press. Bickerton, Derek. 1981. Roots of language. Ann Arbor, Mich.: Karoma. Bickerton, Derek. 1984. The Language Bioprogram Hypothesis. Behavioral and Brain Sciences 7.2: 173–222. Bickerton, Derek. 1988. Creole language and the bioprogram. In F. Newmeyer (ed) Linguistics: The Cambridge survey; Volume II: Linguistic theory: Extensions and implications. Cambridge: Cambridge University Press, 268–285. Bickerton, Derek. 199l. Haunted by the specter of creole genesis. Behavioral and Brain Sciences 14.2: 364–366. Bickerton, D. 1999. How to Acquire Language without Positive Evidence: What Acquisitionists Can Learn from Creoles. In M. deGraff (ed) Language Creation and Language Change: Creolization, Diachrony and Development. Cambridge MA: MIT Press, pp. 49–4. Brandi Luciana and Patrizia Cordin. 1989. Two Italian dialects and the null-subject parameter. In O. Jaeggli and K. Safir (eds) The null subject parameter. Dordrecht: Kluwer, 111–142. Brown, Roger. 1973. A first language. Cambridge, Mass.: Harvard University Press. Byrne, Francis. 1987. Grammatical relations in a radical creole. Amsterdam: John Benjamins. Byrne, Francis. 1992. Scope and tense-spreading in Saramaccan. Journal of Pidgin and Creole Languages 7, 195–221. Cardinaletti, Anna, and Michal Starke. 1999. The typology of structural deficiency. In H. van Riemsdijk (ed) Clitics in the Languages of Europe. Berlin: Mouton de Gruyter, pp. 145-234. Chomsky, Noam. 1982. Some concepts and consequences of the theory of government and binding. Cambridge, Mass.: MIT Press. Chomsky, Noam. 1986. Barriers. Cambridge, Mass.: MIT Press. Chomsky, Noam. 1993. A minimalist program for linguistic theory. In K. Hale and S.J. Keyser (eds) The view from Building 20. Cambridge, Mass.: MIT Press, 1–52. Chomsky, Noam. 1994. Bare phrase structure. MIT Occasional Papers in Linguistics 5, Cambridge, Mass.: MIT, Department of Linguistics and Philosophy, MITWPL. Cinque, Guglielmo. 1994. Class lectures, Girona International Summer School in Linguistics. Clark, Robin, and Ian Roberts. 1993. A computational approach to language learnability and language change. Linguistic Inquiry 24, 299–345 [this volume, Chapter 2].

180 Ian Roberts Clark, Robin, and Ian Roberts. In progress. On complexity as the engine of variation. Ms., Universities of Pennsylvania and Wales. Culicover, Peter. 1971. Syntax. New York: Academic Press. DeGraff, Michel. 1993a. Is Haitian Creole a pro-drop language? In F. Byrne and J. Holm (eds) Atlantic meets Pacific. Amsterdam: John Benjamins, 71–90. DeGraff, Michel. 1993b. A riddle on negation in Haitian. Probus 5.1/2, 63–94. DeGraff, Michel. 1994. To move or not to move? Placement of verbs and object pronouns in Haitian Creole and in French. In K. Beals et al. (eds) Papers from the 30th Meeting of the Chicago Linguistic Society. Chicago: University of Chicago, Chicago Linguistic Society. DeGraff, M., and Y. Dejean. 1994. On Haitian Creole’s ‘very strict’ Adjacency Principle. Paper given at the meeting of the society for Pidgin and Creole Linguistics, Boston. Denison, David. 1985. The origins of periphrastic do: Ellegård and Visser reconsidered. In R. Eaton et al. (eds) Papers from the 4th International Conference on Historical Linguistics. Amsterdam, April 10–13, 1985. Amsterdam: John Benjamins, 45–60. Deprez, V. 1999. The Roots of Negative Concord in French and French-Lexicon Creoles. In M. deGraff (ed) Language Creation and Language Change: Creolization, Diachrony and Development. Cambridge, MA: MIT Press, pp. 375–428. Ellegård, Alvar. 1953. The auxiliary do: The establishment and regulation of its use in English. Stockholm: Almqvist & Wiksell, Gothenburg Studies in English. Emonds, Joseph. 1976. A transformational approach to English syntax: Root, structure-preserving and local transformations. New York: Academic Press. Emonds, Joseph. 1978. The complex V-V´ in French. Linguistic Inquiry 9: 151–175. Emonds, Joseph. 1980. Word order in generative grammar. Journal of Linguistic Research 1: 33–54. Enç, Mürvet. 1988. Anchoring conditions for tense. Linguistic Inquiry 18: 633–656. Falk, H., and A. Torp. 1900. Dansk-Norskens syntaks. Kristiana: H. Aschehoug. Görlach, Manfred. 1991. Introduction to Early Modern English. Cambridge: Cambridge University Press. Gray, Daniel. 1985. The Oxford book of late medieval prose and verse. Oxford: Oxford University Press. Green, John. 1988. Romance creoles. In M. Harris and N. Vincent (eds) The Romance languages. London: Routledge, 420–474. Greenberg, Joseph. 1963. Some universals of grammar with particular reference to the order of meaningful elements. In J. Greenberg (ed) Universals of language. Cambridge, Mass.: MIT Press. Guasti, Maria-Teresa. 1992. Verb syntax in Italian child grammar. Geneva generative papers, 145–162. Hoekstra, Teun, and Pieter Jordens. 1991. From adjunct to head. Ms., University of Leiden. Hoekstra, Teun, and René Mulder. 1990. Unergatives as copular verbs: Locational and existential predication. The Linguistic Review 7: 1–79. Huang, C.-T. J. 1989. Pro-drop in Chinese: A generalized control theory. In O. Jaeggli and K. Safir (eds) The null subject parameter. Dordrecht: Kluwer, 185–214. Jespersen, Otto. 1909–49. A Modern English grammar on historical principles. London: George Allen and Unwin. Karker, A. 1974. Sproghistorisk oversigt. In E. Oxenvad (ed) Nudansk ordbog. Copenhagen: Politikens Forlag. Kayne, Richard. 1983. Chains, categories external to S, and French complex inversion. Natural Language & Linguistic Theory 1: 109–137. Kayne, Richard. 1993. Towards a modular account of auxiliary selection. Studia Linguistica 47: 3–31.

Verb Movement and Markedness 181 Kayne, Richard. 1994. The antisymmetry of syntax. Cambridge, Mass.: MIT Press. Kemenade, Ans van. 1987. Syntactic Case and morphological Case in the history of English. Dordrecht: Foris. Kouwenberg, S. 1990. Complementizer pa, the finiteness of its complements and remarks on empty categories in Papiamentu. Journal of Pidgin and Creole Languages 5: 39–51. Kroch, Anthony. 1989. Reflexes of grammar in patterns of language change. Journal of Language Variation and Change 1: 199–244. Laka, Itziar. 1990. Negation in syntax: On the nature of functional categories. Doctoral dissertation, MIT. Lightfoot, David. 1979. Principles of diachronic syntax. Cambridge: Cambridge University Press. Lightfoot, David. 1991. How to set parameters: Degree-0 learnability. Cambridge, Mass.: MIT Press. Lumsden, J. 1999. Language Acquisition and Creolization. In M. deGraff (ed) Language Creation and Language Change: Creolization, Diachrony and Development. Cambridge MA: MIT Press, pp. 129-148. McCarthy, John, and Alan Prince. 1986. Prosodic morphology. Ms., Brandeis University. McCloskey, Jim. 1991. Clause structure, ellipsis and proper government in Irish. Lingua 85: 259–302. Mufwene, S. 1999. On the Language Bioprogram Hypothesis: Hints from Tazie. In M. deGraff (ed) Language Creation and Language Change: Creolization, Diachrony and Development. Cambridge MA: MIT Press, pp. 95-128. Mühlhäusler, Peter. 1986. Pidgin and creole linguistics. Oxford: Blackwell. Muysken, Pieter. 1981. Creole tense/mood/aspect systems: The unmarked case. In P. Muysken (ed) Generative studies on creole languages. Dordrecht: Foris, 181–199. Muysken, Pieter. 1988. Are creoles a special type of language? In F. Newmeyer (ed) Linguistics: The Cambridge survey, Vol. 2. Cambridge, Cambridge University Press, 285–302. Ouhalla, Jamal. 1991. Functional categories and parametric variation. London: Routledge. Pierce, Amy. 1989. On the emergence of syntax: A cross-linguistic study. Doctoral dissertation, MIT. Pierce, Amy. 1992. Language acquisition and syntactic theory: A comparative analysis of French and English child grammars. Dordrecht: Kluwer. Platzack, Christer. 1987. The Scandinavian languages and the null-subject parameter. Natural Language & Linguistic Theory 5: 377–401. Platzack, Christer, and Anders Holmberg. 1989. The role of AGR and finiteness. Working Papers in Scandinavian Syntax 43: 51–76. Plunkett, K., and S. Stromqvist. 1990. The acquisition of Scandinavian language. In J. Allwood (ed) Gothenburg papers in theoretical linguistics. Poletto, Cecilia. 1993. La sintassi del soggetto nei dialetti italiani settentrionali. Doctoral dissertation, Universities of Venice and Padua. Pollock, Jean-Yves. 1989. Verb movement, Universal Grammar, and the structure of IP. Linguistic Inquiry 20: 365–424. Rizzi, Luigi. 1982. Issues in Italian syntax. Dordrecht: Foris. Rizzi, Luigi. 1986a. Null objects in Italian and the theory of pro. Linguistic Inquiry 17: 501–557. Rizzi, Luigi. 1986b. On the status of subject clitics in Romance. In O. Jaeggli and C. Silva Corvalán (eds) Studies in Romance syntax. Dordrecht: Foris, 391–419. Rizzi, Luigi. 1987. Three issues in Romance dialectology. Paper given at the GLOW Workshop on Dialectology. 10th GLOW Colloquium, University of Venice.

182 Ian Roberts Rizzi, Luigi. 1991. Residual verb second and the wh-criterion. Technical reports in formal and computational linguistics, 2. University of Geneva. Rizzi, Luigi. 1993. Some notes on Romance cliticization. Ms., University of Geneva. Rizzi, Luigi. 1994. Some notes on linguistic theory and language development: The case of root infinitives. Ms., University of Geneva. Rizzi, Luigi, and Ian Roberts. 1989. Complex inversion in French. Probus 1: 1–39 [this volume, Chapter 9]. Roberts, Ian. 1985. Agreement parameters and the development of English modal auxiliaries. Natural Language & Linguistic Theory 3: 21–58 [this volume, Chapter 1]. Roberts, Ian. 1993a. Verbs and diachronic syntax. Dordrecht: Kluwer. Roberts, Ian. 1993b. A formal account of grammaticalization in the history of the Romance futures. Folia Linguistica Historica 13: 219–258. Roberts, Ian, and Ur Shlonsky. 1994. Pronominal enclisis and VSO languages. In R. Borsley and I. Roberts (eds) Comparative Celtic syntax. Cambridge: Cambridge University Press, pp. 171–199. Rohrbacher, Bernhard. 1994. The Germanic VO languages and the full paradigm: A theory of V to I raising. Doctoral dissertation, University of Massachusetts, Amherst. Romaine, S. 1988. Pidgin and creole languages. London: Longman. Ross, John. 1969. Auxiliaries as main verbs. In W. Todd (ed) Studies in philosophical linguistics, Series 1. Evanston, Ill.: Great Expectations Press. Rottet, K. 1993. Functional categories and verb raising in Louisiana Creole. Paper given at the Society of Pidgin and Creole Linguistics, Los Angeles. Smith, I. R. 1977. Sri Lanka Creole Portuguese phonology. International Journal of Dravidian Linguistics 7: 247–406. Sportiche, Dominique. 1988. Conditions on silent categories. Ms., UCLA. Sportiche, Dominique. 1996. Clitic constructions. in J. Rooryck & L. Zaring (eds) Phrase Structure and the Lexicon. Dordrecht: Kluwer, pp. 213-276. Sproat, Richard. 1985. Welsh syntax and VSO structure. Natural Language & Linguistic Theory 3: 173–216. Stowell, Tim. 1982. The tense of infinitives. Linguistic Inquiry 13: 561–570. Syea, Anand. 1985. Aspects of empty categories in Mauritian Creole. Doctoral dissertation, University of Essex. Thomason, S. G. 1981. Chinook Jargon in areal and historical context. University of Montana Occasional papers in linguistics, 2, 295–396. Thompson, R. W. 1961. A note on some possible affinities between the creole dialects of the Old World and those of the New. In R. B. Le Page (ed) Proceedings of the Conference on Creole Language Studies. London: Macmillan, 107–113. Todd, L. 1974. Pidgins and creoles. London: Routledge and Kegan Paul. Travis. Lisa. 1984. Parameters and effects of word order variation. Doctoral dissertation. MIT. Trosterud, Trond. 1989. The null subject parameter and the new Mainland Scandinavian word order—A possible counterexample from a Norwegian dialect. Papers from the 11th Scandinavian Conference of Linguistics, 87–100. Uriagereka, Juan. 1988. On government. Doctoral Dissertation, University of Connecticut. Vikner, Sten. 1995. Verb movement and the licensing of NP positions in the Germanic languages. Oxford: Oxford University Press. Visser, T. 1963–73. A historical syntax of the English language, Volumes I–IV. Leiden: E. J Brill. Watanabe, Akira. 1993. WH-in-situ, subjacency and chain formation. MIT Occasional papers in linguistics, 2. Cambridge, Mass.: MIT, Department of Linguistics and Philosophy, MITWPL.

Verb Movement and Markedness 183 Weverink, M. 1990. The subject in relation to inflection in child language. MA thesis, University of Utrecht. Wexler, Ken. 1994. Finiteness and head movement in early child grammars. In D. Lightfoot and N. Hornstein (eds) Verb movement. Cambridge: Cambridge University Press, 305–350. Williams, Edwin. 1994. A reinterpretation of evidence for verb movement in French. In D. Lightfoot and N. Hornstein (eds) Verb movement. Cambridge: Cambridge University Press, 189–206. Zanuttini, Raffaella. 1991. Syntactic properties of sentential negation: A comparative study of Romance languages. Doctoral dissertation, University of Pennsylvania. Zwart, Jan-Wouter. 1993. Dutch is head initial. Ms., University of Groningen. Zwart, Jan-Wouter. 1994. Dutch syntax. Doctoral dissertation, University of Groningen.

6

Theoretical Consequences Anna Roussou and Ian Roberts

6.0 Introduction In the Introduction and Chapter 1 (of Roberts & Roussou 2003; R&R henceforth) we provided the theoretical framework that underlies our approach to grammaticalisation, focusing on issues pertaining to language change and its relation to acquisition and the nature of parameters. Within this setting we proposed that grammaticalisation can be seen as the result of upward reanalysis which affects a subclass of lexical items. As such, its effects in the grammar can be explained and indeed predicted, without at the same time postulating a distinct process or mechanism of change. The empirical evidence for our approach was given in Chapters 2, 3, and 4 of R&R where we focused on the grammaticalisation of T, C, and D elements respectively. Some of the cases we have considered have been treated as typical examples of grammaticalisation to the extent that they involve lexical to functional reanalysis (the cases in Chapter 2 of R&R for example), while others have not been considered as such, partly because they involve functional to functional reanalysis (e.g. most of the cases in Chapters 3 and 4 of R&R). In this chapter, we return to the theoretical issues raised in the Introduction and in Chapter 1 of R&R. Our goal here is to elucidate these as far as possible, in the light of the analyses of the various cases of grammaticalisation analysed in Chapters 2–4 of R&R. We identified three main questions as themes in the Introduction of R&R: (i) the ubiquity of grammaticalisation—why is this kind of change so common? (ii) the apparent conflict between a descriptively adequate analysis of grammaticalisation, which amounts to identifying pathways of grammatical change, and an explanatorily adequate account of syntactic change as parametric change, which predicts random oscillation among possible UG instantiations; (iii) the inventory and nature of functional categories. In this chapter we will consider each of these issues in turn. In brief, we will argue (i) that grammaticalisation is so common because it represents a natural form of endogenous change; (ii) the conflict between description and explanation in diachrony can be resolved by introducing a notion of markedness into the parametric system; (iii) an account of the nature of

Theoretical Consequences 185 functional categories which takes them to be inherently deficient in their interface properties. The crucial property of functional categories is that they are only fully defined in relation to the syntactic system; in all other respects they are defective. Before going into these questions, however, we first review and systematise the cases we have looked at. This will give us a useful synthesis of the results of the earlier chapters of R&R.

6.1 A General Characterisation of Grammaticalisation In this section we list all the empirical cases discussed in R&R, by providing a schematic representation of the relevant structural changes. We then identify the parametric change and, where possible, the cause of the change as well. The cases considered are thus summarised as follows (here F* denotes the phonological realisation of F, a functional head, and the subscript “Merge” and/or “Move” indicates the mechanism of realisation): (1) English modals (2.1 of R&R): i. Structural change: [TP V+T [VP tV TP]] > [TP T VP] ii. Parametric change: T*Move > T*Merge iii. Cause: loss of infinitive marker. (2) Romance future/conditionals (2.2 of R&R): a. i. Structural change: [TP [VP XP thabeo [T habeo]]] > [TP XP [T habeo]] ii. Parametric change: T*Move > T*Merge iii. Cause: morphological irregularity/meaning of habeo b. i. Structural change: [TP XP [T habeo]] > [T Infinitive [T habeo]] ii. Parametric change: T*Move > T*Merge/Move iii. Cause: reduced productivity of leftward XP-movement (“weakening” of OV) c. i. Structural change: [TP [T V +Af] [VP tV]] > [TP [T V + Af] [VP tV+Af]] ii. Parametric change: T*Merge/Move > T*Move iii. Cause: loss of trigger for T*Merge/Move (e.g. mesoclisis) (3) Greek tha (2.3 of R&R): a. i. Structural change: [TP V+T [VP tV TP]] > [TP T+V1 [VP tV1 + V2] ii. Parametric change: T*Move > T*Merge iii. Cause: loss of infinitival morphology

186 Anna Roussou and Ian Roberts b. i. Structural change: [MP T+M [TP tT VP]] > [MP M [TP V+T [VP tV]]] ii. Parametric change: M*Move > M*Merge, T*Merge > T*Move iii. Cause: reanalysis of impersonal thelei c. i. Structural change: [MP the [TP [VP [CP na +Vlexical]]] > [MP tha [TP V+T [VP tV]]] ii. Parametric change: loss of C*, T*Merge > T*Move iii. Cause: reanalysis of thelei+V in the presence of the na. (4) Greek na (3.1 of R&R): i. Structural change: [CP C [MP hina [TP. . . > [CP oti/na [MP tna [TP . . . ii. Parametric change: C > C*Move (?) iii. Cause: loss of subjunctive morphology, reassignment of Mood features from T to M. (5) Calabrian mu (3.2 of R&R): i. Structural change: [CP AdvP C [NegP (Neg) [MP M [TP. . .> [CP C [NegP (Neg) [MP M [TP. . . ii. Parameter change: none iii. Cause: loss of subjunctive morphology, reassignment of Mood features from T to M. (6) English to (3.3 of R&R): i. Structural change: [PP to [DP V + enne]] > [VP V [CP [MP to [TP [T V+enne]]]]] ii. Parametric change: M > M*Merge iii. Cause: loss of infinitives/subjunctives (7) Germanic that; Greek pou (3.4 of R&R): i. Structural change: [CP Proni [C (Prt)] [IP…. ti…. ]]] > [CP [C that (+Prt)] ii. Parametric change: C*Move(Merge—Germanic) > C*Merge iii. Cause: ambiguity of relative clauses (8) Serial verbs becoming complementisers (3.5 of R&R): a. i. Structural change:

[CP C [TP T [VP1 V1 [VP2 V2]]]] > [CP C [TP [T V1] [VP2 V2]]]

ii. Parametric change: T > T*Merge iii. Cause: unknown

Theoretical Consequences 187 b. i. Structural change: [CP C [TP [T V1] [VP2 V2]]] > [CP [C V1] [TP T [VP2 V2]]] ii. Parametric change: T*Merge > T, C > C*Merge iii. Cause: unknown (9) Romance determiners (4.1 of R&R): i. Structural change: [DP [DemP ille] D .. > [DP [D (il)le]] ii. Parametric change: D[+def] > D[+def]* iii. Cause: loss of morphological case-marking on DP (10) French n-words (4.2.2 of R&R): i. Structural change: [DP [D Ø] [NumP [Num rien] [NP trien]]] > [DP [D Ø] [NumP [Num rien] NP]] ii. Parametric change: Num*Move > Num*Merge iii. Cause: loss of null indefinite D (11) French Stage-Two negation of Jespersen’s Cycle (4.2.3 of R&R): i. Structural change: V [DP mie/pas/point ([PP de DP])] > V [Neg mie/pas/point] [VP ([DP Ø de NP])] ii. Parametric change: (low) Neg > Neg*Merge iii. Cause: loss of non-negative content of negator, reanalysis of dephrase (not pas) (12) Greek oudhen > dhen (4.2.3 of R&R): a. i. Structural change: [DP ou [NumP de [NP hen]]] > [DP dhen [NumP [NP] ]]] ii. Parametric change: Num* > Num iii. Cause: loss of ou due to phonological change b. i. Structural change: [NegP [DP oudhen] Neg [MP M [. . .]]] > [NegP [Neg dhen ] [MP M . .]]] ii. Parametric change: Neg*Move > Neg*Merge iii. Cause: loss of ou due to phonological change (13) Greek ti(s) (4.3 of R&R): i. Structural change: [DP DØ [NumP ti [NP N]]] > [DP ti [NumP Num [NP N]]] ii. Parametric change: D > D*Merge, Num*Merge> Num iii. Cause: development of determiners (14) Greek existentials (4.3 of R&R): i. Structural change: [QP Q [DP pjoios [NumP [NP N]]] > [QP ka [DP pjoios [NumP Num [NP N]]]]

188 Anna Roussou and Ian Roberts ii. Parametric change: Q > Q*Merge iii. Cause: (13) (15) Free relatives > free-choice indefinites (4.4 of R&R): i. Structural change: [DP [D qui(s) [D 0 ] [CP [DP t ] C [IP vis tDP ]]] > [DP [D quivis ] NP ] ii. Parametric change: D*Move > D*Merge iii. Cause: phonological reduction (16) Free relatives > universals (4.4 of R&R): i. Structural change: [QP [Q quis [Q que]] [DP D [CP [DP tquis] C [IP. . .V. . .]]]] > [QP [Q quisque] DP] ii. Parametric change: Q*Move > Q*Merge iii. Cause: phonological reduction (17) Northern Italian subject clitics (4.5 of R&R): i. Structural change: [PersP DPi [Pers V] [NumP. . . [VP ti. . . > [PersP [Pers D [NumP [Num V] . ii. Parametric change: Pers*Move > Pers*Merge iii. Cause: paradigm levelling and split (18) Welsh agreement (4.6 of R&R): i. Structural change: [Agr V + D [YP. . . > [Agr V [YP. . . ii. Parametric change: X*Move/Merge > X*Move iii. Cause: V-movement In the structures given in (1)–(18) we can identify a number of common properties and, as we will show, immediately reduce them to a single pattern, which we identify as structural simplification (in a way that will be made clear below). Let us proceed by grouping the changes given above. The first pattern we identify is that prototypically exhibited by English modals, the Romance future, Greek tha, and the serial verb constructions: movement is lost and a new exponent for the higher functional head, which corresponds to the earlier target of movement, is created. The same pattern is found with changes inside the DP for example: a lower head (Dem, Num, or even N) moves to a higher functional head, such as D, or Q, movement is lost and the original moved item becomes reanalysed as the exponent of the higher head. This is in fact the pattern we identify in the structures in (1), (2a), (3), (8)–(10), (12a), (13), (15). In other words, the lexical item that formerly realised a lower head has now become the realisation of a higher functional head. This can be schematically represented as in (19): (19) [XP Y+X [YP tY ] > [XP Y=X][YP Y]

Theoretical Consequences 189 What the structure in (19) essentially tells us is that the lexical item that at some point realised both X and Y, now becomes the realisation of X. This yields the possibility of a new realisation for Y. The second group can be identified with the changes that gave rise to the creation of modal particles, such as na and mu, discussed in Chapter 3, Sections 3.1 and 3.2 of R&R respectively. As already argued in the relevant sections, these changes were triggered by the loss of subjunctive morphology. Recall that the morphological realisation of the subjunctive was given in the form of a series of agreement affixes which differed from the indicative series. Once the two paradigms collapsed, giving a single paradigm (the indicative), the subjunctive features are now realised in M and take the form of a distinct lexical item, while the different readings associated with the subjunctive are derived from the combination of the particle and the different forms of the finite verb. This change can be schematised as follows: (20) [XP XF . . .[YP . . .YF . . .]] > [XP XF . . .[YP . . .Y. . .]] This change is very similar to that in (19). In fact, to the extent that X’s content is exhausted by F, it has the same outcome, namely [XP Y=X [YP Y]]. The only clear difference between (19) and (20) is that in the latter case it is the features associated with Y that become part of X and not Y itself. In a more abstract sense though the relevant structure is changed in the same way. Furthermore, in both cases the reanalysis has created new exponents for Y (namely na from hina and mu from modo). This change is also relevant for the development of to, summarised in (6): the reanalysis was triggered by the loss of infinitival and subjunctive morphology (as in Greek and Calabrian), creating a new exponent for the realisation of these features, namely M* = to. The third group we identify is that which appears to be a bit more complex, as it actually involves two relevant steps and covers all the rest of the cases where a lexical item associated with the realisation of the DP becomes the realisation of a functional head in the clause structure. The two steps proceed in parallel to some extent, as the changes inside the DP can be taken as responsible for the realisation of the clausal functional heads (and vice versa). The first step involves movement of a DP to a higher functional projection, giving a specifier. The second step involves reanalysis of this DP as a head. This is what we find in the development of n-words as negative morphemes (French pas, Greek dhen), and the reanalysis of pronominals as Agr heads (subject clitics in Italian, agreement affixes in Welsh, and possibly Indo-European) (the structures in (2b), (11)–(12b), (17)–(18)). The structural change is schematically given in (21): (21) [XP YP X . . .[. . .tYP . . .] > [XP Y=X . . .[. . . .]] Once again the structural change has created a new exponent for X. A similar account extends to the examples in (15) and (16) (universal quantifiers

190 Anna Roussou and Ian Roberts out of relative clauses) although the difference here, following Kayne (1994), is that the moved element is already a head that adjoins to another head. The final case that follows this pattern is that of the complementisers that and pou, given in (7) and (8): a DP element moves to SpecCP and becomes reanalysed as C. (14) comes under this schema too, assuming CG kan was originally an XP. This covers all the changes listed in (1–18), except for (2c) and (3c). In (3c), however, the crucial change really involves the reduction of the biclausal structure to a monoclausal one; the lexical verb moves to the T-position of its clause both before and after the change, as finite verbs have done at all periods of Greek (recall that Greek is a null-subject language). Leaving this aside, (3c) amounts to the loss of a piece of structure. (2c) is a rather special case, which we will return to when we discuss markedness in the next section. As the preceding discussion shows it is possible to reduce the structures in (1)–(18) to the three basic configurations given in (19)–(21). In fact, we immediately observe that the structures in (19)–(21) are one and the same thing, given in (22): (22)

XP Y=X

… Y

YP (where YP does not have to be the complement of X)

In all cases the reanalysis gives rise to a new exponent for a higher functional head X; this is the formal correlate of grammaticalisation. What is even more interesting is that all the changes described above may have started from different configurations and were driven by different causes. It is intuitively clear that all of the schemata in (19–22) involve structural simplification, in that the structures on the right of the arrow are less elaborate than those on the left. We have mentioned this point repeatedly in our exposition of the individual cases in the preceding chapters. But how exactly is relative simplicity determined? In principle, there are several formal options available in syntactic representations or derivations: one could count nodes, branching nodes, traces, chain links, symbols, or features. Counting nodes yields the correct result in (19) and (21), but not (20): here both the conservative (left of the arrow) and the innovative (right of the arrow) structures have the same number of nodes. The same considerations apply if branching nodes are taken as the diagnostic for simplicity. Similar considerations hold if either traces (or copies) or chain links are taken as diagnostic; according to either of these criteria, in (19) and (21) the innovative structure is simpler than the conservative one, but not (20). Regarding the computation of symbols or features, (20) again poses problems, in that it is not clear that the reanalysed structure has fewer of either of these than the conservative one.

Theoretical Consequences 191 We must therefore go a step further and provide an account which is more in accordance with our notion of parametric variation. The simplification that takes place in all the cases of grammaticalisation discussed so far correlates with the morphological realisation of features. Prior to reanalysis what we find is one lexical item α spelling out the features of two (or perhaps more) heads X and Y. So what is at stake is the earlier item Y becoming a pure instantiation of the feature content of the relevant head X. This works in the case of loss of movement, as in (19); Y must have an X-feature in order to move. It works in cases like (21), as the original YP (which presumably had more than one feature, in virtue of being an XP—if not, it reduces to case (19)) becomes X. Most importantly in the present context, given our discussion above, it works in the case of (20), where F becomes the sole instantiation of X, having previously been syncretised on Y (recall that the examples of (20) all involve mood markers appearing in M, where previously mood had been part of the verb-morphology; we can think of the earlier system as involving an Agree relation between M and V (or perhaps T) for mood features, and the change as essentially the loss of Agree). The relevant notion of simplicity is determined by the following simplicity metric: (23) A structural representation R for a substring of input text S is simpler than an alternative representation R’ iff R contains fewer formal feature-syncretisms than R’. Feature syncretism can be defined as the presence of more than one formal feature in a given structural position: H [+F, +G . . .]. Thus the structure with the least occurrences of multiple features on single positions is the simplest. Structural simplification should be understood in terms of PF-realisation of these features, so a lexical item which realises X and Y is more complex than one which realises X only. This approach to structural simplification allows us to maintain the idea that there is a universal hierarchy of functional heads (on which, see the next subsection), and at the same time capture parametric variation in a rather clear way. One thing that this approach rules out, for example, is reordering of categories. The metric in (23) works for all the types of change in (19–21); where X(P) moves in the conservative structure (i.e. in (19) or (21)), it must have had at least two features, one allowing it to Merge in the original position, and one triggering movement (this is consistent with our approach to parametric variation in terms of PF-realisation, which is essentially a way of formalising movement). Also in (20), Y originally had more features than just F, which is why it Merges where it does in the conservative grammar. After the reanalysis, the lexical item α becomes the sole realisation of X. If the realisation of a given feature X is what parametric variation amounts to, then we can clearly see the link between parameter setting and syntactic change. So we arrive at a formal approach to grammaticalisation which is

192 Anna Roussou and Ian Roberts based on parameter setting (we will consider the types of parameter change listed in (1)–(18) in 6.2.3). The general schema in (22) has two more implications. First, it predicts that grammaticalisation can be cyclic. In other words, nothing prevents Y from being reanalyzed again, yielding new exponents for X. This is indeed a desirable result and one that seems to be supported by the empirical data. Second, it predicts that grammaticalisation can be successive, namely once Y has been reanalysed as X, it can further be reanalysed as an even higher functional head Z. Indeed, R&R show how this works in our discussion of the empirical cases in Chapters 2–4. The reanalysis of modals is again a clear example of how this works. In Chapter 2, Section 2.1 of R&R we argued that the dynamic (root) vs. epistemic modal readings can be structurally distinguished in the following way: dynamic modals are merged in v, while epistemic ones merge in T (and lexical verbs are simply merged in V). This way we are in a position to define the ‘path’ of reanalysis: from V (lexical), to v (dynamic/root), to T (epistemic), with the possibility of further reanalysis as C, as we argued with respect to the modal thelei in Greek (Section 2.3). Therefore our approach has a clear advantage as it allows us to capture the path of the structural change in the grammaticalisation cases along the hierarchy of functional heads. Furthermore we see that the path is always upwards.1 We elaborate on this in Section 6.2.2. Finally, note that this characterisation of grammaticalisation does not rely on an earlier stage involving visible movement, either of a head or of an XP. That is one option, but the option in (20), which we might characterise as loss of Agree, also falls under the general characterisation driven by (23).

6.2 Grammaticalisation and the Theory of Language Change 6.2.1 Structural Simplification and Language Acquisition In Chapter 1 of R&R we discussed the correlation between language change and language acquisition, following Lightfoot (1979, 1991, 1998). The idea has been that parametric change is triggered when a population of acquirers converges on a given parametric setting which is different from the one adopted by the adult grammar. The question we now need to address is how structural simplification, conceived as ‘avoid feature-syncretism’ in (23), works in the process of language acquisition. In Chapter 1 of R&R we said that the conservative nature of the language acquirer favours a ‘simplified’ structure. This term obviously needs to be redefined and/or clarified in the context of the present discussion. As already mentioned we assume that there is a universal hierarchy of functional categories present in all languages. The parametric options are nothing more than options of how the features of these categories are spelled out, if they are. What the language acquirer is faced with is the following: on the one hand, there is the universal order of functional heads which is

Theoretical Consequences 193 the same for all languages, or at least a universal pool of functional categories which project in the clause structure in a predicted way (we will elaborate on this in Section 6.3). The operations Merge and Agree basically see these features and relate them in a rather mechanical way (putting them together in the case of Merge, matching them in the case of Agree). This is essentially what the computational system (CHL) does. In a way, this is the invisible side of CHL, which interfaces with LF (and semantic interpretation in general). On the other hand, there is the interface with PF, where these features are pronounced somehow, so the computational system has to ensure that there is some matching between functional features and their realisation, that is lexical items. The list of lexical items available to each language is arbitrary, and this is one aspect (if not the only aspect) of the imperfection of the language system. Given this arbitrariness, and given that language is a formal system (possibly the only one) which interfaces with PF, the ideal situation would be if there was a one-to-one matching between lexical items and features. In other words, ideally we would expect that each feature has its own unique PF-realisation. Of course, this is just an idealisation. What we actually find is a rather blurred picture for a number of reasons: either because in some cases there is no realisation for a given feature (which is the ideal situation with respect to the LF side), or in many cases a given lexical item spells out a number of different features. This is the typical situation of what we know as movement. There is one more possibility, namely that a given feature may receive more than one realisation, in a way that is contextually determined. Recall the case of tha and na as modal particles and realisations of M* (Chapter 3, Section 3.1 of R&R). Despite the fact that they both have the same effect, they do not trigger the same interpretations. This is because tha at least has a subfeature (call it ‘future’) of the feature irrealis which is one value for M (we discuss subfeatures in more detail in 6.2.3). The picture that emerges is a rather conflicting one, but only apparently so: the perfection of the language system requires that all features are present, ideally without any PF-interference. On the other hand, PF imposes its own restrictions, and ideally the preferred option is to have a one-to-one mapping between features and lexical items. If this is correct, then this is precisely what the conservative nature of the language acquirer dictates: if a feature must have a realisation (because it is unambiguously cued—see below), then the same realization for more than one feature, i.e. syncretism, is preferentially avoided. This approach explains why inflectional morphology is so important in syntactic change (this of course is hardly a novel observation). Consider the structural changes in (1)–(18) in the previous section and the cause of the change in each case (where this is known). The “cause” should be construed as what prevents the acquirer from postulating the simpler structure in the earlier grammar. The cue for the more complex, conservative grammar is in most cases morphology. In other words, the cause is the trigger for the presence of a more complex system (complex in

194 Anna Roussou and Ian Roberts connection with the notion of simplification as defined above). So we see that much syntactic change comes from outside syntax, basically from PF (cf. Keenan (1996), Longobardi (2001), on the inertial nature of syntactic change). We can rather easily support our claim above by looking at each of the cases in (1)–(18), or at least for all those cases where we have some indication for the cause that led to parametric change. Starting with (1) (English modals), we notice that this is due to the loss of the infinitival morphology. This same loss, corroborated by the loss of the subjunctive morphology, is also responsible for the development of to as the new ‘infinitival’ marker, namely for the change in (6). The cases in (4) and (5), i.e. of na and mu, also arise from the loss of the subjunctive morphology, in addition to the loss of the infinitival marking. The last change is partly responsible for the development of tha in the the na construction (cf. (3), and in particular (3c)), although the initial cause has to do with the loss of a distinct morphological paradigm for the future indicative (which collapsed the future indicative with the past tense subjunctive). This change is also responsible for the development of a periphrastic construction for the expression of the future in (2), followed by further causes such as the morphological irregularity of habeo, the reduced productivity of leftward XP-movement (probably related to the loss of morphological case—see Roberts (1997 [this volume, Chapter 4])), etc. The same extends to (9): the loss of case marking seems to have given rise to the development of D. In other cases, one reanalysis triggers another one, as is the case in (13) and (14) (the restriction of indefinites as wh-words gave rise to a new series of existential quantifiers in the history of Greek). Finally, morphological levelling is the cause for the reanalysis of subject clitics in (17). In most of the cases discussed so far, there is an ambiguity in the structure which has to do with morphophonological changes, or in some cases the ambiguity is purely structural (as in (7) for example). This supports the idea that syntactic change is triggered when marked input is obscured to the language acquirer, who then switches to the default. “Marked” input here simply means “input containing feature syncretisms”. As we will show below, following the preceding discussion, we now have a way to define formal markedness in terms of PF-realisations. To summarise, structural simplification can be seen in the form of (23), that is as a way of avoiding feature syncretism. Given that the latter is provided by the morphological system, which has to be learned and is furthermore parametrised, we have a clear way of linking the notion of simplification with the process of language acquisition. Once the cue (that is morphology mainly) becomes obscure or ambiguous the conservative nature of the language acquirer will opt for a simplified structure: maximise the correspondence between structure and lexical items. This of course yields new exponents for functional features as is indeed what we get in grammaticalisation cases.

Theoretical Consequences 195 6.2.2 Grammaticalisation and Other Syntactic Changes In the present book we have treated grammaticalisation as an instance of upwards reanalysis, which gives rise to new functional material. Furthermore, this reanalysis (at least in most of the cases) involves loss of movement (Move > Merge change). In the present section we will compare grammaticalisation to other syntactic changes, showing their similarities and differences. Let us start by considering three well-known cases in the literature, namely the loss of V2, the loss of V-to-I movement, and the OV > VO change, summarised in (24), (25), and (26) respectively: (24) Loss of V2: [C [T V]][TP . . . tT . . .] > C . .[TP . . . [T V]...] (25) Loss of V-to-T: [T V] . . . [VP . . tV . . .] > T . .[VP . . . V . . .] (26) OV > VO: [FP Obj . . . [VP . . . (V) tObj . . .] > [VP . . . (V) Obj . . . ] The loss of V2, schematically represented in (24), has been discussed with respect to a number of languages, for example English (van Kemenade (1987), (1997), Haeberli (1999), Kroch & Taylor (1997), Pintzuk (1991), among others), French (Adams (1987), Vanelli, Renzi and Benincà (1986), Roberts (1993a), Vance (1989), (1997)), Northern Italian dialects (Renzi, Vanelli, and Benincà (1986)), Welsh (Willis (1998)). (25) represents the loss of V-to-T movement in the history of English (Roberts (1985 [this volume, Chapter 1]), (1993a), Pollock (1989), Warner (1997), among others; see also Vikner (1997) for Danish). Finally, (26) is an instance of word-order change, as exhibited in the history of English. The schema in (26) is based on Roberts (1997 [this volume, chapter 4]), who accounts for this change in terms of Kayne’s (1994) antisymmetry approach: OV is the result of object raising to a position higher than that of the verb, so the change from OV to VO involves loss of object movement. In brief, what all the changes in (24)–(26) share is loss of movement to a higher functional head (C, T, or F respectively). Although the above changes involve loss of movement they are not treated as instances of grammaticalisation. Since in our approach grammaticalisation has also been related to the loss of movement, the obvious question is how we distinguish between the cases above and the typical cases of grammaticalisation discussed in the previous chapters. To illustrate the differences we will focus on the example in (25), namely the loss of V-to-T movement. The reason for choosing (25) is twofold: first, because we can easily compare it to the grammaticalisation of T elements, namely modals, in the history of English (see Chapter 2, Section 2.1 of R&R), and second,

196 Anna Roussou and Ian Roberts because the empirical aspects of the change in (25) are more straightforward than the ones in (24) and (26). The examples in (27) illustrate the presence of V-to-T movement in pre-17th-century English: (27) a. if I gave not this accompt to you if I gave not (=didn’t give) this account to you (1557: J. Cheke, Letter to Hoby; Görlach (1991: 223), Roberts (1999: 290) [see this volume, p. 142]) b. How cam’st thou hither? How camest thou (did you come) here? (1594: Shakespeare, Richard III; Roberts (ibid.)) c The Turkes.. made anone redy a grete ordonnaunce The Turks . . . made soon (=soon prepared) a great ordnance. (c1482: Kaye, The Delectable Newsse of the Glorious Victorye of the Rhodyans agaynest the Turkes; Gray (1985: 23), Roberts (1993a:253)) d In doleful wise they ended both their days (1589: Marlowe The Jew of Malta III, iii, 21; Roberts (ibid.)) The finite V precedes negation in (27a), inverts with the subject in questions in (27b), precedes the adverb in (27c), and precedes a floated quantifier in (27d). All the above examples then show clearly that V raised to T (and in the appropriate contexts to C as well). The picture is quite different in Modern English: the verb cannot raise to T or C. It is in part for this reason that negation and question formation trigger do-support. The loss of V-movement has been related to the loss of inflectional morphology. Vikner (1997: 200) formulates the following condition on V-raising: (28) An SVO language has V-to-I movement if and only if person morphology is found in all tenses. Whether (28) needs any refinement or not does not affect our present discussion (see for example Alexiadou & Fanselow (2000), Bobalijk (2000) for criticisms). What is crucial is the fact that the reanalysis in (26) affected all lexical verbs and moreover left T with no lexical realisation in the relevant context. In this respect the change in (26) (and for that matter those in (24) and (27)) can be abstractly represented as in (29): (29) [XP Y+X [YP tY]] > [XP X[YP Y]] The structure in (29) is also an instance of structural simplification, in the sense that the realisation of X under movement is no longer present after the reanalysis. If this is correct, then we need to outline what the difference is

Theoretical Consequences 197 between (29) and the structural simplification we get in grammaticalisation, as in (22) in Section 6.1, repeated below as (30): (30) Grammaticalisation: [XP Y+X [YP tY]] > [XP Y=X [YP Y]] The input in (29) and (30) is the same, but there is a clear difference in the output, as the structure on the right-hand side of the arrow shows in each case. (29) expresses the loss of V-movement, while (30) the grammaticalisation of modals, as discussed in Chapter 2, Section 2.1 of R&R. Given the two sets of data and the relevant structures, it is rather easy to identify the differences between the two. First, unlike the loss of V-to-T movement in (29), the grammaticalisation of modals under (30) created a new realisation for T (T*Merge). Second, the reanalysis in (29) is ‘downwards’, while grammaticalisation is an instance of ‘upwards’ reanalysis. Third, while loss of V-raising in (26) affected the whole class of lexical verbs, grammaticalisation affected a subclass of verbs with a number of properties in common (they are intensional, tend to lack argument structure, and are subject to morphological irregularities). Fourth, grammaticalisation is associated with semantic ‘bleaching’ and phonological reduction (for example the modal will lost its argument structure and failed to express volition, moreover the contracted form shows up with the modal reading and not with the previous lexical one). On the other hand, ‘downward’ reanalysis as in (29) has no such consequences. The differences between the two types of reanalysis are summarised in (31) and (32) below. (31) covers all the changes in (24)–(26) which involve loss of movement, but are not instances of grammaticalisation, while (32) expresses the properties of grammaticalisation: (31) ‘Downward’ changes as in (24)–(26): a. apply to all members of Y; b. do not change category of Y; c. involve no semantic or phonological change to Y-roots. (32) ‘Upward’ changes, as in (30): a. apply only sporadically or to morphological subclasses of Y; b. change category of Y; c. are associated with semantic bleaching and morphophonological reduction. What is interesting to note is that the ‘downward’ changes in (31) have no interface effects. For example, the loss of V-to-T movement did not affect the interpretation of lexical verbs (or T for that matter) and the reanalyses in (24)–(26) did not affect the argument structure of the verbs that underwent

198 Anna Roussou and Ian Roberts the change (or in the case of (27) the nature of direct objects). Furthermore, the change in (31) does give not rise to any phonological effects in the sense of triggering phonological reduction, etc (although arguably it affected the PF-realisation of T). On the other hand, the ‘upward’ changes in (32) have interface effects, as they go along with phonological reduction, and affect the meaning of the reanalysed element (see the loss of volitional meaning on will and Greek thelo as an auxiliary for example). The absence of interface effects in the case of ‘downward’ reanalysis can be directly linked to the fact that this change does not give rise to functional material, while ‘upward’ reanalysis does.2 We will discuss this more in the following section (6.3). As the above discussion shows, our approach can sufficiently express grammaticalisation and furthermore distinguish it from other cases which also involve loss of movement, by formulating it in terms of ‘upward’ vs. ‘downward’ change. This way we can capture the similarities of the two types of changes and at the same time explicitly state that only grammaticalisation gives rise to new functional material. Finally, it is interesting to note that one further similarity that the two changes share is that the cause can be identified with morphological changes: for example, as already mentioned the loss of V-to-T movement relates to loss of agreement marking, and the OV > VO change with the loss of case distinctions (the cause for the loss of V2 is more speculative, but there may be a correlation with the loss of particles in the C-system, cf. Ferraresi (1997), Roberts & Roussou (2002)). 6.2.3 Descriptive and Explanatory Adequacy: Questions of Markedness In this section we address three issues. First, we return to the question of the tension between descriptive and explanatory adequacy adumbrated in the Introduction of R&R. We resolve this tension by adopting a particular point of view regarding markedness of parameter values, which is defined in terms of the simplicity metric in (23). Second, we show how the majority of the parameter changes listed in (10)–(18) are changes from a marked to an unmarked value. Third, we show how change from unmarked to marked is possible in this context, looking at the change in (2c) above. This completes the picture of syntactic change that we want to present here. As we said in the Introduction, the study of grammaticalisation raises the familiar tension between descriptive and explanatory adequacy in the diachronic domain. A descriptively adequate account of this class of changes results in defining pathways of change. In our terms, as we mentioned in 6.1, pathways of grammaticalisation are defined by the functional hierarchy through which grammaticalised material can travel by means of successive upward reanalyses. Thus grammaticalisation pathways can be deduced from the functional hierarchy (and possibly viceversa), once upward reanalysis is taken as a basic mechanism of syntactic change. However, if we take parameter setting to be an explanatory notion

Theoretical Consequences 199 for syntactic change (and, to the extent that we are to make the connection to language acquisition, as in section 6.2.1, it is), then we are led to an apparent difficulty. In principles and parameters theory, parameters can be thought of as creating a space of variation in which individual grammatical systems are distributed. Synchronically, different systems are viewed as scattered in this space. Diachronically, they randomly “walk” around the space as a function of time. This view is not compatible with the existence of diachronic drift, pathways of change, etc., a point repeatedly and cogently made by Lightfoot (see in particular Lightfoot (1979) and Lightfoot (1998)). So, as stated in the Introduction to R&R, we must reconcile the evidence for pathways of change at the descriptive level with the fact that an explanatory account of syntactic change must involve random parameter change. Following Clark & Roberts (1993 [this volume, Chapter 2]) and Roberts (2001), we propose that a version of the traditional linguistic concept of markedness is able to resolve this tension. The relevant notion of markedness is rooted in the simplicity metric, which we repeat here: (23) A structural representation R for a substring of input text S is simpler than an alternative representation R’ iff R contains fewer formal feature-syncretisms than R’. Now, we stated in the last section that movement operations are always associated with feature-syncretism, since a moved element has one feature licensing it in its merged position and one triggering movement (or Agree—see the discussion in 6.1), then movement is always associated with relatively complex representations. Let us suppose, then, that F*Move is a marked option relative to F, precisely because it entails a more complex representation than F in terms of (23). Also, if no phonological matrix is simpler than the presence of a phonological matrix (since a phonological matrix consists of features, this could be related to (23)), F*Merge is relatively marked as compared to F, but less marked than F*Move as it lacks the features relevant for triggering movement. Finally, we consider that F*Move/ Merge is the most marked option of all, as this involves two phonological matrices and the features involved in triggering movement. So we arrive at the markedness hierarchy for parameter values in (33) (where “>” means “more marked than”): (33) F*Move/Merge > F*Move > F*Merge > F Relatively marked parameter values require overt, robust cues. In the absence of such cues, a less marked option is taken, with F as the default. As we pointed out in the previous section, the notion of “cause” of the changes in (1)–(18) above, should be understood as the factor cuing a relatively marked setting.

200 Anna Roussou and Ian Roberts Let us look at these ideas in the light of the definitions of P-expression and trigger given in Chapter 1, section 1.1 of R&R: (34) Parameter expression:

A substring of the input text S expresses a parameter pi just in case a grammar must have pi set to a definite value in order to assign a wellformed representation to S.

(35) Trigger:

A substring of the input text S is a trigger for parameter pj if S expresses pj.

Given markedness, only marked values of parameters need to be expressed. P-expression then reduces to: (36) a. expression of movement relations (through syntactic “displacement”);3 b. expression of free functional morphemes (through PF-realisation). More generally, acquirers are looking for overt realisations of functional heads. If they analyse a functional head as [F F ], we have the F*Merge option. If it is analysed as [F(P) G F ] (where G stands for moved material of any kind), we have the F*Move option, or, if F has its own phonological realisation, F*Move/Merge. The crucial point, however, is that the conservative nature of the learner, since it prefers maximally simple representations in the sense defined by (23), always favours the default option F. So, if the elements and relations which lead to one of the complex realisations of F are not robustly expressed in the trigger, the default option is chosen. Let us now consider the markedness hierarchy in (33) in relation to the changes summarized in (1)–(18). We have the following result: (37) a. F*Move/Merge > F*Move: (2c), (18) b. F*Move > F*Merge: (1), (2a), (3a), (3b), (7),4 (10), (12b), (13), (15), (16), (17) c. F*Merge > F: (3c), (12a) The clear majority of the changes involve reductions in markedness. The three types of markedness reduction seen in (37) correspond to different subtypes of grammaticalisation mentioned in the literature: (37a) creates new morphology; (37b) is “true” grammaticalisation, in that it creates a realisation of a new functional head; finally, (37c) is loss of realisation (note that Stage Three of Jespersen’s cycle, discussed in 4.2.3 of R&R, would be a further case of this type of change: Neg* > Neg applying to a “high”, C- or T-related Neg). However, on the basis of what has been said so far, some of the changes in (1)–(18) seem to involve an increase in markedness. This is the case for (2b), (4), (6), (8), (9), (11), and (14). Let us consider these more closely.

Theoretical Consequences 201 First, the change in (2b) involved the cliticisation of habere and its attraction of the infinitive. The infinitive movement was reanalysed from (possibly remnant) XP-movement of the category containing the infinitive to SpecTP; this was arguably a facet of the general OV-to-VO shift which took place in the transition from Latin to Romance (see Chapter 2, 2.2 of R&R for discussion and in fact an alternative analysis). Strictly speaking, the earlier structure was also an instance of F*Move/Merge, since we are not distinguishing XP-movement and head-movement. In that case, what changed in Late Latin was XP-movement becoming head-movement, which can certainly be seen as a simplification (and one which needs to be built into (33)). But how does F*Move/Merge arise in the first place? This is a question which must be addressed, if we are not to predict that such options are so marked that they will always irrevocably disappear. Recall that the F*Move/Merge option arose from an earlier structure in which habere moved to T, via the standard case of grammaticalisation Move > Merge. It seems, then, that the marked structure arose in part from a reanalysis of the head which made it less marked. Thus we can consider that T*Move/Merge arose from a still more marked option: T*Move/Move, i.e. the case where T* attracts two elements, a head and a specifier. In this sense, the reanalysis in (2b) in fact involved a reduction in markedness. But of course we now have to explain how T*Move/Move arose. We conjecture that this option arose from T*Move, via a reduction in feature content of the attracted element, thereby requiring that two things move in order to satisfy the property of the attractor. For example, in the case under discussion it is possible that Latin habere was already a light verb v, and that T required an element with a “full” V-feature. Hence, VP-movement is introduced (itself possibly a reanalysis an earlier nominal infinitival – see 2.2 of R&R). Although the technical details are uncertain,5 this sketch shows us how simplification of the attracted element (reduction of habere from V to v) in line with (23), may create complexity elsewhere. The local nature of simplification is what creates complexity, and what prevents language change from leading to irreversible simplification. The changes in (4) and (6) are very similar, as we mentioned in Chapter 3 of R&R. Here new instantiations of M develop through the loss of feature syncretism elsewhere. Recall that the mood markers developed in Greek and English due to the loss of infinitival/subjunctive marking on the verb. Effectively, then, where the earlier system had Agree and feature syncretism on V (or T), the new system has no Agree relation and no feature syncretism. This is a simplification in terms of (23). It may appear that there is an increase in phonological markedness, but in fact this is not true: in the earlier system verbal morphology marked mood distinctions, and these distinctions were eradicated. So these cases are straightforward examples of simplification that are not directly captured in terms of the markedness hierarchy in (33). What is needed is a further option F*Agree, which, like F*Move, is more marked than F*Merge. It is likely that F*Agree is less marked

202 Anna Roussou and Ian Roberts than F*Move (since Move involves a further component, indeed in Chomsky’s terms a further feature, in addition to Agree). (8), the development of serial verbs, appears to be a case of relabelling in the sense of Whitman (2001). Here we can note that T inherently has less feature content than V, as it lacks argument structure (see the next section on the loss of argument structure under grammaticalisation). It is at least possible that C has less feature content than T (note that less temporal distinctions are made in the C-system than in the T-system—see Rizzi (1997)). So this change may well be consistent with (23). Finally, the changes in (9), (11), and (14) all involve the loss of feature content on the part of the grammaticalised element (Dem > D, Num/N > Neg, focus particle > Q), and hence are in conformity with (23). So we see that all the changes in (1–18) are in conformity with (23). (23) is of course the fundamental notion; the approximate hierarchy in (33), which we have now seen to be too simple, derives from (23). Most importantly, the changes in (37a) show how new synthetic morphology may be created, despite the fact that (23) may appear at first sight to favour analyticity. It is interesting to note that in these cases other aspects of word order (OV or VS) are relevant to creating the environment in which new morphology can emerge. New morphology does not readily emerge in SVO languages, it seems, and we can observe that in such languages (English, Romance) there is a clear historical tendency towards analyticity. (23) can explain this, a good example of how we can combine parametric analyses of change with an account of diachronic drift. We would now like to briefly compare our notion of markedness with recent proposals of Cinque’s (1999). As part of his study of clause structure across languages, Cinque observes that functional heads seem to have both marked and unmarked values. A selection of these is given in (38): (38) MoodSpeech Act MoodEvaluative MoodEvidential ModEpistemic Unmarked: Declarative -[-fortunate] direct evidence commitment Marked: -declarative -fortunate -direct evidence -commitment The observations of marked and unmarked values are based on familiar criteria: marked features are “more restricted [in] application, less frequent, conceptually more complex, expressed by overt morphology” (Cinque 1999: 128), while unmarked features are the opposite. Note in particular that marked features tend to be morphologically realised while unmarked features do not. How does this kind of markedness (which we will refer to as “Jakobsonian”) relate to the proposals we have just made? The two notions are quite distinct, in several important respects. First, Jakobsonian markedness refers to values of functional heads, while the one just sketched refers to realisations of those heads. Second, Jakobsonian markedness is not parameterised: the features are available in every language, and (presumably) stand

Theoretical Consequences 203 in the same markedness relations in every language—Jakobsonian markedness is thus given by UG, while the one just sketched derives from a formal property of the learning algorithm. They are thus quite different kinds of thing. Third, Jakobsonian markedness is a substantive notion (note the reference to conceptual complexity in the above quotation from Cinque), while that just sketched is a formal notion. So there are very good reasons to keep the two kinds of markedness distinct, as formal (the one sketched here) and substantive (Jakobsonian) notions with quite different cognitive status (the former deriving from the learning device, the latter from UG). However, two things lead us to say a little more than this. First, common to both notions is the idea that overt morphophonological realisation is marked, while zero realisation is unmarked. Second, there are very significant cross-linguistic generalisations in Cinque’s version of substantive markedness that we would like to find an expression for. Tentatively, we think that the two notions of markedness can be connected by taking a lead from Cinque (who takes it from Jakobson (see Cinque (1999: 128)) in regarding unmarked values as, in a sense, underspecified. What is needed is a feature hierarchy. Functional heads, as features F, G, H . . . , can come with various further feature specifications f, g, h . . . (we write the subfeatures with lower case and potentially autonomous functional features with upper case). We can then treat unmarked values of functional heads as simply the autonomous functional feature F, while the marked value will have a further subfeature, giving F+f. So MoodSpeech Act (or Force, or C; hereafter we refer to this category as C for convenience) means “declarative”, while C[-declarative] means nondeclarative. Of course, on this view, [-declarative] doesn’t exist (and neither does [+declarative], this being the unmarked value of the category C. What exist are other speech-act features: Q, Exclamative, Imperative, etc. These are all subfeatures of C. In other words, instead of saying that we have C with the two values [±declarative], we have C = Declarative by default and C = Imperative, Interrogative (etc.) as marked subfeatures. Now, if the parametrisation operator, which randomly distributes the * in the lexicon, applies to all types of features, F+f will have two chances of PF-realisation, while F will only have one. Thus, marked feature values are more likely to be overtly realised than unmarked ones and we derive implicational statements of the form “If a language has a declarative particle, then it has an interrogative particle,” etc., from the fact that where F* must be realised then so must all subfeatures of F*. Note that this idea carries over to the F*Move case, which seems right. In many languages, for example, marked illocutionary forces are associated with movement to C, while declaratives are not (this is approximately the situation in “residual V2” languages like Modern English). So we also derive the (correct) implicational universal “If a language has movement to C in declaratives, then it has such movement in interrogatives, etc.”6

204 Anna Roussou and Ian Roberts There is an independent reason to adopt some concept of markedness in current principles-and-parameters theory (this argument was made in Roberts (2001: 89–92)): the parameter space as currently defined offers too many choices for comparative or historical work to be possible. Assuming functional categories are the locus of parametric variation, and we have four potential parameter values per functional head, then for n = |F|, the cardinality of the set of functional heads, the cardinality of the set of parameters |P| is 4n and the cardinality of the set of grammatical systems is 24n. Assuming 4 heads in the C-system (see Chapter 3 of R&R and Rizzi (1997)), 4 heads in the T-system (T, Asp, Neg and v) and 4 in the D-system (see Chapter 4 of R&R), we have n = 12. Then |P| = 48 and |G| 248. This is a very large space indeed, and bear in mind that this is based on a fairly conservative functional structure (we have not taken into account the functional structure associated with AP, for example). In a discussion of a 30-parameter system, giving 1,073,741,824 grammars, Clark (1990) points out that a learner who checks one grammar per second from birth would in the worst case take 34 years to converge if this is the number of possible grammars. Hence there must be learning device which facilitates the search in this space. We can, I think, make a similar argument on the basis of diachronic considerations. Two assumptions are generally made in all comparative and historical linguistics (in fact, they really make historical linguistics possible, and have done since the beginnings of the discipline). These are articulated by Croft (1994) as follows:7 (39) a. Uniformitarianism: “the languages of the past are not different in nature from those of the present” (Croft (1994: 204)); b. Connectivity: “within a set of attested language states defined by a given typological classification, a language can . . . shift from any state to any other state” (Croft (1994: 205)). We can reformulate these assumptions in terms of principles-and-parameters theory as follows: (40) a. Uniformitarianism: the languages of the past conform to the same UG as those of the present; b. Connectivity: a grammatical system can change into any other grammatical system given enough time (i.e. all parameters are equally variable). Put this way, both assumptions seem entirely reasonable. To deny (40a) would be to assert that speakers of languages of the past were cognitively different from speakers of currently existing languages. Presumably, though, at least as far back as the origin of modern homo sapiens, we do not want to say this. Effectively, (40a) is the null hypothesis regarding the relation of UG to language change. Denying (40b) would imply “privileging” certain

Theoretical Consequences 205 parameters, a conceptually highly dubious move for which there seems to be no empirical motivation: (40b) is the null hypothesis regarding the role of parameters in language change. So we want to maintain the assumptions in (40). Now, at present approximately 5,000 languages are spoken (Ruhlen (1987)). Suppose that this figure is constant throughout human history (back to the emergence of homo sapiens), and that every language changes with every generation, so if we have a new generation every 25 years, we have 20,000 languages per century. If the total number of grammatical systems is 230 (following Clark’s (1990) discussion), it would take 18,000 centuries for each type to be realised once. At present, the usual reckoning is that humans have been around for about 2000 centuries (i.e. 200,000 years— see for example Bickerton (1991)). Of course, the figures given here are rather arbitrary, but the point should be clear: given the kind of parameter space we seem to have, on the basis of the empirical examination of existing languages, there simply has not been enough time since the emergence of the species (and therefore, we are assuming, of UG) for anything like the total range of possibilities offered by UG to be realised. This conclusion effectively empties uniformitarianism and connectivity of content. In theory, we simply couldn’t know whether a language of the past corresponded to the UG of the present or not, since the overwhelming likelihood is that it is typologically different from any language that existed before or since.8 One might conclude that 30 parameters define too big a parametric space, but, as we have seen, we have here rather conservatively given ourselves a 48-parameter system. Here we are again faced—in a different context— with the familiar tension between the exigencies of empirical description, which lead us to postulate ever more entities, and the need for explanation, which requires us to eliminate as many entities as possible. Chomsky (1995: 4–5) notes that the principles-and-parameters model resolved this tension for synchronic comparative syntax, but we see that the problem reemerges at a higher level. It seems then that the parameter space is too big for the assumptions of uniformity and connectivity to have any empirical consequences. Since uniformity represents the null hypothesis about the relation of UG to change, and connectivity the null hypothesis about parametric change, this conclusion appears to cast doubt on the entire enterprise of looking at syntactic change from the point of view of principles and parameters theory. This is the conceptual problem caused by the size of the parameter space. The size of the parameter space also raises an empirical issue: the fact that on the basis of a small subset of currently-existing languages we can clearly observe language types, and note diachronic drift from one type to another, is simply astonishing. The view presented above implies that, as far as the history of humanity up to now is concerned, languages should appear to vary unpredictably and without assignable limits, even if we have a UG containing just 30 or so parameters. Obviously, we need to find ways

206 Anna Roussou and Ian Roberts to reduce the range of parametric possibilities while retaining (at least) 30 parameters. In the next paragraphs, we will consider two ways to do this. First, we can suppose that something is causing grammatical systems to “clump” in the parametric space, rather like galaxies in the physical universe. What is the parameter-space equivalent of the forces that cause stars to bunch together into galaxies, etc.? We would like to suggest that the traditional linguistic concept of markedness creates basins of attraction in parameter space. In other words, unmarked values of parameters can effectively reduce the possible space that grammatical systems occupy, and so reduce the hyperastronomical range of possibilities (230, 248, etc.) to a sufficiently small range of possibilities for language types and diachronic drift to be discernible. Given the general considerations about the relation between language types, language acquisition, and language change raised in the previous section, the attractors— markedness—must be introduced by the learning algorithm. This is exactly what we have proposed here, and so our approach to markedness has independent motivation.

6.3 On the Nature of Functional Categories 6.3.1 ‘Semantic Bleaching’ and the Logical Nature of Functional Categories In the preceding discussion we have basically provided an account of how functional categories can be realised, when they are: by Merge, Move, in some cases by the combination of Merge and Move (cf. syntactic affixation). The development of a Merge realisation is what has been identified as the innovation in the cases of grammaticalisation. We have provided evidence for this kind of development by considering a number of functional heads, such as T (and for that matter v), C (which we argued splits in a number of heads, such as M, Op, etc.), Agr (which is actually a cover term for features such as Person and Number at least), and D (which again splits into a number of heads, such Dem, Num, and Q at least). The implicit assumption in our discussion has been that these heads have semantic content (cf. Chomsky 1995, 2000, 2001), thus arguing against an approach that postulates heads as pure checking positions. Under this approach the postulation of a series of functional heads is possible to the extent that there is also semantic justification for their presence. The obvious question is what exactly we mean by ‘semantic’ content when it comes to functional heads. The cases of grammaticalisation we have considered so far show that lexical (or functional) to functional reanalysis goes along with a change in the meaning of the reanalysed element. In standard grammaticalisation theory terms this is called ‘semantic bleaching’ (cf. Hopper & Traugott (1993: 87) for references). We consider this term indicative of the semantic changes that are associated with grammaticalisation, so we will use it without

Theoretical Consequences 207 committing ourselves to the theoretical framework where it originated. In order to illustrate how semantic bleaching works we will focus on a couple of cases we have considered in the previous chapters. The lexical to functional reanalysis is perhaps the best way to illustrate the changes in meaning. Recall from our discussion in Chapter 2 of R&R that a subset of lexical verbs becomes the realisation of a functional head. By doing so it loses part of its own semantic information and becomes compatible with the information which is associated with head/projection it realises. The key notion here is that of ‘semantic information’. More precisely what needs to be clarified is which part of its lexical meaning is lost and which part remains, so that the reanalysed item can be the realisation of the functional position. Let us then consider modal verbs in English or thelo in Greek. These cases show a common pattern: as lexical verbs they have argument structure, and when they become functional elements they have no argument structure. For example the verb thelo in Greek is 2-place predicate, which takes a DP as an external argument and a DP or CP (na-clause) as its internal argument. While there are no clear restrictions on the internal argument, this is not so for the external, which has to be +animate (a –animate subject is incompatible with the volitional reading). We can then assume that volitional thelo is merged in V and from there it moves to v and T, as shown in (41). (41) [TP thelo [vP tthelo [VP tthelo]]] Merger of thelo in V and then movement to v allows us to capture the fact that it has a complete argument structure. As already mention in Chapter 2, Section 2.3 of R&R thelo in MG can also be non-volitional, in which case it translates as need. Under this interpretation it does not impose any restrictions on the subject, but is sensitive to the properties of the complement (e.g. it is incompatible with a definite DP, but compatible with a deverbal nominal which is interpreted as a complex event in the sense of Grimshaw (1990)). Furthermore, it is incompatible with perfective aspect, and only allows for a 3rd person reading (singular or plural) when it takes a na-complement (cf. Roussou (2005) for a discussion). It thus differs in some very clear ways from volitional thelo and although it has some argument structure, it is rather defective in a rather obvious sense. In our discussion in Chapter 2 of R&R, we argued that this kind of modals are merged in v, as in (42): (42) [TP thelo [vP tthelo [VP V]]] A comparison between (41) and (42) shows how in the case of the same lexical item two different readings emerge with clear effects on the argument structure (and whatever this entails). To be more precise, if the interpretation of the external argument is determined in association with that of the internal one in a configurational approach (cf. Hale & Keyser (1993)), then

208 Anna Roussou and Ian Roberts the fact that thelo is not merged in V, but in v directly can account for the fact that it is not on its own able to restrict the interpretation of the subject. As a result of this, a [–animate] subject is possible, which is otherwise unavailable. Note that at this point the meaning of thelo is also affected in a clear way. As a volitional verb it has the reading ‘desire/want/wish’, while as a semi-lexical verb it has the reading ‘need/require’ (which is perhaps derived as an implication of ‘want’: I want something implies I’m in need of something). In both cases though it expresses an unrealised wish or a request for something to happen (posterior to the speech time).9 Consider what happens next when it is reanalysed as a T element: it is obviously not compatible with any argument structure, as at no stage does it appear in V or v. In our discussion in Chapter 2 of R&R we argued that once a verbal element is merged in T it is able to assume an epistemic interpretation, namely it can encode necessity or possibility. In the absence of any argument structure what remains in other words is the purely modal content of a verb like thelo (and presumably this is what we find in the history of thelo on its way to becoming a future marker). As Bybee et al. (1994) argue, this is a change from an Agent-oriented to a speaker-oriented modality. We then notice, as is rather standard, that in each step of the reanalysis there is an effect on the semantic content of thelo (which as noted above is not independent of what may appear as its complement). As a result of the reanalyses, thelo loses any kind of descriptive content and becomes associated with logical content. To be more precise, it has no predicative properties (i.e. it no longer expresses a relation between two individuals, or between an individual and an unrealised eventuality). What remains is just the modal notion of unrealised event. The same effect can be observed more or less in the same way for all the other modals under consideration (although the derived meaning can differ as a function of the different lexical source and the feature the reanalysed element realises). The crucial point is that merger in a functional position and loss of argument structure go together. So semantic bleaching is not just the random loss of content. It is rather the retention of logical meaning and the loss of non-logical content. Here, the notion of logical content is best glossed in terms of permutation invariance (see Mostowski (1957), Sher (1997)). The basic intuition here is that the logical content is independent of the external factors, or in von Fintel’s (1995) words, insensitive to facts about the world. The following quotation from his work captures the intuition behind permutation invariance (p. 179): The intuition is that logicality means being insensitive to specific facts about the world. For example, the quantifier all expresses a purely mathematical relationship between two sets of individuals (the subset relation). Its semantics would not be affected if we switched a couple of individuals while keeping the cardinality of the two sets constant. There

Theoretical Consequences 209 couldn’t be a logical item all blonde because it would be sensitive to more than numerical relations. More formally, consider the following definitions from the discussion in Sher (1996: 518): (43) “[Logical] quantifiers should not allow us to distinguish between different elements [of the underlying universe]” (Mostowski (1957: 13), parentheses supplied by Sher). (44) An A-quantifier is logical iff it is invariant under permutations of A (or, more precisely, permutations of P(A) induced by permutations of A). (An A-quantifier is a set of subsets of a universe A or a function from subsets of A to truth values). In other words, as Sher says “QA is logical iff for any permutation of A and any subset B of A, QA(B)= QA(p’(B)), where p’ is the permutation of P(A) induced by p” (p. 518). The definition is elaborated as follows ((45) is attributed to Lindström (1966)): (45) A term is logical iff it is invariant under isomorphic structures. (Sher (1996: 520)) Here a “structure” means an n + 1-tuple, , where A ≠ Ø and Di, 1 ≤ i ≤ n, is member of A, a subset of A or a relation of A. Again, a quotation from Sher can elucidate this definition: “A term invariant under isomorphic structures takes into account only the mathematical structure of its arguments in a given universe. Since individuals are, semantically, atomic elements, they are all structurally identical, and their difference is not detected by any logical (structural) term.” Sher points out that the definition in (45) includes the following elements as logical: cardinality quantifiers, the 1st-order identity relation and “most”-type quantifiers, “more than”, the membership predicate and the relational predicate “well-ordering”. These are elements (with the possible exception of the last one) which are naturally construed as D- or T-elements. (45) excludes predicates such as “is tall,” “is a relation between humans” etc., i.e. lexical predicates. We believe that (45), or something very like it, may be the key to a formal characterisation of the nature of functional categories. It then follows that lexical material which grammaticalises as functional material loses all semantic content which cannot be construed under (45). For verbs, this entails the loss of argument structure; for nouns, the loss of descriptive content; for adjectives, the loss of descriptive content (cf. the discussion of “whole” > “all” below and in 4.4 of R&R); for Prepositions, the loss of content relating to spatial relations (cf. the discussion of Greek kata in 4.4 of R&R). In other words, because it involves a particular kind of change in syntactic category, grammaticalisation strips away the descriptive content and leaves the

210 Anna Roussou and Ian Roberts logical content associated with the reanalysed element. Because the content of functional heads is limited to logical content, when a lexical element becomes functional, it loses all non-logical content. We understand non-logical content in terms of permutation/isomorphism invariance, as described above. The example von Fintel (1995) discusses is that of a universal quantifier, e.g. all. We considered the grammaticalisation of this element in Greek out of the adjective holos > olos (Chapter 4, Section 4.4 of R&R). Following Haspelmath (1995: 367), we assumed that the distributive/universal reading arises in combination with a collective noun, so that an expression like the whole family can be interpreted as all the family/every member of the family. The reanalysis is given in (46) below: (46) [QP ola [DP ta [..[NP spitia]]] Notice that by being merged in Q element, olos can no longer modify the noun, i.e. there is no descriptive content to relate to that of the noun. Similar effects have been attested in our discussion of n-words in French for example. In Chapter 4, Section 4.2.2 of R&R we considered in detail the reanalysis of words like rien and personne. In this case reanalysis is a bit more complex as the negative meaning arises as a function of the changes that affected D inside the DP (the unavailability of a null D as an indefinite marker) and the Agree relation with clausal Neg. Leaving these changes aside what is crucial is that due to these factors the original descriptive content of the noun is reanalysed as the restriction on a quantificational relation (of course, the restriction represents non-logical content, but as n-words in Num personne, etc. denote logical relations rather than purely descriptive content; this may be a way to characterise a semi-functional element—cf. also the discussion of thelo when merged in v above). We have observed similar developments in other cases: nouns lose all their descriptive content in being reanalysed as negators (see 4.2.3 of R&R), and so elements such as pas, dhen, and shi lose all nominal content and simply become negators. In the cases just discussed above we have a lexical item being reanalysed as a functional one. We also allowed for a functional element being reanalysed as a functional one (e.g. modal particles and complementisers in Chapter 3 of R&R). In these cases as well we argued that there is a semantic change involved. Given that the source of the reanalysis is a functional feature the only difference here is that there is a switch from one type of logical element to another. In a way this is what makes these cases a bit less interesting than the ones discussed above in terms of the semantic change (but not in terms of the structural change involved). The notions discussed here open up the possibility of characterising functional categories in a new way. Many functional elements clearly have logical meanings, in a sense that seems very close to that defined above. This is clearly true of quantificational elements in DP (occupying D, Q, or Num). It is true also of modal elements, to the extent that these quantify over

Theoretical Consequences 211 possible worlds. It is also true of negation, as this just defines a complement relation between sets. It may be true of Tense and Aspect, to the extent that these notions can be construed as quantification over times or events. Alternatively, Tense may be an ordering predicate (Stowell (1996)), another kind of logical relation (see above). Complementisers, to the extent that they may be factive or realis, are connected to modality. The various degree markers which may make up a functional system associated with AP are also logical, as indicated in the discussion above. The status of demonstratives is unclear, however; perhaps Dem is not a functional category after all (this would not materially alter our discussion in 4.1 of R&R; we could treat Dem as an AP). Finally, Agreement cannot be functional on this definition (see the discussions in 1.3 and 4.5 of R&R), although, if split into Person and Number, as suggested in 4.6 of R&R, Number could qualify. The question of the status of Person features also relates to the demonstratives; essentially the question here is whether 1st- and 2nd-person features can be seen as logical elements. Of course, the above remarks are somewhat sketchy. But the important point to note is that if in the lexical > functional reanalysis what remains is the logical content, and given that the new item becomes the realisation of a functional feature, then this logical content must be in fact the content of the functional feature in question. In other words grammaticalisation provides us with a good way to understand the properties of functional categories. The way to understand ‘semantic bleaching’ then is precisely in terms of creating items whose meaning is purely logical, in the sense of isomorphism invariance as discussed above. 6.3.2 Speculations on Phonological Reduction We have repeatedly observed that grammaticalisation involves “phonological reduction” of the grammaticalised element. Indeed, if we consider our cases in (1–18), we can observe that most of them involve some process of this type: English modals developed unstressed and reduced forms in the 16th-century (see Plank (1984)); habere clearly reduced first to a clitic and then an affix in becoming the future/conditional forms of various Romance languages; Greek thelo + na reduces to tha; hina reduces to na; Latin modo reduces to Calabrian mu; complementiser that can reduce, while demonstrative that cannot (as we commented in 3.4 of R&R); Latin ille reduced to a monosyllabic form as an article (and object clitic) in Romance (Giusti (2001) suggests this was a crucial step in the reanalysis of these elements); Greek oudhen reduced to dhen as part of its reanalysis as negation; kan + pjoios reduces to kapjos; free relatives undergo a spectacular morphophonological reduction in becoming free-choice indefinites or universal quantifiers; and pronouns undergo a certain amount of reduction in becoming agreement markers. Thirteen out of 18 of our cases of grammaticalisation involve phonological reduction of some type.10

212 Anna Roussou and Ian Roberts There is nothing novel in this observation. For example, Hopper & Traugott (1993: 7) discuss the “cline of grammaticality” in (47), and similar observations have been frequently made: (47) content item > grammatical word > clitic > inflectional affix Of course, phonological change goes on all the time, and is in principle quite independent of syntactic change. But the kind of phonological reduction which is associated with grammaticalisation appears to be more radical than standard phonological change, and of course only affects grammaticalised elements; it is not exceptionless in the traditional neogrammarian sense (cf. the different possibilities of reducing the vowel of the noun can and that of the modal can to schwa). Here we would like to relate these observations regarding phonological reduction to observations about the prosodic nature of functional categories in the synchronic phonological literature. We mentioned in Chapter 1, section 1.3 of R&R, that phonologically realised functional elements are typically unstressed and “light”. In fact, it is clear that in many languages they fall below certain threshold prosodic values. In English, for example, monomoraic CV words are not found in the lexical vocabulary (Kenstowicz (1994: 640)). For this reason, one can define a minimal word in English as being bimoraic. However, Kenstowicz (1994: 642) notes that “elements drawn from the nonlexical class of pronouns, prepositions, and grammatical particles frequently escape minimality restrictions”. McCarthy & Prince (1986) make the notion of minimality more precise in terms of the prosodic hierarchy in (48): (48) PrWd (prosodic word) | F (foot) | σ (syllable) | μ (mora) Every foot must be binary, i.e. disyllabic or bimoraic, and so monosyllabic or monomoraic items cannot be feet, and therefore cannot be phonological words. Kenstowicz (1994: 643) comments “A degenerate element can escape the normal stress rules and hence cliticize. Thus, clitics tend to be monosyllabic. These prosodic dwarfs reside primarily in the nonlexical region of the vocabulary”. Phonologically realised functional elements are thus typical subminimal in terms of the prosodic system of the language, and as such liable to cliticise. Since clitics are phonologically bound elements, there is a natural propensity to reanalyse them as morphologically bound elements, i.e. affixes; in fact in section 6.2.3 above we treated this development as F*Move/Merge > F*Move. In these terms, then, we can naturally relate the emergence of F*Move/Merge to the clitic status of the exponent of F.

Theoretical Consequences 213 Here is a list of “weak forms” in English, from Gimson (1980: 261–263): (49)

a am an and are as at be been but can (aux.) could do (aux.) does (aux.) for from had (aux.) has (aux) have (aux) he her him his is me must not of saint shall she should sir some (pl. indef.) than that (C) the them there (expl.) to us was we were who will would you

Unaccented ǝ m, ǝm n, ǝn ǝnd, nd, ǝn, n ǝ(r) ǝz ǝt bi: bi:n bǝt kǝn, kn kǝd, kd du, dǝ, d dǝz, z, s fǝ(r) frǝm hǝd, ǝd, d hǝz, ǝz, z, s hǝv, ǝv, v hi:, i:, i: hǝ, ɜ:, ǝ i:m i:z s, z mi: mǝst, mǝs nt, n ǝv, v, ǝ sǝnt, snt, sǝn, sn šǝl, šl ši: šǝd, šd sǝ(r) sǝm, sm ðǝn, ðn ðǝt ðǝ, ði: ðǝm, ǝm, m ðǝ(r) tǝ, tu ǝs, s wǝz wi: wǝ(r) hu, u:, u l wǝd, ǝd, d ju

Accented ei æm æn ænd a(r) æz æt bi: bi:n bʌt kæn kud du: dʌz fo:(r) from hæd hæz hæv hi: hɜ(r) hi:m hi:z i:z mi: mʌst not ov seint šæl ši: šud sɜ(r) sʌm ðæn ðæt ði: ðǝm ðεǝ(r) tu: ʌs woz wi: wɜ(r) hu: wi:l wud ju:

214 Anna Roussou and Ian Roberts Here a striking pattern emerges in nearly every case: the unaccented form is subminimal (it is almost always monomoraic, i.e. containing something smaller than CVC or CV:), the accented form is at least bimoraic, and nearly all the elements capable of being unaccented are functional. The unaccented forms are of course the usual ones in connected speech, unless the element in question is contrastively stressed. All the CVC unaccented forms contain schwa, which cannot be stressed in most varieties of English. So we observe a very clear correlation between functional elements and having the status of a “prosodic dwarf” to use Kenstowicz’s term.11 This correlation is borne out by the observation that certain items may be either functional or lexical; when functional they can be unaccented, when lexical they cannot be. We have already commented on can (noun) vs. can (aux) in this respect, as well as that in Chapter 3, 3.4 of R&R. Above we see that both do and have pattern in this way: main-verb do, for example, as in I do university administration every morning cannot be reduced to /dǝ/ or /d/, unlike auxiliary do in an example like Do universities serve any purpose? Similarly, the ability to reduce have correlates with its auxiliary syntax, as the following examples show: (50) a. John hasn’t left. b. %John hasn’t a car. (51) a. John’s left. b. %John’s a car. (in the interpretation “John has a car”) The cases of some and there are similar, although the singular quantifier some, which cannot reduce, may be a problem. In general, then, we can observe a correlation between prosodic subminimality and functional elements. We can therefore understand why grammaticalisation, understood as the creation of new functional material, may involve phonological reduction. Where a lexical element is reanalysed as functional, it must involve phonological reduction if functional categories are required to be subminimal. The evidence we have seen is compatible with the idea that functional categories are in fact obligatorily subminimal. Italian, as discussed in Vogel (1999), provides further evidence for the same conclusion. Vogel (citing Bullock (1991), Repetti (1989, 1991), Vogel (1994)) assumes that the minimal word in Italian consists of a bimoraic foot. She points out that rather few words are actually minimal (mentioning bel (“beautiful (m.sg.)”) and fai (“you(sg) do”) among others). Among the subminimal words of Italian are the pronominal clitics, which consist of a single light syllable. Other elements which consist just of a single light syllable are the articles and the complementisers che, di, and a (again, note that the latter two are also prepositions). So here we observe a similar correlation between functional categories and prosodic subminimality to the one we saw for English. No doubt this kind of observation could be repeated in other languages.

Theoretical Consequences 215 Now, the above discussion has naturally concentrated on F*Merge, the case where a functional category has an overt realisation. The other possible parametric values of functional categories are also phonologically defective, and in fact can be seen in the same light as being prosodically subminimal. F*Merge/Move is naturally related to the clitic status of the exponent of F, as mentioned above. F, the case where there is no phonological realisation of the category, can clearly be seen as an extreme case of subminimality. In Chapter 1 of R&R we observed that lexical elements always have a phonological matrix (although they may of course be subject to operations like ellipsis and gapping, however these are to be characterised; in Chapters Two and Three of R&R we postulated structures containing radically empty VPs and NPs, but this does not alter the fact that the lexical entries of lexical categories are always associated with a phonological matrix); this can also be understood in terms of the minimal word requirement applying to fully lexical items. Finally, F*Move can be thought of as a phonological specification: the *-diacritic requires an element lacking its own lexically given phonological matrix to have one, and so it triggers movement (before Spell Out) of some appropriate element (presumably one it Agrees with—see Chomsky (2000, 2001)). So the parametric properties of functional categories are all differing instances of the fact that these categories are prosodically subminimal. One final point: we are adopting the standard view in principles-andparameters theory that language acquisition involved parameter setting. The parameters we have proposed all involve the realisations (or the lack of realisation) of functional categories. In this section we have suggested that functional categories are inherently prosodically defective. This means that many of the cues—those in (36b) above—for parameter settings reside in perceptually nonsalient parts of the input string: unstressed, subminimal formatives. This naturally places a burden on the language acquirer and creates the possibility of the kind of “mis-setting” of parameters that leads to language change (of course, movement itself is another type of cue for parameter settings; this raises different considerations). In this section, we have suggested that there is a phonological characterisation of functional categories: they are prosodically subminimal elements. In the next section, we will speculate as to why this should be. 6.3.3 A Speculative Characterisation of Functional Categories In this section, we will try to put together the proposals made in the previous two sections, in order to arrive at a tentative characterisation of the nature of functional categories. Our proposal is that functional categories are inherently defective at the interfaces, and, as such, are categories with highly reduced lexical entries. In 6.3.1 we suggested a semantic characterisation of functional categories as being restricted to purely logical denotations, in the sense of isomorphism

216 Anna Roussou and Ian Roberts invariance as described there. In 6.3.2 we observed that functional categories are typically prosodically subminimal. Putting these two ideas together, we can make the following observation:12 (52) Functional categories are defective at the interfaces. To see what (52) means, compare a volitional verb, such as English want with a future marker such as reduced ’ll of spoken colloquial English (similarly, one could compare Latin habeo with Late Latin/Early Romance aio, or perhaps Classical Greek thelo with Modern Greek tha—although in the latter two cases the prosodic facts are not entirely clear to us). As discussed in 6.3.1, want has an argument structure and a non-logical denotation; it is also larger than the minimal English word (it is CVCC, and therefore clearly bimoraic). Its lexical entry must therefore contain information regarding its argument structure and its prosodic structure; in other words, a range of interface properties need to be specified. Of course, this word also has formal syntactic features, at least V (or perhaps [+V, -N]) and maybe a specification of its Case-assigning properties. The reduced auxiliary ’ll, on the other hand, lacks both argument structure and prosodic structure. Its semantic content is arguably exhausted by the feature Future, its phonological content by /l/. However, it has a syntactic categorial feature, T.13 Thus the basic difference between lexical and functional elements is summarised by (52). Call this the Interface Defectivity Hypothesis (IDH). The IDH allows us to reduce the semantic bleaching and phonological reduction (each construed more precisely as in 6.3.1 and 6.3.2 respectively) associated with grammaticalisation to the kind of syntactic categorial reanalysis which we have documented in the foregoing chapters. Moreover, if the IDH is correct then we have a theoretical tool which we can use to characterise the inventory of functional heads. This of course should be matched with the empirical evidence attested crosslinguistically. This can also take us one step towards a characterisation of the functional structure of the clause, and of other functional domains, notably DP. We return to this point below. Before discussing the functional hierarchy, let us make one further observation. In recent work (Marantz (1997), Chomsky (2000, 2001: 43)), it has been suggested that categorical features such as N, V etc. are to be dispensed with. If so, then functional features play a still bigger role in the basic syntactic computations (Merge and Agree in particular) than previously. Moreover, lexical elements have no intrinsic formal features at all.14 Combining this idea with the IDH, we arrive at a near-perfect complementarity: functional categories bear all and only the features relevant for the syntactic computation; lexical categories bear all and only the interface features. If the syntax is then seen as the optimal way to satisfy interface properties for a certain array of lexical items, then we can understand why functional categories must be present, as items which make this procedure possible.

Theoretical Consequences 217 Of course, the complementarity between lexical and functional categories is not perfect, in that functional categories can have phonological properties (albeit reduced) and must have logical properties. It follows for Chomsky (2000, 2001) that they must have interpretable features, and we suggested in 6.3.1 that these features are semantically limited in a particular way—see Chomsky (2000: 138– 139). So what needs to be explained is why they can have phonological features. We have already speculated (see 6.2.1) that as far as the computation to LF is concerned, functional categories need have no PF-properties. So we need to understand why they have just the limited PF-properties they have. We conjecture that the answer to this is very simple: we have seen that functional categories lack metrical properties, which are clearly a major feature of phonological structure. Aside from requiring that functional categories lack phonological structure in this sense, no restriction is imposed. Hence functional categories vary randomly, below the minimal level for participation in prosodic structure. This line of reasoning explains the existence of parametric variation with the form it has. We can gloss the IDH in relation to the restriction to logical content on the LF side in the same way. Our claim that functional categories are restricted to logical meanings amounts to treating them as logical constants. Logical constants are the simplest elements of a logical system. More specifically, since functional categories lack interface structure, then we can surmise that this means that they can only have logically atomic properties—more complex denotations (involving relations with the world, and for example predicate/argument structure) are not allowed. Finally, we can understand the simplicity metric in (23) in a similar light. Feature syncretism involves structure (in that the two features must stand in some kind of relation), and so is to be avoided. To summarise these speculations, then, we can conclude as follows: (53) Functional categories are atomic, in that they (preferentially) lack structure in syntax, and obligatorily lack it at the interfaces. (53) can explain semantic bleaching, phonological reduction, the nature and existence of parametric variation with the properties assumed here and, through (23), the nature of syntactic change and the existence of markedness. As such, it goes a long way towards explaining many of the apparent imperfections of language, including not least the propensity to variation and change in time and space. For these reasons, although it is highly speculative, we think that (53), and its congeners (23) and the IDH, are worth thinking about. 6.3.4 Remarks on the Functional Hierarchy Here we restrict ourselves to a few comments on the functional hierarchy, applying some of the conclusions of the foregoing sections where relevant. Of course, this is a very large topic which we cannot begin to do justice to here, and our remarks should be taken as anything but definitive.

218 Anna Roussou and Ian Roberts In order to determine what the functional hierarchy is, two things are required: the first is to identify the number of possible functional heads. In the previous section, we offered a general characterisation of functional heads, which ought in principle to be able to provide an answer to this question. The next step involves the ordering of these functional heads, i.e. how exactly the universal ordering is derived. Cinque’s (1999) system is an attempt to characterise these positions in terms of the empirical evidence provided by the distribution and realisation of adverbial, auxiliary, and affixal elements. This distribution determines not only the nature of functional categories but their respective order. We gave a preliminary version of Cinque’s hierarchy for the “IP-domain” in Chapter 1, (18) of R&R, which we repeat here as (54): (54) MoodSpeech Act MoodEvaluative MoodEvidential ModEpistemic T(Past) T(Future) MoodIrrealis ModNecessity ModPossibility AspHabitual AspRepetitive(I) AspFrequentative(I) AspCelerative(I) ModVolitional AspCelerative(i) T(Anterior) AspTerminative AspContinuative AspPerfect(?) AspRetrospective AspProximative AspDurative AspGeneric/progressive AspProspective AspSgCompletive(I) AspPlCompletive Voice AspCelerative(II) AspSgCompletive(II) AspRepetitive(II) AspFrequentative(II) AspSgCompletive(II)

The resulting order roughly involves a series of aspectual heads above V, above there lies a series of modal heads, above which are the T heads, finally above these is a further series of modal heads. The latter may relate to discourse properties. The question that arises in a system of this type is where DP arguments fit in this system, or for example where quantificational elements (e.g. wh-words) go. Furthermore a related question arises with respect to the DP-internal structure, although Cinque assumes that it is in parallel with that found in the clausal system, as the positions and ordering of attributive adjectives show. In addition, a structure of this type is assumed to be embedded under a complex C-system of the type proposed by Rizzi (1997) (and see Chapter 3 of R&R). On the other hand, Chomsky (1995, 2000, 2001) takes a rather conservative view and identifies three basic functional heads C, T, and v, leaving open the possibility that these are cover terms for more complex systems. Just by looking at these two rather different approaches we identify a common theme: namely, the division of the clausal domain in three basic parts (above V, above T, and above C). This is in fact rather well accepted in the literature (cf. Cardinaletti & Starke (1999), Grohmann (2000), Platzack (2001)). This is further supported by Belletti’s (2004) proposal to iterate the projections found in the C-system (Topic, Focus, etc) in the space immediately above VP (the right periphery). A rather similar stand is taken by Manzini & Savoia (2005) who identify these positions with clitic shells (in the sense of Sportiche (1996)) and argue that each shell can be projected above V, above I, and above C (simultaneously in some cases). The clitic shell consists of the positions typically associated with the DP, thus bringing out the intuition that there is a correspondence between the nominal and the clausal structure.

Theoretical Consequences 219 As the above brief discussion shows there is no general agreement either on the exact number or on the nature of the functional categories involved, although there seems to be consensus on the existence of C and D at least. One very interesting generalisation that emerges, despite the differences of approach and execution, is that functional heads can repeat themselves in different domains: this has been repeatedly observed for negation and modality, and may be true for focus and topic, if Belletti’s proposals are correct. There appears to be a cyclic structure to the functional hierarchy. This also has an important implication for the theory of grammaticalisation that we have been pursuing in the present book. If there is in fact repetition of features within the hierarchy, then certain kinds of reanalysis do not seem to be so unexpected. We raised this point in connection with the similarities between C, D, and P and the reanalysis of P to as a C (M) element (see Chapter 3 of R&R). We also observed similarities between the reanalysis of N to Num (e.g. n-words, see 4.2.1 of R&R) and V to T (e.g. modals, see 1.1 of R&R). It seems likely then that the functional structure consists of the iteration within the different domains of the same sequence of categories. Cardinaletti & Starke (1999: 184f) propose the structure C–Σ–I–lexical category. It is rather difficult to give these categories a general characterisation, and we will not attempt to do so here. Instead, we simply observe that the same or very similar diachronic processes appear to operate across each domain, as we have seen in the foregoing chapters, and that our characterisation of functional categories predicts three things: (i) that these hierarchies will vary randomly in their PF-realisation from language to language, although all functional categories will remain prosodically subminimal; (ii) the denotations of the functional categories are isomorphism-invariant; (iii) feature syncretism is always avoided. The last point is rather important, in that it suggests that the fields cannot obviously be defined by a system of intersecting features, e.g. C = [+Ref, +V], D = [+Ref, +N], etc. (Cardinaletti & Starke (1999) employ a system of this sort). The only natural alternative is that the systems are defined semantically in terms of the types of individuals they quantify (note that it is natural to see all functional heads as quantificational, given what we said in 6.3.1), viz.: C-heads quantify over propositions, T-heads quantify over events, and D-heads over individuals. This conclusion implies that the way to understand functional structure is by understanding quantification.

6.4 Conclusion In this work we have attempted to give a general formal characterisation of grammaticalisation, the process by which new exponents of functional categories are created. We have argued, on the basis of 18 case studies from a range of languages, that grammaticalisation involves structural reanalysis so that some new element comes to be merged in a functional position F. The

220 Anna Roussou and Ian Roberts structural reanalysis is always simplification in the precise sense defined by the simplicity metric in (23), repeated here: (23) A structural representation R for a substring of input text S is simpler than an alternative representation R’ iff R contains fewer formal feature syncretisms than R’. We described above how (23) provides the basis for a theory of markedness of parameter values, and how changes which create more marked structures may be consistent with (23) at a local level. Finally, in this chapter we have sketched a general characterisation of functional categories, which can explain why grammaticalisation is always associated with phonological reduction and semantic bleaching. For us, this is the direct consequence of the development of new functional material. One question we have only touched on in passing concerns the type of material which is prone to grammaticalisation. In Chapter 1 of R&R, we mentioned isolated morphological subclasses such as the OE/ME preteritpresents and 2nd-conjugation stative verbs in Latin. It is also no accident that the English premodals were intensional verbs, so of course was Greek thelo, which gave rise to the future marker tha. The particles reanalysed as irrealis markers in M discussed in Chapter 3 of R&R also had an intensional meaning (“in order to” or “unless”). Of the cases discussed in Chapter 4 of R&R, it is clear that generic nouns naturally develop into indefinites of various kinds (and thus into n-words and/or wh-words); we suggested that this happens when the descriptive content of the noun can be reanalysed as the restriction on a quantifier. Finally, the reanalysis of pronouns as agreement markers involves no change in phi-features, but simply a loss of referential properties. It is not clear what generalisations emerge from all this: we suggest in fact that the reanalysis which underlies grammaticalisation will act on any available lexical material, as long as it can be reconstrued as functional along the lines described above. The variety of cases discussed by Heine & Kuteva (2002) supports this: the noun “child” may be reanalysed as a partitive (p. 67), “ear” as a locative marker (p. 121),15 “song” as a noun classifier (p. 280). So we make no generalisations on this point. Once an element enters the functional system, it will tend to be reanalysed successively upwards in the structure, and this creates grammaticalisation paths, as we have already pointed out. In the Introduction to R&R, we identified several larger themes in the book. One was the tension between a descriptively adequate account of grammaticalisation paths and the standard principles-and-parameters view of language change as a random walk through a space defined by the range of parametric variation. We tried to show in section 6.1.3 above that this tension can be resolved in terms of an independently motivated account of the relative markedness values of different parameters. Markedness effectively creates “basins of attraction” in the parameter space, and thereby

Theoretical Consequences 221 causes grammatical systems to “clump” around certain combinations of options. Another issue was the characterisation of a possible functional category. We looked at this in 6.3, and tentatively suggested the IDH, and then the idea that functional categories have no structure, put forward in (53). The IDH is supported by and supports our analysis of grammaticalisation. We believe that in this connection we have been able to provide a new perspective on the nature of functional categories by looking at their diachronic development. Finally, since we are framing our analyses in terms of (a variant of) Chomsky’s minimalist programme, we should ask ourselves whether our work has shed any light on the question of the nature of language as a perfect system. We believe that it may have. We remarked at the end of section 6.3.3 that the general idea that functional categories are atomic in structure may underlie the following properties: (55) a. semantic bleaching (they have the minimal LF-structure, cf. the IDH) b. phonological reduction (they have the minimal PF-structure, cf. the IDH) c. the nature and existence of parametric variation (PF is indifferent to all properties except prosodic subminimality, hence random PF-variation) d. the nature of syntactic change (random diachronic variation in PF-properties) e. markedness (the interaction of (23), derived from (53), with (d)). Explaining the nature of syntactic change, as we believe our proposals to do, entails explaining the nature of synchronic variation, since synchronic variation is just the result of diachronic changes. So our proposals about functional categories explain not just the nature, but the existence, of parametric variation. If the above ideas are correct, a major feature of natural language is accounted for in a straightforward way. The existence of functional categories, movement and parametric variation is quite mysterious in the framework of Chomsky (1995, 2000, 2001) and is considered to be at least an apparent imperfection of the system. In the approach outlined here, we can explain the imperfections in terms of (53). Interestingly, (53) is a natural aspect of a perfect system; it is simply a definition of the atomic elements of the system. But the system in question, the computational system of the syntax, must interface with LF and PF. The simplest system-internal properties give rise to system-external complications, particularly in the case of the PF interface, precisely because the syntax is indifferent to certain formal aspects of the interfaces. Since the PF-interface creates the input to language acquisition, the imperfect mapping from syntax to PF gives rise to variation and change in acquisition, and therefore in grammatical systems generally.

222 Anna Roussou and Ian Roberts We conclude that Chomsky’s conjecture that the computational system that forms the syntax is perfect is not impugned by the existence of such an apparent imperfection as variation and change in time and space. These properties, i.e. the simple existence of different grammatical systems, follow from the interactions—or lack thereof—between the computational system and the PF interface.

Notes 1. Tabor & Closs-Traugott (1998) provide a formulation of grammaticalisation in terms of ‘Structural Scope Expansion’ which is very much on line with our approach. However, they argue that it is not clear that this approach can extend to all cases of grammaticalisation. 2. Note that our approach takes the notion of ‘unidirectionality’ to work to the extent that it can be structurally defined. In this respect we differ from standard functionalist approaches to grammaticalisation (but see Note 1). Furthermore, there is nothing in our approach that prevents instances of degrammaticalisation from taking place, yielding a lexical category out of a functional one (cf. the cases discussed in Newmeyer (1998: Chapter 5)). In Chapter 4, Note 21 of R&R, we mentioned the case of me:dhen > midhen (= zero) in Greek, which involves a quantifier becoming a lexical category (noun). We can account for this on the assumption that the other negative quantifiers dropped out of the system (and oudhen became the negator dhen), and me:dhen was no longer analysed as an element consisting of two (or three) morphemes, but was reanalysed as a single lexical item. This is further supported by the fact that as a Noun it can be preceded by the definite article (e.g. to midhen). We will not discuss these cases here, but we think that this is a rather good example to show that in our terms degrammaticalisation is indeed possible, albeit rather sporadic. On the other hand, grammaticalisation works in a very systematic way. 3. “Displacement” refers to a perturbation of the expected order, which we take to be given by UG in the form of a functional hierarchy. We will discuss the functional hierarchy in the next section. 4. In Greek; in Germanic, this was a case of F*Move/Merge> F*Merge. 5. In Chomsky’s (2000, 2001) terms, we could say that T had an EPP-feature. However, it is entirely unclear how such a mysterious property could be innovated. 6. One could extend this line of reasoning, following recent proposals by Giorgi & Pianesi (1997), and say that F can be entirely absent from the representation, but will be “read in” at LF by convention. On the other hand, F+f has to be syntactically present in order to be interpreted. Once syntactically present, F+f is parametrised, and so might be PF-realised. Cinque (1999:133) criticises the Giorgi & Pianesi approach on the grounds that it leads to two ways of giving a default value for F: F is either present with the default value or absent and interpreted with a default value. In terms of the proposals being made here, though, we could think that F can only be present with a default value if PF-realised, and this is a case of formal markedness, as defined here, and so distinct from the maximal default. The maximally unmarked case is then where F has no PF-realisation and the default LF interpretation. It is natural to think of this as the absence of F from the numeration. What this idea requires, of course, is a theory of LF which can tell us how the defaults are filled in. 7. The concept of uniformitarianism was first put forward by the 18th-century geologist James Hutton. Hutton’s idea was that the features of the earth had

Theoretical Consequences 223 evolved over long periods of time through processes of erosion, etc., rather than having been divinely created. The term became known thanks to Lyell (1830). See Ruhlen (1987:25ff.). 8. Kayne (2000:6-8) discusses the number of parameters and the number of grammatical systems, and makes an interesting and rather plausible case that there are at least as many grammatical systems in the world as there are people, i.e. upwards of 5 billion. Despite initial appearances, this conclusion does not alter the point being made in the text: if there are so many grammatical systems, then vast numbers of them differ only slightly from one another. But we still need to allow for “macrovariation” for gross properties such as basic word order, etc., and so still need to allow in principle for a wide typological range. Essentially, Kayne’s argument leads one to the conclusion that there may be more different grammatical systems in the world than is usually thought, but they are all clustering around the same basins of attraction. Markedness must, if anything, be a more powerful attractive force in the parameter space if Kayne is right. 9. Postma (1995) makes a similar point. He postulates that every time a lexical item moves its meaning is affected. Since all movement is to functional positions, the meaning is affected so as to become non-lexical. 10. Of the remainder, English to may have taken on the ability to reduce to /tǝ/ at the time of the reanalysis; the situation regarding serial verbs becoming complementisers is not known; Stage-Two negative words in French were already phonologically minimal, as was Greek ti(s). The only true exception to the generalisation regarding phonological reduction therefore concerns the French n-words (personne, rien, etc.,), which appear to have undergone no phonological reduction at all in becoming functional. This may be a further reason to consider these items as semi-functional, as briefly suggested in the previous section. 11. There are some Prepositions in the list in (48): at, for, from, of and to. Of these, for and to are also C-elements (see Chapter 3, 3.3 of R&R). Regarding the others, it may be that these are “functional prepositions”; it has often been observed that the class of prepositions may be divided into functional and non-functional elements. A good example of a non-functional preposition is “through”, see the discussion in Chapter 4, 4.4 of R&R. Appellations such as Saint and Sir may be Ds. 12. The idea being put forward here is conceptually similar to Cardinaletti & Starke’s (1999) proposal regarding structural deficiency. The implementation of the intuition is rather different, however. 13. We are slightly simplifying matters here in order to illustrate our proposal. Will has a residual volitional sense, visible in particular in examples like (i): (i) I won’t do it! It also has a CVC form, and as such is (just) a minimal word. These facts may indicate that will is in fact semifunctional; cf. the suggestion in Chapter 1, 1.1 of R&R, that root modals are inserted in v. 14. This entails not viewing θ-roles as formal features, pace Hornstein (1999), and taking Accusative Case to be a property of v (as is standard). 15. In fact, it is not clear whether this case fulfils our criterion for grammaticalisation. See the discussion of “through” in 4.4 of R&R.

References Adams, M. 1987. Old French, Null Subjects and Verb-Second Phenomena. PhD dissertation, UCLA. Alexiadou, A. & G. Fanselow. 2000. Laws of diachrony as the source for syntactic generalizations: The case of V to I. GLOW Newsletter 46:57–58.

224 Anna Roussou and Ian Roberts Belletti, A. 2004. Aspects of the low IP area. In L. Rizzi (ed) The Structure of CP and IP: The Cartography of Syntactic Structures, Volume 2. New York/Oxford: Oxford University Press, pp. 16–51. Bickerton, Derek. 1991. Haunted by the specter of creole genesis. Behavioral and Brain Sciences 14.2: 364–366. Bobaljik, J. 2000. The implication of rich agreement: Why morphology does not drive syntax. GLOW Newsletter 46:28–29. Bullock, B. 1991. The Mora and the Syllable as Prosodic Licensers in the Lexicon. PhD dissertation, Stanford University. Bybee, J., R.D. Perkins & W. Pagliuca. 1994. The Evolution of Grammar: Tense, Aspect and Modality in the Languages of the World. Chicago: University of Chicago Press. Cardinaletti, A., and M. Starke (1999). The typology of structural deficiency: A case study of the three classes of pronouns. In H. van Riemsdijk (ed.) Clitics in the Languages of Europe, 145–233. Berlin: Mouton de Gruyter. Chomsky, N. 1995. The Minimalist Program. Cambridge, MA: MIT Press. Chomsky, N. 2000. Minimalist inquiries: The framework. In R. Martin, D. Michaels & Juan Uriagereka (eds) Step by Step. Essays on Minimalist Syntax in Honor of Howard Lasnik, 89–156. Cambridge, MA: MIT Press. Chomsky, N. 2001. Derivation by phase. In Kenstowicz, M. (ed) Ken Hale: A Life in Language, 1–52. Cambridge, MA: MIT Press. Cinque, G. 1999. Adverbs and Functional Heads: A Cross-Linguistic Perspective. New York/Oxford: Oxford University Press. Clark, Robin. 1990. Papers on Learnability and Natural Selection. Technical Reports in Formal and Computational Linguistics, No. 1. Université de Genève. Clark, R. & I. Roberts. 1993. A computational model of language learnability and language change. Linguistic Inquiry 24:299–345. [this volume, Chapter 2]. Croft, W. 1994. Typology and Universals. Cambridge: Cambridge University Press. Ferraresi, G. 1997. Word Order and Phrase Structure in Gothic. PhD dissertation, University of Stuttgart. Von Fintel, K. 1995. The formal semantics of grammaticalization. NELS 25:175–189. Gimson, A. 1980. An Introduction to the Pronounciation of English. London: Arnold. Giorgi, A. & G. Pianesi. 1997. Tense and Aspect: From Semantics to Morphosyntax. New York/Oxford: Oxford University Press. Giusti, G. 2001. The birth of a functional category: From Latin ILLE to the Romance article and personal pronoun. In G. Cinque & G. Salvi (eds) Current Studies in Italian Syntax: Essays offered to Lorenzo Renzi. Amsterdam: North Holland, pp. 157–171. Görlach, M. Early Modern English. Cambridge: Cambridge University Press. Grimshaw, J. 1990. Argument Structure. Cambridge MA: MIT Press. Grohmann, K. 2000. Prolific Peripheries: A Radical View from the Left. PhD dissertation, University of Maryland, College Park. Haeberli, E. 1999. Features, Categories and the Syntax of A-Positions. Synchronic and Diachronic Variation in the Germanic Languages. Ph.D dissertation, University of Geneva. Hale, K. & S.J. Keyser. 1993. On argument structure and the lexical expression of syntactic relations. In K. Hale & S.J. Keyser (eds) The View from Building 20: Essays in Linguistics in Honor of Sylvain Bromberger. Cambridge MA: MIT Press, pp. 53–109. Haspelmath, M. 1995. Diachronic sources of ‘all’ and ‘every’. In E. Bach, E. Jelinek, A. Kratzer & B. Partee (eds) Quantification in Natural Languages. Dordrecht: Kluwer, pp. 363–382.

Theoretical Consequences 225 Heine, H. & T. Kuteva. 2002. World Lexicon of Grammaticalization. Cambridge: Cambridge University Press. Hopper, P. & E. Traugott. 1993. Grammaticalization. Cambridge: Cambridge University Press. Hornstein, N. 1999. Movement and control. Linguistic Inquiry 30:69–96. Kayne, R. 1994. The Antisymmetry of Syntax. Cambridge MA: MIT Press. Kayne, R. 2000. Parameters and Universals. New York/Oxford: Oxford University Press. Kroch, A. & A. Taylor, 1997. Verb movement in Old and Middle English: Dialect variation and language contact. In A. van Kemenade & N. Vincent (eds) Parameters of Morphosyntactic Change. Cambridge: Cambridge University Press, pp. 297–325. Van Kemenade, A. 1987. Syntactic Case and Morphological Case in the History of English. Dordrecht: Foris. Van Kemenade, A. 1997. V2 and embedded topicalization in Old and Middle English. In A. van Kemenade & N. Vincent (eds) Parameters of Morphosyntactic Change. Cambridge: Cambridge University Press, pp. 326–352. Kenstowicz, M. 1994. Phonology in Generative Grammar. Oxford: Blackwell. Lightfoot, D. 1979. Principles of Diachronic Syntax. Cambridge: Cambridge University Press. Lightfoot, D. 1991. How to Set Parameters: Arguments from Language Change. Cambridge MA: MIT Press. Lightfoot, D. 1998. The Development of Language: Acquisition, Change and Evolution. Oxford: Blackwell. Lyell, C. 1830. Principles of Geology. London. Manzini, M.-R. & L. Savoia. 2005. I Dialetti Italiani e Romanci: Morfosintassi Generativa. Alessandria: Edizioni dell’Orso. Marantz, A. 1997. No Escape from the Syntax: Don’t Try a Morphological Analysis in the Privacy of Your Own Lexicon. Ms., MIT. McCarthy, J. & A. Prince. 1986. Prosodic Morphology. Ms., Brandeis University. Mostowski, A. 1957. On a generalization of quantifiers. Fundamenta Mathematicae 44:12–36. Newmeyer, F. 1998. Language Form and Language Function. Cambridge MA: MIT Press. Pintzuk, S. 1991. Phrase Structures in Competition: Variation and Change in Old English Word Order. PhD dissertation, University of Pennsylvania. Platzack, C. 2001. Multiple interfaces. In U. Nikanne & E. van der Zee (eds) Cognitive Interfaces: Constraints on Linking Cognitive Information. Oxford: Oxford University Press, pp. 295-324. Plank, F. 1984. The modals story retold. Studies in Language 8:305-364. Pollock, J.-Y. 1989. ‘Verb Movement, UG and the Structure of IP’, Linguistic Inquiry 20, 365– 424. Postma, G. 1995. Zero semantics. PhD dissertation, Holland Institute of Linguistics. Repetti, L. 1989. The bimoraic norm of tonic syllables in Italo-Romance. PhD dissertation, UCLA. Repetti, L. 1991. A moraic analysis of raddoppiamento sintattico. Rivista di Linguistica 3:307- 330. Rizzi, L. 1997. The Fine Structure of the Left Periphery. In L. Haegeman (ed) The New Comparative Syntax. London: Longman, pp. 281–337. Roberts, Ian. 1985. Agreement Parameters and the Development of English Modal Auxiliaries. Natural Language and Linguistic Theory 3, 21–58 [this volume, Chapter 1]. Roberts, Ian. 1993. Verbs and Diachronic Syntax. Dordrecht: Kluwer.

226 Anna Roussou and Ian Roberts Roberts, I. 1997. Directionality and word order change in the history of English. In A. van Kemenade & N. Vincent (eds) Parameters of Morphosyntactic Change. Cambridge: Cambridge University Press, pp. 397–426 [this volume, Chapter 4]. Roberts, I. 1999. Verb movement and markedness. In Michel deGraff (ed) Language Creation and Language Change. Cambridge, MA: MIT Press, pp. 287–328 [this volume, Chapter 5]. Roberts, I. 2001. Language change and learnability. In Stefano Bertolo (ed) Parametric Linguistics and Learnability. Cambridge: Cambridge University Press, pp. 81–125. Roberts, I. & A. Roussou. 2002. The EPP as a condition on tense dependencies. In Peter Svenonius (ed) Subjects, Expletives and the EPP. New York/Oxford: Oxford University Press, pp. 125–156. Roberts, I. & A. Roussou. 2003. Syntactic Change: A Minimalist Approach to Grammaticalization. Cambridge: Cambridge University Press. Roussou, A. 2005. The syntax of non-volitional θelo in modern Greek. In A. Stavrou & A. Terzi (eds) Advances in Greek Generative Syntax. Amsterdam: Benjamins, pp. 331–360. Ruhlen, M. 1987. A Guide to the World’s Languages, Volume 1: Classification. London: Edward Arnold. Sher, G. 1996. Semantics and logic. In S. Lappin (ed) The Handbook of Contemporary Semantic Theory. Oxford: Blackwell, pp. 511–537. Sportiche, D. 1996. Clitic constructions. In J. Rooryck & L. Zaring (eds) Phrase Structure and the Lexicon. Dordrecht: Kluwer, pp. 213–276. Stowell, T. 1996. The phrase structure of tense. In J. Rooryck & A Zaring (eds) Phrase Structure and the Lexicon. Dordrecht: Kluwer, pp. 277–292. Tabor, W. & E. Closs-Traugott. 1998. Structural scope expansion and grammaticalization. In A. Giacalone-Ramat & P. Hopper (eds) The Limits of Grammaticalization. Amsterdam: Benjamins, pp. 229–272. Vanelli, Laura, Lorenzo Renzi, and Paola Benincà. 1986. Typologie des pronoms sujets dans les langues romanes. In Actes du XIIe Congrès de Linguistique et Philologie Romanes. Aix-en-Provence. Vikner, S. 1997. V-to-I movement and inflection for person in all tenses. In L. Haegeman (ed) The New Comparative Syntax. London: Longman, pp. 189–213. Vogel, I. 1994. Phonological interfaces in Italian. In M. Mazola (ed) Issues and Theory in Romance Linguistics: Selected Papers from the Linguistics Symposium on the Romance Languages XXIII, pp. 109–125. Vogel, I. 1999. Subminimal constituents in prosodic phonology. In S.J. Hannahs & M. Davenport (eds) Issues in Phonological Structure. Amsterdam: Benjamins, pp. 251–267. Warner, A. 1997. The structure of parametric change, and V-movement in the history of English. In A. van Kemenade & N. Vincent (eds) Parameters of Morphosyntactic Change. Cambridge: Cambridge University Press, pp. 380–393. Whitman, J. 2001. Relabelling. In S. Pintzuk, G. Tsoulas & A. Warner (eds) Diachronic Syntax: Models and Mechanisms. Oxford: Oxford University Press, pp. 220–238. Willis, D. 1998. Syntactic Change in Welsh: A Study of the Loss of Verb-Second. Oxford: OUP.

7

Cascading Parameter Changes Internally-Driven Change in Middle and Early Modern English Theresa Biberauer and Ian Roberts

1. Introduction Keenan (1996: 3) puts forward an important principle of syntactic change: the Inertia Principle. Keenan formulates this as follows: (1) Things stay as they are unless a force (including decay) acts upon them. We assume that syntactic change is a consequence of abductive reanalysis leading to parameter-resetting in first-language acquisition (see Lightfoot 1979, 1991, 1999). In that case, we can take (1) to mean that, all other things being equal, the target system in first-language acquisition will be converged on successfully. This is no doubt due to the highly restricted range of analyses of the Primary Linguistic Data (PLD) that Universal Grammar (UG) allows and the limited exposure to PLD needed for parameter fixation, i.e. standard poverty-of-stimulus considerations. Longobardi (2001: 278) adopts Keenan’s principle, and puts forward the following very interesting version of it: (2) Syntactic change should not arise, unless it can be shown to be caused (emphasis his). In other words, as Longobardi says, “syntax, by itself, is diachronically completely inert” (277-8). In minimalist terms, this means that the computational system of human language (CHL in Chomsky’s (2001, 2004, 2005) terminology) is not itself capable of endogenous change. The question that then arises is under what circumstances syntactic change can in fact happen? This is the central question that we wish to address in this paper. According to Longobardi’s version of Inertia in (2), syntactic change must be “a well-motivated consequence of other types of change (phonological changes and semantic changes, including the appearance/disappearance of whole lexical items) or, recursively, of other syntactic changes” (2001: 278, emphasis ours—MTB/IGR). Following and elaborating slightly on Longobardi’s point as just quoted, we take it that syntactic change can be caused by changes to PLD arising from independent

228 Theresa Biberauer and Ian Roberts phonological, morphological or lexical change, or from extra-grammatical factors such as contact. In this paper we intend to develop the idea of recursive syntactic change, that which arises when an initial, extra-syntactically induced parameter change creates a system which has a propensity to further parametric change. As we show, using data from the history of English, this may lead to cascades of parameter changes over several centuries, giving rise ultimately to a major typological shift and the illusion of “typological drift”, in the sense of Sapir (1921) (cf. Sapir’s (1921: 165) definition of drift as “the vast accumulation of minute modifications which in time results in the complete remodeling of the language”). We explore this idea by looking at a series of changes which took place in the history of English between 1100 and 1700, which had the net effect of transforming English from what one might think of as a “typologically standard” West Germanic language into the highly unusual system of Modern English, which has many features unattested in the neighbouring Germanic, Romance and Celtic languages. The changes we look at are the following: the shift from OV to VO (12th and early 13th century), the loss of “residual” OV orders (ca. 1400), the development of clause-internal expletives and of systematic raising of subjects (15th century); the loss of V2 (ca. 1450), the development of the auxiliary system (modals and do) (ca. 1525), the loss of “short” verb-movement (ca. 1575), the contraction of negation (ca. 1600), the development of negative auxiliaries (1630s), and the development of do-support (later 17th century). The paper is organized as follows: in Section 2, we give the general theoretical background to the analyses we will propose, based on Biberauer & Roberts (2005); in Section 3, we summarise Biberauer & Roberts’ (2005, 2006) analysis of word-order change in Middle English (this covers the first three changes listed above); Section 4 deals with the loss of V2, the development of the auxiliary system and the loss of short V-movement, following the proposals in Biberauer & Roberts (2006); here we also present our analysis of the development of do-support. Section 5 concludes the paper.

2. Theoretical Background: Agree, EPP-Features and Pied-Piping Chomsky (2001, 2004, 2005) proposes a system of feature-valuing and movement which relies on two main notions: Agree and Extended Projection Principle (EPP) features. Here we will briefly describe this system and how it is applied in the analysis of word-order change in ME put forward by Biberauer & Roberts (2005). Agree is a relation between two heads α and β, where the following conditions hold: (3) a. α asymmetrically c-commands β; b. α and β are non-distinct in formal features;

Cascading Parameter Changes 229 c. there is no third head γ which intervenes between α and β which would be able to Agree with α (i.e. there is no head γ bearing features of the relevant type which asymmetrically commands β but not α). Where Agree holds, α is known as the Probe and β as the Goal. A precondition for Agree is that both the Probe and the Goal must be active, meaning that they must bear unvalued formal features.1 A typical example of the Agree relation is that which holds between T(ense), the head which bears φ-features relating to the subject, and the φ-features of the subject itself, merged in SpecvP. As shown in (4): (4)

TP T[ϕ]

vP DP ... D[ϕ]

v’

Here the structural conditions for Agree, given in (3), are satisfied: T asymmetrically c-commands D, they are non-distinct in formal features since both bear φ-features and there is no head bearing φ-features intervening between them. Thus, if both T and D bear unvalued features, they are able to Agree. T is taken to have unvalued φ-features, while D, being a nominal element, is inherently specified for these features. D, on the other hand, has an unvalued Case feature. T and D in (4) can therefore Agree, with T’s φ-features being valued by D and D’s Case feature in turn being valued as Nominative by (finite) T in virtue of this Agree relation. This account therefore captures the inherent relation between Nominative Case and agreement with the subject. Note, however, that the account of feature-valuing outlined above makes no reference to movement. In many languages, of course, the DP in (4) raises to the specifier of the head with which it Agrees, i.e. to SpecTP, the “canonical subject position” since Chomsky (1982). In terms of the theory we adopt here, this operation is in principle independent of Agree, although related to it. More specifically, movement in the Probe-Goal system under discussion here only takes place where the target of movement (i.e. the Probe) bears an EPP-feature.2 Thus if the Probe involved in an Agree relation between two heads bears an EPP-feature, the Goal will raise to the Probe, either to a head-adjoined position or to a specifier, depending on the structural status (head vs XP) of the category moved. In (4), this means that if T bears an EPP-feature, either a D-head will adjoin to T or a DP will raise to create a TP-specifier. This latter operation is what happens in Modern English (NE) and in many other languages. We construe the EPP-feature as a feature of a feature, i.e. as being specifically associated with (a) particular feature(s) of the Probe (cf. Pesetsky & Torrego 2001: 359). Thus where the EPP-feature

230 Theresa Biberauer and Ian Roberts is associated with, for example, D-features on T, rather than with T’s Tense features, we represent this as EPPD, etc. A question that now remains is what determines whether a head or an XP undergoes movement? We propose that this depends on pied-piping. The dissociation of feature-valuing from movement makes it clear that a category larger than the Goal, but containing the Goal, may be moved. As we saw above, feature-valuing under Agree is a relation between heads, while the EPP-feature simply requires that the Goal must be moved, but does not in fact necessarily require that only the Goal be moved; it may therefore allow or require, as a matter of parametric variation, that a category larger than the Goal, but containing the Goal, be moved. This is the dimension of parametric variation that is explored in detail in Richards & Biberauer (2005), Biberauer & Richards (2006), Biberauer & Roberts (2005, 2006) and in Section 3 below. More generally, the pied-piping option is relevant in a configuration of the type in (5): (5) . . . XPROBE. . . [YP. . . ZGOAL. . .] . . . Here X Agrees with Z, and, where X has an EPP-feature, UG allows crosslinguistic variation as to whether ZGOAL moves to X or the larger category YP containing ZGOAL moves to X. Movement of the larger category is piedpiping. A well-known example of a cross-linguistic difference of the type in question is the option of pied-piping as opposed to preposition-stranding in the case of wh-movement of the object of a preposition. Consider the contrast between English and French illustrated in (6): (6) a. *Qui as-tu parlé à? [A qui] as-tu parlé? to whom have-you spoken b. Who did you speak to? [ To whom ] did you speak? As shown above, French requires pied-piping of the PP, while English allows preposition-stranding as well as pied-piping. These are parametric options instantiating the schema in (5) since the wh-expression is the Goal of Agree (i.e. Z; the Probe (X) in this case is a [+wh] C), and PP is YP. With these technical preliminaries behind us, we can now move on to the syntactic changes in the history of English that we are interested in.

3. Word Order Changes in Middle English Biberauer & Roberts (2005) (B&R) propose an analysis of Old English (OE) and Middle English (ME) word-order patterns in terms of which the patterns attested at the various stages of OE and ME are analysed as the

Cascading Parameter Changes 231 output of a single grammar which permits restricted types of variation. As we shall see, the variation in question is exactly like that in (5) and (6) above. Their analysis is “Kaynian”, in that, following Roberts (1997 [this volume, Chapter 4]), van der Wurff (1997, 1999) and Fischer et al. (2000), they assume that the underlying word order throughout the history of English is head-initial (this follows from the Linear Correspondence Axiom of Kayne (1994); see Roberts (1997 [this volume, Chapter 4]) for discussion of this in relation to OE).3 B&R propose that West Germanic-like OE word orders, such as SOVAux in subordinate clauses (main-clause order is consistently complicated by the effects of Verb Second), were derived by the application of two types of ‘large XP’ movement: VP-raising to SpecvP and vP raising to SpecTP. To see how this works, consider an SOVAux example like (7): (7) Đa se Wisdom þa þis fitte asungen hæfde . . . when the Wisdom then this poem sung had “When Wisdom had sung this poem . . .” (Boethius 30.68.6; Fischer et al., 2000: 143, 25) The order observed in (7) is obtained by means of the operations given in (8) in the order shown:4 (8) (i) V-to-v raising: vP VP

V+v (V)

O

(ii) VP-to-(inner)SpecvP movement: vP v’

VP (V)

(VP)

O v v

V

(iii) merger of the subject in the topmost SpecvP: vP

v’

S

v’

VP (V)

O v V

(VP) v

Theresa Biberauer and Ian Roberts

232

(iv) vP-movement to SpecTP: TP T’

vP S

T

v’ VP

(V)

(vP)

hæfde

v’ O v V

(VP) v

In (8i) we illustrate V-movement to the “light verb” position, v. Following Marantz (1997), Chomsky (2004: 112, 122), we assume that this operation is universal and is required in order to “verbalise” the acategorial root (which we continue to write as V for convenience). The movement of VP to SpecvP shown in (8ii) is a case of pied-piping of the type discussed in the previous section. Here v Probes the D-features of the object, and has an EPPD feature. The object is the Goal of Agree with v, but where we have the V-final order in (7) as opposed to a “leaking” order (see below), the larger category containing the object DP, namely VP, moves. This is a parametric option in OE. The effect of moving the remnant, verbless VP, i.e. [VP (V) DP] (we indicate “traces” of moved categories in parentheses), is therefore to create the surface order OV. (8iii) demonstrates merger of the subject DP in SpecvP.5 (8iv) shows a second instance of pied-piping, exactly analogous to the one in (8ii) but at a higher structural level. Following merger of hæfde (see Note 4 on the position of auxiliaries), T probes the D-features of the subject, and looks to satisfy its EPPD feature. The subject is the Goal of Agree with T, but where we have the VAux order shown in (7), the larger category containing the subject DP, namely vP, moves. This is a further parametric option in OE. The effect of moving the vP, along with the other operations seen in (8), is therefore to create the surface order SOVAux. It is, of course, well known that the word orders exhibited by OE are not restricted to the SOVAux order considered above. B&R show how all of the other available orders, except those where the object is final (on which see below) can be derived by assuming that the EPPD features of v and T may, in fact, both be satisfied either by means of pied-piping (i.e. moving VP and vP as discussed above) or by moving just the Goal DP, and thus “stranding” VP/vP-internal material. Thus OE EPPD-satisfaction is directly analogous to EPPwh-satisfaction in the case of extraction of the complement of a preposition in NE, in that both the stranding and the pied-piping options are available. The stranding option in the vP-domain gives rise to the derived structure (9), as opposed to (8ii): (9)

vP

v’

O v V

VP v (V)

(O)

Cascading Parameter Changes

233

The principal consequence of the object-raising derivation illustrated in (9) is that any VP-internal material additional to the verb itself and the direct object, e.g. indirect objects, PPs, adverbial material, particles, etc., will appear in a postverbal position. So this option explains the attestation of “leaking” structures in OE. We can further explain the fact that languages such as German disallow “leaking” by saying that German does not allow the “stranding” option in this case (in other words, German EPPD satisfaction is parallel to French wh-movement from PPs). B&R are thus able to account for both the V-final and the “leaking” orders in OE in terms of a single grammar with an option for pied-piping vs stranding as regards EPPD satisfaction in the vP domain. B&R further argue that the same options were found in the TP-domain. (10) gives the structure that results if the “stranding” option is taken in place of vP-fronting at the stage of the derivation illustrated in (iv) above: (10)

TP S

T’ T hæfde

vP

(S)

v’ VP

(V)

v’ O v

(VP) v

V

The surface order that results here is SAuxOV, i.e. an order that is often referred to as “Verb-projection raising” (see van Kemenade (1987) on these orders in OE and Haegeman & van Riemsdijk (1986) on this order in Swiss German and West Flemish; we consider structures of this kind in more detail below).6 In (10), we have “stranding” in the TP-domain, but pied-piping in the vP-domain. Stranding in both domains gives (11): (11)

TP S

T’ T hæfde

vP

(S)

v’ O

v’ v V

(VP) v (V)

O

234 Theresa Biberauer and Ian Roberts Again we have SAuxOV, the “verb-projection raising” order, but this time with leaking of VP-internal material. This order, too, is attested in OE. Examples of the orders in (9 – 11) are given in (12): (12) a. þa geat mon þæt attor ut on þære sæ then poured man that poison out on the sea “Then someone poured the poison out on the sea” (Orosius 258.16; Lightfoot 1991: 61, 18b) b. . . . þæt hi mihton swa bealdlice Godes geleafan bodian that they could so boldly God’s faith preach . . . that they could preach God’s faith so boldly” (ÆCHom. I, 16.232.23; Fischer et al. 2000: 156, 48) c. . . . þæt mon hæfde anfiteatrum geworht æt Hierusalem that man had amphitheatre made at Jerusalem “. . . that man had made an amphitheatre at Jerusalem” (Orosius, Or_6:31.150.22.3120; Trips 2002: 81, 23) (12a) illustrates the “stranding” mode of EPPD satisfaction in the vP domain: only the object DP, þæt attor, raises, stranding the particle ut and the PP on þære sæ. (12b) illustrates “stranding” in the TP domain, with hi raising independently of the rest of the vP, swa bealdlice Godes geleafan bodian, to satisfy T’s EPPD feature. Finally, (12c) shows that it was also possible for just the subject (mon) and just the object (anfiteatrum) to raise to satisfy T and v’s EPP-features. B&R show how postulating a grammar which permits the option of moving just the Goal DP alongside the possibility of piedpiping a larger constituent containing that DP enables one to account for the attested, stable synchronic variation in OE. They furthermore argue that the approach described above also affords a principled account of the word-order changes that took place in ME. The basic idea is that the grammar changed from one which allowed both the VP/vPpied-piping option and the “stranding” (i.e. DP-movement) option for satisfaction of v and T’s EPPD features to one which allowed only the latter mode of satisfaction. B&R propose that this change first occurred at the v-level, in the 12th or early 13th century (see Canale (1978), van Kemenade (1987) and Lightfoot (1991) in this connection). The loss of VP-pied-piping involved a reanalysis of simple OV orders whereby the remnant-VP fronting was reanalysed as object-movement. This can be illustrated with the following example: (13) The man the apple ate. a. [vP S [VP (V) Obj] [v V+v ] (VP) ] b. [vP S Obj [v V+v ] [VP (V) (O) ]] B&R suggest that the cause of this reanalysis was a decrease in unambiguous evidence for pied-piping. A grammar allowing both pied-piping and stranding generates a larger language than one which only allows one of the

Cascading Parameter Changes 235 two options, and is therefore less highly-valued if one assumes the Subset Principle. In terms of this principle, originally put forward in Berwick (1985), “the learner selects the grammar that generates the smallest possible language that is compatible with the data” (Manzini & Wexler 1987: 425).7 In the OE context, the Subset Principle required the OE system, with its optionality of pied-piping vs. stranding, to be robustly triggered by examples of the sort illustrated in (12) above. Arguably, in Early ME, the pied-piping option was, however, less robustly triggered than before. To see this, it is important to realise that the basic difference between the conservative ((a)) and the innovative ((b)) structures in (13) is that the innovative structure allows only the object to feature in preverbal position, with any remaining VP-internal material following the verb, whereas the conservative grammar allows all VP-internal material to surface preverbally (although it need not do so). Given that finite clausal complements always appeared postverbally,8 the principal constructions where one can distinguish the two systems are verb-particle constructions and double-object constructions (we will discuss non-finite complements in Section 4). Verb-particle constructions with the order object—particle—V must be analysed as involving a pied-piping grammar, as the particle is fronted along with the object in the remnant VP (since Koster (1975) it has been assumed that the particle is merged in a VP-internal position in West Germanic). So this order would have triggered the pied-piping grammar, and is clearly found in OE (cf. Pintzuk 1991: 76f., Fischer et al. 2000: 185f.). However, it has often been remarked that verb-particle constructions become vanishingly rare in the 12th and 13th centuries (Spasov, 1966, cited in Kroch & Taylor, 2000: 146); it is possible that this was due to the influx of French borrowings at this period, replacing earlier verb-particle constructions with a simple verb. Thus this important trigger for the pied-piping grammar may have been removed, or at least rendered less robust than formerly, owing to an entirely extraneous lexical factor. A second extraneous factor may have been at work in the case of ditransitive constructions. In these constructions, the order direct object—indirect object— V would have triggered the pied-piping grammar. Again, this order is attested in OE (cf. van Kemenade 1987, Koopman 1990, 1994, Allen 1995, Koopman & van der Wurff 2000). However, during early ME, the distinction between accusative and dative case was lost; Allen (1995: 158f.) shows in detail that the system had broken down in all the ME dialects except Kentish by the end of the 13th century at the latest (see her table 10.1, p. 441). One consequence of this was a rise in prepositional datives. The use of a PP to express indirect objects gives rise to greater positional freedom for these arguments, and consequently a greater instance of “leaking”, and a correspondingly less frequent instantiation of the order triggering the conservative, pied-piping grammar. We propose, then, that the two factors just described would have undermined the trigger experience for the grammar with the pied-piping option. As a result, the word order changed in the way we observe. The word-order changes

236 Theresa Biberauer and Ian Roberts are thus the consequence of a reanalysis of the ever more liberal ‘stranding’permitting pied-piping grammar as one which specifically targets DPs. It is important to note that the reanalysis in (13) did not eliminate OV order, but that it simply changed the structure of OV sentences. Subsequently, starting from around 1400, object-movement of the type shown in (13b) became restricted to negative and quantified objects (van der Wurff (1997, 1999)). This restriction of v’s D-attracting property (arguably to [+Op] DPs) led to an overall increase in the number of VO orders in the PLD. As a result, many instances of vP-movement of the type shown in (8iv) above were in fact indistinguishable from simple DP-movement of the subject of the type seen in (10). This led to the reanalysis in (14). We again illustrate the reanalysis with a simple example: (14) The man ate an apple. a. [TP [vP Subj [v V+v ] [VP (V) Obj ]] T (vP) ] (conservative) b. [TP Subj T [vP (Subj) [v V+v ] [VP (V) Obj ]]] (innovative) Whilst (14) illustrates the basic point that the reanalysis does not affect the surface word order in a simple SVO example of this type, it is, as given, not quite correct. Assuming that auxiliaries surface in T (see Note 4), then (14a) predicts that the conservative grammar allowed the unattested SVOAux order. To solve this problem, B&R appeal to the fact that VP and everything it contains is inaccessible to syntactic operations once the derivation has proceeded past vP (this follows from the version of the Phase Impenetrability Condition (PIC) of Chomsky (2000)). As a consequence of this, the VP-internal object cannot surface in the position in the linear order indicated in (14a), but instead appears in the pre-vP-movement position following the surface position of the auxiliary in T. Thus, thanks to the PIC, the unattested SVOAux order cannot arise. The correct representation for (14a) is therefore (14a’), where the constituents indicated in outline have already been transferred to the interfaces and are therefore unavailable for syntactic operations: (14a’) [TP [vP Subj [v Vv]] T([VP [V] Obj [VP V Obj ]])] (14) illustrates a simplification parallel to that in (13). In (14a’), vP is piedpiped to SpecTP in order to satisfy T’s EPPD feature. In (14b), the subject alone raises to satisfy the same feature. We are thus once again dealing with pied-piping as opposed to “stranding”. The important point here is that, in the absence of clause-internal adverbial modification and auxiliaries (on which, see below), both structures give rise to the same linear order (SVO). The consequence of this is that there is, in cases of this kind, no unambiguous trigger for the more complex pied-piping operation. Moreover, as in the case of (13), the Subset Principle disfavours the grammar with the piedpiping option, since this generates a bigger language than one without it. Pied-piping must therefore be robustly triggered, and B&R suggest that by the 15th century, it was not.

Cascading Parameter Changes 237 Biclausal structures initially provided an environment in which the conservative structure in (14a’) and the innovative structure in (14b) gave rise to different orders. These were thus important triggers for the conservative grammar. (14a’) gives rise to surface SVAuxO and (14b) gives rise to SAuxVO (recall that we are using the cover “Aux” for restructuring verbs; see Note 4). Let us look at how the conservative grammar operated in biclausal cases as this will also help us to see how the restriction on objectmovement described above created the circumstances for the loss of the pied-piping option in biclausal environments, a development which also had important consequences for monoclausal structures. As first pointed out in van Kemenade (1987: 55f.), modal, causative and perception verbs were V(P)R (i.e. restructuring) triggers in OE, a state of affairs that entailed that the infinitival Vs selected by these verbs followed their selectors, as illustrated in (15): (15) a. . . . þe æfre on gefeohte his handa wolde afylan. who ever in battle his hands would defile “. . . whoever would defile his hands in battle” (Ælfric’s Lives of Saints 25.858; Pintzuk 1991: 102, 62) b. . . . . þæt hi mihton swa bealdlice Godes geleafan bodian. that they could so boldly God’s faith preach “. . . that they could preach God’s faith so boldly” (ÆCHom I, 16.232.23; Fischer et al. 2000: 156) (15a), with the order OAuxV, is known as the “verb raising” (VR) order; (15b), with AuxOV, is one case of “verb-projection raising”. Following B&R, we assume the structure in (16) for the complements of restructuring verbs (VR) in OE and Early ME; we consider a VR structure of the kind illustrated in (15a) by way of illustration (note that the matrix vP is omitted): (16)

TP T’

DP-Subj T

... VP TPINF

VR vP

T’ v’

DP PRO VP (V)

DP-Obj

(vP)

T v’

V+v

T

(V+v) (VP)

238 Theresa Biberauer and Ian Roberts We are assuming that the complement of a restructuring verb is a TP (cf. i.a. Wurmbrand 2001 and Lee-Schoenfeld 2005 for arguments in favour of the idea that restructuring complements are “smaller” than other clausal complements). In the context of the theoretical framework we are assuming here, the specific assumption is that restructuring complements are TPs headed by a “defective” T, i.e. one that is not selected by C (cf. B&R 14f.). For our present purposes, this idea has the important consequence that the material in the restructuring complement is not sent to Spellout prior to merger of VR, the way material in the clausal complements of non-restructuring verbs is (owing to the PIC; cf. the discussion of the object in (14a’) above). This accounts for the “clause union” effects commonly associated with restructuring structures. Let us see how our analysis of V(P)R works in more detail. The derivation of the VR order in (15a) proceeds by the following steps. First, as we saw in (8i), V moves to v inside the vP of the embedded clause. Second, as in (8ii), the remnant VP moves to SpecvP. Third, V+v moves to T in the infinitival clause. This operation is the key to deriving the Aux-V order here; Biberauer & Roberts (2006) take this infinitive-movement to be triggered by a selectional property of the main-clause verb VR. They assume the selectional property to be the nature of the (defective) TP that VR selects. The last step in the derivation of a VR structure is remnant vP movement to the specifier of the selected T (this is another instance of “pied-piping” satisfying an EPP-feature). The loss of generalised object movement described above had the effect in the V(P)R context that vP-movement to the lower SpecTP would not be distinguishable in terms of the surface string from just subject movement. To see how this works, consider the structure in (17), which illustrates vP-movement to Spec-TP in a structure where the object has not undergone raising: (17)

TP T’

DP-Subj T

... VP VR

TP vP

DP PRO

T’ v’ tv

tvP

T V+v

T

Already sent to Spellout: VP tv

DP-Obj

As in the case of the direct object in (14a’), the VP indicated in outline here is merged as the complement of the lower v, and thanks to the operation of

Cascading Parameter Changes 239 the PIC, this material is sent to Spellout and therefore becomes inaccessible for further operations as soon as the lower vP is completed. Hence, movement of this vP to SpecTP has no effect on the surface position of the object, which remains final. We thus straightforwardly derive optional VO orders in the complements of VR in both OE and ME.9 Moreover, in (17) the choice between pied-piping vP to the lower SpecTP and exclusively raising the subject to that position, which was operative throughout ME, has absolutely no effect on the surface order of elements, since the only overt material in vP which the PIC would allow to be spelled out in its moved position is the subject, which, in this case, is PRO, i.e. an element which cannot be assigned phonological form.10 As in the case of (14) above, we therefore once again see the relationship between the two changes: when the object is spelled out in postverbal position, crucial evidence in favour of the pied-piping option at the T-level is obscured. Thus the loss of generalised object movement had the consequence that the trigger experience began to feature many more structures for which it was impossible to distinguish subject-raising from vP-raising on the basis of the surface string. Because of the PIC then, acquirers had no evidence to distinguish a derivation involving pied-piping of vP to satisfy T’s EPPD feature from one in which only the subject moves to satisfy that feature. It is of course possible that the presence of vP-adverbials or other modifiers might disambiguate the two derivations, but in the vast majority of cases the ambiguity would have been present. We take it that this situation led to the reanalysis of (17) as (18): (18)

TP T’

DP-Subj T

... VP VR

TP DP PRO

T’ T

V+v

vP T

(PRO)

v’

(v)

VP (V)

DP-Obj

As the structure in (18) shows, the fronted vP in infinitival contexts may have contained no overt material at all: an empty subject (here indicated as PRO) and the trace/copy of v (see Note 10). Recall that VP has already been sent to Spell out, and hence is not realised in the moved position. Given

240 Theresa Biberauer and Ian Roberts the lack of evidence for vP-movement, the simpler option of DP-movement was preferred (assuming that language acquirers always take the simplest option consistent with the trigger experience, where simplicity is taken to mean the smallest structure consistent with the input—see Clark & Roberts 1993 [this volume, Chapter 2]); vP-movement was therefore lost as a means of satisfying T’s EPPD feature. This concludes our account of the loss of vP-pied-piping. We now consider the empirical consequences of this loss. The reanalysis of vP-movement as subject-movement had two major consequences, both deriving from the fact that T’s EPPD feature, in the innovative grammar, could only be satisfied by a DP in SpecTP. The two consequences were that (i) expletive insertion became obligatory where no appropriate, raisable subject was available, and (ii) that movement of DP into SpecTP became obligatory in passives and unaccusatives. B&R illustrate both of these consequences in detail. They further show that both expletives and subject raising were options prior to the 15th century, owing to the fact that DP-raising to SpecTP was, in the conservative grammar, an available means of satisfying T’s EPPD feature. After the change in (14), however, this was the only way of satisfying T’s feature, and so expletive insertion and DP-raising became obligatory. B&R therefore provide a natural account for both the extended period of variation during which expletives and subject-raising were simply optional and for the fact that the change that ultimately took place went in the direction that it did: optionality is to be expected while the grammar has at its disposal two modes of EPP-satisfaction, but once the trigger experience for one of these modes has become insufficiently robust, language acquirers will opt for a simpler grammar which retains only the robustly attested mode. As we have shown above, the changes that occurred in early ME conspired to create a scenario in which vP-raising became indistinguishable from DP-raising in a majority of contexts, with the consequence that the former mode of EPP-satisfaction was lost. A further consequence of the loss of vP-raising was the loss of the orders usually referred to as Stylistic Fronting (Styl-F; see Biberauer & Roberts (2006)). Kroch & Taylor (2000) argue that ME had this operation, which functioned along lines similar to those typically claimed for Modern Icelandic (see Maling (1990), Holmberg (2000)). The two principal properties of Styl-F are that there must be a subject-gap and that it is subject to an Accessibility Hierarchy which states that negation takes precedence over adverbs which in turn take precedence over participles and other verbal elements. (19) is an example of putative Styl-F in ME: (19) . . . wiþþ all þatt lac þatt offredd wass biforenn Cristess come with all that sacrifice that offered was before Christ’s coming “. . . with all the sacrifice that was made before Christ’s coming” (Ormulum I.55.525; Trips 2002: 306, 123)

Cascading Parameter Changes 241 (19) is a passivised relative, with the passive participle offredd (“offered”) representing the fronted element. Biberauer & Roberts (2006) propose that cases of Styl-F observed in ME, and V-Aux ordering more generally, involve vP-movement to SpecTP. In their terms, the TP inside the relative clause in an example like (19) has the structure given in (20): (20) [TP [vP (Op) offredd] [T’ [T wass ] ([vP (Op offredd)) Op biforenn Cristess come])]] The most important aspect of this structure for the purposes of this paper is that vP, containing the string [Op offredd], has raised from its first-merged position following [T wass] to SpecTP. This operation takes place in order to satisfy T’s EPPD-feature. In the case under consideration, the D-feature is borne by the passive participle offredd, which B&R, following Baker, Johnson & Roberts (1989 [this volume, Chapter 8]), assume to contain the “absorbed” logical subject (cf. also Richards & Biberauer 2005).11,12 Biberauer & Roberts’s (2006) analysis also affords a simple explanation of the loss of “Styl-F”. For them, it is simply a case of the loss of vP-fronting, i.e. the loss of the pied-piping option for satisfaction of T’s EPPD feature. In this section, we have summarised B&R’s account of the word-order changes in ME. We have left out a number of details, but the essential points are as given here: the idea that OE had the option of “stranding” or piedpiping VP- and vP-internal material at both the v and T level for EPPD satisfaction, and the idea that the pied-piping option was lost in two stages in ME: first in the 12th or early 13th century at the v-level, and in the 15th century at the T-level. There was additionally also a further change around 1400 restricting object movement to negative and quantified objects. The OE grammar had two options at both levels; independent morphological and lexical factors undermined the evidence for one of these options, in such a way that, thanks to the Subset Principle, one of the options was lost. As we have seen, this in fact took place initially at the v-level, and the change at this level, combined with the restriction on object movement, led to the change at the T-level. The first change was in accordance with the Inertia Principle, since it was caused by independent lexical and morphological factors. The change at the T-level was an example of a syntactic change caused by the net effects of two earlier syntactic changes. This thus provides an initial example of the “cascade” effect which we discussed in the Introduction.

4. The Loss of V2 and the Rise of the Auxiliary System Let us turn now to the loss of V2 in the 15th century. We can date this change to approximately 1450 (cf. van Kemenade (1987: 219f.), Fischer et al. (2000: 133f.)). Starting with van Kemenade (ibid.), it has often been suggested that V2 was lost through “decliticisation”. This idea is related to a well-known OE phenomenon: the existence of a systematic class of

242 Theresa Biberauer and Ian Roberts apparent exceptions to V2 where a pronominal clitic was able to intervene between the initial constituent and the verb: (21) a. hiora untrymnesse he sceal rowian on his heortan. their weakness he shall atone in his heart (CP 60.17; Pintzuk (1999: 136)) b. Þin agen geleafa þe hæfþ gehæledne. thy own faith thee has healed (BlHom 15.24–25) Although there are many different analyses of this phenomenon (cf. i.a. van Kemenade (1987), Platzack (1995), Roberts (1996), Kroch & Taylor (1997), Fuss (1998), Fuss & Trips (2002), Haeberli (1999, 2002)), there is general agreement that the clitics do not “count” for the computation of V2. In terms of Chomsky’s (2008) idea that only phase heads can trigger movement, we could postulate that C is the host of the clitic in these cases (and cliticisation is to the left of the host, see Kayne (1994)).13 The core of the decliticisation idea is that, given the string XP–SCL–V, where “SCL” stands for “subject clitic”, if the SCL ceases to be a clitic, then this string is incompatible with V2. Van Kemenade (ibid.) proposes that precisely this decliticisation caused the loss of V2 in English. A consequence of the change in T’s mode of satisfying its EPPD feature discussed in the previous section is that a DP must appear in SpecTP from 1450 onwards, exactly the time of the loss of V2 (cf. van Kemenade’s (1997: 350) observation that “[t]he loss of V2 and the loss of expletive pro-drop [i.e. the development of a requirement for SpecTP always to be filled with a DP—MTB/IGR] .. coincide historically”). We propose the following reanalysis of sequences like those in (21) at this time (see below on the status of the SCL in (22b)): (22)

a. [CP XP [C SCL-[C [v V+v ] C]] [TP [vP (SCL) ([vVv])] T vP ]] > b. [CP XP C [TP SCL [T [v V+v ]] vP ]]

In (22a), T takes the pied-piping option for satisfaction of its EPPD feature, so vP moves to SpecTP. SCL cliticises to C, an operation which for present purposes we take to involve head-adjunction to the left of C. In (22b), SCL moves to SpecTP to satisfy this feature, as required in the innovative system. The reanalysis is forced by the loss of the pied-piping option. Furthermore, assuming that true clitics can only move to phase heads as we just mentioned, “SCL” in (22b) cannot be a true subject clitic, but must instead be a full subject pronoun. Hence decliticisation follows from the reanalysis in (22). This analysis also accounts for the observed gradualness of the loss of V2 (cf. “the loss of V2 is not an abrupt change, but a rather gradual one” (Haeberli (1999: 406)). Since the conservative grammar allowed the option of DP-movement to SpecTP before the reanalysis took place, the structure

Cascading Parameter Changes 243 in (22b) was already an option before the reanalysis took place, and so a gradual decline in V2, starting before 1450, is expected. In fact, V2 would have been strictly speaking optional throughout OE and ME (see Haeberli (2002) for discussion of this, and some evidence that this was indeed the case). The reanalysis in (22) was presumably also favoured by the fact that many V2 orders were in any case subject initial, and such orders were prone to be reanalysed as TPs with V-to-T movement (see Kroch & Taylor (1997), Fischer et al.(2000); see also Adams (1987) and Roberts (1993) on Old French, and Willis (1998) on Middle Welsh). An important consequence of the loss of V2 due to the reanalysis in (22) was that V-to-T movement became a general feature of finite clauses. The same is true in the history of both French and Welsh (cf. the references given in the previous paragraph). Biberauer & Roberts (2010) propose that, in the case of English, this led to a marked option. The reason for this has to do with tense-marking in English. Briefly, Biberauer & Roberts (2010) propose that the trigger for V-to-T movement is not rich agreement morphology (as was proposed by Roberts (1985 [this volume, Chapter 1], 1993), Rohrbacher (1994, 1999), Vikner (1997) and others), but rather rich tense morphology. More concretely, they propose that T has an unvalued V-feature, while V has an unvalued T-feature. T and V thus enter an Agree relation in terms of which T’s V-feature probes its interpretable counterpart on V, with the latter’s T-feature being valued in the process (cf. the discussion in Section 2 above). In English, the reflex of this Agree relation is V’s tense morphology, i.e. “Affix Hopping” in the sense of Chomsky (1957) and much subsequent work is simply valuation of V’s T-features via Agree. The same is true in non-V2 environments in the other Germanic languages (except Icelandic). In Romance, the Agree relation is associated with an EPP-feature on T which triggers V-movement. Most importantly in the present context, Biberauer & Roberts (2010) suggest that the difference between Germanic and Romance is correlated with the richer system of tense-marking in Romance: French and Italian have 5–7 synthetic tenses (depending on register), while Spanish and Portuguese have more. Germanic on the other hand, has at most 4 such tenses, with English and MSc effectively restricted to 2. Biberauer & Roberts (2010) propose that a Romance-style V/T-Agree system cannot be supported in a feebly tense-inflected language like Late ME (unlike Middle French or Middle Welsh, see above). The argument is therefore that the V-to-T movement grammar which resulted from the loss of V2 around 1450 was inherently unstable since a crucial morphological trigger for it—“rich” tense morphology of the Romance kind—was missing. This, it is argued, contributed to the reanalysis of modals and do as auxiliaries in the early 16th century and the subsequent loss of V-to-T movement later in the 16th century. Let us consider in a little more detail how this reanalysis happened.

244 Theresa Biberauer and Ian Roberts Recall that the modals were a subclass of the members of VR. Consider again the structure of a sequence containing a modal with an infinitival complement after the reanalysis of (17) as (18). Following Roberts (1993: 262) and Roberts & Roussou (2003: 41-42), we take it that the loss of infinitival inflection, which had taken place by 1500, removed the trigger for V-to-T movement in the complement to VR (the assumption is therefore that the infinitival inflection specifically instantiated features on V that not only entered in an Agree relationship with T, but also had to undergo movement under the influence of an associated EPP-feature). In this way, the evidence for the lower functional T-v system was removed from the trigger experience of acquirers. Hence (18) was reanalysed in the early 16th century as monoclausal, with modals being merged in v or T and the lexical verb remaining in V—cf. (23): (23)

TP DP-Subj

T’

T Modal

vP (Subj)

v′ VP

v V

Obj

This change, again, was rather clearly a simplification, a significant one in the context of the system at the time: as pointed out by Roberts (1985 [this volume, Chapter 1], 1993, 1999) and Warner (1997), the reanalysis which resulted in (23) in turn contributed to the conditions for the loss of (finite) V-to-T movement later in ENE by creating a system in which, firstly the modals, and, thereafter, increasingly other auxiliaries were always available to lexicalise T. Very importantly, do underwent the same reanalysis as the modals at about the same time (see Denison (1985), Roberts 1993: 292f.). But the system that resulted was not the NE one of obligatory do-support in certain environments, with do ungrammatical everywhere else. Instead, do was always optional, including in positive declaratives. The 16th century was thus the period of what Jespersen (1909–49) called “exuberant” do, exemplified in (24) where do’s non-emphatic nature is evident from the fact that it surfaces in an unstressed metrical slot: (24) Thus cónscience does make cówards of us áll. (Shakespeare: Hamlet, I, i. 83; Roberts 1993: 293) The option of “exuberant” do in all contexts meant that any verb and any tense could be associated with an auxiliary. In other words, the trigger for

Cascading Parameter Changes 245 V-to-T raising was obscured by the development of the auxiliaries, particularly do (Roberts 1999: 293 [this volume, Chapter 5]). Kroch (1989) shows that, although there was variation throughout the ENE period, as Warner (1997: 382–383) observes, the period 1575–1600 seems to be the crucial one as far as the loss of V-to-T movement is concerned. The reanalysis that took place at this time was of the following kind: (25) a. [TP John [T walk-eth ] . . . [VP .. tV. . . ]] b. [TP John T .. [VP. . . [V walks ]]] By now, the verb-auxiliary system is rather similar to that of Modern English, with the exception of the absence of do-support. Do could still be freely inserted in positive declarative clauses, as just noted; conversely, clausal negation could appear without do, giving rise to examples with the order not—V, and no auxiliary (since V-to-T has been lost): (26) a. Or if there were, it not belongs to you. [1600: Shakespeare 2 Henry IV, IV, i, 98; Battistella & Lobeck (1988: 33)] b. Safe on this ground we not fear today to tempt your laughter by our rustic play [1637: Jonson Sad Shepherd, Prologue 37; Kroch (1989)] The development of do-support was preceded by the development of forms featuring contracted negation, which took place around 1600, as the following remark by Jespersen (1909–49, V: 429), cited in Roberts (1993: 305), suggests: The contracted forms seem to have come into use in speech, though not yet in writing, about the year 1600. In a few instances (extremely few) they may be inferred from the metre in Sh[akespeare], though the full form is written. Around 1600, then, negation contracted onto T, but since V-to-T movement of main verbs had been lost, only auxiliaries were able to be negative. This gave rise to a new system of clausal negation in which negative auxiliaries were used as the basic marker of clausal negation (it is clear from a range of languages, including those belonging to the Uralic family, Korean, Latin, Afrikaans, and others, that negative auxiliaries are a lexical option selected by a wide range of languages).14 The new class of auxiliaries included negative modals like won’t, can’t, shan’t, etc., but also the non-modal negator don’t/doesn’t/didn’t. Zwicky & Pullum (1983) argue convincingly that the negative auxiliaries must in fact be distinct items in the lexicon: negative n’t must be treated as an inflectional suffix, rather than a clitic, because inflections, but not clitics, trigger stem allomorphy, and n’t clearly triggers such allomorphy (see also Spencer (1991: 381f)). Biberauer & Roberts (2010)

246 Theresa Biberauer and Ian Roberts follow this analysis and therefore conclude that negative auxiliaries became part of the English lexicon during the early part of the 17th century. In other words, they propose that the available stock of “T-elements” (i.e. elements lexicalising specifically T-related features) was further increased during the early 17th century by the establishment of negative auxiliaries, and that this lexical factor compounded the morphologically determined system-internal pressure against maintaining a grammar in which lexical content-bearing “main” verbs could undergo raising to T, leading to its rapid demise. Once the negative auxiliaries, including doesn’t, don’t, didn’t, are established as the unmarked expression of clausal negation (probably by the middle of the 17th century; cf. Roberts (1993: 308)), the modern system of do-support comes into being. In this system, merger of do in T depends either on the presence of an “extra” feature on T, in addition to Tense-, V- and D-features (i.e. the interrogative feature Q or the negation feature Neg) or on the presence of a discourse effect, in contexts of emphasis and VP-fronting, as in: (27) a. John DOES (so/too) smoke. b. He threatened to smoke Gauloises and [smoke Gauloises] he DID/*he’d --. The discourse effect is once again required by Chomsky’s (2001: 34) proposal that “optional operations [here: Spellout of the features located in T—MTB/IGR] can apply only if they have an effect on outcome”. We could unite the two cases (Neg/Q-related do-support and discourse effect-related do-support) if we say that the auxiliaries are lexically associated with Negand Q-features (the former case giving rise to forms inflected with n’t; the latter not having any overt morphological reflex in English; but cf. Hunzib, Tunica, Gimira and other languages featuring interrogative verbal morphology discussed by Dryer 2013), and that their merger into the structure will thereby guarantee a discourse effect. If we slightly modify the reanalysis which gave rise to auxiliaries shown in (23) so that the modals and do were merged in v and raised to T in the new structure (which was nevertheless monoclausal, in that the complement to matrix T had lost its T-layer),15 then we could maintain that, although V(-to-v)-to-T was lost by the end of the 16th century, v-to-T remained. In that case, we could think of the development of do-support in the 17th century as a shift from the earlier obligatory v-to-T movement (first fed by V-to-v movement, and as such moving a main verb to T, but later only moving an auxiliary merged in v) to optional v-to-T movement creating a discourse effect. The difference between the two systems concerns the status of phonologically empty v, which in the earlier grammar, until the 17th century, moved to T (i.e. in examples like (26)). In the later grammar, only v containing an auxiliary moved to T. Again, this is a natural simplification of the grammar, given that movement of empty v to T could never be directly

Cascading Parameter Changes 247 observed in the PLD (cf. a parallel case in the nominal domain discussed in Section 4 above: following the loss of generalised object movement, both vP- and DP-raising in the infinitival TP-domain associated with V(P)R structures during the OE and ME periods resulted in the movement exclusively of empty categories (PRO and a lower copy of v in the former case; see Note 10, and PRO alone in the latter). As indicated in this section, this also led to structural simplification in that the original biclausal structure (18) became monoclausal (23)).This simplification was the final development in the establishment of the present-day English verbal system. To summarise, then: what we have seen in this section is how a series of natural changes affecting verb-movement and the auxiliary system in a language that initially resembled its Germanic relatives rather closely ultimately led to the creation of a verbal system that is unique in the Germanic context. We saw that these changes were initially triggered by the loss of vP-pied-piping, which had specific consequences in the V2 domain, resulting in the reanalysis of V2 structures as TPs (cf. (22) above). Various factors, including the reanalysis of modal-containing structures (cf. (23)), the rise of a class of negative auxiliaries and of do as a non-modal auxiliary then “remedied” the in (tense-) inflectional terms unsupportable V-to-T raising system that briefly existed at the end of the 16th century. The ever-increasing availability of auxiliaries and their establishment as a syntactically distinct class of “T-elements” undermined the trigger for V-to-T raising to a significant extent and ultimately led to a situation in which V-to-v-to-T raising could be reanalysed as v-to-T raising, with only verbs merged in the relevant kind of v (see Note 15), i.e. the auxiliaries, consequently being able to undergo this raising. The final change was the loss of “empty” v-to-T raising in positive declarative contexts, which resulted in the modern-day system of do-support, do being restricted to contexts in which it has an “interpretive effect”.

5. Conclusion The result of the changes described in the foregoing sections is that the OE system with OV, V2, no syntactically distinct auxiliaries and no V-movement in non-V2 clauses developed into the NE system, which is VO, non-V2, and has a class of syntactically distinct positive and negative auxiliaries and dosupport, via intermediate steps featuring processes found in neither OE nor NE, such as fully productive V-to-T and object movement restricted to negative and quantified objects. This remarkable series of changes can be seen as a cascade of parametric changes. We can summarise them as follows: (28) a. Loss of VP-to-SpecvP movement (late 12th/early 13th century) b. Restriction of object shift to negative and quantified objects (1400) c. Loss of vP-movement to SpecTP (early 15th century) d. Loss of V2 (1450) e. Development of lexical T (modals and do) (1525)

248 Theresa Biberauer and Ian Roberts f. Loss of V-to-T (1575) g. Contraction of negation (1600) h. Development of negative auxiliaries (1630s) i. Development of do-support (later 17th century) It has often been pointed out that English seems to diverge quite radically from the other West Germanic languages. It used to be thought that this had to with the influence of Norman French, although more recently the effects of Old Norse have sometimes been regarded as responsible for this divergence (see for example Kroch & Taylor (1997), Trips (2002)). We, however, argue that the series of changes in (28) had the net effect of transforming English from a typologically “standard” West Germanic language into the unusual system of Modern English. In this paper, we have tried to show how each change led to the next and how each change, after the initial one, can be ascribed to the interaction of specific system-internal factors. There therefore appears to be no need to invoke contact as a direct cause of the changes as each syntactic change seems to be sufficient to cause the next. The initial change, as we suggested in Section 3, may have been due to extraneous lexical and morphophonological changes, the first perhaps connected to contact with French. We could think of this as “parametric drift”: a cascade of parametric changes diffused through parts of the functional-category system over a fairly long period of time. This point emerges more clearly of we restate the parameters in more technical terms (with the exception of (28g), which was initially a purely phonological change): (29) a. Loss of pied-piping to satisfy v’s EPPD feature, which may have been optional throughout the attested OE period (thus guaranteeing the “interpretive effect” of defocalisation of material to the left of V; see Note 9). b. Loss of v’s optional EPP-feature, but retention of specialised EPPD on v (see (a)). c. Loss of pied-piping to satisfy T’s EPPD feature. d. (Matrix) C loses EPP-feature triggering T-movement. e. Modal and aspectual features of T realised by Merge. f. v loses EPP-feature triggering V-movement (but see Note 15). g. Possibly not a syntactic change. h. Negative features of clause realised by Merge in T. i. T loses obligatory feature triggering v-movement. So we observe a series of small, incremental changes to the formal feature make-up of the core functional categories C, T and v. Taken together, they give rise to a major reorganisation of the English verb-placement and auxiliary system, and have created a system which is quite unlike anything found elsewhere in Germanic (or Romance).16

Cascading Parameter Changes 249 What causes the cascade effect? To answer this we need to understand exactly what is meant by the “propensity to change” alluded to above. The key idea, due to Lightfoot (1979), is that “grammars practice therapy, not prophylaxis”. Essentially, each parameter change skews the PLD in such a way that the next is favoured, perhaps in concert with other pre-existing factors (such as the existence of subject- and object-clitics with their particular behaviour in V2 contexts, as discussed in Section 4 (and see Note 13)). We have seen in the description above how each successive change was favoured. Let us now look at this in a little more detail. The crucial trigger for VP-pied-piping to SpecvP was the occurrence of VP-internal material other than the direct object in a preverbal position in subordinate clauses. OE, as is well known, showed a good deal of “leaking” of such material, and we account for this with the idea that VP-pied-piping was optional. We suggested in Section 3 that the two most important cases of VP-internal material were particles and indirect objects. Independent factors—the influx of French lexical items replacing verb-particle combinations, and the loss of dative case leading to a rise in the expression of indirect objects as PPs—may have undermined this trigger experience. The OE system, with the pied-piping/stranding option for EPPD satisfaction, was inherently marked in terms of the Subset Principle, since this grammar generated a larger language than one without the optionality, and hence robust trigger experience was crucial. How did (29a) lead to (29b)? The loss of v’s optional EPP-feature, resulting in the unavailability of a general object-raising trigger, could plausibly have been the consequence of contradictory input from V(P)R contexts. Recall that these structures were biclausal in OE and ME and that the EPP-feature associated with the lower (infinitival) T-head could be satisfied either via pied-piping (i.e. vP-movement, where vP would have contained a raised object wherever the optional or specialised EPPD on v was present) or by “stranding” (i.e. subject-raising). Note that the “stranding” option of raising just the subject DP would have been just as available in cases where v was associated with an optional EPPD feature as in those where it was not. Consequently, V(P)R structures would have represented a context where objects that had undergone “defocusing” movement under the influence of v’s EPPD feature might nevertheless surface in postverbal position (V in V(P)R structures necessarily undergoing movement to T, as outlined in Section 3). Thus VO order, in V(P)R contexts at least would not have been consistently interpretable as a “focusing” structure and it is conceivable that this input may have compromised the trigger experience to the point where the “defocusing” EPP-feature was lost. This would, of course, have led to the situation that we see in late ME, namely that the only objects that still surface preverbally are those attracted by the remaining object-attracting feature, namely the specialised EPPD feature discussed in Note 9. Precisely when and how this feature arose and why it was retained for as long as it was are questions that we must leave to future research at this point.

250 Theresa Biberauer and Ian Roberts What is clear is that the restrictions on object movement, combined with the loss of VP-pied-piping, led to the change in (29c): the loss of vP-piedpiping. We described in Section 3 how, both in monoclausal and biclausal contexts, the trigger experience could not distinguish the pied-piping from the stranding case, and so, once again, the Subset Principle led to the loss of the older pied-piping grammar. The loss of vP-pied-piping led to the general requirement that a DP had to appear in subject position. This led to the reanalysis of subject clitics as occupying this position in the exceptional V3 orders, and hence to “decliticisation” and the reanalysis of the XP–SCL–V as well as Subject–V orders as non-V2 structures with V moving to T. V-to-T movement was not, however, robustly triggered by the morphological system of Late ME, given the “rich tense” requirement for this operation identified in Biberauer & Roberts (2010). Hence the loss of V2 favoured the development of the auxiliary system (29e) and the loss of V-to-T (29f). The reanalysis of the modal auxiliaries, at least, was also favoured by the changes in restructuring complements caused by the loss of vP-pied-piping (as well as the loss of infinitival morphology, an independent morphophonological change). The development of contracted negation was initially simply a phonological reduction of not to n’t. However, in combination with the loss of V-to-T movement, it led to the development of a separate class of negative auxiliaries. This is a case of the development of an inflectional affix. In general, following the proposals in Fuss (2005), we can take this to involve the removal of a given feature from the syntactic system as an autonomous element, in favour of systematically associating it with a lexical item or class of lexical items. As a further case of restriction on the distribution of a lexical item, this might be thought of as driven by the Subset Principle. The development of negative auxiliaries may have led to the development of general do-support if the conjecture at the end of the previous section regarding the status of obligatory v-movement is correct. Since this couldn’t be seen in many cases, once V-to-T had been lost, and since negative auxiliaries had developed (along with auxiliaries bearing a Q-feature, by analogy, we must suppose), v-movement became optional, and always had a discourse effect. Again, this is an example of a restriction being imposed on a movement operation. One factor which is very clearly at work in many of these changes is what we might call “restriction of function”: the narrowing down of an operation to a subset of the contexts in which it formerly applied. To the extent that this kind of change imposes new restrictions on the distributional freedom of (a class of) lexical items, it may derive from the Subset Principle. A further factor may be a general preference for relative simplicity of derivations, which frequently disfavours movement, or movement of relatively complex categories. In general, then, we see that it is possible to maintain a strong version of the Inertia Principle (which, as Longobardi 2001 points out, is desirable in the context of the Minimalist Programme) and yet at the same time account

Cascading Parameter Changes 251 for an intricate series of related syntactic changes, not all of which have a purely syntax-external cause. At the same time, we see what Sapir’s (1921: 165) intuition regarding “the vast accumulation of minute modifications which in time results in the complete remodelling of the language” might mean in principles-and-parameter terms.

Notes 1. Formal features are those which are directly relevant to the functioning of the operations of syntax, such as φ-features (person and number features), Case features and categorial features. These features may or may not play a role at the phonological and semantic interfaces. Other features, such as [sonorant] or [monotone increasing] may play a role only at one or other interface. It is useful to think of formal features as attribute-value pairs, e.g. [Person: 3]. In this way, unvalued features can be seen as those simply lacking a value, and the Agree operation can be seen as copying values between the Probe and the Goal. A condition on the semantic interface is that all formal features must be valued (cf. the Principle of Full Interpretation). 2. This use of the term “EPP” bears only a rather indirect relation to the Extended Projection Principle as originally proposed in Chomsky (1982: 10). For our purposes here, it suffices to think of the EPP-feature as a movement-triggering diacritic. 3. B&R’s approach diverges, however, from some of the aspects of the theory of phrase structure in Kayne (1994), notably in that they assume that a single head may have more than one specifier. 4. For ease of exposition, we represent the auxiliary hæfde as being merged in T. It is likely that the structure of clauses containing auxiliaries was more complex than this in OE: ‘restructuring’ verbs, which took infinitival complements, almost certainly had a TP complement and, as such, introduced biclausal structures (see below); habban, beon and weorþan, which typically had participial complements, may also have introduced a biclausal structure. We return to this point in Section 4, when we consider the development of the NE auxiliary system. 5. In assuming that the object’s D-feature is valued by v before the subject is merged, we are departing from Chomsky (1995: 355f.). Instead, we follow the account of the distinction between nominative-accusative and ergative-absolutive case-agreement marking put forward by Müller (2004). Müller argues that the contrast between the two types of pattern derives from a choice in the order of operations in a transitive clause when the derivation reaches v. At this point, v may either Agree with the direct object, or the subject may be merged. If Agree precedes Merge, v’s features Agree with the D-features of the object, and the subject, once merged, must Agree with T. This gives rise to a nominativeaccusative system. An ergative-absolutive derives from the opposite order of operations. Since OE was clearly nominative-accusative, the order of operations indicated in (8) is as predicted by Müller’s analysis. 6. We follow the literature on West Germanic, starting with Evers (1975), in using this terminology and the related term “verb raising” for OAuxV orders, although our analysis is very different to those relying on rightward-movement of verbs or verb-projections. 7. The Subset Principle arguably follows from the fact that language acquirers do not have access to negative evidence and therefore cannot retreat from a “superset trap” if they postulate a grammar which generates a language larger than that determined by the data.

252 Theresa Biberauer and Ian Roberts 8. This is, of course, also the pattern exhibited by Modern Dutch and German and West Germanic generally. Assuming v’s EPP-feature in these languages to specifically require movement of a D-bearing Goal, we can account for the consistently postverbal position of non-restructuring clausal complements by appealing to the fact that any D-features contained in complements of this type would no longer be accessible to v’s D-Probe at the point at which this head is merged (cf. the workings of the Phase Impenetrability Condition/PIC discussed below). 9. In addition to the VO orders which result from the effects of the PIC as described above, VAuxO was also available in OE in structures such as that illustrated in (i): (i) . . . þæt ænig mon atellan mæge ealne þone demm that any man relate can all the misery “. . . that any man can relate all the misery” (Orosius 52.6–7; Pintzuk, 2002: 283, 16b)

This order does not involve V(P)R, despite the fact that the matrix verb is one of the “restructuring” triggers discussed above: the non-finite verb atellan precedes the modal that it would follow in restructuring contexts. In order to allow for the possibility of VO orders in subordinate clauses in OE, B&R propose that v in OE was, with the exception of one class of object DPs (see below), only optionally associated with an EPP-feature, but that the presence of this optional EPP-feature systematically guaranteed an interpretive effect that was absent in structures where v lacked it (see Chomsky (2001: 34, 2004: 112)). Assuming leftward movement in Germanic to be a “defocusing” operation (cf. Pintzuk & Kroch 1989 on the obligatorily focus-bearing nature of the post verbal material in Beowulf), B&R propose that OE v’s optional EPPfeature triggered defocusing movement wherever it was present; wherever it was absent, unmoved material could therefore remain in focus. This implies that negative and quantified/indefinite objects which appear to have rather consistently surfaced preverbally during OE (and also in ME) were leftwardmoved for different reasons (see also Kroch & Taylor (2000), Pintzuk (2002: 294ff)). B&R propose that the negative/quantified object movement was triggered by an obligatory EPP-feature specifically associated with a [+Op] D-seeking Probe. OE object movement thus results from two different types of EPP-feature-driven movement, one involving an obligatory EPP-feature, and the other involving an optional EPP-feature which triggers defocusing. See Reinhart (1995) for an account of object-scrambling and defocusing in Dutch. 10. Note that the raising of the lower copy of v in the vP-fronting (pied-piping) case vs. the non-raising of this copy wherever the DP-fronting (“stranding”) option is employed does not have any effect on surface order either. See Biberauer & Roberts (2006, Note 6) for discussion of a PIC-based Spellout mechanism that “distinguishes” higher vs. lower copies, privileging only the former with full Spellout (i.e. phonological realisation). Regardless of the correctness of this proposal, it is clear that any account employing remnant movement where the remnant is ultimately only partially spelled out (e.g. den Besten & Webelhuth’s (1987) analysis of German VP-fronting) must offer some explanation as to how copies contained in a remnant that eventually surfaces above “higher” copies are disqualified from phonological realisation. We leave this matter for further research, the crucial point here being that the copy of the infinitival verb adjoined to v is not available for Spellout, with the consequence that it cannot signal the difference between vP- and DPraising to SpecTP.

Cascading Parameter Changes 253 11. In (20), the object is extracted under relativisation, which we have indicated by (Op); the leftmost occurrence of this symbol marks its successive-cyclic movement through SpecvP (Note, however, that nothing here hinges on the assumption of a null-operator rather than a raising analysis of relatives). The PP biforenn Cristess come was also a constituent of VP (and therefore of vP). However, it does not appear before the auxiliary in the surface string because at this stage, the pied-piping option was no longer available for v. The PP therefore remains within the VP throughout the vP phase of the derivation, and it surfaces in final position owing to the effects of the PIC as described above. 12. B&R’s analysis also facilitates a very simple analysis of V-Aux structures that are very evidently not amenable to a Styl-F analysis, but which are nevertheless attested in ME. Consider (i) in this connection: (i) er þanne þe heuene oðer eorðe shapen were before that heaven or earth created were “before heaven or earth were created” (Trinity Homilies, 133.1776; Kroch & Taylor 2000: 137)

For B&R, (i) involves pied-piping of a vP containing heuene oðer eorðe shapen, and as such is quite straightforward, whereas in terms of a Styl-F analysis the VAux order is problematic since there is no subject gap. 13. This idea might form the basis of a general account of second-position clitics, a point that we will not develop further here 14. Biberauer & Roberts (2010) take it that the negative auxiliaries with n’t represent the unmarked post-17th-century form. They note that many instances of non-contracted not involve constituent, not clausal, negation. This is clearly true whenever not is non-adjacent to the auxiliary, as in (i): (i) a. John has always not smoked. b. The kids have all not done their homework.

It should, however, be noted that clausal scope is possible if not (i.e. the full form) is adjacent to the auxiliary; thus:

(ii) John must/does not smoke.

In this connection, Biberauer & Roberts suggest that there is a “negativeconcord”-style Agree relation between [+neg] T and not (cf. the fact that the presence of the [+neg] feature on T triggers do-support in NE—see below). 15. This might necessitate postulating an “extra” v-layer to host V-to-v movement. However, pace the proposals in Marantz (1997) and Chomsky (2001, 2004) mentioned in Section 3 in this connection, we might think that NE verbs are in fact category-neutral roots; note that, unlike in all the other (continental) Germanic languages, NE verbs are able to appear in an uninflected form in a very wide range of environments: all persons of the present tense except 3sg, the “subjunctive”, the infinitive and the imperative. On the other hand, the evidence adduced in Johnson (1991) does suggest that NE has at least “short” V-movement, and this may then imply the presence of a further v-layer if the proposal in the text is to be maintained. 16. It is interesting to note that Icelandic underwent the initial word-order change, but not the subsequent changes discussed here (see Hróarsdóttir (2000), Rögnvaldsson (1996)). It may be that Icelandic never lost V2 because it never had subject (pro)clitics. This would be consistent with the account of the relation between the loss of vP-pied-piping and the loss of V2 proposed in Section 4.

254

Theresa Biberauer and Ian Roberts

References Adams, M. 1987. “From Old French to the theory of pro-drop”. Natural Language and Linguistic Theory 5: 1–32. Allen, C. 1995. Case Marking and Reanalysis. Grammatical Relations from Old to Early Modern English. Oxford: OUP. Baker, M., Johnson, K. and Roberts, I. 1989. “Passive arguments raised”. Linguistic Inquiry 20: 219–251 [this volume, Chapter 8]. Battistella, E. and Lobeck, A.1988. An ECP account of verb second in Old English. Proceedings of the Conference on the Theory and Practice of Historical Linguistics. Berwick, R.1985. The Acquisition of Syntactic Knowledge. Cambridge, Mass.: MIT Press. den Besten, H. and Webelhuth, G. 1987. “Remnant topicalization and the constituent structure of VP in the Germanic SOV languages”. Paper presented at GLOW X (Venice). Biberauer, T. & M. Richards. 2006. True optionality: When the grammar doesn’t mind. In C. Boeckx (ed) Minimalist Essays. Amsterdam: John Benjamins, pp. 35–67. Biberauer, T. and Roberts, I. 2005. “Changing EPP parameters in the history of English: accounting for variation and change”. English Language and Linguistics 9(1): 5–46. Biberauer, T. and Roberts, I. 2006. Loss of V-Aux orders and remnant fronting in Late Middle English. In J. Hartmann & L. Molnárfi (eds) Comparative Studies in Germanic Syntax. Amsterdam: Benjamins, pp. 263–298. Biberauer, T. and Roberts, I. 2010. “Subjects, Tense and Verb-movement in Germanic and Romance”. In T. Biberauer, A. Holmberg, I. Roberts & M. Sheehan Parametric Variation: Null Subjects in Minimalist Theory. Cambridge: Cambridge University Press, pp. 263–302. Bobaljik, J. and Wurmbrand, S. 2004. “The domain of Agreement”. Natural Language and Linguistic Theory 23:809-865. Canale, M. 1978. Word order change in Old English: base reanalysis in Generative Grammar. PhD dissertation, McGill University. Chomsky, N. 1957. Syntactic Structures.The Hague: Mouton. Chomsky, N. 1982. Some concepts and consequences of the theory of Government and Binding. Cambridge, Mass.: MIT Press. Chomsky, N. 1995. The Minimalist Program. Cambridge, Mass.: MIT Press. Chomsky, N. 2000. “Minimalist inquiries: the framework”. In Step by step. Essays on Minimalist Syntax in Honor of Howard Lasnik, R. Martin, D. Michaels & Juan Uriagereka (eds), 89–156. Cambridge, Mass.: MIT Press. Chomsky, N. 2001. “Derivation by phase”. In Ken Hale: a life in language, Kenstowicz, M. (ed.), 1–52. Cambridge, Mass.: MIT Press. Chomsky, N. 2004. “Beyond explanatory adequacy”. In The cartography of syntactic structures, volume 3: structures and beyond, Belletti, A. (ed.), 104–131. Oxford: OUP. Chomsky, N. 2008. “On Phases”. In R. Friedin, C. Otero & M.-L. Zubizarreta (eds) Foundational Issues in Linguistic Theory. Cambridge MA: MIT Press, pp. 133–166. Clark, R. and Roberts, I. 1993. “A computational model of language learnability and language change”. Linguistic Inquiry 24: 299–345 [this volume, Chapter 2]. Denison, D. 1985. “The origins of periphrastic do: Ellegård and Visser reconsidered”. In Papers from the 4th International Conference on Historical Linguistics, Eaton, R. et al. (eds), 45–60. Amsterdam: John Benjamins. Matthew S. Dryer. 2013. Polar Questions. In: Dryer, Matthew S. & Haspelmath, Martin (eds.) The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://wals. info/chapter/116, Accessed on 2018-06-16.)

Cascading Parameter Changes 255 Evers, A. 1975.The transformational cycle in Dutch and German. Ph.D. dissertation, University of Utrecht. Fischer, O., van Kemenade, A., Koopman, W. & van der Wurff, W. 2000. The syntax of early English. Cambridge: CUP. Fuss, E. 1998. Zur Diachronie von Verbzweit. MA dissertation, University of Frankfurt. Fuss, E. 2005. The Rise of Agreement. A Formal Approach to the Syntax and Grammaticalization of Verbal Inflection. PhD dissertation, University of Frankfurt. Fuss, E. and Trips, C. 2002. “Variation and change in Old and Middle English. On the validity of the Double Base Hypothesis”. Journal of Comparative Germanic Linguistics 4: 171–224. Haeberli, E. 1999. Features, categories and the syntax of A-positions. Synchronic and diachronic variation in the Germanic languages. Ph.D. dissertation: University of Geneva. Haeberli, E. 2002. Features, categories and the syntax of A-positions. Cross-linguistic variation in the Germanic languages. Dordrecht: Kluwer. Haegeman, L. and van Riemsdijk, H. 1986. “Verb projection raising, scope and the typology of rules affecting verbs”. Linguistic Inquiry 7: 417–466. Holmberg, A. 2000. “Scandinavian Stylistic Fronting: how any category can become an expletive”. Linguistic Inquiry 31(3): 445–483. Hróarsdóttir, Th. 2000. Word order change in Icelandic: from OV to VO. Amsterdam: John Benjamins. Jespersen, O. 1909–1949. A Modern English Grammar on Historical Principles I-VII. London/Copenhagen: Allen and Unwin. Johnson, J. 1991. “Object positions.” Natural Language and Linguistic Theory 9: 577–636. Kayne, R. 1994. The Antisymmetry of Syntax. Cambridge, Mass.: MIT Press. Keenan, E. 1996. “Creating Anaphors: An Historical Study of the English Reflexive Pronouns”. Unpublished ms.: UCLA. van Kemenade, A. 1987. Syntactic Case and Morphological Case in the history of English. Dordrecht: Foris. van Kemenade, A. 1997. “V2 and embedded topicalization in Old and Middle English”. In van Kemenade, A. and Vincent, N. (eds). Parameters of morphosyntactic change, 326–352. Cambridge: CUP. Koopman, W. 1990. Word Order in Old English. PhD dissertation, University of Amsterdam. Koopman, W. 1994.“The order of dative and accusative objects in Old English”. Unpublished ms.: University of Amsterdam. Koopman, W. and van der Wurff, W. 2000. “Two word order patterns in the history of English: stability, variation and change”. In Stability, Variation and Change of Word-Order Patterns over Time, Sornicola, R., E. Poppe and Shisha-Halevy, A. (eds), 259–283. Amsterdam: John Benjamins. Koster, J. 1975. “Dutch as an SOV language”. Linguistic Analysis 1: 111–136. Kroch, A. 1989. “Reflexes of grammar in patterns of language change”. Language Variation and Change 1: 199–244. Kroch, A. and Taylor, A. 1997. “Verb movement in Old and Middle English: Dialect variation and language contact”. In Parameters of morphosyntactic change, van Kemenade, A. and Vincent, N. (eds), 297–325. Cambridge: CUP. Kroch, A. and Taylor, A. 2000.“Verb-object order in Middle English”. In Diachronic syntax: models and mechanisms, Pintzuk, S. Tsoulas, G. and Warner, A. (eds), 132–163. Oxford: OUP. Lightfoot, D. 1979. Principles of diachronic syntax. Cambridge: CUP. Lightfoot, D. 1991. How to set parameters: arguments from language change. Cambridge, Mass.: MIT Press.

256 Theresa Biberauer and Ian Roberts Lightfoot, D. 1999. The Development of Language. Acquisition, Change, and Evolution. Cambridge: CUP. Longobardi, G. 2001. “Formal syntax, diachronic minimalism, and etymology: the history of French chez”. Linguistic Inquiry 32(2): 275–302. Maling, J. 1990. “Inversion in embedded clauses in Modern Icelandic”. In Modern Icelandic Syntax, Maling, J. and Zaenen, A. (eds), 71–91. San Diego: Academic Press. Manzini, R. and Wexler, K. 1987. Parameters, Binding Theory, and Learnability. Linguistic Inquiry 18(3): 413–444. Marantz, A. 1997. No Escape from Syntax. University of Pennsylvania Working Papers in Linguistics: 201–225. Müller, G. 2004. “Argument encoding and the order of elementary operations”. Unpublished ms.: University of Leipzig. Pesetsky, D. and Torrego, E. 2001. ‘T-to-C movement: causes and consequences’. In Kenstowicz, M. (ed.), Ken Hale: a life in language, 355–426. Cambridge, Mass.: MIT Press. Pintzuk, S. 1991/1999. Phrase structures in competition. Variation and change in Old English word order. New York: Garland. Pintzuk, S. 2002. “Verb-Object order in Old English: variation as grammatical competition”. In Syntactic effects of morphological change, Lightfoot, D. (ed.), 276–299. Oxford: OUP. Pintzuk, S. and Kroch, A. 1989. “The rightward movement of complements and adjuncts in the Old English of Beowulf”. Language Variation and Change 1: 115–143. Platzack, C. 1995. “The loss of verb second in English and French”. In Clause structure and language change, Battye, A. and Roberts, I. (eds), 200–226. New York: OUP. Reinhart, T. 1995. “Interface Strategies”. OTS Working Papers in Linguistics. Utrecht: University of Utrecht Publishers. Richards, M. and Biberauer, T. 2005. “Explaining Expl”. In The function of function words and functional categories, den Dikken, M. and Tortora, C. (eds), 115–154.Amsterdam: John Benjamins. Roberts, I. 1985. “Agreement Parameters and the Development of the English Modal Auxiliaries”. Natural Language and Linguistic Theory 3(1): 21–58 [this volume, Chapter 1]. Roberts, I. 1993. Verbs and diachronic syntax. Dordrecht: Kluwer. Roberts, I. 1995. “Object movement and verb movement in Early Modern English”. In Studies in Comparative Germanic Syntax, Haider, H., Olsen, S. and Vikner, S. (eds), 269–284. Dordrecht: Kluwer [this volume, Chapter 3]. Roberts, I. 1996. “Remarks on the Old English C-system and the Diachrony of V2”. Linguistische Berichte: 154–167. Roberts, I. 1997. “Directionality and word order change in the history of English”. In Parameters of morphosyntactic change, van Kemenade, A. and Vincent, N. (eds), 397–426 Cambridge: CUP. [this volume, Chapter 4]. Roberts, I. 1999. “Verb movement and markedness”. In Language Change and Language Creation: Creolization, Diachrony and Development, de Graff, M.(ed.), 287–327. Cambridge, Mass.: MIT Press [this volume, Chapter 5]. Roberts, I. and Roussou, A. 2003. Syntactic Change. A Minimalist Approach to Grammaticalization. Cambridge: CUP. Rögnvaldsson, E. 1996. Word order variation in the VP in Old Icelandic. Working Papers in Scandinavian Syntax 58: 55–86. Rohrbacher, B. 1994. The Germanic languages and the full paradigm: a theory of V-to-I raising. PhD dissertation, University of Massachusetts, Amherst. Rohrbacher, B. 1999. Morphology-driven syntax. Amsterdam: Benjamins.

Cascading Parameter Changes 257 Sapir, E. 1921. Language. An Introduction to the Study of Speech. New York: Harcourt Brace. Spasov, D. 1966. English phrasal verbs. Sofia: Naouka Izkoustvo. Spencer, A. 1991. Introduction to Morphological Theory. Oxford: Blackwell. Trips, C. 2002. From OV to VO in Early Middle English. Amsterdam: John Benjamins. Vikner, S. 1997. “V-to-I movement and inflection for person in all tenses”. In The New Comparative Syntax, Haegeman, L. (ed.), 189–213. London: Longman. Warner, A. 1997. “The structure of Parametric Change and V-movement in the history of English”. In Parameters of morphosyntactic change, van Kemenade, A. and Vincent, N. (eds). 380–393. Cambridge: CUP Willis, D. 1998. Syntactic change in Welsh: a study of the loss of verb-second. Oxford: OUP. van der Wurff, W. 1997. “Deriving object-verb order in late Middle English”. Journal of Linguistics 33: 485–509. van der Wurff, W. 1999. “Objects and verbs in Modern Icelandic and fifteenthcentury English: a word order parallel and its causes”. Lingua 109: 237–265. Zwicky, A. and Pullum, G. 1983. “Cliticization vs. inflection: English n’t”. Language 59 (3): 502–513.

Part II

Comparative Syntax

8

Passive Arguments Raised Mark Baker, Kyle Johnson and Ian Roberts

The purpose of this article is to develop and motivate a theory of passive constructions whose central claim is (1): (1) The passive morpheme (-en) is an argument. In pursuing (1), we elaborate the approach to passives originated in Jaeggli (1986), and our theory owes much to Jaeggli’s original insights. We will try to show that the “passive construction” reduces to (1), with the various (and cross-linguistically varied) properties of passives resulting from an interplay of autonomous forces acting in tandem with (1). Most prominent among these forces are the well-formedness conditions on arguments: the θ-Criterion with the related Visibility Condition, the Projection Principle, and conditions on binding. The article is organized as follows. In section 1 we present a brief description of our theory and discuss its major strengths and some apparent problems. In section 2 we discuss the evidence from binding theory that suggests that implicit arguments of passives are “syntactically active,” in that they show coreference properties typical of syntactically realized arguments. We concentrate here on the evidence that the argument -en gives rise to strong crossover effects, drawing on observations due to Postal (1971). The properties of the argument -en with respect to θ-theory and Case theory are the topics of sections 3 and 4, respectively. Here we derive the range of cross-linguistic variation from a small number of minimal assumptions about cross-linguistic differences in the features of the passive morpheme. Two major points emerge: (i) In certain languages -en appears to undergo NP Movement. (ii) The generalization underlying the 1-Advancement Exclusiveness Law of Relational Grammar (see Perlmutter and Postal (1984))—which states essentially that only verbs with logical-subject arguments can be passivized—can be derived from the θ-Criterion in the cases where it truly holds; moreover, the properties of languages where this generalization does not hold can be neatly accounted for. Finally, in section 5 we take up the issue of the structural representation of argument -en at various levels of the derivation and the interaction of this morpheme with other

262 Mark Baker et al. aspects of the English verb-auxiliary system. The discussion here throws light on the nature of this system as a whole. The overall result of our investigation is of both empirical and conceptual interest for the formulation of a theory of grammatical-function (GF) changing processes in natural language. The empirical interest lies in the breadth and depth of our coverage of a central GF-changing process, the passive construction. The conceptual interest lies in the fact that we account for this construction in terms of the notion “argumental affix”—a piece of morphology that is subject to the well-formedness conditions that apply to arguments. The natural continuation of this work would be to extend this notion to other GF-changing processes.

1. The Theory Outlined We propose that -en, the passive argument, is base-generated under Infl and therefore that the D-Structure representation of a passive clause has the general form shown in (2): (2)

S (or, IP) I′

NP

VP

I -en

V

XP

We leave until section 5 the question of the representation of passive clauses involving sequences of auxiliaries—including be—as, for example, in They must have been being arrested. It is clear that the representation in (2) requires some elaboration in order to account for these auxiliary sequences, but we will abstract away from this matter in the early sections of the article, focusing instead on the properties of argument -en. If -en is an argument, then it must be in a θ-marked position at D-Structure (Chomsky (1981, chap. 2)). This requirement entails that Infl is a θ-marked position in (2). Hence, -en can only receive a θ-role assigned outside VP, namely, the logical-subject θ-role. This is a straightforward consequence of the representation in (2). It immediately explains four salient properties of passives: (3) a. The fact that the logical-subject argument is not realized on an NP in passives b. The phenomenon of “implicit arguments” in passives c. The fact that the subject position is nonthematic in passives, permitting NP Movement into this position d. The 1-Advancement Exclusiveness Law (1AEX)

Passive Arguments Raised 263 We will discuss (3a–c) at this point but will defer discussion of the 1AEX until section 3. Concerning (3a), it is clear that the logical-subject argument is realized in some marked way in passives, whether another argument occupies the surface subject position or not. Moreover, as pointed out in Marantz (1984), this is independent of the actual θ-role associated with the logical subject. Marantz illustrates this point with the following examples: (4) a. b. c. d. e.

Hortense was pushed by Elmer. (AGENT) Elmer was seen by everyone who entered. (EXPERIENCER) The intersection was approached by five cars at once. (THEME) The porcupine crate was received by Elmer’s firm. (GOAL) The house is surrounded by trees. (LOCATION?) (Marantz (1984, 129))

For us, this follows from the simple fact that the subject θ-role is realized on an argument in Infl, instead of one in the subject position. We turn to the question of the representation of by-phrases below. A second point that arises in this connection is that -en, like any subject argument, receives a compositional θ-role from VP. This is illustrated by the sentences in (5)–(6), where the type of activity performed by the agent varies as a function of the D-Structure complements of the verb: (5) a. b. c. d.

A baseball was thrown by Fernando. Support was thrown behind the candidate by the CIA. The match was thrown by the prizefighter. The party was thrown by the department.

(6) a. b. c.

A book was taken from the shelf by John. The bus was taken to New York by Mary. A nap was taken by the professor in his office. (Roberts (1985, 55))

Such examples show clearly that if -en is an argument, then it is an external argument, inasmuch as it is assigned a θ-role by the entire D-Structure VP; the implications of this claim will be taken up in greater detail in section 3. Turning now to point (3b), we find further motivation for a D-Structure representation of the kind in (2) from the behavior of subject-oriented modifiers—essentially a particular class of adverbs—and the type of infinitival adjunct known as rationale clauses (henceforth RatCs). Here a comparison of passives and middles is instructive; in passives, but not in middles, the understood subject of the RatC may be “controlled,” and subject-oriented adverbs may find an argument to modify (see Manzini (1980), Roeper (1983)). (7) a. This bureaucrat was bribed [PRO to avoid the draft]. b. *This bureaucrat bribes easily to avoid the draft.

264 Mark Baker et al. (8) a. This bureaucrat was bribed deliberately. b. *This bureaucrat bribes deliberately. Under the hypothesis that RatCs and subject-oriented adverbs require the syntactic presence of an argument, the contrasts in (7)–(8) are accounted for by our theory. Only in (7a) and (8a) is an argument present that satisfies these requirements; that argument is -en. Point (3c) follows from the fact that VP is only capable of assigning one θ-role. Because of this, the subject position in (2) is not assigned a θ-role and therefore cannot be occupied by an argument at D-Structure. Thus, the subject position of passives is a possible landing site for NP Movement, and, where NP Movement does not apply, it is occupied by an expletive. So far nothing in our account predicts the obligatory movement of the object NP in (2) to the S-Structure subject position. To see how we account for this, consider the S-Structure representation derived from (2): (9)

S (or, IP) NPi

I′ VP

I e

V

NP

[ ] + en

ti

(9) differs from (2) in two respects: (i) the application of NP Movement placing the D-Structure object in subject position; (ii) the downgrading of -en from I to V. The exact nature of the operation in (ii) will be discussed in section 5; for the moment we take it as primitive. Given this, we can derive the requirement for NP Movement. The principle behind (i) is the Visibility Condition, proposed in Chomsky (1981, chap. 6). This condition is an adjunct to the θ-Criterion, in that it requires all θ-marked arguments to be “visible” for θ-role assignment at LF, meaning that they must receive abstract Case by S-Structure. This requirement applies, then, to the two arguments of the passivized verb in (2) and (9): -en and the object NP. Assuming that -en downgrades for independent reasons, only the verb can assign Case to -en in (9), since it is the only Case assigner that governs -en. Because the verb must assign Case to -en, it is no longer able to Case-mark NP. As a result, NP must move into the Case-marked subject position. Thus, we account for the “Case-absorption” property of passives (although we consider the term “Case absorption” to be a misnomer). Note that our account derives the effects of “Burzio’s Generalization” for passives. Burzio’s Generalization states the well-known correlation between the failure of assignment of a θ-role to subject position and the failure of

Passive Arguments Raised 265 assignment of structural Case to object position. In our theory, these two “assignment failures” are both instances of assignment to morphology. Since Burzio’s Generalization holds for a range of other constructions (middles, ergatives, and so on) that do not involve morphological alterations of the verb, it could be objected that something is being missed here. However, we take the position that Burzio’s Generalization is the statement of a problem that needs to be solved. Like many problems, it is not a priori obvious that this one has a unitary solution. We therefore consider it an advance that the problem has been solved for passives. The above paragraphs outline what we consider to be the derivation of a “short” passive sentence. What about “long” passives, that is, those containing by-phrases? We have treated -en as an argument that affixes to the verb. We regard its essential properties to be like those of clitics. Thus, like other clitics, -en forms a chain with a full NP (see, for example, Jaeggli (1982), Borer (1984)). The NP that forms the coda of the chain may be overtly realized as a by-phrase, giving rise to “long passives.” In this case the situation resembles clitic-doubling.1 If the NP is not overt, a “short passive” is formed. The existence of a clitic chain in passives implies that -en has a referential index. Further, in short passives, our claim is that there is an empty category linked to the argument -en. In the next section we will find evidence for both these claims. We take a rather nonstandard view of clitics in certain respects. First, we consider that English has at least one clitic, -en, and, second, we allow for the possibility of morphologically “deep” clitics, which trigger quite complex morphophonological suppletions and so on. Both of these differences amount to the same point: we are taking the syntactic properties of wellknown cases of clitics—essentially the Romance clitics—as basic, and as completely independent of the morphophonological property of enclisis or proclisis. Thus, for the purposes of syntax, a clitic is an argument category realized as adjoined to a head. In this respect, -en is a clitic in good standing. The fact that -en has a morphophonological realization that is rather different from that of Romance clitics is the reflex, we take it, of completely independent properties of English phonology. In other words, in asserting that -en is a clitic we are assuming (i) a syntactic definition of clitic like the one above and (ii) the independence of phonology and syntax. Notice moreover that there are elements that are clitics phonologically but not, apparently, syntactically (see the discussion of French subject clitics in Rizzi (1986)). We propose that -en is syntactically a clitic but phonologically an affix. In this section we have outlined the general form of a passive derivation in our theory and commented on a number of matters that arise. We now turn to a more detailed discussion of various aspects of our account.

2. Binding Theory and Passive Arguments Here we argue for (1), and more narrowly for the claim that -en forms a chain with an NP, by showing how this chain enters into various types of binding relations. We also explain why other kinds of binding relations do not arise.

266 Mark Baker et al. 2.1 Strong Crossover in Passives The central fact showing that the passive argument is syntactically active is the existence of strong crossover (SC) effects in passives. Thus, short passives cannot be interpreted so that the understood subject is coreferential with the S-Structure subject. In other words, (10) cannot mean (11): (10) a. They were killed. b. They were admired. (11) a. They committed suicide. b. They admired themselves. This effect cannot be attributed to a semantic constraint, since there is nothing wrong with the sentences in (11). Moreover, we cannot say that the impossibility of coreference is a pragmatic effect due to the structural absence of the passive argument. This would entail that “coreference” with structurally absent arguments is impossible, whereas in fact “coreference” possibilities with such arguments are maximally free. In this respect, consider the following contrast: (12) a. John shaves (easily). b. John was shaved. Whether (12a) is taken as a middle (with John as Theme, meaning ‘it is easy to shave John’) or as an intransitive, the Agent and Theme arguments of shave can be taken to be coreferential (there is in fact a preference for this in the agentive intransitive reading). On the other hand, (12b) does not allow this; it cannot mean ‘John was shaved by John’. Our theory of passives can explain this fact. An example like (10a) has the following representation: (13) *Theyi were kill + eni ti IMPi. We use “IMP” to represent the coda of the chain headed by -en. Compare (13) with standard cases of SC—for example, those that involve Wh Movement: (14) *Whoi does hei love ti ? (14) is usually taken to be ruled out by Principle C of the binding theory (see Chomsky (1981)): (15) An R-expression must be A-free. Since R-expressions include variables—wh-traces—(15) prevents these elements from being A-bound. Since the trace of who in (14) is A-bound by he,

Passive Arguments Raised 267 the example is ruled out. We could explain the ungrammaticality of (13) in the same terms if we make one assumption about IMP: (16) IMP is a variable. IMP is A-bound by the subject in (13), in violation of Principle C, hence the impossibility of coreference between “understood” underlying subjects and S-Structure subjects in passives.2 We will develop a different account, however, for three reasons: (i) if the empty category linked to -en is a variable, then it is unclear what the operator that binds this variable is; (ii) we have no way to tell whether the empty category is in fact a variable or a pronoun—everything we have seen up to now is compatible with either assumption; (iii) the ungrammaticality of the examples in (17) shows that more is going on: (17) a. *They were killed by themselves. b. *Theyi were kill + eni ti by themselvesi. (See Lees and Klima (1963, 21), Postal (1971, chap. 1).) Here the NP linked to -en is manifestly an anaphor, rather than a variable. We can account for the ungrammaticality of this example using the following conditions and definitions: (18) a. Chains: C = (xi, . . ., xn) is a chain iff, for 1 < i < n, xi locally binds xi + 1 (Rizzi (1986, (2))). b. Local binding: X locally binds Y iff X binds Y and there is no Z that binds Y but not X. c. Binds: X binds Y iff X c-commands and is coindexed with Y. (19) For each well-formed structure there exists a set of chains S, such that:

a. Each argument appears in a unique chain of S. b. Each chain of S contains a unique visible θ-position P and a unique argument. c. Each θ-position P is visible in a chain of S.

(20) a. A θ-chain is an element of the set S in (19). b. The Projection Principle requires arguments to appear in a θ-chain at every level. (Notice that we define “chain” independently of “θ-chain.” This creates a number of formal objects that may be regarded as chains without being relevant for the θ-Criterion; the point is that the θ-Criterion and the Projection Principle together force a one-to-one relation between arguments and wellformed θ-chains.)

268 Mark Baker et al. The crucial result of these conditions is that no structure of the following kind can exist: (21) Xi Yi ti (where X c-commands Y and Y c-commands t and there is movement from t to X) In this construction Xi must be in a non-θ-position (by (19a)). Thus, by itself, it is not a θ-chain (by (19b)). (20b) then implies that it must be in a nonsingleton chain with a θ-position that does not itself contain an argument—namely, the position of ti. However, if Xi and ti are both members of a chain, then Yi must be too, by (18a). If Yi is an argument, then this chain cannot be a θ-chain, since (19b) is violated. Thus, there is no θ-chain containing Xi in structures of the form (21), and they are ruled out. The essence of this account is taken from Rizzi (1986), who uses it to explain Italian examples like (22): (22) *Giannii sii è stato affidato ti t’i. Gianni self(dat) was been entrusted (‘Gianni was entrusted to himself.’) Lasnik (1985) also notices certain “loopholes” in Principle A of the binding theory that can be plugged this way. Consider (23): (23) *Johni is believed that hei likes ti. Both (22) and (23) are instances of the schema in (21). Moreover, so are (13) and (14). Hence, a chain-formation approach gives us a unified explanation of SC in passives (although this explanation may be redundant with Principle C in (13)). Two problems arise in connection with this idea. First, the examples with overt reflexives are not as bad as we might expect, and some of them are not very bad at all: (24) ?I am amused/impressed/worried by myself. However, it appears that all of the relatively good examples involve psychverbs. Belletti and Rizzi (1988) argue that psych-verbs are essentially ditransitive unaccusatives and therefore cannot form syntactic passives.3 Hence, the apparent passives in (24) must be adjectival and thereby lack an argument -en. If so, no crossover violation arises. This can be tested by trying examples that cannot be adjectival passives. Our prediction is that reflexive by-phrases in such cases will be very bad: (25) a. *John is considered a genius by himself. b. *John was sent a book by himself. c. *John was believed to have left by himself.

Passive Arguments Raised 269 These examples are far worse than (24), and are also worse than (17) when it receives a stative reading. This can be attributed to the possibility of an adjectival interpretation of (17). The second problem is that reflexives and reciprocals contrast with respect to SC effects: (26) a. They were seen by each other. b. *They were seen by themselves. To account for this, we propose the following structure for reciprocals, following Lebeaux (1983): (27)

NPj

Spec

N′

eachi

otherj

Each is a bare quantifier. As such, it undergoes Quantifier Raising (QR) in the mapping to LF. Moreover, Jaeggli (1982) shows that bare quantifiers require local antecedents, in much the same way that anaphors do. Other is a disjunctive pronoun that requires contraindexation with its specifier. As head of the NP, other gives its index to the whole NP. The result is that we find structures like the following in (26a), which are not ruled out by the θ-Criterion: (28) Theyi eachi were see + enj ti by [ti otherj]j. This account of SC effects in passives carries over to other examples, noticed by Postal (1971), of the configuration in (21). Note that each time the reflexive/reciprocal contrast holds up: (29) Raising

a. Theyi seem to each otherj [ti to like John]. b. *Theyi seem to themselvesi [ti to like John].

(30) Psych (31)

a. Theyi strike each otherj [ti as smart]. b. *Theyi strike themselvesi [ti as smart]. Tough a. Theyi are tough for each otherj [to like ti]. b. *Theyi are tough for themselvesi [to like ti].

Thus, we are assuming that each is not an argument and therefore does not interfere with the formation of the chain between they and ti here.

270 Mark Baker et al. In this way, we account straightforwardly for the properties of (13) and (17) and, by adopting fairly minimal assumptions, explain the reflexive/reciprocal contrast in by-phrases. The central role played by the passive argument in the account of these data is an argument for the correctness of a theory such as ours. 2.2 Passives and Arbitrary Reference Here we will briefly address the question of the referential properties of the passive argument when no by-phrase is present. We will suggest that this argument has properties similar to arbitrary PRO. This idea on the one hand supports the hypothesis that the argument is a kind of clitic, since clitics generally have pronominal interpretation, and on the other hand allows an account of why the passive argument appears to remain “syntactically inert,” in certain cases. Our suggestion, then, is that the [-en, IMP] argument has semantic properties akin to arbitrary PRO. This approach has a certain initial plausibility; passive sentences like John was killed/It is widely believed that . . . are naturally paraphrased as Someone or other killed John/People believe that. . . . Although we cannot provide a semantics for the passive argument here, this observation is sufficient to highlight the intuitive similarity between passives and arbitrary PRO. The advantage of the suggestion just made is that the binding contrast in (32) is now seen to parallel that in (33): (32) a. ?*This privilege was kept to themselves. b. Such privileges should be kept to oneself. (33) a. ?*To shave themselves is fun. b. To shave oneself can be fun. We account for this by invoking a feature-agreement requirement on anaphor binding (ruling out cases of the type John likes themselves) and a requirement that arbitrary pronouns lack features. Therefore, arbitrary pronouns are only able to bind other arbitrary pronouns, as the above examples show. This account carries over to the following cases involving pronouns:4 (34) a. *Hisi mother was see + eni. b. This is the kind of show that onei’s children shouldn’t be take + eni to. (35) a. *PROi to abandon hisi children is irresponsible. b. PROi to abandon onei’s children is irresponsible. Again, we can say that the nonarbitrary pronoun and -en fail to agree in features, and coreference is therefore not possible in the (a) examples. On the other hand, the arbitrary pronoun imposes no feature-agreement requirement, and the (b) examples therefore do allow coreference. On this view, examples like (34a) and (35a) are not evidence against a structurally

Passive Arguments Raised 271 present passive argument; properly understood, the contrasts here are in fact evidence for the presence of such an argument. Despite the basic similarities, there is a notable difference between the passive argument and arbitrary PRO: arbitrary PRO can be 1st person, whereas the passive argument must be 3rd person. This is shown by the fact that arbitrary PRO can bind a 1st person plural anaphor, whereas the passive argument cannot: (36) a. PRO to shave ourselves is fun. b. *Love letters were written to ourselves. The passive argument also contrasts with the otherwise very similar Italian si morpheme in this respect: (37) Si invia spesso lettere a noi stessi. SI sends often letters to us selves ‘Letters are often sent ourselves.’ We do not have an account for this “non-1st person” restriction on the passive argument, so we simply stipulate the constraint here. Whatever really underlies this constraint, it clearly explains the relative syntactic inertness of the passive argument, while allowing us to continue to assume the existence of such an argument.

3. θ-Role Assignment and the 1AEX In the preceding section we investigated some of the implicit argument properties of passives under the theory sketched in section 1 (see (3b)). Next we turn to property (3d): the fact that passives in many languages obey the 1-Advancement Exclusiveness Law of Relational Grammar (Perlmutter (1978), Perlmutter and Postal (1984)). This law stipulates roughly that only one argument can acquire subject status in the derivation of any given clause. Thus, verbs that have derived subjects cannot be passivized. This is illustrated in (38)–(40) for unaccusative verbs in Dutch (see Perlmutter (1978), Burzio (1986)), for raising verbs, and for passive verbs: (38) a. b. c.

In dit weeshuis groeien de kinderen erg snel. in this orphanage grow the children very fast ‘In this orphanage the children grow very fast.’ John seemed to have left. The vase was broken by John.

(39) a. *In dit weeshuis wordt er door de kinderen erg snel gegroeid. in this orphanage is it by the children very fast grown b. *It was seemed to have left (by John). c. *It was been broken (by the vase) by John.

272 Mark Baker et al. (40) a. *In dit weeshuis werden de kinderen erg snel gegroeid. in this orphanage are the children very fast grown b. *John was seemed to have left. c. *The vase was been broken by John. (38) gives elementary examples, and (39) the potential results of forming (impersonal) passives of them (see Perlmutter and Postal (1984)).5 (40), on the other hand, gives the results of simply adding “redundant” passive morphology to the verbs, which already involve derived subject constructions (see Marantz (1984)). Both types of construction are impossible in languages like Dutch and English. Consider first examples like those in (40). In some Government-Binding accounts, these structures are the hardest to block, requiring auxiliary assumptions such as Marantz’s (1984) No Vacuous Affixation Principle, which bars the attachment of “superfluous” morphemes. The fundamental assumption that the passive morpheme is an argument makes an important advance in deriving this aspect of the 1AEX, however. Since the passive morpheme is an argument, it must receive a θ-role, by the θ-Criterion. Verbs like those in (40) have no extra θ-role to assign; therefore, the resulting structures are ungrammatical for the same fundamental reason that those in (41) are: (41) a. #John grew children very fast in the orphanage. b. *Mary seemed that John had left. c. *Peter was broken a vase by John. In each instance there is one more argument than θ-role (the -en in (40); the subject NP in (41)), making the sentence impossible. In this respect, we essentially follow an insight of Jaeggli (1986).6 The sentences in (39) are more interesting for two reasons—one theoretical, the other empirical. Thus, we focus on them for the remainder of this section. Theoretically, the sentences in (39) do not present the same problem as those in (40), because they exhibit a one-to-one correspondence between available θ-roles and arguments (counting -en, but not the expletive subject or the optional by-phrase). Nevertheless, their ungrammaticality can still be derived because we assume that, by virtue of their categorial features, passive morphemes are generated under the Infl node. As such, they always appear outside the VP at D-Structure. This structural position implies that they can receive an external θ-role from the verb, but not an internal one in the sense of Williams (1981). Now, in Government-Binding Theory, the θ-Criterion implies that a verb with a derived subject does not assign a θ-role to the subject position (Chomsky (1981)). Thus, the 1AEX can largely be restated as the fact that passive morphology cannot appear with a verb that does not assign an external θ-role. But this follows immediately from the θ-Criterion: -en is an argument, structurally it can only receive an external θ-role, and the relevant verbs have no such θ-role; hence, the structure is ungrammatical.

Passive Arguments Raised 273 This time a θ-role is in principle available to the passive morpheme, but the morpheme is in the wrong place to receive it. The fact that the passive observationally obeys the 1AEX thus follows completely from the θ-Criterion. The second reason that paradigms like (39) are of interest is that they are in fact grammatical in some languages. Thus, a number of researchers have demonstrated that the 1AEX is not universal, as one might have suspected (see Timberlake (1982), Nerbonne (1982), Keenan and Timberlake (1985), Postal (1986)). The following examples illustrate this for Lithuanian: (42) Kur mūs gimta, kur augta? where by-us bear/pass-n/sg where grow/pass-n/sg ‘Where were we born, where did we grow up?’ (lit. ‘Where by us was getting born, where getting grown up?’) (Timberlake (1982)) (43) Ar būta tenai langinių? and be/pass-n/sg there window-gen ‘Were there really windows there?’ (lit. ‘And had there been existed by windows there?’) (Timberlake (1982)) (44) Jo pasirodyta esant didvyrio. him-gen seem/pass-n/sg being hero ‘By him it was seemed to be a hero.’ (Keenan and Timberlake (1985)) (45) To lapelio būto vėjo nupūsto. that leaf-gen be/pass wind-gen blow/pass ‘By that leaf there was being blown down by the wind.’ (Timberlake (1982)) (42) and (43) are passives of unaccusative verbs; (44) is the passive of a raising verb; (45) is a double passive. Similar facts have been reported for Turkish (Özkaragöz (1980; 1982)), Sanskrit (Ostler (1979, chap. 5)), and Irish (Nerbonne (1982)). In fact, this cross-linguistic difference between English/Dutch on the one hand and Lithuanian/Turkish on the other can be given an elegant account on our analysis. One does not expect the θ-Criterion to vary from language to language; indeed, sentences like those in (40), which are pure θ-Criterion violations, are not found in any language. The account of the ungrammaticality of (39) also relies on the assumption about the categorial features of -en, however; and the categories of corresponding items are known to vary to a certain extent from language to language. Thus, we can account for the Lithuanian/Turkish passives if we assume that the categorial properties of the passive morpheme in these languages are slightly different from those in English/Dutch. In particular, we can say that the passive morpheme in these

274 Mark Baker et al. languages is not an Infl, but rather an N that cliticizes to Infl. In normal personal or impersonal passives, this morpheme will be generated in the subject position. From there, it will affix to Infl. This process is technically an Incorporation in the sense of Baker (1988) and will be governed by the principles of movement as explained there. From this point on, the derivation will be just like that of normal passives: the passive morpheme cliticizes to the verb,7 and an object may move into the vacated subject position if there is one.8 This gives derivations such as (46): (46) [-pass [I [V (NP)...]]] Incorporation [e [I + pass [V (NP)...]]] Cliticization [e [i [V + I + pass (NP)...]]] NP Movement [NPj [i [V + I + pass tj...]]] The important difference between the Lithuanian passive and the Dutch passive is that as an N, the Lithuanian passive morpheme can appear in any NP position generated by the base—including VP-internal positions, where it receives an internal θ-role. This allows a grammatical output with unaccusative verbs, for example. The passive morpheme is generated in the θ-position and then undergoes NP Movement to the subject position. From there it can affix to the Infl and ultimately the V in the usual way: (47) [e [I [V -pass]]] NP Movement [-pass [I [V t]]] Incorporation [e [I + pass [V t]]] Cliticization [e [i [V + I + pass t]]] (= (42), (43)) The raising example (44) is similar: here the passive morpheme is generated as the subject of the embedded clause and reaches the subject position of the matrix clause by NP Movement. From there the derivation proceeds as before. The double passive examples are slightly different. In English and Dutch double passives are always forbidden, because they would involve having two arguments (the two passive morphemes) outside the VP in Infl. Since verbs never assign more than one external θ-role, a θ-Criterion violation is inevitable. In Lithuanian, however, one passive morpheme can be generated in an NP position inside the VP and one outside it. This makes the following derivation possible: (48) [-pass [I [V -pass]]] Incorporation [e [I + pass [V -pass]]] NP Movement [-pass [I + pass [V t]]] Incorporation [e [I + pass + pass [V t]]] Cliticization [e [i [V + I + pass + pass t]]] ( = (45)) In effect, this typological difference between passive constructions is accounted for by introducing a second way that the basic passive structure

Passive Arguments Raised 275 (see (2)) can occur: it can be base-generated (English, Dutch) or derived by Move α (Lithuanian, Turkish). Since we assume that the Lithuanian passive morpheme can be generated in any NP position, we face potential problems of overgeneration. For example, we must prevent the association of the Lithuanian equivalent of (49a) with the sense in (49b) by the derivation in (49c): (49) a. b. c.

John was beaten (by Bill). John beat someone (/Bill). [John [I [beat -en]]] [John [I + en [beat t]]] [John [i [beat+ I + en t]]]

The solution to this problem comes from the theory of syntactic affixation (Incorporation). Baker (1988) argues that an element X can affix to an element Y in the syntax only if Y governs X’s original position (compare the Head Movement Constraint of Travis (1984)). Now, the passive morpheme is stipulated as needing to affix to Infl. The only NP position that Infl governs is the subject position; hence, the derivation in (49) is ruled out. More generally, in order to fulfill its affixation requirements, the passive morpheme must come to be in the subject position. This can in principle take place in either of the two ways mentioned above: it can be base-generated in the subject position (as in (46)) or it can reach the subject position by NP Movement (as in (47), (48)). Other imaginable situations are ruled out. It is important to clarify the status we predict for (49). A form analogous to (49a) could perfectly well be associated with an interpretation like (49b) by a derivation much like (49c) if we made the minimal change of saying that the relevant morpheme affixes directly to verbs. In fact, such structures are attested; the following are two such examples: (50) a. Muk’ bu š-i-mil-van never asp-1sA-kill-Apass ‘I never killed anyone.’ (Tzotzil; Aissen (1983)) b. Man-li’i’ häm (guma’). Apass-see we(abs) (house) ‘We saw something (a house).’ (Chamorro; Gibson (1980)) However, morphemes with these properties will be unable to appear in a structure like (5la) with interpretation (51b): (51) a. John kill-van. b. Someone killed John.

276 Mark Baker et al. c. [-van [I [beat John]]] [[I [beat + van John]]] [John [I [beat + van t]]] Essentially, this means that -van is not what we would call a passive morpheme; rather, it is an antipassive morpheme, an entirely different element (see Baker (1988) for discussion).9 In conclusion, we have shown that the 1AEX can be derived from the θ-Criterion once the passive morpheme is recognized as a true syntactic argument. The limited violations of the 1AEX that have been attested— those like (39) but not like (40)—are the result of a small difference in the lexical properties of the passive morpheme in question.

4. Case Theory and Impersonal Passives Having clarified the relationship between passive morphemes and θ-role assignment, we now try to do the same for Case assignment. In section 1 we discussed how the basic fact about Case theory in passives—that the verb’s Case is “absorbed”—can be derived from the assumption that the passive morpheme is an argument. The result followed immediately from two widely held principles: (i) the Visibility Condition, which says that arguments must be Case-marked to receive a θ-role at LF, and (ii) the fact that (structural) Case is assigned under government at S-Structure. At S-Structure the passive morpheme is an argument attached to the verb (see (2)); hence, it must be assigned the verb’s Case. Thus, this Case cannot be assigned to any (other) NP, and we say that it is “absorbed.” Although this works nicely for languages like English, Jaeggli (1986) points out that it raises a serious comparative issue: what about languages in which passives can be formed with intransitive verbs as well as with transitive ones?10 Examples of such impersonal passives in German are as follows: (52) a. b.

Es wurde bis spät in die Nacht getrunken. It was till late in the night drunk ‘Drinking went on till late at night.’ (Jaeggli (1986, (22b))) Sonntags wird nicht gearbeitet. Sundays is not worked ‘On Sundays there is no working.’ (Roberts (1985, 512))

Verbs such as these do not necessarily take accusative Case objects. If this means that they lack the ability to assign accusative Case, then the passive morpheme will not be Case-marked in structures like (52). Given the Visibility Condition, a θ-Criterion violation would result. Thus, it is necessary to study the Case theory properties of passives more carefully.

Passive Arguments Raised 277 After considering several possibilities, Jaeggli assumes that there is a parametric difference between English and German such that intransitive verbs like work have the capacity to assign accusative Case in the latter but not the former. Normally such verbs will not have an accusative object even in German, because they have no θ-role to assign to it. Nevertheless, in passive structures this accusative Case is available to make the passive morpheme visible at LF, thereby allowing (52). Although this solution mechanically preserves the idea that the passive morpheme’s properties follow from the fact that it is an argument, it has serious problems. First, it is ad hoc in that it posits a fundamental difference between the lexical properties of English and German verbs without any independent motivation. Second, it makes the dubious prediction that no language could have two distinct passive constructions, where one allows impersonal passives and the other does not.11 Third and most important, recent literature has brought out the fact that there are impersonal passive constructions in which accusative Case is still visibly assigned to the thematic object of the verb. Interestingly, this never happens in German: (53) *Es wird diesen Roman von vielen Studenten gelesen. It is this-acc novel-acc by many students read Sobin (1985) shows that it does in Ukrainian, however. In this language the thematic object can appear in either a nominative Case form or an accusative Case form, in more or less free variation: (54) a. b.

Cerkv-u bul-o zbudova-n-o v 1640 roc’i. church-acc/fem was-imp built-pass-imp in 1640 ‘The church was built in 1640.’ Cerkv-a bul-a zbudova-n-a v 1640 roc’i. church-nom/fem was-fem built-pass-fem in 1640 ‘The church was built in 1640.’ (Sobin (1985, (13)))

Timberlake (1976) makes the same point for North Russian dialects. For convenience, we will refer to structures such as (54) as transitive passives in view of their accusative objects. These constructions (also called impersonals) are not particularly rare; they are also found in Welsh (Perlmutter and Postal (1984)), Polish (Keenan and Timberlake (1985), J. Zapior (personal communication)), and the impersonal si-passives in Italian (Belletti (1982)).12 Jaeggli’s solution for German impersonal passives clearly will not extend to these languages. Thus, we are forced to conclude, following Sobin (1985), that the absorption of accusative Case is a cross-linguistically more variable property of passive than the absorption of a θ-role.

278 Mark Baker et al. In order to give an account of the range of impersonal passives, we must modify one of the assumptions from which we derived the Case-absorption effect. There are two possibilities. We could preserve the Visibility Condition as it is if we held that: (55) An element (here the passive morpheme) can receive Case from some item that does not (minimally) govern it. Alternatively, we could say that the Visibility Condition needs to be generalized somehow, such that: (56) An element (here the passive morpheme) can under certain circumstances be assigned a θ-role at LF without being assigned Case. Both of these assumptions are departures from generic Government-Binding Theory, but in fact both have independent motivation. We discuss each in turn. The motivation for (55) comes from the fact that nominative Case can visibly appear on NPs inside the verb phrase in German (Den Besten (1981; 1985)). Indeed, this is possible whenever the subject has lexically assigned Case and hence does not need the nominative Case itself. For example: (57) a. b.

. . . daβ [S meinem Bruder [VP deine Musik nicht gefällt]]. That my brother-dat your music-nom not likes ‘. . . that my brother doesn’t like your music.’ . . . daβ [S dem Karl [VP [S e [VP dein Buch zu gefallen]] scheint]]. That Karl-dat your book-nom to like seems ‘. . . that Karl seems to like your book.’

In (57a) nominative Case from the Infl is assigned into the VP; in (57b) it is passed down all the way from the matrix Infl, into the VP of the matrix verb’s infinitival complement. Clearly, some mechanism is available in German that allows such assignment to take place (see Den Besten (1981; 1985), Webelhuth (1986), Baker (1985), and Roberts (1985) for various suggestions). Now, on our analysis the passive morpheme is also an argument inside the VP at S-Structure; hence, the same mechanism could allow nominative Case to be assigned to it. The structure would be something like (58) (details omitted): (58) [s — [I′[VP getrunk + en] wurde]] nom. case

This allows the passive morpheme to be visible for θ-role assignment at LF in the usual way. See Roberts (1985, chap. 5) for further details.

Passive Arguments Raised 279 The independent motivation for (56) comes from the study of Noun Incorporation (NI), the phenomenon found in many polysynthetic languages by which the head noun of the thematic object of the verb appears morphologically combined with the verb. An example of such a construction is the following, from Nahuatl: (59) Na? ipanima ni-kwatini-itta. I always 1sS-tree-see ‘I see trees all the time.’ (Merlan (1976, (8B))) Baker (1985; 1988) has argued that, at least in some languages, this morphological combination comes about by adjoining the head of the object NP onto the verb in the syntax. If this is correct, then the verb in (59) has a structural object at all syntactic levels. Such objects are unusual, however, in that they need not necessarily be assigned Case. Thus, consider the following NI structures from Niuean (Austronesian; from Seiter (1980)): (60) Fai gata nakai i Niuē? exist-snake-Q in Niue ‘Are there snakes in Niue?’ (61) a. b.

Ne fanogonogo a lautolu *(ke he) tau lologo ke he past listen abs they to pl song to tau tūlā ne ua. Pl clock nonft two ‘They were listening to (the) songs for a couple of hours.’ Ne fanogonogo lologo a lautolu ke he tau tūlā ne ua. past listen song abs they to pl clock nonft two ‘They were listening to songs for a couple of hours.’

(62) a. b.

Kua tā he tama e tau fakatino *(aki) e malala. perf-draw erg-child abs-pl-picture with abs-charcoal ‘The child has been drawing pictures with a charcoal.’ Kua tā fakatino (h)e tama (aki) e malala. perf-draw-picture (erg)-child with abs-charcoal ‘The child has been drawing pictures with charcoal.’

(60) shows that the argument of an unaccusative verb can incorporate, even though such verbs cannot generally assign Case to their objects. By itself, this example is not especially revealing, since the object NP that the noun incorporated out of could receive nominative (or, in this case, absolutive) Case from Infl by a mechanism like that discussed for German. This would not extend to (61), however, which illustrates a special class of “defective transitive” verbs in Niuean. These verbs are not Case assigners;

280 Mark Baker et al. hence, the preposition ke he must normally be inserted to assign Case to their object, as in (61a).13 (61b), however, shows that if the head of the object is incorporated, the structure is grammatical without the insertion of ke he. This time, the object cannot be inheriting Case from Infl, because Infl’s Case is necessarily assigned to the overt thematic subject. Finally, (62) shows that when there is an instrument phrase in the clause, that instrument can receive structural Case from the verb if and only if the head of the object NP has incorporated into the verb.14 Given that each Case can only be assigned once, (62b) proves that incorporated objects need not be assigned Case in Niuean, since the two available structural Cases (ergative and absolutive) are visibly assigned to other NPs (the subject and the instrument, respectively). Their lack of Case notwithstanding, the incorporated nominals in (60)– (62) clearly receive θ-roles at LF. Thus, we have another counterexample to the standard formulation of the Visibility Condition. Hence, the condition must be revised along the lines indicated in (63): (63) In order for an argument to be visible for θ-role assignment at LF, it must either

a. be assigned Case, or b. have its head morphologically united with an X0.

The conceptual implications of this disjunctive formulation are discussed in Baker (1988, sec. 3.4). Its importance in the present context, however, is that it automatically solves the primary problem of impersonal passives. The passive morpheme is an argument that, like incorporated noun roots, is morphologically attached to the verb at S-Structure. Thus, it can be made visible by clause (b) of (63), and no θ-Criterion violation will result. Of course, the extended Visibility Condition in (63) does not give the complete account of impersonal passives, since it does not allow for the observed ambivalence of passive morphemes with respect to Case assignment. Whereas passive morphemes in Ukrainian never need to receive accusative Case in accordance with (63b), passive morphemes in German do absorb accusative Case obligatorily when the verb has such Case available (see (53)); furthermore, passive morphemes in English must be assigned accusative Case, making impersonal passives impossible. Thus, we take (63) to define the limits of what is allowed by Universal Grammar, but observe that individual languages have narrower restrictions—in particular concerning the use of clause (b). In this regard, it is striking that Noun Incorporation constructions have a parallel ambivalence with respect to accusative Case assignment. The examples from Niuean above illustrate the situation where the incorporated nominal root does not need Case at all; NI in this language is thus comparable

Passive Arguments Raised 281 to passives in Ukrainian. Based on the description in Merlan (1976), however, Nahuatl seems to be slightly different. As in Niuean, Incorporation in Nahuatl is possible with certain unaccusative type verbs that would have no Case for the incorporated nominal: (64) Tesiwi-weci-∅-∅. (compare (60)) hail-fall-pres-3s ‘Hail is falling.’/‘It is hailing.’ (Merlan (1976, (26))) Nevertheless, when the object of a transitive verb is incorporated, accusative Case is not free to be assigned to an instrumental instead, as it is in Niuean; rather, the instrument still appears with its characteristic preposition ika: (65) Ne? panci-tete?ki ika kočillo. (compare (62)) he bread-cut with knife ‘He cut the bread with a knife.’ (Merlan (1976, (35b))) Assuming that these patterns are sufficiently general, NI in Niuean is comparable to passive in German: (64) is an example of “impersonal NI” and (65) shows that “transitive NI” constructions are not allowed. Finally, in Greenlandic Eskimo, NI never takes place with an intransitive verb (Sadock (1980; 1985)), and logically transitive verbs with incorporated nouns cannot in general assign structural Case to an instrument or some other NP.15 Thus, NI in Greenlandic is comparable to passive in English, where neither impersonal nor transitive passives are found. This Case-theoretic variation, together with the similarity between passive and NI, can be captured in the following statement of how particular languages use the options made available by (63): (66) If α is an argument of language β, then α can be made visible by A: (63a) or (63b) B: (63a) if structurally possible; otherwise (63b) C: (63a) only where for a given language (and conceivably for specific morphemes in that language), one of A, B, or C is chosen. The properties of the languages we have reviewed can then be summarized as follows: (67) Visibility Setting A Passive morpheme Ukrainian Incorporated noun Niuean

B C German English Nahuatl Eskimo

The characteristic patterns follow directly from these statements.

282 Mark Baker et al. To summarize, we have sought in this section to deepen the typology of passive constructions from a Government-Binding perspective, in particular integrating transitive passives into the account. The classical fact that English passives absorb accusative Case still follows from the fact that the passive morpheme is an argument inside the VP at S-Structure, given that English (unlike German) has no way of assigning nominative inside the VP and that it (unlike Ukrainian) allows only the most restrictive interpretation of the Visibility Condition.16 Yet room is made for other languages to differ from this in welldefined ways. The cost of this achievement is a significant weakening of the association between θ-role assignment and Case assignment, as expressed in the extended Visibility Condition (63) and its parameters (66). Although this proposal is empirically motivated, its conceptual implications will require careful further study. Nevertheless, the theory still makes nontrivial predictions about the range of passive constructions found in particular languages. Finally, we have shown in this section that the Case-theoretic properties of passive morphemes are the same as those of incorporated noun roots. Now, it seems comparatively clear that incorporated nouns are arguments that receive a θ-role from the verb (possibly by originating in an independent NP at D-Structure, as in Baker (1985; 1988)). Inasmuch as the same principles that apply to them also apply to passive morphemes, we have important indirect evidence for the fundamental hypothesis that passive morphemes are arguments.

5. Passives With Auxiliary Verbs To this point, our discussion has ignored the presence of the auxiliary verb in languages requiring one for the passive. This final section explores this aspect of the passive construction and its implications for our proposal. 5.1 Auxiliaries and the Passive Morpheme Perhaps the most obvious way to integrate the auxiliary verb be into the D-Structure representation of a passive construction is as in (68). Here -en is introduced under the Infl node and later moved onto the “main” verb: (68) [S [I –en ] [VP be [VP V NP]]] This is consistent with standard assumptions about auxiliaries and the position of Infl. The difficulty with this proposal is that it entails either lowering -en through at least two maximal projections or raising the main verb through as many maximal projections. The problem is most acute in situations involving several auxiliary verbs; consider, for example, how (69) would be derived under such an account: (69) Mary has been kissed.

Passive Arguments Raised 283 (70) a. [S [I –en ] [VP has [VP been [VP kiss Mary]]]] b. [S [I hasi –en ] [VP ti [VP been [VP kiss Mary]]]] The first two steps of the derivation are given in (70). (70a) portrays the D-Structure representation. We assume that the auxiliary verbs have and be head their own VPs, following Ross (1969). The first of these auxiliary verbs is then moved under the Infl node, following essentially Emonds (1976).17 The difficulty comes with the third step. We have argued that -en is affixed onto the main verb, to form a passive participle, and that this in turn causes the object to move into subject position. This can be brought about in one of two ways. Either -en must move down, or the main verb must move up into Infl. If the main verb moves into Infl, then it must share this position with up to four auxiliary verbs. (Otherwise, our theory would wrongly move the main verb to the left of the auxiliary verbs.) This hypothesis would create a number of problems for an account of the placement of not, the landing site of Q-Float, and the characterization of Subject-Auxiliary Inversion, among other things; see Emonds (1976).18 But allowing -en to move downward is also problematic. There are no cases that we know of involving movement downward through three maximal projections, as in (70). This operation seems to be unavailable in languages that have easily identifiable subject clitics. In French, for example, the weak pronoun je is a subject clitic, and has attached to the auxiliary verb in (71): (71) a. J’ai acheté ҫa pour Jean. ‘I bought this for Jean.’ b. *Ai j’acheté ҫa pour Jean. See Kayne (1972, sec. 2.4) for discussion. Note that je is not able to move past the auxiliary verb and attach to the main verb, as in (71b). If such a lowering process is not available for the subject clitic je, then it is a reasonable conjecture that such “long-distance” downward movement should be prohibited in general and that, in particular, such movement is not available to the passive morpheme. If the main verb may not move into Infl and -en may not move down onto the verb, then (70a) cannot be the D-Structure representation of (69). We are thus led to assign (69) a different D-Structure representation. We suggest that this representation is the one shown in (72) and that, in general, passive constructions have D-Structure representations akin to (73).19 Building on work by Stowell (1981), Couquaux (1981), and Burzio (1986), Kayne (1989) has suggested that the auxiliary verb be takes a clausal complement headed by Infl.20 If Kayne is correct, and if -en is introduced under Infl, then (73) emerges as the D-Structure representation for passive constructions.

284 Mark Baker et al. (72)

S I′

NP I

VP V have

VP V

S I

be

-en

(73)

VP

V

NP

kiss

Mary

S I′

NP e

I

VP S

V be

I

-en

VP

V

NP

From (73), the S-Structure representation of a passive sentence may be derived in the way that we have already sketched. The auxiliary verb moves to Infl and the main verb merges with the passive morpheme, yielding (74):21

Passive Arguments Raised 285 (74)

S I′

NP e

I

be

VP ti

S

V + en

VP

In English the passive morpheme must be assigned Case by the main verb, and this forces the object NP to move to subject position. Rather surprising support for the conjecture that (73) is the D-Structure representation for passives with auxiliary verbs comes from VP Ellipsis data in English. Consider the facts in (75): (75) a. Gary should have been sleeping, and Mary should have been, too. b. ?Gary should have been sleeping, and Mary should have, too. c. ?Gary should have been sleeping, and Mary should, too. d. Gary should have been paid better, and Mary should have been, too. e. Gary should have been paid better, and Mary should have, too. f. ?Gary should have been paid better, and Mary should, too. g. Gary is being paid better nowadays, and Mary is, too. h. *Gary was being given a book, and Mary was being, too. In (75a)–(75g) VP Ellipsis has allowed a VP that otherwise would have been headed by the main verb or one of the auxiliary verbs to be empty. As these examples suggest, VP Ellipsis may delete any of the VPs, as long as the first auxiliary verb is stranded (although in several cases the result is somewhat marginal).22 There is one gap in the paradigm, however, illustrated by (75h). When both progressive be and passive (or copular) be are present, VP Ellipsis cannot delete the phrase headed by the lowest predicate. We shall now show how this fact supports the structure in (73). Our argument proceeds as follows. We give a precise description of the environments where VP Ellipsis may occur. This requires both describing the situations that (75h) exemplifies and defining the types of phrases that may undergo deletion. This second task requires defending a novel theory of main verb be, for clauses with main verb be do not behave as expected

286 Mark Baker et al. under VP Ellipsis. Once this theory of main verb be is embraced, however, it becomes impossible to adequately characterize the environments illustrated by (75h), where VP Ellipsis is prevented. We are rescued from the conundrum if (73) is the true structure of passive clauses. We begin with the second task: determining which phrasal categories VP Ellipsis may delete. 5.2 VP Ellipsis The contrasts in (76) suggest that only VPs may be elliptical: (76) a. I said that Mary should leave, and that Tom should [VP e], too. b. *I said that Mary should kiss Pete, and that Tom should kiss [NP e], too. c. *I said that Mary thought that Bill left, and that Tom thought [S‫ ׳‬e], too. d. *I said that Mary talked to Bill, and that Tom talked [PP e], too. e. *I said that Mary believed Bill to be intelligent, and that Tom believed [S e], too. NPs, S’s, PPs, and Ss may not be elliptical, as the ungrammaticality of (76b– e) shows. The problem comes with the examples in (77), which suggest that an AP may delete: (77) a. I said that Mary was angry, and that Bill was [? e], too. b. I claimed that Mary is happy with Tom, and that Bill is [? e], too. In other situations, however, APs may not be elliptical; consider (78): (78) a. *John seems angry, and [S Mary [VP seems [AP* e]]], too. b. *Mary feels happy, and [S Gary [VP feels [AP* e]]], too. We need a way of distinguishing between these two environments. Because the APs deleted in (77) do not differ in any relevant respect from those deleted in (78), the factor responsible for the difference in grammaticality must be the verb governing the AP. In (77) the verb is be, whereas in (78) it is either seem or feel. We will treat be as the exceptional case and link its unique behavior in this context with another of its exceptional properties: be is the only “main” verb in English that may raise into Infl.23 This is demonstrated by, among other things, its ability to appear before not: (79) Gary is not happy.

Passive Arguments Raised 287 Our explanation for this correspondence goes as follows. Imagine that main verb be is, in fact, not a main verb, but is instead an auxiliary verb.24 This immediately accounts for the fact that it, unlike all other “main” verbs, is able to appear in Infl position. Now, if all sentences must have a main verb, then sentences overtly containing just be must also contain a phonologically null main verb.25 It is the VP headed by this null verb that is deleted in (77).26 Our claim, then, is that the difference between be and other verbs rests on be’s unique subcategorization properties. Only be may be followed by a VP headed by an empty verb. This accounts at once for be’s anomalous behavior with respect to Verb Raising and VP Ellipsis. We may therefore conclude that VP Ellipsis does affect only verb phrases; where it seems that APs are being deleted, in fact a verb phrase containing an empty verb is deleted.27 5.3 VP Ellipsis and Passive Participles We turn now to a description of the restriction on VP Ellipsis that prevents its application in (75h). In that example, repeated in (80), an empty VP cannot follow the present participle: (80) *Gary was being given a book, and Mary was being, too. A very similar restriction is found in clausal gerunds. As in (80), an empty VP cannot be found following the gerund: (81) a. *I remember Mary having eaten the apple, and Gary having, too. b. *I remember Mary having been angry about it, and Gary having, too. A number of explanations have been proposed for the failure of VP Ellipsis in these contexts; see Lobeck (1986) for a review. It is sufficient for our purposes to rely on a descriptive statement of the restriction, as in (82): (82) VP Ellipsis cannot apply to a VP governed by V + ing. Because (82) refers to VPs governed by V + ing, it only prevents deletion of the VP immediately below V + ing. A VP more deeply embedded may delete. Thus, (82) distinguishes the ungrammatical (81b) from the grammatical (83): (83) I remember Mary having been angry about it, and Gary [I havingi ] [VP ti [VP been [VP e]]], too.

288 Mark Baker et al. In (83) the VP headed by an empty verb, and containing the AP angry about it, has deleted. This conforms to (82) because the elliptical VP is not governed by having. Now consider how (82) will apply in passive constructions involving progressive be, such as (84). Our theory of be will assign to such examples the representation in (85). (84) Gary was being given a book. (85) S I′

NP Gary

I

be

VP VP

V

t

V

VP*

being V ε

?

V

NP

give

a book

(The “ε” in (85) represents the null verb that subcategorizes be.) (82) prevents VP* (the VP headed by an empty verb) from being elliptical, because it is governed by being. But it does allow the phrase labeled “?” to be elliptical. However, this is precisely what the example that began this section, (75h), shows cannot happen: (86) *Gary was being given a book, and Mary was being, too. There is no readily apparent way that (82) can be extended so as to prevent deleting “?” in (85), without wrongly preventing the grammatical (83). The solution must lie elsewhere. The structure we have given passive clauses provides that solution. If (73) is correct, and passive participle phrases are Ss headed by an Infl containing -en, then the phrase labeled “?” in (85) is an S:28

Passive Arguments Raised 289 (87)

S I′

NP Gary

I

be

VP VP

V

t

V

VP*

being V ε

S

I -en

VP V

NP

give

a book

Recall that in the previous section we established that VP Ellipsis only affects VPs. Thus, the reason “?” cannot be elliptical in (85) is that it is not a VP, just as our theory of passives entails.

6. Conclusion We have discussed aspects of the syntax of passive constructions, taking as our starting point Jaeggli (1986). We have argued that the only crucial property of a passive morpheme is that it is an argumental affix tied to the Infl node; we have shown how all of the well-known properties of the passive construction can be derived from this. Furthermore, we have shown that the passive morpheme behaves like other syntactic arguments with respect to binding theory (section 2), and with respect to θ-theory (section 3), Case theory (section 4), and X-bar theory (section 5). In this way, a diverse body of facts—ranging from “crossover” effects in passives, to cross-linguistic constraints on the class of passivizable verbs, to failures of VP Ellipsis— receives a unified and explanatory account.

Notes 1. Notice that this account of by-phrases does not entail that by-phrases cannot appear in other constructions, such as nominals or the Romance faire. . .par construction.

290 Mark Baker et al. 2. If IMP is in VP, then the trace in object position binds IMP as well. This would explain the impossibility of It was reasoned to John that . . . on the interpretation where IMP = John. The following examples confirm this: (i) *Stories were told IMPi to Johni. (ii) *Letters were sent IMPi to Maryi. (iii) *Whoi was testimony given IMPi about ti? (iv) *A book was presented IMPi to everyonei.

These examples are all bad, showing that there is a c-command relation between IMP and other arguments in VP. In that case there are two violations of Principle C. 3. See Pesetsky (1987) for an alternative account of psych-verbs. 4. We thank a reviewer for this point and for examples (34b), (35). 5. In Perlmutter and Postal’s original formulation of the 1AEX, it is assumed that in impersonal passives the expletive is inserted as an object and is promoted to subject in order to block these examples. 6. Jaeggli’s account contains a serious difficulty, however. In particular, to block sentences like (39), his account requires (i): (i) -en may only receive the external θ-role of the V.

Jaeggli seeks to derive this statement from the fact that the external argument is the only one for which the verb is not strictly subcategorized. Hence, its manifestation alone is not restricted to a particular syntactic frame and it alone can be expressed by an affix. However, this is too strong, since internal arguments can be expressed by affixes in languages of the world: (42), (43), (45), (50) are all examples of this. On our account, (i) is derived from the fact that-en is generated in Infl, as discussed directly below. 7. Alternatively, the verb could raise to the Infl position, joining the passive morpheme there, as assumed in Baker (1985) (see Chomsky (1986)). At least in this case, this alternative seems more natural. See section 5 for relevant discussion. 8. See Burzio (1986) on chains that “overlap” in the subject position. 9. Moreover, our approach is consistent with a language having two passive morphemes, one of which obeys the lAEX and one of which does not. Italian seems to be such a language, having as it does both a copular passive and impersonal “passives” with the clitic si (see Belletti (1982), Manzini (1983), Burzio (1986)). This situation would be impossible if the 1AEX were itself a parameter of language. See Baker (1988, 332–334) for discussion. 10. Although it is true that verbs without an internal argument never passivize in English, certain verbs that never take an NP argument can. It is not obvious whether such verbs have accusative Case to assign to the passive morpheme. One such class of verbs are the pseudopassives: (i) John was talked about t last night.

We follow the tradition of assuming some process of reanalysis between talk and about: this complex verb can then inherit a Case feature from the P, and this will be assigned to -en, making it visible. Conceivably, this approach could be extended to impersonal passives in French, which are possible only if the verb takes an internal PP argument: (ii) Il a été parlé de vos frères hier soir. There was spoken of your brothers yesterday evening (Kayne (1984))

Here one might claim that the same reanalysis and NP Movement processes that occur at S-Structure in English occur at LF in French. (On LF reanalysis, see

Passive Arguments Raised 291 Baker (1985), Haïk (1985); on LF NP Movement, see Chomsky (1986).) This would capture the close correspondence between permissible pseudopassives in English and permissible impersonal passives in French. 11. Italian and Spanish may be such languages, given that impersonal si passives are allowed with intransitive verbs but copular passives are not. The Case properties of the si morpheme are complex, however; in particular, there is evidence from infinitival constructions that si is dependent on nominative Case in a very unusual way, which goes beyond the bounds of our discussion. See Burzio (1986) for review and discussion. Polish also has two passive constructions: one with copular morphology that is allowed only with transitive verbs and one with clitic morphology that is allowed with intransitives. In this language the mysterious Case-theoretic complications in the clitic-type passive do not seem to arise (J. Zapior (personal communication)). 12. Keenan and Timberlake (1985) point out that on earlier Government-Binding views of the passive, these constructions would be exceptions to Burzio’s Generalization, because the passive verbs assign accusative Case but no external θ-role. On our analysis, this potential anomaly disappears, since the verb does assign an external θ-role (to the passive morpheme itself) as well as accusative Case. 13. Given the theory of Baker (1985), we know that ke he is an inserted preposition in (61a) because otherwise the incorporation of its object would be impossible. Independent evidence for this is the fact that ke he can mark the raised subject in “raising to object” constructions (see Massam (1985)); this would be impossible if ke he were always a θ-role assigner. 14. Crucially, the fact that the instrumental preposition cannot be dropped without incorporation shows that—unlike verbs in certain other languages—verbs in Niuean can only assign one structural Case. 15. A handful of logically triadic verbs such as ‘give’ are exceptions to this. The important point is that “reassignment” of structural Case is not free across the class of incorporating transitive verbs, the way it is in Niuean. 16. Notice that we have made no attempt to choose between the two different approaches to impersonal passives (for example, in German), either of which could be sufficient. We see no need to do so for current purposes, since both have solid independent motivation. An LI reviewer has suggested to us that infinitives might provide a way of distinguishing the two proposals: if the passive morpheme needs to receive nominative Case from a tensed Infl, then impersonal passives should be impossible in infinitivals. In fact, infinitival impersonal passives are acceptable embedded under raising verbs but not in other contexts, both in German (Roberts (1985)) and in relevant Slavic languages (Sobin (1985), J. Zapior (personal communication)). It is not clear that this distinguishes the proposals, however, because even if the passive morpheme did not need Case from Infl, a null subject required by the Extended Projection Principle of Chomsky (1981) might, since (for some reason) PRO—the usual subject of an infinitive—cannot be an expletive (*PRO/*it to rain would bother us). Since the issues about control and expletives that arise lead far beyond passive constructions per se, we do not explore the matter further. 17. See also Akmajian and Wasow (1975), Culicover (1982), Jackendoff (1972), and—for recent discussion, where somewhat different structures are assumed— Lasnik (1981), Lobeck (1986), and Chomsky (1986). 18. In particular, not may have clausal scope only if it follows the first of a series of auxiliary verbs, the contrast in (i) shows. And if only verbs that have been

292 Mark Baker et al. moved into Infl may undergo Subject-Auxiliary Inversion and “Subject-Auxiliary Inversion” actually moves the Infl node, then the contrast in (ii) also demonstrates that main verbs may not move into Infl: (i) Mary can’t have left. ?Mary can have not left. (ii) Can Mary leave? *Can leave Mary? *Leaves Mary? 19. Our proposal gives to a passive sentence a D-Structure representation very like the one adopted in Chomsky and Lasnik (1977), where the passive morpheme is assumed to head an AP. 20. Kayne’s claim is made, in particular, for the past tense and passive auxiliary be in the Romance languages. We extend this claim to the passive auxiliary be in English. 21. We do not know how to determine whether the main verb has raised to join-en in Infl position, or whether-en has lowered onto the verb. Note that raising the main verb in these contexts would not engender the empirical problems that arise if one hypothesizes that the main verb moves into the matrix Infl. (See footnote 18.) Of course, the explanation that is ultimately given for the inability of main verbs to move into Infl in general must have the property that main verbs are allowed to raise into an Infl bearing the passive morpheme. That is, this scenario relies on some unknown constraint on Verb Raising that distinguishes the Infls of passive clauses from others. 22. We disagree here with Lobeck’s (1986) claim that examples like (75b,c) are ungrammatical. In particular, we find a sharp contrast between these examples and (75h). Our judgments are consistent with Emonds (1976) and Iwakura (1977). 23. We are setting aside the exceptional, and somewhat archaic or dialectic, instances of main verb raising with need, have, and dare. See Pullum and Wilson (1977) for discussion. 24. We follow Schachter (1983) in this respect. With this hypothesis, we may define auxiliary verb as a verb that is subcategorized by a VP, and main verb as any other verb. This leaves the task of distinguishing auxiliary verbs from causative and perception verbs, which also seem to be subcategorized by a VP. 25. This may be too strong. It is sufficient for our purposes to say that “main verb” be may be an auxiliary verb. That there are situations where be can be the main verb is suggested by such cases as (i): (i) We saw Mary be rewarded.

Bare clausal complements to perception verbs do not usually host auxiliary verbs. 26. An alternative account that also links the ability of be to raise to Infl with its ability to introduce an elliptical phrase turns on the structure that Verb Raising produces. Once be has raised, it is followed by a VP headed by its own trace, as in (i): (i) [Mary [I bei ] [VP ti [AP happy]]].

Now VP Ellipsis could simply delete this VP. This account, though adequate for simple clauses, does not extend to examples involving modal verbs, as in (ii): (ii) [Mary [I should] [VP be [AP happy]]].

In (ii) be does not raise into Infl because should is base-generated there (see Emonds (1976) and Jackendoff (1977)), but ellipsis of the material following be is still possible. The text account is therefore to be preferred.

Passive Arguments Raised 293 27. If the passive participle results from raising a verb into an Infl with-en, then it is mysterious why VP Ellipsis cannot affect the VP vacated by the moved verb, as in (i): (i) *Mary was believed happy, and Gary was believed, too.

This problem does not arise if VP Ellipsis is not a deletion process but instead results from base-generating an empty VP (see Williams (1977)). On this view, the participle believed in the second conjunct of (i) would have to be basegenerated in Infl, and this is impossible under our account of the passive. 28. Note that (85) does not involve an “adjectival” passive, since given cannot assign the goal θ-role externally: the given book, but *the given man. Hence, one cannot resort to claiming that “?” is an AP in (85).

References Aissen, J. (1983) “Indirect Object Advancement in Tzotzil,” in D. Perlmutter, Studies in Relational Grammar 1, University of Chicago Press, Chicago, Illinois. Akmajian, A. and T. Wasow (1975) “The Constituent Structure of VP and AUX and the Position of the Verb BE,” Linguistic Analysis 1, 205–245. Baker, M. (1985) Incorporation: A Theory of Grammatical Function Changing, Doctoral dissertation, MIT, Cambridge, Massachusetts. Baker, M. (1988) Incorporation: A Theory of Grammatical Function Changing, University of Chicago Press, Chicago, Illinois. Belletti, A. (1982) “‘Morphological’ Passive and Pro-Drop: The Impersonal Construction in Italian,” Journal of Linguistic Research 2, 1–34. Belletti, A. and L. Rizzi (1988) “Psych-verbs and θ-theory,” Natural Language and Linguistic Theory 6, 291–352. Besten, H. den (1981) “Government, Syntaktische Struktur und Kasus,” in M. Kohrt and J. Lenerz, eds., Sprache, Formen und Strukturen, Max Niemeyer Verlag, Tübingen. Besten, H. den (1985) “The Ergative Hypothesis and Free Word Order in Dutch and German,” in J. Toman, ed., Studies in German Grammar, Foris, Dordrecht. Borer, H. (1984) Parametric Syntax, Foris, Dordrecht. Burzio, L. (1986) Italian Syntax, Reidel, Dordrecht. Chomsky, N. (1981) Lectures on Government and Binding, Foris, Dordrecht. Chomsky, N. (1986) Barriers, MIT Press, Cambridge, Massachusetts. Chomsky, N. and H. Lasnik (1977) “Filters and Control,” Linguistic Inquiry 8, 425–504. Couquaux, D. (1981) “French Predication and Linguistic Theory,” in R. May and J. Koster, eds., Levels of Syntactic Representation, Foris, Dordrecht. Culicover, P. (1982) Syntax, 2nd ed., Academic Press, New York. Emonds, J. (1976) A Transformational Approach to English Syntax, Academic Press, New York. Gibson, J. (1980) Clause Union in Chamorro and in Universal Grammar, Doctoral dissertation, University of California, San Diego. Haϊk, I. (1985) The Syntax of Operators, Doctoral dissertation, MIT, Cambridge, Massachusetts. Iwakura, K. (1977) “The Auxiliary System in English,” Linguistic Analysis 3, 101–136. Jackendoff, R. (1972) Semantic Interpretation in Generative Grammar, MIT Press, Cambridge, Massachusetts.

294 Mark Baker et al. Jackendoff, R. (1977) X’-Syntax: A Study of Phrase Structure, MIT Press, Cambridge, Massachusetts. Jaeggli, O. (1982) Topics in Romance Syntax, Foris, Dordrecht. Jaeggli, O. (1986) “Passive,” Linguistic Inquiry 17, 587–622. Kayne, R. (1972) “Subject Inversion in French Interrogatives,” in J. Casagrande and B. Saciuk, eds., Generative Studies in Romance Languages, Newbury House, Rowley, Massachusetts. Kayne, R. (1984) Connectedness and Binary Branching, Foris, Dordrecht. Kayne, R. (1989) “Facets of Romance Past Participle Agreement.” In P. Benincà (ed) Dialect Variation and the Theory of Grammar. Dordrecht: Foris, pp. 85–103. Keenan, E. and A. Timberlake (1985) “Predicate Formation Rules in Universal Grammar,” in J. Goldberg et al., eds., Proceedings of the West Coast Conference on Formal Linguistics 4, Stanford University, Palo Alto, California. Lasnik, H. (1981) “Restricting the Theory of Transformations: A Case Study,” in N. Hornstein and D. Lightfoot, eds., Explanations in Linguistics: The Logical Problem of Language Acquisition, Longmans, London. Lasnik, H. (1985) “Illicit NP Movement: Locality Conditions on Chains?”Linguistic Inquiry 16, 481–490. Lebeaux, D. (1983) “A Distributional Difference between Reciprocals and Reflexives,” Linguistic Inquiry 14, 723–729. Lees, R. B. and E. S. Klima (1963) “Rules for English Pronominalization,” Language 39, 17–28. Lobeck, A. (1986) Syntactic Constraints on VP Ellipsis, Doctoral dissertation, University of Washington, Seattle. Manzini, M. R. (1980) “On Control,” ms., MIT, Cambridge, Massachusetts. Manzini, M. R. (1983) Restructuring and Reanalysis, Doctoral dissertation, MIT, Cambridge, Massachusetts. Marantz, A. (1984) On the Nature of Grammatical Relations, MIT Press, Cambridge, Massachusetts. Massam, D. (1985) Case Theory and the Projection Principle, Doctoral dissertation, MIT, Cambridge, Massachusetts. Merlan, F. (1976) “Noun Incorporation and Discourse Reference in Modern Nahuatl,” IJAL 42, 177–191. Nerbonne, J. A. (1982) “Some Passives Not Characterized by Universal Rules: Subjectless Impersonal,” in B. Joseph, ed., Grammatical Relations and Relational Grammar, Working Papers in Linguistics No. 26, Ohio State University, Columbus, Ohio. Ostler, N. (1979) Case-Linking: A Theory of Case and Verb Diathesis Applied to Classical Sanskrit, Doctoral dissertation, MIT, Cambridge, Massachusetts. [Distributed by the Indiana University Linguistics Club, Bloomington.] Özkaragöz, I. (1980) “Evidence in Turkish for the Unaccusative Hypothesis,” in Proceedings of the Sixth Annual Meeting of the Berkeley Linguistics Society, University of California, Berkeley. Özkaragöz, I. (1982) “Monoclausal Double Passives in Turkish,” paper presented at the Conference on Turkish Language and Linguistics in Ataturk’s Turkey, University of California, Berkeley. Perlmutter, D. (1978) “Impersonal Passives and the Unaccusative Hypothesis,” in J. Jaeger et al., eds., Proceedings of the Fourth Annual Meeting of the Berkeley Linguistics Society, University of California, Berkeley. Perlmutter, D. and P. Postal (1984) “The 1-Advancement Exclusiveness Law,” in D. Perlmutter and C. Rosen, eds., Studies in Relational Grammar 2, University of Chicago Press, Chicago, Illinois. Pesetsky, D. (1987) “Binding Problems with Experiencer Verbs,” Linguistic Inquiry 18, 126–140.

Passive Arguments Raised 295 Postal, P. (1971) Cross-Over Phenomena, Holt, Rinehart and Winston, New York. Postal, P. (1986) Studies of Passive Clauses, State University of New York Press, Albany, NewYork. Pullum, G. K. and D. Wilson (1977) “Autonomous Syntax and the Analysis of Auxiliaries,” Language 53, 741–788. Rizzi, L. (1986) “On Chain Formation,” in H. Borer, ed., Syntax and Semantics 19: The Syntax of Pronominal Clitics, Academic Press, New York. Roberts, I. (1985) The Representation of Implicit and Dethematized Subjects, Doctoral dissertation, University of Southern California, Los Angeles. Roeper, T. (1983) “Implicit Arguments and the Head-Complement Relation,” Linguistic inquiry 18, 267–310. Ross, J. R. (1969) “Auxiliaries as Main Verbs,” in W. Todd, ed., Studies in Philosophical Linguistics, Series One, Great Expectations Press, Evanston, Illinois. Sadock, J. (1980) “Noun Incorporation in Greenlandic,” Language 56, 300–319. Sadock, J. (1985) “Autolexical Syntax: A Theory of Noun Incorporation and Similar Phenomena,” Natural Language and Linguistic Theory 3, 379–440. Schachter, P. (1983) “Explaining Auxiliary Order,” in F. Heny and B. Richards, eds., Linguistic Categories: Auxiliaries and Related Puzzles, vol. 2, Reidel, Dordrecht. Seiter, W. (1980) Studies in Niuean Syntax, Garland, New York. Sobin, N. (1985) “Case Assignment in Ukrainian Morphological Passive Constructions,” Linguistic Inquiry 16, 649–662. Stowell, T. (1981) Origins of Phrase Structure, Doctoral dissertation, MIT, Cambridge, Massachusetts. Timberlake, A. (1976) “Subject Properties in the North Russian Passive,” in C. Li, ed., Subject and Topic, Academic Press, New York. Timberlake, A. (1982) “The Impersonal Passive in Lithuanian,” in Proceedings of the Eighth Annual Meeting of the Berkeley Linguistics Society, University of California, Berkeley. Travis, L. (1984) Parameters and Effects of Word Order Variation, Doctoral dissertation, MIT, Cambridge, Massachusetts. Webelhuth, G. (1986) “Some Data on the Verb-Object Relation in German,” Linguistic Inquiry 17, 772–776. Williams, E. (1977) “Discourse and Logical Form,” Linguistic Inquiry 8, 103–139. Williams, E. (1981) “Argument Structure and Morphology,” The Linguistic Review 1, 81–114.

9

Complex Inversion in French Luigi Rizzi and Ian Roberts

1. Introduction In this paper1 we would like to show that some recent theoretical innovations permit a principled account of complex inversion, a French construction which is on the agenda of theoretical and Romance syntacticians ever since Kayne’s (1972) seminal analysis. Some properties of the construction will lead us to revise and tighten current assumptions on Case, visibility and head-to-head movement, and to propose a new hypothesis on the nature of the root/non-root distinction. The major cases of complex inversion are found in root interrogative sentences: (1) Quel livre Jean a-t-il lu? Which book John has he read? (2) Personne n’est-il venu? No-one isn’t he come? ‘Didn’t anyone come?’ A striking property of the construction is that there are apparently two subjects: a full NP, which occurs to the left of the inflected verb (after a whword or initially in yes/no questions), and a pronoun to the right of the inflected verb. That the NP is not dislocated is shown by the fact that it follows the Spec of Comp in (1), and by the well-formedness of an example like (2) involving the quantified NP personne ‘no-one’, which is in general unable to appear in a dislocated position (see n. 2). The simultaneous presence of a lexical and a pronominal subject here gives the appearance of clitic doubling, either of the kind found with objects in various dialects of Spanish, as illustrated from the River Plate dialect in (3a), or the kind found with subjects in northern Italian dialects, illustrated from Fiorentino in (3b): (3) a. Lo ví a Juan. Him I-saw to John. ‘I saw John.’

Complex Inversion in French 297 b. La Maria la parla sempre. The Mary she talks always. ‘Mary is always talking.’ However, despite apparent similarities, at least two fundamental properties distinguish the French case from those in (3). First, the French construction is highly selective in that it is restricted to direct questions and other environments featuring fronting of the inflected verb. No such construction-specific restriction is found with the ordinary cases of clitic doubling. Second, the pronominal elements in (3) have clear properties of syntactic clitics, which occur attached to the verb or under Infl, and do not occupy an NP position in the syntax. On the other hand, it appears to be the case that French unstressed subject pronouns are in NP position in the syntax, and are cliticized to the inflected verb in the phonology (for relevant evidence see Couquaux 1986; Kayne 1983; Rizzi 1986). The contrast with northern Italian dialects is revealing; while subject clitics and full subject NPs can, and in some cases must, co-occur in many dialects, the two elements are in full complementary distribution in standard French.2 If French subject pronouns manifest an NP position on syntactic levels of representation, then the kind of doubling shown in (1) and (2) must involve two NP positions, not just one, as in (3). Such a state of affairs thus raises different and more acute theoretical problems than the familiar cases of clitic doubling. The basic goal of this paper is to show that the fundamental properties of complex inversion can be properly understood if we combine elements of the thorough analysis proposed in Kayne (1983) with certain more recent proposals: a. Chomsky’s (1986b) extension of X-bar Theory to non-lexical categories; b. an adaptation of Baker’s (1988) approach to visibility and head-to-head movement; c. the idea, independently arrived at by a number of researchers, that subjects are base-generated in VP and raise to their surface subject position in IP. Our adaptation of Baker’s theory of head-to-head movement, in conjunction with a strict interpretation of the Projection Principle, also yields a principled account of the fact that complex inversion is limited to root structures (cf. Den Besten 1983; Safir 1981/82; Safir and Pesetsky 1981), an account which can be extended to root phenomena in general (Emonds 1976). In section 2 we outline an analysis of subject-clitic inversion, a necessary prerequisite. In section 3, we address the problem posed by the presence of two subjects, which we factor into three distinct problems: a. how can the Case requirements of the two nominals be simultaneously fulfilled? (the Case problem); b. which positions do the two subjects come from in the derivation? (the source problem);

298 Luigi Rizzi and Ian Roberts c. which positions do the two subjects occupy at S-structure? (the landingsite problem). In section 4, we turn to the question of the restriction of complex inversion to root contexts and we develop a general approach to the root/non-root distinction.

2. Subject-Clitic Inversion First of all, it is necessary to sketch an analysis of one component of complex inversion which exists as an independent construction: subject-clitic inversion. This construction involves the inversion of a pronominal subject with the inflected verb, shown in (4): (4) a. b.

Est-il parti? ‘Has he left?’ Où est-il allé? ‘Where has he gone?’

Following den Besten (1983) and Kayne (1983), we assume that this inversion process involves leftward movement of the verb over the subject rather than rightward movement of the subject over the verb. Adopting the extension of X-bar Theory to non-lexical categories proposed in Chomsky (1986b), and the theory of head-to-head movement of Baker (1988), this process can be seen as raising of the inflected verb from I0 to C0, shown in 5 (cf. Rizzi 1987):3 (5)

CP C′

XP C0 I0

Où

est

IP I′

NP

il

I0

VP

t

allé

This approach immediately explains why inversion is impossible if C0 is filled. For instance, in the Quebec dialect of French, where an overt C0 can co-occur with a wh-element in Spec-CP, inversion is restricted to the case in which this option is not taken (Goldsmith 1981): (6) a. Qui que tu as vu? Who that you have seen?

Complex Inversion in French 299 b. Qui as-tu vu? Who have you seen? c. *Qui qu’ as-tu vu? Who that have you seen? In (6c) C0 is filled by que and hence is not available as a landing site for movement of the inflected verb. Standard French does not allow the co-occurrence of a wh-element and que, but a reflex of the same phenomenon can be seen with a certain class of adverbs. These adverbs are able to either trigger inversion or to co-occur with a that-clause. Again, these options are exclusive: (7) a. Peut-être qu’il a fait cela. Perhaps that he has done that. b. Peut-être a-t-il fait cela. Perhaps has he done that. c. *Peut-être qu’ a-t-il fait cela. Perhaps that has he done that. The natural account of (7) is to say that this class of adverbs (which includes peut-être, à peine ‘hardly,’ and a few others) are able to appear in Spec-CP. This brings the paradigm in (7) into line with that in (6). Third, again in standard French, a conditional clause can be introduced either by the overt complementizer si ‘if’ or by the inversion of a verb in the conditional mood, but not by both: (8) a. Si tu avais fait cela . . . If you had done that . . . ’ b. Aurais-tu fait cela . . . Had you done that . . . ’ c. *Si aurais/avais-tu fait cela . . . If had you done that . . . ’ Si and the inflected verb thus appear to compete for the same position, namely C0. The analysis of subject-clitic inversion as involving I0-to-C0 movement follows and updates the basic idea proposed by den Besten in that it treats inversion in French as essentially the same phenomenon as the more pervasive kinds of inversion found in Germanic languages. There is nevertheless a striking difference between the French case and the Germanic case (illustrated below by subject-aux inversion in English); namely, that the process is restricted to pronominal subjects in French, unlike in Germanic: (9) a. Has John spoken? b. *A Jean parlé?

300 Luigi Rizzi and Ian Roberts (10) a. Has he spoken? b. A-t-il parlé? Developing a suggestion by Szabolcsi (1983), we will propose that the impossibility of (9b) should be accounted for in terms of Case Theory. The idea is that raising of I0 to C0 destroys the context in which I0 assigns Case to the subject in French, but not in English or in other Germanic languages. A straightforward implementation of this proposal makes use of the idea of directionality of Case assignment; suppose that in French Nominative Case can only be assigned leftward, while in English and in other Germanic languages either direction of assignment is possible. In that case, a phonologically-realized NP will violate the Case Filter in the context created by I0-to-C0 movement in French. This is precisely the context of (9). So Jean violates the Case Filter in (9b). In English, there is no Case Filter violation here because Nominative can be assigned either leftward or rightward.4 As it stands, this proposal is too strong, as it rules out the well-formed example (10b). In order to account for (10b) we need to elaborate on what the Case Filter really requires. Following the general proposals of Baker (1988), we assume that the requirement that NPs be Case-marked is actually an instance of a more general requirement that nominals be associated with a Case feature. This association takes place in one of two ways: either by means of assignment of the feature from a head to the nominal, or by means of incorporation of the nominal into the head bearing the Case feature (for a precise formulation of this requirement, see Baker, Johnson and Roberts (1989: 239) [this volume, Chapter 8]): (11) a. Assignment:

X0

Ni

+Case

b. Incorporation:

X0

+Case

Ni

One variety of incorporation is cliticization. Following Kayne (1983), we assume that the pronoun in subject position can cliticize to the inflected verb in the syntax, once the latter has been moved to C0.5 So (10b) has a representation as shown in (12): (12)

CP C0 ai(-t-)ilj

IP NP tj

I′ I0

VP

ti

parlé

Complex Inversion in French 301 Here the clitic escapes the effects of the strict directionality condition on Nominative assignment in French as it is associated with a Case feature (the Nominative feature borne by I0 in C0) by incorporation with C0, so that the fact that Case assignment to Spec-IP is blocked is irrelevant. To sum up, we treat subject-clitic inversion as the combination of the raising of the inflected verb to C0 followed by incorporation of the subject pronoun with the inflected verb in C0. Incorporation of the pronoun is one way of associating it with a Case feature. Due to the directionality condition on Nominative assignment in French (or, alternatively, the language-specific mode of Case assignment discussed in n. 4), this is the only way that a subject can satisfy the requirements of Case Theory when I0 to C0 takes place. The fact that I0 to C0 can only occur with pronominal subjects is thus reduced to the fact that pronominals are the only elements that undergo incorporation in French (in fact, incorporation from subject position appears to be restricted to pronominals universally; see Baker and Hale 1988). With this background, we can go back to the issues raised by complex inversion.

3. The Problem of Two Subjects The existence of two apparent subjects in complex inversion constructions poses three problems. The first of these we call the “Case problem”: how are the two subjects assigned Case? The second problem is the “source problem”: where do these two subjects originate? The third problem is the “landing-site problem”: where do these subjects, in particular the NP, appear at S-structure? In this section, we will answer each of these questions in turn, thereby arriving at an analysis of complex inversion. 3.1 The Case Problem It is implicit in most versions of Case Theory and explicit in some (e.g., Vergnaud 1985) that there is a biunique relation between Case assigners and Case assignees. If this is so, the Case problem can be put as follows: how do both the full NP and the clitic satisfy the requirements of Case Theory in complex inversion? We will show that the analysis of subject-clitic inversion given in the previous section provides an automatic solution to this problem. Before presenting our analysis, we must make a preliminary assumption concerning the position of the full subject NP, a matter we will elaborate on below. For the moment, we simply recast Kayne’s (1983) proposal in terms of the assumptions about X-bar Theory of Chomsky (1986b). As the NP apparently occupies a position immediately to the right of Spec-CP and

302 Luigi Rizzi and Ian Roberts immediately to the left of C0, we take it that this NP is left-adjoined to C′. The complete structure is thus the following: (13) [CP wh [C’ NP [C ’ [C0 I0 -Cl]IP]]] In this structure the NP is governed by I0 and is to the left of it. Therefore it is assigned Nominative Case from right to left, in the usual way operative in simple declarative clauses (and, presumably, the two elements are in a configuration sufficiently close to Spec-head agreement, if the proposal in n. 4 is to be adopted). As for the clitic, we have seen that it cannot be assigned Case in the usual way because it is “on the wrong side” of I0, and need not be assigned a Case because it is associated with a Case feature by incorporation. The Case requirements of the two nominals are thus satisfied independently of each other. This account is not incompatible with the idea that there is a bi-unique relation between Case features and nominals; the bi-uniqueness condition is relativized to modes of association of Case features with nominals, in that assignment of a Case to a nominal is subject to bi-uniqueness, as well as association of a nominal to a Case feature by incorporation. However, the two modes of association can independently associate a single Case feature with two nominals.6 This account allows us to see why complex inversion is impossible in English: (14) *Which books John has he read? Here I0 in C0 could assign Nominative Case either leftwards or rightwards, but not to both nominals at the same time. Since English subject pronouns never undergo incorporation, he cannot incorporate into C0, so this means of satisfying the requirements of Case Theory is unavailable. Hence there is no way that the requirements of Case Theory can be satisfied in 14. This analysis retains the idea of Kayne (1972) that the possibility of complex inversion in French is a consequence of the existence of subject clitics in this language. 3.2 The Source Problem The Case problem is just one of the issues raised by the presence of two subjects. Another question which must be answered is: where do the two subjects come from, i.e., which positions are they base-generated in? We begin by giving a brief summary of Kayne’s (1983) answer to this question. In Kayne’s terms, the derivation of a complex inversion structure is as follows (we alter the category labels so as to accord with Chomsky 1986b): (15) a. [CP [IP Jean a mangé]] b. [CP a [IP Jean t mangé]] c. [CP Jean a [IP t t mangé]] d. [CP Jean a [IP il t mangé]] e. [CP Jean a-t-il [IP t t mangé]]

Complex Inversion in French 303 The first step is movement of the inflected verb to Comp, deriving (15b) from (15a). Next, the subject left-adjoins to some projection of Comp, giving (15c). Example (15d) is derived by the insertion of an expletive pronoun in subject position. Finally this pronoun cliticizes leftwards onto the inflected verb in Comp. This derivation involves two problematic steps. First, strict cyclicity is violated in (15d) and (15e), in that the operations which derive these structures take place in a subdomain of the domain of operations deriving (15c). Such a violation is suspect, even if the Strict Cycle Condition does not itself turn out to be a primitive condition of the theory (see Freidin 1978); why should it hold as a theorem in general but not in this case? Second, a widely accepted if not explicitly formulated assumption concerning lexical insertion is that all phonetically realized material is present at D-Structure (see Burzio 1986). This means that derivational operations can only create traces or fill empty positions by means of movement (they may also possibly delete material). This plausible constraint is violated by the insertion of il in (15d). It is fairly clear that both of these problems stem from the same cause: the fact that Kayne assumes that there is only one subject position in basic clause structure, at the time an uncontroversial assumption. This is why the same position must be the source of the two subjects, and thus why il must be inserted after the subject position has been vacated, leading to a violation both of strict cyclicity and of the condition on lexical insertion mentioned above. In the context of recent proposals by a number of authors (Kitagawa 1986; Koopman and Sportiche 1985, 1991; Kuroda 1986; Manzini 1986; Sportiche 1988a; Zagona 1982; and others) regarding the base position of subjects, we can straightforwardly solve the source problem. We adopt a variant of these proposals according to which subjects are base-generated in the Specifier of VP and raise in the course of the derivation to the Specifier of IP. This amounts, in effect, to treating I as a raising trigger. The proposal is illustrated for a simple English sentence in (16): (16) a. DS: [IP [I′ I0 [VP John [V′ loves Mary ]]]] b. SS: [IP Johni [I′ I0 [VP ti [V′ loves Mary ]]]] As in normal cases of raising, the subject moves to Spec-IP in order to satisfy the requirements of Case Theory at S-Structure. The relevance of this proposal for us is that it makes available two subject positions. We thus propose that the two subjects of the complex inversion construction each occupy one of the two subject positions at D-Structure: the pronoun, which following Kayne we assume to be an expletive,7 occupies Spec-IP and the full NP occupies Spec-VP. The following is the DS representation of an example like (15e): (17) [IP ili [I′ I0 [VP Jeani [V′ a mangé]]]]

304 Luigi Rizzi and Ian Roberts Here the subject argument, Jean, occupies a theta-position, and the expletive pronoun is in a non-theta-position. The Theta Criterion is thus met at D-Structure. In French, the leftmost verbal element must raise to a tensed inflection (cf. Emonds 1978; Pollock 1989), so the following configuration is derived: (18) [IP il [I′ a [VP Jean [V′ t mangé]]]] If no further movement takes place, the structure will be ruled out by Case Theory, since, given our assumptions, Jean will be unable to receive a Case here. In fact, if there is no interrogative or adverbial element present that activates the CP-level, this kind of configuration is ruled out by Case Theory. If the CP-level is activated by the presence of some appropriate element, I0to-C0 movement can legitimately apply, yielding the following configuration: (19) [CP a [IP il [I′ t [VP Jean [V′ t mangé]]]]] The pronoun is now able to incorporate with the auxiliary, since the auxiliary c-commands it. Moreover, our assumptions about Case Theory, spelled out in the previous section, mean that the inflected verb still has the capacity of assigning a Nominative Case feature leftwards to an NP which it governs. The NP can then move directly from Spec-VP to a position to the left of the auxiliary where it will be assigned Nominative Case. These operations yield a well-formed complex inversion structure, illustrated in (20): (20) Jeani [C aj-t-ili [IP ti [I′ tj [VP t’i [V′ tj mangé]]]]] The structure can only arise where I0 moves to C0, because the environment in which the two subjects are both able to satisfy the requirements of Case Theory depends on the presence of the inflected verb in C0.8 A striking fact about the above derivation is that Jean raises from Spec-VP position to the pre-C0 position, skipping Spec-IP. In this representation, the Caseless trace left in the Spec-VP position, t′i, is not a variable. Moreover, being non-pronominal, we must take it to be an anaphor, analogous to an NP-trace. Thus (20) is analogous in relevant respects to cases of super-raising that have been discussed in the literature (cf. Lasnik 1985; Chomsky 1986b; Baker 1988). In general, super-raising leads to severely ungrammatical sentences, of the type in (21): (21) *Johni seems that Bill likes ti Why is it that the application of NP-movement skipping Spec-IP does not lead to ungrammaticality in (20)? There are two issues to be addressed here. The first concerns the Binding Theory, and the second the intersection of the ECP and Theta Theory.

Complex Inversion in French 305 Taking the binding-theoretic question first, the problem is that NP-traces are subject to Principle A of the Binding Theory. This principle requires that anaphors be bound in their binding domain. In (20), the binding domain for t′i is the minimal category containing a governor for t′i and a subject, i.e., IP. Therefore Jean has moved to a position outside the binding domain of its trace in (20). However, the representation in (20) is saved from Principle A by the fact that Jean and il can (and must) have the same index. This ensures that t′i satisfies Principle A, as it is bound by an element which is in its binding domain, namely the trace of il, which occupies Spec-IP. Thus the derivation of (20) violates Principle A of the Binding Theory, but the representation does not. Since, under current assumptions, the binding conditions are checked on representations and not on derivations, (20) does not lead to a violation. It is well known that the Binding Theory is too weak to deal with the whole class of super-raising structures, however. In particular, what we have just said will not distinguish (20) from examples such as (22): (22) *Johni seems that hei likes ti This sentence is very bad, despite the fact that the trace has an antecedent in its binding domain, the coindexed subject he. This leads us to the second issue mentioned above. Under current approaches, (22) is ruled out either as a violation of the ECP (Chomsky 1986b), or as a violation of Theta Theory (Rizzi 1990). Both accounts have in common that a crucial antecedent- government relation fails to obtain. We will develop here the theta-theoretic approach. In general, arguments in non-theta-positions must be connected to their theta-positions through chain formation. The basic condition on chain formation is that each element in a chain antecedent governs the next (see Chomsky 1986b). Moreover, well-formed theta-chains must preserve the bi-uniqueness condition imposed by the Theta Criterion in that they can contain exactly one argument, and can be assigned exactly one thetarole. Structures such as (22) violate this condition in that the only chains that would satisfy the Theta Criterion violate the antecedent-government requirement. In particular, no chain can unite the NP-trace and John. So (22) is ruled out ultimately by Theta Theory. If (22) is ruled out in this way, why is (20) grammatical? Being basegenerated in a non-theta-position, il is an expletive in (20), so that the chain (Jean, il, t, t’) contains exactly one argument. Moreover, this chain is well formed with respect to the antecedent-government requirement since each member antecedent-governs the next. Hence the Theta Criterion is satisfied here.9 To summarize, we propose that the D-Structure for complex inversion is as in (17) and the S-Structure as in (20). The derivation involves several types of movement: head-to-head movement of a, cliticization of il to a and NP-movement of Jean. All of these movements take place so that the two

306 Luigi Rizzi and Ian Roberts subjects are able to satisfy the requirements of Case Theory, outlined in the previous section. Movement of the inflected verb to C0 is a necessary precondition for the satisfaction of these requirements, so that this approach explains why complex inversion can only occur in interrogatives or other constructions activating the CP-level. Raising the NP subject from Spec-VP to a position in C does not violate either the Binding Theory or other conditions on chains, despite being derivationally close to super-raising, because unlike other cases of super-raising, the NP moved across the subject lands within the same clause and the antecedent-government requirement on each link of the chain can be met. 3.3 The Landing-Site Problem Two questions fall under the landing-site problem: (i) what is the structure of the sequence WH NP V-Cl? and (ii) how is the unique well-formed order to be guaranteed? Above we proposed that the natural updating of Kayne’s analysis would posit that the full NP subject occupies a position left-adjoined to C′. On this proposal, there is only one CP, whose Specifier is occupied by the whphrase, whose head is occupied by V-Cl, and the subject NP is left-adjoined to C′, as in (23): (23) [CP wh [C′ NP [C′[C V-Cl] IP]]] This analysis violates a putative constraint on adjunction, i.e., Chomsky’s (1986b) proposal that maximal projections can only be adjoined to other maximal projections. If the proposal in (23) is correct, Chomsky’s constraint should be weakened so as to allow adjunction of non-heads to non-heads. This would maintain the important restriction that non-heads cannot be adjoined to heads, and heads cannot be adjoined to non-heads. It is nevertheless worthwhile to explore some alternatives, although we shall tentatively conclude that the structure in (23) is to be kept. One alternative is immediately suggested by the guiding intuition behind the proposals made in the previous section for the underlying structure of complex inversion, i.e., that the construction involves two subjects. Pushing this intuition, we would be led to the conclusion that the NP literally is in a subject position at S-Structure, as well as at D-Structure. This implies that basic clause structure makes available three subject positions, not just two, as we have been assuming up to now: the source position of the NP, the source position of the pronoun, and the landing-site position of the NP. In fact, Pollock (1989) proposes just such a structure for clauses. He argues that, instead of considering there to be a single node Infl containing two kinds of features, Tense and Agr, these two elements should be treated as heading their own maximal projections. This proposal, motivated by

Complex Inversion in French 307 facts from Verb Raising in French, leads to a considerably more articulated structure for the clause, namely that illustrated in (24):10 (24)

AgrP Spec-Agr

Agr′ Agr

TP Spec-T

T′ T

VP V′

Spec-V V

NP

This structure in principle makes available three subject positions, all of which we could exploit in the following representation for complex inversion: (25)

AgrP Spec-Agr Jeanj

Agr′ Agr ak-t-ilj

TP Spec-T tj

T′ T tk

VP Spec-V

V′

tj

mangé

In the D-Structure representation Jean occupies Spec-VP and il Spec-TP. The auxiliary raises to Agr0; the pronoun incorporates into Agr0; and Jean moves to Spec-AgrP. The main point in favour of this structure is that it provides a clear and simple solution to the landing-site problem by making available a sufficient number of structural positions. However, adopting this structure poses several problems in other areas. The basic problem is that the CP-level plays no role in (25). This means on the one hand that there is no obvious way to state the fact that complex inversion is characteristic of interrogatives. Nothing prevents the generation of sentences exactly like those in (25) as declaratives. Although a sentence such as Jean a-t-il mangé is grammatical in French, it must be understood as a question. This is clearly a fact that our analysis must capture, but which the proposal in (25) does not naturally deal with. Moreover, the fact that the

308 Luigi Rizzi and Ian Roberts CP-level plays no role in (25) means that it is hard to see how this approach can provide an account of the root nature of complex inversion (see section 4 on this). More seriously, we would be left without an account of the fact that complex inversion is incompatible with the presence of an overt C0, as in: (26) a. Peut-être Jean est-il parti. Perhaps John has he left. b. Peut-être que Jean est parti. Perhaps that John has left. c. *Peut-être que Jean est-il parti. Perhaps that John has he left. The same observation holds for the Québecois phenomenon mentioned above (examples from Safir 1981/82: 461–462; thanks to Maria Teresa Guasti for drawing our attention to this fact): (27) a. Quoi que Jean veut? What that John wants? b. *Quoi que Jean a-t-il voulu? What that John has he wanted? If complex inversion involves movement of the inflected verb to C0 these paradigms are immediately accounted for, on a par with the simple subjectclitic inversion cases discussed earlier (see examples (6) and (7)). But given a structural representation such as (25), the gaps in the paradigms remain mysterious. All these problems clearly stem from the fact that movement to C0 is not involved in this analysis. We therefore reject the proposal in (25). In particular we will not assume that V-Cl is in Agr0, but in C0, as the evidence reviewed forcefully argues.11 A less radical alternative to C′-adjunction is CP-adjunction of the whphrase and structure-preserving movement of the subject NP to Spec-CP. This would give the structure in (28): (28) [CP wh [CP NP [C’[C0 V-Cl] IP] The order wh NP V-Cl would then involve assuming wh-adjunction to CP rather than NP-adjunction to C′, an assumption that avoids the technical problem mentioned in connection with (23). However, the structure in (28) poses some problems of its own. These arise in part because it implies that wh-movement in the syntax can have a landing site which is not the typical position of wh-operators, the Spec of Comp, and in part because it involves movement of a non-operator, the subject NP, to an operator position. The second option raises the possibility of non-operator movement to Spec-CP in general, which would lead us to

Complex Inversion in French 309 expect generalized Verb Second (V2), a phenomenon not found in (Modern) French. The first option raises the question of what prevents iteration of the wh-adjunction, or the combination of wh-movement to Spec-CP and whadjunction to CP. This would give rise to clearly ungrammatical sentences such as the following: (29) *Où quels livres Jean a-t-il trouvés? Where which books John has he found? For these reasons, we maintain the analysis shown in (23), involving C′-adjunction of NP, and wh-movement to Spec-CP in wh-questions (or the presence of a null operator in this position in yes/no questions).12 Returning then to the structure in (23), it is important to see how a theory allowing C′-adjunction necessarily only gives rise to the order of elements found in complex inversion. Taking an example where a wh-phrase is present, there are four logical possibilities to be considered: (30) a. [CP wh [C′ NP b. [CP wh [C′ wh c. [CP NP [C′ NP d. [CP NP [C′ wh Clearly, all of these possibilities, except (30a), must be excluded. Example (30b) violates the constraint on the distribution of wh-elements in French which requires that they be either in operator position (i.e., Spec-CP) or in situ at S-Structure. Example (30c) is ruled out because a non-operator, NP, occupies an operator position, namely Spec-CP (in a non-V2 language). Finally, (30d) is ruled out for both of these reasons. We must also rule out the possibility of C′-adjunction of a non-subject in (30a), as well as multiple C′-adjunction. Following the Principle of Full Interpretation of Chomsky (1986a), we take it that an element occurring in a given position at LF must be licensed in that position by an interpretation. As the C′-adjoined position is neither an operator position nor an argument position (nor a left-dislocation position, a position whose content is presumably licensed at LF by a rule of predication), an element occupying this position at LF can only be licensed by being in a wellformed theta-chain. The formation of a well-formed chain from this position is impossible for non-subject NPs, because the subject in Spec-IP will block chain-formation with any position it c-commands, since it will block antecedent-government of any such position.13,14 Thus the only way of licensing the C′-adjoined NP at LF is by linking it to a trace in subject (i.e., Spec-IP) position. Therefore the only possible candidate for C′-adjunction is the subject NP itself. The C′-adjunction option thus does not give rise to overgeneration. The above approach to the landing-site problem has the advantage that it allows us to deal with two other properties of complex inversion noted

310 Luigi Rizzi and Ian Roberts by Kayne. First, the construction does not allow questioning of the subject itself: (31) *Qui est-il parti? Who did he leave? Second, complex inversion is incompatible with stylistic inversion: (32) a. Où Jean est-il allé? Where John has he gone? b. Où est allé Jean? Where has gone John? c. *Où est-il allé Jean? Where is he gone John? According to the approach to the landing-site problem advocated above, the representations for the relevant parts of these examples would be as follows: (33) a. [CP Qui [C′ t [ est-il [ t′ [ t′′ parti ]]]]] b. [CP Où [C′ pro [ est-il [ t′ [ t′′ allé ] Jean ]]]] Following Kayne (1983), we can straightforwardly account for the illformedness of these two representations by exploiting the fact that the crucial empty category is in an adjoined, hence A′, position. Consider first (33a). Here t does not qualify as a variable because it is an A′-position; t′ does not qualify either, since it has the status of an incorporation trace (a status that we assume to be incompatible with the status of syntactic variable), and t′′ the trace in the base position of the subject, cannot be a variable because it is in a Caseless position. Hence there is no syntactic variable that the operator can bind, and so the structure is ruled out by the general ban on vacuous quantification. Next, consider (33b). We assume that stylistic inversion involves a pro subject licensed by a C0 under certain conditions (as Pollock 1986 suggests for some cases). Recall that pro is really an abbreviation for the feature matrix [−anaphoric, +pronominal]. It is natural to assume that these features only classify empty categories in A-positions; in fact, the only distinction that is needed in A’-positions is that between intermediate traces and empty operators, a distinction that is not properly captured by the features [±anaphoric, ±pronominal]. Hence the empty category occupying the C′-adjoined position in (33b) cannot be pro. If pro is a necessary component of stylistic inversion, (33b) will be illformed.15,16 Notice that the approach to the landing-site problem based on the representation in (25) is unable to account for the facts in (33) in an equally straightforward way, because in that approach the crucial

Complex Inversion in French 311 empty category would be in an A-position non-distinct from an ordinary subject position.17

4. Root Phenomena A salient property of complex inversion is the fact that it is limited to root clauses, as the ungrammaticality of (34) shows: (34) *Je me demande qui Jean a-t-il vu. I wonder who John has he seen. In this section we will propose an account of this restriction, which we phrase in the context of a general approach to the nature of root phenomena. The root character of complex inversion is undoubtedly to be related to the root character of one component of the construction, namely subject-clitic inversion: (35) *Je me demande qui a-t-il vu. I wonder who has he seen. Both (34) and (35) appear to conform to a fundamental generalization concerning root phenomena: movement of the inflected verb to C0 is by and large restricted to main clauses. This rough generalization subsumes, in addition to the French constructions, subject-aux inversion in English and the main types of V2 in other Germanic languages (cf. den Besten 1983). The account we want to propose relies on the idea that the correct distinction is not main vs. embedded clause, but rather selected vs. non-selected clause (see Kayne 1982). A quick survey of the relevant cases supports this hypothesis. In the first instance, we should separate independent CPs from subject, complement and adjunct CPs; the former allow verb-movement to C0 while the latter do not. It is clearly true that independent CPs are not selected, and it follows from the Projection Principle, in conjunction with the Theta Criterion, that both complement and subject CPs must be selected. This leaves adjunct CPs. In typical adjuncts, for example the kind which can host a parasitic gap, CP is selected by a Preposition (in English, without, before, in order, etc.). Thus the whole adjunct is a PP containing a CP selected by the Preposition in such cases. There is, however, one class of adjunct CPs which provides evidence that the correct generalization regarding the possibility of inversion concerns the selected/nonselected distinction rather than the main/embedded distinction, namely the class of conditional protases (see Kayne 1982). Conditionals are embedded adjuncts, and they are also not selected. As (36) shows, they optionally allow inversion: (36) a. Had I the time, I’d help you. b. Aurais-je le temps, je vous aiderais.

312 Luigi Rizzi and Ian Roberts Putting these observations together with previous remarks on the incompatibility of inversion with a filled complementizer (cf. (8c) and English *If had I the time . . .), the following generalization emerges: (37) Inversion is possible only if

(i) CP is not selected, and (ii) C0 is not filled.

In most cases the two conditions overlap, for example in embedded thatclauses, but there are cases of both unselected clauses with a filled C0 that block inversion (cf. (6c-8c), (26c), (27b)), and of selected clauses where C0 is empty and inversion is blocked (e.g., (34) and (35)). We have already seen that condition (ii) of (37) follows directly from the idea that inversion involves movement of the inflected verb into C0: if C0 is filled movement cannot take place. The main topic of this section will be to explain what underlies condition (i) of (37). One possible approach would be to try to reduce (i) to (ii) by assuming that a selected C0 is always filled in the relevant sense. This is not implausible in the case of indirect questions such as (34) and (35), as here we could claim that C0 is filled by the feature [+wh] selected by the main predicate, and hence is not available as a landing site for movement. However, the drawback to this approach is that there is no good way to ensure that all selected CPs have a filled C0, especially in cases where C0 is phonetically null. For this reason, we will explore a more principled approach. We will claim that condition (i) of (37) derives from the Projection Principle. The Projection Principle requires that selectional properties be satisfied at all levels of syntactic representation. This requirement extends to categorial selectional properties, thereby imposing a strong structure- preservation constraint on all selected contexts. We will propose that I0to-C0 movement or, more precisely, the instances of this process that concern us here, does not preserve the structure in the strong sense required by the Projection Principle, and so is banned in all selected contexts. To show how this idea can work, we must first introduce some assumptions concerning the nature of head-to-head movement. We further constrain the approach of Baker (1988: 59) by assuming that head-to-head movement is always and only substitution of a head into another head position. In other words, we restrict the adjunction option to maximal projections (but see n. 18). In cases where incorporation results in a visible amalgam of the two heads, e.g., standard cases of Noun incorporation or V-to-I movement where V picks up tense and agreement marking, we assume that the incorporation host morphologically subcategorizes for the incorporee, hence a structural slot is created for the incorporee at D-Structure as a function of the lexical properties of the incorporation host (cf. Lieber 1980, on morphological subcategorization). So (tensed) I0 in a language like French has the subcategorization frame [+V0—], an incorporating V0 in Mohawk has the feature [+N0—], and so on. In general, where

Complex Inversion in French 313 an incorporation trigger X has the feature [+ Y0—], this means that the slot for Y0 is base-generated within X0, triggering substitution of Y0 during the derivation, leading to the creation of a complex head with the government and Case-marking properties discussed at length by Baker (1988, ch. 2). With this kind of incorporation, the head of the complex formed by incorporation remains X0, the incorporation trigger.18 Of course, nothing prevents an incorporation host of this kind from being selected by a higher head. Since incorporation does not alter categorial status, no problem is posed for the Projection Principle. Consider, for instance, Noun incorporation in an incorporating language. In such cases, the Verb has the morphological subcategorization feature [+N0—], creating a slot into which the Noun can be substituted. In (38), Noun incorporation is strongly structure-preserving, in the sense that it moves N0 to a pre-existing slot and it does not change categories; the verb does not become a noun. If I0 selects a V-projection (cf. Chomsky l986b), the Projection Principle is not violated since the complex head resulting from incorporation remains a verb at S-Structure. On the other hand, if the potential host does not provide a structural slot via morphological subcategorization, adjunction of heads being excluded (or limited to cliticization; see n. 18), the only way for a lower head to incorporate is by direct substitution into the host head. Of course, in most cases this operation will be excluded by the Recoverability Principle, the content of the host head being nonrecoverably erased. There is one case, though, in which recoverability is not violated: this is when the host head is radically empty, hence there is no content to recover. Our claim is that this is precisely what happens in the familiar cases of I0-to-C0 movement. This gives rise to a structure such as (39): 0

(38)

IP I′

NP 0

VP

I

V0 N

NP

0

0

V 0 [+N

] N0

e

(39) a.

CP

b. C′

CP C′

⇒

C0

IP

C0

IP

e

I0

I0

t

Let us see how (39b) can be ruled out in selected contexts. We maintain the standard assumption that selection involves properties of heads. If CP

314 Luigi Rizzi and Ian Roberts is selected in (39b), then there is a higher selecting head requiring that its complement’s head be C0. This lexical requirement is met at D-Structure but not at S-Structure where the phrase’s head is a C0 and an I0 (under the standard definition of the “is-a” relation). So (39b), in a selected context, is ruled out by the Projection Principle.19 We thus derive condition (i) of (37). This approach has a number of significant consequences. First, we account for the fact that V0-to-I0 movement is typically not restricted to unselected domains, while I0-to-C0 movement typically is.20 In our system, this difference follows from the fact that V0 to I0 is usually an instance of the first type of incorporation described above, i.e., that which is triggered by a morphological subcategorization feature of an agreement or tense affix. In this case, the categorial status of the host head is not affected, and even if I0 were selected by C0 (which it may or may not be) there would be no Projection Principle violation. This is why V0-to-I0 movement systematically differs from I0-to-C0 movement across languages. The second consequence is that I0 to C0 is not necessarily excluded in all selected environments. If C0 has the relevant morphological subcategorization feature, movement of I0 to C0 would not involve substitution for C0 and would not violate the Projection Principle. This appears to be the case in the instances of I0-to-C0 movement attested in the Romance languages: Auxto-Comp in Italian and the corresponding structure in inflected infinitives in Portuguese (cf. Rizzi 1982, Chs. 3 and 4; Raposo 1987). The Portuguese case is particularly telling: the construction involves an inflected verbal element in C0 position in various kinds of infinitival complements, as in (40) (from Raposo 1987: 98): (40) O Manel pensa terem os amigos t levado o livro. Manel thinks to-have-agr the friends taken the book ‘Manel thinks that the friends have taken the book.’ As this option is lexically selected (e.g., epistemic verbs allow it but volition verbs do not), a natural way to express this restriction is to say that epistemic verbs but not volition verbs select an embedded C0 with an agreement morpheme, which in turn morphologically subcategorizes an I0 slot. Then movement of the inflected auxiliary to C0 does not involve substitution for C0 itself, and no problems arise with the Projection Principle. So this kind of I0-to-C0 movement is allowed to apply in complement and other embedded contexts.21 To summarize, in this section we have proposed that the generalization underlying the restriction of complex inversion and subject-clitic inversion (and, more generally, I0-to-C0 phenomena) to root contexts is (37). The second part of this generalization follows straightforwardly from the very idea that these processes involve I0-to-C0 movement. We proposed that the first part is derived from the Projection Principle, once certain refinements are added to Baker’s theory of head-to-head movement.22

Complex Inversion in French 315

5. Conclusion The analysis of complex inversion that we have proposed integrates a number of strands: the basic insights of Kayne’s (1983) analysis, Chomsky’s (1986b) extension of X-bar Theory, Baker’s (1988) theory of head-to-head movement and the more elaborated proposals for the structure of clauses that have been made recently. We have shown how these strands can be drawn together so as to give a fairly complete analysis of complex inversion. Moreover, the analysis has led to a number of theoretical proposals; in particular, we have refined the theory of head-to-head movement by proposing that such movement is always substitution (perhaps with adjunction limited to cases of cliticization). Substitution can be into a slot provided by the morphological subcategorization of the host, or directly into the host head when the latter is empty. The second kind is properly restricted to root environments by a strict interpretation of the Projection Principle. APPENDIX I Embedded Subject-Aux Inversion in English Embedded Subject-Aux Inversion (SAI) is never found in indirect questions in English (*John wonders should he go to the store). However, SAI can be triggered by certain negative adverbials: (41) a. Never in my life have I been so insulted! b. Only in America could you get away with that. In certain embedded contexts, sentences of the type in (42) are possible (cf. Kayne 1982, 1983): (42) He said that under no circumstances would he do it. Two properties characterize this construction. First, that cannot be deleted: (43) ?*He said under no circumstances would he do it. Second, the complement is a weak island: (44) ?*What did he say that under no circumstances would he do? If we maintain that this type of inversion is an instance of I0-to-C0 movement, as is clearly shown by the impossibility of SAI where if is present (see above), we have no alternative other than to treat these cases as instances of CP-recursion. 23 We propose, therefore, that that has the marked property in English of selecting CP. Thus, if that is not present, a structure such as (43)

316 Luigi Rizzi and Ian Roberts can involve only one CP, where I0 to C0 is excluded for the reasons we have presented. That this option is by and large restricted to that is shown by the deviance of recursion with other choices of C0. For example, the structure is impossible with a [+wh] C0: (45) *I wonder if/whether under no/any circumstances would John do that. The islandhood of these complements is explained by the CP-recursion idea, as the embedded clause in (44) would have a representation such as the following: (46) [CP t that [CP under no circumstances [C′ would [IP he t do t ]]]] Extraction of the object in (46) would cross the lower tensed CP, which, in the system of Chomsky (1986b), has bounding properties akin to those of a standard wh-island since its Specifier is filled by the negative adverbial. APPENDIX II On the Landing-Site Problem The approach to head-to-head movement developed in section 4 allows us to elaborate a more principled solution to the landing-site problem of complex inversion, which dispenses with the ad hoc step of C′ adjunction (cf. section 3.3).24 The background is provided by the uncontroversial assumption that different kinds of heads license different kinds of specifiers: I0 licenses an A-specifier, C0 licenses an A′-specifier, and so on. Let us now take seriously the idea, formulated in section 4, that the result of inversion is a clause headed by C0 and by I0. In that case, two specifier positions can be licensed: the typical specifier of C0, the landing site for wh-movement, and the typical specifier of I0, a subject position. Both positions are used in complex inversion: (47) Où Jean [[est-il] [t t allé t] If we look at the problem derivationally, as we have done throughout the paper, we can simply assume that, when the new head is created by I0-to-C0 movement, the extra specifier position is automatically provided and made available for the lower subject to move into. Notice that this option never arises in cases involving incorporation qua substitution for a slot created via morphological subcategorization by the host head (that is, V-to-I movement does not create an extra position within IP corresponding to the V-specifier): in such cases the host head remains the only head of the construction after

Complex Inversion in French 317 incorporation, and so no additional Spec position can be licensed. Only in the case in which incorporation involves substitution for the host head, i.e., I0-to-C0 movement in root contexts, does the construction involve a genuine double head, and therefore a double specifier can be allowed. Moreover, this option is excluded in a language lacking subject clitics, such as English, for Case-theoretic reasons, as before (I has only one Case to assign, and so cannot Case-mark both its newly created specifier and the original specifier). The fact that the two specifiers are strictly ordered can now be related to the fact that a Case relation is involved only with one specifier: in (47), Jean must be adjacent (in the appropriate sense) to the head that assigns Case to it, hence où cannot intervene. The C′-adjunction solution made crucial use of the A′ status of the adjoined position to account for the incompatibility of complex inversion with wh-movement of the subject and stylistic inversion: (48) *Qui t est-il venu? (49) *Oú pro est-il allé Jean? This solution is no longer available within the more principled analysis that we are now adopting: if the NP position preceding the inflected verb is a legitimate I0-specifier, then it is an A-position, and (48) and (49) cannot be excluded as before because of the illicit A′-status of the variable and pro. A different approach is in order. Concerning (48), Marc-Ariel Friedemann (personal communication) pointed out to us that this structure is independently ruled out by the ECP within the system of Relativized Minimality (Rizzi 1990), regardless of the A- or A′-status of the trace. In this system, traces must be properly headgoverned, a requirement that is fulfilled for a subject trace in languages such as English or French by a C0 agreeing with its Spec: (50) Qui C0 [ t est venu ] (51) Who C0 [ t left ] In (48) no such proper head governor can be provided for the trace of qui, as C0 containing I0 is on the wrong side of the trace, hence the structure is ruled out by the ECP. As for (49), we can now elaborate on Sportiche’s (l988b) approach to Case Theory presented in n. 4. If Case can be assigned under strict government or agreement, the choice of mode of assignment for each specific instance of Case being a parameter, then it is reasonable to look at the licensing of pro along the same lines. So, pro can be licensed under agreement from its licensing head (as is the case for subject pro in Italian) or under strict government (as is the case for object pro in Italian; cf. Rizzi 1986a). It appears that the non-argument pro responsible for stylistic inversion in

318 Luigi Rizzi and Ian Roberts French is licensed under strict government from C0 (when additional conditions are met): (52) Le jour [ où C0 [ pro est venu Jean ]] The day when came John But then pro cannot be licensed in a structure such as (49) where it would be, if anything, in an agreement configuration with the appropriate head, and would not be strictly governed by it. The important facts illustrated by (48) and (49) can thus be naturally reconciled with our more principled approach to the landing-site problem.

Notes 1. Thanks to Adriana Belletti, Anna Cardinaletti, and the audience at the Séminaire interdépartemental de recherche linguistique at the University of Geneva for their comments on an earlier version on of this material. 2. If subject pronouns occur in NP position in French, then a sentence such as: (i) Marie, elle parle toujours. Mary, she speaks always.

must involve left-dislocation. This is supported by the fact that quantified NPs, generally excluded in cases of left-dislocation (cf. John/*Nobody, he’s a nice guy), are in fact impossible in structures of this kind: (ii) *Personne, il n’ est venu. No-one, he came.

The corresponding case is possible in various northern Italian dialects: (iii) Gnun l’a dit gnent. (Piedmontese) No-one he has said nothing. ‘No-one said anything.’

This is expected: if the clitic is under Infl in (iii), gnun can appear in subject position, where quantified NPs are generally allowed to occur. See Rizzi (1986b) for a detailed presentation of this argument. See also Renzi (1987) and Roberge (1986) for examples showing that certain dialectal varieties of French pattern with northern Italian dialects in this respect. 3. Pollock(1989) following Emonds (1978), shows that in French the leftmost verbal element must raise to I0 in tensed clauses. Such verb raising is impossible in (Modern) English for non-auxiliary verbs. 4. Alternatively, we could adopt the approach developed by Sportiche (l988b) (and also suggested by Jaeggli, personal communication) according to which Case can be assigned in one of two fundamentally different ways: either via government (defined in terms of strict c-command) or via Spec-head agreement. So, Objective and Oblique Cases are generally assigned via government by V or P, while Nominative Case is assigned via Spec-head agreement with I0 in declarative clauses in English and French (cf. also the earlier suggestion of Belletti and Rizzi 1981: 125). As the mode of assignment for I0 must be subject to parametric variation in this system, one could then claim that I0 can assign Nominative Case both by agreement and by government in English, the latter mode of assignment being relevant in inverted clauses, while it can only assign Case via

Complex Inversion in French 319 agreement in French. I -to-C movement destroys the Spec-head agreement configuration and makes Nominative assignment impossible in French in inverted clauses. One advantage of this approach is that it is relatively easy to see why a V0 which has been raised to I0 (or C0) may still assign Case to its object, while an I0 which has been raised to C0 has its Case-assignment capacity inhibited, as in French (this issue was raised by Alessandra Tomaselli, personal communication): a raised V0 still governs its object via Baker’s (1988, ch. 2) Government Transparency Corollary, while a raised I0 is simply no longer in a Spec-head configuration with Spec-IP. Once raised, I0 can only Case-mark Spec-IP by government, an option which is unavailable in French. 5. According to (the obvious updating of) Kayne (1983), the cliticization of the pronominal subject to the inflected verb is allowed to apply in the syntax only when I-to-C movement takes place, as only in this case is the cliticization target higher than the subject pronoun. If the inflected verb does not move, cliticization in the syntax would be downgrading, hence the clitic trace would not be bound by the clitic. The process is then restricted to apply in the phonology in this case. Notice that even if the pronoun is cliticized in the syntax in (12), it still manifests an NP position in that it fills the subject position at D-Structure. 6. Nothing in what we have said rules out the comparable situation with objects, i.e., a structure like complex inversion involving an object pronoun and an object NP. In such a structure, the pronoun could satisfy Case Theory by incorporating with V while the NP is assigned Objective Case under government by V. We suggest that Case Theory actually allows this possibility, but that Theta Theory rules it out since V would have only one object theta-role to assign but two object arguments. The basic difference between the hypothetical object case and the attested subject case, then, is that object pronouns cannot be expletives in French (cf. Kayne 1983), while subject pronouns can. If also in River Plate Spanish, Rumanian, etc., object clitics cannot be expletives, as appears to be the case, then object-clitic doubling in these languages must involve the composition of two argument chains, in the sense of Chomsky (1986a), Rizzi (1987a). 7. On the fact that the expletive agrees with the argument here, but not in other constructions, see Kayne (1983:127–129). 8. Generating the pronoun and the NP the other way around in (17), i.e., with il in Spec-VP and Jean in Spec-IP at D-structure, gives rise to an S-Structure which could satisfy Case Theory without I0-to-C0 movement (the only movement needed would be incorporation of il with the inflected verb in I0). However, in such a sentence Theta Theory would be violated at D-Structure, as the argumental NP occupies a non-theta-position. 9. An example such as (i) is ruled out in English by the antecedent-government condition: 0

0

(i) *A man seems that there was killed t.

Here the chain (a man, there, t) is not well-formed because a man does not antecedent-govern there. The difference with the complex inversion example in (20) is that the raised NP antecedent-governs the clitic in (20). Recall that the configuration of (20) is impossible in English for Case reasons, as English pronouns do not incorporate. 10. We follow Belletti (1990) in assuming that AgrP dominates TP, while Pollock proposes that TP dominates AgrP. 11. If, because of its other virtues, we still want to adopt Pollock’s proposed clause structure, we must explain why (24) is not an option for complex inversion. To get this result, it is enough to assume that one of the Spec positions in (24) is either absent or an A′-position, hence not available as the base position for

320 Luigi Rizzi and Ian Roberts il. The most plausible candidate for this is Spec-TP. If Spec-TP is not present, it obviously cannot be occupied by il. If it is present but an A′-position, it could not be the base position of an expletive, since expletives belong to the A-system. So, il would have to be base-generated in the Spec-Agr position, which means that the representation in (25) could not arise since incorporation of il from Spec-Agr to Agr0 would violate the ECP (see Baker 1988). 12. Another possibility which comes to mind is CP-recursion. This means that the structure of complex inversion would be as follows: (i)

[CP1WH [C′1 C10 [CP2 NP [C′2[C20 V1] IP ]]]]

However, this proposal fails to account for nearly all the important properties of complex inversion. In particular, there would be no way to account for the root nature of the phenomenon (CP-recursion, if available, should be possible in both root and embedded contexts). So we reject this possibility. 13. This requires a version of the Relativized Minimality Principle (see Rizzi 1990), according to which subjects block antecedent-government not just in A-chains but in theta-chains, the latter also including some chains headed by an argument in an A′-position (cf. n. 16). The same reasoning extends to the case where the C′-adjoined position is occupied by a predicate or adjunct, assuming that such an element must be connected by a well-formed chain to its canonical functional position, and that the subject (or perhaps the main predicate; see Roberts 1988). Is able to block antecedent-government in this case as well. 14. The presence of an object clitic on the verb in C0 (as in *Pourquoi cela l’as-tu dit) does not save the preposed object, because object clitics are unable to be expletives in French (cf. n. 5), therefore a chain including the two arguments cela and le inevitably violates the Theta Criterion here (ct. Kayne 1983:1 17). 15. The fact that variables are restricted to A-positions is actually a subcase of the restriction of the features [±anaphoric, ±pronominal] to A-positions, under the usual assumption that variables are defined in terms of this feature system. 16. It was proposed in Rizzi (2000) that this approach also gives an account of the fact that pro cannot appear in Spec-CP and thereby fulfill the V2 requirement in German: (i) Gestern wurde pro getanzt. Yesterday was danced. (ii) Es wurde t getanzt. It was danced. (iii) *Pro wurde t getanzt.

There is evidence that the element fulfilling the V2 requirement does not have to be phonetically realized, e.g., the empty operator involved in yes/no questions or the discourse-bound empty operator discussed in Huang (1983) can fulfill the V2 requirement. Thus the phonetic emptiness of Spec-CP is not in itself the cause of the ungrammaticality of (iii). Rather, (iii) is excluded because pro cannot appear in an A′-position such as Spec-CP. 17. We allow the possibility that theta-chains can be headed by A′-positions, as is the case with the theta-chain headed by the subject NP in the C′-adjoined position in complex inversion (other cases would be clitic chains and the chains relating preposed initial arguments to their theta-positions in V2 structures). 18. What is the status of cliticization with respect to our proposals for head-to-head movement? There are two possibilities. On the one hand, we could treat cliticization on a par with Noun incorporation, by taking cliticization hosts to have an appropriate morphological subcategorization frame. For languages such as Romance, which have cliticization but not Noun incorporation, we can make

Complex Inversion in French 321 the required categorial distinction by adopting the proposal made by Baker and Hale (1988) that pronouns are members of the category Determiner (D) (cf. Postal 1966). Cliticization hosts such as Romance Verbs (or perhaps Infl) would then have the specification [ +D0—]. On the other hand, we could distinguish cliticization from other types of affixation by weakening the ban on head adjunction and maintaining that cliticization is the one case of head-to-head movement which involves adjunction rather than substitution. 19. We assume, with Chomsky (1965), that a positive specification of categorial selection in a lexical entry implies a negative value of all the non-occurring specifications. So [+—C0] implies, among other things, [-—Io], whence the desired result. This account further entails that there can be no operation of S’-deletion in the literal sense of elimination of the CP-level. If this were allowed, a predicate which selected CP at D-Structure would select IP at S-Structure and LF in a clear violation of the strong version of the Projection Principle required by our analysis. The obvious alternative is that “S′-deletion” verbs in fact select infinitival IPs at all levels. 20. For example, according to Pollock (1989), V0-to- Io movement in French takes place in both main and embedded clauses; the same is true for V0-to-I0 movement in Italian (Belletti 1990), Middle English (Roberts 1985 [this volume, Chapter 1]) and Vata (Koopman 1984). 21. There is another class of apparently non-selected CPs, relative clauses, pointed out by Bonnie Schwartz (personal communication). These clauses clearly strongly disallow inversions (*The man who do I know). While it may be possible to claim that restrictive relatives are in fact selected by the Determiner of the head, such an account does not seem viable for appositives, where inversion is equally impossible. This suggests that an extension of our approach is needed. The Projection Principle serves to maintain the semantics/syntax correspondence in cases of selection, but there is no doubt that this correspondence must be maintained in other cases too. In particular it is plausible to suggest that the predication function can only be fulfilled by certain categories (see the list given in Williams 1980). In that case, full relative clauses presumably must be CPs at LF in order to be licensed by predication. If this is so, then the same result obtains as in the case of selection: no substitution for C0 would be possible, as the categorial status would be affected, thus preventing predication. The common factor behind relatives and indirect questions is, on this view, the fact that the Projection Principle and other well-formedness conditions on the syntax/ semantics interface require that such clauses be projections of C0 alone at the relevant syntactic levels. 22. A problem with this approach is posed by cases of embedded V2 in German. The usual [-—wh] complementizer in German is daß. Unlike English that, daß is generally obligatory. Thus a normal case of [–wh] subordination features daß in the embedded C0, with the tensed Verb in final position in the lower clause. However, certain verbs of saying and thinking allow daß to be dropped, and this triggers V2 in the complement CP: (i) a. Ich sagte er hatte meine Frau gesehen. I said he had my wife seen. (ii) b. Ich glaube er mag mich nicht. I think he likes me not.

The CPs here are clearly complements to sagen and glauben, respectively. So we are apparently faced with an instance of I to C in a selected context. This phenomenon in fact lends prima facie support to our first suggestion concerning condition (i) of (37), in that we could claim that C0 simply isn’t filled here.

322 Luigi Rizzi and Ian Roberts Within the more principled approach involving the Projection Principle, we could explore the possibility that these examples involve incorporation triggered by the morphological subcategorization property of C0, as in the Romance cases discussed earlier. Alternatively, it could be the case that these structures are base-generated in extraposed position, hence the Projection Principle does not directly prevent categorial shift of an element in this position. 23. CP-recursion may also be in order to describe the colloquial varieties of French which allow subject clitic inversion in embedded interrogatives (René Amacker, personal communication). 24. Our proposal is conceptually close to Haider’s (1987) Matching Projection approach, even if the two ideas are formally and empirically quite different.

References Baker, M. 1988. Incorporation: A Theory of Grammatical Function Changing. Chicago: University of Chicago Press. Baker, M.C. and K. Hale. 1988. “Pronoun and Anti-Noun Incorporation,” ms, McGill University/MIT. Baker, M.C., K. Johnson and I. Roberts. 1989. “Passive Arguments Raised,” Linguistic Inquiry 20:219–252 [this volume, Chapter 8]. Belletti, A. 1990. Generalized Verb Movement. Aspects of Verb Syntax. Turin: Rosenberg and Sellier. Belletti, A. and L. Rizzi. 1981 “The Syntax of ne: Some Theoretical Implications,” The Linguistic Review 1:117–154. den Besten, H. 1977/83. “On the Interaction of Root Transformations and Lexical Deletive Rules,” ms, University of Amsterdam. Published (1983) in W. Abraham (ed.), On the Formal Syntax of the Westgermania. Amsterdam: John Benjamins. 47–131. Burzio, L. 1986. Italian Syntax: A Government-Binding Approach. Dordrecht: Reidel. Chomsky, N. 1965. Aspects of the Theory of Syntax. Cambridge, Mass.: MIT Press. ———, N. 1986a. Knowledge of Language: Its Nature, Origins and Use. New York: Praeger. ———, N. 1986b. Barriers. Cambridge, Mass.: MIT Press. Couquaux, D. 1986. “Les pronoms faibles sujet comme groupes nominaux,” in M. Ronat and D. Couquaux (eds.), La Grammaire Modulaire. Paris: Les Éditions de Minuit. 25–46. Emonds, J. 1976. A Transformational Approach to English Syntax. New York: Academic. ———, J. 1978. “The Verbal Complex of V′-V in French,” Linguistic Inquiry 9:151–175. Freidin, R. 1978. “Cyclicity and the Theory of Grammar,” Linguistic Inquiry 9:519–549. Goldsmith, J. 1981. “Complementizers and Root Sentences,” Linguistic Inquiry 12:541–574. Huang, J. 1984. “On the Distribution and Reference of Empty Pronouns,” Linguistic Inquiry 15:531–574. Kayne, R.S. 1972. “Subject Inversion in French Interrogatives,” in J. Casagrande and B. Saciuk (eds.), Generative Studies in Romance Languages. Rowley, Mass.: Newbury House. 70–126. ———. 1982. “Predicates and Arguments, Verbs and Nouns,” GLOW Newsletter 8:24. [Abstract of paper presented at the 1982 GLOW Conference.] ———. 1983. “Chains, Categories External to S, and French Complex Inversion,” Natural Language and Linguistic Theory 1:109–137.

Complex Inversion in French 323 Kayne, R.S. and J.-Y. Pollock. 1978. “Stylistic Inversion, Successive Cyclicity, and Move NP in French,” Linguistic Inquiry 9:595–621. Kitagawa, Y. 1986. “Subjects in Japanese and English,” Ph.D., University of Massachusetts, Amherst. Koopman, H. 1984. Verb-Movement and Universal Grammar: From the Kru Languages to Grammatical Theory. Dordrecht, Foris. Koopman, H. and D. Sportiche. 1985. “Theta Theory and Extraction,” GLOW Newsletter 14:57–58. [Abstract of paper presented at the 1985 GLOW Conference.] Koopman, H. and D. Sportiche. 1991. “The Position of Subjects.” Lingua 85:211–258. Kuroda, Y. 1986. “Whether we Agree or Not: A Comparative Syntax of English and Japanese.” Lingvisticae Investigationes 12:1–47. Lasnik, H. 1985. “Illicit NP-movement: Locality Conditions on Chains?” Linguistic Inquiry 16:481–490. Lieber, R. 1980. “On the Organisation of the Lexicon,” Ph.D., MIT. Manzini, M.-R. 1986. “Phrase Structure and Extraction,” GLOW Newsletter 16:55–57. [Abstract of paper presented at the 1986 GLOW Colloquium.] Pollock, J.-Y. 1986. “Sur la syntaxe de EN et le paramètre du sujet nul,” in M. Ronat and D. Couquaux (eds.), La Grammaire Modulaire. Paris: Les Éditions de Minuit. 211–246. Pollock, J.-Y. 1989. “Verb Movement, UG and the Structure of IP,” Linguistic Inquiry 20:365–424. Postal, P. 1969. “On So-Called ‘Pronouns’ in English,” in D. Reibel and S. Schane (eds.), Modern Studies in English. Englewood Cliffs, New Jersey: Prentice Hall. Raposo, E. 1987. “Case Theory and Infl-to-Comp: The Inflected Infinitive in European Portuguese,” Linguistic Inquiry 18:85–110. Renzi, L. 1987. “I pronomi soggetto: un caso di parentela tipologica tra fiorentino e francese, e un capitolo poco noto di storia della lingua italiana,” ms, Università di Padova. Rizzi, L. 1982. Issues in Italian Syntax. Dordrecht: Foris. Rizzi, L. 1986a. “Null Objects in Italian and the Theory of pro,” Linguistic Inquiry 17:501–557. Rizzi, L. 1986b. “On the Status of Subject Clitics in Romance,” in O. Jaeggli and C. Silva-Corvalàn (eds.), Studies in Romance Linguistics. Dordrecht: Foris. 391–420. Rizzi, L. 1987b. “On the Structural Uniformity of Syntactic Categories,” paper presented at the Second World Basque Conference, San Sebastian, September 1987. Rizzi, L. 1990. Relativized Minimality. Cambridge, Mass.: MIT Press. Rizzi, L. 1991. “On the Status of Referential Indices,” in A. Kasher (ed.), The Chomskian Turn. Oxford: Blackwell. 273–299. Rizzi, L. 2000. “Three Issues in Romance Dialectology,” in L. Rizzi (ed.), Comparative Syntax and Language Acquisition. London: Routledge. Roberge, Y. 1986. “The Syntactic Recoverability of Null Arguments,” Ph.D., University of British Columbia. Roberts, I. 1985. “Agreement Parameters and the Development of English Modal Auxiliaries,” Natural Language and Linguistic Theory 3:21–58 [this volume, Chapter 1]. Roberts, I. 1989. “Thematic Minimality,” Rivista di Grammatica Generativa 13:111‑137. Safir, K. 1981/82. “Inflection-Government and Inversion,” The Linguistic Review 1:417–467. Safir, K. and D. Pesetsky. 1981. “Inflection, Inversion and Subject Clitics”, Proceedings of NELS 11. 331–344.

324 Luigi Rizzi and Ian Roberts Sportiche, D. 1988a. “A Theory of Floating Quantifiers and Its Corollaries for Constituent Structure,” Linguistic Inquiry 19:425–449. Sportiche, D. 1988b. “Conditions on Silent Categories,” ms, University of California, Los Angeles. Szabolcsi, A. 1983. “On the Non-Unitary Nature of Verb-Second,” ms, Max-Planck Institute for Psycholinguistics, Nijmegen. Vergnaud, J.-R. 1985. Dépendances et niveaux de représentations en syntaxe. Amsterdam: John Benjamins. Williams, E. 1980. “Predication,” Linguistic Inquiry 11:203–238. Zagona, K. 1982. “Government and Proper Government of Verbal Projections,” Ph.D., University of Washington, Seattle.

10 Excorporation and Minimality Ian Roberts

1. Introduction* Baker (1988) presents a theory of syntactic incorporation according to which the operation that derives morphologically complex words from more basic elements (roots, stems, or affixes) is held to be the variant of Move-α that applies to heads. Thus, various kinds of affixation and incorporation processes are viewed as instances of head-to-head movement, as in (1): (1)

XP

X0 + Y0i

YP

0

Y

ti

A number of phenomena have been analyzed in this way: Baker himself treats noun incorporation, applicative constructions, causatives, and passives, and arguably the best-known work that builds on his has been done in the domain of verb movement (see Pollock (1989)). Other authors have also proposed treating Romance clitic climbing (Burzio (1986), Kayne (1989)) and Dutch verb raising (Haegeman (1988)) in terms of head-tohead movement. The major advantage of Baker’s approach is that it allows an account of certain constraints on morphological operations in terms of well-known and independently motivated syntactic conditions, notably the Empty Category Principle (ECP). The Head Movement Constraint, which prevents head movement from “skipping” intervening heads (Travis (1984)), can be derived from the ECP (Chomsky (1986)). Moreover, it follows from the ECP that only heads of complements can incorporate; incorporation is

326 Ian Roberts impossible from subjects and from adjuncts. Abstractly, then, the following cases are ruled out by the ECP in a system like Baker’s (in other words, in each case the trace of head-to-head movement fails to be properly governed): (2) a. *

XP X0 + Z0

YP Y0

ZP t

b. *

XP

X′

YP

0 0 X +Y

t

c. *

XP

XP

YP

X 0 + Y0

t

The question we will address in this squib is, What is the status of excorporation? Excorporation is successive cyclic head-to-head movement where one head simply “passes through” another, first incorporating and then moving on, as in (3): (3)

XP

X0

+

excorp

Z0i

YP

Y0 + ti

ZP

0

Z incorp

ti

Excorporation and Minimality 327 Excorporation seems to be impossible in the genuine morphological cases of head-to-head movement, such as noun incorporation and affixation. For example, assuming that do is inserted in English at PF in order to carry “stranded” verbal affixes (Chomsky (1957)) and that have and be raise from base V-positions to I (Emonds (1976; 1978)),1 we never find cases of subject-aux inversion of the following type: (4) a. *Have John does t gone? (S-Structure: have John [t -s] t gone) b. *Be John did t arrested? (S-Structure: be John [t -ed] t arrested) Instead, once the auxiliary combines with agreement (I0), the two elements must move together to C0 (giving Has John gone? and Was John arrested? for (4)). Baker (1988, 73) suggests that derivations like (3) can be ruled out in terms of a ban on word internal traces (although he also mentions (fn. 19) the possibility of ruling out (3) in terms of the ECP). On the other hand, if cliticization and verb raising are cases of incorporation, excorporation might have to be countenanced. In the case of cliticization, excorporation would be manifested by clitic climbing. In the case of verb raising, excorporation would be needed to account for the interaction of verb second with this process (Jean Rutten (personal communication)): (5) a. Italian La volevo t chiamare t ieri. her I-wanted to-call yesterday ‘Yesterday I wanted to call her up.’ b. Dutch Gisteren had ik [mijn vriendin op t] t yesterday had I my girlfriend up willen bellen. want call ‘Yesterday I wanted to call my girlfriend up.’ I follow Kayne (1989) in taking clitics to be heads, and clitic climbing to be successive-cyclic head movement. In (5a) the clitic la moves through the lower I and on, possibly through an intermediate C, to its surface position. The clitic passes through these heads morphologically unscathed; it does not carry along any features of the heads it moves through (this is particularly clear if we assume, following Belletti (1988), that Italian infinitives always raise to I). In (5b) successive applications of verb raising (which I describe in more detail below) create a verbal complex had – willen – bellen, out of which had alone moves to satisfy the verb-second requirement.

328 Ian Roberts I will show that the elaboration of Baker’s theory proposed in Rizzi and Roberts (1989 [this volume, Chapter 9]), in conjunction with the Minimality Condition (either the “rigid” condition of Chomsky (1986), or the relativized condition of Rizzi (1990)), gives exactly the correct results. Cases of incorporation that involve genuine affixation are prevented from undergoing subsequent excorporation by the ECP, whereas other instances of incorporation, those apparently operative in cliticization and verb raising, may allow excorporation. I will further show that this lends support to a treatment of verb raising as a particular kind of incorporation.

2. Background Assumptions I first introduce the modifications Rizzi and Roberts (1989 [this volume, Chapter 9]) make to the theory of head-to-head movement. Rizzi and Roberts further elaborate the approach of Baker (1988, 59) by assuming that head-to-head movement may be either substitution of a head into another head position or adjunction of a head to another head position. In cases where incorporation results in a visible amalgam of the two heads (such as standard cases of noun incorporation, or V-to-I movement where V “picks up” tense and agreement marking), we assume that the incorporation host morphologically subcategorizes for the incorporee; hence, a structural slot is created for the incorporee at D-Structure as a function of the lexical properties of the incorporation host. So (tensed) I0 in a language like French has the subcategorization frame [ + V0 __ ], an incorporating V0 in Mohawk has the feature [ + N0 __ ], and so on. In general, where an incorporation trigger X0 has the feature [ + Y0 __ ], this means that the slot for Y0 is base-generated within X0, triggering substitution of Y0 during the derivation, leading to the creation of a complex head with the government and Case-marking properties discussed at length by Baker (1988, chap. 2). With this kind of incorporation, the head of the complex formed by incorporation remains X0, the incorporation trigger. For convenience, I adopt the notation originally proposed in Selkirk (1982, 3ff.) and indicate the incorporation trigger as X−1. It is crucial for what follows, however, that we continue to consider this element to be a head as far as the ECP is concerned (as it clearly is, if we extend standard assumptions about X-bar theory below the X0 level).2 On the other hand, if the potential host does not provide a structural slot via morphological subcategorization, head-to-head movement may take place either as an instance of adjunction or, if the host head is radically empty, as substitution into the empty head position. In the case of adjunction, following the proposals concerning adjunction in May (1985), the host head is realized in two segments, neither of which is itself a head. I leave aside the case of substitution into a radically empty head (see Rizzi and Roberts (1989 [this volume, Chapter 9]) for discussion). Substitution incorporation and adjunction incorporation are illustrated in (6) (these structures

Excorporation and Minimality 329 are left-headed purely for purposes of illustration; in fact, English inflections usually appear on the right of the stem): (6) a.

XP

X0

X–1 [+ Y0]

b.

YP

Y0

t

XP

X0

X0

YP

Y0

t

The structures in (6a) and (6b) have quite different properties with respect to the Head Movement Constraint as it is derived from the ECP, as we will see. The second piece of background that is required is the Minimality Condition. In Chomsky (1986, 10) the Minimality Condition operates in configurations of the kind shown in (7): (7) . . . X . . . [ Y . . . W . . . Z . . . ] Here, if W governs Z, then the Minimality Condition prevents X from governing Z even if X satisfies all the other criteria for governing Z. The Minimality Condition is particularly important for the computation of antecedent government relations; thus, an intervening governor may block government of a trace by its antecedent, leading potentially to a violation of the ECP (or perhaps of conditions on chain formation; see Rizzi (1990, chap. 3)). This condition derives the Head Movement Constraint in a configuration like (2a) in the following way: Y0 in (2a) is an intervening governor for the trace and so prevents Z0 from antecedent-governing its trace. In the next section we will see how putting these assumptions into action derives the correct account of the properties of excorporation.

3. Excorporation In this section I will show how excorporation is prevented in cases like (4), but may be possible in cases like (5). Consider first the cases in (4). Suppose that have/be raising in English is triggered by the presence of an agreement

330 Ian Roberts affix in I0 that morphologically selects V0. So have/be raising is an instance of head-to-head movement of the type in (6a), with have/be = Y0 and I = X0. If have/be then moves to C alone, stranding the agreement affix, the following configuration results: (8)

*

C′

C0

IP

V0

I0

V0

I–1 [+V0

]

t

In this structure the I−1 sister to V0 is a head. In terms of minimality, then, I−1 counts as an intervening governor for the trace (see (7)), and antecedent government of this trace by V0 in C0 is therefore blocked. This situation will arise whenever excorporation takes place from a selected slot—that is, whenever Y0 in (6a) moves on, stranding X−1. This is what explains the ungrammaticality of (4). (I assume that any index possessed by an incorporee percolates to the complex formed after incorporation; this ensures that the trace in the base V0 position is antecedent-governed in (8) and permits the formation of the chain (V, t) here.) Consider next the situation that arises when incorporation involves the adjunction of one head to another, as in (6b). Following the conception of adjunction outlined in May (1985) and adopted in Chomsky (1986), the two occurrences of the host head X0 in (6b) are the segments of the single head X0. We can thus propose that the X0 sister to Y0, since it is not itself a head, cannot block proper government of the trace of Y0. Therefore, Y0 is able, other things being equal, to move on, stranding the host head, and its trace will be properly governed (note the parallel with movement of maximal projections, where adjunction may void barrierhood for analogous reasons). The situation with verb raising in examples like (5b) is actually more complex. Here it is not the adjoined element, the infinitival verb (willen bellen, itself the result of an earlier incorporation of bellen with willen), that excorporates, but rather the apparent original incorporation host, the inflected verb had.What allows this situation is the fact that the adjoined element is not the head of the resulting complex and thus arguably cannot count as an intervening governor for the trace of the inflected verb in cases of this type.3 This account does not affect the earlier account of (4) and (6a); here the selecting affix is the head of the complex I0.

Excorporation and Minimality 331 In adjunction structures of the kind in (6b), then, both the host and the incorporee are free to move on. In clitic climbing, the incorporee moves on; in structures combining verb raising and verb second, the host moves on. This raises the possibility that both elements could excorporate independently, the subsequent derivation perhaps involving different kinds of head-to-head movement. If we look more closely, sentences like (5b) in fact exemplify this possibility. The derivation of (5b) involves several kinds of verb movement. The entire derivation is presented in (9): (9) a. ik [[[mijn vriendin op-bellen] willen] heb] I0 b. ik [[[mijn vriendin op - t] willen-bellen] heb] I0 c. ik [[[mijn vriendin op - t] t] [V0 heb [V0 willen-bellen]]] I0 d. ik [[[mijn vriendin op - t] t] [V0 t [V0 willen-bellen]] [I0 had]] e. ik [[[mijn vriendin op - t] t] [V0 t t] [I0[I0 had] [V0 willen-bellen]]] f. [C0[I0 had]] ik [[[mijn vriendin op - t] t] [V0 t t] [I0 t [V0 willen-bellen]]] (9a) is the D-Structure representation. In (9b) bellen adjoins to willen (stranding the particle op). In (9c) the complex willen bellen adjoins to the matrix V0 heb-. The latter is in its base position and therefore uninflected at this stage of the derivation. However, heb- is selected by the tense/agreement morphology in I0. So this element excorporates from [V0 heb- willen bellen] and moves to I0 in (9d), forming had. This possibility is allowed, since verb raising involves adjunction, for the reasons just outlined. Next, willen bellen adjoins to I0. So the two parts of the complex formed by raising of willen bellen to heb- move to I0 independently of one another and produce different resulting structures. This is simply the combination of the two cases of excorporation we have discussed.4 As the last step in the derivation, I0 raises to C0 for verb second to give (9e); as described above, I0 can excorporate from the complex verb, but V0 (heb-) cannot excorporate from I0.

4. Conclusion The above remarks are intended as an elaboration of Baker’s theory. We see that, when certain points left open by Baker are developed, a picture emerges of a range of head-to-head movements. Given that it seems justified to treat processes as superficially varied as noun incorporation, V-to-I movement, verb raising, and clitic climbing all as instances of headto-head movement, a theoretically based account of what distinguishes them is most welcome. I hope that the foregoing analysis is a step in that direction.

332 Ian Roberts

Notes * My thanks to Hagit Borer, Luigi Rizzi, Jean Rutten, and two anonymous LI reviewers for invaluable help with this squib. All errors are of course my own. 1. These assumptions about the functioning of the English auxiliary system are made purely for the sake of the argument here. The reality is undoubtedly much more complex; see Chomsky (1989) and Pollock (1989) for some recent proposals. As an anonymous reviewer points out, the proposal sketched in the text does not prevent I containing an affix raising to C followed by do-insertion into C. Where an aspectual auxiliary is present, this would give results like *Does John have left? This is ruled out by the Strict Cycle Condition and the idea that have/ be raising is obligatory. A full treatment of this and related issues (notably the matter of long movement of auxiliaries via successive adjunctions, also pointed out by a reviewer) would go beyond the scope of this squib. These matters are dealt with, using essentially the system of head-to-head movement presented here, in Roberts (1993). 2. We could ask how the system carries over to cases of compounding, where the incorporation host is not a bound morpheme. One possibility would be to claim that morphological subcategorization is not the unique prerogative of X−1s. X−1 s must have a morphological subcategorization, since, as a consequence of wellformedness conditions of X-bar theory, they cannot stand alone (that is, they cannot be dominated by X′ or by a nonbranching X0; this is the No-Stray-Affix Filter). This does not prevent X0s from having morphological subcategorization frames; such X0s would be the elements that trigger compound formation. 3. Here certain differences emerge between Rizzi’s Relativized Minimality Condition and Chomsky’s rigid Minimality Condition. Chomsky’s condition is phrased in terms of the idea that some projection of a given head blocks government from outside that projection (see Chomsky (1986, 42)); it therefore follows automatically that adjoined heads do not project minimality barriers, since, in their adjoined position, they do not project at all. In Rizzi’s system any c-commanding head blocks antecedent government of another head; adjoined heads are not immediately different from base-generated or substituted heads with respect to this principle. The natural refinement of Rizzi’s system is to say that only a head occupying a base-generated X0 position can block antecedent government of another head. This approach has the advantage that heads incorporated by substitution block excorporation of their hosts, an apparently correct result that Chomsky’s version of minimality cannot obtain for the same reason that adjoined heads do not block excorporation of their hosts in his system. 4. I assume that the Strict Cycle Condition is not operative here, since the two applications of head-to-head movement have exactly the same domains of application.

References Baker, M. C. (1988) Incorporation: A Theory of Grammatical Function Changing, University of Chicago Press, Chicago, Illinois. Belletti, A. (1990) Generalized Verb Movement: Aspects of Verb Syntax, Rosenberg & Sellier, Turin. Burzio, L. (1986) Italian Syntax, Reidel, Dordrecht. Chomsky, N. (1957) Syntactic Structures, Mouton, The Hague. Chomsky, N. (1986) Barriers, MIT Press, Cambridge, Massachusetts. Chomsky, N. (1989) “Some Notes on Economy of Derivation and Representation,” in I. Laka and A. Mahajan, eds., Working Papers in Linguistics 10, Department of Linguistics and Philosophy, MIT, Cambridge, Massachusetts. Reprinted in R.

Excorporation and Minimality 333 Friedin (ed) Principles and Parameters in Comparative Grammar. Cambridge MA: MIT Press, pp. 417–454. Emonds, J. (1976) A Transformational Approach to English Syntax, Academic Press, New York. Emonds, J. (1978) “The Verbal Complex V-V’ in French,” Linguistic Inquiry 9, 151–175. Haegeman, L. (1988) “Verb Projection Raising and the Multidimensional Analysis: Some Empirical Problems,” Linguistic Inquiry 19, 671–684. Kayne, R. (1989) “Null Subjects and Clitic Climbing,” in O. Jaeggli and K. Safir, eds., The Null Subject Parameter, Reidel, Dordrecht. May, R. (1985) Logical Form: Its Structure and Derivation, MIT Press, Cambridge, Massachusetts. Pollock, J.-Y. (1989) “Verb Movement, Universal Grammar, and the Structure of IP,” Linguistic Inquiry 20, 365–424. Rizzi, L. (1990) Relativized Minimality, MIT Press, Cambridge, Massachusetts. Rizzi, L. and I. Roberts (1989) “Complex Inversion in French,” Probus 1, 1–30 [this volume, Chapter 9]. Roberts, I. (1993) Verbs and Diachronic Syntax, Reidel, Dordrecht. Selkirk, E. (1982) The Syntax of Words, MIT Press, Cambridge, Massachusetts. Travis, L. (1984) Parameters and Effects of Word Order Variation, Doctoral dissertation, MIT, Cambridge, Massachusetts.

11 Two Types of Head Movement in Romance* Ian Roberts

1. Introduction The basic point of this paper is to argue that there are two distinct kinds of head movement. One kind is triggered by morphological properties of the host head, while the other kind is not, and in fact often appears to be triggered by some property of the moved head. Adopting and extending the terminology of Chomsky & Lasnik (1993), we refer to the former as L-related head movement and the latter as non-L-related head movement.1 Both types of head movement are subject to the ECP, but, since the nature of the target of movement is different in each case, the antecedent-government requirement manifests itself in different ways. This gives the appearance of differing locality conditions; in particular, only L-related head movement obeys the “classical” Head Movement Constraint of Travis (1984). By a revision of the Relativized Minimality Condition of Rizzi (1990), however, we see that both the cases which obey this condition and those which do not are in conformity with the ECP. We assume a conjunctive formulation of the ECP, as in Rizzi (1990: chapter 2). Moreover, we assume that traces of head movement are subject to a uniform head-government requirement. For non-L-related head movement, this raises the possibility that the head-governor and the antecedent-governor may be distinct. Our main empirical argument for the framework to be adopted relies on this fact; we will show that there is diachronic evidence from French that non-finite AGR ceased to be a head-governor for head traces in the 17th century, with the result that a number of instances of non-L-related head movement disappeared together. Moreover, we will see that the Romance languages as a group can be characterized in terms of whether non-finite AGR is or is not a head-governor. We will speculate on the relationship of this property to the null-subject parameter.

Two Types of Head Movement in Romance 335

2. The Head Movement Constraint and the ECP In the recent work on head movement, the fundamental locality constraint that has been assumed is the Head Movement Constraint (HMC). This was originally formulated by Travis (1984: 131) as follows: (1) An X° may only move into the Y° which properly governs it. The HMC prevents head movement from non-complement categories; the empirical consequences of this are discussed in depth in Baker (1988). To the extent that some kind of Minimality Condition is incorporated into the definition of proper government, it also blocks head movement which “skips” a locally c-commanding head moving directly to a non-local c-commanding head. The banned configuration is schematized in (2): (2) [ZP [Z° X°] [YP Y°[XP t]] The instantiations of this configuration in familiar languages involving I-to-C movement are strongly ungrammatical, as following illustrate: (3) a. English SAI: *Have you would t said it to John b. French SCL-inversion: *Dit-il a t la vérité? (vs. A-t-il dit la vérité? ‘Has he told the truth?’) c. German V2: *Überall folgen ich dich t werde. (vs. Überall werde ich dich folgen,‘I’ll follow you everywhere.’) This is a desirable consequence of the HMC as formulated in (1). Starting with Chomsky (1986b), the ECP has been formulated such that the HMC is a deductive consequence of this principle rather than a separate generalization. For concreteness, we will frame our discussion in terms of a conjunctive version of the ECP where the definition of antecedentgovernment involves the Relativized Minimality Condition; cf. Rizzi (1990: chapter 2). Thus traces must satisfy both of the following conditions in order to be well-formed: (4) A non-pronominal empty category must be (i) properly head-governed (formal licensing); (ii) θ-governed or antecedent-governed (identification). (Rizzi (1990:74)) Head- and antecedent-government are the notions that will play a central role in our discussion throughout. Rizzi (1990: ch. 3) eventually abandons

336 Ian Roberts the θ-government component of the definition in (4); since we will be primarily concerned with functional categories in our discussion, θ-government plays no role. Both head- and antecedent government are defined in terms of Relativized Minimality (RM) in the following way: (5) For α (a head, antecedent), X α-governs Y only if there is no Z such that (i) Z is a typical potential α-governor for Y; (ii) Z c-commands Y and does not c-command X. The definitions of typical potential α-governor are as follows: (6) Z is a typical potential head-governor for Y = Z is a head m-commanding Y. (7) a. Z is a typical potential antecedent-governor for Y, Y in an A-chain = Z is an A-specifier c-commanding Y. b. Z is a typical potential antecedent-governor for Y, Y in an A′-chain = Z is an A′-specifier c-commanding Y. c. Z is a typical potential antecedent-governor for Y, Y in an X-chain = Z is a head c-commanding Y. Applying these definitions to the schema in (2), we see that the moved X does not head-govern its trace since there is an intervening head which m-commands the trace and so counts as a typical potential head-governor for that trace. This is the effect of (6). (7c) tells us that X does not antecedent-govern its trace since there is an intervening head which c-commands the trace, and so counts as a typical potential antecedent-governor for the trace. Thus X fails to head and antecedent- govern its trace and the trace violates the ECP. This subsumes the basic case of the HMC under the ECP. In the remainder of this section, we will discuss four conceptual objections to this account. The other sections of the paper demonstrate the empirical desirability of a different conception of the constraints on head movement by showing that the “classical” HMC of (1) does not give the correct descriptive characterization of them. We will argue for a “fully relativized” condition on head movement, linked to a division of heads into L-related and non-L-related. As just mentioned, there are four objections to the RM account of how the HMC follows from the ECP. First, it is clear that many instances of head movement are triggered by some morphological property of the host head. For instance, V-to-I movement in languages like French or Middle English is clearly related to properties of the agreement paradigms of those languages. Rizzi & Roberts (1989 [this volume, Chapter 9]) refer to this kind of morphological triggering of head movement as m(orphological)selection. It is arguable that the standard cases of verb-movement in the Germanic languages are all cases which are triggered by m-selection (cf. in particular Tomaselli (1990) for arguments that V2 Cs have an abstract morphological feature which triggers I-to-C). M-selection, like other types

Two Types of Head Movement in Romance 337 of selection, is strictly local (cf. Chomsky (1965)). Thus there is at best a redundancy between the notion of morphological selection (which is independently needed to account for the differential triggering of head movement, e.g. this is how we distinguish noun-incorporating languages from non-noun-incorporating languages, languages with applicatives from those without applicatives, languages with synthetic causatives from those without synthetic causatives, etc.) and the HMC. This is especially clear if we adopt some version of Chomsky’s (1991) economy guideline, since we would not, other things being equal, expect a head to move from a position where it was not m-selected. This means that heads inside adjuncts and subjects could never undergo head movement. It seems, then, that the “classical” HMC can be derived from m-selection combined with the economy guideline, both of which are needed quite independently of head movement. In that case, one might wonder whether the ECP should be able to “see” head traces at all; one could make the general assumption that the ECP (like the related pro-module) only checks XPs, and that head traces which do not contribute to semantic interpretation are typically deleted in the mapping to LF in accordance with economy. We will return to this point below. Second, Rizzi’s definition of typical potential antecedent-governor in (7) is conceptually unsatisfactory (the same point is made by Chomsky & Lasnik (1993)). (7a) and (7b) appear to form a natural class of positions (pace Chomsky & Lasnik (ibid)), but (7c) does not appear to belong to this class. (7c) differs from (7a,b) in that (7a,b) refer crucially to the functional distinction between A and A′-positions, while (7c) refers to the purely structural notion of X. Also, (7a,b) refer to specifier positions, while (7c) does not. We should be suspicious of such an unnatural definition. It is clear that (7c) is the problematic component and this is the part which derives the HMC. Third, it is in fact (7c) alone that guarantees the ill-formedness of the traces in examples like (2). The definition of typical potential head-governor in (6) simply guarantees that X in (2) does not head-govern its trace, but it does not say that the trace is not head-governed. Neither (6) nor (7c) says anything about Y’s ability to head-govern this trace. Since the structure is ruled out because the trace is in any case not antecedent-governed, the point may seem moot, but we have just seen that the definition which gives the result that the trace is not antecedent-governed is suspect. Rizzi (1990) deals with this point by stipulating that head traces are not subject to an independent head-government requirement. This move effectively compounds the conceptual problem surrounding how (2) is explained by the ECP. We will show in what follows that this stipulation can be abandoned with empirical and conceptual gain. Fourth, in the context of a conjunctive ECP, we expect to find selective violations of one or other clause of this condition. In the case of XP-movement, it is in fact quite easy to construct examples of each type of violation for

338 Ian Roberts both A′and for A-movement (still adopting the background assumptions of Rizzi (1990: chapter 2)): (8) a. b. c. d.

*How do you wonder which problem to solve t t′? *Who did you say that t left? *John seems that it is likely t to win. *John was preferred for t to leave.

In (8a), t′ is head-governed (as the possibility of the corresponding short movement shows), but not antecedent-governed since which problem functions as a typical potential antecedent-governor for this trace. In (8b), on the other hand, t is antecedent-governed (in Rizzi’s system) since there is no typical potential antecedent-governor which intervenes between the trace and who. However, the trace fails to be head-governed since that is not a governor (and I cannot govern its specifier, cf. Rizzi (1990), Koopman & Sportiche (1991)). (8c,d) illustrate the same for NP-movement. (8c) is a case of super-raising where the trace fails to be antecedent-governed owing to the presence of the typical potential antecedent-governor it occupying an intervening A-specifier position. Nevertheless, the possibility of short raising shows that we must take this trace to be head-governed, presumably by the raising predicate. Finally, on the assumption that for, although it is a governor, is not a proper headgovernor, the trace in (8d) fails to be properly head-governed, although it is antecedent-governed (as the grammaticality of the corresponding sentence without for shows). There are no corresponding examples involving head movement. Since we want to assimilate head movement as far as possible to other cases of movement, i.e. XP-movement, we should wonder why this is. Once again, Rizzi’s stipulation that head traces are not subject to a separate head-government requirement answers the question, but in an unsatisfactory way. What we wish to do in this paper is construct an account of the locality conditions on head movement which overcomes this objection, as well as the three given above. In the context of a research program which aims to assimilate head movement as far as possible to XP-movement, I take this goal to be desirable. In the foregoing, we have seen conceptual reasons to want to reformulate the conditions on head movement. We have alluded to an alternative account in terms of m-selection and economy. Such an account would claim that head movement is always and only triggered by m-selection. This account would capture the impossibility of “long” head movement of the type schematized in (2) and exemplified in (3); these would be violations of m-selection, since only I can be m-selected by C, and violations of economy, since there is no trigger for movement of the lower head. The impossibility of head movement from subjects and adjuncts discussed in Baker (1988) would similarly result from the combination of the locality condition on m-selection and economy.

Two Types of Head Movement in Romance 339 This kind of approach works for the kind of case just mentioned. However, there are cases of head movement which are not triggered by a morphological property of the host of head movement.2 In terms of what was just said, we would expect these cases of head movement not to be local in the same way as the cases discussed here. This prediction turns out to be correct, but there is a locality condition. The central point of this paper is to show (a) that this locality condition should be formulated in terms of antecedent-government, (b) that this condition is related to the putative antecedent-government condition on traces of “standard” kinds of head movement, along the general lines proposed by Rizzi (1990) and summarized above, and (c) that there is a head-government requirement on all head traces, both traces of m-selected head movement and traces of nonm-selected head movement. If (a–c) are correct, then the ECP does apply to head traces after all.3 In the next sections I will try to provide some empirical motivation for the claim that there exist cases of head movement which are not triggered by m-selection. Once we have seen some data, we can provide the details of the overall theory of constraints on head movement.

3. Lema & Rivero on “Long Head Movement” In a series of interesting and important recent papers, Rivero (1991, 1993, 1994) and Lema & Rivero (L&R) (1990, 1991a,b) discuss a case of head movement which cannot be regarded as m-selected. Lema & Rivero refer to this phenomenon as Long Head Movement (LHM). LHM is found in various Slavic and conservative Romance varieties: Bulgarian, Czech, Rumanian,4 literary European Portuguese (EP) and Old Spanish (OSp) (L&R (1991b) also mention Old Provençal, Old Catalan and Early Italian; Rivero (1991) discusses Slovak and Serbo-Croatian, while Borsley et al. (1996) show that the same phenomenon exists in Breton). Here we will limit our discussion and illustration to the Romance cases. LHM constructions have the following general form: (9) Prt/infini Aux+AGR ti In (9), a non-finite verb-form is moved over a finite auxiliary (the non-finite form may itself be an auxiliary). (10) gives some examples of the schema in (9) (see L&R (1991a)): (10) a. b.

Seguir-te-ei por toda a parte (EP) Follow-you-will-(I) by all the part ‘I will follow you everywhere’ Darte he un exemplo (OSp) Give-you (I) will an example ‘I will give you an example’

340 Ian Roberts To the extent that the finite auxiliary is a head which c-commands the base-position of the verb (and this is certainly the natural assumption to make), the derivations of these sentences clearly violate the HMC in that the non-finite verb “skips” the finite auxiliary. It is less clear whether the representations of this kind of sentence violate the HMC. One way to “save” the standard HMC is to propose that the non-finite verb and the auxiliary are coindexed (perhaps because they form a “tense chain” in the sense of Guéron & Hoekstra (1988)), or that the auxiliary undergoes head movement to the position occupied by the non-finite verb after verb-movement, possibly at LF. Variants of these solutions are proposed inter alia in Lema & Rivero (1990) and Lema (1990); they all give the result that at SS a single chain contains the non-finite verb, the finite auxiliary and the trace. In this way, the SS (and LF) representations of these sentences do not violate the HMC. For our purposes, the question of whether these constructions technically violate the HMC or not is secondary to the question of the trigger for movement of the non-finite verb. If we can show that this movement is not triggered by m-selection, then the fact that the movement does not obey the same constraints as m-selected head movement (i.e. the “classical” HMC) comes as no surprise if, as we suggested in the previous section, the “classical” HMC derives from m-selection. LHM is triggered in EP and OSp by the ban on clitic-first orders, the Tobler-Mussafia Law of Romance philology (cf. Tobler (1875), Mussafia (1983); for recent analyses, see de Kok (1985), Benincà (1989, 1991), Alberton (1990), Salvi (1991) and Cardinaletti & Roberts (2002 [this volume, Chapter 12])). Similarly, the Slavic cases of LHM are all triggered by clitic pronouns, auxiliaries or particles that obligatorily occupy the second position in the clause. The Tobler-Mussafia Law can be roughly formulated as the following filter (cf. also Benincà (1991)): (11) *[CP Y clitic . . . where Y = ∅ or another clitic. LHM is the case where Y is a non-finite verb, and the clitic is an auxiliary verb. This constraint was operative in all the Medieval Romance languages (with some variation which need not concern us here; see Benincà (1991)), as the following examples illustrate: (12) a. Rogó-le el alcalid que ge-lo departisse (OSp) Asked-him the judge that him-it tell ‘The judge asked him to tell it to him’ (Zif 141:L&R (ibid))

b. Voit le li rois (Old French) Sees him the king ‘The king sees him’ (Charroi de Nimes, 58; Cardinaletti & Roberts (2002 [this volume, Chapter 12]))

Two Types of Head Movement in Romance 341 c. Vogliolo sapere da mia madre (Old Italian) (I) want-it to-know from my mother ‘I want to know it from my mother’ (Novellino, III; Alberton (1990)) However, in Old French and in Old Italian we do not find cases where a non-finite verb has moved in front of an inflected auxiliary, to give the order V[-fin] CL AUX. Instead, we find the order AUX CL V[-fin], as in: (13) a. Ai le jou bien fait? (Old French) Have it I well done? ‘Have I done it well?’ (Clari:71, 27; de Kok (1985: 83))

b. Hailo tu fatto per provarmi? (Old Italian) Have-it you done to try me? ‘Have you done it to try me?’ (Alberton (1990))

Examples of this type are not found with the future auxiliary in Spanish and Portuguese. So the future auxiliary cannot act as a first-position element which “protects” a second-position pronominal clitic from first position in OSp and EP. We propose that this is due to the fact that the auxiliary itself is a clitic. This property distinguishes the OSp future auxiliary from other auxiliaries in the Medieval Romance languages (the cognate of the OSp future auxiliary was an affix during the entire medieval period in French and elsewhere). Because the future auxiliary is a clitic, it itself requires that some element precede it, in accordance with the ban on first-position clitics in Medieval Romance. This gives rise to mesocliticization, i.e. the order infinitive–clitic pronoun–clitic auxiliary as in (10). If some other element, e.g. a wh-phrase, is fronted to a position within CP, LHM does not and in fact cannot take place: (14) A quien nos dar-édes por cabdiello? to who us give-you+will as leader? ‘Who will you give us as leader?’ Here the infinitive is fronted to the auxiliary, but follows the pronominal clitic; it is in a different position from the one it occupies in (10) (in the Slavic languages the verb does not move at all when some other constituent is fronted). We conclude that LHM is a last-resort operation which satisfies (11).5 Moreover, the order of the elements in (10) clearly shows that the auxiliary cannot be considered an affix, since pronominal clitics do not intervene between stems and affixes. We conclude that LHM cannot be a case of m-selected head movement.

342 Ian Roberts L&R show that, among others, LHM has the following properties in all known cases: (15) a. It is restricted to root contexts. b. It is blocked by negation. Let us consider these properties one by one. The restriction to root contexts is illustrated by the Portuguese example in (16): (16) Uma historia .. onde me referirei de espaço a elle A story where me (I) refer-will at length to her ‘A history where I will refer to her at length’

(ibid., (9a), p. 4)

In the relative clause here, the order is clitic-infinitive-auxiliary. This is always the case in non-root contexts. In the Slavic languages, the verb remains in situ in non-root contexts, while in the Romance languages it moves to the auxiliary, as (16) shows (cf. (14) where we see the same phenomenon where some other constituent is fronted). We follow L&R in taking the root nature of this phenomenon as an indication that it involves movement to C; since den Besten (1983), it has been standard to regard root cases of head movement as involving C. So LHM is V-to-C movement (see Benincà (1991)). The clitic element occupies a functional head position immediately below C (arguably the AGR1 of Cardinaletti & Roberts (2002; this volume, Chapter 12)). The essential locality condition on LHM is revealed by the fact that negation blocks it. This is illustrated by the OSp example in (17): (17) Aqui non vos faran si non todo plazer Here not to-you make-will-3pl if not all pleasure ‘Here they will not give you anything but pleasure’ (ibid., (13b), p. 5) Here we have the same order as in non-root contexts: clitic-infinitiveauxiliary. This is systematically the case in negative clauses. We could try to account for this by saying that negation is a head, and as such head movement cannot take place across it. Thus, the following derived structure is ruled out by the HMC: (18) [C V[-fin]i ] . . . Neg . . . ti . . . However, if this is true, why does an inflected auxiliary not similarly block head movement in the following structure? (19) [CV[-fin]i] [AGR(cl-) Aux[+fin]] . . . ti . . .

Two Types of Head Movement in Romance 343 Note also that an analogous problem arises in the analysis of do-support in English. Negation clearly does not block have/be raising, but recent analyses (notably Chomsky (1991), Pollock (1989)) have proposed that do-support is obligatory with clausal negation because negation blocks either affixhopping or LF verb-movement. Again (although this time at LF), negation appears to selectively block head movement. In Roberts (1993: 1.3) I proposed treating the fact that negation seems to selectively block head movement by extending the A/A’ distinction to the head level. This meant that there were two types of heads: A-heads and A’-heads, with T and AGR being A-heads and Neg and C A′-heads. Then, by extending Relativized Minimality to the X°-level, it was possible to claim that movements to A and A’-head positions would not “interfere” with each other, and the fact that negation selectively blocks A’-movement could be explained. In her commentary on the earlier version of this paper, Iatridou pointed out that this did not seem to be a natural extension of the A/A′-distinction and also underlined the fact that this distinction is theoretically unclear even for maximal projections (cf. Chomsky & Lasnik (1993), Koopman & Sportiche (1991)). For this reason I propose to reformulate the basic idea of earlier work in terms of the notion of L-relatedness. Chomsky & Lasnik (1993) define L-relatedness as follows: “Given a lexical head L, we say that a position is L-related if it is the specifier or complement of a feature of L.” Certain functional heads, e.g. T and AGR, are taken to be features of V; others, e.g. C, are not. We propose, then, that the notion of L-relatedness can be extended to heads along the following lines: (20) Given a lexical head L, a position is L-related if: (i) it is a feature of L; (ii) it is a specifier or complement of a feature of L. Being a feature of L in Chomsky & Lasnik’s sense certainly includes m-selecting L, although it is presumably a broader notion (below we will see some evidence that AGR is always L-related whether it m-selects V or not). So L-related heads m-select other heads, thereby triggering “classical” local head movement (as mentioned in note 2, this property can be “induced” by the Specifier). Non-L-related heads do not directly trigger head movement but this in itself does not disqualify them from being landing sites for head movement. Under the right conditions (notably if head movement is triggered by something else, e.g. (11)), they may serve as landing sites for head movement. It should be borne in mind that at the Xº-level, as at the XP-level, there is in principle no inherent connection between category and function; just as NPs can occupy either L-related or non-L-related positions, so we may expect that a given head can be either L-related or non-L-related, depending on factors other than its category (for example, its internal structure). Moreover, we do not mean to imply that a head is L-related iff it has

344 Ian Roberts an m-selection feature, since this would disqualify I in Modern English, with the undesirable consequence that Spec of IP would be a non-L-related position. Rather, there is a one-way implication: if a head has an m-selection feature, then it is L-related. Taking up the suggestion in Chomsky & Lasnik (1993) that the A/A′ distinction is to be fully replaced by the notion of L-relatedness, we can now reformulate the definition of “typical potential antecedent governor” given in (7) as follows: (21) W is a typical potential antecedent-governor for Z = i. in a non-L-related chain: for Z = XP, W is an XP in a non-L-related position c-commanding Z; for Z = X°, W is a non-L-related X° c-commanding Z. ii. in an L-related chain: for Z = XP, W is an XP in an L-related position c-commanding Z; for Z = X°, W is an L-related X° c-commanding Z. Chomsky & Lasnik (1993) propose a characterization of the notion of a chain which is uniform with respect to L-relatedness, which amounts to the following: (22) The chain C = (a1. . . , an) is uniform with respect to L-relatedness if each ai is L-related or each ai is non-L-related. Still following Chomsky & Lasnik (ibid), we assume that only in “mixed” chains, i.e. those where an is an L-related position and all other positions are non-L-related, are intermediate traces of moved heads deletable. The definitions in (21) may in large part follow from other considerations, although we will retain them in this form for convenience. Chomsky & Lasnik argue that XP-movement is constrained as described in (21) as a consequence of economy. As we have seen, (21ii) in its application to heads follows from the locality condition on m-selection combined with economy (to the extent that m-selection is coextensive with L-relatedness for heads; see above). (21i) in its application to heads may also derive from economy considerations; the very fact that all cases of LHM are instances of “lastresort” movement triggered by the filter in (11) strongly suggests that this is true (although some of the instances of non-L-related head movement to be discussed in section 4 are less obviously economy effects). As a consequence of (21), the Relativized Minimality Condition will apply to heads in a fashion analogous to the way in which it applies to maximal projections. This immediately overcomes the conceptual objection that Rizzi’s definition of typical potential antecedent-governor mixes structural and functional notions. This reformulation defines the intervener for head movement in functional terms too, eliminating this asymmetry.

Two Types of Head Movement in Romance 345 Now, we have seen that LHM is non-m-selected movement to C. Therefore it is a case of head-movement to a non-L-related position. As such, AGR does not count as an intervener since it is L-related and so the non-finite verb is free to move over it. We assume that negation is non-Lrelated, at least in the languages in question (Ouhalla (1990) argues that negation m-selects V in Turkish and Berber; if so, then we make clear predictions about how negation interacts with head movement). In that case, negation does count as an intervener. This provides us with an explanation for the fact that negation blocks LHM, but an inflected auxiliary (which we take to be in AGR) does not. (The English case is clearly more complex since it involves an LF operation; we turn to it in section 6).6 So, to sum up, L&R’s data show that there exist cases of head movement which are not triggered by m-selection. This motivates a restatement of the condition on head movement in terms of the notion of “typical potential antecedent-governor” along the lines of (21). What we have argued in this section is that head-to-head movement is subject to antecedent-government, although antecedent-government in general may well derive from economy conditions on chain formation. However, if we continue to assume a conjunctive ECP (even as a descriptive artifact), there is another factor which we have not yet considered: the fact that traces of head movement must be head-governed. As we mentioned in the previous section, the independent nature of this requirement is obscured in a system in which the HMC holds in full since the same head—the one which is the minimal governor of the trace of incorporation—will always be both the antecedent-governor and the head-governor of the lower trace. In a system like the one we are proposing here, however, this difference is of real empirical importance. In configurations like (9) or (19), V[-fin] antecedentgoverns ti from C but the minimal governing head must head-govern it. In the case of LHM of non-finite verbs we can assume that the verb has moved through T (to pick up infinitival morphology in the examples we have discussed), and so what we really have is T-to-C movement. The minimal governor for T is AGR (AGR2 in terms of Cardinaletti & Roberts (2002 [this volume, Chapter 12]); note that cases where a finite verb moves to C to satisfy (11), e.g. (12), are different in that “long” movement is from AGR2 “skipping” AGR1), Thus we see that a necessary (but not sufficient) condition for LHM is that AGR must have the capacity to head-govern a head trace. In the rest of this paper, we will explore the consequences of an approach to the HMC based on the version of RM extended to heads as in (21), along with the related idea that the head- and antecedent-governors for a given head trace may be separate elements. Our argument runs as follows: first, we identify further, less obvious, cases of LHM in Romance; second, we show how all the LHM constructions depend on AGR’s ability to head-govern a head trace, an ability which we take to be subject to parametric variation; third, we show how this postulated parametric property of AGR allows us to give an interesting account of a series

346 Ian Roberts of related syntactic changes in the history of French, and a related typology of the Romance languages. The empirical results regarding the history of French and the typology of Romance thus support the contention that head traces are subject to a head-government requirement, and therefore that the ECP applies to head traces, It does not appear that this requirement can be subsumed under economy; one possibility, following Aoun et al. (1987), is that it is a PF-requirement. In the final section of this paper we will provide an argument to that effect.

4. Other “Long” Movements 4.1 Clitic-Climbing In this section we propose an analysis of clitic-climbing which treats this construction as a case of non-m-selected head movement. The analysis is largely inspired by Kayne (1989), and is best considered as an elaboration of the proposals in that paper. Our analysis focuses initially on Italian, although as far as we are aware the comparable phenomena of Spanish could be handled in the same way. In section 5, we will discuss the historical development of this construction in French. We propose the following initial characterization of Italian clitics: (23) Clitics adjoin to AGR, possibly via successive-cyclic movement. As in the case of XP-movement, successive-cyclic movement takes place in order to satisfy locality conditions, in particular the ECP. We will follow Sportiche (1998) in assuming that an initial step of cliticization involves XP-movement, which we take to be adjunction to TP. We construe successive clitic-movement as excorporation, in the sense of Roberts (1991 [this volume, Chapter 10]). Roberts (1991 [this volume, Chapter 10]) shows that excorporation is possible from an adjoined position or from a position into which a head has “freely” substituted, but not from an m-selected position into which a head has been substituted to satisfy m-selection. In the domain of XP-movement, adjoined positions are always A′-positions, i.e. non-L-related positions, We will make the same assumption for head movement: no adjoined position is an L-related position. Thus the chain formed by clitic-movement is a non-L-related chain. Moreover, if we take the base position of clitics to be D (cf. Postal (1969)), then clitic chains are uniformly non-L-related chains in the sense defined in the previous section. What (23) states is that once adjoined to AGR, clitics do not further excorporate (although they may move with AGR). This is not the place to further elaborate the theory of cliticization, but we tentatively take this attraction to AGR to be caused by some intrinsic property of the clitics, qua Ds. Although (23) requires clitics to adjoin to AGR, it does not say that they have to adjoin to the local c-commanding AGR. In the general case, the clitic will not in fact move further than the local AGR because further

Two Types of Head Movement in Romance 347 movement will be impossible. Should a clitic move to the non-local AGR, we have the following configuration: (24) [ AGR cli [ AGR ]] . . . V [CP [C (ti )] . . . AGR . . . ti . . . VP Here the clitic passes through the lower Comp, since Comp (in Italian) is a non-L-related head (CP is also a barrier by inheritance; clitic movement, like all other movement, must obey subjacency). Economy considerations then predict that short movement must be preferred to the representation in (24), since this representation contains an extra trace in C as compared to the derivation where the clitic moves just to the lower AGR. Thus (optional) clitic-climbing should never be found. One possibility would be that “restructuring” verbs, i.e. those which tolerate clitic climbing, have a “defective” CP complement which, unlike other CPs, does not block either head movement or NP-movement from within (cf. Rizzi (1982) on how restructuring verbs allow certain cases of “long” NP-movement). In that case, clitic-climbing would not necessarily create a “superfluous” trace in the lower C, and economy would not be violated in (24). We will not speculate on what exactly this “restructuring” property may be, but, as Kayne (1989) points out, it cannot be simply CP-deletion, since there is at least one restructuring verb that can take a [+wh] C, namely sapere ‘know’. Whatever the special property of restructuring, where V is volere ‘want’, for example, the configuration in (24) is allowed, but where V is decidere (‘decide’), it is not: (25) a. Lo voglio fare ‘I want to do it’ b. *Lo decido di fare ‘I decide to do it’ In our terms, it is natural to propose that the basic property of “restructuring” verbs is that they are able to head-govern a clitic-trace in the C of their complement, while non-restructuring verbs are not. This in turn may be related to the intuition that these verbs are somehow “closer” to their complement clause than other types of infinitival-taking verbs, an idea instantiated in clause-union analyses of clitic-climbing. Note that in (24) the clitic does not pass through the lower AGR. The characterization of cliticization in (23) rules out this possibility: the clitic cannot excorporate from AGR. However, nothing prevents a clitic from moving to the lower AGR and staying there; this is what obligatorily happens with non-restructuring verbs and, in Standard Italian, this is also an option with restructuring verbs: (26) a. Voglio farlo b. Decido di farlo

348 Ian Roberts (The position of the infinitive indicates that the infinitive has moved in these examples—cf. Kayne (1991) and section 4.2). In these terms, then, clitic-climbing is the case where (23) is satisfied by movement to a nonlocal AGR. There is a further condition on this movement, however: the low AGR must be able to head-govern the clitic trace in the DP adjoined to TP (we assume that this trace is undeletable at LF since it is the clitic’s base position). So, for (24) to be well-formed, V must head-govern the clitic-trace in C and AGR must head-govern the lower trace. As we have mentioned, V’s property seems to be lexically determined. We will argue that (non-finite) AGR’s head-government property reflects a parametric option that is taken in Italian but not in contemporary French. The clitic-trace in the lower T is antecedent-governed by the trace in C; AGR does not count as an intervening head since it is L-related.7 So we assimilate clitic-climbing, in essential respects, to LHM. Cliticclimbing clearly parallels LHM in one important way: it is blocked by negation. This is illustrated by the following paradigm: (27) a. b. c.

Non lo voglio fare *Lo voglio non fare Voglio non farlo ‘I don’t want to do it’

Taking non to be Neg and Neg to be a non-L-related head, we see why clitic-climbing is blocked in (27b): Neg prevents the trace of the climbed clitic in C from antecedent-governing the clitic trace in T. For this account to work, we must also prevent excorporation from Neg. To do this, we make the following assumption: (28) Excorporation from non-L-related heads is impossible. This idea is related to the fact that economy considerations seem to underlie the definition of typical potential antecedent-governor given in (21). Following Roberts (1991 [this volume, Chapter 9]), excorporation cannot take place from m-selected positions. However, adjunction to an m-selecting, L-related head followed by excorporation from it is allowed (except where that head is AGR in the case of clitic movement). On the other hand, adjunction to a non-m-selecting, non-L-related head followed by excorporation is in conflict with the idea that movement never goes further than the nearest appropriate landing site—essentially the idea behind deriving Relativized Minimality from economy. Thus negation, like any non-L-related head, will always break non-L-related head chains. As we have already mentioned, our analysis of clitic-climbing is very close to that proposed by Kayne (1989). Kayne proposes that Italian differs from French in that non-finite Infl is able to L-mark VP in Italian, but

Two Types of Head Movement in Romance 349 not in French. Hence an infinitival VP is not a barrier to clitic-movement in Italian, while it is in French. Kayne in turn relates this capacity of Infl to the null-subject property of Italian, suggesting that an Infl which is able to license null subjects in finite clauses is sufficiently “lexical” to L-mark VP in non-finite clauses. However, if we combine the results of work done subsequent to Kayne (1989), essentially Pollock’s (1989) analysis of short movement of infinitives in French and the related “split-Infl” hypothesis along with the idea proposed inter alia by Koopman & Sportiche (1991) that the base position of the subject is VP-internal, it emerges that an L-marking account of the difference between French and Italian is not tenable. As is well-known, Pollock (1989) argues that infinitives in French move to T (reformulating his clause structure along the lines of Belletti (1994)). This is shown by an example like the following: (29) Se laver souvent les mains. . . To wash-self often the hands,. . . The adverb souvent is taken to be adjoined to VP, and so the infinitive se laver must be outside VP. However, the fact that the infinitive precedes the negator pas while finite verbs must precede this element leads Pollock to argue that infinitives undergo just the “short” movement to T in (contemporary) French. If this account is correct, then in Kayne’s terms we must say that T is able to L-mark VP. In that case the relevant transposition of Kayne’s analysis of clitic-climbing in the “split-Infl” system must involve AGR, and indeed this seems the natural way to retain the connection with null subjects. So we must say that French non-finite AGR is unable to L-mark TP while its Italian counterpart has this capacity. The example in (29), however, shows that this assumption is problematic. The clitic reflexive se is an anaphor requiring a local, c-commanding antecedent. Thus the PRO subject of the infinitive must appear in an L-related position c-commanding se. Moreover, if we wish to maintain the results of Chomsky’s (1981) PRO theorem (cf. Koopman & Sportiche (1991) on how this result may hold in a system where subjects are base-generated in VP), then we must require PRO to move in infinitival clauses in order to escape government by T (and possibly V) in its base position. So it seems clear that PRO moves out of VP here. Moreover, if we adopt the general proposal in Koopman &Sportiche (1991: 237–239), then PRO must have moved to Spec of AGRP (Spec of AGRP is an L-related position in infinitives since AGR is always an L-related position). Further support for the idea that the subject raises from VP in infinitival clauses in French comes from examples with an overt subject, such as the following: (30) J’ai vu Jean se laver souvent les mains I’ve seen John wash-self often the hands

350 Ian Roberts Here the same reasoning applies as in (29). The infinitive is in T, as the position of the adverb shows, and the subject must be in an L-related position so as to bind se. In her study of the complements of perception verbs, Guasti (1992: 202–204) argues that these complements are AGRSPs (equivalent to our AGRP) in French. Hence, the only L-related position from which the subject can bind se in (30) is Spec of AGRP. Thus NP-movement to Spec of AGRP is possible in French infinitives, and AGR must L-mark TP. The movement of the infinitive alone shows that T L-marks AgrP. These facts do not vary between French and Italian (although infinitives move further in Italian; see Belletti (1990), Guasti (1992) and below), and so it seems that we are compelled to say that non-finite AGR can L-mark TP in French. In that case, we must look elsewhere for an account of the differences between French and Italian in terms of clitic-climbing. Kayne in fact leaves open the question of how NP-movement out of nonfinite VPs is possible in French (and his article was written prior to the general adoption of the VP-Internal Subject Hypothesis) but we consider that an account which can encompass the basic insight of his analysis of clitic-climbing and avoid the problem that his L-marking account poses for the analysis of NP-movement is to be preferred over the one he proposes. We adopt the following definition from Cinque (1991): (31) XP is L-marked if XP is directly selected by an X ≠ [-V]. This means that non-complements (subjects and adjuncts) and complements to N and P are not L-marked, while complements to V and A are L-marked. Functional categories are not specified for [+V], hence they are not [-V], hence they L-mark their complements. This accounts for the possibility of NP-movement to Spec of AGRP in French infinitives (and in infinitives generally). We account for the possibility of clitic-climbing in Italian in terms of AGR’s head-government capacity. Italian AGR is able to head-govern a clitic-trace, while French AGR is unable to. Just as in Kayne’s analysis, we are able to make the natural connection to the null-subject parameter: Italian AGR is “stronger” than French AGR (see section 6). This account implies that French clitics can move no further than TP, which seems correct ‑ cf. Kayne (1989). Presumably, they adjoin first to VP, as proposed by Sportiche (1998). Since we are treating clitic-climbing as a variety of LHM, we might wonder why LHM of the type discussed in the previous section is absent in Italian. Recall, however, that we posited AGR’s head-government capacity as a necessary condition for LHM; there may be other conditions. For OSp and EP there are two further conditions: the Tobler-Mussafia Law, i.e. the filter in (11), must hold and there must be a class of clitic auxiliaries (a subvariety of what Lema & Rivero call functional auxiliaries) which are unable to act as hosts for the enclisis required by this filter. In Modern Italian the

Two Types of Head Movement in Romance 351 Tobler-Mussafia Law does not hold, as the general availability of clitic-first orders shows. It is well known that this Law held for Medieval Italian, however, but the general absence of OSp/EP-style “mesocliticization” indicates that all the Italian auxiliaries were able to act as hosts for enclisis (cf. the contrast between (10) and (13) in the previous section). Whatever the precise account of the differences between Italian and the languages discussed in the previous section, the absence of LHM of the kind seen there from Italian does not pose a problem for our analysis of clitic-climbing. Conversely, however, we expect that the languages discussed in the previous section will have clitic-climbing, other things being equal, because AGR is able to head-govern a clitic-trace. This expectation is borne out, as the following examples show: (32) a. Como lo podré catar? (OSp) ‘How will I be able to look at him?’ (L&R (1990b:l5)) b. receio de que alguém nos pudesse ouvir (EP) ‘the fear that someone could hear us’ (1893, Machado de Assisi) (32) shows that the implicational statement “If LHM, then clitic-climbing” is true. This is because both constructions depend on AGR being able to head-govern a clitic trace. Moreover, Kayne (1989:254) proposes the implication “If clitic-climbing, then null subjects.” It therefore follows that the following must hold: “If LHM, then null subjects.” To the best of my knowledge, this statement holds for the whole of Romance both synchronically and diachronically. In this section we have proposed an analysis of clitic climbing which assimilates it to LHM in the sense that, following Kayne (1989), it is a case of non-m-selected head movement. Hence clitic chains are (uniformly) nonL-related chains. Following (21), clitic movement can “skip” an intervening AGR if a higher AGR is accessible. There are two preconditions for this: (i) the higher V must be able to head-govern a trace in the C of its complement CP; (ii) the lower AGR must be able to head-govern a clitic trace. The first of these conditions is satisfied by the so-called “restructuring verbs,” the second is a parametric property—related to the null-subject parameter— which distinguished Modern Italian from Modern French. 4.2 “Long” Movement of Infinitives In this section we will discuss a further instance of non-m-selected head movement, and relate it to the parameter we are proposing concerning AGR’s capacity to head-govern a trace of non-m-selected head movement. The construction, or really family of construction types, that we are concerned with are those which involve “long” movement of infinitives. By long

352 Ian Roberts movement of infinitives we mean those cases where an infinitive is moved further than in Pollock’s “short” movement, i.e. further than Belletti’s T. Putting things this way, there are at least two cases of long movement of infinitives to distinguish: movement beyond AGR and movement to AGR. In Italian, infinitives seem to move beyond AGR, as shown by the fact that (with the exception of negative imperatives; cf. Zanuttini (1991), Kayne (1992)) enclisis is obligatory with infinitives: (33) Gianni ha deciso di non farlo/*lo fare più. ‘G. has decided to not do-it/*it-do any more’. (Belletti (1990)) Kayne (1991) proposes that enclisis of this kind is in general proclisis to an empty position. Assuming this is true, and still assuming that Italian clitics always move to AGR, far(e) here must have moved to some position above AGR. Note that the fact the infinitive + clitic complex precedes the negative adverb più (which Belletti shows generally occupies the same position as French pas) indicates that both elements are outside TP. We conclude then that Italian infinitives raise to a position above AGR, although we have nothing to say about the precise nature of that position here beyond the fact that the movement does not appear to be the result of m-selection (pace Belletti (1990); Kayne (1991) proposes that the infinitive adjoins to I’). Nor do we have anything to say concerning the trigger for infinitive movement beyond T. We treat movement of the infinitive beyond AGR as involving essentially the same properties as clitic-climbing and LHM: movement of the infinitive, being non-m-selected, forms a non-L-related chain, thus the infinitive is able to skip AGR since this is an L-related head. The trace of this movement is thus antecedent-governed by the moved infinitive and headgoverned by AGR. So the well-formedness of this construction depends on AGR having the capacity to head-govern. More precisely, we propose the following substructure for the relevant parts of sentences like (33): (34) Infi [ AGR clj AGR] . . . [TP t j [TP [ T ti ] . . . T m-selects the infinitive. The chain linking the trace in T to the trace in the V-position is thus an L-related chain, while the subsequent movement to the position above AGR forms a non-L-related chain. The traces in TP are head-governed by AGR, and neither of them can be deleted at LF. The clitic-trace cannot be deleted because it is part of uniformly non-Lrelated chain (cf. the discussion of clitic-movement in the previous section, and the discussion of Chomsky & Lasnik’s notion of uniformity in section 3). The chain formed by moving the infinitive successively from V to its SS position is not a uniform chain; however here T carries substantive

Two Types of Head Movement in Romance 353 information about finiteness, and so cannot be deleted in LF (cf. Chomsky (1991)). A number of Romance languages manifest the order clitic-infinitiveadverb. According to Kayne (1991), this is the case in Sardinian, while Motapanyane (1991) shows this to be the case in Rumanian. Moreover this was true of Middle French, as the following examples (from de Kok (1985: 335), cited in Alberton (1990)) illustrate: (35) a. car elle (. . .) commença a ne les chercher pas for she began to NEG them look-for not ‘for she began not looking for them’ (Hept., 65) b. Le pauvre gentilz homme (. . .) les pria de ne les abandoner point The poor gentleman them begged to NEG them abandon not ‘The poor man begged them not to abandon them’ (Hept., 3) Assuming that pas and point occupy their modern positions (cf. Pearce (1990) for a detailed discussion of the interaction of infinitives and negation in Old and Middle French), the infinitives in these cases are raised at least as far as AGR. These examples also illustrate the order clitic-infinitive, which is the usual one for most of Old and all of Middle French, although there are some cases of enclisis to infinitives in Old French (cf. de Kok (1985: 285)). We take the cases in (35) to be movement to AGR. This kind of case is thus not an instance of LHM, in that the infinitive obeys the classical HMC by moving from V to T to AGR. Nevertheless AGR does not m-select the movement of the infinitive (as shown by the total lack of agreement morphology on infinitives in Old French, Middle French and Rumanian; Sardinian, on the other hand, does have an inflected infinitive and so the following remarks do not necessary apply to that language) and thus head-government of the trace of movement to AGR depends on AGR’s general ability to head-govern. In this sense, the two types of “long” movement of infinitives both depend on this property of AGR, even though only movement beyond AGR violates the classical HMC. Once again, we see the connection to the null-subject parameter. Kayne (1991) points out that if a language has the order INF-ADV, it allows null subjects. In other words, if a language allows either kind of “long” infinitive-movement, either to AGR or past AGR, then it has null subjects. So AGR’s head-government capacity is related to its capacity to license null subjects. Let us refer to this capacity as that of being a “generalized licenser” for empty categories (in a manner rather similar to that proposed by van Kemenade & Hulk (1995)). More simply, we will refer to AGR which is a generalized licenser as [+L]. So we see that Italian AGR is [+L], while (Modern) French AGR is [-L].

354 Ian Roberts 4.3 Aux-to-Comp Rizzi (1982: chapter 3) discusses an Italian construction where a participial or infinitival (or, more marginally, a subjunctive) auxiliary inverts around a subject and assigns nominative Case to that subject, as in (36): (36) Avendo Gianni fatto questo,. . . Having John done this,. . . As with various types of Germanic and French inversion which yield the order Aux-subject-Prt, Rizzi argues that this is an instance of movement to C (Rizzi adduces the complementary distribution of conditional inversion with “if,” exactly parallel to Germanic and French, as evidence). Now, a striking property of (contemporary) Italian is that inversion around an overt nominal subject is in general impossible (these examples are more acceptable at a very high stylistic level, but many native speakers reject them): (37) a. *Ha Gianni preso il libro? ‘Has G. taken the book?’ b. *Che film ha Gianni visto? ‘Which film has G. seen? ’ In this respect, Italian patterns with French and against Germanic, cf.: (38) a. Has John spoken? b. *A Jean parlé? Rizzi & Roberts (1989 [this volume, Chapter 9]), adapting ideas from Koopman & Sportiche (1991), rule out (38b) by assuming that French chooses a different parametric option for nominative assignment to that chosen by English and the other Germanic languages, one which gives the result that nominative cannot be assigned to Spec of AGRP in inversion contexts. In French, nominative Case cannot be assigned under government, hence there is no way for Jean to receive Case in (38b) as inversion destroys the context for nominative assignment (see Rizzi &Roberts (1989 [this volume, Chapter 9]) for details). This account carries over naturally to Italian, accounting for the ungrammaticality of (37). However, Aux-to-Comp as in (36) is now problematic: why is nominative assignment under government possible here but not in (37)? The answer to this question requires an elaboration of the account of nominative assignment in Italian. First, it is well-known that Italian allows so-called “free inversion”: (39) Telefonò Gianni Phoned Gianni

Two Types of Head Movement in Romance 355 There are good reasons to think that the inflected verb is not in C here: the fact that the subject must follow the participle in a compound tense rather than appear between the auxiliary and the participle; the fact that the construction is not sensitive to the root/embedded distinction; the fact that the verb is not sensitive to the content of the complementizer, etc (cf. Kayne (1972)). We thus adopt a variant of the proposals in Giorgi & Longobardi (1991) and Rizzi (1990) and assume that the subject is in its base position here, and it receives Case from T under government. In (39), the inflected verb has moved on to AGR, but the trace in T is still able to Case-mark the subject thanks to the Government Transparency Corollary (GTC, cf. Baker (1988); note that there is no Agreement Transparency Corollary to save (37) and (38b)): (40) [ AGR ′ [ AGR V+T+AGR ] [TP [T t] [ VP t NP]]]. T assigns nominative to the postverbal subject in this configuration. In (37) on the other hand, T does not govern the position of the subject but the combination AGR+T(+V) does. AGR is also a nominative assigner in Italian, and so where T and AGR combine we have a head with the following structure: (41)

AGR[+Nom] T[+Nom]

AGR[+Nom]

Following standard assumptions about headedness in derived words (i.e. complex heads) and the percolation of features to the whole word (cf. Lieber (1980), Marantz (1984)), the nominative feature associated with the dominating AGR here is AGR’s and is hence not able to be assigned under government. T’s feature is blocked by the presence of AGR, and T cannot assign nominative from its base position to the subject in Spec of AGRP. This account of nominative assignment in free inversion as opposed to “Germanic inversion” constructions in Italian leads to the following conclusion: (42) T can assign nominative under government from positions where it has not combined with AGR [+Nom]. (42) provides the answer to the problem posed above. In Aux-to-Comp constructions, T does not combine with AGR [+Nom]. One reason for this could be that T does not combine with AGR at all, but moves to C directly, skipping AGR (this would be possible, since AGR does not m-select T here, as it does in finite clauses). This would give the following structure: (43) [C V+Ti ] AGR [T ti′ ] [ VP . . . ti . . .]

356 Ian Roberts Thus one natural way to account for the Case-assignment properties of Aux-to-Comp constructions is in terms of an LHM analysis. T can skip AGR here thanks on the one hand to the fact that AGR, being an L-related head, does not intervene in the non-L-related chain formed by T-to-C movement, and so T can antecedent-govern its own trace from C. On the other hand, we must assume that AGR is able to head-govern the trace in T. Since AGR does not m-select T, unlike in finite clauses, there is then no requirement that T move to AGR. Moreover, to the extent that we might wish to assume that AGR always carries a [+Nom] feature, independently of finiteness, then Case theory prevents T from moving through AGR. However, there are good reasons not to treat Aux-to-Comp as LHM, but rather as non-m-selected movement through AGR. This is because clitics are moved with the auxiliary to C in this construction: (44) a. *Avendo Gianni lo fatto,. . . b. Avendolo Gianni fatto . . . ‘John having done it,. . . ’ Given our account of Italian clitic-placement based on (23), we are obliged to say that (44b) involves AGR-to-C (recall that we have been assuming that clitics only move beyond AGR where AGR itself moves). So, rather than an LHM account, we propose that Aux-to-C involves adjunction of the nonfinite auxiliary to a non-finite AGR which neither m-selects T nor contains a [+Nom] feature. Moving through AGR in this way, the auxiliary “picks up” the clitic (this account means that Kayne’s (1991) analysis of enclisis must be slightly weakened since this instance of enclisis does not involve adjunction of the clitic an empty functional head at SS, although such an adjunction does take place during the derivation). Since m-selection is not involved, non-finite AGR must be intrinsically a head-governor in order to allow the structure. Once again, the head chain is non-uniform, since T m-selects V. The trace in T cannot be deleted, since it carries information about finiteness. On the other hand, the trace adjoined to AGR can—and therefore must—be deleted. This is a good result since it is unclear how this trace would be head-governed. So we see that the same property is manifested in Aux-to-C as in LHM constructions more generally, as well as in infinitive-to-AGR movement. This is true despite the fact that the derivation of Aux-to-C does not violate the classical HMC.

5. The History of French We have now seen that four constructions are related to AGR’s ability to head-govern the trace of non-m-selected, and possibly therefore, long, head movement: LHM, clitic-climbing, long infinitive-movement and Aux-toComp. Moreover, following Kayne (1989, 1991), non-finite AGR’s ability

Two Types of Head Movement in Romance 357 to head-govern in this way appears to be related its ability to license a null subject in finite clauses. In this section we will see further evidence for relating these constructions from the history of French. Essentially, French lost clitic-climbing, long-infinitive movement, Aux-to-Comp and null subjects together (there are no attested cases of LHM in the Lema & Rivero sense at any stage in the history of French, for reasons that are probably connected to the nature of the OF auxiliary system—cf. the discussion of Italian in section 4.1—so we leave this construction aside in what follows). Modern French lacks clitic-climbing, long infinitive movement, Aux-toComp and null subjects: (45) a. b. c. d.

*Je le peux faire I it can do *Ayant Jean fait cela,. . . Having John done that . . . *N’aimer pas ses parents,. . . To love not one’s parents . . *Je veux faire-le ‘I want to do it’ *Avons fait cela ‘(We) have done that’

Clearly then, Modern French AGR is [-L]. In that case, (45a–c) are ruled out by the head-government requirement of the ECP, as the trace of clitic or verb-movement is not head-governed (but it is antecedent-governed on our assumptions). On the other hand, in earlier stages of French the constructions of (45) are attested. In (46) we illustrate clitic-climbing (a), Aux-to-Comp (b), and infinitive-movement (c): (46) a. b.

Nous lui devons render gloire (1536, Calvin) We to-him must give glory Ayant ce bon homme fait tout son possible . . . Having this good man done everything possible . . . (Brunot (1905: 670)) c. Car vous avez le choix de combater ou de ne combatre pas For you have the choice to fight or to neg fight not (c15, Jouvencel)

These data indicate that in Middle and Renaissance French AGR was [+L]. So French underwent a change in the parametric value assigned to AGR, roughly in the 17th century. The situation concerning the null-subject parameter at this stage of the history of French is quite complex. It seems best to characterize the system of this period as a “defective” null-subject system. Fully productive

358 Ian Roberts null subjects (whose distribution is limited in various ways at all periods of French—see below) are largely lost in the 16th century (cf. Roberts (1993: chapter 2)). In the 17th century, null subjects are still found, but subject to various restrictions. They were (a) when referential, restricted to certain persons, in particular 2pl; (b) most frequently non-referential; (c) sensitive to the properties of C, in that both referential and non-referential null subjects were favored in questions and relatives. These properties are illustrated in (47) (examples (a) and (b) are from Maupas’ (1607) contemporary grammar; (c) contains an expletive pro in Spec of AGRP while the subject has arguably remained in VP): (47) a. b. c.

Rarement advient que ces pronoms nominatifs soient obmis. ‘Rarely (it) happens that these nominative pronouns are omitted’ J’ay receu les lettres que m’avez envoyees. ‘I’ve received the letters that (you) have sent me’ Viendra jamais le jour qui doit finir ma peine? ‘Will-come never the day which must end my pain?’ (late cl6, Desporal)

Finally, it should be noted that the morphological agreement system was not at this time especially “rich”; the 17th-century paradigms were largely the same as the contemporary ones, which are generally regarded as insufficient for the identification of the content of a null subject. Nevertheless, AGR was presumably able at this period to formally license pro even if identification of the content of pro depended on other, rather hard to discern, factors. Strikingly, 17th-century French is not the only example of a defective null-subject system of this kind. In recent work, Poletto (1995) has shown that Renaissance Veneto null subjects were restricted in very similar ways (to 1sg and 1pl, otherwise expletive or in the presence a [+wh] C), and at this period the Veneto agreement paradigms were also rather “poor.” The data in (47) confirm the idea that French AGR changed from to [+L] to [−L] around the 17th century. This is sufficient to provide a diachronic argument for the correlations that we have proposed and the account that we have given of them, and thus to support the analyses in section 4 and the system of head movement proposed in section 3. The natural question to pose at this point is: what caused the value of [+L] to change in French? In other work (cf. Clark & Roberts (1993 [this volume, Chapter 2]), Roberts (1993)), I have argued for a major parametric change in the early 16th century, roughly between Middle and Renaissance French, which eliminated both V2 and interrogatives like (38b) (A Jean parlé ‘Has John spoken?’ cf. the discussion in 4.3). This change involved loss of the possibility of nominative-assignment under government, directly eliminating inversion around a nominal subject and so indirectly eliminating V2.

Two Types of Head Movement in Romance 359 Now, no attested period of French shows the kind of freely available null subjects that we find in contemporary Italian or Spanish. In Old French, null subjects could only occur in V2 contexts (cf. inter alios Adams (1987), Hirschbühler (1990), Roberts (1993), Vance (1989)). Let us adopt the approach to the licensing of null subjects put forward in Rizzi (1986), which we can formulate as follows: (48) If X licenses pro in position P in configuration C, X can Case-mark P in C. This means that null subjects in V2 contexts also depend on the possibility of nominative-assignment under government. Thus the 16th-century loss of this possibility eliminated null subjects from the grammar of French and, in so doing, rendered AGR [-L], i.e. unable to license pro. A knock-on effect of this change was the loss of AGR’s ability to head-govern head traces, with the consequences that we have just seen. On this view, then, AGR really became [-L] in the 16th century; the defective null-subject system that survives into the 17th century is a transitional residue and there is a timelag of roughly a century before the knock-on effect of the change in the nominative parameter is felt. This account relates a whole series of important syntactic changes in French, and provides us with an example of how parametric changes may cascade through a system over a period of time. Although these remarks answer the question posed above, it may be possible to go a step further; here we move to rather more speculative ground. It is clear from what we have said that the loss of V2 is indirectly connected to the change in the value of [+L]. There are two significant facts about V2 systems generally: (49) a. C is able to head-govern a subject trace in V2 languages. b. V2 languages with referential null subjects (e.g. Old French and Veneto) only allow such null subjects when AGR is in C. The statement in (49a) is illustrated by the fact that SVO clauses are possible in V2 languages. We follow the tradition beginning with den Besten (1983) (and recently argued for by Schwartz & Vikner (1996)) in treating all root V2 clauses in V2 languages, including SVO clauses, as involving V-movement to C. Hence a German sentence like (50a) has the representation in (50b): (50) a. Johann liebt Maria ‘John loves Mary’ b. [CP Johanni [C′[C liebt ] [AGRP ti. . . Maria ]]] Here C head-governs ti. This should be contrasted with the situation in English, where the ungrammaticality of (51) shows that C, when it contains

360 Ian Roberts an auxiliary, is unable to head-govern a subject trace (this argument is due to Rizzi (1990)):8 (51) *Whoi did ti leave? Thus C is a head-governor in V2 languages but not elsewhere. In terms of what we have said about AGR above, this implies that a V2 C is [+L] (note also that a number of authors, notably Tomaselli (1990), have suggested that C in V2 languages is able to license an expletive null subject). Putting this together with the fact that V2 null-subject languages like OF and Medieval Veneto characteristically have poorer agreement inflection than non-V2 null-subject languages like Italian, we are led to the idea that in such a system C is responsible for formally licensing null subjects, while AGR identifies their content. Since according to Rizzi (1986), the same head must perform both of these functions, it is only where AGR combines with C, i.e. in V2 clauses, that a referential null subject is possible. In such languages, then, both AGR and C are [+L]. The two cases of defective null-subject systems that we have seen in 17th century French and Renaissance Veneto are both found in languages where a formerly productive V2 system has been recently lost. We can understand this if we consider that the loss of V2 means a loss of productive AGR-to-C movement. Since it is not morphologically rich at this point, AGR alone is unable to license productive null subjects, but a defective system can survive for a time, after which AGR too becomes [-L]. This account retains the idea that defective null-subject systems are transitional between fuller systems and essentially non-null-subject systems (although on this latter point the history of Veneto since the 16th century is interesting and complex—Poletto (1995), Vanelli (1987)), while attributing the defectiveness less directly to the weakness of the morphology and more to properties of the syntax. This seems to be the correct move given that French has changed with regard to the possibilities of null subjects since the 17th century, but its morphological agreement system has changed only very slightly. It also makes the time lag between the loss of nominative-assignment under government and the loss of the constructions dependent on non-finite AGR being [+L] easier to understand. This section has provided empirical support for the system of head movement developed in section 3 and the analyses of various Romance constructions in section 4. The constructions discussed in Section 4 all depend on AGR being [+L], a property which also underlies AGR’s ability to license null subjects. The diachronic evidence from French shows that all these constructions are lost as null subjects are lost, strongly supporting the contention that a single property underlies all these constructions in the way that we have suggested. This evidence also supports our overall system for head movement, and our contention that there is a head-government requirement for head traces which is independent of antecedent-government.

Two Types of Head Movement in Romance 361

6. Conclusion: Head-Government in PF In the foregoing we have argued that there are two types of head movement which obey Relativized Minimality, in differing ways as a function of the differing nature of the chains that are created. In section 2, we gave three conceptual reasons why such a move is desirable: (a) given a conjunctive ECP, the lack of selective violations of antecedent or head-government just in the case of head movement is suspect; (b) Rizzi’s definition of typical potential antecedent-governor mixes structural and functional notions; (c) the status of head traces regarding the head-government requirement is in general unclear. We dealt with these objections by distinguishing between head movement which is m-selected and forms uniformly L-related chains, and head movement which is not m-selected and forms non-L-related chains. We then extended the RM system in full to the head level, in terms of the definitions of typical potential antecedent-governor given in (21). In their application to head movement, these definitions may derive in large part from economy constraints on chain formation. Head chains formed as a result of m-selection properties of the host head are strictly local as a result of the locality constraints on selection generally. Head chains formed for other reasons (and it should be underlined that it is unclear why infinitives undergo “long” movement in many Romance languages) are not subject to the locality requirement induced by m-selection, but they are subject to the basic constraint that they undergo the minimum movement possible; this is seen most clearly in the LHM cases discussed by Lema & Rivero which are triggered by the filter in (11). As we noted in section 2, this approach overcomes the conceptual objection that Rizzi’s definition of typical potential antecedent-governor mixes structural and functional notions. We have also clarified the status of head traces with respect to head government: head traces, like all other traces, are subject to a head government requirement which is in principle independent of an antecedent-government requirement. And in fact we can construct cases where one requirement is violated while the other is satisfied. (52) is a case where a trace is head- but not antecedent-governed: (52) *Lo voglio che faccia. it I-want that I-do(subjunctive) This is an attempt to perform clitic-climbing from a tensed (subjunctive) clause. It fails because the non-L-related chain formed by clitic movement is unable to “skip” the non-L-related head C (either by excorporation or by substitution—cf. section 4.1). Nevertheless, AGR is able to head-govern the clitic trace in T (which is probably adjoined to T as in cases of long infinitive-movement discussed in 4.3).

362 Ian Roberts Conversely, (53) is an example where a head-trace is antecedent- not head-governed: (53) *Have you would t said it to John? English AGR is clearly [−L], and so it cannot head-govern the trace of longmoved have here. But have, occupying C, is able to antecedent govern its trace, since AGR, occupied by would, does not count as an intervener (even if would raises from T to AGR, the same is true since T is an L-related head—see 4.3). What is the nature of the head-government requirement? If the antecedent-government requirement of the ECP largely follows from economy considerations, we might wonder whether the head-government requirement follows from anything. One possibility is that the head-government requirement is a PF condition. In fact, it has been argued in Aoun et al. (1987) that the head-government requirement does not hold in LF. If this is true, then we expect that LF head movement in English and French, if non-m-selected, can form non-local non-L-related chains. For English, this is the case with LF V-to-C movement of the type proposed in Pollock (1989) and Chomsky (1991), and this is why negation selectively blocks V-movement in English. Have/be raising is selected and hence not sensitive to the presence of negation.9 On the other hand, LF V-raising is not m-selected, but really a kind of Quantifier Raising of the type discussed in May (1977, 1985), so negation blocks it, making do-insertion necessary in auxiliary-less negative clauses. The availability of trace-deletion makes no difference here, since there must be a trace in the V position. For French, Kayne (1989) provides evidence that there are LF “restructuring” effects, which we interpret as being instances of LF LHM uninhibited by the fact that French AGR is [-L]. The evidence comes from easy-to-please constructions, which were originally related to restructuring by Rizzi (1982). The initial observation is that easy-to-please constructions are not cases of unbounded dependency in Romance languages, as they are in English (cf. Chomsky (1977); Jaeggli (1982) showed that Spanish patterns with the other Romance languages in this respect). The following English/Romance contrast illustrates the difference: (54) a. Bill is easy to convince Mary that we should talk to. b. *Questo lavoro è facile da promettere di finire per domani. ‘This job is easy to promise to finish for tomorrow.’ However, Rizzi (1982) notices that where the intermediate verb is a restructuring verb, long-distance versions of this construction rather similar to what is found in English are possible: (55) Questa canzone è facile da cominciare a cantare. ‘This song is easy to start to sing.’

Two Types of Head Movement in Romance 363 Kayne (1989) observes that the same is true for French, and suggests that the availability of LF restructuring is behind this: (56) a. ?Ce livre serait impossible à commençer à lire aujourd’hui. ‘This book would be impossible to start to read today.’ b. *Ce genre de livre est facile à promettre de lire. ‘This kind of book is easy to promise to read.’ Combining our system with the idea that the head-government requirement does not hold in LF yields an account of (56) (and of the absence of (55)) in French. As in the case of overt movement of infinitivals discussed in 4.2, we assume that the trace in T cannot be deleted since it contains information as to finiteness. So we see that there is evidence that the head-government requirement is essentially a PF requirement. The components of the ECP seem to belong to different “interface” components, as Aoun et al. (1987) originally argued. The main theoretical result of the paper is that this is true for X°-traces just as it is true for XP-traces. The main empirical result of this study, which as we have seen supports the overall system of head movement, is a typology of the Romance languages according to their parametric choices among C and AGR as [+L], as follows: (57) OF, Med. N. It. Mod.It., Sp., Prt Mod. Fr. Rhaeto-Romansch

AGR + + -

C + +

If AGR is [+L], null subjects and “LHM” constructions are possible ceteris paribus. If C is [ +L], the language is V2. If both C and AGR are [+L], null subjects will depend on AGR-to-C movement.10 We have not discussed the fourth possibility, as instantiated in RhaetoRomansch, but to the extent that this language is reported to be similar to the Germanic languages with regard to its V2 and null-subject possibilities (see Haiman (1988)), this would be its characterization in terms of our system. This characterization combines a wide range of empirical phenomena. In particular, it is to my knowledge a new claim of this paper to relate LHM, clitic-climbing, long-movement of infinitives and Aux-to-Comp to each other. We have shown that the facts of the history of French justify these connections, and that the theory of head movement explains them.

Notes * Earlier versions of this paper have been presented at the 13th GLOW Colloquium, Leiden, the University of Maryland Workshop on Verb Movement and the Workshop on Romance Syntax at University College, London. I’d like to thank the audiences at those meetings for their comments. At Maryland, Sabine

364 Ian Roberts Iatridou was the Commentator on this paper; the modifications that the paper has subsequently undergone are in large part a response to her comments. I am especially grateful to her. I am also especially indebted to Adriana Belletti, Bob Borsley, Guglielmo Cinque, Maria-Teresa Guasti, Maria-Rita Manzini, Luigi Rizzi, Maria-Luisa Rivero for comments and discussion. None of these people are responsible for my mistakes, though. 1. One antecedent of our proposal, at least in the domain of verb-movement, is Koopman (1984), who proposed that there were both “A”-type verb-move and “A’”-type verb-movement. Cf. also the proposal in Pollock (1989) that certain cases of verb-movement are triggered by a variable-binding requirement; this amounts to saying that these are cases of A’-movement of V. 2. Or the Specifier of the host; on Rizzi’s (1996) account of “residual” V2 in English and French questions, it is essentially Spec of CP which triggers I-to-C in virtue of an abstract morphological property. We can regard this as “induced” m-selection, and so the locality of I-to-C movement in cases of residual V2 is predicted. This account is incompatible with the account of the impossibility of I-to-C movement in indirect questions proposed by Rizzi & Roberts (1996 [this volume, Chapter 9]), but compatible with the one in Rizzi (1996): selection for a [+wh] C in indirect questions satisfies the WH-criterion and so I-to-C movement is blocked by economy. 3. It may be that the antecedent-government component of the ECP derives from conditions on chain-formation, as suggested both by Rizzi (1990b) and Chomsky & Lasnik (1993) in different ways. However, in what follows we provide clear evidence that head-traces must be head-governed. If the ECP is reduced to a head-government requirement, then we have empirical evidence that it applies to head traces. In section 6, we will suggest that the head-government requirement is a PF condition. 4. Rumanian may be a slightly different case from the others, since here LHM is triggered by illocutionary force, being found in questions and exclamations (Rivero (1993)). Following the reasoning in note 2, such movements are the result of an abstract morphological trigger. In that case, Rumanian LHM would be nonlocal head movement triggered by morphology. On the view being developed here, this should be impossible. We suspect that Rumanian LHM is in fact a type of “inverted conjugation” found in perfect tenses in a range of Romance languages: Old French (Dupuis (1989)), Sardinian (Jones (1993)), Old Spanish (Lema & Rivero (1991a)) and Southern Italian dialects. It is unclear what this phenomenon is, and how it is connected to VP Fronting and to Scandinavian-style Stylistic Fronting (Rögnvaldsson & Thráinsson (1990)), but it is clear that it is not the same as LHM. See Lema & Rivero (1991a) for discussion. 5. On what (11) may ultimately derive from, see Roberts (1992). (11) certainly is not a phonological property (pace Cardinaletti & Roberts (2002 [this volume, Chapter 12])). Since dislocated elements ‑ which are presumably outside CP—do not “count.” Thus we find enclisis with dislocated elements. This observation is crucial to an understanding of word order in Modern European Portuguese ‑ cf. Benincà (1991), Salvi (1991). 6. Zanuttini (1991) argues that in languages where clausal negation occupies a higher position than the inflected verb, e.g. Italian, Spanish and Portuguese, it selects TP in finite clauses. This relation is c-selection of the standard kind, i.e. a relation between a head and a maximal projection (p. 54). Thus Zanuttini’s proposal does not affect the point being made in the text; in fact, she explicitly points out (p. 52) that Romance negative markers are not affixal. Hence they do not m-select anything, so we have no reason to treat Neg as L-related.

Two Types of Head Movement in Romance 365

The account of Osp and EP LHM in the text implies that all cases of the ToblerMussafia Law in Medieval Romance involve LHM, i.e. V-to-C movement. We thus predict that V-to-C movement, i.e. enclisis, is blocked by negation. This is certainly true in contemporary European Portuguese, where a version of the Tobler-Mussafia Law is still operative (Benincà (1991:24)): (i) a. Não os comprendo. b. *Não comprendo-os. ‘(I) not understand them’ 7. Pace Belletti (1990), we assume that AGR does not m-select T in infinitives (in languages without inflected infinitives). One might then question the idea that AGR is L-related here. Nevertheless, we assume that AGR is intrinsically L-related independently of m-selection (cf. the theory of A-positions proposed by Rizzi (1991)). Recall that when we introduced the notion of L-relatedness we pointed out that we cannot maintain the very strong claim that “X is L-related iff X has an m-selection feature.” Instead, we retain the weaker implication that “if X has an m-selection feature, X is L-related.” We can tentatively add that AGR is always L-related; note that this could derive the existence of affixhopping in Modern English. 8. In Chomsky (1991), the impossibility of (51) is attributed to a violation of economy, do being inserted in a context where it is not required. However economy cannot tell the whole story here, since a parallel argument, due to Friedemann (1991) and summarized in Rizzi (1990b), can be constructed for French on the basis of the ungrammaticality of (i): (i) *Quei sent ti mauvais? ‘What smells bad?’ Interrogative que must cliticise to a verb in C. However, if the verb moves to C, the trace in subject position cannot be head-governed. If the verb does not move to C, que’s cliticization requirement is not satisfied. It is not obvious that economy considerations can explain the ungrammaticality of this example. 9. If we say this, we must reconsider the nature of the locality condition on selection. One possibility is that the only kinds of head that can be m-selected are L-related heads. So, in the spirit of Relativized Minimality, Neg does block m-selection of T by AGR even when it is base-generated in a position which intervenes between these heads. 10. The approach to clitic-climbing presented in 4.2 implies that if C is L-related, clitic-movement can skip it. Thus, independently of any “restructuring” property of the matrix verb, clitics may be able to climb to the higher AGR. The prediction is that V2 languages with clitic systems allow clitic-climbing with verbs like “decide,” “promise,” etc. (cf. the ungrammaticality of (25b) in Modern Italian). In fact, this is what we find in Old French, in complete conformity with our analysis and the resulting typology—cf. Pearce (1990).

References Adams, M. 1987. Old French, Null Subjects and Verb-Second Phenomena. PhD dissertation, UCLA. Alberton, S. 1990. Enclise du pronom objet en français et en italien antique ou la loi Tobler-Mussafia. Mémoire de Licence, University of Geneva. Aoun, J., N. Hornstein, D. Lightfoot & A. Weinberg. 1987. Two Types of Locality. Linguistic Inquiry 18:537–577. Baker, M. 1988. Incorporation: A Theory of Grammatical-Function Changing. Chicago: University of Chicago Press.

366 Ian Roberts Belletti, Adriana. 1990. Generalized Verb Movement: Aspects of Verb Syntax. Turin: Rosenberg and Sellier. Belletti, A. 1994. Verb positions: evidence from Italian. In D. Lightfoot & N. Hornstein (eds) Verb Movement. Cambridge: Cambridge University Press, pp. 19–40. Benincà, P. 1989. L’ordine delle parole nelle lingue romanze medievali. XIX Congreso Internacional de Lingüística e Filoloxia Romanicas. Santiago de Compostela. Benincà, P. 1995. TOP and SpecCP in Medieval and Modern Romance. In A. Battye & I. Roberts (eds) Clause Structure and Language Change. Cambridge: Cambridge University Press, pp. 325–344. Den Besten, H. 1983. On the interaction of root transformations and lexical deletive rules. In W. Abraham (ed) On the Formal Syntax of the Westgermania. Amsterdam: John Benjamins. Borsley, R., M.-L. Rivero & J. Stephens. 1996. Long head movement in Breton. In R. Borsley & I. Roberts (eds) The Syntax of the Celtic Languages. Cambridge: Cambridge University Press, pp. 53–74. Cardinaletti, Anna, and Ian Roberts. 2002. Clause structure and X-second. In G. Cinque (ed) Functional Structure in DP and IP: The Cartography of Syntactic Structure Volume One New York/Oxford: Oxford University Press, pp. 123–166 [this volume, Chapter 12]. Chomsky, N. 1965. Aspects of the Theory of Syntax. Cambridge MA: MIT Press. Chomsky, N. 1977. On Wh-Movement. In P. Culicover, T. Wasow & A. Akmajian (eds) Formal Syntax. New York: Academic Press. Chomsky, N. 1981. Lectures on Government and Binding. Dordrecht: Foris. Chomsky, N. 1986. Barriers. Cambridge, MA: MIT Press. Chomsky, N. 1991. Some notes on economy of derivation and representation. In R. Friedin (ed) Principles and Parameters in Comparative Grammar. Cambridge, MA: MIT Press, pp. 417–454. Chomsky, N. & H. Lasnik. 1993. The theory of principles and parameters. In J. Jacobs, A. von Stechow, W Sternefeld & T. Vennemann (eds) Syntax: An International Handbook of Contemporary Research. Berlin: de Gruyter. Reprinted in N. Chomsky The Minimalist Program. Cambridge MA: MIT Press, 1995, pp. 13–128. Cinque, G. 1991. Types of A’-Dependencies. Cambridge, MA: MIT Press. Clark, R. & I. Roberts. 1993. A computational model of language learnability and language change. Linguistic Inquiry 24:299–345 [this volume, Chapter 2]. Dupuis, F. 1989. L’expression du sujet dans les subordonnées en ancien français. PhD dissertation, University of Montreal. Friedemann, M.-A. 1991. Propos sur la montée du verbe en C dans certaines interrogatives françaises. Mémoire de licence, University of Geneva. Giorgi, A. & G. Longobardi. 1991. The Syntax of Noun Phrases: Configuration, Parameters and Empty Categories. Cambridge: Cambridge University Press. Guasti, M.-T. 1992. Causative and Perception Verbs. PhD dissertation, University of Geneva. Guéron, J. & T. Hoekstra. 1988. T-chains and the constituent structure of auxiliaries. In A. Cardinaletti, G. Cinque & G. Giusti (eds) Constituent Structure. Dordrecht: Foris. Haiman, J. 1988. Rhaeto-Romansch. In N. Vincent & M. Harris (eds) The Romance Languages. London: Routledge. Hirschbuhler, P. 1990. La légitimation de la construction V1 à sujet nul dans la prose et le vers en ancien français. Revue québécoise de linguistique 19:32–55. Jones, M. 1993. Sardinian Syntax. London: Croom Helm. Kayne, R. 1972. Subject inversion in French interrogatives. in J. Casagrande and B. Saciuk (eds.) Generative Studies in Romance Languages. Rowley, MA: Newbury House.

Two Types of Head Movement in Romance 367 Kayne, R. 1989. Null subjects and clitic climbing. In O. Jaeggli & K. Safir (eds) The Null Subject Parameter. Dordrecht: Kluwer, pp. 239–261. Kayne, R. 1991. Romance clitics, verb movement and PRO. Linguistic Inquiry 22:647–686. Kayne, R. 1992. Italian negative infinitival imperatives and clitic climbing. In L. Tasmowski & A. Zribi-Hertz (eds) De la musique à la linguistique. Hommages à Nicolas Ruwet. Ghent: Communication & Cognition, pp. 300–312. Van Kemenade, A. & A. Hulk. 1995. Verb second, pro-drop, functional categories and language change. In A. Battye & I. Roberts (eds) Clause Structure and Language Change. Cambridge: Cambridge University Press, pp. 227–256. De Kok, A. 1985. La place du pronom personnel régime conjoint en français. Une étude diachronique. Amsterdam: Rodopi. Koopman, H. 1984. The Syntax of Verbs: From Verb-Movement Rules in the Kru Languages to Universal Grammar. Dordrecht: Foris. Koopman, Hilda, & Dominique Sportiche. 1991. The position of subjects. In James McCloskey (ed) The Syntax of Verb-Initial Languages, 211–258. Amsterdam: Elsevier. [Special issue of Lingua 85.] Jaeggli, O. 1982. Topics in Romance Syntax. Dordrecht: Foris. Lema, J. 1990. Licensing Conditions on Head Movement. PhD dissertation, University of Ottawa. Lema, J. & M.-L. Rivero. 1990. Long head movement: ECP vs HMC. In J. Carter et al. (eds) Proceedings of NELS 20, pp. 333–347. Lema, J. & M.-L. Rivero. 1991a. Types of verbal movement in Old Spanish: Modals, futures, and perfects. Probus 3. 237–278. Lema, J. & M.-L. Rivero. 1991b. Inverted conjugations and V-second effects in Romance. In C. Laeufer & T. Morgan (eds) Theoretical Analyses in Romance Linguistics. Amsterdam: Benjamins, pp. 311–328. Lieber, R. 1980. On the Organization of the Lexicon. PhD dissertation, MIT. Marantz, A. 1984. On the Nature of Grammatical Relations. Cambridge MA: MIT Press. May, R. 1977. The Grammar of Quantification. PhD dissertation, MIT. May, R. 1985. Logical Form: Its Structure and Derivation. Cambridge MA: MIT Press. Motapanyane, V. 1991. Theoretical Implications of Complementation in Rumanian. PhD dissertation, University of Geneva. Mussafia, A. 1983. Scritti di filologia e linguistica. Padua: Antonore. A. Daniele and L. Renzi (eds) Ouhalla, J. 1990. Sentential negation, relativized minimality and the aspectual status of auxiliaries. The Linguistic Review 7:183–231. Pearce, E. 1990. Parameters in Old French Syntax. Dordrecht: Kluwer. Poletto, C. 1995. The diachronic development of subject clitics in North-Eastern Italian dialects. In A. Battye & I. Roberts (eds) Clause Structure and Language Change. New York/Oxford: Oxford University Press, pp. 295–324. Pollock, J.-Y. 1989. Verb movement, UG and the Structure of IP. Linguistic Inquiry 20:365–424. Postal, P. 1969. On the so-called pronouns of English. In D. Reibel & S. Schane (eds) Modern Studies in English. Englewood Cliffs, NJ: Prentice-Hall. Rivero, M.-L. 1991. Long head movement and negation: Serbo-Croatian vs Slovak and Czech. The Linguistic Review 8:319–351. Rivero, M.-L. 1993. Long head movement vs V2 and null subjects in Old Romance. Lingua 89:217–245. Special Issue on Null Subjects in Diachrony edited by Aafke Hulk and Ans van Kemenade. Rivero, M.-L. 1994. Clause structure and V-movement in the languages of the Balkans. Natural Language and Linguistic Theory, 12: 63–120. Rizzi, L. 1982. Issues in Italian Syntax. Dordrecht: Foris.

368 Ian Roberts Rizzi, L. 1986. Null Objects in Italian and the Theory of pro. Linguistic Inquiry 17:501–557. Rizzi, L. 1996. Residual verb-second and the Wh-criterion. In A. Belletti & L. Rizzi (eds) Parameters and Functional Heads. New York/Oxford: Oxford University Press, pp. 63–90. Rizzi, L. 1990. Relativized Minimality. Cambridge MA: MIT Press. Rizzi, L. 1991. Proper head-government and the definition of A-positions. GLOW Newsletter 26:46–47. Rizzi, L. & I. Roberts. 1989. Complex inversion in French. Probus 1, 1‑30 (Reprinted in A. Belletti & L. Rizzi (eds). 1996. Parameters and functional heads. Oxford: Oxford University Press, pp. 91–118) [this volume, Chapter 9]. Roberts, I. 1991. Excorporation and minimality. Linguistic Inquiry, 22, 209–218 [this volume, Chapter 10]. Roberts, I. 1992. Wacknagel meets the extended projection principle. GLOW Newsletter, 28. Roberts, Ian. 1993. Verbs and Diachronic Syntax. Dordrecht: Kluwer. Rögnvaldsson, E. & H. Thráinsson. 1990. On Icelandic word order once more. In J. Maling & A. Zaenen (eds) The Syntax of Modern Icelandic. San Diego: Academic Press. Salvi, G. 1991. Difesa e illustrazione della legge di Wackernagel applicata alle lingue romanze antiche. In Miscelleanea G. B. Pellegrini. Padova: Unipress. Schwartz, Bonnie, and Sten Vikner. 1996. The verb always leaves IP in V2 clauses. In A. Belletti & L. Rizzi (eds) Parameters and Functional Heads: Essays in Comparative Syntax. New York/Oxford: Oxford University Press, pp. 11–62. Sportiche, D. 1998. Movement, agreement and case assignment. In. D. Sportiche (ed) Partitions and Atoms of Clause Structure. London: Routledge, pp. 88–243. Tobler, A. 1875. Review of J. Le Coultre De l’ordre des mots dan Chrétien de Troyes. Reprinted in A. Tobler (1912) Vermischte Beiträgen zu Französischen Grammatik, V. Leipzig. Tomaselli, A. 1990. La sintassi del verbo finito nelle lingue germaniche. Padua: Unipress. Travis, L. 1984. Parameters and Effects of Word Order Variation. PhD dissertation, MIT. Vance, B. 1989. Null Subjects and Syntactic change in Medieval French. PhD dissertation, Cornell University. Vanelli, L. 1987. I pronomi soggetto nei dialetti italiani settentrionali dal Medio Evo a oggi. Medievo Romanzo 12. Zanuttini, R. 1991. Syntactic Properties of Sentential Negation: A Comparative Study of Romance Languages. PhD dissertation, University of Pennsylvania.

12 Clause Structure and X-Second Anna Cardinaletti and Ian Roberts

The purpose of this essay is to propose a unified analysis of a range of “secondposition” phenomena that have been attested in various languages, and in so doing to motivate a more elaborated theory of Nominative Case assignment. The proposal is that many languages, including a number of Germanic and Romance languages, have a projection that intervenes between Comp and the highest Infltype projection (which, following Belletti 1990, we take to be AgrP). We refer to this projection as Agr1P, and we refer to the lower, “traditional” AgrP as Agr2P. Thus our claim is that in the languages in question there are two Agr-heads and two projections of Agr. These two Agrs are both “subject” Agrs; in this respect, our proposal is distinct from but not exclusive with Chomsky’s (1989) idea that, in addition to the standard “subject” Agr, there is also an “object” Agr. We show here that this proposal is of considerable empirical value in that it offers a new perspective on a range of second-position phenomena and allows us to connect “verb-second” effects with various kinds of “cliticsecond” effects, known in traditional grammar as Wackernagel’s Law and the Tobler-Mussafia Law.1 In fact, we suggest that the presence of Agr1P is fundamentally related to Nominative Case assignment, in that the basic property of Agr1° seems to be that of assigning Nominative Case; the other properties that we ascribe to it (e.g., attracting clitics and attracting the inflected verb) are intimately related to its Nominative-assigning property. In this sense, it may be best to think of Agr1P as NomP. As a working hypothesis, then, we assume that in languages that have both Agr1P and Agr2P, Agr2° is not an assigner of Nominative Case. Because our focus is on the interaction of clause structure with structural Case assignment, we concentrate almost exclusively on the processes and properties of S-structure. It is a classic tenet of generative grammar that inflectional affixes may be separate syntactic entities at pre-phonological levels of representation; see the analysis of the English auxiliary system in Chomsky (1957). At the turn of the 1990s, this idea received new impetus, beginning with Pollock (1989). Pollock’s proposal that the Infl-node of Chomsky (1981) should be split into its morphological components led to the working hypothesis in much recent research that any inflectional head which appears to be syntactically relevant heads its own maximal projection with the standard X-bar theoretic structure. Our proposal amounts to the claim that certain languages

370 Anna Cardinaletti and Ian Roberts have a special X-bar projection for the assignment of Nominative Case; in terms of the connection with inflectional morphology, it may be possible to relate the existence of Agr1P to the possession of morphologically realized Nominative Case. The structure that we are proposing is as follows: CP Spec

C′ Agr1P

C° Spec

Agr1′ Agr1°

Agr2P Spec

Agr2′ Agr2°

TP

The order of TP and Agr2° varies in the languages under consideration: in West Germanic TP is on the left of Agr2° (see Giusti 1986 for the proposal that IP is head-final in German); in North Germanic and Romance, it is on the right. By contrast, Agr1°, like C°, precedes Agr2P in all the languages we discuss. The essay is organized as follows: in section 1 we analyze the phenomenon of embedded verb-second, basing ourselves largely on the best-known case of this type: Icelandic (although we also analyze both Old French and Yiddish in these terms). In section 2 we show how our system gives a natural analysis of the “clitic-second” phenomena in Germanic and Romance; in our terms the traditionally recognized Wackernagel position of Germanic languages is Agr1°, as is the clitic position in those Romance languages that obey the Tobler-Mussafia Law. The last two sections deal with ways in which the properties of Agr1° vary parametrically: we discuss different modes of Nominative Case assignment in section 3, where we present and elaborate the recent proposals of Koopman and Sportiche (1991), and different kinds of null subjects in section 4. Finally, the appendix is devoted to the discussion of Stylistic Fronting. We have also added a postscript to this version of the paper, explaining its rather unusual and protracted prepublication history.

1. Agr1 as a Position for the Inflected Verb: Embedded V2 1.1 Icelandic The postulation of Agr1P allows us to account in a straightforward way for certain differences within the class of verb-second languages. While in many languages—for example, German, Dutch, and Mainland Scandinavian— verb-second is essentially a root phenomenon, it appears to be generalized

Clause Structure and X-Second 371 to all types of embedded clauses in Icelandic. The following data (from Rögnvaldsson and Thráinsson 1990 and Thráinsson, personal communication) illustrate this, showing that in a variety of embedded clauses we have the order XP > V > subject:2 (1) a. Ég held að þegar hafi María lesið þessa bók. I believe that already has M. read this book ‘I believe that Mary has read this book already.’ b. Ég harma að þegar hafi María lesið þessa bók. (factive) I regret that already has M. read this book ‘I regret that Mary has already read this book.’ c. Ég spurði hvort þegar hefði María lesið þessa bók. (Wh) I asked whether already had M. read this book ‘I asked whether Mary had already read this book.’ d. sú staðreynd að þegar hefur María lesið þessa bók (NP) the fact that already had M. read this book ‘the fact that Mary had already read this book’ e. bókin sem þegar hefur María lesið (relative) book-the that already had M. read ‘the book that Mary had already read’ The Icelandic situation, as illustrated in (1), contrasts with what we find in German. In German, embedded V2 is possible only in a limited class of embedded clauses, essentially the complements to “bridge verbs” of the type in (1a). Embedded V2 is excluded in all the contexts parallel to (1b–e):3 (2) a. b. c. d. e.

Ich glaube, gestern hat Maria dieses Buch gelesen. I believe yesterday has M. this book read ‘I believe Mary read this book yesterday.’ *Ich bedauere, (daβ) gestern hat Maria dieses Buch gelesen. I regret (that) yesterday has M. this book read *Ich frage mich, ob gestern hat Maria dieses Buch gelesen. I ask myself whether yesterday has M. this book read *die Tatsache, gestern hat Maria dieses Buch gelesen the fact yesterday has M. this book read *das Buch, das gestern hat Maria gelesen the book which yesterday has M. read

For (2a), an analysis in terms of CP-recursion seems to be in order. As noted in Rizzi and Roberts (1996 [this volume, Chapter 9]), the same class of verbs allows an otherwise root phenomenon—subject-aux inversion triggered by a negative-polarity item—in its complement in English: (3) a. I believe that only in America could you do such a thing. b. *I wonder whether only in America could you do such a thing.

372 Anna Cardinaletti and Ian Roberts It seems then that, independently of verb-second, the complements of bridge verbs are able to have root properties. We propose, still following Rizzi and Roberts (1996 [this volume, Chapter 9]), that this is because bridge verbs allow CP-recursion in their complements. More precisely, we propose that bridge verbs select a C° which selects another C°; to avoid unlimited recursion at the C-level, we clearly must propose that the two C°s have different properties (e.g., that the first allows a “propositional” complement, while the second only allows a “predicational” complement in the terms of Rizzi 1990b). In German, the two C°s are different in form: the first is null, and the second is filled by the inflected verb, like the C° of a matrix clause. (In English, too, the C° that selects CP is different from other C°s: that in (3a) cannot be deleted, while other occurrences of that can be.) Adopting this analysis for (2a), we propose the following partial structure: (4) . . . glaube [CP Ø [CP gestern [C hat [ Maria . . . ] ] ] ] Vikner (1995) proposes extending this analysis to embedded verb second in Icelandic. This entails that CP-recursion is generalized in Icelandic, while it is limited to a specific class of complements in German, Mainland Scandinavian, and English. In other words, the property of selecting C° is available for all classes of C° in Icelandic. If this were true, however, then there would be no way to avoid unlimited recursion of C°, which is clearly an undesirable consequence. Instead, our proposal provides a straightforward account of the data in (1). These examples have the following structure (although, to the extent that the class of verbs which allows CP-recursion in German also allows it in Icelandic, (1a) may also have a structure like (4) with að in the higher C° and the inflected verb in the lower C°):4 (5) [CP C° [Agr1P TOP [Agr1′ Vi+Agr1 [Agr2PNPNom [Agr2°ti] . . . ]]]] As we will see in more detail in section 3, the special property of Icelandic is that SpecAgr1′ is a topic position, while the usual superficial subject position is SpecAgr2′. If we do not assume a double-Agr structure, no other position would be available for the subject. Since there is no generalized CP-recursion, SpecC′ is not available. Although we assume that the subject is base-generated in VP (see in particular Koopman and Sportiche 1991), the base-position of the subject is unavailable, at least for a definite NP, since it is not a position that can receive Nominative Case from Agr° (see section 4 for some evidence that indefinite NPs can appear in this position). This idea is confirmed by the fact that definite subjects always precede VP-adverbs, as in . . .að pegar hefur María oft lesið þessa bók ‘that already has Mary often read this book’ (Thráinsson, personal communication) (see also Vikner 1995, 68). Furthermore, we follow Rizzi (1990a) in assuming that SpecT′ is inherently

Clause Structure and X-Second 373 an A′-position and, as such, is not a possible landing site for the subject. In fact, if we adopt Rizzi’s (1991) characterization of potential A-positions as either θ-positions or specifiers of Agr, SpecT′ must be an A′-position. Hence, SpecAgr2′ is the position of the definite subject. Using this analysis of embedded clauses, we are not forced to treat matrix V2 in Icelandic as involving movement of the verb to C°. Movement to Agr1° would clearly suffice to derive the same orders (see Rögnvaldsson and Thráinsson 1990 and the references given there for recent discussion of similar proposals). At the same time, the data do not force us to reject a movement-to-C analysis. One property of Icelandic, however, suggests that, in fact, matrix V2 should be handled in terms of verb-movement to Agr l° rather than to C°. Icelandic makes much more frequent use of declarative V1 orders than do the other (Modern) Germanic languages (aside from Yiddish). Declarative V1 is illustrated in the following example: (6) Hitti hann þá einhverja útlendingar. met he then some foreigners ‘He then met some foreigners.’ (Sigurðsson 1985, 1) We propose that in (6) the inflected verb has undergone structure-preserving topicalization (i.e., topicalization of an X° category to another head position Y°). This operation probably takes place for reasons connected to information structure, since the examples in question seem to be presentational sentences. We propose that the landing site of this operation is C°; these are the only declarative clauses in which the verb is in C° (as in other Germanic and Romance languages, the inflected verb is typically in C° in matrix interrogatives, imperatives, and hypotheticals). As we will see in section 2.2, this kind of “verb-topicalization” is not restricted to Icelandic but also is found in Medieval Romance languages (see also Benincà 1989; Alberton 1990). More generally, we expect this possibility to exist in all languages that realize V2 at the Agr1P-level since in such cases C° is freely available as a landing site for structure-preserving topicalization. Since the V2 requirement is satisfied elsewhere, SpecC′ in such languages can remain empty. This analysis implies that declarative operators do not exist, since otherwise we would expect V1 declaratives to be generally possible on a par with V1 interrogatives, hypotheticals, and so on.5 One reason to favor an analysis of (6) in which the verb moves to C° over one in which the verb moves only to Agr1°, while the subject stays in SpecAgr2′, is that this kind of V1 is a root phenomenon. Since den Besten (1983), the simplest treatment of root phenomena has been to say that they involve movement to C°, a position available in principle in matrix clauses but unavailable in embedded clauses. If matrix clauses are Agr1P in Icelandic, this implies that verb second is not a unified phenomenon in

374 Anna Cardinaletti and Ian Roberts the Germanic languages, at least in the sense that the landing site of the verb may vary cross-linguistically. We will see further evidence in favor of this conclusion as we proceed. See also Diesing (1988, 1990) and Santorini (1988, 1989) for a similar conclusion based on Yiddish evidence (and see section 1.3 here for some discussion of Yiddish). To sum up, following on from our “double-Agr” hypothesis about basic clause structure, the following conclusions emerge for Icelandic: (7) a. Agr1° can assign Nom under government (see section 3). b. SpecAgr2′ is a subject position. c. SpecAgr1′ can be a topic position. d. SpecC′ is an operator position. 1.2 Old French The V2 nature of Old French (OF) is illustrated clearly by the examples in (8) (non-nominative clitics—e.g., en in (8b)—are effectively part of the inflected verb, and so do not “count” in the computation of the second position): (8) a. Einsint aama la demoisele Lancelot. (Adams 1987b, 50) thus loved the lady Lancelot ‘Thus the lady loved L.’ b. Desuz un pin en est li reis alez. (Schulze 1888, 200) under a pine-tree of-it is the king gone ‘The king went underneath a pine tree.’ c. Quatre saietes ot li bers au costé. (Le Charroi de Nîmes, 1. 20)6 four boats-of-war had the baron at-the side ‘The baron had four boats of war at his side.’ Adams (1987a,b) shows that V2 is possible in the complements to bridge verbs. The class of bridge verbs in question is comparable to the class which in V2 Germanic languages typically allows complements with matrix properties (see section 1.1). Here are some examples with null subjects: (9) a. b.

Or voi ge bien, plains es de mautalant. (Le Charroi de Nîmes, 1.295) now see I well full are (you) of bad-intentions ‘And now I see clearly that you are full of bad intentions.’ Je cuit plus sot de ti n’i a. (Adams 1987b, 17 (11a–c)) I think more stupid than you not there has (it) ‘I think that there is no one stupider than you.’

In neither of these examples is que present, suggesting that these are cases of German-style embedded V2 (this implies that “conjunctive discourse” is not specific to German; see note 3).

Clause Structure and X-Second 375 The embedded sentences in examples like (9) can be analyzed as follows: (10)

CP1 CP2

C1°

C2′

AdvP C2°

AgrP

V+Agr NP

TP

It is clear that CP2 here is just like a matrix clause, and so V2 is possible, as expected. However, there are cases of V2 orders in Wh-clauses: (11) a. b.

Quant a eus est li rois venus,. . . (Dupuis 1989, 148 (40)) when to them is the king come ‘When the king came to them,. . .’ s’a la vostre bonté vousist mon pere prendre garde if against the your good-will wanted my father to-take precaution ‘if against your good will my father wanted to take precautions’ (Adams 1988b, (19c))

(12) a. b.

Por l’esperance qu’an lui ont,. . . for the hope which in him have (they) ‘For the hope which they have in him,. . . Et si ne sait que faire puisse. and so not knows what to-do can (he) ‘And so he doesn’t know what he can do.’

In terms of the standard assumption that the inflected verb cannot move to a [+wh] C° (see Rizzi and Roberts 1996 [this volume, Chapter 9] and Rizzi 1996 for an account of this), we are led to the conclusion that the verb is in Agr1° in these examples. So we assign the following structure to (11a), for example: (13)

CP C′

XP

Agr1P

C° [+wh]

Agr1′

Spec Agr1°

Agr2P Agr2′

Spec

quant

a eus esti

li rois

Agr2°

TP

ti

ti venus

376 Anna Cardinaletti and Ian Roberts Here the verb appears in Agr1° and assigns Case under government to the subject, li rois, in SpecAgr2′. What is the status of SpecAgr1′ in OF? On the basis of examples such as (11), it appears to be a topic position of the Icelandic kind. However, clear examples of the type in (11) are not very frequent. According to Dupuis (1989, 151–152), this possibility is only attested with any real frequency in the Quatre Livres du Roi (a text from around 1170); in other texts (including some from the same period), there are very few cases of embedded V2 (in non-bridge complements) with overt subjects, There are, however, cases of embedded V2 order with null subjects in a range of twelfth-century texts, as well as in some thirteenth-century verse texts (see Hirschbuhler 1990 and section 4.2 here). Such examples are not clear cases of topicalization and can instead be treated as Stylistic Fronting. Stylistic Fronting is an operation found in Icelandic, Faroese, and Yiddish. The operation fronts some VP-constituent, usually an adverbial, a participle, or a complement (see Maling 1990 for details), to a position between C° and the inflected verb. The main condition on Stylistic Fronting is that the sentence must contain a subject gap, usually a trace of Wh-movement, but also possibly the trace of cliticization (Platzack 1988) or an NP-trace (Sigurðsson, 1989). The following Icelandic example illustrates the application of Stylistic Fronting to a participle: (14) Þarna er konan sem kosin var / var kosin forsetí. there is woman that elected was / was elected president ‘There is a woman that was elected president.’ One test favors a Stylistic-Fronting analysis of examples like (12) over topicalization analysis. Platzack (1988) shows that the subject-gap condition can be satisfied by a cliticized subject pronoun. This gives rise to the order Complementizer > Subject > XP > V, which cannot be a case of embedded verb-second. Such orders are found at the relevant period of OF: (15) quant il de ci departiront (Vance 1988,89 (11)) when they from here will-leave ‘when they will leave here’ The existence of this kind of order, and the absence of clear embedded topicalization at the relevant period of OF, suggest that apparently ambiguous examples of the kind in (12) are to be treated as involving Stylistic Fronting. For reasons of space, we make no proposal for the analysis of Stylistic Fronting in this essay, although we note that it does seem to correlate with the double-Agr structure in VO languages: Icelandic and OF both have both properties, as arguably so do Middle English and the Medieval Mainland Scandinavian languages (see Platzack 1988, 1994). In the appendix, we suggest a tentative analysis of this operation.

Clause Structure and X-Second 377 The conclusion is that, with the possible exception of some of the very early texts OF does not allow generalized embedded topicalization, although it does have Stylistic Fronting. We interpret this to mean that SpecAgr1’ was not a topic position in OF but was a subject position. Nevertheless, movement of the inflected verb to Agr1° was general, and so the movement of the inflected verb to Agr1° does not necessarily imply that SpecAgr1’ is a topic position (see section 3). In section 4.2, we will see that the distribution of null subjects in OF supports this conclusion. To sum up, OF had the following properties: (16) a. Agr1° can assign Nominative under government. b. SpecAgr2′ is a subject position. c. SpecAgr1′ is also a subject position (see section 3). 1.3 Yiddish The other Germanic language that has been claimed to allow generalized embedded topicalization is Yiddish (see Diesing 1988, 1990; Santorini 1988, 1989). The following examples illustrate topicalization in [−wh] and [+wh] complements: (17) a. b. c.

Jonas bedoyert az dos bukh hob ikh geleyent. (Vikner 1995, 72) J. regrets that this book have I read ‘John regrets that 1 have read this book.’ Ikh veys nit far vos in tsimer iz di ku geshtanen. (Vikner 1995, 74) I know not for what in room is the cow stood ‘I don’t know why the cow has stood in the room.’ Ikh veys nit tsi in tsimer iz di ku geshtanen. (Santorini, personal communication) I know not if in room is the cow stood ‘I don’t know if the cow has stood in the room.’

Since “regret” is a non-bridge verb, (17a) is most likely not a case of CP-recursion. Examples (17b,c) are cases of topicalization in a [ +wh] complement. Both Diesing (1988, 1990) and Santorini (1988, 1989) propose to analyze this kind of embedded V2 in Yiddish by treating SpecI’—the canonical subject position in a language like English—as a topic position and then taking the subject to be in its VP-internal base position. However, this analysis will not carry over straightforwardly in our terms. If we split IP into TP and AgrP, and assume that Nominative Case is assigned by Agr°, then the VP-internal subject position cannot receive Nominative Case. Instead, this position receives Partitive Case (see Belletti 1988) and therefore can only be occupied by indefinite NPs. Since we are assuming that SpecT' is an A'-position (see section 1.1), no (nonscrambled) NP can occupy this position. Hence, the only position for a definite Nominative

378 Anna Cardinaletti and Ian Roberts NP is SpecAgr'. For these reasons, the examples in (17) motivate an analysis in terms of the “double-Agr” structure. So (17a) would have the following structure: (l8)

CP C°

Agr1P Agr1′

NP Agr1°

Agr2P Agr2′

NP

az dos bukh hobi

ikh

Agr2°

TP

ti

ti geleyent

Further evidence in favor of embedded topicalization and against CP-recursion comes from sentences like the following where an argument undergoes long Wh-movement: (19) Vas hot er nit gevolt az in shul zoln di kinder leyenen? (Santorini 1989, 59) what has he not wanted that in school shall the children read ‘What didn’t he want the children to read in school?’ As expected, the analogous long extraction of an adjunct is impossible (see Schwartz and Vikner 1996, 3.1). In general, extraction from complements with CP-recursion is impossible (see Rizzi and Roberts 1989 [this volume, Chapter 9]). This, combined with the fact that CP-recursion under verbs of volition is otherwise unattested, argues that there is no CP-recursion here. Instead, then, this must be a case of embedded topicalization. However, there are a number of restrictions on embedded V2 in Wh-complements in Yiddish. Only a limited class of Wh-complements—for example, those in far vos ‘why’ and tsi ‘if”—allow embedded topicalization. Compare (20) with (l7b,c): (20) a. b.

*Ikh veys nit vu nekhtn iz di ku geshtanen. (Vikner 1995, 74) I know not where yesterday is the cow stood ‘I don’t know where the cow stood yesterday.’ *Ikh veys nit ven zayn khaver hot Moyshe getrofn. (Santorini, personal communication) I know not when his friend has M. met ‘I don’t know when Moyshe met his friend.’

Clause Structure and X-Second 379 c. *Ikh veys nit (far) vemen zayn khaver hot Moyshe forgeshtelt. (Santorini, personal communication) I know not (to) whom his friend has M. introduced ‘I don’t know who Moyshe has introduced his friend to.’ Short extraction of an argument Wh-element seems to be possible over a topicalized adjunct PP, however: Ikh veys nit vemen in restoran hot Moyshe getrofn ‘I don’t know whom in the restaurant has M. met’. This situation should be compared with that in Icelandic mentioned in note 2. On the other hand, topicalization in a Wh-complement apparently becomes generally possible where the subject is extracted or the subject is indefinite and apparently VP-internal: (21) a. Zi iz gekumen zen ver frier vet kontshen. (Diesing 1988, 132) she is come see who earlier will finish ‘She has come to see who will finish earlier.’ b. Ikh veys nit vu nekhtn iz geshtanen a ku. (Santorini, personal communication) I know not where yesterday is stood a cow Both contexts involve a gap in the canonical, preverbal subject position. As such they appear to be cases of Stylistic Fronting (see section 1.2 and appendix). In relative clauses, verb-second order arises where there is a subject gap, but not otherwise. Relatives on the object are possible, provided that a resumptive pronoun is used (examples (22) from Lowenstamm 1977, cited in Santorini 1989, 56; example (23) from Lowenstamm 1977, cited in Diesing 1990, 63): (22) a. b.

nokh epes, vos oyfn hitl iz geven still something, that on the hat-DIM is been ‘something else that was on the little hat’ *Der yid vos in Boston hobn mir gezen iz a groyser lamdn. the man that in Boston have we seen is a great scholar ‘The man whom we saw in Boston is a great scholar.’

(23) der yid vos in Boston hobn mir im gezen the man that in Boston have we him seen ‘the man that we saw (him) in Boston’ Here again, the possibility of embedded topicalization depends on the presence of subject gap, confirming the idea that these are cases of Stylistic Fronting. The contrast between (22b) and (23) supports the idea that embedded topicalization is incompatible with any form of Wh-movement (with the exception noted after (20)). It seems, then, that Yiddish allows embedded

380 Anna Cardinaletti and Ian Roberts verb second under rather limited conditions:in non-bridge complements and in Wh-complements where the Wh-element is not moved. Stylistic Fronting is possible in clauses where Wh-movement affects the subject. In other words, embedded topicalization creates an island for extraction in Yiddish Wh-complements (see Vikner 1995, 73–80). This seems to be a property specific to Yiddish, since embedded topicalization does not have this effect in Icelandic (or, perhaps, the effect is limited to extraction of adjuncts over adjunct topics; see note 2). In spite of this difference, Yiddish shows the same basic properties as Icelandic: it has a double-Agr clause structure, SpecAgr1′ is a topic position, SpecAgr2′ is a subject position, and Nominative Case can be assigned under government.

2. Agr1 as a Position for Clitics It has often been noticed, since the earliest work on Indo-European syntax (see Wackernagel 1892), that unstressed elements of various kinds tend to be found in the second position in the clause. In this section, we propose a general analysis of this salient fact about the syntax of a variety of languages in terms of the following general idea:7 (24) Agr1° is a position for clitics. In the languages we discuss in this section, Agr1° is preceded by an element occupying its Spec. Since Agr1° is occupied by a clitic, the result is a “cliticsecond” structure. We will see that there is cross-linguistic variation in these structures regarding the position of the inflected verb relative to the clitic; our system makes it possible to analyze this variation in a straightforward way. 2.1 The Wackernagel Position in German A striking property of German that differentiates it from other Germanic languages is the fact that pronouns can occur between C° and the subject. The following examples illustrate this phenomenon for both embedded and matrix clauses. (In all our German examples the subject is unstressed; if the subject receives focal stress, the judgments can change, presumably owing to the fact that in this case the subject occupies a different position—a matter which we leave aside.) (25) a b.

. . . daβ es ihm der Johann gestern gegeben hat. that it him-DAT the J. yesterday given has ‘. . . that John gave it to him yesterday.’ Gestern hat es ihm der Johann gegeben. yesterday has it him-DAT the J. given ‘Yesterday John gave it to him.’

Clause Structure and X-Second 381 In terms of our system, these examples have the following structure:8 (26)

C′ Agr1P

C°

{ daß hat

Agr1′ Agr2P

Agr1° NP

Agr2′

der J.

...gegeben...

es ihm

As (26) shows, the subject (der Johann)can remain in SpecAgr2′; it is not required to move to SpecAgr1′. However, it is also possible to have the order in which the pronouns follow the subject. In our system, it is possible to say that in such cases the clitics are in the same position as in (26), with the subject occupying SpecAgr. (27) a.

C′ C°

{ daß hat

Agr1P NP

der J.i

Agr1′ Agr1°

es ihm

Agr2P NP ri

Agr2′

...gegeben...

b . . . .daß der Johann es ihm gestern gegeben hat. c. Gestern hat der Johann es ihm gegeben. The post-subject position of the pronouns is not a VP-internal position since the VP-internal order of arguments is Dative-Accusative in German. The fact that an Accusative pronoun must appear before a Dative NP indicates that it is outside V: (28) . . . daß der Johann es dem Hans gegeben hat. that the J. it the-dat H. given has ‘. . . that John gave it to Hans.’ Moreover, a post-subject pronoun is typically unstressed and, like a presubject pronoun, has clitic properties (see Boschetti 1986). This suggests

382 Anna Cardinaletti and Ian Roberts that post-subject es occupies the same position as pre-subject es, rather than being in a scrambled position of some kind (although other pronouns aside from es may be scrambled; see note 9). As indicated in (26) and (27), the clitic position is Agr1°. A further argument in favor of our approach over a scrambling analysis comes from the absence of “clitic-splitting” in German. That is, we do not find sentences where one pronoun precedes the subject and one follows:9 (29) a. b.

*. . .daß ihm der Hans es/sie wahrscheinlich gegeben hat. that him-DAT the H. it/her probably given has *. . .daß es/sie der Hans ihm wahrscheinlich gegeben hat. that it/her the H. him-DAT probably given has

Example (29) is ungrammatical because there is only one position available for unstressed pronouns: Agr1°. Where there is more than one unstressed pronoun, they all appear in Agr1°. Furthermore, it is unlikely that (25) is a case of scrambling since many speakers reject scrambling to pre-subject position, as shown by (30): (30) ??. . .daß den Roman ihm Johann gegeben hat. that the-ACC novel him-DAT J. given has. An important confirmation for this analysis of German comes from the fact that Dutch shows a pattern which differs minimally from German, something that we can treat very simply in our system.10Zwart (1991) argues convincingly that Dutch has object clitics that occupy a VP-external position (see also Jaspers 1989). In fact, Zwart proposes an analysis very similar to ours, involving a head-initial functional projection whose head is to the right of the canonical subject position. This analysis provides a simple explanation for the order “subject > pronouns” in Dutch: (31) . . . dat Jan ‘t gisteren aan Marie gaf. that J. itcl yesterday to M. gave ‘. . . that John gave it to Mary yesterday.’ Example (31) is like the German examples in (27b, c), and we can analyze it in the same way: ‘t is in Agr1° and Jan is in SpecAgr1′. Dutch differs from German, however, in that the order in (25) is ungrammatical: (32) *. . . dat ‘t Jan aan Marie gaf. This means that the structure in (26) is impossible in Dutch. Since (31) shows, in our terms, that the clitic can occupy Agr1°, (32) must be interpreted as indicating that the subject cannot be in SpecAgr2′. The minimal difference

Clause Structure and X-Second 383 between German and Dutch lies in the possible positions of the Nominative subject. We conclude that, although in both languages SpecAgr1′ is a subject position, SpecAgr2′ is a position that receives Nominative only in German (see section 3). 2.2 “Tobler-Mussafia” Effects A related phenomenon to the ones just discussed is found in all the Medieval Romance languages (Old French, Old Italian, Old Spanish) and, at least prescriptively, in European Portuguese (Benincà 1989; see also Galves 1994/2001 for an analysis of the clitic systems of both European and Brazilian Portuguese in terms of the double-Agr clause structure). The phenomenon in question amounts to a ban on clitic-first orders; in constructions where a proclitic would appear in first position, enclisis is obligatory and proclisis is excluded (Mussafia 1983). This phenomenon is known as the “Tobler-Mussafia Law” in the traditional literature. The following examples illustrate the operation of the Tobler-Mussafia Law in Old French (OF): (33) a. b. c.

Toutes ces chases te presta Nostre Sires. (de Kok 1985, 74) all these things you lent our Lord ‘All these things our Lord lent you.’ Voit le li rois. (Le Charroi de Nîmes, 1. 58) sees him the king ‘The king sees him.’ Fust i li reis, n’i oüssum damage. (Harris 1978, 240) were here the king, not here had (we) damage ‘If the king were here, we wouldn’t suffer any damage.’

In (33a), we have a regular V2 sentence (recall that OF was a V2 language; see section 1.2). The direct object toutes ces choses is topicalized to SpecC′; the finite verb appears in second position with the proclitic indirectobject pronoun te. Sentence (33b) is an example of “narrative V1” (see Hirschbuhler 1990). Here we see that the object clitic le follows the inflected verb; it is enclitic, not proclitic. Example (33c) is a verb-initial conditional clause where the same type of enclisis affects the locative i. Example (34) illustrates the Tobler-Mussafia Law in matrix interrogatives (de Kok 1985, 82): (34) a. Conois la tu? Know her you ‘Do you know her?’ b. Et quex chevaliers i avra il . . . ? and which knight there will-be there ‘And which knight will be there . . . ?’

384 Anna Cardinaletti and Ian Roberts In (34a), the clitic la is enclitic on the inflected verb (note that the subject pronoun follows it). In (34b), the clitic i is proclitic to the inflected verb (and the subject pronoun still follows the verb). In (35) and (36), we illustrate the analogous effect in Old Italian (OIt) (all examples are taken from Alberton 1990): (35) a. Poi vi trovò tanto oro e tanta ariente. (Novellino, LXXXIV) then there (he) found much gold and much silver ‘Then he found a lot of gold and silver there.’ b. Vogliolo sapere da mia madre. (Novellino, III) (I) want-it to-know from my mother ‘I want to know it from my mother.’ (36) a. Hailo tu fatto per provarmi? Have-it you done to try me ‘Have you done it to try me?’ b. Chi si potrebbe tener di piangere e di lagrimare in cotanto dolore? who self could (he) keep from crying and from weeping in such pain ‘Who could keep himself from crying and weeping in such pain?’ A comparison of (35) and (36) with the French examples in (33) and (34) shows that the basic phenomenon is the same: wherever proclisis would place the clitic in first position, the clitic follows the inflected verb. There are certain differences between OF and OIt, however; one important difference is that V1 declaratives are significantly rarer in OF than in OIt (Benincà 1989, 5). This may be due to the fact that the verb could remain in Agr1° in OIt matrix clauses (and so, to the extent that OIt was V2, V2 was realized at the Agr1-level), while in OF movement to C° was required (i.e., V2 was realized at the C-level). In this respect, OIt would be like Icelandic (see section 1.1), and OF differs from Icelandic (for other differences between OF and OIt; see Alberton 1990).11 We gloss over these differences here since our main goal is to account for the obligatory enclisis. Our account of the Tobler-Mussafia Law runs as follows. Following Kayne (1991), we assume that clitics always occupy functional-head positions. In particular, the clitic is in Agr1° before verb-movement to C°. The difference between enclisis and proclisis lies in whether the clitic forms a complex with the verb, or whether the verb moves to C° independently of the clitic. Enclisis results from the latter situation: the verb moves to C°, while the clitic remains in Agr1°. This is what we find in yes/no questions like (34a) and (36a) and conditionals like (33c), as well as in examples where the verb undergoes structure-preserving topicalization to C°, as in (33b) and (35b). Proclisis, by contrast, involves the formation of the complex [Cl + V] in Agr1°, which may then be moved to C°, as clearly seen in Wh-questions such as (34b) and (36b) (and in V2 sentences in which [Cl + V] moves to C°, as in OF examples like (33a)). A major advantage of this account is that it directly captures the fact

Clause Structure and X-Second 385 that enclisis is a root phenomenon (Benincà 1989; Alberton 1990); like other root phenomena, enclisis involves the presence of an inflected verb in C°. In this respect, our analysis is more straightforward than analyses of the type proposed by Alberton and Benincà, in which the clitic is in C° with both enclisis and proclisis in matrix clauses and topicalizes to SpecC′ to produce enclisis. The clause structure we wish to propose for these languages makes it possible to assume that the topicalization of V is movement to C°, rather than movement to SpecC′. As we said earlier in this essay, this integrates the account of enclisis more directly with accounts of other root phenomena (e.g., subject-aux inversion in English or subject-clitic inversion in contemporary French, and other phenomena discussed in den Besten 1983). A further advantage of our account as compared to approaches of the type in Alberton and Benincà is that the topicalization rule is structurepreserving in that it moves the inflected verb, which is an X°, to C°. Our analysis leads to two questions: Why can the verb move to C° independently of the clitic in Agr1°? Why must the verb move to C° independently of the clitic in cases of topicalization and yes/no questions? There are two possible answers to the first of these questions: either the verb “excorporates” from Agr1° to C°, leaving the clitic behind in Agr1° (see Roberts 1991 [this volume, Chapter 10] on excorporation), or the verb is able—under the right conditions—to move from Agr2° to C° “skipping” Agr1°. Of these two possibilities, we will assume the second in what follows (see Roberts 1993b). For the second question, we see no alternative to the traditional idea that the languages that show this effect do not permit clitic-first orders, which is presumably a phonological constraint. Hence, the verb “skips” Agr1° exactly where the “regular” movement through Agr1°, picking up the clitic on the way, would give rise to a clitic-first order with [Cl + V] in C°. The question that now arises is, why does V only “skip” Agr1° when moving through Agr1° would violate the ban on clitic-first orders? As we will suggest section 3, the clitic in Agr1° attracts the inflected verb in the usual case (see note 13). The clitic thus seems to impose two distinct requirements: (a) it cannot appear first and (b) it must combine with the inflected verb. Where some other element (e.g. a Wh-constituent) appears in first position for independent reasons, requirement (a) is automatically satisfied and requirement (b) can be most economically satisfied by V-movement to Agr1°, and, hence, it must be satisfied in this way, following Chomsky (1989)—although other requirements, e.g., Rizzi’s (1996) Wh-criterion, may lead to the [Cl + V] complex moving further. Alternatively, if no other element appears first, the two requirements imposed by the clitic are satisfied most economically if it moves to C°, skipping Agr1°, and the clitic left-adjoins to V (the latter operation possibly taking place in PF). Thus we see that the Tobler-Mussafia Law is a “last resort” operation in the sense of Chomsky (1989). Concerning comparative questions, our analysis makes the prediction that a language which shows the Tobler-Mussafia effect has Agr1°.We can relate the possibility of independent movement of the verb with respect to

386 Anna Cardinaletti and Ian Roberts the clitic to the existence of this position in the following way: in a system with both Agr1° and Agr2°, Agr2° contains the inflectional affixes that are required to form the inflected verb. Here the verb-stem must move to Agr2° so that an inflected verb can be formed. However, verb-movement to Agr1° is not forced by such morphological factors (although the verb may be “attracted” to Agr1° independently of the need to pick up an inflectional affix; see section 3 and note 13). Contemporary French and Italian have only one Agr position, and this is the position that contains the finite verbal inflection to which clitics are adjoined. Thus the finite verb always forms a unit with object clitics in these languages, and so, when the finite verb appears initially in C° in yes/no questions (as in (37)) and certain types of conditionals (as in (38)), the clitic remains proclitic. We illustrate this with a conditional in Italian, as no overt form of subject-verb inversion is found in interrogatives; see Rizzi (1982) on this form of conditional inversion: (37) [C° [Agr° La connais ] ] tu? (vs. (34a)) her know you ‘Do you know her?’ (38) [C° [Agr° L’avessi]] io saputo in tempo,. . . it-hadsubj I known in time ‘Had I known it in time,. . .’ Note that our analysis does not claim that clitics form a unit with the nonfinite verb. As Kayne (1991) shows, it is probably desirable to maintain that enclitics on nonfinite verbs in some Romance languages occupy a syntactic position separate from the verb—for example, in forms like Modern Italian farlo ‘to do it’. Moreover, Medieval Romance languages are like contemporary Germanic languages in that both groups tolerate clitic elements which can be independent of finite morphology; this is completely impossible in most contemporary Romance languages (but not in Romanian; see Motapanyane 1991 for an analysis of clitic placement in this language, which makes use of the same “doubleAgr” system as that being proposed here). In our terms, both Medieval Romance and contemporary Germanic have clitics that occupy Agr1°, a functional-head position that is independent in principle of the position of finite morphology— Agr2°. Furthermore, our analysis captures the traditional idea that ToblerMussafia effects are related to Wackernagel’s Law; as we have seen, both sets of phenomena crucially involve the presence of an “autonomous” clitic in Agr1°. 2.3 V3 Orders in Old English and Old High German In this section, we discuss one phenomenon of clitic-placement found in the older Germanic languages, Old English (OE) and Old High German (OHG). The data on OE are taken from van Kemenade (1987) and those on OHG from Tomaselli (1991).

Clause Structure and X-Second 387 Both OE and OHG were V2 languages, although, as we shall see later in this discussion, this does not necessarily imply that CP was activated in all matrix clauses in these languages. In this section, we focus on one case of V3 word order that is found in both of these languages, in which the second element is a clitic. (In OHG, this order is restricted to subject clitics; in OE, complement clitics are found in this position, too.) This order is the same as that in the Medieval Romance languages in matrix declarative sentences: (39) a. God him worhte þa reaf of fellum. (van Kemenade 1987, 114) God them wrought then garments of skin ‘God then made them garments of skin.’ b. Forðon we sceolan mid ealle mod & mægene to Gode gecyrran. therefore we shall with all mind and power to God turn ‘Therefore we must turn to God with all our mind and power.’ (van Kemenade 1987, 110) c. Dhes martyrunga endi dodh uuir findemes mit urchundin dhes heilegin chiscribes. his martyrdom and death we demonstrate with evidence of the holy writings ‘We demonstrate his martyrdom and his death with evidence from the holy scriptures.’ (Tomaselli 1991, 3) There is a major difference with the Romance Tobler-Mussafia effects discussed in the previous section, however. In contexts where the verb is in C° and an XP of a particular class is in SpecC′, the clitic follows the verb. In OE, these contexts involve a Wh-constituent, the negative element ne or the adverb þa ‘then’ in initial position, all of which can be plausibly treated as elements in SpecC′ triggering movement of the inflected verb to C° (Tomaselli 1991, 6 implies that the same is true in OHG; cf. Tomaselli 1990): (40) a. b.

Ne geseah hine nan man nates-hwon yrre. (van Kemenade 1987, 114) not saw him no man so little angry ‘No one ever saw him so little angry.’ Hwæt sægest þu, yrþlincg? (van Kemenade 1987, 138–139) what saist thou, ploughman ‘What do you say, ploughman?’

Parallel with our analysis of the Tobler-Mussafia effect in 2.2, we propose that in examples like (39) the clitic occupies Agr1°. The verb can occupy this position, as the following OE case of verb-subject order shows (from van Kemenade 1987, 114): (41) Fela spella him sædon þa Beormas, ægþer ge of hiera agnum lande. . . many stories him told the Permians, both of their own country ‘The Permians told him many stories, both about their own country. . .’

388 Anna Cardinaletti and Ian Roberts We cannot provide the parallel evidence in OHG, since the clitic in Agr1° is always a subject clitic. However, OHG examples like (39c) can be analysed as having the inflected verb in Agr1° together with the clitic. In Germanic matrix declaratives, then, the CP-level is not activated. The main verb can also appear in a position quite distant from that occupied by the clitic in embedded clauses and in the characteristically non-V2 second conjuncts in main-clause coordination (see Kiparsky 1995). Notice that the clitic can appear either before or after the subject in OE, as in Modern German (see section 2.1): (42) a. b. c.

. . . þæt him his fiend wæren æfterfylgende. (van Kemenade 1987, 113) that him his enemies were following. ‘. . . that his enemies were chasing him.’ & se cyng him eac wel feoh sealde. (van Kemenade 1987, 113) and the king him also well property gave ‘and in addition the king gave him much property.’ . . . daz íh nîeuuánne necúme in conventicula haereticorum (Tomaselli 1991, 2) that I never not-come in the circle of-heretics ‘. . . that I never come into the circle of-heretics.’

We take it that the inflected verb is in Agr2° in these examples. As in Medieval Romance languages, Agr2° is the position in which the inflected verb is formed by combining with the agreement affix, while Agr1° is the clitic position. The principal difference between these Germanic languages and the Medieval Romance languages discussed section 2.2 is that in the Germanic languages the verb only moves to Agr1° in matrix declaratives, while in the Romance languages the verb moves to Agr1° both in matrix and in subordinate clauses, and, indeed in any clause where the disallowed clitic-first order will not result from continued movement to C°. In Germanic interrogatives, and so on, where the inflected verb is clearly in C° (see (40)), the verb “skips” Agr1°, and so the clitic appears in third position here. This is consistent with the fact that verb-preposing is a root phenomenon in Germanic, while it is found also in embedded clauses in Romance. Our analysis captures the same range of facts as those proposed by van Kemenade (1987) for OE and Tomaselli (1991) for OHG. Compared to those analyses, ours has the advantage of greater generality, in that it relates the Germanic phenomena to the Tobler-Mussafia effects found in Romance. Also, the more elaborated clause structure that we propose makes it possible to distinguish the clitic position from the position of the inflected verb and to maintain that the inflected verb is always formed in the standard way in these languages, by head-movement into a position containing the agreement affix (see Pollock (1989)). In contrast, neither Tomaselli nor van Kemenade allow for a position reserved purely for verbal inflection; in such

Clause Structure and X-Second 389 a framework, it is necessary to posit special rules for the formation of the inflected verb when it is “distant” from the clitic in examples like (42), an undesirable consequence that we are able to avoid.

3. Subject Positions In this section, we will show how the subject positions are determined in the various languages which we have considered (we limit our attention throughout to non-θ subject positions,—i.e., those outside VP). A central part of our discussion is based on the approach to Nominative-Case assignment advocated in Koopman & Sportiche (1991). Koopman & Sportiche propose that UG makes available two mechanisms of Nominative-Case assignment. Nominative Case can be assigned either under government, illustrated in (43a), or under agreement, illustrated in (43b): (43) a.

X′ YP

X°

Y′

NP

b.

XP YP

X′ X°

Koopman & Sportiche consider that the choice among these two mechanisms is a “pure” parametric choice, in the sense that each language is in principle free to choose among these possibilities. We will suggest, however, that the choice of Nominative-assignment under agreement depends on other factors—in particular, on the nature of the Case-assigning head X°— and that the only pure parametric choice is that of selecting (43a) or not. In the context of the general assumption that Agr1° is the NominativeCase assigner in all the languages we are concerned with (but see Note 12 for a refinement of this assumption), if a language chooses the government option in (43a) then SpecAgr2′ will be the subject position; and if a language chooses the agreement option in (43b) then SpecAgr1′ will be the subject position. We further assume that the two options can be combined, making both SpecAgr1′ and SpecAgr2′ possible subject positions. We understand “subject position” to mean both Case-position—that is, a position to which Nominative is assigned—and A-position (although we are excluding θ-positions here, as we mentioned above). The generalization that emerges from a consideration of the languages we have looked at is that wherever Agr1° is a clitic position, its specifier is a subject position. On the other hand, where Agr1° is a “pure” verbal position (in the sense that the only element that ever appears there is the

390 Anna Cardinaletti and Ian Roberts verb), its specifier is a topic position. Suppose, then, that it is the fact that Agr1° may host a clitic that makes it possible for it to assign Nominative Case under agreement. We will return to this idea below. First, however, let us consider the evidence from the various languages that points to this conclusion. Consider first the situation in Icelandic (and Yiddish, see section 1.3). The evidence we presented in section 1.1 shows that in this language Agr1° is a position which always hosts the inflected verb and that SpecAgr1′ is a topic (non-Case, A′) position, while SpecAgr2′ is a subject position. Schematically, then, we have the following: (44) [Agr1P TOP [[Agr1° Vi] [Agr2P Subj. [[Agr2° ti] . . . ]]]] This structure is common to main and embedded clauses. We can account for its basic properties in terms of Koopman & Sportiche’s system by saying that Agr1° assigns Nominative under government only. Note that Agr1° always contains the inflected verb and never contains a clitic in Icelandic; we propose the lack of Case-assignment under agreement is related to this point. Now compare German, on the basis of the analysis given in 2.1. In main clauses, we have the following: (45) [CP TOP [[C° V] [Agr1P Subj. [[Agr1° C1][Agr2P Subj. . . .]]]]] As in standard approaches, we take it that the verb moves to C° in main clauses and that SpecC′ is a topic position. Agr1° is the Wackernagel position and as such may contain one or more clitics. SpecAgr1′ is a potential subject position in German, and not a topic position (see section 2.1); this is the fundamental difference between German and Icelandic. SpecAgr2′ is also a possible subject position in German, as it is in Icelandic. In terms of Koopman & Sportiche’s system, we can say that in German Agr1° assigns Nominative both under government and agreement. For this reason, SpecAgr1′ is a subject position rather than a topic position. As mentioned earlier in the discussion, we believe that there is a nontrivial connection between the possible presence of a pronoun in Agr1° and the fact that SpecAgr1′ is a subject position. There is a further complication concerning the topic positions in both German and Icelandic—that is, SpecC′ in German and SpecAgr1′ in Icelandic (this complication is probably common to all the V2 Germanic languages and to OF (see Roberts 1993, 2.3)). These positions also display properties of subject positions. Apart from being available for topicalization of any XP, they accept elements which are clearly not topics: lexical expletives and non-topic subjects (both pronominal and nonpronominal); see. Cardinaletti 1990b, 1994 for further arguments for this analysis.

Clause Structure and X-Second 391 We can treat this situation in terms of Rizzi’s (1991) proposal that any specifier that agrees in ϕ-features with its head can be an A-position (and therefore a subject position; other A-positions are all structural complements, not specifiers). This agreement can naturally be thought of as a coindexation relation between the subject in SpecX′ and X. We refer to this possibility henceforth as SPEC-HEAD coindexation: this substantive relation is a subcase of the structural relation of spec-head agreement. Hence, precisely where a subject NP appears in SpecAgr1′ in Icelandic, or in SpecC′ in German, the position can be assigned Nominative since there will be agreement in ϕ-features between the head and the NP. So, even in Icelandic, Nominative can be assigned under agreement in these conditions.12 Nominative-assignment under spec-head coindexation can thus be distinguished in principle from Nominative-assignment under spec-head agreement, since in the latter situation it is possible that the head which assigns Nominative is not coindexed with its specifier (on this point we differ from the proposals made by Koopman & Sportiche (1991)). In this sense, as we mentioned above, we do not take the possibility of Nominative-assignment under agreement given in (43b) to be an absolute parametric choice, but rather a possibility related to the nature of the Nominative-assigning head. On the status of the possibility of Nominativeassignment under government, see the following discussion. Next, let us consider the more complex cases of OE and OF. Following the discussion in section 2.3, OE main clauses show the pattern in (46a), and OE embedded clauses that in (46b): (46) a. [Agr1P TOP [[Agr1° Cl + Vi] [Agr2P Subj. [Agr2° ti ].…]]]] b. [Agr1P Subj. [[Agr1° Cl ] [Agr2P Subj. [Agr2° V ].…]]]] Example (46b) clearly indicates that Agr1° can assign Nominative either under government or under agreement in OE, and so both SpecAgr1′ and SpecAgr2′ are possible subject positions (see section 2.3). So, for embedded clauses in OE, no problem arises. (46a), which is motivated by the analysis of V3 orders in OE that we gave in section 2.3, leads to the following question: why is SpecAgr1′ a subject position in embedded clauses, but a topic position in main clauses? The fact that SpecAgr1′ is a subject position in embedded clauses follows from our proposal that the possible presence of a pronoun in Agr1° is correlated with the capacity to assign Case to SpecAgr1′. A pronoun can also occur in Agr1° in main clauses, as seen in (46a) (see section 2.3 for examples and discussion). However, Agr1° is also a position for the inflected verb in (46a). This must be the factor which makes the difference: where Agr1° is a position to which the inflected verb can move, its specifier is a subject position only where the two are coindexed (the same is true for C˚ in German, as we saw in the preceding discussion). If Agr1° hosts clitics but not inflected

392 Anna Cardinaletti and Ian Roberts verbs, its specifier is uniquely a subject position (see the following for a slight refinement). Finally, let us briefly consider the situation in OF, on the basis of the analyses proposed in sections 1.2 and 2.2 ((47a) is the structure of a main clause and (47b) that of an embedded clause): (47) a. [CP TOP [[Cº Cl+Vi ] [Agr1P Subj. [[Agr1° ti ] [Agr2P Subj…..]]]]] b. [Agr1P Subj. [[Agr1° Cl+V ] [Agr2P Subj. . . .]]] In both main and embedded clauses, Agr1˚ assigns Nominative under both government and agreement. Thus, both SpecAgr1′ and SpecAgr2′ are subject positions. There is an important difference between OF and what we saw above for OE. In OE, the specifier of a head containing the combination [Cl + V] (SpecAgr1′ in matrix clauses) is a topic position; in contrast, the seemingly identical head position—[Agr1° Cl + V ]—has a subject-position specifier in OF embedded clauses, as (47b) shows. We propose that the difference between the two cases lies in the different status of V-to-Agr1 movement in OF as compared to OE, which we already noted in 2.3. In OF, V moves to Agr1° in all embedded clauses and all root clauses where no violation of the ban on clitic-first orders will result, so this movement is clearly the unmarked option. In contrast, in OE V moves to Agr1° only in root clauses. So V-to-Agr1 is a more general process in OF than in OE, since the verb forms a unit with the clitic wherever it can. This is clearly not the case in OE. Since the clitic in Agr1° triggers V-movement to this position in OF, we can consider Agr1° to be essentially a clitic position; hence, its specifier is a subject position. Conversely, in OE the head [Agr1° Cl + V] is formed by an instance of verb-movement that is not tiggered by the clitic. Because of this, we consider these cases of Agr1° to be verb positions, and so the account given earlier (after (46)) applies.13 To sum up, we can account for the different subject positions attested in the languages we have discussed. Icelandic allows Nominative-assignment only under government, while the other languages discussed here allow both possibilities. We arrived at these conclusions by holding constant the following two assumptions: (48) a. Agr1° assigns Nominative Case (although it may not be the only Nominative-assigner; see note 12). b. The nature of a specifier position depends on the possible content of the head specified. Of these assumptions, (48b) deserves further comment. What emerged from our discussion is that Agr1° (and perhaps also C°; see note 12) assigns Case under Spec-head agreement where (a) it contains an inflected verb that is coindexed with the NP in the specifier position or (b) it is a clitic position.

Clause Structure and X-Second 393 This general conclusion is consistent with the spirit of Rizzi’s (1991) proposal that A-position specifiers are in a Spec-head relation with Agr. As we mentioned at the beginning of this section, in turn this suggests that the possibility of assigning Nominative under Spec-head agreement is not a “pure” parameter but, rather, is related to the intrinsic properties of the Nominative-assigner. In contrast, the possibility of assigning Nominative under government may be a “pure” parametric choice. We saw evidence in section 2.1 that in Dutch Nominative Case can only be assigned under agreement (see (31) vs. (32)). In Dutch, Agr1° is clearly a clitic position, as Zwart’s (1991) data show; hence, SpecAgr1′ is a subject position. However, SpecAgr2′ cannot contain the subject, as (32) shows. We interpret this to mean that the government option is not chosen in Dutch. Where the verb raises to C°, SpecAgr1′ remains a subject position, since Agr1° is not carried along by this movement; as in German, the verb “skips” Agr l° on its way to C°. So Agr1 continues to assign Nominative to its specifier in these conditions. Dutch shows the same evidence as German and Icelandic that SpecC′ may be a subject position (Travis 1984); we can treat these facts exactly in the same way as we have treated parallel German facts (see note 12). To see that this system has some generality, and is not limited to languages with the “double-Agr” structure, let us briefly consider the situation in languages with one Agr projection. Here the same possibilities of Nominative-Case assignment are available, and they are determined in the same way in that Nominative-assignment under agreement depends on properties of (the single) Agr while Nominative-assignment under government is a “pure” parametric choice. Modern Standard Italian is an example of a language in which Nominative is assigned under agreement only. This is consistent with our proposals in that Agr in Italian can contain both complement clitics and the inflected verb, as is well known. The possibility of Nominative-assignment under government does not exist as the ungrammaticality of the following example shows (see Roberts 1993b; Rizzi 1996): (49) *Ha Gianni fatto questo? has G. done this ‘Has John done this?’ (Here we are only concerned with the Case properties of Agr°; it is possible that “free inversion” involves Nominative-assignment by T° under government to a VP-internal subject position; see Rizzi 1990a.) Where Nominative-assignment under government is impossible in a system with only one Agr-projection, verb-second and other kinds of inversion around a nominal subject are impossible (see Rizzi and Roberts 1996 [this volume, Chapter 9] on French). More generally, it is probable that systems

394 Anna Cardinaletti and Ian Roberts with just one Agr-projection must allow Nominative-assignment under agreement since they are unable to exploit the government configuration in many cases (this conclusion depends on the assumption that SpecT′ is an A′-position; see section 1.1). English, like Italian, has only one Agr-projection. Nominative-assignment under agreement is possible, since there is always a subject in SpecAgr′, triggering the required spec-head coindexation. Nominative-assignment under government is also possible in English: (50) Has John done that? The existence of Nominative-assignment under government in English shows that this possibility holds no necessary relationship to full verbsecond. Tables 12.1 and 12.2 sum up the properties of the different grammatical systems that we have discussed up to now. Recall that we have proposed that the possibility of assigning NOM under agreement is not a “pure” parameter but, rather, a function of parametrized properties of the NOMassigning head.

Table 12.1 Variation among double-Agr languages Language

SpecAgr1′

Agr1°

SpecAgr2′

Icelandic Yiddish Old French German Dutch Old English, Old High German

Topic Topic Subject Subject Subject Topic (main), Subject (embedded)

Governed nom Governed nom Governed/Agreeing nom Governed/Agreeing nom Agreeing nom Governed/Agreeing nom

Subject Subject Subject Subject —a Subject

a. The position is never filled.

Table 12.2 Parameters proposed Parameter

Positive Value

Negative Value

Agr-recursion

See languages in Table 12.1

Mod. English, Mod. French Mod. Italian

V2

Germanic (other than English), Old French Germanic (other than Dutch), Old French

Mod. Romance, English

Governed NOM

Mod. Romance, Dutch

Clause Structure and X-Second 395

4. Null Subjects In this section, we will show how the analysis developed in the previous sections can account for certain aspects of the distribution of null subjects in V2 languages. We adopt the conception of the pro-module outlined in Rizzi (1986). The most important aspect of this system is the fact that null subjects depend on Case-marking for formal licensing in the sense that a null subject must occupy a position of potential (although not necessarily actual) Case-assignment. Because of this, what we saw in the previous sections concerning the interaction of the double-Agr clause structure with Nominativeassignment has interesting consequences for the distribution of null subjects. 4.1 Expletive Null Subjects In Icelandic and German, there is an alternation between an overt expletive and a null expletive. This alternation is determined by several factors, most importantly the licensing condition for null subjects. In these languages, null subjects can only be licensed under government or under “pure” spec-head agreement (in the sense discussed in Section 3). This gives rise to complementary distribution between the null expletive and the lexical expletive. Wherever it is possible for the expletive null subject to appear, it must appear and so the lexical expletive is excluded (see Cardinaletti 1990a, 1994 for an account of this). Wherever the expletive null subject is not licensed, the overt expletive appears. This situation is illustrated by the following pairs of German, (51), and Icelandic, (52), matrix sentences:14 (51) a. b.

*(Es) wurde getanzt. It was danced. Gestern wurde (*es) getanzt. Yesterday was it danced ‘Yesterday there was dancing.’

(52) a. b.

*(Það) var dansað. it was danced I gær var (*það) dansað. Yesterday was it danced ‘Yesterday there was dancing.’

In the (a)-examples, the expletive is in topic position (although it is not a topic, see Section 3 and the following discussion). Since this is a position where null subjects are not licensed, the expletive pro cannot appear, and hence the overt expletive is required. Null subjects cannot be licensed in this position since this is a configuration in which Nominative is assigned under Spec-head coindexation (where the head is C° in German and Agr1° in Icelandic); as we will see in more detail below, Spec-head coindexation

396 Anna Cardinaletti and Ian Roberts does not license a null subject. In the (b)-examples, on the other hand, the expletive appears in a position where null subjects are licensed, and hence the lexical expletive is ungrammatical. In Icelandic, the situation in matrix clauses just illustrated is also found in all types of embedded clauses, as long as the thematic subject is indefinite, and is as such able to remain in VP where it receives Partitive Case (in the examples below, we use enginn, ‘nobody’). As the examples show, the null expletive is in SpecAgr2′ (we are grateful to Hoski Thnáinsson for providing us with these examples): (53) a. Ég harma að þegar skuli pro enginn hafa lesið ϸessa bók. I regret that already should nobody have read this book ‘I regret that nobody should have read this book already.’ b. Ég spurði hvort þegar hefði pro enginn lesið ϸessa bók. I asked whether already had no body read this book ‘I asked whether nobody had read this book already.’ c. sú staðreynd að þegar hefur pro enginn lesið ϸessa bók the fact that already has nobody read this book ‘the fact that nobody has read this book already’ d. bókin sem þegar hefur pro enginn lesið book-the that already has nobody read ‘the book that nobody has read already’ As we said, overt expletives cannot appear where a null expletive is possible (Thráinsson, personal communication): (54) a. b. c. d.

*Ég held/harma að þegar skuli það María/enginn hafa lesið ϸessa bók. I believe/regret that already should it M./nobody have read this book *Ég spurði hvort þegar hejfði það María/enginn lesið ϸessa bók. I asked whether already has it M./nobody read this book *sú staðreynd að þegar hefur það María/enginn lesið ϸessa bók the fact that already had it M./nobody read this book * bókin sem þegar hefur það María/enginn lesið book-the that already had it M./nobody read

In (53), the null expletive occupies SpecAgr2′. As we saw in Section 3, this is a subject position in the sense that it is a position which receives Nominative Case. Hence the licensing condition on pro is satisfied here, and the null expletive is present, making the overt expletive impossible as in (54). The reverse situation is found in SpecAgr1′. This can be seen in examples like the following (Thráinsson, personal communication) (55) a. Ég harma að *(þaði) skuli ti enginn hafa lesið ϸessa bók. I regret that (it) should nobody have read this book ‘I regret that nobody should have read this book.’

Clause Structure and X-Second 397 b. c. d.

Ég spurði hvort (þaði) hefði ti enginn lesið ϸessa bók. I asked whether (it) had nobody read this book ‘I asked whether nobody had read this book.’ sú staðreynd aþ *(ϸaði) hefur ti enginn lesið ϸessa bók the fact that (it) has nobody read this book ‘the fact that nobody has read this book’ bókin sem *(ϸaði) hefur ti enginn lesið book-the that (it) has nobody read ‘the book that nobody has read’

(There is a complication concerning (55b), which improves this example in comparison with the others; see the following discussion.) Moreover, when there is no subject, for example, in impersonal passives, ϸað is able to appear in SpecAgr1′ (example from Rögnvaldsson 1984, 17–18): (56) Ég veit að *(ϸaði) er ti; ekið vinstra megin í Astralíu. I know that (it) is driven on the left in Australia ‘I know that people drive on the left in Australia.’ In both (55) and (56), we see that the overt expletive is required, and pro is impossible. The null subject is ungrammatical in SpecAgr1′ because the only way in which Nominative Case can be assigned to this position is by spec-head coindexation and not by “pure” spec-head agreement or by government. In general, this implies that spec-head coindexation is not adequate for licensing pro. This approach is confirmed by the behavior of expletives in interrogatives where the inflected verb is raised to C°. In such examples, the overt expletive is impossible and the null expletive is required: (57) Var (*ϸað) dansað? was (it) danced ‘Was there dancing?’ Here the presence of the verb, containing Agr1°, in C° makes it possible to assign Nominative to SpecAgr1′ under government. Hence pro is possible. In (55), as in (53), there is a definiteness effect. See the ungrammaticality of the parallel examples where the thematic subject is definite (Thráinsson, personal communication): (58) a. *Ég harma að ϸað skuli María hafa lesið ϸessa bók. I regret that it should M. have read this book b. *Ég spurði hvort ϸað hefði María lesið ϸessa bðk. I asked whether it had M. read this book c. *sú staðreynd að ϸað hefur María lesið ϸessa bók the fact that it has M. read this book d. *bókin sem ϸað hefur María lesið book-the that it has M. read

398 Anna Cardinaletti and Ian Roberts We attribute the ungrammaticality of (58) to the fact that there are two NPs that compete for Nominative Case: ϸað and María. In contrast, the indefinite enginn in (53) and (55) is able to receive Partitive Case (see Belletti 1988; Vikner 1995), making Nominative Case available for ϸað.15 Given these considerations, it is clear that no definite NP can appear lower that SpecAgr2′. However, it is clear from the discussion in section 1.1 that there is no definiteness effect with embedded V2 in Icelandic. Therefore, this consideration further confirms that we cannot adopt an analysis of Icelandic embedded V2 of the type proposed by Rögnvaldsson and Thráinsson (1990), and by Diesing (1988,1990) and Santorini (1989) for Yiddish embedded V2 (moreover, given the general similarities between Yiddish and Icelandic noted in section 1.3, this analysis probably does not hold for Yiddish, either). This type of analysis relies on the idea that the standard SpecI′—the canonical subject position of English—is able to be a topic position and that the subject can remain in an “internal” position. However, the contrasts between (53) and (54) and between (55) and (58) show that the “internal” position is subject to a definiteness effect. All this supports our analysis of embedded V2 in terms of the double-Agr structure, and so supports the postulation of that structure as a general possibility. Icelandic also allows an empty SpecAgr1′ in embedded clauses. This is only possible where C° is able to license an expletive null subject and only where the subject is indefinite. The conditions under which C° can license an expletive null subject are rather difficult to understand; however, they correspond closely to those where in French Stylistic Inversion an expletive null subject may appear, being essentially cases where C° is [+wh] or subjunctive (see Kayne and Pollock 1978; Pollock 1986) (examples (59a–c) from Rögnvaldsson and Thráinsson 1990, 31; example (59d) from Rögnvaldsson 1984, 17–18; example (60) from Kayne and Pollock 1978): (59) a. *Ég spurð hvort pro hefði María/hún lesið ϸessa bók. I asked whether had M./she read this book b. ?Ég spurð hvort proi hefði ti enginn lesið ϸessa bók. I asked whether had nobody read this book ‘I asked whether anybody had read this book.’ c. Hann spurði hvort proi hefðu ti komið margir gestir. he asked whether had come many guests ‘He asked whether many guests had come.’ d. Ég veit að proi væri ti ekið vinstra megin í Astralíu. I know that is driven on the left in Australia ‘I know that people drive on the left in Australia.’ (60) J’exige que pro soit procédé au réexamen de cette question. I require that be proceeded to-the reexamination of this question. ‘I require that this question be reexamined.’

Clause Structure and X-Second 399 As the examples in (59) indicate, the definiteness effect here is due to the fact that pro is base-generated in SpecAgr2′ and raises to SpecAgr1′. The trace of pro in SpecAgr2′ blocks movement of a definite subject into that position. One aspect of sentences of this kind may seem problematic, given what we said above: in at least some cases, null and overt expletive subjects are apparently not in complementary distribution. This can be seen by comparing examples like (55b) with (59b); we repeat (55b) for convenience here: (55) b. Ég spurði hvort ϸað hefði ti enginn lesið ϸessa bók. I asked whether it had nobody read this book ‘I asked whether anybody had read this book.’ This noncomplementary distribution of ϸað and pro is not inconsistent with the basic idea that an overt expletive cannot appear wherever a null expletive can be licensed. The crucial point here is that there are two different potential licensers for expletive pro: Agr1° and C° (where C° is [+wh] or subjunctive). Suppose that each of these heads is only able to license pro under government, so Agr1° licenses pro in SpecAgr2′ and C° licenses pro in SpecAgr1′. When Agr1° licenses pro in SpecAgr2′ það cannot appear in this position and so must move to SpecAgr1′. When C° licenses pro in SpecAgr1′, það cannot appear there for exactly the same reason. This is what we see in (59). So we arrive at the result that the only expletive possible in SpecAgr2′ is the null one, while in SpecAgr1′ either það or pro is possible depending on which head—Agr1° or C°—is the licenser. A similar situation occurs in French with Stylistic Inversion; alongside examples like (60), the comparable case with the overt expletive il is possible: (61) J’exige qu’il soit procédé au réexamen de cette question. I require that it be proceeded to-there-examination of this question. ‘I require that this question be reexamined.’ The preceding paragraphs show that our proposed clause structure for Icelandic can account for the distribution of null and overt expletives in both matrix and embedded clauses. A major advantage of our account is that it explicitly relates the phenomenon of embedded verb-second to the distribution of these expletives in embedded clauses. In German, the situation regarding null expletives is largely comparable to what we have just seen for Icelandic. A null expletive is possible in SpecAgr1′ as shown by (51b) and the following: (62) a. . . . daß pro es ihm ein Mann gegeben hat. that it to-him a man given has ‘. . . that a man has given it to him.’

400 Anna Cardinaletti and Ian Roberts

b. . . . daß pro es ihm dieser Mann gegeben hat. that it to-him this man given has ‘. . . that this man has given it to him.’

In (62a) the thematic subject, ein Mann, is indefinite and so able to receive Case in VP independently of Agr1°. Hence pro can appear in SpecAgr1′. In (62b), the thematic subject is definite, and, as its position relative to the clitics shows, it occupies SpecAgr2′. In this example, then, pro must occupy SpecAgr1′. Note that here pro does not receive a Case; nevertheless it is licensed since it occupies a potential Case position (in the sense that Nominative can be assigned under spec-head agreement here; see section 3). As we saw for SpecAgr1′ in Icelandic, no null subject can be licensed by spec-head coindexation. In German the same situation holds at the level of SpecC′ (it does not hold at SpecAgr1′ as this position receives Nominative Case under “pure” spec-head agreement in German). This is illustrated by both (51a) and the following: (63) Esi hat ti ihm ein/dieser Mann das Buch gegeben. It has him a/this man the book given ‘A/This man has given him the book.’ In this section we have shown how the proposals we made in previous section regarding both the clause structure and the modes of Nominative assignment in German and Icelandic can provide a straightforward account of the distribution of overt and null expletives, a matter that has long been regarded as problematic. This account depends crucially on the assumption that pro can be licensed in configurations of potential Case assignment, with the exception of those that depend on spec-head coindexation. 4.2 Referential Null Subjects We next consider the interaction of our proposals with the analysis of referential null subjects, concentrating on Old French. As we saw in section 1.2, OF was a verb-second language, and it is well known that the distribution of null subjects interacted with the movement of the verb (see Thurneysen 1892; von Wartburg 1934; Price 1971; Benincà 1986; Foulet 1982; Benincà 1983–84, 1989; Vanelli, Renzi, and Benincà 1985; Adams 1987a, b). Null subjects are frequently found with verb second in OF. The following lines from Le Charroi de Nîmes illustrate this: (64) a. Muetes de chiens font avec els mener. (Compl V) troups of dogs make (they) with them bring ‘They have troops of dogs brought with them.’ b. Par Petit Pont sont en Paris entré. (PP V) by Petit-Pont are (they) in Paris come ‘They entered Paris by the Petit-Pont.’

Clause Structure and X-Second 401 c. Li cuens Guillelmes fu molt gentix et ber. (Subj V) the count G. was very kind and good ‘Count Guillelmes was very kind and good.’ Beginning with Vanelli, Renzi, and Beninca’ (1985), various authors have made an analytic connection between the verb-movement required for verb second and the licensing of referential null subjects in this language. We concur with this basic approach, and, following Roberts (1993b), we interpret the connection in terms of the relationship between Case theory and formal licensing. So the connection between null subjects and verb-movement in OF is due to the fact that verb-movement creates the configuration in which pro can be formally licensed. To the extent that verb second is crosslinguistically a root phenomenon, this approach leads to the expectation that OF null subjects are a root phenomenon. However, there is evidence (from Adams 1988a,b,c; Dupuis 1988, 1989; Hirschbuhler and Junker 1988; Hirschbuhler 1990; and Vance 1988) that null subjects were possible in a range of embedded clauses, including, most importantly, Wh-complements (they are also possible in complements to bridge verbs, which we argued in section 1.1 involve CP-recursion). Adams gives the following examples of VI and V2 orders with null subjects in Wh-complements (1988b, 5–6 (9), (11); 1988c, 10–11 (9), (11)): (65) a. b. c.

Je sui le sire a cui [ volez parler]. I am the lord to whom (you) wish to-speak ‘I am the lord to whom you wish to speak.’ L’espee dont [ s’estoit ocis] the sword by-which himself-was killed ‘the sword that he killed himself with’ quant [ vit le roi] when (he) saw the king ‘when he saw the king’

Continuing to assume that the inflected verb cannot move to a [+wh] C°, we are led to the conclusion that the verb is in Agr1° in these examples. If this is so, we need to explain where the null subject is and how it is licensed. The same question emerges from a consideration of the null subjects combined with Stylistic Fronting that we saw in section 1.2 (see (12)). Hirschbuhler (1990) presents evidence that referential embedded null subjects are found in just one phase of the OF period. He studies twelfthand thirteenth-century prose and verse and concludes that there are essentially two systems at work, according to the nature of the text. One system, which we call the “conservative” system, allows null subjects in embedded clauses fairly freely, as in (65). This system is typical of twelfth-century prose texts and for both twelfth- and thirteenth-century verse. In contrast, in thirteenth-century prose texts, embedded V1 orders with null subjects are

402 Anna Cardinaletti and Ian Roberts limited to two types of rather fixed expressions. Call this the “advanced” system. Here are some representative examples of the advanced system: (66) a. se ne fu chiés le Roi Mehaignieé (Hirschbuhler 1990, 7 (14)) if not were “chez” King M. ‘if it were not at the court of King M.’ b. quant vint a cele hore que . . . (Hirschbuhler 1990, 7 (15a)) when came to that time that ‘when the time came that . . . ’ These are clearly cases of expletive null subjects. Moreover, they seem to represent an unproductive option. We propose, on the basis of Hirschbuhler’s data, that the “double Agr” system was at work in the more conservative texts. So, for instance, we assign the following structure to the embedded clause in (65a): (67)

CP XP

C′ Agr1P

C° [+wh]

Agr1′

Spec

Agr2P

Agr1°

Agr2′

Spec

a cui

proi

volezk

ti

Agr2°

TP

tk

tk parler

Here the verb appears in Agr1° and licenses the null subject, the pro in SpecAgr1′ under agreement. The verb moves from Agr2° to Agr1° to license the subject position in SpecAgr1′. So SpecAgr1′ is a subject position. As we saw in sections 1.2 and 3, SpecAgr2′ is also a subject position. Putting together what we have just seen with the results of section 1.2, our analysis of OF runs as follows. There may be a very early period, essentially represented by the Quatre Livres du Roi, in which OF is like Icelandic in that SpecAgr1′ was a topic position, and embedded topicalization was possible. However, most of the texts of “conservative” OF do not give sufficient evidence for embedded topicalization, as opposed to Stylistic Fronting, to support this analysis. For conservative OF (i.e., leaving aside the system that may be instantiated by the Quatre Livres du Roi), we propose that SpecAgr1′ was a subject position. So Nominative Case could be assigned to this position under spec-head agreement. On this view, “conservative”

Clause Structure and X-Second 403 OF is just like German with respect to the nature of both SpecAgr1′ and SpecAgr2′; the differences between the two languages are due to the independent existence of both Stylistic Fronting and referential null subjects in OF. In main clauses at this period, the well-known connection between verb-second and null subjects is retained since V is in C° and null subjects must be in SpecAgr1′. Third, there is the “innovative” system of thirteenth-century prose. Here, only expletive null subjects are allowed in embedded contexts. The obvious analysis is that Agr1° is no longer able to license a referential null subject. We think this is because Agr1° is not present at all at this stage. This leaves Agr2° and C° as potential licensers of null subjects. Of these, Agr2′ is unable to license referential null subjects alone (i.e., without moving to C°). As a result, C° is the only potential licenser in embedded clauses, and so only expletive null subjects are available in these OF texts. Agr(2)° (i.e., the sole Agr after the thirteenth century) is unable to license a null subject under agreement in embedded clauses, but it continues to license null subjects under government when moved to C° in main clauses, giving rise to the clear asymmetries between main and embedded clauses found in thirteenth-century prose. We suggest that Agr(2)° is unable to license pro under agreement as the inflectional morphology it contains is too “poor”; for licensing under agreement genuinely rich morphology is required, as, for example, in Italian. French verbal morphology has been relatively impoverished since roughly the twelfth century (see Foulet 1935– 1936, Roberts 1993b for discussion). This difference in the licensing properties of Agr(2)˚ may be related to the suggestion made in section 3 that relations that hold under spec-head agreement depend on the content of the head, while relations that hold under government are “pure” parametric choices. Note that this idea is consistent with the proposal—which we have seen several times already—that spec-head coindexation does not suffice for licensing null subjects. Finally, we need to say something about why and how Agr1P was lost. Two, possibly related, factors are relevant to this question. First, there appears to be a relation between the loss of Agr1°, instantiated as the loss of embedded referential null subjects, and the loss of Tobler-Mussafia effects (see section 2.2). De Kok (1985, 93) gives early examples of clitic-first orders in yes/no questions in exactly the thirteenth-century prose texts that Hirschbuhler (1990) uses to establish the development of the “advanced” system of OF null subjects. This diachronic correlation is in need of further investigation and documentation, but, to the extent that it holds up, it can be accounted for straightforwardly in our terms as the loss of Agr1P. A second development that may be relevant is the loss of the morphological case system. It is well known that OF had a morphological case system that distinguished nominative from non-nominative in NPs headed by (most) masculine nouns. This system was lost between the twelfth and the fourteenth centuries (Foulet 1982, 32–33). Now, as we mentioned in section 3,

404 Anna Cardinaletti and Ian Roberts the presence of Agr1P makes Nominative-assignment under government possible independently of movement to C° and, hence, allows inversion to be a non-root phenomenon. Suppose, as seems intuitively reasonable, that non-root Nominative-assignment under government is linked to generalized nominative morphology (i.e., nominative morphology not restricted to the pronominal system). One way to think of this is by taking Agr1P to be really NomP, a projection of a head whose sole function is to assign Case. Then, the loss of the morphological case system triggers the loss of Agr1P/NomP. This proposal has the virtue of tying the loss of Agr1P/NomP to a very wellknown development in the history of French. The Germanic languages offer other examples of a possible relation between a morphological case system and Agr1P/NomP: German and Icelandic both have both, while English and the Mainland Scandinavian languages have neither. (Dutch is unusual in that it has very little morphological Case and Agr1P/NomP.) In conclusion, the preceding analysis of OF accomplishes two things: first, it allows us to retain the idea that verb-movement, either to C° or to Agr1°, was intimately connected with the licensing of referential null subjects in this language; second, we can account for a range of the word orders attested at different periods of OF (although up to now we have given no analysis of Stylistic Fronting; see the appendix). A further point is that our approach allows us to explain the restrictions on null subjects that developed during the OF period and to relate them to changes in word-order possibilities, to changes in clitic positions, and, possibly, to the loss of morphological case.

5. Conclusion In this essay, we have proposed that some languages have a more complex clause structure than has usually been supposed, in that there are two Agr-projections available. These two projections locate properties of Agr° in different positions: in particular, clitics can be thought of as occupying a position which is distinct from that of the finite verbal agreement morphology. By means of this simple assumption, we have been able to give a straightforward account of a range of phenomena involving the interaction of verb-movement, clitic-placement, and Nominative-Case assignment. On the empirical level, it emerges from our analysis that certain phenomena are characteristic of the “double-Agr” system. These are (i) generalized embedded topicalization; (ii) Tobler-Mussafia effects; (iii) phenomena involving second-position clitics more generally (e.g., the V3 orders of OE and OHG, as well as the “Wackernagel effects” of Modern German). If the suggestion made at the end of the previous section that the existence of Agr1P is closely connected to the existence of generalized morphological nominative case is correct, then we make the further prediction that properties (i-iii) correlate with the presence of morphological nominative case. On the theoretical level, we have developed a theory of subject positions. Taking as our starting point the proposals in Koopman and Sportiche

Clause Structure and X-Second 405 (1991), we have suggested that the inventory of subject positions in a given language is a function of the possible modes of Nominative-assignment. We elaborated Koopman and Sportiche’s proposal that Nominative Case can be assigned under either government or spec-head agreement. First, we suggested that spec-head coindexation, although a subcase of spec-head agreement, should be distinguished from spec-head agreement. Spec-head coindexation may allow Nominative-assignment to positions that are otherwise A′-positions (see Rizzi 1991). Also, this idea has consequences for the theory of null subjects, since spec-head coindexation never formally licenses a null subject. Second, we proposed that Nominative-assignment under agreement is not a “pure” parametric choice but, instead, is determined by the content of the Nominative-assigning head (which may be subject to parametric variation). In contrast, the possibility of Nominative-assignment under government appears to be a parametric choice. Of course, various questions remain open. The most pressing of these concerns the factors that determine the movements of the inflected verb to the various functional-head positions. In terms of the distinction between L-related and non-L-related heads (see Chomsky 1995), since both C° and Agr1° would be non-L-related heads, while Agr2° would be L-related, the question becomes: what determines verb-movement to non-L-related heads? We hope that certain approaches to answering this question have emerged in the foregoing.

Postscript (2002) The ideas presented in this chapter were first developed in 1989-1990. The material was submitted as an abstract to the GLOW Colloquium held at Cambridge University in 1990, and was accepted as an alternate paper. The essay was completed in its present form in 1990–1991, with a view to submission to the proceedings of the 1990 GLOW Colloquium. However, this volume failed to materialize, and so the paper, despite becoming somewhat well known, has never been published until now. The reason for publishing this essay now, despite the fact that it uses assumptions and technology that are at least ten years old, is that it was one of the first attempts at exploring the “structural cartography” of the clause at the CP-level. In fact, one of the goals of the essay was to extend the results of Pollock’s (1989) “split-Infl” analysis of English and French clause structure to the structural level immediately above the traditional IP. In this respect, then, we are very pleased that the paper has been accepted for publication in the first volume devoted to the structural cartography of the clause. The essay deals with four principal empirical areas, corresponding to each of the four main sections: (embedded) verb-second; clitic-second; subject positions; and, finally, null subjects. In this postscript, we briefly indicate some of the developments in each of these areas since this essay

406 Anna Cardinaletti and Ian Roberts was written. First, however, we should say something about its basic idea: that there exists an Agreement projection occupying a structural position in between the traditional C and the traditional I, and that the head of this projection has the typical properties of such elements (it can attract the inflected verb, clitics and subjects; and it can assign Nominative Case). This proposal combines two strands that have emerged as distinct trends since the essay was written; we can call these the “split-Agr” idea and the “split-C” idea. Let us briefly look at these, in turn. Numerous proposals for splitting Agr into categories such as Person, Number, and Gender began to emerge very quickly after Pollock’s initial split-Infl proposal. Shlonsky (1989) proposed a split Agr on the basis of Hebrew data. Most recently, the idea that “subject agreement” may correspond to up to four separate projections has been defended in great detail on the basis of data from Italian dialects by Poletto (2000) and Manzini and Savoia (2005). A distinct proposal was made in Chomsky (1991), cited in the early paper as Chomsky (1989) and later reprinted as chapter 2 of Chomsky (1995): that subject-agreement and object-agreement should be distinguished. Despite the conceptual argument made against the postulation of Agreement projections in Chomsky (1995, 4.10) there seem to be strong empirical reasons to postulate such positions in many languages, at least as hosts for clitics, as the work by Poletto and by Manzini and Savoia shows. The idea of a “split-Comp” of one kind or another was proposed as early as Chomsky (1977) but has received greater impetus in recent theoretical work. Shlonsky (1992, 1994) proposed an Agr-C position with many properties similar to the Agr1 position argued for here; Shlonsky exploits this position, in particular, to account for the intricate distribution of clitic and non-clitic pronouns in West Flemish (see also Haegeman 1992). The most influential recent proposal is Rizzi (1997). According to this proposal, the traditional C is split into Force and Finiteness, the former being the category that interfaces with discourse in matrix clauses and with a selecting predicate in embedded clauses, and the latter being the category that interfaces with the I-system. It is thus natural to think that Force is the structurally highest C-projection and that Fin is the structurally lowest, because it is the category that selects IP. Rizzi provides evidence that intervening between Force and Fin are a Focus category and a potentially unlimited number of Topic Phrases. Finally, Rizzi also proposes that interspersed among the Focus and Topic Phrases there may be Agreement Phrases of various kinds. It is clear, then, that Rizzi’s system shares many features with the proposals made here, although it is considerably richer and is largely aimed at a different empirical domain (the nature of “left-periphery” elements in Italian and English and their interactions with subject extraction). Tentatively, one can identify Rizzi’s Fin position with our Agr1; note that ascribing the ability to assign Nominative Case to this position creates a symmetry with the Caseproperties of T in the I -system.

Clause Structure and X-Second 407 Turning now to the topics of the specific sections, a prevalent view of the nature of embedded verb second of the type found in Yiddish and Icelandic has been that this construction involves V2 at the IP-level, with SpecIP acting as the position able to host any fronted XP (i.e., as an A′-position). This idea was proposed by both Diesing and Santorini for Yiddish (see section 1.3 and the references given there). This kind of analysis has also been widely assumed in discussions of “transitive expletive” constructions in Icelandic (see Bobaljik 1995; Chomsky 1995, ch. 4; Bobaljik and Jonas 1996, etc.), although the considerations raised here in section 4 suggest that this assumption may not be warranted. More generally, the analysis proposed here makes it possible to take the traditional SpecIP to be always and only an A-position (in more recent technical terms, a position to which DPs bearing agreement features are attracted). Clitic-second effects have been often treated as phonological phenomena. Many of the contributions in Halpern and Zwicky (1995) explicitly propose this, often appealing to a PF repair strategy such as Prosodic Inversion to place clitics, which otherwise would be illicitly in first position at PF, in the required second position. The most thorough and convincing rebuttal of this approach to second-position clitics in Slavic is Starke (1993). Madeira (1993) proposed an analysis of enclisis in European Portuguese in terms that are largely compatible with the proposals made here. The approach advocated here makes appeal to phonological processes in second-position clitic placement, as in other cases of clitic placement, unnecessary. During the 1990s a great deal of important work was done on the typology of weak and clitic pronouns, considerably improving our understanding of this area. The most important publications in this connection are Cardinaletti and Starke (1996, 1999); see also the chapters in van Riemsdijk (1999). It is natural that the proposal to split IP into various component projections should have led to the postulation of various subject positions. Cardinaletti (1997) pursues the ideas put forward here, arguing in detail that nonpronominal subjects in Italian and a number of other languages appear in a higher position than do various pronominal and null subjects. McCloskey (1996) argues that the subject position in Irish, a VSO language, is lower than the Tense position. Haider (1995), following Grewendorf (1989), proposes that German has no designated subject position; subjects are able to remain in their VP-internal position. This possibility is not considered here and would have a number of implications for the analyses of German that are proposed. Regarding null subjects, there has been a tendency with the advent of the minimalist program to reconsider the status of empty categories. Chomsky’s (1993, 1995) proposal that traces are actually copies of the moved material implies that the earlier traces have no status in the current theory. This, in turn, casts doubt on the existence of the pronominal empty categories PRO and pro. Two important recent proposals for theories of control that do away with PRO are Hornstein (1999) and Manzini and Roussou (2000).

408 Anna Cardinaletti and Ian Roberts Regarding pro, Borer (1986) proposed that this element may be eliminated in favor of taking Infl itself to act as a subject. This idea has been developed recently by Alexiadou and Anagnostopoulou (1998) and Manzini and Savoia (2005). Precisely how the proposal made here would fare in such a system is matter for future research.

Notes

We would like to thank Paola Benincà, Guglielmo Cinque, Cecilia Poletto, Luigi Rizzi, Sten Vikner, and an anonymous reviewer for their comments; Beatrice Santorini for help with Yiddish data; and Hoski Thráinsson for help with Icelandic data. Versions of this material were presented at the Incontro di Grammatica Generativa in Pisa, 1990; at the Seminario di Ricerca in Venice, 1990; and at a Workshop on Comparative Syntax in Venice, 1990. Thanks to the audiences for their comments. For the specific concerns of the Italian Academy, Anna Cardinaletti is responsible for 1.1, 1.3, 2.1, 3, 4.1, Appendix, and Ian Roberts is responsible for 1.2, 2.2, 2.3, 4.2, Postscript. The references have been updated to incorporate later published works of the original papers and dissertations and to include references to authors in the postscript. 1. We use “verb-second” (V2) as a pretheoretical descriptive term, without prejudicing how to analyze it theoretically. 2. Although, as our data show, there is no restriction on the class of clauses which allows verb-second in Icelandic, there are some independent restrictions on the interaction of Wh-movement and topicalization. It seems that an adverbial Wh-constituent cannot introduce a verb-second clause where the topicalized element is also adverbial, as the following contrasts illustrate: (i) Ég spurði hvar henni hefðu flestir aðdáendur gefið blóm. I asked where her had most fans given flowers ‘I asked where the most fans had given her flowers.’ (Thráinsson 1986, 186; cited in Santorini 1989, 67) (ii) bókin sem þegar hefur María lesið (cf. (le)) book-the that already had M. read ‘the book that Mary had already read’ (iii) *Ég veit ekki hvar í gœr hefur kýrin staðið. (Vikner 1995, 74) I know not where yesterday has the-cow stood

Example (i) has an adverbial Wh-element and an argumental topic; (ii) shows the opposite pattern. Both examples are grammatical as cases of embedded topicalization. Example (iii), by contrast, involves an adverbial Wh-element and an adverbial topic and is ungrammatical. Vikner (ibid.) proposes that the impossibility of (iii) should be attributed to failure of antecedent government. However, more needs to be said (at least given standard accounts of antecedent-government) to distinguish (iii) from (i) and (ii). 3. In German, the complementizer must be absent from the embedded clause for V2 to be possible; if daß is included in (2a), the sentence becomes ungrammatical. This is not true in Mainland Scandinavian, where the equivalent of daß is required (see Vikner 1995, 84). Example (2d) is improved if the verb is subjunctive, but not to the point of full grammaticality: (i) ??die Tatsache, gestern habe Maria dieses Buch gelesen the fact yesterday havesubj M. this book read

Clause Structure and X-Second 409 However, deverbal nouns seem to accept a V2 complement of this kind more readily: (ii) (iii)

die Behauptung, er wäre in Frankfurt the statement he weresubj in Frankfurt ‘the statement [that] he was in Frankfurt’ die Behauptung, gestern wäre er in Frankfurt the statement yesterday weresubj he in Frankfurt ‘the statement [that] yesterday he was in Frankfurt’

4. Cecilia Poletto (personal communication) raises the question of double topicalization. Why do we not find structures with both CP-recursion and Agrrecursion, which would give rise to orders of the following kind (where the inflected verb is either in the second Cº or in Agr1º):

(i) Cº[ TOP [ Cº/V [ TOP [ Agr1º/V [ SUBJECT . . .]]]]]

This structure would be manifested by double topicalization. In general, double topicalization is not possible, due to relativized minimality, since the lower topic prevents the higher one from antecedent-governing its trace. 5. An apparent problem, pointed out to us by Guglielmo Cinque (personal communication), concerns the possibility of generating VOS orders in V1 declaratives. If, as we are proposing, the verb is in Cº in such sentences, what prevents the object from moving to SpecAgr1’? This would give rise to VOS order, something we do not find (with a definite subject) in Icelandic. Notice that the problem is not limited to V1 declaratives, but, to the extent that they involve V-to-C movement, holds also of interrogatives, imperatives, and similar clause types in languages with Agr-recursion. A solution comes from the adjacency condition on Case-assignment; McCloskey (1991) shows that in a systematically V1 language like Irish, the verb must be adjacent to the subject for Nominative-assignment to take place and suggests that this is true wherever Nominative-assignment takes place under government. Adopting McCloskey’s proposal, we suggest that when the verb moves to Cº the subject must move to SpecAgr1′, blocking movement of the object to this position. 6. We have used Le Charroi de Nîmes as a primary source because it is a representative text from the relevant period (twelfth century) of Old French. 7. The idea that Agr l° can be a clitic position leads naturally to an extension of our system to account for the “reduplication of agreement” found in many Gallo-Italian dialects with rich systems of subject clitics. See Roberts (1993a) for a version of this idea. Also, our system may provide a straightforward account of inflected complementizers of the type found in West Flemish (see Haegeman 1992) and Bavarian (see Bayer 1984–1985). These phenomena are illustrated in (i): (i) a. (La Maria) la parla. (Northern Italian dialects) (the M.) she talks ‘Mary talks.’ b. . . . da-n-k goan. (West Flemish) that-lsg-I go-lsg ‘. . . that I go.’ c. . . . da-ø-me goan. (West Flemish) that-lpl-we go-l pl ‘. . . that we go.’ d. . . . wenn-st kummst. (Bavarian) when-2sg come-2sg ‘. . . when you come.’

410 Anna Cardinaletti and Ian Roberts 8. In German, V-to-C movement can bypass the Agr1 head. In principle (i.e., according to general conditions on locality), head-movement to C can skip Agr. In the languages where it cannot—for example, Icelandic—we take Agr1 to act as an attractor for V. 9. In fact, the situation is more complicated than (29) indicates in that sentences like (i) are possible on condition that at least the second pronoun is stressed: (i) ?. . . daß es/sie der Hans ihm wahrscheinlich gegeben hat. that it/her the H. him-DAT probably given has

We interpret this fact not as a counterexample to the analysis being put forward in the text, but as an example of scrambling of a (stressed) pronoun. When pronouns occupy some position other than the Wackernagel position, they have undergone scrambling in the same way as full NPs. 10. Another important confirmation comes from Penner’s (1990) work on the acquisition of Bernese Swiss German. Penner shows that acquisition of this dialect proceeds in phases that can be neatly accounted for by assuming that Agr2P is acquired before Agr1P and Agr1P before CP. 11. This connection between matrix V1 and embedded topicalization was noticed by Santorini (1989, 64). According to her, the languages that have both properties include Icelandic, Yiddish, and Old French. We capture this correlation in our system by saying that such languages allow matrix clauses to be Agr1Ps, with “Vº -topicalization” to Cº giving matrix V1. This is, in fact, what we said for Icelandic in section 1.1. The relative infrequency of matrix V1 in OF as compared to OIt suggests that this is not the right approach for OF, at least not for the “core” period where matrix declaratives are CPs (see sections 1.2 and 3). It is possible that this analysis will turn out to be correct for the early period of OF where embedded topicalization is possible (see section 1.2). 12. This proposal appears to lead to a problem in German. We have been operating under the assumption that Agr1º is the head that assigns Nominative. However, the analysis in section 2.1 clearly implies that V moves to Cº, “skipping” Agr1º. So, the Nominative- assigning head does not appear to be moved to Cº. To deal with this, we tentatively suggest that Cº in German independently has the capacity to assign Nominative when it contains a verb that agrees with its specifier. Evidence in favor of this comes from es-clauses with a definite subject, for example: (i) ?Esi hat ti dieser Mann angerufen. it has this man called ‘This man has called.’

It is hard to avoid the conclusion that there are two Nominative NPs here: es and dieser Mann. In terms of our analysis, C° assigns Nominative to es in SpecC′ under agreement, and Agr1° assigns an independent Nominative feature to dieser Mann in SpecAgr2′ under government. SpecAgr1′ is occupied by the trace of es, see Cardinaletti (1990a, 1994). This proposal captures the frequently made observation that the definiteness effect is less rigid in German than in other languages. 13. It is possible that the distinction that Rizzi and Roberts (1996) [this volume, Chapter 9] draw between morphologically selected and free head-to-head movement may be relevant here. Since Rizzi and Roberts use the notion of morphological selection only for cases where affixes attract stems, rather than for cases of cliticization, their system should be modified to be able to apply to cases where clitics simply impose a requirement of attachment to words of a certain class. As we saw in section 2.2, the best treatment of the Tobler-Mussafia

Clause Structure and X-Second 411 effect involves saying that the clitic must combine with the inflected verb, but we cannot state whether this requirement is one of enclisis or proclisis. In terms of Kayne’s (1991) proposal that enclisis involves a configuration where the clitic is structurally lower than the element it cliticizes to, it is not clear how the mechanism of morphological selection can be adapted since selection always requires that the selecting element govern the selected element. 14. As in Cardinaletti (1994), we are assuming that es and það are essentially like the expletives of non-V2 languages. They have no special affinity with the sentence-initial position (SpecC′ and SpecAgr1′, respectively), and they are not “fillers” of this (or any other) position. 15. According to the Partitive Hypothesis, an indefinite subject can receive Partitive Case in a position inside VP. If we assume that the base position of the subject is inside the projection of the main verb, the position assigned Partitive cannot be the base position of the subject, since this NP can precede an auxiliary that selects the main verb (see Vikner 1995 for further discussion).

References Adams, M. (1987a) “From Old French to the Theory of Pro-Drop.” Natural Language and Linguistic Theory 5, 1–32. Adams, M. (1987b) “Old French, Null Subjects and Verb Second Phenomena,” Ph.D. Diss., University of California at Los Angeles. Adams, M. (1988a) “Embedded pro,” in J. Blevins and J. Carter (eds.) Proceedings of Nels 18, 1–21. Amherst: University of Massachusetts. Adams, M. (1988b) “Les effets V2 en ancien et en moyen français.” Revue québécoise de linguistique théorique et appliquée 7 (special issue on Aspects de la syntaxe historique du français), 13–40. Adams, M. (1988c) “Word Order and Null Subjects: Contributions from Old French.” Unpublished ms., University of California at Los Angeles. Alberton, S. (1990) “Enclise du pronom objet en français et en italien antique ou la loi Tobler-Mussafia,” Mémoire de Licence, Université de Genève. Alexiadou, A., and E. Anagnostopoulou (1998) “Parametrizing AGR: Word Order, V-movement and EPP-checking.” Natural Language and Linguistic Theory 16(3), 491–539. Bayer, J. (1984–1985) “COMP in Bavarian Syntax.”Linguistic Review 3, 209–274. Belletti, A. (1988) “Unaccusatives as Case Assigners.” Linguistic Inquiry 19, 1–34. Belletti, A. (1990) Generalized Verb Movement: Aspects of Verb Syntax. Turin: Rosenberg and Sellier. Benincà, P. (1983–84) “Un’ipotesi sulla sintassi delle lingue romanze medievali.” Quaderni Patavini di Linguistica, 3, 3–19. Benincà, P. (1989) “L’ ordine delle parole nelle lingue romanze medievali.” Paper presented at the 19th International Congress of Linguistics and Romance Philology, Santiago de Compostela, Spain. Bobaljik, J.D. (1995) “Morphosyntax: The Syntax of Verbal Inflection,” Ph.D. Diss., MIT, Cambridge, Mass. Bobaljik, J.D., and D. Jonas (1996) “Subject Positions and the Roles of TP.” Linguistic Inquiry 27(2), 195–236. Borer, H. (1986) “I-Subjects.” Linguistic Inquiry 17, 375–416. Boschetti, L. (1986) “Zur Syntax des Pronomens Sich,” M.A. Thesis, Università di Venezia. Cardinaletti, A. (1990a) “Es, pro and sentential arguments in German.” Linguistische Berichte 126, 135–164.

412 Anna Cardinaletti and Ian Roberts Cardinaletti, A. (1990b) “Subject/Object Asymmetries in German Null Topic Constructions and the Status of SpecCP,” in J. Mascaró and M. Nespor (eds.) Grammar in Progress: GLOW Essays for Henk van Riemsdijk, 75–84. Dordrecht: Foris. Cardinaletti, A. (1994) La sintassi dei pronomi: Uno studio comparativo delle lingue germaniche e romanze. Bologna: Il Mulino. Originally cited as: (1990c) “Pronomi nulli e pleonastici nelle lingue germaniche e romanze: Saggio di sintassi comparata,” Ph.D. Thesis, Università di Padova e Venezia. Cardinaletti, A. (1997) “Subjects and Clause Structure,” in L. Haegeman (ed.) The New Comparative Syntax, 33–63. London: Longman. Cardinaletti, A., and M. Starke (1996) “Deficient Pronouns: A View from Germanic—A Study in the Unified Description of Germanic and Romance,” in H. Thráinsson, S. D. Epstein, and S. Peter (eds.), Studies in Comparative Germanic Syntax, vol. 2, 21–65. Dordrecht: Kluwer. Cardinaletti, A., and M. Starke (1999) “The Typology of Structural Deficiency: A Case Study of the Three Classes of Pronouns,” in H. van Riemsdijk (ed.) Clitics in the Languages of Europe, 145–233. Berlin: Mouton de Gruyter. Chomsky, N. (1957) Syntactic Structures.The Hague: Mouton. Chomsky, N. (1977) “On wh-movement,” in A. Akmajian, P. Culicover, and T. Wasow (eds.) Formal Syntax, 71–132. New York: Academic Press. Chomsky, N. (1981) Lectures on Government and Binding. Dordrecht: Foris. Chomsky, N. (1989) “Some Notes on Economy of Derivations and Representations,” in I. Laka and A. Mahajan (eds.) MIT Working Papers in Linguistics 10. Cambridge, Mass.: MIT Press. Reprinted in Chomsky 1995, ch. 2. Chomsky, N. (1991) “Some Notes on Economy of Derivation and Representation,” in R. Freidin (ed.) Principles and Parameters in Comparative Grammar, 417– 454. Cambridge, Mass.: MIT Press. Chomsky, N. (1993) “A Minimalist Program for Linguistic Theory,” in K. Hale and S. J. Keyser (eds.) The View from Building 20, 1–52. Cambridge, Mass.: MIT Press. Chomsky, N. (1995) The Minimalist Program. Cambridge, Mass.: MIT Press. de Kok, A. (1985) La place du pronom personnel régime conjoint en français: Une étude diachronique. Amsterdam: Rodopi. den Besten, H. (1983) “On the Interaction of Root Transformations and Lexical Deletive Rules,” in W. Abraham (ed.) On the Formal Syntax of the Westgermania, 47–131. Amsterdam: John Benjamins. Diesing, M. (1988) “Word Order and the Subject Position in Yiddish,” in J. Blevins and J. Carter (eds.) Proceedings of NELS 18, 124–140. Amherst: University of Massachusetts. Diesing, M. (1990) “Verb Second in Yiddish and the Nature of the Subject Position.” Natural Language and Linguistic Theory 8, 41–80. Dupuis, F. (1988) “Pro-drop dans les subordonnées en ancien français.”Revue québécoise de linguistique théorique et appliquée 7 (special issue on Aspects de la syntaxe historique du français), 41–62. Dupuis, F. (1989) “L’expression du sujet dans les subordonnées en ancien français,” Ph.D. Diss., Université de Montréal. Foulet, L. (1935–1936) “L’ extension de la forme oblique du pronom personnel en ancien français,” Romania 61, 257–315, 401–463; Romania 62, 27–91. Foulet, L. (1982) Petite syntaxe de l’ancien français (3rd ed.). Paris: Editions Champion. Galves, C. (1994/2001) “V-Movement, Levels of Representation and the Structure of S.” In Letras de Hoje, Porto Alegre. Portuguese version published as “Movimento de V, níveis de representação e a estrutura de IP” in C. Galves Ensaios

Clause Structure and X-Second 413 sobre as gramáticas do português, Editora da Unicamp, Universidade Estadual de Campinas, Campinas, Brazil. Giusti, G. (1986) “On the Lack of Wh-Infinitivals with zu and the Projection of COMP in German,” Groninger Arbeiten zur germanistischen Linguistik 28, 115–169. Grewendorf, G. (1989) Ergativity in German. Dordrecht: Foris. Haegeman, L. (1992) Generative Syntax: Theory and Description—A Case Study from West Flemish. Cambridge: Cambridge University Press. Haider, H. (1995) Derivations and Representations in German (Arbeitspapiere des Sonderforschungsbereichs 340). Stuttgart/Tübingen: University of Stuttgart/ Tübingen. Halpern, A., and A. Zwicky (1995) Approaching Second. Stanford, Calif.: Center for the Study of Language and Information. Harris, M. (1978) The Evolution of French Syntax: A Comparative Approach. London: Longman. Hirschbuhler, P. (1990) “La légitimation de la construction V1 à sujet nul dans la prose et le vers en ancien français.” Revue québécoise de linguistique 19, 32–55. Hirschbuhler, P., and M.-O. Junker (1988) “Remarques sur les sujets nuls en subordonnées en ancien et en moyen français.” Revue québécoise de linguistique théorique et appliquée 7 (special issue on Aspects de la syntaxe historique du français), 63–84. Hornstein, N. (1999) “Movement and Control.” Linguistic Inquiry 30(1), 69–96. Jaspers, D. (1989) “A Head Position for Dutch Clitics or: Wilma, Wim and Wackernagel,” in D. Jaspers, W. Klooster, and Y. Putseys (eds.) Sentential Complementation and the Lexicon, 241–253. Dordrecht: Foris. Kayne, R. (1991) “Romance Clitics, Verb Movement and PRO.”Linguistic Inquiry 22, 647–686. Kayne, R., and J.-Y. Pollock (1978) “Stylistic Inversion, Successive Cyclicity and Move-NP in French.” Linguistic Inquiry 9, 595–621. Kiparsky, P. (1995) “Origins of Germanic Syntax,” in A. Battye and I. Roberts (eds.) Clause Structure and Language Change, 140–170. New York: Oxford University Press. Koopman, H., and D. Sportiche (1991) “The Position of Subjects.”Lingua 85, 211–258. Le Charroi de Nímes (1972) Ed. J. L. Perrier. Paris: Champion. Lowenstamm, J. (1977) “Relative Clauses in Yiddish: A Case for Movement.”Linguistic Analysis 3, 197–216. Madeira, A. (1993) “Clitic-second in European Portuguese.” Probus 5, 155–174. Maling, J. (1990) “Inversion in Embedded Clauses in Modern Icelandic,” in J. Maling and A. Zaenen (eds.) Modern Icelandic Syntax: Syntax and Semantics, vol. 24, 71–91. New York: Academic Press. Originally cited as: (1980) Islenskt mál og almenn málfræði 2, 175–193. Manzini, M. R., and A. Roussou (2000) “A Minimalist Theory of A-Movement and Control.” Lingua 110, 409–447. Manzini, M. R., and L. Savoia (2005) I dialetti italiani e romanci: morfosintassi generativa. Alessandria: Edizioi dell’Orso (3 volumes). McCloskey, J. (1991) “Clause Structure, Ellipsis and Proper Government in Irish.”Lingua 85, 259–302. McCloskey, J. (1996) “On the Scope of Verb Movement in Irish.” Natural Language and Linguistic Theory 14(1), 47–104. Motapanyane, V. (1991) “Theoretical Implications of Complementation in Rumanian,” Ph.D. Diss., Université de Genève. Mussafia, A. (1983) Scritti di filofogia e linguistica. Ed. A. Daniele and L. Renzi. Padua: Antenore. Penner, Z. (1990) “The Acquisition of the Syntax of Bernese Swiss German: The Role of Propositional Elements in Restructuring Early Grammar.” Paper presented at

414 Anna Cardinaletti and Ian Roberts the Fifteenth Annual Boston University Conference on Language Development, October. Linguistic Institute, University of Berne. Platzack, C. (1987) “The Scandinavian Languages and the Null Subject Parameter.” Natural Language and Linguistic Theory 5, 377–401. Platzack, C. (1988) “The Emergence of a Word-Order Difference in Scandinavian Subordinate Clauses,” McGill Working Papers in Linguistics: Special Issue on Comparative Germanic Syntax, 215–238. Platzack, C. (1994) “The Loss of Verb Second in English and French,” in A. Battye and I. Roberts (eds.) Clause Structure and Language Change, 200–226. New York: Oxford University Press. Poletto, C. (2000) The Higher Functional Field: Evidence from the Northern Italian Dialects. New York: Oxford University Press. Pollock, J.-Y. (1986) “Sur la syntaxe de en et le paramètre du sujet nul,” in M. Ronat and D. Couquaux (eds.) La Grammaire Modulaire, 211–246. Paris: Editions de Minuit. Pollock, J.-Y.(1989) “Verb Movement, UG and the Structure of IP.” Linguistic Inquiry 20, 365–424. Price, G. (1971) The French Language: Present and Past. London: Edward Arnold. Rizzi, L. (1982) Issues in Italian Syntax. Dordrecht: Foris. Rizzi, L. (1986) “Null Objects in Italian and the Theory of pro.”Linguistic Inquiry 17, 501–557. Rizzi, L. (l990a) Relativized Minimality. Cambridge, Mass.: MIT Press. Rizzi, L. (1990b) “Speculations on Verb Second,” in J. Mascaró and M. Nespor (eds.) Grammar in Progress: GLOW Essays for Henk van Riemsdijk, 375–385. Dordrecht: Foris. Rizzi, L. (1991) “Proper Head Government and the Definition of A-Positions.” GLOW Newsletter 26, 46–47. Rizzi, L. (1996) “Residual Verb Second and the Wh Criterion,” in A. Belletti and L. Rizzi (eds.) Parameters and Functional Heads: Essays in Comparative Syntax, 63–90. New York: Oxford University Press. Originally cited as: (1991a) Technical Reports in Formal and Computational Linguistics, No. 2. Geneva: University of Geneva. Rizzi, L. (1997) “The fine structure of the left periphery,” in L. Haegeman (ed.) Elements of Grammar, 281–337. Dordrecht: Kluwer. Rizzi, L., and I. Roberts (1996) “Complex Inversion in French,” in A. Belletti & L. Rizzi (eds.) Parameters and Functional Heads: Essays in Comparative Syntax, 91–116, New York: Oxford University Press, 1996. Originally published as: (1989) Probus 1, 1–30. This volume, Chapter 9. Roberts, I. (1991) “Excorporation and Minimality.”Linguistic Inquiry 22(1), 209– 218. This volume, Chapter 10. Roberts, I. (1993a) “The Nature of Subject Clitics in Franco-Provençal Valdôtain,” in A. Belletti (ed.) Syntactic Theory and the Dialects of Italy, 319–353. Turin: Rosenberg and Sellier. Roberts, I. (1993b) Verbs and Diachronic Syntax. Dordrecht: Kluwer. Rögnvaldsson, E. (1984) “Icelandic Word Order and Það-Insertion.” Working Papers in Scandinavian Syntax 8, 1–21. Rögnvaldsson, E., and H. Thráinsson (1990) “On Icelandic Word Order Once More,” in J. Maling and A. Zaencn (eds.) Modern Icelandic Syntax: Syntax and Semantics 24: 3–40. New York: Academic Press. Santorini, B. (1988) “Against a Uniform Analysis of All Verb-Second Clauses.” Unpublished ms., University of Pennsylvania. Santorini, B. (1989) “The Generalization of the Verb-Second Constraint in the History of Yiddish,” Ph.D. Diss., University of Pennsylvania. Schulze, A. (1888) Der altfranzösische direkte Fragesatz. Leipzig: S. Hirzel.

Clause Structure and X-Second 415 Schwartz, B. D., and S. Vikner (1996) “The Verb Always Leaves IP in V2 Clauses,” in A. Belletti and L. Rizzi (eds.) Parameters and Functional Heads: Essays in Comparative Syntax, 11–62. New York: Oxford University Press. Shlonsky, U. (1989) “The Hierarchical Representation of Subject-Verb Agreement.” Unpublished ms., University of Haifa. Shlonsky, U. (1992) “The Representation of Agreement in Comp.” Geneva Generative Papers 0, 39–52. Shlonsky, U. (1994) “Agreement in Comp.” Linguistic Review 11, 351–375. Sigurðsson, H. (1985) “Subordinate V/1 in Icelandic: How to Explain a Root Phenomenon.” Working Papers in Scandinavian Syntax 18, 1–58. Sigurðsson, H. (1989) “Verbal Syntax and Case in Icelandic,” Ph.D. Diss., University of Lund. Starke, M. (1993) “En deuxième position en Europe Centrale,” Mémoire de Licence, University of Geneva. Thráinsson, H. (1986) “V1, V2, V3 in Icelandic,” in H. Haider and M. Prinzhorn (eds.) Verb Second Phenomena in the Germanic Languages, 169–194. Dordrecht: Foris. Thurneysen, R. (1892) “Die Stellung des Verbums im Altfranzösischen.” Zeitschrift für Romanische Philologie 16, 289–371. Tomaselli, A. (1990) La sintassi del verbo finito nelle lingue germaniche. Padua: Unipress. Tomaselli, A. (1991) “Cases of V3 in Old High German,”Groninger Arbeiten zur germanistischen Linguistik, 33, 93–127. Travis, L. (1984) “Parameters and Effects of Word Order Variation,” Ph.D. Diss., MIT. Vance, B. (1988) “L’évolution du pro-drop en français medieval.” Revue québécoise de linguistique théorique et appliquée 7 (special issue on Aspects de la syntaxe historique du français), 85–112. Vanelli, L., L. Renzi, and P. Benincà (1985) “Typologie des pronoms sujets dans les langues romanes.” Actes du XVIIe Congrès International de Linguistique et Philologie Romanes (Aix-en-Provence, 29 August-3 September 1983), vol. 3, 162–176. Aix-en-Provence: Universitè de Provence. van Kemenade, A. (1987) Syntactic Case and Morphological Case in the History of English. Dordrecht: Foris. van Riemsdijk, H. (ed.) (1999) Clitics in the Languages of Europe. Berlin: Mouton de Gruyter. Vikner, S. (1995) Verb Movement and Expletive Subjects in the Germanic Languages. New York: Oxford University Press. Originally cited as: (1990) “Verb Movement and the Licensing of NP-Positions in the Germanic Languages,” Ph.D. Thesis, Université de Genève. von Wartburg, W. (1934) Evolution et structure de la langue française. Paris: Larousse. Wackernagel, J. (1892) “Über ein Gesetz der indogermanischen Wortstellung.” Indogermanische Forschungen 1, 333–435. Zwart, J.-W. (1991) “Clitics in Dutch: Evidence for the Position of INFL.”Groninger Arbeiten zur germanistischen Linguistik 33, 71–92.

Appendix Stylistic Fronting and Embedded Topicalization

Throughout this essay, and especially in section 1, we have kept Stylistic Fronting distinct from embedded topicalization. However, Thráinsson (1986) proposes that Stylistic Fronting is to be identified with embedded topicalization; he suggests that the “subject-gap” requirement for Stylistic Fronting is illusory and that Stylistic Fronting is simply topicalization of a non-subject where the subject is unavailable. In contrast, Vikner (1995, 117) gives two reasons to keep the two constructions separate. First, Stylistic Fronting typically affects non-nominal elements (participles, adverbs, and adjectives), while topicalization preferentially applies to NPs and PPs (see Maling 1990). Note that, for participles at least, it is not even clear that the moved category is a maximal projection. Second, topicalization from clauses containing an embedded topicalization is impossible in Icelandic, while topicalization from clauses where Stylistic Fronting has taken place is fine (examples from Vikner ibid.): (A1) a. *Maríu veit ég að þessum hring lofaði Olafur. Mary-DAT know I that this ring-Ace promised Olaf-NOM ‘I know that Olaf promised this ring to Mary.’ b. Þessi maður held ég að tekið hafí út peninga úr bankanum. this man think I that taken has out money from bank-the ‘I think that this man has taken money out of the bank.’ Furthermore, our discussion of OF in section 1.2 and of Yiddish in section 1.3 suggests that the two processes are distinct, since these languages seem to allow Stylistic Fronting in cases where they do not allow embedded topicalization. Nevertheless, the two operations are very similar to each other, in both their effects and their cross-linguistic distribution, and so it is worth briefly exploring a way of relating them to each other. In this way, too, we can provide a concrete analysis of Stylistic Fronting, something we did not attempt in the main part of the chapter. Let us begin by considering again a minimal contrast between topicalization and Stylistic Fronting from Yiddish (Santorini, personal communication):

Clause Structure and X-Second 417 (A2) a. b.

*Ikh veys nit vu nekhtn iz di ku geshtanen. I know not where yesterday is the cow stood ‘I don’t know where the cow has stood yesterday.’ Ikh veys nit vu nekhtn iz geshtanen a ku. I know not where yesterday is stood a cow

Example (A2a) has the same structure as parallel cases of embedded topicalization in Icelandic: nekhtn is in SpecAgr1′ and di ku is in SpecAgr2′, while the inflected verb is in Agr1°. As we saw in section 1.3, this structure is ruled out in Yiddish for reasons connected to constraints on extraction from clauses involving topicalization. What is the structure of (A2b)? The grammaticality of this kind of sentence depends on the presence of a subject gap, as is well known. The thematic subject is indefinite and appears to occupy some VP-internal position. We propose to explain the well-known requirement that Stylistic Fronting involves a subject gap in the following way: Stylistic Fronting is topicalization via SpecAgr2′. In (A2b), nekhtn moves to SpecAgr1′ via SpecAgr2′. Hence, SpecAgr2′ must not contain the subject in order for Stylistic Fronting to be possible, since it must contain the trace of the fronted element (for a very similar idea, see Vikner 1995, 118). Where Stylistic Fronting takes place, SpecAgr2′ is nevertheless a subject position in the sense of the discussion in section 3. This means that the movement is a case of fronting of a non-nominal to an A-position. This makes it possible for the trace in this position to be Case-marked (although the Case-marking is not obligatory; hence, where some NP undergoes Stylistic Fronting, there is no Case clash). In languages in which SpecAgr1′ is a topic position (e.g., Icelandic and Yiddish), the further movement from SpecAgr2′ to SpecAgr1′ is formally an instance of topicalization—that is, A′-movement. In languages in which SpecAgr1′ is not a topic position (e.g., Old French), this movement could be a further instance of “A movement” of a non-nominal. The “A” character of this movement means that (a) in OF, the fronted element can be moved to a non-topic position and (b) in Yiddish, the “A”-movement to SpecAgr2′ is not subject to the constraints that apply to Wh-movement. Where the precondition on Stylistic Fronting is satisfied by extraction of the subject, what we have said implies that the subject trace is not in SpecAgr2′ but is in a lower position, possibly the “internal” subject position in VP. This analysis is analogous to the analysis of postverbal subjectextraction often proposed for Italian (see Rizzi 1982, ch. 4). Further support comes from the fact that postverbal subjects are found in the languages that have Stylistic Fronting. This is illustrated in the following examples from Icelandic (Platzack 1987, 378), Yiddish (Santorini 1989, 101), and Old French (Dupuis 1989, 163), respectively: (A3) a. Það munu kaupa þessa bók margir stúdentar. it will buy this book many students ‘Many students will buy this book.’

418 Anna Cardinaletti and Ian Roberts

b. Es hot ongerufen mayner a khaver. it has on-called my-NOM a friend ‘A friend of mine called.’ c. . . . . que li soudans de Coine oï dire que si faitement avoient fait li Francois. . . . that the Sultan of Coine heard tell that so in this way had done the French. ‘ . . . that the Sultan of Coine heard tell that the French had done in this way.’

13 The Analysis of VSO Clauses Ian Roberts

Introduction Although R. Morris Jones & Thomas (1977) were the first to propose a verb-movement rule for Welsh, Emonds (1980) was the first to propose that VSO orders generally are derived by verb-movement (Emonds was not, however, directly concerned with Welsh). In general terms, Emonds’ idea was that the existence of VSO orders is not incompatible with the postulation of an underlying VP constituent. As long as verb-movement rules exist (see Roberts (2005, Chapter Five) for a technical discussion of this point in terms of the proposals in Chomsky (2001)), such a rule can move the verb out of the VP and to the left over the subject. This gives a derived VSO order from an underlying SVO or SOV order. In Welsh, we can very clearly see that something like this is going on in VSO clauses. There is a general possibility of using periphrastic tenses, up to the possibility of what we can think of as free “do-insertion”. Free “doinsertion” is available in the future and preterit tenses, as illustrated in (1) (Rouveret (1994: 71–85) argues that gwneud (“do”) and bod (“be”) are in fact main verbs with a particular type of complement in examples like (1b); in Roberts (2005,Chapter Three), we adopt a variant of Rouveret’s proposal, but this does not alter the fact that we can treat the general availability of “do” as a kind of free do-insertion, even if the “do” in question has properties distinct from those of English “do”):1 (1) a. Fe/mi welais i Megan PRT saw I Megan “I saw Megan.” b. Fe/mi wnes i weld Megan PRT did I see Megan “I saw Megan.” (The particle fe is more standard and more widely used in South Welsh, while mi is characteristically Northern. Use of the synthetic (“short”) verb form is neutral as to register, while the periphrastic form in (1b) is more colloquial). It seems clear that the lexical verb welais in (1a) is in the same position as

420 Ian Roberts the auxiliary verb wnes in (1b). Both elements appear in between the clauseinitial particle mi and the subject i. Moreover, except for the possibility of infixed pronouns in Literary Welsh (which we briefly discuss and analyse in Section 2.2 below), the only thing which can appear in between the particle and the subject is a finite verb or auxiliary. It seems natural to think that this position is the position for the finite element of the clause, especially given the general possibility of periphrastic tenses (see Note 1). If the verb is chosen as the finite element, it moves to that position; if an auxiliary is chosen, the main verb remains in VP (although see Roberts (2005, Chapter 3, Section 2), for a refinement of the latter point). (1b) shows that the order of elements in the clause following the position for the finite element is SVO (and this is in line with the general head-initial typology of Welsh).2 Sproat (1985) gives a number of other arguments leading to the same conclusion; in particular, he shows that non-finite verbs typically appear in SVO orders, and that a non-finite verb can be fronted along with its complement, stranding the subject. The relevant facts are illustrated in (2a) and (2b), respectively: (2) a. [Cyn i Siôn ladd draig], y mae rhaid iddo brynu llaeth i’r gath. Before to John kill dragon, Prt is necessary to-him buy milk for-the cat. “Before John kills a dragon, he has to buy milk for the cat.” (Sproat (1985: 205)) b. [Gadael y glwyd ar agor] a wnaeth y ffermwr. Leave the gate on open Prt did the farmer “Leave the gate open, the farmer did.” (Rouveret (1994: 77)) We conclude with Sproat, and following the general consensus of work on Welsh (and the Celtic languages in general, see for example the Introduction to Borsley & Roberts (1996)), that VSO clauses involve an operation which moves the verb out of VP to the left over the subject. The above considerations are straightforward and easy to motivate. However, I have not explicitly said (a) what the position that V moves to is, (b) what the position of the subject is, or (c) why either element occupies the position it does. In a theory which posits the existence of one or more functional categories above VP, a range of different options are available in this connection. Assuming a right-branching structure (presumably as a universal—see Kayne (1994)), the one thing we have to ensure is that the finite verb occupies a position higher than and therefore to the left of that of the subject at Spell Out. In the remainder of this chapter, I consider a range of possible analyses of VSO clauses. First, in terms of the EPP of Chomsky (1982) (the subject requirement), I will show that the subject leaves VP (section 1). Second, I will show that the EPP-preserving analysis of VSO—placing the subject in SpecIP and the verb in C—cannot work (section 2). This means that we have

The Analysis of VSO Clauses 421 to accept an intermediate possibility involving a more elaborate functional structure than just CP containing IP containing VP. The full analysis of VSO clauses is given in Roberts (2005, Chapter Two, Section 1).

1. The Subject Leaves VP On the basis of the above considerations, it should be clear what the simplest analysis of VSO would be. We have seen that V raises to a VP-external position, and that this position is associated with finiteness. It seems very natural to identify this position as Tense. Hence we could posit that (a) Tense (T) has a strong V-feature and (b) T has a weak D-feature. In terms of the system in Chomsky (1995, Chapters 3 and 4), this feature specification would give rise to a T that attracts V while at the same time preventing the subject DP from moving into its Specifier, thus giving rise to VSO order. Proposal (b) entails that the Extended Projection Principle (EPP) as formulated in Chomsky (1995, Chapter 4) can be parametrised.3 Given that Chomsky proposes that it is just the categorial features of functional heads are subject to parametrisation, this seems like a good result (see Chomsky (1995: 284)). In fact, Chomsky assumes essentially this analysis for VSO clauses in Irish (see Chomsky (1995: 374–375)). Moreover, McCloskey (1996b) argues on independent grounds that the EPP does not hold in Irish. These proposals yield a derived structure like the following for (1a) at Spell Out (glossing over the matter of the position of the initial particle for now—see Section 2): (3)

TP T V

welaisi

VP DP

i

V′ V

DP

ti

Megan

In addition to it being adopted in Chomsky (1993, 1995), Koopmann & Sportiche (1991) also consider this as a possibility. The approach is appealing from a cross-linguistic perspective in that it means that we can set up a typology of properties of T along the following lines (see also Chomsky (1993), Carnie (1995), Bobaljik & Carnie (1996)): (4) A possible parametrisation of T’s features: Strong D, strong V: French, an SVO language with V-to-T movement (Pollock (1989)) Strong D, weak V: English, an SVO language without V-movement Weak D, strong V: Celtic Weak D, weak V: German (see Haider (1997))

422 Ian Roberts There are, however, a number of arguments against an analysis of the type in (3). Most of these are quite well-known by now, but I go through them here to show the strength of the case against this analysis. The first argument has to do with adjacency effects. McCloskey (1991) shows that V must be adjacent to the subject in VSO clauses in Irish: (5) *Dúirt sí go dtabharfadh amárach a mac turas orm. said she that would-give tomorrow her son visit on-me. V X S The same observation holds for Welsh: (6) *Mi welith yfory Emrys ddraig. PRT will-see tomorrow Emrys dragon V X S It is widely assumed that the space between T and VP contains positions in which adverbs of various kinds can appear (see Cinque (1999) for a very detailed set of hypotheses in this connection).4 If so, then we expect such adverbs to intervene between the raised verb and the subject if the structure of a VSO clause is (3). The fact that these orders are impossible indicates that (3) is not the correct analysis of VSO clauses. The second argument was noticed by Koopman & Sportiche (1991) and by Rouveret (1991). In a way it’s the inverse of the first argument. In modern spoken Welsh, the form of negation seems to be similar to French ne .. pas, in that there are two elements, one a preverbal clitic-like element and the other postverbal adverb-like element.5 The relevant observation in the present context is that the second element of negation follows the subject:6 (7)

a. Cheisiodd Gwyn ddim ateb y cwestiwn bob tro. Tried Gwyn not answer the question each turn “Gwyn didn’t try to answer the question every time.” b. Dydy Gwyn ddim wedi mynd i Aberystwyth bob tro. Is Gwyn not after go to Aberystwyth each turn “Gwyn hasn’t always gone to Aberystwyth.” (Borsley & R. Morris-Jones (2000: 27))

If we assimilate (d)dim to French pas (see Notes 6 and 7), then we take it to occupy a VP-external position lower than T and higher than the position in which the subject is merged. In that case, the order subject–ddim–VP in (7) shows us that the subject is not in its merged (VP-internal, or vP-internal) position. Concretely, let us follow Pollock (1989) in assuming that the pastype negative is in SpecNegP and that NegP intervenes between TP and VP.

The Analysis of VSO Clauses

423

This situation is illustrated both for Welsh and for the comparable French sentence in (8): (8) (Ni) ddarllenodd Emrys . . . Jean ne lut . . . NegP ddim pas

Neg′ VP

Neg DP

V′ V

tsubj

tv

QP o’r

llyfr

We must conclude that the subject raises from VP. Not all the Celtic languages are like Welsh as regards the relative order of the subject and the second part of negation. The Goidelic languages just have preverbal negation (Irish ní (present), níor (past); Scots Gaelic and Ulster Irish cha). However, McCloskey (1996a) shows that certain adverbs, which it is plausible to think of as attached to VP (or vP), intervene between the subject and the object in Irish VSO clauses and cannot intervene between the verb and the subject: (9) Níor shaotaigh Eoghan ariamh pingin. Neg-Past earned Owen ever penny “Owen has never earned a penny.” This example indicates that the subject leaves VP in Irish. Breton has a “double negation” ne .. ket. Here, unlike Welsh, the order is verb–ket–subject–object, as (10) illustrates (as (10) also illustrates, the preverbal particle may delete—cf. Stephens (1982: 67)):7,8 (10) (Ne) lenn ket ar vugale levrioù. Neg read not the children books. “The children don’t read books.” This observation raises an interesting comparative question, which I will not explore here. For our present concerns, (10) can be taken as indirect confirmation of the claim that the subject raises in Welsh. It may be that the subject remains in a VP- (or vP-) internal position in Breton (but see Note 16 and below for some indications that this is not so, despite initial appearances); but the comparative evidence then clearly indicates that it does not do so in Welsh.9 At least one dialect of Welsh appears to behave partly like Breton; this is the Pembrokeshire dialect discussed by Awbery (1990) and Rouveret (1994). In this dialect, (d)dim is the principal clausal negator; ni(d) does not appear

424 Ian Roberts overtly (although the initial consonant of the verb is mutated, suggesting that it is syntactically present but deleted in the phonology; see Roberts (2005, Chapter Two, Section 3)). Pronominal subjects must precede (d)dim: (11) Chwrddes i ddim ag e. Met I not with him “I didn’t meet him.” (cited in Rouveret (1994: 137)) On the other hand, definite nominal subjects are able, but not required, to follow(d) dim: (12) a. Ath ‘y nhad ddim i mâs i ddrychid. Went my father neg to outside to look. “My father didn’t go outside to look.” b. A fywodd ddim ‘r ‘en grwban bach. And lived neg the old tortoise little “And the little old tortoise did not survive.” (Rouveret (1994: 137)) In an example like (12b) it appears that the subject is able to remain in a “low”, possibly VP-internal, position (unless ‘r ‘en grwban bach has undergone Heavy NP-Shift, a possibility pointed out by Mair Parry (p.c.)). I have nothing to say about the existence of variation in the position of the subject in this dialect, but conclude that the (definite) subject raises from such a “low” position in other varieties of Welsh.10 The third argument is due to McCloskey (1996a), who makes the argument for Irish, although it carries over to Welsh. McCloskey points out that the standard A-dependencies—passive, unaccusative and raising—are all found in Irish. These are illustrated in (13):11 (13) a. Tá sé críochnaithe againn (passive) Is it finished by-us “That has been finished by us.” b. Neartaigh a ghlór. (unaccusative) Strengthened his voice “His voice strengthened.” c. Ó ráinís tigh a cheannach ar an maile so.(raising) Since you-happened house to buy in the town this “Since you happened to have bought a house in this village” Such constructions are also found in Welsh:12 (14) a. b.

Cafodd y dyn ei ladd. Got the man his killing “The man was killed.” Diflanodd y dyn. disappeared the man “The man disappeared.”

(passive) (unaccusative)

The Analysis of VSO Clauses 425

c. Mae Gwyn yn siwr o fod yma. is Gwyn in sure from be here “Gwyn is sure to be here.”

(raising)

While there is no doubt about the existence of passive and raising verbs in Welsh, no reliable diagnostics for unaccusatives exist in the literature (although cf. Tallerman (2001) for some relevant considerations). One likely argument is based on the behaviour of deverbal adjectives in –edig. Although they are rather unproductive, such adjectives, like adjectival passive participles in English, have the property that the nominal they modify must be an internal argument, giving the following contrast: (15) a. llyfr printiedig book printed “a printed book” b. dyn printiedig man printed “a printed man” (impossible in the interpretation of “man who has printed something”) Unergative intransitives therefore cannot form such adjectives, while putative unaccusatives (intransitives whose single argument undergoes a change of state) can, at least marginally: (16) a. *dyn canedig man sung “a sung man” b. ?dyn diflanedig man disappeared “a disappeared man”

(unergative) (simple unaccusative)

Thus, in an example like (14b), y dyn appears to act as the internal argument of the verb and is therefore a good candidate for an underlying object which has raised to subject position (thanks to Mari Jones and David Willis for help with and useful discussion of this data). We can tell y dyn is in subject position here by making the tense compound. In that case the subject appears between the auxiliary and the verbal noun, while the object position follows the verbal noun: (17) a. Mae’r dyn wedi diflanu. Is-the man Asp disappear “The man has disappeared.” b. *Mae wedi diflanu’r dyn. Is Asp disappear-the man

426 Ian Roberts We conclude that in (14b) and (17a) the single argument of diflanu (“disappear”) has raised from the underlying object position to the surface subject position. On minimalist assumptions, A-movement, like all movement, must be driven exclusively by the need to check (or value—see Chomsky (2001)) features. Features are assumed to be checked as a result of movement to the Specifier of a functional head. It follows from these assumptions that movement to Spec, VP—a position in the checking domain of a lexical head, but not a functional one—is impossible; lexical heads do not offer the possibility of feature-checking (Hornstein (1999) proposes that lexical heads check θ-features and that correspondingly movement to their Specifiers is allowed; Hornstein claims that such movement to θ-positions gives rise to control relations, as such it is not relevant here (see also Manzini & Roussou (2000))). Now, if the subject position were VP-internal, A-dependencies would precisely involve movement to Spec, VP. Hence the derived subjects of raising, passive and unaccusative verbs cannot be in Spec, VP. If we can show that these subjects are no different in their position in their clauses from other subjects, then we have an argument that subjects in general are not VP-internal. We can use the considerations raised earlier to show that the derived subjects in (14) are not in a position different from that occupied by other subjects. (18) shows that these subjects, like others, must be adjacent to the finite verb/auxiliary (cf. (6)):13 (18) a. *Cafodd ddoe y dyn ei ladd Got yesterday the man his kill b. *Diflanodd ddoe y dyn. disappeared yesterday the man c. *Mae yfory Gwyn yn siwr o fod yma. is tomorrow Gwyn in sure from be here

(passive) (unaccusative) (raising)

(19) shows that these subjects precede (d)dim in negatives (cf. (7)): (19) a. b. c.

Chafodd y dyn ddim o’i ladd. Got the man neg of-his killing “The man wasn’t killed.” Ddiflawnodd y dyn ddim. Disappeared the man neg “The man didn’t disappear.” Dydy Gwyn ddim yn siwr o fod yma. is Gwyn not in sure from be here “Gwyn is sure to be here.”

(passive) (unaccusative) (raising)

There is thus no difference between the position occupied by the derived subjects of A-dependencies and the position occupied by other subjects. I conclude that derived subjects raise to the specifier of a functional head, and

The Analysis of VSO Clauses 427 therefore not to Spec, VP, and that other subjects, since they do not appear to occupy a different position from clearly derived subjects, also raise to the specifier of a functional head. The above argument may be able to shed some light on the situation in Breton. We have seen that subjects follow the second element of negation ket (see (10)), and we suggested that perhaps subjects do not in fact raise from their merged position in this language. Of course, the order ket—subject does not, without further assumptions about the position of NegP, tell us that the subject has not moved at all. It simply shows that one indication of movement in Welsh is not found in Breton. In the light of the argument just made, however, we can see that if subjects are not raised in Breton then we expect that either A-dependencies will not be found or that the derived subjects of A-dependencies will be in different positions from other subjects. If neither of these expectations is fulfilled, and in particular if derived subjects of A-dependencies follow ket, then we must conclude that ket occupies a relatively “high” position in Breton—arguably higher than (d)dim does in Welsh (and in any case higher than the position given for (d)dim and pas in (8)). The first observation is that Breton has at least one kind of A-dependency, namely passive (illustrated in (20a)). (20b) is, at least on semantic grounds, a plausible instance of an unaccusative, although I am unaware of any clear syntactic tests for unaccusativity in Breton. It is unclear whether Breton clearly has raising; the equivalent of “seem” is a nominal seblan, from whose complement raising unsurprisingly does not take place. Nevertheless, I tentatively take the existence of a fairly clear participial passive construction as evidence for the existence of the possibility of A-dependencies in this language, which is sufficient for the present argument: (20) a. b.

Gant ar vasonnerien eo savet mogerioù By the masons is built walls “The walls are built by the masons.” Aet eo an den. Gone is the man “The man is gone.”

—passive —unaccusative

The derived subjects of these dependencies must be in the specifier of a functional head. If these subjects follow the second element of negation ket, then the subject position cannot be VP-internal. The relevant sentences are in (21) (vez in (21a) is habitual “be”): (21) a. b.

Ne vez ket savet ar mogerioù gant ar vasonnerien. Neg are neg built the walls by the masons “The walls are not built by the masons.” N’eo ket aet an den. Neg-is neg gone the man “The man has not gone.”

428 Ian Roberts We conclude that the basic difference between Breton and Welsh which is illustrated by the contrast between (7) and (10) above has to do with the position of the second element of negation rather than with the fact that the subject leaves VP in one language and not the other (this does not entail that the subject is in the same position in both languages, however; there could be two differences between the languages). A fourth argument that the subject leaves VP can be derived from Huang’s (1993) work on VP-fronting and reconstruction. Huang analyses the differences in interpretation of anaphors in fronted predicates as compared to fronted arguments that were first observed by Barss (1986), e.g.: (22) a. [Which pictures of himself] does John think that Bill would like —? b. [Criticise himself], John thinks Bill never would —. Barss’ observation is that the anaphor in (22a) can be interpreted such that either John or Bill is its antecedent, while in (22b) only Bill can be interpreted as its antecedent. Barss proposes a mechanism allowing the complex Wh-phrase in (22a) to be “reconstructed” either into the binding domain of John or into the binding domain of Bill. Why is reconstruction into the binding domain of John not available for the fronted VP in (22b)? Huang’s answer to this question is that the antecedent of himself is the VP-internal trace contained in the fronted VP. The trace must be the trace of the local subject Bill. In other words, the full representation of (22b), including traces and coindexing relations, is as in (23):14 (23) [VP ti criticise himselfi], John thinks Billi never would—. Reconstruction of the fronted VP (by whatever mechanism) makes no difference to the interpretation of the anaphor since it is already contained in the binding domain of the trace, and therefore must take the antecedent of the trace as its antecedent. It should be clear that Huang’s argument depends on (a) the VP-internal subject hypothesis and (b) the idea that the subject raises from VP. Because of this, we can use sentences like (22) as a diagnostic for whether the subject is raised from VP in Welsh. As we saw in (2b), what appears to be VP-fronting stranding the subject is available in Welsh. Now, in the case where these fronted constituents contain anaphors, if the interpretative judgements are the same as for English, then this implies that there is a subject trace inside the fronted constituent, and hence that the subject has raised. If not, then we have to treat the fronting operation as fronting of a category smaller than the category containing the merged position of the subject. The relevant data is as follows: (24) a. [Pa luniau ohon’i huni/j] y mae Johnj yn credu y mae Billi yn eu hoffi—? which pictures of-his self Prt is John Prt believe Prt is Bill Prt its like— “Which pictures of himself does John believe Bill likes?

The Analysis of VSO Clauses 429 b. [Siarad â’r huni/*j], y mae Johnj yn meddwl bod Billi—. Speak with-his self, Prt is John Prt think that-is Bill “Talk to himself, John thinks Bill does.” The judgements exactly parallel English, as described by Barss and Huang. Wh-movement of an argument category containing an anaphor, as in (24a), allows either John or Bill to be interpreted as the antecedent. On the other hand, Wh-movement of a predicative category, as in (24b,c), only allows the lower potential antecedent—Bill—to be the actual antecedent. We can account for these facts exactly as Huang does, with the consequence that we must assume that the subject leaves the VP in these cases. We thus have a further argument that the subject leaves VP in Welsh.15 Finally, let us briefly consider an argument that subjects leave VP that has been made on the basis of data from Northern dialects of Irish (the construction is also found in Scots Gaelic—see Adger (1996)). The argument, first made by Bobaljik & Carnie (1996) (see also Carnie (1995)), is based on the existence of SOV order in infinitival clauses in these varieties, as in:16 (25) Ba mhaith liom [ (é) an teach a thógáil]. is good with-me (him) a house-ACC Prt build “I would like him to build the house.” In Southern dialects, the direct object is Genitive and the order is SVO: (26) Ba mhaith liom [ (é) a thógáil an tí] is good with-me (him) to build Prt house-GEN “I would like him to build the house.” Bobaljik & Carnie propose that the Accusative form of the object is found when the object moves to SpecAgrOP. Therefore the subject must have raised out of VP, because it precedes the object. The lowest available position for the subject is SpecTP. Therefore V is higher than T. They conclude that V is in AgrS.17 In conclusion, we have seen several reasons to reject an analysis of the kind given in (3). All our arguments point in the same direction: the subject is raised from its base position in Welsh (and we have seen evidence that the same can be maintained for Irish and Breton). We now know that the subject is in the specifier of a functional category, and we can be sure that the verb is in a functional head position higher than the subject. Moreover, the adjacency evidence proposed by McCloskey (1991), illustrated in (5) and (6) above, suggests that the subject is in the specifier position of the functional category which is immediately subjacent to the one whose head V moves to

430

Ian Roberts

(and that further adjunction to this category is impossible). Schematically, the situation must be as follows: (27)

F1P F1

F2P

[wD] [sV] DP

F2′ F2 [sD]

Here we have indicated—following the system of Chomsky (1995)—what the relevant weak and strong feature-values associated with F1 and F2 must be. In the next section we will consider various hypotheses as to the categorial values of F1 and F2, keeping in mind the question of the status of the EPP in relation to these categories.

2.

Against Generalized V-to-C

The first hypothesis to consider is: (28) F1 = C; F2 = I. For the time being, I remain agnostic as to the precise identity of I. We could assume, following Chomsky (1995, 4.10) that there is no AgrSP (or at least that there is no AgrSP in Welsh), and so I would be T. Alternatively, I could correspond to AgrS (or to a split AgrS, as proposed in Roberts (2005, Chapter Two)). The relevant point for what follows is that an analysis like (28) has V raising into the C-system (which we will also split into its component functional categories below) with the subject in the highest specifier position in IP. If such an analysis can be maintained, then of course we can maintain that Welsh (and presumably Celtic more generally) satisfies the EPP in the sense of requiring a clausal subject, i.e. that SpecIP be filled. The idea that V raises to C in Irish was proposed by Déprez & Hale (1986), Hale (1989), Stowell (1989) and Doherty (1996). However, there are three principal reasons to reject this idea for Welsh (two of which may carry over to Irish). The first concerns the lack of root-embedded asymmetries in verb-movement, as compared with the well-known situation in the Germanic languages. The second concerns the nature of the elements that can be found in the left periphery of the clause in Welsh (and in Irish). The third concerns the peculiarities of the auxiliary bod (“be”) and the distribution of certain tensed-verb forms in Welsh. I will now deal with each of these points in turn.

The Analysis of VSO Clauses 431 2.1 Root-Embedded Asymmetries and Movement to C If we propose that VSO involves V in C (cf. (28)), then we assimilate V-movement in Welsh VSO clauses to the verb-movement found in Germanic verb second, whether full (as in German and the other Germanic languages aside from English—cf. den Besten (1983), Vikner (1995)) or residual (as in English and French—cf. Rizzi (1996)). These cases of verb-movement are illustrated in (29):18 (29) a. b. c.

Morgen/ wann werden sie dieses Buch lesen? (German) Tomorrow/when will they this book read(?) “Tomorrow they will read this book/When will they read this book?” When will they read this book? Quand liront-ils ce livre? (French) When will-read-they this book? “When will they read this book?”

The first difficulty arises when we observe that this movement is largely restricted to root clauses in the languages just mentioned. This can be seen particularly clearly in indirect questions:19,20 (30) a. *Ich frage mich, ob morgen wird Maria dieses Buch lesen. I ask me if tomorrow will Maria this book read. b. *I wonder if will she read the book. c. *Je me demande si lira-t-elle le livre. I me ask if will-read-she the book. The simplest account of this restriction of movement to C was proposed by den Besten (1983). He proposed that the presence of material in C blocks head-movement to C (+WH Cs have to be regarded as filled in cases like I wonder who left). Other accounts have been proposed by Kayne (1982), Rizzi & Roberts (1989 [this volume, Chapter 9]) and Rizzi (1996), which I will not discuss in detail here. The important point for present purposes is that the root-embedded asymmetry is held to be a major hallmark of verb-movement to C. In the next section I will offer a characterisation of the root-embedded asymmetry in terms of Rizzi’s (1997) “split-C” system. The important observation is that VSO orders in Celtic are not restricted to root clauses (as noted by Guilfoyle (1990), McCloskey (1996b), Bobalijk and Carnie (1996)): (31) a. Tybed a geith hi ddiwrnod rhydd wythnos nesa? (Welsh) I-wonder Prt will-get she day free week next “I wonder if she’ll get a free day next week.”

432 Ian Roberts b. Goulenn a reas hag-en oac’h eveurus. asked Prt did-3sg whether were-2sg happy “He asked whether you were happy.” c. Chuir sé ceist ort an raibh tú sásta. Asked he question to-you Prt were you content “He asked whether you were content.”

(Breton) (Irish)

Here the particles a (in (31a)), hag-en (in (31b)) and an (in (31c)) function in a fashion precisely parallel to that of ob, if and si in (30) in that they mark the embedded finite clause as an embedded question. I take it as uncontroversial that the embedded clauses in (30) are CPs, that the head of the clause, i.e. C, is responsible for marking clause-type (declarative, interrogative, etc.; more on this in Roberts (2005, Chapter Four)) and that in (30), therefore, the particles ob, if and si are in C (note that all the languages in (30) are uncontroversially head-initial, at least at the CP level). If we apply this reasoning to (31), we can observe that the finite embedded clauses are plausibly regarded as CPs (there appears to be no reason not to so regard them, and in any case, if it could be shown that they were not CPs then the basic point of this section, that V does not raise to C in VSO clauses, would be supported since the embedded clauses are clearly VSO here) and that the particles mark clause type. By parity of reasoning with the standard analysis of German, English and French, therefore, and since the Celtic languages are clearly head-initial, these particles can naturally be supposed to be in C. What the contrast between (30) and (31) therefore shows is that C-elements do not block verb-movement over the subject in Celtic, but they do in Germanic and in French. In fact, as mentioned earlier, the only factor conditioning verb-movement in Celtic seems to be finiteness (this will be slightly refined in Section 2.3 below, at least regarding Welsh; on Breton “long” V-movement see Roberts (2005, Chapter Four, Section 2)). In this respect, verb-movement in Celtic patterns like verb-movement in French (cf. Pollock (1989); this parallel was first made by McCloskey (1996b)). Thus, in order to maintain that V moves to C in Celtic, we would have to stipulate that complementisers do not block V-movement in these languages, but do in Germanic and French, and that the finiteness property which triggers V-movement in French is associated with C in Celtic. Clearly, it is much simpler and more natural to conclude that V moves to I in (31) rather than to C. The argument just given is suggestive, not conclusive. To make it more precise, we could say something like the following: (i) in many languages embedded C is not a position into which V can move; (ii) in many languages finiteness conditions V-movement to I, but not to C. Then we observe that V-movement in Celtic is not sensitive to the root-embedded asymmetry, but is sensitive to finiteness and, given (i) and (ii), conclude that V moves to I but not to C in these languages. However, this conclusion depends on the general cross-linguistic validity and theoretical underpinnings of (i) and (ii). As

The Analysis of VSO Clauses 433 we shall see in Section 2.3, there is good evidence that some tense-marking is a property of the C-system in Celtic; however, the tense-marking in question is restricted in various ways which in fact suggest that (main) verbs do not in fact move to C. Point (i) depends on assumptions about the C-system that we now investigate in more depth. Old Irish and Scots Gaelic are of some interest in the present context, since it seems that in these languages V-movement to C is morphologically marked. However, the root-embedded asymmetry persists, which implies that the inability of embedded C to host incorporation is due to the presence of embedded complementisers, along the lines of den Besten’s original insight. Carnie, Pyatt & Harley (2000) argue that Old Irish has a “filled-C” requirement, although it is not verb second (see also Doherty (1998, 1999, 2000); Doherty in fact argues that Old Irish is a residual V2 language). Old Irish, then, may be a genuine case of a language with just one part of the verb-second constraint, the part that relates to C (Ferraresi (1991) and Longobardi (1978) argue that Gothic also has a “filled-C” requirement, but only in embedded clauses). The element in C is therefore in absolute first position. Various kinds of material can satisfy this constraint in a root clause: “conjunct particles” (negation, question markers, complementisers), preverbs and the verb itself—this data is discussed and analysed in more detail in Roberts (2005, Chapter Four, Section 2). Most interestingly, when the verb itself occupies C, it takes on a specific morphological form, traditionally known as the independent form (the noninitial form is known as the dependent form). This alternation is illustrated in (32): (29) a. Beirid in fer in claideb Carries (independent) the man the sword b. Ní beir/ *beirid in fer in claideb. Not carries (dependent)/*(independent) the man the sword (Carnie, Pyatt & Harley (2000: 45)) As Carnie, Pyatt & Harley suggest, the alternation in form can be interpreted as a morphological reflex of the features that attract the verb to C.21 Scots Gaelic retains the Old Irish distinction between independent and dependent verb forms. The distinction remains productive in the future tense, where we observe examples like the following: (33) a. òlaidh mi —independent present/future tense drink I b. òl mi —dependent present/future tense drink I (Calder (1990: 223)) The dependent forms occur when there is a particle in C (e.g. gun “that”; see Roberts (2005, Chapter 4) for more discussion of the C-particles in Celtic).

434 Ian Roberts The independent forms appear when there is no particle (see Calder (1990: 202f.)). Modern Irish retains the distinction with a handful of verbs. The dependent forms are found after a C-particle (except for the leniting particle aL— see McCloskey (2001)); the independent forms are found after aL and in absolute initial position: (34) a. b. c.

Creidim go bhfaca mé do nighean. I-believe that saw-dep I your daughter “I believe that I saw your daughter.” an bean a chonaic tú the woman aL saw-indep you “the woman that saw you” Chonaic tú í. saw-indep you her “You saw her.”

It is unlikely that these forms are synchronically associated with V-movement to C; I am aware of no independent evidence for this, and Doherty (2000: 28–31) argues that the correlation between independent morphology and initial position (i.e. C) was weakening as early as Classical Old Irish (9th century AD). In this section, we have seen the evidence that there is no root-embedded asymmetry in verb-movement in the Celtic languages. This suggests that, without further refinements regarding the nature of complementiser systems (see next section), we should conclude that V does not raise to C in these languages. However, the argument is not conclusive as it stands. In the next section, I look first at some other data and then return to the question of the root-embedded asymmetry. In the process, I adopt and adapt a much more articulated conception of the structure of complementiser systems, following Rizzi (1997). 2.2 Adverbs, Particles and the Structure of Comp The second problem for the idea that V moves to C comes from the position of adverbs. On the basis of the relative positions of sentential adverbs and complementisers, McCloskey (1996a) argues that C lowers to I in Irish. The argument is based on the observation that in general across languages sentential adverbs don’t adjoin to CP: (35) a. b. c. d.

In general, he understands what’s going on. It’s probable that in general he understands what’s going on. *It’s probable [CP in general [CP that he understands what’s going on]]. *[In general [that he understands what’s going on]] is surprising.

In (35c,d) the bracketing is meant to indicate that the adverb should be interpreted as modifying the that-clause. These readings are impossible in

The Analysis of VSO Clauses 435 English. McCloskey calls this general ban on the adjunction of adverbs to CP the Adjunction Prohibition. Irish shows the opposite distribution of adverbs in relation to CPs: (36) a. Is doíche [faoi cheann cúpla lá [go bhféadfaí imeacht]]. is probable at-the-end-of couple day that could leave Adv C I b. *Chreid said go roimh i bhfad rachadh na cuarteoirí ‘na bhaile. Believed they that before long would-go the visitors home McCloskey also shows that we cannot maintain that Irish simply lacks the Adjunction Prohibition (however exactly this idea might be formulated). The evidence against this idea is that the order adverb–WH phrase is bad: (37) *Ní bhfuair siad amach ariamh an bhliain sin cé a bhí ag goid a gcuid móna Neg found they out ever the year that who Prt was stealing their turf McCloskey proposes (i) that sentential adverbs adjoin to IP in Irish just as in English (and other languages), and (ii) that Irish has a rule which lowers C to I. The C-to-I lowering rule derives orders like that in (36a) and, since it is obligatory, it explains the ungrammaticality of (37). Schematically, the relevant parts of (36a) have the following structure: (38) t [ip Adv [ip C+I . . McCloskey’s general conclusion is that the distribution of sentential adverbs in relation to complementisers is such that we cannot claim that V moves to the C position. Before going on to look at the situation in this respect in Welsh, I’d like to propose a way of handling McCloskey’s data without having recourse to a C-to-I lowering operation. This in turn will lead to a suggestion regarding Germanic root-embedded asymmetries. McCloskey’s conclusion is retained under the analysis to be proposed. The basic idea is to capitalise on the overall similarity between the structure of McCloskey’s argument and the structure of Pollock’s (1989) arguments for V-to-I raising in French and I-to-V lowering in English. Pollock observed, inter alia, the following contrast between French and English: (39) a. French: V+Infl Adv direct object as in: Jean embrasse souvent Marie b. English: Adv V+Infl direct object as in: John often kisses Mary

(=(39b))

Pollock concluded that V raises to Infl in French, but that Infl lowers to V in English.

436 Ian Roberts The Irish-English contrast that we saw in (35) vs. (36) can be handled in the same way. Here what we have is the following (note that this pattern only shows up clearly in embedded contexts): (40) a. Irish: Adv C IP b. English: C Adv IP So, the possibility emerges of saying that Irish complementisers do not move (overtly, at least) while English ones do. We can make this idea concrete if we adopt the recent proposals for a “split-Comp” system that have been developed by Rizzi (1997). Rizzi argues that at the left periphery of the clause, above IP, there are at least three separate projections: ForceP, FocusP and FiniteP—in that order—interspersed with possibly recursive TopPs. The categories we are interested in here are Fin and Force: Fin marks the clause as finite or not (and, since non-finite clauses usually designate unrealised events, may really be a variety of MoodP—see Cinque (1999)). Force is the position associated with clausal typing and illocutionary force. Now, as Rizzi points out, complementisers like English that and Irish go mark two things: that the clause they introduce is declarative and that it is finite. In this respect, they are each associated with features of two heads, Force and Fin, just as a finite verb is associated with properties of V (thematic structure) and T (tense). So we might expect that crosslinguistically, some complementisers are overtly realised in Force and others in Fin. Looking again at (40), we can see that in these terms, that can be analysed as appearing in Force and go as appearing in Fin. We can in fact conclude (with Rizzi (1997: 313), although Rizzi’s implementation of the idea is somewhat different) that English Force overtly attracts Fin in embedded contexts (this will be slightly qualified below), while in Irish this is not the case. We need only add that sentential adverbs either appear in a TopP in between Force and Fin or occupy specifier positions reserved for them (along the lines proposed in Cinque (1999)), and we derive the different distributions of adverbs and complementisers without recourse to a lowering rule. The structure in (41) summarises the proposal: (41) [ForceP [Force that]..[TopP Adv ..[FinP [Fin go] IP (41) transparently encodes the fact that English shows the order that–Adv– IP (as in (35b)) and not Adv–that–IP (see (35c,d)), while Irish allows precisely the latter (see (36a)) and not the opposite (see (36b)). We are thus able to retain the essential point of McCloskey’s analysis—that V does not raise to C in Irish—without recourse to an ad hoc and problematic lowering rule. The analytic devices which effectively replace C-to-I lowering in the analyses proposed here are the split-C structure and upward head-movement of that. The postulation of head-movement is quite uncontroversial (but see Chomsky (2001) and Roberts (2005, Chapter 5)); the former is motivated

The Analysis of VSO Clauses 437 by Müller and Sternefeld (1993) and Rizzi (1997), as well as numerous subsequent studies. The proposals regarding the positions of Celtic complementisers are more fully motivated and integrated into the overall comparative picture below and in Roberts (2005, Chapter Four). The fact that sentential adverbs cannot precede WH-phrases in Irish, illustrated in (37), can be accounted for if we assume that Wh-phrases in Irish move to a Specifier position higher than TopP; Rizzi argues that Italian interrogative Wh-phrases occupy Spec, FocP (FocP is situated in between ForceP and TopP). In that case, we can account straightforwardly for (34).22 McCloskey argues that C-to-I lowering takes place at PF on the basis of the distribution of negative-polarity items (NPIs). Assuming standardly that NPIs must be commanded by a negative element that licenses them at S-structure, McCloskey points out that Irish NPIs can be fronted by a process known as Narrative Fronting to a position which does not appear to be c-commanded by a negative element (the elements in question must be NPIs, since Irish lacks negative quantifiers of all kinds—see Acquaviva (1996)): (42) Neach ar bith dínn ní bheidh beo Being any of-us neg will-be alive “Not one of us will be alive.” McCloskey reconciles this fact with the general condition on NPIs by assuming that the NPI is c-commanded by ní at S-structure, with lowering of ní in PF. To fully deal with this point would take us quite far afield, into a detailed consideration of how NPIs are licensed. However, I would like to briefly sketch an alternative approach to this data. The requirement that NPIs be licensed by a c-commanding negation is motivated by data such as the following (see Ladusaw (1979), Laka (1990), Zanuttini (1991), and the references given there): (43) a. He didn’t speak to anybody. b. *Anybody didn’t speak to him. c. Didn’t anybody speak to him? (43b) shows that NPIs in subject position are bad unless the negation is raised to C (reverting momentarily to a unitary-C approach). In English, NPIs are systematically bad in SpecCP, as McCloskey observes (his (109), p. 43): (44) a. *Ever haven’t I seen such a sight. b. *Under any circumstances wouldn’t I do that. Suppose that the c-command condition derives from that fact that the Neg head must license NPIs. In English, the Neg head is found either in the

438 Ian Roberts position of the finite auxiliary where there is no inversion, or in C—more precisely in Foc, according to Rizzi (1997). The subject cannot form a chain whose head is the Neg-element in the non-inverted auxiliary position, but can if the Neg element is in the Foc position. Features of a complement are c-commanded by the auxiliary position. In this way, the data in (43) and (44) can be accounted for. But what of the Irish example in (42)? Here the crucial observation is that the Irish negative elements are part of the C-system. Suppose, then, that in line with our analysis of go, ní occupies Fin in (42) and raises covertly to Force. We must regard the condition on NPIs as an LF condition, since one of the tenets of minimalism is that independent S-structure well-formedness conditions do not exist. Hence at LF, ní will be in a position which c-commands the fronted constituent. (I am assuming that the fronted negative constituent in (42) is in SpecFoc or SpecTop)).23 The essential difference between Irish and English lies (a) in the fact that Force does not overtly attract Fin in Irish as we see from apparent violations of the Adjunction Prohibition like (36a), and (b) in the fact that Fin is inherently able to contain negative material—i.e. the simple observation that Irish has negative complementisers.24 An important feature of this analysis is that the negative head in English cannot be supposed to raise into the C-system covertly,25 and when it does raise to Foc, as in the well-formed counterparts of (41), it does not raise further to Force. More generally, the proposals just made for the difference between English and Irish complementisers point the way to an account of Germanic-style root-embedded asymmetries in verb-movement. Following Rizzi & Roberts (1989 [this volume, Chapter 9]), we assume that the generalisation has to do with selection. Let us suppose, then, that a selected Force position has features in virtue precisely of being selected by a higher predicate. The fact that typical complementisers like English that raise from Fin to Force must then be attributed to the fact that selected Force triggers overt Fin-movement (presumably because it requires PF-realisation). Let us suppose that the verbmovement part of V2 is a reflex of the fact that Fin requires a PF-realisation; following the notation introduced in Roberts (2001: 99–103), we indicate a functional feature F requiring a PF-realisation as F*, so in this case we have Fin*.26 In these terms, we can see that complementisers are able to satisfy this requirement in embedded clauses, even if they subsequently raise to Force. So all the Germanic languages, including English, have Fin-to-Force movement where Force is selected (i.e. in embedded clauses). This blocks V-movement to Fin in embedded clauses as a general case of Merge preempting Move; exploiting the presence of complementisers in C as a way of blocking V2 is an idea that goes back to den Besten (1983). Consistent with the idea that all the Germanic V2 languages have Fin* and that this is a significant component of the V2 phenomenon, we can observe that all these languages differ from English in requiring the presence of a complementiser in finite embedded declaratives (with the notable

The Analysis of VSO Clauses 439 exception of German—see below). German diverges somewhat from the general pattern in that it requires embedded V2 exactly where the complementiser is missing: (45) Ich glaube, gestern habe Maria dieses Buch gelesen. I believe yesterday has M. this book read. ‘I believe Mary read this book yesterday.’ But of course this is consistent with the general proposal for Fin*. A further refinement is required in order to account for the presence of that in “CP-recursion” contexts in English like (46) (and the comparable situation in the Scandinavian languages—see Vikner (1995)): (46) a. b.

I said that never in my life had I seen a place like Bangor. Vi ved at denne bog har Bo ikke læst. We know that this book has Bo not read “We know that Bo has not read this book.” (Danish: Vikner (1994: 67))

Assuming V is in Foc and the negative constituent in SpecFocP in (46a), that cannot have raised to Force from Fin. To account for this, we assume that that is merged in Force in the complements to bridge verbs, again presumably to satisfy the requirement that embedded Force have a lexical realisation. In these cases in V2 languages, as for example in (46b), Fin* is satisfied by V-movement (Fin* is satisfied by V-movement passing through it in (46a)). The ability to directly select a complementiser in Force is a property of bridge verbs in the languages in question (this is presumably connected to the observation that these embedded clauses have assertive illocutionary force, cf. Hooper & Thompson (1973)); German presumably does not allow this but instead selects the subjunctive in examples like (45) (see also Penner & Bader (1995) on further properties of this construction). Non-bridge verbs (canonically factives like regret, etc) are unable to directly select a complementiser in this position.27 In wh-complements where no overt complementiser is present something more must be said. Here we can capitalise on an observation by Stowell (1981: 422) to the effect that selection for +WH neutralises selection for Fin. This can be illustrated by paradigms such as the following: (47) a. b. c. d.

I explained how to fix the sink. I explained how we should fix the sink. I explained that we should fix the sink. *I explained to fix the sink.

[+WH, -Fin] [+WH, +Fin] [-WH, +Fin] [-WH, -Fin]

In Rizzi’s system, this is straightforwardly accounted for by the local nature of selection (see Chomsky (1965)) and the fact that both Force and Foc are structurally higher than Fin. In that case, it follows that selection for Force

440 Ian Roberts as Interrogative and/or for Foc as +WH,28 blocks selection for a feature of Fin. Hence the requirement for a finite complement (i.e. selection for Fin), violated in (47d), is inoperative where a feature of Foc is selected, as in (47a,b). The crucial assumption here is that local selection can take place “across” a head lacking features relevant for selection; in other words, (48a) and (48b) are possible selection configurations, but not (48c) (where “. . .” contains no heads and each head asymmetrically c-commands the next from left to right): (48) a. A[+F] . . . B[+F] —local selection b. A[+F] . . . B . . . C[+F] —non-local selection across an inert head c. A[+F] . . . B[+F] . . . C[+F] —impossible for A to directly select C The parallels between selection so construed and the Agree relation of Chomsky (2001), as well as certain versions of relativised minimality are clear, but I will not pursue them here (see Roberts (2005, Chapter Two, Section 2.2), for a formulation in terms of relativised minimality). What we observe in (47), then, are the following selection relations (and non-relations): (47a,b): V selects Force[+Q]; Force [+Q] selects Foc[+WH]; Foc[+WH] does not select any property of Fin; (47c): V directly selects finite Fin, Force and Foc being inert (see (48b)), this requirement is violated in (47d) (assuming explain requires finite Fin where it is structurally able to select a property of Fin). In matrix clauses, we observe essentially the same situation. Force[+Q] can select a [+WH] Foc, in which case Fin is not selected for. On the other hand, if Foc is inactive (i.e. [-WH]), then Fin must be finite. These facts are illustrated by the following paradigm: (49) a. b. c. d.

What to do? Force[+Q], Foc[+WH], Fin not selected. What should we do? Force[+Q], Foc[+WH], Fin not selected. Should we leave? Force[+Q] selects finite Fin where Foc is inactive. *Whether/if to leave? Violation of selection property of Force[+Q].

If Fin is active, it is Fin*; as (49) shows, where Foc is [+WH], Fin may be either active or not, but where Foc is inactive, Fin* is selected by Force[+Q], and must be active.29 What about Fin in (47)? Why does it not trigger movement in V2 languages if it is not selected? Clearly, the reason is that this is not an inherently declarative Fin. Suppose then that Fin (and therefore the clause it heads) is interpreted as declarative just when no higher position in the C-system is active. This is very natural, since it amounts to saying that +finite Fin is interpreted as declarative when it is in a clear intuitive sense the head of

The Analysis of VSO Clauses 441 CP. It also entails that Force is inactive in (root) declarative clauses, and so declarative is the unmarked clause type. So, non-selected declarative Fin is equivalent to root, finite Fin. Non-selected declarative Fin is in principle distinct from selected, declarative Fin; the former is a kind of default, while the latter is a selected C-head, entering into selection relations like other heads. Selected, non-declarative Fin is different again. We thus arrive at the following general picture of the varieties of Fin (here “*” means that the relevant type of Fin needs a PF-realisation in the language in question, not that it is ungrammatical): (50) a. b. c. d.

+selected -selected +selected -selected

+declarative +declarative -declarative -declarative

* in Germanic * in Germanic, not in English (full V2) * in Germanic, * in English (residual V2) not found, as declarative is a default.

The above sketch of the C-system and the root-embedded asymmetry in Germanic clearly shows that Fin-to-Force movement is available in these languages. In Irish, on the other hand, Force does not overtly attract Fin, as McCloskey’s evidence shows. Given this, we might conclude that the absence of root-embedded asymmetries in verb-movement in Celtic is not an argument against a V-to-C analysis of VSO; V could be analysed as moving to Fin, just as suggested above for Yiddish and Icelandic. However, we have clear evidence that V does not raise to Fin in Irish. We have seen that go is in Fin at Spell Out, and if we follow Kayne (1994) in assuming that all head-adjunction must be to the left of the target, then the simple fact that we find go > Verb and not Verb > go tells us that V does not raise to Fin. In this respect, Irish differs from Yiddish and Icelandic, in that in those languages it is reasonable to postulate that complementisers appear in Force (see Note 28). Thus, by a roundabout route, we endorse McCloskey’s conclusion that the facts of adverb placement argue against V-raising into the C-system in Irish. (However, note that we have restricted our attention to the Irish complementiser go; according to McCloskey (2001), this item is typical of Irish C-elements, but at least the Wh-particle aL shows a different “morphosyntactic profile”, to use McCloskey’s term). Let us now turn to Welsh, and see what the above considerations tell us about the position of V in VSO clauses in this language. In fact, Welsh is very interesting in this respect. In terms of the schema for the difference between English and Irish given in (40), we can observe that Welsh has the order C—Adv—C. In other words, adverbs can appear after certain complementisers but not others, as shown in Tallerman (1996). They cannot intervene between the particle y, which introduces finite clauses, and the verb: (51) *Dywedodd ef y yfory bydd yn gadael. Said he Prt tomorrow he-will-be Asp leave

442 Ian Roberts By parity of reasoning with what we said about Irish go, we could take (46) as evidence that y is in Fin. However, adverbs cannot precede y (and have lower scope) either (see also Tallerman (1996: 122)): (52) Dywedodd ef yfory y bydd yn gadael. Said he tomorrow Prt he-will-be Asp leave “He said tomorrow that he’ll leave.” Here, the adverb yfory must be interpreted as having matrix scope, giving an anomalous temporal interpretation for this clause, as in the English translation. So y behaves like English for (see Note 23), and doesn’t tell us very much about the structure of C or the position of the finite verb. In Welsh there are particles which introduce affirmative main clauses (under certain conditions—see next section). These are fe and mi, the variation being dialectal as we mentioned above (cf. (1)). The particles have three properties: (i) they trigger soft mutation on the initial consonant of V (see Roberts (2005, Chapter Two, Section 2)); (ii) they host infixed pronouns e.g. fe/mi’ch gwelais i “I saw you(pl)” (I take these pronouns to be in the syntax Romance-style proclitics on the finite verb that are moved with the finite verb; however, they are PF-enclitic to the particle, and so the otherwise available particle-deletion operation—see Roberts (2005, Chapter Four, Note 1)—cannot apply when these elements are present); (iii) they are adjacent to the finite verb, the only possible intervening element being the “infixed pronouns” just mentioned. The last of these properties is most relevant here. As (53) shows, adverbs can occur before these particles, but not in between them and the verb: (53) a. b.

Bore ‘ma, fe/mi glywes i’r newyddion ar y radio. Morning this, Prt heard I the news on the radio. “This morning, I heard the news on the radio.” *Fe/mi bore ‘ma glywes I ‘r newyddion ar y radio. Prt morning this heard I the news on the radio.

Again we can apply our analysis of McCloskey’s Irish data and take it that fe/mi are in Fin. Support for this comes in connection with property (i) above, the fact that fe/mi trigger ICM on V. In Roberts (2005, Chapter Two, 3.1), I show that ICM takes place under syntactic conditions that correspond to the GB notion of head-government; if fe/mi are in Fin and the verb in the highest head position in IP then this configuration is met by the particle–V relation. This analysis assures the adjacency of fe/mi and the finite verb, as long as IP-adjunction is ruled out (which I assume it is in general, following Rizzi (1997) and the “infixed pronouns” are analysed as proclitics on the finite verb, as sketched above).

The Analysis of VSO Clauses 443 Welsh has a general focussing strategy (traditionally called the “mixed construction”) which allows exactly one XP to be fronted over the verb, followed by a or y and the rest of the clause. The choice of a or y depends on the nature of the fronted XP: a is associated with subjects, direct objects and VPs, y with all other XPs (it is possible that a is associated with movement and y with a resumption strategy, see Awbery (1976), Sadler (1988), McCloskey (1979, 1990), Rouveret (1994) and Note 3 of Roberts (2005, Chapter Four)). (54) gives some examples from Tallerman (1996: 100,103): (54) a. b.

Y dynion a werthodd y ci. The men Prt sold the dog. “It’s the men who have sold the dog.” Ym Mangor y siaradais i llynedd. In Bangor Prt spoke I last-year “It was in Bangor I spoke last year.”

These particles have the same properties as fe/mi (they trigger mutation on V, and they must be adjacent with only an infixed pronoun able to intervene). Moreover, they are in complementary distribution with fe/mi. Clauses with a fronted focused XP like those in (54) can be preceded by one of a special class of complementisers, as in (54) (examples again from Tallerman (1996: 108,117,119)): (55) a. Dywedais i mai [ ‘r dynion a fuasai’n gwerthu’r ci ]. Said I MAI the men Prt would-Asp sell-the dog. “I said that it’s the men who would sell the dog.” b. Ai [ceffyl a fuasai hi’n gwerthu]? AI horse Prt would she-Asp sell? “Is it the horse that she’d sell?” c. Nid [ y dyn a ddaeth]. NEG the man Prt came. “It wasn’t the man who came.” Rouveret (1994) and Tallerman (1996) both treat these structures as involving “CP recursion”. Now, the interesting observation, due to Tallerman, is that adverbs can appear between mai and the focused constituent, but not—with embedded scope—before mai: (56) a. Dywedais i mai fel arfer y dynion a fuasai’n gwerthu’r ci. Said I MAI as usual the men Prt would-Asp sell-the dog. “I said that it’s as usual the men who would sell the dog.” b. *Dywedais i fel arfer mai ‘r dynion a fuasai’n gwerthu’r ci. Said I as usual MAI the men Prt would-Asp sell-the dog.

444 Ian Roberts However, it is also possible to separate a focused constituent from the focus particle with an adverb, as (57) shows: (57) Dywedais i mai’r dynion fel arfer a fuasai’n gwerthu’r ci. Said I MAI the men as usual Prt would-Asp sell-the dog. “I said that it’s the men as usual who will sell the dog.” Thus the natural position for a is in Fin. It seems reasonable to situate the focused XP in SpecFoc; this implies that the adverb in (57) occupies a position in between Foc and Fin (possibly SpecFinP or a special position for adverbial specifiers). We consider mai and the other elements that introduce “CP-recursion” illustrated in (55) to be in Force, like their English counterparts in the complements to bridge verbs. Hence the intervening adverb in (56a) can be thought of as occupying a specifier position in between Force and FocP. This analysis is preferable to one which relies on CP-recursion of the kind proposed by Tallerman (1996) in that it directly captures the fact, observed by Cardinaletti & Roberts (2002 [this volume, Chapter 12]), that “the two C-positions have different properties” (Tallerman (1996: 109)). This is because, on this analysis, they are distinct categories. Moreover, since it is possible to have mai–Adv*–XP–Prt where exactly the last XP before the particle is interpreted as focused (cf. Tallerman (1996: 114)); this falls directly under Rizzi’s (1997) proposal that between Force and Foc there is a possibly recursive TopP. Third, we can account for the order in (57), where an adverb intervenes between the focused constituent and the focus particle, as here the focused constituent is SpecFocP, the adverb is either in a lower SpecTopP or in SpecFinP, and particle is in Fin, as show in (57): (58) Dywedais i [ForceP mai [FocP’r dynion [TopP/FinP fel arfer [Fin a] fuasai’n gwerthu y ci ]]]. In order to account for these data, Tallerman must either assume that focused constituents can be adjoined, losing the generalization that there can only be one such constituent in the left periphery of the clause, or weaken the Adjunction Prohibition to allow adverbs like fel arfer to adjoin to C’ (the latter solution is adopted in Willis (1998)). The above analysis implies that the verb is not raised into any part of the C-system. In particular, we can see that a has not raised to Foc in (57), and so the fact that this particle precedes the verb combined with the fact that no adverb can intervene between it and the verb indicate that the verb remains in a position lower than Fin. I continue to assume, following Kayne (1994), and pace Willis (1998), that the verb can’t be right-adjoined to Fin. So our analysis of the C-system of Welsh (and Irish) indicates that finite V do not move there. In turn, this means that the analysis of VSO cannot be as in (28). And this in turn indicates that these languages do not have an English- or French-style EPP as subject requirement (where the subject position is taken to be the highest Specifier position in IP).

The Analysis of VSO Clauses 445 2.3 The Nature of “Bod” A further argument that V does not raise to Fin in Welsh comes from the evidence that there is one verb with peculiar morphological and distributional properties which can be neatly accounted for if we assume that this particular verb—unlike all others—is able to raise to Fin. The verb in question is the auxiliary bod (“be”). If the peculiarities of bod are accounted for by its ability to move into Fin, then the fact that other verbs do not show any of these peculiarities shows that they do not raise to Fin. There is further evidence for this from a peculiar restriction on embedded finite verbs most recently discussed and analysed in Tallerman (1998). Bod is the only auxiliary in Welsh, since as we mentioned in Note 1 there is no equivalent of have (see Roberts (2005, Chapter Three, Section 3) for a proposal regarding this).30 The idea that auxiliaries have greater movement privileges than other verbs is of course not new: cf. Pollock (1989) for discussion of the evidence that this is true in both English and French, and Rizzi (1982, Chapter 3) for comparable evidence from Italian.31 The morphological peculiarity of bod lies in the fact that it has tense forms which other verbs lack. As shown in Note 1, all verbs have synthetic present/future, conditional and preterit forms, but all the other tenses are periphrastic (literary Welsh also has pluperfect and subjunctive forms but these are not part of the colloquial language). Bod, on the other hand, has synthetic present (distinct from the present/future) and imperfect forms (distinct from the conditional). This verb also has distinct paradigms in the present and imperfect according to the clause type. Here I give the Northern colloquial forms (from King (1993: 146f.)): (59) Present tense: Affirmative: dw i Interrogative: ydw i Negative: (dy)dw i ti wyt i dwyt ti mae o/hi ydy o/hi dydy o/hi dan ni ydan ni (dy)dan ni dach chi (y)dach chi (dy)dach chi maen nhw ydyn nhw dydyn nhw “I am” etc. “Am I?” etc. “I’m not” etc. (60) Imperfect tense: Affirmative: roeddwn i Interrogative: roeddet ti roedd o/hi roedden ni roeddech chi roedden nhw “I was” etc.

oeddwn i Negative: doeddwn i oeddet ti doeddet ti oedd o/hi doedd o/hi oedden ni doedden ni oeddech chi doeddech chi oedden nhw doedden nhw “Was I?” etc. “I wasn’t” etc.

446 Ian Roberts These forms can to some extent be decomposed, particularly in the imperfect (although 3rd-person sg/pl present tense mae(n) is clearly a suppletive form—see Hendrick (1996) for an analysis). Here we see that the alternation r-/∅/d- indicates clause type. The prefix r- is the phonologically conditioned variant of y, a particle that occurs with these tenses of bod in declaratives in the literary language (the literary 3sg present is y mae, the 1sg present is yr ydwyf). This particle is in complementary distribution with fe/ mi, which occur with other verbs and with bod in other tenses.32 The prefix d- is the relic of the older/literary preverbal negation ni(d) (see Note 6). Synchronically, though, it is arguable that in the variety illustrated in (59, 60) the prefixes r- and d- are lexically part of bod, given that they do not occur with other verbs. The natural analysis of this situation is to treat bod as able to raise to Fin, and to suppose that the Tense-features associated with present and imperfect are only licensed there. In this way, we can explain the extra forms of bod in terms of extra movement possibilities. In interrogatives, bod may raise further to Foc—see below. This analysis implies that verbs lacking the present and imperfect tenses (i.e. all verbs except bod) do not raise to Fin. Perhaps because of this, such verbs also lack the declarative r- and negative d- prefixes. Support for this analysis comes from the fact, alluded to above, that bod is in complementary distribution with many preverbal particles. As mentioned above, it is in complementary distribution with fe/mi (see also Note 33). We regard the r- prefix on the imperfect as part of bod rather than as a separate head in the contemporary colloquial language. Also, bod is in complementary distribution with the focus particles a/y,33 as can be seen if we change the synthetic verb forms in (54) and (55) into periphrastic tenses with the relevant form of bod (sydd in (61a) is a further suppletive form of bod which appears when the local subject has been extracted—see Hendrick (1996)): (61) a. Y dynion sydd wedi gwerthu’r ci. The men SYDD Asp sell the dog “It’s the men who have sold the dog.” b. (?)Ym Mangor dw i wedi siarad iddo fo. In Bangor am I Asp speak to-3sg him. “It’s in Bangor I have spoken to him.” c. Ai [ceffyl mae wedi gwerthu]?34 AI horse is Asp sell? “Is it a horse that she’s sold?” We propose that bod raises to Fin, and so its syntax corresponds in an important respect to what we proposed above for a and y. Further evidence that bod raises into the C-system in certain contexts comes from Rouveret’s (1996) analysis of copular clauses. Rouveret argues

The Analysis of VSO Clauses 447 that identificational clauses like those in (62) are V2 clauses in the sense that in his terms the precopular XP is in SpecCP and the copula in C: (62) a. Y brenin ydy Arthur. The king is Arthur. “Arthur is the king.” b. Arthur ydy’r brenin. Arthur is the king. “It is Arthur who is the king.” c. *Ydy Arthur y brenin. Is Arthur the king. In addition to pointing out the complementary distribution between bod and particles,35 Rouveret gives two further arguments for his analysis. First, he observes that an indefinite predicative nominal may appear in initial position, unlike what we find in English: (63) a. Arwr ydy Siôn. Hero is John “John is a hero.” b. *A hero is John. Rouveret suggests that the definite predicate nominal occupies SpecAgrSP in inverse predicational sentences in English such as The king is Arthur, a position from which indefinite predicate nominals are excluded. This suggests that the indefinite predicate nominal arwr in (63a) is not in this position, but instead presumably in a higher position.36 Second, fronted predicate nominals can contain an anaphor bound by the postcopular DP: (64) Ei elyn pennaf ei hun ydy Siôn. His enemy chief his self is John “John is his own worst enemy.” The fronted DP containing the anaphor must be reconstructed in order for the anaphoric relation to be well-formed. Since reconstruction is standardly assumed to be a property of A’-dependencies (but cf. Burzio (1986)), this implies that the fronted DP is in an A’-position, i.e. Spec,CP according to Rouveret. Adapting Rouveret’s conclusion to the split-C system being assumed here, I conclude that in predicative sentences bod raises into the C-system with the fronted XP occupying the Specifier of the head occupied by bod. Since the fronted predicate has a focused interpretation, it is natural to suppose that this head position is Foc (see also Note 36). So Rouveret’s conclusion supports the general claim about the syntax of bod being made here. My final piece of motivation for the idea that bod raises to Fin comes from embedded clauses. Since bod is able to syncretise with Fin-material

448 Ian Roberts such as a/y, we expect not to find any root-embedded asymmetry involving bod if it is attracted to Fin in the manner described. This is true, but only up to a point. First, the fact that the full range of forms of bod is found in relatives and in embedded questions—both embedded contexts par excellence— indicates that the prediction is correct in a straightforward way: (65) a. b.

Dw i ddim yn siwr beth mae hi ‘n moyn wneud. Am I Neg Pred sure what is she Asp want do. “I’m not sure what she wants to do.” (King (1993: 310)) Dw i’n nabod rhywun sy ’n medru siarad Hen Saesneg. Am I-Asp know someone SYDD Asp can speak Old English “I know someone who can speak Old English.” (King (1993: 301))

Thus far, my account is straightforwardly supported. In terms of the account of root-embedded asymmetries sketched in the previous section, the difference between Welsh and the Germanic languages lies in the absence of the Force-Fin relation in embedded clauses. The complementisers that appear in Force (mai and the other elements shown in (55)) do not attract Fin; since they must be followed by a focussed constituent, it appears that they select for properties of Foc. In fact, mai-type particles act exactly like Wh-complements in Germanic— since they activate Foc, they cannot affect Fin. However, since embedded Fin is in general not required to move to Force in Welsh, this does not have the consequence that Fin is unable to trigger movement, and so bod moves to Fin. However, there is a class of embedded positive declaratives with extremely interesting properties. Exactly where the embedded clause contains one of the tenses of bod that is specific to this verb (the present or the imperfect), the non-finite form bod appears instead of the finite form: (66) a. Mae’n deud bod nhw fan hyn. Is Asp say BOD they place this “He says that they’re here.” b. Dywedodd hi bod y trên yn hwyr. Said she BOD the train in late “She said that the train was late.”

(King (1993: 304))

(King (1993: 304)) Since the non-finite form of bod is the verbal noun, it can, like other verbal nouns, be associated with a preceding pronoun in a historically possessive form, be mutated and be followed by an echo pronoun (see Roberts & Shlonsky (1996) and Roberts (2005, Chapter Two, Section 1, Chapter Three, Section 1.3) on Welsh pronouns): (67) Mae o ’n deud fy mod i’n dwp. Is he Asp say my BOD I Asp idiot “He says I am a idiot.”

The Analysis of VSO Clauses 449 With standard verbal nouns in periphrastic tenses, pronominal objects are marked by a preceding historically genitive pronoun (and can be followed by an echo pronoun)—see Roberts (2005, Chapter Three, Section 2): (68) Mae Megan wedi ei weld (o). Is Megan Asp his see (he) “Megan has seen him.” In Roberts (2005, Chapter Three, Section 2), I present an analysis of verbal nouns which has the consequence that non-finite clauses “headed” by verbal nouns are PrtPs, so (68) shows a kind of “participle agreement”, as analysed in Kayne (1989), without movement of the non-finite verb (see Roberts (2005, Chapter Three, Section 2), for details). If this analysis of (68) is correct, then the same pronoun in (67) should be in an agreement position. Assuming that dweud always selects a finite CP, the only likely candidate is AgrS. If that is true, then bod is no higher than AgrS. Let us suppose that it is in T, with the non-enclitic agreement marker (fy) in AgrS.37 In that case, we have a root-embedded asymmetry involving bod: it fails to raise to Fin in these examples. The clauses are nevertheless interpreted as present or imperfect as the relevant features are present in the empty Fin which here is not targeted by bod. We can account for this root-embedded asymmetry in a manner consistent with our account for the Germanic asymmetries if we say that in precisely this case Force is selected by the matrix verb and selects the tense features of Fin (naturally, the relevant tense features are optionally selected, in that these clauses do not have to be present or imperfect). In terms of the table given in (50), present/imperfect Fin, if -selected, +declarative, attracts V. Only bod can raise to Fin, because only bod has the relevant morphological features. Where present/imperfect Fin is +selected and +declarative, it has no PF-realisation property and therefore blocks the raising of bod. In this sense, Welsh is a V2 system just with this auxiliary and just in these tenses (cf. Rouveret (1996) for a very similar conclusion). Where Fin cannot attract bod, bod stays in T and cannot bear present or imperfect morphology because these morphosyntactic features are inherently associated with Fin. In this case, bod defaults to its citation form and AgrS is manifested as a proclitic (see Roberts (2005, Chapter Three Section 2.4) for the idea that the proclitics are the default manifestation of Agr in Welsh). This analysis shows that the Germanic root-embedded asymmetry in interrogatives cannot be attributed to the nature of the Q-feature, because in Welsh bod-clauses we have the same situation but with entirely different features. So we see that just where bod shows peculiar behaviour we can plausibly say that it does not raise to Fin. This is consistent with the idea that where bod does not show such peculiar behaviour it does move to Fin, and therefore indirectly supports the proposal that verbs other than bod, which entirely lack bod’s peculiarities, do not raise to Fin.

450 Ian Roberts Other verbs and the other tenses of bod are associated with the complementiser y in these contexts (examples from (King (1993: 306)): (69) a. Dw i ‘n meddwl y dylech chi ddeud wrtho fo. Am I Asp think Prt ought you say to-3sg he “I think you ought to tell him.” b. Mae’n sicr y byddai hynny’n beryglus dros ben. Is-Asp certain Prt would-be that-Pred dangerous extremely “It’s certain that that would be extremely dangerous.” Since bod does not raise to Fin in the conditional, it is unaffected by being in an embedded clause. The same is true for other verbs. As we have already mentioned, y is probably in Fin. It is important to note that this element is absent in bod clauses of the type in (66) and (67); it seems that the realisation of Fin under declarative, selected Force is y in all tenses except the present and imperfect, and zero in the present and imperfect. In root decla ratives, where Force is not selected, Fin is realised as fe/mi in all tenses except the present and imperfect, where it is realised by the relevant form of bod (see Roberts (2005, Chapter 4) for an analysis of this that relates it to V2; Rouveret (1996) also suggests that Welsh shows V2 effects just with bod). Focused clauses, both root and embedded, require Fin to be realised by particles as here it is Foc which determines the form of Fin, depending on the status of the focused XP. We see the behaviour illustrated in (66) and (67) only in positive declarative embedded clauses. If the clause is negative, there are two options. One possibility, characteristic of the spoken language, is to negate the bod-clause with (d)dim: (70) Dan ni ’n gobeithio bod chi ddim yn siomedig. (King (1993: 305)) Are we Asp hope BOD you Neg Pred disappointed “We hope that you’re not disappointed.” The other possibility is to introduce the complement clause with a special negative morpheme nad, followed by the interrogative/focus form of bod: (71) Dan ni ‘n gobeithio nad ydach chi yn siomedig. Are we Asp hope NAD are you Pred disappointed “We hope that you’re not disappointed.” (King (1993: 305)) The natural analysis of this construction is to treat nad as being in Force, in other words to assimilate it to the mai-like elements, with bod in Fin. Support for this analysis comes from the fact that nad can introduce a negative embedded focussed clause: (72) Mi wn i nad y dyn a ddaeth. (Tallerman (1996)) Prt know I NAD the man Prt came “I know that it wasn’t the man who come.”

The Analysis of VSO Clauses 451 Here we see the pattern of a mai-clause, with nad in Force and a in Fin, parallel to examples like (55). Further support for both the idea that some tense-features are selected on Fin by a higher predicate and the idea that V (other than bod) do not raise to Fin comes from a restriction on embedded past-tense lexical verbs (in the literary variety, and even there there appears to be some variation among speakers regarding the judgements—see Tallerman (1998: 71–73) for discussion). Past-tense lexical verbs are not allowed in the complements of verbs which take a finite complement, although other tenses are allowed (recall that verbs other than bod do not have present or imperfect forms): (73) a. b.

*Meddyliodd Aled [ yr aeth Mair adre’ ]. Thought Aled that went Mary home Meddyliodd Aled [ y byddai Mair yn mynd adre’ ]. Thought Aled that would-be Mary Asp go home “Aled thought that Mary would go home.”

The ban on past-tense lexical verbs is a root-embedded asymmetry, as (74) shows: (74) Aeth Mair adre’. Went Mary home “Mary went home.” We can account for this if we say that selected Fin bears the [past] feature, i.e. some tense-features show up on Fin (as we saw above, [present] and [imperfect] do as well, although these are in any case restricted to bod). But V cannot move to Fin, and hence cannot be licensed as past (in main clauses, and generally in varieties where the restriction in (73a) does not hold, [past] is presumably realised on T). This correctly predicts that preterit bod can show up in contexts like (73a) (Maggie Tallerman (p.c.)). Instead of (73a), we have what appears to be an infinitive, marked with i (“for/to”) (here we follow the presentation in Tallerman; however, some native speakers reject (75) and (76) in favour of a bod complement to meddwl (“think”): Meddyliod Aled bod Mair wedi mynd adre’ “Thought Aled that Mair Asp go home” (“Aled thought Mair had gone home”): (75) Meddyliodd Aled [ i Mair fynd adre’ ]. Thought Aled to Mair go home “Aled thought Mair had gone home”. However, Tallerman (1998) (following Harlow (1992)) shows that these clauses are in fact finite, because (i) they are in complementary distribution with past-tense verbs, as we have seen; (ii) they have a past-tense interpretation; (iii) they coordinate with finite clauses; (iv) they form a binding

452 Ian Roberts domain exactly in the way finite clauses generally do and thus unlike nonfinite clauses. Properties (iii) and (iv) are illustrated here: (76) Meddyliodd Aled [ i Alys fynd adre’ ] ac [ y byddai Mair yn mynd yn fuan ]. Thought Aled to Alys go home and that would-be M. Asp go soon “Aled thought that Alys had gone home and that Mair would be going soon..” (Tallerman (1998: 8)) (77) a. Dywedodd Aledi [ iddo foi/j fynd ]. Said Aled to-3sg he go “Aled said that he’d gone.” b. *Dywedodd Aled [ iddo ei hun fynd ]. Said Aled to-3sg himself go “Aled said that himself had gone.” Presumably, i represents a default form of Fin here (normally it marks infinitives, and as such corresponds either to English to or for; see Tallerman (1998) for extensive discussion). Note the agreement that shows up on this element, as on many “prepositional” elements in Welsh (see Roberts (2005, Chapter Two, 1.2)). The agreement morpheme could be in AgrS, giving the structure [Fin i ] [AgrS -ddo ]. See the discussion of AgrS as a syntactic affix in Roberts (2005, Chapter 2, Section 1).38 In this section, we have seen two more arguments that (main) verbs do not move to Fin (i.e. into the C-system) in main clauses. The first was based on a number of distributional and morphological peculiarities of bod, all of which can be accounted for by assuming that bod moves to Fin in the present and imperfect in matrix clauses. The second was the fact that verbs cannot be past in form in embedded finite clauses. An important assumption behind these arguments is that Welsh Fin has tense features associated with it. In fact, it appears that Welsh differs from English (and many other familiar languages) in that some parts of its tense system are realised on Fin and some parts on T. Because only bod can raise to Fin, only bod can be associated with the tenses that are realised exclusively there, i.e. [present] and [imperfect]. [Past] is realised on Fin in embedded clauses, and this blocks its realisation on T. Future and conditional are realised on T consistently.39 Other Celtic languages show similar phenomena: Irish complementisers show tense distinctions (e.g. go (non-past)/ gur (past) “that”; an (non-past)/ar (past) “Q”, etc.) and, as mentioned in Section 2.1, independent forms of certain verbs in Irish and Scots Gaelic are in complementary distribution with certain complementisers. Most interestingly, as observed both by Cottell (1995) and McCloskey (1999), the complementisers in question are exactly the ones that can show tensed-marking (so, for example, the Wh-particle aL does not show tense-marking and is not in complementary distribution with independent verb forms—see (34)). In a system like Rizzi’s, which explicitly allows for two positions in the

The Analysis of VSO Clauses 453 clause where temporal information can appear, languages where the tense systems are split across the two positions are to be expected.

2.4 Conclusion In this section, we have investigated the possibility that V raises into the C-system with the subject remaining the highest specifier in IP. We have seen a number of arguments against this analysis: the interaction of adverbs, particles and finite verbs, the behaviour of bod, and the restriction on embedded past-tense verbs. In the process, we have adopted Rizzi’s (1997) split-C system. Here is a summary of some of the claims we have made about the structure of the left periphery in Welsh, Irish and Germanic using Rizzi’s system: (78)

WELSH: IRISH: ENGLISH: GERMAN:

Force mai/ai/nad thatbridge V daß

Spec, Foc fronted XP NPI(?) WH

Foc (Adv) (Adv)

Fin a/y/fe/mi/bod go thatnon-bridge tdaß

Additionally, both Welsh and Irish have tense/agreement forms which can only be realised on Fin, while Germanic languages do not have this (although Fin attracts the finite verb in unselected contexts in these languages, giving rise to the V-movement part of V2—see Roberts (2005, Chapter 4, Section 3)). I also suggested an account of the root-embedded asymmetry, which made it possible to see how this asymmetry applies to bod in certain contexts, in addition to having some generality for Germanic.

3. Conclusion From the arguments given in Section 2, I conclude that V does not raise into the C-system in Welsh (or in Modern Irish), and so an analysis of the kind in (28) is to be rejected. Since we have seen that V and the subject leave VP (Section 1), the only alternative is that both elements occupy functional positions in between C and V. Let us look again at the schema in (27): (27)

F1P F1 F2P [wD] [sV] DP

F2′

F2 [sD]

Adopting the clause structure proposed in Chomsky (1993, 1995—except for 4.10), the most likely hypothesis is: (79) F1 = AgrS; F2 = T.

454 Ian Roberts An alternative possibility might be: (80) F1 = T; F2 = H lower than T, e.g. AgrO or v We can dismiss (80) immediately on two grounds. First, we know that there is very little structural “space” between the position of the particles in Fin, the position of the verb and the position of the subject. The only things that can intervene anywhere in this complex are infixed pronouns (see 2.2). Placing the verb as low as T thus seems implausible—one would expect to find at least temporal adverbs in between particles and V. Second, if SpecvP is the position subjects are merged in, as Chomsky proposes and as assumed in Roberts (2005, Chapter Two, Section 3), and the arguments that we gave in Section 1 to the effect that the subject leaves its base position are valid, then this idea cannot be maintained. So, we have eliminated all plausible analyses of VSO clauses except for (79). My analysis of simple VSO sentences like those in (1a) in Welsh is thus as in (81): (81)

FinP Fin

AgrSP AgrS

TP DP

T′ T

Mi

welais

i

VP

tv

V

DP

tv

Megam

The structure of (1b) would be:40 (82)

FinP Fin

AgrSP AgrS

TP DP

T′ T

Mi

wnes

i

tv

VP V

DP

weld

Megan

The Analysis of VSO Clauses 455 Adopting (79) entails in terms of the checking theory of Chomsky (1995, Chapter 4) that AgrS has a weak D-feature and a strong V-feature. These are the parameter settings that give rise to VSO order: strong D-features in AgrS would give a French-like system (see Pollock (1989)); weak V-features would presumably give an SVO system in which AgrS is completely “deactivated” (this may be the situation in German, although the question of OV vs. VO underlying order is complex—cf. Haider (1997)). One could think that the subject raises to SpecTP purely to satisfy the Extended Projection Principle, which therefore applies in Celtic languages, pace Chomsky (1995), McCloskey (1996b) (but see below and Roberts (2005, Chapter Two, section 1)). If the Extended Projection Principle is to be reduced to a strong D-feature of T, as Chomsky (1995) proposes, then we are led to propose the following feature-values for functional heads in the Welsh I-system (and presumably, the same holds at least for Irish): (83) a. AgrS has weak D-features. b. AgrS has strong V-features. c. T has strong D-features. However, these proposals are problematic, despite the fact that they clearly account for the data seen in this chapter. First, AgrS is a rather peculiar agreement head in Welsh (and in Celtic VSO languages generally), owing to the anti-agreement effect. This term designates the fact that the subject fails to agree with the verb under various conditions (the phenomenon is also found in Breton (Stephens (1982)), Irish (McCloskey & Hale (1984)) and Classical Arabic (Mohammad (1988)); see also Roberts & Shlonsky (1996)). The standard anti-agreement effect is illustrated by examples like (84) for Welsh: (84) a. Canon sing-3pl “They sang” b. Canodd Sing-3sg “He/she sang” c. Canodd y plant. Sang-3sg the children (pl) “The children sang” d. *Canon y plant. Sang-3pl the children (pl) The generalisation is that a plural (3rd-person) subject always appears with a singular verb—i.e. (number) agreement systematically fails. The plural form is ungrammatical with an overt subject, as (84d) shows. However, with a null subject, the plural form is allowed, and indeed required for the plural interpretation, as (84a) shows. In Irish, a default form is used in some

456 Ian Roberts tenses and persons with an overt postverbal subject (see McCloskey & Hale (1984) for detailed discussion and analysis); an overt subject is impossible with a non-default form: (85) a. b.

Chuirfeadh Eoghan/na léachtóiri isteach ar an phost sin. Would-put-3sg Owen/the teachers in for the job that “Owen/the teachers would apply for that job.” Chuirfinn (*mé) isteach ar an phost sin. Would-put-1sg I in for the job that “I would apply for that job.”

Clearly, this striking property of many VSO systems (including all the Celtic ones) needs to be accounted for. At first sight, it is not obvious how the idea that the verb raises to an agreement position can account for this. Second, the natural way to account for the apparently obligatory movement of the subject to SpecTP is in terms of the EPP, as we just mentioned. However, there are two constructions in Welsh which provide clear evidence that the EPP does not apply (at least in anything like the standard sense). These are impersonal passives and existential constructions:41 (86) a. Gwelwyd plant. See-PASS children. “Children were seen.” b. ?Mae yn yr ardd blant. Is in the garden children. “There are children in the garden.” In these constructions, there is no subject in the specifier immediately subjacent to the position occupied by the finite verb. One might think that plant in (86a) is in subject position, but it can be shown that this is not so. Plant can be cliticised by an infixed pronoun, a possibility restricted to non-subjects, and the impersonal passive can appear in a periphrastic tense where the single argument follows the verbal noun and so must be the direct object. These phenomena are illustrated in (87a) and (87b), respectively:42 (87) a. Fe ’i gwelwyd. PRT-him see-PASS. “He was seen.” b. Yr ydys yn gweld plant. PRT is-PASS Asp see children. “Someone is seeing children.” (i.e. There is been seeing children) (Harlow (1989: 310)) I look more closely at passives like (87a) as part of our overall account of Case-licensing in Welsh in Roberts (2005, Chapter Two), where I draw a

The Analysis of VSO Clauses 457 rather different conclusion regarding the status of infixed i (see in particular Roberts (2005, Section 3.4.2)). However, the point that the subject position is unfilled in (86a) holds. Examples like (86b) alternate with the following: (88) a. b.

Mae(‘r) plant yn yr ardd. Is (the) children in the garden. “(The) children are in the garden.” Mae yna blant yn yr ardd. Is there children in the garden. “There are children in the garden.”

In (88a), plant appears to occupy the subject position and there is no definiteness effect. In (88b), yna (“there”) occupies subject position and there is a definiteness effect, as (89) illustrates: (89) *Mae yna’r blant yn yr ardd. Is there the children in the garden. “There’s the children in the garden.” See Rouveret (1996) and Roberts (2005, Chapter Two, Section 3.4.4), for more discussion. The relevant point for present purposes is that there is no absolute requirement for a subject to appear in SpecTP. Of course, one could always propose that SpecTP is occupied by expletive pro in these cases. However, this seems undesirable in the context of a theory where elements can only be licensed at the interfaces. Expletive pro has no PF property, since it is an empty category, and it has no LF property, since it is an expletive. Therefore it has no interface property, and its postulation should be avoided—see Borer (1986), Alexiadou & Anagnostopoulou (1998) for a proposal for constructions formerly regarded as containing this element in null-subject languages like Italian and Modern Greek. There is also an empirical argument that expletive pro is not present in impersonal passives like (86a). As discussed in detail in Roberts (2005, Chapter Two, Section 3), soft mutation applies in Welsh wherever a category is immediately preceded by an XP that c-commands it (cf. Borsley & Tallerman (1998), Borsley (1999)). The direct object in (86a) is not mutated, which indicates that there is no immediately c-commanding XP, and so no expletive pro in subject position. In this respect, (86a) contrasts with sentences with an argumental null subject. In such sentences, soft mutation on the direct object is triggered: (90) Mi welais pro blant. Prt saw-1sg children “I saw children.”

458 Ian Roberts Here the initial /p/ of plant is mutated to /b/ (see Roberts (2005, Chapter 2, Section 3), for a detailed discussion of the phonology and syntax of soft mutation). The contrast between (90) and (86a) suggests that there is a null subject in (90), following Borsley & Tallerman’s generalisation, but that there isn’t one in (86a). Hence the EPP does not hold (or does not hold of SpecTP) in Welsh (this conclusion was also reached by Harlow (1989), on the basis of essentially the same data). So we conclude that, however appealing the scenario in (83) might be, the fact that the agreement position is not always associated with actual agreement and the fact that SpecTP does not obey the EPP are good reasons to look further. Moreover, recall that we began by pointing out that there are two essential desiderata of parameters: they must be typologisable and learnable. That is, they must both give rise to interesting cross-linguistic generalisations and help explain known cross-linguistic generalisations, and it must be that we can propose plausible, simple trigger experience for fixing their values. In Roberts (2005), I scrutinise the parameter values in (83) in the light of the criteria of typologisability and learnability. Unsurprisingly, given the problems just raised, this leads to a reformulation.

Notes 1 In fact, the generalisation is that any tense can be expressed periphrastically, but while no tense has only a synthetic form, some have both periphrastic and synthetic forms. King (1993:137) gives the following list of the tenses of the verb prynu (“buy”) in modern spoken Welsh: Present Imperfect Perfect Pluperfect preterite brynodd e Future future perfect conditional Conditional perfect

mae e’n prynu roedd e’n prynu mae e wedi prynu roedd e wedi prynu naeth e brynu brynith e bydd e’n prynu neith e brynu bydd e wedi prynu basai fe’n prynu (prynai fe) basai fe wedi prynu

“he buys/is buying” “he was buying” “he has bought” “he had bought” “he bought” (. . .) “he will buy” “he will have bought” “he would buy” “he would have bought”

Here, (f)e is the 3sg masculine pronoun (the occurrence of initial f- is phonologically conditioned, this pronoun shows up as (f)o in many Northern varieties, and in some of the examples here). The forms mae, oedd, bydd and basai are the present, imperfect, future and conditional forms of the auxiliary bod (“be”). The forms naeth and neith are the preterit and future respectively of gwneud (“do”). Note that there is no “have” auxiliary—I discuss this point in Roberts (2005, Chapter Three, Section 3). I will leave aside the question of the nature of the aspectual particles (y)n and wedi here, simply assuming that they instantiate an aspectual functional head (see again Roberts (2005, Chapter Three, Section 2).

The Analysis of VSO Clauses 459 I analyse the alternation in the initial consonant (soft mutation of /p/ to /b/) in Roberts (2005, Chapter Two, Section 2). Various tenses which are more characteristic of the literary language are not included here: the pluperfect and the subjunctive. The future and conditional are historically present and imperfect, respectively, and are often described as such in traditional grammars. 2 Welsh is a typologically very regular language. It is VO/Pr/NA/NG, i.e. it conforms to a general “head-initial” pattern. Welsh conforms to every one of Greenberg’s (1963) universals concerning VSO languages: namely, Universals 1, 2, 3, 6, 8, 9, 10, 12, 16, 17, 19, 21, 22. Following the criterion of typologisability, we would like whatever parameter accounts for VSO to account for the other properties too. See Roberts (2005, Chapter 3). 3 See also Alexiadou & Anagnostopoulou (1998), Chomsky (2000, 2001), Platzack (1998) and the papers in Svenonius (2002) for relevant considerations regarding the nature and functioning of the EPP. 4 Chomsky (1995: 329f.) excludes VP-adjoined adverbs. However, under the specific assumptions about clause structure made there, the subject is merged outside the core VP in the specifier of a higher phrase, vP. Adverb-adjunction to vP is not excluded in this system (and there is no strong motivation for such exclusion); in fact, this is arguably the position of often in (i):

(i) John often reads books.

Hence, the argument in the text goes through—if the subject were not raised from its merged position and V is raised to T, then we expect at least some adverbs to intervene between V and the subject, viz. adverbs adjoined to vP. V must leave vP in order to derive VSO order, if the subject is merged at Spec, vP. In Roberts (2005, Chapter Two, Section 2), I will adopt the idea that the subject is merged in Spec, vP (this is further refined in Roberts (2005, Chapter Three, Section 2)). 5 There is evidence that Welsh negation follows the diachronic pattern noticed by Jespersen (1917:195-6). Jespersen observed that English negation developed from a preverbal clitic negation in Old English (ic ne secge), to a French-style “double” negation in Middle English (I ne seye not) to a postverbal-only negation in Early Modern English (I say not) (the development is then complicated by the introduction of obligatory do-insertion, which is not relevant here). Different registers of Modern Welsh show all three patterns of negation, although it is a reasonable conjecture that the registers correlate with diachronic stages (see Willis (1997)). Literary Welsh has a preverbal negation ni before a verb beginning with a consonant, nid before a verb beginning with a vowel: (i) Ni redodd Siôn i ffwrdd. Neg ran John to far “John didn’t run away.” (Rouveret (1994:127))

A “double-negation” pattern is also possible, with (d)dim (“at all/anything”) (the initial d- of the auxiliary is the survival of nid in the spoken language—I’ll say more about forms like dyw in Section 2.3 below):

(ii) Dyw Ffred ddim fan hyn. Neg-is Fred neg place this “Fred isn’t here.”

J. Morris-Jones (1913:314) remarks that “This adverbial ddim is nearly as frequent in the spoken lang. as pas after a neg. in French”. In sentences without an auxiliary, the only visible form of negation in modern spoken Welsh is (d)dim:

460 Ian Roberts (iii) Chafodd Ffred ddim gwobr. Got Fred neg prize “Fred didn’t get a prize.”

Although a number of issues are unclear, the basic development appears to parallel that of English. French too shows the same development, viz. from ne V in Old/Middle French to ne V pas in Modern Literary French to V pas in the contemporary spoken language. Borsley & R. Morris-Jones (2000:20ff.) distinguish quantifier dim from adverbial dim. Quantifier dim shares the distribution of non-negative quantifiers like peth or rhai (both “some”), in appearing as a pronominal modifier or preceding partitive o (“of”): (iv)

a. Does [dim dyn/dim o’r dynion] yn yr ystafell. Is no man/no of-the men in the room “No man/none of the men are in the room”. b. Welish i [ddim dyn/ddim o’r dynion]. Saw I no man/none of the men. “I saw no man/none of the men.” c. Mae [rhai dynion/rhai o’r dynion] yn yr ystafell. Is some men/some of the men in the room. “Some men/some of the men are in the room.” d. Welish i [rai dynion/rai o’r dynion]. Saw I some men/some of the men. “I saw some men/some of the men.”

It seems clear then that quantifier dim is a DP-internal element, as the bracketing in the above examples indicates. Adverbial dim is in complementary distribution with other sentential negators such as byth (“never”), and can appear as part of a sequence of adverbs: (v) a. b.

*Dw i byth ddim yn yfed cwrw. Am I ever not in drink beer. (Borsley & R. Morris-Jones’ (63), p. 29) Dydy’r ceffyl ddim bob tro yn y cae. Is-the horse not each turn in the field. “The horse isn’t always in the field.” (Borsley & R. Morris-Jones’ (57), p. 27)

This is the element which is comparable to French (sentential) pas, and which we might thus think of as occupying SpecNegP. See below. See also Note 7 on a peculiar restriction on the distribution of adverbial dim. 6 Borsley & R. Morris-Jones (2000) point out that adverbial dim cannot cooccur with a nominal object:

(i) *Wela’ i ddim Gwen eto. Will-see I not Gwen again. “I will not see Gwen again.”

Although a surprising restriction, and one for which we have no immediate account, it does not affect the point being made in the text: for this point to hold it is sufficient for adverbial dim to occupy a position which is structurally higher than the position into which the subject is merged. If the subject is merged in SpecVP and dim can precede VP-adverbs, as in (vb) of the previous footnote, then this is clear. Cf. also Note 22 of Roberts (2005, Chapter Two).

The Analysis of VSO Clauses 461 7 Subjects can also appear in front of ne: (i) a. Ne goll ket Yann e hent ar c’hoad. Neg lose neg Yann his way in-the wood. “Yann does not lose his way in the wood.” b. Yann ne goll ket e hent ar c’hoad. Yann neg lose neg his way in-the wood. “Yann does not lose his way in the wood.” (Stephens (1982:128))

This is an instance of a general rule of XP-fronting to a position in front of the preverbal particle (ne in a negative clause). Other XPs can be fronted to this position:

(ii) a. [Al levr nevez] ne lenn ket ar vugale. The book new neg read neg the children. “The children didn’t read the new book.” (Object-preposing:Stephens (1982:256)) b. [Debrin krampouezh ed-du] ne ra ket Yann. eat pancakes buckwheat neg does neg Yann “Yann does not eat buckwheat pancakes.” (VP-preposing:Stephens (1982:105))

It seems clear that the order in (ib) involves an A’-movement rule, and so is not relevant for determining the “neutral order” of constituents. In this matter, I agree with Borsley & Stephens (1989) and not with Stump (1984). See Roberts (2005, Chapter 4, Section 2). 8 Stump (1984) gives the following example of a pronominal subject between the verb and ket: (i) Ne gouskont-int ket. Neg sleep -they neg. “They do not sleep.”

This example, and the corresponding one where the pronoun follows ket are of unclear grammaticality (see in particular Borsley & Stephens (1989:414), Schafer (1994:38, n. 19), Stump (1989:435-437)). It may be that in at least some dialects of Breton, pronominal subjects can or must raise further than nominal subjects; this point is not central to the text discussion, however, and so I leave it aside. 9 Things are apparently more complex in Breton, at least according to Schafer (1994). Schafer observes that there is an object-shift rule. This operation places the object pronoun in front of the subject and, in a negative clause, after ket: (i) a. Ne wel ket anezhañ Maia. Neg see neg it (a-pronoun) Maia. “Maia doesn’t see it.”

(Schafer (1994: 38))

(Schafer (1994: 37))

b. Breman e wel anezhañ Maia. Now Prt see it Maia. “Now Maia sees it.”

Schafer (1994:40) suggests that Neg is situated above AgrOP, and that object shift places the object pronoun in SpecAgrOP. She also argues against a rightward-subject-movement analysis of (ia,b). Whatever the correct analysis of object shift, this construction appears to confirm that the subject occupies a “low” position in this language.

462 Ian Roberts 10 Rouveret (1994:140) suggests that in more literary varieties of Welsh where the principal negator is ni(d) and (d)dim is not required, there may be a similar variation in the position of the subject. Without the diagnostic afforded by the medial negation, this variation is hidden, he suggests. This conclusion is not readily compatible with minimalist assumptions, which entail that the convergence of a derivation without subject raising over the position of medial negation renders ungrammatical a derivation with such movement. In the absence of direct evidence in favour of two distinct positions for definite subjects in varieties of Welsh other than the Pembrokeshire dialect, I will assume that there is a single (VP-external) one. 11 (13a) is an example of the “perfective passive”. For reasons which we don’t need to go into, this passive construction is always perfective in interpretation. Irish, like Welsh, also has a synthetic impersonal passive. However, in both languages, there is some reason to think that there is no movement of the logical object in these cases; see Stenson (1981) on the Irish construction and Comrie (1977) on the Welsh one. See also Section 3.4.2 of Chapter Two. See McCloskey (1996b) for arguments that the object is raised to subject position in (13b). (13c) involves raising of a null subject, as the 2sg agreement on the matrix predicate shows. The2sg argument is clearly the logical subject of the lower predicate. 12 For more on the cael-passive, see Note 11 of Roberts (2005), Chapter Three. 13 Here we have definite subjects. Indefinites are possible in (18b) because the verb is unaccusative, as we have just seen. 14 For a general alternative to Huang’s account, see Heycock (1995). 15 Interestingly, the same judgements carry over to Breton, according to Schafer (1994:90). If this is so, then we have a further indication that, despite the fact that they must follow ket, subjects are raised out of VP in this language. If the subject is generated in Spec,vP, as I assume in Roberts (2005, Chapter Two, Section 2), then this result implies that what is fronted here is at least vP and that the subject moves out of that category. 16 The subject can always be optionally realised as an Accusative DP in nonfinite clauses in Irish. See Chung & McCloskey (1987:211), Bobaljik & Carnie (1996:238) and the references given there. 17 For further discussion of this issue, and a summary of a number of analyses of SOV order in non-finite clauses in Irish, see Carnie (1995: 81-118). For my purposes here, it is enough to show how these orders argue for the idea that the subject leaves VP in Irish. 18 In English the movement is restricted to auxiliaries, owing to the interaction of the absence of main verb movement to T and the Head Movement Constraint—see Pollock (1989). In French, the subject must be a clitic pronoun when V moves to C—see Kayne (1972, 1983), Rizzi & Roberts (1989 [this volume, Chapter 9]), Roberts (1993), Sportiche (1998). 19 Many non-standard varieties of English allow residual verb second in indirect questions. The phenomenon has been carefully studied in Hiberno-English by McCloskey (1992). In this variety, examples like the following are fully acceptable: (i) a. Ask your father does he want his dinner. b. “Would a woman of this area dress herself like that?” “I don’t know would she.” (McCloskey (1992:15))

McCloskey notes that embedded residual verb second is only possible with verbs whose complement has a truly interrogative interpretation (verbs which

The Analysis of VSO Clauses 463 introduce “true questions” in the terminology of Suñer (1993)). Verbs which take Wh-complements with a “semiquestion” interpretation do not allow embedded residual verb second, e.g.: (ii) a. *It was amazing who did they invite. b. *The police couldn’t establish who had they beaten up. (McCloskey (1992:16))

It seems clear that Q and Wh-features should be distinguished on the basis of data like this. See McCloskey (1992) for further discussion and analysis. 20 There are languages where it seems that the root-embedded asymmetry of verb second does not hold. The two best-known cases are Icelandic and Yiddish: because they lack the usual asymmetry, these languages are often referred to as symmetric V2 languages. Verb second appears to be generalized to all types of embedded clauses in Icelandic. This order is optional in relative clauses and adverbial clauses, and impossible in topicalised object clauses, cf. Sigurðsson (1989:44f.)). The other Germanic language which has been claimed to allow generalized embedded topicalization is Yiddish (cf. Diesing (1988, 1990), Santorini (1990, 1994)). The following examples should be contrasted with those in (27): (i) a. Ég spurδi hvort þegar hefδi María lesiδ þessa bók. (Icelandic) I asked whether already had Mary read this book. ‘I asked whether Mary had already read this book.’ (Rögnvaldsson & Thráinsson (1990))

b. Ikh veys nit far vos in tsimer iz di ku geshtanen. (Yiddish) I know not for what in room is the cow stood. ‘I don’t know why the cow has stood in the room.’ (Vikner (1995))

Various analyses of these phenomena have been proposed, for a summary see Vikner (1995). See below for a suggestion as to what might be the situation in these languages, which does not extend to the differences noted by Sigurðsson. 21 Assuming that the initial element is always in C, this is one way of accounting for what is traditionally known as Bergin’s Law, as Carnie, Pyatt & Harley (ibid) point out. Bergin’s Law states that non-initial verbs are dependent (Bergin (1938:197), cited in Doherty (2000:6)); on the analysis sketched in the text, noninitial verbs will not be in C and thus will have the dependent form. See Doherty (2000:28-31) for some counterexamples to this, which, as he argues, probably represent a change in progress in the Old Irish period. The suggestion that absolute forms are the morphological realisation of C-features is supported by the fact that there is a third set of verb-paradigms that is found in relative clauses. These forms can be viewed as morphological reflexes of a different feature associated with relative C—the question of these different verbal paradigms in Old Irish comes up again in Chapter Four, Section 2 of Roberts (2005). 22 Rizzi proposes that complementisers may vary in their position in the articulated structure of Comp. In particular, he argues that that is in Force while for is in Fin, on the basis of contrasts like the following:

(i) Yesterday I said that, tomorrow, John will leave. (ii) *Yesterday we preferred for, tomorrow, John to leave.

The adjacency requirement that holds between for and the lower subject that it Case-licenses can be accounted for if there is no available adjunction site for adverbs between Fin and IP. (This can be achieved by assuming that there are no adjunction sites at all, as Rizzi proposes). Note the analogy with the way in

464 Ian Roberts which Pollock’s system derives the observed adjacency requirement that holds between verbs and their direct objects in English. This system predicts that the reverse order of for and the adverb will be grammatical:

(iii) *Yesterday we preferred tomorrow for John to leave.

As (iii) shows, this is not so. Rizzi (1997:330) suggests that for may be “syncretic” for Fin and Force. He also points out that the Italian prepositional complementiser di allows the analogous example:

(iv) (v)

Penso, a Gianni, di dovergli parlare. I-think, to John, to have-to-to-him speak. “I think, to John, to have to speak to.” *Penso di, a Gianni, dovergli parlare. I-think to, to Gianni, have-to-to-him speak.

The proposal in the text is that Irish go is a finite version of di, at least as far as its position is concerned. 23 McCloskey (1996a) shows that an adverb can intervene between the fronted negative constituent and (the relevant form of) ní, which in the terms adopted here shows that the fronted negative constituent must at least be higher up than SpecFin. 24 It may seem uneconomical to posit two differences between Irish and English. However, Welsh supports the analysis being presented in the text as there is reason to think that Welsh is like Irish regarding property (a)—see below for examples of C-elements which do not raise from Fin to Force—but unlike Irish in that negation is not located in C. The evidence for the latter assertion comes from the following paradigm, given by Borsley & R. Morris-Jones (2000:19-20): (i) Dw i ddim wedi gweld unrhyw un. Am I neg after see anyone “I haven’t seen anybody” (ii) *Does unrhyw un yn yr ystafell. Is anyone in the room

(Borsley & Morris-Jones’ (15)) (Borsley & Morris-Jones’ (17))

The pattern is similar to the familiar one with any in English (which is not to imply that Welsh is in general like English as regards NPIs and n-words—see Borsley & R. Morris-Jones (2000) for further discussion). 25 Many authors have proposed that there is covert I-to-C raising universally (e.g. Stowell (1981), Pesetsky (1982) and den Besten (1983) among others). Roberts & Roussou (2002) propose that there is universally a chain between T and Fin. We must assume that this chain does not go further into the C-system than Fin. 26 Roberts also assumes that the presence/absence of a PF-realisation requirement is the only form of parametric variation. Hence there is no possibility of, for example, Fin and Force being ordered differently in different languages (in fact, to my knowledge, all versions of minimalism since Chomsky (1993) have assumed that functional heads are invariant in order). 27 This account also leaves open the possibility that there may be languages in which all predicates are like English bridge verbs in directly selecting complementisers merged in Force, but like Irish in that Force does not trigger movement of Fin. If such a language also has the Fin* property, V+T should be able to freely raise to Fin, giving rise to the absence of root-embedded asymmetries. This may well be the situation in the “symmetric” V2 languages Yiddish and Icelandic—see Note 21.

The Analysis of VSO Clauses 465 28 If Force is non-interrogative Foc may still be +WH. This can account for semiquestions and exclamatives like:

(i) It’s amazing how many people were there. (ii) We found out who did it.

See Note 20 and McCloskey (1992). 29 Following Roberts (2001:103), Foc* triggers movement to both specifier and head since the head which moves there has no feature which is capable of licensing the content of Foc, although it can morphologically realize it. 30 Welsh is often described, e.g. by Thomas (1982:213), as having modal auxiliaries, e.g. gallu, medru (both “can”), dylwn (“should/ought”). These verbs are rather like English modals in showing some temporal restrictions compared to regular verbs, in not having direct objects and, in the case of dylwn, of resisting non-finite forms. They are probably best analysed as defective main verbs. They are quite unlike bod, whose principal trait is that it possesses more forms than other verbs, not less. 31 Willis (1998:130ff.) shows that, despite the general obligatory XP-fronting in main clauses in Middle Welsh (i.e. the V2 nature of the language at this period), bot (the Middle Welsh equivalent of bod) systematically blocks XP-fronting. This fact must be connected to the idiosyncrasies of contemporary bod discussed here, but it is very hard to see how. 32 King (1993:138) notes that mi can marginally occur as a mild intensifier before present and imperfect forms of bod: (i) Mi rydw i’ n mynd. Prt am I Asp go. “I am going.”

(King’s translation)

In terms of the account to be given in the text, we might think that in these cases mi can occur in Force, indicating strong assertion. On the other hand, mi here might be linked to its origin as a 1sg pronoun (see Willis (1998:225f., 227f.) on this development). An anonymous reviewer points out that sequences such as mi oedd are found in dialects, a point which clearly requires further analysis. Note that (i) was rejected by speakers of Southern Welsh I tested it with. 33 In the literary language, interrogative a precedes interrogative forms of the copula: (i) A ydych chi’n mynd? Prt are you-Asp go? “Are you going?” I return to this point in Roberts (2005, Chapter Four, Section 1). 34 Y is often written before mae in examples like (60c). However, we continue to regard this element on a par with the r- prefix on the affirmative imperfect forms as part of bod in these cases. On this view, it is restricted to occurring before mae because the two elements are really a single form ymae. It is tempting to place the y that cooccurs with present and imperfect bod in Force. However, examples like (i) show that this is not possible: (i) Mi wn i mai yn yr ardd y mae Hefin. Prt know I that in the garden Y is Hefin “I know that Hefin is in the garden.”

(Tallerman (1996:115))

As we saw above, mai is in Force here, and so y cannot be. Moreover it follows the focused PP yn yr ardd, and so is most likely in Fin with mae.

466 Ian Roberts 35 Rouveret actually handles this differently, as his data is taken from the literary language. As we have mentioned (see Notes 34 and 35), certain particles do appear with bod in this variety. They are however impossible in identificational sentences like those under consideration here, and this is Rouveret’s point. In our terms, the issue becomes a question of the absence of the prefix r- on the imperfect forms of bod in this context. Presumably, like a and y, bod does not raise from Fin to Foc in focussed sentences (the fact that it appears in Fin is enough to guarantee complementary distribution with focus particles, since the focus particles are in Fin as argued in the previous section, and to account for the extra tenses) but does in identificational copular clauses—appearance in Foc is incompatible with initial r-; cf. wh-questions where there is no r- and bod is plausibly in Foc. 36 The analysis of subject positions given in Roberts (2005, Chapter Two, Section 1), would have to treat Siôn as occupying a position lower than SpecAgrSP in (62a). This does not affect the point that Rouveret’s argument establishes here, though. 37 This implies that in these clauses the subject is in a position lower than T. In fact, this would only apply to echo pronouns, as full DPs are not allowed where an ei-pronoun precedes bod. I will not go into the implications of this here. 38 Interestingly, bod can show up in the imperfect exactly where there is extraction of the subject (see Willis 2000:554–555 for discussion and analysis): (i) Pa lyfau wyt ti ’n meddwl oedd yn addas? Which books are you-Asp think was Pred suitable ‘Which boks do you think were suitable?’ (Willlis 2000:554 [53]) (ii) ??Rwy ’n meddwil oedd y hen lyfrau yn addas. Am.I-Asp think was the old books Pred suitable ‘I think that the old books were suitable.’ (Willis 2000:555 [56b])

This contrast shows that oedd does not raise where extraction takes place. In this sense, the ungrammaticality of (ii) is parallel to (iii) (although the judgement does not seem to be as strong, as (ii) is seen as marginal rather than impossible):

(iii) a. *Who did leave? b. *Que sent bon? What smells nice? (see Friedemann 1990) For a proposal regarding these examples, see Roberts (2005, 5.1). 39 In this context, it is striking to observe that Cinque (1999) postultes two T-positions, one for Past and one for Future, and that T(Past) is structurally higher than T(Future). 40 Here I’m assuming that the dummy verb gwneud (“do”) is generated in T and raised to AgrS, like its English counterpart (pace Rouveret (1994, Chapter One)). I’m also assuming that verbal noun gweld (mutated here, hence the absence of initial /g/—see Table One of Roberts (2005, Chapter Two, Section 2.1)) is a non-finite verb form, following Borsley (1996), and pace Rouveret (1994). In Roberts (2005, Chapter Three, Section 2), I summarise Borsley’s arguments. In that section, I will propose a more articulated structure corresponding to the VP in (82), one which comes closer to Rouveret’s (1994) proposal.

The Analysis of VSO Clauses 467 41 It is natural to think that the PP yn yr ardd satisfies the EPP in (86b), but the marginal possibility of (i) shows that this is not right: (i) Mae ‘na yn yr ardd blant. Is there in the garden children “There are children in the garden.”

Judgements are divided on (i): some informants prefer it to (86b), while others reject it entirely. This gives some reason at least to doubt that the PP yn yr ardd satisfies the EPP in (86b). See also Roberts (2005, Chapter Two, Section 3.4.4). 42 These observations led Comrie (1977) to conclude that this construction violates the Motivated Chomage Law of Relational Grammar, in that it appears to be a case of “spontaneous demotion”. In this, it resembles Cinque’s (1988) non-argumental si of (i): (i) Si mangia gli spaghetti. SI eats the spaghetti “People eat spaghetti.”

It is likely therefore that the passive construction shown in (87) is an impersonal, rather than a passive (see Roberts (1987), Blevins (2003) on this distinction). In Roberts (2005, Chapter Three, section 2.2), I suggest that Welsh does not have voice morphology.

References Acquaviva, P. (1996) Negation in Irish and the representation of monotone decreasing quantifiers. In R. Borsley & I. Roberts (eds) The Syntax of the Celtic Languages. Cambridge: Cambridge University Press, 284–313. Adger, D. (1996) Aspect, Agreement and Measure Phrases in Scottish Gaelic. In R. Borsley & I. Roberts (eds) The Syntax of the Celtic Languages. Cambridge: Cambridge University Press, 200–223. Alexiadou, A. & E. Anagnostopoulou (1998) Parametrising Agr: Word-order, V-movement and EPP-checking, Natural Language and Linguistic Theory 16(3). Awbery, G. (1976) The Syntax of Welsh. A Transformational Study of the Passive. Cambridge: Cambridge University Press. Awbery, G. (1990) Dialect syntax: A neglected resource for Welsh. In R. Hendrick (ed) Syntax and Semantics XXIII: The Syntax of the Modern Celtic Languages. San Diego: Academic Press, 1–25. Barss, A. (1986) Chains and Anaphoric Dependence. PhD Dissertation, MIT. Bergin, O. (1938) On the syntax of the verb in Old Irish. Ériu 12:197–214. den Besten, H. (1983) On the interaction of root transformations and lexical deletive rules. In W. Abraham (ed) On the Formal Syntax of the Westgermania. Amsterdam: John Benjamins, 47–138. Blevins, James P. Passives and impersonals. Journal of Linguistics 39(3):473–520. Bobaljik, J. & A. Carnie (1996) A minimalist approach to some problems of Irish word order. In R. Borsley & I. Roberts (eds) The Syntax of the Celtic Languages. Cambridge: Cambridge University Press, 223–240. Borer, H. (1986) I-subjects. Linguistic Inquiry 17:375–416. Borsley, Robert D. (1996) On a nominal analysis of Welsh verb-nouns. In Ahlqvist, A., & V. Čapková (eds) Dán do oide: Essays in memory of Conn R. Ó Cléirigh. Dublin: Institiúid Teangeolaíochta Éireann, 39–47. Borsley, Robert D. (1999) Mutation and constituent structure in Welsh. Lingua 109:267–300.

468 Ian Roberts Borsley & R. Morris Jones (2000) The syntax of Welsh negation. Transactions of the Philological Society 98(1), 15–47. Rowlett, P. (ed) Papers from the Salford Negation Conference. Borsley, Robert D. & I. Roberts (1996) The Syntax of the Celtic Languages. Cambridge: Cambridge University Press. Borsley, Robert D. & J. Stephens (1989) Agreement and the position of subjects in Breton. Natural Language and Linguistic Theory 7:407–427. Borsley, Robert D. & M. Tallerman (1998) Phrases and soft mutation in Welsh. Journal of Celtic Linguistics 5:1–33. Burzio., L. (1986) Italian Syntax: A Government-Binding Approach. Kluwer: Dordrecht. Calder, G. (1990) A Scots Gaelic Grammar. Glasgow: Gairm Publications. Cardinaletti, Anna & Ian Roberts (2002). Clause structure and X-second. In G. Cinque (ed.) Functional Structure in DP and IP: The Cartography of Syntactic Structure Volume One. New York/Oxford: Oxford University Press, pp. 123–166 [this volume, Chapter 12]. Carnie, A. (1995) Non-verbal predication and head movement. PhD Dissertation, MIT. Carnie, A., E. Pyatt & H. Harley (2000) “VSO order as raising out of IP? Some evidence from Old Irish.” In A. Carnie and E. Guilfoyle (eds) The Syntax of VerbInitial Languages. Oxford/New York: Oxford University Press, 39–60. Chomsky, N. (1965) Aspects of the Theory of Syntax. Cambridge, Mass.: MIT Press. Chomsky, N. (1982) Some Concepts and Consequences of the Theory of Government and Binding. Cambridge, Mass.: MIT Press. Chomsky, N. (1993) A Minimalist Program for Linguistic Theory. In K. Hale and S.J. Keyser (eds) The View from Building 20. Cambridge, Mass.: MIT Press, 1–52. Chomsky, N. (1995) The Minimalist Program. Cambridge, Mass.: MIT Press. Chomsky, N. (2000) “Minimalist Inquiries: The Framework.” In R.Martin, D. Michael and J. Uriagereka (eds) Step by Step: Essays in Honor of Howard Lasnik, Cambridge, Mass: MIT Press, 89–155. Chomsky, N. (2001) “Derivation by Phase.” In M. Kenstowicz (ed) Ken Hale: A Life in Language. Cambridge, Mass: MIT Press, 1–52. Chung, S. & J. McCloskey (1987) Government, barriers and small clauses in Modern Irish. Linguistic Inquiry 18:173–238. Cinque, G. (1999) Adverbs and Functional Projections. Oxford/New York: Oxford University Press. Comrie, Bernard (1977) In defense of spontaneous demotion. In P. Cole & J. Sadock (eds). Syntax and Semantics 8: Grammatical Relations. New York: Academic Press, 25–55. Cottell, S. (1995) The representation of tense in Modern Irish. GenGenP 3:105–124. Déprez, V. & K. Hale (1986) Resumptive pronouns in Irish. Proceedings of the Harvard Celtic Colloquium 5:38–48. Diesing, M. (1988) Bare plurals and the stage/individual contrast. In M. Krifka (ed) Genericity in Natural Language: Proceedings of the 1988 Tübingen Conference. SNS-Bericht 88-42. Seminar für Natürlich-Sprachliche Systeme, Universität Tübingen, pp. 107–154. Diesing, M. (1990) Verb movement and the subject position in Yiddish. Natural Language and Linguistic Theory 8:41–79. Doherty, C. (1996) Clause structure and the Modern Irish Copula. Natural Language and Linguistic Theory 14:1–46.

The Analysis of VSO Clauses 469 Doherty, C. (1998) The Syntax of Old Irish Clause Structure. Ms. University College Dublin. Doherty, C. (1999) Tmesis and verb second in Early Irish syntax. Annual Meeting of the Berkeley Linguistics Society 25:98–108 Doherty, C. (2000) Residual verb second in Early Irish: On the nature of Bergin’s construction. Diachronica 17(1):5–38. Emonds, J. (1980) Word order in generative grammar. Journal of Linguistic Research 1:33–54. Ferraresi, G. (1997) Word Order and Phrase Structure in Gothic, PhD Dissertation, University of Stuttgart. Friedemann, M.A. (1990) Le pronom interrogatif que. Rivista di grammatica generativa 15:123–139. Greenberg, J. (1963). Some universals of grammar with particular reference to the order of meaningful elements. In J. Greenberg (ed) UIniversals of Language. Cambridge, MA: MIT Press. Guilfoyle, E. (1990) Functional Categories and Phrase Structure Parameters. Doctoral dissertation, McGill University. Haider, H. (1997) Precedence among predicates. Journal of Comparative Germanic Linguistics 1:3–41. Hale, K. (1989) Some Remarks on Agreement and Incorporation. Ms. MIT. Harlow, Steve (1989) The syntax of Welsh soft mutation. Natural Language and Linguistic Theory 7: 289–317. Harlow, S. (1992) Finiteness and Welsh sentence structure. In H. Obenauer & A. Zribi-Hertz (eds) Structure de la phrase et théorie du liage. Saint-Denis: Presses Universitaires de Vincennes, 93–119. Hendrick, R. (1996) Some syntactic effects of suppletion in the Celtic copulas. In R. Borsley & I. Roberts (eds) The Syntax of the Celtic Languages. Cambridge: Cambridge University Press, 75–96. Heycock, C. (1995) Asymmetries in reconstruction. Linguistic Inquiry 26:547–570. Hooper, J. & S. Thompson. (1973). On the applicability of root transformations. Linguistic Inquiry 4:465–497. Hornstein, N. (1999) Movement and control. Linguistic Inquiry 30:69–96. Huang, C.T.J. (1993) Reconstruction and the structure of VP: Some theoretical consequences. Linguistic Inquiry 24:103–138. Jespersen, O. (1917) Negation in English and Other Languages. Copenhagen: Det Kgl. Danske Videnskabernes Selskab. Historisk-filologiske Meddelelser 1.1–151. Kayne, R. (1972) Subject inversion in French interrogatives. In J. Casagrande & B. Saciuk (eds) Generative Studies in Romance Languages. Rowley: Newbury House, 70–126. Kayne, R. (1983) Chains, categories external to S and French complex inversion. Natural Language and Linguistic Theory 1:107–139. Kayne, R. (1982) Predicate and arguments, nouns and verbs. GLOW Newsletter. Kayne, R. (1989) Facets of Romance past participle agreement. In P. Benincà (ed) Dialect Variation and the Theory of Grammar. Dordrecht: Foris, pp. 85–103. Kayne, R. (1994) The Antisymmetry of Syntax. Cambridge, Mass.: MIT Press. King, Gareth (1993) Modern Welsh: A Comprehensive Grammar. London: Routledge. Koopmann, H. & D. Sportiche (1991) The position of subjects. Lingua 85: 211–258. Ladusaw, W. (1979) Polarity Sensitivity as Inherent Scope Relations. Doctoral dissertation, University of Texas, Austin. Laka, Itziar (1990) Negation in Syntax: On the Nature of Functional Categories and Projections. Doctoral dissertation, MIT.

470 Ian Roberts Longobardi, G. (1978) Problemi di sintassi gotica. Aspetti teorici e descrittivi. MA thesis, University of Pisa. Manzini, M.R. & A. Roussou (2000) Control and A-Dependencies. Lingua. McCloskey, J. (1979) Transformational Syntax and Model-Theoretic Semantics. Dordrecht: Kluwer. McCloskey, J. (1990) Resumptive pronouns, A-bar binding and levels of representation in Irish. In R. Hendrick (ed) Syntax and Semantics XXIII: The Syntax of the Modern Celtic Languages. San Diego: Academic Press, 199–248. McCloskey, J. (1991) Clause structure, ellipsis and proper government in Irish. Lingua 85: 259–302. McCloskey, J. (1992) Adjunction, selection and embedded verb second. Ms., University of California at Santa Cruz. McCloskey, J. (1996a) The scope of verb-movement in Irish. Natural Language and Linguistic Theory. McCloskey, J. (1996b) Subjects and subject positions in Irish. In R. Borsley & I. Roberts (eds) The Syntax of the Celtic Languages. Cambridge: Cambridge University Press, 241–283. McCloskey, J. (2001) “On the Morphosyntax of Wh-Movement in Irish.” Journal of Linguistics 37:67-100. McCloskey, J. & K. Hale (1984) On the syntax of person-number inflection in Modern Irish. Natural Language and Linguistic Theory 1:487–553. Mohammad, M. (1988) The Sentential Structure of Arabic. PhD Dissertation, University of Southern California. Morris-Jones, J. (1913) A Welsh Grammar: Phonology and Accidence. Oxford: Oxford University Press. Morris-Jones, R. & A.R. Thomas (1977) The Welsh Language: Studies in its Syntax and Semantics. Cardiff: University of Wales Press. Müller, G. & W. Sternefeld (1993) Improper movement and unambiguous binding. Linguistic Inquiry 24:461–507. Penner, Z. & T. Bader (1995) Topics in Swiss German Syntax. Bern: Peter Lang. Pesetsky, D. (1982) Paths and Categories. MIT PhD Dissertation. Pollock, J.-Y. (1989) Verb movement, universal grammar, and the structure of IP. Linguistic Inquiry 20:365–424. Rizzi, L. (1982) Issues in Italian Syntax, Foris, Dordrecht. Rizzi, L. (1996) Residual verb second and the WH-criterion. In A. Belletti & L. Rizzi (eds) Parameters and Functional Heads. New York/Oxford: Oxford University Press, 63–90. Rizzi, L. (1997) The fine structure of the left periphery. In L. Haegeman (ed) The New Comparative Syntax. London: Longman, pp. 281–337. Rizzi, L. & I. Roberts (1989) Complex inversion in French. Probus 1: 1–39. [this volume, Chapter 9]. Roberts, I. (1987) The Representation of Implicit and Dethematized Subjects. Dordrecht: Foris. Roberts, I. (1993) Verbs and Diachronic Syntax: A Comparative History of English and French. Dordrecht: Kluwer. Roberts, I. (2001) Language change and learnability. In S. Bertolo (ed) Language Acquisition and Learnability. Cambridge: Cambridge University Press, 81–125. Roberts, I. (2005). Principles and Parameters in a VSO Language: a Case Study in Welsh. Oxford/New York: Oxford University Press. Roberts, I. & A. Roussou. (2002). The EPP as a condition on tense dependencies. In Peter Svenonius (ed) Subjects, Expletives and the EPP. New York/Oxford: Oxford University Press, 125–156.

The Analysis of VSO Clauses 471 Roberts, I. & U. Shlonsky (1996) Pronominal enclisis in VSO languages. In R. Borsley & I. Roberts (eds) The Syntax of the Celtic Languages. Cambridge: Cambridge University Press, 171–199. Rouveret, Alain (1991) Functional categories and agreement. The Linguistic Review 8:353–387. Rouveret, Alain (1994) Syntaxe du gallois: principes généraux et typologie. Paris: CNRS Editions. Rouveret, Alain (1996) Bod in the present tense and in other tenses. In Robert D. Borsley and Ian Roberts (eds) The Syntax of the Celtic Languages: A Comparative Perspective, 125–171. Cambridge: Cambridge University Press. Sadler, L. (1988) Welsh Syntax: A Government-Binding Approach. London: Croom Helm. Santorini, B. (1990) The Generalization of the Verb Second Constraint in the History of Yiddish. Doctoral dissertation, University of Pennsylvania. Santorini, B. (1994) The rate of phrase structure change in the history of Yiddish. Language Variation and Change 5:257–283. Schafer, R. (1994) Nonfinite Predicate Initial Constructions in Modern Breton. PhD Dissertation, University of California, Santa Cruz. Sigurđsson, H. (1989) Verbal Syntax and Case in Icelandic. PhD Dissertation, University of Lund. Sportiche, D. (1998) Subject clitics in French and Romance Complex Inversion and clitic doubling. In K. Johnson & I. Roberts (eds) Beyond Principles and Parameters. Dordrecht: Kluwer, 189–222. Sproat, R. (1985) Welsh syntax and VSO structure. Natural Language and Linguistic Theory 3:173–216. Stenson, N. (1981) Studies in Irish syntax. Ars Linguistica 8. Tübingen: Gunter Narr Verlag. Stephens, J. (1982) Word Order in Breton. PhD Dissertation, University College, London. Stowell, T. (1981) The Origins of Phrase Structure. PhD Dissertation, MIT. Stowell, T. (1989) Raising in Irish and the Projection principle. Natural Language and Linguistic Theory 7:317–360. Stump, G. (1984) Agreement vs incorporation in Breton. Natural Language and Linguistic Theory 2:289–348. Stump, G. (1989) Further remarks on Breton agreement. Natural Language and Linguistic Theory 7:429–471. Suñer, M. (1993) Indirect questions and the structure of CP. In H. Campos & F. MartinezGil (eds) Current Issues in Spanish Linguistics. Georgetown: Georgetown University Press. Svenonius, P. (ed) (2002) Subjects, Expletives and the EPP. New York/Oxford: Oxford University Press. Tallerman, M. (1996) Fronting constructions in Welsh. In R. Borsley & I. Roberts (eds) The Syntax of the Celtic Languages. Cambridge: Cambridge University Press, 97–124. Tallerman, M. (1998) On the uniform Case-licensing of subjects in Welsh. The Linguistic Review 15:69–133. Tallerman, M. (2001) A (very!) preliminary look at unaccusativity in Welsh. Talk given at the 8th Welsh Syntax Seminar, Plas Gregynog. Thomas, A. (1982) Change and Decay in Language. In D. Crystal (ed) Linguistic Controversies. London: Edward Arnold, 209–220. Vikner, S. (1995) Verb Movement and Expletive Subjects in the Germanic Languages. Oxford: Oxford University Press.

472 Ian Roberts Willis, D. (1997) P-Celtic. Ms. University of Oxford, Oxford. Willis, D. (1998) Syntactic Change in Welsh: A Study of the Loss of Verb Second. Oxford: Clarendon Press. Willis, D. (2000) On the distribution of resumptive pronouns and wh-trace in Welsh. Jounal of Linguistics 36:531–573. Zanuttini, R. (1991) Syntactic Properties of Sentential Negation: A Comparative Study of Romance Languages. PhD Dissertation, University of Pennsylvania.

14 Introduction Parameters in Minimalist Theory1 Anders Holmberg and Ian Roberts

The papers collected here represent some of the work carried out in the period 2002-2007 by the group working on the project “Null Subjects and the Structure of Parametric Theory”, funded by the Arts and Humanities Research Council, Great Britain (Grant No. APN14458). The group consisted of Theresa Biberauer, Anders Holmberg, Chris Johns, Ian Roberts, Michelle Sheehan and David Willis. The central goal of that project was to investigate and, if possible, refine the notion of parameter of Universal Grammar, as it has been understood in generative theory since roughly 1980, by looking carefully at the phenomena associated with one of the best-known and most widely discussed examples of a parameter: the nullsubject parameter (NSP). This volume brings together a number of articles focussing on the nature of null subjects in a range of languages; Biberauer (2008a) is a sister volume arising from the same project, which focuses more on parameter theory than on null subjects, while Holmberg (2009) focuses on partial null-subject languages.2 In this Introduction, we would like to set the papers in context. Accordingly, we first discuss the phenomena from English and various Romance languages which originally motivated the postulation of the NSP. Next, we summarise the main kinds of null-subject system that have been identified in the comparative-syntax literature. We complete Section 1 by summarising the two principal approaches to the analysis of null subjects, whose classical exponents are Rizzi (1986) and Borer (1986). Section 2 focuses on the debates surrounding the classical formulations of the NSP, and in particular the “typological” predictions that were initially made, beginning with Rizzi (1982). Here we consider the systematic cross-linguistic investigation whose results apparently indicate that certain predicted correlations do not hold (Gilligan 1987), and the far-reaching and negative conclusions for parameter theory drawn partly on the basis of this by Newmeyer (2004, 2005). We once again attempt to defend a version of P&P theory against Newmeyer’s attacks (see also Roberts & Holmberg (2005), Newmeyer (2006), Biberauer (2008b)). Section 3 takes up the wider question of the nature of parameters; here we observe certain problems with the original view, which associated

474 Anders Holmberg and Ian Roberts parametric variation closely with a rich, domain-specific array of UG principles. This view can no longer be maintained in full in the context of the minimalist programme, which undertakes to eliminate as many UG principles as possible, and which calls into question the domain-specificity of what principles we are required to postulate. Ongoing comparative work over the past twenty years or so has tended by and large to favour the postulation of a large number of microparameters (see Kayne (2005) for discussion, and Baker (1996, 2008) for a different view); this view of parameters is readily compatible with minimalist assumptions, arguably more so than a “macroparametric” approach. We suggest that the proliferation of microparameters is an instance of the familiar tension between descriptive and explanatory adequacy, which at earlier stages in the development of the theory motivated the simplification of rule systems and led to the development of the principles-and-parameters approach itself;3 as before, what seems to be required is a radical increase in theoretical abstraction. We make some tentative suggestions in this direction in Sections 3.4 and 3.5, developing ideas in Gianollo, Guardiano & Longobardi (2008), Roberts (2007a: 443, 2008: 17, 2010a), Roberts & Roussou (2003, Chapter 5 [this volume, Chapter 6]) and Mobbs (2008). This leads us to briefly propose a hierarchical model of parameter schemata which combines the notion of micro- and macroparameters, and to speculate on the shape of comparative syntax beyond explanatory adequacy.

1. Null Subjects: The Basics 1.1 The Observation Traditional grammars of many languages, for example Latin, observe that a pronominal subject is marked “in the verb,” i.e. by the person-number agreement inflection on a finite verb, and as such is not in need of expression by an independent pronoun. The following comment from a wellknown traditional grammar of Latin is representative: “Here [in the finite verb, AH/IGR] the form contains in itself all the necessary elements . . . , the persons being indicated by the endings” (Gildersleeve & Lodge (1895/1997: 144)). Jespersen takes up this idea in the following remark: In many languages the distinction between the three persons is found not only in pronouns, but in verbs as well . . . in Latin . . . Italian, Hebrew, Finnish, etc. In such languages many sentences have no explicit indication of the subject, and ego amo, tu amas is at first said only when it is necessary or desirable to lay special stress on the idea “I, thou.” (Jespersen (1924: 213)) This idea has an ancient pedigree, as indicated by the following remark by Apollonius Dyscolus on Ancient Greek:

Introduction 475 The nominative [subject] is implicitly present in [finite] verbs, and it is definite (i.e. has definite reference) in the first and second persons, but indefinite in the third because of the unlimited number of possible referents. (On Syntax, Book 1, §17; Householder (1981: 25)) What is being alluded to here is that, since a pronominal subject can be expressed “in the verb” in languages such as Greek and Latin, there is no general requirement to pronounce the subject separately as a nominative pronoun. The initial observation behind the NSP, in all its formulations, has to do with this second point: an overt pronominal subject is not required in finite clauses, and, when such a pronoun does not appear, there is no nominal element which overtly realises the subject function in the clause. This fact may reflect a trivial feature of morphology, person-marking on the verb, but the possibility of not expressing the nominal bearing the subject function is of obvious importance for syntax. The first generative study of these matters was Perlmutter (1971). Perlmutter (pp. 100ff.) distinguished languages with the surface filter in (1), which he called Type A languages, from those lacking it, Type B languages: (1) “Any sentence other than an Imperative in which there is an S that does not contain a subject in surface structure is ungrammatical.” (Perlmutter’s (9), p. 100) Perlmutter relates the presence of the surface filter in (1) to the possibility of null subjects and of wh-movement of the subject from a finite embedded clause across a complementiser (this observation has since become known as “Perlmutter’s generalisation”), to the presence of obligatory expletives in the relevant kind of impersonal constructions, and to the existence of an arbitrary subject pronoun as a true subject (as opposed, for example, to an arbitrary subject clitic pronoun which surfaces as part of the objectclitic cluster). French and English are examples of Type A languages, while “Spanish, Italian, Serbo-Croatian, Arabic, Hebrew, Warlbiri and Basque” (Perlmutter (1971: 115)), as well as numerous others are Type B languages. This “typological distinction,” as Perlmutter referred to it, is not, however, connected to the nature of agreement inflection in the Type B languages in Perlmutter’s treatment. With this exception, and with the important omission of any discussion of “free inversion” (see below), Perlmutter’s discussion identifies the NSP in all but name.4 As Perlmutter pointed out, the basic fact motivating the postulation of this parameter is that certain languages require finite clauses to overtly express a definite, referential, pronominal subject, while others do not. The contrast is illustrated by the following Italian and English examples: (2) a. Parla italiano. b. *Speaks English.

476 Anders Holmberg and Ian Roberts Spanish and Greek, among many other languages, pattern like Italian, while, as Perlmutter pointed out, French appears to pattern like English ((3c) is ungrammatical as a declarative, although it would be a well-formed imperative): (3) a. Habla español. b. Mila ellinika. c. *Parle français. Thus Italian, Spanish and Greek are null-subject languages, while English and French are non-null-subject languages. The NSP relates, as stated above, to finite, discourse-neutral clauses, and canonically involves the interpretation of the null subject as a definite, referential pronoun. Many non-null-subject languages, including English, allow null subjects under other conditions. For example, both English and French extensively allow or require the subject of non-finite clauses to be null:5 (4) a. b. c.

[(Him) smoking] bothers me. John expects [(Mary) to leave soon]. Jean a essayé [de -partir]. John has tried [ -to leave].

Such subjects have somewhat different properties from the null subjects of (2) and (3), in that in (4b,c) the empty subject of the infinitive must be coreferent with the subject of the main clause (this is subject control) and that in (4a) must be arbitrary. Accordingly, they have generally been analysed in a different way from those of (2) and (3).6 The initial observation, then, is that some languages allow a definite pronominal subject of a finite clause to remain unexpressed as a nominal bearing the subject function, while others do not. Traditional grammars of languages such as Latin and Greek relate this to the fact that personal endings on the verb distinguish person and number, thereby making a subject pronoun redundant. Languages which allow null subjects are very common: most of the older Indo-European languages fall into this category, as do most of the Modern Romance languages (with the exception of some varieties of French and some varieties of Rhaeto-Romansch; see Roberts (2010b), the Celtic languages, with certain restrictions in the case of Modern Irish (see McCloskey & Hale (1984), and, for arguments that Colloquial Welsh is not a null-subject language, Tallerman (1987)), West and South Slavic, but probably not East Slavic (these appear to be “partial” null-subject languages in the sense of §1.2.4 below and Holmberg & Sheehan (chapter 3). Indeed, it seems that languages which allow null subjects are significantly more widespread than those which do not (Gilligan (1987), cited in Newmeyer (2005: 85)). According to the Haspelmath, Dryer, Gil & Comrie (2005)’s World Atlas of Language Structures (WALS), of 674 languages for which data is available, subject pronouns can be omitted in 409, and

Introduction 477 cannot be omitted in 77 (the remainder form various kinds of mixed categories involving clitics, displaced pronouns, etc.; see Map 101 “Expression of Pronominal Subjects”). So null-subject languages, of one kind or another, are considerably more common than non-null-subject languages. 1.2 Types of Null-Subject Systems Since Rizzi’s early work on null subjects, it has been observed that there are different types of null-subject language. Rizzi (1982: 143) proposed that the NSP be divided into two subcases, one applying to languages in which the unexpressed pronoun can only be an expletive, and one applying to languages where it is able to be referential. Huang (1984) observed that many East Asian languages show a much more liberal option of non-expression of pronominal elements, and that this could not be related to person agreement, since that kind of inflection is generally absent in these languages. More recently, the existence of “partial” null-subject languages has been observed: languages in which the pronominal subject may remain unexpressed under restricted conditions determined by both the morphological and the syntactic context. We now briefly describe each of these types of null-subject language one by one. 1.2.1 Consistent Null-Subject Languages These languages have been the most discussed and analysed among the various types of null-subject languages and have, mainly for historical reasons, often been taken to be the only kind of null-subject language. In consistent null-subject languages, all persons in all tenses can feature an unexpressed pronoun.7 These languages characteristically show “rich” agreement inflection, i.e. distinct personal endings on the verb, generally in all tenses. The Italian, Greek and Turkish forms in (5) illustrate: (5) a. Italian bevo bevi beve beviamo bevete bevono b. Greek pino pinis pini pinume pinete pinun

“I drink” (etc)

“I drink” (etc)

478 Anders Holmberg and Ian Roberts c. Turkish8 içiyorum içiyorsum içiyor içiyoruz içiyorsunuz içiyorlar

“I drink” (etc)

The Romance null-subject languages and Modern Greek are the paradigm examples of this kind of language, and have been much discussed and exemplified in the literature. These languages also illustrate the properties originally proposed to form a cluster determined by the positive value of the NSP, which we will turn to in Section Two below. One further property that we can mention here is that, as pointed out by Jespersen in the quotation above, overt subject pronouns are generally allowed in finite clauses in null-subject languages, although they tend to have what we may loosely call an emphatic interpretation (this is indicated by putting the English pronoun in capitals in the translations below). Thus, alongside (2a) and (3a,b) we have: (6) a. b. c.

Lui parla italiano. HE speaks Italian. Él habla español. HE speaks Spanish. Aftos mila ellinika. HE speaks Greek.

(Italian) (Spanish) (Greek)

This aspect of the interpretation of overt pronominal subjects in null-subject languages emerges slightly more clearly in (7). Here the overt pronoun in the adverbial clause does not allow the interpretation in which it corresponds to the subject of the main clause (see Vanelli, Renzi & Benincà (1986), SamekLodovici (1996) and Frascarelli (2007) for discussion): (7) a. b.

Il professore ha parlato dopo che (lui) è arrivato. (Italian) The professor has spoken after that (he) is arrived “The professor spoke after he arrived.” I Maria jelase afou (afti) idhe ton Yianni. (Greek) The Mary laughed after (she) saw the Yiannis. “Mary laughed after she saw Yiannis.”

In other words, the overt pronoun of the adjunct does not show the same ambiguity as its English and French counterparts in (8). Instead, it strongly prefers the interpretation which is disjoint from “the professor”, while the English and French pronouns are, out of context, ambiguous between this interpretation and the one where they correspond to “the professor”:

Introduction 479 (8) a. The professor spoke after he arrived. b. Le professeur a parlé après qu’il est arrivé. (=(8a)) These interpretative differences involving the use of an overt pronoun appear to be related to the fact that subject pronouns may be unexpressed, i.e. to the positive value of the NSP. For the moment, we take the two diagnostic features of a consistent nullsubject language to be (i) the possibility of leaving the definite subject pronoun unexpressed in any person-number combination in any tense, and (ii) the rich agreement inflection on the verb. We will suggest other properties as we proceed. 1.2.2 Expletive Null Subjects Some languages apparently allow expletive null subjects, but not referential ones. German is one such language, as are some varieties of Dutch and Afrikaans, and a range of creoles (Nicolis (2005, 2008) mentions Cape Verdean, Berbice Dutch, Kriyol, Mauritian, Papiamentu and Saramaccan; Roberts (2007a: 413) adds Haitian and Jamaican). In (9a) the expletive pronoun es cannot be expressed, while in (9b) the same pronoun in the same syntactic position, only now with a referential interpretation, must be expressed (examples from Cardinaletti (1990: 5–6)): (9) a. b.

Gestern wurde (*es) getanzt. Yesterday was (it) danced. “Yesterday there was dancing.” Gestern war *(es) geschlossen. Yesterday was (it) closed. “Yesterday it was closed.”

Owing to this restriction on their null subjects, languages of this type are not regarded as “full” null-subject languages. Rizzi (1982: 143) identifies what he called two “related but autonomous parameters”: one concerns whether an unexpressed pronoun is allowed at all, and the other whether referential pronouns are allowed to be unexpressed. In languages like English, both parameters are negative, while in Italian both are positive. In German and the creoles just mentioned, the first is positive and the second negative. Hence German allows non-referential null subjects, as in (9a), but not referential ones. According to Rizzi, the fourth logical option is impossible (“if an inflection cannot be pronominal, it cannot be referential either” (Rizzi (1982: 143)). There is thus an implicational relation between the presence of referential null subjects and the presence of expletive null subjects (see Holmberg (2010a; this point is discussed in some detail in Roberts (2007b: 31–38)). For an alternative analysis of the German facts illustrated in (9), and related phenomena elsewhere in Germanic, see Biberauer (2010), and the references given there.

480 Anders Holmberg and Ian Roberts Expletive null-subject languages (sometimes called ‘semi-pro-drop languages’), then, are distinguished from consistent null-subject languages in that non-dummy pronouns cannot be left unexpressed. 1.2.3 “Discourse Pro-Drop” (also called “Radical Pro-Drop”) A good number of languages which are otherwise typologically and genetically distinct (Chinese, Japanese, Korean, Thai, Vietnamese and others) allow null subjects quite freely, but seem to be entirely without agreement marking of any kind. The case of Chinese is discussed in Huang (1984). Chinese allows both subjects and objects to remain unexpressed and have a definite pronominal interpretation, as illustrated in (10): (10) a. b.

-kanjian ta le (he) see he Asp Ta kanjian -le. He see (him) Asp “he saw him.”

Both pronouns can be dropped under the appropriate discourse conditions. It has been suggested since the earliest studies (Huang (1984), Rizzi (1986)) that the total absence of agreement marking may play a role in facilitating the very liberal availability of null subjects in these languages. Recently, three specific hypotheses have been put forward in this connection. First, Tomioka (2003: 336) proposed the “Discourse Pro-Drop Generalization” (see Jayaseelan (1999) for a similar idea) : “All languages which allow discourse pro-drop allow (robust) bare NP arguments”; “Null pronouns in Discourse Pro-Drop languages are simply the result of N’-Deletion/NP-Ellipsis without determiner stranding”. This idea expresses a relation between discourse pro-drop and the availability of bare NP arguments (i.e. the grammaticality of a sentence such as (I) saw cat, thereby relating discourse pro-drop in an interesting way to Chierchia’s (1998) Nominal Mapping Parameter).9 It also relates discourse pro-drop to ellipsis, as does Saito (2007) (see below and the discussion in Roberts (2010a)). A different proposal is made by Neeleman & Szendrői (2007, 2008). These authors treat fully specified nominals as KPs (since they inherently contain a syntactic position for Case) and posit an operation of contextfree KP-deletion. In languages with fusional pronoun morphology, this context-free operation is blocked by the principle of disjunctive ordering (the Elsewhere Condition of Kiparsky (1973)), which states that a more specific operation blocks a more general one in the case where both structural descriptions are met. They further adopt a “realisational” approach to the

Introduction 481 insertion of pronouns into positions created by the syntax; for example, the English pronoun him is the realisation, or “Spell Out,” of the feature complex [KP +pronoun, -anaphor, 3rd person, Singular, Masculine, Accusative]. The general “radical pro-drop” rule is the context-free zero-realisation rule (11): (11) [ KP +pronoun, -anaphor ] ←→ ∅ The Elsewhere Condition will always block this realisation of pronouns in English, since, given their fusional nature, English pronouns always have more complex spell-out rules whose structural descriptions properly include that of (11). But this is not true in every language: in some languages, e.g. Japanese, regular, agglutinating case-markers are added to the pronominal root (watasiga “I”; watasi-o “me”, etc.). Japanese thus has separate spell-out rules for the Case (K) morpheme and for the pronoun, which is a category distinct from KP (probably NP). And here is the central idea of their analysis: because of the non-fusional make-up of pronominal KPs, neither the radical pro-drop realisation of KP nor the specific rules for NP and K are in an “elsewhere” relation. Hence Japanese pronominal KPs are optionally allowed a zero realisation. The analysis leads to the following generalisation: fusional pronouns block radical pro-drop. Neeleman & Szendrői’s analysis entails a clear implicational relation between non-fusional pronoun morphology and discourse pro-drop, which they show holds across a very wide range of languages. The third analysis of discourse pro-drop put forward recently is due to Saito (2007). Saito (2007) suggests that a single covert grammatical mechanism allows for radical pro-drop and argument ellipsis. This mechanism involves covert copying of elements into argument positions from a set of discourse-given entities. Understood pronouns may be included in this set, along with antecedents for the recovery of elided arguments. The precondition for this covert operation is, effectively, the lack of surface agreement triggers. Hence arguments are not required to be overtly present in order to trigger surface agreement on verbs and other categories. Thus the relation between radical pro-drop and absence of agreement-marking is established (see also Kuroda (1988)). This analysis of discourse pro-drop is discussed in more detail in Roberts (2010a); see also Section 3.4 below. Whatever the correct analysis, the characteristics of discourse pro-drop languages which distinguish them from consistent null-subject languages are (i) general possibility of non-expression/ellipsis of nominal arguments in various functions in addition to the subject; (ii) lack of person-agreement marking on verbs. 1.2.4 Partial Null-Subject Languages The existence of partial null-subject languages as a separate type of nullsubject language has been more difficult to establish. However, Holmberg (2005: 548–550), Holmberg and Sheehan (2010) and Holmberg, Nayudu &

482 Anders Holmberg and Ian Roberts Sheehan (2009) and Holmberg (2010a,b) identify a number of characteristics which can serve to distinguish languages of this type from languages of the Italian type described in Section 1.2.1 (see also the papers in Holmberg (2008)). These include Finnish, Hebrew, Russian, Icelandic, Marathi and probably several other Indic languages, and Brazilian Portuguese.10 Here we take Finnish as our example of a partial null-subject language. We observe three things which distinguish the null subjects of Finnish from those of a language like Italian. First, only 1st and 2nd person pronouns are freely able to be left unexpressed in any finite context: (12)

(Minä) puhun englantia I speak-1sg English (Sinä) puhut englantia You speak-2sg English *(Hän) puhuu englantia He/she speak-3sg English (Me) puhumme englantia We speak-1pl English (Te) puhutte englantia You speak-2pl English *(He) puhuvat englantia They speak-3pl English

“I speak English, etc.”

(Holmberg’s (2005: 539))

However, it is not the case that 3rd person pronouns can never be unexpressed. As Holmberg (2005: 539) says: “A 3rd person definite subject pronoun can be null when it is bound by a higher argument, under conditions that are rather poorly understood.” This possibility is illustrated by examples like the following: (13) Pekkai väittää [ että häni/j/Øi/*j puhuu englantia hyvin ]. Pekka claims that he speaks English well (Holmberg’s (2005: 539)) This is characteristic of partial null-subject languages. Holmberg & Sheehan (2010) is an investigation of the conditions under which these null subjects are licensed. Finally, in partial null-subject languages “generic pronouns can, and must, be null” (Holmberg’s (2005: 540)): (14) Täällä ei saa polttaa. Here not may smoke “One can’t smoke here” This contrasts with languages like Italian and Greek, where a special clitic (Italian) or verb form (Greek) is required:

Introduction 483 (15) a. b.

Qui non si può fumare. Here not SI can smoke Apoghorevete to kapnisma. prohibit-3sg-.mediopass the smoking “One can’t smoke here”

[Italian] [Greek]

In Italian, omission of the si-clitic gives rise to a string comparable to that in the Finnish example in (14), but the unexpressed pronoun must be understood as definite: “Here s/he can’t smoke”. The Greek example features the mediopassive form of the verb “to prohibit”, which gives rise to the same kind of impersonal interpretation as Italian si. The above differences suffice to establish that partial null-subject languages have a range of properties distinguishing them from consistent null-subject languages. It is very likely that many languages that have been seen as consistent null-subject languages are in fact partial null-subject languages; in this connection, the simple descriptive facts need to be clarified. One should regard categorical statements in descriptive grammars to the effect that a given language is a null-subject language with some scepticism (with the possible exception of the very best studied languages). The fact that subjects are sometimes unexpressed does not make a language a nullsubject language in the technical sense. As is well known, even (spoken) English can drop its subjects in certain contexts (see Note 6). Distinguishing a consistent from a partial null-subject language requires consideration of contexts such as (7) and the syntax of impersonal constructions at a level of detail which is, probably, rarely attained in the research behind descriptive grammars. 1.2.5 Conclusion We see that there are four identifiable types of null-subject language. We can range them along a scale of “liberality” as follows: (16) expletive null subjects ⊃ partial null subjects ⊃ consistent null subjects ⊃ discourse pro-drop Placing discourse pro-drop languages at the right edge of the scale is motivated particularly if other arguments (direct and indirect objects) are taken into account; discourse pro-drop languages allow (referential) null objects as well as subjects, which is not the case for consistent or partial null-subject languages, across the board. And of course, we could add non-null-subject languages such as English at the left-hand edge of the scale in (16). All other things being equal, for each system Si in (16), the set of positions in which a pronoun can remain unexpressed in Si is a proper subset of the set of positions in which a pronoun can remain unexpressed in all systems Sj, where Si is to the left of Sj in (16).11 We will return to the possibility of arranging different kinds of systems into some kind of hierarchy in sections 3.3 and 3.4.

484 Anders Holmberg and Ian Roberts 1.3 Two Analyses of Null Subjects There have been two main views on the nature of null subjects and the NSP in the literature for some time. One view, most influentially put forward in Rizzi (1986), is that null subjects are occurrences of a phonologically unrealised, or empty, pronoun pro in the subject position (see Section 1 of Roberts (2010a) and Section 9 of Holmberg (2010a) for more details of Rizzi’s account). The other view, which has its origins in Borer (1986), is that there is no overarching requirement for a subject position as such (i.e. that the Extended Projection Principle (EPP) of Chomsky (1982: 10) does not hold, or at least does not hold universally). The null subject may then be directly expressed by the rich verbal agreement inflection, and there is therefore no need for a distinct empty pronoun to realise the subject function. This view implicitly accepts that agreement inflection can function like a pronoun, in that it can bear a grammatical function, and a thematic relation, in the way that nominal expressions typically do. Since agreement is located in the Infl, or I, position, this view is known as the “I-subject” view. This view articulates the intuition behind comments in traditional grammars of the kind exemplified in Section 1.1 above. The two views just sketched survive in current work. Developing Borer’s (1986) I-subjects idea, it has been suggested by various authors that, since person-number specification of the subject can be exhaustively computed from the verbal inflection, the preverbal subject is effectively optional and when it appears it acts as a left-dislocated (or more precisely “clitic leftdislocated”) element occupying a position peripheral to the clause (i.e. one not associated with a grammatical function) and with the verbal inflection functioning analogously to a clitic pronoun. In differing ways, this view is put forward by Alexiadou & Anagnostopoulou (1998), Barbosa (1995, 2009), Fassi Fehri (1993), Manzini & Savoia (2005), Nash & Rouveret (1997), Ordoñez (1997), Platzack (2004) and Pollock (1997). Holmberg (2010a) articulates what might be viewed as a version of this hypothesis. On the other hand, Cardinaletti (1997, 2004), Holmberg (2005) and Sheehan (2006) have argued that the subject position is present, at least in some null-subject languages, and hence it is filled by the null pronoun pro (see for example Sheehan (2010 on this). Holmberg (2005) and Roberts (2010a) follow Cardinaletti & Starke (1999) in taking pro to be a weak pronoun, that is a ‘deficient’ pronoun whose distribution is restricted to certain designated positions. Furthermore, following a long line of work going back at least to Rizzi (1982), they take pro to be licensed by a special pronominal feature (usually termed a D-feature) associated with the head bearing the features realised as person-agreement on the verb. The two views can be taken to converge on the idea that the inflectional head must be pronoun-like in a null-subject language (they may converge in other respects too; see Roberts (2010a, §2.5)). This appears to be the core of the null-subject parameter, whatever the further details. Let us encode this

Introduction 485 property formally as the presence or absence of a D-feature associated with T. Following the yes/no-question format for parameters adopted in Roberts (2007a), let us then state the NSP as follows: (17) The Null Subject Parameter: Does T bear a D-feature? T clearly does not bear a D-feature in non-null-subject languages, while it clearly does in consistent null-subject languages. Partial null-subject languages and languages only allowing null expletives do not have a D-feature in T, which is why they allow referential null subjects only under very restricted circumstances (partial null-subject languages) or not at all; see Holmberg (2010a), Holmberg & Sheehan (2010) and Roberts (2010a) for refinements. We leave open for now the status of discourse pro-drop languages, although it is worth noting that if, following the Nominal Mapping Parameter, non-D elements can function as arguments, perhaps the requirement for a D-feature on T is waived in such languages. We will explore this a little more in Section 3.4 below (and see Roberts (2010a, §2.5)).

2. The NSP in the Context of P&P Theory 2.1 Rizzi (1982): Clustering Properties in Romance and English The NSP has played a prominent role in the theoretical study of comparative syntax in recent years, not just because of the characterisation it gives us of languages like Italian, and how they differ from English, but primarily because it has been seen as a good example of the way in which rather abstract grammatical properties (such as that given in (17)) may have proliferating effects, unifying apparently unrelated surface phenomena. To see the full importance of this idea, we need to consider Chomsky’s (1964: 28f.) definitions of levels of adequacy for linguistic theory. These were observational, descriptive and explanatory adequacy. An observationally adequate grammar presents the data correctly, while a descriptively adequate grammar “specifies the observed data . . . in terms of significant generalizations that express underlying regularities in the language” (Chomsky (1964: 28)). Explanatory adequacy “can be interpreted as asserting that data of the observed kind will enable a speaker whose intrinsic capacities are as represented in th[e] general theory to construct for himself a grammar that characterizes exactly this intuition”; in other words, attaining explanatory adequacy involves showing how a given empirical phenomenon can be deduced from UG. The postulation of parametric variation in UG principles was a very large step in the direction of explanatory adequacy, since, one could assume, if we can say that this syntactic feature of this language is due to setting that parameter to that value, we have provided an explanatorily

486 Anders Holmberg and Ian Roberts adequate account of the syntactic feature in question in that we have related it directly to UG, as well as a descriptively adequate account to the extent that the analysis of the relevant property of the language is correct. Moreover, the parametric account has immediate cross-linguistic implications, since it implies that another language lacking the property in question will set the parameter in question to a different value. Now, if each parameter value determines a cluster of disparate syntactic features, then explanatory adequacy is enhanced, especially if certain features are readily accessible to acquirers on the basis of impoverished evidence while others are hardly likely to be easily accessible. In this case, arriving at a parameter value determining both the accessible and relatively inaccessible feature gives us a simple account of how the inaccessible feature can be acquired, thus accounting for an aspect of the poverty of the stimulus to language acquisition and thereby, again, reaching explanatory adequacy. At the same time, other things being equal, a “typological” prediction is made: the inaccessible feature will be acquired whenever the acquired one is, since both reflect the same abstract property of Universal Grammar, the setting of a given parameter to a given value. Let us state the following conjecture in relation the “clustering effect” associated with parameters:12 (18) A substring of the input text S expresses a parameter P just in case a grammar must have P set to a definite value in order to assign a wellformed representation to S. (Roberts (2007a:133)) (19) For any UG parameter P, fixing P at value vi entails a cluster C of grammatical expressions of P(vi), while setting P to value vj≠i entails a disjoint cluster C’ of expressions of P(vj≠i). (20) A cluster of P-expressions is a set of properties of a surface morphosyntactic form of cardinality equal to or greater than 2, which are reflexes of P’s setting to a determinate value vi. (21) A substring of the input text S is a trigger for parameter P if S expresses a determinate value vi of P. (Roberts (2007a:133)) It follows from these definitions that a trigger for a given parameter value is included in the cluster of expressions of that parameter value, and indeed properly included in that set if triggers must be unique (if there is to be a one-to-one mapping from triggers to parameter values). A trigger must be accessible in the primary linguistic data (PLD), while the other expressions in a given cluster C may be relatively inaccessible, along the lines described above. The properties connected to the NSP by Perlmutter (1971) and, in particular, Rizzi (1982), can be seen as a cluster. We take the NSP to be a UG parameter, and fixing the NSP at value + entails the cluster C of grammatical expressions of NSP(+) in (22):

Introduction 487 (22) a. The possibility of a silent, referential, definite subject of finite clauses. b. “Free subject inversion.” c. The apparent absence of complementiser-trace effects. d. Rich agreement inflection on finite verbs. (22b) refers to the general possibility of expressing an overt subject, usually with a focus interpretation, in postverbal position: (23) a. b.

Hanno telefonato molti studenti. *Ont téléphoné beaucoup d’étudiants. Have telephoned many students. “Many students have telephoned.”

“Free inversion” is in fact subject to slightly differing constraints in different languages, being more freely available in Spanish and Greek than in Italian, for example; see Sheehan (2006, 2010) and the references given there. (22c) relates to Perlmutter’s generalisation, since it originates in Perlmutter’s (1971) pioneering work. Perlmutter’s generalisation expresses the fact that in non-null-subject languages the subject of a finite clause cannot undergo wh-movement if the complementiser introducing the clause is present. This constraint holds of English and French, as the following examples show: (24) a. *Who did you say that -wrote this book? b. *Qui as-tu dit qu’ -a écrit ce livre? (=(24a)) Here the questioned constituent (who/qui) corresponds to the subject of the subordinate clause, so there is a “gap” in that position. The ungrammaticality of (24a) is known as a “complementiser-trace effect”, since in many versions of the theory of movement it is held that the empty subject position at the movement site in the complement clause contains a trace of the moved wh-element. The idea that the presence of the complementiser determines the ungrammaticality of such examples is supported by the fact that (24a) becomes grammatical if that is omitted. In French, (24b) can be rendered grammatical by altering the form of the complementiser from que to qui. These points are illustrated in (25): (25) a. Who did you say -wrote this book? b. Qui as-tu dit qui -a écrit ce livre? (=(25a)) In null-subject languages, as Perlmutter observed, it appears that complementiser-trace effects are not found (Rizzi (1982) argued that in fact this is not true if certain structures covertly derived at the level of Logical

488 Anders Holmberg and Ian Roberts Form are taken into consideration). The subject of a finite clause introduced by a complementiser can readily be questioned in these languages: (26) a. b.

Chi hai detto che __ ha scritto questo libro? (Italian) Who have-2sg said that __ has written this book Pjos ipes oti ___ egrapse afto to vivlio? (Greek) who said-2sg that ___ wrote this the book “Who did you say wrote this book?”

This feature of the null-subject cluster can be reasonably thought of as relatively inaccessible in the PLD, while rich agreement inflection is presumably very accessible (especially given the known sensitivity of acquirers to inflections; see Wexler (1998)), and the other two properties may be somewhat accessible. In terms of the definitions above, setting the NSP to the value [-] entails the disjoint cluster C’ of expressions of NSP(-): no null subjects, no free subject inversion, complementiser-trace effects and “poor” agreement inflection. English, French and the Mainland Scandinavian languages have the cluster C’, as do many creoles. The above discussion relates primarily to language acquisition, and shows how parametric clusters take us towards explanatory adequacy. But it is also clear that this approach defines language types. In this way, typology, in the sense of the establishment of cross-linguistic relations and of a structure to cross-linguistic variation, and acquisition become intrinsically related. This is a very positive development as it clearly opens the way to a two-pronged empirical approach to understanding the nature of UG. It is also worth noting that the definitions given in (19–22) provide very strong definitions of types, far stronger than what is usually found in the Greenbergian typological tradition. According to these definitions, all the properties in C are biconditionally related to one another. In other words, all things being equal, we can derive the following set of implicational statements: (27) a. A language has null subjects iff it has rich agreement. b. A language has null subjects iff it has free subject inversion. c. A language has null subjects iff it does not show complementisertrace effects. d. A language has rich agreement iff it has free subject inversion. e. A language has rich agreement iff it does not show complementiser-trace effects. f. A language has free subject inversion iff it does not show complementiser-trace effects. In Rizzi’s (1982) terms, all of these properties are connected by the presence of the silent pronoun pro in the subject position. This element is licensed

Introduction 489 by rich agreement inflection, and can satisfy the general requirement for a subject position (the Extended Projection Principle of Chomsky (1982: 10)), allowing an overt subject to appear in the “freely inverted” position (see Sheehan 2010 for more on this), and indeed to be wh-moved from this position (as argued by Rizzi (1982)). Thus the formal property which underlies the NSP, on this analysis, is the availability of pro subjects. Once acquirers deduce this (on the basis of the universal principles determining the availability of null pronouns), they will immediately deduce the existence of the other properties in the cluster, and the very strong implicational links among the properties in (27) follow. Hence we expect strong typological correlations to support parametric clusters, and thereby to motivate analyses of the general type instantiated by Rizzi’s account of the cluster associated with the NSP. One might also, at least initially, think that all languages must fall on one side of the other of a parametric divide that distinguishes cluster NSP+ from cluster NSP-, and that perhaps this is true of all clusters defined by all parameters. 2.2 Gilligan (1987): Universal Clusters Building on Rizzi (1982), then, it was possible to take the NSP to define two disjoint language types, in which the four properties in (22) are biconditionally related to one another. Putting aside the partial null-subject language Brazilian Portuguese, the validity of this typology across Romance looks reasonable: all the Romance languages except French (and some varieties of Rhaeto-Romansch) appear to show the properties in (22).13 But of course if this is a valid linguistic typology, it should extend further. Consider, then, the Celtic languages Welsh and Irish. (Literary) Welsh and Irish are both null-subject languages (on Irish, see McCloskey & Hale (1984); on Welsh, see Roberts (2005, Chapter 2), Borsley, Tallerman & Willis (2007: 34)). In both systems, the presence of null subjects appears to be fairly closely tied to rich agreement inflection on the finite verb. However, it is very hard to evaluate the status of the correlation with the absence of complementiser-trace effects, since subjects never appear adjacent to finite complementisers in these languages owing to the fact that the basic word order in finite clauses in VSO. This highlights a major difficulty in trying to generalise the correlations in (22): in general, such tightly connected clusters can only hold among languages where a range of further conditions are held constant, and this in practice often means rather closely related languages. Even in the case of closely related languages there may be other, independent differences which disguise the effect of a parameter. Among the Germanic languages English and the Mainland Scandinavian languages show none of the properties of (22) (except for the dialects which allow that-trace extractions; see Sobin (1987), Lohndahl (2007)). But the status of Afrikaans, Dutch, and German is less clear, since their SOV order makes it difficult to determine whether free inversion occurs. As for Icelandic, it shares with Brazilian

490 Anders Holmberg and Ian Roberts Portuguese the property of being a partial null-subject language, which therefore does not conform to the classical NSP (see Holmberg,2010a). Simple, surface evaluation of correlations of the kind in (22) is not possible, without making further assumptions. In this respect, such correlations differ from Greenbergian correlations such as “If a language is OV, then it has postpositions” in that more assumptions are required in order to test them. This, of course, is connected to the fact that the correlations arose from a deeper syntactic analysis of the languages in question than that underlying the simpler Greenberg-style correlations Our intention is not to criticise either Rizzi’s or Greenberg’s approach to establishing typological correlations; both may be able to reveal previously unsuspected cross-linguistic correlations. The point is simply that correlations of the type in (22) may be very difficult to establish and maintain as more and more extra variables are brought into play. For example, what do we conclude from the fact that Welsh and Irish are null-subject languages in which the presence of complementiser-trace effects cannot be determined in finite clauses owing to the existence of VSO order? It seems reasonable to conclude that VSO order neutralises this property, and so here we do not have a true counterexample to the proposed cluster, but at the same time we do not have the strong biconditional relation of the type in (27). Such biconditional statements must be subject to a general qualification along the lines of “all other things being equal”. Another potential counterexample is posed by those dialects of Mainland Scandinavian and English which do not show the that-trace effect in finite clauses (cf.Sobin (1987), Lohndahl (2007). (28) Vem tror du att _ är intelligentast? (Fenno-Swedish) who think you that is most intelligent ‘Who do you think is most intelligent?’ Again, such facts indicate that the correlation between the that-trace effect and the other properties of (28) is not biconditional. Null subjects and free inversion entail the absence of that-trace effects, but not vice versa. This is not necessarily an argument against Rizzi’s theory. Note that the theory has an explanation of the correlation of absence of that-trace effect with the other properties, which is that the presence of pro in SpecTP makes it possible for A’-movement to by-pass this position, moving directly from a lower subject position to Spec,CP. In recent work, Rizzi argues that Spec, TP is a ‘criterial position’ in his terms, which means that movement out of this position is universally impossible; cf. Rizzi & Shlonsky (2005). This creates a problem for subject extraction which every language has to deal with somehow. In null-subject languages the problem is solved quite simply by by-passing Spec,TP. English and French solve it by operating on C, which in English is morphologically realised as deletion of that, in French as substitution of qui for que, thus modifying the criterial property of Spec,

Introduction 491 TP (see Rizzi & Shlonsky for details). Fenno-Swedish, Ozark English, etc. have to solve this problem, too. The most likely hypothesis is that these varieties do not solve it in the manner of the null-subject languages, but perhaps in a manner akin to that in other varieties of Mainland Scandinavian and English, by operating on C, but without a morphological effect on the complementiser. A further alternative may be that the EPP does not apply at SpecTP here, allowing that position to be bypassed by “long” wh-movement. The point here is that complementiser-trace violations are not predicted not to occur in non-null-subject languages. Again, we see that the correlations are not accurately formulated as biconditionals. Returning to the Celtic languages, it is difficult to be sure of the status of free inversion in these languages. One might claim that the position occupied by the subject in VSO order is in fact that of a freely-inverted subject, although the fact that the subject position is interpolated between an auxiliary and non-finite verb in compound tenses in VSO languages, while in cases of free inversion the subject follows both the auxiliary and the non-finite verb (see (23a)), argues against this. Nonetheless, we could tentatively conclude for Celtic that complementiser-trace effects cannot be tested owing to effect of VSO order. In both cases, the very strict nature of the correlations is part of the problem. Again, if the correlations were stated as one-way implications, or were hedged by a statement such as “all else, particularly basic word order, being equal”, the situation would be clearer in that the status of these languages as true counterexamples would be easier to determine. The Celtic languages, although typologically quite different in a number of superficial respects from Germanic and Romance (see Haspelmath (2001)), are nonetheless Indo-European languages and have been in contact with some Germanic and Romance languages for millennia. If we want to really establish the universal coverage of a cluster like (22) and the typology it implies, we should of course move far beyond merely comparing Germanic, Romance and Celtic. However, we can only expect that the kind of difficulty involved in interpreting the data, sketched above in relation to Celtic, will multiply, perhaps out of control. Again, this stems from the degree of syntactic analysis needed to establish the relevant properties combined with the very strict nature of the clustering we have postulated. Gilligan (1987) tested the correlations put forward by Rizzi (1982) against a 100-language sample, which he attempted to correct for areal and genetic bias. As reported in Newmeyer (2004: 202–206; 2005: 88–92) and Croft (2003: 80–84), Gilligan found just four robust cross-linguistic correlations, taking the properties discussed by Rizzi pairwise: (29) a. Free Inversion → expletive null subjects b. Free Inversion → allow complementiser-trace violations c. Referential null subjects → expletive null subjects d. Allow complementiser-trace violations → expletive null subjects

492 Anders Holmberg and Ian Roberts Newmeyer (2005: 90–1) concludes that “[t]hese results are not very heartening, . . . In three of the four correlations, null non-thematic subjects are entailed, but that is obviously a simple consequence of the virtual non-existence of languages that manifest overt non-thematic subjects.” In a sense, this situation is exactly what we would expect, given the above considerations: expanding the database from roughly ten to roughly one hundred languages simply multiplies the number of uncontrolled variables to a point where, without further detailed analysis of a wide range of constructions in a wide range of languages, the correlations can no longer be discerned. In a word, then, these results are inconclusive (this is not at all the conclusion Newmeyer draws, as we will see in the next section). However, Gilligan’s results are not quite as inconclusive as they first appear. Consider the only implication which does not involve null expletive pronouns: (29). This relates to one of the most important conjectures in Rizzi’s (1982) analysis: that true complementiser-trace violations are in fact universally ruled out (due to the ‘criterial’ nature of Spec,TP, according to Rizzi & Shlonsky (2005)), and null-subject languages can circumvent them owing to the availability of a different position from which movement can take place in finite clauses introduced by C, the “freely inverted” subject position, as discussed above.14 This claim has very clear explanatory force in relation to the poverty of the stimulus: the acquirer encountering the relatively accessible phenomenon of free inversion in the PLD will thereby “acquire” the possibility of the complementiser-trace violations, an otherwise fairly inaccessible aspect of the PLD. Moreover, we can in fact combine the implications in (30) to give a modest implicational scale along the following lines: (30) Free Inversion → (allow that-trace violations → expletive null subjects). This scale defines three types of language: Type I has all three properties; Type II allows complementiser-trace violations and hence allows expletive null subjects, and Type III only allows expletive null subjects (here “allowing complementiser-trace violations” is intended to mean that no manipulation of C is required for long wh-movement of the lower subject to be possible, hence the varieties of English and Mainland Scandinavian discussed above are not relevant here). It is worth noting that we have weakened the original clustering claim to a series of one-way implicational statements; in other words, the claims in (30) are weaker and therefore easier to support empirically than those in (27). As we have now observed several times, this seems desirable for establishing parameter-based typologies. This seems to be due to the intricate nature of parameter interactions. In his study of the null-subject parameter in creoles, Nicolis (2005, 2008) observes that Cape Verdean, Berbice Dutch, Kriyol, Mauritian, Papiamentu and Saramaccan all have expletive null subjects and tolerate that-trace violations, but do not allow referential null subjects (Nicolis (2008:x)); while

Introduction 493 Haitian and basilectal Jamaican have expletive null subjects but do not tolerate that-trace violations. No creole has free inversion. So we have the following picture:15 (31) Type I: Italian, Spanish, Greek, etc. Type II: Cape Verdean, Berbice Dutch, Kriyol, Mauritian, Papiamentu, Saramaccan Type III: Haitian, basilectal Jamaican So we see that a very strict, theory-driven typology such as that which emerges from Rizzi (1982) makes very strong predictions which almost immediately become very difficult to evaluate as soon as the cross-linguistic database is extended only modestly. Unsurprisingly, it becomes impossible to evaluate once further languages are taken to consideration, as shown by Gilligan’s survey. When a very large number of genetically and typologically highly diverse languages are compared for the “same” properties, with no control as to the other typological features of these languages, the original correlations were shown not to hold in their original form, although four implicational statements could still be gleaned. To us, this does not seem like a bad or shocking result for parametric theory, but rather a fairly promising result from the admixture of a very large amount of essentially random data into an originally carefully controlled database. The fact that any coherent patterns survived is telling, and a sign that Rizzi’s observations were clearly on the right track. We conclude that the claim that results such as Gilligan’s invalidate the particular parametric cluster proposed is not warranted since, first, we simply do not know enough about dozens of problematic languages in order to be sure whether they are genuinely counterexamples; second, the most intriguing, non-obvious and explanatory conclusion of Rizzi (1982) remains (namely, the implication in (29b)); third, a more modest implicational hierarchy can be constructed on the basis of Gilligan’s results, one which shows that Rizzi’s cluster has some significance for language typology. In the next section, we review the consequences of drawing the opposite conclusion from this one. 2.3 Newmeyer (2004, 2005, 2006): Negative Conclusions We have seen that the kind of clustering of properties predicted by a parameter like the NSP can be construed as making very strong typological predictions, but the exact evaluation of these predictions may in practice be rather difficult. At every step, possible counterexamples and difficult cases proliferate, and, although in our judgement Rizzi’s original proposals certainly have a valid crosslinguistic core, it is of course always possible that a given proposal is wrong. This alone would not, of course, invalidate the approach.

494 Anders Holmberg and Ian Roberts In earlier work (Roberts & Holmberg (2005)), aside from defending parametric theory in general against a range of specific charges made against it in Newmeyer (2004), we suggested a different example of parametric clustering. This concerns an abstract feature of Agr, or T, which accounts for a range of differences between the Insular Scandinavian languages (ISc) and the mainland ones (MSc) involving the possibility of null nonreferential subjects, non-nominative subjects, stylistic fronting, V-to-T in embedded clauses and relatively rich subject verb agreement; see Holmberg (2010a). Newmeyer’s (2006) response to this is that the parametric cluster in question lacks sufficient cross-linguistic justification: For the facts in (1a-e)[i.e. the correlating syntactic properties just listed, AH/IR] to support parametric theory, it would be necessary to demonstrate that in language after language the constellation of properties particular to ISc and MSc reappear. But aside from some brief remarks about Middle English, Yiddish, and Old French, two of which, like Scandinavian, are Germanic, and the third of which was heavily influenced by Germanic, they ignore the typological dimension entirely. (Newmeyer (2006: 3)). Here it becomes clear that Newmeyer sees it as a requirement that, in order for any observed, or predicted, cluster of properties “to support parametric theory”, what he refers to as “the typological dimension” must be taken into account. As becomes clear from the following discussion of each of the surface properties we proposed to be connected to a single formal grammatical property, what he calls the “typological dimension” means bringing in as much unanalysed data from as many unrelated languages as possible. For the reasons given in the previous section, the likelihood is that this approach will only obscure any patterns that may be observable in more controlled data samples (although, as we have observed, any partial or weakened generalisations will be all the more striking, as is the case for (29b)). But for Newmeyer only the strongest possible justification can satisfy the “typological dimension”: every language must fall on one side or the other of the proposed parametric divide (i.e. every parameter must be shown to define a cluster corresponding to the value P+, and the complementary cluster C’ = P- in terms of (18-21) above), and this must be ascertainable on the basis of a superficial survey of the relevant properties. If not, the parameter lacks empirical support. But almost no known distinction among languages will meet this desideratum for empirical support. For example, the correlation between VO vs. OV order and pre- vs. postpositional order has 141 languages defined as not falling into one of the four types, and 48 straight counterexamples, according to Dryer (2005: 386). Taking Newmeyer’s “typological” stricture seriously would presumably cast doubt on this correlation, too. More generally, it will simply result in the abandonment of any attempt to establish cross-linguistic generalisations.

Introduction 495 Newmeyer (2004, 2005, 2006) concludes from his discussion of the NSP, and equally superficial discussion of a range of other proposed parameters, that parameter-based approaches to cross-linguistic variation “have failed to live up to their promise” (2005: 181) and that “the hopeful vision of UG as providing a small number of principles each admitting of a small number of parameter settings is simply not workable” (2005: 185). Instead, he suggests that “language-particular differences are captured by differences in language-particular rules” (2005: 183). The nature of these “rules” is never really clarified, and so the last claim is hard to evaluate. One case he discusses, though, is that of the head-complement parameter, which, according to Newmeyer (2004, 2005: 74) should be replaced by two rules: (a) Complements are to the right of the head, and (b) Complements are to the left of the head. Some languages apply (a), other languages (b). In Roberts & Holmberg (2005) we pointed out that this is equivalent to a parameter in the sense of P&P theory: UG leaves two options open, and acquisition of head-complement order is a matter of choosing between the two options on the basis of experience. Newmeyer (2006) agrees, but insists that if a set of grammatical facts can be characterized as a rule, then it should be, rather than being characterized as the value of a parameter.16 However, if rules are, at least in certain cases, empirically equivalent to parameters, we can conclude that, in his view, they will fare just as badly, by his criteria, in accounting for cross-linguistic variation. At the very best, the proposed rules are as inadequate as he proposes parameters are. At worst, they may be much worse: the only construal of “rule” in the literature on generative syntax refers to phrase-structure rules and transformational rules. Such rules were thought of as largely language-particular, language acquisition consisting of the selection among the infinite class of such rules compatible with the PLD on the basis of an evaluation metric. If this is what Newmeyer is advocating, it is clearly a retrograde step, and undoubtedly represents a retreat from explanatory adequacy (the same observation is made in Dryer’s (2008) review of Newmeyer (2005): cf. Dryer’s (245–246) remark that “Newmeyer argues for . . . a retreat to a version of C[homskyan]G[enerative]T[heory] from the period of Chomsky (1973) up to, but not including Chomsky (1981)”). If it is not what he is advocating, he owes us an account of his notion of rule. Even then, if he retains the view that rules are largely equivalent to parameters, there is no reason to adopt his approach over a parameter-based one, since, as he has argued, the parameter-based approach is inadequate. What must be shown is that the notion of rule, as opposed to parameter, is both empirically superior to the notion of parameter (i.e. more descriptively adequate) and can take us closer to understanding UG, i.e. to explanatory adequacy, than the notion of parameter. But as far as we can tell, Newmeyer merely gives up the pursuit of explanatory adequacy along with the notion of parameter. For this reason, we cannot accept his conclusions.

496 Anders Holmberg and Ian Roberts Implicit in the above discussion has been a distinction between two approaches to universals, or to cross-linguistic correlations of any kind. These are, roughly speaking, the Chomskyan approach and the Greenbergian approach (see also Rizzi (1994)). The Chomskyan approach is based on the pursuit of explanatory adequacy. Hence the central idea is that a given piece of grammatical knowledge hardly accessible through experience given normal assumptions about the nature of the PLD can be acquired “for free” given the existence of a correlating, more accessible piece of evidence. Rizzi’s discussion of the NSP meets this desideratum, and the survival of implication (29b) under Gilligan’s empirical scrutiny is highly significant in this connection. The other approach involves the observation of properties which covary on the surface, without prejudice as to any deeper correlations or povertyof-the-stimulus considerations. The empirical scope of the generalisations unearthed in this tradition is impressive (see in particular WALS, Dryer & Haspelmath (2013)), and seems to set an important research agenda, and a series of empirical challenges, to generative approaches committed to more abstract analyses and, in particular, to achieving explanatory adequacy. As Gianollo, Guardiano & Longobardi (henceforth GGL, 2008:x) point out, both approaches to typology have flaws, particularly if they are seen as heuristics for the accumulation of information regarding which properties (co-)vary and which do not. As they point out, the Chomsky-Rizzi approach, in which poverty-of-stimulus considerations remain paramount, “provides deep and often correct grammatical insights, but, alone, it may lead little further than the study of single parameters in a few languages” (ibid). We have seen above the reasons why this may be so. The Greenbergian approach, they say, is subject to two objections. First, “it is practically hardly usable for the relevant purposes, because of the depth of grammatical investigation required by sound parametric analysis” (again, we have made this point in the foregoing). Second, and perhaps more challengingly, “it is anyway likely to be insufficient if investigation is not guided by strong abductive and theory-oriented considerations, even if hypothetically extended to all languages, because the cardinality of the universal parameter set is so wide as to generate a number of possible languages of which the actual existing or known ones represent an infinitesimal sample” (ibid). In other words, typological observations, however well-supported by data from the currently available set of languages, may nevertheless represent accidental correlations coexisting contingently at this historical moment, and not reflect the true nature of a UG which generates a far larger set of languages than those currently extant. This second point, in fact, undermines Newmeyer’s objection to parameters such as the one put forward by Holmberg & Roberts (2005), whose effects are most visible in Germanic languages, on the grounds that “it leaves open the possibility of historical accident (language contact, drift due to some non-parametric cause)” (Newmeyer (2006: 5)); so does the Greenbergian approach, only on a wider historical and geographical scale. GGL go on to advocate a very interesting kind of “halfway” approach,

Introduction 497 based on their notion of Modularised Global Parametrization, which they argue is largely immune to both types of objection. Whatever the merits of GGL’s approach, we suggest that the fundamental error in Newmeyer’s critique of parameters is that he conflates the two original approaches. He therefore judges parameters proposed in the Chomskyan tradition, and which have something to offer from the point of view of explanatory adequacy, by the extensional, taxonomic, surface-oriented criteria of the Greenbergian approach. This inevitably leads to his highly negative assessment of work such as Rizzi’s on the NSP. Essentially, any parameter would have to show clustering effects which exhaustively partition all languages into two classes, with this partitioning immediately accessible to superficial scrutiny in all cases, i.e. with no masking effects due to extraneous properties or extensional overlaps in coverage. Ideally, some of the clustering properties would be relatively inaccessible in the PLD, giving the parameter explanatory depth. Of course, no proposed parameter (or Greenbergian universal) meets such stringent requirements. In the case of the NSP, for example, any and every kind of phonologically non-realised subject ought to fall under its purview, and this mistake underlies much of the criticism of the nature of the parameters and the alleged failure of the correlations. But it has been known since the earliest work that there are at least two, if not three, different kinds of null-subject systems: the Italian-style “consistent” system, those allowing only non-referential null subjects and the East Asian type of “discourse prodrop” (more recently, “partial” systems have been added to this list, as we have seen). Newmeyer arrives at his position because his critique of the parametric approach conflates and confuses the aims and methods of two distinct traditions. Given this, it is no surprise that he advocates a retreat from the goal of explanatory adequacy. And it is fundamentally for this reason that we cannot accept his conclusions. 2.4 A Performance–Efficiency-Based Alternative So, after setting the bar for descriptive adequacy for any proposed parameter impossibly high with his allusion to the “typological dimension,” Newmeyer proposes a retreat from explanatory adequacy in the form of a return to the rule-based systems of the 1960s and 1970s, thereby dissolving the link between acquisition and typology discussed in §2.1 above. Nonetheless, he does believe that a number of typological observations (mostly made in the Greenbergian tradition) reflect genuine cross-linguistic generalisations. He suggests, however, that these are amenable to “performance explanations” of the type advocated by Hawkins (1994, 2004). Hawkins’ central idea is the Performance-Grammar Correpondence Hypothesis (PGCH), which we state as follows: (32) Grammars have conventionalized syntactic structures in proportion to their degree of preference in performance, as evidenced by patterns of

498 Anders Holmberg and Ian Roberts selection in corpora and ease of processing in psycholinguistic experiments. (Hawkins (2004: 3), cited in Newmeyer (2005: 119)) Preferences are the reflection of efficiency: “speakers attempt to increase efficiency by reducing structural complexity” (Newmeyer (2005: 122–123)). Hawkins proposes three efficiency principles, one of which is Minimize Domains (MiD). This efficiency principle is said to be able to account for the fact that OV and postpositional orders tend to cooccur and VO and prepositional orders tend to cooccur, but OV tends not to cooccur with PO and VO tends not to cooccur with OP. The idea is that in the consistently “head-initial” (VO and PO) and “head-final” (OV and OP) orders, the “distance” from V to P or P to V is shorter than in the inconsistent orders. The structures in (34) and (35), adapted from Newmeyer (2005: 124), illustrate: (33) a. [VP V NP [PP P NP ]] b. [VP [PP NP P ] NP V ]

VO and PO (not far from V to P) OV and OP (not far from P to V)

(34) a. [VP V NP [PP NP P ]] b. [VP [PP P NP] NP V ]

VO and OP (too far from V to P) OV and PO (too far from P to V)

According to these schematised structures, in the frequently occurring orders of (34), only one NP intervenes between the two heads V and P, favouring these structures on the grounds that they are easy to process. The rarer orders in (35) are disfavoured because two NPs intervene. Newmeyer concludes that MiD and the similar efficiency principles can do the work that parameters have been proposed to do, only better. Although there are a number of obvious objections that can be raised against this rather simplified account,17 we will concentrate our discussion here on the conceptual issues. The key point, we submit, is the notion of “efficiency”. This is left implicit in Newmeyer’s account, although some general notion of “ease” of processing is clearly intended. Since we are dealing with a cognitive capacity, processing as an aspect of performance, it is reasonable to think of efficiency in terms of reduction in computational effort. If that is so, then, following Mobbs (2008), we suspect that there may indeed be a case to made for viewing typological skewings as the consequence of general computational principles. Rather than applying at the level of performance and processing, we suspect, like Mobbs, that they may be more deeply implicated in the organisation of the language faculty and the learning device. We will return to these points in §3.5 below, and look a little more at the other efficiency principles proposed by Hawkins, in addition to MiD.

Introduction 499 2.5 Conclusion Here we have described and illustrated the idea of parametric clustering in relation to Rizzi’s classic (1982) work on the NSP. We have observed the difficulties in straightforwardly generalising the predictions without further analysis and/or taking into account interfering factors (such as VSO order in the Celtic languages). Newmeyer’s critique of the NSP and of parameterbased approaches to cross linguistic work completely overlooks this point and conflates the two distinct approaches to comparative syntax that have emerged in recent years. This leads to him setting the bar for even descriptive adequacy for the postulation of parameters impossibly high, with the logical but regrettable result that he retreats entirely from the goal of explanatory adequacy in comparative syntax. We briefly illustrated his preferred approach, based on Hawkins’ Performance-Grammar Correpondence Hypothesis (PGCH) and associated efficiency principles, which we return to in §3.5. By and large, though, we reject Newmeyer’s conclusions regarding the principles-and-parameters approach to comparative syntax. At the same time, we recognise that the kind of clustering effect discussed in §2.1 and 2.2 is of limited typological interest, and that extending the typological purview of parameters like the traditional NSP may be problematic (although the brief discussion in §2.2 indicates that weaker clusterings and correlations can be envisaged). What is needed, as also pointed out by Biberauer (2008b), is a more systematic basis for extending the typological domain in which theoretically inspired parameters may be more readily evaluated. We now turn to this point.

3. Parametric Theory 3.1 Problems With the “Classical” Formulation Although we believe that aspects of Newmeyer’s critique of parameterbased comparative syntax are seriously misguided, and, for the reasons given above, we cannot accept his conclusions, it is nonetheless true that there are problems with the original conception of parameters. In our view, this warrants a refinement of the idea, rather than its abandonment (see also Biberauer (2008b)). One valid criticism is raised by Newmeyer (2005: 83). A consequence of the large amount of comparative work that has been done since the 1980s is that there has been a proliferation of parameters as descriptive devices. This has been particularly apparent in the “microparametric” work on closely related languages and dialects, typified by the papers collected in Kayne (2000, 2005). Given this proliferation of parameters, a natural question which arises is, quite simply, how many parameters there are. It is very likely that the number of parameters is in the hundreds, and at least possible that it is in the thousands. For example, Roberts (2007) discusses five well-known

500 Anders Holmberg and Ian Roberts parameters (null subjects, V-to-T, T-to-C, negative concord, wh-movement). The “head parameter” is arguably non-unitary, and breaks up into several sub-parameters, probably at least ten. There is also a parameter governing subject-raising to SpecTP (whose negative value, along with a positive value for V-to-T raising, gives rise to VSO orders), at least four parameters governing auxiliary selection (and this does not take into account the impressive microvariation found in Central-Southern Italo-Romance), several parameters governing ergativity (to account for varieties of split ergativity), presumably pertaining to the feature content of v, a parameter concerning the ability of C to Agree for Case with the subject of the TP it introduces (related to “Exceptional Case-marking”), a parameter determining the availability of the option of preposition-stranding and a parameter concerning whether wh-expressions are DPs. This brings the total to 24. Adding Polysynthesis, Subject Side and Serial Verbs from Baker (2001), the total comes to 27. Baker (2008b) proposes two macroparameters governing agreement (Direction of Agreement and Case-Dependency of Agreement). Furthermore, the null-subject parameter may break up into at least three parameters (see Holmberg 2010a). Finally, Longobardi & Guardiano (2009) propose 51 parameters which affect DP-internal syntax only. This brings the number of parameters up to 80, and, there is little doubt many more than this would be needed just to reach descriptive adequacy. Newmeyer concludes that “we are not yet at the point of being able to ‘prove’ that the child is not equipped with 7,846 . . . parameters, each of whose settings is fixed by some relevant triggering experience. I would put my money, however, on the fact that evolution has not endowed human beings in such an exuberant fashion” (2005: 83). Although, as Newmeyer implicitly admits, this is only a plausibility argument, we agree with him. It seems highly implausible that UG should specify detailed microparameters governing the nature of clitic systems or agreement systems (or classifier systems or tone systems) when so many languages lack such systems entirely. Clearly, what is needed is some structure to parameter systems, at the very least along the lines of specifying “if L has a clitic/agreement/tone/classifier system, then what particular kind of system does L have?”, where the consequent may break down into a further series of implicational choices. A related point arises from the discussion above of the correlations connected to the NSP in (22) and (26). All other things being equal, this type of clustering, deriving from a parameter closely linked to a UG principle (such as, in this case, licensing of a particular empty category), does yield a series of biconditional relations determining a range of properties that are predicted to be either all present or all absent in any language. We saw above, very briefly, the kinds of analytic difficulty that approach can entail. What is preferable, in fact, is weaker, one-way implications of the kind in (29), or the kind typically put forward in the Greenbergian tradition. But these, too, require a rather different approach to the nature of parameters than the classical one illustrated by Rizzi (1982).

Introduction 501 Arguably the real issue underlying both of these points is the familiar tension between descriptive and explanatory adequacy. Parameters have in recent years to an extent shared the fate of 1960s-style transformations. They are very powerful formal devices that make possible, for the first time ever, the precise, theory-internal description of cross-linguistic relations (and, correspondingly, descriptions of what children must be able to acquire). However, if over-exploited, and especially in the absence of any general restrictions on their form and functioning, these devices become mere facilitators of taxonomies. Newmeyer quite correctly observes that the very large number of parameters that we seem to need, just to approach cross-linguistic descriptive adequacy, reduces their efficacy as explanatory devices. What is required, as took place with the theory of transformations from late 1960s onwards, is a theory of parameters which places substantive restrictions on their form and function while maintaining their descriptive power.18 3.2 The Minimalist View In the context of the minimalist programme for linguistic theory, as it has been pursued in relation to syntax for almost two decades, a slightly different view of parameters has been widely accepted. This can be thought of, following Baker (2008a:3, 2008b:155f.), as the “Borer-Chomsky conjecture,” or BCC: (35) All parameters of variation are attributable to differences in the features of particular items (e.g. the functional heads) in the Lexicon. More precisely, we can restrict parameters of variation to a particular class of features, namely formal features in the sense of Chomsky (1995) (Case, φ and categorial features) or, perhaps still more strongly, to attraction/ repulsion features (EPP features, Edge Features, etc.). This view has a number of advantages, especially as compared with the earlier view that parameters were points of variation associated with UG principles, and arguably takes us a little way towards resolving the tension between descriptive and explanatory adequacy at the parametric level as described in the previous section. Let us see why this is so, particularly in relation to the latter point. First, the BCC imposes a strong limit on what can vary. In most versions of syntactic theory informed by minimalist ideas there are a small number of extremely general principles: Merge, Agree, Select, etc. It is highly unlikely that any of these are subject to parametric variation (pace Baker (2008a)). For example, it would be strange to propose a parameter determining whether or not Merge has to be binary, or whether or not internal Merge (i.e. Move) is possible. (Kayne (2009:1) suggests that internal merge may be a defining feature of human language, i.e. the “narrow language faculty” in the sense of Hauser, Chomsky & Fitch (2002)). Similarly, although

502 Anders Holmberg and Ian Roberts languages may vary a fair amount as to which features may be subject to Agree relations, the nature of the operation itself (the locality restrictions operative, the nature of Match, the definition of an active Probe, etc.) seems unlikely to vary. Of course, we have no proof that these kinds of variation do not exist, but they seem unlikely, have not—to our knowledge—been postulated, and would be excluded by the BCC. Second, as has often been pointed out (initially by Borer (1984: 29)), an advantage of the BCC is that associating parameter values with lexical entries reduces them to the one part of a language which clearly must be learned anyway. Ultimately, on this view, parametric variation reduces to the fact that different languages have different lexica, in that sound-meaning pairs vary arbitrarily: the most fundamental and inescapable dimension of cross-linguistic variation. The child acquires the values of the parameters valid for its native language as it acquires the vocabulary (more precisely, as it acquires the formal features associated with the functional categories of its native language). A third advantage of the BCC is that it imposes a restriction on the form of parameters. The attraction property of a functional head, which may be responsible for fairly major aspects of word-order variation, can be formulated in terms of a simple diacritic associated with a given functional head, or a given feature of a functional head. For example, the matrix C of a V2 language may be associated with a movement-triggering diacritic, causing an XP to move into its specifier position; in a “residual V2” language like English or French, the diacritic would be associated with a [wh]-feature of C, giving rise to movement of a wh-XP into SpecCP exactly where C has the [wh]-feature. In a language like Chinese, where there is no movement to SpecCP at all, there would be no such feature associated with C. We will give a more general formulation of possible parametric variation, consistent with the BCC, in §3.4 below. Clearly, if parametric variation can be restricted to formulations of this type, this would be a step in the direction of explanation. This simplicity of formulation in turn makes possible a statement of parametric variation at the UG level which relies on the logic of underspecification. It may, for example, be possible to state that parameter P has value vi, perhaps a movement-triggering/attraction property, when this is stated as such, and vj, the absence of this property, elsewhere. This in turn raises the prospect of applying disjunctive ordering to parameter setting, and thereby the setting up of markedness relations. For the moment, we merely note that the simplicity of the formulation of parameters, given the BCC and the typical minimalist conception of parameter setting, allows this. In subsequent sections we will explore the possible implications of this idea further. A further point, also made by Roberts (2001) and Roberts & Roussou (2003), is that the BCC, combined with a simple restriction of parameters to formal features of functional heads, allows us in principle to calculate the upper bound to the set of grammars. Suppose we have two potential

Introduction 503 parameter values per formal feature (i.e., each feature offers a binary parametric choice), then for n = |F|, the cardinality of the set of formal features, the cardinality of the set of parameters |P| is 2n and the cardinality of the set of grammatical systems |G| is 22n. For (rather implausible) illustration, assume 15 formal features, then we have n = 15, then |P| = 30 and |G| = 230, or 1,073,741,824. One further parametrisable formal feature raises |G| to 232, and so on. Since it is likely that UG makes available more than 15 formal parametrised features, the upper bound to the number of possible grammatical systems is likely to be considerably greater than just over a billion, as in the example just given. Of course these are upper bounds; what is not taken account of is interaction among parameters so as to rule out possible combinations of values; as GGL (2008:x) show, such interactions are likely to be pervasive. The lower bound is therefore very likely to be a great deal lower than the kinds of numbers resulting from the above calculation; at this stage it is hard to determine a general characterisation for this (for a general speculation, derived from Kauffman’s (1995) work on dynamical systems, see Roberts (2001: 91–93)). Although not in itself a step towards greater explanatory adequacy, this feature of the BCC has the heuristic value of allowing, in principle, an exact quantification of the cost of the postulation of a new parametrised formal feature. What has the above to do with the NSP? Rizzi (1982: 143) identified the core property allowing null subjects as a “pronominal Agr” in Infl in terms of the system he was assuming. In more contemporary terms, this can easily be restated as the presence of a D-feature on T (see §1.3, (19)), or, following Barbosa (1995) and Alexiadou & Anagnostopoulou (1998), on the verbal inflection itself. These two alternatives correspond to the two basic hypotheses for accounting for (non-discourse-driven) null subjects in Holmberg (2005: 536–537) and discussed in Roberts (2010a). If the requirement for a subject in SpecTP (Chomsky’s (1982: 10) Extended Projection Principle) can be reduced to the realisation of a D-feature on T or its Specifier, then there is no need for the subject to raise to SpecTP in a language with D in T, giving rise to the possibility of free subject inversion and the associated possibility of extraction from the inverted position, evading the ban on extraction from SpecTP when C is finite and overtly realised (the complementisertrace effect). Finally, if the D-feature is associated with rich agreement (see Roberts 2010a for a proposal for this), then we arrive at an account which ties together all four clustering properties associated with the NSP. Hence it is very easy to formulate the NSP in terms of a formal feature of a functional head.19 Finally, although the idea of reducing parameters to formal features of functional heads has largely been associated with “microparametric” approaches, it is not necessarily restricted to that case. In fact, we have just sketched a way to replicate the clustering predictions of Rizzi’s original NSP in these terms. Such clustering is usually seen as the hallmark of a “macroparameter.” Baker (2008a:4) points out that it “is perfectly possible

504 Anders Holmberg and Ian Roberts that a lexical parameter consistent with [the BCC, AH/IGR] could have a substantial impact on the language generated, particularly if it is concerned with some very prominent item (such as the finite Tense node)”. This is, in fact, exactly what we have just seen in relation to the NSP. In this section we have described the BCC, the idea that parametric variation is associated with formal features of functional heads, and we have indicated what we see as some of its advantages. We have pointed out that, by postulating a restriction on what can be parametrised, on the form of parameters, and, possibly, interactions among parameters determined by disjunctive ordering (and therefore markedness relations), the BCC can take us some way to resolving the tension between descriptive and explanatory adequacy we observed in §3.1. However, if we take seriously the idea that parameters are lexically determined, even as properties of functional heads, we run the risk of seeing them as “less universal” in various ways. For example, could there be a language-particular parameter? This presumably depends on whether we are willing to countenance language-particular features of functional heads. A second question is whether a single language, i.e. lexicon, can tolerate contrary specifications of the same parameter on distinct lexical items. Third, can a parameter simply fail to have a value in a given language, by no having lexical item which realises this feature? (Thanks to David Willis for raising these issues). We will return to this last question in §3.4 below; GGL, for example, allow for this possibility. But the most important objection to a purely microparametric approach remains the question of the highly specific innate endowment in formal features that the BCC appears to demand; this is clearly a stumbling block to true explanation. We will look at this question more closely in the next two sections. 3.3 Micro- and Macroparameters In introducing the BCC in the previous section, we have already introduced one way of thinking about the distinction between microparameters and macroparameters. Although it does not require it, as we saw, the BCC favours a microparametric approach. According to this view, cross-linguistic variation consists of variant features of (a subclass of) lexical items which determine a small range of variation, and larger-scale differences among languages represent the accumulation of numerous microvariants of this kind. On the other hand, macroparameters such as the Polysynthesis Parameter of Baker (1996) and, possibly, the Head Parameter determine in one fell swoop a huge range of possibly variant properties. As Baker (2008a:5) puts it “there are at most a few simple (not composite) parameters that define typologically distinct sorts of languages.” Baker gives interesting arguments for the existence of macroparameters alongside microparameters. In addition to two arguments based on the cross-linguistic distribution of different kinds of agreement marking (these are developed at greater length in Baker (2008b), but we will not go into

Introduction

505

them here), he gives an interesting statistical argument. Essentially, his point is that if all variation were microvariation, we would not expect to find coarse-grained types of the “head-initial”, “head-final” kind. If each category were able to vary freely, independently of all others, for its linear order in relation to its complement (this can of course be phrased in terms of triggering movement of its complement or not), then we would expect there to be a normal distribution of word-order variants across languages. As he says (Baker (2008a:10)), “there should be many mixed languages of different kinds, and relatively few pure languages of one kind or the other”. On the other hand, if there were only macroparameters, we predict, falsely, the kind of situation described above in relation to the NSP: every category in every language should pattern in one way or the other. But if we admit both macroparameters and microparameters, we expect to find a bimodal distribution: languages should tend to cluster around one type or another, with a certain amount of noise and a few outliers from either one of the principal patterns. This, Baker points out (again drawing on the statistics for OV/OP vs. VO/PO order in Dryer (2005: 386)), is essentially what we find. He suggests (pp. 11–12, citing his earlier 1996 work) that the same is true regarding polysynthesis. We find Baker’s argument fairly convincing. One might add a similar diachronic argument. A canonical example of microparametric variation comes from the Italo-Romance dialects. Although the variation among these varieties is highly impressive (as the 2,500 pages of Manzini & Savoia (2005) amply attest), a large number of features remain constant: all Italian dialects are SVO, all are prepositional, none show a systematic ergative case/agreement pattern (although some “split-ergativity” is attested), none is fully polysynthetic, none shows the Chinese value of Chierchia’s (1998) Nominal Mapping Parameter (i.e. in allowing a singular count noun to stand alone as an argument, giving I saw cat), all have definite and indefinite articles, all have moderately rich agreement systems, all (except a small number of Rhaeto-Romansch varieties; see Benincà & Poletto (2005)) have complement clitics, none has a full morphological case system, etc. On the other hand, the microparametric variation involving the existence and behaviour of subject clitics, the expression of negation, the position of both finite and non-finite verbs in relation to subject and complement clitics and various classes of adverbs, the nature of object- and subject-agreement on pastparticiples in compound tenses, the nature and choice of aspectual auxiliaries, the expression of various forms of finite and non-finite complementation, and a range of other properties, is extremely intricate. These are exactly the conditions which favour productive microparametric work, as Kayne has convincingly argued (see in particular Kayne (2005)). But, one could ask, why are certain properties variable in ItaloRomance and others not? The microparametric answer, as it were, is that no theoretical significance should be attached to what varies and what does not in this particular synchronic geographically defined domain; this

506 Anders Holmberg and Ian Roberts is attributable to a historical accident, in that the common features are due to a shared inheritance. But if we consider Latin, we find OV order, a full morphological case system, the complete absence of pronominal clitics, no (active) compound tenses, and a system of complementation in which finite clausal subordination was a minority pattern. As has often been observed, the Modern Romance languages (or the Italo-Romance subgroup) are more similar to one another than any of them are to their common ancestor Latin. The microparametric explanation for this observation would presumably appeal to the accumulation of microparametric changes in the common ancestor language before it broke up into the dialects, i.e. in Late or Vulgar Latin. The question here, though, is to what extent Vulgar Latin can be reasonably regarded as a single system; the term is generally used a cover term for the varieties of non-literary Latin spoken in Italy and elsewhere in the Roman Empire, whose written records are somewhat uniform but have been argued to form a koiné (Palmer (1961: 223)). In this connection, Clackson (2004: 790) says: “the construction of a uniform ‘Vulgar Latin’ probably oversimplifies a very complex linguistic situation. Different communities of speakers used different varieties”. If there ever was a single “Proto(-Italo)-Romance” variety, it would probably have to be dated rather early, as Hall (1950) suggests on phonological grounds (proposing 250–200BC, exactly the period in which Roman rule was extended to the whole Italian peninsula). Although the Latin of this period is known to differ somewhat from Classical Latin, and to have certain “Vulgar” features, it is highly unlikely that it had the syntactic characteristics of Romance rather than Classical Latin (OV rather than VO order, etc). It seems then that the current microparametric variation either derives historically from an archaic, typologically distinct, single ancestor variety of Latin, or there is no ancestor variety common to all the dialects. Either way, the major typological differences between Latin and (Italo)-Romance cannot be traced to a single microparametric change or series of microparametric changes in a single variety; there must have been typological drift across the varieties of Vulgar Latin. This poses a problem for a purely microparametric approach: other things being equal, we might have expected some dialects to have retained a case system, or OV order, or synthetic passive forms and not to have developed clitics, etc., others to have developed in the way we observe, and still others to have developed in a mixed fashion, preserving certain archaic features and innovating others. But what we observe, instead, is typological drift: from OV to VO, and in the general direction of greater analyticity (as elsewhere in Indo-European), allowing for a considerable amount of truly microparametric variation of the kind that we observe to develop. The simplest account of this kind of parallel development involves distinguishing macroparametric from microparametric change: certain macroparameters (OV vs. VO, for example) changed in the transition from Latin to Romance, while much of the synchronically

Introduction 507 observable variation among the Romance languages, and certainly among Italian dialects, involves microparameters.20 So let us conclude, with Baker, that macroparameters exist alongside microparameters. Then two related questions arise: (i) what are their properties? (ii) how are they distinguished from microparameters? Two rather unsatisfactory and partial answers to these questions are that macroparameters ought to be rather few in number, and they ought to be extremely pervasive in their influence on the grammatical system. The first point holds because, as Baker (2008a:7) says “[i]f there were many macroparameters and they interacted with one another in complex ways, then languages could differ crazily in ways that would be hard to pull apart.” But this does not seem to be the case in practice: universal properties and microparameters account for much that is shared and much that varies. The second point holds because macroparameters, perhaps by definition, can affect large-scale aspects of the grammar such as all headed phrases, or all instances of Agree. In the preceding section we argued that the BCC, which Baker quite reasonably takes to underlie microparametric variation, has a number of desirable consequences, and may even take us some way towards constraining the form of parameters in the way that is required in order to resolve the tension between explanatory and descriptive adequacy which has arguably arisen in this domain. But here we have suggested that Baker is right in suggesting that a small number of macroparameters may also exist. Baker (2008a:3) explicitly proposes that macroparameters are to be formulated in a manner incompatible with the BCC. So we appear to be in a quandary. We tentatively suggest a way out of this quandary which, we believe, points the way to truly resolving the tension between explanatory and descriptive adequacy in the parametric domain. This involves retaining a formally “microparametric” view of macroparameters, i.e. seeing macroparameters as aggregates of microparametric settings, but as proposing that these aggregate settings are favoured by markedness considerations. This proposal was made in Roberts (2007a:274) for the Head Parameter (and is suggested as an “intermediate” approach to the question of macro- vs. microparametric variation by Baker (2008a, Note 2)). It has often been noted that the Head Parameter is rather problematic. If it were a single (macro)parameter, determining the order of head and complement across all categories once and for all, it would predict a spectacular clustering of properties, which is not actually attested in the majority of languages. If it is broken down into a series of related microparameters relating to each head-complement pair then, without some further statement, all predictions regarding word-order correlations are lost. The preference for “harmonic” ordering seems to derive from an overriding tendency for independent parameters to conspire to produce a certain type of grammar. To capture this, Roberts (2007a:194) suggested that a restatement of Hawkins’

508 Anders Holmberg and Ian Roberts (1983) generalisation regarding cross-categorial harmony is needed, along the following lines: (36) There is a preference for the EPP feature of a functional head F to generalize to other functional heads G, H . . . We can think of (36) as an approximation to a markedness convention of the type proposed for phonology by Chomsky & Halle (1968, Chapter 9). To take a specific example, suppose, following Kayne (1994), that VO is the universal underlying order and, following Biberauer (2003), that OV orders derive from the combination of V-to-v raising and remnant VP-fronting to SpecvP, as illustrated in (37): (37) [vP [VP O (V)] v+V (VP)]] If movement represents a marked option, as suggested by Roberts & Roussou (2003), then v is set to a marked parameter value here. Following Chomsky & Halle’s notation, let’s write this as the mEPP value for v.21 In rigidly head-final languages like Malayalam or Japanese, many, perhaps all, functional heads will have at least one EPP-feature of this kind. Such systems will therefore emerge as very marked indeed, in terms of what we have said so far, and yet they are more common than “mixed” types like Latin, German, etc., which would be less marked on this approach. It is here that markedness conventions and the concept of the markedness of a whole system, or subsystem, of parameters comes in. Let us postulate, for concreteness, the following convention: (38) For a class of heads H, uEPP for H[F:—] ≠ v → [+EPP]/v[+EPP]; [-EPP] elsewhere If acquirers assign a marked value to H, they will assign the same value to all comparable heads. What (38) says is that the unmarked value of the EPP feature for some head of a particular type with an unvalued feature (i.e. a Probe, capable in principle of triggering movement) is [+EPP], i.e. the presence of an EPP feature, just where v has an EPP feature, i.e. in an OV system. This has the effect that, for all head-complement pairs, head-final is the unmarked order in an OV system, and head-initial in a VO system. In these terms, rigidly head-final languages are relatively unmarked, as of course are rigidly head-initial languages, while “mixed” languages are relatively marked (and one can in principle quantify exactly how marked different types of mixed systems would be). Furthermore, Dryer’s (1992) observation that VO vs. OV order is the basic determinant of ordering among other head-complement pairs is directly captured. One might speculate that v is the crucial category determining the markedness of the system with respect to word order because it is the head of the phase which determines argument

Introduction 509 structure and therefore the category whose features are most important for determining the positioning and licensing of arguments. We can understand a markedness convention like (38) in terms of conservatism of the learner, assuming that the learner exploits pieces of input, perhaps marked input, to the full. So we could entertain something like the following: (39) Generalisation of the input: (38) can naturally be understood in terms of (39). Moreover, both (38) and a simple feature-counting simplicity metric like that put forward by Roberts & Roussou (2003: 210 [see this volume, Chapter 6]) can be seen as different aspects of the overall conservatism of the learner, which essentially tries to set parameters in the most efficient way possible. The Subset Principle can also be seen in this light: one aspect of the learner’s conservatism is to avoid superset traps (see Berwick (1985), Clark & Roberts (1993 [this volume, Chapter 2]), Biberauer & Roberts (2009)). We will return to the question of why something like (39) should hold in §3.5. For the moment we may observe that macroparametric effects arise from aggregations of microparameters acting in concert for markedness reasons. One could perhaps extend this to the Polysynthesis Parameter. According to Baker (1996: 14, 17, 496), the central property of polysynthetic languages is that all argumental DPs must be correspond to a morphological expression in the head that θ-marks them. The reflexes of this very general condition (which Baker calls the Morphological Visibility Condition) are syntactic noun-incorporation, rich object- and subject-agreement marking, “free pro-drop” of all arguments, free word order and a range of other major morphosyntactic properties (see table 11.1, Baker (1996: 498–9)). Kayne (2005: 7) observes that there is a similarity between these features of polysynthetic languages such as Mohawk and what is found in clitic doubling/dislocation constructions in Romance. He speculates that “it could alternatively be the case that the systematic obligatoriness of pronominal agreement morphemes in Mohawk is just an extreme example of what is found to a lesser extent in (some) Romance” (ibid). Suppose, then, that Mohawk and other polysynthetic languages have generalised clitic leftdislocation (CLLD), while Romance languages have partial CLLD and languages such as English lack it altogether. There is a link to the NSP here, in that rich subject-agreement on the verb may be thought of as comparable to a clitic and the subject then seen as clitic left-dislocated (this is one variant of Hypothesis A in §1.3, (17), pursued by Barbosa (1995, 2009) and Alexiadou & Anagnostopoulou (1998)). Clitics can be thought of as the overt realisation of both the probe’s and the goal’s matching φ-features on the probe (see Roberts 2010a,c); this prevents the goal from being realised in its grammatical-function position (this applies to non-clitic doubling; the difference between clitic doubling and non-clitic doubling may have to do with the interpretability of the features, but we will leave that aside here).

510 Anders Holmberg and Ian Roberts We can thus envisage a markedness statement based on (38) which would specify that realising all features on all possible probes is the more marked option than never realising them, but less marked than realising them sporadically. Again, the Polysynthesis Parameter may reduce to an aggregation of microparameters concerning clitics/agreement, governed by a markedness constraint. Are there other macroparameters? One intriguing recent suggestion is due to Huang (2015). He observes that a range of properties, up to 20 or more, appear to cluster together in Chinese, as compared to English and other familiar European languages, or to Japanese, for example. Space prevents full discussion and illustration of these, but among them are the following: (40) a. Generalized bare N (denoting kinds). b. A generalized classifier system. c. No plural morphology. (41)

a. b. c. d. e.

Action verbs are atelic. No simplex accomplishment verbs. Resultative compounds or phrases. Periphrastic causatives. Extensive use of light verbs.

(42) a. No agreement, tense, case morphology. b. No wh-movement. c. Word order: “V2” counting backwards (no Kaynean VP-movement over adjuncts).22 d. No V-to-T movement. (43) Radical Pro drop. Huang proposes that these properties seem to cluster together in one language-type to the exclusion of another, because they are manifestations of the same generalization. He proposes the macro-parameter High Analyticity, which states that Chinese lexical items are highly analytic at three levels: the level of lexical categories, functional categories and argument structure. In particular, in a detailed study of the diachronic development of Chinese, Huang shows that the following properties have been lost: (44) a. wh-movement b. VP movement c. NP movement d. EPP movement e. Plural morphology f. Denominal suffix g. Causativizing suffix

Introduction 511 These losses have taken place in conjunction with the development of numerous particles, particularly for marking tense, mood, aspect and illocutionary force. It seems tempting, then, to try to maintain that “analyticisation” is loss of movement, associated to some degree with loss of morphology. In particular, it seems that Chinese lacks movement at the lowest structural level, inside the lexical phase. It is plausible to think of the properties in (40) as being related to the lack of N-to-n movement (assuming classifiers to be ns and plural marking to be determined or fed by N-to-n movement). Similarly, at least (41a–c) could be connected to the lack of V-to-v movement, assuming that a complex event structure such as that involved in accomplishments requires some form of incorporation in the vP phase (see Ramchand (2008)). The properties in (42) are more familiar, but perhaps depend on the prior formation of verbs by V-to-v movement (in the case of V-to-T movement) and the prior existence of lexical elements in the n/v position that can be probed by uninterpretable φ-features merged outside the first phase, in the case of (42a) for example. Furthermore, there appear to be no EPP-type movement triggers at the higher phase level (see (42b, c)), although, since Chinese is topic-prominent, Edge Features must be present on C. We will return to the relation of radical pro-drop to the absence of φ-features below; see also Saito (2007), Roberts (2010a). Putting all of this together with the very well-known fact that in Chinese it is impossible to distinguish nominal and verbal roots by their morphological shape, and that very many basic roots are entirely ambiguous between nominal and verbal interpretation, it is tempting to suggest that head-movement, in particular L-to-l movement (i.e. movement of the lexical root to the local phase head) is systematically absent in Chinese. More generally, phase heads appear to lack Agree-related movement triggers (EPP features). In the non-lexical phases (CP and DP), this has the consequence that probes are largely absent, as their putative goals are too deeply buried in the lexical phase to be accessible (assuming the version of the Phase Impenetrability Condition in Chomsky (2000), rather than the less restrictive version in Chomsky (2001)). So we might, very tentatively, conclude that Huang’s High Analyticity macroparameter results from an aggregate of head-movement parameters acting together, in this case in not triggering movement. In more familiar languages, we have the inherently more marked situation where heads sporadically trigger movement (although fairly systematically at first-phase level; but see Biberauer & Roberts (2010) for the proposal that (Modern) English lacks V-to-v movement). A final, very speculative suggestion comes to mind at this point. If there are macroparameters determining polysynthesis and high analyticity, are there parameters determining other morphological types? In fact, Julien (2002, Chapter 3), following a proposal in Kayne (1994), proposed that very many OV agglutinating languages typically showed the following structure, for all (or most) X: (45) [XP YP [X affix] (YP)]

512

Anders Holmberg and Ian Roberts

This is in fact the variant of the general OV parameter, generalised by (38), where the host head contains a bound morpheme. So (38) may fall into two subcases, depending on precisely this, the former giving OV order, the latter agglutinating morphology and OV order. It has been observed many times that rigidly OV languages tend to be agglutinating. Finally, we are led to think that perhaps the very first kind of typology ever proposed, the morphological typology put forward by Schleicher (1862) and Sapir (1921) among others,23 was on the right track after all, but it was really a pre-theoretical observation about syntactic macroparameters. The values of the macroparameters are so salient that one could not fail to notice their effects in the data, but, in the absence of a theory of syntax, it was not possible to discern their true nature. And so the generalisations were mistakenly thought to be morphological.24 Current parametric theory, especially given the distinction between macroparameters and microparameters as construed here, enables us to tentatively begin to do this. Here we have argued first that Baker (2008a) is correct in distinguishing macroparameters from microparameters, but that it is inadvisable to abandon the BCC. Instead, we have suggested that macroparameters are the result of aggregates of microparameters acting in concert, guided by the acquisition-based markedness constraint in (38). In the next section, we will relate all this to the NSP (and to the somewhat incidental question of whether this is a macro- or microparameter), to GGL’s proposal for “parameter schemata” and to a proposal for learning paths, in the sense of Dresher (1990), which relate macro- and microparameters. 3.4

Epigenetic Parameter-Setting

A further critical question concerning the general nature of parameters that Newmeyer (2005: 44) very correctly raises is “whether all parameters are applicable to all languages.” In the principles-and-parameters literature, the answer to this question has generally been positive, although it does lead to the questions concerning the status of a complex range of microparameters related to clitics, or agreement, or classifiers, in languages where these properties are lacking, as discussed in §2.1. GGL (2008) explicitly propose that the answer to this question should be negative. They propose that, instead of innate parameters, UG makes available a small set of parameter schemata, which, in conjunction with the PLD, create the parameters that determine the non-universal aspects of the grammatical system. In this way, parameters are created through interaction with the PLD in a fashion reminiscent of the Piagetian concept of epigenesis.25 They propose the following form for their schemata (pp. 7–8): (46) a. Grammaticalisation: is F, a functional feature, grammaticalised? b. Checking: is F, a grammaticalised feature, checked by X, X a category?

Introduction 513

c. Spread: is F, a grammaticalised feature, spread on Y, Y a category? d. Strength: is F a grammaticalised feature checked by X, strong? (i.e. does it overtly attract X?)

They illustrate the functioning of such schemata in detail, in relation to the feature definiteness, and its effects on the internal syntax of DPs in some of the 24 languages for which they have obtained data regarding 46 parameters. In a similar vein, Roberts & Roussou (2003: 213) propose the following set of options relating to a given formal feature F on the basis of their extensive analysis of grammaticalisation as a diachronic operation affecting the realisation of functional categories: (47) a. is F realised by (external) Merge (i.e. does it correspond to an overt grammatical formative?) b. does F enter an Agree relation? c. if so, does F attract? d. if so, does F attract a head or an XP? e. if (c), does F attract both a head and an XP? f. does F combine realisation by external and internal Merge? g. if so, does F attract a head or an XP? (Roberts & Roussou do not assume GGL’s initial question: whether F is present at all, assuming instead that all languages use the same set of formal features (see Roberts & Roussou (2003:29)). What (46) and (47) share is specifying a range of formal operations which can be associated with a given type of substantive feature (a formal feature of a functional head). They differ in detail, and this is certainly not the place to evaluate their relative merits. What we can note is that the sequence of statements involves a steady increase in specificity in each case. In fact, each statement is close to being disjunctively ordered in relation to the previous one, and it would certainly not be difficult to reformulate either (46) or (47) so as to make this more precise). Roberts & Roussou explicitly state that their system reflects a markedness hierarchy; GGL on the other hand make no such claim. Moreover, each set of statements has a kind of “branching” structure, which we can illustrate as follows for (47b-d), replacing (47a) with GGL’s option of the presence of F as a formal feature, and simplifying slightly:26 (48) F? No yes STOP does F Agree? No yes STOP does F have an EPP feature? No yes Does F trigger head-movement? yes No STOP

514 Anders Holmberg and Ian Roberts (This can also be done for (46)). Each “yes” option entails a further option. The “yes” options that do not dominate anything may entail further options regarding the type of head-movement, or, in the case of XP-movement, piedpiping options (see Richards & Biberauer (2005), Biberauer & Richards (2006) on the latter option); we leave further specifications aside here. Each more deeply embedded option is more marked than all less deeply embedded ones, since effectively the description of the parameter is the conjunction of all the dominating nodes, and so it increases in length as embedding deepens. “STOP” options on left branches are relatively unmarked options in each case. So here we see a parameter schema given as a network of options, each more embedded option representing a more specific, and therefore a more marked, option. Importantly, we can consider networks like (48) to define “learning paths” in the sense of Dresher (1990); again the conservatism of the learner is such that it prefers the path to be as short as possible, and so deeply embedded options are relatively marked owing to the fact that they have longer descriptions. Following GGL, we assume that the schema and the overall pool of possible features are given by UG; the network is created through epigenesis in acquisition, and markedness follows, on one standard construal, from increasing specificity (length of description relevant to F’s role in the grammar, and hence greater computational burden; we will say more about markedness in the next section). Now, parameter schemata of the kind in (46–48) apply to individual formal features. As such, they are classic examples of microparameters (and have many of the advantages of this kind of formulation of parameters discussed in §3.2). But we are now in a position to fruitfully combine these, or some of them at least, with markedness statements of the kind in (36), to derive some of the macroparameters discussed in the previous section. In a nutshell, macroparameters quantify over F in networks like (48). To see how this works, consider the EPP option embedded two levels down in (48). The markedness statement in (36) essentially says that the unmarked option for the grammatical system (i.e. not necessarily for F itself) is “no F has this value”, and that the next least marked option is “all F have this value”, and that the trigger for choosing is the value taken by v. More mixed, and therefore more marked, systems may relate the possession of F to further categorial features, and the options may become progressively more specific (have longer descriptions) and more marked. In other words, we have a cross-cutting set of options of the form: (49) a. Are movement-triggering features absent from all probes? b. If not, are movement-triggering features obligatory on all probes? c. If neither (a) nor (b), are movement-triggering features present on {T, v. ..}? The positive value of (49a) gives a rigidly, harmonically head-initial language like Welsh. The positive value of (49b) gives a rigidly, harmonically

Introduction

515

head-final language such as Japanese or Turkish. Again, (49c) breaks up into a series of microparameter, with a range of other factors enter here (options regarding pied-piping, and general constraints on disharmonic orders of the kind explored in Biberauer, Holmberg & Roberts (2007)). The existence of this set of cross-cutting options is determined by generalisation of the input. Applying (48) where F is universally quantified, we come very close to deriving the macroparameters discussed in the previous section (we have added one further option on the most embedded right branch, for further illustration): (50)

F? yes No STOP does F Agree? No yes STOP does F have an EPP feature? No head-initial yes head-final Does F trigger head-movement? Is F realised by external No yes Merge? STOP polysynthesis No Yes: High analyticity STOP agglutinating

(This may wrongly predict that polysynthetic languages are head-initial, but actually they appear to have free word order, precisely owing to their polysynthetic nature (see Baker (1996: 10ff.)). So we can finally arrive at a picture of the form of parameters as involving generalised quantification over formal features, as follows: (51) Q(f f ∈ C) [P(f)] Here Q is a quantifier, f is a formal feature, C is a class of grammatical categories providing the restriction on the quantifier, and P is a set of predicates defining formal operations of the system (“Agrees”, “has an EPP feature”, “attracts a head”, etc.). The longer the characterisation of either C or P, the more deeply embedded in a network/schema the parameter will be, the more marked it will be, and the further along the learning path it will be. True macroparameters sit at the top of the learning path, and in fact involve unrestricted universal quantification or its negation, as we saw in (49) (Baker (2008a:9) also suggests that macroparameters might be a kind of default). This seems to us to be a maximally simple theory of parameters, since ultimately it involves relations between sets of features of categories and predicates defining grammatical operations. Can we relate the NSP to these considerations? Roberts (2010a) speculates along just these lines. He suggests that putting together Saito’s (2007)

516 Anders Holmberg and Ian Roberts proposals regarding the nature of radical pro-drop (which were briefly described in §1.2.3) with the particular account of the “rich agreement” that facilitates consistent null subjects based on Müller’s (2005) notion of impoverishment (again, see Roberts (2010a) for details) we arrive at the following generalisations: (52) a. Radical pro-drop is possible iff φ-agreement is not obligatory. b. Consistent null subjects are possible iff there is no impoverishment of T’s φ- features. Where (52a) holds “discourse pro” is possible (i.e. subject pronouns can be merged at LF); where (52b) holds, deletion of subject pronouns is possible prior to LF (again, see Roberts’ paper for details). We see that the two systems are derivational mirror images of one another, and that this is the direct consequence of the different status of φ-features on probes (fully optional vs. obligatorily present and unimpoverished), which in turn is typically reflected in the agreement morphology (totally absent vs. “richly” realised).27 This further suggests a rethinking of the typological generalisations surrounding null arguments: perhaps the fundamental dimension of parametric variation is “radical” vs. “consistent” null-subject (or null-argument) languages, with partial and non-null-subject languages being subcases of the “consistent” type featuring varying degrees of impoverishment of the goal. The basic form of the parameter would then be as in (53): (53) a. Are uφ-features obligatory on all probes? yes No Radical b. Are uφ-features fully specified on all probes? No Pro-drop Yes Polysynthesis

c. Are uφ-features fully specified on some probes?

No: Yes Non-null-subject d. . Are the uφ-features of {T, v,…} impoverished? As indicated, the “No” value in (53a) gives radical prodrop. The negative value of (53b) may give rise to a polysynthetic system (or at least to consistent head-marking; Baker’s notion of polysynthesis seems to combine this with consistent head-movement of arguments; see (50)). A positive value for (53c) gives a non-null-subject language like English. (53d) is intended to simply indicate the ways in which the null-subject parameter starts to

Introduction 517 “break up” into microparameters as individual probes are evaluated in relation to it (cf. §1.2.5,(16)). Clearly, a “no” value for T and a “Yes” value for v will give a consistent null-subject language like Italian.28 In terms of the general schema for parameters in (51), we can state the NSP as follows: (54) ∃ff ∈D  S (D, TFin )  (54) reads “For some feature D, D is a sublabel of finite T”, where “sublabel” is understood as in Chomsky (1995: 268). This captures the force of the informal statement given in (18) above, and shows how the NSP fits with the general format for parameters, and how it is part of the parameter network in (53). Partial null-subject languages, and intricate cases like certain registers of French (see Roberts (2010b)), require still further specification. Again, these represent progressively more marked options, located more deeply in the schema/network and further down the learning path. In these terms, we can immediately observe a connection between the null-subject parameter and the other parameter schemata/networks discussed here. A further advantage of hierarchies of the type sketched above is that they restrict the upper bound of grammars that a given set of parameters can generate. The cardinality of G, the set of grammars, is equivalent to the cardinality of P, the set of parameters, plus 1, to the power of the number of hierarchies: n

(55) |G| = (|P| + 1) , where n = |H| Suppose, arbitrarily, that there are 5 hierarchies (we have seen two; there must be one for word order, and it is very easy to imagine that there are at least two more), and suppose that there are 30 parameters. Then |G| is 315, or 28,629,151. This is a large number, but recall that 30 parameters yielded over a billion grammars on our earlier calculation based on unhierarchised microparameters (see §3.2). In this section and the previous one, we have clearly made some progress towards resolving the tension between descriptive and explanatory adequacy that we observed in §3.1. In particular, the question of the highly specific innate endowment in formal features that the BCC appears to demand has been eliminated, and we have also clarified the relation between micro- and macro-parameters. All parameters ultimately have the extremely simple form in (51), and they form schemata/networks which are related to markedness of the general kind in (48) and (50). The NSP for example is a case of the parameter schema in (53), specifically (53c) as it relates to T. We believe that these proposals go a long way towards to restoring the explanatory value of parameters (as well as giving wide empirical coverage, if our speculations about macroparameters are on the right track). In the

518 Anders Holmberg and Ian Roberts next section, we will suggest that they may show the way beyond explanatory adequacy. 3.5 Why Parameters? Comparative Syntax Beyond Explanatory Adequacy In recent work, Chomsky (2004, 2005, 2007) has proposed that an important property of the minimalist programme is that it can take us beyond explanatory adequacy (see Chomsky (2007: 19) for a lucid statement of this idea). Accordingly, we attribute the adult state of linguistic knowledge, adult competence, to three factors: (i) the genetic endowment, UG, (ii) experience of the PLD, (iii) principles not specific to language. The last have become known as “third-factor principles” and have to do in particular with principles of optimal and efficient computation. In this section, we would like to show how the view of parameters arrived at in the preceding sections can begin to take us beyond explanatory adequacy in this domain. The classical view of how principles and parameters interact to produce adult competence was based on the idea that the parametric options were specified as such as part of the genetic endowment, made manifest in the PLD (perhaps with inaccessible properties being triggered by accessible properties in a parametric cluster) and thereby fixed during language acquisition, to give an adult system which was an instantiation of UG with all parametric options fixed. So this view relied entirely in the interaction of factors (i) and (ii) above, characteristic of the classical notion of explanatory adequacy. The view of principles and parameters that follows from the considerations in the previous sections is rather different. UG does not even provide the parameter schemata. In essence, parameters reduce to the quantificational schema in (49), in which UG contributes the elements quantified over (formal features), the restriction (grammatical categories) and the nuclear scope (predicates defining grammatical operations such as Agree, etc). The quantification relation itself is not given by UG, since we take it that generalised quantification—the ability to compute relations among sets—is an aspect of general human computational abilities not restricted to language. So even the basic schema for parameters results from an interaction of UG elements and general computation. The parameter schemata form networks defined by markedness relations. The markedness notions we have invoked include relative length of description, and generalisation of the input (relevant for statements such as (38), which ultimately define macroparameters). Both rely on a general notion of computational conservatism, which again we can think of as a facet of computational efficiency. Again, then, the schemata arise from third-factor properties. The different points in the schemata all instantiate the schema in (49); they differ in the specificity of the two arguments to the quantifier C, the class of grammatical categories, and P, the predicates defining

Introduction 519 (conjunctions of) grammatical operations. The more specific either of these arguments, the more marked, and indeed the more “micro”, the parameter. Third-factor considerations may contribute to markedness in other ways, too. Mobbs (2008) critically reviews Hawkins’ (2004) three performance efficiency principles we mentioned in §2.5, and which were invoked by Newmeyer (2005) as an alternative to parametric accounts. We give them here: (56) a. Minimize Domains (MiD) b. Minimize Forms (MiF) c. Maximize Online Processing (MaOP) We saw in §2.5 how Newmeyer, following Hawkins, tries to invoke MiD in order to account for a well-known Greenbergian implicational universal. Whether or not that account is successful (and there are many reasons to think it is not, see Note 19), one can wonder as to the precise status of these efficiency principles. Mobbs convincingly argues that Hawkins’ Performance-Grammar Correspondence Hypothesis (PGCH) reflects only a correlation, not causation, and that we might do better to reconsider the efficiency principles as third-factor principles that play a role in defining UG. In these terms, MiD naturally relates to the “minimal search” considerations which underlie locality conditions: the non-intervention condition on Agree and the PIC. It also relates to the No Tampering Condition (existing relations should not be altered by later operations) and cyclicity generally (Mobbs (2008: 8–9)). So a version of one of Hawkins’ efficiency principles, more abstractly construed as a general computational principle informing competence, rather than constituting performance, may lie behind certain fundamental properties of the language faculty. What concerns us more directly here, however, are third-factor constraints related to markedness. Mobbs suggests very plausibly that Hawkins’ Minimize Forms (MiF) constraint may underlie the markedness preference articulated in Roberts & Roussou (2003) for relatively simple forms. More importantly, he proposes a further efficiency constraint, Generalise Features (GenF), which he states as follows: (57) Human computation shares features over forms in the same domain (Mobbs (2008: 11)). This, he suggests, may underlie the “generalisation of the input” form of markedness put forward in (38) above. We concur with this suggestion. It seems, then, that Hawkins-style efficiency principles may have a role to play in our theory of universals and typology, but not as performance constraints on processing. Instead, they may reflect deeper, rather general computational principles, which contribute to the third factor determining adult

520 Anders Holmberg and Ian Roberts competence. They may contribute to typology to the extent that they inform the markedness principles which determine parameter schemata/networks. Further, we have suggested that the general form of parameters themselves results from the interaction of UG primitives with the general principles of quantification. Finally, we can ask the most difficult question of all, but one which the minimalist programme requires us to ask: why do we have parameters at all? Our general format for parameters in (51), inasmuch as it allows Q to be a negative quantifier, basically states that formal features of functional heads are all in principle optional. UG says nothing more than this, which is about as little as could possibly be said (in particular, this is a more “minimal” statement than either forbidding or requiring the presence of such features). Moreover, the quantificational schema is maximally liberal: it states that the formal features may be in any set-theoretic relation with any predicate defined by the theory of grammar. So parametric variation arises because UG really doesn’t mind about the distribution of formal features in any given grammatical system. But we know that speakers fixate on given grammatical systems during language acquisition, and speech communities recognise given (aggregates of) grammatical systems as languages. Neither of these aspects of parametric variation directly concerns UG, however: “fixing” parameters may be a facet (actually, almost a definition) of learning. So the kind of stable parametric variation we observe in adults arises from the fixation on random values. These values take on cultural and social value—a very different kind of value—as “languages” for the kinds of reasons that have been revealed by sociolinguistics. But, even if UG doesn’t mind, how and why could variation in grammatical systems have emerged? Here, the work of Niyogi & Berwick (1995, 1997) and Niyogi (2006) on modelling the acquisition and change of grammatical systems in populations of learners is revealing. They show that given a learning algorithm A, a probability distribution of linguistic tokens across a population (random PLD), and a restricted class of grammars from which to select (UG), variability will result as long as the time allowed for the selection of hypotheses is restricted. This idea emerges most clearly in the following quotation from Niyogi (2006: 14–15): imagine a world in which there are just two languages, Lh1 and Lh2. Given a completely homogeneous community where all adults speak Lh1, and an infinite number of sentences in the Primary Linguistic Data, the child will always be able to apply a learning algorithm to converge on the language of the adults, and change will never take place . . . Now consider the possibility that the child is not exposed to an infinite number of sentences but only to a finite number N after which it matures and its language crystallizes. Whatever grammatical hypothesis the child has after N sentences, it retains for the rest of its life. Under such a setting, if N is large enough, it might be the case that most children

Introduction 521 learn Lh1, but a small proportion ε end up acquiring Lh2. In one generation, a completely homogeneous community has lost its pure character. In other words, if we combine the heterogeneity in any speech community, the random distribution of PLD (poverty of the stimulus) and the limited time for learning (i.e. the critical period for language acquisition), change in grammatical systems is inevitable. If change is inevitable in the diachronic dimension, then variation is inevitable in the synchronic dimension. But, again, none of this reflects any aspect of UG except an indifference as to the distribution of formal features, as captured by (51). Here we begin to see the shape of comparative syntax, beyond explanatory adequacy. 3.6 Where Are Parameters? The Locus of Parametric Variation We have already discussed the BCC, the idea that parameters are specified in the lexical entries of lexical items, at some length. However, there is also a trend in recent minimalist theory to locate all linguistic variation, including all syntactic variation, in the post-spell-out morphology and phonology components (see for example, Sigurðsson (2006a,b), Poole & BurtonRoberts (2006a,b), Boeckx (2011)). The following quote is representative: . . . it is not implausible to think of narrow syntax as completely uniform (meeting LF-demands), and not affected (design-wise) or adapted to cope with or code for variation in the guise of (syntactic) parameters. (Boeckx, 2011). This hypothesis (which Boeckx dubs the Strong Uniformity Theory, SUT) is still at a programmatic stage, and thus has not been seriously put to the test, yet. It is an interesting hypothesis, though, and very much in the spirit of the minimalist programme. What we are proposing here can indeed be construed as coming close to the SUT. For example, whether to delete or pronounce a pronoun is clearly a matter of PF, and variation with regard to EPP-features discussed above in §3.3. is presumably best construed as taking effect after Spell Out (which is made technically possible by phase-theory, according to which Spell Out applies at several stages in the derivation of a sentence; Chomsky (2000, 2001, 2008), Svenonius (2004)). We would maintain, however, that some variation is encoded in narrow syntax. In fact, so does Boeckx. He concedes that the following variation is encoded there: (58) a. Features F1 and F2 may be expressed separately or as a bundle; b. F may or may not exhibit a uF variant; c. A given phase head may be strong, i.e. uF-bearing, or weak (defective). (The similarities with GGL’s parameter schema in (46) and Roberts & Roussou’s proposal in (47) are obvious; see also the other proposals

522 Anders Holmberg and Ian Roberts mentioned by Boeckx). But, he says, “all other ‘parametric’ options arise in the post-syntactic morpho-phonological component, such as whether a head H allows its specifier to be filled by overt material, or whether the head or the tail of a chain can or must be pronounced, or whether a given head H is affixal and requires its expression in the vicinity of another head, or whether a head H precedes or follows its complements” (Boeckx 2011: 215). This is not significantly different from what we are proposing. In fact, Boeckx’s parameter schemata (a,b,c) fall under our more general parameter schema (51) (if P can be ‘is bundled together with F2’, an idea which we have not discussed, but which will be exploited in Holmberg (2010a)).29 Interestingly, Boeckx presents these ideas in the context of a paper where he distances himself from GB-style parametric theory: “[T]he idea that a GB-style Principles-and-Parameters architecture provides the right format for a solution to Plato’s Problem is, I think, seriously mistaken, on both empirical and conceptual grounds” and “if minimalists are right, there cannot be any parametrized principles, and the notion of parametric variation must be rethought” (Boeckx 2011: 206). As should be obvious from the discussion above, this is not exactly how we see it. Instead, we see the theory of linguistic variation which is developed within the minimalist framework as a refinement of the “GB-style Principles-and-Parameters architecture” (see also Biberauer (2008b) for essentially the same proposal). One reason why Boeckx dismisses GB-style parameters, thus concurring with Newmeyer (2004, 2005, 2006) (discussed above in section 2.3), Culicover (1999), and Jackendoff (2002), is that he thinks of the parameters as being, by definition, principles which come as an addition to the set of universal principles of UG, but with the difference that they specify a range of options for some grammatical property. In this view parameters make UG larger, with more specifications, and thereby also make it more specific to the language faculty, in apparent conflict with the minimalist programme of reducing UG, as far as possible, to third factor effects, as discussed in the previous section. It also raises the question how such a rich UG could have evolved (as pointed out by Boeckx 2011). But although the view of parameters as constituting an additional set of specifications to UG is often voiced in textbook presentations of parametric theory, it is not the only view, and it is not inherent to P&P theory. An alternative is that parameters are just those grammatical options which are not specified by UG, as we have tried to specify in the preceding sections. An obvious example is the traditional head-complement parameter. In the case of this parameter the values (head precedes complement or head follows complement) are not given by properties of UG, but instead follow from the ultimately physical fact that words must be linearly ordered, allowing exactly those two options (as Boeckx notes, in fact). However, this does not make it any less of a parameter in the P&P sense, as long as it remains true that there is a finite range of options which are left open by UG in that UG does not make the choice for the learner. Another classical parameter of

Introduction 523 GB theory which clearly has this character is the wh-parameter of Huang (1982): (59) Wh-movement takes place before/after S-structure. Here UG prescribes that wh-movement happens whenever a wh-phrase is selected from the lexicon, but does not specify when. Given the GB-model, and given the semantic properties of wh-expressions, there are logically two possibilities: before S-structure or after, on the LF-side (in Huang’s (1982) terms “in Syntax or in LF”). In this case the options are given by independently motivated architectural properties of the system. Specifying the options as in (59) is done in the name of explicitness, presumably, but is in fact redundant, as should be obvious to a critical reader. Disagreement between us and Boeckx as regards the historical relation between GB-theory and minimalism is of limited importance, though, in the context of this volume. What is more important is whether we agree on the substantial issues concerning the nature of UG and the explanation of variation. One substantial, empirical claim that Boeckx (2011) makes, which we do not agree with, is that all parameters (he refers to them as ‘nanoparameters’) are independent; hierarchies of parameters and macroparameters do not exist. But the hierarchies we have proposed are determined by third-factor principles, as we have seen, and so there is no cost to UG in proposing these. Unaltered, Boeckx’s view will either be descriptively inadequate (too few parameters to account for the attested variation) or predict astronomical numbers of unattested systems, for the reasons alluded to in §3.2. So we see that the opposite view should be taken seriously. An even more substantial point on which we agree with Boeckx is that “the minimalist program offers us a different, more adequate way of exploring how principles and parameters may interact” (Boeckx 2011: 206, Note 2). We hope to have demonstrated this in the preceding sections. See also Biberauer (2008b).

4. Conclusion The above discussions, combined with the papers to follow and those collected in Biberauer (2008a) and Holmberg (2008), attest to the continuing validity of the principles-and-parameters approach to UG. Naturally, this approach has changed significantly over the thirty years since it was first suggested (in Rizzi (1978)). However, the essential idea can still provide a way to resolve the traditional tension between description and explanation in comparative syntax. Moreover, we have seen that, by attempting to eliminate the same tension at the parametric level (caused by a proliferation of highly specific microparameters), we can arrive at interesting characterisations of epigenetic parameter-setting, parameter schemata/networks, and the relation between macro- and microparameters. A central concept in all of

524 Anders Holmberg and Ian Roberts this is markedness, which may be largely determined by third-factor considerations of computational efficiency. As we already said, we are beginning to see how comparative syntax might look, beyond explanatory adequacy.

Notes 1. We’d like to thank the other members of the null-subject project group for their comments on various earlier versions of this work. We’d also like to thank Hans Broekhuis, Norbert Corver, Riny Huybregts, Ursula Kleinhenz and Jan Koster, the editors of Organising Grammar: Linguistic Studies in Honor of Henk van Riemsdijk (De Gruyter 2005) for their comments on the Roberts & Holmberg contribution to that collection, parts of which are taken up again in Sections 2.3 and 2.4. Our thanks also to Fritz Newmeyer, for criticising us (see Newmeyer 2006), and to Pino Longobardi, Cristina Guardiano and Claudia Gianollo for being such stimulating interlocutors. Last but not least, Roberts would like to thank Alastair Appleton, Bob Davies, Luke Donnan and Iain Mobbs, all students on the Cambridge Linguistics MPhil 2007–8, for listening to and reacting to some of the ideas discussed in Sections 2 and 3 so assiduously and so intelligently. 2. Other material, published and unpublished, generated by the project is detailed on the project website (http://people.pwf.cam.ac.uk/mtb23/NSP/Nullsubjects projecthome.html). 3. A similar point is made in Roberts (2001:90) in relation to parametric accounts of syntactic change. 4. One might also observe, with hindsight, that the surface filter in (1) is very close to the original formulation of the Extended Projection Principle put forward in Chomsky (1982:10), i.e. the requirement that every clause must have a subject. For Perlmutter, (1) is parametrised (although, again, this terminology postdates his insights); see Section 1.3 on proposals that the NSP involves parametrising the requirement for a structural subject. 5. Examples of this type do not fall under Perlmutter’s surface filter in (1) since it was generally assumed that such infinitives were VPs (resulting from Raising in (4b) where Mary is present, and Equi-NP Deletion where it is not, and in (4c), combined with tree-pruning in the sense of Ross (1967, Chapter 3)). In cases like (4a), the S-node of the gerund is pruned when the subject is not present. 6. Both English and French also allow null subjects in special discourse environments or registers. Haegeman (2000:130) gives the following examples from what she calls “written abbreviated registers” (“written registers in which pressures of economy seem to overrule the ‘core’ grammar” (132), including diaries, short notes and some kinds of colloquial speech): (i) a. — cried yesterday morning. b. Elle est alsacienne. — paraît intelligente. She is Alsatian. Seems intelligent.

(Plath (1983:288)) (Léautaud (1989:48))

Such null subjects, in addition to being restricted to certain types of discourse and/or register, have special properties which distinguish them from the canonical null subjects of (2) and (3) (see Haegeman (2000:138–141) for details). See Holmberg (2010a, section 2). We will leave these cases aside here. 7. There are sometimes limited exceptions to this generalisation. In Italian, for example, the 2sg pronoun tu must appear when the verb is in the subjunctive. In the present subjunctive, the singular forms of the verb are not distinct: che (io) parli, che tu parli, che (lui/lei) parli (“that I/you/he/she speak”).

Introduction 525 8. But see Öztürk (2001, 2008) for the view that Turkish is a “discourse prodrop” language in the sense of §1.2.3 below. This paradigm is from Csató & Johansen (1998:214). 9. Chierchia (1998) formulated the Nominal Mapping Parameter, which distinguishes languages in which bare nouns are able to function as arguments from those in which they are not (more technically, can NP map directly into type ?). If yes, then the language has generalized bare arguments (allows bare singulars and plurals), has a generalized classifier system and lacks plural morphology. Chierchia proposes that Chinese has the positive value for this parameter, while English has the negative value. See Ramchand & Svenonius (2008) for a different view. 10. Brazilian Portuguese is an interesting case, as it seems to differ from European Portuguese in being a partial null-subject language, while European Portuguese is a consistent null-subject language. Duarte (1995) traces this development in 19th- and 20th-century Brazilian Portuguese. For some speculation regarding the diachronic development of Brazilian Portuguese in this respect, see Roberts (2014a). 11. As it stands, this is an idealisation. There is, in fact, considerable variation among the discourse prodrop languages as regards the use of pro-drop. Chinese is apparently more restricted in this respect, making more use of overt pronouns, than, for example, Japanese, and possibly more than many consistent null-subject languages: see Cole (2009). 12. (19) is simplified in various ways. First, we are assuming that parameters always have binary values, although of course one could in principle add to (19) a clause specifying how values vi, i+1 .. n each entail distinct clusters of P-expressions C1, . . . Cn, each of which may be partially disjoint from the others. For simplicity, and following general practice in discussions of parameters, we assume that all parameters are in fact, or can be formulated as if they were, binary. Second, taking C’ to be the negative correlate of C is clearly the simplest assumption we can make, although not required. We will nevertheless make it. 13. See Holmberg & Sheehan (2010), and especially Sheehan (2006: ch. 6) on Brazilian Portuguese. 14. On this basis, we might have expected (29b) to go the other way, stating a oneway implicational relation between complementiser-trace violations and free inversion. But as discussed above, languages have different strategies for avoiding complementiser-trace violations, including an operation on C, as in English and French. On the other hand, (29b) makes the highly substantive claim that any language with free inversion will allow complementiser-trace violations. 15. See Roberts (2007b:31ff.) on the possibility of including referential null subjects in the scale in (31). 16. A possible reason behind Newmeyer’s opposition to characterizing the choice between head-precedes-complement and complement-precedes-head as two values of a parameter is that, in this particular case, the values are not given by properties of UG, but rather follow from the ultimately physical fact that words must be linearly ordered, allowing exactly those two options. We return briefly to this point in §3.6. 17. These include: (i) why are all verbs with prepositional complements assumed to also have a direct object? English and most other languages have verbs like rely, hope, depend, etc, which can take PP but not NP complements. Do we therefore expect a different typological generalisation for this type of verb? If not, why not? (ii) why is V’s PP complement assumed to pattern with its NP complement such that both always systematically precede/follow the verb? In other words, why are orders such as the following not considered?

526 Anders Holmberg and Ian Roberts (i) [VP NP V [PP P NP]] (ii) [VP [PP NP P ] V NP]

(OV and PO) (VO and OP)

Here the “distance”—however that is computed—between V and P is smaller than in the cross-linguistically common cases in (34). Therefore, all other things being equal, we expect to find these more frequently than (34), which is not the case. “Mixed” orderings of complements like these do exist: depending on one’s analysis of languages such as Mandarin and other Chinese varieties may have [VP [PP P NP ] V NP] (see Li’s (1990) analysis of the ba-construction), which is predicted to be as common as (34) by MiD, but is cross-linguistically rare. German and Dutch allow the order in (i) in subordinate clauses with “PP-extraposition”; again, this pattern is predicted to be more frequent than (34) and (35), which is very probably not the case. These objections concern the presence of an “extra” NP in the representations given by Newmeyer and schematised in (34). Hawkins (1990:238–9) leaves the extraneous NP out, and makes essentially the same argument. Here, too, some objections can be made. First, what about adjunct PPs, which presumably occur in a different configuration from those in (34) and (35); do we predict a different typological generalisation for these? Second, Dryer’s (2005:386) figures show that (35b) is four times as rare as (35a). Why? Third, Dryer states that in 141 of the 1,033 languages he surveys the word order cannot be determined. It may be that some of these languages have “free” word order on the surface. For Newmeyer/ Hawkins, such orders, if truly free, would presumably be the hardest of all to process, since there is no consistent “distance” in the “domain” (or perhaps the easiest, for the same reason). But actually they are neither the rarest type, being much commoner than the inconsistent types in (35), nor particularly common, being much rarer than the consistent types. Of course, it would be possible to assume an underlying fixed order as in generative work, but such a move will make a processing account hard to maintain and raise the possibility of movement operations perturbing underlying order in the other cases. Whichever way things are construed, then, more than 10% of the languages in Dryer’s sample pose an insuperable problem for Hawkins’ approach. Newmeyer’s strictures regarding the “typological dimension”, so ruthlessly applied to generative work, mean that he cannot simply appeal to analyses that have not yet been carried out. Jumping the gun somewhat, then, we find Newmeyer’s preferred alternative as unconvincing as his arguments against the principles and parameters approach. 18. A similar point is made in connection with diachronic linguistics by Roberts (2001). GGL (2008: 6) observe a tension between explanatory and what they term “evolutionary” adequacy: “once parameters are included by a theory of UG, the minimization of the genetic endowment (produced by the supposed economy constraint on stored innate knowledge) should probably amount to minimizing the number of parameters as well as resulting, if anything, in a reduction, rather than in the observable extension, of the space of variation.” 19. Newmeyer (2005:208) asserts that the minimalist approach to parameters, which involves seeing them as inherently connected to features of functional heads, “makes it all but impossible to predict any significant degree of clustering.” This assertion is false, as we have just seen. 20. One could perhaps attempt a contact-based explanation for the parallel developments. The Southern dialects were in contact with Greek and the other Italic varieties of Indo-European: Oscan, Umbrian and related varieties, collectively known as Sabellian. These are broadly similar to Latin in typological terms, being predominantly SOV (Wallace (2004:832)), and so are unlikely to be responsible for the common development of the dialects. The Northern

Introduction 527 varieties were in contact with forms of Celtic and Venetic. In Tuscany and indeed in the early days of Rome itself, there was contact with Etruscan, a non-Indo-European SOV language (Rix (2004:961)). A very thorough study of contact between Latin and all of these languages, at various times and places, is Adams (2003). One possibility, which could have had far-reaching consequences, is that Latin demonstratives developed into articles partly due to contact with Greek, which had an article system. Adams (2003:518) points to the use of demonstratives as articles in a passage of Plautus. This is significant because it can be traced to Greek influence and because of its early date (see Adams’s discussion for details). 21. Here we are only concerned with the feature which attracts VP; let us leave aside whatever it is that attracts V. In Biberauer’s system, VP-movement is triggered by an EPP feature associated with the v’s property of probing for D, VPmovement representing the “pied-piping” option as compared to object shift (movement of the object DP). 22. This refers to the fact that Chinese canonically shows the order Adjunct-VComplement rather than V-Complement-Adjunct (e.g. John often visits Mary vs. John visits Mary often). Following Kayne (1994), Cinque (1999), the E nglish order may result from VP-raising into the functional field; Chinese lacks this operation. Hence, in general, the only material following V is the complement (and, given verb serialization, V usually has exactly one argument). This gives the appearance of “reverse V2”. 23. The earliest version of this typology was put forward by Schlegel (1817); see Morpurgo-Davies (1998:71–75) for discussion of this and its 17th- and 18thcentury antecedents. 24. At this point, it is natural to ask whether there is an Inflectional/Fusional Parameter. We suspect not; this parameter arises where none of Polysynthesis, Analyticity or Agglutination are set to their positive values. Inflection/fusion seems to involve non-uniform behaviour among functional heads, and hence might be seen as a marked system. Certainly, these systems are prone to change, as much of the history of Indo-European attests, and inflectional languages tend to be of mixed type (cf. German, Latin, Sanskrit and other conservative, highly inflecting Indo-European languages). 25. Boden (2006: 493) characterises epigenesis as follows: ““a self-organising dialectic between biological maturation and experience.” A similar suggestion, but restricted to what he calls “core parameters” (approximately equivalent to Baker’s notion of macroparameter), is put forward by Uriagereka (2007:106ff). See also Vercelli (2009). 26. In the theory of incorporation put forward in Roberts (2010c), a probe which triggers incorporation cannot have an EPP feature for principled reasons, which justifies the way the options are presented in (48), as opposed to the slightly richer set of options put forward by Roberts & Roussou. 27. The Mainland Scandinavian languages do not show any subject agreement in finite clauses (see Holmberg & Platzack 1995: 16), yet they are not radical prodrop languages, but instead non-null subject languages. We maintain that they have a generalised uϕ-feature in finite T which does not have a morphologically realised valued form, but is nevertheless visible in virtue of obligatory movement of a nominal subject to Spec,TP; see Holmberg (2010a). 28. The clustering properties might follow, given §2.2(30/32). On the connection of rich agreement to all of this, see Roberts (2010a, §2.5). 29. The concept “F and G form a feature bundle” can also be expressed in the restriction, as follows (where “S” is again the predicate “is a sublabel of”): (i) ∃f ∈F∃g∈F  S (f,C) and S (g,C) .

528 Anders Holmberg and Ian Roberts

References Abels, Klaus & Ad Neeleman (2009) Universal 20 without the LCA. In J. M. Brucart, A. Gavarró & J. Solà (eds) Merging Features. Oxford: Oxford University Press. Adams, J.N. (2003) Bilingualism and the Latin Language, Cambridge: Cambridge University Press. Alexiadou, A. & E. Anagnostopoulou (1998) Parametrizing Agr: Word order, verbmovement and EPP-checking. Natural Language and Linguistic Theory 16: 491–539. Baker, M. (1996) The Polysynthesis Parameter. Oxford: Oxford University Press. Baker, M. (2001) The Atoms of Language. Oxford: Oxford University Press. Baker, M. (2008a) The macroparameter in a microparametric world. In T. Biberauer (ed) The Limits of Syntactic Variation. Amsterdam: John Benjamins. Baker, M. (2008b) The Syntax of Agreement and Concord. Cambridge: Cambridge University Press. Barbosa, P. (1995) Null Subjects. PhD Dissertation, MIT. Barbosa, P. (2009) Two kinds of subject pro. In A. Holmberg (ed) Studia Linguistica, Special Issue on Null Subjects. Benincà, P. & C. Poletto (2005) On some descriptive generalizations in Romance. In G. Cinque & R. Kayne (eds) The Oxford Handbook of Comparative Syntax. Oxford: Oxford University Press, pp. 221–258. Berwick, R. (1985). The Acquisition of Syntactic Knowledge. Cambridge, MA: MIT Press. Biberauer, T. (2003). Verb Second (V2) in Afrikaans: A Minimalist Investigation of Word-Order Variation. Ph.D. Dissertation, Cambridge University. Biberauer, T. (2008) The Limits of Syntactic Variation. Amsterdam: John Benjamins. Biberauer, T. (2010) Semi null-subject languages, expletives and expletive pro reconsidered. In T. Biberauer, A. Holmberg, I. Roberts & M. Sheehan (eds) Parametric Variation: Null Subjects in Minimalist Theory. Cambridge: Cambridge University Press, pp. 153–199. Biberauer, T., A. Holmberg & I. Roberts (2007) Disharmonic word-order systems and the final-over-final constraint. In C. Chang & H. Haynie (eds) Proceedings of 26th West Coast Conference on Formal Linguistics. Somerville, MA: Cascadilla Press, pp. 96–104. Biberauer, T. & Richards, M. (2006). True optionality: When the grammar doesn’t mind. In C. Boeckx (ed.), Minimalist Essays. Amsterdam: Benjamins, 35–67. Biberauer, T. & I. Roberts (2009) The return of the subset principle. In P. Crisma & G. Longobardi (eds) Historical Syntax and Linguistic Theory. Oxford: Oxford University Press, pp. 8–74. Biberauer, T. & I. Roberts. (2010) Subjects, Tense and verb-movement. In T. Biberauer, A. Holmberg, I. Roberts & M. Sheehan (eds) Parametric Variation: Null Subjects in Minimalist Theory. Cambridge: Cambridge University Press, pp. 263–302. Bobaljik, J. & D. Jonas (1996) Subject positions and the roles of TP. Linguistic Inquiry 27:195–236. Boden, M. (2006) Mind and Machine: The History of Cognitive Science, Volume I. Oxford: Oxford University Press. Boeckx, Cedric. (2011). Approaching parameters from below. In A.-M. di Sciullo & C. Boeckx (eds) The Biolinguistic Enterprise: New Perspectives on the Evolution and Nature of the Human Language Faculty. Oxford: Oxford University Press, pp. 205–212. Borer, H. (1984) Parametric Syntax. Dordrecht: Foris. Borer, H. (1986) I-Subjects. Linguistic Inquiry 17: 375–416.

Introduction 529 Borsley, R., M. Tallerman & D. Willis (2007) The Syntax of Welsh. Cambridge: Cambridge University Press. Broekhuis, Hans, Norbert Corver, Riny Huybregts, Ursula Kleinhenz & Jan Koster, eds (2005) Organising Grammar: Linguistic Studies in Honor of Henk van Riemsdijk. Berlin: De Gruyter. Burton-Roberts, Noel & Geoffrey Poole (2006a). Syntax vs. phonology: A representational approach to stylistic fronting and verb-second in Icelandic. Lingua 116:562–600. Burton-Roberts, Noel & Geoffrey Poole (2006b). ‘Virtual conceptual necessity’, feature dissociation, and the Saussurian legacy in generative grammar. Journal of Linguistics 42:575–628. Cardinaletti, A. (1990) Impersonal Constructions and Sentential Arguments in German. Padua: Unipress. Cardinaletti, A. & M. Starke (1999) The typology of structural deficiency: A case study of the three classes of pronouns. In H. van Riemsdijk (ed) Clitics in the Languages of Europe. Berlin: de Gruyter, pp. 145–235. Chierchia, G. (1998) Reference to kinds across Languages. Natural Language Semantics, 339–405. Chomsky, N. (1964) Current Issues in Linguistic Theory. The Hague: Mouton. Chomsky, N. (1973) Conditions on transformations. in S. Anderson & P Kiparsky (eds) A Festschrift for Morris Halle. New York: Holt, Reinhart & Winston, pp. 232–286. Chomsky, N. (1981) Lectures on Government and Binding. Dordrecht: Foris. Chomsky, N. (1982) Some Concepts and Consequences of the Theory of Government and Binding. Cambridge, Mass.: MIT Press. Chomsky, N. (1995) The Minimalist Program. Cambridge, Mass.: MIT Press. Chomsky, N. (2000) Minimalist inquiries: The framework. In R. Martin, D. Michaels & J. Uriagereka (eds) Step by Step: Essays on Minimalist Syntax in Honor of Howard Lasnik. Cambridge, Mass.: MIT Press, pp. 89–156. Chomsky, N. (2001) Derivation by Phase. In M. Kenstowicz (ed) Ken Hale: A Life in Language. Cambridge, Mass.: MIT Press, pp. 1–52. Chomsky, N. (2004) Beyond explanatory adequacy. In A. Belletti (ed) Structures and Beyond: The Cartography of Syntactic Structures, Volume 3, Oxford: Oxford University Press, pp. 104–131. Chomsky, N. (2005) Three factors in language design. Linguistic Inquiry 36:1–22. Chomsky, N. (2007) Approaching UG from below. In H.-M. Gärtner & U. Sauerland (eds) Interface + Recursion = Language? Chomsky’s minimalism and the view from syntax and semantics. Berlin: Mouton de Gruyter, pp. 1–29. Chomsky, N. (2008) On phases. In C. Otero & M.-L Zubizarreta (eds) Foundational Issues in Linguistic Theory: Essays in Honor of Jean-Roger Vergnaud. Cambridge, MA: MIT Press. Chomsky, N. & M. Halle (1968) The Sound Pattern of English. New York: Harper & Row. Cinque, G. (1999) Adverbs and Functional Heads: A Cross-Linguistic Perspective. Oxford: Oxford University Press. Clackson, J. (2004) Latin. In R. Woodward (ed) The Cambridge Encyclopedia of the World’s Ancient Languages. Cambridge: Cambridge University Press, pp. 789–811. Clark, R. & I. Roberts (1993) A computational model of language learnability and language change. Linguistic Inquiry 24: 299–345 [this volume, Chapter 2]. Cole, M. (2009). Null Subjects: A Reanalysis of the Data. Linguistics 47:559–587. Croft, W. (2003) Typology and Universals. Cambridge: Cambridge University Press. Csató, E. & L. Johansen (1998) Turkish. In L. Johansen & E. Csató (eds) The Turkic Languages. London: Routledge, pp. 203–235.

530 Anders Holmberg and Ian Roberts Culicover, P. (1999) Syntactic Nuts. Oxford: Oxford University Press. Dresher, E. (1999) Charting the learning path: Cues to parameter setting. Linguistic Inquiry 30: 27–68. Duarte, E. (1995) A Perda do Princípio “Evite pronome” no Português Brasileiro. PhD Dissertation, Unicamp. Dryer, M. (1992) On the Greenbergian word-order correlations. Language 68:81–138. Dryer, M. (2005) Relationship between the order of object and verb and the order of adposition and noun phrase. In M. Haspelmath, M. Dryer, D. Gil & B. Comrie (eds) The World Atlas of Language Structures. Oxford: Oxford University Press, pp. 386–389. Frascarelli, M. (2007). Subjects, topics, and the interpretation of referential pro. Natural Language and Linguistic Theory 25:691–734. Gianollo, C., C. Guardiano & G. Longobardi (2008) “Three fundamental issues in parametric linguistics.” In T. Biberauer (ed) The Limits of Syntactic Variation. Amsterdam: John Benjamins. Gildersleeve, B.L. & G. Lodge (1895/1997) Latin Grammar. Bristol: Bristol Classical Press. Gilligan, G. (1987) A Cross Linguistic Approach to the Pro-Drop Parameter. PhD Dissertation, University of Southern California. Haegeman, L. (2000) Adult null subjects in non pro-drop languages. In M.-A. Friedemann, & L. Rizzi (2000) The Acquisition of Syntax: Studies in Comparative Developmental Linguistics. London: Longman, pp. 129–169. Haider, Hubert. (2000). OV is more basic than VO. In P. Svenonius (ed.), The Derivation of VO and OV. Amsterdam: Benjamins, pp. 45–67. Hall, R.A. (1950) “The reconstruction of Proto-Romance.” Language 26:6–27. Haspelmath, M. (2001) The European linguistic area: Standard average European. In Martin Haspelmath, Ekkehard König, Wulf Oesterreicher & Wolfgang Raible (eds.) Language Typology and Language Universals (Handbücher zur Sprachund Kommunikationswissenschaft). Berlin: de Gruyter, 1492–1510. Haspelmath, M., M. Dryer, D. Gil & B. Comrie (2005) World Atlas of Language Structures. Oxford: Blackwell. Hauser, M., N. Chomsky & W. Fitch (2002) The faculty of language: What is it, who has it, and how did it evolve? Science 298:1569–1579. Hawkins, J. (1990) A parsing theory of word order universals. Linguistic Inquiry 21:221–262. Hawkins, J. (1994) A Performance Theory of Order and Constituency. Cambridge: Cambridge University Press. Hawkins, J. (2004) Efficiency and Complexity in Grammars. Oxford: Oxford University Press. Holmberg, A. (2005) Is there a little pro? Evidence from Finnish. Linguistic Inquiry 36: 533–564. Holmberg, A. ed (2009) Studia Linguistica, Special Issue on Null Subjects. Holmberg, A. (2010a). Null subject parameters. In T. Biberauer, A. Holmberg, I. Roberts & M. Sheehan (eds) Parametric Variation: Null Subjects in Minimalist Theory. Cambridge: Cambridge University Press, pp. 88–124. Holmberg, A. (2010b) The null generic pronoun in Finnish: A case of incorporation in T. In T. Biberauer, A. Holmberg, I. Roberts & M. Sheehan (eds) Parametric Variation: Null Subjects in Minimalist Theory. Cambridge: Cambridge University Press, pp. 200–230. Holmberg, A., A. Nayudu & M. Sheehan. (2009) Three partial null-subject languages: A comparison of Brazilian Portuguese, Finnish and Marathi. Studia Linguistica 63: 59–97.

Introduction 531 Holmberg, A. & M. Sheehan. (2010) Control into finite clauses in partial nullsubject languages. In T. Biberauer, A. Holmberg, I. Roberts & M. Sheehan (eds) Parametric Variation: Null Subjects in Minimalist Theory. Cambridge: Cambridge University Press, pp. 125–152. Householder, F. (1981) Syntax of Apollonius Dyscolus. Amsterdam: John Benjamins. Huang, J. (1982) Logical Relations in Chinese and the Theory of Grammar. PhD Dissertation, MIT. Huang, J. (1984) On the distribution and reference of empty pronouns. Linguistic Inquiry 15: 531–574. Huang, J. (2015) On syntactic analyticity and parametric theory. In A. Li, A. Simpson & W.-T. D. Tsai (eds) Chinese Syntax in a Cross-Linguistic Perspective. New York/Oxford: Oxford University Press, pp. 1–48. Jackendoff, R. (2002) Foundations of Language. Oxford: Oxford University Press. Jespersen, O. (1924) The Philosophy of Grammar. London: Allen & Unwin. Julien, M. (2002) Syntactic Heads and Word Formation. Oxford: Oxford University Press. Kauffman, S. (1995) At Home in the Universe. London: Viking. Kayne, R. (1994) The Antisymmetry of Syntax. Cambridge, Mass.: MIT Press. Kayne, R. (2000) Parameters and Universals. Oxford: Oxford University Press. Kayne, R. (2005) Some notes on comparative syntax, with special reference to English and French. In G. Cinque & R. Kayne (eds) The Oxford Handbook of Comparative Syntax. Oxford: Oxford University Press, pp. 3–69. Kayne, R. (2009) Antisymmetry and the Lexicon. In J. van Craenenbroek (ed) Linguistic Variation Yearbook 2008, pp. 1–32. Kiparsky, P. (1973) ‘Elsewhere’ in Phonology. In S. Anderson & P Kiparsky (eds) A Festschrift for Morris Halle. New York: Holt, Reinhart & Winston, pp. 93–106. Lohndahl, Terje (2007) That-t in Scandinavian and elsewhere: Variation in the position of C. Working Papers in Scandinavian Syntax 79: 47–73. Lund University. Longobardi, G. & C. Guardiano (2009) Evidence for syntax as a signal of historical relatedness. Lingua. Manzini, M.-R. & L. Savoia (2005) I dialetti italiani e romanci, 119:1679–1706. J Nerbonne (ed) Special Issue: The Forests behind the Trees. Alessandria: Edizioni dell’Orso (3 volumes). McCloskey, J. & K. Hale (1984) On the syntax of person-number inflection in Modern Irish. Natural Language and Linguistic Theory 1:487–553. McShane, Marjorie. (2009) Subject ellipsis in Russian and Polish. Studia Linguistica 63:98–132. Mobbs, I. (2008) ‘Functionalism’, the Design of the Language Faculty, and Typology. PhD dissertation, University of Cambridge. Morpurgo-Davies, A. (1998) History of Linguistics; Volume IV: Nineteenth-Century Linguistics (general editor G. Lepschy), London: Longman. Müller, G. (2005) Pro-drop and impoverishment. In P. Brandt & E. Fuss (eds) Form, Structure and Grammar. A Festschrift Presented to Günther Grewendorf on the Occasion of his 60th Birthday. Tübingen: Narr, pp. 93–115. Müller, G. (2007) Some consequences of an impoverishment-based approach to morphological richness and pro-drop. Ms. University of Leipzig (available at http://www.uni-leipzig.de/~muellerg/mu228.pdf). Nash, L. & A. Rouveret (1997) Proxy categories in phrase structure theory. In K. Kusumoto (ed) Proceedings of NELS 27, pp. 287–304. Neeleman, A. & K. Szendrői (2007) Radical pro-drop and the morphology of pronouns. Linguistic Inquiry 38:671–714.

532 Anders Holmberg and Ian Roberts Neeleman, A. & K. Szendrői (2008) Radical pro-drop and the morphology of pronouns. In M.T. Biberauer (ed) The Limits of Syntactic Variation. Amsterdam: John Benjamins. Newmeyer, F. (2004) Against a parameter-setting approach to language variation. In P. Pica, J. Rooryk & J. van Craenenbroek (eds) Language Variation Yearbook, Volume 4. Amsterdam: Benjamins, pp. 181–234. Newmeyer, F. (2005) Possible and Probable Languages: A Generative Perspective on Linguistic Typology. Oxford: Oxford University Press. Newmeyer, F. (2006) Reply to Holmberg & Roberts. Ms. University of Washington. Nicolis, M. (2004) On Pro-Drop. PhD Dissertation, University of Siena. Nicolis, M. (2008) The null subject parameter and correlating properties: The case of Creole languages. In T. Biberauer (ed) The Limits of Syntactic Variation. Amsterdam: John Benjamins. Niyogi, P. (2006) The Computational Nature of Language Learning and Evolution. Cambridge, MA: MIT Press. Niyogi, P & R. Berwick (1995) The Logical Problem of Language Change. A.I. Memo No. 1516, MIT Artificial Intelligence Laboratory. Niyogi, P. & R. Berwick (1997) A dynamical systems model for language change. Complex Systems 11: 161–204. Ordoñez, P. (1997) Word Order and Clause Structure in Spanish and Other Romance Languages. PhD Dissertation, CUNY Graduate Center. Ordoñez, P. (2006) The Order of Subjects in Spanish and Catalan. Talk given at the Encontro Lingua Falada e Escrita V, Federal University of Maceió. Öztürk, B. (2001) Turkish as a non-pro-drop language. In E. E. Taylan (ed) The Verb in Turkish. Amsterdam: John Benjamins, pp. 239–260. Öztürk, B. (2008) Non-configurationality: Free word order and argument drop in Turkish. In T. Biberauer (ed) The Limits of Syntactic Variation. Amsterdam: John Benjamins. Palmer, L. (1961) La Lingua Latina. Bologna: Einaudi. [Italian translation of L. Palmer (1954) The Latin Language. London: Faber and Faber]. Perlmutter, D. (1971) Deep and Surface Constraints in Syntax. New York: Holt, Rinehart & Winston. Platzack, C. (2004) Agreement and the Person Phrase hypothesis. Working Papers in Scandinavian Syntax, 73: 83–112. Pollock, J.Y. (1997) Langage et Cognition: Introduction au Programme Minimaliste de la Grammaire Générative. Paris: Presses Universitaires de France. Ramchand, G. (2008) First Phase Syntax. Cambridge: Cambridge University Press. Ramchand, G. & P. Svenonius (2008) Mapping a Parochial Lexicon onto a Universal Semantics. In T. Biberauer (ed) The Limits of Syntactic Variation. Amsterdam: John Benjamins. Richards, Marc (2004) Object Shift in North and West Germanic: Optionality, Scrambling and Base-Generated OV. PhD Dissertation, University of Cambridge. Richards, M. & Biberauer, T. (2005). Explaining Expl. In M.den Dikken & Tortora, C. (eds.), The Function of Function Words and Functional Categories. Amsterdam: Benjamins, pp. 115–154. Rix, H. (2004) Etruscan. In R. Woodward (ed) The Cambridge Encyclopedia of the World’s Ancient Languages. Cambridge: Cambridge University Press, pp. 943–966. Rizzi, L. (1978) Violations of the wh-island constraint and the subjacency condition. In C. Dubisson, D. Lightfoot & Y.-C. Morin (eds) Montreal Working Papers in Linguistics, 11. Rizzi, L. (1982) Issues in Italian Syntax. Dordrecht: Foris. Rizzi, L. (1986) Null objects in Italian and the theory of pro. Linguistic Inquiry 17: 501–557. Rizzi, L. & U. Shlonsky (2005) Strategies of Subject Extraction. Centro Interdipartimentale di Studi Cognitivi sul Linguaggio, University of Siena.

Introduction 533 Roberts, I. (2001) Language change and learnability. In Stefano Bertolo (ed) Parametric Linguistics and Learnability. Cambridge: Cambridge University Press, pp. 81–125. Roberts, I. (2007a) Diachronic Syntax. Oxford: Oxford University Press. Roberts, I. (2007b) Introduction to I. Roberts (ed) Comparative Grammar: Critical Concepts. Volume II; The Null Subject Parameter. London: Routledge, pp. 1–44. Roberts, I. (2010a) A deletion analysis of null subjects. In T. Biberauer, A. Holmberg, I. Roberts & M. Sheehan (eds) Parametric Variation: Null Subjects in Minimalist Theory. Cambridge: Cambridge University Press, pp. 58–87. Roberts, I. (2010b) Varieties of French and the null subject parameter. In T. Biberauer, A. Holmberg, I. Roberts & M. Sheehan Parametric Variation: Null Subjects in Minimalist Theory. Cambridge: Cambridge University Press, pp. 303–327. Roberts, I. (2010c) Agreement and Head Movement: Clitics, Incorporation and Defective Goals. Cambridge MA: MIT Press. Roberts, I. (2014a) Taraldsen’s generalisation and language change: Two ways to lose null subjects. In P. Svenonius (ed) Functional Structure from Top to Toe: The Cartography of Syntactic Structure Volume 9. Oxford: Oxford University Press, pp. 115–147. Roberts, I. (2014b). The mystery of the overlooked discipline: Modern syntactic theory and cognitive science. Revue Roumaine de Linguistique 58: 151–178. Roberts, I. & Holmberg, A. (2005) On the role of parameters in Universal Grammar: a reply to Newmeyer. In H. Broekhuis, N. Corver, M. Everaert & J. Koster (eds) Organising Grammar: A Festschrift for Henk van Riemsdijk. Berlin: Mouton de Gruyter. Roberts, I. & A. Roussou (2003) Syntactic Change: A Minimalist Approach to Grammaticalization. Cambridge: Cambridge University Press. Ross, J. (1967) Constraints on Variables in Syntax. PhD Dissertation, MIT. Saito, M. (2007) Notes on East Asian argument ellipsis. Language Research 43: 203–227. Samek-Lodovici, V (1996) Constraints on Subjects: An Optimality Theoretic Analysis, PhD Dissertation, Rutgers University. Sapir, E. (1921) Language, New York: Harcourt Brace & Co. Schlegel, A. (1817) Über dramatische Kunst und Litteratur, Grundzüge einer Kulturund Völkergeschichte Alteuropas. 2nd edition, 3 volumes. Heidelberg: Mohr & Winter. Schleicher, A. (1861-2) Compendium der vergleichenden Grammatik der indogermanischen Sprachen. Kurzer Abriss einer Laut- und Formenlehre der indogermanischen Ursprache, des Altindischen, Altiranischen, Altgriechischen, Altitalischen, Altkeltischen, Altslawischen, Litauischen und Altdeutschen, 2 volumes. Weimar: Böhlau. Sheehan, M. (2006) The EPP and Null Subjects in Romance. University of Newcastle-Upon-Tyne PhD Dissertation. Sheehan, M. (2010) ‘Free’ inversion in Romance and the null subject parameter. In T. Biberauer, A. Holmberg, I. Roberts & M. Sheehan (eds) Parametric Variation: Null Subjects in Minimalist Theory. Cambridge: Cambridge University Press, pp. 231–262. Sigurðsson, Halldor Á. (2006a). Agree in syntax, agreement in signs. In C. Boeckx (ed.) Agreement Systems. Amsterdam-Philadelphia: John Benjamins, pp. 201–237. Sigurðsson, Halldor Á. (2006b). PF is more ‘syntactic’ than often assumed. Working Papers in Scandinavian Syntax 77: 101–128. Sigurðsson, H. & A. Holmberg (2008) Icelandic dative intervention: Person and number are separate probes. In R. d’Alessandro, G. Hrafnbjargson & S. Fischer (eds) Agreement Restrictions. Berlin: Mouton de Gruyter, pp. 251–280.

534 Anders Holmberg and Ian Roberts Sobin, Nicholas. (1987) The variable status of Comp-trace phenomena. Natural Language and Linguistic Theory 5:33–60. Stowell, T. (1981) The Origins of Phrase Structure. PhD Dissertation, MIT. Svenonius, P. (2004) On the edge. In David Adger, Cécile de Cat, and George Tsoulas (eds) Peripheries: Syntactic Edges and Their Effects. Dordrecht: Kluwer, pp. 261–287. Tallerman, M. (1987) Mutation and the Syntactic Structure of Modern Colloquial Welsh, PhD Dissertation, University of Hull. Tomioka, S. (2003) The semantics of Japanese null pronouns and its cross-linguistic implications. In K. Schwabe & S. Winkler (eds) The interfaces: Deriving and Interpreting Omitted Structures. Amsterdam: Benjamins, pp. 321–340. Uriagereka, J. (2007) Clarifying the notion ‘parameter’. Biolinguistics 1:99–113. Vangsnes, Ø. (2002) Icelandic expletive constructions and the distribution of subject types. In P. Svenonius (ed) Subjects, Expletives and the EPP. New York/Oxford: Oxford University Press, pp. 43–70. Vanelli, L., L. Renzi & P. Benincà (1985/2007) Typologie des pronoms sujets dans les langues romanes. Actes du XIIe Congrès des Linguistique et Philologie Romanes. Aix-en-Provence. [Published in Italian translation as Vanelli, L., L. Renzi & P. Benincà (1986) “Tipologia dei pronomi soggetto nelle lingue romanze medievali,” Quaderni Patavini di Linguistica 5: 49–66, and reprinted in Benincà, P. (1994) La variazione sintattica, Bologna: Il Mulino, pp. 195–213. Published in English translation as Vanelli, L., L. Renzi & P. Benincà (2007) A typology of Romance subject pronouns. In Roberts, I. (ed) Comparative Grammar, Volume II: The Null Subject Parameter. London: Routledge, pp. 234–245.] Vercelli, D. (2009) Language in an epigenetic framework. In M. Piattelli-Palmarini, J. Uriagereka & P. Salaburu (eds) Of Minds and Language: A dialogue with Noam Chomsky in The Basque Country. Oxford: Oxford University Press, pp. 97–107. Wallace, R. (2004) Sabellian languages. In R. Woodward (ed) The Cambridge Encyclopedia of the World’s Ancient Languages. Cambridge: Cambridge University Press, pp. 812–839. Wexler, K. (1998) Very early parameter setting and the unique checking constraint: A new explanation of the optional infinitive stage. Lingua 106: 23–79.

15 Macroparameters and Minimalism A Programme for Comparative Research Ian Roberts 15.1 Introduction This chapter is somewhat programmatic, and sketches a way of reconceiving the notion of parameter of Universal Grammar along novel lines, intended to meet recent criticisms from Newmeyer (2004, 2005) and Boeckx (2009), and to provide a fully workable research programme for the formal study of cross-linguistic variation in syntax (with obvious implications for historical language change, ontogeny, and phylogeny). The basic idea is the following: Baker’s (1996, 2008a,b) notion of macroparameter can be reconciled with the idea that parameters are specified as the formal features of functional categories by construing macroparameters as aggregates of microparameters. Apparent macroparametric variation appears when a group of functional heads are specified for the same properties: for example, if all heads implicated in determining word-order variation have the same word-order related property, the system is harmonically head-initial or head-final. This aggregate behaviour is determined, not by UG, but by a conservative learning strategy (input generalization), hence the distinction between micro- and macroparameters is not part of NS/UG, but is an emergent property of the interaction between the learner, the primary linguistic data, and UG. In these terms, we can set up networks of parameters. (1) illustrates how this might work for word order, assuming for concreteness that the default linearization option is head-initial, with head-final order derived by marking the relevant heads in some way (e.g. for triggering movement of their complements as in Kayne 1994): (1)

Is the head-final feature present on all heads?

Y: head-final (a)

N: present on no heads?

Y: head-initial(b)

N: present on [+V] categories?

Y:head-final in the clause only(c)

N: present on...

536 Ian Roberts Languages of type (a) are Japanese, Korean, Dravidian, etc.: the harmonically, rigidly head-final systems. Type (b) includes the rigidly, harmonically head-initial Celtic and Romance languages. Type (c) features German and Dutch, to a close approximation, since they show head-final TP, vP, and VP but are (almost) head-initial in all other categories. At the ‘deepest right branch’ (this notion is just a notational choice, with no theoretical significance), the parameter breaks down into a series of increasingly specific microparameters. Below I will sketch several parametric hierarchies of this kind. True macroparameters sit at the top of the network, as here all relevant parametrized heads behave as one. As we move successively down, systems become more marked, parameters become more ‘micro’, behaving in a nonuniform, differentiated fashion which is inherently more complex than the systems higher in the tree (we can suppose that the options move from subsets of the set of formal features F to singleton features of heads f∈F, to increasingly context-sensitive environments, ultimately perhaps to single lexical items), and the options have a longer description (the conjunction of all the ‘dominating nodes’ in the hierarchy). For language acquisition, each parameter hierarchy defines a learning path, much in the sense of Dresher (1999), with the higher options inherently preferred by the acquirer. More generally, we can think of the hierarchies as an epigenetic landscape (Waddington 1977), defined by incrementally more computationally complex options as the learner ‘moves down the tree’. The acquisition device searches the space by looking for the ‘easiest’ solution at each stage, where a solution is defined as a parameter-setting compatible with available primary linguistic data. The device moves from a relatively easy to the next-hardest stage only when forced to by primary linguistic data (PLD) incompatible with the current setting. On the other hand, if language change is driven by acquisition, and in particular by choices made by acquirers on the basis of ambiguous PLD that differ from the choices made by previous generations (Lightfoot 1979), then the inference is that in diachrony systems will, all other things being equal, ‘move up’ the hierarchy. However, as the sketch in (1) should make clear, systems near to the top of a hierarchy are very different from one another (owing to the concerted action of aggregates of microparameters), and so, the higher the position in the hierarchy, the harder it is for systems to change. At the lower, microparametric levels, on the other hand, it is relatively easy for systems to change. A further question that arises, in the spirit of approaching UG ‘from below’ (see Chomsky 2005, 2007), is the extent to which parametric variation is specified by UG. Since the P&P model appears to presuppose a complex and richly structured UG, this is a problem (see Newmeyer 2005: 83). The recent resurgence of interest in the evolution of language points the same way: how would the Darwinian forces of mutation and selection give rise to a richly parametrized UG? Classical GB-style P&P theory,

Macroparameters and Minimalism 537 then, seems to pose a problem for both the ontogeny and the phylogeny of language. In this connection, Berwick and Chomsky (2011) argue that much of the observable variation in grammatical systems reflects the nature of what they refer to as ‘the externalization process’, i.e. phonological/ morphological interface (PF), rather than the narrow syntax (NS) itself. They suggest that true, NS-internal, syntactic variation may be negligible or non-existent. On the basis of the parameter hierarchies to be investigated below, I conclude that there are just two axes of variation: set-theoretic relations among formal features (person, number, gender, and case features) and distribution of the feature which triggers syntactic movement (whose formal nature we will say more about below). Of these, only the latter is UG-internal: the former follows from set theory combined with the simple and obvious idea that languages select for the set of formal features they deploy in derivations (just as they select for phonological distinctive features). I suggest a conceptual motivation for the UG-internal nature of the movement trigger below. Hence, we get a partial reconstruction of the original conception of syntactic parameter, in the sense that we expect to find large-scale typological variation as the reflex of a single (aggregate) property of (a set of) functional categories. Moreover, as already observed, the parameter hierarchies make clear predictions about both language acquisition and change. However, the ontological status of the parameters, and of the hierarchies, is very different from either the 1980s or the 1990s conception: parametric variation is not specified in UG itself. Instead, it arises from underspecified aspects of UG, and is structured by third-factor properties arising largely from the need for efficient learning. Because of this, we are not led to propose an over-elaborate innate endowment; in keeping with Chomsky’s recent proposals, the innate UG contains rather little more than the single combinatorial operation Merge and a schema for syntactic categories and features.

15.2 Parameter Hierarchies In principle, parametric hierarchies may hold either of NS or of PF parameters. Fully investigating the nature of clearly PF parameters (concerning syllable structure, phoneme inventories, patterns of assimilation, metrical properties, etc.) would take us too far afield, and so I will concentrate on parameters whose effects are morphosyntactic here. For now I put aside the Berwick-Chomsky conjecture mentioned above, and return to the question of whether these are truly NS parameters after introducing and discussing the hierarchies. In the previous section I illustrated word order/linearization as one macroparameter, whereby the unordered structures created by merge in NS are linearized by Kayne’s Linear Correspondence Axiom, head-final categories

538 Ian Roberts being marked with the movement-triggering diacritic. Looking at this hierarchy from a diachronic perspective, and taking seriously the idea that systems will, all other things being equal, tend to ‘move up’ the hierarchy, we can observe two things. First, change from type (c) (head-final in the clause only) to type (b) (head-initial) is readily observed in several major branches of Indo-European (North Germanic, Celtic, Romance, Greek) and in Western Finno-Ugric. English of course shows this change in its recorded history, as first observed in the generative literature by van Kemenade (1987). Second, we expect change from type (b) (head-initial) to type (a) (head-final) to be rarer, but a number of interesting cases are documented: in Niger-Congo (Nikitina 2008), in Ethiopian Semitic (Biberauer, Newton, and Sheehan 2009a,b) and in Chinese (Huang 2015). A further relevant point is that the possible mechanisms and directions of change may in all cases be subject to UG-derived constraints. In the area of word-order/linearization a particularly interesting one may be the ways in which the Final-over-Final Condition (FOFC) constrains possible ‘intermediate’ systems in change from general head-initial to general head-final order, or vice versa. FOFC rules out a head-initial category as the complement of a head-final category (in the same extended projection; see Biberauer, Holmberg, and Roberts 2010 and Biberauer and Sheehan 2012 for slightly different versions of the constraint): (2) *[ ZP [ XP X YP] Z ] As pointed out in the above references, this implies the following two diachronic paths: (3) a. [[[O V] I] C] → [C [O V] I]] → [C [ I [O V]]] → [C [I [V O]]]. b. [C [I [V O]]] → [C [ I [O V]]] → [C [ [ O V] I]] → [[[ O V] I] C]. Any other path will violate FOFC at some diachronic stage. See in particular Biberauer, Newton, and Sheehan (2009a,b), Biberauer, Sheehan, and Newton (2010) for discussion. In the rest of this section, I will outline three further possible parameter hierarchies and briefly consider their possible diachronic consequences. 15.2.1 Null Arguments Following Holmberg (2010) and Roberts (2010a,b), I take null arguments to arise through pronoun deletion, which can take place under the generalized recoverability condition that the formal features of the goal be (properly) included in the features of the probe (we refer to such goals as ‘defective’). The relevant parameter hierarchy is then as follows (here ‘fully specified’ means recoverably specified, permitting recoverability):

Macroparameters and Minimalism 539 (4)

Are φ -features obligatory on all probes? yes

Νο

Are φ -features fully specified on all probes?

Radical pro-drop (a)

Νο

Yes Pronominal arguments (b)

Are φ -features fully specified on some probes? Νο:

Non-null-subject (c)

Yes Are the φ -features of {T, v,..}fully specified Italian, etc.

This parameter falls fairly cleanly into a hierarchy. Type (a) languages include Japanese, Chinese, and many other East Asian languages, which lack agreement-marking altogether and yet permit any pronoun to be dropped under appropriate discourse conditions. Saito (2007) argues that the absence of ϕ-features means that no head can be an active probe (since, following Chomsky 2001, active probes by definition have unvalued ϕ-features), and hence arguments may freely fail to appear in overt syntax, being inserted only in LF. This kind of system represents the maximum of ϕ-impoverishment. Type (b) languages include a number of Amerindian languages, notably Navajo, analysed by Jelinek (1984): these languages also allow all pronominal arguments to drop, but differ from the East Asian type in showing fully specified subject- and object-agreement, as well as possessor agreement in nominals, and free word order. Hence these languages represent the maximal case of morphological ‘richness’ allowing recoverable deletion of pronouns. Type (c) is represented by English and the North (and perhaps West) Germanic languages (with the possible exception of Icelandic), languages which do not allow null pronominals at all. As indicated, the hierarchy breaks down into microparameters at this point (see Holmberg and Roberts 2010 [this volume, Chapter 14] for details), starting with those languages which, like Italian, allow null pronominal subjects only. Looking at this hierarchy from a diachronic perspective, we can readily observe that the change from null-subject to non-null-subject is well-attested. Perhaps the best-known case study is French (see Adams 1987a,b, Roberts 1993, Vance 1989, 1997), but it is very likely that the Germanic languages, either together or separately, must have moved from a null-subject system to a non-null-subject system at some stage in their prehistory, given the general prevalence of null subjects in other branches of Indo-European.

540 Ian Roberts Concerning the higher parts of the hierarchy, one question that can be asked is whether colloquial French is moving to a pronominal argument system in which all arguments are obligatorily realized as clitics with fully nominal ‘doubles’ realized in adjunct positions, along the general lines of pronominal-argument languages as analysed by Jelinek. Superficially, examples like (3) suggest that this is the case: (5) (Moi), (le livre), je le Me, the book, I it ‘I have given the book to John.’

lui to-him

ai have

donné, given,

(à Jean). to John.

This kind of sentence is natural in contemporary spoken French. Here all the bracketed constituents are optional as long as all the arguments are realized as clitics. Among others, Harris (1978) suggests that French is moving in the direction of having a very rich system of preverbal agreement markers (the clitics) with optional doubling. However, the only obligatory clitic is the subject je, and if the other clitics are not present the relevant putative ‘double’ must occupy the appropriate argument position, with very little freedom of word order. French may thus show a tendency in the direction of a pronominal agreement system, but the only argument for which this is clearly the case is the subject. Nonetheless, these varieties of French give us a picture of how a general pronominal-argument system might develop. Can a system move from the pronominal-argument option to radical prodrop? This would have to involve the loss of all agreement morphology. Of course, loss of agreement-marking is a readily attested kind of change, very evident in the history of English as is well-known. But here what is required is the complete loss of valuation of ϕ-features of all arguments. There is some suggestive evidence from Brazilian Portuguese that the loss of clitics can bring about argument drop (in particular the loss of direct-object and impersonal clitics seem to have brought about a number of what, from the general Romance perspective, are highly unusual cases of null subjects and objects respectively; see in particular Duarte 1995 and the papers in Kato and Negrão 2000). We can thus see ways in which languages may, over time, ‘move up’ the null-argument hierarchy. 15.2.2 Word Structure This replicates the oldest typology proposed (Schlegel 1817, Schleicher 1861–2, Sapir 1921). The conjecture is that nineteenth-century typology observed these highly salient properties of word structure and naturally attributed them to morphology, when in fact they are determined by syntax, more precisely by the syntax of incorporation. In Roberts (2010c) I put forward a general account of incorporation in terms of the framework of Chomsky (2001), in which the central idea, again, is that of a defective goal (i.e. a category whose formal features are properly included in those of its

Macroparameters and Minimalism 541 probe). If the defective goal is a minimal category in bare-phrase-structure terms, then incorporation results from the feature-copying involved in the agree/match relation. In these terms, polysynthetic languages allow productive incorporation of lexical roots, notably N-to-V incorporation. Fully analytic languages such as Chinese disallow head-movement even at the lowest structural level (V-to-v and N-to-n; Huang 2015). Fusional languages relativize head-movement to categories: familiar V-movement parameters fall under this heading (cf. the link between V-to-T movement and inflection on V discussed by Vikner 1997, Biberauer and Roberts 2010 and others). The hierarchy is as follows:1 (6)

Do all probes trigger head-movement? Y: polysynthesis(a)

Do some probes trigger head-movement? N: analytic(b)

Y: does {C, T ...}(c) ?

Type (a) is instantiated by a number of Amerindian polysynthetic languages, notably Mohawk, as analysed in detail by Baker (1996). Type (b) is Chinese, and type (c) is instantiated by the Romance and Celtic languages for V-movement to T, and the non-English Germanic languages for V-movement to (root) C. Looking at this hierarchy from a diachronic perspective, we can observe that the well-known tendency across Indo-European for inflection and concomitant head-movement to be lost (most clearly observed in the history of English and the North Germanic languages, see again Vikner 1997 and the references given there) shows how systems ‘move up’ to approximate to the analytic type. Of course, at the very top, it is hard to see how a system might change from analytic to polysynthetic. Perhaps in certain cases the distances between the highest values are simply too great for any normal, acquisitiondriven, process of change to traverse. 15.2.3 Alignment The alignment of case and agreement marking with grammatical functions such as subjects and objects is highly variable across languages, and most languages have some means of altering their unmarked alignment (arguably the commonest being the passive, although causatives, applicatives and ‘dative-shift’ also do this, while psychological predicates often show marked alignment and, internal to nominals, clausal patterns are only partially replicated). The basic patterns are captured in Chomsky’s (2000, 2001) probe-goal-agree system. There Chomsky proposes that if a probe bears the movement diacritic, it causes the goal to move to its specifier. This accounts for subject-raising to Spec,TP and basic cases of object shift.

542 Ian Roberts Collins (2005) argues convincingly that passives are derived by ‘smuggling’ a participial phrase over the first-merged external argument in Spec,vP, thereby making the object closer to T and facilitating object movement to the subject position (with the subject staying in its first-merged position and being realised as a ‘by-phrase’). Roberts (forthcoming) generalizes Collins’ account of passives and argues for the existence of a class of ‘indirect’ derivations which are formally characterized by the simple difference that the probe triggers movement of a category distinct from the goal. In ‘direct’ derivations, on the other hand, the probe triggers movement of its goal. More precisely, consider (8): (8) H1 [ H 2 [H3P DP1 H 3 [YP Y DP2 ]]] H2 is a phase head, endowed with movement-triggering (EPP/EF) and probing features, the latter may be ‘inherited’ by the non-phase head H3 (Chomsky 2008). H1 is external to the phase headed by H2, but may have probing features. Both DPs have unvalued features and so can be goals and can move. YP is a lexical predicative category and as such may move but is not a possible goal for any of H1–3, neither is it a probe, being lexical. In this situation there are just two possibilities for a well-formed derivation in which locality conditions are met and probing features valued: (i) H2 attracts DP1, which is probed by H1 (and possibly further attracted), while H3 inherits H2’s probing features and probes DP2. This is a direct derivation. (ii) H2 attracts YP and probes DP1, which does not move; H1 probes (and possibly attracts) DP2; H3 is inert. This is an indirect derivation. The clausal structure assumed in relation to (8) is slightly more complex than the standard one, in that the head of the ‘internal phase’ is not v but Voice: (9) T = H1 [ VoiceP Voice = H 2 [vP DP1 = EA v = H 3 [ VP V DP = IA 2 ]]] In an active derivation, v inherits probing features from Voice and thereby probes the IA, but Voice retains the movement-triggering feature and attracts the EA to its specifier; the EA is then further attracted to T. In a passive derivation, Voice withholds its ϕ-features from v and thereby licenses the EA; its movement-triggering feature then causes VP to move. In this way, the IA is placed in a position where it can and must be probed by T. This distinction between direct and indirect derivations underlies the active-passive derivation, two main types of causative constructions found cross-linguistically, ‘dative-shift’ in languages like English, and variation in the realization of arguments of psychological predicates both within and across languages. Most importantly for present purposes, it also underlies

Macroparameters and Minimalism 543 the ergative-accusative alternation. The structure of an ergative clause is as in (10) (which is in fact the same as (9)): (10) T/Aux [ VoiceP Voice [vP EA v [ VP V IA]]] Where T takes the direct option, it licenses and raises the EA, as is standard. Where it takes the indirect option, VP raises to Spec,VoiceP over the subject in Spec, vP and the object, if present, is licensed by T: ergative alignment results, with the external argument being licensed by Voice. If the object is not present, T can license the subject (if Voice nonetheless has a probing feature, this may give rise to a tripartite system). Hence the ergative pattern (intransitive subject patterns with transitive object) can be derived. In a split-ergative system, T has the relevant set of tense/aspect properties, and selects the relevant type of VoiceP complement. Split-ergative patterns of the kind common in Indo-Iranian languages can be derived by distinguishing perfective and imperfective T, with just the former selecting Voice which triggers the indirect derivation and hence ergative alignment. The alignment hierarchy, to a first approximation, is as follows: (11)

Does Voice trigger a direct derivation? Y: accusative(a)

N: does Voice trigger an indirect derivation?

Y: does Voice probe the object?

Y: tripartite (c)

N: Voice indirect just with unaccusatives? Y: active (b)

N: ergative (d)

N: {[αAsp],[βPers]} T triggers indirect derivation?

Type (a) languages show the familiar accusative alignment, covertly in English, overtly in Latin, Russian, Japanese, etc. Type (b) languages show ergative alignment only with the single argument of an unaccusative verb, e.g. Basque. Type (c) languages distinguish transitive subjects, intransitive subjects, and direct objects, hence must have a probe distinct from T for one of the last two, i.e. Voice; this is what is found in Hindi and Marathi. Type (d) languages show the straightforward ergative alignment, e.g. Lezgian, Chuckchi, numerous Australian languages. Again, the split-ergative pattern gives rise to a range of microparametric variation determined by the exact features of T. From a diachronic perspective, the hierarchy in (11) predicts a preference for accusative systems generally, and indeed a preference for systems to shift from ergative to accusative alignment. This is a further prediction which requires close empirical evaluation and is supported by Nichols’ (1994) observation that ergativity appears to be a recessive phenomenon.

544 Ian Roberts One very interesting and well-known diachronic observation, which in fact appears to go against the immediate prediction of the hierarchy in (11), is that ergative alignment tends to originate from passive constructions: it has been proposed that many Polynesian languages have undergone a passive-to-ergative change (Hohepa 1969, Hale 1970), and that split-ergativity in Indo-Iranian is derived from an earlier passive construction (Butt and Deo 2005, Garrett 1990, Harris and Campbell 1995). We can in fact understand this in terms of two things: why it is that such a ‘low’ category moves in indirect derivations, and this in interaction with FOFC. On the first point, recall that in indirect derivations, the goal does not move by definition. The complement of the goal cannot move by anti-locality (Abels 2003; see Biberauer, Holmberg, and Roberts 2010 for a refinement), hence the closest category available for movement is the complement of the complement. This is why VP moves in indirect derivations triggered by Voice. However, here an important refinement becomes relevant. Consider the structure that results from VP-fronting in an indirect derivation: (12) T [ VoiceP [ VP V IA] Voice [vP EA v (VP)]] All other things being equal, the boldfaced part of (12) violates FOFC (see (2)). Now, in a head-final language, all things are not equal as the order of IA and V is inverted and we no longer have a FOFC violation here. Assuming general ‘roll-up’ of all complements, we have a structure like (13) instead: (13) [ VoiceP [vP EA [ VP IA V (IA)] v (VP)] Voice (vP)] T (VoiceP) This gives SOV surface order (with the verb possibly showing the structure root+ voice+ tense, which is very common crosslinguistically, see Julien 2002) and no FOFC violation. But of course passives (and other types of indirect derivation) are attested in SVO and languages with other word orders. Here, one possibility is that participial morphology plays a role. Collins’ (2005) analysis of the passive is actually a little more complex than the above summary suggests in that he suggests that the VP is contained inside a participle phrase (PrtP). It is therefore this category which moves in an indirect derivation. In fact, if the direct object is indefinite and the expletive there appears in Spec,TP, we can see the PrtP in its fronted position, without object movement, in examples like (14):2 (14) There were [VoiceP [PrtP many students arrested VP ] Voice [vP EA v (PrtP)]]. Here again, the boldfaced part of the structure shows an apparent FOFC violation. However, Biberauer, Holmberg, and Roberts (2010) observe and explain what they call the Category Proviso to FOFC. This states that, in

Macroparameters and Minimalism 545 the basic FOFC configuration in (2), repeated here, if XP and Z are distinct in category FOFC does not apply: (2) *[ ZP [ XP X YP] Z] The notion of category distinction that Biberauer, Holmberg, and Roberts apply is related to the notion of Extended Projection, as originated by Grimshaw (1991). Hence, all the functional categories making up the clause count as the same category, and similarly in the complex nominal making up the extended DP. However, we can adopt a traditional idea and consider that participles are not truly verbal, but share certain features (the ability to show gender but not person agreement in Romance, for example) with nominals rather than with verbs. If so, then we may be justified in regarding the PrtP in (14) as categorially distinct from Voice, which is clearly a verbal category. In that case, (14) does not instantiate a FOFC violation. More generally, indirect derivations in head-initial languages will be allowed, but on condition that the moving category be in some way nonverbal, characteristically a participial or infinitival element. This condition does not hold in head-final languages. So we predict that participial passives will be general in VO languages, and available as an option but not required in OV languages.3 What about ergative languages? Here there is no good motivation for invoking general participial morphology on all verbs. And in fact, we find a very clear skewing of ergativity in relation to word order: according to WALS (Map 81, Dryer 2008, and Map 98, Comrie 2008) SOV languages make up twenty-one out of forty (just over 50%) of ergative languages (including active and tripartite languages), and about 46% (497/1,223) of languages of all alignment types. SVO languages make up 36% of the 1,228 languages surveyed for clausal word order, but there are no SVO languages showing an ergative pattern out of 190 languages surveyed, and only one showing an active-inactive pattern (Drehu; Oceanic). Even when one corrects for the bias towards case-marking in SOV as opposed to SVO languages, this appears to be a significant and unexpected skewing. If ergative case-marking depends on an indirect derivation, FOFC is violated if VO order is retained. Hence the verb, the object or the entire VP must move again, entailing some order other than SVO. This evidence supports the idea that ergative derivations do not involve a special category ‘protecting’ the fronted category from the effects of FOFC, while passives do. We are now in a position to understand the nature of the passive-toergative shift. It may simply involve the loss of passive morphology (or the reinterpretation of this morphology as something else functioning as part of the verbal sequence of functional heads, e.g. as an aspect marker), and the retention of the indirect derivation. This derivation is ruled out by FOFC in an SVO system, but allowed in an SOV system. Hence the passive-to-ergative shift will be possible in SOV languages, and in this context

546 Ian Roberts we can note that the Indo-Iranian languages are SOV (and in fact, their word order has become more rigidly OV diachronically, since Sanskrit allowed a good deal of deviation from what was probably a basic OV pattern, like most of the older Indo-European languages; see Hale 1995 on Sanskrit word order). The Polynesian languages which have undergone this change present more of a challenge. Since they are VSO alternating with VOS, however, they may be amenable to the analysis proposed in Massam (2005), who proposes that these languages may in fact allow nominalized predicates; this property might also explain the relative rarity of VOS languages.4 It is clear that a number of questions remain open, but at the same time the postulation of the hierarchy in (10), like the others in the previous sections, raises a number of interesting questions for both synchronic and diachronic syntax. 15.2.4 Conclusion Although the above hierarchies almost certainly do not exhaust the inventory of syntactic macroparameters, together they determine a very large number of highly salient, yet still variable, surface properties. At the true macrolevel (i.e. the first choice point, highest in the hierarchy), and leaving aside word structure, these hierarchies determine the following surface properties (other things being equal): (15) a. Head-final. b. Radical pro-drop. c. ‘Free word order.’ d. Accusative alignment. These properties together define a very common type, including the Dravidian, Altaic, and most Finno-Ugric (except Finnish and one or two others) languages, as well as Japanese and Korean. Of course, it is possible that these shared features are a macro-areal phenomenon, but this set of correspondences is predicted by the hierarchies, and the incompatibility of the polysynthesis value with the others is also predicted, since polysynthetic languages always allow free word order and pro-drop of all arguments (see Baker 1996). A further point is that, as syntactic parameters, the effects of these parameters must be visible in every language: no language can fail to choose basic order, whether or not to realize pronominal arguments, basic word structure, and the type of derivation which licenses subjects and direct objects. Similarly, the evidence for setting these parameters is highly salient in the primary linguistic data of language acquisition, and hence acquirers can be expected to arrive at the macrovalues very early (see Wexler 1998 on very early parameter setting). This can be seen in terms of the notion of

Macroparameters and Minimalism 547 parameter-expression, introduced in Clark and Roberts (1993 [this volume, Chapter 2]) (this definition is from Roberts and Roussou (2003:15)): (16) A substring of the input text S expresses a parameter Pi just in case a grammar must have Pi set to a definite value in order to assign a wellformed representation to S. Consider, for example, a very simple sentence such as the following: (17) He ate it. This sentence expresses the following macroparametric properties of English: SVO order, no null arguments (indirectly, if the referents of the pronouns are salient in context since they can be supposed to be empty in a system allowing null pronouns), morphological fusion and accusative alignment (indirectly, in relation to other parts of the paradigm). Comparable examples in other languages similarly express the major parameter values: (18) a. L’ha It.s/he-has ‘S/he has eaten it’ b. Tabe-ta. eat-past ‘S/he has eaten it’ c. ShakoMsS/3PO‘He likes them’

mangiato. eaten

(Italian) (Japanese)

núhwe’-s like- HAB (Baker 1996)

(Mohawk)

As (18a.) shows, this simple Italian sentence expresses that language’s values for null-arguments, i.e. that it is a null-subject language, in that the subject pronoun is not expressed while the object is; it also expresses morphological fusion and accusative alignment. What it does not express is the SVO word order of Italian; in fact the position of the pronominal object clitic expresses OV order (although of course the corresponding sentence with a non-pronominal object would clearly express the general VO order of Italian). The Japanese example (18b) expresses the general radical pro-drop nature of Japanese, and its agglutinating nature (the past-tense morpheme -ta regularly suffixes to verbs). Since the subject and object are not expressed, though, it does not express either alignment or word order. Finally, the Mohawk example in (18c) expresses the pronominal-argument nature of the language (through the syncretic subject/object agreement prefix). These examples illustrate the salience of the hierarchies in the PLD. Further, it is clear that the hierarchies interact. As mentioned above, polysynthetic languages always allow pro-drop of all arguments (Baker 1996), agglutinating languages show a strong tendency to head-finality (although

548 Ian Roberts the SVO Bantu languages are an exception) and, as noted above, there are no SVO ergative languages. In some cases, as is particularly clear from Baker’s work on polysynthesis and from the suggestion in §2.5 regarding SVO ergative languages, there are principled reasons for this.

15.3 Theoretical Questions In addition to the empirical richness of the predictions and implications of the parameter hierarchies, a number of theoretical issues arise. As mentioned in §1, the central theoretical question, given both Chomsky’s recent proposals and Newmeyer’s critique of P&P theory, concerns whether the parametric variation observed in the hierarchies is specified in UG (as thought in the 1980s and 1990s), or whether it may be derived through ‘third-factor’ considerations as suggested by Chomsky (2005, 2007). The parameter hierarchies are defined by complexity relations: the higher settings are simpler, having a shorter description, than the lower ones. Both this notion of complexity and the acquisition strategy of Input Generalization are instances of a general notion of computational conservatism, which we can think of as a facet of computational efficiency. Input Generalization can be defined as follows: (19) Generalization of the input: If acquirers assign a marked value to H, they will assign the same value to all comparable heads. (Roberts 2007) (19) leads all the potentially movement-triggering functional heads to ‘point the same way’. It is not a grammatical principle, but an acquisition strategy, and is motivated by computational conservativity. Mobbs (2008) suggests that this is a reflection of a non-language specific optimization principle. On this view, then, macroparametric effects in grammatical systems derive from markedness, which emerges from the computational conservativity of the learner. In these terms, as we have seen, there is no need to formulate a difference between micro- and macroparameters. It emerges given our characterization of markedness. Hence, the form of the hierarchies, and the nature of markedness, arise from third-factor properties. Moreover, the different points in the hierarchies all instantiate the following general schema: (20) Q(ff ∈ C )[P(f)] Here Q is a quantifier, f is a formal feature, C is a class of grammatical categories providing the restriction on the quantifier, and P is a set of predicates defining formal operations of the system (‘agrees’, ‘has an EPP feature’, ‘attracts a head’, etc.). The positions in the hierarchies differ in the specificity of the two arguments to the quantifier, C, the class of grammatical

Macroparameters and Minimalism 549 categories, and P, the predicates defining (conjunctions of) grammatical operations. The more specific either of these arguments, the more complex, and the more ‘micro’, the parameter. But arguably none of this is specified in UG: all that (20) really says is that UG leaves certain grammatical properties open (‘some quantification over formal features has some grammatical property’). The gaps left open are filled in by Input Generalization and the mode in which the learner ‘moves down’ the hierarchy, stopping, as stated in §1, at the earliest possible point compatible with experience. The form of parameters is thus not specified by UG, but is an emergent property of the interaction of UG, the acquirer and the data. In this way, parametric variation in fact arises from all three of the factors Chomsky (2005) discusses as contributing to language design: UG (under-specification), PLD, and the computational conservatism of the learner, which underlies (19). This approach is clearly highly compatible with recent minimalist thinking. Finally, a reconsideration of the formal mechanisms implicated in the four parametric hierarchies is of theoretical interest. Linearization and alignment depend on movement, and hence ultimately on the distribution of the movement trigger. On the other hand, the null-argument and wordstructure hierarchies can be defined in purely set-theoretic terms; they are really just special cases of Agree, in that a given probe-goal relation has specific consequences (deletion, incorporation); the choice of features related by Agree may have a range of syntactic consequences, as well as giving rise to differing morphological results. However, the movement trigger is not really a feature: it cannot be valued, checked, or, arguably, counted. Instead it should be seen as a consequence of the fact that merge is not restricted to applying only once: a head may choose to ‘remerge’ part of its complement, the second-merged occurrence of the complement will inevitably asymmetrically c-command the first-merged one and the head, and so PF will linearize it to the left of the head and delete the first-merged occurrence. A head can do this just once because the system cannot count. This, effectively the general option of movement, may be the only contribution UG itself makes to cross-linguistic variation. The rest is on account of third-factor considerations and externalization. So we arrive at a slightly nuanced version of the Berwick-Chomsky conjecture: UG does not mind about movement, hence the wide range of surface variation following from the simple general option of remerge.

Notes 1 Agglutinative systems, on the other hand, may fall outside this hierarchy, if it is correct that they result from the combination of head-final order, involving complement-to-specifier movement with the head of the target phonologically realised (Julien 2002). 2 Assuming that the verb root moves to Prt here (in order to pick up participial morphology), this example also shows that Prt triggers movement of the object. Presumably this is connected to Prt probing a subset of the object’s φ-features,

550 Ian Roberts as can be overtly seen in part-participle agreement for gender and number in Romance. 3 It is unclear how well this prediction holds up; WALS unfortunately does not supply sufficient data on passive constructions, distinguishing just between their presence and absence. 4 A related point is that it is very likely that there are two different kinds of VSO languages, those which allow an alternative VOS order, have impoverished tense inflection, and allow ergative alignment, and those which show none of these properties. The former type is exemplified by Polynesian VSO languages, the latter by Celtic and Semitic VSO languages; see Biberauer and Roberts (2010) for details and a proposed explanation.

References Abels, K. (2003) Successive Cyclicity, Anti-Locality and Adposition Stranding. PhD Dissertation, University of Connecticut. Adams, Marianne. (1987a) From Old French to the theory of pro-drop. Natural Language and Linguistic Theory 5:1–32. Adams, Marianne. (1987b) Old French, Null Subjects and Verb Second Phenomena. Doctoral dissertation, UCLA, Los Angeles, Calif. Baker, M. (1996) The Polysynthesis Parameter. Oxford: Oxford University Press. Baker, M. (2001a) The Atoms of Language. Oxford: Oxford University Press. Baker, M. (2001b) The natures of non-configurationality. In M. Baltin & C. Collins (eds.). Oxford: Blackwell, pp. 407–438. Baker, M. (2008a). The Syntax of Agreement and Concord. Cambridge: Cambridge University Press. Baker, M. (2008b) The macroparameter in a microparametric world. In T. Biberauer (ed.) The Limits of Syntactic Variation. Amsterdam: Benjamins, pp. 351–374. Berwick, R. & N. Chomsky (2011). The Biolinguistic Program: the current state of its development. In A-M. di Sciullo & C. Boeckx (eds.). The Biolinguistic Enterprise: New Perspectives on the Evolution of Language and the Nature of the Human Language Faculty. Oxford: Oxford University Press, pp. 19–41. Biberauer, T., A. Holmberg & I. Roberts (2010) A Syntactic Universal and Its Consequences. Ms. Universities of Cambridge and Newcastle. Biberauer, T., G. Newton & M. Sheehan (2009a) The Final-over-Final Constraint and predictions for diachronic change. In R. Compton & M. Irimia (eds) Toronto Working Papers in Linguistics 31:1–17. Biberauer, T., G. Newton & M. Sheehan (2009b) Limiting synchronic and diachronic variation and change: The Final-Over-Final Constraint. Language and Linguistics 10:699–741. Biberauer, T. & I. Roberts. (2010) Subjects, tense and verb-movement. In T. Biberauer, A. Holmberg, I. Roberts & M. Sheehan (eds) Parametric Variation: Null Subjects in Minimalist Theory. Cambridge: Cambridge University Press, pp. 263–302. Biberauer, T. & M. Sheehan (2012) Disharmony, antisymmetry, and the final-overfinal constraint. In M. Uribe-Etzebarria & V. Valmala (eds) Ways of Structure Building. Oxford: Oxford University Press, pp. 206–244. Biberauer, T., M. Sheehan & G. Newton (2010). On impossible changes and impossible borrowings. In A. Breitbarth, C. Lucas, S. Watts & D. Willis (eds) Continuity and Change in Grammar. Amsterdam: Benjamins, pp. 35–60. Boeckx, C. (2011) Approaching parameters from below. In A.-M. di Sciullo & C. Boeckx (eds) The Biolinguistic Enterprise: New Perspectives on the Evolution

Macroparameters and Minimalism 551 and Nature of the Human Language Faculty. Oxford: Oxford University Press, pp. 205–212. Borer, H. (1984) Parametric Syntax. Dordrecht: Foris. Butt, M. & A. Deo (2005) Ergativity in Indo-Aryan. Ms. Universities of Constance and Stanford. Chomsky, N. (1995) The Minimalist Program. Cambridge, MA: MIT Press. Chomsky, N. (2000) Minimalist inquiries: The framework. In R. Martin, D. Michaels and J. Uriagereka (eds.). Step by step: essays on minimalist syntax in honor of Howard Lasnik. Cambridge, MA.: MIT Press, pp. 89–156. Chomsky, N. (2001) Derivation by phase. In M. Kenstowicz (ed.). Ken Hale: A Life in Language. Cambridge, MA.: MIT Press, pp. 1–53. Chomsky, N. (2002) On Nature and Language. Cambridge: Cambridge University Press. Chomsky, N. (2005) Three factors in language design. Linguistic Inquiry 36: 1–22. Chomsky, N. (2007) Approaching UG from below. In H.-M. Gärtner and U. Sauerland (eds.) Interface + Recursion = Language? Chomsky’s Minimalism and the View from Syntax and Semantics. Berlin: Mouton de Gruyter, pp. 1–29. Chomsky, N. (2008) On phases. In R. Freidin, C. Otero and M-L. Zubizarreta (eds). Foundational Issues in Linguistic Theory. Cambridge, MA: MIT Press, pp. 133–166. Clark, R. & I. Roberts (1993) A computational model of language learnability and language change. Linguistic Inquiry 24: 299–345 [this volume, Chapter 2]. Collins, C. (2005) A smuggling approach to the passive in English. Syntax 8: 81–120. Comrie, B. (2008) Alignment of case marking of full noun phrases. In Martin Haspelmath, Matthew S. Dryer, David Gil & Bernard Comrie (eds.) (2008) The World Atlas of Language Structures Online. Munich: Max Planck Digital Library, chapter 98. Available online at http://wals.info/feature/98. Accessed on February 20, 2010. Dresher, E. (1999) Charting the learning path: Cues to parameter setting. Linguistic Inquiry 30: 27–68. Dryer, M. (2008) Order of subject, object and verb. In Martin Haspelmath, Matthew S. Dryer, David Gil & Bernard Comrie (eds.) The World Atlas of Language Structures Online. Munich: Max Planck Digital Library, chapter 81. Available online at http://wals.info/feature/81. Accessed on February 20, 2010. Duarte, E. (1995) A perda do princípio ‘Evite pronome’ no português brasileiro. PhD Dissertation, University of Campinas. Garrett, A. (1990) The origin of NP split ergativity. Language 66: 261–296. Grimshaw, J. (1991) Extended Projection. Ms: Rutgers. Hale, K. (1970) The passive and ergative in language change: The Australian case. In S.A. Wurm and D. Laycock (eds.) Pacific Linguistic Studies in Honor of Arthur Capell, Sydney, pp. 757–781. Harris, M. (1978) The Evolution of French Syntax: A Comparative Approach. London: Longman. Harris, A. & L. Campbell (1995) Historical Syntax in Cross-Linguistic Perspective. Cambridge: Cambridge University Press. Hohepa, P. (1969) The Accusative to Ergative drift in Polynesian languages. Journal of the Polynesian Society 78: 295–329. Holmberg, A. (2010) Null subject parameters. In T. Biberauer, A. Holmberg, I. Roberts & M. Sheehan (eds) Parametric Variation. Cambridge: CUP, pp. 88–124. Holmberg, A. & I. Roberts (2010) Introduction. In: T. Biberauer, A. Holmberg, I. Roberts & M. Sheehan. Parametric Variation. Cambridge: CUP, pp. 1–54 [this volume, Chapter 14]. Huang, C.-T. J. (1982) Logical Relations in Chinese and the Theory of Grammar. PhD, MIT.

552 Ian Roberts Huang, C.-T. J. (2015) On syntactic analyticity and parametric theory. In A. Li, A. Simpson & W.-T. D. Tsai (eds) Chinese Syntax in a Cross-Linguistic Perspective. New York/Oxford: Oxford University Press, pp. 1–48. Jelinek, E. (1984) Empty categories, case, and configurationality. Natural Language and Linguistic Theory, 2(1):39–76. Julien, M. (2002) Syntactic Heads and Word Formation. Oxford/New York: OUP. Kato, M. & E. Negrão (eds) (2000) The Null Subject Parameter in Brazilian Portuguese. Frankfurt: Vervuert-IberoAmericana. Kayne, R. (1994) The Antisymmetry of Syntax. Cambridge, Mass.: MIT Press. Kayne, R. (2000) Parameters and Universals. Oxford: Oxford University Press. Kayne, R. (2005a) Movement and Silence. New York/Oxford: Oxford University Press. Kayne, R. (2005b) Some Notes on Comparative Syntax, with Special Reference to English and French. In G. Cinque and R. Kayne (eds.) Handbook of Comparative Syntax. New York: Oxford University Press, pp. 3–69. Van Kemenade, A. (1987) Syntactic Case and Morphological Case in the History of English. Dordrecht: Foris. Lightfoot, D. (1979) Principles of Diachronic Syntax. Cambridge: Cambridge University Press. Massam, D. (2005) Lexical categories, lack of inflection, and predicate fronting in Niuean. In A. Carnie, S. Dooley and H. Harley (eds). Verb First: On the Syntax of Verb-Initial Languages. Amsterdam: Benjamins, pp. 227–242. Mobbs, I. (2008) ‘Functionalism’, the Design of the Language Faculty, and Typology. PhD Dissertation, University of Cambridge. Newmeyer, F. (2004) Against a parameter-setting approach to typological variation. Linguistic Variation Yearbook 4: 181–234. Newmeyer, F. (2005) Possible and Probable Languages. A Generative Perspective on Linguistic Typology. Oxford: Oxford University Press. Nichols, J. (1994) Ergativity and linguistic geography. Australian Journal of Linguistics 13:39–89. Nikitina, T. (2008) Nominalization and Word-Order Change in Niger-Congo. Unpublished ms. Stanford University. [Available at http://www.stanford.edu/~tann/ word_order.pdf]. Roberts, Ian. (1993) Verbs and Diachronic Syntax. Dordrecht: Kluwer. Roberts, I. (2007) Diachronic Syntax. Oxford: Oxford University Press. Roberts, I. (2010a) A deletion analysis of null subjects. In T. Biberauer, A. Holmberg, I. Roberts & M. Sheehan (eds) Parametric Variation. Cambridge: Cambridge University Press, pp. 58–87. Roberts, I. (2010b) Varieties of French and the null subject parameter. In T. Biberauer, A. Holmberg, I. Roberts & M. Sheehan Parametric Variation: Null Subjects in Minimalist Theory. Cambridge: Cambridge University Press, pp. 303–327. Roberts, I. (2010c) Agreement and Head Movement: Clitics, Incorporation and Defective Goals. Cambridge, Mass.: MIT Press. Roberts, I. (forthcoming) Direct and Indirect Derivations. Ms. University of Cambridge. Roberts, I. & A. Roussou. (2003) Syntactic Change: A Minimalist Approach to Grammaticalization. Cambridge: Cambridge University Press. Saito, M. (2007) Notes on East Asian Argument Ellipsis. Language Research 43:203–227. Sapir, E. (1921) Language. New York: Harcourt Brace & Co. Schlegel, A. (1817) Über dramatische Kunst und Litteratur, Grundzüge einer Kulturund Völkergeschichte Alteuropas. Heidelberg: Mohr & Winter. Schleicher, A. (1861–2) Compendium der vergleichenden Grammatik der indogermanischen Sprachen. 2 volumes. Weimar: Böhlau.

Macroparameters and Minimalism 553 Vance, B. (1989) Null Subjects and Syntactic Change in Medieval French. PhD dissertation, Cornell University. Vance, B. (1997) Syntactic Change in Medieval French: Verb Second and Null Subjects. Dordrecht: Kluwer. Vikner, S. (1997) Vo-to-Io movement and inflection for person in all tenses. In L. Haegeman (ed) The New Comparative Syntax. London: Longman, pp. 189–213. Waddington, C.H. (1977) Tools for Thought. London: Paladin. Wexler, K. (1998) Very early parameter setting and the unique checking constraint: a new explanation of the optional infinitive stage. Lingua 106: 23–79.

Index

1AEX 262, 271–6 A’-movement 111, 128, 338 acquisition 31–2, 488, 495, 497, 514, 518, 520–1; and parameter hierarchies 536–7, 546, 548; and structural simplification 192–4; see also Language Bioprogram Hypothesis, learnability, primary linguistic data, root infinitive agent/subject-oriented adverb 27, 263–4 agreement: poor 141, 152, 358, 360, 403; rich 12, 143, 243, 403, 477, 487–8, 494 alignment 541–6 A-movement 90–2, 130, 338, 343, 426 arbitrary reference 270–1 Baker, M. 325, 327–8, 504–5, 507, 509 Bickerton, D.; see Language Bioprogram Hypothesis binding 265–70, 305–6 Boeckx, C. 521–3 Borer, H. 484 Borer-Chomsky Conjecture (BCC) 501–4, 507 Breton 423, 427–8, 432 bridge verb 371–2, 373, 439 Burzio’s generalisation 264–5 by-phrase 265, 268 Calabrian 186, 189, 211 case 17, 91, 95, 166, 354–6, 480–1; absorption of 264, 277–8; and impersonal passives; 276–82; in Middle French 60–2, 70–7; in Modern French 60–6, 300–2, 304, 317, 354; in Old French 60–2,

66–70; and X-second 369, 377, 389–93, 397–8, 400–2 checking 95, 115–16, 122–6, 141, 426 Chinese 480, 510–11 Cinque, G. 202, 203, 218 Clausal Truncation Hypothesis 149–51 clitic: clitic-climbing 124, 346–51, 357, 361; clitic doubling 265, 296–7, 509; clitic left-dislocation (CLLD) 509; clitic-second 380–9; complement clitic 126–7, 130, 167–8, 393; see also cliticisation cliticisation 112–13, 127, 168, 274, 327, 346–7; in French 71, 77, 79, 153, 300, 341 complementiser 299, 434–9, 441–3, 448, 450, 452; grammaticalisation of 186, 211 complementiser-trace effect 487–93 conditional mood 185, 188 connectivity 204–5 control 263, 476 CP-recursion 315–16, 371–2, 378, 439, 44 crossover effect 124, 130, 261, 266–70 crossover mechanism 48, 50 Danish 89–91, 439 Darwin, Charles 39 determiner 187 Directionality Parameter 104, 128; see also Head Parameter discourse/radical pro-drop 480–1, 483, 516, 539 do-support 21–3, 244–8, 250, 343 downward reanalysis 197–8 Dutch 148, 382; verb raising in 108, 110–11, 121, 126, 327, 330–1, 382

Index elegance 47, 52–6, 70, 74, 81 embedded topicalization 377–80, 402, 416–18 Empty Category Principle (ECP) 304–5, 325–9, 335–9, 345–6 English 152, 154, 186, 302, 394, 475–6; Early Modern English object shift 92–9; loss of inflection 19–24; loss of verb movement 139–48, 242–7; Middle English (ME) 5, 15–19, 127–30, 236–8, 240–1; modals 3, 5, 24–8, 185, 188, 207; subject-auxiliary inversion 315–16; see also Old English EPP-feature 228–30, 232–4, 236, 238–42, 248–9; and parameter hierarchies 508, 513, 514 equidistance 91 ergative 543–5 excorporation 121, 346–8 existential construction 456 existential quantifier 187, 194 explanatory adequacy 198–206, 485–6, 488, 495–7, 501, 502, 504, 518–21 expletive 17, 78, 240, 264, 303–5, 395–400, 544; see also null subject Extended Projection Principle (EPP) 163, 430, 444, 456 extraposition 109–10, 119–20, 124 feature syncretism 191, 194, 199, 201 Final-over-Final Condition (FOFC) 538, 544 Finnish 482 fitness 40–4, 48–9, 51–9, 66, 69–70, 74–7, 79–91 Flemish 117–19 floated/floating quantifiers 25, 140, 126 free relatives 188 free-choice indefinites 188 French 187, 230, 340, 349–50, 353–60, 363, 398–404, 431, 475–6; acquisition of 59–66, 147, 150, 153; Middle French (MidF) 70–7, 353; multiple subjects 301–11, Old French (OF) 66–70, 374–7, 383, 392; root phenomena 311–314; subject-clitic inversion 298–301; verb movement 139–41, 143–4, 158–9 functional categories 206–11, 215–17, 220–1, 513 future tense 185, 188

555

generalisation of the input; see Input Generalisation genetic algorithm 40–1, 43, 47–51, 59, 79–82 German 108, 111, 148, 150, 276–8, 380–3, 391; expletive null subjects 395, 399, 479; verb-second 126, 359, 371–2, 431, 439 Gianollo, C. 496–7, 503, 512–14 Gilligan, G. 491–3, 496 Greek 475–7, 483; grammaticalisation in 185–90, 192, 194, 198, 201, 207, 210 Greenberg, J. 490, 496–7 Guardiano C.; see Gianollo, C. Haitian Creole 158–63, 164–70 Hawkins, J. 497–8, 508–9, 519 Head Movement Constraint (HMC) 140, 325, 329, 335–9, 345, 353; and object shift 95, 97 Head Parameter 507–8 head-final order 105–13, 117, 498, 505, 508, 515, 535–6 head-initial order 105–6, 113–26, 498, 505, 507–8, 515, 535—536 High Analyticity 510–11, 515 Huang, J. 428–9, 510–11, 523 Icelandic 95–6, 130, 143, 370–6, 390–2, 395–8 impersonal passive; see passive improvement 51 incorporation 274–5, 279–81, 300–2, 312–14, 540–1; see also excorporation Inertia Principle 227, 241, 250 Input Generalisation 509, 515, 518–19, 548–9 Interface Defectivity Hypothesis (IDH) 216–17 inversion: in English: 3–4, 11, 24, 29–30; free inversion 354–5, 487–93; simple inversion 59, 61, 79 Irish 422, 424, 432–8, 441, 489 Italian 143–4, 151, 166, 188–9, 214, 327, 477, 483; head movement 341, 347–8, 352, 354, 361–2, 384, 386, 393 I-to-C movement; see movement to C Japanese 481, 547 Jespersen’s Cycle 187, 200

556

Index

Kayne, R. 104–5, 113, 302–3, 346, 348–53, 537–8 Koopmann, H. 12, 62, 388–90 Language Bioprogram Hypothesis 155–7 Latin 201, 474–6, 506 learnability 40–1, 43, 77, 79–80, 82; see also learning problem learning algorithm 106, 203, 206, 520 learning problem 43–7 left-dislocation 46–7, 74, 80–1; see also clitic left-dislocation LF 94, 124, 164, 217, 221, 309, 362–3, 438 Lightfoot, D. 24–5, 29–30, 156–7 Lithuanian 273–5 Long Head Movement (LHM) 338–46, 348, 350–3, 356–7 long infinitive movement 351–3 Longobardi, G. 227, 496–7, 503, 512–14 Louisiana Creole 160–1 L-relatedness 128, 334, 336, 343–52, 356, 361–2 macroparameter 474, 500, 503–12, 514–15, 518, 523, 535–7, 547–8 Mainland Scandinavian 89–94, 97–8 markedness 32, 80, 145–6, 149, 151–2, 154, 198–203; and creoles 157, 162, 171; Jakobsonian 171, 202–3; and parameter hierarchies 502, 508–10, 513–14, 518–20 Mauritian Creole 159 microparameter 474, 499–500, 503–12, 514–15, 535–6, 539, 543 Minimality Condition 329, 335; see also relativised minimality modal 3–5, 10–11, 29–32, 169, 185, 188, 197, 207–8; in Icelandic 90; in Middle English 15–19; modal particle 189, 191; loss of inflection 19–26; root modal 26–8 morphological case 91, 127–30, 187, 403–4, 505 movement of objects 162, 195, 232–4, 236, 238–9, 249; and cliticisation 95–9; in Early Modern English 92–5; in Mainland Scandinavian 89–92 movement to C 335, 342, 345, 356, 359–60, 362; in French 61, 63–4, 359, 433; see also inversion

movement to Infl/T/Agr 152, 158–61, 314; in English 11–15, 23–5, 142–7, 92–3, 96–7, 121, 129–3, 243–7, 392 mutation 48, 50, 56 negation 150, 153–4, 437; in Celtic 422–3, 427; in English 3–4, 11, 24, 93–4, 96, 111, 117–18, 245–6, 248, 362; in Romance 342–3, 345, 348; see also Jespersen’s cycle negative-polarity item (NPI) 371, 437–8 Neogrammarians 39 Newmeyer, F. 492–501, 512, 519 null argument 538–40; see also null subject null subject 474–8, 483–5, 487–93, 516–17, 539–40; expletive 163–7, 395–400, 402–3, 479–80, 483, 492–3; in French 59–62, 64–73, 76–7, 79, 357–60, 376–7; partial null-subject language 481–3, 517; referential 358–60, 400–4, 491–2; see also discourse/radical pro-drop, null subject parameter, pro null subject parameter (NSP) 65–6, 350, 475–7, 484–6, 488–90, 503–5, 515–17; see also null subject object movement; see movement of objects object raising; see movement of objects object shift; see movement of objects Old English (OE) 4, 240–1, 387–8, 391–2; word order 107–12, 113–26, 234–5 OV 156–7, 161–2; in Old English 105, 107–10, 113, 128–9, 234–6; OV to VO 185, 195, 198, 201–2, 249; typological aspects of 494, 498, 505–8, 511–12, 545–7 p-ambiguity 64, 68–70, 75 Papiamentu 163, 167 parameter expression 512–14, 517–18, 520–2 parameter hierarchy 513–17, 536–46, 549 parameter schema 512–14, 517–18, 520–2 parameter space 204–6 parametric drift 248 parsing device 40, 44, 49–57, 74

Index passive 424–7, 541–5; with auxiliary verbs 282–9; and binding theory 265–71; impersonal passive 272, 274, 276–82, 397, 456–7; and thetaroles 271–6 p-encoding 57–9, 63–5, 67–8, 71–5, 78 Perlmutter, D. 475, 486–7 PF 164, 191, 193–4, 217, 219, 221, 361–3, 438 Phase Impenetrability Condition (PIC) 236, 238–9, 519 phi/φ-features 95, 98, 229, 509, 516 phonological reduction 211–17 pied-piping 228–30, 232–42, 248–50 polysynthesis 504–5, 509–11, 515–16, 541, 546 Portuguese 314, 342 Possessive have 96–7 poverty of the stimulus 482, 492, 496 primary linguistic data (PLD) 227, 249, 486, 495–7, 518, 520–1, 535–6 PRO 239, 270–1, 349 pro 98, 164, 168, 310, 317–18, 484, 490 Projection Principle 9–10, 267, 311–15 psych verb 268–9 radical pro-drop; see discourse/radical pro-drop raising verb 16–18, 271, 425 rationale clause 263–4 reconstruction 428, 447 reflexives and reciprocals 268–9 relativised minimality 91, 317, 335–6, 344 reproduction 48–50, 58 restructuring 17–19, 116, 119, 237–8, 347, 362–3 Réunionnais 160 rightward movement 108–10, 112, 117, 119 Rizzi, L. 149–54, 344, 354, 477, 486–90, 493, 497; see also relativised minimality, split-C(omp) root infinitive 147–55 root phenomenon 78, 311–14, 373, 385, 401 root-embedded asymmetry 430–5, 438, 441, 448–9, 451 Scots Gaelic 433 scrambling 97–8, 111–13, 108–9, 123–5, 127–8, 130

557

semantic bleaching 197, 206–11, 216–17 sentential adverb 27, 434–7 serial verb 169, 186, 188, 202 shifting 45, 66, 74, 76, 78 shortest link 91 simplicity 145, 191, 198–9 Spanish 339–40, 342, 351, 475–6 split-C(omp) 406, 436 Sportiche, D. 62, 68, 388–90 Steele, S. 30–1 strict cyclicity 303 structural change 185–9 structural simplification 191, 192–4, 197, 201 stylistic fronting (Styl-F) 240–1, 376–7, 379–80, 403, 416–18, 494 stylistic inversion 310, 317, 398–9 subjacency 48–9 subject-oriented adverb; see agent/ subject-oriented adverb Subset Condition 45, 53, 55, 66, 74, 81 superset 44–7, 52–6, 59, 66, 75, 81, 509 tense/mood/aspect (TMA) particles 168–72 that-trace effect; see complementisertrace effect theta/θ-role 9–11, 140–1, 262, 305; and auxiliaries 13, 15, 20, 27–8, 96; in passives 272–3, 280, 282 third factor 518–19, 522–4, 548–9 Tobler-Mussafia Law 340, 350–1, 370, 383–5 tough construction 269 translation function 49–51 Transparency Principle 31–2 T-to-C movement; see movement to C Turkish 477–8 Ukrainian 277, 280–1 unaccusative 271–4, 279, 281, 424–7, 543 unergative 425 uniformitarianism 204–5 upward reanalysis 195, 197–8 V1 70, 373, 383–4, 401 V2 46–7, 195, 359–60, 370–4, 438–40, 447, 502; in English 107–8, 120–1, 123, 127, 241–7, 387; in Icelandic 370–4, 398; in Middle French 70–5,

558

Index

77–81; in Old French 66–70, 359, 374–7, 383, 401; in Yiddish 377–80 V3 47, 74, 386–9, 391 verb raising 108–9, 115–16, 119, 330–1 verb second; see V2 verb-particle construction 93, 107, 235 verb-projection raising; see VP-raising, vP raising Visibility Condition 264, 278, 280 VO 161–2, 236, 239, 455; in Old English 113, 129–30; OV to VO; see OV; typological aspects of 494, 498, 505–6, 508, 545, 547; see also VOS VOS 546 VP ellipsis 286–9 vP-raising 231, 239–40 VP-raising 109–11, 116–19, 231, 233–4, 237

VSO 489–91, 546; in Welsh 419–24, 431–2, 441, 444, 454–6 V-to-Agr movement; see movement to Infl/T/Agr V-to-C movement; see movement to C V-to-Infl movement; see movement to Infl/T/Agr V-to-T movement; see movement to Infl/T/Agr V-Visibility 10 weak pronoun 114, 119, 283 Welsh 188, 189, 489; see also VSO wh-movement 130, 250, 380–1, 491–2; non-occurrence of 150, 308–9, 487, 510, 523 Yiddish 377–80, 416–17 Zwart, J.-W. 105, 114–17, 382