Report DMCA / Copyright


Diachronic and Comparative Syntax

This book brings together for the first time a series of previously published papers featuring Ian Roberts’ pioneering work on diachronic and comparative syntax over the last thirty years in one comprehensive volume. Divided into two parts, the volume engages in recent key topics in empirical studies of syntactic theory, with the eight papers on diachronic syntax addressing major changes in the history of English as well as broader aspects of syntactic change, including the introduction to the formal approach to grammaticalisation, and the eight papers on comparative syntax exploring head-movement, the nature and distribution of clitics, and the nature of parametric variation and change. This comprehensive collection of the author’s body of research on diachronic and comparative syntax is an essential resource for scholars and researchers in theoretical, comparative, and historical linguistics. Ian Roberts is Professor of Linguistics in the Department of Linguistics at the University of Cambridge. His most recent publications include The Final-Over-Final Condition, with Theresa Biberauer, Anders Holmberg and Michelle Sheehan (2017) and The Wonders of Language (2017).

Diachronic and Comparative Syntax

Ian Roberts

Diachronic Syntax


  1 Agreement Parameters and the Development of English Modal Auxiliaries



  2 A Computational Model of Language Learnability and Language Change



  3 Object Movement and Verb Movement in Early Modern English



  4 Directionality and Word Order Change in the History of English



  5 Verb Movement and Markedness



  6 Theoretical Consequences ANNA ROUSSOU AND IAN ROBERTS


vi Contents

  7 Cascading Parameter Changes: Internally-Driven Change in Middle and Early Modern English




Comparative Syntax


  8 Passive Arguments Raised



  9 Complex Inversion in French



10 Excorporation and Minimality



11 Two Types of Head Movement in Romance



12 Clause Structure and X-Second



13 The Analysis of VSO Clauses



14 Introduction: Parameters in Minimalist Theory



15 Macroparameters and Minimalism: A Programme for Comparative Research





This book brings together in one place, for the first time, a series of papers on diachronic and comparative syntax that I have published over a period of more than three decades. The papers deal with central empirical topics in recent syntactic theory (verb-movement, word order, null subjects, the nature and distribution of clitics) and also several major theoretical topics (head-movement, argument structure, the nature of parametric change and variation). Together, they form a coherent body of work reflecting how syntactic theory, as it has developed since the 1980s, can shed new light on questions of variation and change. Part I includes papers on diachronic syntax. Several of these papers deal with well-known changes in the history of English. Chapter 1 was a pioneering paper in that it attempted to recast Lightfoot’s classic (1979) analysis of the development of English modal auxiliaries in terms of principles and parameters governing verb-movement. It was the first paper to observe that English lost V-to-I movement of main verbs in the 16th century and to connect this to the breakdown of verbal inflection. In important respects, it anticipated Pollock’s (1989) seminal work on V-to-I movement in English and French, as well as the large literature on verb-movement that grew out of Pollock’s work in the 1990s (see, for example, the papers in Hornstein and Lightfoot 1994, Vikner 1995, 1997, Rohrbacher 1999, Bobaljik 2002, Bobaljik and Thráinsson 1998, Bentzen 2007, Wiklund et al. 2007, Koeneman and Zeijlstra 2014 and Tvica 2017). Chapter 3 shows how Holmberg’s (1986) generalization (object shift can only apply when the verb moves) holds throughout the history of English (albeit vacuously in Modern English); a corollary of this is that there is no need to posit any change in the nature of English pronouns in order to account for their changed distribution since the Early Modern period, as this follows directly from the change in verb syntax discussed in Chapter 1 combined with Holmberg’s generalisation. Chapter 4 was the first attempt to account for the change from OV to VO word order in Middle English from the point of view of the antisymmetric theory of syntax of Kayne (1994), closely basing the analysis of Old English word order on the proposals for Dutch in Zwart (1993). This approach

viii Preface was largely superseded by the one developed in Biberauer & Roberts (2005, 2008), the latter of which is republished here as Chapter 7. This latter paper presents an overview of a series of changes in English, from the Old English period through to the 17th century, showing how these changes form a “cascade”, with each change creating the conditions for the next. The other chapters of Part I deal with more general aspects of syntactic change. Chapter 6 is excerpted from Roberts & Roussou (2003) and summarises the overall approach to grammaticalisation adopted there, considering its implications both for the theory of parameters and for the theory of functional categories. Chapter 2 is an ambitious attempt to develop an account of parameter setting using genetic algorithms, and applying the idea to an account of how syntactic change originates in language acquisition. An important aspect of this approach is a “least-effort” principle in acquisition, which forms the basis for the theory of markedness developed and applied to a range of data in Chapter 5. The chapters making up Part II, dealing with comparative syntax, treat head-movement, the nature of clitics and/or the nature of parametric variation. Chapter 8 proposes an influential analysis of passive constructions, whose central idea is that the passive morpheme is an argumental clitic. Chapter 10 points out that there is nothing in the systems of head-­ movement put forward in Chomsky (1986) and Baker (1988) that, without stipulation, prevents “excorporation”, i.e. successive-cyclic head-movement without pied-piping. It is suggested that this may be an empirical advantage. Chapter 11 proposes, on the basis of Romance data, that there are two distinct kinds of head-movement, “A-head-movement” and “A’-headmovement,” subject, in terms of Relativised Minimality (as formulated in Rizzi 1990), to different locality constraints. It is argued that this can account for certain apparent violations of the Head Movement Constraint. Both Chapter 9 and Chapter 12 deal with clitic-placement and its interactions with verb-movement, the former in relation to a particular construction in French, the latter in relation to a range of “second-position” effects in a range of languages. In some respects, the latter paper anticipates Rizzi’s (1997) proposals regarding the expanded left periphery. Chapter 13 analyses verbmovement in VSO clauses in Welsh, relating the situation in this language to verb-movement and clause structure in Germanic and Romance. Finally, Chapters 14 and 15 develop a new approach to parametric variation, the former on the basis of a thorough overview of work on null subjects, the latter in more general terms. Taken together, these papers form a coherent body of work applying the theory of principles and parameters, at different stages of its development, to a range of diachronic and comparative phenomena. Ian Roberts Los Angeles, October 2017

References Baker, M. (1988) Incorporation: A Theory of Grammatical Function Changing. Chicago. Bentzen, K. (2007) Order and Structure in Embedded Clauses in Northern Norwegian. PhD dissertation, CASTL, University of Tromsø, Norway. Biberauer, T. and I. Roberts. 2005. Changing EPP-parameters in the history of English: accounting for variation and change. English Language and Linguistics 9, 1: 5–46. Biberauer, T. & I. Roberts. 2008. Cascading Parameter Changes: Internally-driven Change in Middle and Early Modern English. In T. Eythórsson (ed) Grammatical Change and Linguistic Theory: The Rosendal Papers. Amsterdam: Benjamins, pp. 79–114 [this volume, Chapter 7]. Bobaljik, J. 2002. Realizing Germanic Inflection: Why Morphology Does Not Drive Syntax. Journal of Comparative Germanic Linguistics 6: 129–167. Bobaljik, J. & H. Thráinsson. 1998. Two heads aren’t always better than one. Syntax 1: 37–71. Chomsky, N. 1986. Barriers. Cambridge, MA: MIT Press. Holmberg, A. 1986. Word Order and Syntactic Features in Scandinavian Languages and English. PhD Dissertation, University of Stockholm. Hornstein, N. & D. Lightfoot (eds). 1994. Verb Movement, Cambridge: Cambridge University Press. Kayne, R. 1994. The antisymmetry of syntax, Cambridge, MA: MIT Press. Koeneman, O. & H. Zeijlstra. 2014. The Rich Agreement Hypothesis Rehabilitated. Linguistic Inquiry 45: 571–615. Lightfoot, D. 1979. Principles of Diachronic Syntax. Cambridge: Cambridge University Press. Pollock, J-Y. 1989. Verb movement, Universal Grammar and the structure of IP. Linguistic Inquiry 20: 365–424. Rizzi, L. 1990. Relativized Minimality. Cambridge, Mass.: MIT Press. Rizzi, L. 1997. On the fine structure of the left periphery. In L. Haegeman (ed.). Elements of grammar. Dordrecht: Kluwer, pp. 281–337. Roberts, I. & A. Roussou. 2003. Syntactic Change. A Minimalist Approach to grammaticalization. Cambridge: Cambridge University Press. Rohrbacher, B. 1999. Morphology-driven syntax. Amsterdam: Benjamins. Tvica, Seid. 2017. Agreement and verb movement. PhD diss., University of Amsterdam, LOT. Vikner, S. 1995. Verb movement and expletive subjects in the Germanic languages. Oxford: Oxford University Press. Vikner, S. 1997. V-to-I movement and inflection for person in all tenses. In: Haegeman, L. (ed.). The New Comparative Syntax. London: Longman, 187–213. Wiklund, A.L., G. Hrafnbjargarson, K. Bentzen & T. Hróarsdóttir. 2007. Rethinking Scandinavian verb movement, Journal of Comparative Germanic Linguistics 10: 203–233. Zwart, J.-W. 1993. Dutch Syntax. PhD dissertation, University of Groningen.


Chapter 1 was first published as Roberts, I. 1985 “Agreement Parameters and the Development of English Modal Auxiliaries,” Natural Language and Linguistic Theory, 3: 21–58. Chapter 2 was first published as Clark, R. & I. Roberts 1993, “A Computational Model of Language Learnability and Language Change,” Linguistic Inquiry 24: 299–345. Reprinted by kind permission of MIT Press. Chapter 3 was first published as Roberts, I. 1995, “Object Movement and Verb Movement in Early Modern English” H. Haider, S. Olsen & S. Vikner (eds) Studies in Comparative Germanic Syntax. Dordrecht: Kluwer, pp. 269–284. Chapter 4 was first published as Roberts, I. 1997, “Directionality and Word Order Change in the History of English.” In A. van Kemenade & N. Vincent (eds) Parameters of Morphosyntactic Change. Cambridge: Cambridge University Press, pp. 397–426. Reprinted with permission. Chapter 5 was first published as Roberts, I. 1999, “Verb Movement and Markedness,” in Michel deGraff (ed) Language Creation and Language Change. Cambridge, Mass.: MIT Press, 287–328. Reprinted by kind permission of MIT Press. Chapter 6 was first published as Chapter 5 of Roberts, I. & A, Roussou 2003, Syntactic Change: A Minimalist Approach to Grammaticalization, Cambridge: Cambridge University Press. Reprinted with permission. Chapter 7 was first published as Biberauer, T. & I. Roberts 2008, “Cascading Parameter Changes: Internally-driven Change in Middle and Early Modern English,” in T. Eythórsson (ed) Grammatical Change and Linguistic Theory: The Rosendal Papers. Amsterdam: Benjamins, pp. 79–114. Reprinted with kind permission of John Benjamins. Chapter 8 was first published as Baker, M., K. Johnson & I. Roberts 1989, “Passive Arguments Raised,” Linguistic Inquiry 20: 219–251. Reprinted with kind permission of MIT Press. Chapter 9 was first published as Rizzi, L. & I. Roberts 1989, “Complex Inversion in French,” Probus 1, 1–30, and reprinted in A. Belletti & L. Rizzi (eds) Parameters and Functional Heads. Oxford/New York: Oxford University Press, 1996. Reprinted by permission of Oxford University Press.

Acknowledgments  xi Chapter 10 was first published as Roberts, I. 1991, “Excorporation and Minimality,” Linguistic Inquiry, 22, 209‑218. Reprinted by kind permission of MIT Press. Chapter 11 was first published as Roberts, I. 1994, “Two Types of Head Movement in Romance,” N. Hornstein & D. Lightfoot (eds) Verb Movement, Cambridge: Cambridge University Press, pp. 207–242. Reprinted by permission of Cambridge University Press. Chapter 12 was first published as Cardinaletti, A. & I. Roberts 2002, “Clause Structure and X-Second”, in Guglielmo Cinque (ed) The Functional Structure of DP and IP, Cambridge: Cambridge University Press, pp. 123–167. Reprinted by permission of Oxford University Press. Chapter 13 was first published as Chapter 1 of Roberts, I. 2005, Principles and Parameters in a VSO Language: a Case Study in Welsh. Oxford/ New York: Oxford University Press. Reprinted by permission of Oxford University Press. Chapter 14 was first published as the Introduction to Biberauer, T., A. Holmberg, I. Roberts & M. Sheehan, 2010 Parametric Variation: Null Subjects in Minimalist Theory. Cambridge: Cambridge University Press. Reprinted by permission of Cambridge University Press. Chapter 15 was first published as Roberts, I. 2012, “Macroparameters and Minimalism: A Programme for Comparative Research,” in C. Galves, S. Cyrino, R. Lopes, F. Sândalo and J. Avelar (eds) Parameter Theory and Linguistic Change. Oxford: Oxford University Press, pp. 320–335. Reprinted by permission of Oxford University Press.


I am grateful to the co-authors of several of the chapters herein for our collaboration, and for their agreement in republishing the material. Their contact details are as follows: Professor Mark Baker Department of Linguistics Rutgers University 18 Seminary Place New Brunswick, NJ 08901-1184 USA [email protected] Dr Theresa Biberauer St John’s College Cambridge CB2 1CP UK [email protected] Professor Anna Cardinaletti Università Ca’ Foscari Venezia Dipartimento di Studi Linguistici e Culturali Comparati 30123 Venice Italy [email protected] Professor Robin Clark Department of Linguistics University of Pennsylvania Philadelphia, PA 19104 USA [email protected] Professor Anders Holmberg Department of Linguistics Newcastle University Newcastle NE1 7RU UK [email protected]

Contributors  xiii Professor Kyle Johnson Department of Linguistics University of Massachusetts N408 Integrative Learning Center 650 North Pleasant Street, Amherst, MA 01002 USA [email protected] Professor Luigi Rizzi Centro interdipartimentale di studi cognitivi sul linguaggio University of Siena Complesso S. Niccolò Via Roma, 56 Siena Italy [email protected] Professor Anna Roussou Division of Linguistics University of Patras Patra Greece [email protected]

Part I

Diachronic Syntax


Agreement Parameters and the Development of English Modal Auxiliaries* Ian Roberts

1. Introduction 1.1  General Background In Modern English there is a syntactically and morphologically definable subclass of verbs, the modal auxiliaries. These verbs1 differ from other verbs (main verbs) with respect to the following criteria among others (cf. Jespersen, 1909–49; Palmer, 1974; Pullum and Wilson, 1977): (1) a. Inversion:

Must they leave? *Leave they? b. Negation: They cannot walk. *They walk not. c. Agreement: *He mays, musts, wills, cans, etc. d. Non-finite forms: *He has (?)might (etc.) to do it. *They are canning to do it. *They might could do it.2

This list of properties, while not exhaustive, suffices to establish the distinction between the two classes of verbs. This distinction did not exist at an earlier stage of the language. On this matter we quote Visser (1963–73): Originally, . . . they [the modals – IGR] were not function-words, but full or independent notional verbs that did not differ syntactically in any way from other full verbs. Thus they could regularly be construed with direct objects: “ic can eow” (= “I know you”), “ic sculde tyn þusend punda” (= “I had to pay ten thousand pounds”), “eall þæt he ahte” (= “all that he possessed”). Since infinitives were nouns, the relation between them and the verbs shall, can, etc., to which they were joined must originally have been the same as that between a direct object and a full verb, so that there was structurally

4  Ian Roberts no difference in this respect between ‘he can manigfealdan spræce’ [= “he knows many languages”—IGR] and ‘he can sprecan [= “he can speak”—IGR].’ (Visser, 1963–73, p. 548) The sameness in syntactic distribution of these two classes in Middle English is shown in the examples in (2), where we can see that negation and various processes of V-fronting3 affected both modals and main verbs in the same fashion:4 (2) a. Inversion: (i) Al her cariage was stole be the Frenshmen, so mote they nedes go home on fote All their conveyance was stolen by the Frenchmen; so they had to go home on foot. (c. 1464 Capgrave Chronicle of England: V 1694) (ii) Wilt thow ony thinge with hym? Do you want him for anything? (1470–85 Malory Morte d’Arthure III, iii, 102, V 559)

(iii) Than longen folk to goon on pilgrimages Then people want to go on pilgrimages. (c. 1386 Chaucer General Prologue Canterbury Tales, 12) b. Negation: (i) ʒif ʒe wollnot to haue mercy of God If you don’t want God’s mercy. (c. 1450 Mirk’s Festial 285: V 1177)

(ii) Thy godfadirs wyff thow shalt not take You shall not take your godfather’s wife. (c. 1450 Idley Instructions 2 a. 1757: V 1489)

(iii) A blynde man kan nat juggen wel in hewis A blind man cannot judge colours well. (c. 1387 Chaucer Troilus 2, 21: V 1624)  (iv) He ne held it noght He did not hold it. (Mossé, 1952, p. 112)  (v) My wyfe rose nott My wife did not get up. (Mossé, 1952, p. 112)

Agreement Parameters and Modal Auxiliaries  5 Example (3) shows that Middle English (ME) (and some Early Modern English (ENE)) modals had non-finite forms: (3)  (i) I shall not konne answere I will not be able to answer. (c. 1386 Chaucer Canterbury Tales B 2902: V 1649)

(ii) Cunnyng no recour in so streit a neede Knowing no recourse in so desperate a need. (c. 1439 Lydgate Fall of Princes 7, 1346: V 1650)

(iii) if we had mought conuenient come together If we had been able to meet conveniently. (c. 1528 St. Thomas More Works 107, 86: V 1687)

 (iv) if he had wolde If he had wanted to. (1525 Ld. Berners, Froiss. II, 402: V 1687) We will see later that the ME modals were in fact distinct with respect to their agreement properties. This distinction will be crucial in what follows. The change that involved the development of the subclass of modals and its consequences will be the object of study in this article. The paper is organized as follows: the remainder of this introduction will be devoted to giving theoretical background to our account. In particular, we will present those aspects of Government Binding theory that are important here: government and the theory of thematic roles. We also propose a condition on the distribution of verbs. This condition leads to the postulation of two kinds of agreement system: one syntactic and one morphological. We take these two agreement systems to represent parameters of Universal Grammar (UG), and consider the development of English auxiliaries to be an instance of a shift in the value of this parameter from a morphological system to a syntactic system. In section 2, after discussing the syntax and morphology of ME modals (section 2.2), we go on to consider the causes of the parametric shift: the loss of subjunctive inflection (section 2.3) and the general loss of inflections marking verb agreement (section 2.4). In section 2.5, we describe the parametric shift in more detail and point out some of its effects. Finally, in section 2.6 we briefly consider how root modals in ENE and presentday English fit into our account. In conclusion, we compare our account with those of Lightfoot (1974, 1979) and Steele et al. (1981), criticizing Lightfoot’s Transparency Principle and suggesting a way of viewing syntactic change in terms of a parameter-setting model of acquisition. Thus the paper has three distinct but related goals. First, as a contribution to Government Binding theory, we propose and motivate the condition on verbs. Second, the paper is meant to contribute to our knowledge of the history


Ian Roberts

of English by bringing together well-known facts in a novel way. Third, the paper exemplifies an approach to diachronic syntax adumbrated in Lightfoot (1979), where syntactic change is explicitly related to aspects of acquisition. 1.2

Theoretical Background

We assume the framework of Government Binding theory (henceforth GB theory), as in Chomsky (1981, 1982). The expansion of S is as in (4): (4) S → NP INFL VP GB theory consists of a small number of autonomous subsystems of principles. Of these subsystems, the most important in this paper is the theory of thematic relations (θ-theory). We briefly sketch the main points of θ-theory below. Before doing so, however, we introduce and define the notion of government, as this will be central in what follows. 1.2.1


The definition of Government is as in (5): (5) α governs γ in a configuration like [β . . . γ . . . α . . . γ . . .] where: (i) α = X0 (a lexical element) (ii) where φ is a maximal projection, if φ dominates γ, then either φ dominates α, or φ is the maximal projection of γ (iii) α c-commands γ. (from Belletti and Rizzi, 1981, p. 12). Definition (5) means that a head α governs a node γ if and only if α c-commands γ and α c-commands no maximal projection which dominates γ except possibly the maximal projection of γ.5 C-command is defined in (6): (6) α c-commands β iff the minimal maximal projection dominating α also dominates β. Definitions (5) and (6) together define an upper limit to government. It is impossible for a head (α) to govern a node (γ) which is outside the minimal maximal projection dominating α. So α cannot govern γ in (7): (7) γ

ϕ (= α max)


Agreement Parameters and Modal Auxiliaries


On the other hand, if γ is located within the minimal maximal projection (φ) dominating α, three situations are possible: (i) φ is the minimal maximal projection dominating γ, as in (8): (8)




(ii) φ is the minimal maximal projection dominating the maximal projection of γ: (9)






(iii) γ is more deeply embedded inside φ than in either (8) or (9): (10)


δ max(≠ γ max)



In all of (8), (9) and (10), α c-commands γ. In (8) and in (9) α also governs γ. In (10), however, γ is too deeply embedded within φ to be governed by α. We can see from this discussion that government is a more restrictive version of c-command. More concretely, if in (7–10) α = V, φ = VP and γ = NP or N0, we have the following configurations: (7ʹ) NP







8  Ian Roberts  (9ʹ)









In (7ʹ), NP is the subject of V. In (8ʹ, 9ʹ), NP is the object of V. In (10ʹ), NP is not a complement of V. So we see that government is intimately bound up with complementation. In fact, Chomsky (1981, p. 51) proposes that only positions governed by α can be subcategorized by α. Now consider certain proposals in the recent literature on morphology (cf. Lieber, 1980; Selkirk, 1982; Williams, 1981): (11) In a word of the form [w Stem +Af]:

a. Af subcategorizes for the stem; b. Af heads W.

The Affix -al, for example, makes Nouns into Adjectives. This is captured by saying that -al subcategorizes for a Noun. Thus -al cannot attach to Stems which are not Ns: (12) *outP + alA *carryV + alA *fortunateA + alA. In addition, (11b) says that when -al attaches to a stem, the resulting word will be of category A. This is illustrated in (13): (13) [transformation + al]A [industry + al]A [nation + al]A. So the Affix-Stem relation is a kind of head-complement relation. Then we might suggest that in (11), Af governs Stem; (11) would then be a case of (8), where α = Af, and γ = Stem. The proposal that Affixes govern Stems is immediately supported by (i) the fact that Affixes do not govern outside the Word—this is the analog of (7), where φ = Word; and (ii) the fact

Agreement Parameters and Modal Auxiliaries  9 that Affixes cannot ‘see into’ the structure of Stems they attach to—this is the analog of (10). (The proposal also predicts that Affixes can govern the head of the stem they attach to (cf. (9)). It is not clear that this prediction has any consequences.) We conclude that there exists both syntactic government and morphological government. These are instances of the same relation. 1.2.2  θ-theory An important aspect of complementation is that complement NPs are semantic arguments of the verb. Complement NPs are in a thematic relation with the verb, e.g., “source of action”, “goal of action”, etc. (cf. Gruber, 1965; Jackendoff, 1972). In other words, complement NPs bear thematic roles (θ-roles), which are assigned to them by the verb. It is clear that θ-role assignment is closely linked to subcategorization. However, it is also clear that θ-roles can be assigned to non-subcategorized material, in particular subjects. Associated with each lexical entry for a verb is a specification of the lexical item’s argument structure in the form of a thematic grid. The thematic grid is an unordered listing of θ-roles. For illustration the θ-grids of some verbs are given in (14): (14) a. give [θ1, θ2, θ3] b. hit [θ1, θ2] c. smile [θ1]. The θ-roles are assigned to the verb’s arguments, in accordance with the θ-criterion: (15) Each argument bears one and only one θ-role, and each θ-role is assigned to one and only one argument.  (Chomsky, 1981, p. 36). Williams (1980) proposed that the arguments in a verb’s θ-grid be divided into two types: external arguments and internal arguments. Internal arguments are assigned their θ-roles from V by government, while the external argument is assigned its θ-role in subject position, i.e., externally to VP and so in a position that is not governed by V. (Chomsky (1981) proposes that VP compositionally assigns the external argument to the subject; cf. Marantz (1981) for evidence in favor of this idea.) θ-role assignment is in principle optional. However, θ-roles must in fact always be assigned because of the Projection Principle, which we state as follows: (16) The thematic properties of lexical items must be preserved at all syntactic levels.

10  Ian Roberts Together, (16) and (15) ensure that a verb like give will have exactly two internal positions at all syntactic levels; hit will have exactly one, and smile none at all. 1.3 V-Visibility6 The presentation of θ-theory and the Projection Principle so far has been standard. At this point, we introduce a further condition on θ-role assignment which is central to what follows: (17) V assigns θ-roles iff V is governed. Condition (17) holds at S-Structure. We will refer to (17) as the ‘V-VISIBILITY CONDITION’. “Governed” positions in (17) include both syntactically and morphologically governed positions. So (17), combined with the θ-criterion, forces a verb which assigns θ-roles to appear in one of the positions in (18) (where α is a head): (18) a. VP



b. a


c. [w V = Stem α = Af].

If a verb fails to appear in one of the environments in (18), it will not be ‘visible’ for θ-role assignment. However, the Projection Principle requires the arguments of the Verb to be present at all levels. As a result, these arguments will be present, but not θ-marked. And this violates the θ-criterion, (15). On the other hand, the θ-criterion is not violated if the arguments are not present, but then the Projection Principle is. Also, if a verb has no θ-roles to assign, then (17) will force such a verb to appear in an ungoverned position. If it appeared in a governed position it would have to assign θ-roles, but the Projection Principle prevents this. So a verb with no θ-roles to assign will have a radically different distribution compared to other verbs. We propose that there does exist a class of verbs with no θ-roles to assign: the modal auxiliaries. The examples of (1) show that these verbs have a different distribution to other verbs. Condition (17) forces modals to appear in an ungoverned position.

Agreement Parameters and Modal Auxiliaries  11 We make the following assumption about negation and inversion: (19) Negation and inversion are processes affecting INFL. Given (19), (1a,b) are evidence that only modals can appear in INFL. INFL is an ungoverned position, and so modals must appear there because of (17). Main verbs cannot appear in INFL because of (17). Let us return now to the configurations in (18) and consider possible values for α. In (18a), V is syntactically governed by α. Here the obvious candidates for α are INFL and V. In (18c), V is morphologically governed by α. Here α is some kind of verbal inflection, e.g., an agreement affix or some kind of participial affix. Configuration (18b) also involves syntactic government, but here it is not so clear what α could be. It follows from X-bar theory that V is the only head in VP, so either V or a must have moved from its base position. We will leave (18b) aside for the remainder of the paper. If V is governed by another V in (18a), what is the status of the governing V with respect to (17)? The governing V may itself be governed by yet another V or by INFL. If the governing verb is an auxiliary, it must be in INFL, as we saw above. If the governing verb has θ-roles to assign, then it must subcategorize for VP: this possibility is exemplified by the class of causative and perception verbs make, let, see, hear, etc. (Cf. Manzini (1983) for a treatment of these verbs as taking small-clause VP complements.) A third possibility is that α = INFL with no lexical auxiliary. In this situation, the abstract agreement features (AGR) in INFL govern V, if the clause is finite. If the clause is nonfinite, to governs V.7 Since (17) prevents auxiliaries from appearing in governed positions, we predict auxiliaries to be incompatible with agreement. This prediction is correct for modals, but incorrect for the aspectuals have and be. In fact, we have now isolated three properties which distinguish auxiliaries from main verbs in ways predicted by the claim that auxiliaries have no θ-roles. Main verbs are correctly distinguished from modals, but aspectuals seem to straddle the division, as (20) shows: (20)

θ-roles Main verbs + Modals − Aspectuals −

agreement + − +

appearance in INFL − + +

Agreement is the exceptional property here. For the purposes of this paper, we make the simplifying assumption that aspectuals show inherent agreement, and so in fact are not governed by AGR.8 In (18c), the Affix governing V may be a participial affix like passive -en or progressive -ing. Also, it may be an agreement affix. In languages with a variety of affixes marking agreement, we frequently find Verb-movement to INFL. This movement rule has been proposed for French (by Emonds,

12  Ian Roberts 1978), for German (by Safir, 1982) and for Welsh (by Sproat, 1983). Koopmann (1983) refers to this as the “the NP-type of V-movement”. She notes that this kind of V-movement has the following properties: (21) a. Movement does not take place when an auxiliary is present. b. Movement only takes place in finite clauses.  c. Movement is always clause bound. We propose that the NP-type of V-movement is motivated by (17): it has to occur whenever the verb’s D-Structure position is not a visible position, i.e., is not governed by AGR or an auxiliary. Schematically, V-movement takes place as shown in (22): (22) a. D-Structure: NP [INFL[V[V—]Af]] [VP V . . .] b. S-Structure: NP [INFL[V Vi + Af]] [VP [V ei] . . .]. We propose that languages with ‘rich’ agreement systems in fact lack AGR. Affixes are generated in INFL, with an empty subcategorized verb-stem attached. Affixes are unable to govern out of the word they head, as we saw. So, in order to satisfy (17), the verb-stem moves from its D-Structure position into the empty verb-stem position in INFL. In this way, the verb-stem appears in environment (18c) at S-Structure and so satisfies (17).9 We can derive Koopmann’s generalizations about V-movement given in (21) from (17). First, the fact that movement does not take place when an auxiliary is present follows from the fact that the auxiliary already occupies the verb-stem position in INFL. Also, (17) is satisfied by the auxiliary syntactically governing V. Second, the lack of movement in nonfinite clauses is a consequence of the non-appearance of the agreement affixes in such clauses: (17) is satisfied in situ by infinitival affixes. Third, if we assume that movement always takes place to the nearest visible position, then the clauseboundedness follows, as V will always move to the nearest INFL. The discussion of (18a) and (18c) amounts to a description of two systems of agreement. In the first, AGR governs V in the configuration of (18a). We suggest that this type of agreement is typical of languages with little or no agreement morphology. We will refer to this system as a ‘syntactic agreement system’, since (17) is satisfied by syntactic government of V. The second system of agreement involves the base-generation of agreement affixes in INFL, and movement of V into the empty V position which these affixes subcategorize for. V then satisfies (17) in the configuration (18c). We refer to this second system as a ‘morphological agreement system’, since morphological government satisfies (17). Morphological agreement systems are typical of languages which are rich in agreement morphology.10 We now make explicit what was implicit in the above paragraph: the choice of a morphological or a syntactic agreement system is a parameter of UG. Our central proposal in this paper is that English has historically

Agreement Parameters and Modal Auxiliaries  13 developed from having a morphological agreement system to having a syntactic agreement system. We propose that this parametric change took place during the sixteenth century. The development of a class of modal auxiliaries was part of this change. The parametric change involved a change in the structures in which modals appeared. Formerly, modals assigned θ-roles and appeared in the configuration in (23), and met condition (17) by moving into INFL in the syntax. Modals were just like all other verbs in this respect: (23)






V modal

VP ...


In (23), the modal syntactically governs the lower V. When modals were reanalyzed as auxiliaries, they no longer assigned θ-roles (cf. section 2.6 for a slight restatement of this point). So (17) forced them to appear in an ungoverned position. They thus were analyzed as appearing in a configuration like (24): (24)








Here no agreement morphology or abstract AGR can be present. The modal syntactically governs V. The factors which led to the parametric change were: (25) (i) The use of modals as functional substitutes for the moribund system of subjunctive inflections. (ii) The morphological irregularity of the modals. (iii) The phonologically motivated obsolescence of agreement inflection. The first factor, (25i), meant that modals were interpreted as clausal operators specifying the mood of the clause, exactly like subjunctive inflections. Clausal operators do not assign θ-roles, and so modals could be construed as

14  Ian Roberts not assigning θ-roles. The second factor, (25ii), made it appear that modals lacked agreement, and (17) now forced them to be construed as clausal operators with no arguments. The increasing frequency of periphrastic constructions like (24), where V is syntactically governed and no agreement is present, combined with the general loss of agreement morphology, due quite independently to phonological change, the third factor, led to the resetting of the agreement parameter. The result of the change in the agreement parameter is that present-day English verbs can be divided into the two classes illustrated in (1). Modals can only appear in INFL, because of their lack of θ-roles and the resulting effects of (17). Thus modals are affected by negation and inversion. At the same time, (17) also prevents any kind of verbal affix from attaching to modals. Finally, present-day English lacks Verb-movement to INFL as a consequence of the parametric change. The absence of this rule has a range of consequences as we shall see in section 2.5. We now consider in more detail what happened.

2.  The Changes 2.1 Introduction In this section we discuss in detail the various changes that took place in the auxiliary and agreement systems. We will argue that the syntactic differences between present-day English and Middle English verb systems, as outlined in the Introduction, are the result of one major parametric change and an associated lexical change. First we describe the syntax and morphology of ME modals, i.e., we look at their properties prior to their being reanalyzed as auxiliaries. We point out certain morphological and syntactic peculiarities which set these verbs off as a somewhat marked class even in ME. Sections 2.3 and 2.4 are both concerned with the loss of verbal inflection, in different ways. Section 2.3 focusses on the subjunctive mood, which as a morphological paradigm was moribund in late Middle English. Periphrastic constructions with modals replaced the subjunctive, adding to the incidence of syntactic government of V. Section 2.4 deals with the general loss of verbal inflection during late Middle English. We also consider the rise of the periphrastic construction with do at this point. This construction is important because its frequency greatly decreased the amount of evidence for a morphological agreement system available to learners of the language. This is so because do appears in INFL, syntactically governing V. Also, do was most common in the sixteenth century in precisely those structures where it is obligatory today, namely, in questions and negatives. These constructions provided evidence of Verb-movement to INFL, and therefore, given our assumptions, of a morphological agreement system.

Agreement Parameters and Modal Auxiliaries  15 Section 2.5 considers the parametric change and its consequences. The major consequence of the rise of a syntactic agreement system replacing the morphological system was the loss of Verb-movement to INFL. The loss of this rule has the immediate consequence of rendering do-support obligatory where it had formerly been optional. This, combined with the reanalysis of the modals as auxiliaries, explains the contrast in (1a,b). Other consequences of the loss of Verb-movement to INFL are the restriction of both quantifier-floating and adverb-placement to preverbal positions. We suggest that neither of these operations has changed historically but the loss of Verb-movement to INFL has given rise to this surface constraint. We also discuss the loss of other ‘verb-like’ properties of modals in this section: the loss of direct objects, and the loss of participial and other non-finite forms. Both of these developments follow from the absence of θ-roles, if (17) is correct. Finally, in section 2.6 we discuss the status of root modals since the reanalysis. We suggest that root modals have retained adjunct θ-roles. Adjunct θ-roles, however, are neither subject to the θ-criterion nor to (17). Hence, there is a sense in which ability can, for example, appears to take arguments. However, this kind of argument-taking is independent of (17), and so has no consequences for our proposals here. 2.2  Modals in Middle English 2.2.1 Syntax In the examples we have given of ME modals, such as (2), the modals appear preceding a verb in the infinitive. We take it then that ME modals subcategorized for VP. Thus they appeared in a position governing VP, in a structure like the following: (26)








The modal must move to INFL in order to be morphologically governed and so meet (17). The modals have somewhat unusual θ-marking properties in (26). First, they subcategorize for VP, and so θ-mark VP. This means that VP in (26) is an argument. Arguments are generally interpreted as referential, but such an

16  Ian Roberts interpretation is not available for VPs. So we say that modals had nonreferential arguments. This is a marked property. More interestingly, it is clear from the examples in (2) that the subject of the modal is also the subject of the head of VP2. This can be most clearly seen in (2b,ii) and (2b,iii), which are repeated here for convenience: (2) b. (ii) Thy godfadirs wyff thow shalt not take You shall not take your godfather’ s wife. (c. 1450 Idley Instructions 2a. 1757: V 1489) (iii) A blynde man kan nat juggen wel in hewis A blind man cannot judge colours well. (c. 1387 Chaucer Troilus 2, 21: V 1624) In these examples, it is clear that thow and a blynde man are the respective subjects of take and juggen ‘judge’. This fact, combined with the general ‘epistemic’ meanings of the modals, leads to the suggestion that the modals were raising verbs in Middle English. Further plausibility is added to this idea by the fact that the equivalents of modals in a number of languages are raising verbs (e.g., most Germanic and Romance languages). If modals were raising verbs in Middle English, (26) would be replaced by (27): (27)












The subject of the lower clause becomes the subject of the matrix clause by means of Raising, one instantiation of Move-α. The subject of a raising verb is a nonthematic position. Because it is a nonthematic position, this position is a potential ‘landing site’ for NP-movement. All the diagnostics for raising are essentially tests to see if the subject position of a given verb is thematic or not. These tests include finding out whether expletives can appear in subject position, and whether idiom chunks can appear in subject position. If these possibilities exist, the subject is nonthematic.

Agreement Parameters and Modal Auxiliaries  17 Applying the second test first, it is very difficult to tell whether something is an idiom or not, in the absence of native-speaker intuitions on the matter. Hence this test is not useful for us. For the first test, it seems that ME modals could in some circumstances have expletive subjects. These were sometimes phonologically null, as in (28a) and (28c). However, all the examples have an oblique Case-marked NP associated with the subject of the complement clause. If we assume that oblique Case is inherent and that raising is motivated by the Case Filter, the presence of oblique Case-marked NPs associated with the subject of the complement clause indicates that these sentences do not involve raising. (Presumably the oblique NP controls the PRO subject of the complement.) (28) a. Mee moste nedys been dampned for this I will have to be damned for this. (1455 Speculum Misercordie, 251: V 1715) b. It deuit me no langare for to ly11 I must no longer lie. (c. 1490 Lancelot of the Lake, 18: V 57) c. Vs muste make lies We must tell lies. (c. 1440, York Myst. 164, 321: V 33) Thus the sentences in (28) do not provide evidence that ME modals were raising verbs. There are further difficulties with the idea that ME modals were raising verbs: it is difficult to show that the complement to modals was sentential. The complement is never tensed. On the assumption that to appears in INFL and that INFL is the head of S, the presence of to in infinitivals could be taken as evidence of a sentential complement. However, to rarely appears in such complements (but cf. (2b, i)). A final point on this matter: if we could show that ME modals were raising verbs, then the diachronic change could be schematized as in (29): (29) e Modal  S NPi VP ⇒ NPi Modal VP. This change closely parallels the synchronic rule of restructuring proposed for Italian by Rizzi (1982). The surface evidence we have is consistent with the idea that ME modals were restructuring verbs, but we have no clear positive evidence for this hypothesis. If the ME modals were restructuring verbs, then the change in their syntax would involve a reanalysis of the restructured structure as the D-Structure, with the corresponding alteration of the thematic properties of the modals. We conclude for now that ME modals appeared in the S-Structure configuration given in (26), with the

18  Ian Roberts added possibility that this configuration was the result of the application of restructuring. Some of the ME modals also had direct objects, as shown in (30): (30) a. for all the power thai mocht for all the power at their command. (1470 Henry, Wallace iii 396: Lightfoot (1979: 101)) b. Ich hit wulle heortlicher I want it very much. (c. 1225 Ancrene Wisse 199, 23 (ed. Tolkien)) c. God grante I mot wel achieve God grant that I’ll be able to achieve it. (c. 1390 Gower Conf. Am. I, 6 i: V 1689) We can clearly see that modals had exactly the status of main verbs in these examples. They assigned θ-roles to their subcategorized objects, and so were subject to (17). Middle English had a morphological agreement system, so in this usage the modals, like any main verb, moved into INFL in the derivation from D-Structure to S-Structure. To sum up, modals appeared preceding NP and VP at S-Structure. In the latter case, the possibility exists that the S-Structure was the result of restructuring, although I have been unable to find evidence to confirm this. There is also the possibility that ME modals were raising verbs; on this point, too, crucial evidence is lacking. 2.2.2 Morphology ME modals were a morphologically definable subclass of verbs. In fact, they had quite irregular conjugations. In the present tense, they had the regular second person singular agreement, but lacked third person singular agreement. They also had irregularly formed preterits, although the preterits showed the regular plural agreement. Both the lack of third person singular agreement in the present tense and the irregular preterits were the consequence of the modals’ membership of the Proto-Germanic class of preteritpresent verbs. Preterit-present verbs were verbs whose preterit had taken over the functions of the present, and so a new preterit had been formed by analogy. This class had about a dozen members in Old English, but just the modals were left by late Middle English, in the standard dialect (cf. Lightfoot, 1979, pp. 101–103, for documentation). Plural agreement was lost in the early sixteenth century (cf. section 2.4). After the plural endings had disappeared, the only agreement distinctions remaining were the second person singular and the preterit/present distinction. However, as Lightfoot points out, the preterit/present morphological

Agreement Parameters and Modal Auxiliaries  19 distinction did not correspond to the usual semantic opposition, and so to some extent pairs like shall and should came to be felt to be separate lexical items, rather than different tenses of the same verb. Lightfoot comments: the preterits seem to have been unstable from early times, perhaps as a result of competition from the subjunctive . . . The breakdown of the productivity of preterit/present relationship appears to have started quite early and the preterit and present tense forms developed uses independently of each other and the tense relationship between them was steadily eroded. (Lightfoot, 1979, p. 104) So we can see that ME modals were morphologically marked, semantically anomalous in at least one respect, and syntactically marked in taking VP as an argument (or possibly in being restructuring verbs). These factors by themselves may not have been sufficient for reanalysis—in fact they were present in the language for centuries before the modals were reanalyzed. However, combined with the loss of verbal inflections, and in particular with the loss of the subjunctive paradigm, the marked properties of ME modals we have seen in this section allowed the reanalysis to take place. 2.3  Loss of Subjunctive Inflection Here we consider the role played in the reanalysis of the modals by the loss of subjunctive inflections in Middle English. These inflections were not entirely lost; in fact the subjunctive still exists in some dialects and registers of present-day English.12 However, the ME period saw a considerable rise in the frequency of periphrastic constructions, consisting of a modal with an infinitive. These periphrases presumably grew in frequency due to the loss of distinctions between the indicative and the subjunctive caused by phonological changes. Visser commented on this: In the earliest periods of the English language the modally marked form [the subjunctive—IGR] was extensively used in all sorts of writings. That this did not remain so was in the first place due to the phonological changes the language underwent in the course of time. (Visser, 1963–73, p. 789) It seems that the subjunctive could be replaced by a modal in every major ME use of the subjunctive.13 With regard to the development of periphrastic subjunctives, I add nothing to the traditional account: the verbal inflections which manifested the subjunctive/indicative distinction no longer existed due to phonological change, hence a new means of expressing modality arose. This development was important for the parametric change because it meant that by late Middle English the modals commonly appeared as “semantic substitutes” for verbal inflection. This meant that modals were

20  Ian Roberts being construed as clausal operators, like subjunctive inflection. As clausal operators, modals assign no θ-roles. In a subjunctive clause, the head of VP and not the subjunctive inflection assigns θ-roles to the NPs in the clause; likewise the modal assigns no θ-roles, the head of its complement VP does. If modals have no θ-roles to assign, then (17) forces them to appear in ungoverned positions. In other words, if modals are semantic substitutes for the subjunctive, they have the same θ-properties as the subjunctive (i.e., none), then they must not be governed or show agreement. This in turn means that modals in these uses could appear in INFL, as in (24), governing V but not themselves governed. We suggest that it was possible for modals (rather than, say, adverbs) to functionally substitute for the subjunctive because they already expressed generally ‘modal’ notions. In other words, it is clear that the core lexical meaning of modals facilitated this part of the change. Also, we saw in section 2.2.2, that modals as a class appeared to lack agreement morphology. Thus, modals both appeared on morphological grounds not to meet (17), and were semantically compatible with an interpretation on which they were not construed as θ-role assigners. The decline in the subjunctive led to a rise in the number of constructions in which modals had to be construed as clausal operators. So, to be compatible with (17), modals were reanalyzed as lacking θ-roles. As a result of this reanalysis of the θ-properties of modals, modals were forced to appear only in ungoverned positions. This in turn increased the number of periphrastic constructions showing syntactic government of V still more. 2.4  Loss of Agreement Inflections As we mentioned in section 2.2.2, agreement inflections had almost disappeared by the mid-sixteenth century.14 The prime cause for the loss of these inflections was phonological, as was the case with the loss of other inflections in English (e.g., case endings on nouns, and markings of adjectival concord). The loss of verbal inflections took place over a long period; certain OE distinctions had already been lost by the beginning of the ME period. Also, it is obviously true that not all agreement was lost: -s for third person singular (3Sg) survives to the present day, and second person singular (2Sg) -st lasted as long as the pronoun thou, a considerable time after the sixteenth century. The crucial fact seems to have been the loss of plural agreement, and this happened in the sixteenth century. Loss of plural agreement meant that preterits (except for be) showed no person agreement, and that present tenses agreed only in 2Sg and 3Sg. Thus language learners at this time were faced with a highly impoverished morphological agreement system. We have already discussed, in section 2.3, the increase in the use of the periphrastic subjunctive in Middle English. Other periphrastic constructions

Agreement Parameters and Modal Auxiliaries  21 also arose during this period. Many authors (Jespersen, 1938; Traugott, 1969, for example) point out that the otherwise independent development of the progressive, perfect and passive during Middle English added to the number of periphrastic constructions. The most important such construction, however, is that with do. Do in these constructions was a semantically empty tense carrier, and so, like Tense, assigned no θ-roles. So (17) forces do to appear in INFL. In this respect do parallels the modals. Where modals were periphrastic substitutes for the subjunctive, do was a periphrastic substitute for tense. Periphrastic constructions with do were most common in the sixteenth century. One reason for this could be the loss of verbal inflection, leading to the use of a periphrastic construction to signal tense more clearly. We can put this intuitive statement into our terms and say that periphrastic do indicates a marked increase in the number of constructions with syntactic verb-government and V in situ as opposed to morphological V-government and V-movement to INFL. We now sketch briefly the rise of this construction. Do was used with a following infinitive throughout Middle English. It could either be a semantically empty tense carrier, as in Modern English, or a causativizer (cf. Ellegård, 1953, p. 208, on the relation between these). The following are examples of causatives with do: (31) a. that they kepyn and do kepyn . . . accorde and pes that they keep and make (others) keep accord and peace. (c. 1475 Gregory’s Chronicle p. 138: V 1212). b. they shall putt or done putt in any certaine place they shall put or have put (i.e., do + infinitive). (c. 1475 Gregory’s Chronicle p. 145: V 1212). We can see the causative meaning of these examples from the fact that the finite form of the main verb is given and then repeated with do. Visser says of this construction that ‘Soon after 1500 this do + infinitive pattern became obsolete’. The last example given is: (32)

Every such person . . . shall doe make a seale Every such person shall have a seal made. (MMED: 1541 Act 33 Henry VIII: V 1212).

So the causative do disappeared in the sixteenth century. The sixteenth century is noted for the frequent occurrence of semantically empty do. Jespersen comments: At first it [auxiliary do—IGR] was used indiscriminately without any definite grammatical purpose. In some poets such as Lydgate, in the beginning of the fifteenth century it served chiefly to fill up the line

22  Ian Roberts and to make it possible to place the infinitive at the end as a convenient rime-word. Sometimes it served to make the tense clear in verbs that are alike in present and preterite . . . “the holy spyryte dyd and dothe remayne and shall remayne” (Fisher c. 1535). The culmination was reached in the sixteenth century. . . But then a reaction set in and gradually restricted the use of do to those cases that are well known from grammars of Present English. (Jespersen, 1938, p. 195) These remarks are supported by the following statistical evidence, presented in graph form, from Barber (1976): (33) 100% 90% 80% 70% 60% 50% 40% 30% 20% 10%




Figure 1.1 Auxiliary do. Percentage of do forms in different types of sentence, 1500–1700.   Upper broken line: negative questions.   Upper solid line: affirmative questions.   Lower broken line: negative declarative sentences.   Lower solid line: affirmative declarative sentences. (Adapted from Alvar Ellegård, The Auxiliary Do, University of Gothenburg 1953, from Barber (1976).)

Agreement Parameters and Modal Auxiliaries  23 Notice that do is particularly frequent in both questions and negatives. These are exactly the environments that provide evidence of a morphological agreement system by providing evidence of V-movement to INFL. In the late sixteenth century, up to almost 90% of negative questions, almost 60% of affirmative questions and almost 40% of negative sentences had a periphrasis with do. So in these cases no evidence for a morphological agreement system was available. The direct evidence for a morphological agreement system was rather slight after the loss of plural agreement inflections. The frequent occurrence of periphrastic constructions involving modals and do, combined with the impoverishment of agreement inflection, led to a change in the agreement system in the sixteenth century. The change was from a morphological agreement system to a syntactic system. In other words, V no longer moved into INFL in tensed clauses in order to be morphologically governed by an agreement affix, and thereby meet condition (17). Instead, V met (17) by being syntactically governed in its base position by some element in INFL, an auxiliary or abstract agreement feature (AGR). No agreement affixes appeared in INFL any more; instead, abstract AGR appeared. AGR is not an affix, and can govern out of INFL. Thus, where an auxiliary was not present, AGR syntactically governed V in its base position. Certain feature combinations were spelled out on the verb at PF: 2Sg as -st, 3Sg as -s or -(e)th. This situation obtains in present-day English, except that 2Sg has disappeared, and 3Sg is only spelled as -s. For convenience, we illustrate the change with the following diagrams: (34) a. Middle English: S



[w Af]




Af morphologically governs V.

b. Modern English: S



} } AGR Aux




AGR or Aux syntactically governs V.

24  Ian Roberts 2.5  Consequences of the Change The principal consequence of the change from a morphological to a syntactic agreement system was the impossibility of V-movement to INFL. Verbs no longer needed to move to INFL, as (17) was satisfied by syntactic government of V. Also, verbs no longer could move to INFL because there is no possible landing site. Assuming INFL, like any other node, contains only one position, that position is always occupied after the change by AGR or to or some auxiliary. The contrasting behavior of modals and main verbs in questions and negatives in present-day English, illustrated in (1), is also a consequence of the change in the agreement system: (1) a. Inversion:  Must they leave?    *Leave they? b. Negation:  They cannot walk.    *They walk not. The contrast illustrated here must hold in Modern English; modals must appear in INFL because they do not assign θ-roles, and so cannot appear in V, which is always governed by AGR, an Auxiliary or to. We are assuming that negation and inversion are operations on INFL—hence the behavior of modals in (1). Related to the above point was the development of obligatory do-­ support. As we saw in section 2.4, do-support had been optional in late Middle English and into the sixteenth century. Do-support is forced in the first place because V-movement to INFL was impossible. If we assume that not must attach to a lexical element in INFL, and that movement of INFL for inversion must move a lexical item, then we see why do is necessary in questions and negatives in Modern English. (17) predicts the absence of agreement on modals and the impossibility of modals appearing in nonfinite forms. In such cases, a modal would be morphologically governed by some participial affix, or syntactically governed by to in INFL. Thus the contrast in (1c,d) is explained. Another consequence was the disappearance of direct objects of modals. Lightfoot (1979, p. 101) gives the following as the last attestations of modals with direct objects (but see section 2.6): (35) a. the leeste ferthyng þat y men shal the least farthing that I owe to people. (c. 1425 Hoccleve, Min. Poems xxiii 695). b. Yet can I Musick too Yet I know music too. (1649 Lovelace, Poems (1659) 120). c. for all the power thai mocht for all the power at their command. (1470 Henry, Wallace iii 396 (cf. (30a)).

Agreement Parameters and Modal Auxiliaries  25 Modals were no longer able to assign θ-roles for the reasons we saw in sections 2.2 and 2.3. It follows from the θ-criterion that they could no longer have direct objects. We return to this issue, in the light of further data, in section 2.6. We have seen that the change in thematic properties of the modals, the change in the agreement system, and condition (17) together derive the facts of present-day English illustrated in (1) from the ME situation shown in (2). Some further consequences follow from the absence of V-movement to INFL. In Modern English, quantifiers semantically associated with the subject can ‘float’ rightward. However, no quantifier can float past a main verb, although they can appear to the right of auxiliaries: (36) a. They must have all left. b. *They must have left all. At an earlier stage of English, (36b) was allowed. In fact, (36b) died out in the sixteenth century (cf. Lightfoot, 1979, pp. 168–196). This change can be related to the loss of Verb-movement to INFL, without any change being posited in the rule of quantifier-floating itself. Assume that floated quantifiers have always appeared in the X-position in (37): (37)








As long as English had a rule raising verbs to INFL, as illustrated in (37), floated quantifiers could appear after the main verb at S-Structure. Once the V-movement to INFL rule was lost after the parametric change, however, quantifiers could no longer appear following the main verb. Since auxiliaries appear in INFL, floated quantifiers can still appear after auxiliaries. So we explain the Modern English contrast in (36) and its absence in Middle English in terms of the parametric change in the agreement system. Similar reasoning may well explain why adverbs stopped appearing between a tensed verb and its object in the sixteenth century. Lightfoot comments: “In ME light Adverbs . . . regularly occurred here [in immediate postverbal position—IGR], but from ENE it ceased to be a possible position: *he wrote well the poem, *he touched lightly her shoulder.” If we take the X-position in (37) as the position of adverbs in such cases, then the sentences where the adverb intervenes between verb and object are just like those with floated quantifiers. The disappearance of V-movement to INFL as a consequence of the change in the agreement parameter entailed the disappearance of sentences like those cited by Lightfoot above.15

26  Ian Roberts The above shows that the parametric change in the agreement system had a number of consequences for English syntax. Most of the consequences follow from the loss of V-movement to INFL. 2.6  Root Modals We have said that modals were reanalyzed as verbs with no θ-roles. One consequence of this reanalysis is that modals were unable to take direct objects. However, we do find late examples with direct objects, particularly with will and can. Sentence (38) is a seventeenth-century example of will with a direct object: (38) Where we would no pardon they laboured to punish us Where we wanted no pardon they laboured to punish us. (1643, OED will B22). See also (35b) for a late example with can. Warner (1983) points out that there is evidence that will and can, and to a lesser extent may, have main-verb properties later than the first half of the seventeenth century. The mainverb properties in question are infinitival and participial forms: (39) a. In evill, the best condicioun is not to wille, the second not to can In evil, the best way to be is not to want to (do it) and the second best not to be able to. (1607: Bacon G. PI. Ess. Arb. 242, in OED Can A. 5). b. If he had woulde, he might easily have . . . occupied the Monarchy If he had wanted to, he could easily have occupied . . . (1633: Donne Hist. Septuagint 226, OED Will B. 49). All of these examples involve root modals, as can clearly be seen from the glosses. (Note also that the last example of shall with a direct object is root, as it means ‘to owe’.) The above examples show that root readings, assigning a θ-role and appearing in nonfinite forms are correlated. On the other hand, epistemic readings, absence of nonfinite forms and having the status of a clausal operator correlate. This was the situation for a time in Early Modern English, it seems. These correlations are consistent with (17). ENE root modals assigned θ-roles, and so could be syntactically governed by to, as in (39a), or morphologically governed by participial morphology, as in (39b). Epistemic modals could not appear in these environments because of (17). We address two questions in this section: (i) What was the status of the modals in (38) and (39) with respect to the reanalysis of the modals as auxiliaries? (ii) What is the nature of the root/epistemic distinction in present-day English? To answer the second question first: we follow Zubizarreta (1982) in treating root modals essentially as modifiers. Root modals are analogous

Agreement Parameters and Modal Auxiliaries  27 to a class of sentential adverbs: Jackendoff’s (1972) agent-oriented adverbs. The evidence for the parallelism comes from contrasts like the following: (40) a. John can (ability) read Arabic. b. John deliberately read the forbidden text. (41) a. *Arabic can (ability) read easily. b. *The forbidden text deliberately read easily. Here we can see that both root modals and agent-oriented adverbs require an agent. Middle verbs, like read in (41), are formed from transitive verbs by a lexical process of deletion of the agent θ-role. The root modal and agentoriented adverb are grammatical with transitive read in (40), but ungrammatical with middle read in (41). The contrast between these examples is captured by saying root modals and agent-oriented adverbs require the presence of an agent argument in the clause they modify. Consider next (42): (42) Klamath can be/was deliberately heard in the wilds of Oregon. We saw that root modals and agent-oriented adverbs require an agent argument in the clause they modify. Example (42) shows us what happens to this requirement in passives where the derived subject cannot be construed agentively. This example is grammatical because an agent argument is considered to be ‘implicitly’ present (see Zubizarreta (1983) on implicit arguments). Because of the implicit argument, the root modal and agent-oriented adverb are allowed here. Some adverbs and some root modals are oriented to subject position rather than to an agent argument: in these cases, the selection requirement for an agent still holds, so (42) results as ungrammatical (cf. Jackendoff, 1972; Zubizarreta, 1982). For us, distinctions among adverbs are less important than the fact that modals and adverbs pattern alike as modifiers. We have now seen evidence that both root modals and agent-oriented adverbs have a semantic argument. However, this argument is always the argument of some other predicate. So, given the θ-criterion, we are led to suppose that root modals and agent-oriented adverbs do not assign θ-roles to their arguments. However, there is a modification relation between the root modal or adverb and the agent argument in the examples above. Zubizarreta captures this by proposing a different class of thematic relations: adjunct θ-roles. Adjunct θ-roles differ from ‘main’ θ-roles, i.e., all instances of θ-roles we have considered so far, in that they’re not subject to the θ-criterion. So adjunct θ-roles can be assigned to some argument already bearing a θ-role. Also adjunct θ-role assignment is optional. We adopt the notion of adjunct θ-roles here. We also exempt adjunct θ-role assigners from (17). So root modals appear in ungoverned positions in present-day English and assign adjunct θ-roles to the agent argument in the clause in which they appear.

28  Ian Roberts To return now to the first question, we have seen that root modals, as adjunct θ-role assigners, pose no problem for our account, as long as we assume that adjunct θ-role assigners are exempt from (17). Also, root modals appearing as main verbs pose no problem, as they were syntactically governed, and so able to assign main θ-roles. The question now is: how did the constructions illustrated in (38) and (39) die out, and root senses of auxiliary modals develop? In fact, given Zubizarreta’s theory of adjunct θ-roles, nothing in our account of the development of modal auxiliaries prevents some of that class from being adjunct θ-role assigners. Adjunct θ-roles seem to form quite a limited semantic class, having primarily to do with notions of volition and intention. The main θ-roles assigned by ME modals had mainly to do with such notions, so these θ-roles were reanalyzed as adjunct θ-roles, when (17) and the other factors discussed in sections 2.2 and 2.3 led to the reanalysis of the modals. The crucial aspect of the change is that modals were not construed as assigning main θ-roles. We can conclude that alongside the ‘main-verb’ ENE root modals in (38) and (39) there were auxiliary root modals. The main-verb root modals were then redundant. We commented earlier on the marked features of ME main-verb modals. After the reanalysis of modals as auxiliaries, the surviving main-verb root modals would have been even more marked. A further factor, noted by Lightfoot (1979, pp. 112–3), is the rise of ‘semi-­ auxiliaries’ like have to, be able to and be going to. These paraphrases took over many functions of the modals in nonfinite clauses. It is unclear whether they existed before the reanalysis, but it is clear that they became more common during the sixteenth century. The availability of these paraphrases must have speeded the decline of the highly marked main-verb modals. As a result, main-verb modals are not attested later than the seventeenth century. So main-verb root modals were not exactly ‘replaced’ by auxiliary root modals: auxiliary root modals could have existed from the time of the reanalysis, given certain plausible assumptions about adjunct θ-roles, and main-verb root modals were both marked and redundant, and eventually replaced in finite clauses by auxiliaries, and in nonfinite environments by semi-auxiliaries. 2.7 Conclusion We have presented in this section a parametric change from the history of English. The change was from a morphological system of agreement to a syntactic system. An important aspect of this change was the development of a class of verbs which did not assign main θ-roles: the modals. The causes of the change were various morphological irregularities of modals in Middle English, and the loss of verbal inflections—in particular inflections marking plural person agreement and the subjunctive/indicative mood distinction. The effects of the change were quite wide-ranging, but the principal effect was the loss of Verb-movement to INFL.

Agreement Parameters and Modal Auxiliaries  29

3. Conclusion This conclusion has three sections. In the first two we briefly discuss earlier accounts of the facts considered in section 2; section 3.1 deals with Lightfoot (1974, 1979) and section 3.2 covers Steele et al. (1981). In section 3.3 we suggest a way of relating language change and language acquisition that incorporates the insights of a parameter-setting model of acquisition. We propose that the Transparency Principle of Lightfoot (1979) can be eliminated as its major results are guaranteed by a parameter-setting approach. 3.1  Lightfoot (1974, 1979) Here we will focus primarily on the 1979 account. Lightfoot discusses the changes we have analyzed in the context of arguing for the possibility of radical restructuring of grammars. The crucial concept is the Transparency Principle. Lightfoot’s Transparency Principle forces changes to take place when the relationship between the adult grammar underlying the input data to acquisition, and the actual input data the acquirer receives is “too opaque”. This opacity typically arises from the accumulation of irregularities. So, for Lightfoot, the change from the ME grammar (no separate class of auxiliaries) to the Modern English grammar (a distinct class of modals; do-support) took the form of a radical restructuring of the PS-Rules of the base in Early Modern English. This restructuring involved in particular the introduction of a new category: Aux. The restructuring was forced by the Transparency Principle, and took place abruptly in the early sixteenth century. The Transparency Principle came into operation because modals no longer had enough verb-like properties to be analyzed by language acquirers as verbs, so the reanalysis was forced. Lightfoot (1979, pp. 101–104) gives five independent causes for the opacity of the categorial membership of the ME modals. Lightfoot goes on to claim that these five factors led to a situation in Early Modern English where language learners ceased analyzing the modals as members of the category V, and instead posited a separate category Aux. This in turn led to the change in the PS-Rules in the base, and the reassignment of some lexical items, the modals, to the category Aux. The consequences of this change include the reformulation of the rules of negation and inversion so as to affect the new Aux node, instead of affecting V, as had formerly been the case. Aux includes do, so the change in these rules captures the development of do-support for negation and inversion. The other major consequence of the change in the PS-Rules and the reassignment of modals to Aux was the disappearance of infinitival and participial forms of modals. All these surface changes result from the restructuring of the base rules and the introduction of a new category. Aux is inherently tensed, so no

30  Ian Roberts nonfinite or participial forms can appear. Also, Aux does not iterate, so double-modal sequences do not appear. Our account is quite close to Lightfoot’s. We consider many of the same factors to be involved (in particular the morphological and semantic irregularity of ME modals). The major difference is that our account is framed within a principles-and-parameters theory, while Lightfoot’s is cast in terms of a rule-based theory. For us, the entire complex of changes is essentially driven by (17). This one principle underlies the different kinds of agreement system—the two systems represent the two ways of satisfying this principle. Also, because of (17), all the properties of Modern English modals (appearance in INFL, lack of affixes) follow from the fact that modals lack main θ-roles. Thus our account is conceptually superior to Lightfoot’s. Our account also covers a wider range of data than Lightfoot’s as we can cover the changes in quantifier floating and adverb placement as consequences of the loss of V-movement to INFL (cf. section 2.5). Nevertheless, it should be clear that our account owes a lot to Lightfoot’s. 3.2  Steele et al. (1981) Steele et al.’s account is primarily a reworking of Lightfoot’s data in terms of a theory of a universal Aux node. Their main point is that Lightfoot’s account ‘is on the right track, but incomplete’ (p. 283). There are two respects in which they alter Lightfoot’s account: (i) they regard the loss of the subjunctive/indicative distinction as central, (ii) Steele et al. claim that inversion and negation always affected Aux. We can see from the second point that Steele et al. consider Aux to have been present all along. In Old and Middle English, there was a rule of Aux-attachment to V. Tense was contained in Aux. So this is why inversion and negation only affected tensed verbs in Middle English. They note that this means that negation and inversion rules were unchanged by the reanalysis of the modals: We attribute the changes [in the rules of negation and inversion—IGR] not directly to the reanalysis of the modals, but rather to the loss of the obligatory attachment of Aux. (Steele et al., 1981, p. 282) The Aux-attachment rule became optional in Middle English. This optionality is related by Steele et al. to the general rise in periphrastic constructions in this period; in particular the appearance of periphrastic do, which they attribute, as in classical analyses of do-support (e.g., Chomsky, 1957), to stranding of Tense when the attachment rule does not apply. We concur with the account given by Steele et al. for negation and inversion. This account captures the fact that the change in these rules did not involve any change in the modals, or in the form of the rules in question, but

Agreement Parameters and Modal Auxiliaries  31 rather in all the other verbs of English: the majority of verbs changed their behavior with respect to these processes while the modals did not. However, on our account, this change in the majority of verbs is captured by the loss of the V-movement to INFL rule, not the Aux-attachment rule. We have given a principled explanation in terms of condition (17) and the theory of government for why Verb-movement to INFL must exist. Also, we showed how the loss of this rule was a consequence of a parametric shift in the system of agreement, caused by the impoverishment of agreement paradigms and the rise of periphrastic constructions. Our account therefore has a more principled basis and wider implications. 3.3  The Transparency Principle and Parameters The main theoretical defect of Lightfoot’s approach concerns the Transparency Principle. This principle is put forward as an inductive generalization about the theory of grammar. The idea is that only a certain amount of opacity can be tolerated in grammars before they will be necessarily abductively restructured through acquisition. However, as Warner (1983) points out, it is not clear what ‘opacity’ really is. We will suggest here that, given the notion of parameter-setting through acquisition, as described in Chomsky (1981), the Transparency Principle has the force of making one parametric setting ‘too opaque’ and favoring the selection of another. As choosing among parameters is the main task of acquisition, the Transparency Principle is ‘built-in’ to a parameter-setting model. Thus we can dispense with the Transparency Principle as a separate principle. One thing that follows from our abandonment of the Transparency Principle as a motivating force behind syntactic change is that we are no longer compelled to regard changes as resulting from the accumulation of exceptional properties, but rather as resulting from the interaction of possibly quite independent factors. It is also possible for some factor in a change to be a feature of the grammar for a long time before becoming a factor which leads to a change in conjunction with some other, otherwise independent factor. For example, modals were irregular for centuries before the sixteenth century (see section 2.2.2); however, in conjunction with the loss of subjunctive and plural verbal inflections the irregularity of modals became a factor leading to a parametric change. We now briefly propose an alternative to Lightfoot’s earlier proposals for the relation of language change to language acquisition, one which does not make use of the Transparency Principle.16 The crucial notion is that of a ‘parameter of Universal Grammar’. This notion is outlined in the following way by Chomsky: In a highly idealized picture of language acquisition, UG [Universal Grammar—IGR] is taken to be a characterization of the child’s prelinguistic initial state. Experience—in part, a construct based on the

32  Ian Roberts internal state given or already attained—serves to fix the parameters of UG, providing a core grammar, guided perhaps by a structure of preferences and implicational relations among the parameters of core theory. If so, then considerations of markedness enter into the theory of core grammar. (Chomsky, 1981, p. 7) If we take this view of acquisition, and continue to regard acquisition as the driving force behind language change, we are led to the parameter-changing view of language change that we have adopted in this article. We will give an illustration as follows: we define a syntactic change as a difference in the value of at least one parameter of UG over a period of time. Now imagine a parameter P with the potential values [+F] and [−F]. For concreteness, take P to be agreement systems and [+F] to morphological agreement, with [−F] therefore syntactic agreement. From the point of view of a learner of English in the early sixteenth century, [+F] is the value of P in the core grammar which underlies the language behavior of the surrounding speech community. Owing to the large number of periphrastic constructions with modals and do and the lack of agreement morphology on verbs, the acquirer initially sets P at [−F], i.e., the acquirer assumes that English has a syntactic agreement system. How can P now be ‘reset’ to [+F], the value corresponding to that of the surrounding speech community? The only possible way for this to happen would be on the basis of strong positive evidence, disconfirming the original hypothesis ([−F] = syntactic agreement) and causing the child to arrive at [+F] (= morphological agreement). However, such positive evidence is not always available in the trigger experience. Moreover, the same evidence that led the acquirer to posit a parametric difference with respect to the adult grammar may often lead to further reanalysis—a case in point being the reanalysis of -s, -st agreement affixes as spell-outs of agreement features rather than base-generated affixes. We can see from this illustration how an acquirer may never be led to reset a parameter. In this way a syntactic change is initiated in the speech community,17 since the acquirer’s grammar contains one parametric setting which differs from that in the grammar underlying the input data he or she received. The above account is only intended as an outline. In the absence of a developed markedness theory and learnability theory we are not in a position to say how much irregularity or indeterminacy in the input data is enough to cause an acquirer to set a parameter in a way that does not correspond to the setting underlying the input data. However, theoretically motivated work in diachronic syntax can lay some groundwork for markedness and learnability theory by uncovering examples of parametric changes. We can then begin to approach the problems of markedness and learnability inductively.

Agreement Parameters and Modal Auxiliaries  33

Notes   * The material in this paper has been presented before audiences at Salzburg Comparative Syntax Festival, USC, MIT and UC Berkeley. Those audiences have all contributed helpful comments. I am also indebted to Joseph Aoun, Mürvet Enç, Osvaldo Jaeggli, George Lakoff, David Lightfoot, David Pesetsky and two anonymous NLLT reviewers for useful discussion of the ideas in this paper. The biggest thanks are due to Nigel Fabb, with whom the central notion of this paper was formulated. All mistakes, of course, are inalienably mine.   1. We take modals to be ordinary verbs. The motive for proposing that modals are members of a separate category of auxiliaries, or are verbs marked [+Aux] is precisely the exceptional properties of modals compared to main verbs illustrated in (1). In this paper we show that these properties derive from two properties (i) the fact that modals assign no (main) θ-roles, and (ii) a condition requiring verbs with θ-roles to appear in governed positions and verbs with no θ-roles to appear in ungoverned positions (Condition (17)). These two properties together allow us to continue to regard modals as verbs. Since modals clearly were verbs in Middle English, as (2) shows, we do not need to view the historical change as a category change from V to Aux, or as the addition of the feature [+Aux] to the modals.   2. In some dialects of Modern English, sequences of modals are grammatical. I will have nothing to say about those dialects here. Traugott (1972) gives double-modal sequences as one example of a property of Black English taken over from Early Modern English without change.   3. Old and Middle English exhibited the ‘verb-second’ (V2) phenomenon, also found in Modern German and Dutch. In a tensed root clause, a tensed verb must appear in second position. (2a, i) and (2a, iii) exhibit this in Middle English. We assume, following current proposals (den Besten, 1977; Evers, 1981; Haider, 1984; Koopmann, 1983; Koster, 1975; Thiersch, 1978) that a rule of INFL-fronting is involved in the derivation of clauses with V2 order. This rule is fed by a rule moving V into INFL in tensed clauses, cf. section 1.3. Examples (2a, i) and (2a, iii) show that this rule applied indifferently to modals and to main verbs. This situation no longer holds for the only Modern English INFLfronting process (Subject-Aux inversion, or SAI). SAI can only affect auxiliaries, as (1a) shows. We assume that Middle and Modern English both have the same INFL-fronting rule, and we will see in the course of the paper how the range of application of this rule became restricted. The first restriction was on the range of environments in which the rule applied; the general development of SVO order which replaced the earlier SOV order in Middle English eliminated the V2 phenomenon. This left the type of inversion seen in (2a, ii) intact, but eliminated that seen in (2a, i) and (2a, iii). Section 2 accounts for the second restriction: the loss of the possibility to front main verbs, leaving only Subject-Aux Inversion. This change followed the loss of V-movement to INFL. Cf. section 2.5.   4. Middle English examples are mostly taken from Visser (1963–73). In such examples, the citation is followed by the reference given by Visser, using his abbreviatory conventions, followed by V and the number of the paragraph of Visser the example was taken from. Other examples of Middle and Early Modern English are as cited.   5. I am grateful to an anonymous reviewer for this perspicuous reformulation of (5).   6. The ideas in this section were developed in close collaboration with Nigel Fabb.   7. So nonfinite AGR governs VP but not the subject.

34  Ian Roberts   8. If we do not make the simplifying assumption of the text, (20) suggests that the presence or absence of agreement relates to some property other than θ-role assignment. So condition (17) would relate to some condition other than θ-role assignment. In forthcoming work (Roberts (1985)), I suggest that (17) should be stated in terms of selection rather than θ-role assignment. Lack of θ-role assignment is the property that allows aspectuals to appear in INFL. However, (17) is stated in terms of selection, and aspectuals arguably select for properties of VP. This approach involves reformulating the Projection Principle and lexical theory in such a way that selection does not entail θ-role assignment. However, the details of the reformulation and all its implications go beyond this paper’s aims. For this reason, we leave aspectuals aside, in rather unsatisfactory limbo.   9. Note that this is an instance of movement to a ‘complement’ position. The Projection Principle and the θ-criterion generally rule out such movements (e.g., Raising to Object), as they force complements to be θ-marked, and movement to a θ-marked position always violates the θ-criterion. Here the Projection Principle is not violated, however, because affixes have no thematic relation with the stems which are their ‘complements’.   Also, it is only possible for a Stem position to be filled at D-Structure by an auxiliary; if a main verb, i.e., a verb with θ-roles, appears there at D-Structure, it will be unable to govern its complements and so the θ-criterion will be violated. In fact, this is the reason why only θ-role-less verbs, (i.e., auxiliaries) can appear in INFL at D-Structure. 10. Another property has been correlated with rich agreement morphology in recent work: the prodrop parameter (cf. Chomsky, 1981, Chapter 4; Rizzi, 1982, Chapter 4). Chomsky suggests that Rule R, which lowers INFL onto V, takes place in the syntax in prodrop languages, and this is what leads to the properties associated with prodrop. This would be consistent with the theory of Verb-movement proposed in the text. We could say that some languages had affix-movement to V instead of V-movement to affix, the latter being what is outlined in the text.   Another view has been adopted more recently, by Osvaldo Jaeggli in class lectures. Jaeggli holds that agreement morphology is able to form a kind of clitic-chain with an empty category, pro, in subject position. The agreement morphology has the effect of identifying pro, identification being the main requirement for pro. This proposal makes interesting predictions in conjunction with our theory of Verb-movement to INFL. We can derive the following implication: (i) if there is rich agreement, there will be Verb-­movement to INFL, (ii) if there is prodrop, there will be Verb-movement to INFL, (iii) if there is little or no agreement, there will be no Verb-movement. Modern English is consistent with these implications, having little agreement, no Verb-movement to INFL, and no prodrop. Italian and Spanish are also consistent, having rich agreement, Verb-movement to INFL and prodrop. Middle English and Modern German have Verb-movement, rich agreement and no prodrop.   Another interesting point in this connection is assignment of Nominative Case. We are assuming that languages with rich agreement lack AGR, so we can ask how Nominative Case is assigned in these languages. For prodrop languages we can say one of two things: either the empty category in subject position of a tensed clause does not need Case (certainly the Case Filter does not require it), or Case is transmitted via the chain formed with the agreement affix in INFL. For languages with no prodrop, i.e., Middle English and Modern German, the question is more acute. We may conjecture that the

Agreement Parameters and Modal Auxiliaries  35 INFL-fronting process that underlies verb-second may be relevant here (cf. fn. 3 and Koopmann, 1983). 11. ‘deuit’ here is the 3Sg. present form of dowen (OE dugan). This verb was a ME modal, and former preterit-present verb that meant roughly ‘to be fitting’. It died out of the Standard language in late Middle English but survived in certain dialects until the nineteenth century (cf. Lightfoot, 1979, pp. 102–3). 12. According to Visser, who cites other commentators on English, the Modern English subjunctive as seen in examples like (i) is unique to the twentieth century, and most common in American English: (i)  I require that he be there at 8.

In fact, this construction can give further support to our claims about Verbmovement in Modern English.   We propose that the complement in (i) contains an empty modal. The empty modal appears in INFL and is selected by the matrix verb. The fact that the ‘subjunctive’ verb always appears in a stem form (with the exception of the fossilized if I/he were) follows automatically, as only stem forms can follow modals. Moreover, overt modals are impossible in subjunctive complements like (i). If the subjunctive is an empty modal, this fact is simply a case of the general prohibition against double-modal sequences, which is itself a consequence of (17). Likewise, the possibility of aspectual have and be in subjunctive complements is explained.   The empty modal syntactically governs the verb in (i). We can see that the verb does not move into INFL from the position of clausal negation, which precedes the verb: (ii) I suggest that he not be there by 8. In (ii), not is in its normal position, between INFL and VP.   Aspectual auxiliaries are able to move into INFL in present-day English (cf. Emonds, 1976; Akmajian et al.,1979). However, they are unable to appear before not in subjunctive complements: (iii) *I require that he be not there by 8. (iv) *I require that he have not left before I arrive.

If there is a phonologically null modal in INFL, the impossibility of have/be raising in subjunctive complements is explained.   In Middle and Early Modern English, however, the situation was different. At these periods, we find the verb and not in the reverse order in subjunctives: (v) Beware thou that thou bring not my son thither again. (1611, Bible, Gen 24, 6: V 869).

With the verb—not order, assuming that not is always between INFL and V, Verb-movement to INFL must have taken place. In this case, the verb was morphologically governed. The present-day order, on the other hand, shows no evidence of Verb-movement. Instead, the verb is syntactically governed by the empty modal in INFL. This change is a function of the parametric shift from morphological to syntactic government. 13. The subjunctive appeared mainly in the following environments: subject clause (whoever hate his brother . . .), relatives (the properties that a king have), conditionals (thou art dead if thou speak one word), temporal clauses (if and when the need of work allow not such leisures to be taken), purpose clauses (the properties that are required to an argument, that it be full and formal), result


Ian Roberts

clauses (God keep him, that he come not to such a pass), complements to verbs of saying (Ask his father where he be), complements to verbs of fearing (I dread that he become my bane), complements to verbs of wishing (Christ wants that his glory last). For each of these uses of the ME subjunctive, it is possible to find parallel instances of periphrastic constructions with modals. 14. To quote an (almost) contemporary source: In former times, till about the reigne of King Henry the eighth, they [plural forms of Verbs—IGR] were wont to be formed by adding -en thus loven, sayen, complainen. But now (whatsoever is the cause) it has growne quite out of use, and that other so generally prevailed, that I dare not presume to set it a-foote againe. Albeit (to tell you my opinion) I am perswaded, that the lack hereof well considered will be found a great blemish to our tongue. (Ben Jonson, 1637) Henry VIII reigned from 1509 to 1547. So the final loss of plural agreement coincides very closely with the date for the parametric change, which we could put at the mid-to-late sixteenth century. 15. The Adjacency Condition on Case Assignment prevents the adverb from appearing inside VP, intervening between the verb and the NP it Case-marks. Notice how our theory of Verb-movement deals with apparent counterexamples to this condition in Middle English. 16. I should stress at this point that this view of language change has much in common with that given in Lightfoot (1979). The main innovations are due to the incorporation of advances in linguistic theory. In fact, this kind of view was proposed recently by Lightfoot (class lectures, LSA Institute, UCLA, 1983). 17. What we have said only covers the instigation of a change. The spread of a syntactic change through a speech community is presumably governed by constraints like those observed for phonological change by Labov (1972). Note that Labov’s account presupposes the existence of a change.

References Akmajian, A., S. Steele and T. Wasow: 1979, ‘The Category AUX in Universal Grammar’, Linguistic Inquiry 10(1), 1–64. Barber, C.: 1976, Early Modern English, Andre Deutsch, London. Belletti, A. and L. Rizzi: 1981, ‘The Syntax of ‘ne’: Some Theoretical Implications’, The Linguistic Review 2(1), 117–155. den Besten, H.: 1983, ‘On the Interaction of Root Transformations and Lexical Deletive Rules’, in W. Abraham (ed.), On the Formal Syntax of the Westgermania, John Benjamins, Amsterdam, pp. 47–132. Chomsky, N.: 1957, Syntactic Structures.(Janua Linguarum, 4). Mouton, The Hague. Chomsky, N.: 1981, Lectures on Government and Binding, Foris Publications, Dordrecht. Chomsky, N.: 1982, Some Concepts and Consequences in the Theory of Government and Binding, MIT Press, Cambridge, MA. Culicover, P.: 1976, Syntax, Academic Press, New York. Ellegård, A.: 1953, The Auxiliary ‘do’: the Establishment and Regulation of its Use in English, Almqvist and Wiksell, Stockholm. Emonds, J.: 1976, A Transformational Approach to English Syntax: Root, StructurePreserving and Local Transformations, Academic Press, New York. Emonds, J.: 1978, ‘The complex V—Vʹ in French’, Linguistic Inquiry 9, 151–175. Evers, A.: 1981, ‘Verb-Second Movement Rules’, in Wiener Linguistische Gazette 26.

Agreement Parameters and Modal Auxiliaries  37 Fabb, N. and I. Roberts: in preparation, The English Auxiliary System. Gruber, J. S.: 1965, Studies in Lexical Relations. MIT Ph.D. dissertation, distributed by Indiana University Linguistics Club. Haider, H.: 1984, ‘Topic, Focus, and V-Second’, in GAGL 25. Jackendoff, R.: 1972, Semantic Interpretation in Generative Grammar, MIT Press, Cambridge, MA. Jespersen, O.: 1909–49, A Modern English Grammar on Historical Principles, Vols. I–III, Allen and Unwin, London. Jespersen, O.: 1938, Growth and Structure of the English Language, Allen and Unwin, London. Jonson, Ben: 1637, English Grammar (edited by A.V. Waite, 1909). Koopmann, H.: 1983, The Syntax of Verbs: From Verb-Movement Rules in the Kru Languages to Universal Grammar. Unpublished Ph.D. dissertation, McGill University. Koster, J.: 1975, ‘Dutch as an SOV language’, Linguistic Analysis 1, 111–136. Labov, W.: 1972, Sociolinguistic Patterns, University of Pennsylvania Press, Philadelphia, PA. Lieber, R.: 1980, On the Organization of the Lexicon, Unpublished Ph.D. dissertation, MIT. Lightfoot, D. W.: 1974, ‘The Diachronic Analysis of English Modals’, in Anderson, J. and C. Jones (eds.), Historical Linguistics, Proceedings of the First International Conference on Historical Linguistics, Amsterdam, North Holland. Lightfoot, D. W.: 1979, Principles of Diachronic Syntax, Cambridge University Press, Cambridge, England. Manzini, M. R.: 1983, Restructuring and Reanalysis, Unpublished Ph.D. dissertation, MIT. Marantz, A.: 1981, On the Nature of Grammatical Relations, Unpublished Ph.D. dissertation, MIT. Palmer, F. R.: 1974, The English Verb, Longmans, London. Pullum, G. and D. Wilson: 1977, ‘Autonomous Syntax and the Analysis of Auxiliaries’, Language 53(4), 741–789. Rizzi, L.: 1982, Issues in Italian Syntax, Foris Publications, Dordrecht. Roberts, I. G.: 1985. The Representation of Implicit and Dethematized Subjects. USC Ph.D. dissertation. Safir, K.: 1982, ‘Inflection-Government and Inversion’, The Linguistic Review 1, 417–467. Selkirk, E.: 1982, The Syntax of Words, MIT Press, Cambridge, MA. Sproat, R.: 1983, ‘VSO Languages and Welsh Configurationality’, in MIT Working Papers in Linguistics, Vol V. Steele, S. et al.: 1981, An Encyclopedia of AUX. A Study of Cross-Linguistic Equivalence, MIT Press, Cambridge, MA. Thiersch, C.: 1978, Topics in German Syntax, Unpublished Ph.D. dissertation. Traugott, E.: 1969, ‘Diachronic Syntax and Generative Grammar’, in R. Lass (ed.), Approaches to English Historical Linguistics, Holt, Rinehart and Winston, New York. Traugott, E.: 1972, A History of English Syntax: A Transformational Approach to the History of English Sentence Structure, Holt, Rinehart and Winston, New York. Travis, L.: 1984. Parameters and Effects of Word Order Variation. Unpublished Ph.D. dissertation, MIT. Visser, Th.: 1963–73, An Historical Syntax of the English Language, Vols. I–IIIb, E. J. Brill, Leiden, Holland. Warner, A.: 1983, ‘Review of D. W. Lightfoot: Principles of Diachronic Syntax’, Journal of Linguistics 19, 187–209. Williams, E.: 1980, ‘Predication’, Linguistic Inquiry 11(1), 203–238.

38  Ian Roberts Williams, E.: 1981, ‘On the Notions ‘Lexically Related’ and ‘Head of a Word’,’ Linguistic Inquiry 12(2), 245–274. Zubizarreta, M. L.: 1982, On the Relation of the Lexicon to Syntax, Unpublished Ph.D. dissertation, MIT. Zubizarreta, M. L.: 1983,‘The Relation of Morpho-Syntax to Morpho-phonology: The case of Romance Causatives’, Mimeographed, MIT.


A Computational Model of Language Learnability and Language Change Robin Clark and Ian Roberts

1. Introduction Darwin’s (1859) theory of natural selection had an important influence on the Neogrammarians. Like Darwin, they believed that diachronic change was the result of selective pressures on organisms from the environment operating on random variation within a population (see Haldane 1990 for a classic exposition of natural selection as the motive force underlying evolution). Darwin proposed that natural selection was accounted for by the greater reproduction rates of fitter organisms; in the linguistic realm, Paul (1920) proposed that language change is driven by restructuring of the target grammar that may take place during language acquisition. If the input to language acquisition is taken to be the environment and if language acquisition is taken to be the linguistic correlate of biological reproduction, a clear parallelism between Darwin’s view of natural selection and Paul’s view of the selection of grammars emerges. Despite the appeal of this notion, no successful evolutionary theory of the relationship between language acquisition and language change has been developed in the 130 years since Darwin’s On the Origin of Species. The purpose of this article is to relate natural selection, language acquisition, and language change in light of current computational models of learning. The basic problem for the hypothesis that language change is driven by acquisition concerns the relationship between the adult input, which is generated by one grammar, and the learner’s hypotheses, which may differ at certain points from the adult grammar. We have grown accustomed to thinking of acquisition as a relation between linguistic experience and a target grammar; the learner must converge to a single target grammar in order for learning to be considered successful (see Gold 1967, Osherson, Stob, and Weinstein 1986). Although this idealization has proven useful in the study of the logical problem of language acquisition, it renders opaque the relationship between language acquisition and language change. If each generation converges successfully to the adult grammar, how can languages ever change? One would expect them to remain forever fixed since change entails that there must be at least one generation whose grammar differs

40  Robin Clark and Ian Roberts from its parents’ grammar; yet, by definition, this generation would have misconverged. We can easily state the problem in terms of parameter setting. Acquisition is a process of accurately fixing parametric values. That is, the learner sets parameter pn to the value vi in response to some property, ci, of the input text; the usual idealization states that the learner has successfully converged to the value vi for the parameter pn if the target grammar has pn set to vi. Language change, on the other hand, presupposes that a population must converge on a value vi for at least one parameter, p, where the adult grammar has p(υj) and υi ≠ υj. Strictly speaking, the learner has failed to learn. More puzzling still, the property ci of the input text that allowed adults to induce pn(υi) when they were learning the language should be present in the speech that they, in turn, address to children. How is it that, for one generation, property ci causes learners to hypothesize pn(υi) whereas in a succeeding generation it loses its causal force? We will argue that the question of how parametric change can take place given reasonable constraints on learnability is fundamental both for understanding language acquisition and for understanding language change. Indeed, the logical problem of language change cannot be separated from the logical problem of language acquisition; one of the claims of this article is that the former problem is a subcase of the latter (see Lightfoot and Hornstein 1981) in that the answer reduces to the relation between property ci, the structure of the learner, and pi (the same point has been made by Lightfoot (1991)). We will formalize this problem in light of current thinking on language learnability; doing this elucidates both the processes that underlie diachronic change and those that drive learning. The result is of importance for an understanding both of language acquisition and of diachronic change.1 A central problem for acquisition theory is that of characterizing how the learner formulates and retracts hypotheses in light of its linguistic environment. Equally, one of the central problems for language change concerns how a population of learners can converge on a grammar that is systematically different from the adult grammar in the sense defined above. In both cases, hypothesis formation and retraction by learners appear to be the crucial mechanisms. We will adopt the genetic algorithm approach to learnability developed in Clark 1990, 1992.2 This approach treats learning as a special case of natural selection. In what follows, we will show how to encode the learner’s hypotheses about the target sequence of parameter settings as “bit strings”—that is, strings of 0s and 1s—that serve to enumerate not only hypotheses but also, by extension, grammars and parsing devices. These bit strings, then, can be treated like genetic material that specifies grammatical “phenotypes” that may be expressed by parsing devices. These parsing devices are then run against an input text, and their relative fitness is measured by a simple metric. Those hypotheses that are judged most fit are then combined via a special mating operation; in other words, we will literally allow hypotheses to mate and thereby produce “offspring” hypotheses that

Computational Model of Language Learnability  41 share genetic material (subsequences of bit strings) of both parents. Since the mating operation prefers the most fit hypotheses, this technique allows the learner to search the hypothesis space efficiently while optimizing the learner’s computational resources. The genetic algorithm technique presupposes that the input text expresses each parameter with sufficient frequency that the learner’s hypotheses are placed under pressure to bear that parameter setting. Hypotheses that carry a parameter value corresponding to a parameter setting frequently expressed in the input text will be strongly selected for by the fitness metric. As a result, hypotheses containing “favorable” parameter settings will tend to reproduce more frequently, whereas the “unfavorable” setting will disappear from the population, where favorable simply means ‘better able to parse the input’. If, on the other hand, a parameter is not expressed frequently in the input text, the learner will be under less pressure to set that parameter in accordance with the target setting. In this case, the fitness metric will not be decisive in driving the learner toward the target setting, so that either the correct setting or the incorrect setting can survive in the linguistic environment. The fitness metric, which we will describe in detail below, plays a crucial role in mediating between the learner and the input text. Implicit to this discussion is the notion that relative fitness determines convergence; the learner converges to the most fit hypothesis relative to the input text even if this grammar differs from the adult state for the values of some parameters. We will propose that parametric change occurs when the target of acquisition contains parameter values that cannot be uniquely determined on the basis of the linguistic environment. This can occur when the evidence presented to the learner is formally compatible with a number of different, and conflicting, parameter settings. In these cases the learner must evaluate its hypotheses using criteria that are not purely a response to the external environment; in particular, the learner must consider factors like the Subset Condition (Berwick 1985) and elegance of derivations (the least effort strategy; Chomsky 1991). Thus, the consideration of language change from a learnability perspective gives us access to how learners evaluate the relative merit of their hypotheses. Our goal here will be to characterize, in a precise manner, the conditions under which a learner arrives at a grammar distinct from the target, thus fueling diachronic change. Moreover, this approach reduces the logical problem of language change to the logical problem of language acquisition by relating both to the question of how learners set parameters to particular values. Intuitively, our argument will be that, because of various factors, the input data do not put pressure on the learner to set certain parameters to a definite value; several alternative grammars can adequately account for the input stream; the appropriate choice of grammar is underdetermined by the linguistic environment, even given the learner’s rich internal structure. Since external pressures do not force the learner to select a particular grammar,

42  Robin Clark and Ian Roberts it will turn in on itself, abandoning external pressure, and rely on its own internal structure to select from the alternatives at hand. If this is correct, then diachronic change can provide crucial information on those factors that learners rely on to select hypotheses. Since the external environment is not decisive in these cases, diachronic change reflects pure learnability considerations. Thus, diachronic change reflects what is, in a sense, “pathological” learning, and so a careful study of its properties can reveal a great deal about how learning transpires in nonpathological cases (a similar idea is developed for phonological change by Kiparsky (1982)). We will argue that parametric change can involve a variety of factors. Change in one component—for example, the phonology—can obscure syntactic parameter expression. The resulting text will not uniquely drive the learner toward the target. At this point the learner appeals to the fitness metric to select an appropriate parameter setting, and factors such as the Subset Condition or general economy of representations come into play rather than pure selective pressure from the input text. This type of change is exemplified by the introduction of subject clitics in 15th-century French. A second important factor is instability due to independent parametric changes within a component; change in one parameter setting can trigger a number of changes to other parameter settings. As we will show, parametric change in 16th-century French provides a case study on how parametric change can cascade through a system (see Roberts 1993). During this period, French ceased to be both a null subject language and a verb-second (V2) language. We will show that, because of innovations in the 15th century, the system became unstable, and deep parametric change was forced on the learner via the fitness metric. Fundamental to this analysis is the formalization of the notion of stability relative to a particular parameter setting: a parameter setting is stable to the degree that its expression in the input data is unambiguous. Following Clark (1990, 1992), we will say that a parameter value, p(υj), is expressed by an input sentence, si, just in case a grammar must have p set to value υj in order to assign a well-formed representation to si (see section 2.4). We should note that this does not mean that the parameter is set by raw data; rather, parameter expression defines a class of representations that are compatible with the current input sentence and the parameter values that those representations entail. An unstable parameter setting, then, is one whose expression is ambiguous. We will show that, through a variety of independent changes, 16th-century French became highly unstable, resulting in the loss of null subjects and V2 phenomena. The article is organized as follows. In section 2 we discuss the formal and conceptual underpinnings of the learning theory. In section 3 we apply the learning theory to a particular case of change. Finally, in section 4 we discuss some of the consequences of the current approach for the theory of learning and change.

Computational Model of Language Learnability  43

2.  Genetic Algorithms and Language Learnability The basic problem faced by a language learner is to discover a target grammar based on a plausible input text.3 A principles-and-parameters (see Chomsky 1981) approach to grammar provides a powerful way of limiting the problem of discovering the appropriate target grammar given the impoverished nature of the input data. Parameters can be viewed as finite vectors along which natural languages may vary; the learner is faced with the problem of searching a finite space of possible grammars rather than the more difficult problem of inducing a set of rules that lies at an undetermined point in an infinite hypothesis space. Learning theory must provide an account of how the learner’s search through the set of possible combinations of parameter values takes place, and of how certain values are chosen over others. We believe, with Lightfoot (1979), that such an account should give a solution to the logical problem of language change. In this section we will describe in detail our account of how the learner searches through the available parameters and fixes their values. The approach is based on the notion of a genetic algorithm (Holland 1975, Goldberg 1989, Clark 1990, 1992). Genetic algorithms model the basic process of natural selection in the biological world: how certain patterns of genetic material are more adapted to their environment (i.e., fitter) than others, and hence tend to reproduce at the expense of the others. Our account of language learning is analogous: the input text is the analogue of the environment, and so “fitness” means consistency with this; parameter settings correspond to the genetic material of the biological world (and so a whole grammar would be a genome). Successful combinations of parameter settings “reproduce” (i.e., contribute to the formation of new hypotheses about the target grammar) at the expense of others. In this way, the learning mechanism gradually eliminates “unfit” hypotheses (those that are not consistent with the input text) and arrives at a single fittest grammar. Since nothing in the approach requires this grammar to be consistent with the one that underlies the input text, learners may arrive at final-state systems that differ from those of their parents; this, in essence, is our solution to the logical problem of language change. 2.1  The Nature of the Learning Problem It is possible to see the learner as a relation between input data and a sequence of parameter values (see Clark 1990, 1992). More precisely, we can view the learner as a function from input texts to parameter values, as in (1). (1) φ(σi ) = x1 ,x 2 , . . . , x n

44  Robin Clark and Ian Roberts Here the learner is the function φ that applies to an arbitrarily selected text, σi, and gives a sequence of n parameter values, x1 , x2 , …, xn . Given a sequence of parameter values, we can imagine that a special compiling function, ϕn, maps the sequences of parameter values onto a grammar, Gi, for the input text σi. We can further define a function γ, which, given a grammar Gi, returns a parsing device Pm for the grammar Gi. Thus, we can view learning as a relation between inputs and parsing devices. This is important, since the notion of fitness with respect to input texts is most naturally defined in terms of the number of failed or successful parses of those texts. We will discuss how this is done below. Putting the above together, the learning situation is as described in (2). (2) γ[ϕn(φ(σi))] = Pm In considering the learning problem, it is important to recall that the learner is computationally bounded. In other words, the learner has finite resources in terms of time and memory. It cannot take indefinite periods of time before converging to the target grammar, nor does it have a perfect memory for past sequences in the input text or past (unsuccessful) hypotheses. Furthermore, the learner is given little information about the proper analysis to be accorded to the input data. It has only limited information about the proper structural analysis for any given datum, and little to no access to input that is ill formed with respect to the target. The claim that the hypothesis space, under a principles-and-parameters approach, is finite is not, in itself, sufficient to guarantee that the learner can converge in a reasonable amount of time. Finite problems can be sufficiently large that their solution might take an impractical amount of time to compute. Suppose, for example, that the hypothesis space is determined by 30 binary parameters. In this case there are 230, or 1,073,741,824, possible grammars. If the learner could test each of these grammars at the rate of one per second, it might in the worst case take the learner over 34 years to converge on the target. Clearly, the learner must be capable of searching the hypothesis space in a more efficient manner. Beyond efficiency considerations, it is clear that the learner cannot use a brute-force search technique to converge on the target since certain parameters may fall into subset relations; allowing 0 to stand for the negative value of a parameter and 1 to stand for the positive value, we can indicate as in (3) that the language that results when a certain parameter, px, is set to 0 is a proper subset of the language that results when px is set to 1. All the sentences that are grammatical in the subset language will also be grammatical in the superset language. If the learner guesses the superset language, then no further evidence will contradict its hypothesis. Thus, the learner will never have grounds to retract this (incorrect) hypothesis. Thus, the learner must guess the minimal language compatible with the input sequence σi. Given that the learner has no reliable access to negative evidence, it appears

Computational Model of Language Learnability  45 that the learner must guess the smallest possible language compatible with the input at each step of the learning procedure. This is, in essence, the Subset Condition proposed by Berwick (1985), which is intended to circumvent the sort of trap posed by subset parameters. (3) L[p1, . . ., px − 1, px(0), px+1, . . ., pz] ⊂ L[p1, . . ., px − 1, px(1), px+1, . . . pz]

L[p1 ,..., px – 1, px(0), px + 1 ,..., pz]

L[p1 ,..., px – 1, px(1), px + 1 ,..., pz]

A further possibility arises if we consider that sets of parameters might interact in such a way as to generate superset languages. That is, when considered individually, the parameters in question may not necessarily generate superset languages, but when they act in a group, they do generate a superset language. This is the shifting relation observed by Clark (1990):4 (4) Shifting Two parameters, xi and xj, cause a shift at values xi(1) and xj(1) just in case: a. L[ϕn(x1, . . ., xi(1), . . ., xj(0), . . ., xn)] ⊄ L[ϕn(x1, . . ., xi(0), . . ., xj(1), . . ., xn)] b. L[ϕn(x1, . . ., xi(0), . . ., xj(1), . . ., xn)] ⊄ L[ϕn(x1, . . ., xi(1), . . ., xj(0), . . ., xn)] c. L[ϕn(x1, . . ., xi(1), . . ., xj(0), . . ., xn)] ⊂ L[ϕn(x1, . . ., xi(1), . . ., xj(1), . . ., xn)] d. L[ϕn(x1, . . ., xi(0), . . ., xj(1), . . ., xn)] ⊂ L[ϕn(x1, . . ., xi(1), . . ., xj(1), . . ., xn)] In other words, a shift occurs given two parameters that generate superset languages when they are both set to some particular value. Notice, crucially, that if the language generated by setting xi to 0 is a subset of the language generated by setting xi to 1, this relationship is preserved in the shifted language. In brief, a learner could obey the Subset Condition on the microscopic level (with respect to a single parameter) while violating it on the macroscopic level (due to shifting interactions between parameters). In order for the learner to avoid these higher-level violations of the Subset Condition, it would have to calculate interactions between parameter settings. But this would become increasingly difficult as the number of parameters that could “conspire” to generate a shifted language increased; given n parameters, the learner may have to consider n! possible interactions. The graph in (5) illustrates a case of shifting that involves superset parameters. In this example we have two parameters p1 and p2 that interact to generate a shifted language, L[p1(1), p2(1)]. In (5) dominance indicates the subset/superset relation.

46  Robin Clark and Ian Roberts (5)

L[p1(1), p2(1)]

L[p1(0), p2(1)]

L[p1(1), p2(0)]

L[p1(0), p2(0)]

In this case both p1 and p2 are superset parameters; any language with p1 set to 0 is a subset of a language with p1 set to 1, and any language with p2 set to 0 is a subset of a language with p2 set to 1. Note that L[p1(1), p2(0)] and L[p1(0), p2(1)] are not in the superset relation with each other. The language L[pl(l), p2(1)], however, properly contains the other three possible options. As we will show, the learner will be reluctant to posit the language L[pl(l), p2(1)] and will only do so if faced with a significant amount of empirical prodding in the form of failed parses. A more difficult case is illustrated in (6). (6)

L[p1(1), p2(1)]

L[p1(1), p2(0)]

L[p1(0), p2(1)]

L[p1(0), p2(0)]

In this case only one of the parameters, p1, is a superset parameter. One might imagine that p1 regulates the option of having left-dislocation of a constituent. The parameter p2 does not generate languages in the superset relation. For example, one might take p2 to be a parameter that regulates V2 phenomena in matrix clauses. Suppose that p1 and p2 interact in such a way that, when both are set to 1, the language allows left-dislocation of a constituent over the V2 structure of the root clause; the resulting language has all of the normal V2 orders plus clauses with an additional constituent left-dislocated before the normal V2 order. Such a language would be a shifted language. Take the case where the target language is V2 without left-dislocation. Suppose that the learner, during an early phase of the learning cycle,

Computational Model of Language Learnability  47 erroneously sets p1 to 1, allowing left-dislocation of an NP (or DP) in response to the presence of nonsubject NPs/DPs in clause-initial position. This hypothesis, however, is inadequate to account for all the root V2 orders that the learner encounters—for example, those with initial adverbials and also possibly those with initial NPs/DPs without a resumptive pronoun. In response, the learner sets p2 to 1, allowing for the possibility of V2, but does not reset p1 to 0. In this case the learner has now entered a shifted language; because of the interaction between p1 and p2, all the target orders will be consistent with the learner’s hypothesis, which, nevertheless, overgenerates. We will show that such a hypothesis will be selected against in such a way that the learner can retract its overgeneral hypothesis without access to direct negative evidence. Such a shifted language, although a possibility empirically, will tend to be unstable diachronically, with one of the two superset possibilities, V2 or left-dislocation, being quickly lost. Notice that a learner will have two analyses available for “V3” structures (structures with two constituents before the tensed verb); either such a structure involves left-dislocation with a standard V2 structure, as in (7a), or it involves simple left-dislocation, as in (7b). (7) a. [CP DP [CP DP [C′ V [IP . . .]]]] b. [CP DP [IP DP V . . .]] We will argue that (7b) involves a simpler syntactic analysis, with shorter chains, than (7a). Thus, the learner will tend to prefer the hypothesis that allows (7b) over one that requires the analysis in (7a). We will discuss further how this factor of “elegance” influences parameter setting below. In our discussion of the data from French in section 3, we will present some cases of this type. 2.2  Genetic Algorithms Clark (1990, 1992) proposes that genetic algorithms provide a computational model of learning for a principles-and-parameters theory that circumvents the problems discussed in section 2.1 while accounting for the relationship between input evidence and parameter setting. Genetic algorithms mimic natural selection by representing hypotheses about a problem in a way that is similar to the way in which genetic material is represented. Hypotheses are then tested against the problem space, with the most fit hypotheses contributing to the formation of new hypotheses via reproduction (the combination of preexisting hypotheses to form new hypotheses in a way that is similar to the biological recombination of DNA present in mating). By “breeding” the most fit hypotheses, testing them against the problem space, and pruning the least fit, a genetic algorithm can efficiently search large spaces and find optimal solutions.5 More precisely, a genetic algorithm defines a number of automatic mechanisms for combining hypotheses that

48  Robin Clark and Ian Roberts are, in some sense to be defined below, “fit.” These mechanisms, which simulate breeding or reproduction, produce new hypotheses that are likely to replicate the advantageous properties of existing hypotheses while eliminating those properties that are ill adapted to the environment (in our case, the sequence of input sentences that the learner encounters). By repeating this process over successive “generations” of hypotheses, the learner is able to approximate the target sequence of parameter settings.6 A genetic algorithm consists of the following components: • A representation of hypotheses in terms of strings, similar in structure to genetic material. In our case we will encode sequences of parameter values as strings of binary numbers. • A set of reproduction operators that combine or alter existing “parent” hypotheses in order to produce new “offspring” hypotheses. Reproduction will be based on the performance of the hypotheses relative to the input stream; those that perform best will reproduce most prolifically. Furthermore, since reproduction is based on existing hypotheses, the search of the hypothesis space is highly constrained and not random (see Holland 1975 and Goldberg 1989 for careful discussions of these points): 1. A crossover mechanism. This mechanism combines two hypotheses and produces a new hypothesis by combining parts of each of the parents’ genetic material. 2. A mutation operator. This mechanism randomly alters an offspring’s genotype to produce a new hypothesis close to, but not identical with, the parents’ genetic endowment. • A measure of fitness of hypotheses in terms of their performance in an environment. The fitness metric defines how well adapted hypotheses are to their environment. In our case the fitness metric mainly measures success in parsing the input text (although it does contain other factors, as we will show). Most crucial for our purposes are the representation of hypotheses in terms of strings and the notion of a fitness metric. Let us first turn to a more careful consideration of the representation of hypotheses. It is common to think of parameters as variables in Universal Grammar that range over a limited set of values. The bounding nodes for classical Subjacency (Chomsky 1977) provide a good example of such a parameter. Here Subjacency is taken as an invariant property of natural languages whereas the bounding nodes may be contingently selected from a restricted set: (8) Subjacency No rule may involve X and Y in the configuration: . . .X. . . [α . . .[β . . .Y. . . β]. . .α] . . .

Computational Model of Language Learnability  49 (order irrelevant) where α and β are bounding nodes; α, β ∈ {NP, IP, CP}. Parameters can equally be viewed as variant properties of natural languages; in other words, a parameter can be thought of as a descriptive statement that may be either true or false of a given grammatical system. From this perspective, we could rewrite the parameter for the bounding nodes as a series of three statements: (9) a.  IP is a bounding node for Subjacency. b. CP is a bounding node for Subjacency. c. NP is a bounding node for Subjacency. The learner’s task would be to scan the input data and attempt to assign truth-values, 1 for true and 0 for false, to each of the above propositions. The learner’s hypotheses could then be taken as strings of 0s and 1s corresponding to the truth-value associated with each parameter. For example, the string 100 could correspond to the hypothesis that IP is a bounding node for Subjacency, but neither CP nor NP is. Thus, it is relatively natural to represent parameter settings in terms of strings. Notice that this binary representation of sequences of parameter values serves both to encode grammars as binary numbers and to enumerate the set of possible natural languages (see Clark 1992). Crucially, given the above method of encoding parameter sequences, we must be capable of recovering the grammars and parsing devices that these encodings represent. This is crucial because fitness will be measured in terms of the performance of parsers relative to a stream of input data; the actual algorithm, however, will operate on the string representation of the hypotheses. We must, then, have a translation function that relates our hypotheses (strings) to the parsers that they represent, as shown in (10): (10)

Input stream

Genetic algorithm

Population of hypotheses

Translation function

Population of parsing devices

In fact, we have already defined all the machinery needed to accomplish the above. We conceive of the learner, φ, as operating on strings of parameter settings; thus, φ is the set of reproduction operators in the genetic algorithm. The translation function in (10) then maps the learner’s hypothesis strings onto parsing devices; in other words, the translation function is comparable to the

50  Robin Clark and Ian Roberts functions ϕn, which maps sequences of parameter settings onto grammars, and γ, which maps grammars onto parsers. In a sense, the hypothesis strings represent genotypes for parsing devices whereas the translation function (ϕn and γ) maps genotypes onto phenotypes. Overlying all of this is the fitness metric, which guides the learner’s application of the reproduction operators. The crossover operator combines two hypothesis strings to create new hypotheses. For example, suppose that the two hypotheses in (11) have been selected for reproduction. (11) a. 000111 b. 101000 Now suppose we “cut” both strings after the third position in the bit string: (12) a. 000―111 b. 101―000 The first part of string (12a) is then recombined with the second part of string (12b), and the first part of string (12b) is recombined with the second part of string (12a): (13) a. 000―000 b. 101―111 And thus two new “offspring” hypotheses that have inherited genetic material (hypotheses about settings of particular parameters) from each parent are created. It should be noted that fitness interacts in a crucial way with the crossover operation. Highly fit hypotheses are more likely to be selected to take part in crossover and therefore are more likely to pass the parameter settings that made them fit on to new generations of hypotheses. The mutation operator similarly creates new hypotheses on the basis of existing ones. In essence, it must slightly alter a hypothesis string in order to create a new, but “nearby,” hypothesis. We can do this by flipping a randomly selected bit position in a hypothesis string by the following rules: (14) a. 0→1 b. 1→0 Thus, selecting the second position of the following hypothesis for mutation would yield a “mutant” that is nearly identical to its parent structure: (15) 000111→010111 The mutation operation can be viewed as a means of searching the immediate hypothesis space surrounding a parameter string. Thus, the learner can,

Computational Model of Language Learnability  51 in a sense, experiment with near-optimal hypotheses that approximate, but do not correspond to, the target. In terms of an actual parsing framework, there would be a fixed central algorithm, corresponding to UG. Within this algorithm would be various flags, indicating points where code must be inserted for the parser to function. The 0s and 1s in the hypothesis strings could be interpreted as pointers to the parameterized code. Upon receiving a hypothesis string, the machine would look up the various pieces of code indicated by the 0s and 1s and systematically substitute the code it finds for the flags in the main algorithm. The result would be a special parsing device designed to analyze the language enumerated by the hypothesis string. Thus, a “self-constructing” parser would be the ensemble of the core algorithm, the parameterized code, and a learning device that would select the appropriate hypothesis string in response to the input text. We then have a straightforward model of the translation function required by the genetic algorithm to relate hypothesis strings to parsing devices. Recall that this translation function, itself, corresponded to the functions γ and ϕn in the formalization of the learning problem, above. 2.3 Fitness Having shown how hypotheses can be represented in terms of strings and how these can be combined systematically to form new hypotheses, we still face the problem of defining the relative fitness of a hypothesis with respect to a linguistic environment. Ultimately, we want the learner to become better able to represent the input data. In other words, the learner should change its hypothesis on the basis of evidence from the external environment, and its new hypothesis must be better able to account for this evidence. In some sense, new hypotheses must be an improvement over the old hypotheses. Clark (1990) provides a crude definition of improvement based on the ability to parse input sentences in terms of failed parses. We will modify his treatment by supposing that the crucial property of a failed parse is that it violates at least one principle of core grammar.7 In particular, we will suppose that a parser consists of a number of modules (Case, binding, X-bar theory, and so on) that operate in tandem to produce a full syntactic representation. When a principle in one of these modules is violated, when the current grammar cannot assign a well-formed representation to some input, the offended component will signal a violation. With this in mind, we adopt the following notion of improvement of one hypothesis with respect to another: (16) A hypothesis A is an improvement over a hypothesis B if, given an input datum, si, A signals m violations of core grammatical principles while B signals n violations and m < n.

52  Robin Clark and Ian Roberts Intuitively, a parser that signals 3 violations on a parse is rather better than one that signals 4 violations, and a parser that signals 2 violations is superior to one that signals 3. Crucially, parsers need not perform perfectly in order for the performance to be distinguished. We will suppose, then, that the various modules of the parser are connected to a summation function, Σ, as shown in (17). (17)

X-bar θ Case ∑ Binding Bounding ECP

Each module can signal a violation to the function Σ, which then sums up the number of violations and passes the number on to the learner. Notice that the learner has no access to which grammatical principles have been violated; it only receives a number representing the sum of the violations for each parse. As noted above, the learner must be able to distinguish between hypotheses that generate a superset language and those that do not. If a superset hypothesis and a subset hypothesis can both account for an input datum, then, all things being equal, the learner should prefer the latter to the former. Thus, any fitness metric should be such that it generally rates a subset hypothesis more highly than a superset hypothesis just so long as the subset hypothesis is empirically adequate (does not fail to parse the input data). Finally, we will assume that the learner can take into account the overall “elegance” of its hypotheses. That is, the learner will, all else being equal, prefer hypotheses that lead to more compact representations. Compactness, here, can be defined in terms of such factors as the number of nodes required to cover the input string, the length of the chains associated with arguments and operators, or both. For the moment we will assume that the measure of elegance is a raw node count from each parse. With these factors in mind, we suggest the following as a fitness metric, defined over a population of parsing devices relative to an input sentence (see Clark 1990 for an earlier version of this metric). It should be noted that hypotheses are judged indirectly by means of the parsing devices that they determine, just as a genotype is judged through its expression as a

Computational Model of Language Learnability  53 phenotype. In particular, the learner has no information about why certain hypotheses perform better than others, only that certain hypotheses do, in fact, perform better. In assessing the performance of hypotheses, the fitness metric will consider a number of different factors. Above all, it will consider raw success or failure to parse; other factors, like subset relations and elegance of representation, are also taken into account, although their contribution is weighted so that they influence the learner slightly less than actual success or failure to parse. Let the number of parsing devices be n. We then need a way to count up the number of violations incurred by a given parser P and, since we are defining relative fitness, to relate this to the number of violations signaled by all the parsing devices together. We indicate the total number of violations n of all parsing devices by ∑ j =1 υ j ; this operation simply sums the number of violations in the entire population of parsing devices. We indicate the number of violations incurred by P as υi. To relate υi to the number of violations incurred by other parsers, we follow a standard statistical technique and subtract the number of violations incurred by P from the total and divide that figure by the total:

∑ υ −υ ∑ υ n


j =1




j =1


Thus, if the total number of violations is 1,000 and Pi produces 10 viola1000 − 10 tions, the metric will give = 0.99. Where Pj produces 100 vio1000 1000 − 100 lations, the metric gives = 0.9. Pi is thus more highly valued 1000 than P . j

For complete precision, we must prevent the parser in question from being compared with itself, so we exclude it from the population as follows: (19)

n j =1

υ j − υi

(n − 1)∑ j =1 υ j n

As noted earlier, we also want to evaluate whether a given hypothesis gives rise to a superset grammar. We can do this by proceeding in the same way as above: If we allow sm to represent the number of superset settings n in the hypothesis hm, then ∑ j =1 sj is the number of parameters set to superset values in the population.8 We now introduce a “superset penalty”, the constant b < 1, and multiply the count of superset settings by b. In this way, Subset Condition violations are scaled so that they will have less weight in the overall metric than a simple failure to parse a sentence. The product of b and the superset count for a single parsing device is evaluated relative to

54  Robin Clark and Ian Roberts the population of parsers in the same way as above. Combining the superset factor with the parsing factor, then, produces the metric in (20).





υ j + b∑ j =1 sj − (υi + bsi ) n

j =1

(n − 1)(∑ j =1 υj + b∑ j =1 sj ) n


Finally, we need to weigh in the relative elegance of parses as a factor. n Again we proceed in the same fashion: ∑ e j is the measure of the genj =1 eral elegance of the analyses in the entire population of parsers (which we continue to take to be a simple tally of the number of nodes) and ei is the measure for parser Pi. Analogous to the superset penalty, we introduce the constant c, which is a scaling factor for the elegance of the representation. Here again, this means that elegance is a less important factor than failure to parse. If we include the elegance factor in the equation, we arrive at the fitness metric: (21) The fitness metric




υ j + b∑ j =1 sj + c ∑ j =1 e j − (υi + bsi + cei ) n

j =1


(n − 1)(∑ j =1 υj + b∑ j =1 sj + c∑ j =1 e j ) n



We will leave the question of the exact values of the constants b and c open, assuming only that 1 > b, c > 0 (preliminary calculations suggest that both of these constants are in fact very small, in the region of 0.00002; see Clark 1990). It is worth emphasizing that the fitness metric takes these factors into consideration, but that they are weighted so that they always count less than straightforward failure to parse. Notice, though, that they become crucial in distinguishing between successful parses. This will play an important role in our discussion of language change in section 3. Finally, the fitness metric in (21) blurs the reasons for success or failure of a hypothesis relative to a population; the learner has no way of knowing why a given hypothesis succeeds or fails. It is perhaps useful to consider the contribution of each of the above factors, using some hypothetical examples. Let us turn first to the way in which the fitness metric treats grammatical violations. For the population, this is n the term ∑ υ j in the fitness metric; for the individual parsing device, j =1 it is the term υi. Suppose we have the three parsing devices p1, p2, and p3. Running these on an input sentence yields the following results: (22) a. p1 returns 1 violation, covering the input with 15 nodes. b. p2 returns 2 violations, covering the input with 15 nodes. c. p3 returns 3 violations, covering the input with 15 nodes.

Computational Model of Language Learnability  55 Running the above results through the fitness metric gives the following results, with b = 0.02 and c = 0.05 (we ignore, here, the contribution of the subset factor by assuming that none of the hypotheses underlying the parser contain superset settings): (23) a. p1 receives a fitness rating of 0.393939. b. p2 receives a fitness rating of 0.333333. c. p3 receives a fitness rating of 0.272727. Thus, parser p1 is judged the most fit, p2 the next most fit, and p3 the least fit. Notice that the learner does not receive information about which grammatical principles are violated. It has no need of such information in order to distinguish between the hypotheses at hand. Instead, it need only observe the performance of its hypotheses in an external manner, without information about their inner workings. The learner will base its new hypotheses on those old ones that are relatively more fit, thus passing on the parameter settings that made those hypotheses fit to future generations. Those parameter settings that avoid grammatical violations relative to the input text will be preserved, and those that tend to generate violations will gradually disappear. Let us turn, now, to the contribution of the superset penalty, the term n ∑ j=1 sj for the entire population and the term si for a single parsing device. Suppose that p1 and p2 both signal no violations of any grammatical principles and both cover the input in 20 nodes. Suppose further that p2 contains a superset setting for one parameter and p1 contains no superset settings. The fitness metric will then return the following results: (24) a. p1 receives a fitness rating of 0.50495. b. p2 receives a fitness rating of 0.49505. Notice that the “smallest hypothesis,” in this case the one underlying p1, is judged more fit than the one that violates the Superset Condition. Thus, the fitness metric can distinguish both between hypotheses that are unequal in their parsing powers and between hypotheses that are equal in parsing power but differ with respect to the Subset Condition. We turn, finally, to the contribution of the “elegance” factor; this is n the term ∑ j =1 e j for the entire population and ei for individual parsing devices. Consider two hypotheses, p1 and p2, which both return no violations and contain no superset settings but cover the input with trees of different elegance. Suppose that p1 is able to cover the input with 17 nodes whereas p2 covers the input with 18 nodes. The results of the fitness metric are then as follows: (25) a. p1 receives a fitness rating of 0.514286. b. p2 receives a fitness rating of 0.485714.

56  Robin Clark and Ian Roberts The first hypothesis is preferred by the fitness metric since it is able to span the input in a more elegant way than the second hypothesis. In order to see the importance of this factor, consider the case where the target is SVO. Suppose that hypothesis h1 treats the subject as being in the Spec of IP at S-Structure whereas hypothesis h2 treats the subject as having moved to the Spec of CP, attracting the main verb with it. For a simple clause, h1 and h2 will return the following structures: (26) a. h1: [IP DP [Iʹ I VP]] b. h2: [CP DPi [Cʹ Vj [IP ti [Iʹ tj VP]]]] By assumption, both h1, and h2 can account for the input stream. Notice, however, that h2 involves systematically longer chains than h1 since the former always involves movement of the subject to the Spec of CP, with subsequent attraction of the verb to C0, whereas the latter does not. The representations returned by h1 are simpler than those returned by h2. Since the learner, via the fitness metric, can take into account the general elegance of representations, it can successfully distinguish between h1 and h2. Notice, however, that elegance is defined quite simply as a count of the nodes in the tree covering an input item plus the lengths of the chains in the representation. The fitness metric can be considered to work as follows. The population of parsing devices specified by the learner’s hypothesis strings is run against n n n each input item. The term ∑ j=i υ j + b∑ j=1 sj + c ∑ j=1 e j yields the total number of violations, the total number of superset settings, and the total elegance of representations of the entire population, with the various factors weighted appropriately by the constants b and c. Dividing this term by n, the size of the population, would give the average number of undesirable properties for the entire population. Next consider the term υi + bsi + cei. This yields the number of unhealthy properties each individual parsing device carries. As this term grows in relation to the population average, the relative fitness of the parsing device decreases. If this term decreases with respect to the population average, then the parsing device is judged relatively more fit.9 The opportunity to reproduce (that is, be selected for the crossover operation and mutation) is a direct function of relative fitness. The simulation developed in Clark 1990 assumes that the fitness associated with a hypothesis corresponds transparently to its proportion of the general population. In an environment with random mating, then, those hypotheses with a high proportion in the population are more likely to meet and reproduce. The fitness ratings are used to simulate a weighted roulette wheel, the results of which undergo the crossover and mutation operations. In other words, successful hypotheses will receive a high fitness rating. The fitness rating corresponds to the probability that the hypothesis will get to reproduce. Thus, the fittest hypotheses will reproduce more frequently

Computational Model of Language Learnability  57 and pass on their parameter settings to new hypotheses. Cumulatively, then, the population will tend toward the optimal set of parameter settings for the target. Crucially, the most fit hypotheses are the most likely to contribute to the formation of new hypotheses. These hypotheses have the greatest opportunity to pass on the parameter settings that made them fit to new hypotheses. Because weak hypotheses are pruned at random intervals, these are ultimately prevented from contributing their inferior parameter settings to the general pool. Thus, fit parameter settings tend to take over while unfit parameter settings are purged. By iterating the process of parsing, judging fitness, reproduction, and “death,” the learner is able to incrementally approach the target grammar. 2.4 P-Encodings Before we turn to the diachronic data, two other definitions are required. Consider a simple example like (27). (27) John loves Mary. Notice that certain parameters must be set in a particular way if the sentence is to be parsed. Both John and Mary must receive θ-roles and Case, the verb love must be capable of picking up its inflectional affix, and so on. Any parsing device that can successfully account for these features of the sentence in (27) will return a well-formed representation. Other parameters (e.g., bounding nodes and those that regulate conditions on anaphora) are irrelevant to the representation of this sentence. It will not matter what values for these parameters the parsing device presupposes. This suggests that any given input sentence expresses certain parameters and that a set of distinct parsing devices can account for (27): (28) Parameter expression A sentence σ expresses a parameter pi just in case a grammar must have pi set to some definite value in order to assign a well-formed representation to σ. When a given datum expresses some parameter value, the learner will be under pressure to set that parameter to the value expressed by the datum. This is because the fitness metric will prefer hypotheses with the correct setting to those without it. This provides a simple definition of the intuitive notion of triggering datum: (29) Trigger A sentence σ is a trigger for a parameter pj if σ expresses pj.

58  Robin Clark and Ian Roberts Given the above interpretation of the input data, we can imagine a method of encoding the data in string form. Suppose we have a function ψ that maps a sentence onto the set of sequences of parameter settings that are compatible with that sentence. For example, a given input sentence, Sm, can be accounted for by grammars with the second and third parameters set to 0 and the fifth parameter set to 1. Applying ψ to Sm would give the following set of parameter strings: (30) ψ(sm) = {00001, 10001, 00011, 10011} Using “*” as a variable to range over 0 and 1, we could replace the above set of strings with a cover term: (31) {00001, 10001, 00011, 10011} = [* 0 0 * 1] We will refer to the sequence [* 0 0 * 1] as the p-encoding for sm; the p-encoding of a sentence may be thought of as a “pure” representation of the parameters expressed by the sentence.10 Notice that, in principle, one could replace the sentences in an input text with their p-encodings and, so, study the frequency of expression for various parameters and the overall structure of the text relative to parameter expression. There is an important relationship between parameter expression and the fitness metric. Ultimately, the fitness associated with a hypothesis governs its probability of being selected for reproduction. The more fit a hypothesis is, the more likely it is to pass on those parameter settings that made it fit. Now consider parameter expression. When a parameter is expressed, those hypotheses that have the correct value for that parameter will be judged more fit than those that lack the proper value. If a parameter is expressed robustly by several different construction types (and, hence, has a higher probability of occurring in the input text), then those hypotheses bearing the correct value will have more opportunity to be selected for reproduction and the appropriate parameter setting will tend to dominate in the population. Furthermore, those hypotheses bearing the incorrect value will have a lower fitness rating and will tend to reproduce less so that the parameter values that made them unfit are washed from the population. Thus, parameter settings that are expressed robustly will tend to be set quickly and efficiently by the learner. Parameters that are not expressed robustly, however, will tend not to affect the fitness of a hypothesis in the same way. The learner will have correspondingly less stake in setting the parameter correctly and will not converge so readily to the parameter value. Now consider the case where parameters are ambiguously expressed. In our terms, there might be several contradictory p-encodings associated with a class of data, for example. Here the learner has several possible solutions available that can account for the input without generating grammatical violations. In this case frequency of parameter expression will not aid the learner in distinguishing between its hypotheses. Instead, the learner will

Computational Model of Language Learnability  59 have to rely on the structure of the hypotheses themselves, and not their empirical coverage, in order to select a winning hypothesis. These internal factors are the overall elegance of representations and the number of superset settings in each hypothesis, both of which are factors in the fitness metric. We argue here that it is this sort of case that provides the fuel for core diachronic change in a parameter setting. In the next section we will turn to a case where learners were faced with just such an ambiguity

3.  A Case Study in Diachronic Change We believe that applying genetic algorithms in the form outlined above to the acquisition of natural languages is not only possible but desirable. It is desirable in part because it avoids the problems discussed in the previous section: it allows convergence over a finite but large hypothesis space, and it can be defined such that superset traps can be avoided (which the version of the fitness metric given in (21) in fact does). Our main contention here, however, is that the genetic algorithm approach provides a solution to the logical problem of language change. We will now turn to an application of the genetic algorithm approach to learning and show how it can model diachronic change as well. 3.1  The History of French Roberts (1993) analyzes three major syntactic changes in the history of French as reflexes of a single underlying parametric change. The three changes are as follows (here and elsewhere, unless otherwise noted, OF and MidF data are from Roberts 1993): (32) Loss of “simple inversion” in interrogatives

a. *A Jean pris le livre? ModF has Jean taken the book b. Comment fu ceste lettre faitte? OF how was this letter made

(33) Loss of null subjects

a. *Ainsi s’amusaient bien cette nuit. ModF thus (they) had fun that night b. Si firent pro grant joie la nuit. OF thus (they) made great joy the night

(34) Loss of V2

a. *Puis entendirent-ils un coup de  tonnerre. ModF then heard they a clap of thunder b. Lors oïrent ils venir un escoiz de tonoire. OF then heard they come a clap of thunder

60  Robin Clark and Ian Roberts As Roberts shows in some detail, each of these constructions was lost in the early 16th century. Roberts argues that these changes reflect an underlying change in the value of the parameter determining nominative Case assignment proposed by Koopman and Sportiche (1991): nominative Case may be assigned (by I) under government, or under agreement, or under both. The central idea of this account of the history of French is that OF allowed nominative Case assignment under government, whereas ModF does not. More precisely, all of the OF constructions depend on the possibility of the inflected verb, V + I, assigning nominative Case to the subject in Spec of IP from C, as shown in (35). (35)

CP C' C0 V + I0



This situation was allowed in the grammar of OF (and is still allowed in, for example, the contemporary Germanic languages). In a grammar where this configuration of Case assignment is not allowed, no lexical NP can survive in subject position in inversion contexts; this is the situation in ModF, where (32) thus violates the Case Filter. Following Kayne (1983) and Rizzi and Roberts (1989 [this volume, Chapter 9]), we assume that clitics can survive in subject position in this context since they are able to pass the Case Filter in other ways (also see Baker 1988, Everett 1986). Adopting Rizzi’s (1986a) proposal that the necessary condition on formal licensing of pro is that it occupy a Case-marked position, Roberts accounts for the change illustrated in (33) by extending the nominative Case parameter to the pro module; it is well known that OF null subjects were licensed only in contexts of inversion (see Thurneysen 1892, Price 1971, Einhorn 1974, Foulet 1982, Vanelli, Renzi, and Benincà 1986, Adams 1987a,b), and so a natural interpretation of this is that null subjects could only be licensed where nominative Case was assigned under government, that is, in the configuration (35). This in turn accounts for why null subjects were lost when nominative Case assignment under government was lost.11 Regarding (34), V2 also depends on the capacity of I to assign nominative Case to the subject under government after being raised to C with the verb. Note that nominative-Case-under-government is a necessary, not a sufficient, condition for V2. Hence, a system without this possibility cannot have V2. However, a system with this possibility need not have V2 (Modern English is probably such a system). In fact, as we will illustrate, obligatory V2 was already eroding in MidF—this was a crucial factor in the instability that led to the change in the nominative Case assignment parameter.

Computational Model of Language Learnability  61 The principal trigger for the change in the possibilities of nominative Case assignment was the introduction of new word orders that did not strictly conform to V2, notably XSVO (where “X” could be a topic or an adverb). This innovation was probably caused by the development of a series of subject clitics in MidF (see below). The cumulative effect of the new word orders was to destabilize the system in such a way that setting the parameter for nominative Case assignment under government positively became impossible by about 1500, and learners converged on a grammar lacking this property. The result was the elimination of the structures in (32)–(34) in 16th-century texts—a major change in the grammar of French. Note that we do not consider the null subject parameter or the V2 parameter as in any sense subsumed by the nominative Case parameter; however, the particular circumstances of French at the time the change took place were such that the loss of nominative Case assignment under government entailed the loss of null subjects and the elimination of V2. Our proposal is that the initial weakening of V2 combined with the development of a series of subject clitics created a system that ultimately eliminated V2, and in doing so eliminated null subjects and simple inversion. In particular, the weakening of V2 had the effect that hypotheses that allowed an input datum to be analyzed as a V2 structure became more costly relative to the fitness metric; thus, the learner was under pressure from fitness to eliminate the V2 hypothesis. Although we concentrate exclusively on French here, there is also evidence (see in particular Vanelli, Renzi, and Benincà 1986) that many of the Northern Italian dialects of Italy have undergone the same parametric change, since in their recorded history, simple inversion, V2, and, arguably, null subjects have been lost (although the contemporary dialects in fact have a kind of “disguised” null subject system that probably represents an independent diachronic innovation; see Poletto 1990, Renzi and Vanelli 1983, Rizzi 1986b). Moreover, Renzi (1983) argues that Modern Standard Italian has undergone the same changes as French regarding inversion while retaining null subjects. In all, five parameters are relevant to our account of the historical development of French. These are given in (36). (36) a. Nominative Case is assigned (by I) under agreement. {1,0} b. Nominative Case is assigned (by I) under government. {1,0} c. Clitic nominative pronouns. {1,0} d. Null subjects licensed canonically (Case-dependently). {1,0} e. Obligatory V-movement to C in matrix declaratives (V2). {1,0}

62  Robin Clark and Ian Roberts Note that we split Koopman and Sportiche’s parameter for nominative Case assignment into two separate parameters in order to preserve a basically binary vocabulary for parameters (see the discussion of Subjacency and bounding nodes in section 2). We take it that (36a) has been constant at 1 throughout the entire period (but see section 4). As just mentioned, (36b,d,e) changed together in the 16th century. The shift in (36d) and (36e) was forced by the change in the value of (36b). This is presumably quite a standard situation with parametric change: changes in parameter values interact. Moreover, parameter values can be affected by nonsyntactic factors, notably phonological changes. This is the case with (36c); properties connected to the stress system may cause a class of pronouns to cliticize and thereby trigger a shift in the value of this parameter. We now review the relevant data from the different periods of French and show how the data trigger parameter settings. To illustrate the general technique, we will first consider Modern French. Then we will consider Old French and finally the period of greatest “structural instability” (and, hence, of greatest interest), Middle French. 3.2  Learning Modern French Before we consider the earlier periods of French, let us first look at the situation in the contemporary language. What are the parameter values for ModF? It is clear that nominative Case is assigned by I to its Spec position; hence, the first position in the string must be set to 1. On the other hand, Rizzi and Roberts (1989 [this volume, Chapter 9]) argue that ModF does not allow nominative Case assignment under government; this is what leads to the restriction to clitics in contexts where the inflected verb, a complex head that contains I, moves to C (e.g., in interrogatives or conditionals; cf. also (32)): (37) a. b.

Ont-ils/*les enfants vu ce film? have they/the children seen this film Aurait-elle/*Marie fait cela . . . had she/Marie done this

Once moved to C, I must Case-mark the subject position under government; the ungrammaticality of (37a–b) with a nonclitic subject shows this to be impossible. In terms of this analysis, I does not assign nominative Case under government in ModF, and we therefore set the second position in the string to 0.12 It is well known that ModF has a class of clitic nominative pronouns (see Kayne 1975, Rizzi 1986b); the contrasts in (37) in fact illustrate that these elements interact with Case theory in a manner distinct from nonclitics. Rizzi and Roberts (1989 [this volume, Chapter 9]) propose that clitics can satisfy Case theory by incorporating with the verb in C (see Baker 1988, Everett 1986, 1989). Thus, we take it that in ModF parameter (36c) is set to 1. Both parameters (36d) and (36e) are set to 0: ModF is neither a null subject nor a V2 language, as comparisons with contemporary Italian and German show, respectively.13

Computational Model of Language Learnability  63 These remarks on the grammar of ModF (which we of course cannot fully substantiate here; see the references cited for further arguments) lead to the following conclusion regarding the representation of the parameters in (36) as a string of binary units: (38) The “target string” for ModF is 10100. Nominative Case is assigned under agreement and subject clitics are allowed. Let us now consider how the parameter values for ModF are expressed in the input text. Recall that a sentence S expresses a parameter Pi iff a grammar must have Pi set to a particular value in order to assign a well-formed representation to S. In such a situation, S is a trigger for Pi. The following examples illustrate a significant part of the trigger for the parameter values of ModF: (39) a. b. c.

Jean aime Marie. Jean loves Marie Hier Jean est parti. yesterday Jean left Où est-il allé? where did he go

Recall that the conditions of acquisition are such that starred examples like the (a) cases of (32)–(34), which can be used by the linguist to justify a particular analysis, are not available. Moreover, many sentences are amenable to differing structural analyses that can affect their status as triggers. This last point is crucial to understanding how change takes place, as we will show. Consider first (39a), a simple declarative sentence with canonical SVO order. In terms of the usual analysis of ModF, the relevant parts of this sentence are as follows: (40) [IP Jean [I′ aime . . .]] Parsed in this way, (39a) triggers nominative Case assignment under agreement and indicates that V-movement to C is not required in matrix declaratives—in other words, that ModF is not V2. Thus, (40) is associated with the following p-encoding: (41) [1 * * * 0] Nominative Case is assigned under agreement, and V-movement to C is not allowed in matrix declaratives. (41) indicates that (39a) tells the learner that nominative Case is assigned under agreement, and that French is not V2; but it does not say anything

64  Robin Clark and Ian Roberts about whether nominative Case is assigned under government, whether subject clitics are allowed, or whether null subjects are allowed. However, strings exactly equivalent to (39a) are grammatical in the Germanic V2 languages. In these languages the relevant parts of the structure are as follows: (42) [CP Jean [C′ aime [IP t [I' t. . .] Call this the “V2 parse” of an SVO sentence. Here I assigns nominative Case to the Spec of I' (i.e., the position occupied by the trace of the subject) under government; we will refine this analysis in section 4. Hence, the p-encoding for this parse is as follows: (43) [* 1 * * 1] The parser must have nominative Case assignment under government, and V-movement to C is obligatory in matrix declaratives. As (43) shows, (39a) remains silent regarding subject clitics and null subjects. To sum up, SVO declaratives in ModF have the following p-encodings: (44) SVO declaratives p-encode a. [1 * * * 0] Nominative Case is assigned under agreement, and V remains in I in matrix declaratives. b. [* 1 * * 1] Nominative Case is assigned under government, and V moves to C in matrix declaratives. SVO sentences are thus associated with different p-encodings depending on the parse they are given. We can characterize this situation in terms of the following notion of p-ambiguity: (45) A sentence S is p-ambiguous with respect to some parameter Pi just in case S has the set of well-formed representations (R1 . . . Rn) and Pi must be set to some definite value υ1 in order to assign Ri to S (i.e., Ri triggers a Pi (υ1)), whereas Pi does not need to be set to υ1 in order to assign Rj ≠ Ri to S. ModF SVO sentences are p-ambiguous, as (44) shows. As will be discussed in section 3.3, however, the representation where V is in C is disfavored since it involves a more complex structure than the representation where V is in I. Now consider (39b). In V2 languages generally, orders of this type are impossible (see Schwartz and Vikner 1996). This can be interpreted in terms of a ban on adjunction to CP. Supposing that this is so, this example must

Computational Model of Language Learnability  65 be parsed with the adverb attached to IP, V in I, and the subject in Spec of I′. In other words, the relevant parts of the structure are like the parse of (39a) given in (40), and the triggering properties of the sentence are the same. More generally, we can conclude the following: (46) XSV p-encodes [1 * * * 0] Nominative Case is assigned under agreement, and movement of V to C is not allowed in matrix declaratives. Now consider the interrogative in (39c). (39c) provides evidence for the subject clitic (this evidence is probably morphological, given the existence of a separate paradigm of clitic pronouns) and therefore, given that clitic pronouns do not obey the Case Filter in the same way as nonclitic NPs, provides no evidence for either Case assignment parameter. We take it that interrogative sentences by their nature provide no evidence regarding V2 in declaratives, and the null subject parameter is not determined either. We therefore arrive at the p-encoding in (47) (where s indicates a subject clitic in the schematic word order). (47) whVsO p-encodes [* * 1 * *] Subject clitics are possible. If the subject clitic is not recognized as such, but treated as a full NP, this sentence would p-encode (48). (48) [* 1 * * *] Nominative Case is assigned under government. We assume, however, that phonological and morphological evidence disfavors this possibility. When we put the p-encodings in (43)–(47) together (and disregard the one in (48)), the following picture emerges: (49) a. SVO, XSV: [1 * * * 0] Nominative Case is assigned under agreement; no V2 is possible in declaratives. b. SVO: [* 1 * * 1] Nominative Case is assigned under government; V2 is possible in declaratives. c. whVsO: [* * 1 * *] Subject clitics are possible. The two parameters that are not positively set are nominative Case assignment under government and null subjects. These are both set to 0 in the optimal case. Let us consider why.

66  Robin Clark and Ian Roberts The two parameters determining nominative Case assignment by I, (36a–b), are in a shifting relation. Although neither parameter directly determines a superset relation (a grammar that allows nominative Case assignment under agreement generates a language that intersects with one that does not; similarly for nominative Case assignment under government), if both parameters are set to 1, they together generate a language that is the superset of the one that results from setting either parameter to 0. This is a classic case of shifting (of the type seen in section 2). Now, as we have shown, (36a) is unambiguously expressed in the input for ModF and thus is set to 1. In order to avoid shifting, a positive value for (36b) is strongly disfavored. Since there is no unambiguous evidence for nominative Case assignment under government, the pressure against shifting is decisive and the parameter is set to 0 in the optimal grammar. It should be noted that the only evidence for nominative Case assignment under government consists of sentences with the order SVO, with a V2 parse, which can also be analyzed more compactly under the assumption that nominative Case is assigned under agreement. In particular, the V2 parse for the SVO order must involve movement of the subject to the Spec of CP and thus entails a longer chain than would occur under the competing analysis. Thus, the non-V2 parse is again favored and the V2 parse is disfavored by the fitness metric. This provides the learner with further evidence in favor of setting the V2 parameter to 0, as well as disfavoring nominative Case assignment under government. For the null subject parameter, we could follow Berwick’s (1985) reasoning and invoke the Subset Condition. If null subject languages are a superset of non-null-subject languages, the lack of a trigger for a positive value of the null subject parameter will guarantee that (36d) is set to 0. Alternatively, we could appeal to morphological conditions and say that, although the syntactic evidence does not determine a value for (36d), the “poverty” of French verbal inflection determines a negative value. We will leave this question open here. The above paragraphs demonstrate how the various factors we are concerned with work. On the basis of simple, plausible, positive evidence, the learner can converge on the correct parameter settings for Modern French. In what follows we will show how these same factors led to a major parametric change in French, circa 1500. 3.3  Old French As mentioned earlier, OF allowed nominative Case assignment under government (see (34a–b)). We assume that nominative Case could also be assigned under agreement, although we will return to this point in section 4. (34b) shows that OF allowed null subjects, although it is well known that these were possible only in contexts of inversion. Another well-known and much-discussed difference between OF and ModF is that the OF nominative

Computational Model of Language Learnability  67 pronouns je, tu, il, etc., were potentially tonic elements, unlike their ModF counterparts (see Kayne 1975 on ModF; Adams 1987a,b, Roberts 1993:sec. 2.2, and below on OF). These facts about OF syntax lead to the following parameter settings, in terms of (36): (50) The target string for OF is 11011. Nominative Case assignment was possible both under agreement and under government; null subjects were possible; V2 was obligatory in matrix declaratives. As in the previous section, we now show how this string could be determined on the basis of simple, positive evidence.14 The following kinds of sentence were available as evidence, where (S) indicates a null subject: (51) a. XVS (Et) lors demande Galaad ses armes. (and) then asks Galahad (for) his arms b. SVO Aucassins ala par le forest. Aucassin went through the forest c. XV(S)O Si firent grant joie la nuit. so (they) made great joy  the night (52) a. whVSO (Mais) ou fu cele espee prise . . . ? (but) where was that sword taken b. whVSO Ne nos connoissez vos mie? NEG us know you not (51a) is a V2 declarative (as in modern Germanic languages, conjunctions like ‘and’ and ‘but’ do not count in the computation of V2; these elements can be external to CP when they conjoin CPs). The relevant parts of the structure of this sentence are as follows: (53) [CP Lors [C' demande [IP Galaad . . .]]] Here the inflected verb in C assigns nominative Case to the subject NP, Galaad, under government. Of the five parameters in (36), this example then positively triggers nominative Case assignment under government and V2. More generally, this word order has the following p-encoding: (54) XVS p-encodes [ * 1 * * 1 ] Nominative Case is assigned under government, and V2 is obligatory in matrix declaratives.

68  Robin Clark and Ian Roberts OF also allowed SVO sentences like (51b). As in the case of the ModF SVO order, this kind of sentence is p-ambiguous in the following way: (55) SVO p-encodes either [ * 1 * * 1 ] Nominative Case under government, V2 or [ 1 * * * 0 ] Nominative Case under agreement, no V2 We will return to this point below. As noted earlier and illustrated in (51c), OF allowed null subjects in V2 contexts. Such examples are also p-ambiguous from the point of view of the learner: if V is in C, then the null subject is licensed under government in Spec of I; if V is in I, then the null subject is licensed under agreement in Spec of I. In the former situation, nominative Case under government and V2 are triggered; in the latter, nominative Case under agreement is triggered along with a negative value for V2. In both cases, the null subject parameter is positively triggered. The following p-ambiguity arises: (56) XV(s)O p-encodes either [ * 1 * 1 1 ] As above, with null subject or [ 1 * * 1 0 ] As above, with null subject Now consider the interrogatives in (52). (52a) has the same trigger properties as a V2 declarative, except that by assumption interrogatives cannot trigger the V2 parameter. On the assumption that the nominative pronouns were tonic,15 (52b) involves nominative Case assignment under government to the clitic, just as with any other NP subject. These examples, then, have the following p-encoding: (57) whVSO p-encodes [ * 1 * * * ] Nominative under government Putting the above p-encodings together, we arrive at (58). (58) a. b. c. d. e.

[*1**1] Nominative under government and V2 [1***0] Nominative under agreement and no V2 [*1*11] Nominative under government, null subject, and V2 [1 * * 1 0 ] Nominative under agreement, null subject, and no V2 [*1***] Nominative under government

Computational Model of Language Learnability  69 Both nominative Case parameters are triggered positively. (Notice that the positive evidence overrides the fact that these two parameters are in a shifting relationship; we return to this in section 4.) The null subject parameter is also positively triggered. V2 is also triggered if we take it that the positive evidence for the more complex trigger weighs more heavily than the pressure in favor of the simpler structure in the p-ambiguous cases; this is a matter that can be captured by the fitness metric. Finally, as mentioned in footnote 11, there is no morphological evidence in favor of subject clitics, in that there was only one series of subject pronouns at this time. Phonological evidence presumably militates against treating the nominative pronouns as obligatory clitics; for example, these pronouns could be stressed in OF, as their occurrence in topicalized position indicates: (59) Je, que sai? me what do I know Moreover, subject pronouns, unlike object pronouns, could appear first in V2 declaratives. This indicates that they “counted” just like other XPs for the determination of V2; object pronouns did not “count,” however: (60) a. b.

Tu es or riche et ge sui po proisié. you are now rich and I  am little valued Toutes ces choses te presta Nostre Sires. all  these  things to you lent our  Lord

On the basis of evidence of this kind, the subject clitic parameter was set to 0. Thus, we have demonstrated how simple, positive data could trigger the parameter settings for OF. Indeed, this discussion of the OF data brings out one important point: clear, positive evidence overrides all other considerations. We showed this in two cases. First, OF had a shifted system with respect to the nominative Case parameter, but learners nevertheless converged on this system since there was clear, positive evidence for it. Second, the p-ambiguities of SVO and V2/null subject examples are resolved by the unambiguous V2 cases, and moreover this resolution is in the direction of the more complex structure. In other words, clear, positive evidence can override both subset/shifting considerations and the pressure toward the simplest possible structure. In terms of our assumptions and definitions, “clear, positive evidence” means non-p-ambiguous evidence. Since the only non-p-ambiguous evidence for V2 is the XVS order, this type of sentence clearly played a crucial role. This order was very frequent in OF matrix declaratives. Roberts (1993:sec. 2.3.1) gives the following percentages for (X)VS and SV(X) order (based on the first 100 matrix declaratives with overt subjects in six representative texts): (61) (X) VS = 58% SV (X) = 34%

70  Robin Clark and Ian Roberts Although a more sophisticated and exhaustive quantitative analysis is needed in order to fully demonstrate the point, we can conclude that (X) VS orders were sufficiently frequent to trigger a positive setting of the V2 parameter. This in turn means that SVO sentences could be analyzed as V2, unlike in ModF. Thus, a shifted system is allowed because there is clear evidence for it; the situation is quite different in ModF, where the only evidence for the shifted system is p-ambiguous and is therefore disregarded. In section 1 we introduced the notion of stability of parameter setting, proposing that a parameter setting is stable to the degree that its expression in the input data is unambiguous. Was the V2 parameter stable in OF? The only non-p-ambiguous trigger for V2 is provided by XVS orders. The frequency of these orders positively sets the nominative-Case-undergovernment parameter and thereby makes the V2 parse available for the p-ambiguous SVO and null subject structures. The potential instability created by the “non-V2 parses” of these examples is eliminated in the optimal grammar of OF. Nevertheless, it is likely that the non-V2 parse for SVO and null subject sentences was a close rival for the V2 parse, even in (later) OF, especially since elegance considerations always favor a non-V2 parse over a V2 parse where there is a choice. More explicitly, in terms of the fitness metric, the existence and frequency of an unambiguous trigger for V2 was sufficient to establish a positive setting for the V2 parameter. Recall that the relative elegance of a parse plays a less crucial role in judging fitness than real grammatical violations. This is because the elegance factor is scaled down by the constant c of the fitness metric, whereas violations are not scaled down. Thus, a hypothesis that leads to slightly more inelegant representations without generating grammatical violations will ultimately drive a hypothesis that generates elegant violations out of the population. In the next section we will show how the MidF situation contrasts with what we have just described for OF. In particular, we will show that, in part because of the introduction of new word orders and in part because of the diminishing frequency of XVS, XVS orders were no longer able to trigger a positive value for the V2 parameter. As a result, the V2 parameter became maximally unstable. The instability was resolved by a parametric change that led to the loss of the constructions in (32)–(34). 3.4  Middle French In MidF, XSV was introduced, and SVO and V1 became more frequent. These facts are standardly described in histories of French (see Harris 1978, Marchello-Nizia 1979, Vance 1989, and, for a detailed treatment in terms of the parameters under discussion here, Roberts 1993). Together, they meant that the V2 constraint was less rigorously respected than it had been in OF (although V2 orders were still possible throughout this period, unlike ModF). Also, a separate series of nominative clitics emerged. For now, we will take the introduction of the new word orders as given, although we

Computational Model of Language Learnability  71 discuss possible causes for this change in section 4 (also see Adams 1987a,b, Roberts 1993). We treat the cliticization of nominative pronouns as a phonologically driven change. Otherwise, MidF was like OF and different from ModF, in particular with respect to nominative Case assignment under government and null subjects. We do not present a target string for MidF, however, since we precisely wish to show how indeterminacy in one parameter (V2) created indeterminacy elsewhere (nominative Case assignment under government and the possibility of null subjects). Let us consider the types of evidence available in MidF. As in OF, the following kinds of declaratives were found: (62) a. XVS Or avoit nostre curé priez des aultres prebtres. now had our priest asked the other priests b. SVO Les Anglais veulent un roi guerrier. the English want a warrior king c. XV(S)O Or ai eu plusseurs  fois  grant imagination. now have (I) had several times great  imagination Also as in OF, these constructions have the following p-encodings, corresponding to (62a), (62b), and (62c), respectively: (63) a. XVS: [ * 1 * * 1 ] Nominative under government and obligatory V2 in matrix declaratives b. SVO: [ * 1 * * 1 ] As above or [ 1 * * * 0 ] Nominative under agreement and no V2 c. XV(S)O: [ * 1 * 1 1 ] As in (a) with null subjects or [ 1 * * 1 0 ] As in (b) with null subjects The changes that took place in MidF created further possibilities, however. Consider the following examples (where s indicates a subject clitic): (64) a. XVs Or  ai  je proposé  ensi  que . . . now have I  proposed thus that b. XsV Et ce conseil nous vous  donnons. and this advice we to you give


Robin Clark and Ian Roberts

Taking these examples to positively trigger the subject clitic parameter, we propose that they have the p-encodings in (65a) and (65b), respectively. (65) a. XVs: [ * * 1 * 1 ] Subject clitics and V2 b. XsV: [ * * 1 * 0 ] Subject clitics and no V2 Since clitics can receive Case in ways unavailable to other nominal elements, sentences containing subject clitics provide no information about either nominative Case assignment parameter. The order verb-clitic in (64a) triggers a positive setting for the V2 parameter. On the other hand, since French subject clitics (then as now) do not attach to a verb and move with it (unlike object clitics), the order clitic-verb in (64b) triggers a negative value for the same parameter (but see below for further discussion of this kind of case). As mentioned earlier, MidF allowed, with growing frequency, other word orders that were not found in OF: (66) a. XSV Lors la royne fist Saintré appeller. then the queen had Saintré called b. (S)VY Se appensa de faire ung amy. (he) to himself thought to make a friend (66a), combined with the greater frequency of SVO orders in MidF as compared to OF, shows that V2 began to “erode” at this period. Sentences like (66b) illustrate another phenomenon, noticed and analyzed by Vance (1989): the fact that null subjects increase their distribution in this period, no longer being licensed only in inversion contexts. Roberts (1993:sec. 2.3.5) analyzes this situation in terms of the idea that null subjects could be licensed under agreement as well as under government in MidF, whereas in OF they were licensed only under government. So MidF allowed a null subject in the following configuration: IP

(67) NP




The p-encodings for these orders are as follows: (68) a. XSV: [ l * * * 0 ] Nominative under agreement and no V2

Computational Model of Language Learnability  73 b.

(S)VY: [ 1 * * 1 0 ] As in (a) with null subject or [ * 1 * 1 1 ] Nominative under government, null subject, and V2

In interrogatives the same general situation holds as in declaratives. On the one hand, the same kinds of examples are found as in OF: (69) a. whVSO Que voelt ceste parolle dire? what wants this word  to say ‘What does this word mean?’ b. whVsO A qui estes vous? whose are you (69a) has the same p-encodings as its OF counterpart: (70) whVSO: [ * 1 * * * ] Nominative under government (69b), on the other hand, no longer encodes nominative Case under government, since the subject has cliticized: (71) whVsO: [ * * 1 * * ] Subject clitics Let us now put together the MidF p-encodings: (72) a. b. c. d. e. f. g. h.

[*1**1] Nominative under government and V2 [1***0] Nominative under agreement and no V2 [*1*11] Nominative under government, null subject, and V2 [1**10] Nominative under agreement, null subject, and no V2 [**1*1] Subject clitics and V2 [ * * 1 * 0] Subject clitics and no V2 [*1***] Nominative under government [**1**] Subject clitics

74  Robin Clark and Ian Roberts In terms of p-encodings alone, the V2 parameter appears to be no more or less unstable than it was in OF. However, two factors distinguish the MidF situation from the OF one. First, the unambiguous trigger for V2— XVS order—was much less frequent in MidF than in OF. According to Marchello-Nizia (1979), the mean orders for three texts from the late 15th century are as follows: (73) (X) VS = 10%

SV (X) = 60%

This is a significant difference in frequency as compared to OF (see (61)). The second factor concerns the status of SVO clauses. As shown earlier, the “V2 parse” for these clauses is disregarded in ModF, yet it was favored in OF. In MidF there is total indeterminacy on this point: there is (infrequent) evidence for V2 in the form of XVS order, and there is evidence against V2 in the form of XSV. Any parsing device with a positive setting for V2 would engender a violation on this word order and would be disfavored by the fitness metric. Another factor that adds to the instability of V2 at this point is the development of left-dislocation with a resumptive pronoun (Priestley 1955, Kroch 1989). This is illustrated by the following example from Priestley: (74) Les autres arts et sciences, Alexandre les honoroit bien. the other arts and sciences Alexandre them-honored well The development of this type of construction led to shifting of the type described in (5) and (6). That is, the interaction between left-dislocation and V2 further obscured the latter due to surface “V3” orders. Kroch (1989: 215) shows that there is a real correlation between the rise of the construction in (74) and the loss of V2. The correlation results from the action of the fitness metric, which will judge a system of this type as relatively unfit. Late MidF V2 provides an instance of the situation described in section 1: learners are unable to converge on a single value for a parameter. In other words, the V2 parameter is maximally unstable. This case therefore exemplifies the “pathological” situation for acquisition. Since the available data cannot decide between two parametric values, other aspects of the fitness metric come into play: the Subset Condition and the elegance criterion. As noted earlier, a language with both V2 and left-dislocation will be disfavored by the Subset Condition, since it is a case of shifting. Another factor that can decide between competing parses, and therefore competing p-encodings and triggers, is the criterion of elegance. It is reasonable to suppose that learners follow a least effort strategy in that they try to assign the simplest possible parse to the input string.16 This idea can be instantiated in terms of counting nodes, traces, or chain positions. We will not attempt to

Computational Model of Language Learnability  75 choose between those possibilities (Roberts (1993) opts for chain positions; for a formal statement of this, see his chapter 2, note 26); what is important here is that any parse that represents the inflected verb as being moved to C is more costly in terms of the least effort strategy than one that represents the verb as being moved only to I (by any of the above criteria). Suppose, then, that the least effort strategy plays a crucial role in resolving the instability in the data, by penalizing all p-encodings that depend on V-movement to C where there is a choice between this and V-movement to I. More technically, suppose that hypothesis h1 is identical to hypothesis h2 except that h2 allows for V2 in matrix declaratives whereas h1 does not. That is, h1 and h2 admit the same sentences and contain the same number of superset settings to parameters, differing only in the value for the V2 parameter. Hypothesis h2, then, systematically includes more structure in its representation than h1 since h2 will represent the verb as having moved to C (as well as movement of the subject in SVO). In other words, if h1 returns k nodes on a structure, h2 will return k + n nodes. Letting m represent the number of superset settings in each hypothesis, running each of the above through the fitness metric will yield the following ratings: (75) a.

h1: 1 −

m + ck 2m + c (2k + n)


h2: 1 −

m + c (k + n) 2m + c (2k + n)

m + c (k + n) m + ck is greater than 1 − , the fitness 2m + c (2k + n) 2m + c (2k + n) metric prefers h1 over h2 and the learner is under pressure to select h1. This, then, effectively sets the V2 parameter to 0. Like OF, MidF had one order where the V2 parameter was unambiguously p-encoded as 1: namely, XVS orders, which unambiguously p-encode [* 1 * * 1]. In the situation of instability that reigned in MidF, the fitness metric, formulated to take account of the way in which the least effort criterion resolves p-ambiguities, will lead to convergence on a grammar where such experience is simply disregarded (i.e., not parsed where no alternative analysis can be found).17 Thus, this case shows how an unambiguous trigger for a given property can be disregarded when the system is maximally unstable, even if the instability is located in another area of the grammar. If the hypotheses where the V2 parameter has a positive value are penalized, the only remaining triggers for nominative Case assignment under government are whVSO orders. This order, too, is only weakly triggered in 15th-century French. The difference between MidF and OF in this regard was that several new constructions were available in MidF, notably complex inversion (as in Où Jean est-il allé? ‘(lit.) where Jean is-he gone’) and Since 1 −

76  Robin Clark and Ian Roberts (qu’)est-ce que ‘(what) is-it that’.18 Whereas nominative Case assignment under agreement received strong support from the input data, nominative Case assignment under government received very little. Since the two parameters are in a shifting relationship, there was some pressure (built into the fitness metric, as shown in section 2) not to set them both to 1. In this situation, the fact that nominative Case assignment under government was only weakly triggered led to a change in the value of this parameter. The change to a system with nominative Case assignment under agreement entailed a change in the null subject parameter (already only weakly triggered, as (74) shows) for theory-internal reasons. Under the assumption that null subjects can only be licensed in positions where Case is assigned (Rizzi 1986a), once nominative Case could no longer be assigned under government, null subjects could no longer be licensed under government. In this way, French lost null subjects with no significant change in the verbal inflectional morphology. There is a complication here, however—namely, that MidF, unlike OF, also allowed null subjects to be licensed in configurations of agreement. Why were these null subjects lost along with those licensed in government configurations? Roberts (1993) answers this question in terms of a postulate concerning the identification of null subjects that we can phrase as follows: (76) Where null subjects are licensed only in configurations of agreement, they require a “pronominal” Agr for identification. A “pronominal” Agr is an Agr that morphologically distinguishes at least five persons—that is, an Agr of the kind found in languages like Spanish and Italian. French Agr is not pronominal in this sense, and indeed has not been since early in the OF period. The intuition behind (76) is that a system where null subjects are licensed under government requires less inflectional morphology to recover the content of those null subjects than one where the only licensing configuration is agreement, since government is a closer syntactic relation than agreement. A system that licenses null subjects both under government and under agreement, like MidF, tolerates a relatively poorer agreement morphology. Therefore, once null subjects could no longer be licensed under government in French, the relative “poverty” of the verbal morphology became crucial, and null subjects were also lost in contexts where they had been licensed under agreement. As Roberts shows, the parallel development of Northern Italian dialects, in particular Veneto, supports the postulation of (76). Thus, at the beginning of the MidF period (ca. 1300) the relevant parameter settings were those in (77a); by the end of this period (ca. 1500) they had become those in (77b). (77) a. 11011 (= OF) b. 10100(= ModF)

Computational Model of Language Learnability  77 It is clear that the crucial element of instability was created by the gradual erosion of V2 as a rigid constraint on word order in matrix declaratives. In particular, the introduction and spread of XSV orders brought about a situation that eliminated a crucial trigger for nominative Case assignment under government—XVS order. The previous discussion shows how the genetic algorithm approach to learnability, and in particular the fitness metric, can shed light on this. What seems to have happened is that V2 was mildly unstable in, say, 1300 (recall the discussion at the end of section 3.3) in the sense that non-V2 parses for certain types of sentence (e.g., SVO) were close competitors for V2. These competitors generated “mutant” word orders, notably XSV, which were highly successful. The critical point was reached in the late 15th century, when V2 was eliminated. For completely contingent reasons (which concern the overall organization of the MidF grammatical system), the loss of V2 led to the loss of nominative Case assignment under government. And for reasons having to do with the organization of UG, this entailed the loss of null subjects. Moreover, Roberts (1994 [this volume, Chapter 11]) argues that this in turn led to the loss of clitic climbing (also see Kayne 1989). This account of syntactic changes in the history of French illustrates how syntactic change can be internally driven: change in one parameter can destabilize another. We will provide another example of this in section 4. However, we now find ourselves up against the problem posed by innovations: how were XSV orders introduced into a V2 system? Since these orders are ungrammatical in modern V2 Germanic languages, their introduction into a V2 system requires some comment. If we say that the weakening of V2 was a condition for this development, we risk falling into an unproductive regress. It was in part for this reason that we avoided the issue earlier and simply took this innovation as given. However, there are good reasons to think that the introduction of XSV order is related to the cliticization of subject pronouns. Adams (1987b) points out that the overwhelming majority of early cases of XSV involved a pronominal subject. As Adams suggests, it is possible that XSV originates from cases of V2 where the clitic subject pronoun is not counted in determining V2.19 If Adams’s idea is correct, then the initial stimulus for the erosion of V2 comes from a morphophonological change in the subject pronouns. As is frequently the case, syntactic change can be traced back to extrasyntactic factors, although the relationship between the extrasyntactic factors and the syntactic changes they cause can be extremely indirect. This is because instability, once introduced, can propagate through a grammatical system.

4.  Some Concluding Remarks Here we wish to address some of the wider issues raised by our case study of language change. These concern the shifting relationship between the nominative Case parameters in section 3.1 with respect to the OF data and

78  Robin Clark and Ian Roberts what our approach has to say about the classic questions for diachronic linguistics concerning the nature of innovation and loss. How is it that a massively unstable system of parameter settings, like the one in MidF, can come into being in the first place? Of course, factors external to the syntax can destabilize a syntactic system, but we believe that instability can propagate within a syntactic system and that exactly this has happened in the history of French. Consider again the p-encodings for the OF data: (78) a. b. c. d. e.

[*1**1] [1***0] [*1*11] [1**10] [*1***]

Bearing in mind that the correct grammar for OF did not contain nonV2 parses (i.e., the p-encodings in (78b) and (78d) are discarded in the correct grammar), it seems that nominative Case assignment under agreement had a quite precarious status in OF. There was another trigger for nominative Case assignment under agreement, however: the fact that subordinate clauses regularly had SVO order (assuming, contra Lightfoot (1989, 1991), that subordinate word order can trigger parameter settings). Thus, it is the fact that OF had a root/embedded asymmetry with respect to V2 order that is crucial for triggering nominative Case assignment under agreement. Now, there is evidence that early OF (prior to ca. 1200) allowed embedded V2 (Cardinaletti and Roberts, 2002 [this volume, Chapter 12], Dupuis 1989, Hirschbuhler 1990). This means that nominative Case assignment under agreement was an OF innovation, emerging in subordinate clauses as V2 became a uniquely root phenomenon. This innovation started the chain of changes leading to the MidF innovations that were crucial to our account in section 3.4 (and hence to the later changes discussed there). Assume that an archaic stage of OF did not allow nominative Case assignment under agreement. How can Case assignment under agreement arise? Notice that when such assignment comes into the grammar, a shifted system is introduced on the basis of a nonshifted one. Following an idea originally due to Cardinaletti (1990), let us suppose that expletive elements can never topicalize. In a V2 system, however, Spec of C’ is a topic position: it is an Ā-position and a position that does not receive Case. Cardinaletti proposes that when an expletive occupies this position, as frequently happens in the V2 Germanic languages, the position is able to count as an A-position in that (nominative) Case can be assigned there. Thus, we can attribute the introduction of nominative Case assignment under agreement to the introduction of a lexical expletive capable of occupying Spec of CP in matrix declaratives. OF had a lexical expletive il that appeared in Spec of C’ in examples like (79) (from Einhorn 1974: 123).

Computational Model of Language Learnability  79 (79) Il ne me chaut. it not to me matters ‘It doesn’t matter to me.’ Supposing that this construction emerged in archaic OF, we can then say that nominative Case assignment under agreement was triggered by this kind of example. Finally, let us briefly consider what implications our proposals may have for traditional preoccupations of diachronic syntax: the nature of innovation and the nature of loss. Of course, it should be immediately clear that the conception of how grammatical systems differ from one another that lies at the heart of the principles-and-parameters approach means that parameters themselves never change.20 What changes over time are parametric values. Nevertheless, at the level of constructions (e.g., available word order types) it is clear that possibilities are both innovated and lost. In our terms, innovation may arise from one of two sources: either internally, when a parametric change makes new constructions available, or externally, when phonological or morphological change weakens evidence for certain hypotheses. The second type of innovation is likely to lead to instability at the level of parameter settings, as in the case of the introduction of XSV orders triggered by the cliticization of subject pronouns in MidF.21 Concerning loss, it seems that only parametric change can truly eliminate a construction in the sense that construction C is accepted by native speakers of language L at time T and rejected at T′ (T > T′). This has been the fate of simple inversion, V2, and null subjects in French. In terms of the standard view of language acquisition, this situation seems problematic. Put very simplistically, why is one generation’s trigger experience the next generation’s fossil? This is the logical problem of language acquisition again. Various solutions have been proposed, but we believe we have discovered a new and interesting one. An approach to learnability based on a genetic algorithm including a version of the fitness metric makes it possible to see how a data point can be disregarded in a situation of instability (where instability can be formalized); this was what happened in the case of XVS orders in 15th-century French. Although relatively infrequent and often parsable as some other construction, XVS was certainly found in 1500, and so, given the standard assumption that parameters can be set on the basis of quite impoverished experience, an account of loss based on frequency considerations alone will not answer the fundamental question. The fitness metric, properly formulated so that frequency and other considerations are taken into account, seems able to resolve this tension between standard views of acquisition and the fact that structures are lost in the course of language change, since it can be seen why one class of input strings may be rendered unparsable. This can happen even where, as in the case of XVS orders, the input in question is intrinsically simple and structurally “transparent”; here we

80  Robin Clark and Ian Roberts see a major difference between our account and the approach to language change based exclusively on something like Lightfoot’s (1979) Transparency Principle, although we believe our approach retains the basic insight behind the Transparency Principle in the elegance part of the fitness metric.22 Another important consideration that emerges from our discussion is that exactly the same string Si can successfully trigger a parameter setting P(υ1) in one grammatical system Gi, but fail to trigger P(υ1) in system Gj ≠ Gi. French XVS order is a case of exactly this sort, where Gi is the grammar of OF and Gj that of late MidF. In terms of the genetic algorithm, Si can trigger a successful hypothesis or an unsuccessful one. As in the biological world, successful propagation depends as much on the external environment as on internal properties, so that little can be predicted purely on the basis of internal structural criteria. It is this aspect of the genetic algorithm that makes possible a deeper understanding of language change and demonstrates how successive generations may treat the “same” trigger experience differently. Note also that in these terms, language change refers not only to the “limit cases” of innovation and loss, but also to the varying success of strings in encoding viable parameter settings. Our approach also has implications for the theory of markedness. It is part of the classical concept of markedness that marked properties are both diachronically unstable and “difficult” in terms of acquisition. A shifted system of parameter settings can be thought of as a marked system. It is clear from our discussion that a shifted system is diachronically unstable. Consider again the shifted system discussed in section 2, which featured both V2 and left-dislocation. Neither V2 nor left-dislocation is marked on its own (note the stability of Germanic V2 and the fact that all periods of French feature left-dislocation of one kind or another); however, their combined presence in a system leads to markedness—witness the instability of MidF.23 So we suggest that in general markedness, rather than being an inherent property of certain parameter values, is a property that derives from the interaction of parameters in a given grammatical system, relative to the fitness metric. This in turn implies that a given parameter value can be marked in one grammatical system (or at one period) and unmarked in another system (e.g., at another period). Diachronic studies of the type discussed here also have important implications for the study of language learnability and language acquisition. As discussed briefly above, diachronic change represents a type of “pathological” learning, where learners systematically arrive at the wrong grammar for the target language. Strictly speaking, these are cases where learners fail. We would argue that learners fail for reasons that reveal something important about their internal structure. Parametric change is the result of an input text that places indifferent pressure on the learner’s hypotheses; several different grammars can provide an acceptable account for the input text. We have shown that other factors, always related to the learner’s internal fitness metric, come into play to distinguish between the competing hypotheses.

Computational Model of Language Learnability  81 These factors involve the Subset Condition and a measure of elegance. Let us return to the fitness metric, repeated here as (80). (80) The fitness metric




υ j + b∑ j =1 sj + c ∑ j =1 e j − (υi + bsi + cei ) n

j =1


(n − 1)(∑ j =1 υj + b∑ j =1 sj + c∑ j =1 e j ) n



Our study of diachronic change reveals certain facts about the scaling constants b and c. We assume that empirical coverage of the input text is the n learner’s central interest; thus, violations (calculated by ∑ j =1 υ j for the population and by υi for the individual) are the single most important factor in the equation. Both superset settings and elegance are scaled down by the constants b and c, respectively. Let us now consider what the relative magnitudes of b and c are. At a certain point, French was a V2 language that allowed for left-dislocation (the latter associated with atonic pronouns), and it was a shifted language that would be selected against by the fitness metric. Furthermore, the relative frequency of structures that would have required both V2 and left-dislocation was relatively low, placing little pressure on the learner in terms of violations. All else being equal, learners could have preferred either a language with matrix V2 and no left-dislocation or a language with left-dislocation and no matrix V2. Notice that left-dislocation is a superset parameter; a language that allows left-dislocation in addition to its basic word order is a superset of a language that allows only the basic word order. We argued, on the other hand, that matrix V2 led to more complex representations, relative to the input text, than a grammar without matrix V2. Now, the changes we have illustrated in French involve the abandonment of matrix V2, a nonsuperset parameter, and the persistence of leftdislocation, a superset parameter. Given our premises, then, the fitness metric must have preferred a grammar that generated an elegant set of representations and a superset language over a grammar that generated inelegant representations and a subset language. Thus, learners appear to consider elegance a more important factor than superset settings when evaluating hypotheses: (81) c > b Thus, our study of diachronic change has enabled us to make a concrete hypothesis about how learners evaluate parameter settings. We can now test this hypothesis against actual child grammars, perhaps by attempting to characterize successive developmental stages in child language. In general, we should see children avoiding grammars that create inelegant representations. More to the point, we should find children resisting grammars that

82  Robin Clark and Ian Roberts force longer chains to the point of, temporarily at least, preferring grammars with superset settings if these grammars can approximate the target. We have shown how a theory of language learning based on a genetic algorithm affords a novel and insightful account of language change, taking as our case study of language change the development of word order and null subjects in French. We believe that our account sheds light both on the mechanisms of language change and on those of language acquisition, and goes some way toward building a bridge between these two domains; in this respect, our work is conceptually very close to work by Lightfoot (1989, 1991). Moreover, we have shown that it is possible to characterize the markedness of systems and to clearly see the role played by such factors as elegance and frequency of input, and the interactions between these factors. We know of no other approach to language learnability and language change that achieves these results.

Notes The first author received support from grant 11–25362.88 from the Fonds national suisse pour la recherche scientifique and from a grant from the Fondation Ernstet Lucie Schmidheiny. This article has greatly benefited from comments made by two anonymous reviewers for Linguistic Inquiry.   1. The first person to formulate this problem in terms of generative syntax was Lightfoot 1979.   2. Genetic algorithms were developed by Holland; see particularly Holland 1975. Goldberg 1989 provides a comprehensive overview of the technique; see also Booker, Goldberg, and Holland 1990. Clark 1990 develops a model of parameter setting in terms of genetic algorithms as an approach to demonstrating the learnability property. See also Clark 1992 for a comprehensive theoretical treatment.   3. We will assume, with many researchers in developmental psycholinguistics, that an input text consists of short, simple, grammatical sentences. Little in the present discussion hinges on the precise nature of the text, so long as the basic constructions of the language are adequately exemplified. For further discussion (and debate) on the nature of the input evidence, see Wexler and Culicover 1980, Lightfoot 1989, and the discussion of the latter work. For a formal characterization of the input evidence and its relation to learning, see Osherson, Stob, and Weinstein 1986.   4. As we will show, shifting is more than a logical possibility and serves to force parametric change over time.  5. Space prevents a comprehensive discussion of this class of algorithms; see Goldberg 1989 for a general introduction and Clark 1990, 1992, for an application to the learnability problem for natural languages.   6. Genetic algorithms are part of a class of algorithms that approximate some desired optimum but are not absolutely guaranteed to return the optimum. Other such algorithms include the “simulated annealing” found in some applications using neural networks, as pointed out by an anonymous reviewer. The property of returning a result that is “probably approximately correct” (PAC) is important for our purposes since such approximations are the fuel for language change (see the discussion of PAC learning in Natarajan 1991). We have selected genetic algorithms from the class of PAC algorithms because genetic

Computational Model of Language Learnability  83 algorithms incorporate a notion of relative fitness and for the formal clarity of the resulting model of parameter setting. We will argue below that this notion of fitness provides some insight into the nature of the learner and how properties of the learner govern diachronic change.   7. See Clark 1992 for an extensive formal discussion of fitness and reproduction and of their influence on convergence. Here, we will mainly be concerned with the intuitions that underlie the formalism.   8. For simplicity, we assume that the learner has access to a table that tells it which settings are superset settings; this is much simpler than forcing the learner to calculate whether or not a given parameter value generates a superset language. Note that shifting relations will not be included on the table. These will be selected against by the fitness metric in an indirect way.   9. The results discussed here receive a more formal discussion in Clark 1992, where proofs of certain theorems entailed by the fitness metric are given. For present purposes, the important point is that, relative to an input text, the fitness metric drives the learner toward a hypothesis that minimizes the number of violations and the number of superset settings and that generates the most elegant syntactic representations possible, given that grammatical violations are avoided. 10. The notion of p-encoding defined here is essentially isomorphic to that of “schemata,” which has been widely discussed in the genetic algorithm literature (see, in particular, Goldberg 1989 and the references cited there). There is one important difference, however; schemata are usually taken as ranging over empirical generalizations whereas p-encodings represent the ambiguities inherent in the input stream. The two are similar in that p-encodings represent the set of grammars that can, in principle, assign a well-formed representation to a given string. 11. Vance (1989) in fact shows that 15th-century MidF null subjects could be licensed under agreement as well as government. Nevertheless, both null subjects licensed under government and null subjects licensed under agreement are lost with simple inversion in the 16th century. Roberts (1993:sec. 2.4.3) proposes that the loss of null subjects where they were licensed under government also entailed their loss throughout the system on the basis of the idea that, for null subjects to be licensed only under agreement, a very rich “pronominal” morphology is required. This type of morphology is found in Italian and Spanish but not in MidF or ModF. Hence, the “poverty” of French agreement, combined with the change in the nominative Case parameter, led to the loss of null subjects everywhere. We will discuss Vance’s data further below. 12. In our presentation, we abstract away from the “split Infl” hypothesis of Pollock (1989), restricting ourselves to projections of I. To fully account for the facts of ModF inversion, however, it is necessary to split I into at least Agr and T (and their projections). In terms of the “Agr over T” system proposed by Belletti (1990), our nominative Case parameter refers to Agr. To account for stylistic inversion, we probably need to say that T can assign nominative Case to a postverbal subject under government (see Rizzi 1990). (Also see note 13.) 13. Literary ModF allows strings that appear to be V2—for example, Dans cette maison vécut Racine ‘In this house lived Racine’. However, such examples should be treated as instances of stylistic inversion. Stylistic inversion differs from V2 and subject-clitic inversion in that the subject appears in a position following the entire verbal complex in a compound tense and is not sensitive to the root/embedded distinction, unlike true V2. (See Kayne and Pollock 1978 and Pollock 1986.) In fact, Pollock (1986) suggests that stylistic inversion may involve a nonreferential null subject in Spec of Iʹ. If so, ModF allows at least

84  Robin Clark and Ian Roberts some highly restricted occurrences of null subjects and (36d) should therefore be reformulated to refer to referential null subjects. 14. In the case of OF, as in the case of all languages now without native speakers, negative evidence in the form of grammaticality judgments is unavailable. Linguists working on such languages are in a situation almost analogous to that of children acquiring their native language, although in fact the linguists’ situation is worse since they have no access to UG and their data are seriously degenerate owing to dialect mixture, scribal error, and so on. Unlike children, however, linguists have no access to a regular input text. Children are surrounded by native speakers producing grammatical utterances. Linguists obviously are not, since all the native speakers are dead. 15. In fact, there are reasons to think that in the position immediately following the inflected verb, as in (52b), these pronouns did cliticize in OF (see Dupuis 1989:119f., Roberts 1993:sec. 2.2.2, and Vance 1989: 70ff.). However, Roberts argues that the crucial step in the development of the system of subject pronouns in French was the emergence of complementary distribution between the je-series and the moi-series. This happened because the cliticization of the je-pronouns became obligatory in MidF. What the OF evidence shows is that these pronouns were optionally clitics in that they cliticized only in certain contexts. In other contexts, such as those in (59) and (60), these pronouns were clearly tonic. It may be, then, that the correct formulation of the parameter in (36c) should refer to obligatory cliticization of nominative pronouns, or, more likely, to the existence of a special series of clitic pronouns. Note that in the latter case the trigger for the parameter is morphological: the learner must recognize two paradigms of subject pronouns. 16. This idea is discussed at length in the context of syntactic change by Roberts (1993), who notes the close resemblance between this idea and the Transparency Principle proposed by Lightfoot (1979). Also see de Vincenzi’s (1989) proposal that something of this kind is a general parsing strategy, not limited to language learners. Note that the least effort strategy as conceived here is not a principle of grammar; in this we differ from Chomsky (1991). 17. An alternative analysis is often available. Roberts (1993:sec. 2.4.1) shows that many cases of V2 could be treated as “free inversion.” 18. For a synchronic analysis of the former construction, see Rizzi and Roberts 1989 [this volume, Chapter 9], and for a discussion of its diachronic development, Roberts 1993:sec. 2.3.4. Concerning the development of the latter construction as a nonemphatic interrogative, see Foulet 1921. 19. We do not want to propose that preverbal subject pronouns in MidF or ModF are syntactic clitics; rather, following Kayne (1983), we believe that these pronouns cliticize only in PF. However, the ultimately unsuccessful hypothesis that these pronouns were indeed syntactic clitics could nevertheless have given rise to XSV orders at the time when the subject-pronoun system was undergoing change. See Roberts 1993 for a more elaborated approach. 20. Except perhaps at the higher diachronic level of phylogenetic change; it is a reasonable assumption that the set of parameters available to modern Homo sapiens is not the same as the set that was available to the first hominids with a language faculty. Of course, we are concerned in the text with changes in the recorded history of languages that by assumption fall within the set of human languages, so this question does not arise. 21. There is at least a metaphorical sense in which cases like XSV are successful rogue hypotheses, where success is determined by the least effort criterion. This is mutation at the level of constructions, not at the level of parameters, so the mutation operator of section 2 is presumably not relevant.

Computational Model of Language Learnability  85 22. More recently, Lightfoot (1991) has proposed a new approach to change based on “Degree-0 learnability.” A detailed comparison of that approach with the one developed here is beyond the scope of this article (though see Clark, in preparation). 23. Modern German also has left-dislocation, but with a tonic resumptive pronoun. On the other hand, MidF left-dislocation featured atonic resumptive pronouns. This was yet another way in which the clitic nature of pronouns in MidF created instability.

References Adams, Marianne. 1987a. From Old French to the theory of pro-drop. Natural Language and Linguistic Theory 5:1–32. Adams, Marianne. 1987b. Old French, null subjects and verb second phenomena. Doctoral dissertation, UCLA, Los Angeles, Calif. Baker, Mark. 1988. Incorporation: A theory of grammatical function changing. Chicago: University of Chicago Press. Belletti, Adriana. 1990. Generalized verb movement: Aspects of verb syntax. Turin: Rosenberg and Sellier. Berwick, Robert. 1985. The acquisition of syntactic knowledge. Cambridge, Mass.: MIT Press. Booker, L. B., David E. Goldberg, and John H. Holland. 1990. Classifier systems and genetic algorithms. In Machine learning: Paradigms and methods, ed. Jaime Carbonell, 235–282. Cambridge, Mass.: MIT Press. Cardinaletti, Anna. 1990. Pronomi nulli e pleonastici nelle lingue germaniche e romanze: Saggio di sintassi comparata. Dottorato di ricerca in linguistica, Università di Padova. Cardinaletti, Anna, and Ian Roberts. 2002. Clause structure and X-second. In Functional Structure in DP and IP: the Cartography of Syntactic Structure Volume One ed. G. Cinque New York/Oxford: Oxford University Press, pp. 123–166 [this volume, Chapter 12]. Chomsky, Noam. 1977. On wh-movement. In Formal syntax, ed. Peter Culicover, Thomas Wasow, and Adrian Akmajian, 71–132. New York: Academic Press. Chomsky, Noam. 1981. Lectures on government and binding. Dordrecht: Foris. Chomsky, Noam. 1991. Some notes on economy of derivation and representation. In Principles and parameters in comparative grammar, ed. Robert Freidin, 417–454. Cambridge, Mass.: MIT Press. Clark, Robin. 1990. Papers on learnability and natural selection. Technical Reports in Formal and Computational Linguistics, No. 1. Université de Genève. Clark, Robin. 1991. A computational model of parameter setting. Paper presented at the American Association for Artificial Intelligence Spring Symposium on Machine Learning, Natural Language, and Ontology. Stanford, Calif. Clark, Robin. 1992. The selection of syntactic knowledge. Language Acquisition 2:85–149. Clark, Robin. In preparation. Finitude, boundedness, and approximate learning of natural languages. Ms., Université de Genève. Darwin, Charles. 1859. On the origin of species. London: John Murray. Dupuis, Fernande. 1989. L’expression du sujet dans les subordonnées en ancien français. Thèse de Ph.D., Université de Montréal, Montréal, Québec. Einhorn, Einar. 1974. Old French: A concise handbook. Cambridge: Cambridge University Press.

86  Robin Clark and Ian Roberts Everett, Daniel. 1986. Pirahã clitic doubling and the parametrization of nominal clitics. In MIT working papers in linguistics 8, 85–127. Department of Linguistics and Philosophy, MIT, Cambridge, Mass. Everett, Daniel. 1989. Clitic doubling, reflexives and word order alternations in Yagua. Language 65:339–372. Foulet, Lucien. 1921. Comment ont évolué les formes de l’interrogation? Romania 47:243–348. Foulet, Lucien. 1982. Petite syntaxe de l’ancien français. 3d ed. Paris: Editions Champion. Gold, E. M. 1967. Language identification in the limit. Information and Control 16:447–474. Goldberg, David. 1989. Genetic algorithms in search, optimization, and machine learning. Reading, Mass.: Addison-Wesley. Haldane, J. B. S. 1990. The causes of evolution. Princeton, N.J.: Princeton University Press. Harris, Martin B. 1978. The development of French syntax: A comparative approach. London: Longmans. Hirschbuhler, Paul. 1990. La légitimation de la construction V1 à sujet nul dans la prose et le vers en ancien français. Revue québécoise de linguistique 19:32–55. Holland, John. 1975. Adaptation in natural and artificial systems. Ann Arbor, Mich.: University of Michigan Press. Kayne, Richard. 1975. French syntax. Cambridge, Mass.: MIT Press. Kayne, Richard. 1983. Chains, categories external to S, and French complex inversion. Natural Language and Linguistic Theory 1:109–137. Kayne, Richard. 1989. Null subjects and clitic climbing. In The null subject parameter, ed. Osvaldo Jaeggli and Ken Safir, 239–261. Dordrecht: Kluwer. Kayne, Richard, and Jean-Yves Pollock. 1978. Stylistic inversion, successive cyclicity, and Move NP in French. Linguistic Inquiry 9:595–621. Kiparsky, Paul. 1982. Explanation in phonology. Dordrecht: Foris. Koopman, Hilda, and Dominique Sportiche. 1991. The position of subjects. In The syntax of verb-initial languages, ed. James McCloskey, 211–258. Amsterdam: Elsevier. [Special issue of Lingua 85.] Kroch, Anthony. 1989. Reflexes of grammar in patterns of language change. Language Variation and Change 1:199–244. Lightfoot, David. 1979. Principles of diachronic syntax. Cambridge: Cambridge University Press. Lightfoot, David. 1989. The child’s trigger experience: Degree-0 learnability. Behavioral and Brain Sciences 12:321–334; commentary 334–375. Lightfoot, David. 1991. How to set parameters. Cambridge, Mass.: MIT Press. Lightfoot, David, and Norbert Hornstein. 1981. Explanation in linguistics. London: Longmans. Marchello-Nizia, C. 1979. Histoire de la langue française aux XIVe et XVe siècles. Paris: Bordas. Natarajan, Balas. 1991. Machine learning: A theoretical approach. Palo Alto, Calif.: Morgan Kaufmann. Osherson, Daniel, Michael Stob, and Scott Weinstein. 1986. Systems that learn: An introduction to learning theory for cognitive and computer scientists. Cambridge, Mass.: MIT Press. Paul, Hermann. 1920. Prinzipien der Sprachgeschichte. 5th ed. Halle: Niemeyer. Poletto, Cecilia. 1990. Diachronic development of subject clitics. Talk given at the Crucial Languages Workshop, Université de Genève. Pollock, Jean-Yves. 1986. Sur la syntaxe de en et le paramètre du sujet nul. In La grammaire modulaire, ed. Mitsou Ronat and Daniel Couquaux, 211–246. Paris: Editions de Minuit.

Computational Model of Language Learnability  87 Pollock, Jean-Yves. 1989. Verb movement, Universal Grammar, and the structure of IP. Linguistic Inquiry 20:365–424. Price, Glanville. 1971. The French language: Present and past. London: Edward Arnold. Priestley, Lawrence. 1955. Reprise constructions in French. Archivum Linguisticum 7:1–28. Renzi, Lorenzo. 1983. Fiorentino e italiano: Storia dei pronomi personali soggetto. In Italia linguistica: Idee, storia, struttura, ed. F. Albano Leoni et al., 223–239. Bologna. Renzi, Lorenzo, and Laura Vanelli. 1983. I pronomi soggetto in alcune varietà romanze. In Scritti linguistici in onore di Giovan Battista Pellegrini, 121–145. Pisa. Rizzi, Luigi. 1986a. Null objects in Italian and the theory of pro. Linguistic Inquiry 17:501–557. Rizzi, Luigi. 1986b. On the status of subject clitics in Romance. In Studies in Romance syntax, ed. Osvaldo Jaeggli and Carmen Silva-Corvalàn, 391–419. Dordrecht: Foris. Rizzi, Luigi. 1990. Relativized Minimality. Cambridge, Mass.: MIT Press. Rizzi, Luigi, and Ian Roberts. 1989. Complex inversion in French. Probus 1:1–30 [this volume, Chapter 9]. Roberts, Ian. 1993. Verbs and diachronic syntax. Dordrecht: Kluwer. Roberts, Ian. 1994. Two types of head-movement in Romance. In D. Lightfoot & N. Hornstein (eds) Verb Movement. Cambridge: Cambridge University Press, pp. 207–242 [this volume, Chapter 11]. Schwartz, Bonnie, and Sten Vikner. 1996. The verb always leaves IP in V2 clauses. In A. Belletti & L. Rizzi (eds) Parameters and Functional Heads: Essays in Comparative Syntax. New York/Oxford: Oxford University Press, pp. 11–62. Thurneysen, Robert. 1892. Die Stellung des Verbums im Altfranzösischen. Zeitschrift für Romanische Philologie 16:289–371. Vance, Barbara. 1989. Null subjects and syntactic change in medieval French. Doctoral dissertation, Cornell University, Ithaca, N.Y. Vanelli, Laura, Lorenzo Renzi, and Paola Benincà. 1986. Typologie des pronoms sujets dans les langues romanes. In Actes du XIIe Congrès de Linguistique et Philologie Romanes. Aix-en-Provence. de Vincenzi, Maria. 1989. Syntactic parsing strategies in a null subject language. Doctoral dissertation, University of Massachusetts, Amherst. Wexler, Kenneth, and Peter Culicover. 1980. Formal principles of language acquisition. Cambridge, Mass.: MIT Press.


Object Movement and Verb Movement in Early Modern English Ian Roberts

0. Introduction This paper provides evidence that an earlier stage of English had a rule ‘object shift’, similar to that found in the Mainland Scandinavian (MSc) languages. The evidence of object shift in English sheds light on the nature of object shift in general and provides a new perspective on the well-known loss of overt verb-movement in the history of English. We begin by illustrating the phenomenon of object shift from Swedish and Danish, drawing on the important work by Holmberg (1986, 1991) and Vikner (1989, 1994). In our discussion of MSc, we underline the central fact about object shift: the object moves just when the verb moves. This is section 1. Having illustrated object shift in MSc, we turn in section 2 to the English data. What we show is that Early Modern English (ENE) of the 16th century had object shift of a type very similar to that found in MSc, in particular in that the connection between object movement and verbmovement is attested. The ENE facts are thus amenable to an analysis parallel to that of MSc. Similarly, the loss of object shift since ENE can be naturally connected to the loss of overt verb-movement, and we can thus explain the absence of shifted objects in NE in terms of the absence of overt verb-movement. Section 3 elaborates on the analysis, showing how a small extension of Chomsky’s (1993) system of feature-checking, head-movement and locality can provide a straightforward account of object shift in MSc and ENE, and of the diachronic development of English. The analysis also extends, at least in part, to Icelandic and Faroese. We are led to two main conclusions on the basis of the observation that object shift is attested in English for as long as verb-movement is. First, we see that the English pronoun system is essentially parallel to that of the MSc languages. In particular, English pronouns are not cross-linguistically unusual in any sense. Their cross-linguistically unusual syntax derives from the fact that, in the absence of overt verb-movement, they never (or almost never) occupy a ‘special’ syntactic position. Similarly, the English pronouns

Object Movement and Verb Movement  89 have not changed since ENE; what has changed in English is AgrS, in that overt verb-movement is no longer possible (for main verbs).

1.  Object Shift in Mainland Scandinavian Holmberg (1986, 1989, 1991) and Vikner (1989, 1994) discuss the phenomenon of object-shift in MSc. In these languages, unstressed pronominal objects are obligatorily moved leftward out of VP if the main verb moves out of VP (here and throughout, object pronouns are assumed to be unstressed). Taking the negative adverb ikke to be at the left margin of VP (whatever its precise position may be), the following Danish examples illustrate: (1) a. Hvorfor læste studenterne ikke [t artiklen]? Why read the-students not the-articles? b. *Hvorfor læste studenterne artiklen ikke [t t]? Why read the-students the-articles not? ‘Why didn’t the students read the articles?’ (2) a. *Hvorfor læste studenterne ikke [t den]? Why read the-students not it? b. Hvorfor læste studenterne den ikke [t t]? Why  read the-students it not? ‘Why didn’t the students read it?’ In all these examples the inflected verb has moved to C, as is usual in both declarative and interrogative main clauses in MSc since these are V2 languages (see Vikner (to appear, ch. 2)). In (1), the non-pronominal direct object DP  artiklen cannot be moved out of VP, as the ungrammaticality of (1b) shows. In (2), we observe the converse behaviour of the pronominal object: where the verb leaves VP, so must the object pronoun. (2a) is ungrammatical because the object has remained in VP while the verb has moved out of VP. In (2b), the object pronoun, although it has left VP, has not ‘followed’ the verb to C. This is evident from the relative positions of the object pronoun and the subject DP here. There is no reason to say that the subject DP is anywhere other than in its usual Spec-AgrS position, and the object pronoun cannot precede this DP.1 Hence the object pronoun does not move to C, but moves to a position somewhere in IP but outside VP. The data in (1) and (2) provide us with the essentials of object shift, and illustrate the two basic generalizations about the phenomenon: a) The object pronoun leaves the VP when the verb does; b) The object pronoun does not ‘follow’ the verb to C but instead remains in I.

90  Ian Roberts Vikner (1989, 1994) gives a range of further data which illustrate these properties of object shift. (3) shows that object shift does not take place in embedded clauses where the verb does not move: (3) a. Det var godt at han ikke It was good that he not b. *Det var godt at han den It was good that he it ‘It was good that he didn’t buy it’.

[købte den]. bought it. ikke [købte t]. not bought.

Still assuming ikke to be at the left edge of VP, the fact that ikke precedes the verb here shows that it has not left VP. In (3a) the object pronoun also remains in VP and the sentence is grammatical. In (3b) object shift takes place and, in the absence of verb-movement, the result is ungrammatical. This shows that generalization (a) above should be tightened up to say that object shift takes place only when verb-movement does. In (4), we give examples with a modal followed by an infinitive: (4) a. Hvorfor skal studenterne ikke why shall the-students not b. *Hvorfor skal studenterne den why shall the-students it ‘Why don’t the students have to read it?’

[læse den]? read it? ikke [læse t]? not read?

It is likely that MSc modals are main verbs with clausal infinitive complements, like their counterparts in many languages (cf. Vikner 1988). The object cannot move either to a position between the negative and the infinitive (giving ikke den læse) or, as shown in (4b), to a position preceding the negative. So, whatever the position of ikke, object shift is impossible. There is no reason to suppose that the infinitive moves (that is, we do not find the order infinitive–ikke with the negative in the lower clause). Nor, as (4b) shows, can the object move. So (4) is consistent with generalization (a). (5) shows what happens in periphrastic tenses: (5) a. Hvorfor Why b. *Hvorfor Why

har have har have

studenterne the-students studenterne the-students

ikke not den it

[læst den]? read it? ikke [læst t]? not read?

Again, there is no reason to say that past participles move in Danish. Hence, the ungrammaticality of (5b) further confirms generalisation (a). Holmberg (1986) and Vikner (1989, 1994) have convincingly shown that object shift is distinct from scrambling of the type found in Continental West Germanic languages. Moreover, they argue that it is a variety of A-movement. Consequently they propose that, like other A-movements,

Object Movement and Verb Movement  91 object shift is driven by Case theory. The essential idea is that Case theory applies to unstressed pronouns in a more stringent way than to full DPs and hence these elements are required to undergo some ‘extra’ movement in order to satisfy this further requirement. Holmberg (1986) relates this further requirement to morphological case, observing that pronouns show morphological case in MSc while full DPs do not. Holmberg’s analysis carries over to Icelandic, where full DPs show morphological case and undergo object shift.2 Vikner’s (1989) account of Danish essentially endorses this view. An apparent problem for the idea that object shift is A-movement has to do with its interaction with A-movement of the subject (as noted by Vikner 1994). Movement of the object crosses the base position of the subject, and subsequent movement of the subject crosses the landing site of object shift. Suppose, for concreteness, that the landing site of object shift is Spec-AgrOP (this position is in fact the only plausible candidate—see section 3). The relevant parts of the derived structure of a clause with object shift, e.g. (2b), must then be as in (6): (6) [AgrSP subji . . . [AgrOP objj [AgrOʹ [VP* ti [VP . . . tj]]]] Spec-AgrOP is an A-position, and c-commands the trace of the subject without c-commanding the derived position of the subject. Formally, both the movement of the object and the movement of the subject violate the Shortest Link requirement on movement (Chomsky 1993), i.e. Relativised Minimality. The structure in (6) has the abstract character of a superraising example like (7): (7) *Johni seems that it is likely [ti to win]. Why is (6) not a violation of shortest link on a par with (7)? Chomsky (1993: 21–26) proposes an answer to this question. He posits an operation of V-raising to AgrO (either at S-structure or LF), which creates a chain C = (V, t). The minimal domain of this chain consists of Spec-AgrOP, the base position of the subject and the base position of the object.3 Chomsky then introduces the following notion of equidistance: (8) If α and β are in the same minimal domain, they are equidistant from Γ. Equidistant positions are those which cannot act as interveners for each other, since movement to either of them has the same status with respect to the Shortest Link requirement on chains. In (6), Spec-AgrOP and the base position of the subject are equidistant from the base position (or launching site) of the object. Hence the object can move directly to Spec-AgrOP, ‘skipping’ the subject position, since the subject is not an intervener for the resulting chain. Movement of AgrO to a higher position creates a further chain C´ whose minimal domain will contain both Spec-AgrOP and the

92  Ian Roberts specifier of the target head, allowing the subject to ‘skip’ Spec-AgrOP but preventing the object from skipping the subject. Thus, according to Chomsky, overt V-movement facilitates overt object movement, i.e. object shift. Two questions remain open, however. First, V-movement only allows object shift; it does not require it. Yet we have seen that in MSc object shift is obligatory for unstressed pronouns. Second, we need an account of the restriction of object shift to pronouns in MSc. We will return to both of these points in section 3. For now, the important conclusion is that treating object shift as A-movement (to Spec-AgrOP) can at least explain why overt V-movement is a precondition for overt object shift. In this section we have reviewed the basic data concerning object shift and we have seen that it is plausible to regard this operation as A-movement, probably to Spec-AgrOP. We now consider the evidence that object shift existed in ENE.

2.  Object Shift in Early Modern English In this section we will show that the correlation between V-movement and object shift holds diachronically in English. As long as English had overt V-movement, it had object shift. This observation is interesting for two reasons. First, it supports the approach to object shift outlined in the previous section. Second, it implies that, although Modern English appears to lack ‘special’ clitic pronouns (in Zwicky and Pullum’s 1983 sense), this fact can be attributed to the well-known lack of overt V-movement rather than to any peculiar feature of the pronoun system. Since we need to say that overt V-movement is lacking in Modern English in any case, this conclusion regarding the pronoun system is advantageous. We now review that diachronic evidence. We restrict our attention largely to the 16th century for two reasons. First, as we will show directly, overt V-movement is not always found at this period, but the correlation between overt V-movement and object shift is systematic. Second, at this period object shift is, as in MSc, restricted to pronouns. Earlier periods of the language show more general object shift—see note 5. We follow Emonds (1978) and Pollock (1989) in taking clausal negation as a diagnostic for movement from V to AgrS. In fact, we make the simplifying assumption that not, when it has clausal scope, is at the left edge of VP in ENE.4 ENE had three possibilities for clausal negation, which are illustrated by the following examples: (9) a. it serveth not  (1513, Anon) b. it not belongs to you  (1600, Shakespeare) c.  whose sore task Does not divide the Sunday from the week (1605, Shakespeare) The order V–not in (9a) involves V-to-AgrS movement. This was the only form of clausal negation prior to the 16th century, and it died out

Object Movement and Verb Movement  93 soon after 1600 (see Kroch 1989, Jespersen 1909–49, Lightfoot 1979, Roberts 1985 [this volume, Chapter 1], 1993). For these reasons, we refer to this as the ‘conservative’ system. The order not–V of (9b) is a 16th-century innovation which died out in the 17th century; we will say more about this order below. Owing to the similarity with non-V2 clauses in MSc, we will designate this the ‘Scandinavian’ order. The construction in (9c) is of course the only one to survive in contemporary English. This was also a 16th-century innovation (see Ellegård 1953, Denison 1985, 1993, Kroch 1989, Roberts 1993). We refer to this as the ‘modern’ construction. Let us now consider the behaviour of pronominal objects in each of these constructions in turn.5 With the ‘conservative’ form of negation, which involves V-to-AgrS movement, we consistently find object shift. The following are representative examples: (10) a. b. c. d.

if you knew them not they tell vs not the worde of God yf thou smyte it not of Why bring you him not up?

(1580, John Lyly) (1565, Thomas Stapleton) (1534, Thomas More) (1614, Jonson)

(10a) is a straightforward case of the order V–pronoun–not . (10b) features a double-object verb with one pronominal object and one non-pronominal object. The pronominal object precedes not while the non-pronominal follows. This is what we expect, given what we have seen regarding object shift, and is exactly parallel with what is found in MSc (see Vikner 1994). (10c) involves the verb-particle smite . . . off. There is a variety of approaches to the analysis of this construction (e.g. Emonds 1976, Kayne 1985, den Dikken 1990, Johnson 1991), but they all have in common the idea that the particle is in VP at S-structure in the sequence V–DP–Prt. In that case, the order pronoun–not –Prt in (10c) provides clear evidence of object shift. Finally, (10d) is a further case of a verb-particle construction. Since it is an interrogative, the inflected verb is in C in this example. As in MSc, the object pronoun does not ‘follow’ the verb to C. The situation is substantially the same as it is in MSc regarding object shift. Of course, there are independent differences between ENE and MSc which to some degree obscure the similarities. There are two relevant differences, both concerned with verb-movement. In conservative ENE, V consistently moves to AgrS. This is not the case in MSc, as we saw in section 1. On the other hand, in MSc V consistently moves to C in all main-clause declaratives, while ENE was no longer a V2 language (cf. van Kemenade 1987, and, for a different view, Lightfoot 1994). In ENE the inflected verb moves to C only in interrogatives and a restricted range of declarative main clauses. If we abstract from these differences, we see that ENE and MSc are exactly alike regarding object shift.

94  Ian Roberts Consider next the ‘Scandinavian’ construction in (9b). With this kind of negation, we only find the order not –V–pronoun. This is illustrated in (11): (11) a. b. c.

She not denies it I not bid thee that I not see them

(1599, Shakespeare) (1611, Jonson) (1633, Jonson)

Here, too, the pattern parallels MSc, modulo differences in verb-movement. Given our assumption about the position of not, these examples have no overt verb-movement at all. So this construction is exactly like what we saw for MSc embedded clauses in section 1. These examples are thus consistent with generalisation (a) about object shift: the object moves only when the verb moves. Since the verb does not move here, neither does the object. The pattern of negation in (11) is of course ungrammatical in contemporary English. As we mentioned earlier, it is only found from roughly 1550 to 1650. Roberts (1993), drawing on ideas in Kroch (1989), proposes an account of this which is consistent with most theories of negation, do-insertion and verb-movement in contemporary English (see, among others, Pollock 1989, Rizzi 1990, Chomsky 1991). Suppose, following Chomsky (1993), V raises to T and AgrS at LF in Modern English in order to check its features, i.e. the V-features of T and AgrS are ‘weak’. This raising is blocked by the presence of Neg (inter alia), and so in negative clauses V cannot be inserted bearing an affix, as that affix will fail to be checked at LF. Do is inserted to check features of T and AgrS (the insertion takes place prior to LF, despite the fact that the features in question are weak and hence do not need to be checked before LF, since post-S-structure lexical insertion is impossible).6 In these terms, the ‘Scandinavian’ negation pattern of (11) can be analysed exactly, in fact, as MSc negation would be analysed: not does not block LF-raising since it is not, at this period, a head, but rather an adverbial (in an analysis which features NegP, the most plausible assumption is that not occupies Spec-NegP). In the mid-17th century, not becomes a head and so do-insertion becomes obligatory with negation because not now blocks LF verb-raising. Independent evidence that not becomes a head comes from the appearance at this time of the reduced form n’t in texts from 1660s, cf. Jespersen (1909–49, V: 429). With do-insertion, the pronominal object always occupies the modern position. Thus sentences which are negated in the contemporary manner and which contain pronominal objects appear to be exactly like their counterparts in contemporary English. This is illustrated by the following examples: (12) a. b. c.

ye do not remembre me this sorrow does not leave me they dyde not assaile it

(1463, Anon) (c1480, Anon) (1523–25, Berners)

Object Movement and Verb Movement  95 In the 16th century, we do not find cases where the object appears adjacent to the inflected verb do. That is, the order do pronoun not V is unattested; we do not find examples like (13): (13) I did him not see. The best way to make this evidence compatible with everything else we have seen about object shift in ENE is to assume that sentences involving doinsertion are parallel to compound tenses. In that case, they are comparable with MSc examples of the type in (4) and (5). In this section, we have seen that ENE object shift was exactly parallel to MSc object shift once the differences in verb-movement that distinguish ENE from MSc are taken into account.

3.  Object Shift and Cliticisation In this section, we propose an analysis of object shift in terms of the featurechecking system put forward in Chomsky (1993). Recall that two questions remain open from our discussion of V-movement and object-movement in section 1. The first concerns why object shift should be obligatory when V moves. The second concerns the selectivity of object shift: why (in MSc and ENE) is it restricted to pronouns? We can answer the second of these questions by saying that pronouns are subject to some feature-checking requirement over and above the one which applies to ‘full’ DPs (very much in the spirit of earlier work by Holmberg and Vikner). A natural proposal is that pronouns, whose content is exhausted by phi-features, are required to check those features with some functional head, perhaps in addition to the usual requirement for checking Case features. If we take that head to be AgrO, we see why pronouns must undergo object shift under certain conditions. Object shift will be triggered to the extent that AgrO has strong phi-features. We now need to see what triggers AgrO’s strong phi-features. This will give us the answer to our first question. Our proposal is this: when V raises (for V2 in MSc, because of the V-features of AgrS in ENE) it must pass through AgrO in order to satisfy the Head Movement Constraint. In so doing, it ‘activates’ AgrO’s agreement property. We can think of this as AgrO’s strong phi-features being induced by V-movement. More technically, we can say that the presence of some element in the checking domain of AgrO activates AgrO’s potential for checking. Hence, when the verb moves to the checking domain of AgrO (by adjoining to AgrO), the pronoun is required to raise to Spec-AgrOP in order to check AgrO’s strong phi-­features. In this way, we see why V-movement both allows and requires pronoun object shift. This approach extends straightforwardly to Icelandic. In this language, as we mentioned in note 2, pronominal object shift is obligatory whenever

96  Ian Roberts the verb moves and verbs systematically raise at least to AgrS in all finite clauses (i.e. AgrS has strong V-features). Hence pronouns always shift (in finite clauses). On the other hand, full DPs optionally undergo object shift. We take no position on what may trigger this, noting only that a mechanical option is to say that AgrO has an optionally strong N-feature. Clearly, Faroese differs from Icelandic in this last respect (cf. note 2). In all the Scandinavian languages, the trigger for pronoun object shift is the same, however. Moreover, we can maintain that English from at least the 16th century onwards patterns exactly like the Scandinavian languages. The difference in English is that verbs do not raise overtly, hence they never induce object shift. There is a well-known complication for the view that Modern English lacks overt V-movement: auxiliary verbs do raise. Roberts (1983) and Pollock (1989) have suggested that this is connected to the fact that auxiliaries lack Θ-roles. Lacking in Θ-roles, auxiliaries do not have direct objects, and so they are not relevant for the discussion of object shift. However, there is one exception to this statement: the possessive have of conservative dialects of British English. This verb behaves in inversion and negation contexts as though it moves to AgrS. Pollock (1989) argues that this fact can be made consistent with the view that only verbs which have no θ-roles are able to move to AgrS in Modern English; this is arguably correct, but the important point for present purposes is that possessive have is a verb which superficially has a direct object and which continued to raise to AgrS after this possibility had been lost for other English verbs with direct objects. So it is appropriate to ask how this verb behaves with respect to object shift. Unfortunately, it is difficult to ascertain what the precise situation is with regard to object shift in contemporary dialects which allow have to raise to AgrS. In negatives, the paradigm is as follows: (14) a. ?? I have it not. b. *I have not it. c. I haven’t it. It seems that in this variety the contracted negation n’t is obligatory where have raises to I. Since n’t combines with have and moves with it, this makes it impossible to tell if object shift has taken place in (14c). The general requirement for the contracted negation with have is confirmed by the ungrammaticality of (15a), which is clearly independent of object shift: (15) a. *I have not a car. b. I haven’t a car. Despite this difficulty, native speakers are unanimous in preferring (14a) over (14b). (14a) sounds very archaic or poetic, while (14b) is simply impossible. So the evidence favours the view that object shift may still marginally

Object Movement and Verb Movement  97 remain with conservative possessive have. Clearly, this is expected if in fact Modern English pronouns have exactly the same property as their Scandinavian counterparts. Several authors (Deprez 1990, Josefsson 1992, Jonas and Bobaljik 1993) have proposed that pronoun object shift is head-movement (this was also my own view in an earlier version of this work, presented in 1991 and written in 1992). One argument which has been given in favour of this idea is that it assimilates object shift to cliticisation in Romance.7 However, it is likely, especially given the facts of Romance past-participle agreement discussed in Kayne (1989), that Romance clitic-placement involves DP-movement (see Roberts (1993), Sportiche (1992), Vikner and Sprouse (1988) and below). Moreover, pronoun object shift has many properties that are quite unlike any Romance cliticisation. So, postulating that object shift is head-movement does not achieve the alleged assimilation, and this assimilation is probably not desirable in any case. If we propose that pronominal object shift is head-movement, we are led to posit a kind of head-movement that is otherwise unattested, and to suggest relaxation or violation of well-known constraints on headmovement (this was a very real flaw in my earlier version of this material, cited above). One hypothesis—the simplest head-movement variant of what we have proposed—is that object shift adjoins the pronoun (a D) to AgrO. This operation either violates the Head Movement Constraint by moving D over V, or, if we adopt the more complex derivation with DP-movement to SpecAgrOP applying first, the D-movement effectively becomes downgrading since it moves D from Spec-AgrOP to AgrO. An alternative is to assume some kind of excorporation (either of D from [D + V] in V, or of V from [D + V] in AgrO). This kind of operation is arguably undesirable on general grounds (even if, strictly speaking, Relativised Minimality allows it—cf. Roberts (1991 [this volume, chapter 10])). If we exclude excorporation, then it is technically impossible to adjoin to the head of the object DP to AgrO. Suppose then that the object’s host head is higher than AgrO. The Head Movement Constraint requires that DP-movement take place in order to permit D to attach to the host without incurring a violation. This kind of derivation may be right for Continental West Germanic (and Old English) and perhaps Romance clitics (but see below on the latter). However, these groups of languages each provide different kinds of independent motivation for this view, motivation which is lacking in MSc (and ENE). Continental West Germanic languages all have leftward scrambling of DP. While the nature of and trigger for scrambling remain rather unclear (cf. the references in Vikner (1994) and other articles in Corver and Van Riemsdijk (1994)), it is plausible to suppose that scrambling is the DP-movement operation which feeds cliticisation (cf. Roberts (1993)). MSc languages lack scrambling, and hence one type of independent motivation for one kind of cliticisation analysis of object shift.

98  Ian Roberts It is also instructive to compare Romance clitics with MSc shifted objects. Romance clitics always occupy “special” positions, unlike MSc object pronouns, which have to remain in what appears to be their base position if V does not move. This is one reason why it is plausible to think that Romance clitics are base-generated in their “special” positions, as a kind of agreement (or Voice) head which triggers raising and licensing of a DP (pro in nonclitic-doubling cases). This is what Sportiche (1996) proposes. On this view, DP-movement (of pro) is still associated with Romance clitics. The facts of MSc clearly offer no scope for such an approach to shifted objects. More generally, any approach raising shifted objects higher than AgrO raises two questions: (i) why is DP raised? (ii) how do D-movement and V-movement interact? Although we have no satisfactory general answer to (i) for Continental West Germanic and Romance, we have independent evidence of the existence of such “long” DP-movement, as we have just seen. Question (ii) naturally leads to the postulation either of excorporation or head-adjunction to a head containing a trace. Neither of these options is conceptually attractive (see Kayne (1991) on the latter). We do not need to appeal to either of them if the shifted object stays at the AgrO level, but then it can only be in Spec-AgrOP. One variant of the cliticisation approach, explored in Holmberg (1991), is to say that pronominal objects always cliticise: where V does not leave VP, the cliticisation is string-vacuous D-movement to V. However, this putative cliticisation differs from Romance cliticisation in two important respects: (a) it is head-adjunction to the right of the host; (b) it is cliticisation to a lexical head.8 We see then that MSc and ENE object shift are actually rather unlike Romance cliticisation, and in any case Romance cliticisation involves a DP-movement component. Hence this kind of comparative consideration provides no argument at all for regarding object shift as head-movement. The technical problems associated with this idea are such that we continue to regard object shift as DP-movement to Spec-AgrOP, triggered and licensed as described above. To recapitulate: if pronoun object shift is DP-movement, we have a natural account of the synchronic and diachronic link with verb-movement. The ENE data are particularly clear in this regard. We must say that object shift is at least DP-movement; we have seen that there is no good reason for saying that it involves any more than this—and several good reasons not to say this. In this section we have implicitly introduced a typology of clitic, weak or shifted pronouns. In North Germanic and English they are required to check for phi-features with AgrO, where AgrO’s strong features are induced by verb-movement (always in Icelandic, sometimes in MSc, almost never in Modern English). In West Germanic they typically undergo scrambling and may cliticise to some head position above AgrS and below C (for a detailed proposal, which does not assume exactly the mechanisms sketched

Object Movement and Verb Movement  99 here, see Haegeman (1993)). In Romance, clitics are base-generated in the higher head positions and trigger pro-raising there (Sportiche (1996)). Understanding what precisely underlies these differences is a topic that goes beyond our goals here. One point which arises concerns the status of the “higher” agreement or voice projections between AgrS and C in North Germanic and English. Are these positions present? If so, what are their reflexes? Sadly, we must leave these fascinating questions open here.

4. Conclusion In this paper, we have proposed an analysis of MSc object shift which carries over to the essentially parallel phenomenon in ENE. An important aspect of our analysis is that it leads to the conclusion that English object pronouns have not changed at all since ENE. What has changed since ENE is the position of the inflected verb, as is well known. Since V (almost) never raises to AgrO, it neither triggers nor licenses object shift. So we arrive at the welcome conclusion that the observed change in the distribution of object pronouns is not an independent development, but simply a further reflex of the general loss of overt verb movement in English.

Acknowledgements An earlier version of this material (which featured a rather different analysis) was presented at the GGS-Treffen, Bern, the 7th Comparative Germanic Syntax Workshop, Stuttgart, and the University of California, Irvine. Thanks to the audiences at those presentations for their comments. Thanks also to Bob Borsley, Sten Vikner and the editors of this collection for their comments. All errors are my own.

Notes 1. Josefsson (1992) presents evidence that the order XP—V—reflexive object ­pronoun—subject is possible in some varieties of Swedish. In that case, we would say that the object pronoun does follow the verb to C° in these varieties. I will leave this potentially important fact aside in what follows. 2. However, Holmberg’s analysis runs into problems in Faroese, as Vikner (1994) points out, since here NPs have morphological case, as in Icelandic, but object shift is limited to pronouns as in MSc (see Barnes 1992: 28): (i) a. Jógvan keypti ikki bókina. J. bought not the-book. b. *Jógvan keypti bókina ikki. J. bought the-book not. ‘J. didn’t buy the book’. (ii) a. *Jógvan keypti ikki hana. J. bought not it. b. Jógvan keypti hana ikki. J. bought it not. ‘J. didn’t buy it’.

100  Ian Roberts

Middle English poses the converse problem for Holmberg (1986), in that there is no morphological case-marking on non-pronominal NPs but they nevertheless appear to be able to undergo object shift. We will return briefly to ME below (see note 5).   In section 3, we will make an alternative proposal as to what triggers pronoun movement.   This proposal can handle both the Faroese and the ME data.   A further point which is relevant here is that full NPs may undergo object shift in Icelandic, while pronouns must. See section 3. 3. The minimal domain of a head H is the smallest set of nodes such that its members dominate all nodes the categories in the domain of H dominate except those that contain H. The domain of H is the set of nodes contained in the maximal projection of H distinct from and not containing H. These definitions extend straightforwardly to head-chains. 4. In light of the recent work stemming from Pollock (1989), this assumption is obviously too simplistic. In fact, our remarks on do-support below will suggest a partial refinement. This is, of course, not the place to provide a full analysis of negation in ENE. 5. In the text, we do not consider the possibility of ‘Icelandic-style’ object shift of non-pronominal DPs. It does not seem that this was possible in ENE, at least. A simple count of the relative positions of objects and not in Spevack’s (1970) Shakespeare Concordance for eight plays revealed no examples at all of object shift of a non-pronominal DP. Conversely, of a total of 93 instances of ‘conservative’ negation with a pronominal object, 78 featured object shift (and several of those which did not had clearly emphatic objects, in addition to two cases of reflexive X-self objects) and 15 had no object shift. Of a total of 23 cases of ‘modern’ negation involving do-insertion, only one had object shift with another 22 having the object in the modern position. Finally, neither of the two cases of ‘Scandinavian’ negation had object shift. In light of Vikner’s (1994) suggestion that object shift of full DPs is related to overt V-to-AgrS movement (see Note 2), these facts may be problematic, since the ‘conservative’ form of ENE negation clearly involved overt V-to-AgrS movement.   Object shift of full DPs appears to have been possible in Middle English, as the following example shows: (i) Triacle schal be leide to . . . forto þe posteme breke Treacle should be laid on . . . to the boil break (ca 1398, Trevisa)

However, it is extremely difficult to say whether such cases involve scrambling or object shift. We leave this question open for future research. 6. It is difficult to account for the ‘last-resort’ nature of do-insertion here without invoking the idea that lexical insertion for feature-checking is more costly than movement—see Chomsky (1991). It is altogether unclear to me why this should be, however.   There is also the diachronic question of the 16th-century use of do in positive declaratives, i.e. the non-last-resort situations. See Roberts (1993a: 3.2) and, for an account using the mechanisms in Chomsky (1993), Watanabe (1993). 7. This is not the only argument that has been given in favour of a head-movement approach, of course. Jonas and Bobaljik (1993) have a theoretical motivation which derives from the fact that they establish a correlation between transitive expletive constructions (TECs, e.g. There ate a man an apple) and nominal object shift. This correlation can be explained in terms of the idea that Spec-TP must be a possible site for subject raising when the object is shifted to Spec-AgrOP; likewise, Spec-TP is the position of the argumental subject in TECs. Hence the availability of Spec-TP underlies both properties. Since MSc has pronoun object shift

Object Movement and Verb Movement  101 only, and no TECs, our approach to pronoun object shift threatens to undermine Jonas and Bobaljik’s explanation for the correlation between nominal object shift and TECs.   In fact, there are independent grounds for rejecting Jonas and Bobaljik’s approach. If we follow Kayne (1993) and assume that Specifiers are adjoined positions, and if we continue to adopt the definition of minimal domain in Chomsky (1993: 12), then we create the potential for AgrO-to-T movement to license object-shift to Spec-TP in the following configuration (because domains are defined in terms of containment and adjoined categories are contained in but not dominated by the category they adjoin to, according to Chomsky):

(i) [AgrSP AgrS [TP T [AgrOP Subj AgrO [ t V Obj . . .

The minimal domain of the chain formed by AgrO-to-T movement now contains Spec-TP, Spec-AgrOP and Spec-VP. We can prevent object-movement to Spec-TP if we assume that Spec-TP is either absent (Chomsky 1993) or an A´-position (Rizzi (1990)) and that domain extension by head-movement only ever creates one further potential landing site for movement, as pointed out by Jonas and Bobaljik. This approach is adopted in Roberts (1993) in the account of restructuring and clitic-climbing given there. See also Kayne (1994) for conceptual arguments in favour of treating Specifiers as adjoined categories.   Object shift is still possible where T raises to AgrS, forming a chain with minimal domain {Spec-AgrSP, Spec-TP, Spec-AgrOP}, all positions accessible to the subject (but not to the object). To the extent that T-to-AgrS movement is reflected by overt V-movement (something of an open question at present), then the tie-in between verb-movement and object shift may be made still tighter: object shift would depend on V-movement to AgrS. Note that the MSc and ENE facts are compatible with this more stringent characterisation.   In these terms, one is led to formulate an account of TECs which posits some Agr-recursion at the AgrS-level. One can regard Agr-recursion as substitutionmovement of AgrS (given Chomsky’s (1993) domain-extension requirement on transformations any substitution movement of a head will be indistinguishable from copying of that head). If AgrS raises only where V raises, then one can tie TECs to systematic overt V-raising to AgrS. This is a correct result for North Germanic, in that it distinguishes Icelandic from Mainland Scandinavian. This might be the beginning of an alternative to Jonas and Bobaljik’s generalisation, but this is not the place to develop it further. 8. Both of these objections could be avoided by postulating that D left-adjoins to some functional head while V moves to some higher position, giving a derived structure like (i):

(i) F




But there is no independent motivation for this kind of structure. Contrast the evidence for V-movement in Romance enclisis given in Kayne (1991).

References Barnes, Michael: 1992, ‘Faroese Syntax—Achievements, Goals, Problems’, in Jonna Louis-Jensen and Jóhan Hendrik W. Poulsen (eds.), The Nordic Languages and Modern Linguistics 7, Føroya Fróðskaparfelag, Tórshavn, pp. 17–37.

102  Ian Roberts Chomsky, Noam: 1991, ‘Some Notes on Economy of Derivations and Representations’, in R. Friedin (ed.), Principles and Parameters in Comparative Grammar, MIT Press, Cambridge, MA, pp. 417–454. Chomsky, Noam: 1993, ‘A Minimalist Program for Linguistic Theory’, in Kenneth Hale and Samuel Jay Keyser (eds.), The View From Building 20, MIT Press, Cambridge, MA, pp. 1–52. Corver, Norbert and Henk van Riemsdijk: 1994, Scrambling, Foris/de Gruyter, Berlin. Denison, David: 1985, ‘The Origins of Periphrastic Do: Ellegård and Visser Reconsidered’, in R. Eaton et al. (eds.), Papers from the 4th International Conference on Historical Linguistics, Amsterdam, April 10–13, 1985, John Benjamins, Amsterdam, pp. 45–60. Denison, David: 1993, English Historical Syntax: Verbal Constructions, Longmans, London. Deprez, Vivienne: 1990, ‘Parameters of Object Movement’, talk given at the Scrambling Workshop, University of Tilburg, October 1990. Dikken, den Marcel: 1990, ‘Particles and the Dative Alternation’, in Proceedings of the Second Leiden Conference for Junior Linguists, pp. 71–86. Ellegård, Alvar: 1953, The Auxiliary do: The Establishment and Regulation of its Use in English, edited by Fred Behre, Gothenburg Studies in English, Almqvist and Wiksell, Stockholm. Emonds, Joe: 1976, A Transformational Approach to English Syntax: Root, ­Structure-Preserving and Local Transformations, Academic Press, New York. Emonds, Joe: 1978, ‘The Complex V–V’ in French’, Linguistic Inquiry 9, 151–175. Haegeman, Liliane: 1993, ‘The Morphology and Distribution of Object Clitics in West Flemish’, Studia Linguistica 47:57–94. Holmberg Anders: 1986, Word Order and Syntactic Features in the Scandinavian Languages, Dept of General Linguistics, University of Stockholm. Holmberg, Anders: 1989, ‘What is Wrong with SOV Word Order in an SVO Language?’, ms. University of Uppsala. Holmberg, Anders: 1991, ‘The Distribution of Scandinavian Weak Pronouns’, in Henk van Riemsdijk and Luigi Rizzi (eds.), Clitics and Their Hosts, EUROTYP Working Papers 8.1, European Science Foundation, Strasbourg, pp. 155–174. Jespersen, Otto: 1990–49, A Modern English Grammar on Historical Principles, George Allen & Unwin, London. Johnson, Kyle: 1991, ‘Object Positions’, Natural Language and Linguistic Theory 9, 577–636. Jonas, Dianne and Jonathan Bobaljik: 1993, ‘Specs for Subjects’, in Jonathan Bobaljik and Colin Phillips (eds.), Papers on Case and Agreement I, MIT Working Papers in Linguistics 18, 59–98. Josefsson, Gunlög: 1992, ‘Object Shift and Weak Pronominals in Swedish’, Working Papers in Scandinavian Syntax 49, 59–94. Kayne, Richard: 1985, ‘Principles of Particle Constructions’, in Jacqueline Guéron, Hans Georg Obenauer and Jean-Yves Pollock (eds.), Levels of Syntactic Representation, Foris, Dordrecht, pp. 101–140. Kayne, Richard: 1989, ‘Facets of Romance Past Participle Agreement’, in Paola Benincà (ed.), Dialect Variation on the Theory of Grammar, Foris, Dordrecht, pp. 85–104. Kayne, Richard: 1991, ‘Romance Clitics, Verb Movement and PRO’, Linguistic Inquiry 22, 648–686. Kayne, Richard: 1994, The Antisymmetry of Syntax. Cambridge, MA: MIT Press. van Kemenade, Ans: 1987, Syntactic Case and Morphological Case in the History of English, Foris, Dordrecht.

Object Movement and Verb Movement  103 Kroch, Anthony: 1989, ‘Reflexes of Grammar in Patterns of Language Change’, Journal of Language Variation and Change 1, 199–244. Lightfoot, David: 1979, Principles of Diachronic Syntax, Cambridge University Press, Cambridge. Lightfoot, David: 1994, ‘Why UG Needs a Learning Theory: Triggering Verb Movement’, in Adrian Battye and Ian Roberts (eds.), Clause Structure and Language Change, Oxford University Press, New York/Oxford, pp. 31–52. Pollock, Jean-Yves: 1989, ‘Verb Movement, UG and the Structure of IP’, Linguistic Inquiry 20, 365–424. Rizzi, Luigi: 1990, Relativized Minimality, MIT Press, Cambridge, MA. Roberts, Ian: 1983, ‘The Syntax of English Modals’, in Dan Flickinger et al. (eds.), Proceedings of the Second West Coast Conference on Formal Linguistics, Stanford, pp. 227–246. Roberts, Ian: 1985, ‘Agreement Parameters and the Development of English Modal Auxiliaries’, Natural Language and Linguistic Theory 3, 21–58 [this volume, Chapter 1]. Roberts, Ian: 1991, ‘Excorporation and Minimality’, Linguistic Inquiry 22, 209–218 [this volume, Chapter 10]. Roberts, Ian: 1993a, Verbs and Diachronic Syntax, Kluwer, Dordrecht. Roberts, Ian: 1993b, ‘Restructuring, Pronoun Movement and Head Movement in Old French’, ms. University of Wales. Spevack, Michael: 1970, A Complete and Systematic Concordance to the Works of Shakespeare, Volume V: Hildings—Severing, Olms, Hildersheim. Sportiche, D. 1996, ‘Clitic Constructions’, in J. Rooryck & L. Zaring (eds) Phrase Structure and the Lexicon. Dordrecht: Kluwer, pp. 213–276. Vikner, Sten: 1988, ‘Modals in Danish and Event Expressions’, Working Papers in Scandinavian Syntax 39. Vikner, Sten: 1989, ‘Object Shift and Double Objects in Danish’, Working Papers in Scandinavian Syntax 44, 141–155. Vikner, Sten: 1994, ‘Scandinavian Object Shift and West Germanic Scrambling’, in Norbert Corver and Henk van Riemdijk (eds.), Scrambling, Foris/de Gruyter, Berlin, pp. 487–517. Vikner, Sten:1995, Verb Movement and Expletive Subjects in the Germanic Languages, Oxford University Press, Oxford/New York. Vikner, Sten and Rex Sprouse: 1988, ‘Have/Be Selection as an A-Chain Membership Requirement’, Working Papers in Scandinavian Syntax 38. Watanabe, Akira: 1993, ‘The Role of Triggers in the Extended Split INFL Hypothesis: Unlearnable Parameter Setting’, ms. University of Tokyo. Zwicky, Arnold and Geoff Pullum: 1983, ‘Cliticization vs. Inflection: English n´t’, Language 59, 502–513.


Directionality and Word Order Change in the History of English Ian Roberts

1. Introduction* A standard view of the historical development of English word order involves the idea that Old English (OE) was, like all other attested West Germanic varieties (with the possible exception of Yiddish—see Santorini 1992), head-final at least in VP and IP (see Stockwell 1977; Canale 1978; Lightfoot 1979, 1991; Bean 1983; van Kemenade 1987; Pintzuk 1991; Traugott 1992; Denison 1993; although not all of these authors assume an IP). In this respect, OE (and West Germanic generally) differs from Modern English (NE) (and North Germanic generally) in the value of the directionality parameter in (1), for at least some values of Y: (1) Directionality parameter: Y′ → Y XP Y′ → XP Y According to the standard view, at some point in the Middle English (ME) period—probably in the twelfth century—there was a change in the value of the directionality parameter (for the relevant categories). As a result of this change, the language became uniformly head-initial. This means that OV and V–Aux orders, formerly abundantly attested (see section 2), are no longer found. Recently, Kayne (1994) has argued that UG cannot contain a parameter like (1). Kayne proposes a theory of phrase structure which derives many of the properties of X′-theory from the central idea that asymmetric c-command relations among non-terminals are intrinsically connected to linear order among terminals. We can phrase the central constraint as follows:1 (2) If A, a non-terminal, asymmetrically c-commands B, a non-terminal, then all terminals a dominated by A precede all terminals b dominated by B. To see how (2) works in the case of head-complement order, consider the VP in (3): (3) [VP [V see[DP [D him]]]]

Directionality and Word Order Change  105 Here V asymmetrically c-commands D (the definition of c-command is ‘X c-commands Y iff X does not contain Y and every category dominating X dominates Y’). Hence, by (2), see must precede him. This conclusion would follow even if we chose to draw the phrase marker the other way around. Thus there can be no parametric variation as regards head-complement order; (2) (or whatever it derives from, cf. note 3) requires that heads precede their complements. Hence, all languages are underlyingly head-initial. In a system like Kayne’s, superficial OV patterns, or, more generally, head-final typologies, must be derived by leftward-movement processes. Chomsky (1994) adopts a similar position. Zwart (1993) has shown that this approach yields positive results in the analysis of Dutch; in particular, one can dispense with the idea that Dutch is a mixed-branching language, with some categories (e.g. CP) right-branching and others (e.g. VP) leftbranching. Zwart’s proposal is that ‘The SVO order of Dutch main clauses is derived from an “underlying” SOV order, visible in embedded clauses. However, this SOV order is derived from an underlying SVO order in the Dutch VP, still visible when the object is not a noun phrase but a clause’ (p. 29). The proposal accounts for the following generalizations about Dutch: (i) ‘top projections’, e.g. C and D, are always head-initial; (ii) ‘when a head allows its complement to appear on one side only, the complement always follows the head’—this is true of the complements of N, for example; (iii) ‘when the head allows its complement to appear on both sides, the head and the complement are never adjacent when the complement precedes the head’—this is true of the complements of A, P and V. Zwart captures these generalizations by assuming that all categories are head-initial, and that some complements, in particular direct objects of V, move leftward during the derivation. In addition to accounting for these generalizations the proposal allows a simple treatment of verb raising (as in fact the absence of overt verb movement). Various other proposals made by Zwart will be discussed and adopted to OE below. The purpose of this chapter is to explore the consequences of what we might call the Zwart/Kayne view for OE and, in particular, for the wordorder changes that took place in ME. I will argue that the Zwart/Kayne view is at least as good as the more standard views as regards the synchronic analysis of OE, and that it permits a more natural and revealing account of the ME changes. One advantage can be identified straightaway. On the ‘standard’ view, OE is not a uniformly head-final language. For example, there is no doubt that CP and DP are both head-initial projections, as in Dutch. The projections that are usually regarded as head-final are IP and VP (we leave aside AP, NP and PP; these are usually regarded as either headinitial or mixed, and so they do not affect the point at hand). However, there is a fair amount of evidence for a medial IP of some kind (cf. in particular Pintzuk 1991), some of which I review below. If we split IP into various functional projections, then we may be able to claim that some of them are head-initial and some head-final. A claim like this is made in Cardinaletti &

106  Ian Roberts Roberts (2002 [this volume, Chapter 12]), for example. However, this kind of analysis shares with the standard analysis the consequence that at some point in the functional system that makes up the clause there is a switch from head-initial to head-final patterning. Note also that this is the most restrictive view compatible with the mixed typology. Such mixed typologies look very suspect: clearly it would be better to opt for a uniform direction in head-complement ordering. In that case, the only possibility that is seriously workable is to assume that OE is uniformly head-initial. More generally, the view I advocate is: (4) Principle: Y′ → Y XP Parameter:  Morphosyntactic features causing leftward movement from VP. What changed in ME could not, ex hypothesi, have been the base expansion of V′ or I′.2 Instead, leftward movement possibilities are lost in ME. In this way, the word-order change becomes one of a type that is already very familiar: the loss of a movement dependency. It thus falls into the same general category of changes as the loss of V-to-I movement in early Modern English (cf. Roberts 1985 [this volume, Chapter 1], 1993a: ch. 3; Rohrbacher 1994a) or the loss of V2 in English (van Kemenade 1987; Platzack 1995) or French (Adams 1987; Roberts 1993a; Clark & Roberts 1993 [this volume, Chapter 2]), etc. More precisely, in each case I take it that strong features of the relevant kind (I’s V-feature in the case of V-to-I movement; the relevant feature of C in the case of V2) are lost, leading to the impossibility of movement thanks to UG-internal economy conditions (the Procrastinate Principle). In the case at hand, AgrO loses strong N-features, and so DP-movement to Spec, AgrOP is lost. Thus change in base expansion of X′ is reduced to the more familiar loss of movement dependencies, itself caused by changes in abstract features of functional heads. This kind of change is well attested, and, if Clark & Roberts are right, can be understood in terms of the idea that the language-learning algorithm contains a simplicity metric which values the absence of overt movement, and therefore weak features of functional heads, more highly than overt movement, i.e. strong features of functional heads. Hence language acquirers will tend to assign representations without overt movement to parts of the input which involve movement in the adult grammar. I take this treatment of word-order change to be a positive move. The chapter is organized as follows. In section 2, I review what I will call the ‘standard’ GB view of OE word order and describe the ME changes. The ‘standard’ view is a distillation of the work of many researchers, most notably van Kemenade (1987), although it probably does not correspond precisely to any one analysis that has been put forward. In section 3, I present an alternative view, arguing that OE can usefully be seen as a VO language. Section 4 deals with the word-order change in ME.

Directionality and Word Order Change  107

2. ‘Standard’ GB Accounts of OE Word Order 2.1 Introduction In this section, we first discuss the evidence for head-final order in OE IPs and VPs. Then we discuss the various operations that must be postulated in order to account for the attested facts if this order is assumed. These fall into two groups: three rightward movement operations (verb raising, verbprojection raising and extraposition) and two leftward movement rules (scrambling and clitic placement). We will discuss each of these operations in turn. I should emphasize at the outset that throughout this section we are reporting a point of view that will be replaced by an alternative later in the chapter. There is a consensus among scholars who have worked on OE syntax that the predominant word order in the clause is verb-final in subordinate clauses and verb second in main clauses (see Stockwell 1977; Canale 1978; Lightfoot 1979; Bean 1983; Denison 1993; van Kemenade 1987; Traugott 1992). The situation is thus very similar to that in Modern Dutch or German. The following examples illustrate OV order in subordinate clauses: (5) a. . . . Ϸæt  ic þas boc  of  Ledenum gereorde  to  Engliscre spræce That I this book from Latin  language to  English  tongue awende (AHTh, I, pref, 6; van Kemenade 1987: 16) translate ‘. . . that I translate this book from the Latin language to the English tongue’ b. . . . pæt  he his stefne up ahof (Bede 154.28; Pintzuk 1991: 77) that he his voice up raised ‘ . . that he raised up his voice’ c. . . . forþon  of  Breotone nædran on scippe lædde wæron    because from Britain  adders  on ships  brought were   ‘. . . because vipers were brought on  ships from Britain’ (Bede 30.1–2; Pintzuk 1991: 117) The example in (5b) shows that verb-particle complexes can appear in the order Particle–V, again as is typical in Modern Dutch and German subordinate clauses (see Koster 1975). The example in (5c) shows that auxiliaries can follow participles in subordinate clauses (auxiliaries also follow infinitives in embedded clauses where there is no verb (-projection) raising— see below). This is another trait shared with Modern Dutch and German (although verb raising interferes with this pattern in Dutch—see below), and is also a typological feature of ‘OV’ languages (see Greenberg 1963; Hawkins 1983). The usual assumption is that verb-second order is derived by an operation which fronts the verb from its final position to the second

108  Ian Roberts position (although the precise nature of the second position remains a matter of debate on which I will take no position here), see Koster (1975), den Besten (1983) on Dutch, and van Kemenade (1987) for a clear demonstration that the arguments applied to Dutch carry over to OE. Hence OV order is taken as underlyingly general to all clauses in OE. Aside from verb second, various factors disguise the basic OV order. In the following subsections, I illustrate these, summarizing the standard analysis in each case. 2.2 Rightward Movement Rules 2.2.1 Verb Raising This phenomenon is well known from studies of Standard Dutch (see inter alia Evers 1975; den Besten & Edmonson 1983; Rutten 1991). As we mentioned above, finite auxiliaries usually follow non-finite participles or infinitives in languages conforming to the OV typology; (5c), where a finite auxiliary follows a participle in an embedded clause, illustrates this for OE. This is also generally the case in Standard German, for example. However, in Dutch a large class of finite auxiliaries and auxiliary-like verbs must or may precede the participle or infinitive. The following contrast (from van Kemenade 1987: 56) illustrates this: (6) a. . . . dat Jan het  boekje  wilde hebben(Dutch)    that John the booklet wanted to have b. . . . daß der Johann das Büchlein haben wollte(German)    that the John the  booklet  to have wanted ‘. . . that John wanted to have the booklet’ The precise details of which Dutch verbs allow this, and under which conditions, are complex, and are treated in detail in the references just given. The essential point is that, alongside the expected order (given an assumed OV typology) of non-finite verb followed by finite verb (or aux), Dutch also allows the order in which the finite verb / aux followed non-finite verb. Verb raising is found in OE, as the examples in (7) show: (7) a. þe æfre  on gefeohte his hande wolde afylan who ever in  battle his hands would defile ‘who would ever defile his hands in battle’ (ÆLS 25.858; Pintzuk 1991: 102) b. & from Offan kyninge Hygebryht wæs gecoren and by King O.  H.  was chosen ‘and H. was chosen by King O.’ (ChronA 52.8-54.1 (785), Pintzuk 1991: 102)

Directionality and Word Order Change  109 Assuming an OV, I-final structure for OE (7) would involve verb raising as in (8): (8) þe æfre on gefeohte [VP his hande ti] wolde afylani. (8) shows that the infinitive afylan moves to the right. Most analyses (e.g. Rutten 1991) assume that it adjoins to the right of I, the position containing the inflected verb. 2.2.2 Verb-Projection Raising This phenomenon is the counterpart of verb raising, with the extra complication that some further constituent, typically a complement of the non-finite verb, appears to the right of the finite auxiliary and (consistent with OV typology) preceding the non-finite verb. Although not attested in Standard Dutch or German, this phenomenon is found in many Continental West Germanic dialects, e.g. West Flemish (Haegeman & van Riemsdijk 1986; Haegeman 1992) and varieties of Swiss German (Haegeman & van Riemsdijk 1986). It is also found in OE, as the examples in (9) show: (9) a. hwær  ænegu Ϸeod at oϷerre mehte [frið begietan] where any  people from other  might peace obtain ‘where any people might obtain peace from another’ (Or31.14–15; Pintzuk 1991: 113) b. Ϸæt  nan man ne  mihte [ða meniu geniman] that no man NEG could the multitude count ‘that no man could count the multitude’ (ÆLS 25.418; Pintzuk 1991: 33) Here the bracketing indicates the constituent that, on an OV analysis, must be assumed to move to the right of the finite auxiliary. This constituent is often thought of as a VP or other V-projection, whence the term ‘verbprojection raising’. Although the issue of the category of the post-verbal constituent is clearly distinct from the issue of the underlying position of this constituent, we will suggest below that these constituents represent something larger than VP, and in fact may be clausal. In any case, this constituent is assumed by proponents of an OV analysis of OE word order to move from a position immediately preceding the inflected verb to the superficial position seen in (9). 2.2.3 Extraposition If we assume an underlying OV order for OE, we must assume that this language, once again like Dutch and German, allows rightward extraposition

110  Ian Roberts of PPs and CPs. Unlike these languages, however, it also allows apparent rightward extraposition of DPs (cf. Stockwell 1977 or ‘exbraciation’): (10) a. drihten wæs acenned [PP on þære byrig] the lord was born in the city ‘The Lord was born in the city’ (ÆCHom i.34.9; Pintzuk 1991: 69) b. þæt turonisce folc wilnigende wæs [CP þæt Martinus wære to the Tours people desiring  was that M. were  as biscope gehalgod to heora burh-scire] bishop consecrated of their city ‘the people of Tours wanted M. to be consecrated as bishop of their city’ (ÆLS 31.254–6; Pintzuk 1991: 69) c. . . . þæt ænig mon atellan mæge [DP ealne þone demm]     that any  man  relate  can all the misery ‘. . . that any man can relate all the misery’ (Or 52.6–7; Pintzuk 1991: 36) In each of these cases, the bracketed constituent is assumed to have moved from a VP-internal position to the left of the main predicate. We could schematize this operation as follows: (11) [VP ti V] XPi (where XP = CP, PP and, possibly, DP) Clearly, the existence of examples like (10c) is superficially incompatible with the OV typology, although the facts can be accounted for by postulating an optional DP-extraposition rule. Pintzuk & Kroch (1989) show, on the basis of a metrical analysis of Beowulf, that whenever a DP is found in final position in that text, after the finite verb in a subordinate clause, it receives stress. They interpret this as indicating that these orders result from the application of an operation like Focus NP Shift (cf. Ross 1967; Stowell 1981). They also show that final PPs and CPs are not always subject to stress of this kind, hence these elements appear finally due to the operation of an extraposition rule just like that of Dutch or German (cf. Koster 1975). In this way, the basic ‘OV’ typology of OE can be maintained; the sole difference between OE and Modern Dutch or German is the existence in OE of Focus NP-Shift and the absence of a rule of this type in Dutch and German. However, it is not clear if this result can be maintained for OE in general, aside from Beowulf. In that case, we must postulate optional DP-extraposition for OE prose texts. OV analyses of OE word order have not properly solved this problem. Verb raising, verb-projection raising and extraposition all involve rightward movements of various kinds. OE also had various leftward movement operations that distorted the basic OV typology. We now turn to these.

Directionality and Word Order Change  111 2.3 Leftward Movement Rules 2.3.1 Scrambling Originally discussed in Ross (1967), scrambling is an operation which, in West Germanic languages, moves definite DPs leftwards within the clause, to some position outside VP. In Dutch and German, a diagnostic for scrambling is the order of a definite complement with respect to clausal negation, since the marker of clausal negation is taken to be at the left margin of VP (cf. Bennis & Hoekstra 1986; den Besten & Webelhuth 1990; Deprez 1994; Fanselow 1990; Lee & Santorini 1994; Webelhuth 1991). The following examples illustrate scrambling in German; in both cases, the fact that the direct object das Buch precedes nicht is taken to show that it has left VP: (12) Gestern  kauftev Peter. . . yesterday bought Peter

a. . . . das Buchi ohne Zweifel nicht [VP ti tV] the book without doubt not b. . . . ohne Zweifel das Buch nicht [VP ti tV] without doubt  the  book not ‘Yesterday John without doubt didn’t buy the book’ (Vikner1995: 5) Assuming the adverb ohne Zweifel to occupy a constant position (an assumption that is denied by Zwart 1993, for example), the object is clearly moved further in (12b) than in (12a). There is little consensus in the literature on West Germanic either as to the nature of the scrambling operation as A-movement or Ā-movement or as to the landing sites of the operation. For our present expository purposes, this is not important; what is crucial is that the different positions occupied by the direct object in (12), along with the possibility of it following negation (i.e. remaining in the putatively VP-internal position) show that it can be moved to the left out of VP. In OE, clausal negation is signalled by pre-verbal ne, hence this diagnostic for scrambling is not available. Nevertheless, the following is a plausible instance of scrambling since the complement precedes an adjunct, and hence, given standard assumptions about the base positions of complements and adjuncts, must have moved leftward out of VP as indicated: (13) ne  mihton hi  nænige fultumi æt him [VP ti begitan] not could  they any help from him  get ‘They couldn’t get any help from him’ (Bede 48.9–10; Pintzuk 1991: 1.44) We can also see the effects of scrambling from interactions with verb-projection raising. On the standard view of OE word order, the following example

112  Ian Roberts involves scrambling of the direct object combined with rightward movement of the constituent containing the trace of the scrambled object: (14) þæt  he þæt godes husi wolde [myd fyre ti forbærnan] that he the  god’s house wanted with fire to-burn ‘that he wanted to burn the god’s house with fire’ (ÆLS 25.613–14; Pintzuk 1991: 39) The above discussion does not take into account the possibility of movement to Spec, AgrOP. Clearly, ‘short’ scrambling of the type in (12b) might be analysable in this way (if NegP is taken to be contained in AgrOP). This possibility, which does not figure in ‘standard’ accounts of OE word order will be considered in more detail below. If such an operation can be motivated, it adds a third case of leftward movement to the inventory of OE leftward movement rules (or, if it could be shown that there is only one position for leftward-moved complements outside VP, then it would substitute for scrambling on the view I am outlining here—however, we will see below that there are clearly two such positions). For our present exposition, we do not distinguish movement to Spec,AgrOP from scrambling. 2.3.2 Cliticization OE had a special position, or class of positions, for clitics. These elements preceded the main verb in ordinary matrix declaratives, followed the verb in matrix interrogatives, negative clauses and clauses beginning with a certain class of adverbs and followed the complementizer or the subject in embedded clauses. These positions are illustrated by the following examples: (15) a. God him   worhte þa  reaf of fellurn (clitic–verb) God them made then garments of skin ‘God then made them garments of skin’ (AHTh, I, 18; van Kemenade 1987: 114) b. Hwæt sægest ϸu, yrþlincg? (verb–clitic) ‘What sayest  thou, ploughman?’ (AColl 22; van Kemenade 1987: 138) c. . . . ϸæt him his fiend wæren æfterfylgende (clitic–subject)   that him his enemies were  following ‘. . . that his enemies were following him’ (Oros.,48, 12; van Kemenade 1987: 113) d. . . . þæt þa Deniscan him ne  mehton þæs ripes     that them the Danes NEG could  the harvest forwiernan  (subject–clitic) refuse ‘. . . that the Danes could not refuse them the harvest’ (ChronA 89.10 (896); Pintzuk 1991: 188)

Directionality and Word Order Change  113 It is usually assumed that the clitic moves to its special position from an underlying position within VP. The precise nature of this movement rule, and of the alternating cl–V and V–cl orders, is not clear. It is likely that West Germanic cliticization processes are connected to scrambling—see for example Haegeman (1993a) and below. On the relative order of verb and clitic, see van Kemenade (1987) and Tomaselli (1995). The above paragraphs outline the kinds of assumptions and analyses that are typically proposed for OE, assuming that the language is head-final in VP and IP (although we should note that Pintzuk 1991 argues for a ‘double base’ system where both IP and VP were subject to synchronic variation in OE with respect to the headedness parameter). In the next section, I will propose a head-initial analysis for OE VPs and IPs.

3. OE as a Head-Initial Language 3.1 Introduction The purpose of this section is to show that OE can plausibly be analysed as head-initial. Our goal is quite limited: I wish to show only that a head-initial analysis does no worse than a head-final one. In fact, a number of peculiar properties of OE have to be stipulated on a head-initial analysis just as they do on a head-final analysis; it is interesting to observe that many of the same descriptive puzzles arise on both approaches, suggesting that they are not mere artifacts of a given approach. I should also note that this section is compatible with a less restrictive theory of phrase structure than Kayne’s, such as standard GB versions of X′-theory: I argue that there are no compelling empirical reasons to analyse OE as head-final. What we will see is that since OE can be analysed as VO it does not pose a strong empirical challenge to a thesis such as Kayne’s. To put it another way: there is a weak version and a strong version of that thesis being put forward here. The weak version states OE is not an OV language and hence that there are fewer underlyingly OV languages the previously thought. The strong version says that there are no OV languages, and hence OE is not an OV language. The facts do not decide between than weak and the strong version, and, in a sense, it is not crucial for the empirical claims of this chapter that Kayne’s view be accepted. Either version, however, requires that our conception of the attested word order changes in English be rethought. Moreover, if Kayne’s view is accepted, then all cases of OV-to-VO change will have to be looked at in the terms proposed here, and so our argument has a more general scope (see note 11). We begin by discussing the position of I, and then move to the position of V. 3.2 The Position of I Zwart (1993) points out that if we adopt the checking theory of Chomsky (1993a) there is no reason to assume that any verbal functional projection,

114  Ian Roberts i.e. any I-type element, is ever head-final. The basic motivation for this assumption in all the West Germanic languages has been that V-to-I movement creates an inflected verb. Hence, if inflected verbs are found in final position, I must be final. But if V is inserted fully inflected and raises to I to check its morphological features, it is possible to assume that its final position does not correspond to I but to V, with the possibility that the raising takes place covertly. Thus the final position of inflected verbs does not tell us anything about the position of I. (I will return to the question of V-movement below.) Other properties can serve as diagnostics for the position of I-type elements, though. It has often been proposed (beginning with Kayne 1989) that clitics raise to I-type positions. It is clear that there are at least two types of ‘clitic’ elements crosslinguistically, and there are at least two types of positions that they raise to. On the first point, Cardinaletti (1994) distinguishes between clitics and weak pronouns by saying that (a) the latter are homophonous with strong pronouns while the former are not and (b) that the latter only optionally move to ‘special’ positions while the former do so obligatorily. By criterion (a) OE ‘clitics’ are weak pronouns; by criterion (b), they are clitics since they are always found in special positions (although there are some examples where this is debatable—see (19) below). While it may be correct to view OE ‘clitics’ as weak pronouns, this does not affect the argument to be made here since the important point for our purposes is the nature of the special position that is moved into. The fact that this movement is obligatory in OE (examples like (19) notwithstanding) strongly suggests that these elements are clitics, and I will take this view in what follows. I will also follow Cardinaletti (1994) in regarding movement of these pronouns as head-movement, although I will return to this point in section 4. Concerning the second point, clitics or weak pronouns seem to be attracted either to the inflected verb (as in Romance) or to a position after C (as in Germanic; possibly identifiable as the Wackernagel position). Rivero (1994) distinguishes ‘I-oriented’ from ‘C-oriented’ clitics. In these terms, OE clitics are clearly C-oriented, like other West Germanic clitics. However, I will suggest directly that the ‘C-oriented’ position is a kind of Agr-position; in particular, although close to C, it cannot be identified with C at least in OE. The OE clitic alternations in main clauses that are illustrated in (15) suggest strongly that clitics occupy a functional position lower than C. In the orders WH–V–CL (15b), V can be reasonably thought of as being in C (Rizzi 1991), in the order XP–CL–V (15a) either we must explain why the clitic selectively moves with V only in these cases (cf. Tomaselli 1995 for an attempt to do this for the comparable OHG data) or we must conclude that the verb and the clitic are in positions lower than C. Cardinaletti & Roberts (2002 [this volume, Chapter 12]), Pintzuk (1991) and Kiparsky (1995) take the latter option. The alternative of regarding the clitics as consistently moving to C (with V left-adjoined to C in (15b) and not raising to C in (15a)) cannot be maintained for embedded clauses where clitics can follow the

Directionality and Word Order Change  115 subject. Taking the weaker position that clitics always move to C in main clauses would (a) mean that there is no unified target for clitic placement and (b) posit an otherwise unmotivated root–embedded distinction in clitic placement which would be quite separate from that affecting verb placement. I thus conclude that clitics do not always move to C, and, since it is desirable to assume that there is the minimum possible number of targets for clitic placement, that they never move to C. The alternative, directly motivated by the position of the clitic in (15d) and indirectly by its position in (15a), is that clitics occupy ‘medial’ head-initial functional projections. These projections do not seem to form part of the complementizer system, since their nature does not seem to be determined by factors to do with complementation, and so I conclude that they are part of the I-system. This seems natural to the extent that it is plausible to consider that clitics move to special positions because they are subject to special checking requirements. The features to be checked are presumably ϕ-features (since the content of these elements is exhausted by such features), and hence the checking position is presumably an Agr-type position (cf. Sportiche 1996 and section 4 for a proposal along these lines). Note that this view of putatively ‘C-oriented’ clitics reduces the distinction between ‘C-orientation’ and ‘I-orientation’ either to orientation to different parts of the I-system or perhaps, to orientation to the same position with independent difference in V- and subject-placement. We thus have evidence for medial I-positions and, if we assume inflectional affixes are attached to verbs in the lexicon, no evidence for final I positions. Following Zwart (1993) (as the above reasoning does, modulo certain differences between Dutch and OE), we can conclude that IP, or the IP-type projections, are head-initial. We will see further evidence that supports this conclusion below. 3.3 The Position of V The V-raising operation is proposed in order to regularize the head-final order. If inflected verbs are assumed to have raised to I, then we must assume that the non-finite V has raised to the right of I in an example like (7a), repeated here: (7) a. þe æfre on gefeohte his hande wolde  afylan who ever in battle  his hands would defile ‘who would ever defile his hands in battle’ (ÆLS 25.858; Pintzuk 1991: 10) However, if the finite verb has not moved, we are not forced to conclude that the non-finite form is external to the VP headed by the finite verb. Hence it is possible that the non-finite verb is in a constituent that is a complement of the finite verb; in this case, it is likely that it is something larger than VP. Note that the position of the object is immaterial here; we must

116  Ian Roberts postulate leftward movement out of VP in any case, and so we could say that this element has moved here. In that case, the structure for (7a), rather than (8), would be (8′): (8) þe æfre on gefeohte [VP his hande ti] wolde afylani (8’) þe æfre on gefeohte his handei wolde [VP afylan ti] Presumably, the object moves for case-checking purposes. It may be the ‘verb raising triggers’ are verbs whose complements do not contain an (active) AgrO, and so movement out of them is forced (cf. note 10). It certainly well known that this class of verbs is close to the class of restructuring verbs in Romance (cf. Evers 1975; Rizzi 1982; Rutten 1991), and may be correct to analyse the phenomena associated with restructuring as the reflex of the functional substitution of the matrix AgrO for the low AgrO (cf. Roberts 1997). Where the non-finite verb precedes the finite verb, I assume that the complement (or, in some cases, part of the complement—see the discussion around (29) below) is fronted for checking. It seems then the infinitival clauses are subject to a distinct checking requirement from finite clauses, which are able to remain final. Non-finite complements are subject to the same leftward movement processes as DPs and other complements (e.g. small clauses). However, we must impose one restriction on them: material cannot intervene between a non-finite verb and a finite verb. This property is shared with Dutch small clauses, and implies, as in Zwart’s (1993) analysis of the placement of these elements, that they can only be moved to a relatively ‘low’ position. Positions available to definite DP complements which are non-adjacent to V are not available for non-finite clauses (see note 4 for more on this). Given these assumptions, and an analysis of (7a) as in (8′), V-raising has no clear motivation in OE. V-projection raising is also rather suspect. The same comments as were just made about verb raising apply: if we do not assume a final I it is not clear that we must treat a VP in post-finite-verb position as external to the VP headed by the finite verb. Zwart’s position on this is that ‘V-projection raising’ is exactly like ‘verb raising’ with the single difference that AgrO is available in the lower clause for checking the lower object (1993: 19, n. 14). We can adopt this view. This means that an example like (9a), repeated here for convenience, would involve only movement of the object frið internal to the complement, as illustrated in (9a′): (9) a. hwær ænegu þeod  at  oþerre mehte [frið begietan] where any people from other might peace obtain ‘where any people might obtain peace from another’ (Or 31.14–15; Pintzuk 1991: 113) (9′) a. hwær ænegu þeod at oþerre mehte [friði begietan ti]

Directionality and Word Order Change  117 One piece of evidence for this point of view for OE is discussed by Haeberli & Haegeman (1995). They show that OE contrasts minimally with West Flemish (WF) in that negative polarity items can appear in putatively raised VPs and form a single semantic negation with ne on the finite verb in OE, while this is impossible in WF: (16) a. þæt heora  nan ne  mehte [nanes wæpnes gewealdan](OE) that of-them none NE might no weapon wield ‘that none of them could use any weapon’ (Mitchell 1989: 660, cited in Haeberli & Haegeman 1995) b. *. . . dan-ze en-willen [tegen niemand klapen] that-they en-want to no-one talk c. . . . dan-ze tegen niemand en-willen klapen that-they against no-one en-want talk In (16b), the WF negative polarity item niemand cannot be licensed in a raised VP. This is unsurprising since it is a typical property of rightward-moved categories that they form islands of various kinds (cf. Haegeman & van Riemsdijk 1986 on this and other properties in both WF and Zurich German). However, (16a) indicates that the putatively rightward-moved VP in OE allows the negation to link up with the main negation ne (OE, like most varieties of English, has negative concord). Haeberli & Haegeman conclude that many instances of V-projection raising in OE should be analysed as involving verb movement to a medial I. This conclusion provides further evidence against a head-final IP, and partially undermines one of the rightward movement rules that the standard account assumes. (If we are to adopt Kayne’s general view, then we cannot account for the WF facts in terms of the islandhood of rightward-moved projections; for our purposes, it suffices to say that WF negative concord is subject to a restriction that OE negative concord is not subject to. I have no speculations to offer as to what that restriction might be. However, the important point for present purposes is that the absence of rightward movement makes the claim possible, while a theory that treats orders like those in (16) as derived from an island-creating rightward movement rule has no recourse for accounting for the OE vs. WF differences). Haeberli & Haegeman nevertheless argue that V-projection raising is needed in some cases (where van Kemenade 1987 had proposed it). These are cases where there is both a non-subject and a subject before the inflected verb, creating a situation in which there is ‘not enough space’ for both constituents if the finite verb is in I (e.g., the bracketing here is what is implied by Haeberli & Haegeman’s analysis): (17) a . . . þæt [IP he [?? þæs gewinnesi [I mehtej] [VP mare ti gefremman tj ] that he  the  victory  could better  achieve ‘. . . that he could better achieve the victory’ (van Kemenade 1987: 21, Oros 47, 1)

118  Ian Roberts b . . . þæt [IP mon [?? ælcne ceapi [I  mehtej] [VP be twiefealdan    that people each  commodity could  by twofold bet  ti geceapian tj] (van Kemenade 1987: 21, Oros 130, 2) better buy ‘. . . that people could buy each commodity twice as cheaply’ These examples are only problematic if one takes evidence of the type is (16a) as forcing a medial-I analysis. However, as mentioned above, another possibility is to treat the VPs (more precisely, the complement to mehte) as occupying the complement position of mehte. In that case, we are not force to regard the finite verb as having raised to I, and we know that there must be landing sites for scrambling to its left. Haeberli & Haegeman also note the following example with a negative polarity item in the putatively raised VP: (18) þæt wæs ða  ða he Iudeas nolde  nan wuht læran that was  when he the Jews not-wanted nothing/not to-advise hwæt hi   don scolden (Haeberli & Haegeman 1995, (31c), CP 58.433) what they do should ‘that was when he didn’t want to advise the Jews what to do’ The authors note that Iudeas here must be in an adjoined position; presumably it would be adjoined to I′. In terms of the idea just sketched, we can regard this DP as occupying a scrambled position to the left of the category whose head contains the inflected verb, clearly a more satisfactory analysis (in fact the I′-adjunction possibility is ruled out on Kayne’s 1994 assumptions). There may be a ‘Pollockian’ argument for V-movement here. If the second element of negation—nan wuht— is in a position comparable to NE not or French pas, then the inflected V is not in VP. However, if there is a scrambled position to its left the verb cannot be in the position that inflected verbs occupy in French (Agrs). But the negative polarity evidence shows that the verb is not final with a raised VP following it. I conclude that it must be in a medial I-position (medial in the sense of being lower than Agrs but nevertheless VP-external). The natural candidates are T and AgrO (assuming the clause structure proposed in Belletti 1990 and Chomsky 1993 where Agrs is higher than T and T higher than AgrO; AgrO is only a candidate to the extent that it is above NegP—see Roberts 1995 [this volume, Chapter 3]).3 For the sake of concreteness, we take this position to be AgrO. Another property that has been attributed to V-projection raising in WF and other contemporary West Germanic varieties is that pronouns cannot be part of ‘raised’ projections. However, Pintzuk (1991) gives examples where pronouns are in such positions in OE: (19) . . . þæt  heo wolde hine læran    that she  wanted him to-teach ‘. . . that she wanted to teach him’

(Pintzuk 1991, ÆLS 18.291)

Directionality and Word Order Change  119 There are several ways to interpret examples like this. First, if we assume V-projection raising, then we are led to conclude that OE pronouns are different from those of WF and other Modern West Germanic varieties. In that case, (19) would be evidence that pronouns do not always move to a special position in OE on a ‘V-projection raising’ analysis, supporting the idea that OE ‘clitics’ are really weak pronouns in Cardinaletti’s terms (and conversely that they are clitics in Cardinaletti’s sense at least in WF). Second, we could deny that this example contains V-projection raising, and treat it as evidence for medial I. In principle, such a conclusion would not tell us anything about the existence of V-projection raising elsewhere in OE or about the position of V in VP. Third, if we consider that ‘V-projection raising’ reflects the presence of an in situ complement, then we must take it that the pronoun has moved within the complement in (19). This option is unavailable in WF, etc. On this view, we have no evidence that OE weak pronouns are anything other than clitics. The kind of variation in clitic positions between OE and WF that we posit is attested in Romance languages with the very similar operation of clitic climbing: in Standard Italian, clitics can climb (to an appropriate matrix verb) or cliticize to the lower verb, while in Sardinian and a number of southern Italian dialects climbing is obligatory wherever it is possible. This parallel can be maintained whether or not we consider clitic movement to be head movement, since West Germanic languages allow ‘long’ scrambling in contexts which can be plausibly regarded as restructuring contexts (Evers 1975; Rutten 1991; Zwart 1993). The third position is consistent with everything else we have said, and also indicates that OE weak pronouns are clitics in Cardinaletti’s sense. I thus adopt this position. The evidence from Haeberli & Haegeman combined with the general considerations regarding verb raising and the position of functional head that we raised earlier combine to cast some doubt on the existence of V-projection raising in OE. If V-projection raising is not assumed, we have more cases of leftward movement than was previously thought: orders where VP precedes the finite verb might be derivable by leftward movement (again, not necessarily of VP, rather of a larger constituent). Since leftward movement is required in any case, this is not a problem. We also have prima facie cases of complements to the right of the verb. As in the case of verb raising we do not take these complements to be VPs. They must be at least IPs; more generally, they are complements that are ‘transparent’ rather in the manner of the complements to restructuring predicates is Romance languages (a similarity that was originally observed by Evers 1975), hence clitic climbing and scrambling can take place from within them. I thus suggest that V-projection raising, which has always been a highly problematic operation, does not exist. More specifically, I conclude that OE provides no cases of such an operation: the derived structure of putative examples of this construction is as in (9a′). I have now dispensed with two of the rightward movement rules of OE that the standard analyses assume. The remaining one is extraposition. For

120  Ian Roberts CP- and PP-extraposition, I simply assume that the elements in question are able to remain in their complement positions (Zwart 1993 provides evidence from the fact that post-verbal CPs are not necessarily islands— which extraposed clauses always are—in favour of the idea that this is the situation in Dutch; I do not have comparable data in OE, unfortunately). DP-extraposition is more interesting. Here, one possibility is to adapt Pintzuk & Kroch’s analysis in the obvious fashion: focussed DPs are able to remain in complement position. An alternative, at least for complements of non-finite verbs, is to say that final DPs are fronted inside the complements and the remainder of the complement undergoes the usual leftward movement operation for non-finite complements. The derived structure of an example like (l0c) would then be as in (20): (20) . . . þæt ænig man [XP atellan ti ] mæge [YP [DP ealne þone demm]i tXP] We will discuss this kind of derivation in more detail below—cf. (29). Focussing and remnant fronting account for many cases of final DPs. The residue must be demonstrably unfocussed and in embedded clauses with a single finite verb which are demonstrably not V2; and note that even this kind of example could be handled by postulating V-raising higher that AgrO and object movement to Spec,AgrO (see Note 4). (Similarly, V-movement allows for cases where the final DP is not adjacent to the finite verb). An important argument for V-final orders, originally due to Koster (1975), has to do with the positions occupied by particles belonging to separable verbs like terug+geven ‘give back’ in V2 clauses. Koster’s observation was that the position of the particle always corresponds to the position of the verb+particle combination in a verb-final clause: it cannot be followed by a DP, cannot be preceded by a finite clause, and may be followed by a PP. Koster argued that this distribution of particles could be simply accounted for if one assumed that the particle was stranded by V-movement. Hence the underlying position was final (modulo the position of finite clauses and certain PPs). Van Kemenade (1987: 29–39) applies Koster’s criteria to OE. What emerges is that particle positions pattern fairly systematically with V-positions, although they can be more readily separated from V in OE than in Dutch and they are also able to move with V in V2 clauses in OE, unlike Dutch. It seems, then, that the rule attaching Prt to V applies more liberally in OE. Aside from this, however, the particle positions in OE differ from those in Dutch in three main respects. i. Particles can be followed by complement DPs, as can finite embedded verbs: (21) . . . þa   ahof Paulus up his heafod    then raised P.  up his head ‘. . . then P. raised up his head’ (van Kemenade 1987: 33, AHTh, I, 96)

Directionality and Word Order Change  121 As van Kemenade notes, the ‘postposing’ of DP here is just a case of the general possibility of having post-finite-verb DPs in OE (here it seems clear that the post-verbal DPs are not necessarily focussed). Pintzuk (1991) shows that particles do not occur after a non-finite verb, and interprets this as an argument for a medial I. I concur in this view; this is consistent with the suggestion made above that finite V moves to AgrO. ii. In Dutch, predicate adjectives and participles have to precede the final V in embedded clauses and always precede the particle in V2 clauses. This is not the case for OE; (22a) shows that adjectives can follow V, and (22b) shows that an adjective can follow Prt: (22) a . . . forðam ðe  hi licettað hie unscyldige    because that they pretend themselves innocent ‘. . . because they pretend themselves to be innocent’ (van Kemenade 1987: 35, CP, 439, 19) b. . . . he ahof þæt cild up  geedcucod and ansund he raised the child up quickened  and healthy (van Kemenade 1987: 36, AHTh, II, 28) These examples arguably involve small clauses. Zwart (1993) argues that small clauses have to move to a special checking position (which he calls Spec, PredP) in Dutch, this derives the fact that constituents of this type always precede V in embedded clauses. Suppose this is true; then the OE data indicate that V can raise to a higher position in that language. We thus have another piece of evidence for V-movement. We also arrive at an important difference between OE and Dutch. iii. In Dutch, very few adverbs follow the finite V in embedded clauses and very few follow Prt in main clauses. OE does not show the embedded pattern of Dutch, i.e. it allows post-verbal adverbs is embedded clauses: (23) . . . ðæt  hie ðæt unaliefede doð aliefedlice    that they the unlawful do  lawfully ‘. . . that they do unlawful things as if they were lawful’ (van Kemenade 1987: 36, CP, 144,10) There is no data available with respect to the root pattern. Again, I take (21) as evidence for V-to-AgrO movement in OE. The object has been moved to Spec,AgrO, or perhaps higher. More generally, I regard particles, following Kayne (1985), as small clause predicates. They optionally adjoin to the left of V in OE. I also have to assume that V can ‘excorporate’ from Prt in OE, as in Dutch. When Prt does not adjoin to the left of V, it occupies the same positions as

122  Ian Roberts other small-clause predicates like those seen in (22). This gives the following possibilities: (24) . . .  V . . . Prt + t [ DP t]: (V-movement and Prt-movement) . . . þæt  he ahof up the earcan (Pintzuk 1991 78, GD(C) 42.6-7) ‘. . . that he lifted up the chest.’ (25) . . .  V [ DP Prt]     (no movement) . . . þæt  he wearp þæt sweord onweg  (Pintzuk 1991 91, Bede 38.20) ‘. . . that he threw the  sword away.’ (26) . . .  Prt+ V [ DP t]   (Prt-movement) . . .  þæt  up arisað  lease leogeras  (Pintzuk 1991; 84; WHom 1b.16) ‘. . . that up arise  false   liars.’ ‘. . . that false liars rise up.’ (27) . . .   V X [ DP Prt]   (V-movement) . . .   þa ahof  Drihten  hie  up (van Kemenade 1987: 33, Blick 157) ‘. . .  then raised the-Lord them up ‘. . .  then the Lord raised them up.’ The examples of ‘V-movement’ in (24)–(27) are cases where V moves beyond AgrO. In some cases, e.g. (27), this is clearly movement to C; in others, e.g. (24), it may not be (note the different position of the verb with respect to the subject in (24) vs. (27)). The DP subject of the small clauses (24)–(27) may be moved out of the small clause to a checking position— this is very likely for the clitic hie in (27), and cannot be excluded in the other examples (this depends on what we assume about the position of V; see note 3). If V does not in fact move further than AgrO in (25) and (26) then we must assume that the object can check for case inside the small clause; notice that on this view (26) becomes analogous to our proposal for final DP-complements of non-finite verbs in (20). It is striking that this order, like that in (20), is not found in contemporary West Germanic. So far, we have seen that we can do without the three rightward movement processes assumed by the standard analysis. We have seen that V moves to AgrO, and that small clauses, non-finite complements and DP-complements move leftwards. We have suggested that finite CPs, some PPs and, possibly, focussed DPs can stay in complement position, and that DP can be ‘stranded’ in final position by remnant complement-movement. It is natural to relate the CP- and PP-positions to the idea that such categories are not required to check for case, although given that we assume small-clause predicates and non-finite complement clauses are also subject to a checking requirement the notion of ‘case’ here is much more abstract than in GB theory—we return to this point briefly in the next section. We are also assuming both scrambling and cliticization. At this point, we should make our general position clear regarding the leftward movement of complements.

Directionality and Word Order Change  123 I propose that the OE clause contains the following positions:   1. a topic position: active only in V2 clauses (although V is not always in C in such clauses, as the clitic evidence shows);  2. C;   3. a subject position, although, as in Dutch and German, the subject can and often does remain in a lower position;   4. the clitic position, again comparable to German;   5. a scrambling position, which can also be occupied by the subject;   6. a second scrambling position;  7. the checking position for objects, non-finite complements and small clauses: Spec, AgrOP;   8. the position of the finite verb in non-V2 clauses: AgrO;   9. the base position of V (which in certain examples it appears to stay in); 10. the complement position, occupied by CP, some PPs and (possibly) focussed DPs. I distinguish two scrambling positions on the assumption that OE scrambling is parallel to that of German as seen in (12), although we have not seen any direct evidence of this. There are thus three possible positions for the direct object: 5, 6 and 7. For concreteness, we identify the second scrambled position (Position 6) as Spec, TP. If we take Position 3 to be Spec, Agr1P in the sense of Cardinaletti & Roberts, then the clitic position is Agr1 and the first scrambling position may be Spec, AgrS (following Kayne 1994 we do not structurally distinguish specifiers from adjoined positions and assume a given YP cannot simultaneously support a specifier and an XP adjunct, hence Position 5 is the position adjoined to AgrSP and when the subject moves there it presumably checks with AgrS, but a scrambled element may not).4 These assumptions give us the following clause structure: (28) [CP1 [C2 [Agr1P3 [Agr14 [AgrSP5 [AgrS [TP6 [T [AgrOP7 [AgrO 8 [VP9 10]]]]]]]]]]] The above proposal accounts for the word orders found in OE with little stipulation. Like all other analysts, we must allow for a certain optionality: for example, V-movement is not always required, even in main clauses. Also, we may have to allow for focussed DP complements to have a special privilege with respect to Case theory. It may be that the optionality in verb movement concerns the strength of features of functional heads which trigger movement (what this effectively means in a framework like that of Chomsky 1993 is that the optionality represents distinct parameter values; this view of OE is argued for by Pintzuk 1991, but there the idea is framed in terms of variation in the branching direction of I′ and V′. The proposal in Kiparsky 1995 that C may be absent from OE main clauses reduces to the same idea in Minimalist terms—C cannot be absent, but may have weak features in some instances and strong in others).

124  Ian Roberts To see how the system works, let us consider the principal subordinate clause word orders, as discussed in van Kemenade (1987) and Kiparsky (1995). Here I gloss over clitic positions, and use Aux to mean finite V: (29) a. S V Aux O b. S O Aux V c. S Aux O V d. S O V Aux e. S Aux V O f. *S V O Aux

(standardly DP-extraposition) (standardly V-raising) (standardly V-projection raising) (standardly underlying) (V-raising, DP-extraposition) (underivable)

In the head-initial system being advocated here, the grammatical orders are derived by combinations of object movement and ‘VP-movement’ (here again, it is likely that the category being moved is larger than VP). I now outline the relevant derivations in detail. On our account, (29a) must involve fronting of the non-finite verb. We can view this as a kind of VP-fronting if we can motivate moving the object out of VP. As sketched earlier (cf. (20)), I assume that this happens on the lower cycle, i.e. that the object moves out of the fronted constituent inside the lower clause, and the remnant of the lower clause is fronted. This analysis ties the existence of post-verbal DPs in OE to the existence of examples like (19) in terms of the idea that nominals, full DPs and pronouns can be fronted for checking on the lower cycle (cf. also the discussion of (25)). Neither examples like (19) nor post-verbal DPs are found in Modern West Germanic; this then reduces to the same fact (but cf. the discussion of (29c) below). Recall again that similar variation is found in clitic-climbing constructions across Romance. On this view, the relevant parts of the S-structure for (29a) are as follows: (30) . . . [AgrOP [VP V ti] [AgrO Aux] [XP . . . Oi . . . We could capture the connection to focus observed by Pintzuk & Kroch in the following manner: at LF, the object must raise to a position c-commanding its trace. This can only happen where the object is focussed and undergoes LF raising. The well-known weak-crossover effects discussed is Chomsky (1977) show that LF raising places DPs in a position higher that the subject position. Therefore, this position is higher than the position of the fronted VP in (30), and the object c-commands its trace at LF. This idea has two disadvantages: first, the empirical status of Pintzuk & Kroch’s result for texts other than Beowulf is uncertain, as we mentioned earlier; second, it is not clear that the fronted complement cannot be reconstructed (although if this is a kind of A-movement, we do not expect this possibility, unlike the Modern English VP-topicalization discussed in Huang 1993). We thus leave this question open. The order in (29b) results straightforwardly from either object scrambling or object movement to the higher Spec,AgrO. (29c) involves raising

Directionality and Word Order Change  125 of the object to Spec, AgrOP for case-checking inside the complement clause without movement of the remnant category containing V. Since this kind of order is found in some contemporary West Germanic varieties (it is standardly analysed as V-projection raising, cf. section 2.2.2), we have to conclude that checking on the lower cycle is allowed in these varieties. What is not allowed in these varieties, however, is (i) clitic placement on the lower cycle (cf. the discussion around (19)), and (ii) fronting of the remnant constituent into the higher clause (to give the order in (29a)). We can now derive the following prediction for West Germanic (other than English): a language in which non-finite complements are fronted has the order in (29a) only if it has V-projection raising, i.e. the order in (29c). This prediction is fulfilled; Old High German and Middle Dutch are like OE in having both orders, while no modern variety which has the order in (29c) has the order in (29a). (29d) can be analysed as object scrambling combined with VP-fronting (note that the object c-commands its trace inside VP), or as fronting of the entire complement including the direct object. We could regard (29e) as not involving movement (except for Aux-to-AgrO) and claim that the object must be focussed in order to escape the requirement that it move to Spec, AgrOP. Once again, if this cannot be sustained empirically, we need an alternative analysis. To capture (29e) while ruling out (29f) and without assuming that the final DP is obligatorily focussed, we must invoke a variant of the standard verb raising idea (cf. section 2.2.1). We propose, then, that the infinitive attaches to the auxiliary in these cases (and, presumably, in (29b)). This captures the further fact that nothing can intervene between Aux and V in such cases. As in all analyses of verb raising in systems where the order in (29c) (‘V-projection raising’) is also found, we have to treat this infinitivemovement as optional.5,6 Consider now the illicit order (29f). There are several possible derivations to look at. First, we can rule out the possibility that V and O move as a constituent, since the object must leave VP in order to be licensed in the lower clause (and V does not move to Aux here, clearly). So we assume that the object and the VP are both fronted separately. Essentially, VP must front to a position lower than the object. We have already guaranteed this by assuming that VP must move to the same position as small-clause predicates (including particles): Position 7. Thus, the object would have to be in VP to give the order in (29f), and we have just seen that this is impossible. (We also need to prevent the object from being focussed here; perhaps this can be achieved by requiring LF-raising of focussed categories and preventing raising out of a constituent on a left branch). The above paragraphs give my account of OE word order. The account fares no worse than standard ones (which are highly stipulative; all the processes described in section 2 are motivated purely by the need to attain descriptive adequacy) and in some cases does better, e.g. regarding the crosslinguistic prediction about the relation between the orders in (29a)

126  Ian Roberts and (29c). Our approach also captures the observation that non-finite Vs always form a ‘verbal cluster’ with finite Vs in West Germanic (cf. Evers 1975; den Besten & Edmonson 1983; Prinzhorn 1990), except when they undergo remnant topicalization to the first position (cf. den Besten & Webelhuth 1990). Verbal clusters result either from fronting a non-finite complement to Position 7 or from V-raising out of the complement (note that this is not optionality in the sense of two parametric systems; the same checking operation takes place in each case, and, seemingly, equally economically). One final point: the ten positions of the OE clause carry over straightforwardly to Dutch and German. There are three properties which distinguish both Dutch and German from OE, all of them well known and all of them unaccounted for on this or any other analysis. First, V always precedes clitics in V2 clauses in both Dutch and German but not OE. We illustrate with German—this example should be contrasted with the grammatical OE (15a): (31) *Gott ihnen werkte die Kleider God  them made the clothes We take this to indicate that V moves to C obligatorily in topic-initial V clauses in Dutch and German, but not in OE (cf. Kiparsky 1995); we take no position on the analysis of SV V2 clauses—cf. Zwart (1993), Schwartz & Vikner (1996). Second, V does not raise to AgrO in embedded clauses in either Dutch or German (but cf. note 3), hence we do not find post-verbal final adverbs or small clauses in these contexts. Compare the following Dutch examples with (22) and (23) respectively: (32) a. * . . . omdat hij ontving  de wijnglazen  gebroken     because he  received the wine-glasses broken b. * . . . omdat hij zijn werk  deed ijverig     because he  his  work did industriously (van Kemenade 1987: 27) Third, as we have seen, final (light) DPs are not allowed, as the following Dutch example shows: (33) * . . . omdat hij kocht het boek.     because he  bought the book. Dutch also differs from both German and OE in not allowing complement clitics to precede the subject. We take this to indicate that the subject must appear in Position 3 (Spec,Agr1P) in Dutch—cf. Cardinaletti & Roberts (2002 [this volume, Chapter 12]). These differences aside, Dutch and German pattern like OE.

Directionality and Word Order Change  127

4. Changes in ME This section will outline my account of how word-order change took place in the early ME period. This account can link the word-order change to three other important changes that took place at this time and can be embedded in a general approach to language change. These advantages follow naturally from the assumption of head-initial order in OE. It follows that if Kayne’s theory of phrase structure forces us to assume that OE was head-initial, this theory has advantages in this particular domain over less restrictive theories such as the standard GB one. At the same time as the word-order change, two other important syntactic changes took place: (34) a. Loss of complement clitics (but cf. note 7) b. Loss of scrambling Van Kemenade (1987) argues at length that the object cliticization was dramatically reduced in the twelfth century and completely extinct by 1400. On the other hand, subject clitics are found until c.1400, especially in southern texts. The loss of the subject clitics was connected to the loss of V2 (cf. van Kemenade 1987). However, our main interest here is in complement clitics. It is often observed that English word order was ‘rigidified’ in ME.7,8 Although the word order in ME (and indeed ENE) was freer in various ways than in NE (see note 7), we interpret this observation as meaning that scrambling disappears quite early on, around the time of the word-order change. Another change that takes place in early ME is the loss of the morphological case declensions. OE had a system of case marking on nouns which distinguished four cases and two numbers, and up to seven declension classes (case marking on articles has a slightly different history—see note 9). Owing at least in part to phonological changes (the reduction of unstressed vowels to [ə] and the loss of final nasals) and in part to standard processes of morphological ‘levelling’, this system was reduced by EME to one where nominative–accusative distinctions were essentially no longer made, and only the dative ([–ǝ]) and genitive singular ([–(ǝ)s]) survived. Of these, the former did not last long, and so we arrive at essentially the modern system. These changes can be illustrated with the following paradigms for stone, which in OE was a representative of the masculine a-stem declension, the one to which all other declensions were apparently levelled: (35) OE:

nom: acc: gen: dat:

stan stan stanes stane

stanas stanas stana stanum

128  Ian Roberts 12c.


ston ston stones ston(e) stoon (sg)

stones stones stone(s) stonen/s stoon(e)s (pl/gen sg)

For further details on the loss of the OE morphological case system, see Lass (1992: 103–112). The ‘standard’ account of the development of English word order postulates a change in the directionality parameter in (1), taking place in the twelfth century (see van Kemenade 1987 and Lightfoot 1991). It is unclear how to connect this change to the loss of clitics and scrambling, and to the loss of morphological case (although van Kemenade posits a connection between the loss of clitics and the loss of case morphology). The account that I propose here has the merit of connecting the word-order change to the changes in (34). It is also possible, as we shall see, to regard the loss of morphological case as the trigger for all these changes. The account of OE word order sketched in section 3 crucially involves object fronting to Spec,AgrOP. Along with scrambling, this gives rise to many OV orders. Suppose in fact that these two processes are linked. For concreteness, I take scrambling (in West Germanic) to be Ā-movement. That is, scrambling is movement to a non-L-related position; the landing site of scrambling is adjoined to a maximal projection that has no lexical feature to assign to the scrambled element. However, scrambled DPs must check for case; thus, they move through Spec,AgrOP en route to the scrambled position (I assume the same, or the analogous, for scrambled indirect objects and some PPs; the checking mechanism that is relevant here may correspond to the GB notion of inherent case, and so it is no surprise to see it extended to at least some PPs). Movement to Spec,AgrO is required because AgrO has a strong N-feature (although, possibly, focussed DPs are exempt from this requirement, as we have seen). I take the general view that ‘deep’ syntactic changes such as word-order change arise through restructuring of grammars by language acquirers (see in particular Lightfoot 1979, 1991). In that case, if we are to understand the word-order change in early ME, we must understand what leads language acquirers to postulate a strong N-feature associated with AgrO. On this point, I assume the following: (36) a. Morphological trigger: if a head H has the relevant L-morphology then H has strong L-features. b. Syntactic trigger: if a well-formed representation can be assigned to a given string by assuming that H has strong L-features, then H has strong L-features. c. In general, weak features are the default value. These are assumed in the absence of clear evidence to the contrary of the type in (a) or (b).

Directionality and Word Order Change  129 (36a) states, in general terms, what the morphological trigger of a strong feature is. It is motivated by the attested facts involving the loss of V-to-I movement and the loss of agreement features in sixteenth-century English (see Roberts 1985 [this volume, Chapter 1], 1993a; Pollock 1989; Rohrbacher 1994a). Similarly, (36b) gives a rough statement of the syntactic trigger for a strong feature. The assumption in (36c) derives from the general idea that a preference for maximally simple representations of the input is a property of the learner (cf. Clark & Roberts 1993 [this volume, Chapter 2] for a formalization of this). The simplest representation compatible with the input is chosen, where representations lacking overt movement are defined as simpler than those featuring movement dependencies (arguably because overt movement always creates adjunction structures, while the lack of movement may not, and adjunction structures are more complex than non-adjunction structures). In this sense, we see that weak features give rise to simpler representations than strong features, and so robust positive evidence is required for strong features, while weak features represent the default (or unmarked) value (this conception of markedness is discussed and illustrated at greater length in Roberts1999 [this volume, Chapter 5]). In the case of the word-order change in early ME, the morphological trigger in the sense of (36a) is provided by nominal morphology, i.e. case marking. As we saw above, the OE morphological case system broke down in the early ME period. Once this happened, there was no morphological trigger for strong N-features on AgrO. So we see that the loss of case marking in English removed part of the trigger for these features.9 However, the loss of part of the trigger for the strong value of the AgrO parameter does not on its own change the value of that parameter; it simply means that the syntactic trigger became crucial for determining its value. To prevent the parameter defaulting to the weak value in the absence of morphology, OV orders must be robustly attested in the trigger experience, forcing the postulation of representations where AgrO has a strong N-feature. In this respect, the crucial factor was the existence of post-verbal DPs, and a number of other post-verbal constituents (particles and other small-clause predicates) owing to the existence of V-to-AgrO movement and stranding of final DPs by remnant movement of non-finite complements. As long as the morphology provided a clear trigger for a strong value of AgrO, these constructions were assigned representations of the type seen in section 3. Once the morphological trigger for DP-movement to Spec,AgrOP was lost, however, VO and other V-complement orders could be assigned simpler representations not involving DP-movement. In such representations, AgrO has a weak N-feature. This kind of analysis is favoured by the preference for weak feature-values, (36c) (which in turn derives from a general preference for simpler representations wherever possible, as mentioned above). Hence the presence of these orders weakens the syntactic trigger for a strong feature on AgrO by making possible more highly valued representations in which AgrO has a weak feature. Given this possible analysis of VO orders, the weak

130  Ian Roberts value of AgrO is both the default value, and confirmed by part of the trigger experience. Hence there is no robust syntactic trigger for the strong value of the parameter, and the OV orders either die out or are reanalysed.10 Thus the AgrO-parameter changed as the formerly strong N-feature became weak. This entails the loss of the orders in (29b, c, d). As I mentioned above, scrambling must move through Spec,AgrOP in order for the scrambled DP to check its case. Once Spec,AgrOP loses its strong case feature, there is no reason to move there overtly. Hence, by Economy, movement to this position becomes impossible, and scrambling is lost. Of course, wh-movement survives the loss of a strong N-feature on Spec,AgrOP. We must assume that wh-movement can allow its trace to check for case, but that scrambling cannot. The reason for this might be that whtraces are ‘true variables’ while the traces of scrambling are not, a fact which we can connect to the well-known fact that (Germanic) scrambling does not trigger weak crossover (cf. for example Lee & Santorini 1994, Vikner 1994b). Suppose further, following Sportiche (1998), that clitic placement (in West Germanic) involves DP-movement followed by local D-movement. The only kind of DP-movement that can place D within range of the clitic position without violating the head movement constraint is scrambling, given our analysis of OE clause structure. The loss of scrambling thus implies the loss of special positions for clitics. More precisely, it implies the loss of complement clitics; subject DPs can move within range of the clitic position by A-movement. As we saw above, complement clitics are essentially lost at the same time as the word order changes in early ME. We are thus able to treat the principal cause of the word-order change as the loss of morphological case marking, and connect this change to the loss of scrambling and complement clitics. Notice that the loss of the latter two operations is guaranteed by central principles of the theory, in particular economy constraints on movement. Linking the word-order change to the loss of morphological case raises two major objections from a comparative Germanic perspective. First Icelandic is VO and has a rich morphological case system. Second, Dutch is OV and lacks morphological case (by and large). We can account for Dutch in terms of the idea that (36) provides: the morphological trigger for the strong N-feature of AgrO. (36) states that AgrO has a strong feature if the morphology is present. Hence, the lack of morphology implies nothing. In Dutch, the syntactic properties that facilitated the change in AgrO’s featurevalue in English are missing: V-to-AgrO movement and final DPs.11 Hence the reanalysis has not taken place, and Dutch retains a strong N-feature or AgrO despite the lack of a morphological trigger. For Icelandic, we must say that V always moves to a higher position than the object: here the issue of object shift comes up again. It is unclear, however, why Icelandic should lack scrambling. Again, though, our account simply says that there is a necessary condition for scrambling—a strong N-feature on AgrO —we have nothing to say about what the sufficient condition might be.

Directionality and Word Order Change  131

5. Conclusion I have argued that the facts of OE and ME word order, and the changes relating the two systems, can be accounted for in terms of the idea that OE was a head-initial language. Our argument concerning the synchronic analysis of OE was simply that such an analysis does no worse than many recent analyses which postulate a head-final order. Many descriptive problems remain for both approaches. The real motivation for our approach comes from the treatment of the word-order changes in early Middle English. I argued that the loss of OV orders was caused by the loss of a strong N-feature on AgrO, a development which is related to the loss of morphological case on DPs by (36). In this way, the word-order change in English can be viewed as an instance of a typical kind of change: the loss of an overt movement rule caused by the loss of the morphological trigger for a strong feature of a functional head. The loss of overt movement of inflected verbs in early Modern English was arguably a change of a similar kind (cf. in particular Roberts 1993a). I further suggested that the loss of scrambling and of complement clitics was connected to the change in the value of this feature: once this feature had changed, economy constraints on movement prevented overt scrambling from taking place, and this in turn blocked cliticization of complements. In this way, our account connects four salient changes in early Middle English in terms of the loss of a single abstract feature, and the account of the causation of the change is embedded in a more general theory of language change, from which it follows that strong features may become weak when a morphological trigger for overt movement disappears from the input to language acquisition.12

Notes   * Earlier versions of this material were presented at the University of Venice; University of Geneva; University of York; the Centre National pour la Recherche Scientifique, Paris; the School of Oriental and African Studies; London; the ninth Comparative Germanic Syntax Conference (Harvard University); Georgetown University; and the 3rd Diachronic Generative Syntax Conference (Free University of Amsterdam). I would like to thank the audiences at those presentations, and particularly Anthony Kroch and Giuseppe Longobardi, for their comments and criticisms. Thanks also to the editors of this collection and one anonymous reviewer for comments and suggestions. All errors are my own.   1. This is not quite how Kayne puts it. Kayne proposes the Linear Correspondence Algorithm (LCA):

(i) For a given phrase marker P, with T the set of terminals, d(A) is a linear ordering on T

where d(X), for X a non-terminal, is the set of terminals X dominates, and A is the set of pairs of non-terminals such that the first asymmetrically c-commands the second. These notions formalize the relation between asymmetric c-command and linear order. However, as Chomsky (1994) and Rohrbacher (1994b) both point out, it is necessary to stipulate precedence, not just ordering, in order to derive the result that Specifier–Head–Complement is the only possible order within XP. To see this, take the VP in (3). Here d(VP) is {⟨see, him⟩}. So

132  Ian Roberts the LCA requires that the terminals see and him be ordered, but not necessarily that see precede him. Moreover, where d(X) contains more than one ordered pair, say {⟨x, y⟩, ⟨y, z⟩}, nothing in Kayne’s system prevents us from choosing ‘precede’ as the ordering among x and y and ‘follow’ as the ordering among y and z (giving xzy where x asymmetrically c-commands y and y asymmetrically c-commands z).The fact that precedence has to be stipulated makes Kayne’s system less elegant than it might have been.   2. Note that in a theory of the type advocated in Chomsky (1993), which lacks a single point in the derivation that can be defined as a base, it is not even clear that the notion ‘base expansion of X’ can be defined.   3. This conclusion depends on the assumption that there are no scrambling positions to the left of AgrsP. This assumption is somewhat dubious, and has been explicitly denied by Haegeman (1993b) and Sportiche (1996). If these authors are right in positing scrambling positions between C and AgrS, then we could maintain that V raises to Agrs in examples of this type (and more generally in OE and West Germanic). If there is a clear relation between the ‘richness’ of verbal inflection and verb movement to Agrs of the type proposed by Roberts (1985, 1993) and Rohrbacher (1994a), then we are forced to say this anyway for at least some of these languages. However, in that case superficial OV order is the result of obligatory scrambling to positions above Agrs; it is not clear what causes this (although this does not affect the account of the wordorder change given in section 4, which postulates a relation between the loss of scrambling and the loss of overt movement to Spec, AgrOP; even if superficial OV derives from obligatory scrambling to positions above Agrs, loss of movement to Spec, AgrOP will nevertheless account for the loss of scrambling). For the purposes of the present argument, however, we can maintain the slightly simpler assumption that the verb raises to T or AgrO in OE.  4. Alternatively, given the discussion in note 3 of scrambling positions above AgrS, both positions 5 and 6 may be above AgrSP. This would entail a number of modifications to the structure in (28).   5. Put this way, V-raising might be thought to involve right-adjunction, another structural configuration ruled out by the proposals in Kayne (1994). We can avoid this consequence by positing that V left-adjoins to Aux (i.e. the higher V-position), and Aux the raises to AgrO by excorporation. Note that any analysis of West Germanic verb raising must assume excorporation (cf. Roberts 1991 [this volume, Chapter 10]; Rutten 1991). To get the correct orders in clusters of more than two verbs, we have to assume that non-finite verbs also move; in fact, this is exactly the same operation.   6. The reason for verb clustering might be that ‘transparent’ complements must be licensed by an external head. The licensing head would always be the finite verb, and the relationship either a Spec-head one (where the complement moves to Position 7) or head-head one (motivating V-raising in (29b, d)). On this view, VPs can front to the specifier of a position occupied by the finite verb or its trace (e.g. Spec,AgrOP or, in V2 clauses, Spec,CP). A similar constraint holds for ‘restructuring’ in contemporary Romance languages—cf. Kayne (1989), Roberts (1997).   7. Until the sixteenth century, pronoun object shift of the sort found in contemporary Mainland Scandinavian is attested:

(i)  They tell vs not the worde of God (1565, T. Stapleton, A Fortress of the Faith (Antwerp 1565); Roberts 1995: 27 [this volume, Chapter 3])

This is a different kind of pronoun movement from the type being considered in the text. It always places the pronoun in a position lower than the subject (i.e. below AgrS) and it is dependent on verb movement to a higher position (cf. Holmberg 1986). On the other hand, clitic movement of the kind found

Directionality and Word Order Change  133 in West Germanic and OE places the pronoun above AgrS (but below C, we have argued) and is independent of V-movement. The historical development of Eng­ lish strongly suggests that the latter system developed into the former. Here we propose an account of how the latter system was lost.   There are residual OV orders in ME with full DPs. These are mostly found with infinitives and participles, e.g. (thanks to Najib Jarad, p.c., for (ii-b, c)): (ii)  a.  I may no rest haue a-mongys 3ow  (MKempe A 122.19–20; Fischer 1992: 373)    b. and prattest hine to slayne and his cun to fordonne ‘and threaten to slay him and destroy his kin’  (CursM 12965; Visser 1963–73: §1039)    c.  She did him excite . . . hir story for to write  (Lydgate Fall Pr. 9.518; Visser 1963–73: §2279)

These are reminiscent of similar orders found in Old French (cf. Pearce 1990). This construction is too restricted to be considered a case of scrambling. Perhaps it is a variant of Icelandic-style object shift, although to show this we would have to show that ME infinitives move (and note that (iia–c) illustrate the three main kinds of ME infinitive: bare, to and for-to). We have no detailed analysis along these lines to offer here, although it is tempting to consider such a movement as the diachronic residue of OE V-raising—see below.   8. In minimalist terms, this may seem strange, but one could claim that scrambling assigns an interpretive feature rather than a formal, morphosyntactic feature; probably the basis of the distinction between L-related and non-L-related movement/positions can be recast in these terms, with L-related positions being ‘pure’ checking positions and non-L- related ones being associated with an interpretive feature of some kind. On what the interpretive property of scrambled positions might be, cf. Diesing (1992).   9. Here the question which arises is what we should call the ‘relevant’ morphology in terms of (36). One possibility is that it concerns the existence of overt morphological nominative-accusative distinctions. However, only two out of seven noun declensions ever distinguished nominative from accusative in OE, and those only in the singular. A more promising proposal would be to attribute the crucial properties to determiners. Masculine and feminine singular forms of the protodefinite article (a demonstrative at the time—cf note 13) distinguished Nominative from Accusative in OE (masc se (NOM), þone (ACC), fem sēo (NOM), þā (ACC)) but this distinction dies out around the same time as the word-order becomes VO. The correlation here seems quite close in that, for example, the Final Continuation of the Peterborough Chronicle (1132–55) shows an invariant þe as the singular definite article (Lass 1992: 112), and is usually thought to be VO (Mitchell 1964, cited in Fischer (1992: 372), shows that this text has 88% VO order). We might take it then, that morphological case-marking on articles, in particular a nominative-accusative distinction, provides the ‘relevant morphology’. This conclusion is consistent with the situation in Modern German, which provides a morphological trigger for a strong N-feature on Agro only if we regard case marking on articles as crucial. On Modern Dutch, see below. 10. As we saw in note 7, residual OV orders survive for some time after the twelfth century. We must assume that these orders neither depend on nor trigger a strong N-feature of Agro, although their analysis remains unclear. In this situation, we can only conclude that twelfth-century acquirers reanalysed OV orders as whatever construction the one seen in note 7 is. 11. Final DPs are found in Middle Dutch (Weerman 1989). Our expectation is then that these orders are lost before morphological case is lost. Weerman shows that Old and Middle High German also had final DPs. It is clear, then, that

134  Ian Roberts these properties can be lost independently of the English developments that we are discussing here. An intriguing possibility is suggested by our analysis of (29a). We suggested above that Modern West Germanic varieties with ‘V-projection raising’ and without final DPs lack the possibility of remnant fronting of non-finite clauses. A plausible speculation is that this is due to these complements having a more reduced functional structure than the OE (or OHG and Middle Dutch) counterparts. If non-finite complements develop a more reduced functional structure then there are fewer landing sites for the object on the lower cycle and correspondingly less possibility of remnant complement fronting. Such a development, which is quite independent of anything discussed in the text, may have led to the loss of final DPs in all of Continental West Germanic. If this can be maintained, then we do not have to predict that final DPs were lost before morphological case-marking in Dutch. 12. In the highly restrictive view of parametrization imposed by the Minimalist framework it is difficult to see what other properties might give rise to OV orders. It is then tempting to regard other cases of OV-to-VO change as being caused in the same way. This is a plausible speculation as regards the development from Latin to Romance. Latin was OV with free word order and morphological case; Modern Romance languages are all VO, have rigid word order and—with the possible exception of Rumanian—have no morphological case (outside the pronominal system). Taking Latin ‘free word order’ as indicative of scrambling (whether of a type precisely like that found in West Germanic or closer to what is found in Russian, Hindi or Japanese remains to be seen), and the Romance ‘rigid’ word order to indicate the absence of scrambling, it seems likely that an account of the sort given above would carry over. The obvious anomaly concerns clitics/weak pronouns: why has Romance retained such elements while English has lost them? Although I cannot give a full answer here, I speculate that this is connected to the fact that Romance clitics are essentially V-related elements that license pro (cf. Rizzi 1993 on the former and Sportiche 1996 on the latter), rather than being Wackernagel elements like their Germanic (and Late Latin) counterparts. The changes described in the text eliminate Wackernagel pronouns (or ‘C-oriented’ clitics in Rivero’s terminology—cf. the discussion of this notion in section 3.2), they are not necessarily incompatible with clitics with the particular properties that the Modern Romance ones have. On this view, when the movement source for clitics described in the text was lost Romance pronouns changed status. The question now becomes: why did this change not happen in English? One possible answer is that it did: I mentioned in note 8 that Middle and early Modern English show pronoun object shift of the sort found in Mainland Scandinavian languages today—see Roberts (1995 [this volume, Chapter 3]) for an analysis that is largely, but not entirely, compatible with the analysis of word-order change given here However, English/ Scandinavian pronoun object-shift is radically different from Romance cliticization, essentially in that the English and Scandinavian pronouns do not seem to be V-related. A more intriguing possibility is that English lacked a sufficiently rich agreement system to allow a new class of licensers of pro. The question for this approach is why the clitics themselves did not become an agreement system; to attempt an answer to this would take us too far afield here.   Giuseppe Longobardi (p.c.) points out another possible consequence of the account of word-order change given here. Suppose, as is often suggested, that scrambling is a way of marking the specificity of a DP. Then the loss of scrambling led to the loss of this mode of marking specificity, and may thereby have contributed to the development of the article system of Middle and Modern English, absent in Old English (sē, sēo, þæt was a demonstrative

Directionality and Word Order Change  135 at this time). More generally, OV languages with rich morphological case and scrambling tend to lack article systems: cf. once again Latin. Here we see how changes in clausal functional categories can interact with developments inside DPs, another very rich area that we cannot begin to investigate adequately here.

Verb Movement and Markedness Ian Roberts

This chapter can be thought of as an exercise in the application of the theory of parameters to a set of data. I will discuss a fairly simple, well-known, and well-understood parameter and show how evidence from language change, language acquisition, and creolization supports the idea that there is an unmarked value of this parameter: the “weak” value in terms of Chomsky’s (1993) proposals. This leads to the contention that the weak value of a parameter is always the unmarked value. Drawing on work by Clark and Roberts (1993 [this volume, Chapter 2]), we will see that this conclusion is underpinned by the theory of learnability. The parameter in question is the one that governs verb movement to I.1 In section 5.1, I present the basic data that motivates the postulation of such a parameter. In section 5.2, I describe the evidence that the value of this parameter changed in sixteenth-century English. Section 5.3 offers an interpretation of the recently discovered “root-infinitive” stage of the acquisition of English and many other languages that amounts to proposing that root infinitives result from the verb-movement parameter being initially set to the unmarked value. In section 5.4, I argue, concentrating on Haitian Creole, that French-based creoles have the default value of this parameter, perhaps because creoles generally have default parameter values (see Bickerton 1984), an idea that I attempt to substantiate.

5.1 The Verb-Movement Parameter It was originally argued by Emonds (1978) that French has a rule moving finite verbs out of VP, whereas English does not. The basic form of the observation is as follows: there is a class of elements X that can be plausibly regarded as positioned on the left edge of VP. These elements include VP adverbs, clausal negation, and floating quantifiers. In French, finite main verbs must precede X, but English main verbs always follow X. The relevant paradigms are as follows: (1) Adverb a. Jean embrasse souvent Marie.    *Jean souvent embrasse Marie.

140  Ian Roberts b. *John kisses often Mary. John often kisses Mary. (2) Negation a. Jean (ne) mange pas de chocolat.   *Jean (ne) pas mange de chocolat. b. *John eats not chocolate. John does not eat chocolate. (3) Floating quantifiers a. Les enfants mangent tous le chocolat.   *Les enfants tous mangent le chocolat. b. *The children eat all chocolate.  The children all eat chocolate. The evidence clearly shows that finite verbs are in different positions in the two languages. The alternative is to suggest that the X-elements differ between the two languages. This has been suggested by Williams (1994); however, Williams’s approach does not encompass what I take to be the very important interactions with inversion, which I turn to directly. Standard assumptions about inversion, deriving from the seminal work of den Besten (1983), treat this operation as involving movement of I to C. Given the Head Movement Constraint (Travis 1984; Baker 1988), V cannot move directly to C, and so inversion of main verbs depends on the prior operation of V-to-I movement to feed it. Thus we find that French main verbs are able to undergo inversion (subject to the independent restriction that the subject be a clitic; see Rizzi and Roberts 1989 [this volume, Chapter 9]), whereas English main verbs are unable to do so: (4) a. Voit-il le cheval? b. *Sees he the horse? The contrast in (4) is further evidence that French main verbs move to I, whereas their English counterparts do not.2 Pollock (1989) developed Emonds’s original proposal in terms of principles-and-parameters theory (henceforth P&P). His proposal was that English I is theta opaque, in that it does not permit a category it contains to assign theta roles. Hence, a verb with argument structure could not move there without violating the Theta Criterion. A verb that lacks argument structure could move there, however. On the assumption that auxiliaries do not assign theta roles, this derives the fact that the auxiliaries in English can undergo verb movement (cf. Emonds’s rule of have/be raising): (5)

a. b. c. d.

John has often kissed Mary. John has not kissed Mary. The kids have all eaten the chocolate. Has John seen Mary?

Verb Movement and Markedness  141 French I, on the other hand, is theta transparent, and so main verbs are able to move there without violating the Theta Criterion (and are required to move there because Tense must create a bound variable by LF; this requirement is met in English by movement of an abstract auxiliary when no overt auxiliary is present). Pollock notes that nonfinite I is theta opaque in French, and hence we find the same split between auxiliaries and main verbs as in English finite clauses:3 (6) a. N’avoir/*posséder pas de voiture en banlieue crée des problèmes. to-have  to-possess not of car in   suburbs creates some problems ‘To have/possess not a car in the suburbs creates problems.’ b. N’être/*sembler  pas heureux est une condition pour écrire des   romans. to-be to-seem not happy is   a condition for to-write some   novels ‘To be/seem not happy is a condition for writing novels.’ For Pollock, then, the parameter governing V-movement is the theta opacity or transparency of I. Pollock notes that it is likely that the value of this parameter in a given language is connected to the “richness” of agreement morphology because French agreement morphology is somewhat richer than that of English. Moreover, Pollock notes, following Roberts (1985 [this volume, Chapter 1]), that earlier stages of English had richer agreement and V-to-I movement (i.e., a theta transparent I). I discuss the diachronic evidence and the relation between V-to-I movement and the agreement system in more detail in section 5.2. Chomsky (1993) proposes that the relevant parameter concerns the value of an abstract morphological feature that licenses verbs, and is associated with I. This feature is called I’s V-feature. In Chomsky’s system, such features are generated both on V and on I and must be canceled out by a checking operation prior to LF because they have no semantic content and will thus violate the Principle of Full Interpretation unless eliminated. The feature varies parametrically as either strong or weak. If it is strong, it is visible to the PF component and hence must be eliminated prior to the mapping to that level of representation, Spell-Out. Because feature checking takes place in a highly local domain, V must move to I in order for feature checking to take place. Thus where the V-feature is strong, V raises overtly to I. Where the feature is weak, the Procrastinate principle, which delays movement to the covert, post-Spell-Out part of the grammar wherever possible, prevents this movement from taking place overtly. In these terms, then, French I has a strong V-feature and English I has a weak V-feature.4 Again, this account is compatible with the observation that the value of the parameter is connected to the richness of agreement morphology: relatively “rich” agreement morphology, in some sense, gives rise to a strong V-feature, whereas relatively “poor” agreement morphology is associated with a weak feature. I elaborate this idea in the following discussion.

142  Ian Roberts

5.2 The Loss of Verb Movement in English It is well known that English has historically lost V-to-I movement (see Roberts 1985 [this volume, Chapter 1], 1993a; Kroch 1989; Pollock 1989). The historical evidence from English prior to roughly 1600 shows that, at this period, English verbs patterned like French verbs with respect to movement to I. (7) a. if I gave not this accompt to you ‘if I gave not (= didn’t give) this account to you’ (1557: J. Cheke, Letter to Hoby, in Görlach 1991, 223) b. How cam’st thou hither? ‘How camest thou (did you come) here?’ (1594: Shakespeare, Richard III) c. The Turks . . . made anone ready a grete ordonnaunce. ‘The Turks . . . made soon ready (=soon prepared) a great ordnance.’ (c1482: Kaye, The Delectable Newsse of the Glorious Victorye of the Rhodyans) d. In doleful wise they ended both their days. ‘In doleful way they ended both (= both ended) their days.’ (1589: Marlowe, The Jew of Malta, III, iii, 21) These examples, and many like them, show that I had a strong V-feature at this period. According to most accounts (Kroch 1989; Lightfoot 1991; Roberts 1993a), verb movement of this type began to decline in the latter part of the sixteenth century and was lost from the colloquial language in the seventeenth century, although it remained in the literary language throughout the seventeenth century and perhaps slightly longer (see Jespersen 1909– 49, VI, 502). Kroch (1989), reanalyzing the quantitative data of Ellegård (1953), shows that the crucial turning point was 1575. Beginning with Roberts (1985 [this volume, Chapter 1]), it has been argued that the loss of parts of the verbal conjugation in English is related to this change. There is little doubt that, as part of the general loss of morphology that took place during the Middle English period, a large number of verbal endings had disappeared by the end of the fifteenth century. In particular, Gray (1985, 495ff) gives the following paradigms for Londonarea English circa 1400 and circa 1500: (8) 1400 1500 cast(e) cast cast-est cast-est cast-eth cast-eth caste(n) cast(e) caste(n) cast(e) caste(n) cast(e)

Verb Movement and Markedness  143 Presumably these changes are caused by phonological erosion of final nasals and of unstressed vowels. In particular, in the sixteenth century there are only very few attested survivals of any plural ending (see Roberts 1993a, 257ff). It is therefore plausible to propose that the presence of agreement morphology, particularly plural endings, played a very important role in triggering the strong V-feature. Evidence from the Scandinavian languages supports the idea that the “richness” of agreement is related to the strength of I’s V-feature. Platzack (1987) establishes a diachronic correlation between the loss of agreement inflection and the loss of V-to-I movement in subordinate clauses in Swedish.5 A similar diachronic pattern can be shown for Danish also (see Roberts 1993a, 264). Moreover, Icelandic has retained a relatively “rich” agreement inflection (similar to that found in Old Swedish and Old Danish) and has retained V-to-I movement. Similarly, evidence from Scandinavian dialects supports this idea. On the one hand, Älvsdalsmålet, a dialect of Swedish spoken in Dalecarlia, Central Sweden, has plural agreement marking and verb movement in subordinate clauses (see Platzack 1987; Platzack and Holmberg 1989). On the other hand, the Norwegian dialect of Hallingdalen effectively lacks plural agreement endings and lacks verb movement in subordinate clauses (Trosterud 1989). The crosslinguistic evidence from English and North Germanic clearly establishes a link between agreement marking and verb movement. Roberts (1993a) formulated this link in terms of number agreement, suggesting that overt, distinct marking of number agreement is the relevant kind of “richness.” Rohrbacher (1994) points out that Roberts’s proposal makes the wrong predictions for Faroese, which lacks verb movement but has overt, distinct plural marking. Instead, Rohrbacher proposes that distinct first- and second-person markings in at least one number of one tense of the regular verbs is what is required; this generalization covers Faroese (which, unlike Icelandic, lacks such marking) and the other languages.6 If we could establish a simple and general biconditional relation between agreement morphology of a certain kind and a strong V-feature in I, we could simply attribute the loss of verb movement in English to the effects of Procrastinate because the loss of morphology would entail the change in the value of the feature from strong to weak and, hence, the loss of verb movement. However, it is clear that such a simple biconditional relation cannot be maintained. First, there is the case of V-to-I movement in infinitives—lacking in French (for main verbs) but obligatory in Italian. Belletti (1990) illustrates this with the following kind of contrast between French and Italian (Belletti argues that pas and più are in the same position): (9) a. *Ne  lire pas le livre.   neg read neg the book

144  Ian Roberts b. c. d.

Ne pas lire  le livre. neg neg read the book ‘To not read the book.’ Non leggere più il libro. neg read more the book ‘To no longer read the book.’ *Non più leggere il libro. neg more read the book

Neither language has any agreement morphology in infinitives (and infinitives themselves are morphologically marked in broadly the same way). Second, there are certainly languages with V-movement that lack the relevant morphology. Depending on other considerations (see note 5), Dutch and Afrikaans might be such cases. A clearer case is the English of Northern England and Scotland of the fourteenth to sixteenth centuries. In this language, the same ending (usually spelled -is) appeared in all persons in the present tense and there is clearly verb movement (the ending -it in (10b) is the past tense). (10) a. b.

Consideris thou this? ‘Considerest thou (=do you consider) this?’ (1515: Douglas, Aeneid, 1. 208; Görlach 1991, 208) For then they observit not Flowing nor eschewit not Ryming in termes. ‘For then they observed not (= didn’t observe) flowing nor eschewed not (didn’t eschew) rhyming in terms’ (1584: James VI, The Essays of a Prentice, Preface to the Reader; Görlach 1991, 309)

Third, the Kronoby dialect of Swedish combines an absence of agreement morphology with verb movement (Platzack and Holmberg 1989). Finally, in both Danish and English there was a considerable time lag between the loss of agreement morphology and the loss of V-to-I movement: Danish had lost its verbal morphology by 1400 (Karker 1974, 25), but V-to-I in subordinate clauses survived until at least the seventeenth century (Falk and Torp 1900, 302). In Standard English there was a gap of at least 75 years between the loss of relevant verbal morphology (c1500) and the loss of verb movement (c1575). These reasons lead me to formulate the relation between agreement morphology and the presence of a strong V-feature on I as a one-way implication: (11) If there is verbal agreement marking of the relevant type, then I has a strong V-feature. The statement in (11) implies that the loss of verbal agreement marking on its own is not sufficient for the change in the value of I’s V-feature, and

Verb Movement and Markedness  145 hence, such a change on its own has no effect on V-to-I movement. Thus the statement is consistent with the existence of languages that have verb movement but lack the “relevant” type of inflectional morphology. However, (11) says that the loss of this morphology is a necessary condition for the loss of V-to-I movement: a language that has this morphology has V-to-I movement, and this morphology must be lost if V-to-I movement is lost. Clark and Roberts (1993 [this volume, Chapter 2]) argue that the parametersetting algorithm contains an elegance condition, which, all other things being equal, favors those parameter settings generating relatively simple representations over those generating relatively more complex ones. In Clark and Roberts, in progress, we further argue that the preference for greater simplicity acts at a higher level, causing the form of parameters to be as simple as possible—that is, binary feature values associated with underspecified categories (see section 5.4 on the view that functional categories are inherently underspecified). For the sake of the present exposition, I take simplicity to be a function of movement relations. Thus a structure involving movement is more complex for the learner than a structure not involving movement. This idea forms the basis of a theory of markedness of parameter values that holds that strong feature values are marked because they inevitably give rise to movement dependencies. 7 In these terms, (11) (properly formulated, following Roberts, Rohrbacher, or some other characterization of “relevant morphological marking”) states the morphological trigger for a marked parameter setting. Hence the loss of the relevant morphology implies the loss of that morphological trigger. However, this does not in itself imply a change in the parameter value. Many marked parameter values are triggered by word order alone—call these syntactic triggers. For V-to-I movement to be lost, it must be that this trigger somehow became inoperative, leading the learner to select the unmarked parameter value (i.e., the weak V-feature). We can phrase this in terms of the notion of P-expression introduced by Clark and Roberts (1993 [this volume, Chapter 2]): a sentence S expresses a parameter P just in case a grammar must have P set to some definite value in order to assign a wellformed representation to S. Given a theory of markedness, only marked values of parameters need to be expressed; unmarked values will be triggered by default if they are not expressed. I regard a marked parameter value as morphologically expressed when a condition like (11) is satisfied; its syntactic expression depends on word order unambiguously expressing the marked value. In the case of V-to-I movement, it is plausible that the most salient syntactic expression of the marked parameter value was movement of I containing V to C in inversion and the order V+I not in negating. A notable characteristic of sixteenth-century English is the emergence of do as a dummy auxiliary. In this period, do was freely available as a semantically empty carrier of tense and agreement marking, leaving the main verb inside VP and in an unmarked form in all contexts, including positive declaratives (where in standard Modern English it is, of course, ungrammatical;

146  Ian Roberts Ellegård 1953; Visser 1963–73; Denison 1985). Roberts (1993a) argues that do developed from a homophonous causative and raising verb by essentially a process of grammaticization, the same process that created a syntactically identifiable class of modal auxiliaries (see also Lightfoot 1979 on the latter). The most important step in the process of grammaticization was signaled by the loss of nonfinite forms of do and modals. Sixteenth-century do was like Modern Standard English auxiliary do in that it could not appear in nonfinite contexts (see Visser 1963–73, section 419). The following is one of the last examples of do in a non-finite context: (12) Now if I would then doe . . . tel hym. ‘Now if I would then do . . . tell him.’ (1534: St Thomas More, Works (1557) 1192, F4; Visser, ibid) It may be significant that Thomas More is one of the last authors (of standard English) to use sequences of modals (see Lightfoot 1979, section 2.1). I take it that the inability of do to appear in a nonfinite context, combined with its lack of lexical content, meant that it was analyzed as an I-element, not as a V that raises to I (the same can be said of the modals; see Roberts 1985 [this volume, Chapter 1]). Because they were base-generated in I, these elements created an indeterminacy in the trigger for the V-to-I parameter: in contexts where these elements were present, the trigger for this parameter was simply not present (in the terms of Clark and Roberts (1993 [this volume, Chapter 2]), this parameter value was not expressed). According to Ellegård’s (1953) figures (reproduced in Kroch 1989 and Roberts 1985 [this volume, Chapter 1], 1993a), the incidence of do in questions averaged approximately 60% in 1575 and approximately 65% in 1600. In negatives, the figures were approximately 25% and 35%, respectively. These figures, particularly those for questions, strongly suggest that the development of the auxiliary system, in particular the availability of dummy do, undermined the syntactic expression of the marked parameter value and hence led learners to default to the unmarked value. In this way, the V-feature of I was set as weak, and the Procrastinate principle blocked V-to-I movement. To summarize, three factors led to the parameter change. First, the loss of agreement morphology, given (11), removed the morphological trigger for the strong feature. Second, a number of constructions in the input, essentially those involving auxiliaries, were compatible with grammars lacking verb movement. Neither of these developments in themselves guarantee the loss of V-to-I movement. The crucial third factor is the sensitivity of the learner to complexity, embodied in a markedness theory claiming that strong feature values, because they create more complex representations involving overt movement, are dispreferred relative to weak features. The situation for learners of English in the sixteenth century was one where the morphological and syntactic evidence for verb movement was no longer categorical.

Verb Movement and Markedness  147 Hence, given the general drive to minimize complexity, the system that did not feature the option of verb movement was preferred. Therefore, the parametric value of I changed: the former strong V-feature became weak.

5.3 Root Infinitives and Parameter Setting In this section, I briefly consider one way that the proposals made in the previous section about the variation and fixation of verb-movement parameters might interact with language acquisition. I have proposed that the setting of verb-movement parameters consists in locating the strong feature values of functional heads. Such features can be triggered either morphologically or syntactically: morphologically by the presence of relevant inflectional paradigms, and syntactically by the presence of clear evidence of movement of the relevant kind. Here, I present a case study of one aspect of developmental syntax that has received a good deal of attention in the recent literature: the so-called root infinitives (Rizzi 1994; Wexler 1994). I will try to show that this phenomenon is a reflex of the learner’s propensity for weak features; this propensity manifests itself in the production of root infinitives under certain highly specific circumstances. Wexler (1994) shows that acquirers of a range of languages, whose V-movement properties are well understood, go through a developmental phase (between roughly 20 and 30 months) where they show evidence of having acquired the relevant kinds of V-movement, but that, alongside adult-like instances of V-movement, these acquirers produce so-called root infinitives. The root infinitives are unmoved verbs in a form homophonous with the infinitive. The evidence, which I will now review, comes from a range of Germanic and Romance languages. Pierce (1989, 1992) shows that children acquiring French produce adultlike sentences such as (13a) with inflected verbs and V-to-I movement, alongside sentences where verb movement does not occur and the verb has a form homophonous with the infinitive, such as (13b): (13) a. Patsy est pas là-bas. Patsy is not over-there b. Pas manger la soupe. not eat-inf the soup Example (13b), like (13a), is a declarative. Moreover, although the -er infinitive ending for this conjugation is phonetically identical to both the second-plural present and the past participle (all are /-e/), Pierce shows that the form here is an infinitive on the basis of examples involving verbs from other conjugations where these forms are distinct (e.g., voir ‘see’ (infinitive) vs. vu (past participle) vs. voyez (second-plural present)). The correlation between the use of the infinitive and the absence of V-to-I movement is clear.

148  Ian Roberts In German, the pattern is finite verb in second position, (14a), versus apparent infinitive in final position, (14b) (all the utterances are root clauses). (14) a. Mein Hubsaube hat Tiere   din. my  helicopter had animals in-it b. Ich der Fos  hab’n. I  the frog have Weverink (1990) shows that Dutch children go through a comparable stage: (15) a. Ik pak’t  op. I  pack-it up b. Pappa schoenen wassen. daddy shoes   wash Plunkett and Stromqvist (1990) show that in Swedish child language, finite verbs are used when the verb precedes negation, as in the adult language (16a), and apparently non-finite forms are used when the verb follows negation (16b) (again, all clauses are root). (16) a. Alg sager inte mu. elk say   not moo b. Inte ha     den. not have(–FIN) it Finally, Wexler (1994) insightfully relates this crosslinguistic evidence to an old observation from developmental studies of English that the third-singular marker -s is frequently “dropped”: (17) He no bite you. Wexler analyzes this kind of production as a further instance of a form homophonous with an infinitive appearing in child language where it would not appear in adult language. Wexler supports this analysis with the observation that not V + s forms are rarely attested in child language. In English, it is not verb movement that is lacking but rather the dummy auxiliary do, which can in a sense be seen as the counterpart of verb movement in other languages. Wexler considers three different possible analyses of root infinitives. One possibility is that all these clauses contain a null modal. This would immediately explain the form and distribution of the verbs in all the languages: they take the infinitive form and cannot move because those are the properties of verbs in the complement to modals. However, beyond this, such an approach is little more than a restatement of the problem: why should there

Verb Movement and Markedness  149 be a developmental phase in which a null modal is postulated? Notice further that, as Wexler points out, this putative modal has no semantic value (unlike, for example, the empty modal that has been proposed for English subjunctives by Culicover (1971), Emonds (1976) and Roberts (1993a), which roughly corresponds to should semantically). From the perspective of English, at least, it is more plausible to consider that there is a null do present in examples like (17); however, this idea does not translate naturally to the other languages. In any case, here we come up against the same problem of explanation: why should acquirers postulate such an element in the first place? A second possible account is that T is either optional or underspecified, at this developmental stage, for [±finite]. Either way, the adult distinction between finite and non-finite clauses would be lacking as a consequence of this. This analysis would then explain the data and indicate that this sole factor perturbs the children’s verb-movement and inflection system. Wexler adduces some support for this idea from the fact that children acquiring English sometimes use bare-stem forms for the past tense. The problem with this idea is that there is still no real explanation as to why such a phase should exist. The third possibility that Wexler considers is that Agr, or perhaps T, has weak V-features at this stage. Because this idea can be readily integrated into my system of markedness, learning, and parameter setting, this is the analysis I adopt. However, before going into a detailed discussion of this idea, we must consider a fourth analysis that has been put forward by Rizzi (1994). Rizzi proposes the Clausal Truncation Hypothesis. The central idea is that adult grammars contain the principle in (18). (18) CP = root The effect of this principle is that all clauses in adult grammar must contain a full functional structure, including at least C, AgrS, T, and AgrO. T is assumed to contain a temporal variable that must be bound: in finite clauses verbal inflection performs this function (see Pollock 1989), whereas nonfinite clauses require some binder; for example the T-variable of embedded nonfinite clauses is bound by the higher T (see Enç 1988; Stowell 1982). Root infinitives are ungrammatical in adult language because this variable cannot be bound. Child language allows root infinitives because (18) does not hold and, in Rizzi’s words: [I]f the selected starting point is a category lower than TP . . , then one will get the root infinitive, or a root construction exhibiting whatever unmarked nonfinite form the language possesses; so, a root infinitive can be a bare VP in a language like English, in which the infinitival form is the bare stem . . ; or the maximal projection of the head corresponding to the infinitival morpheme . . . Since there is no T position, there is

150  Ian Roberts no tense variable to bind, hence the root infinitive at this stage will not incur the violation of the binding requirement which bans this construction in adult systems . . . . (Rizzi 1994, 379) The Clausal Truncation Hypothesis accounts for a number of important properties that seem to hold of root infinitives. First, root infinitives are incompatible with wh-movement. Examples like (19) are apparently not found. (19) *Was Hans essen? what  Hans eat-inf In wh-interrogatives, the verb always moves and is always finite in form at this stage of acquisition of the languages that have been studied from this perspective. This follows from the Clausal Truncation Hypothesis because wh-movement requires CP, and if CP is present, then the option of clausal truncation is not taken, and root infinitives cannot appear for the same reason as in the adult language. Root infinitives are also largely incompatible with negation. This follows if NegP is held to be higher than TP (see Belletti 1990 and pace Ouhalla’s (1991) proposal that the position of NegP can vary crosslinguistically). Rizzi suggests that examples of negation that are found with root infinitives (e.g., (13b), (16), (17)) should be treated as constituent negation. He also notes the observation of Hoekstra and Jordens (1991) that a negation marker corresponding to the adult anaphoric negation was used in root infinitives, whereas the standard clausal negator of the adult language was used with finite verb forms (in this connection, see no instead of not in (17)). French root infinitives are incompatible with the presence of the subject clitics (e.g., je, tu, etc.). If French subject clitics are clitics in AgrS or pronouns that cliticize obligatorily to AgrS (in section 5.4 I argue for the latter approach), then this is explained by the Clausal Truncation Hypothesis: when there is root infinitive, there is no TP and therefore no AgrSP. A further important observation is that auxiliaries do not occur as root infinitives (Wexler 1994). Thus, root infinitives like those in (20) are not found. (20) a. *avoir   mangé have-inf eaten b. *gekauft haben bought  have-inf Rizzi proposes that auxiliaries can be treated either as generated in T or as obligatorily raised to T. Either way, they depend on T for licensing and so would not be found in root infinitives, where TP is absent.

Verb Movement and Markedness  151 A final point is that root infinitives are not found in Early Italian (Guasti 1992). Rizzi relates this fact to the obligatory raising of Italian infinitives to AgrS, as illustrated by the contrasts in (9) above. Since infinitives depend on AgrS, root infinitives involving truncation of TP are impossible. The Clausal Truncation Hypothesis explicitly claims that this stage of child language has at least one grammatical property that is not found in any adult system: the fact that (18) is inoperative. The suggestion is then that principle (18) matures (see Rizzi 1994, note 4). Another possibility Rizzi considers is that, given that (18) is nonoperative in adult language in certain registers or under certain pragmatic conditions (e.g., in questionanswer pairs), perhaps children are simply not aware of the pragmatic conditions governing the suspension of (18) in early stages of acquisition and, hence, suspend it “too often.” This proposal too seems to entail that this stage of acquisition features a grammatical system that is not found in adult language (unless we can find a language where (18) is suspended more readily).8 I would like to explore the possibility that root infinitives are a reflection of the approach to markedness discussed earlier. If this idea can be maintained, it will yield the desirable result that this stage of acquisition is a grammatical system of the usual kind—that is, a particular combination of parameter settings and, in principle, one that could correspond to an adult language. And it will also, of course, be evidence for the operation of this theory of markedness in language acquisition. In the previous section, we saw a case of language change that provided evidence for a morphological trigger for strong values of parameters. In its general form, the morphological trigger can be stated as follows (see the more specific statement given in (11)): (21) If a head H has the relevant L-morphology, then H has strong L-features. For example, the relevant V-morphology for AgrS seems to consist of particular properties of the agreement paradigms. In the present context, two things are important about the morphological trigger. First, its absence is simply the absence of a trigger, and not a trigger for some “negative” property like weak features (this would in fact be a kind of negative evidence and so must be discounted on general grounds). Second, to function as a trigger, there must be some morphological analysis of the input. It is a well-known fact about language change (that played a clear role in the previous section) that morphology can be diachronically “weakened”; this amounts to saying that successive generations discount aspects of the morphological system. So it is clear that the morphology itself may not always be acquired. A natural inference is that morphological acquisition may proceed in stages, depending on which aspects of the adult system are acquired. It is well known from acquisition studies that this is true (Brown 1973).

152  Ian Roberts In the absence of a trigger, whether morphological or syntactic, this theory of markedness claims that weak values of parameters will always be preferred because they are associated with simpler overt representations. Suppose, then, that when confronted with an agreement system of a particular kind A, where agreement affixes are not abundant (and where referential null subjects do not appear—assuming that these are licensed via agreement morphology), the preference for simpler structures causes acquirers to ignore the morphological input and to assume weak feature values. In other words, in cases where the agreement system is “poor” enough to allow the morphological trigger to pass unnoticed, as it were, given the preference for weak features, a verb-movement system that can be characterized as follows emerges: (22) No agreement; no verb movement. In production, the nonagreeing verb form is used; in many languages, this is the infinitive (note that, in English, it is simply the stem form that appears in the infinitive, the imperative, and in all persons of the present tense except the third-singular form). A system like (22) would give rise to the production of “root infinitives.” Notice, however, that these forms are no more infinitives than the non-third-singular present-tense forms of English verbs; they are simply instantiations of nonagreeing verbs. Evidence that these forms are not infinitives comes from the fact that they are compatible with overt subjects, as examples like (14b), (15b), and (17) show; here nominative Case is assigned to the subject by I in exactly the way it is in simple adult English sentences like John smokes. (Rizzi suggests that the subjects in (l4b), (15b), and (17) may be dislocated, but it seems clear that an analysis like the one I present, which can explain the possibility of subjects with “root infinitives” without recourse to some special stipulation about subjects, is preferable to one that needs such a stipulation.) The system in (22) is a product of the lack of morphological expression of the strong value of the V-to-I parameter; it appears sporadically and is eliminated due to the syntactic expression of this parameter. So (22), combined with the theory of markedness I outlined, is the central idea of my account of “root infinitives,” which specifically links the occurrence of this phenomenon to the fact that the languages in question have somewhat impoverished agreement morphology. The “root infinitives” are a manifestation of the default option for AgrS’s V-feature in the case where the morphological trigger does not immediately rule this option out. In languages where the verbal agreement morphology is richer (and which allow null subjects)—namely, languages where the agreement system is not of type A—the morphological trigger is sufficiently salient that AgrS’s V-feature is immediately triggered, and so root infinitives are not found in acquisition. This is the situation in Italian, for example. So, my account of Italian is directly related to the nature of the trigger. More generally, I predict that,

Verb Movement and Markedness  153 other things being equal, “root infinitives” will be found only in languages with somewhat “impoverished” agreement morphology. It may be that the set of languages that manifest root-infinitive stages in acquisition is coextensive with the set of languages that do not allow referential null subjects; however, I will not speculate further on this point. Let us now consider the other morphosyntactic properties that, according to Rizzi, are correlated with “root infinitives.” It is possible to account for the absence of subject clitics in French root infinitives in essentially the same way as Rizzi (only without assuming clausal truncation). French subject clitics are effectively dependent on the presence of strong AgrS, because they are typical of Romance clitics in requiring both cliticization to AgrS and cliticization to an inflected verb. Thus, if the verb does not raise to AgrS, these pronouns cannot cliticize to it and hence cannot appear. An alternative would be to regard these clitics as manifestations of the strong features of AgrS. However, I retain the view that they are pronouns that obligatorily cliticize (see Kayne 1983; Rizzi 1986b). One reason for this is that if French subject clitics are AgrS then French is a null-subject language and the presence of root infinitives would therefore be surprising. As for the absence of auxiliaries, I could follow Rizzi in regarding them as dependent on (overt) raising to T for licensing; if raising is unavailable, auxiliaries will thus be unavailable. However, Pollock’s (1989) evidence that auxiliaries may but do not have to raise in French infinitives suggests that the licensing condition involving overt raising to T is not correct. The question that the absence of auxiliaries in root infinitives effectively raises is: why do we not find auxiliaries behaving like the tense/mood/aspect (TMA) markers of creoles (i.e., showing no agreement and no raising)? I suggest that the answer to this question lies in the fact that auxiliaries in English and French clearly undergo more raising to AgrS than do main verbs and also, especially in English, have richer agreement than main verbs. These facts may create a situation where auxiliaries effectively follow the Italian pattern: there is a sufficiently robust trigger for their movement that the unmarked option (22) does not surface. This implies that Universal Grammar (UG) distinguishes auxiliaries from main verbs and that movement parameters can be set differently for auxiliaries and main verbs. These ideas are needed independently, of course, in order to account for the existence of have/be raising in Modern English (as well as its diachronic origin). My account of the absence of root infinitives with wh-preposing and negation is similar. Suppose that there exists a parameter determining the “lexicalization” of an I system containing [+wh] or negation. The value of that parameter is positive in the adult versions of all the languages in question (except for Mainland Scandinavian negation, although this may be related to the fact that negation in these languages is adverbial rather than being a head in the I system):9 lexicalization involves V-movement in all the languages except English, where it is achieved by do-insertion. The trigger for the lexicalization parameter is syntactic: inversion in the case of [+wh]

154  Ian Roberts (linked to the value of the parameter determining I-to-C movement in interrogatives, again positive in all the languages in question), and V-movement or do-insertion in the case of negation. This parameter is also positively set in the root-infinitive stage of acquisition of the languages in question. For this reason, root infinitives, which are immobile, are incompatible with negation and inversion, which require movement. That the lexicalization parameter is distinct from the V-movement parameter is independently shown by Modern English, which requires lexicalization of [+wh] and [+neg] and yet does not allow V-movement (for main verbs). However, the lexicalization parameter and the V-movement parameter interact, thanks to the principle of Greed, which requires that a given category α move only to satisfy a property of α (see Chomsky 1993). Given that these languages all have I-to-C movement in interrogatives, V will raise through AgrS. Hence it must also check off its own strong AgrS features. We can extend this reasoning to the negation case if we assume that negation must combine with Tense (Laka 1990; Zanuttini 1991) and that, in finite clauses Tense must combine with AgrS (Chomsky 1993). Thus we derive the result that a verb moving in interrogatives and negatives is always inflected. To be consistent with the general theory of parameters of Chomsky (1993), the lexicalization parameter must be expressible in terms of V- or N-features. It seems natural to regard it as a case of strong V-features of I-elements with the relevant feature specifications; say, C and Neg, respectively, for [+wh] and [+neg]. In Modern English, the strong feature values of these elements cannot be checked by V because V is unable to move and hence do is inserted in just these contexts; do is able to move and so can check the strong V-features and hence cannot be inserted in positive declaratives because its own feature would then be unchecked. In sixteenth-century English, do raised to check I’s strong V-feature independently of the values of [wh] and [neg]. When I’s V-feature became weak, the raising of do began to take place under different conditions (i.e., only where there was [+neg] or [+wh]). Thus the presence of do in positive declaratives declined from 1575 (see Kroch 1989). This analysis of root infinitives has several advantages over the one proposed by Rizzi. It makes a general prediction about the connection between root infinitives and the richness of the agreement system. It does not postulate properties of child language that are not found in any adult grammatical system, and, of course, it integrates the phenomena into a general theory of markedness and triggers. Rizzi’s account of the development from a root-infinitive system to an adult-like one lacking in root infinitives involves the maturation either of (18) itself or of the pragmatic principles regulating the suspension of (18). It is because the root-infinitive systems are “immature” in this sense that they do not correspond to any adult grammatical system. Because my analysis of root infinitives treats them as involving a weak setting for I’s V-feature, I must account for why this feature value changes in languages where the

Verb Movement and Markedness  155 adult system has the opposite setting. The account is straightforward: acquisition of the adult morphological paradigms will ensure, given my account of the trigger for strong feature values, that the I has strong features in a language such as French. The syntactically triggered strong values for I[+wh] and I[+neg] also, given the Greed-based analysis sketched out above, create a number of situations in which V has strong features. Thus the grammar with a strong V-feature for I is preferred; the trigger experience requires the marked parameter setting. Similar considerations extend to the trigger for V2, although here the precise nature of the trigger is less clear.10 I conclude with a speculation on how the I parameter may fail to be set to the same value as that underlying the trigger. It is a reasonable speculation, especially given the suggested correlation with “poor” agreement systems, that sixteenth-century English acquirers passed through a rootinfinitive stage. We saw above that do was a freely available “dummy verb” at this stage of the language. Effectively, when do appears, adult English has a root infinitive (see Wexler 1994). All through the sixteenth century, do was particularly frequent in questions and negatives, removing one piece of evidence that I has strong V-features. Moreover, the crucial morphological trigger for V raising to I was lost by the sixteenth century. Given these considerations, it is easy to see how the I parameter was not reset so as to correspond to the then adult value.

5.4 Creoles, Markedness, and the Language Bioprogram Hypothesis In a series of very interesting papers, Bickerton (1981, 1984) has argued that creoles give a direct insight into the language faculty. His central idea is that creoles are acquired on the basis of a radically impoverished trigger (see Bickerton 1999), so impoverished that UG (in the form of the Language Bioprogram) has a more direct relationship with the final-state system than in the case of noncreole languages. Noncreole languages are, of course, underdetermined by experience, but the standard assumption is that aspects of experience profoundly influence the final state by setting parameters and so on. On the other hand, the characteristic property of creoles is that their history features a break in the normal generation-to-generation “transmission of language.” Of course, language is not really transmitted; parameter values are triggered thanks to their expression in the input to language acquisition (when they are not, a change may take place). In these terms, the special property of creoles is that they are based on highly defective, or even absent, triggers. At this point, “the human linguistic capacity is stretched to the uttermost” (Bickerton 1981, 4). The trigger experience is impoverished (consisting primarily of pidgin and/or jargon). In Bickerton’s words: It is debatable whether the P-input [pidgin input—IGR] is language at all. P-input is in no sense a reduced or simplified version of some

156  Ian Roberts existing language: it is a pragmatic, asyntactic mode of communication using lexical (and very occasionally grammatical) items drawn mainly, but by no means exclusively, from the politically dominant language. (Bickerton 1991, 365) At the point of creole genesis, then, a new system is effectively “invented.” Because of this, Bickerton argues, creoles are a unique window on the language faculty. Bickerton’s evidence for his point of view comes from the striking morphosyntactic similarities that hold among creoles that are based on different lexifier languages and are widely dispersed both geographically and historically. I review a number of these below. Bickerton argues that it is most unlikely that these similarities are the result of historical borrowing or contact and extremely unlikely that they are due to chance.11 The view that creoles, or any class of languages that one might be able to identify pretheoretically, have some kind of privileged relationship with UG is one of which I am skeptical. If creoles are genuine natural languages, and all the evidence indicates that they are (unlike pidgins, on most views), then they are simply instantiations of UG possibilities like any other system. The P&P approach cannot treat some set C of languages as “closer” to UG than its complement set Cʹ. Parameters must have determinate values for a grammatical system to function (notice that this is a theorem of the view of parameters in Chomsky 1993, not something that needs to be stipulated separately); creoles must therefore contain fixed parameter values. As such, creoles have exactly the same relationship to UG as any noncreole language. (Essentially this point is also made by Lightfoot (1991, 182)). The conclusion just reached is, in a sense, pessimistic with regard to the interest of creoles (of course, creoles will always be just as fascinating as any other natural languages, but, as we have seen, special claims have been made on their behalf). More importantly, it leaves open the question of the similarities that appear to hold among creoles. Also, it glosses over the fact that creoles are acquired on the basis of an unusual trigger (although Lightfoot (1991, 182) argues that both this fact and any consequences it may have are open to doubt). A weaker view, which is less “pessimistic” regarding the special interest of creoles, and which is moreover compatible with most versions of P&P theory, is that creoles are systems with predominantly unmarked parameter values. This is the position of Bickerton (1984) and is essentially what I wish to defend here. Lightfoot argues that this cannot be true across the board, primarily on the basis of word-order facts. Creoles are overwhelmingly SVO languages, and so one would be led to claim that SVO is the unmarked word order. Lightfoot is skeptical of such a claim. However, if we follow recent work by Kayne (1994), who argues that all languages must effectively be SVO, and Zwart (1993), who argues that Dutch is SVO with a great deal of leftward movement of complements, then we are led to conclude that OV

Verb Movement and Markedness  157 order results from a strong N-feature associated with some functional head, presumably AgrO. In that case, the approach to markedness adopted here tells us that OV is a marked word order, because it depends on the presence of a strong N-feature in AgrO. And so Lightfoot’s objection disappears. The claim of this section is that, although Bickerton’s original idea that creoles effectively had some privileged relation to UG as compared with noncreoles is too strong (and in fact cannot be formulated in the theory of parameters assumed here), the idea that creoles fairly systematically reflect unmarked parameter settings (i.e., weak values of the features of functional heads) can account for many attested similarities among creoles and can be plausibly related to the unusual status of the trigger. My empirical focus will be on properties connected to verb movement, and so the claim made in the previous two sections that the lack of V-to-I movement reflects the unmarked parameter setting will be further substantiated. Naturally, a view of the relations among language acquisition, creolization, and language change emerges from the discussion—here, the unifying notion is again markedness. My position will be a fairly weak one: I do not wish to assert that creoles always and only have unmarked values for all parameters. This would make the excessively strong prediction that creoles are all parametrically isomorphic to one another. As Bickerton (personal communication) points out, there are differences between creoles, and these differences arose from variation in the degree to which realizations of functional categories are triggered (Bickerton cites the example of the survival of French relative pronouns into French-based creoles, whereas these pronouns were lost in English-based creoles). The hypothesis I would like to entertain, then, is that creoles tend to have weak values of parameters. As such, they will have a number of properties in common (but notice that these common properties could perfectly well be shared by noncreole languages, something that the strongest version of the Language Bioprogram Hypothesis does not obviously admit). We can relate the tendency towards weak parameter values to the nature of the trigger. Assume that a significant part of the trigger consists of pidgin. Such vernaculars lack inflectional morphology, and so the morphological triggers for strong feature values are missing. Similarly, we can speculate that pidgins do not robustly encode syntactic triggers; the discussion of Hawaiian Pidgin by Bickerton (1981, chap. 1; 1999) strongly suggests this. Notice that we do not need to make any very strong claim about the trigger for the acquisition of first-generation creoles—only that the trigger is morphologically and syntactically defective on crucial points, just in the sense that certain properties required for the triggering of strong features were not expressed with sufficient frequency or clarity. This strikes me as a plausible and conservative speculation as to the circumstances of creole genesis. Granted this much, if the triggering data for strong values is not available to the learner, the learner’s preference for maximally elegant representations will always

158  Ian Roberts favor weak feature values because these give rise to representations that are simpler than those arising from strong feature values. Hence weak feature values will tend to predominate. So, we can relate the unusual circumstances of creole acquisition to a propensity for unmarked (i.e., weak) feature values. The central factor that determines the unmarked status of creoles is the simplicity metric that is intrinsic to the learning device. It is now necessary to show that the syntactic characteristics of creoles can be accounted for as the reflexes of unmarked feature values. Muysken (1981, 1988) discusses six properties that he takes as characteristic of creoles. I will now discuss these one by one (making slight adaptations to the theoretical assumptions) and show how, with one partial exception, they can all be viewed as deriving from unmarked values of parameters. 5.4.1 Lack of Verb Movement This property is particularly interesting in creoles whose lexifier languages have verb movement. It is also of obvious relevance given what was said in the preceding sections. A clear case of this type is Haitian Creole, whose lexifier language is French. Haitian Creole very clearly lacks V-to-I movement, as has been shown by DeGraff (1994) and DeGraff and Dejean (1994). DeGraff (1994) explicitly applies the Pollock-Emonds tests for V-to-I movement (see section 5.1 for discussion) to Haitian Creole, and the results consistently show that V-to-I movement is lacking in Haitian Creole. Contrasts with French of the following type are illustrative (although DeGraff discusses a much wider range of data, including many different classes of adverbs): (23) Adverb placement a. i. *Bouki pase deja  rad  yo. Bouki iron already cloth the ii. Bouki deja pase rad yo. ‘Bouki has already ironed their clothes.’ b. i. Bouqui repasse déjà  le  linge. Bouqui irons already the clothes ‘Bouqui is already ironing the clothes.’ ii. *Bouqui déjà repasse le linge. (24) Negation a. i. Boukinèt pa  renmen Bouki. Boukinèt neg love  Bouki. ‘Boukinèt does not love Bouki.’ ii. *Boukinèt renmen pa Bouki. b. i. *Jean ne pas aime Marie. ii. Jean n’aime  pas Marie. Jean neg-love neg Marie ‘Jean does not love Marie.’

(Haitian Creole)


(Haitian Creole)


Verb Movement and Markedness  159 These examples clearly show that elements that are usually held to intervene between I and VP, and whose position relative to a verb is therefore a test for verb movement, always precede V in Haitian Creole. (Michel DeGraff (personal communication) informs me that Haitian Creole lacks floating quantifiers, and so this test for verb movement is inapplicable; neither is inversion (i.e., I-to-C movement) found in Haitian Creole (DeGraff 1993a, 75)). The conclusion seems clear: Haitian Creole lacks V-to-I movement. As DeGraff points out, Haitian Creole is also a typical creole in that there is no verbal inflectional morphology. There is no subject-verb agreement at all, and information regarding tense, mood, and so forth is carried by preverbal particles (to which I return below). Hence, the correlation between the lack of V-to-I movement and the lack of verbal inflectional morphology observed in the history of various Germanic languages in section 5.2 holds up in the relationship between French and Haitian Creole. The generalization about the trigger experience that I made there, repeated below as (25), is then supported: (25) If a head H has the relevant L-morphology then H has strong L-features. Haitian Creole clearly lacks the relevant verbal morphology, unlike French. Moreover, the impoverished trigger for Haitian Creole must also have lacked the syntactic cues for V-to-I movement; this is plausible for the reasons given above. Given the absence of a trigger for a strong V-feature in I, the drive to minimize the complexity of linguistic structures leads to a system where there is no overt V-movement—that is, a system where I has a weak feature. We thus observe a striking parallel between the development from Middle/ Early Modern English to Modern English and the development from French to Haitian Creole (in addition to the parallels between the development of English and the development of many Mainland Scandinavian languages discussed in section 5.2); this parallel is also observed by DeGraff (1994). The two developments were, of course, entirely unconnected. Moreover, the trigger experiences were altered in differing ways. In English, regular processes of phonological reduction eroded the verbal inflections and an independent syntactic change led to the development of an auxiliary system (see Lightfoot 1979; Roberts 1993a on the latter); moreover, there was a continuous development from one stage to the other. Haitian Creole, on the other hand, is not directly diachronically derived from French; it is a grammatical system that emerged spontaneously from a pidgin whose lexicon is largely French-based (but see Mufwene 1999 and Lumsden 1999 for alternative proposals). As suggested above, I take the nature of the pidgin to be the source of the lack of morphological and syntactic triggers for V-to-I movement. The parallel extends to Mauritian Creole, another French-based creole, as the following example shows (from Green 1988): (26) Li pa  pu  dir  narjê. it neg prt say nothing ‘It doesn’t mean anything.’

160  Ian Roberts On the other hand, Réunionnais appears to pattern with French, to judge by the following (ibid): (27) Li mãz  pa  sel. he eat neg salt ‘He doesn’t eat salt.’ Interestingly, Réunionnais is known to be more “heavily influenced” by French than the other Indian Ocean creoles or Haitian Creole. In fact, Réunionnais is explicitly excluded from the class of creoles by Bickerton (1981, 4). Baker and Corne (1982, 107) provide evidence that Réunionnais had greater superstrate contact than other French-based creoles. This contact presumably provided a syntactic trigger for V-to-I movement in Réunionnais. Mesolectal Louisiana Creole presents an interesting variation on this theme, according to Rottet (1993) (discussed in DeGraff 1994). This variety has “short” and “long” forms for a large number of verbs, where the long form is clearly derived from the French infinitive and the short form from the French present tense: (28)

aret frem kup

arête freme kupe

‘stop’ ‘shut’ ‘cut’

The short form precedes VP adverbs and negation, whereas the long form follows these elements: (29) a. Fo tuzhu kupe     zerb-la. Fo always cut (long form) grass-the ‘It’s always necessary to cut the grass.’ b. Fo kup      tuzhu zerb-la. fo cut (short form) always grass-the ‘It’s always necessary to cut the grass.’ (30) a. Mo pa  mõzhe. I   not eat (long form) ‘I haven’t eaten’/ ‘I didn’t eat.’ b. Mo mõzh     pa. I   eat (short form) not ‘I don’t eat.’ These examples show that the short form acts like a French verb and the long form acts like a Haitian Creole (or English) verb. The short form thus appears to move to I. The idea that the short form moves to I whereas the long form stays in V is confirmed by the interactions of these forms with certain preverbal

Verb Movement and Markedness  161 particles that mark tense and aspect. These particles are incompatible with short forms, but compatible with long forms: (31) Le  klosh ape   sone/*son aster. the bell  PROG ring   now ‘The bells are ringing now.’ Assuming that the preverbal particle occupies an I-like position here, (31) further indicates that short-form V raises to I. Raising is blocked when the particle is present. Example (31) is thus analogous to the English case where the presence of a modal blocks have/be raising. DeGraff (1994) follows Rottet (1993) in assuming that the long forms and the short forms are representatives of different grammatical systems. This is in part corroborated by the fact that only long forms are found in the basilect, and, of course, standard French root clauses require a morphologically inflected verb, the counterpart of the Mesolectal Louisiana Creole short form. DeGraff and Rottet thus take the mesolectal data to be the reflection of coexisting grammatical systems in the sense of Kroch (1989). 5.4.2 SVO Order It appears that creoles are without exception SVO (Bickerton 1981, 1984, 1988; Mühlhäusler 1986; Muysken 1981, 1988). SVO is of course a very common order among noncreoles, but SOV is just as common, and there is a significant minority of VSO languages (the other word-order types are very rare; see Greenberg 1963). So, creoles as a group can be distinguished from noncreoles in that they do not show non-SVO typologies; on the other hand, SVO itself is not confined to creoles. This is a good example of how creoles occupy just part of the space of variation that is attested in language in general. The facts are particularly striking in creoles that derive from non-SVO lexifier languages. This is the case of Berbice Dutch, derived from Dutch and Ijo, both surface OV languages. Another example is Rabaul Creole German, which is SVO despite being derived from German, a superficially SOV language. Mühlhäusler (1986, 156) cites Thomason (1981, 333) who points out that Chinook Jargon is SVO, but the dominant order in the Northwestern Amerindian languages on which it is based is VSO.12 It is generally assumed that VSO order involves verb movement (see Emonds 1980; Sproat 1985). Hence it is a marked order, as compared to SVO, which does not necessarily feature verb movement. I have also suggested that VO is the unmarked order with respect to OV. This is because, following Kayne (1994), Chomsky (1994), and Zwart (1993, 1994), I take VO to be the only available underlying order and surface OV to be derived by overt DP-movement to Spec,AgrOP for Case checking. So, OV results from the presence of a strong N-feature in AgrO. The strong feature of AgrO entails more complex representations for the learner, since it induces

162  Ian Roberts more overt movement. Hence it is marked. So, I conclude that OV order is marked. In this light, the fact that creoles are always VO is a reflection of their tendency to show unmarked values for parameters, given the extremely degenerate trigger for creole genesis. Thus, this property of creoles can be explained in terms of our theory of markedness. In this connection, it is worth reflecting on the surface position of the subject. If we apply the above reasoning to subject movement, the straightforward prediction, assuming some version of the VP-internal subject hypothesis, would be that subject raising out of VP is a marked option. This predicts that the “least marked” word order is SVO (because verb movement, also a marked option, is required for VSO) where the subject is obligatorily adjacent to the verb (and the verb to the object). In such a system, VP adverbs and negation would always precede the subject, as would tense, mood, and aspect particles (if these are not verbs; see below), and any I-type material. As examples like (23), (24a), and (26) above show, this is not the situation in creoles; negation typically intervenes between the subject and the verb (as do tense, mood, and aspect particles). We are led to the conclusion that creoles typically show the marked value for subject movement and the unmarked value for object movement. Why should this be? One possibility would be to deny the VP-internal subject hypothesis. It is certainly true that, if this hypothesis is adopted, we have to explain the general preference to raise subjects (notice that this is true even where subject raising is an option, independently of markedness considerations). Languages with the “maximally unmarked” SVO order of the type just described are hard to find; similarly, VSO languages typically show adjacency effects between the verb and the subject (see McCloskey 1991 on Irish, for example; the same is true of Welsh), whereas a system with only V-raising would show adjacency effects between the subject and the object. These are general puzzles for our approach, as it stands. These observations must be related to two well-known facts: (a) that object agreement is never richer than subject agreement, and (b) that referential null objects are much rarer than referential null subjects. Tendentially, then, AgrS tends to be “richer” or “stronger” than AgrO. To be more precise, it seems that we never find a system where AgrO has a strong N-feature and AgrS has a weak N-feature.13 However, abandoning the VP-internal subject hypothesis is not a solution. The proposal would be that subjects are base-generated in [Spec, AgrS] and that therefore there is no raising to this position, hence SXVO is not a marked order. However, there is clear evidence of raising constructions elsewhere in Haitian Creole, hence evidence that AgrS has a strong N-feature independently of what we assume about the base position of subjects. DeGraff (l993a) gives pairs like the following: (32) a. Genlè Jak damou. seem  Jak in-love ‘It seems that Jak is in love.’

Verb Movement and Markedness  163 b. Jak geniè damou. Jak seems in-love ‘Jak seems in love.’ These examples clearly show that raising to subject of a familiar kind exists in Haitian Creole, and hence that AgrS has strong N-features. I propose that the strength of AgrS’s N-feature in Haitian Creole reflects a property of UG, not a parametrically variant property or a property of the learning device. The generalization can be phrased as follows: (33) AgrS has a strong N-feature. This statement corresponds to the Extended Projection Principle (EPP) of earlier work (Chomsky 1982). In the minimalist framework, the EPP reduces to a feature-checking requirement, as (33) states. However, the above considerations regarding the greater richness of AgrS as compared to AgrO, as well as the fact that we find subject expletives but not object expletives, indicate that AgrS’s N-feature does not vary parametrically.14 If this is the case, then no markedness issue arises, and we of course expect creoles to have a strong N-feature in AgrS.15 It remains an open question why (33) should hold. I will not speculate on this here.16 5.4.3 No Referential Null Subjects This property seems to be generally true of creoles, according to Muysken (1981) and Lightfoot (1991).17 It is particularly striking in creoles based on null-subject lexifier languages like Spanish and Portuguese. A case in point is Papiamentu, based on Spanish. Muysken (1988, 291) gives the following examples to show that this is not a null-subject language: (34) a. E  ta  kome. he asp eat ‘He is eating.’ (Él está comiendo.) b. *Ta kome.  asp eat  (Está comiendo. ‘He/she is eating.’) c. *Ta kome Maria. asp  eat  Maria (Está comiendo Maria. ‘Maria is eating.’) We can attribute this to the fact that creole agreement systems are too impoverished to permit recovery of the content of referential pro.18 Assuming, following Rizzi (l986a), that this ability to recover pro’s content is the fundamental property of null subject languages, we can ask how this should

164  Ian Roberts be expressed in terms of the system of parametric variation adopted here. Given what I said above, it cannot be that the ability to license null subjects is connected to AgrS’s N-feature because (33) would then imply that all languages are null-subject languages. Therefore, it must be a property of AgrS’s V-feature (recall that we assume that only these two features can vary). Accordingly, I adopt the following approach to licensing pro: (35) a. pro is formally licensed by a strong N-feature. b. pro’s content is identified by specifier-head agreement with the relevant inflection. The condition in (35a) is necessary for the occurrence of any kind of pro, expletive or argumental, and (35b) requires that the morphology permitting the recovery of pro’s features be present in AgrS (if something other than morphology can permit recovery of pro’s features (see note 18) then no particular requirement on AgrS may result). This is necessary at LF for the correct interpretation of pro, and it is necessary at PF for the correct identification (and perhaps elimination; see Chomsky 1993) of pro’s features. Assuming that there is no PF movement, this means that pro must be in the relevant configuration at Spell-Out. Moreover, assuming Greed, the head bearing the independent inflection must be required to move into a position satisfying (35b) for pro for reasons independent of pro. The relevant morphology may be a grammaticalized pronoun, as in many Northern Italian dialects, or it may be the verbal morphology. In the latter case, (35b) requires that V be in AgrS at Spell-Out (i.e., that AgrS have a strong V-feature). In this way, we derive the result that referential null-subject languages where pro’s content is recovered morphologically always have V-to-AgrS movement (although the converse does not necessarily hold: languages with V-to-AgrS movement do not necessarily allow referential null subjects; e.g., French, Middle English, Icelandic, etc.). Thus (35b) gives us the result that referential null subjects of this type are a marked property, contingent on V-movement. It then follows from what I said above about verb movement that creoles will not have referential null subjects of this type. Independent evidence for (35b) comes from the fact that referential null subjects cannot occupy the “freely inverted” position in Italian (Rizzi 1987). The requirements in (35) are a part of UG, comprising a well-formedness condition on a particular element, pro. As such, it is not a parameter. The only syntactic variation with respect to null subjects concerns V-movement; the verb must be in a position where (35b) is satisfied in order to license subject pro at Spell-Out. Hence, AgrS’s V-feature plays a crucial role in the null-subject parameter. There is also morphological variation of an illunderstood kind: the verbal inflection must be “rich” enough to permit recovery of pro’s features. I take it that inflection has whatever properties it has at PF and that this may or may not suffice to satisfy the licensing condition in (35b) at PF; there is no syntactic variation on this point beyond verb movement.

Verb Movement and Markedness  165 DeGraff (l993a) argues that Haitian Creole has null subjects. Although he clearly shows that Haitian Creole (and other creoles) can have null expletive subjects, DeGraff’s evidence that Haitian Creole has referential null subjects is not convincing. His argument is that “‘subject’ pronouns in HA [Haitian Creole] do not appear in subject position, but that they are clitics phonologically spelling person and number features of INFL” (p. 73). The evidence for this is as follows: (36) Subject pronouns must be adjacent to VP a. Yaya, bèl   ti  abitan an, ap   viv nan vil   Sen-Mak. Yaya  beautiful little peasant det prog live in  town Sen-Mak ‘Yaya, the beautiful little peasant, lives in Sen-Mak.’ b. *Li,   bèl   ti  abitan an, ap  viv nan vil  Sen-Mak. 3SG beautiful little  peasant det prog live in  town Sen-Mak (37) Subject pronouns are phonologically proclitic on V or the first particle a. mwen ale → m ale b. mwen ap ale→ m ap ale c. mwen pa te ale → m pa te ale (38) Subject pronouns cannot be contrastively stressed a. BOUKI  ale. Bouki  left ‘BOUKI left.’ b. *LI ale. he left (39)

Subject pronouns cannot occur in isolation Ki moun ki   genyen? Bouki/*li. who   comp won   Bouki/him ‘Who won? Bouki/him.’

(40) Subject clitics cannot head complex NPs (e.g., appositive relatives) Bouki/*mwen avèk li,  de   abitan  Sen-Mak, pral Leogàn. Bouki/*isg  with 3sg two peasant Sen-Mak  go  Leogàn ‘Bouki/I and he, two peasants from Sen-Mak, are going to Leogàn.’ (The li in (40) is a complement pronoun, not a clitic; see below.) The properties of the subject pronouns in (36)–(40) indicate only that Haitian Creole subject pronouns are phonological clitics. In fact, standard French subject clitics share all these properties, and these have usually been regarded as phonological clitics (Kayne 1983; Rizzi 1986b; Rizzi and Roberts 1989 [this volume, Chapter 9]) or, more recently, as weak pronouns (Cardinaletti and Starke 1999). The data in (36)–(40) provide no reason to consider the Haitian Creole pronouns as distinct from their Standard French counterparts. Hence they do not provide evidence that Haitian Creole allows referential null subjects. Moreover, the fact that Haitian Creole clitics cannot be “doubled”

166  Ian Roberts by a nominal subject also indicates that these pronouns are not syntactic clitics in AgrS: (41) *Jan li   ale. Jan 3sg leave ‘Jan left.’ (DeGraff’s (1993a, 76) ex. (34)) This situation should be contrasted with that found in many Northern Italian dialects (Rizzi l986b; Brandi and Cordin 1989; Poletto 1993), where the clitics are found in contexts like those in (41). Moreover, fully nominal subjects can occur with no clitic in AgrS: (42)  Jan ale. Jan leave ‘Jan left.’ Again, this situation parallels that of Standard French and differs from what is found in many Northern Italian dialects, favoring the conclusion that Haitian Creole is like French and not like a typical Northern Italian dialect.19 I conclude that Haitian Creole does not allow referential null subjects. However, it is clear from examples like (43) that it does allow expletive null subjects (and see note 17). (43) Te fè  frèt. ant make cold ‘It was cold.’ (DeGraff (1993a, 72) ex. (2)) Also, Haitian Creole allows apparent violations of the C-trace filter, suggesting that it has expletive pro (following the analysis of comparable Italian data in Rizzi 1982, although free inversion of the Italian type is impossible; this must be due to the fact that whatever Case checking process licenses postverbal subjects in Italian is unavailable in Haitian Creole): (44)

Ki moun ou  kwè  (ki)   pa  vini? who   2sg believe (comp) will come ‘Who do you think will come?’ (DeGraff (1993a, 80) ex. (43))

The condition in (35a), in combination with (33), allows any language to have expletive pro in Spec, AgrS. Hence we expect to find it in Haitian Creole. The question now becomes: why does English apparently not have expletive pro in Spec, AgrS? This may reduce to a simple matter of lexical

Verb Movement and Markedness  167 variation: the inventory of pronouns in English contains expletive elements; that of Haitian Creole does not (DeGraff (1993a, note 3) points out that an overt expletive must appear with certain adjectives having clausal complements, e.g., “difficult,” “impossible,” etc.; following Bennis (1986), these look like correlative argument pronouns). The inventory of pronouns in referential null-subject languages seems to consistently lack weak subject pronouns. This has usually been accounted for in terms of the Avoid Pronoun principle, which is presumably some kind of PF Economy principle; I have nothing to add to such an account here. Expletive null subjects are also found in other creoles. DeGraff (1993a, 84) gives the following examples, among others, all of which have expletive null subjects: (45) a.


(A) (bi-) kendi/koto. it tns hot/cold ‘It was hot/cold.’ (Byrne 1987, 76) pro tawata jobe. past rain ‘It was raining.’ (Kouwenberg 1990, 46)



Such data are consistent with the Haitian Creole data presented by DeGraff, and with my proposed treatment of it. We see then that referential null subjects, to the extent that they depend on V-movement to AgrS. represent a marked parametric option. On the other hand, expletive null subjects are a lexical, rather than a parametric, option (in fact, it is arguable that pro is universally available but its distribution varies depending on the other expletive pronouns available in the language; in English, for example, it is arguably available in locativeinversion constructions (see Hoekstra and Mulder 1990)). Hence, if creoles tend to favor unmarked parameter values, we expect that referential null subjects will not be found in these languages, although expletive null subjects may be. 5.4.4 No Complement Clitics This property is particularly striking in those creoles deriving from Romance languages with rich systems of complement clitics. Haitian Creole is again a good example because it derives from French. DeGraff (1994, llff) gives the following contrasts with French: (46) a. Bouqui l’aime. Bouqui 3sg-like ‘Bouqui likes him/her/it.’

168  Ian Roberts

b. *Bouqui aime le/la. Bouqui like  3sg-m/3sg-f

(47) a. Bouki renmen li. Bouki like   3sg ‘Bouki likes him/her/it.’ b. *Bouki li  renmen. Bouki 3sg like Haitian Creole is quite typical in this respect. There are various views on the nature of Romance clitics and cliticization processes. One view, due to Sportiche (1988) and elaborated by Rizzi (1993), is that Romance clitics undergo a combination of D- and DP-movement. In that case, some strong feature must trigger the movements, and hence cliticization is a marked property. Another view, elaborated by Sportiche (1996), is that Romance clitics are themselves heads that trigger movement of pro. Although Sportiche claims that pro can move either before or after Spell-Out, the system of pro-licensing in (35) implies that pro would have to move prior to Spell-Out (although overtly doubled elements could procrastinate, as Sportiche proposes). Hence in this system, too, clitics imply more movement than nonclitics. We do not need to choose between these two different views of Romance clitics here; the important thing is that both imply the existence of an overt movement operation for clitics that is not required where there are no clitics. So, on either view, complement clitics represent a marked parameter value. We thus expect clitics of this kind to be lacking in creoles. 5.4.5 Preverbal Tense/Mood/Aspect Particles This is another property that has often been noticed (see inter alia Bickerton 1981, 1984, 1988; Muysken 1981, 1988). Tense/mood/aspect (TMA) particles carry information that is typically indicated inflectionally in languages with richer morphology, including many lexifier languages. The particles themselves are usually related to auxiliaries of the lexifier language—for example, the Haitian anteriority marker te derives from either été or était, both past forms of French être ‘to be’. Muysken (1981, 183) shows that TMA elements have the following properties: • they occur adjacent to the first verb • they cannot be fronted with the fronted verb in predicate clefts, as the following Papiamentu examples show: (48) a. Ta  ganja Wanchu a  ganjabo. foc lie  John   asp lie-you ‘John has really lied to you.’

Verb Movement and Markedness  169

b. *Ta   a  ganja Wanchu a  ganjabo. foc asp lie   John   asp lie-you •

they only occur with the first verb in serial-verb constructions20

Muysken concludes that these elements are auxiliaries. Although seemingly correct, Muysken’s conclusion leaves important questions open. English auxiliaries, for example, show quite different syntactic and morphological properties internal to the class; it is generally assumed that modals and do are generated in I, whereas have and be are generated in syntactically lower V-positions. The main reason for this is that modals and do are always finite and always precede negation, and have and be are both able to be non-finite and follow negation (see section 5.2). What all English auxiliaries have in common is that they are athematic; these elements appear to lack argument structure. Have and be are also able to appear as “main verbs,” where they may have an argument structure, although this is rather unclear (see Pollock 1989 and Williams 1994 for discussion of this point from different perspectives). It is interesting to consider the Haitian Creole TMA markers from the perspective of English. In his discussion of these elements, DeGraff (l993a, 74–75) makes two important observations. First, negation always precedes all these markers: (49) a. Jan pa t  ava  ale nan mache. Jan neg ant mood go in market ‘John would not have gone to the market.’ b. Jan te (*pa) ava (*pa) ale (*pa) nan mache. Jan ant neg  mood neg go  neg in   market Second, most of the TMA markets are able to act as main verbs in isolation: Pral, marking future, also means ‘to go’; dwe, marking obligation or possibility, also means ‘to owe’; fini, marking completion, also means ‘to finish’; konnen, marking habituality, also means ‘to know’; sòti, marking recent past, also means ‘to leave’; etc. (DeGraff 1993a, 75) It seems clear from the above that the TMA markers of Haitian Creole at least are auxiliaries comparable to English have and be, rather than to English modals and do (DeGraff also points out (note 14) that Saramaccan and Antiguan Creole and comparable to Haitian Creole in this respect, although Mesolectal Louisiana Creole has tense and mood markers that precede negation (DeGraff, personal communication)). However, an important difference as compared to have and be is that the TMA markers never raise, consistent with the general absence of V-movement in Haitian Creole.

170  Ian Roberts It has often been argued that auxiliaries like have and be should be treated as verbs heading their own VPs (this proposal goes back to Ross (1969) and has been adopted in more recent work by Chomsky (1986), Pollock (1989), Roberts (1993a), and many others). In that case, they are verbs that are at least able to be thematically defective, even if they may also have thematic structure when they appear as main verbs. However, suppose that functional heads are underspecified lexical heads,21 then these elements must strictly speaking be functional heads. The fact that these elements appear lower than negation indicates that they represent a layer of functional structure below Neg. A proposal of this type has been made for complex auxiliary structures by Cinque (1994). Notice that, if these elements are functional heads, then they are functional heads with weak features because they do not trigger overt incorporation of lower verbs. This is of course consistent with my general proposal about creoles. These elements are licensed as LF affixes (when they have no argument structure; when they have an argument structure, they are licensed by assigning thematic roles, like all other verbs). One question remains: why are the TMA markers like have/be rather than like modals? The answer to this becomes clear if we consider the account of grammaticization proposed by Roberts (1993b) and Clark and Roberts (in progress). This theory of grammaticization postulates that three properties were required for the grammaticalization of a lexical item l of category L: • • •

l is in a category L which is always moved to a functional head F l is aprosodic l has a potentially athematic interpretation

Our analysis of the TMA markers of Haitian means that they clearly have the third of these properties. Also, Haitian TMA markers are readily subject to various processes of phonological reduction, which indicates that they are prosodically weak (Michel DeGraff, personal communication); notice the parallel with English auxiliary reduction. However, it is the first property that is relevant here: in order for a lexical category L to be grammaticalized as a functional item F that occupies a position above negation, it must be that at some prior stage in the history of the system there was movement from L to F. This is the case in English, for example, where the positions occupied by modals and do are a relic of the earlier V-movement system (see section 5.2 and Roberts 1993a, chap. 3). If we assume that creoles retain unmarked properties, then creoles lack V-movement; hence the conditions for creating auxiliaries like English modals have never obtained. We see, then, that the TMA system is a complex of functional heads with weak features. Above negation, there is at least one further functional head (depending on one’s assumptions about the position of Neg and the fine structure of C), which has weak V-features. All the features that are capable of variation thus have weak values in Haitian Creole, and as far

Verb Movement and Markedness  171 as I am aware, Haitian Creole is quite typical of creoles generally in these respects. This is consistent with our general view of creoles and our theory of markedness. I must add a proviso to this discussion of TMA systems. Beginning with Bickerton (1975), there has been a great deal of discussion of highly specific regularities in these systems. Two principal claims have been made: (a) that the order is always precisely T–M–A; and (b) that there is a “Jakobsonian” (Bickerton’s term) system of markedness in operation, in that if T is phonologically realized, it is [+anterior], if M is phonologically realized, it is [+irrealis], and if A is phonologically realized, it is nonpunctual. The empirical basis of these claims has been disputed (see, for example, Muysken 1981 and the response in Bickerton 1981), and I do not wish to enter this controversy here. As it stands, my proposal for markedness makes no prediction about the finer structure of the TMA system. Further research on clause structure may yield some such prediction. The Jakobsonian markedness may be a reflection of markedness properties of the lexicon; alternatively, it may show “absorption” of the features of functional heads in the sense of Roberts and Shlonsky (1996). One potential problem arises from our discussion of TMA markers. It could be taken to imply that creoles have reached a steady state, at least as far as V-features are concerned. This is so because the main mechanism by which strong features are created diachronically is grammaticization, and we have seen that this process depends on the prior existence of L-to-F movement—that is, in the verbal domain, the existence of strong features. It is possible that this is a correct result: creoles seem to be subject to a great deal of change, and yet the presence of TMA markers characterizes those of differing ages and in widely scattered parts of the world. A maximally unmarked V-system may be particularly resistant to change. However, I do not want to say that change is impossible. Phonological factors (essentially prosodic reductions), a change in the negation system obscuring the boundary between the two functional layers, or the development of referential null subjects by the grammaticization of subject pronouns as AgrS (see note 19) could lead to the development of strong V-features in the higher functional domain. Hence we can conclude that creole TMA systems are robust, owing to their unmarked nature, but they are not a steady state. I conclude that it is possible to analyze the shared syntactic properties of creoles that are reasonably well understood in terms of the idea that they are reflexes of weak feature values of functional heads (i.e., as unmarked parameter-settings).22 So, I maintain the view put forward by Bickerton (1984) that creoles tend to have unmarked values for parameters. It is reasonable to speculate that this is due to the peculiar circumstances of creole genesis: if the trigger experience is made up largely of pidgin, then morphological triggers will be wholly lacking and syntactic triggers defective. The learner will then employ the elegance property in such a way that the simplest possible representations—that is, those with the minimum amount

172  Ian Roberts of movement—will be selected. The result is that overt movement will be lacking, and creoles will have weak values for all or most parameters. In this precise and rather limited sense, Bickerton’s claim that creoles can tell us something special was right. However, this does not mean, pace Bickerton, that creoles give us a unique insight into the workings of language faculty.23 Creoles, even if they truly have all parameters set to the weak value, are still just instantiations of what UG in combination with the learning device creates as possible variation. Moreover, languages other than creoles may have similar constellations of weak features: English for example has no main-verb movement to the higher functional domain, surface SVO order, no referential null subjects, no complement clitics, and a system of preverbal TMA markers somewhat similar to those found in creoles. Moreover, all of these properties, except for the null-subject one, have been innovated in the recorded history of the language (and the null-subject parameter must have changed since Indo-European, if not since Proto-Germanic). What gives us a privileged view of UG, and of the nature of the parametersetting algorithm, is not creoles but language change. Creoles are particularly interesting because they represent an extreme of language change, but it is the mechanisms of language change, which are ubiquitous in the history of every language and every language family, that have made creoles what they are.

5.5 Conclusion In this chapter, I have presented a number of the ramifications of a wellknown parameter: the one that determines whether a finite V moves to I. After seeing the initial motivation for positing this parameter—essentially the English/French contrasts originally discussed by Emonds (1978)—we saw the evidence that this parameter changed its value in sixteenth-century English. I gave an account of this change in terms of a theory of markedness based ultimately on the elegance condition of the learner proposed by Clark and Roberts (1993 [this volume, Chapter 2], in progress) and the notions of morphological and syntactic expression of a parameter. We next considered the root-infinitive phenomenon of child language, discussed by Rizzi (1994) and Wexler (1994). Here, I argued that these facts could be understood in terms of an early attempt to set the V-to-I parameter to the weak value in languages where the morphological trigger for the parameter is not salient. Finally, we considered the nature of creoles and creolization, and I suggested that Bickerton’s (1984) conjecture that creoles tend to instantiate unmarked values of parameters is correct, once viewed from the perspective of the theory of markedness being proposed here. In the case of V-to-I movement, this implies that creoles typically lack V-movement, which appears to be correct. I also suggested that the possibility of referential null subjects is intimately connected to the V-to-I movement parameter.

Verb Movement and Markedness  173 If the ideas in this paper are anywhere near correct, then the next step for research is to extend this kind of reasoning to other parameters. In that way, a truly general theory of crosslinguistic variation will emerge.

Notes Parts of this material have been presented in lectures at the Universities of Paris, Florence, Venice, Oxford, and Wales. I’d like to thank the audiences at those presentations for their comments. I’d also like to thank Robin Clark, from whom the ideas about learning theory are derived, and the students in my Seminar on Language Change at the University of Maryland—especially Jairo Nuñes, who suggested that I look at creoles. Thanks also to David Lightfoot and Juan Uriagereka for interesting discussions and much more. Finally, my grateful thanks to Derek Bickerton and Michel DeGraff for very helpful comments on an earlier draft. All the errors are mine.   1. For the purposes of exposition, I adopt the “split-I” version of clause structure, originally proposed by Pollock (1989), only where necessary. Elsewhere, I, IP, and so forth are used as cover terms for the functional system associated with the verb, although I recognize that this hides a more complex reality.  2. The straightforward implication that a language allows inversion only if it has the French-style orders in (la), (2a), and (3a) does not hold. The Mainland Scandinavian languages—Swedish, Danish and Norwegian—are verbsecond in root clauses and pattern like English with respect to the Pollock/ Emonds tests in embedded clauses (Platzack 1987; Platzack and Holmberg 1989; Vikner 1995). If at least some V2 clauses involve V-to-I movement, as proposed by Travis (1984) and Zwart (1993), then the generalization can be maintained. The absence of French-style patterns in the equivalents of (1), (2), and (3) becomes a question of the trigger of V-to-I movement. See also note 5.   3. In fact, Pollock notes that main-verb infinitives can undergo “short” movement over VP adverbs but not over negation: (i) A peine comprendre/comprendre à peine l´italien après cinq ans hardly to-understand/to-understand hardly Italian after  five years d’étude est une honte. of-study is a disgrace ‘Hardly to understand/to understand hardly Italian after five years of study is a disgrace.’

This led to the postulation of two landing sites for V-movement and hence the Split-I Hypothesis.   4. Chomsky accounts for have/be raising with the suggestion that these elements cannot obey the Procrastinate principle because they lack an LF interpretation and so must be eliminated prior to that level. I will not adopt this account but return to this matter briefly below.   5. Because Swedish has always been a V2 language, I moves to C in most main clauses; this is clearly a separate parameter from the V-to-I one, although they interact in that verb second triggers V-to-I independently of any property of C, suggesting that verb second is at least in part a property of I, although it must be a distinct one from that governing V-to-I movement of the kind we are concerned with here. See Zwart 1993 for an analysis of verb second that treats it as partially an I-property. These considerations do not alter the fact that inversion (of the “residual V2” kind found in English and French) interacts with V-to-I movement. This can be seen in the history of English, where the loss of

174  Ian Roberts orders like (6a,c,d) patterns with the loss of inversion in interrogatives of the kind in (6b). French, of course. retains these patterns. English used to be a V2 language, but this property was lost around 1400, well before the change being discussed in the text (see Van Kemenade 1987).   6. On the other hand, Rohrbacher predicts that Dutch and Afrikaans lack verb movement whereas German has it, and Roberts predicts that all the West Germanic languages have it. If these languages have a final I, as both Roberts and Rohrbacher assume, it is not clear if any empirical differences follow from this. If, on the other hand, they have medial I, as argued by Zwart (1994), then differences are predicted. Both Roberts’s and Rohrbacher’s proposals are incompatible with Zwart’s analysis of Dutch clause structure and its natural extension to German. However, the adoption of a richer clause structure for West Germanic opens up the possibility again of detecting differences. One relevant consideration is that German allows more leftward DP-movement than Dutch; in Chomsky’s (1993) terms, this can be connected to German having more V-movement than Dutch. There are many open questions here, however, and to go into them fully would take us too far afield.   7. Strong features are marked wherever there is a choice. If there exists a functional head that always and only has strong features (for example, Watanabe (1993) has argued that wh-operators always have strong features), then these features would not count as marked. The crucial notion here is relative complexity.   8. A possible alternative, which retains the Clausal Truncation Hypothesis without postulating maturation of (18), is to say that CP is lacking at this stage of acquisition. Once CP is acquired, (18) comes with it, and root infinitives are banished. In order to avoid invoking maturation all over again by saying that CP matures, such an account would have to say that CP is not triggered early on. However, as (19) shows, wh-questions are found at the root-infinitive stage, and so this stage and CP coexist, which makes it difficult to see how root infinitives could be due to the absence of CP from this stage of acquisition. One could perhaps claim that there are two grammars at this stage: one with CPs and no root infinitives, and one with root infinitives and no CPs. But how, in that case, is CP not triggered in the root-infinitive system? It is hard to see an answer to this that would not appeal to maturation. (Thanks to Derek Bickerton for raising this possibility.)   9. It seems clear that UG allows for negative items to be realized either as XPs or as heads, and that double-barrelled negation featuring both a head and an XP is allowed (the best example of this last situation is of course French, although Middle English, Old High German, Welsh, Breton, and numerous Northern Italian dialects also have a ne. . .pas type of negation). The XP is a specifier of NegP and the head a Neg0. Acquirers can distinguish them on the basis of their interactions with movement, particularly head movement. See Pollock 1989 and Belletti 1990. 10. It obviously is syntactic in part, but there may be a morphological trigger in some varieties in the form of complementizer agreement (see Zwart 1993). For further remarks on the trigger for V2, see Clark and Roberts 1993 (this volume, Chapter 2). See also note 5. 11. By contrast, the monogenesis theory of creole origins holds that all creoles are essentially derived from a Portuguese-based creole that was widely spoken in the fifteenth century (see Thompson 1961; Todd 1974); however, even if the monogenesis theory is correct, creoles have had 600 years of quite independent development in geographically dispersed locations and with, in every non-Portuguese-based creole, massive relexification. The Romance languages in 1100, separated from a common parent by 600 years of development— but with more contact and less relexification than creoles—show a number of

Verb Movement and Markedness  175 striking typological differences: some are V2, some are not; some have residual morphological Case, some do not; some have scrambling, some do not; and so on. Monogenesis alone is, in my view, very unlikely to explain why creoles should occupy such a small space of the available variation; in this sense, the monogenesis theory is irrelevant to the explanation of the morphosyntactic properties of creoles. 12. Mühlhäusler (1986) gives one or two counterexamples to the claim that all creoles are SVO. However, these may be pidgins rather than creoles. Sri Lankan Portuguese creole is SOV (Smith 1977, cited in Romaine 1988, 40). However, as Romaine says “Portuguese influence was removed in 1658, rather early in the development of the creole. This meant that the substratum languages provided the input during the creolization phase” (ibid). The substratum languages are Sinhala and Tamil, both rigidly OV. These languages may thus have provided a syntactic trigger for OV order. 13. I am assuming that AgrS triggers raising to its specifier, and not, pace Chomsky (1993), that T does this; the relevant generalization could be reformulated in terms of the exact assumptions that Chomsky makes, but I retain this formulation for clarity of exposition. 14. The statement in (33) does not explain the apparent absence of object expletives generally, but it does explain the prevalence of subject expletives. The fact that object expletives are not found, if it is a fact, suggests that there is more to say on the general question of Agr’s N-features. 15. This proposal implies that in VSO languages the subject raises to [Spec, AgrSP] and the verb higher. I believe that such an analysis is correct, but to go into this matter would take us too far afield here. 16. Derek Bickerton suggests an account of why creoles may have subjects in [Spec, AgrSP] that does not rely on (33). Pidgins do not have consistent word-order patterns because they do not have syntax. However, they show a statistical tendency towards Topic-Comment order. Because Topics are typically agents, and agents are always subjects, creolizers may reanalyze Topics as subjects. If the Topics precede everything else, then the subject would be analyzed as occupying [Spec, AgrSP]. In other words, the tendency towards Topic-Comment order in the pidgin would act as a syntactic trigger for the marked N-feature of AgrS. This account is quite plausible (although one would need to demonstrate that pidgins always have a tendency towards Topic-Comment order). However. the facts about noncreoles mentioned in the text suggest that (33) is needed anyhow, and so I tentatively retain that account. 17. Several remarks are in order here. First. Michel DeGraff (personal communication) tells me that Mauritian Creole allows null subjects: (i) e pu  repar sa sime la dimen. (we) mod repair det road det tomorrow ‘We will repair this road tomorrow.’

See Syea 1985 and Adone 1994. Adone suggests that such null arguments are variables bound by empty topics rather than occurrences of pro. If this is correct, then the claim in the text can be maintained. Second, Derek Bickerton raises the problem of tensed serial-verb constructions (SVCs), giving the following paradigm from Seychelles Creole, a Frenchbased Indian Ocean creole: (ii) a.  Zot  pran balye  koko   bat Kazer.    they take  broom coconut hit  Kaiser b.  Zot ti   pran balye  koko  ti  bat Kazer.   they ant take broom coconut ant hit  Kaiser

176  Ian Roberts c.  Zot ti pran balye  koko  zot ti bat Kazer.     they ant take broom coconut they ant hit Kaiser      ‘They hit the Kaiser with a coconut broom.’

Example (iib) looks like a clausal structure, because it contains a TMA marker, and can contain an overt subject (as in (iic)). This is not a coordinate structure because extraction is possible. The only possibility is to assimilate (iib) to (iia) and treat it as an SVC despite the presence of the TMA element (below I will in fact suggest that creole TMA markers in general are “low” functional elements that have more in common with verbs and in fact are directly comparable to English have and be). I have nothing to offer on the subject of serial verbs here, however. Bickerton also points to Hawaiian Creole English examples like the following: (iii) a.  She bring food for I eat. b.  She bring food for eat. c.  *She bring food for me (to) eat.

The ungrammaticality of (iiic) shows that Hawaiian Creole English differs from English in that for is unable to assign Case to a following subject. And (iiia) shows that Hawaiian Creole English differs from English in that for can have a tensed complement. However, we can take (iiib) to be an instance of an infinitive complement to for containing a PRO subject that is controlled by the matrix subject. Examples (iiia) versus (iiib) thus parallel English (iva) and (ivb), respectively:

(iv) a.  She brings food in order that I eat. b.  She brings food in order to eat. If Shei bring food for shei eat is bad in Hawaiian Creole English, then this parallels the mysterious impossibility of (va) versus (vb) in French: (v) a.  *Je veux que je parte.    I want that I  go (subjunctive) b.  Je veux partir.    I want to-go    ‘ I want to go.’ None of the above instances provides clear evidence of null subjects in creoles. 18. I have nothing to say about the difficult question of null arguments in languages that entirely lack overt agreement (e.g., Chinese, Thai, etc.). Presumably, (35b) should be extended to include some general notion of “recovery from the syntactic context” (suggested to me by Bickerton, personal communication). See Huang 1989 for a proposal. 19. Poletto (1993) discusses a number of Central Veneto dialects (Venetian, Paduan, and Trevisan) that show the French pattern of non-co-occurrence with an overt nominal subject. These dialects show other null-subject properties, notably free inversion. Moreover, the subject clitics follow preverbal negation and are obligatory in all conjuncts of coordinated predicates (see Rizzi 1986b). Both of these properties contrast with standard French and lead Poletto to propose that the subject clitics are not in subject position but are Agr elements that license pro. However, these clitics are argumental, hence no subject argument can cooccur with them without violating the Theta Criterion. It is possible that the Haitian Creole subject pronouns that we are discussing here are like this. Note that this conclusion would not undermine the point being made in the text because only expletive null subjects are allowed in this kind of system, and my claim is that referential null subjects are connected to a marked parameter

Verb Movement and Markedness  177 setting. We might thus expect to find null-subject systems that are comparable to Central Veneto in creoles. However, note that we do not have the principal independent indicator of null-subject status in Haitian Creole—namely, free inversion. Haitian Creole pa is preverbal, but it arguably occupies a position comparable to French pas; see DeGraff 1993b for arguments that Haitian pa, unlike French pas, heads the negation phrase, NegP. In any case, one reason why pa precedes the verb in Haitian Creole is because the verb does not move, as we have seen; on the other hand, in French and in most Northern Italian dialects, the verb does move. In the absence of ne-type preverbal negation in Haitian Creole, the only indicator of the position of subject pronouns is whether they are obligatory in all conjuncts of coordinated predicates. Michel DeGraff (personal communication) informs me that this is tendentially the case, although not an absolute requirement: (i) Li  vini, *(li)  manje epi *(li) ale. 3 sg came 3sg  eat   and 3sg leave ‘S/He came, (s/he) ate and (s/he) left.’

DeGraff adds “However, this ‘constraint’ is often not obeyed in the speech and writings of Haitian Creole/French bilingual speakers—turning the stars in the examples above into question marks.” It is thus not clear whether this diagnostic is giving a clear indication of the status of the subject pronouns here. 20. Michel DeGraff (personal communication) informs me that Saramaccan is a counter-example to this generalization. In this creole, a tense marker can appear with both the first and the second verb. See Byrne 1992. Derek Bickerton (personal communication) points out that the same is true in Seychelles Creole, as illustrated by example (iib) of note 18. 21. The characterization of functional heads is a very complex matter that is central for current theory (and for the view of parameters adopted here). The one suggested in the text has the merit of simplicity, at least. “Underspecification” relates to the interfaces: functional heads are typically athematic and aprosodic (weakly stressed and often showing less-than-minimal prosodic structure (see McCarthy and Prince 1986)). Notice that, if parametric variation is the result of underspecification (Uriagereka 1988), then it follows that functional heads are the locus of parametric variation. Clark and Roberts (in progress) extend this idea by saying that parametric variation is imposed by the learner on an underspecified aspect of UG. Then the entire theory of parameters can be deduced from the statement that categories can be underspecified. 22. The above is a somewhat sketchy and second-hand overview of a number of salient syntactic properties of creoles. Bickerton (1981, chap. 2) lists a number of further properties as characteristic of creoles. Here I briefly list, exemplify, and comment on those which are most relevant to the present concerns: (i) No movement in yes/no questions Yo pa-t-a-vlé     mênê-m lakay-li. (Haitian Creole) they not-tns-mon-want take-me house-his ‘They wouldn’t have wanted to take me to his house.’/’Wouldn’t they have wanted to take me to his house?’ (Bickerton 1981, 70)

This property fits straightforwardly with our claims (it is also discussed by Mühlhäusler (1986, 156)); inversion is I-to-C movement, triggered by a feature of C (see Rizzi 1991 and section 10.2), and as such is a marked property. Hence we do not expect to find it in creoles.

178  Ian Roberts

(ii)  Same “verb” for existential and possessive constructions (as distinct from locatives) Gê  you fâm   ki  gê  you pitit-fi. (Haitian Creole) have one woman who have one child-daughter ‘There is a woman who has a daughter.’ (Bickerton 1981, 66)

This property is very interesting in the light of Kayne’s (1993) account of auxiliaries. Suppose that the existential auxiliary is Kayne’s “archi-auxiliary” BE; to form a possessive auxiliary, a functional head in the complement of BE must raise to BE. Hence there must be movement, and so the presence of distinct existential and possessive auxiliaries is a marked property. So we do not expect creoles to have them. (We cannot explain the systematic distinction with the locative auxiliary, however).

(iii) Wh-fronting Wisaid   yu  bin de? which-side you tns be-loc ‘Where have you been?’

(Guyanese) (Bickerton 1981, 70)

Watanabe (1993) has argued that syntactic wh-movement is universal (i.e., that the relevant “N-feature” of C is universally strong); in that case, the presence of wh-movement in creoles is not a reflection of a variant and so cannot be a marked property. Here, the reasoning essentially parallels what I said above in connection with subject-raising out of VP.

(iv) Negative concord on subjects (see also Déprez, 1999) Non dog na bait nan kyat. (Guyanese) ‘No dog bit any cat.’ (Bickerton 1981, 66) (v) Lack of subject relative pronouns Wan a dem a di man bin get di barn. (Guyanese) ‘One of them was the man who had the bomb.’ (Bickerton 1981, 62)

We cannot account for the phenomena in (iv) and (v), although it seems clear that they are independent of verb movement.   Many of the other properties that Bickerton originally claimed to be common to creoles have either been claimed not to be, or claimed to be due to substrate influence, and hence I will not consider them here. 23. Bickerton (personal communication) has responded to this point by underlining that creoles always grammaticize certain functional elements (TMA markers and oblique Case markers) when these have been lost from the original language because of pidginization, and only much more sporadically grammaticize other parts of the functional system (e.g., relative pronouns). Thus, he claims, creoles “spell out the minimal amount of morphology required for the principles of UG to work.” This point seems to be connected to the issue of “Jakobsonian markedness” raised above: I have nothing to say at present as to why certain properties are overtly realized by grammaticalized items and others not. I agree (a) that these elements are morphology (LF affixes, see above) and (b) that they are “minimal,” precisely in the sense that they are LF affixes and not syntactic or lexical affixes, and so they do not induce the complexity of overt movement. However, it is not clear to me that other processes of language change do not reveal the same propensity for grammaticization of

Verb Movement and Markedness  179 certain features of functional heads. Neither is it clear that, in any sense beyond that of instantiating weak values of parameters, these properties indicate that this is the minimum needed for UG.

Theoretical Consequences Anna Roussou and Ian Roberts

6.0 Introduction In the Introduction and Chapter 1 (of Roberts & Roussou 2003; R&R henceforth) we provided the theoretical framework that underlies our approach to grammaticalisation, focusing on issues pertaining to language change and its relation to acquisition and the nature of parameters. Within this setting we proposed that grammaticalisation can be seen as the result of upward reanalysis which affects a subclass of lexical items. As such, its effects in the grammar can be explained and indeed predicted, without at the same time postulating a distinct process or mechanism of change. The empirical evidence for our approach was given in Chapters 2, 3, and 4 of R&R where we focused on the grammaticalisation of T, C, and D elements respectively. Some of the cases we have considered have been treated as typical examples of grammaticalisation to the extent that they involve lexical to functional reanalysis (the cases in Chapter 2 of R&R for example), while others have not been considered as such, partly because they involve functional to functional reanalysis (e.g. most of the cases in Chapters 3 and 4 of R&R). In this chapter, we return to the theoretical issues raised in the Introduction and in Chapter 1 of R&R. Our goal here is to elucidate these as far as possible, in the light of the analyses of the various cases of grammaticalisation analysed in Chapters 2–4 of R&R. We identified three main questions as themes in the Introduction of R&R: (i) the ubiquity of grammaticalisation—why is this kind of change so common? (ii) the apparent conflict between a descriptively adequate analysis of grammaticalisation, which amounts to identifying pathways of grammatical change, and an explanatorily adequate account of syntactic change as parametric change, which predicts random oscillation among possible UG instantiations; (iii) the inventory and nature of functional categories. In this chapter we will consider each of these issues in turn. In brief, we will argue (i) that grammaticalisation is so common because it represents a natural form of endogenous change; (ii) the conflict between description and explanation in diachrony can be resolved by introducing a notion of markedness into the parametric system; (iii) an account of the nature of

Theoretical Consequences  185 functional categories which takes them to be inherently deficient in their interface properties. The crucial property of functional categories is that they are only fully defined in relation to the syntactic system; in all other respects they are defective. Before going into these questions, however, we first review and systematise the cases we have looked at. This will give us a useful synthesis of the results of the earlier chapters of R&R.

6.1  A General Characterisation of Grammaticalisation In this section we list all the empirical cases discussed in R&R, by providing a schematic representation of the relevant structural changes. We then identify the parametric change and, where possible, the cause of the change as well. The cases considered are thus summarised as follows (here F* denotes the phonological realisation of F, a functional head, and the subscript “Merge” and/or “Move” indicates the mechanism of realisation):  (1) English modals (2.1 of R&R): i. Structural change: [TP V+T [VP tV TP]] > [TP T VP] ii. Parametric change: T*Move > T*Merge iii. Cause: loss of infinitive marker.  (2) Romance future/conditionals (2.2 of R&R): a. i. Structural change: [TP [VP XP thabeo [T habeo]]] > [TP XP [T habeo]] ii. Parametric change: T*Move > T*Merge iii. Cause: morphological irregularity/meaning of habeo b. i. Structural change: [TP XP [T habeo]] > [T Infinitive [T habeo]] ii. Parametric change: T*Move > T*Merge/Move iii. Cause: reduced productivity of leftward XP-movement (“weakening” of OV) c. i. Structural change: [TP [T V +Af] [VP tV]] > [TP [T V + Af] [VP tV+Af]] ii. Parametric change: T*Merge/Move > T*Move iii. Cause: loss of trigger for T*Merge/Move (e.g. mesoclisis)  (3) Greek tha (2.3 of R&R): a. i. Structural change: [TP V+T [VP tV TP]] > [TP T+V1 [VP tV1 + V2] ii. Parametric change: T*Move > T*Merge iii. Cause: loss of infinitival morphology

186  Anna Roussou and Ian Roberts b. i. Structural change: [MP T+M [TP tT VP]] > [MP M [TP V+T [VP tV]]] ii. Parametric change: M*Move > M*Merge, T*Merge > T*Move iii. Cause: reanalysis of impersonal thelei c. i. Structural change: [MP the [TP [VP [CP na +Vlexical]]] > [MP tha [TP V+T [VP tV]]] ii. Parametric change: loss of C*, T*Merge > T*Move iii. Cause: reanalysis of thelei+V in the presence of the na.  (4) Greek na (3.1 of R&R): i. Structural change: [CP C [MP hina [TP. . . > [CP oti/na [MP tna [TP . . . ii. Parametric change: C > C*Move (?) iii. Cause: loss of subjunctive morphology, reassignment of Mood features from T to M.  (5) Calabrian mu (3.2 of R&R): i. Structural change: [CP AdvP C [NegP (Neg) [MP M [TP. . .> [CP C [NegP (Neg) [MP M [TP. . . ii. Parameter change: none iii. Cause: loss of subjunctive morphology, reassignment of Mood features from T to M.  (6) English to (3.3 of R&R): i. Structural change: [PP to [DP V + enne]] > [VP V [CP [MP to [TP [T V+enne]]]]] ii. Parametric change: M > M*Merge iii. Cause: loss of infinitives/subjunctives  (7) Germanic that; Greek pou (3.4 of R&R): i. Structural change: [CP Proni [C (Prt)] [IP…. ti…. ]]] > [CP [C that (+Prt)] ii. Parametric change: C*Move(Merge—Germanic) > C*Merge iii. Cause: ambiguity of relative clauses  (8) Serial verbs becoming complementisers (3.5 of R&R): a. i. Structural change:

[CP C [TP T [VP1 V1 [VP2 V2]]]] > [CP C [TP [T V1] [VP2 V2]]]

ii. Parametric change: T > T*Merge iii. Cause: unknown

Theoretical Consequences  187 b. i. Structural change: [CP C [TP [T V1] [VP2 V2]]] > [CP [C V1] [TP T [VP2 V2]]] ii. Parametric change: T*Merge > T, C > C*Merge iii. Cause: unknown  (9) Romance determiners (4.1 of R&R): i. Structural change: [DP [DemP ille] D .. > [DP [D (il)le]] ii. Parametric change: D[+def] > D[+def]* iii. Cause: loss of morphological case-marking on DP (10) French n-words (4.2.2 of R&R): i. Structural change: [DP [D Ø] [NumP [Num rien] [NP trien]]] > [DP [D Ø] [NumP [Num rien] NP]] ii. Parametric change: Num*Move > Num*Merge iii. Cause: loss of null indefinite D (11) French Stage-Two negation of Jespersen’s Cycle (4.2.3 of R&R): i. Structural change: V [DP mie/pas/point ([PP de DP])] > V [Neg mie/pas/point] [VP ([DP Ø de NP])] ii. Parametric change: (low) Neg > Neg*Merge iii. Cause: loss of non-negative content of negator, reanalysis of dephrase (not pas) (12) Greek oudhen > dhen (4.2.3 of R&R): a. i. Structural change: [DP ou [NumP de [NP hen]]] > [DP dhen [NumP [NP] ]]] ii. Parametric change: Num* > Num iii. Cause: loss of ou due to phonological change b. i. Structural change: [NegP [DP oudhen] Neg [MP M [. . .]]] > [NegP [Neg dhen ] [MP M . .]]] ii. Parametric change: Neg*Move > Neg*Merge iii. Cause: loss of ou due to phonological change (13) Greek ti(s) (4.3 of R&R): i. Structural change: [DP DØ [NumP ti [NP N]]] > [DP ti [NumP Num [NP N]]] ii. Parametric change: D > D*Merge, Num*Merge> Num iii. Cause: development of determiners (14) Greek existentials (4.3 of R&R): i. Structural change: [QP Q [DP pjoios [NumP [NP N]]] > [QP ka [DP pjoios [NumP Num [NP N]]]]

188  Anna Roussou and Ian Roberts ii. Parametric change: Q > Q*Merge iii. Cause: (13) (15) Free relatives > free-choice indefinites (4.4 of R&R): i. Structural change: [DP [D qui(s) [D 0 ] [CP [DP t ] C [IP vis tDP ]]] > [DP [D quivis ] NP ] ii. Parametric change: D*Move > D*Merge iii. Cause: phonological reduction (16) Free relatives > universals (4.4 of R&R): i. Structural change: [QP [Q quis [Q que]] [DP D [CP [DP tquis] C [IP. . .V. . .]]]] > [QP [Q quisque] DP] ii. Parametric change: Q*Move > Q*Merge iii. Cause: phonological reduction (17) Northern Italian subject clitics (4.5 of R&R): i. Structural change: [PersP DPi [Pers V] [NumP. . . [VP ti. . . > [PersP [Pers D [NumP [Num V] . ii. Parametric change: Pers*Move > Pers*Merge iii. Cause: paradigm levelling and split (18) Welsh agreement (4.6 of R&R): i. Structural change: [Agr V + D [YP. . . > [Agr V [YP. . . ii. Parametric change: X*Move/Merge > X*Move iii. Cause: V-movement In the structures given in (1)–(18) we can identify a number of common properties and, as we will show, immediately reduce them to a single pattern, which we identify as structural simplification (in a way that will be made clear below). Let us proceed by grouping the changes given above. The first pattern we identify is that prototypically exhibited by English modals, the Romance future, Greek tha, and the serial verb constructions: movement is lost and a new exponent for the higher functional head, which corresponds to the earlier target of movement, is created. The same pattern is found with changes inside the DP for example: a lower head (Dem, Num, or even N) moves to a higher functional head, such as D, or Q, movement is lost and the original moved item becomes reanalysed as the exponent of the higher head. This is in fact the pattern we identify in the structures in (1), (2a), (3), (8)–(10), (12a), (13), (15). In other words, the lexical item that formerly realised a lower head has now become the realisation of a higher functional head. This can be schematically represented as in (19): (19) [XP Y+X [YP tY ] > [XP Y=X][YP Y]

Theoretical Consequences  189 What the structure in (19) essentially tells us is that the lexical item that at some point realised both X and Y, now becomes the realisation of X. This yields the possibility of a new realisation for Y. The second group can be identified with the changes that gave rise to the creation of modal particles, such as na and mu, discussed in Chapter 3, Sections 3.1 and 3.2 of R&R respectively. As already argued in the relevant sections, these changes were triggered by the loss of subjunctive morphology. Recall that the morphological realisation of the subjunctive was given in the form of a series of agreement affixes which differed from the indicative series. Once the two paradigms collapsed, giving a single paradigm (the indicative), the subjunctive features are now realised in M and take the form of a distinct lexical item, while the different readings associated with the subjunctive are derived from the combination of the particle and the different forms of the finite verb. This change can be schematised as follows: (20) [XP XF . . .[YP . . .YF . . .]] > [XP XF . . .[YP . . .Y. . .]] This change is very similar to that in (19). In fact, to the extent that X’s content is exhausted by F, it has the same outcome, namely [XP Y=X [YP Y]]. The only clear difference between (19) and (20) is that in the latter case it is the features associated with Y that become part of X and not Y itself. In a more abstract sense though the relevant structure is changed in the same way. Furthermore, in both cases the reanalysis has created new exponents for Y (namely na from hina and mu from modo). This change is also relevant for the development of to, summarised in (6): the reanalysis was triggered by the loss of infinitival and subjunctive morphology (as in Greek and Calabrian), creating a new exponent for the realisation of these features, namely M* = to. The third group we identify is that which appears to be a bit more complex, as it actually involves two relevant steps and covers all the rest of the cases where a lexical item associated with the realisation of the DP becomes the realisation of a functional head in the clause structure. The two steps proceed in parallel to some extent, as the changes inside the DP can be taken as responsible for the realisation of the clausal functional heads (and vice versa). The first step involves movement of a DP to a higher functional projection, giving a specifier. The second step involves reanalysis of this DP as a head. This is what we find in the development of n-words as negative morphemes (French pas, Greek dhen), and the reanalysis of pronominals as Agr heads (subject clitics in Italian, agreement affixes in Welsh, and possibly Indo-European) (the structures in (2b), (11)–(12b), (17)–(18)). The structural change is schematically given in (21): (21) [XP YP X . . .[. . .tYP . . .]  >  [XP Y=X . . .[. . . .]] Once again the structural change has created a new exponent for X. A similar account extends to the examples in (15) and (16) (universal quantifiers

190  Anna Roussou and Ian Roberts out of relative clauses) although the difference here, following Kayne (1994), is that the moved element is already a head that adjoins to another head. The final case that follows this pattern is that of the complementisers that and pou, given in (7) and (8): a DP element moves to SpecCP and becomes reanalysed as C. (14) comes under this schema too, assuming CG kan was originally an XP. This covers all the changes listed in (1–18), except for (2c) and (3c). In (3c), however, the crucial change really involves the reduction of the biclausal structure to a monoclausal one; the lexical verb moves to the T-position of its clause both before and after the change, as finite verbs have done at all periods of Greek (recall that Greek is a null-subject language). Leaving this aside, (3c) amounts to the loss of a piece of structure. (2c) is a rather special case, which we will return to when we discuss markedness in the next section. As the preceding discussion shows it is possible to reduce the structures in (1)–(18) to the three basic configurations given in (19)–(21). In fact, we immediately observe that the structures in (19)–(21) are one and the same thing, given in (22): (22)


… Y

YP (where YP does not have to be the complement of X)

         In all cases the reanalysis gives rise to a new exponent for a higher functional head X; this is the formal correlate of grammaticalisation. What is even more interesting is that all the changes described above may have started from different configurations and were driven by different causes. It is intuitively clear that all of the schemata in (19–22) involve structural simplification, in that the structures on the right of the arrow are less elaborate than those on the left. We have mentioned this point repeatedly in our exposition of the individual cases in the preceding chapters. But how exactly is relative simplicity determined? In principle, there are several formal options available in syntactic representations or derivations: one could count nodes, branching nodes, traces, chain links, symbols, or features. Counting nodes yields the correct result in (19) and (21), but not (20): here both the conservative (left of the arrow) and the innovative (right of the arrow) structures have the same number of nodes. The same considerations apply if branching nodes are taken as the diagnostic for simplicity. Similar considerations hold if either traces (or copies) or chain links are taken as diagnostic; according to either of these criteria, in (19) and (21) the innovative structure is simpler than the conservative one, but not (20). Regarding the computation of symbols or features, (20) again poses problems, in that it is not clear that the reanalysed structure has fewer of either of these than the conservative one.

Theoretical Consequences  191 We must therefore go a step further and provide an account which is more in accordance with our notion of parametric variation. The simplification that takes place in all the cases of grammaticalisation discussed so far correlates with the morphological realisation of features. Prior to reanalysis what we find is one lexical item α spelling out the features of two (or perhaps more) heads X and Y. So what is at stake is the earlier item Y becoming a pure instantiation of the feature content of the relevant head X. This works in the case of loss of movement, as in (19); Y must have an X-feature in order to move. It works in cases like (21), as the original YP (which presumably had more than one feature, in virtue of being an XP—if not, it reduces to case (19)) becomes X. Most importantly in the present context, given our discussion above, it works in the case of (20), where F becomes the sole instantiation of X, having previously been syncretised on Y (recall that the examples of (20) all involve mood markers appearing in M, where previously mood had been part of the verb-morphology; we can think of the earlier system as involving an Agree relation between M and V (or perhaps T) for mood features, and the change as essentially the loss of Agree). The relevant notion of simplicity is determined by the following simplicity metric: (23) A structural representation R for a substring of input text S is simpler than an alternative representation R’ iff R contains fewer formal feature-syncretisms than R’. Feature syncretism can be defined as the presence of more than one formal feature in a given structural position: H [+F, +G . . .]. Thus the structure with the least occurrences of multiple features on single positions is the simplest. Structural simplification should be understood in terms of PF-realisation of these features, so a lexical item which realises X and Y is more complex than one which realises X only. This approach to structural simplification allows us to maintain the idea that there is a universal hierarchy of functional heads (on which, see the next subsection), and at the same time capture parametric variation in a rather clear way. One thing that this approach rules out, for example, is reordering of categories. The metric in (23) works for all the types of change in (19–21); where X(P) moves in the conservative structure (i.e. in (19) or (21)), it must have had at least two features, one allowing it to Merge in the original position, and one triggering movement (this is consistent with our approach to parametric variation in terms of PF-realisation, which is essentially a way of formalising movement). Also in (20), Y originally had more features than just F, which is why it Merges where it does in the conservative grammar. After the reanalysis, the lexical item α becomes the sole realisation of X. If the realisation of a given feature X is what parametric variation amounts to, then we can clearly see the link between parameter setting and syntactic change. So we arrive at a formal approach to grammaticalisation which is

192  Anna Roussou and Ian Roberts based on parameter setting (we will consider the types of parameter change listed in (1)–(18) in 6.2.3). The general schema in (22) has two more implications. First, it predicts that grammaticalisation can be cyclic. In other words, nothing prevents Y from being reanalyzed again, yielding new exponents for X. This is indeed a desirable result and one that seems to be supported by the empirical data. Second, it predicts that grammaticalisation can be successive, namely once Y has been reanalysed as X, it can further be reanalysed as an even higher functional head Z. Indeed, R&R show how this works in our discussion of the empirical cases in Chapters 2–4. The reanalysis of modals is again a clear example of how this works. In Chapter 2, Section 2.1 of R&R we argued that the dynamic (root) vs. epistemic modal readings can be structurally distinguished in the following way: dynamic modals are merged in v, while epistemic ones merge in T (and lexical verbs are simply merged in V). This way we are in a position to define the ‘path’ of reanalysis: from V (lexical), to v (dynamic/root), to T (epistemic), with the possibility of further reanalysis as C, as we argued with respect to the modal thelei in Greek (Section 2.3). Therefore our approach has a clear advantage as it allows us to capture the path of the structural change in the grammaticalisation cases along the hierarchy of functional heads. Furthermore we see that the path is always upwards.1 We elaborate on this in Section 6.2.2. Finally, note that this characterisation of grammaticalisation does not rely on an earlier stage involving visible movement, either of a head or of an XP. That is one option, but the option in (20), which we might characterise as loss of Agree, also falls under the general characterisation driven by (23).

6.2  Grammaticalisation and the Theory of Language Change 6.2.1  Structural Simplification and Language Acquisition In Chapter 1 of R&R we discussed the correlation between language change and language acquisition, following Lightfoot (1979, 1991, 1998). The idea has been that parametric change is triggered when a population of acquirers converges on a given parametric setting which is different from the one adopted by the adult grammar. The question we now need to address is how structural simplification, conceived as ‘avoid feature-syncretism’ in (23), works in the process of language acquisition. In Chapter 1 of R&R we said that the conservative nature of the language acquirer favours a ‘simplified’ structure. This term obviously needs to be redefined and/or clarified in the context of the present discussion. As already mentioned we assume that there is a universal hierarchy of functional categories present in all languages. The parametric options are nothing more than options of how the features of these categories are spelled out, if they are. What the language acquirer is faced with is the following: on the one hand, there is the universal order of functional heads which is

Theoretical Consequences  193 the same for all languages, or at least a universal pool of functional categories which project in the clause structure in a predicted way (we will elaborate on this in Section 6.3). The operations Merge and Agree basically see these features and relate them in a rather mechanical way (putting them together in the case of Merge, matching them in the case of Agree). This is essentially what the computational system (CHL) does. In a way, this is the invisible side of CHL, which interfaces with LF (and semantic interpretation in general). On the other hand, there is the interface with PF, where these features are pronounced somehow, so the computational system has to ensure that there is some matching between functional features and their realisation, that is lexical items. The list of lexical items available to each language is arbitrary, and this is one aspect (if not the only aspect) of the imperfection of the language system. Given this arbitrariness, and given that language is a formal system (possibly the only one) which interfaces with PF, the ideal situation would be if there was a one-to-one matching between lexical items and features. In other words, ideally we would expect that each feature has its own unique PF-realisation. Of course, this is just an idealisation. What we actually find is a rather blurred picture for a number of reasons: either because in some cases there is no realisation for a given feature (which is the ideal situation with respect to the LF side), or in many cases a given lexical item spells out a number of different features. This is the typical situation of what we know as movement. There is one more possibility, namely that a given feature may receive more than one realisation, in a way that is contextually determined. Recall the case of tha and na as modal particles and realisations of M* (Chapter 3, Section 3.1 of R&R). Despite the fact that they both have the same effect, they do not trigger the same interpretations. This is because tha at least has a subfeature (call it ‘future’) of the feature irrealis which is one value for M (we discuss subfeatures in more detail in 6.2.3). The picture that emerges is a rather conflicting one, but only apparently so: the perfection of the language system requires that all features are present, ideally without any PF-interference. On the other hand, PF imposes its own restrictions, and ideally the preferred option is to have a one-to-one mapping between features and lexical items. If this is correct, then this is precisely what the conservative nature of the language acquirer dictates: if a feature must have a realisation (because it is unambiguously cued—see below), then the same realization for more than one feature, i.e. syncretism, is preferentially avoided. This approach explains why inflectional morphology is so important in syntactic change (this of course is hardly a novel observation). Consider the structural changes in (1)–(18) in the previous section and the cause of the change in each case (where this is known). The “cause” should be construed as what prevents the acquirer from postulating the simpler structure in the earlier grammar. The cue for the more complex, conservative grammar is in most cases morphology. In other words, the cause is the trigger for the presence of a more complex system (complex in

194  Anna Roussou and Ian Roberts connection with the notion of simplification as defined above). So we see that much syntactic change comes from outside syntax, basically from PF (cf. Keenan (1996), Longobardi (2001), on the inertial nature of syntactic change). We can rather easily support our claim above by looking at each of the cases in (1)–(18), or at least for all those cases where we have some indication for the cause that led to parametric change. Starting with (1) (English modals), we notice that this is due to the loss of the infinitival morphology. This same loss, corroborated by the loss of the subjunctive morphology, is also responsible for the development of to as the new ‘infinitival’ marker, namely for the change in (6). The cases in (4) and (5), i.e. of na and mu, also arise from the loss of the subjunctive morphology, in addition to the loss of the infinitival marking. The last change is partly responsible for the development of tha in the the na construction (cf. (3), and in particular (3c)), although the initial cause has to do with the loss of a distinct morphological paradigm for the future indicative (which collapsed the future indicative with the past tense subjunctive). This change is also responsible for the development of a periphrastic construction for the expression of the future in (2), followed by further causes such as the morphological irregularity of habeo, the reduced productivity of leftward XP-movement (probably related to the loss of morphological case—see Roberts (1997 [this volume, Chapter 4])), etc. The same extends to (9): the loss of case marking seems to have given rise to the development of D. In other cases, one reanalysis triggers another one, as is the case in (13) and (14) (the restriction of indefinites as wh-words gave rise to a new series of existential quantifiers in the history of Greek). Finally, morphological levelling is the cause for the reanalysis of subject clitics in (17). In most of the cases discussed so far, there is an ambiguity in the structure which has to do with morphophonological changes, or in some cases the ambiguity is purely structural (as in (7) for example). This supports the idea that syntactic change is triggered when marked input is obscured to the language acquirer, who then switches to the default. “Marked” input here simply means “input containing feature syncretisms”. As we will show below, following the preceding discussion, we now have a way to define formal markedness in terms of PF-realisations. To summarise, structural simplification can be seen in the form of (23), that is as a way of avoiding feature syncretism. Given that the latter is provided by the morphological system, which has to be learned and is furthermore parametrised, we have a clear way of linking the notion of simplification with the process of language acquisition. Once the cue (that is morphology mainly) becomes obscure or ambiguous the conservative nature of the language acquirer will opt for a simplified structure: maximise the correspondence between structure and lexical items. This of course yields new exponents for functional features as is indeed what we get in grammaticalisation cases.

Theoretical Consequences  195 6.2.2  Grammaticalisation and Other Syntactic Changes In the present book we have treated grammaticalisation as an instance of upwards reanalysis, which gives rise to new functional material. Furthermore, this reanalysis (at least in most of the cases) involves loss of movement (Move > Merge change). In the present section we will compare grammaticalisation to other syntactic changes, showing their similarities and differences. Let us start by considering three well-known cases in the literature, namely the loss of V2, the loss of V-to-I movement, and the OV > VO change, summarised in (24), (25), and (26) respectively: (24) Loss of V2: [C [T V]][TP . . . tT . . .]  >  C . .[TP . . . [T V]...] (25) Loss of V-to-T: [T V] . . . [VP . . tV . . .]  >  T . .[VP . . . V . . .] (26) OV > VO: [FP Obj . . . [VP . . . (V) tObj . . .]  >  [VP . . . (V) Obj . . . ] The loss of V2, schematically represented in (24), has been discussed with respect to a number of languages, for example English (van Kemenade (1987), (1997), Haeberli (1999), Kroch & Taylor (1997), Pintzuk (1991), among others), French (Adams (1987), Vanelli, Renzi and Benincà (1986), Roberts (1993a), Vance (1989), (1997)), Northern Italian dialects (Renzi, Vanelli, and Benincà (1986)), Welsh (Willis (1998)). (25) represents the loss of V-to-T movement in the history of English (Roberts (1985 [this volume, Chapter 1]), (1993a), Pollock (1989), Warner (1997), among others; see also Vikner (1997) for Danish). Finally, (26) is an instance of word-order change, as exhibited in the history of English. The schema in (26) is based on Roberts (1997 [this volume, chapter 4]), who accounts for this change in terms of Kayne’s (1994) antisymmetry approach: OV is the result of object raising to a position higher than that of the verb, so the change from OV to VO involves loss of object movement. In brief, what all the changes in (24)–(26) share is loss of movement to a higher functional head (C, T, or F respectively). Although the above changes involve loss of movement they are not treated as instances of grammaticalisation. Since in our approach grammaticalisation has also been related to the loss of movement, the obvious question is how we distinguish between the cases above and the typical cases of grammaticalisation discussed in the previous chapters. To illustrate the differences we will focus on the example in (25), namely the loss of V-to-T movement. The reason for choosing (25) is twofold: first, because we can easily compare it to the grammaticalisation of T elements, namely modals, in the history of English (see Chapter 2, Section 2.1 of R&R), and second,

196  Anna Roussou and Ian Roberts because the empirical aspects of the change in (25) are more straightforward than the ones in (24) and (26). The examples in (27) illustrate the presence of V-to-T movement in pre-17th-century English: (27) a. if I gave not this accompt to you if I gave not (=didn’t give) this account to you (1557: J. Cheke, Letter to Hoby; Görlach (1991: 223), Roberts (1999: 290) [see this volume, p. 142]) b. How cam’st thou hither? How camest thou (did you come) here? (1594: Shakespeare, Richard III; Roberts (ibid.)) c The Turkes.. made anone redy a grete ordonnaunce The Turks . . . made soon (=soon prepared) a great ordnance. (c1482: Kaye, The Delectable Newsse of the Glorious Victorye of the Rhodyans agaynest the Turkes; Gray (1985: 23), Roberts (1993a:253)) d In doleful wise they ended both their days (1589: Marlowe The Jew of Malta III, iii, 21; Roberts (ibid.)) The finite V precedes negation in (27a), inverts with the subject in questions in (27b), precedes the adverb in (27c), and precedes a floated quantifier in (27d). All the above examples then show clearly that V raised to T (and in the appropriate contexts to C as well). The picture is quite different in Modern English: the verb cannot raise to T or C. It is in part for this reason that negation and question formation trigger do-support. The loss of V-movement has been related to the loss of inflectional morphology. Vikner (1997: 200) formulates the following condition on V-raising: (28) An SVO language has V-to-I movement if and only if person morphology is found in all tenses. Whether (28) needs any refinement or not does not affect our present discussion (see for example Alexiadou & Fanselow (2000), Bobalijk (2000) for criticisms). What is crucial is the fact that the reanalysis in (26) affected all lexical verbs and moreover left T with no lexical realisation in the relevant context. In this respect the change in (26) (and for that matter those in (24) and (27)) can be abstractly represented as in (29): (29) [XP Y+X [YP tY]] > [XP X[YP Y]] The structure in (29) is also an instance of structural simplification, in the sense that the realisation of X under movement is no longer present after the reanalysis. If this is correct, then we need to outline what the difference is

Theoretical Consequences  197 between (29) and the structural simplification we get in grammaticalisation, as in (22) in Section 6.1, repeated below as (30): (30) Grammaticalisation: [XP Y+X [YP tY]] > [XP Y=X [YP Y]] The input in (29) and (30) is the same, but there is a clear difference in the output, as the structure on the right-hand side of the arrow shows in each case. (29) expresses the loss of V-movement, while (30) the grammaticalisation of modals, as discussed in Chapter 2, Section 2.1 of R&R. Given the two sets of data and the relevant structures, it is rather easy to identify the differences between the two. First, unlike the loss of V-to-T movement in (29), the grammaticalisation of modals under (30) created a new realisation for T (T*Merge). Second, the reanalysis in (29) is ‘downwards’, while grammaticalisation is an instance of ‘upwards’ reanalysis. Third, while loss of V-raising in (26) affected the whole class of lexical verbs, grammaticalisation affected a subclass of verbs with a number of properties in common (they are intensional, tend to lack argument structure, and are subject to morphological irregularities). Fourth, grammaticalisation is associated with semantic ‘bleaching’ and phonological reduction (for example the modal will lost its argument structure and failed to express volition, moreover the contracted form shows up with the modal reading and not with the previous lexical one). On the other hand, ‘downward’ reanalysis as in (29) has no such consequences. The differences between the two types of reanalysis are summarised in (31) and (32) below. (31) covers all the changes in (24)–(26) which involve loss of movement, but are not instances of grammaticalisation, while (32) expresses the properties of grammaticalisation: (31) ‘Downward’ changes as in (24)–(26): a. apply to all members of Y; b. do not change category of Y; c. involve no semantic or phonological change to Y-roots. (32) ‘Upward’ changes, as in (30): a. apply only sporadically or to morphological subclasses of Y; b. change category of Y; c. are associated with semantic bleaching and morphophonological reduction. What is interesting to note is that the ‘downward’ changes in (31) have no interface effects. For example, the loss of V-to-T movement did not affect the interpretation of lexical verbs (or T for that matter) and the reanalyses in (24)–(26) did not affect the argument structure of the verbs that underwent

198  Anna Roussou and Ian Roberts the change (or in the case of (27) the nature of direct objects). Furthermore, the change in (31) does give not rise to any phonological effects in the sense of triggering phonological reduction, etc (although arguably it affected the PF-realisation of T). On the other hand, the ‘upward’ changes in (32) have interface effects, as they go along with phonological reduction, and affect the meaning of the reanalysed element (see the loss of volitional meaning on will and Greek thelo as an auxiliary for example). The absence of interface effects in the case of ‘downward’ reanalysis can be directly linked to the fact that this change does not give rise to functional material, while ‘upward’ reanalysis does.2 We will discuss this more in the following section (6.3). As the above discussion shows, our approach can sufficiently express grammaticalisation and furthermore distinguish it from other cases which also involve loss of movement, by formulating it in terms of ‘upward’ vs. ‘downward’ change. This way we can capture the similarities of the two types of changes and at the same time explicitly state that only grammaticalisation gives rise to new functional material. Finally, it is interesting to note that one further similarity that the two changes share is that the cause can be identified with morphological changes: for example, as already mentioned the loss of V-to-T movement relates to loss of agreement marking, and the OV > VO change with the loss of case distinctions (the cause for the loss of V2 is more speculative, but there may be a correlation with the loss of particles in the C-system, cf. Ferraresi (1997), Roberts & Roussou (2002)). 6.2.3 Descriptive and Explanatory Adequacy: Questions of Markedness In this section we address three issues. First, we return to the question of the tension between descriptive and explanatory adequacy adumbrated in the Introduction of R&R. We resolve this tension by adopting a particular point of view regarding markedness of parameter values, which is defined in terms of the simplicity metric in (23). Second, we show how the majority of the parameter changes listed in (10)–(18) are changes from a marked to an unmarked value. Third, we show how change from unmarked to marked is possible in this context, looking at the change in (2c) above. This completes the picture of syntactic change that we want to present here. As we said in the Introduction, the study of grammaticalisation raises the familiar tension between descriptive and explanatory adequacy in the diachronic domain. A descriptively adequate account of this class of changes results in defining pathways of change. In our terms, as we mentioned in 6.1, pathways of grammaticalisation are defined by the functional hierarchy through which grammaticalised material can travel by means of successive upward reanalyses. Thus grammaticalisation pathways can be deduced from the functional hierarchy (and possibly viceversa), once upward reanalysis is taken as a basic mechanism of syntactic change. However, if we take parameter setting to be an explanatory notion

Theoretical Consequences  199 for syntactic change (and, to the extent that we are to make the connection to language acquisition, as in section 6.2.1, it is), then we are led to an apparent difficulty. In principles and parameters theory, parameters can be thought of as creating a space of variation in which individual grammatical systems are distributed. Synchronically, different systems are viewed as scattered in this space. Diachronically, they randomly “walk” around the space as a function of time. This view is not compatible with the existence of diachronic drift, pathways of change, etc., a point repeatedly and cogently made by Lightfoot (see in particular Lightfoot (1979) and Lightfoot (1998)). So, as stated in the Introduction to R&R, we must reconcile the evidence for pathways of change at the descriptive level with the fact that an explanatory account of syntactic change must involve random parameter change. Following Clark & Roberts (1993 [this volume, Chapter 2]) and Roberts (2001), we propose that a version of the traditional linguistic concept of markedness is able to resolve this tension. The relevant notion of markedness is rooted in the simplicity metric, which we repeat here: (23) A structural representation R for a substring of input text S is simpler than an alternative representation R’ iff R contains fewer formal feature-syncretisms than R’. Now, we stated in the last section that movement operations are always associated with feature-syncretism, since a moved element has one feature licensing it in its merged position and one triggering movement (or Agree—see the discussion in 6.1), then movement is always associated with relatively complex representations. Let us suppose, then, that F*Move is a marked option relative to F, precisely because it entails a more complex representation than F in terms of (23). Also, if no phonological matrix is simpler than the presence of a phonological matrix (since a phonological matrix consists of features, this could be related to (23)), F*Merge is relatively marked as compared to F, but less marked than F*Move as it lacks the features relevant for triggering movement. Finally, we consider that F*Move/ Merge is the most marked option of all, as this involves two phonological matrices and the features involved in triggering movement. So we arrive at the markedness hierarchy for parameter values in (33) (where “>” means “more marked than”): (33) F*Move/Merge > F*Move > F*Merge > F Relatively marked parameter values require overt, robust cues. In the absence of such cues, a less marked option is taken, with F as the default. As we pointed out in the previous section, the notion of “cause” of the changes in (1)–(18) above, should be understood as the factor cuing a relatively marked setting.

200  Anna Roussou and Ian Roberts Let us look at these ideas in the light of the definitions of P-expression and trigger given in Chapter 1, section 1.1 of R&R: (34) Parameter expression:

A substring of the input text S expresses a parameter pi just in case a grammar must have pi set to a definite value in order to assign a wellformed representation to S.

(35) Trigger:

A substring of the input text S is a trigger for parameter pj if S expresses pj.

Given markedness, only marked values of parameters need to be expressed. P-expression then reduces to: (36) a. expression of movement relations (through syntactic “displacement”);3 b. expression of free functional morphemes (through PF-realisation). More generally, acquirers are looking for overt realisations of functional heads. If they analyse a functional head as [F F ], we have the F*Merge option. If it is analysed as [F(P) G F ] (where G stands for moved material of any kind), we have the F*Move option, or, if F has its own phonological realisation, F*Move/Merge. The crucial point, however, is that the conservative nature of the learner, since it prefers maximally simple representations in the sense defined by (23), always favours the default option F. So, if the elements and relations which lead to one of the complex realisations of F are not robustly expressed in the trigger, the default option is chosen. Let us now consider the markedness hierarchy in (33) in relation to the changes summarized in (1)–(18). We have the following result: (37) a. F*Move/Merge > F*Move: (2c), (18) b. F*Move > F*Merge: (1), (2a), (3a), (3b), (7),4 (10), (12b), (13), (15), (16), (17) c. F*Merge > F: (3c), (12a) The clear majority of the changes involve reductions in markedness. The three types of markedness reduction seen in (37) correspond to different subtypes of grammaticalisation mentioned in the literature: (37a) creates new morphology; (37b) is “true” grammaticalisation, in that it creates a realisation of a new functional head; finally, (37c) is loss of realisation (note that Stage Three of Jespersen’s cycle, discussed in 4.2.3 of R&R, would be a further case of this type of change: Neg* > Neg applying to a “high”, C- or T-related Neg). However, on the basis of what has been said so far, some of the changes in (1)–(18) seem to involve an increase in markedness. This is the case for (2b), (4), (6), (8), (9), (11), and (14). Let us consider these more closely.

Theoretical Consequences  201 First, the change in (2b) involved the cliticisation of habere and its attraction of the infinitive. The infinitive movement was reanalysed from (possibly remnant) XP-movement of the category containing the infinitive to SpecTP; this was arguably a facet of the general OV-to-VO shift which took place in the transition from Latin to Romance (see Chapter 2, 2.2 of R&R for discussion and in fact an alternative analysis). Strictly speaking, the earlier structure was also an instance of F*Move/Merge, since we are not distinguishing XP-movement and head-movement. In that case, what changed in Late Latin was XP-movement becoming head-movement, which can certainly be seen as a simplification (and one which needs to be built into (33)). But how does F*Move/Merge arise in the first place? This is a question which must be addressed, if we are not to predict that such options are so marked that they will always irrevocably disappear. Recall that the F*Move/Merge option arose from an earlier structure in which habere moved to T, via the standard case of grammaticalisation Move > Merge. It seems, then, that the marked structure arose in part from a reanalysis of the head which made it less marked. Thus we can consider that T*Move/Merge arose from a still more marked option: T*Move/Move, i.e. the case where T* attracts two elements, a head and a specifier. In this sense, the reanalysis in (2b) in fact involved a reduction in markedness. But of course we now have to explain how T*Move/Move arose. We conjecture that this option arose from T*Move, via a reduction in feature content of the attracted element, thereby requiring that two things move in order to satisfy the property of the attractor. For example, in the case under discussion it is possible that Latin habere was already a light verb v, and that T required an element with a “full” V-feature. Hence, VP-movement is introduced (itself possibly a reanalysis an earlier nominal infinitival – see 2.2 of R&R). Although the technical details are uncertain,5 this sketch shows us how simplification of the attracted element (reduction of habere from V to v) in line with (23), may create complexity elsewhere. The local nature of simplification is what creates complexity, and what prevents language change from leading to irreversible simplification. The changes in (4) and (6) are very similar, as we mentioned in Chapter 3 of R&R. Here new instantiations of M develop through the loss of feature syncretism elsewhere. Recall that the mood markers developed in Greek and English due to the loss of infinitival/subjunctive marking on the verb. Effectively, then, where the earlier system had Agree and feature syncretism on V (or T), the new system has no Agree relation and no feature syncretism. This is a simplification in terms of (23). It may appear that there is an increase in phonological markedness, but in fact this is not true: in the earlier system verbal morphology marked mood distinctions, and these distinctions were eradicated. So these cases are straightforward examples of simplification that are not directly captured in terms of the markedness hierarchy in (33). What is needed is a further option F*Agree, which, like F*Move, is more marked than F*Merge. It is likely that F*Agree is less marked

202  Anna Roussou and Ian Roberts than F*Move (since Move involves a further component, indeed in Chomsky’s terms a further feature, in addition to Agree). (8), the development of serial verbs, appears to be a case of relabelling in the sense of Whitman (2001). Here we can note that T inherently has less feature content than V, as it lacks argument structure (see the next section on the loss of argument structure under grammaticalisation). It is at least possible that C has less feature content than T (note that less temporal distinctions are made in the C-system than in the T-system—see Rizzi (1997)). So this change may well be consistent with (23). Finally, the changes in (9), (11), and (14) all involve the loss of feature content on the part of the grammaticalised element (Dem > D, Num/N > Neg, focus particle > Q), and hence are in conformity with (23). So we see that all the changes in (1–18) are in conformity with (23). (23) is of course the fundamental notion; the approximate hierarchy in (33), which we have now seen to be too simple, derives from (23). Most importantly, the changes in (37a) show how new synthetic morphology may be created, despite the fact that (23) may appear at first sight to favour analyticity. It is interesting to note that in these cases other aspects of word order (OV or VS) are relevant to creating the environment in which new morphology can emerge. New morphology does not readily emerge in SVO languages, it seems, and we can observe that in such languages (English, Romance) there is a clear historical tendency towards analyticity. (23) can explain this, a good example of how we can combine parametric analyses of change with an account of diachronic drift. We would now like to briefly compare our notion of markedness with recent proposals of Cinque’s (1999). As part of his study of clause structure across languages, Cinque observes that functional heads seem to have both marked and unmarked values. A selection of these is given in (38): (38)  MoodSpeech Act MoodEvaluative MoodEvidential ModEpistemic Unmarked: Declarative -[-fortunate] direct evidence commitment Marked: -declarative -fortunate -direct evidence -commitment The observations of marked and unmarked values are based on familiar criteria: marked features are “more restricted [in] application, less frequent, conceptually more complex, expressed by overt morphology” (Cinque 1999: 128), while unmarked features are the opposite. Note in particular that marked features tend to be morphologically realised while unmarked features do not. How does this kind of markedness (which we will refer to as “Jakobsonian”) relate to the proposals we have just made? The two notions are quite distinct, in several important respects. First, Jakobsonian markedness refers to values of functional heads, while the one just sketched refers to realisations of those heads. Second, Jakobsonian markedness is not parameterised: the features are available in every language, and (presumably) stand

Theoretical Consequences  203 in the same markedness relations in every language—Jakobsonian markedness is thus given by UG, while the one just sketched derives from a formal property of the learning algorithm. They are thus quite different kinds of thing. Third, Jakobsonian markedness is a substantive notion (note the reference to conceptual complexity in the above quotation from Cinque), while that just sketched is a formal notion. So there are very good reasons to keep the two kinds of markedness distinct, as formal (the one sketched here) and substantive (Jakobsonian) notions with quite different cognitive status (the former deriving from the learning device, the latter from UG). However, two things lead us to say a little more than this. First, common to both notions is the idea that overt morphophonological realisation is marked, while zero realisation is unmarked. Second, there are very significant cross-linguistic generalisations in Cinque’s version of substantive markedness that we would like to find an expression for. Tentatively, we think that the two notions of markedness can be connected by taking a lead from Cinque (who takes it from Jakobson (see Cinque (1999: 128)) in regarding unmarked values as, in a sense, underspecified. What is needed is a feature hierarchy. Functional heads, as features F, G, H . . . , can come with various further feature specifications f, g, h . . . (we write the subfeatures with lower case and potentially autonomous functional features with upper case). We can then treat unmarked values of functional heads as simply the autonomous functional feature F, while the marked value will have a further subfeature, giving F+f. So MoodSpeech Act (or Force, or C; hereafter we refer to this category as C for convenience) means “declarative”, while C[-declarative] means nondeclarative. Of course, on this view, [-declarative] doesn’t exist (and neither does [+declarative], this being the unmarked value of the category C. What exist are other speech-act features: Q, Exclamative, Imperative, etc. These are all subfeatures of C. In other words, instead of saying that we have C with the two values [±declarative], we have C = Declarative by default and C = Imperative, Interrogative (etc.) as marked subfeatures. Now, if the parametrisation operator, which randomly distributes the * in the lexicon, applies to all types of features, F+f will have two chances of PF-realisation, while F will only have one. Thus, marked feature values are more likely to be overtly realised than unmarked ones and we derive implicational statements of the form “If a language has a declarative particle, then it has an interrogative particle,” etc., from the fact that where F* must be realised then so must all subfeatures of F*. Note that this idea carries over to the F*Move case, which seems right. In many languages, for example, marked illocutionary forces are associated with movement to C, while declaratives are not (this is approximately the situation in “residual V2” languages like Modern English). So we also derive the (correct) implicational universal “If a language has movement to C in declaratives, then it has such movement in interrogatives, etc.”6

204  Anna Roussou and Ian Roberts There is an independent reason to adopt some concept of markedness in current principles-and-parameters theory (this argument was made in Roberts (2001: 89–92)): the parameter space as currently defined offers too many choices for comparative or historical work to be possible. Assuming functional categories are the locus of parametric variation, and we have four potential parameter values per functional head, then for n = |F|, the cardinality of the set of functional heads, the cardinality of the set of parameters |P| is 4n and the cardinality of the set of grammatical systems is 24n. Assuming 4 heads in the C-system (see Chapter 3 of R&R and Rizzi (1997)), 4 heads in the T-system (T, Asp, Neg and v) and 4 in the D-system (see Chapter 4 of R&R), we have n = 12. Then |P| = 48 and |G| 248. This is a very large space indeed, and bear in mind that this is based on a fairly conservative functional structure (we have not taken into account the functional structure associated with AP, for example). In a discussion of a 30-parameter system, giving 1,073,741,824 grammars, Clark (1990) points out that a learner who checks one grammar per second from birth would in the worst case take 34 years to converge if this is the number of possible grammars. Hence there must be learning device which facilitates the search in this space. We can, I think, make a similar argument on the basis of diachronic considerations. Two assumptions are generally made in all comparative and historical linguistics (in fact, they really make historical linguistics possible, and have done since the beginnings of the discipline). These are articulated by Croft (1994) as follows:7 (39) a. Uniformitarianism: “the languages of the past are not different in nature from those of the present” (Croft (1994: 204)); b. Connectivity: “within a set of attested language states defined by a given typological classification, a language can . . . shift from any state to any other state” (Croft (1994: 205)). We can reformulate these assumptions in terms of principles-and-parameters theory as follows: (40) a. Uniformitarianism: the languages of the past conform to the same UG as those of the present; b. Connectivity: a grammatical system can change into any other grammatical system given enough time (i.e. all parameters are equally variable). Put this way, both assumptions seem entirely reasonable. To deny (40a) would be to assert that speakers of languages of the past were cognitively different from speakers of currently existing languages. Presumably, though, at least as far back as the origin of modern homo sapiens, we do not want to say this. Effectively, (40a) is the null hypothesis regarding the relation of UG to language change. Denying (40b) would imply “privileging” certain

Theoretical Consequences  205 parameters, a conceptually highly dubious move for which there seems to be no empirical motivation: (40b) is the null hypothesis regarding the role of parameters in language change. So we want to maintain the assumptions in (40). Now, at present approximately 5,000 languages are spoken (Ruhlen (1987)). Suppose that this figure is constant throughout human history (back to the emergence of homo sapiens), and that every language changes with every generation, so if we have a new generation every 25 years, we have 20,000 languages per century. If the total number of grammatical systems is 230 (following Clark’s (1990) discussion), it would take 18,000 centuries for each type to be realised once. At present, the usual reckoning is that humans have been around for about 2000 centuries (i.e. 200,000 years— see for example Bickerton (1991)). Of course, the figures given here are rather arbitrary, but the point should be clear: given the kind of parameter space we seem to have, on the basis of the empirical examination of existing languages, there simply has not been enough time since the emergence of the species (and therefore, we are assuming, of UG) for anything like the total range of possibilities offered by UG to be realised. This conclusion effectively empties uniformitarianism and connectivity of content. In theory, we simply couldn’t know whether a language of the past corresponded to the UG of the present or not, since the overwhelming likelihood is that it is typologically different from any language that existed before or since.8 One might conclude that 30 parameters define too big a parametric space, but, as we have seen, we have here rather conservatively given ourselves a 48-parameter system. Here we are again faced—in a different context— with the familiar tension between the exigencies of empirical description, which lead us to postulate ever more entities, and the need for explanation, which requires us to eliminate as many entities as possible. Chomsky (1995: 4–5) notes that the principles-and-parameters model resolved this tension for synchronic comparative syntax, but we see that the problem reemerges at a higher level. It seems then that the parameter space is too big for the assumptions of uniformity and connectivity to have any empirical consequences. Since uniformity represents the null hypothesis about the relation of UG to change, and connectivity the null hypothesis about parametric change, this conclusion appears to cast doubt on the entire enterprise of looking at syntactic change from the point of view of principles and parameters theory. This is the conceptual problem caused by the size of the parameter space. The size of the parameter space also raises an empirical issue: the fact that on the basis of a small subset of currently-existing languages we can clearly observe language types, and note diachronic drift from one type to another, is simply astonishing. The view presented above implies that, as far as the history of humanity up to now is concerned, languages should appear to vary unpredictably and without assignable limits, even if we have a UG containing just 30 or so parameters. Obviously, we need to find ways

206  Anna Roussou and Ian Roberts to reduce the range of parametric possibilities while retaining (at least) 30 parameters. In the next paragraphs, we will consider two ways to do this. First, we can suppose that something is causing grammatical systems to “clump” in the parametric space, rather like galaxies in the physical universe. What is the parameter-space equivalent of the forces that cause stars to bunch together into galaxies, etc.? We would like to suggest that the traditional linguistic concept of markedness creates basins of attraction in parameter space. In other words, unmarked values of parameters can effectively reduce the possible space that grammatical systems occupy, and so reduce the hyperastronomical range of possibilities (230, 248, etc.) to a sufficiently small range of possibilities for language types and diachronic drift to be discernible. Given the general considerations about the relation between language types, language acquisition, and language change raised in the previous section, the attractors— markedness—must be introduced by the learning algorithm. This is exactly what we have proposed here, and so our approach to markedness has independent motivation.

6.3  On the Nature of Functional Categories 6.3.1 ‘Semantic Bleaching’ and the Logical Nature of Functional Categories In the preceding discussion we have basically provided an account of how functional categories can be realised, when they are: by Merge, Move, in some cases by the combination of Merge and Move (cf. syntactic affixation). The development of a Merge realisation is what has been identified as the innovation in the cases of grammaticalisation. We have provided evidence for this kind of development by considering a number of functional heads, such as T (and for that matter v), C (which we argued splits in a number of heads, such as M, Op, etc.), Agr (which is actually a cover term for features such as Person and Number at least), and D (which again splits into a number of heads, such Dem, Num, and Q at least). The implicit assumption in our discussion has been that these heads have semantic content (cf. Chomsky 1995, 2000, 2001), thus arguing against an approach that postulates heads as pure checking positions. Under this approach the postulation of a series of functional heads is possible to the extent that there is also semantic justification for their presence. The obvious question is what exactly we mean by ‘semantic’ content when it comes to functional heads. The cases of grammaticalisation we have considered so far show that lexical (or functional) to functional reanalysis goes along with a change in the meaning of the reanalysed element. In standard grammaticalisation theory terms this is called ‘semantic bleaching’ (cf. Hopper & Traugott (1993: 87) for references). We consider this term indicative of the semantic changes that are associated with grammaticalisation, so we will use it without

Theoretical Consequences  207 committing ourselves to the theoretical framework where it originated. In order to illustrate how semantic bleaching works we will focus on a couple of cases we have considered in the previous chapters. The lexical to functional reanalysis is perhaps the best way to illustrate the changes in meaning. Recall from our discussion in Chapter 2 of R&R that a subset of lexical verbs becomes the realisation of a functional head. By doing so it loses part of its own semantic information and becomes compatible with the information which is associated with head/projection it realises. The key notion here is that of ‘semantic information’. More precisely what needs to be clarified is which part of its lexical meaning is lost and which part remains, so that the reanalysed item can be the realisation of the functional position. Let us then consider modal verbs in English or thelo in Greek. These cases show a common pattern: as lexical verbs they have argument structure, and when they become functional elements they have no argument structure. For example the verb thelo in Greek is 2-place predicate, which takes a DP as an external argument and a DP or CP (na-clause) as its internal argument. While there are no clear restrictions on the internal argument, this is not so for the external, which has to be +animate (a –animate subject is incompatible with the volitional reading). We can then assume that volitional thelo is merged in V and from there it moves to v and T, as shown in (41). (41) [TP thelo [vP tthelo [VP tthelo]]] Merger of thelo in V and then movement to v allows us to capture the fact that it has a complete argument structure. As already mention in Chapter 2, Section 2.3 of R&R thelo in MG can also be non-volitional, in which case it translates as need. Under this interpretation it does not impose any restrictions on the subject, but is sensitive to the properties of the complement (e.g. it is incompatible with a definite DP, but compatible with a deverbal nominal which is interpreted as a complex event in the sense of Grimshaw (1990)). Furthermore, it is incompatible with perfective aspect, and only allows for a 3rd person reading (singular or plural) when it takes a na-complement (cf. Roussou (2005) for a discussion). It thus differs in some very clear ways from volitional thelo and although it has some argument structure, it is rather defective in a rather obvious sense. In our discussion in Chapter 2 of R&R, we argued that this kind of modals are merged in v, as in (42): (42) [TP thelo [vP tthelo [VP V]]] A comparison between (41) and (42) shows how in the case of the same lexical item two different readings emerge with clear effects on the argument structure (and whatever this entails). To be more precise, if the interpretation of the external argument is determined in association with that of the internal one in a configurational approach (cf. Hale & Keyser (1993)), then

208  Anna Roussou and Ian Roberts the fact that thelo is not merged in V, but in v directly can account for the fact that it is not on its own able to restrict the interpretation of the subject. As a result of this, a [–animate] subject is possible, which is otherwise unavailable. Note that at this point the meaning of thelo is also affected in a clear way. As a volitional verb it has the reading ‘desire/want/wish’, while as a semi-lexical verb it has the reading ‘need/require’ (which is perhaps derived as an implication of ‘want’: I want something implies I’m in need of something). In both cases though it expresses an unrealised wish or a request for something to happen (posterior to the speech time).9 Consider what happens next when it is reanalysed as a T element: it is obviously not compatible with any argument structure, as at no stage does it appear in V or v. In our discussion in Chapter 2 of R&R we argued that once a verbal element is merged in T it is able to assume an epistemic interpretation, namely it can encode necessity or possibility. In the absence of any argument structure what remains in other words is the purely modal content of a verb like thelo (and presumably this is what we find in the history of thelo on its way to becoming a future marker). As Bybee et al. (1994) argue, this is a change from an Agent-oriented to a speaker-oriented modality. We then notice, as is rather standard, that in each step of the reanalysis there is an effect on the semantic content of thelo (which as noted above is not independent of what may appear as its complement). As a result of the reanalyses, thelo loses any kind of descriptive content and becomes associated with logical content. To be more precise, it has no predicative properties (i.e. it no longer expresses a relation between two individuals, or between an individual and an unrealised eventuality). What remains is just the modal notion of unrealised event. The same effect can be observed more or less in the same way for all the other modals under consideration (although the derived meaning can differ as a function of the different lexical source and the feature the reanalysed element realises). The crucial point is that merger in a functional position and loss of argument structure go together. So semantic bleaching is not just the random loss of content. It is rather the retention of logical meaning and the loss of non-logical content. Here, the notion of logical content is best glossed in terms of permutation invariance (see Mostowski (1957), Sher (1997)). The basic intuition here is that the logical content is independent of the external factors, or in von Fintel’s (1995) words, insensitive to facts about the world. The following quotation from his work captures the intuition behind permutation invariance (p. 179): The intuition is that logicality means being insensitive to specific facts about the world. For example, the quantifier all expresses a purely mathematical relationship between two sets of individuals (the subset relation). Its semantics would not be affected if we switched a couple of individuals while keeping the cardinality of the two sets constant. There

Theoretical Consequences  209 couldn’t be a logical item all blonde because it would be sensitive to more than numerical relations. More formally, consider the following definitions from the discussion in Sher (1996: 518): (43) “[Logical] quantifiers should not allow us to distinguish between different elements [of the underlying universe]” (Mostowski (1957: 13), parentheses supplied by Sher). (44) An A-quantifier is logical iff it is invariant under permutations of A (or, more precisely, permutations of P(A) induced by permutations of A). (An A-quantifier is a set of subsets of a universe A or a function from subsets of A to truth values). In other words, as Sher says “QA is logical iff for any permutation of A and any subset B of A, QA(B)= QA(p’(B)), where p’ is the permutation of P(A) induced by p” (p. 518). The definition is elaborated as follows ((45) is attributed to Lindström (1966)): (45) A term is logical iff it is invariant under isomorphic structures. (Sher (1996: 520)) Here a “structure” means an n + 1-tuple, , where A ≠ Ø and Di, 1 ≤ i ≤ n, is member of A, a subset of A or a relation of A. Again, a quotation from Sher can elucidate this definition: “A term invariant under isomorphic structures takes into account only the mathematical structure of its arguments in a given universe. Since individuals are, semantically, atomic elements, they are all structurally identical, and their difference is not detected by any logical (structural) term.” Sher points out that the definition in (45) includes the following elements as logical: cardinality quantifiers, the 1st-order identity relation and “most”-type quantifiers, “more than”, the membership predicate and the relational predicate “well-ordering”. These are elements (with the possible exception of the last one) which are naturally construed as D- or T-elements. (45) excludes predicates such as “is tall,” “is a relation between humans” etc., i.e. lexical predicates. We believe that (45), or something very like it, may be the key to a formal characterisation of the nature of functional categories. It then follows that lexical material which grammaticalises as functional material loses all semantic content which cannot be construed under (45). For verbs, this entails the loss of argument structure; for nouns, the loss of descriptive content; for adjectives, the loss of descriptive content (cf. the discussion of “whole” > “all” below and in 4.4 of R&R); for Prepositions, the loss of content relating to spatial relations (cf. the discussion of Greek kata in 4.4 of R&R). In other words, because it involves a particular kind of change in syntactic category, grammaticalisation strips away the descriptive content and leaves the

210  Anna Roussou and Ian Roberts logical content associated with the reanalysed element. Because the content of functional heads is limited to logical content, when a lexical element becomes functional, it loses all non-logical content. We understand non-logical content in terms of permutation/isomorphism invariance, as described above. The example von Fintel (1995) discusses is that of a universal quantifier, e.g. all. We considered the grammaticalisation of this element in Greek out of the adjective holos > olos (Chapter 4, Section 4.4 of R&R). Following Haspelmath (1995: 367), we assumed that the distributive/universal reading arises in combination with a collective noun, so that an expression like the whole family can be interpreted as all the family/every member of the family. The reanalysis is given in (46) below: (46) [QP ola [DP ta [..[NP spitia]]] Notice that by being merged in Q element, olos can no longer modify the noun, i.e. there is no descriptive content to relate to that of the noun. Similar effects have been attested in our discussion of n-words in French for example. In Chapter 4, Section 4.2.2 of R&R we considered in detail the reanalysis of words like rien and personne. In this case reanalysis is a bit more complex as the negative meaning arises as a function of the changes that affected D inside the DP (the unavailability of a null D as an indefinite marker) and the Agree relation with clausal Neg. Leaving these changes aside what is crucial is that due to these factors the original descriptive content of the noun is reanalysed as the restriction on a quantificational relation (of course, the restriction represents non-logical content, but as n-words in Num personne, etc. denote logical relations rather than purely descriptive content; this may be a way to characterise a semi-functional element—cf. also the discussion of thelo when merged in v above). We have observed similar developments in other cases: nouns lose all their descriptive content in being reanalysed as negators (see 4.2.3 of R&R), and so elements such as pas, dhen, and shi lose all nominal content and simply become negators. In the cases just discussed above we have a lexical item being reanalysed as a functional one. We also allowed for a functional element being reanalysed as a functional one (e.g. modal particles and complementisers in Chapter 3 of R&R). In these cases as well we argued that there is a semantic change involved. Given that the source of the reanalysis is a functional feature the only difference here is that there is a switch from one type of logical element to another. In a way this is what makes these cases a bit less interesting than the ones discussed above in terms of the semantic change (but not in terms of the structural change involved). The notions discussed here open up the possibility of characterising functional categories in a new way. Many functional elements clearly have logical meanings, in a sense that seems very close to that defined above. This is clearly true of quantificational elements in DP (occupying D, Q, or Num). It is true also of modal elements, to the extent that these quantify over

Theoretical Consequences  211 possible worlds. It is also true of negation, as this just defines a complement relation between sets. It may be true of Tense and Aspect, to the extent that these notions can be construed as quantification over times or events. Alternatively, Tense may be an ordering predicate (Stowell (1996)), another kind of logical relation (see above). Complementisers, to the extent that they may be factive or realis, are connected to modality. The various degree markers which may make up a functional system associated with AP are also logical, as indicated in the discussion above. The status of demonstratives is unclear, however; perhaps Dem is not a functional category after all (this would not materially alter our discussion in 4.1 of R&R; we could treat Dem as an AP). Finally, Agreement cannot be functional on this definition (see the discussions in 1.3 and 4.5 of R&R), although, if split into Person and Number, as suggested in 4.6 of R&R, Number could qualify. The question of the status of Person features also relates to the demonstratives; essentially the question here is whether 1st- and 2nd-person features can be seen as logical elements. Of course, the above remarks are somewhat sketchy. But the important point to note is that if in the lexical > functional reanalysis what remains is the logical content, and given that the new item becomes the realisation of a functional feature, then this logical content must be in fact the content of the functional feature in question. In other words grammaticalisation provides us with a good way to understand the properties of functional categories. The way to understand ‘semantic bleaching’ then is precisely in terms of creating items whose meaning is purely logical, in the sense of isomorphism invariance as discussed above. 6.3.2  Speculations on Phonological Reduction We have repeatedly observed that grammaticalisation involves “phonological reduction” of the grammaticalised element. Indeed, if we consider our cases in (1–18), we can observe that most of them involve some process of this type: English modals developed unstressed and reduced forms in the 16th-century (see Plank (1984)); habere clearly reduced first to a clitic and then an affix in becoming the future/conditional forms of various Romance languages; Greek thelo + na reduces to tha; hina reduces to na; Latin modo reduces to Calabrian mu; complementiser that can reduce, while demonstrative that cannot (as we commented in 3.4 of R&R); Latin ille reduced to a monosyllabic form as an article (and object clitic) in Romance (Giusti (2001) suggests this was a crucial step in the reanalysis of these elements); Greek oudhen reduced to dhen as part of its reanalysis as negation; kan + pjoios reduces to kapjos; free relatives undergo a spectacular morphophonological reduction in becoming free-choice indefinites or universal quantifiers; and pronouns undergo a certain amount of reduction in becoming agreement markers. Thirteen out of 18 of our cases of grammaticalisation involve phonological reduction of some type.10

212  Anna Roussou and Ian Roberts There is nothing novel in this observation. For example, Hopper & Traugott (1993: 7) discuss the “cline of grammaticality” in (47), and similar observations have been frequently made: (47) content item > grammatical word > clitic > inflectional affix Of course, phonological change goes on all the time, and is in principle quite independent of syntactic change. But the kind of phonological reduction which is associated with grammaticalisation appears to be more radical than standard phonological change, and of course only affects grammaticalised elements; it is not exceptionless in the traditional neogrammarian sense (cf. the different possibilities of reducing the vowel of the noun can and that of the modal can to schwa). Here we would like to relate these observations regarding phonological reduction to observations about the prosodic nature of functional categories in the synchronic phonological literature. We mentioned in Chapter 1, section 1.3 of R&R, that phonologically realised functional elements are typically unstressed and “light”. In fact, it is clear that in many languages they fall below certain threshold prosodic values. In English, for example, monomoraic CV words are not found in the lexical vocabulary (Kenstowicz (1994: 640)). For this reason, one can define a minimal word in English as being bimoraic. However, Kenstowicz (1994: 642) notes that “elements drawn from the nonlexical class of pronouns, prepositions, and grammatical particles frequently escape minimality restrictions”. McCarthy & Prince (1986) make the notion of minimality more precise in terms of the prosodic hierarchy in (48): (48) PrWd (prosodic word) | F (foot) | σ (syllable) | μ (mora) Every foot must be binary, i.e. disyllabic or bimoraic, and so monosyllabic or monomoraic items cannot be feet, and therefore cannot be phonological words. Kenstowicz (1994: 643) comments “A degenerate element can escape the normal stress rules and hence cliticize. Thus, clitics tend to be monosyllabic. These prosodic dwarfs reside primarily in the nonlexical region of the vocabulary”. Phonologically realised functional elements are thus typical subminimal in terms of the prosodic system of the language, and as such liable to cliticise. Since clitics are phonologically bound elements, there is a natural propensity to reanalyse them as morphologically bound elements, i.e. affixes; in fact in section 6.2.3 above we treated this development as F*Move/Merge > F*Move. In these terms, then, we can naturally relate the emergence of F*Move/Merge to the clitic status of the exponent of F.

Theoretical Consequences  213 Here is a list of “weak forms” in English, from Gimson (1980: 261–263): (49)

a am an and are as at be been but can (aux.) could do (aux.) does (aux.) for from had (aux.) has (aux) have (aux) he her him his is me must not of saint shall she should sir some (pl. indef.) than that (C) the them there (expl.) to us was we were who will would you

Unaccented ǝ m, ǝm n, ǝn ǝnd, nd, ǝn, n ǝ(r) ǝz ǝt bi: bi:n bǝt kǝn, kn kǝd, kd du, dǝ, d dǝz, z, s fǝ(r) frǝm hǝd, ǝd, d hǝz, ǝz, z, s hǝv, ǝv, v hi:, i:, i: hǝ, ɜ:, ǝ i:m i:z s, z mi: mǝst, mǝs nt, n ǝv, v, ǝ sǝnt, snt, sǝn, sn šǝl, šl ši: šǝd, šd sǝ(r) sǝm, sm ðǝn, ðn ðǝt ðǝ, ði: ðǝm, ǝm, m ðǝ(r) tǝ, tu ǝs, s wǝz wi: wǝ(r) hu, u:, u l wǝd, ǝd, d ju

Accented ei æm æn ænd a(r) æz æt bi: bi:n bʌt kæn kud du: dʌz fo:(r) from hæd hæz hæv hi: hɜ(r) hi:m hi:z i:z mi: mʌst not ov seint šæl ši: šud sɜ(r) sʌm ðæn ðæt ði: ðǝm ðεǝ(r) tu: ʌs woz wi: wɜ(r) hu: wi:l wud ju:

214  Anna Roussou and Ian Roberts Here a striking pattern emerges in nearly every case: the unaccented form is subminimal (it is almost always monomoraic, i.e. containing something smaller than CVC or CV:), the accented form is at least bimoraic, and nearly all the elements capable of being unaccented are functional. The unaccented forms are of course the usual ones in connected speech, unless the element in question is contrastively stressed. All the CVC unaccented forms contain schwa, which cannot be stressed in most varieties of English. So we observe a very clear correlation between functional elements and having the status of a “prosodic dwarf” to use Kenstowicz’s term.11 This correlation is borne out by the observation that certain items may be either functional or lexical; when functional they can be unaccented, when lexical they cannot be. We have already commented on can (noun) vs. can (aux) in this respect, as well as that in Chapter 3, 3.4 of R&R. Above we see that both do and have pattern in this way: main-verb do, for example, as in I do university administration every morning cannot be reduced to /dǝ/ or /d/, unlike auxiliary do in an example like Do universities serve any purpose? Similarly, the ability to reduce have correlates with its auxiliary syntax, as the following examples show: (50) a. John hasn’t left. b.  %John hasn’t a car. (51) a. John’s left. b. %John’s a car. (in the interpretation “John has a car”) The cases of some and there are similar, although the singular quantifier some, which cannot reduce, may be a problem. In general, then, we can observe a correlation between prosodic subminimality and functional elements. We can therefore understand why grammaticalisation, understood as the creation of new functional material, may involve phonological reduction. Where a lexical element is reanalysed as functional, it must involve phonological reduction if functional categories are required to be subminimal. The evidence we have seen is compatible with the idea that functional categories are in fact obligatorily subminimal. Italian, as discussed in Vogel (1999), provides further evidence for the same conclusion. Vogel (citing Bullock (1991), Repetti (1989, 1991), Vogel (1994)) assumes that the minimal word in Italian consists of a bimoraic foot. She points out that rather few words are actually minimal (mentioning bel (“beautiful (m.sg.)”) and fai (“you(sg) do”) among others). Among the subminimal words of Italian are the pronominal clitics, which consist of a single light syllable. Other elements which consist just of a single light syllable are the articles and the complementisers che, di, and a (again, note that the latter two are also prepositions). So here we observe a similar correlation between functional categories and prosodic subminimality to the one we saw for English. No doubt this kind of observation could be repeated in other languages.

Theoretical Consequences  215 Now, the above discussion has naturally concentrated on F*Merge, the case where a functional category has an overt realisation. The other possible parametric values of functional categories are also phonologically defective, and in fact can be seen in the same light as being prosodically subminimal. F*Merge/Move is naturally related to the clitic status of the exponent of F, as mentioned above. F, the case where there is no phonological realisation of the category, can clearly be seen as an extreme case of subminimality. In Chapter 1 of R&R we observed that lexical elements always have a phonological matrix (although they may of course be subject to operations like ellipsis and gapping, however these are to be characterised; in Chapters Two and Three of R&R we postulated structures containing radically empty VPs and NPs, but this does not alter the fact that the lexical entries of lexical categories are always associated with a phonological matrix); this can also be understood in terms of the minimal word requirement applying to fully lexical items. Finally, F*Move can be thought of as a phonological specification: the *-diacritic requires an element lacking its own lexically given phonological matrix to have one, and so it triggers movement (before Spell Out) of some appropriate element (presumably one it Agrees with—see Chomsky (2000, 2001)). So the parametric properties of functional categories are all differing instances of the fact that these categories are prosodically subminimal. One final point: we are adopting the standard view in principles-andparameters theory that language acquisition involved parameter setting. The parameters we have proposed all involve the realisations (or the lack of realisation) of functional categories. In this section we have suggested that functional categories are inherently prosodically defective. This means that many of the cues—those in (36b) above—for parameter settings reside in perceptually nonsalient parts of the input string: unstressed, subminimal formatives. This naturally places a burden on the language acquirer and creates the possibility of the kind of “mis-setting” of parameters that leads to language change (of course, movement itself is another type of cue for parameter settings; this raises different considerations). In this section, we have suggested that there is a phonological characterisation of functional categories: they are prosodically subminimal elements. In the next section, we will speculate as to why this should be. 6.3.3  A Speculative Characterisation of Functional Categories In this section, we will try to put together the proposals made in the previous two sections, in order to arrive at a tentative characterisation of the nature of functional categories. Our proposal is that functional categories are inherently defective at the interfaces, and, as such, are categories with highly reduced lexical entries. In 6.3.1 we suggested a semantic characterisation of functional categories as being restricted to purely logical denotations, in the sense of isomorphism

216  Anna Roussou and Ian Roberts invariance as described there. In 6.3.2 we observed that functional categories are typically prosodically subminimal. Putting these two ideas together, we can make the following observation:12 (52) Functional categories are defective at the interfaces. To see what (52) means, compare a volitional verb, such as English want with a future marker such as reduced ’ll of spoken colloquial English (similarly, one could compare Latin habeo with Late Latin/Early Romance aio, or perhaps Classical Greek thelo with Modern Greek tha—although in the latter two cases the prosodic facts are not entirely clear to us). As discussed in 6.3.1, want has an argument structure and a non-logical denotation; it is also larger than the minimal English word (it is CVCC, and therefore clearly bimoraic). Its lexical entry must therefore contain information regarding its argument structure and its prosodic structure; in other words, a range of interface properties need to be specified. Of course, this word also has formal syntactic features, at least V (or perhaps [+V, -N]) and maybe a specification of its Case-assigning properties. The reduced auxiliary ’ll, on the other hand, lacks both argument structure and prosodic structure. Its semantic content is arguably exhausted by the feature Future, its phonological content by /l/. However, it has a syntactic categorial feature, T.13 Thus the basic difference between lexical and functional elements is summarised by (52). Call this the Interface Defectivity Hypothesis (IDH). The IDH allows us to reduce the semantic bleaching and phonological reduction (each construed more precisely as in 6.3.1 and 6.3.2 respectively) associated with grammaticalisation to the kind of syntactic categorial reanalysis which we have documented in the foregoing chapters. Moreover, if the IDH is correct then we have a theoretical tool which we can use to characterise the inventory of functional heads. This of course should be matched with the empirical evidence attested crosslinguistically. This can also take us one step towards a characterisation of the functional structure of the clause, and of other functional domains, notably DP. We return to this point below. Before discussing the functional hierarchy, let us make one further observation. In recent work (Marantz (1997), Chomsky (2000, 2001: 43)), it has been suggested that categorical features such as N, V etc. are to be dispensed with. If so, then functional features play a still bigger role in the basic syntactic computations (Merge and Agree in particular) than previously. Moreover, lexical elements have no intrinsic formal features at all.14 Combining this idea with the IDH, we arrive at a near-perfect complementarity: functional categories bear all and only the features relevant for the syntactic computation; lexical categories bear all and only the interface features. If the syntax is then seen as the optimal way to satisfy interface properties for a certain array of lexical items, then we can understand why functional categories must be present, as items which make this procedure possible.

Theoretical Consequences  217 Of course, the complementarity between lexical and functional categories is not perfect, in that functional categories can have phonological properties (albeit reduced) and must have logical properties. It follows for Chomsky (2000, 2001) that they must have interpretable features, and we suggested in 6.3.1 that these features are semantically limited in a particular way—see Chomsky (2000: 138– 139). So what needs to be explained is why they can have phonological features. We have already speculated (see 6.2.1) that as far as the computation to LF is concerned, functional categories need have no PF-properties. So we need to understand why they have just the limited PF-properties they have. We conjecture that the answer to this is very simple: we have seen that functional categories lack metrical properties, which are clearly a major feature of phonological structure. Aside from requiring that functional categories lack phonological structure in this sense, no restriction is imposed. Hence functional categories vary randomly, below the minimal level for participation in prosodic structure. This line of reasoning explains the existence of parametric variation with the form it has. We can gloss the IDH in relation to the restriction to logical content on the LF side in the same way. Our claim that functional categories are restricted to logical meanings amounts to treating them as logical constants. Logical constants are the simplest elements of a logical system. More specifically, since functional categories lack interface structure, then we can surmise that this means that they can only have logically atomic properties—more complex denotations (involving relations with the world, and for example predicate/argument structure) are not allowed. Finally, we can understand the simplicity metric in (23) in a similar light. Feature syncretism involves structure (in that the two features must stand in some kind of relation), and so is to be avoided. To summarise these speculations, then, we can conclude as follows: (53) Functional categories are atomic, in that they (preferentially) lack structure in syntax, and obligatorily lack it at the interfaces. (53) can explain semantic bleaching, phonological reduction, the nature and existence of parametric variation with the properties assumed here and, through (23), the nature of syntactic change and the existence of markedness. As such, it goes a long way towards explaining many of the apparent imperfections of language, including not least the propensity to variation and change in time and space. For these reasons, although it is highly speculative, we think that (53), and its congeners (23) and the IDH, are worth thinking about. 6.3.4  Remarks on the Functional Hierarchy Here we restrict ourselves to a few comments on the functional hierarchy, applying some of the conclusions of the foregoing sections where relevant. Of course, this is a very large topic which we cannot begin to do justice to here, and our remarks should be taken as anything but definitive.

218  Anna Roussou and Ian Roberts In order to determine what the functional hierarchy is, two things are required: the first is to identify the number of possible functional heads. In the previous section, we offered a general characterisation of functional heads, which ought in principle to be able to provide an answer to this question. The next step involves the ordering of these functional heads, i.e. how exactly the universal ordering is derived. Cinque’s (1999) system is an attempt to characterise these positions in terms of the empirical evidence provided by the distribution and realisation of adverbial, auxiliary, and affixal elements. This distribution determines not only the nature of functional categories but their respective order. We gave a preliminary version of Cinque’s hierarchy for the “IP-domain” in Chapter 1, (18) of R&R, which we repeat here as (54): (54) MoodSpeech Act MoodEvaluative MoodEvidential ModEpistemic T(Past) T(Future) MoodIrrealis ModNecessity ModPossibility AspHabitual AspRepetitive(I) AspFrequentative(I) AspCelerative(I) ModVolitional AspCelerative(i) T(Anterior) AspTerminative AspContinuative AspPerfect(?) AspRetrospective AspProximative AspDurative AspGeneric/progressive AspProspective AspSgCompletive(I) AspPlCompletive Voice AspCelerative(II) AspSgCompletive(II) AspRepetitive(II) AspFrequentative(II) AspSgCompletive(II)

The resulting order roughly involves a series of aspectual heads above V, above there lies a series of modal heads, above which are the T heads, finally above these is a further series of modal heads. The latter may relate to discourse properties. The question that arises in a system of this type is where DP arguments fit in this system, or for example where quantificational elements (e.g. wh-words) go. Furthermore a related question arises with respect to the DP-internal structure, although Cinque assumes that it is in parallel with that found in the clausal system, as the positions and ordering of attributive adjectives show. In addition, a structure of this type is assumed to be embedded under a complex C-system of the type proposed by Rizzi (1997) (and see Chapter 3 of R&R). On the other hand, Chomsky (1995, 2000, 2001) takes a rather conservative view and identifies three basic functional heads C, T, and v, leaving open the possibility that these are cover terms for more complex systems. Just by looking at these two rather different approaches we identify a common theme: namely, the division of the clausal domain in three basic parts (above V, above T, and above C). This is in fact rather well accepted in the literature (cf. Cardinaletti & Starke (1999), Grohmann (2000), Platzack (2001)). This is further supported by Belletti’s (2004) proposal to iterate the projections found in the C-system (Topic, Focus, etc) in the space immediately above VP (the right periphery). A rather similar stand is taken by Manzini & Savoia (2005) who identify these positions with clitic shells (in the sense of Sportiche (1996)) and argue that each shell can be projected above V, above I, and above C (simultaneously in some cases). The clitic shell consists of the positions typically associated with the DP, thus bringing out the intuition that there is a correspondence between the nominal and the clausal structure.

Theoretical Consequences  219 As the above brief discussion shows there is no general agreement either on the exact number or on the nature of the functional categories involved, although there seems to be consensus on the existence of C and D at least. One very interesting generalisation that emerges, despite the differences of approach and execution, is that functional heads can repeat themselves in different domains: this has been repeatedly observed for negation and modality, and may be true for focus and topic, if Belletti’s proposals are correct. There appears to be a cyclic structure to the functional hierarchy. This also has an important implication for the theory of grammaticalisation that we have been pursuing in the present book. If there is in fact repetition of features within the hierarchy, then certain kinds of reanalysis do not seem to be so unexpected. We raised this point in connection with the similarities between C, D, and P and the reanalysis of P to as a C (M) element (see Chapter 3 of R&R). We also observed similarities between the reanalysis of N to Num (e.g. n-words, see 4.2.1 of R&R) and V to T (e.g. modals, see 1.1 of R&R). It seems likely then that the functional structure consists of the iteration within the different domains of the same sequence of categories. Cardinaletti & Starke (1999: 184f) propose the structure C–Σ–I–lexical category. It is rather difficult to give these categories a general characterisation, and we will not attempt to do so here. Instead, we simply observe that the same or very similar diachronic processes appear to operate across each domain, as we have seen in the foregoing chapters, and that our characterisation of functional categories predicts three things: (i) that these hierarchies will vary randomly in their PF-realisation from language to language, although all functional categories will remain prosodically subminimal; (ii) the denotations of the functional categories are isomorphism-invariant; (iii) feature syncretism is always avoided. The last point is rather important, in that it suggests that the fields cannot obviously be defined by a system of intersecting features, e.g. C = [+Ref, +V], D = [+Ref, +N], etc. (Cardinaletti & Starke (1999) employ a system of this sort). The only natural alternative is that the systems are defined semantically in terms of the types of individuals they quantify (note that it is natural to see all functional heads as quantificational, given what we said in 6.3.1), viz.: C-heads quantify over propositions, T-heads quantify over events, and D-heads over individuals. This conclusion implies that the way to understand functional structure is by understanding quantification.

6.4 Conclusion In this work we have attempted to give a general formal characterisation of grammaticalisation, the process by which new exponents of functional categories are created. We have argued, on the basis of 18 case studies from a range of languages, that grammaticalisation involves structural reanalysis so that some new element comes to be merged in a functional position F. The

220  Anna Roussou and Ian Roberts structural reanalysis is always simplification in the precise sense defined by the simplicity metric in (23), repeated here: (23) A structural representation R for a substring of input text S is simpler than an alternative representation R’ iff R contains fewer formal feature syncretisms than R’. We described above how (23) provides the basis for a theory of markedness of parameter values, and how changes which create more marked structures may be consistent with (23) at a local level. Finally, in this chapter we have sketched a general characterisation of functional categories, which can explain why grammaticalisation is always associated with phonological reduction and semantic bleaching. For us, this is the direct consequence of the development of new functional material. One question we have only touched on in passing concerns the type of material which is prone to grammaticalisation. In Chapter 1 of R&R, we mentioned isolated morphological subclasses such as the OE/ME preteritpresents and 2nd-conjugation stative verbs in Latin. It is also no accident that the English premodals were intensional verbs, so of course was Greek thelo, which gave rise to the future marker tha. The particles reanalysed as irrealis markers in M discussed in Chapter 3 of R&R also had an intensional meaning (“in order to” or “unless”). Of the cases discussed in Chapter 4 of R&R, it is clear that generic nouns naturally develop into indefinites of various kinds (and thus into n-words and/or wh-words); we suggested that this happens when the descriptive content of the noun can be reanalysed as the restriction on a quantifier. Finally, the reanalysis of pronouns as agreement markers involves no change in phi-features, but simply a loss of referential properties. It is not clear what generalisations emerge from all this: we suggest in fact that the reanalysis which underlies grammaticalisation will act on any available lexical material, as long as it can be reconstrued as functional along the lines described above. The variety of cases discussed by Heine & Kuteva (2002) supports this: the noun “child” may be reanalysed as a partitive (p. 67), “ear” as a locative marker (p. 121),15 “song” as a noun classifier (p. 280). So we make no generalisations on this point. Once an element enters the functional system, it will tend to be reanalysed successively upwards in the structure, and this creates grammaticalisation paths, as we have already pointed out. In the Introduction to R&R, we identified several larger themes in the book. One was the tension between a descriptively adequate account of grammaticalisation paths and the standard principles-and-parameters view of language change as a random walk through a space defined by the range of parametric variation. We tried to show in section 6.1.3 above that this tension can be resolved in terms of an independently motivated account of the relative markedness values of different parameters. Markedness effectively creates “basins of attraction” in the parameter space, and thereby

Theoretical Consequences  221 causes grammatical systems to “clump” around certain combinations of options. Another issue was the characterisation of a possible functional category. We looked at this in 6.3, and tentatively suggested the IDH, and then the idea that functional categories have no structure, put forward in (53). The IDH is supported by and supports our analysis of grammaticalisation. We believe that in this connection we have been able to provide a new perspective on the nature of functional categories by looking at their diachronic development. Finally, since we are framing our analyses in terms of (a variant of) Chomsky’s minimalist programme, we should ask ourselves whether our work has shed any light on the question of the nature of language as a perfect system. We believe that it may have. We remarked at the end of section 6.3.3 that the general idea that functional categories are atomic in structure may underlie the following properties: (55) a. semantic bleaching (they have the minimal LF-structure, cf. the IDH) b. phonological reduction (they have the minimal PF-structure, cf. the IDH) c. the nature and existence of parametric variation (PF is indifferent to all properties except prosodic subminimality, hence random PF-variation) d. the nature of syntactic change (random diachronic variation in PF-properties) e. markedness (the interaction of (23), derived from (53), with (d)). Explaining the nature of syntactic change, as we believe our proposals to do, entails explaining the nature of synchronic variation, since synchronic variation is just the result of diachronic changes. So our proposals about functional categories explain not just the nature, but the existence, of parametric variation. If the above ideas are correct, a major feature of natural language is accounted for in a straightforward way. The existence of functional categories, movement and parametric variation is quite mysterious in the framework of Chomsky (1995, 2000, 2001) and is considered to be at least an apparent imperfection of the system. In the approach outlined here, we can explain the imperfections in terms of (53). Interestingly, (53) is a natural aspect of a perfect system; it is simply a definition of the atomic elements of the system. But the system in question, the computational system of the syntax, must interface with LF and PF. The simplest system-internal properties give rise to system-external complications, particularly in the case of the PF interface, precisely because the syntax is indifferent to certain formal aspects of the interfaces. Since the PF-interface creates the input to language acquisition, the imperfect mapping from syntax to PF gives rise to variation and change in acquisition, and therefore in grammatical systems generally.

222  Anna Roussou and Ian Roberts We conclude that Chomsky’s conjecture that the computational system that forms the syntax is perfect is not impugned by the existence of such an apparent imperfection as variation and change in time and space. These properties, i.e. the simple existence of different grammatical systems, follow from the interactions—or lack thereof—between the computational system and the PF interface.

Notes  1. Tabor & Closs-Traugott (1998) provide a formulation of grammaticalisation in terms of ‘Structural Scope Expansion’ which is very much on line with our approach. However, they argue that it is not clear that this approach can extend to all cases of grammaticalisation.   2. Note that our approach takes the notion of ‘unidirectionality’ to work to the extent that it can be structurally defined. In this respect we differ from standard functionalist approaches to grammaticalisation (but see Note 1). Furthermore, there is nothing in our approach that prevents instances of degrammaticalisation from taking place, yielding a lexical category out of a functional one (cf. the cases discussed in Newmeyer (1998: Chapter 5)). In Chapter 4, Note 21 of R&R, we mentioned the case of me:dhen > midhen (= zero) in Greek, which involves a quantifier becoming a lexical category (noun). We can account for this on the assumption that the other negative quantifiers dropped out of the system (and oudhen became the negator dhen), and me:dhen was no longer analysed as an element consisting of two (or three) morphemes, but was reanalysed as a single lexical item. This is further supported by the fact that as a Noun it can be preceded by the definite article (e.g. to midhen). We will not discuss these cases here, but we think that this is a rather good example to show that in our terms degrammaticalisation is indeed possible, albeit rather sporadic. On the other hand, grammaticalisation works in a very systematic way.   3. “Displacement” refers to a perturbation of the expected order, which we take to be given by UG in the form of a functional hierarchy. We will discuss the functional hierarchy in the next section.   4. In Greek; in Germanic, this was a case of F*Move/Merge> F*Merge.   5. In Chomsky’s (2000, 2001) terms, we could say that T had an EPP-feature. However, it is entirely unclear how such a mysterious property could be innovated.   6. One could extend this line of reasoning, following recent proposals by Giorgi & Pianesi (1997), and say that F can be entirely absent from the representation, but will be “read in” at LF by convention. On the other hand, F+f has to be syntactically present in order to be interpreted. Once syntactically present, F+f is parametrised, and so might be PF-realised. Cinque (1999:133) criticises the Giorgi & Pianesi approach on the grounds that it leads to two ways of giving a default value for F: F is either present with the default value or absent and interpreted with a default value. In terms of the proposals being made here, though, we could think that F can only be present with a default value if PF-realised, and this is a case of formal markedness, as defined here, and so distinct from the maximal default. The maximally unmarked case is then where F has no PF-realisation and the default LF interpretation. It is natural to think of this as the absence of F from the numeration. What this idea requires, of course, is a theory of LF which can tell us how the defaults are filled in.   7. The concept of uniformitarianism was first put forward by the 18th-century geologist James Hutton. Hutton’s idea was that the features of the earth had

Theoretical Consequences  223 evolved over long periods of time through processes of erosion, etc., rather than having been divinely created. The term became known thanks to Lyell (1830). See Ruhlen (1987:25ff.).   8. Kayne (2000:6-8) discusses the number of parameters and the number of grammatical systems, and makes an interesting and rather plausible case that there are at least as many grammatical systems in the world as there are people, i.e. upwards of 5 billion. Despite initial appearances, this conclusion does not alter the point being made in the text: if there are so many grammatical systems, then vast numbers of them differ only slightly from one another. But we still need to allow for “macrovariation” for gross properties such as basic word order, etc., and so still need to allow in principle for a wide typological range. Essentially, Kayne’s argument leads one to the conclusion that there may be more different grammatical systems in the world than is usually thought, but they are all clustering around the same basins of attraction. Markedness must, if anything, be a more powerful attractive force in the parameter space if Kayne is right.   9. Postma (1995) makes a similar point. He postulates that every time a lexical item moves its meaning is affected. Since all movement is to functional positions, the meaning is affected so as to become non-lexical. 10. Of the remainder, English to may have taken on the ability to reduce to /tǝ/ at the time of the reanalysis; the situation regarding serial verbs becoming complementisers is not known; Stage-Two negative words in French were already phonologically minimal, as was Greek ti(s). The only true exception to the generalisation regarding phonological reduction therefore concerns the French n-words (personne, rien, etc.,), which appear to have undergone no phonological reduction at all in becoming functional. This may be a further reason to consider these items as semi-functional, as briefly suggested in the previous section. 11. There are some Prepositions in the list in (48): at, for, from, of and to. Of these, for and to are also C-elements (see Chapter 3, 3.3 of R&R). Regarding the others, it may be that these are “functional prepositions”; it has often been observed that the class of prepositions may be divided into functional and non-functional elements. A good example of a non-functional preposition is “through”, see the discussion in Chapter 4, 4.4 of R&R. Appellations such as Saint and Sir may be Ds. 12. The idea being put forward here is conceptually similar to Cardinaletti & Starke’s (1999) proposal regarding structural deficiency. The implementation of the intuition is rather different, however. 13. We are slightly simplifying matters here in order to illustrate our proposal. Will has a residual volitional sense, visible in particular in examples like (i): (i) I won’t do it! It also has a CVC form, and as such is (just) a minimal word. These facts may indicate that will is in fact semifunctional; cf. the suggestion in Chapter 1, 1.1 of R&R, that root modals are inserted in v. 14. This entails not viewing θ-roles as formal features, pace Hornstein (1999), and taking Accusative Case to be a property of v (as is standard). 15. In fact, it is not clear whether this case fulfils our criterion for grammaticalisation. See the discussion of “through” in 4.4 of R&R.

References Adams, M. 1987. Old French, Null Subjects and Verb-Second Phenomena. PhD dissertation, UCLA. Alexiadou, A. & G. Fanselow. 2000. Laws of diachrony as the source for syntactic generalizations: The case of V to I. GLOW Newsletter 46:57–58.

224  Anna Roussou and Ian Roberts Belletti, A. 2004. Aspects of the low IP area. In L. Rizzi (ed) The Structure of CP and IP: The Cartography of Syntactic Structures, Volume 2. New York/Oxford: Oxford University Press, pp. 16–51. Bickerton, Derek. 1991. Haunted by the specter of creole genesis. Behavioral and Brain Sciences 14.2: 364–366. Bobaljik, J. 2000. The implication of rich agreement: Why morphology does not drive syntax. GLOW Newsletter 46:28–29. Bullock, B. 1991. The Mora and the Syllable as Prosodic Licensers in the Lexicon. PhD dissertation, Stanford University. Bybee, J., R.D. Perkins & W. Pagliuca. 1994. The Evolution of Grammar: Tense, Aspect and Modality in the Languages of the World. Chicago: University of Chicago Press. Cardinaletti, A., and M. Starke (1999). The typology of structural deficiency: A case study of the three classes of pronouns. In H. van Riemsdijk (ed.) Clitics in the Languages of Europe, 145–233. Berlin: Mouton de Gruyter. Chomsky, N. 1995. The Minimalist Program. Cambridge, MA: MIT Press. Chomsky, N. 2000. Minimalist inquiries: The framework. In R. Martin, D. Michaels & Juan Uriagereka (eds) Step by Step. Essays on Minimalist Syntax in Honor of Howard Lasnik, 89–156. Cambridge, MA: MIT Press. Chomsky, N. 2001. Derivation by phase. In Kenstowicz, M. (ed) Ken Hale: A Life in Language, 1–52. Cambridge, MA: MIT Press. Cinque, G. 1999. Adverbs and Functional Heads: A Cross-Linguistic Perspective. New York/Oxford: Oxford University Press. Clark, Robin. 1990. Papers on Learnability and Natural Selection. Technical Reports in Formal and Computational Linguistics, No. 1. Université de Genève. Clark, R. & I. Roberts. 1993. A computational model of language learnability and language change. Linguistic Inquiry 24:299–345. [this volume, Chapter 2]. Croft, W. 1994. Typology and Universals. Cambridge: Cambridge University Press. Ferraresi, G. 1997. Word Order and Phrase Structure in Gothic. PhD dissertation, University of Stuttgart. Von Fintel, K. 1995. The formal semantics of grammaticalization. NELS 25:175–189. Gimson, A. 1980. An Introduction to the Pronounciation of English. London: Arnold. Giorgi, A. & G. Pianesi. 1997. Tense and Aspect: From Semantics to Morphosyntax. New York/Oxford: Oxford University Press. Giusti, G. 2001. The birth of a functional category: From Latin ILLE to the Romance article and personal pronoun. In G. Cinque & G. Salvi (eds) Current Studies in Italian Syntax: Essays offered to Lorenzo Renzi. Amsterdam: North Holland, pp. 157–171. Görlach, M. Early Modern English. Cambridge: Cambridge University Press. Grimshaw, J. 1990. Argument Structure. Cambridge MA: MIT Press. Grohmann, K. 2000. Prolific Peripheries: A Radical View from the Left. PhD dissertation, University of Maryland, College Park. Haeberli, E. 1999. Features, Categories and the Syntax of A-Positions. Synchronic and Diachronic Variation in the Germanic Languages. Ph.D dissertation, University of Geneva. Hale, K. & S.J. Keyser. 1993. On argument structure and the lexical expression of syntactic relations. In K. Hale & S.J. Keyser (eds) The View from Building 20: Essays in Linguistics in Honor of Sylvain Bromberger. Cambridge MA: MIT Press, pp. 53–109. Haspelmath, M. 1995. Diachronic sources of ‘all’ and ‘every’. In E. Bach, E. Jelinek, A. Kratzer & B. Partee (eds) Quantification in Natural Languages. Dordrecht: Kluwer, pp. 363–382.

Theoretical Consequences  225 Heine, H. & T. Kuteva. 2002. World Lexicon of Grammaticalization. Cambridge: Cambridge University Press. Hopper, P. & E. Traugott. 1993. Grammaticalization. Cambridge: Cambridge University Press. Hornstein, N. 1999. Movement and control. Linguistic Inquiry 30:69–96. Kayne, R. 1994. The Antisymmetry of Syntax. Cambridge MA: MIT Press. Kayne, R. 2000. Parameters and Universals. New York/Oxford: Oxford University Press. Kroch, A. & A. Taylor, 1997. Verb movement in Old and Middle English: Dialect variation and language contact. In A. van Kemenade & N. Vincent (eds) Parameters of Morphosyntactic Change. Cambridge: Cambridge University Press, pp. 297–325. Van Kemenade, A. 1987. Syntactic Case and Morphological Case in the History of English. Dordrecht: Foris. Van Kemenade, A. 1997. V2 and embedded topicalization in Old and Middle English. In A. van Kemenade & N. Vincent (eds) Parameters of Morphosyntactic Change. Cambridge: Cambridge University Press, pp. 326–352. Kenstowicz, M. 1994. Phonology in Generative Grammar. Oxford: Blackwell. Lightfoot, D. 1979. Principles of Diachronic Syntax. Cambridge: Cambridge University Press. Lightfoot, D. 1991. How to Set Parameters: Arguments from Language Change. Cambridge MA: MIT Press. Lightfoot, D. 1998. The Development of Language: Acquisition, Change and Evolution. Oxford: Blackwell. Lyell, C. 1830. Principles of Geology. London. Manzini, M.-R. & L. Savoia. 2005. I Dialetti Italiani e Romanci: Morfosintassi Generativa. Alessandria: Edizioni dell’Orso. Marantz, A. 1997. No Escape from the Syntax: Don’t Try a Morphological Analysis in the Privacy of Your Own Lexicon. Ms., MIT. McCarthy, J. & A. Prince. 1986. Prosodic Morphology. Ms., Brandeis University. Mostowski, A. 1957. On a generalization of quantifiers. Fundamenta Mathematicae 44:12–36. Newmeyer, F. 1998. Language Form and Language Function. Cambridge MA: MIT Press. Pintzuk, S. 1991. Phrase Structures in Competition: Variation and Change in Old English Word Order. PhD dissertation, University of Pennsylvania. Platzack, C. 2001. Multiple interfaces. In U. Nikanne & E. van der Zee (eds) Cognitive Interfaces: Constraints on Linking Cognitive Information. Oxford: Oxford University Press, pp. 295-324. Plank, F. 1984. The modals story retold. Studies in Language 8:305-364. Pollock, J.-Y. 1989. ‘Verb Movement, UG and the Structure of IP’, Linguistic Inquiry 20, 365– 424. Postma, G. 1995. Zero semantics. PhD dissertation, Holland Institute of Linguistics. Repetti, L. 1989. The bimoraic norm of tonic syllables in Italo-Romance. PhD dissertation, UCLA. Repetti, L. 1991. A moraic analysis of raddoppiamento sintattico. Rivista di Linguistica 3:307- 330. Rizzi, L. 1997. The Fine Structure of the Left Periphery. In L. Haegeman (ed) The New Comparative Syntax. London: Longman, pp. 281–337. Roberts, Ian. 1985. Agreement Parameters and the Development of English Modal Auxiliaries. Natural Language and Linguistic Theory 3, 21–58 [this volume, Chapter 1]. Roberts, Ian. 1993. Verbs and Diachronic Syntax. Dordrecht: Kluwer.

226  Anna Roussou and Ian Roberts Roberts, I. 1997. Directionality and word order change in the history of English. In A. van Kemenade & N. Vincent (eds) Parameters of Morphosyntactic Change. Cambridge: Cambridge University Press, pp. 397–426 [this volume, Chapter 4]. Roberts, I. 1999. Verb movement and markedness. In Michel deGraff (ed) Language Creation and Language Change. Cambridge, MA: MIT Press, pp. 287–328 [this volume, Chapter 5]. Roberts, I. 2001. Language change and learnability. In Stefano Bertolo (ed) Parametric Linguistics and Learnability. Cambridge: Cambridge University Press, pp. 81–125. Roberts, I. & A. Roussou. 2002. The EPP as a condition on tense dependencies. In Peter Svenonius (ed) Subjects, Expletives and the EPP. New York/Oxford: Oxford University Press, pp. 125–156. Roberts, I. & A. Roussou. 2003. Syntactic Change: A Minimalist Approach to Grammaticalization. Cambridge: Cambridge University Press. Roussou, A. 2005. The syntax of non-volitional θelo in modern Greek. In A. Stavrou & A. Terzi (eds) Advances in Greek Generative Syntax. Amsterdam: Benjamins, pp. 331–360. Ruhlen, M. 1987. A Guide to the World’s Languages, Volume 1: Classification. London: Edward Arnold. Sher, G. 1996. Semantics and logic. In S. Lappin (ed) The Handbook of Contemporary Semantic Theory. Oxford: Blackwell, pp. 511–537. Sportiche, D. 1996. Clitic constructions. In J. Rooryck & L. Zaring (eds) Phrase Structure and the Lexicon. Dordrecht: Kluwer, pp. 213–276. Stowell, T. 1996. The phrase structure of tense. In J. Rooryck & A Zaring (eds) Phrase Structure and the Lexicon. Dordrecht: Kluwer, pp. 277–292. Tabor, W. & E. Closs-Traugott. 1998. Structural scope expansion and grammaticalization. In A. Giacalone-Ramat & P. Hopper (eds) The Limits of Grammaticalization. Amsterdam: Benjamins, pp. 229–272. Vanelli, Laura, Lorenzo Renzi, and Paola Benincà. 1986. Typologie des pronoms sujets dans les langues romanes. In Actes du XIIe Congrès de Linguistique et Philologie Romanes. Aix-en-Provence. Vikner, S. 1997. V-to-I movement and inflection for person in all tenses. In L. Haegeman (ed) The New Comparative Syntax. London: Longman, pp. 189–213. Vogel, I. 1994. Phonological interfaces in Italian. In M. Mazola (ed) Issues and Theory in Romance Linguistics: Selected Papers from the Linguistics Symposium on the Romance Languages XXIII, pp. 109–125. Vogel, I. 1999. Subminimal constituents in prosodic phonology. In S.J. Hannahs & M. Davenport (eds) Issues in Phonological Structure. Amsterdam: Benjamins, pp. 251–267. Warner, A. 1997. The structure of parametric change, and V-movement in the history of English. In A. van Kemenade & N. Vincent (eds) Parameters of Morphosyntactic Change. Cambridge: Cambridge University Press, pp. 380–393. Whitman, J. 2001. Relabelling. In S. Pintzuk, G. Tsoulas & A. Warner (eds) Diachronic Syntax: Models and Mechanisms. Oxford: Oxford University Press, pp. 220–238. Willis, D. 1998. Syntactic Change in Welsh: A Study of the Loss of Verb-Second. Oxford: OUP.


Cascading Parameter Changes Internally-Driven Change in Middle and Early Modern English Theresa Biberauer and Ian Roberts

1. Introduction Keenan (1996: 3) puts forward an important principle of syntactic change: the Inertia Principle. Keenan formulates this as follows: (1) Things stay as they are unless a force (including decay) acts upon them. We assume that syntactic change is a consequence of abductive reanalysis leading to parameter-resetting in first-language acquisition (see Lightfoot 1979, 1991, 1999). In that case, we can take (1) to mean that, all other things being equal, the target system in first-language acquisition will be converged on successfully. This is no doubt due to the highly restricted range of analyses of the Primary Linguistic Data (PLD) that Universal Grammar (UG) allows and the limited exposure to PLD needed for parameter fixation, i.e. standard poverty-of-stimulus considerations. Longobardi (2001: 278) adopts Keenan’s principle, and puts forward the following very interesting version of it: (2) Syntactic change should not arise, unless it can be shown to be caused (emphasis his). In other words, as Longobardi says, “syntax, by itself, is diachronically completely inert” (277-8). In minimalist terms, this means that the computational system of human language (CHL in Chomsky’s (2001, 2004, 2005) terminology) is not itself capable of endogenous change. The question that then arises is under what circumstances syntactic change can in fact happen? This is the central question that we wish to address in this paper. According to Longobardi’s version of Inertia in (2), syntactic change must be “a well-motivated consequence of other types of change (phonological changes and semantic changes, including the appearance/disappearance of whole lexical items) or, recursively, of other syntactic changes” (2001: 278, emphasis ours—MTB/IGR). Following and elaborating slightly on Longobardi’s point as just quoted, we take it that syntactic change can be caused by changes to PLD arising from independent

228  Theresa Biberauer and Ian Roberts phonological, morphological or lexical change, or from extra-grammatical factors such as contact. In this paper we intend to develop the idea of recursive syntactic change, that which arises when an initial, extra-syntactically induced parameter change creates a system which has a propensity to further parametric change. As we show, using data from the history of English, this may lead to cascades of parameter changes over several centuries, giving rise ultimately to a major typological shift and the illusion of “typological drift”, in the sense of Sapir (1921) (cf. Sapir’s (1921: 165) definition of drift as “the vast accumulation of minute modifications which in time results in the complete remodeling of the language”). We explore this idea by looking at a series of changes which took place in the history of English between 1100 and 1700, which had the net effect of transforming English from what one might think of as a “typologically standard” West Germanic language into the highly unusual system of Modern English, which has many features unattested in the neighbouring Germanic, Romance and Celtic languages. The changes we look at are the following: the shift from OV to VO (12th and early 13th century), the loss of “residual” OV orders (ca. 1400), the development of clause-internal expletives and of systematic raising of subjects (15th century); the loss of V2 (ca. 1450), the development of the auxiliary system (modals and do) (ca. 1525), the loss of “short” verb-movement (ca. 1575), the contraction of negation (ca. 1600), the development of negative auxiliaries (1630s), and the development of do-support (later 17th century). The paper is organized as follows: in Section 2, we give the general theoretical background to the analyses we will propose, based on Biberauer & Roberts (2005); in Section 3, we summarise Biberauer & Roberts’ (2005, 2006) analysis of word-order change in Middle English (this covers the first three changes listed above); Section 4 deals with the loss of V2, the development of the auxiliary system and the loss of short V-movement, following the proposals in Biberauer & Roberts (2006); here we also present our analysis of the development of do-support. Section 5 concludes the paper.

2. Theoretical Background: Agree, EPP-Features and Pied-Piping Chomsky (2001, 2004, 2005) proposes a system of feature-valuing and movement which relies on two main notions: Agree and Extended Projection Principle (EPP) features. Here we will briefly describe this system and how it is applied in the analysis of word-order change in ME put forward by Biberauer & Roberts (2005). Agree is a relation between two heads α and β, where the following conditions hold: (3) a. α asymmetrically c-commands β; b. α and β are non-distinct in formal features;

Cascading Parameter Changes  229 c. there is no third head γ which intervenes between α and β which would be able to Agree with α (i.e. there is no head γ bearing features of the relevant type which asymmetrically commands β but not α). Where Agree holds, α is known as the Probe and β as the Goal. A precondition for Agree is that both the Probe and the Goal must be active, meaning that they must bear unvalued formal features.1 A typical example of the Agree relation is that which holds between T(ense), the head which bears φ-features relating to the subject, and the φ-features of the subject itself, merged in SpecvP. As shown in (4): (4)

TP T[ϕ]

vP DP ... D[ϕ]


Here the structural conditions for Agree, given in (3), are satisfied: T asymmetrically c-commands D, they are non-distinct in formal features since both bear φ-features and there is no head bearing φ-features intervening between them. Thus, if both T and D bear unvalued features, they are able to Agree. T is taken to have unvalued φ-features, while D, being a nominal element, is inherently specified for these features. D, on the other hand, has an unvalued Case feature. T and D in (4) can therefore Agree, with T’s φ-features being valued by D and D’s Case feature in turn being valued as Nominative by (finite) T in virtue of this Agree relation. This account therefore captures the inherent relation between Nominative Case and agreement with the subject. Note, however, that the account of feature-valuing outlined above makes no reference to movement. In many languages, of course, the DP in (4) raises to the specifier of the head with which it Agrees, i.e. to SpecTP, the “canonical subject position” since Chomsky (1982). In terms of the theory we adopt here, this operation is in principle independent of Agree, although related to it. More specifically, movement in the Probe-Goal system under discussion here only takes place where the target of movement (i.e. the Probe) bears an EPP-feature.2 Thus if the Probe involved in an Agree relation between two heads bears an EPP-feature, the Goal will raise to the Probe, either to a head-adjoined position or to a specifier, depending on the structural status (head vs XP) of the category moved. In (4), this means that if T bears an EPP-feature, either a D-head will adjoin to T or a DP will raise to create a TP-specifier. This latter operation is what happens in Modern English (NE) and in many other languages. We construe the EPP-feature as a feature of a feature, i.e. as being specifically associated with (a) particular feature(s) of the Probe (cf. Pesetsky & Torrego 2001: 359). Thus where the EPP-feature

230  Theresa Biberauer and Ian Roberts is associated with, for example, D-features on T, rather than with T’s Tense features, we represent this as EPPD, etc. A question that now remains is what determines whether a head or an XP undergoes movement? We propose that this depends on pied-piping. The dissociation of feature-valuing from movement makes it clear that a category larger than the Goal, but containing the Goal, may be moved. As we saw above, feature-valuing under Agree is a relation between heads, while the EPP-feature simply requires that the Goal must be moved, but does not in fact necessarily require that only the Goal be moved; it may therefore allow or require, as a matter of parametric variation, that a category larger than the Goal, but containing the Goal, be moved. This is the dimension of parametric variation that is explored in detail in Richards & Biberauer (2005), Biberauer & Richards (2006), Biberauer & Roberts (2005, 2006) and in Section 3 below. More generally, the pied-piping option is relevant in a configuration of the type in (5): (5) . . . XPROBE. . . [YP. . . ZGOAL. . .] . . . Here X Agrees with Z, and, where X has an EPP-feature, UG allows crosslinguistic variation as to whether ZGOAL moves to X or the larger category YP containing ZGOAL moves to X. Movement of the larger category is piedpiping. A well-known example of a cross-linguistic difference of the type in question is the option of pied-piping as opposed to preposition-stranding in the case of wh-movement of the object of a preposition. Consider the contrast between English and French illustrated in (6): (6) a.  *Qui as-tu parlé à? [A qui]   as-tu  parlé?  to whom  have-you spoken b. Who did you speak to? [ To whom ]  did you speak? As shown above, French requires pied-piping of the PP, while English allows preposition-stranding as well as pied-piping. These are parametric options instantiating the schema in (5) since the wh-expression is the Goal of Agree (i.e. Z; the Probe (X) in this case is a [+wh] C), and PP is YP. With these technical preliminaries behind us, we can now move on to the syntactic changes in the history of English that we are interested in.

3.  Word Order Changes in Middle English Biberauer & Roberts (2005) (B&R) propose an analysis of Old English (OE) and Middle English (ME) word-order patterns in terms of which the patterns attested at the various stages of OE and ME are analysed as the

Cascading Parameter Changes  231 output of a single grammar which permits restricted types of variation. As we shall see, the variation in question is exactly like that in (5) and (6) above. Their analysis is “Kaynian”, in that, following Roberts (1997 [this volume, Chapter 4]), van der Wurff (1997, 1999) and Fischer et al. (2000), they assume that the underlying word order throughout the history of English is head-initial (this follows from the Linear Correspondence Axiom of Kayne (1994); see Roberts (1997 [this volume, Chapter 4]) for discussion of this in relation to OE).3 B&R propose that West Germanic-like OE word orders, such as SOVAux in subordinate clauses (main-clause order is consistently complicated by the effects of Verb Second), were derived by the application of two types of ‘large XP’ movement: VP-raising to SpecvP and vP raising to SpecTP. To see how this works, consider an SOVAux example like (7): (7) Đa se Wisdom þa þis fitte asungen hæfde . . . when the Wisdom then this poem sung had “When Wisdom had sung this poem . . .” (Boethius 30.68.6; Fischer et al., 2000: 143, 25) The order observed in (7) is obtained by means of the operations given in (8) in the order shown:4 (8) (i) V-to-v raising: vP VP

V+v (V)


(ii) VP-to-(inner)SpecvP movement: vP v’

VP (V)


O v v


(iii) merger of the subject in the topmost SpecvP: vP




VP (V)

O v V

(VP) v

Theresa Biberauer and Ian Roberts


(iv) vP-movement to SpecTP: TP T’

vP S


v’ VP




v’ O v V

(VP) v

In (8i) we illustrate V-movement to the “light verb” position, v. Following Marantz (1997), Chomsky (2004: 112, 122), we assume that this operation is universal and is required in order to “verbalise” the acategorial root (which we continue to write as V for convenience). The movement of VP to SpecvP shown in (8ii) is a case of pied-piping of the type discussed in the previous section. Here v Probes the D-features of the object, and has an EPPD feature. The object is the Goal of Agree with v, but where we have the V-final order in (7) as opposed to a “leaking” order (see below), the larger category containing the object DP, namely VP, moves. This is a parametric option in OE. The effect of moving the remnant, verbless VP, i.e. [VP (V) DP] (we indicate “traces” of moved categories in parentheses), is therefore to create the surface order OV. (8iii) demonstrates merger of the subject DP in SpecvP.5 (8iv) shows a second instance of pied-piping, exactly analogous to the one in (8ii) but at a higher structural level. Following merger of hæfde (see Note 4 on the position of auxiliaries), T probes the D-features of the subject, and looks to satisfy its EPPD feature. The subject is the Goal of Agree with T, but where we have the VAux order shown in (7), the larger category containing the subject DP, namely vP, moves. This is a further parametric option in OE. The effect of moving the vP, along with the other operations seen in (8), is therefore to create the surface order SOVAux. It is, of course, well known that the word orders exhibited by OE are not restricted to the SOVAux order considered above. B&R show how all of the other available orders, except those where the object is final (on which see below) can be derived by assuming that the EPPD features of v and T may, in fact, both be satisfied either by means of pied-piping (i.e. moving VP and vP as discussed above) or by moving just the Goal DP, and thus “stranding” VP/vP-internal material. Thus OE EPPD-satisfaction is directly analogous to EPPwh-satisfaction in the case of extraction of the complement of a preposition in NE, in that both the stranding and the pied-piping options are available. The stranding option in the vP-domain gives rise to the derived structure (9), as opposed to (8ii): (9)



O v V

VP v (V)


Cascading Parameter Changes


The principal consequence of the object-raising derivation illustrated in (9) is that any VP-internal material additional to the verb itself and the direct object, e.g. indirect objects, PPs, adverbial material, particles, etc., will appear in a postverbal position. So this option explains the attestation of “leaking” structures in OE. We can further explain the fact that languages such as German disallow “leaking” by saying that German does not allow the “stranding” option in this case (in other words, German EPPD satisfaction is parallel to French wh-movement from PPs). B&R are thus able to account for both the V-final and the “leaking” orders in OE in terms of a single grammar with an option for pied-piping vs stranding as regards EPPD satisfaction in the vP domain. B&R further argue that the same options were found in the TP-domain. (10) gives the structure that results if the “stranding” option is taken in place of vP-fronting at the stage of the derivation illustrated in (iv) above: (10)


T’ T hæfde



v’ VP


v’ O v

(VP) v


The surface order that results here is SAuxOV, i.e. an order that is often referred to as “Verb-projection raising” (see van Kemenade (1987) on these orders in OE and Haegeman & van Riemsdijk (1986) on this order in Swiss German and West Flemish; we consider structures of this kind in more detail below).6 In (10), we have “stranding” in the TP-domain, but pied-piping in the vP-domain. Stranding in both domains gives (11): (11)


T’ T hæfde



v’ O

v’ v V

(VP) v (V)


234  Theresa Biberauer and Ian Roberts Again we have SAuxOV, the “verb-projection raising” order, but this time with leaking of VP-internal material. This order, too, is attested in OE. Examples of the orders in (9 – 11) are given in (12): (12)  a. þa geat mon þæt attor ut on þære sæ then poured man that poison out on the sea “Then someone poured the poison out on the sea” (Orosius 258.16; Lightfoot 1991: 61, 18b) b. . . . þæt hi mihton swa bealdlice Godes geleafan bodian   that they could so boldly God’s faith preach . . . that they could preach God’s faith so boldly” (ÆCHom. I, 16.232.23; Fischer et al. 2000: 156, 48) c. . . . þæt mon hæfde anfiteatrum geworht æt Hierusalem    that man had amphitheatre made at Jerusalem “. . . that man had made an amphitheatre at Jerusalem” (Orosius, Or_6:; Trips 2002: 81, 23) (12a) illustrates the “stranding” mode of EPPD satisfaction in the vP domain: only the object DP, þæt attor, raises, stranding the particle ut and the PP on þære sæ. (12b) illustrates “stranding” in the TP domain, with hi raising independently of the rest of the vP, swa bealdlice Godes geleafan bodian, to satisfy T’s EPPD feature. Finally, (12c) shows that it was also possible for just the subject (mon) and just the object (anfiteatrum) to raise to satisfy T and v’s EPP-features. B&R show how postulating a grammar which permits the option of moving just the Goal DP alongside the possibility of piedpiping a larger constituent containing that DP enables one to account for the attested, stable synchronic variation in OE. They furthermore argue that the approach described above also affords a principled account of the word-order changes that took place in ME. The basic idea is that the grammar changed from one which allowed both the VP/vPpied-piping option and the “stranding” (i.e. DP-movement) option for satisfaction of v and T’s EPPD features to one which allowed only the latter mode of satisfaction. B&R propose that this change first occurred at the v-level, in the 12th or early 13th century (see Canale (1978), van Kemenade (1987) and Lightfoot (1991) in this connection). The loss of VP-pied-piping involved a reanalysis of simple OV orders whereby the remnant-VP fronting was reanalysed as object-movement. This can be illustrated with the following example: (13)     The man  the apple   ate. a. [vP S      [VP (V) Obj]   [v V+v ] (VP) ] b. [vP S    Obj        [v V+v ] [VP (V) (O) ]] B&R suggest that the cause of this reanalysis was a decrease in unambiguous evidence for pied-piping. A grammar allowing both pied-piping and stranding generates a larger language than one which only allows one of the

Cascading Parameter Changes  235 two options, and is therefore less highly-valued if one assumes the Subset Principle. In terms of this principle, originally put forward in Berwick (1985), “the learner selects the grammar that generates the smallest possible language that is compatible with the data” (Manzini & Wexler 1987: 425).7 In the OE context, the Subset Principle required the OE system, with its optionality of pied-piping vs. stranding, to be robustly triggered by examples of the sort illustrated in (12) above. Arguably, in Early ME, the pied-piping option was, however, less robustly triggered than before. To see this, it is important to realise that the basic difference between the conservative ((a)) and the innovative ((b)) structures in (13) is that the innovative structure allows only the object to feature in preverbal position, with any remaining VP-internal material following the verb, whereas the conservative grammar allows all VP-internal material to surface preverbally (although it need not do so). Given that finite clausal complements always appeared postverbally,8 the principal constructions where one can distinguish the two systems are verb-particle constructions and double-object constructions (we will discuss non-finite complements in Section 4). Verb-particle constructions with the order object—particle—V must be analysed as involving a pied-piping grammar, as the particle is fronted along with the object in the remnant VP (since Koster (1975) it has been assumed that the particle is merged in a VP-internal position in West Germanic). So this order would have triggered the pied-piping grammar, and is clearly found in OE (cf. Pintzuk 1991: 76f., Fischer et al. 2000: 185f.). However, it has often been remarked that verb-particle constructions become vanishingly rare in the 12th and 13th centuries (Spasov, 1966, cited in Kroch & Taylor, 2000: 146); it is possible that this was due to the influx of French borrowings at this period, replacing earlier verb-particle constructions with a simple verb. Thus this important trigger for the pied-piping grammar may have been removed, or at least rendered less robust than formerly, owing to an entirely extraneous lexical factor. A second extraneous factor may have been at work in the case of ditransitive constructions. In these constructions, the order direct object—indirect object— V would have triggered the pied-piping grammar. Again, this order is attested in OE (cf. van Kemenade 1987, Koopman 1990, 1994, Allen 1995, Koopman & van der Wurff 2000). However, during early ME, the distinction between accusative and dative case was lost; Allen (1995: 158f.) shows in detail that the system had broken down in all the ME dialects except Kentish by the end of the 13th century at the latest (see her table 10.1, p. 441). One consequence of this was a rise in prepositional datives. The use of a PP to express indirect objects gives rise to greater positional freedom for these arguments, and consequently a greater instance of “leaking”, and a correspondingly less frequent instantiation of the order triggering the conservative, pied-piping grammar. We propose, then, that the two factors just described would have undermined the trigger experience for the grammar with the pied-piping option. As a result, the word order changed in the way we observe. The word-order changes

236  Theresa Biberauer and Ian Roberts are thus the consequence of a reanalysis of the ever more liberal ‘stranding’permitting pied-piping grammar as one which specifically targets DPs. It is important to note that the reanalysis in (13) did not eliminate OV order, but that it simply changed the structure of OV sentences. Subsequently, starting from around 1400, object-movement of the type shown in (13b) became restricted to negative and quantified objects (van der Wurff (1997, 1999)). This restriction of v’s D-attracting property (arguably to [+Op] DPs) led to an overall increase in the number of VO orders in the PLD. As a result, many instances of vP-movement of the type shown in (8iv) above were in fact indistinguishable from simple DP-movement of the subject of the type seen in (10). This led to the reanalysis in (14). We again illustrate the reanalysis with a simple example: (14)      The man ate       an apple. a. [TP [vP Subj [v V+v ] [VP (V) Obj ]] T (vP) ] (conservative) b. [TP Subj T [vP (Subj) [v V+v ] [VP (V) Obj ]]] (innovative) Whilst (14) illustrates the basic point that the reanalysis does not affect the surface word order in a simple SVO example of this type, it is, as given, not quite correct. Assuming that auxiliaries surface in T (see Note 4), then (14a) predicts that the conservative grammar allowed the unattested SVOAux order. To solve this problem, B&R appeal to the fact that VP and everything it contains is inaccessible to syntactic operations once the derivation has proceeded past vP (this follows from the version of the Phase Impenetrability Condition (PIC) of Chomsky (2000)). As a consequence of this, the VP-internal object cannot surface in the position in the linear order indicated in (14a), but instead appears in the pre-vP-movement position following the surface position of the auxiliary in T. Thus, thanks to the PIC, the unattested SVOAux order cannot arise. The correct representation for (14a) is therefore (14a’), where the constituents indicated in outline have already been transferred to the interfaces and are therefore unavailable for syntactic operations: (14a’) [TP [vP Subj [v Vv]] T([VP [V] Obj [VP V Obj ]])] (14) illustrates a simplification parallel to that in (13). In (14a’), vP is piedpiped to SpecTP in order to satisfy T’s EPPD feature. In (14b), the subject alone raises to satisfy the same feature. We are thus once again dealing with pied-piping as opposed to “stranding”. The important point here is that, in the absence of clause-internal adverbial modification and auxiliaries (on which, see below), both structures give rise to the same linear order (SVO). The consequence of this is that there is, in cases of this kind, no unambiguous trigger for the more complex pied-piping operation. Moreover, as in the case of (13), the Subset Principle disfavours the grammar with the piedpiping option, since this generates a bigger language than one without it. Pied-piping must therefore be robustly triggered, and B&R suggest that by the 15th century, it was not.

Cascading Parameter Changes  237 Biclausal structures initially provided an environment in which the conservative structure in (14a’) and the innovative structure in (14b) gave rise to different orders. These were thus important triggers for the conservative grammar. (14a’) gives rise to surface SVAuxO and (14b) gives rise to SAuxVO (recall that we are using the cover “Aux” for restructuring verbs; see Note 4). Let us look at how the conservative grammar operated in biclausal cases as this will also help us to see how the restriction on objectmovement described above created the circumstances for the loss of the pied-piping option in biclausal environments, a development which also had important consequences for monoclausal structures. As first pointed out in van Kemenade (1987: 55f.), modal, causative and perception verbs were V(P)R (i.e. restructuring) triggers in OE, a state of affairs that entailed that the infinitival Vs selected by these verbs followed their selectors, as illustrated in (15): (15) a. . . . þe æfre on gefeohte his handa wolde afylan. who ever in battle his hands would defile “. . . whoever would defile his hands in battle” (Ælfric’s Lives of Saints 25.858; Pintzuk 1991: 102, 62) b. . . . . þæt hi mihton swa bealdlice Godes geleafan bodian. that they could so boldly God’s faith preach “. . . that they could preach God’s faith so boldly” (ÆCHom I, 16.232.23; Fischer et al. 2000: 156) (15a), with the order OAuxV, is known as the “verb raising” (VR) order; (15b), with AuxOV, is one case of “verb-projection raising”. Following B&R, we assume the structure in (16) for the complements of restructuring verbs (VR) in OE and Early ME; we consider a VR structure of the kind illustrated in (15a) by way of illustration (note that the matrix vP is omitted): (16)


DP-Subj T



T’ v’




T v’



(V+v) (VP)

238  Theresa Biberauer and Ian Roberts We are assuming that the complement of a restructuring verb is a TP (cf. i.a. Wurmbrand 2001 and Lee-Schoenfeld 2005 for arguments in favour of the idea that restructuring complements are “smaller” than other clausal complements). In the context of the theoretical framework we are assuming here, the specific assumption is that restructuring complements are TPs headed by a “defective” T, i.e. one that is not selected by C (cf. B&R 14f.). For our present purposes, this idea has the important consequence that the material in the restructuring complement is not sent to Spellout prior to merger of VR, the way material in the clausal complements of non-restructuring verbs is (owing to the PIC; cf. the discussion of the object in (14a’) above). This accounts for the “clause union” effects commonly associated with restructuring structures. Let us see how our analysis of V(P)R works in more detail. The derivation of the VR order in (15a) proceeds by the following steps. First, as we saw in (8i), V moves to v inside the vP of the embedded clause. Second, as in (8ii), the remnant VP moves to SpecvP. Third, V+v moves to T in the infinitival clause. This operation is the key to deriving the Aux-V order here; Biberauer & Roberts (2006) take this infinitive-movement to be triggered by a selectional property of the main-clause verb VR. They assume the selectional property to be the nature of the (defective) TP that VR selects. The last step in the derivation of a VR structure is remnant vP movement to the specifier of the selected T (this is another instance of “pied-piping” satisfying an EPP-feature). The loss of generalised object movement described above had the effect in the V(P)R context that vP-movement to the lower SpecTP would not be distinguishable in terms of the surface string from just subject movement. To see how this works, consider the structure in (17), which illustrates vP-movement to Spec-TP in a structure where the object has not undergone raising: (17)


DP-Subj T

... VP VR



T’ v’ tv


T V+v


Already sent to Spellout: VP tv


As in the case of the direct object in (14a’), the VP indicated in outline here is merged as the complement of the lower v, and thanks to the operation of

Cascading Parameter Changes  239 the PIC, this material is sent to Spellout and therefore becomes inaccessible for further operations as soon as the lower vP is completed. Hence, movement of this vP to SpecTP has no effect on the surface position of the object, which remains final. We thus straightforwardly derive optional VO orders in the complements of VR in both OE and ME.9 Moreover, in (17) the choice between pied-piping vP to the lower SpecTP and exclusively raising the subject to that position, which was operative throughout ME, has absolutely no effect on the surface order of elements, since the only overt material in vP which the PIC would allow to be spelled out in its moved position is the subject, which, in this case, is PRO, i.e. an element which cannot be assigned phonological form.10 As in the case of (14) above, we therefore once again see the relationship between the two changes: when the object is spelled out in postverbal position, crucial evidence in favour of the pied-piping option at the T-level is obscured. Thus the loss of generalised object movement had the consequence that the trigger experience began to feature many more structures for which it was impossible to distinguish subject-raising from vP-raising on the basis of the surface string. Because of the PIC then, acquirers had no evidence to distinguish a derivation involving pied-piping of vP to satisfy T’s EPPD feature from one in which only the subject moves to satisfy that feature. It is of course possible that the presence of vP-adverbials or other modifiers might disambiguate the two derivations, but in the vast majority of cases the ambiguity would have been present. We take it that this situation led to the reanalysis of (17) as (18): (18)


DP-Subj T

... VP VR


T’ T


vP T




VP (V)


As the structure in (18) shows, the fronted vP in infinitival contexts may have contained no overt material at all: an empty subject (here indicated as PRO) and the trace/copy of v (see Note 10). Recall that VP has already been sent to Spell out, and hence is not realised in the moved position. Given

240  Theresa Biberauer and Ian Roberts the lack of evidence for vP-movement, the simpler option of DP-movement was preferred (assuming that language acquirers always take the simplest option consistent with the trigger experience, where simplicity is taken to mean the smallest structure consistent with the input—see Clark & Roberts 1993 [this volume, Chapter 2]); vP-movement was therefore lost as a means of satisfying T’s EPPD feature. This concludes our account of the loss of vP-pied-piping. We now consider the empirical consequences of this loss. The reanalysis of vP-movement as subject-movement had two major consequences, both deriving from the fact that T’s EPPD feature, in the innovative grammar, could only be satisfied by a DP in SpecTP. The two consequences were that (i) expletive insertion became obligatory where no appropriate, raisable subject was available, and (ii) that movement of DP into SpecTP became obligatory in passives and unaccusatives. B&R illustrate both of these consequences in detail. They further show that both expletives and subject raising were options prior to the 15th century, owing to the fact that DP-raising to SpecTP was, in the conservative grammar, an available means of satisfying T’s EPPD feature. After the change in (14), however, this was the only way of satisfying T’s feature, and so expletive insertion and DP-raising became obligatory. B&R therefore provide a natural account for both the extended period of variation during which expletives and subject-raising were simply optional and for the fact that the change that ultimately took place went in the direction that it did: optionality is to be expected while the grammar has at its disposal two modes of EPP-satisfaction, but once the trigger experience for one of these modes has become insufficiently robust, language acquirers will opt for a simpler grammar which retains only the robustly attested mode. As we have shown above, the changes that occurred in early ME conspired to create a scenario in which vP-raising became indistinguishable from DP-raising in a majority of contexts, with the consequence that the former mode of EPP-satisfaction was lost. A further consequence of the loss of vP-raising was the loss of the orders usually referred to as Stylistic Fronting (Styl-F; see Biberauer & Roberts (2006)). Kroch & Taylor (2000) argue that ME had this operation, which functioned along lines similar to those typically claimed for Modern Icelandic (see Maling (1990), Holmberg (2000)). The two principal properties of Styl-F are that there must be a subject-gap and that it is subject to an Accessibility Hierarchy which states that negation takes precedence over adverbs which in turn take precedence over participles and other verbal elements. (19) is an example of putative Styl-F in ME: (19) .  .  . wiþþ  all  þatt    lac       þatt  offredd  wass  biforenn  Cristess   come   with   all  that  sacrifice that  offered  was  before   Christ’s coming   “. . . with all the sacrifice that was made before Christ’s coming” (Ormulum I.55.525; Trips 2002: 306, 123)

Cascading Parameter Changes  241 (19) is a passivised relative, with the passive participle offredd (“offered”) representing the fronted element. Biberauer & Roberts (2006) propose that cases of Styl-F observed in ME, and V-Aux ordering more generally, involve vP-movement to SpecTP. In their terms, the TP inside the relative clause in an example like (19) has the structure given in (20): (20) [TP [vP (Op) offredd] [T’ [T wass ] ([vP (Op offredd)) Op biforenn Cristess come])]] The most important aspect of this structure for the purposes of this paper is that vP, containing the string [Op offredd], has raised from its first-merged position following [T wass] to SpecTP. This operation takes place in order to satisfy T’s EPPD-feature. In the case under consideration, the D-feature is borne by the passive participle offredd, which B&R, following Baker, Johnson & Roberts (1989 [this volume, Chapter 8]), assume to contain the “absorbed” logical subject (cf. also Richards & Biberauer 2005).11,12 Biberauer & Roberts’s (2006) analysis also affords a simple explanation of the loss of “Styl-F”. For them, it is simply a case of the loss of vP-fronting, i.e. the loss of the pied-piping option for satisfaction of T’s EPPD feature. In this section, we have summarised B&R’s account of the word-order changes in ME. We have left out a number of details, but the essential points are as given here: the idea that OE had the option of “stranding” or piedpiping VP- and vP-internal material at both the v and T level for EPPD satisfaction, and the idea that the pied-piping option was lost in two stages in ME: first in the 12th or early 13th century at the v-level, and in the 15th century at the T-level. There was additionally also a further change around 1400 restricting object movement to negative and quantified objects. The OE grammar had two options at both levels; independent morphological and lexical factors undermined the evidence for one of these options, in such a way that, thanks to the Subset Principle, one of the options was lost. As we have seen, this in fact took place initially at the v-level, and the change at this level, combined with the restriction on object movement, led to the change at the T-level. The first change was in accordance with the Inertia Principle, since it was caused by independent lexical and morphological factors. The change at the T-level was an example of a syntactic change caused by the net effects of two earlier syntactic changes. This thus provides an initial example of the “cascade” effect which we discussed in the Introduction.

4.  The Loss of V2 and the Rise of the Auxiliary System Let us turn now to the loss of V2 in the 15th century. We can date this change to approximately 1450 (cf. van Kemenade (1987: 219f.), Fischer et al. (2000: 133f.)). Starting with van Kemenade (ibid.), it has often been suggested that V2 was lost through “decliticisation”. This idea is related to a well-known OE phenomenon: the existence of a systematic class of

242  Theresa Biberauer and Ian Roberts apparent exceptions to V2 where a pronominal clitic was able to intervene between the initial constituent and the verb: (21) a. hiora untrymnesse he sceal   rowian  on his heortan. their   weakness he  shall atone in his heart (CP 60.17; Pintzuk (1999: 136)) b. Þin  agen geleafa   þe hæfþ  gehæledne. thy own faith thee has healed (BlHom 15.24–25) Although there are many different analyses of this phenomenon (cf. i.a. van Kemenade (1987), Platzack (1995), Roberts (1996), Kroch & Taylor (1997), Fuss (1998), Fuss & Trips (2002), Haeberli (1999, 2002)), there is general agreement that the clitics do not “count” for the computation of V2. In terms of Chomsky’s (2008) idea that only phase heads can trigger movement, we could postulate that C is the host of the clitic in these cases (and cliticisation is to the left of the host, see Kayne (1994)).13 The core of the decliticisation idea is that, given the string XP–SCL–V, where “SCL” stands for “subject clitic”, if the SCL ceases to be a clitic, then this string is incompatible with V2. Van Kemenade (ibid.) proposes that precisely this decliticisation caused the loss of V2 in English. A consequence of the change in T’s mode of satisfying its EPPD feature discussed in the previous section is that a DP must appear in SpecTP from 1450 onwards, exactly the time of the loss of V2 (cf. van Kemenade’s (1997: 350) observation that “[t]he loss of V2 and the loss of expletive pro-drop [i.e. the development of a requirement for SpecTP always to be filled with a DP—MTB/IGR] .. coincide historically”). We propose the following reanalysis of sequences like those in (21) at this time (see below on the status of the SCL in (22b)): (22) 

a. [CP XP [C SCL-[C [v V+v ] C]] [TP [vP (SCL) ([vVv])] T vP ]] > b. [CP XP C [TP SCL [T [v V+v ]] vP ]]

In (22a), T takes the pied-piping option for satisfaction of its EPPD feature, so vP moves to SpecTP. SCL cliticises to C, an operation which for present purposes we take to involve head-adjunction to the left of C. In (22b), SCL moves to SpecTP to satisfy this feature, as required in the innovative system. The reanalysis is forced by the loss of the pied-piping option. Furthermore, assuming that true clitics can only move to phase heads as we just mentioned, “SCL” in (22b) cannot be a true subject clitic, but must instead be a full subject pronoun. Hence decliticisation follows from the reanalysis in (22). This analysis also accounts for the observed gradualness of the loss of V2 (cf. “the loss of V2 is not an abrupt change, but a rather gradual one” (Haeberli (1999: 406)). Since the conservative grammar allowed the option of DP-movement to SpecTP before the reanalysis took place, the structure

Cascading Parameter Changes  243 in (22b) was already an option before the reanalysis took place, and so a gradual decline in V2, starting before 1450, is expected. In fact, V2 would have been strictly speaking optional throughout OE and ME (see Haeberli (2002) for discussion of this, and some evidence that this was indeed the case). The reanalysis in (22) was presumably also favoured by the fact that many V2 orders were in any case subject initial, and such orders were prone to be reanalysed as TPs with V-to-T movement (see Kroch & Taylor (1997), Fischer et al.(2000); see also Adams (1987) and Roberts (1993) on Old French, and Willis (1998) on Middle Welsh). An important consequence of the loss of V2 due to the reanalysis in (22) was that V-to-T movement became a general feature of finite clauses. The same is true in the history of both French and Welsh (cf. the references given in the previous paragraph). Biberauer & Roberts (2010) propose that, in the case of English, this led to a marked option. The reason for this has to do with tense-marking in English. Briefly, Biberauer & Roberts (2010) propose that the trigger for V-to-T movement is not rich agreement morphology (as was proposed by Roberts (1985 [this volume, Chapter 1], 1993), Rohrbacher (1994, 1999), Vikner (1997) and others), but rather rich tense morphology. More concretely, they propose that T has an unvalued V-feature, while V has an unvalued T-feature. T and V thus enter an Agree relation in terms of which T’s V-feature probes its interpretable counterpart on V, with the latter’s T-feature being valued in the process (cf. the discussion in Section 2 above). In English, the reflex of this Agree relation is V’s tense morphology, i.e. “Affix Hopping” in the sense of Chomsky (1957) and much subsequent work is simply valuation of V’s T-features via Agree. The same is true in non-V2 environments in the other Germanic languages (except Icelandic). In Romance, the Agree relation is associated with an EPP-feature on T which triggers V-movement. Most importantly in the present context, Biberauer & Roberts (2010) suggest that the difference between Germanic and Romance is correlated with the richer system of tense-marking in Romance: French and Italian have 5–7 synthetic tenses (depending on register), while Spanish and Portuguese have more. Germanic on the other hand, has at most 4 such tenses, with English and MSc effectively restricted to 2. Biberauer & Roberts (2010) propose that a Romance-style V/T-Agree system cannot be supported in a feebly tense-inflected language like Late ME (unlike Middle French or Middle Welsh, see above). The argument is therefore that the V-to-T movement grammar which resulted from the loss of V2 around 1450 was inherently unstable since a crucial morphological trigger for it—“rich” tense morphology of the Romance kind—was missing. This, it is argued, contributed to the reanalysis of modals and do as auxiliaries in the early 16th century and the subsequent loss of V-to-T movement later in the 16th century. Let us consider in a little more detail how this reanalysis happened.

244  Theresa Biberauer and Ian Roberts Recall that the modals were a subclass of the members of VR. Consider again the structure of a sequence containing a modal with an infinitival complement after the reanalysis of (17) as (18). Following Roberts (1993: 262) and Roberts & Roussou (2003: 41-42), we take it that the loss of infinitival inflection, which had taken place by 1500, removed the trigger for V-to-T movement in the complement to VR (the assumption is therefore that the infinitival inflection specifically instantiated features on V that not only entered in an Agree relationship with T, but also had to undergo movement under the influence of an associated EPP-feature). In this way, the evidence for the lower functional T-v system was removed from the trigger experience of acquirers. Hence (18) was reanalysed in the early 16th century as monoclausal, with modals being merged in v or T and the lexical verb remaining in V—cf. (23): (23)

TP DP-Subj


T Modal

vP (Subj)

v′ VP

v V


This change, again, was rather clearly a simplification, a significant one in the context of the system at the time: as pointed out by Roberts (1985 [this volume, Chapter 1], 1993, 1999) and Warner (1997), the reanalysis which resulted in (23) in turn contributed to the conditions for the loss of (finite) V-to-T movement later in ENE by creating a system in which, firstly the modals, and, thereafter, increasingly other auxiliaries were always available to lexicalise T. Very importantly, do underwent the same reanalysis as the modals at about the same time (see Denison (1985), Roberts 1993: 292f.). But the system that resulted was not the NE one of obligatory do-support in certain environments, with do ungrammatical everywhere else. Instead, do was always optional, including in positive declaratives. The 16th century was thus the period of what Jespersen (1909–49) called “exuberant” do, exemplified in (24) where do’s non-emphatic nature is evident from the fact that it surfaces in an unstressed metrical slot: (24) Thus cónscience does make cówards of us áll. (Shakespeare: Hamlet, I, i. 83; Roberts 1993: 293) The option of “exuberant” do in all contexts meant that any verb and any tense could be associated with an auxiliary. In other words, the trigger for

Cascading Parameter Changes  245 V-to-T raising was obscured by the development of the auxiliaries, particularly do (Roberts 1999: 293 [this volume, Chapter 5]). Kroch (1989) shows that, although there was variation throughout the ENE period, as Warner (1997: 382–383) observes, the period 1575–1600 seems to be the crucial one as far as the loss of V-to-T movement is concerned. The reanalysis that took place at this time was of the following kind: (25) a. [TP John [T walk-eth ] . . . [VP .. tV. . . ]] b. [TP John T .. [VP. . . [V walks ]]] By now, the verb-auxiliary system is rather similar to that of Modern English, with the exception of the absence of do-support. Do could still be freely inserted in positive declarative clauses, as just noted; conversely, clausal negation could appear without do, giving rise to examples with the order not—V, and no auxiliary (since V-to-T has been lost): (26) a. Or if there were, it not belongs to you. [1600: Shakespeare 2 Henry IV, IV, i, 98; Battistella & Lobeck (1988: 33)] b. Safe on this ground we not fear today to tempt your laughter by our rustic play [1637: Jonson Sad Shepherd, Prologue 37; Kroch (1989)] The development of do-support was preceded by the development of forms featuring contracted negation, which took place around 1600, as the following remark by Jespersen (1909–49, V: 429), cited in Roberts (1993: 305), suggests: The contracted forms seem to have come into use in speech, though not yet in writing, about the year 1600. In a few instances (extremely few) they may be inferred from the metre in Sh[akespeare], though the full form is written. Around 1600, then, negation contracted onto T, but since V-to-T movement of main verbs had been lost, only auxiliaries were able to be negative. This gave rise to a new system of clausal negation in which negative auxiliaries were used as the basic marker of clausal negation (it is clear from a range of languages, including those belonging to the Uralic family, Korean, Latin, Afrikaans, and others, that negative auxiliaries are a lexical option selected by a wide range of languages).14 The new class of auxiliaries included negative modals like won’t, can’t, shan’t, etc., but also the non-modal negator don’t/doesn’t/didn’t. Zwicky & Pullum (1983) argue convincingly that the negative auxiliaries must in fact be distinct items in the lexicon: negative n’t must be treated as an inflectional suffix, rather than a clitic, because inflections, but not clitics, trigger stem allomorphy, and n’t clearly triggers such allomorphy (see also Spencer (1991: 381f)). Biberauer & Roberts (2010)

246  Theresa Biberauer and Ian Roberts follow this analysis and therefore conclude that negative auxiliaries became part of the English lexicon during the early part of the 17th century. In other words, they propose that the available stock of “T-elements” (i.e. elements lexicalising specifically T-related features) was further increased during the early 17th century by the establishment of negative auxiliaries, and that this lexical factor compounded the morphologically determined system-internal pressure against maintaining a grammar in which lexical content-bearing “main” verbs could undergo raising to T, leading to its rapid demise. Once the negative auxiliaries, including doesn’t, don’t, didn’t, are established as the unmarked expression of clausal negation (probably by the middle of the 17th century; cf. Roberts (1993: 308)), the modern system of do-support comes into being. In this system, merger of do in T depends either on the presence of an “extra” feature on T, in addition to Tense-, V- and D-features (i.e. the interrogative feature Q or the negation feature Neg) or on the presence of a discourse effect, in contexts of emphasis and VP-fronting, as in: (27) a. John DOES (so/too) smoke. b. He threatened to smoke Gauloises and [smoke Gauloises] he DID/*he’d --. The discourse effect is once again required by Chomsky’s (2001: 34) proposal that “optional operations [here: Spellout of the features located in T—MTB/IGR] can apply only if they have an effect on outcome”. We could unite the two cases (Neg/Q-related do-support and discourse effect-related do-support) if we say that the auxiliaries are lexically associated with Negand Q-features (the former case giving rise to forms inflected with n’t; the latter not having any overt morphological reflex in English; but cf. Hunzib, Tunica, Gimira and other languages featuring interrogative verbal morphology discussed by Dryer 2013), and that their merger into the structure will thereby guarantee a discourse effect. If we slightly modify the reanalysis which gave rise to auxiliaries shown in (23) so that the modals and do were merged in v and raised to T in the new structure (which was nevertheless monoclausal, in that the complement to matrix T had lost its T-layer),15 then we could maintain that, although V(-to-v)-to-T was lost by the end of the 16th century, v-to-T remained. In that case, we could think of the development of do-support in the 17th century as a shift from the earlier obligatory v-to-T movement (first fed by V-to-v movement, and as such moving a main verb to T, but later only moving an auxiliary merged in v) to optional v-to-T movement creating a discourse effect. The difference between the two systems concerns the status of phonologically empty v, which in the earlier grammar, until the 17th century, moved to T (i.e. in examples like (26)). In the later grammar, only v containing an auxiliary moved to T. Again, this is a natural simplification of the grammar, given that movement of empty v to T could never be directly

Cascading Parameter Changes  247 observed in the PLD (cf. a parallel case in the nominal domain discussed in Section 4 above: following the loss of generalised object movement, both vP- and DP-raising in the infinitival TP-domain associated with V(P)R structures during the OE and ME periods resulted in the movement exclusively of empty categories (PRO and a lower copy of v in the former case; see Note 10, and PRO alone in the latter). As indicated in this section, this also led to structural simplification in that the original biclausal structure (18) became monoclausal (23)).This simplification was the final development in the establishment of the present-day English verbal system. To summarise, then: what we have seen in this section is how a series of natural changes affecting verb-movement and the auxiliary system in a language that initially resembled its Germanic relatives rather closely ultimately led to the creation of a verbal system that is unique in the Germanic context. We saw that these changes were initially triggered by the loss of vP-pied-piping, which had specific consequences in the V2 domain, resulting in the reanalysis of V2 structures as TPs (cf. (22) above). Various factors, including the reanalysis of modal-containing structures (cf. (23)), the rise of a class of negative auxiliaries and of do as a non-modal auxiliary then “remedied” the in (tense-) inflectional terms unsupportable V-to-T raising system that briefly existed at the end of the 16th century. The ever-increasing availability of auxiliaries and their establishment as a syntactically distinct class of “T-elements” undermined the trigger for V-to-T raising to a significant extent and ultimately led to a situation in which V-to-v-to-T raising could be reanalysed as v-to-T raising, with only verbs merged in the relevant kind of v (see Note 15), i.e. the auxiliaries, consequently being able to undergo this raising. The final change was the loss of “empty” v-to-T raising in positive declarative contexts, which resulted in the modern-day system of do-support, do being restricted to contexts in which it has an “interpretive effect”.

5. Conclusion The result of the changes described in the foregoing sections is that the OE system with OV, V2, no syntactically distinct auxiliaries and no V-movement in non-V2 clauses developed into the NE system, which is VO, non-V2, and has a class of syntactically distinct positive and negative auxiliaries and dosupport, via intermediate steps featuring processes found in neither OE nor NE, such as fully productive V-to-T and object movement restricted to negative and quantified objects. This remarkable series of changes can be seen as a cascade of parametric changes. We can summarise them as follows: (28) a. Loss of VP-to-SpecvP movement (late 12th/early 13th century) b. Restriction of object shift to negative and quantified objects (1400) c. Loss of vP-movement to SpecTP (early 15th century) d. Loss of V2 (1450) e. Development of lexical T (modals and do) (1525)

248  Theresa Biberauer and Ian Roberts f. Loss of V-to-T (1575) g. Contraction of negation (1600) h. Development of negative auxiliaries (1630s) i. Development of do-support (later 17th century) It has often been pointed out that English seems to diverge quite radically from the other West Germanic languages. It used to be thought that this had to with the influence of Norman French, although more recently the effects of Old Norse have sometimes been regarded as responsible for this divergence (see for example Kroch & Taylor (1997), Trips (2002)). We, however, argue that the series of changes in (28) had the net effect of transforming English from a typologically “standard” West Germanic language into the unusual system of Modern English. In this paper, we have tried to show how each change led to the next and how each change, after the initial one, can be ascribed to the interaction of specific system-internal factors. There therefore appears to be no need to invoke contact as a direct cause of the changes as each syntactic change seems to be sufficient to cause the next. The initial change, as we suggested in Section 3, may have been due to extraneous lexical and morphophonological changes, the first perhaps connected to contact with French. We could think of this as “parametric drift”: a cascade of parametric changes diffused through parts of the functional-category system over a fairly long period of time. This point emerges more clearly of we restate the parameters in more technical terms (with the exception of (28g), which was initially a purely phonological change): (29) a. Loss of pied-piping to satisfy v’s EPPD feature, which may have been optional throughout the attested OE period (thus guaranteeing the “interpretive effect” of defocalisation of material to the left of V; see Note 9). b. Loss of v’s optional EPP-feature, but retention of specialised EPPD on v (see (a)). c. Loss of pied-piping to satisfy T’s EPPD feature. d. (Matrix) C loses EPP-feature triggering T-movement. e. Modal and aspectual features of T realised by Merge. f. v loses EPP-feature triggering V-movement (but see Note 15). g. Possibly not a syntactic change. h. Negative features of clause realised by Merge in T. i. T loses obligatory feature triggering v-movement. So we observe a series of small, incremental changes to the formal feature make-up of the core functional categories C, T and v. Taken together, they give rise to a major reorganisation of the English verb-placement and auxiliary system, and have created a system which is quite unlike anything found elsewhere in Germanic (or Romance).16

Cascading Parameter Changes  249 What causes the cascade effect? To answer this we need to understand exactly what is meant by the “propensity to change” alluded to above. The key idea, due to Lightfoot (1979), is that “grammars practice therapy, not prophylaxis”. Essentially, each parameter change skews the PLD in such a way that the next is favoured, perhaps in concert with other pre-existing factors (such as the existence of subject- and object-clitics with their particular behaviour in V2 contexts, as discussed in Section 4 (and see Note 13)). We have seen in the description above how each successive change was favoured. Let us now look at this in a little more detail. The crucial trigger for VP-pied-piping to SpecvP was the occurrence of VP-internal material other than the direct object in a preverbal position in subordinate clauses. OE, as is well known, showed a good deal of “leaking” of such material, and we account for this with the idea that VP-pied-piping was optional. We suggested in Section 3 that the two most important cases of VP-internal material were particles and indirect objects. Independent factors—the influx of French lexical items replacing verb-particle combinations, and the loss of dative case leading to a rise in the expression of indirect objects as PPs—may have undermined this trigger experience. The OE system, with the pied-piping/stranding option for EPPD satisfaction, was inherently marked in terms of the Subset Principle, since this grammar generated a larger language than one without the optionality, and hence robust trigger experience was crucial. How did (29a) lead to (29b)? The loss of v’s optional EPP-feature, resulting in the unavailability of a general object-raising trigger, could plausibly have been the consequence of contradictory input from V(P)R contexts. Recall that these structures were biclausal in OE and ME and that the EPP-feature associated with the lower (infinitival) T-head could be satisfied either via pied-piping (i.e. vP-movement, where vP would have contained a raised object wherever the optional or specialised EPPD on v was present) or by “stranding” (i.e. subject-raising). Note that the “stranding” option of raising just the subject DP would have been just as available in cases where v was associated with an optional EPPD feature as in those where it was not. Consequently, V(P)R structures would have represented a context where objects that had undergone “defocusing” movement under the influence of v’s EPPD feature might nevertheless surface in postverbal position (V in V(P)R structures necessarily undergoing movement to T, as outlined in Section 3). Thus VO order, in V(P)R contexts at least would not have been consistently interpretable as a “focusing” structure and it is conceivable that this input may have compromised the trigger experience to the point where the “defocusing” EPP-feature was lost. This would, of course, have led to the situation that we see in late ME, namely that the only objects that still surface preverbally are those attracted by the remaining object-attracting feature, namely the specialised EPPD feature discussed in Note 9. Precisely when and how this feature arose and why it was retained for as long as it was are questions that we must leave to future research at this point.

250  Theresa Biberauer and Ian Roberts What is clear is that the restrictions on object movement, combined with the loss of VP-pied-piping, led to the change in (29c): the loss of vP-piedpiping. We described in Section 3 how, both in monoclausal and biclausal contexts, the trigger experience could not distinguish the pied-piping from the stranding case, and so, once again, the Subset Principle led to the loss of the older pied-piping grammar. The loss of vP-pied-piping led to the general requirement that a DP had to appear in subject position. This led to the reanalysis of subject clitics as occupying this position in the exceptional V3 orders, and hence to “decliticisation” and the reanalysis of the XP–SCL–V as well as Subject–V orders as non-V2 structures with V moving to T. V-to-T movement was not, however, robustly triggered by the morphological system of Late ME, given the “rich tense” requirement for this operation identified in Biberauer & Roberts (2010). Hence the loss of V2 favoured the development of the auxiliary system (29e) and the loss of V-to-T (29f). The reanalysis of the modal auxiliaries, at least, was also favoured by the changes in restructuring complements caused by the loss of vP-pied-piping (as well as the loss of infinitival morphology, an independent morphophonological change). The development of contracted negation was initially simply a phonological reduction of not to n’t. However, in combination with the loss of V-to-T movement, it led to the development of a separate class of negative auxiliaries. This is a case of the development of an inflectional affix. In general, following the proposals in Fuss (2005), we can take this to involve the removal of a given feature from the syntactic system as an autonomous element, in favour of systematically associating it with a lexical item or class of lexical items. As a further case of restriction on the distribution of a lexical item, this might be thought of as driven by the Subset Principle. The development of negative auxiliaries may have led to the development of general do-support if the conjecture at the end of the previous section regarding the status of obligatory v-movement is correct. Since this couldn’t be seen in many cases, once V-to-T had been lost, and since negative auxiliaries had developed (along with auxiliaries bearing a Q-feature, by analogy, we must suppose), v-movement became optional, and always had a discourse effect. Again, this is an example of a restriction being imposed on a movement operation. One factor which is very clearly at work in many of these changes is what we might call “restriction of function”: the narrowing down of an operation to a subset of the contexts in which it formerly applied. To the extent that this kind of change imposes new restrictions on the distributional freedom of (a class of) lexical items, it may derive from the Subset Principle. A further factor may be a general preference for relative simplicity of derivations, which frequently disfavours movement, or movement of relatively complex categories. In general, then, we see that it is possible to maintain a strong version of the Inertia Principle (which, as Longobardi 2001 points out, is desirable in the context of the Minimalist Programme) and yet at the same time account

Cascading Parameter Changes  251 for an intricate series of related syntactic changes, not all of which have a purely syntax-external cause. At the same time, we see what Sapir’s (1921: 165) intuition regarding “the vast accumulation of minute modifications which in time results in the complete remodelling of the language” might mean in principles-and-parameter terms.

Notes   1. Formal features are those which are directly relevant to the functioning of the operations of syntax, such as φ-features (person and number features), Case features and categorial features. These features may or may not play a role at the phonological and semantic interfaces. Other features, such as [sonorant] or [monotone increasing] may play a role only at one or other interface. It is useful to think of formal features as attribute-value pairs, e.g. [Person: 3]. In this way, unvalued features can be seen as those simply lacking a value, and the Agree operation can be seen as copying values between the Probe and the Goal. A condition on the semantic interface is that all formal features must be valued (cf. the Principle of Full Interpretation).   2. This use of the term “EPP” bears only a rather indirect relation to the Extended Projection Principle as originally proposed in Chomsky (1982: 10). For our purposes here, it suffices to think of the EPP-feature as a movement-triggering diacritic.  3. B&R’s approach diverges, however, from some of the aspects of the theory of phrase structure in Kayne (1994), notably in that they assume that a single head may have more than one specifier.   4. For ease of exposition, we represent the auxiliary hæfde as being merged in T. It is likely that the structure of clauses containing auxiliaries was more complex than this in OE: ‘restructuring’ verbs, which took infinitival complements, almost certainly had a TP complement and, as such, introduced biclausal structures (see below); habban, beon and weorþan, which typically had participial complements, may also have introduced a biclausal structure. We return to this point in Section 4, when we consider the development of the NE auxiliary system.  5. In assuming that the object’s D-feature is valued by v before the subject is merged, we are departing from Chomsky (1995: 355f.). Instead, we follow the account of the distinction between nominative-accusative and ergative-absolutive case-agreement marking put forward by Müller (2004). Müller argues that the contrast between the two types of pattern derives from a choice in the order of operations in a transitive clause when the derivation reaches v. At this point, v may either Agree with the direct object, or the subject may be merged. If Agree precedes Merge, v’s features Agree with the D-features of the object, and the subject, once merged, must Agree with T. This gives rise to a nominativeaccusative system. An ergative-absolutive derives from the opposite order of operations. Since OE was clearly nominative-accusative, the order of operations indicated in (8) is as predicted by Müller’s analysis.  6. We follow the literature on West Germanic, starting with Evers (1975), in using this terminology and the related term “verb raising” for OAuxV orders, although our analysis is very different to those relying on rightward-movement of verbs or verb-projections.   7. The Subset Principle arguably follows from the fact that language acquirers do not have access to negative evidence and therefore cannot retreat from a “superset trap” if they postulate a grammar which generates a language larger than that determined by the data.

252  Theresa Biberauer and Ian Roberts   8. This is, of course, also the pattern exhibited by Modern Dutch and German and West Germanic generally. Assuming v’s EPP-feature in these languages to specifically require movement of a D-bearing Goal, we can account for the consistently postverbal position of non-restructuring clausal complements by appealing to the fact that any D-features contained in complements of this type would no longer be accessible to v’s D-Probe at the point at which this head is merged (cf. the workings of the Phase Impenetrability Condition/PIC discussed below).  9. In addition to the VO orders which result from the effects of the PIC as described above, VAuxO was also available in OE in structures such as that illustrated in (i): (i) . . . þæt    ænig  mon   atellan  mæge ealne  þone demm     that  any  man  relate  can  all   the  misery “. . . that any man can relate all the misery” (Orosius 52.6–7; Pintzuk, 2002: 283, 16b)

This order does not involve V(P)R, despite the fact that the matrix verb is one of the “restructuring” triggers discussed above: the non-finite verb atellan precedes the modal that it would follow in restructuring contexts. In order to allow for the possibility of VO orders in subordinate clauses in OE, B&R propose that v in OE was, with the exception of one class of object DPs (see below), only optionally associated with an EPP-feature, but that the presence of this optional EPP-feature systematically guaranteed an interpretive effect that was absent in structures where v lacked it (see Chomsky (2001: 34, 2004: 112)). Assuming leftward movement in Germanic to be a “defocusing” operation (cf. Pintzuk & Kroch 1989 on the obligatorily focus-bearing nature of the post verbal material in Beowulf), B&R propose that OE v’s optional EPPfeature triggered defocusing movement wherever it was present; wherever it was absent, unmoved material could therefore remain in focus. This implies that negative and quantified/indefinite objects which appear to have rather consistently surfaced preverbally during OE (and also in ME) were leftwardmoved for different reasons (see also Kroch & Taylor (2000), Pintzuk (2002: 294ff)). B&R propose that the negative/quantified object movement was triggered by an obligatory EPP-feature specifically associated with a [+Op] D-seeking Probe. OE object movement thus results from two different types of EPP-feature-driven movement, one involving an obligatory EPP-feature, and the other involving an optional EPP-feature which triggers defocusing. See Reinhart (1995) for an account of object-scrambling and defocusing in Dutch. 10. Note that the raising of the lower copy of v in the vP-fronting (pied-piping) case vs. the non-raising of this copy wherever the DP-fronting (“stranding”) option is employed does not have any effect on surface order either. See Biberauer & Roberts (2006, Note 6) for discussion of a PIC-based Spellout mechanism that “distinguishes” higher vs. lower copies, privileging only the former with full Spellout (i.e. phonological realisation). Regardless of the correctness of this proposal, it is clear that any account employing remnant movement where the remnant is ultimately only partially spelled out (e.g. den Besten & Webelhuth’s (1987) analysis of German VP-fronting) must offer some explanation as to how copies contained in a remnant that eventually surfaces above “higher” copies are disqualified from phonological realisation. We leave this matter for further research, the crucial point here being that the copy of the infinitival verb adjoined to v is not available for Spellout, with the consequence that it cannot signal the difference between vP- and DPraising to SpecTP.

Cascading Parameter Changes  253 11. In (20), the object is extracted under relativisation, which we have indicated by (Op); the leftmost occurrence of this symbol marks its successive-cyclic movement through SpecvP (Note, however, that nothing here hinges on the assumption of a null-operator rather than a raising analysis of relatives).   The PP biforenn Cristess come was also a constituent of VP (and therefore of vP). However, it does not appear before the auxiliary in the surface string because at this stage, the pied-piping option was no longer available for v. The PP therefore remains within the VP throughout the vP phase of the derivation, and it surfaces in final position owing to the effects of the PIC as described above. 12. B&R’s analysis also facilitates a very simple analysis of V-Aux structures that are very evidently not amenable to a Styl-F analysis, but which are nevertheless attested in ME. Consider (i) in this connection: (i)  er þanne  þe   heuene oðer  eorðe  shapen were before  that  heaven  or  earth  created were “before heaven or earth were created” (Trinity Homilies, 133.1776; Kroch & Taylor 2000: 137)

For B&R, (i) involves pied-piping of a vP containing heuene oðer eorðe shapen, and as such is quite straightforward, whereas in terms of a Styl-F analysis the VAux order is problematic since there is no subject gap. 13. This idea might form the basis of a general account of second-position clitics, a point that we will not develop further here 14. Biberauer & Roberts (2010) take it that the negative auxiliaries with n’t represent the unmarked post-17th-century form. They note that many instances of non-contracted not involve constituent, not clausal, negation. This is clearly true whenever not is non-adjacent to the auxiliary, as in (i): (i)  a.  John has always not smoked.    b.  The kids have all not done their homework.

It should, however, be noted that clausal scope is possible if not (i.e. the full form) is adjacent to the auxiliary; thus:

(ii)  John must/does not smoke.

In this connection, Biberauer & Roberts suggest that there is a “negativeconcord”-style Agree relation between [+neg] T and not (cf. the fact that the presence of the [+neg] feature on T triggers do-support in NE—see below). 15. This might necessitate postulating an “extra” v-layer to host V-to-v movement. However, pace the proposals in Marantz (1997) and Chomsky (2001, 2004) mentioned in Section 3 in this connection, we might think that NE verbs are in fact category-neutral roots; note that, unlike in all the other (continental) Germanic languages, NE verbs are able to appear in an uninflected form in a very wide range of environments: all persons of the present tense except 3sg, the “subjunctive”, the infinitive and the imperative. On the other hand, the evidence adduced in Johnson (1991) does suggest that NE has at least “short” V-movement, and this may then imply the presence of a further v-layer if the proposal in the text is to be maintained. 16. It is interesting to note that Icelandic underwent the initial word-order change, but not the subsequent changes discussed here (see Hróarsdóttir (2000), Rögnvaldsson (1996)). It may be that Icelandic never lost V2 because it never had subject (pro)clitics. This would be consistent with the account of the relation between the loss of vP-pied-piping and the loss of V2 proposed in Section 4.


Theresa Biberauer and Ian Roberts

References Adams, M. 1987. “From Old French to the theory of pro-drop”. Natural Language and Linguistic Theory 5: 1–32. Allen, C. 1995. Case Marking and Reanalysis. Grammatical Relations from Old to Early Modern English. Oxford: OUP. Baker, M., Johnson, K. and Roberts, I. 1989. “Passive arguments raised”. Linguistic Inquiry 20: 219–251 [this volume, Chapter 8]. Battistella, E. and Lobeck, A.1988. An ECP account of verb second in Old English. Proceedings of the Conference on the Theory and Practice of Historical Linguistics. Berwick, R.1985. The Acquisition of Syntactic Knowledge. Cambridge, Mass.: MIT Press. den Besten, H. and Webelhuth, G. 1987. “Remnant topicalization and the constituent structure of VP in the Germanic SOV languages”. Paper presented at GLOW X (Venice). Biberauer, T. & M. Richards. 2006. True optionality: When the grammar doesn’t mind. In C. Boeckx (ed) Minimalist Essays. Amsterdam: John Benjamins, pp. 35–67. Biberauer, T. and Roberts, I. 2005. “Changing EPP parameters in the history of English: accounting for variation and change”. English Language and Linguistics 9(1): 5–46. Biberauer, T. and Roberts, I. 2006. Loss of V-Aux orders and remnant fronting in Late Middle English. In J. Hartmann & L. Molnárfi (eds) Comparative Studies in Germanic Syntax. Amsterdam: Benjamins, pp. 263–298. Biberauer, T. and Roberts, I. 2010. “Subjects, Tense and Verb-movement in Germanic and Romance”. In T. Biberauer, A. Holmberg, I. Roberts & M. Sheehan Parametric Variation: Null Subjects in Minimalist Theory. Cambridge: Cambridge University Press, pp. 263–302. Bobaljik, J. and Wurmbrand, S. 2004. “The domain of Agreement”. Natural Language and Linguistic Theory 23:809-865. Canale, M. 1978. Word order change in Old English: base reanalysis in Generative Grammar. PhD dissertation, McGill University. Chomsky, N. 1957. Syntactic Structures.The Hague: Mouton. Chomsky, N. 1982. Some concepts and consequences of the theory of Government and Binding. Cambridge, Mass.: MIT Press. Chomsky, N. 1995. The Minimalist Program. Cambridge, Mass.: MIT Press. Chomsky, N. 2000. “Minimalist inquiries: the framework”. In Step by step. Essays on Minimalist Syntax in Honor of Howard Lasnik, R. Martin, D. Michaels & Juan Uriagereka (eds), 89–156. Cambridge, Mass.: MIT Press. Chomsky, N. 2001. “Derivation by phase”. In Ken Hale: a life in language, Kenstowicz, M. (ed.), 1–52. Cambridge, Mass.: MIT Press. Chomsky, N. 2004. “Beyond explanatory adequacy”. In The cartography of syntactic structures, volume 3: structures and beyond, Belletti, A. (ed.), 104–131. Oxford: OUP. Chomsky, N. 2008. “On Phases”. In R. Friedin, C. Otero & M.-L. Zubizarreta (eds) Foundational Issues in Linguistic Theory. Cambridge MA: MIT Press, pp. 133–166. Clark, R. and Roberts, I. 1993. “A computational model of language learnability and language change”. Linguistic Inquiry 24: 299–345 [this volume, Chapter 2]. Denison, D. 1985. “The origins of periphrastic do: Ellegård and Visser reconsidered”. In Papers from the 4th International Conference on Historical Linguistics, Eaton, R. et al. (eds), 45–60. Amsterdam: John Benjamins. Matthew S. Dryer. 2013. Polar Questions. In: Dryer, Matthew S. & Haspelmath, Martin (eds.) The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://wals. info/chapter/116, Accessed on 2018-06-16.)

Cascading Parameter Changes  255 Evers, A. 1975.The transformational cycle in Dutch and German. Ph.D. dissertation, University of Utrecht. Fischer, O., van Kemenade, A., Koopman, W. & van der Wurff, W. 2000. The syntax of early English. Cambridge: CUP. Fuss, E. 1998. Zur Diachronie von Verbzweit. MA dissertation, University of Frankfurt. Fuss, E. 2005. The Rise of Agreement. A Formal Approach to the Syntax and Grammaticalization of Verbal Inflection. PhD dissertation, University of Frankfurt. Fuss, E. and Trips, C. 2002. “Variation and change in Old and Middle English. On the validity of the Double Base Hypothesis”. Journal of Comparative Germanic Linguistics 4: 171–224. Haeberli, E. 1999. Features, categories and the syntax of A-positions. Synchronic and diachronic variation in the Germanic languages. Ph.D. dissertation: University of Geneva. Haeberli, E. 2002. Features, categories and the syntax of A-positions. Cross-linguistic variation in the Germanic languages. Dordrecht: Kluwer. Haegeman, L. and van Riemsdijk, H. 1986. “Verb projection raising, scope and the typology of rules affecting verbs”. Linguistic Inquiry 7: 417–466. Holmberg, A. 2000. “Scandinavian Stylistic Fronting: how any category can become an expletive”. Linguistic Inquiry 31(3): 445–483. Hróarsdóttir, Th. 2000. Word order change in Icelandic: from OV to VO. Amsterdam: John Benjamins. Jespersen, O. 1909–1949. A Modern English Grammar on Historical Principles I-VII. London/Copenhagen: Allen and Unwin. Johnson, J. 1991. “Object positions.” Natural Language and Linguistic Theory 9: 577–636. Kayne, R. 1994. The Antisymmetry of Syntax. Cambridge, Mass.: MIT Press. Keenan, E. 1996. “Creating Anaphors: An Historical Study of the English Reflexive Pronouns”. Unpublished ms.: UCLA. van Kemenade, A. 1987. Syntactic Case and Morphological Case in the history of English. Dordrecht: Foris. van Kemenade, A. 1997. “V2 and embedded topicalization in Old and Middle English”. In van Kemenade, A. and Vincent, N. (eds). Parameters of morphosyntactic change, 326–352. Cambridge: CUP. Koopman, W. 1990. Word Order in Old English. PhD dissertation, University of Amsterdam. Koopman, W. 1994.“The order of dative and accusative objects in Old English”. Unpublished ms.: University of Amsterdam. Koopman, W. and van der Wurff, W. 2000. “Two word order patterns in the history of English: stability, variation and change”. In Stability, Variation and Change of Word-Order Patterns over Time, Sornicola, R., E. Poppe and Shisha-Halevy, A. (eds), 259–283. Amsterdam: John Benjamins. Koster, J. 1975. “Dutch as an SOV language”. Linguistic Analysis 1: 111–136. Kroch, A. 1989. “Reflexes of grammar in patterns of language change”. Language Variation and Change 1: 199–244. Kroch, A. and Taylor, A. 1997. “Verb movement in Old and Middle English: Dialect variation and language contact”. In Parameters of morphosyntactic change, van Kemenade, A. and Vincent, N. (eds), 297–325. Cambridge: CUP. Kroch, A. and Taylor, A. 2000.“Verb-object order in Middle English”. In Diachronic syntax: models and mechanisms, Pintzuk, S. Tsoulas, G. and Warner, A. (eds), 132–163. Oxford: OUP. Lightfoot, D. 1979. Principles of diachronic syntax. Cambridge: CUP. Lightfoot, D. 1991. How to set parameters: arguments from language change. Cambridge, Mass.: MIT Press.

256  Theresa Biberauer and Ian Roberts Lightfoot, D. 1999. The Development of Language. Acquisition, Change, and Evolution. Cambridge: CUP. Longobardi, G. 2001. “Formal syntax, diachronic minimalism, and etymology: the history of French chez”. Linguistic Inquiry 32(2): 275–302. Maling, J. 1990. “Inversion in embedded clauses in Modern Icelandic”. In Modern Icelandic Syntax, Maling, J. and Zaenen, A. (eds), 71–91. San Diego: Academic Press. Manzini, R. and Wexler, K. 1987. Parameters, Binding Theory, and Learnability. Linguistic Inquiry 18(3): 413–444. Marantz, A. 1997. No Escape from Syntax. University of Pennsylvania Working Papers in Linguistics: 201–225. Müller, G. 2004. “Argument encoding and the order of elementary operations”. Unpublished ms.: University of Leipzig. Pesetsky, D. and Torrego, E. 2001. ‘T-to-C movement: causes and consequences’. In Kenstowicz, M. (ed.), Ken Hale: a life in language, 355–426. Cambridge, Mass.: MIT Press. Pintzuk, S. 1991/1999. Phrase structures in competition. Variation and change in Old English word order. New York: Garland. Pintzuk, S. 2002. “Verb-Object order in Old English: variation as grammatical competition”. In Syntactic effects of morphological change, Lightfoot, D. (ed.), 276–299. Oxford: OUP. Pintzuk, S. and Kroch, A. 1989. “The rightward movement of complements and adjuncts in the Old English of Beowulf”. Language Variation and Change 1: 115–143. Platzack, C. 1995. “The loss of verb second in English and French”. In Clause structure and language change, Battye, A. and Roberts, I. (eds), 200–226. New York: OUP. Reinhart, T. 1995. “Interface Strategies”. OTS Working Papers in Linguistics. Utrecht: University of Utrecht Publishers. Richards, M. and Biberauer, T. 2005. “Explaining Expl”. In The function of function words and functional categories, den Dikken, M. and Tortora, C. (eds), 115–154.Amsterdam: John Benjamins. Roberts, I. 1985. “Agreement Parameters and the Development of the English Modal Auxiliaries”. Natural Language and Linguistic Theory 3(1): 21–58 [this volume, Chapter 1]. Roberts, I. 1993. Verbs and diachronic syntax. Dordrecht: Kluwer. Roberts, I. 1995. “Object movement and verb movement in Early Modern English”. In Studies in Comparative Germanic Syntax, Haider, H., Olsen, S. and Vikner, S. (eds), 269–284. Dordrecht: Kluwer [this volume, Chapter 3]. Roberts, I. 1996. “Remarks on the Old English C-system and the Diachrony of V2”. Linguistische Berichte: 154–167. Roberts, I. 1997. “Directionality and word order change in the history of English”. In Parameters of morphosyntactic change, van Kemenade, A. and Vincent, N. (eds), 397–426 Cambridge: CUP. [this volume, Chapter 4]. Roberts, I. 1999. “Verb movement and markedness”. In Language Change and Language Creation: Creolization, Diachrony and Development, de Graff, M.(ed.), 287–327. Cambridge, Mass.: MIT Press [this volume, Chapter 5]. Roberts, I. and Roussou, A. 2003. Syntactic Change. A Minimalist Approach to Grammaticalization. Cambridge: CUP. Rögnvaldsson, E. 1996. Word order variation in the VP in Old Icelandic. Working Papers in Scandinavian Syntax 58: 55–86. Rohrbacher, B. 1994. The Germanic languages and the full paradigm: a theory of V-to-I raising. PhD dissertation, University of Massachusetts, Amherst. Rohrbacher, B. 1999. Morphology-driven syntax. Amsterdam: Benjamins.

Cascading Parameter Changes  257 Sapir, E. 1921. Language. An Introduction to the Study of Speech. New York: Harcourt Brace. Spasov, D. 1966. English phrasal verbs. Sofia: Naouka Izkoustvo. Spencer, A. 1991. Introduction to Morphological Theory. Oxford: Blackwell. Trips, C. 2002. From OV to VO in Early Middle English. Amsterdam: John Benjamins. Vikner, S. 1997. “V-to-I movement and inflection for person in all tenses”. In The New Comparative Syntax, Haegeman, L. (ed.), 189–213. London: Longman. Warner, A. 1997. “The structure of Parametric Change and V-movement in the history of English”. In Parameters of morphosyntactic change, van Kemenade, A. and Vincent, N. (eds). 380–393. Cambridge: CUP Willis, D. 1998. Syntactic change in Welsh: a study of the loss of verb-second. Oxford: OUP. van der Wurff, W. 1997. “Deriving object-verb order in late Middle English”. Journal of Linguistics 33: 485–509. van der Wurff, W. 1999. “Objects and verbs in Modern Icelandic and fifteenthcentury English: a word order parallel and its causes”. Lingua 109: 237–265. Zwicky, A. and Pullum, G. 1983. “Cliticization vs. inflection: English n’t”. Language 59 (3): 502–513.

Part II

Comparative Syntax


Passive Arguments Raised Mark Baker, Kyle Johnson and Ian Roberts

The purpose of this article is to develop and motivate a theory of passive constructions whose central claim is (1): (1) The passive morpheme (-en) is an argument. In pursuing (1), we elaborate the approach to passives originated in Jaeggli (1986), and our theory owes much to Jaeggli’s original insights. We will try to show that the “passive construction” reduces to (1), with the various (and cross-linguistically varied) properties of passives resulting from an interplay of autonomous forces acting in tandem with (1). Most prominent among these forces are the well-formedness conditions on arguments: the θ-Criterion with the related Visibility Condition, the Projection Principle, and conditions on binding. The article is organized as follows. In section 1 we present a brief description of our theory and discuss its major strengths and some apparent problems. In section 2 we discuss the evidence from binding theory that suggests that implicit arguments of passives are “syntactically active,” in that they show coreference properties typical of syntactically realized arguments. We concentrate here on the evidence that the argument -en gives rise to strong crossover effects, drawing on observations due to Postal (1971). The properties of the argument -en with respect to θ-theory and Case theory are the topics of sections 3 and 4, respectively. Here we derive the range of cross-linguistic variation from a small number of minimal assumptions about cross-linguistic differences in the features of the passive morpheme. Two major points emerge: (i) In certain languages -en appears to undergo NP Movement. (ii) The generalization underlying the 1-Advancement Exclusiveness Law of Relational Grammar (see Perlmutter and Postal (1984))—which states essentially that only verbs with logical-subject arguments can be passivized—can be derived from the θ-Criterion in the cases where it truly holds; moreover, the properties of languages where this generalization does not hold can be neatly accounted for. Finally, in section 5 we take up the issue of the structural representation of argument -en at various levels of the derivation and the interaction of this morpheme with other

262  Mark Baker et al. aspects of the English verb-auxiliary system. The discussion here throws light on the nature of this system as a whole. The overall result of our investigation is of both empirical and conceptual interest for the formulation of a theory of grammatical-function (GF) changing processes in natural language. The empirical interest lies in the breadth and depth of our coverage of a central GF-changing process, the passive construction. The conceptual interest lies in the fact that we account for this construction in terms of the notion “argumental affix”—a piece of morphology that is subject to the well-formedness conditions that apply to arguments. The natural continuation of this work would be to extend this notion to other GF-changing processes.

1.  The Theory Outlined We propose that -en, the passive argument, is base-generated under Infl and therefore that the D-Structure representation of a passive clause has the general form shown in (2): (2)

S (or, IP) I′



I -en



We leave until section 5 the question of the representation of passive clauses involving sequences of auxiliaries—including be—as, for example, in They must have been being arrested. It is clear that the representation in (2) requires some elaboration in order to account for these auxiliary sequences, but we will abstract away from this matter in the early sections of the article, focusing instead on the properties of argument -en. If -en is an argument, then it must be in a θ-marked position at D-Structure (Chomsky (1981, chap. 2)). This requirement entails that Infl is a θ-marked position in (2). Hence, -en can only receive a θ-role assigned outside VP, namely, the logical-subject θ-role. This is a straightforward consequence of the representation in (2). It immediately explains four salient properties of passives: (3) a. The fact that the logical-subject argument is not realized on an NP in passives b. The phenomenon of “implicit arguments” in passives c. The fact that the subject position is nonthematic in passives, permitting NP Movement into this position d. The 1-Advancement Exclusiveness Law (1AEX)

Passive Arguments Raised  263 We will discuss (3a–c) at this point but will defer discussion of the 1AEX until section 3. Concerning (3a), it is clear that the logical-subject argument is realized in some marked way in passives, whether another argument occupies the surface subject position or not. Moreover, as pointed out in Marantz (1984), this is independent of the actual θ-role associated with the logical subject. Marantz illustrates this point with the following examples: (4) a. b. c. d. e.

Hortense was pushed by Elmer. (AGENT) Elmer was seen by everyone who entered. (EXPERIENCER) The intersection was approached by five cars at once. (THEME) The porcupine crate was received by Elmer’s firm. (GOAL) The house is surrounded by trees. (LOCATION?) (Marantz (1984, 129))

For us, this follows from the simple fact that the subject θ-role is realized on an argument in Infl, instead of one in the subject position. We turn to the question of the representation of by-phrases below. A second point that arises in this connection is that -en, like any subject argument, receives a compositional θ-role from VP. This is illustrated by the sentences in (5)–(6), where the type of activity performed by the agent varies as a function of the D-Structure complements of the verb: (5) a. b. c. d.

A baseball was thrown by Fernando. Support was thrown behind the candidate by the CIA. The match was thrown by the prizefighter. The party was thrown by the department.

(6) a. b. c.

A book was taken from the shelf by John. The bus was taken to New York by Mary. A nap was taken by the professor in his office. (Roberts (1985, 55))

Such examples show clearly that if -en is an argument, then it is an external argument, inasmuch as it is assigned a θ-role by the entire D-Structure VP; the implications of this claim will be taken up in greater detail in section 3. Turning now to point (3b), we find further motivation for a D-Structure representation of the kind in (2) from the behavior of subject-oriented ­modifiers—essentially a particular class of adverbs—and the type of infinitival adjunct known as rationale clauses (henceforth RatCs). Here a comparison of passives and middles is instructive; in passives, but not in middles, the understood subject of the RatC may be “controlled,” and subject-­oriented adverbs may find an argument to modify (see Manzini (1980), Roeper (1983)). (7) a. This bureaucrat was bribed [PRO to avoid the draft]. b. *This bureaucrat bribes easily to avoid the draft.

264  Mark Baker et al. (8) a. This bureaucrat was bribed deliberately. b. *This bureaucrat bribes deliberately. Under the hypothesis that RatCs and subject-oriented adverbs require the syntactic presence of an argument, the contrasts in (7)–(8) are accounted for by our theory. Only in (7a) and (8a) is an argument present that satisfies these requirements; that argument is -en. Point (3c) follows from the fact that VP is only capable of assigning one θ-role. Because of this, the subject position in (2) is not assigned a θ-role and therefore cannot be occupied by an argument at D-Structure. Thus, the subject position of passives is a possible landing site for NP Movement, and, where NP Movement does not apply, it is occupied by an expletive. So far nothing in our account predicts the obligatory movement of the object NP in (2) to the S-Structure subject position. To see how we account for this, consider the S-Structure representation derived from (2): (9)

S (or, IP) NPi


I e



[ ] + en


(9) differs from (2) in two respects: (i) the application of NP Movement placing the D-Structure object in subject position; (ii) the downgrading of -en from I to V. The exact nature of the operation in (ii) will be discussed in section 5; for the moment we take it as primitive. Given this, we can derive the requirement for NP Movement. The principle behind (i) is the Visibility Condition, proposed in Chomsky (1981, chap. 6). This condition is an adjunct to the θ-Criterion, in that it requires all θ-marked arguments to be “visible” for θ-role assignment at LF, meaning that they must receive abstract Case by S-Structure. This requirement applies, then, to the two arguments of the passivized verb in (2) and (9): -en and the object NP. Assuming that -en downgrades for independent reasons, only the verb can assign Case to -en in (9), since it is the only Case assigner that governs -en. Because the verb must assign Case to -en, it is no longer able to Case-mark NP. As a result, NP must move into the Case-marked subject position. Thus, we account for the “Case-absorption” property of passives (although we consider the term “Case absorption” to be a misnomer). Note that our account derives the effects of “Burzio’s Generalization” for passives. Burzio’s Generalization states the well-known correlation between the failure of assignment of a θ-role to subject position and the failure of

Passive Arguments Raised  265 assignment of structural Case to object position. In our theory, these two “assignment failures” are both instances of assignment to morphology. Since Burzio’s Generalization holds for a range of other constructions (middles, ergatives, and so on) that do not involve morphological alterations of the verb, it could be objected that something is being missed here. However, we take the position that Burzio’s Generalization is the statement of a problem that needs to be solved. Like many problems, it is not a priori obvious that this one has a unitary solution. We therefore consider it an advance that the problem has been solved for passives. The above paragraphs outline what we consider to be the derivation of a “short” passive sentence. What about “long” passives, that is, those containing by-phrases? We have treated -en as an argument that affixes to the verb. We regard its essential properties to be like those of clitics. Thus, like other clitics, -en forms a chain with a full NP (see, for example, Jaeggli (1982), Borer (1984)). The NP that forms the coda of the chain may be overtly realized as a by-phrase, giving rise to “long passives.” In this case the situation resembles clitic-doubling.1 If the NP is not overt, a “short passive” is formed. The existence of a clitic chain in passives implies that -en has a referential index. Further, in short passives, our claim is that there is an empty category linked to the argument -en. In the next section we will find evidence for both these claims. We take a rather nonstandard view of clitics in certain respects. First, we consider that English has at least one clitic, -en, and, second, we allow for the possibility of morphologically “deep” clitics, which trigger quite complex morphophonological suppletions and so on. Both of these differences amount to the same point: we are taking the syntactic properties of wellknown cases of clitics—essentially the Romance clitics—as basic, and as completely independent of the morphophonological property of enclisis or proclisis. Thus, for the purposes of syntax, a clitic is an argument category realized as adjoined to a head. In this respect, -en is a clitic in good standing. The fact that -en has a morphophonological realization that is rather different from that of Romance clitics is the reflex, we take it, of completely independent properties of English phonology. In other words, in asserting that -en is a clitic we are assuming (i) a syntactic definition of clitic like the one above and (ii) the independence of phonology and syntax. Notice moreover that there are elements that are clitics phonologically but not, apparently, syntactically (see the discussion of French subject clitics in Rizzi (1986)). We propose that -en is syntactically a clitic but phonologically an affix. In this section we have outlined the general form of a passive derivation in our theory and commented on a number of matters that arise. We now turn to a more detailed discussion of various aspects of our account.

2.  Binding Theory and Passive Arguments Here we argue for (1), and more narrowly for the claim that -en forms a chain with an NP, by showing how this chain enters into various types of binding relations. We also explain why other kinds of binding relations do not arise.

266  Mark Baker et al. 2.1  Strong Crossover in Passives The central fact showing that the passive argument is syntactically active is the existence of strong crossover (SC) effects in passives. Thus, short passives cannot be interpreted so that the understood subject is coreferential with the S-Structure subject. In other words, (10) cannot mean (11): (10) a. They were killed. b. They were admired. (11) a. They committed suicide. b. They admired themselves. This effect cannot be attributed to a semantic constraint, since there is nothing wrong with the sentences in (11). Moreover, we cannot say that the impossibility of coreference is a pragmatic effect due to the structural absence of the passive argument. This would entail that “coreference” with structurally absent arguments is impossible, whereas in fact “coreference” possibilities with such arguments are maximally free. In this respect, consider the following contrast: (12) a. John shaves (easily). b. John was shaved. Whether (12a) is taken as a middle (with John as Theme, meaning ‘it is easy to shave John’) or as an intransitive, the Agent and Theme arguments of shave can be taken to be coreferential (there is in fact a preference for this in the agentive intransitive reading). On the other hand, (12b) does not allow this; it cannot mean ‘John was shaved by John’. Our theory of passives can explain this fact. An example like (10a) has the following representation: (13) *Theyi were kill + eni ti IMPi. We use “IMP” to represent the coda of the chain headed by -en. Compare (13) with standard cases of SC—for example, those that involve Wh Movement: (14) *Whoi does hei love ti ? (14) is usually taken to be ruled out by Principle C of the binding theory (see Chomsky (1981)): (15) An R-expression must be A-free. Since R-expressions include variables—wh-traces—(15) prevents these elements from being A-bound. Since the trace of who in (14) is A-bound by he,

Passive Arguments Raised  267 the example is ruled out. We could explain the ungrammaticality of (13) in the same terms if we make one assumption about IMP: (16) IMP is a variable. IMP is A-bound by the subject in (13), in violation of Principle C, hence the impossibility of coreference between “understood” underlying subjects and S-Structure subjects in passives.2 We will develop a different account, however, for three reasons: (i) if the empty category linked to -en is a variable, then it is unclear what the operator that binds this variable is; (ii) we have no way to tell whether the empty category is in fact a variable or a pronoun—everything we have seen up to now is compatible with either assumption; (iii) the ungrammaticality of the examples in (17) shows that more is going on: (17) a. *They were killed by themselves. b. *Theyi were kill + eni ti by themselvesi. (See Lees and Klima (1963, 21), Postal (1971, chap. 1).) Here the NP linked to -en is manifestly an anaphor, rather than a variable. We can account for the ungrammaticality of this example using the following conditions and definitions: (18) a. Chains: C = (xi, . . ., xn) is a chain iff, for 1 < i < n, xi locally binds xi + 1 (Rizzi (1986, (2))). b. Local binding: X locally binds Y iff X binds Y and there is no Z that binds Y but not X. c. Binds: X binds Y iff X c-commands and is coindexed with Y. (19) For each well-formed structure there exists a set of chains S, such that:

a. Each argument appears in a unique chain of S. b. Each chain of S contains a unique visible θ-position P and a unique argument. c. Each θ-position P is visible in a chain of S.

(20) a. A θ-chain is an element of the set S in (19). b. The Projection Principle requires arguments to appear in a θ-chain at every level. (Notice that we define “chain” independently of “θ-chain.” This creates a number of formal objects that may be regarded as chains without being relevant for the θ-Criterion; the point is that the θ-Criterion and the Projection Principle together force a one-to-one relation between arguments and wellformed θ-chains.)

268  Mark Baker et al. The crucial result of these conditions is that no structure of the following kind can exist: (21) Xi Yi ti (where X c-commands Y and Y c-commands t and there is movement from t to X) In this construction Xi must be in a non-θ-position (by (19a)). Thus, by itself, it is not a θ-chain (by (19b)). (20b) then implies that it must be in a nonsingleton chain with a θ-position that does not itself contain an argument—namely, the position of ti. However, if Xi and ti are both members of a chain, then Yi must be too, by (18a). If Yi is an argument, then this chain cannot be a θ-chain, since (19b) is violated. Thus, there is no θ-chain containing Xi in structures of the form (21), and they are ruled out. The essence of this account is taken from Rizzi (1986), who uses it to explain Italian examples like (22): (22) *Giannii sii è stato affidato ti t’i. Gianni self(dat) was been entrusted (‘Gianni was entrusted to himself.’) Lasnik (1985) also notices certain “loopholes” in Principle A of the binding theory that can be plugged this way. Consider (23): (23) *Johni is believed that hei likes ti. Both (22) and (23) are instances of the schema in (21). Moreover, so are (13) and (14). Hence, a chain-formation approach gives us a unified explanation of SC in passives (although this explanation may be redundant with Principle C in (13)). Two problems arise in connection with this idea. First, the examples with overt reflexives are not as bad as we might expect, and some of them are not very bad at all: (24) ?I am amused/impressed/worried by myself. However, it appears that all of the relatively good examples involve psychverbs. Belletti and Rizzi (1988) argue that psych-verbs are essentially ditransitive unaccusatives and therefore cannot form syntactic passives.3 Hence, the apparent passives in (24) must be adjectival and thereby lack an argument -en. If so, no crossover violation arises. This can be tested by trying examples that cannot be adjectival passives. Our prediction is that reflexive by-phrases in such cases will be very bad: (25) a. *John is considered a genius by himself. b. *John was sent a book by himself. c. *John was believed to have left by himself.

Passive Arguments Raised  269 These examples are far worse than (24), and are also worse than (17) when it receives a stative reading. This can be attributed to the possibility of an adjectival interpretation of (17). The second problem is that reflexives and reciprocals contrast with respect to SC effects: (26) a. They were seen by each other. b. *They were seen by themselves. To account for this, we propose the following structure for reciprocals, following Lebeaux (1983): (27)






Each is a bare quantifier. As such, it undergoes Quantifier Raising (QR) in the mapping to LF. Moreover, Jaeggli (1982) shows that bare quantifiers require local antecedents, in much the same way that anaphors do. Other is a disjunctive pronoun that requires contraindexation with its specifier. As head of the NP, other gives its index to the whole NP. The result is that we find structures like the following in (26a), which are not ruled out by the θ-Criterion: (28) Theyi eachi were see + enj ti by [ti otherj]j. This account of SC effects in passives carries over to other examples, noticed by Postal (1971), of the configuration in (21). Note that each time the reflexive/reciprocal contrast holds up: (29) Raising

a. Theyi seem to each otherj [ti to like John]. b. *Theyi seem to themselvesi [ti to like John].

(30) Psych (31)

a. Theyi strike each otherj [ti as smart]. b. *Theyi strike themselvesi [ti as smart]. Tough a. Theyi are tough for each otherj [to like ti]. b. *Theyi are tough for themselvesi [to like ti].

Thus, we are assuming that each is not an argument and therefore does not interfere with the formation of the chain between they and ti here.

270  Mark Baker et al. In this way, we account straightforwardly for the properties of (13) and (17) and, by adopting fairly minimal assumptions, explain the reflexive/reciprocal contrast in by-phrases. The central role played by the passive argument in the account of these data is an argument for the correctness of a theory such as ours. 2.2  Passives and Arbitrary Reference Here we will briefly address the question of the referential properties of the passive argument when no by-phrase is present. We will suggest that this argument has properties similar to arbitrary PRO. This idea on the one hand supports the hypothesis that the argument is a kind of clitic, since clitics generally have pronominal interpretation, and on the other hand allows an account of why the passive argument appears to remain “syntactically inert,” in certain cases. Our suggestion, then, is that the [-en, IMP] argument has semantic properties akin to arbitrary PRO. This approach has a certain initial plausibility; passive sentences like John was killed/It is widely believed that . . . are naturally paraphrased as Someone or other killed John/People believe that. . . . Although we cannot provide a semantics for the passive argument here, this observation is sufficient to highlight the intuitive similarity between passives and arbitrary PRO. The advantage of the suggestion just made is that the binding contrast in (32) is now seen to parallel that in (33): (32) a. ?*This privilege was kept to themselves. b. Such privileges should be kept to oneself. (33) a. ?*To shave themselves is fun. b. To shave oneself can be fun. We account for this by invoking a feature-agreement requirement on anaphor binding (ruling out cases of the type John likes themselves) and a requirement that arbitrary pronouns lack features. Therefore, arbitrary pronouns are only able to bind other arbitrary pronouns, as the above examples show. This account carries over to the following cases involving pronouns:4 (34) a. *Hisi mother was see + eni. b. This is the kind of show that onei’s children shouldn’t be take + eni to. (35) a. *PROi to abandon hisi children is irresponsible. b. PROi to abandon onei’s children is irresponsible. Again, we can say that the nonarbitrary pronoun and -en fail to agree in features, and coreference is therefore not possible in the (a) examples. On the other hand, the arbitrary pronoun imposes no feature-agreement requirement, and the (b) examples therefore do allow coreference. On this view, examples like (34a) and (35a) are not evidence against a structurally

Passive Arguments Raised  271 present passive argument; properly understood, the contrasts here are in fact evidence for the presence of such an argument. Despite the basic similarities, there is a notable difference between the passive argument and arbitrary PRO: arbitrary PRO can be 1st person, whereas the passive argument must be 3rd person. This is shown by the fact that arbitrary PRO can bind a 1st person plural anaphor, whereas the passive argument cannot: (36) a. PRO to shave ourselves is fun. b. *Love letters were written to ourselves. The passive argument also contrasts with the otherwise very similar Italian si morpheme in this respect: (37) Si invia spesso lettere a noi stessi. SI sends often letters to us  selves ‘Letters are often sent ourselves.’ We do not have an account for this “non-1st person” restriction on the passive argument, so we simply stipulate the constraint here. Whatever really underlies this constraint, it clearly explains the relative syntactic inertness of the passive argument, while allowing us to continue to assume the existence of such an argument.

3.  θ-Role Assignment and the 1AEX In the preceding section we investigated some of the implicit argument properties of passives under the theory sketched in section 1 (see (3b)). Next we turn to property (3d): the fact that passives in many languages obey the 1-Advancement Exclusiveness Law of Relational Grammar (Perlmutter (1978), Perlmutter and Postal (1984)). This law stipulates roughly that only one argument can acquire subject status in the derivation of any given clause. Thus, verbs that have derived subjects cannot be passivized. This is illustrated in (38)–(40) for unaccusative verbs in Dutch (see Perlmutter (1978), Burzio (1986)), for raising verbs, and for passive verbs: (38) a. b. c.

In dit weeshuis groeien de kinderen erg snel. in this orphanage grow the children very fast ‘In this orphanage the children grow very fast.’ John seemed to have left. The vase was broken by John.

(39) a. *In dit  weeshuis   wordt er door de  kinderen erg snel gegroeid. in this orphanage is it by the children   very fast grown b. *It was seemed to have left (by John). c. *It was been broken (by the vase) by John.

272  Mark Baker et al. (40) a. *In dit weeshuis werden de kinderen erg snel gegroeid. in this orphanage are  the children  very fast grown b. *John was seemed to have left. c. *The vase was been broken by John. (38) gives elementary examples, and (39) the potential results of forming (impersonal) passives of them (see Perlmutter and Postal (1984)).5 (40), on the other hand, gives the results of simply adding “redundant” passive morphology to the verbs, which already involve derived subject constructions (see Marantz (1984)). Both types of construction are impossible in languages like Dutch and English. Consider first examples like those in (40). In some Government-Binding accounts, these structures are the hardest to block, requiring auxiliary assumptions such as Marantz’s (1984) No Vacuous Affixation Principle, which bars the attachment of “superfluous” morphemes. The fundamental assumption that the passive morpheme is an argument makes an important advance in deriving this aspect of the 1AEX, however. Since the passive morpheme is an argument, it must receive a θ-role, by the θ-Criterion. Verbs like those in (40) have no extra θ-role to assign; therefore, the resulting structures are ungrammatical for the same fundamental reason that those in (41) are: (41) a. #John grew children very fast in the orphanage. b. *Mary seemed that John had left. c. *Peter was broken a vase by John. In each instance there is one more argument than θ-role (the -en in (40); the subject NP in (41)), making the sentence impossible. In this respect, we essentially follow an insight of Jaeggli (1986).6 The sentences in (39) are more interesting for two reasons—one theoretical, the other empirical. Thus, we focus on them for the remainder of this section. Theoretically, the sentences in (39) do not present the same problem as those in (40), because they exhibit a one-to-one correspondence between available θ-roles and arguments (counting -en, but not the expletive subject or the optional by-phrase). Nevertheless, their ungrammaticality can still be derived because we assume that, by virtue of their categorial features, passive morphemes are generated under the Infl node. As such, they always appear outside the VP at D-Structure. This structural position implies that they can receive an external θ-role from the verb, but not an internal one in the sense of Williams (1981). Now, in Government-Binding Theory, the θ-Criterion implies that a verb with a derived subject does not assign a θ-role to the subject position (Chomsky (1981)). Thus, the 1AEX can largely be restated as the fact that passive morphology cannot appear with a verb that does not assign an external θ-role. But this follows immediately from the θ-Criterion: -en is an argument, structurally it can only receive an external θ-role, and the relevant verbs have no such θ-role; hence, the structure is ungrammatical.

Passive Arguments Raised  273 This time a θ-role is in principle available to the passive morpheme, but the morpheme is in the wrong place to receive it. The fact that the passive observationally obeys the 1AEX thus follows completely from the θ-Criterion. The second reason that paradigms like (39) are of interest is that they are in fact grammatical in some languages. Thus, a number of researchers have demonstrated that the 1AEX is not universal, as one might have suspected (see Timberlake (1982), Nerbonne (1982), Keenan and Timberlake (1985), Postal (1986)). The following examples illustrate this for Lithuanian: (42) Kur mūs gimta, kur augta? where by-us bear/pass-n/sg where grow/pass-n/sg ‘Where were we born, where did we grow up?’ (lit. ‘Where by us was getting born, where getting grown up?’) (Timberlake (1982)) (43) Ar būta tenai langinių? and be/pass-n/sg there window-gen ‘Were there really windows there?’ (lit. ‘And had there been existed by windows there?’) (Timberlake (1982)) (44) Jo pasirodyta esant didvyrio. him-gen seem/pass-n/sg being hero ‘By him it was seemed to be a hero.’ (Keenan and Timberlake (1985)) (45) To lapelio būto vėjo nupūsto. that leaf-gen be/pass wind-gen blow/pass ‘By that leaf there was being blown down by the wind.’ (Timberlake (1982)) (42) and (43) are passives of unaccusative verbs; (44) is the passive of a raising verb; (45) is a double passive. Similar facts have been reported for Turkish (Özkaragöz (1980; 1982)), Sanskrit (Ostler (1979, chap. 5)), and Irish (Nerbonne (1982)). In fact, this cross-linguistic difference between English/Dutch on the one hand and Lithuanian/Turkish on the other can be given an elegant account on our analysis. One does not expect the θ-Criterion to vary from language to language; indeed, sentences like those in (40), which are pure θ-Criterion violations, are not found in any language. The account of the ungrammaticality of (39) also relies on the assumption about the categorial features of -en, however; and the categories of corresponding items are known to vary to a certain extent from language to language. Thus, we can account for the Lithuanian/Turkish passives if we assume that the categorial properties of the passive morpheme in these languages are slightly different from those in English/Dutch. In particular, we can say that the passive morpheme in these

274  Mark Baker et al. languages is not an Infl, but rather an N that cliticizes to Infl. In normal personal or impersonal passives, this morpheme will be generated in the subject position. From there, it will affix to Infl. This process is technically an Incorporation in the sense of Baker (1988) and will be governed by the principles of movement as explained there. From this point on, the derivation will be just like that of normal passives: the passive morpheme cliticizes to the verb,7 and an object may move into the vacated subject position if there is one.8 This gives derivations such as (46): (46) [-pass [I [V (NP)...]]] Incorporation [e [I + pass [V (NP)...]]] Cliticization [e [i [V + I + pass (NP)...]]] NP Movement [NPj [i [V + I + pass tj...]]] The important difference between the Lithuanian passive and the Dutch passive is that as an N, the Lithuanian passive morpheme can appear in any NP position generated by the base—including VP-internal positions, where it receives an internal θ-role. This allows a grammatical output with unaccusative verbs, for example. The passive morpheme is generated in the θ-position and then undergoes NP Movement to the subject position. From there it can affix to the Infl and ultimately the V in the usual way: (47) [e [I [V -pass]]] NP Movement [-pass [I [V t]]] Incorporation [e [I + pass [V t]]] Cliticization [e [i [V + I + pass t]]] (= (42), (43)) The raising example (44) is similar: here the passive morpheme is generated as the subject of the embedded clause and reaches the subject position of the matrix clause by NP Movement. From there the derivation proceeds as before. The double passive examples are slightly different. In English and Dutch double passives are always forbidden, because they would involve having two arguments (the two passive morphemes) outside the VP in Infl. Since verbs never assign more than one external θ-role, a θ-Criterion violation is inevitable. In Lithuanian, however, one passive morpheme can be generated in an NP position inside the VP and one outside it. This makes the following derivation possible: (48) [-pass [I [V -pass]]] Incorporation [e [I + pass [V -pass]]] NP Movement [-pass [I + pass [V t]]] Incorporation [e [I + pass + pass [V t]]] Cliticization [e [i [V + I + pass + pass t]]] ( = (45)) In effect, this typological difference between passive constructions is accounted for by introducing a second way that the basic passive structure

Passive Arguments Raised  275 (see (2)) can occur: it can be base-generated (English, Dutch) or derived by Move α (Lithuanian, Turkish). Since we assume that the Lithuanian passive morpheme can be generated in any NP position, we face potential problems of overgeneration. For example, we must prevent the association of the Lithuanian equivalent of (49a) with the sense in (49b) by the derivation in (49c): (49) a. b. c.

John was beaten (by Bill). John beat someone (/Bill). [John [I [beat -en]]] [John [I + en [beat t]]] [John [i [beat+ I + en t]]]

The solution to this problem comes from the theory of syntactic affixation (Incorporation). Baker (1988) argues that an element X can affix to an element Y in the syntax only if Y governs X’s original position (compare the Head Movement Constraint of Travis (1984)). Now, the passive morpheme is stipulated as needing to affix to Infl. The only NP position that Infl governs is the subject position; hence, the derivation in (49) is ruled out. More generally, in order to fulfill its affixation requirements, the passive morpheme must come to be in the subject position. This can in principle take place in either of the two ways mentioned above: it can be base-generated in the subject position (as in (46)) or it can reach the subject position by NP Movement (as in (47), (48)). Other imaginable situations are ruled out. It is important to clarify the status we predict for (49). A form analogous to (49a) could perfectly well be associated with an interpretation like (49b) by a derivation much like (49c) if we made the minimal change of saying that the relevant morpheme affixes directly to verbs. In fact, such structures are attested; the following are two such examples: (50) a. Muk’ bu š-i-mil-van never asp-1sA-kill-Apass ‘I never killed anyone.’ (Tzotzil; Aissen (1983)) b. Man-li’i’ häm (guma’). Apass-see we(abs) (house) ‘We saw something (a house).’ (Chamorro; Gibson (1980)) However, morphemes with these properties will be unable to appear in a structure like (5la) with interpretation (51b): (51) a. John kill-van. b. Someone killed John.

276  Mark Baker et al. c. [-van [I [beat John]]] [[I [beat + van John]]] [John [I [beat + van t]]] Essentially, this means that -van is not what we would call a passive morpheme; rather, it is an antipassive morpheme, an entirely different element (see Baker (1988) for discussion).9 In conclusion, we have shown that the 1AEX can be derived from the θ-Criterion once the passive morpheme is recognized as a true syntactic argument. The limited violations of the 1AEX that have been attested— those like (39) but not like (40)—are the result of a small difference in the lexical properties of the passive morpheme in question.

4.  Case Theory and Impersonal Passives Having clarified the relationship between passive morphemes and θ-role assignment, we now try to do the same for Case assignment. In section 1 we discussed how the basic fact about Case theory in passives—that the verb’s Case is “absorbed”—can be derived from the assumption that the passive morpheme is an argument. The result followed immediately from two widely held principles: (i) the Visibility Condition, which says that arguments must be Case-marked to receive a θ-role at LF, and (ii) the fact that (structural) Case is assigned under government at S-Structure. At S-Structure the passive morpheme is an argument attached to the verb (see (2)); hence, it must be assigned the verb’s Case. Thus, this Case cannot be assigned to any (other) NP, and we say that it is “absorbed.” Although this works nicely for languages like English, Jaeggli (1986) points out that it raises a serious comparative issue: what about languages in which passives can be formed with intransitive verbs as well as with transitive ones?10 Examples of such impersonal passives in German are as follows: (52) a. b.

Es wurde bis spät in die Nacht getrunken. It was till late in the night drunk ‘Drinking went on till late at night.’ (Jaeggli (1986, (22b))) Sonntags wird nicht gearbeitet. Sundays is not worked ‘On Sundays there is no working.’ (Roberts (1985, 512))

Verbs such as these do not necessarily take accusative Case objects. If this means that they lack the ability to assign accusative Case, then the passive morpheme will not be Case-marked in structures like (52). Given the Visibility Condition, a θ-Criterion violation would result. Thus, it is necessary to study the Case theory properties of passives more carefully.

Passive Arguments Raised  277 After considering several possibilities, Jaeggli assumes that there is a parametric difference between English and German such that intransitive verbs like work have the capacity to assign accusative Case in the latter but not the former. Normally such verbs will not have an accusative object even in German, because they have no θ-role to assign to it. Nevertheless, in passive structures this accusative Case is available to make the passive morpheme visible at LF, thereby allowing (52). Although this solution mechanically preserves the idea that the passive morpheme’s properties follow from the fact that it is an argument, it has serious problems. First, it is ad hoc in that it posits a fundamental difference between the lexical properties of English and German verbs without any independent motivation. Second, it makes the dubious prediction that no language could have two distinct passive constructions, where one allows impersonal passives and the other does not.11 Third and most important, recent literature has brought out the fact that there are impersonal passive constructions in which accusative Case is still visibly assigned to the thematic object of the verb. Interestingly, this never happens in German: (53) *Es wird diesen Roman von vielen Studenten gelesen. It  is this-acc novel-acc by many students read Sobin (1985) shows that it does in Ukrainian, however. In this language the thematic object can appear in either a nominative Case form or an accusative Case form, in more or less free variation: (54) a. b.

Cerkv-u bul-o zbudova-n-o   v   1640 roc’i. church-acc/fem was-imp built-pass-imp in  1640 ‘The church was built in 1640.’ Cerkv-a bul-a zbudova-n-a v 1640 roc’i. church-nom/fem was-fem built-pass-fem in 1640 ‘The church was built in 1640.’ (Sobin (1985, (13)))

Timberlake (1976) makes the same point for North Russian dialects. For convenience, we will refer to structures such as (54) as transitive passives in view of their accusative objects. These constructions (also called impersonals) are not particularly rare; they are also found in Welsh (Perlmutter and Postal (1984)), Polish (Keenan and Timberlake (1985), J. Zapior (personal communication)), and the impersonal si-passives in Italian (Belletti (1982)).12 Jaeggli’s solution for German impersonal passives clearly will not extend to these languages. Thus, we are forced to conclude, following Sobin (1985), that the absorption of accusative Case is a cross-linguistically more variable property of passive than the absorption of a θ-role.

278  Mark Baker et al. In order to give an account of the range of impersonal passives, we must modify one of the assumptions from which we derived the Case-absorption effect. There are two possibilities. We could preserve the Visibility Condition as it is if we held that: (55) An element (here the passive morpheme) can receive Case from some item that does not (minimally) govern it. Alternatively, we could say that the Visibility Condition needs to be generalized somehow, such that: (56) An element (here the passive morpheme) can under certain circumstances be assigned a θ-role at LF without being assigned Case. Both of these assumptions are departures from generic Government-Binding Theory, but in fact both have independent motivation. We discuss each in turn. The motivation for (55) comes from the fact that nominative Case can visibly appear on NPs inside the verb phrase in German (Den Besten (1981; 1985)). Indeed, this is possible whenever the subject has lexically assigned Case and hence does not need the nominative Case itself. For example: (57) a. b.

. . . daβ    [S meinem Bruder  [VP deine Musik   nicht gefällt]]. That      my  brother-dat your music-nom not  likes ‘. . . that my brother doesn’t like your music.’ . . . daβ     [S dem Karl [VP [S e [VP dein Buch zu gefallen]] scheint]]. That Karl-dat   your book-nom to  like seems ‘. . . that Karl seems to like your book.’

In (57a) nominative Case from the Infl is assigned into the VP; in (57b) it is passed down all the way from the matrix Infl, into the VP of the matrix verb’s infinitival complement. Clearly, some mechanism is available in German that allows such assignment to take place (see Den Besten (1981; 1985), Webelhuth (1986), Baker (1985), and Roberts (1985) for various suggestions). Now, on our analysis the passive morpheme is also an argument inside the VP at S-Structure; hence, the same mechanism could allow nominative Case to be assigned to it. The structure would be something like (58) (details omitted): (58) [s — [I′[VP getrunk + en] wurde]] nom. case

This allows the passive morpheme to be visible for θ-role assignment at LF in the usual way. See Roberts (1985, chap. 5) for further details.

Passive Arguments Raised  279 The independent motivation for (56) comes from the study of Noun Incorporation (NI), the phenomenon found in many polysynthetic languages by which the head noun of the thematic object of the verb appears morphologically combined with the verb. An example of such a construction is the following, from Nahuatl: (59) Na? ipanima ni-kwatini-itta. I always 1sS-tree-see ‘I see trees all the time.’ (Merlan (1976, (8B))) Baker (1985; 1988) has argued that, at least in some languages, this morphological combination comes about by adjoining the head of the object NP onto the verb in the syntax. If this is correct, then the verb in (59) has a structural object at all syntactic levels. Such objects are unusual, however, in that they need not necessarily be assigned Case. Thus, consider the following NI structures from Niuean (Austronesian; from Seiter (1980)): (60) Fai gata nakai i Niuē? exist-snake-Q in Niue ‘Are there snakes in Niue?’ (61) a. b.

Ne fanogonogo a lautolu *(ke he) tau lologo ke he past listen abs they to pl song to tau tūlā ne ua. Pl clock nonft two ‘They were listening to (the) songs for a couple of hours.’ Ne fanogonogo lologo a lautolu ke he tau tūlā ne ua. past listen song abs they to  pl clock nonft two ‘They were listening to songs for a couple of hours.’

(62) a. b.

Kua tā he tama e tau fakatino *(aki)  e malala. perf-draw erg-child abs-pl-picture with abs-charcoal ‘The child has been drawing pictures with a charcoal.’ Kua  tā  fakatino (h)e tama    (aki)  e malala. perf-draw-picture (erg)-child  with  abs-charcoal ‘The child has been drawing pictures with charcoal.’

(60) shows that the argument of an unaccusative verb can incorporate, even though such verbs cannot generally assign Case to their objects. By itself, this example is not especially revealing, since the object NP that the noun incorporated out of could receive nominative (or, in this case, absolutive) Case from Infl by a mechanism like that discussed for German. This would not extend to (61), however, which illustrates a special class of “defective transitive” verbs in Niuean. These verbs are not Case assigners;

280  Mark Baker et al. hence, the preposition ke he must normally be inserted to assign Case to their object, as in (61a).13 (61b), however, shows that if the head of the object is incorporated, the structure is grammatical without the insertion of ke he. This time, the object cannot be inheriting Case from Infl, because Infl’s Case is necessarily assigned to the overt thematic subject. Finally, (62) shows that when there is an instrument phrase in the clause, that instrument can receive structural Case from the verb if and only if the head of the object NP has incorporated into the verb.14 Given that each Case can only be assigned once, (62b) proves that incorporated objects need not be assigned Case in Niuean, since the two available structural Cases (ergative and absolutive) are visibly assigned to other NPs (the subject and the instrument, respectively). Their lack of Case notwithstanding, the incorporated nominals in (60)– (62) clearly receive θ-roles at LF. Thus, we have another counterexample to the standard formulation of the Visibility Condition. Hence, the condition must be revised along the lines indicated in (63): (63) In order for an argument to be visible for θ-role assignment at LF, it must either

a. be assigned Case, or b. have its head morphologically united with an X0.

The conceptual implications of this disjunctive formulation are discussed in Baker (1988, sec. 3.4). Its importance in the present context, however, is that it automatically solves the primary problem of impersonal passives. The passive morpheme is an argument that, like incorporated noun roots, is morphologically attached to the verb at S-Structure. Thus, it can be made visible by clause (b) of (63), and no θ-Criterion violation will result. Of course, the extended Visibility Condition in (63) does not give the complete account of impersonal passives, since it does not allow for the observed ambivalence of passive morphemes with respect to Case assignment. Whereas passive morphemes in Ukrainian never need to receive accusative Case in accordance with (63b), passive morphemes in German do absorb accusative Case obligatorily when the verb has such Case available (see (53)); furthermore, passive morphemes in English must be assigned accusative Case, making impersonal passives impossible. Thus, we take (63) to define the limits of what is allowed by Universal Grammar, but observe that individual languages have narrower restrictions—in particular concerning the use of clause (b). In this regard, it is striking that Noun Incorporation constructions have a parallel ambivalence with respect to accusative Case assignment. The examples from Niuean above illustrate the situation where the incorporated nominal root does not need Case at all; NI in this language is thus comparable

Passive Arguments Raised  281 to passives in Ukrainian. Based on the description in Merlan (1976), however, Nahuatl seems to be slightly different. As in Niuean, Incorporation in Nahuatl is possible with certain unaccusative type verbs that would have no Case for the incorporated nominal: (64) Tesiwi-weci-∅-∅. (compare (60)) hail-fall-pres-3s ‘Hail is falling.’/‘It is hailing.’ (Merlan (1976, (26))) Nevertheless, when the object of a transitive verb is incorporated, accusative Case is not free to be assigned to an instrumental instead, as it is in Niuean; rather, the instrument still appears with its characteristic preposition ika: (65) Ne? panci-tete?ki ika kočillo. (compare (62)) he bread-cut with knife ‘He cut the bread with a knife.’ (Merlan (1976, (35b))) Assuming that these patterns are sufficiently general, NI in Niuean is comparable to passive in German: (64) is an example of “impersonal NI” and (65) shows that “transitive NI” constructions are not allowed. Finally, in Greenlandic Eskimo, NI never takes place with an intransitive verb (Sadock (1980; 1985)), and logically transitive verbs with incorporated nouns cannot in general assign structural Case to an instrument or some other NP.15 Thus, NI in Greenlandic is comparable to passive in English, where neither impersonal nor transitive passives are found. This Case-theoretic variation, together with the similarity between passive and NI, can be captured in the following statement of how particular languages use the options made available by (63): (66) If α is an argument of language β, then α can be made visible by A: (63a) or (63b) B: (63a) if structurally possible; otherwise (63b) C: (63a) only where for a given language (and conceivably for specific morphemes in that language), one of A, B, or C is chosen. The properties of the languages we have reviewed can then be summarized as follows: (67) Visibility Setting      A Passive morpheme Ukrainian Incorporated noun Niuean

B   C German English Nahuatl Eskimo

The characteristic patterns follow directly from these statements.

282  Mark Baker et al. To summarize, we have sought in this section to deepen the typology of passive constructions from a Government-Binding perspective, in particular integrating transitive passives into the account. The classical fact that English passives absorb accusative Case still follows from the fact that the passive morpheme is an argument inside the VP at S-Structure, given that English (unlike German) has no way of assigning nominative inside the VP and that it (unlike Ukrainian) allows only the most restrictive interpretation of the Visibility Condition.16 Yet room is made for other languages to differ from this in welldefined ways. The cost of this achievement is a significant weakening of the association between θ-role assignment and Case assignment, as expressed in the extended Visibility Condition (63) and its parameters (66). Although this proposal is empirically motivated, its conceptual implications will require careful further study. Nevertheless, the theory still makes nontrivial predictions about the range of passive constructions found in particular languages. Finally, we have shown in this section that the Case-theoretic properties of passive morphemes are the same as those of incorporated noun roots. Now, it seems comparatively clear that incorporated nouns are arguments that receive a θ-role from the verb (possibly by originating in an independent NP at D-Structure, as in Baker (1985; 1988)). Inasmuch as the same principles that apply to them also apply to passive morphemes, we have important indirect evidence for the fundamental hypothesis that passive morphemes are arguments.

5.  Passives With Auxiliary Verbs To this point, our discussion has ignored the presence of the auxiliary verb in languages requiring one for the passive. This final section explores this aspect of the passive construction and its implications for our proposal. 5.1  Auxiliaries and the Passive Morpheme Perhaps the most obvious way to integrate the auxiliary verb be into the D-Structure representation of a passive construction is as in (68). Here -en is introduced under the Infl node and later moved onto the “main” verb: (68) [S [I –en ] [VP be [VP V NP]]] This is consistent with standard assumptions about auxiliaries and the position of Infl. The difficulty with this proposal is that it entails either lowering -en through at least two maximal projections or raising the main verb through as many maximal projections. The problem is most acute in situations involving several auxiliary verbs; consider, for example, how (69) would be derived under such an account: (69) Mary has been kissed.

Passive Arguments Raised  283 (70) a. [S [I –en ] [VP has [VP been [VP kiss Mary]]]] b. [S [I hasi –en ] [VP ti [VP been [VP kiss Mary]]]] The first two steps of the derivation are given in (70). (70a) portrays the D-Structure representation. We assume that the auxiliary verbs have and be head their own VPs, following Ross (1969). The first of these auxiliary verbs is then moved under the Infl node, following essentially Emonds (1976).17 The difficulty comes with the third step. We have argued that -en is affixed onto the main verb, to form a passive participle, and that this in turn causes the object to move into subject position. This can be brought about in one of two ways. Either -en must move down, or the main verb must move up into Infl. If the main verb moves into Infl, then it must share this position with up to four auxiliary verbs. (Otherwise, our theory would wrongly move the main verb to the left of the auxiliary verbs.) This hypothesis would create a number of problems for an account of the placement of not, the landing site of Q-Float, and the characterization of Subject-Auxiliary Inversion, among other things; see Emonds (1976).18 But allowing -en to move downward is also problematic. There are no cases that we know of involving movement downward through three maximal projections, as in (70). This operation seems to be unavailable in languages that have easily identifiable subject clitics. In French, for example, the weak pronoun je is a subject clitic, and has attached to the auxiliary verb in (71): (71) a. J’ai acheté ҫa pour Jean. ‘I bought this for Jean.’ b. *Ai j’acheté ҫa pour Jean. See Kayne (1972, sec. 2.4) for discussion. Note that je is not able to move past the auxiliary verb and attach to the main verb, as in (71b). If such a lowering process is not available for the subject clitic je, then it is a reasonable conjecture that such “long-distance” downward movement should be prohibited in general and that, in particular, such movement is not available to the passive morpheme. If the main verb may not move into Infl and -en may not move down onto the verb, then (70a) cannot be the D-Structure representation of (69). We are thus led to assign (69) a different D-Structure representation. We suggest that this representation is the one shown in (72) and that, in general, passive constructions have D-Structure representations akin to (73).19 Building on work by Stowell (1981), Couquaux (1981), and Burzio (1986), Kayne (1989) has suggested that the auxiliary verb be takes a clausal complement headed by Infl.20 If Kayne is correct, and if -en is introduced under Infl, then (73) emerges as the D-Structure representation for passive constructions.

284  Mark Baker et al. (72)

S I′


VP V have











S I′

NP e



V be






From (73), the S-Structure representation of a passive sentence may be derived in the way that we have already sketched. The auxiliary verb moves to Infl and the main verb merges with the passive morpheme, yielding (74):21

Passive Arguments Raised  285 (74)

S I′

NP e



VP ti


V + en


In English the passive morpheme must be assigned Case by the main verb, and this forces the object NP to move to subject position. Rather surprising support for the conjecture that (73) is the D-Structure representation for passives with auxiliary verbs comes from VP Ellipsis data in English. Consider the facts in (75): (75) a. Gary should have been sleeping, and Mary should have been, too. b. ?Gary should have been sleeping, and Mary should have, too. c. ?Gary should have been sleeping, and Mary should, too. d. Gary should have been paid better, and Mary should have been, too. e. Gary should have been paid better, and Mary should have, too. f. ?Gary should have been paid better, and Mary should, too. g. Gary is being paid better nowadays, and Mary is, too. h. *Gary was being given a book, and Mary was being, too. In (75a)–(75g) VP Ellipsis has allowed a VP that otherwise would have been headed by the main verb or one of the auxiliary verbs to be empty. As these examples suggest, VP Ellipsis may delete any of the VPs, as long as the first auxiliary verb is stranded (although in several cases the result is somewhat marginal).22 There is one gap in the paradigm, however, illustrated by (75h). When both progressive be and passive (or copular) be are present, VP Ellipsis cannot delete the phrase headed by the lowest predicate. We shall now show how this fact supports the structure in (73). Our argument proceeds as follows. We give a precise description of the environments where VP Ellipsis may occur. This requires both describing the situations that (75h) exemplifies and defining the types of phrases that may undergo deletion. This second task requires defending a novel theory of main verb be, for clauses with main verb be do not behave as expected

286  Mark Baker et al. under VP Ellipsis. Once this theory of main verb be is embraced, however, it becomes impossible to adequately characterize the environments illustrated by (75h), where VP Ellipsis is prevented. We are rescued from the conundrum if (73) is the true structure of passive clauses. We begin with the second task: determining which phrasal categories VP Ellipsis may delete. 5.2  VP Ellipsis The contrasts in (76) suggest that only VPs may be elliptical: (76) a. I said that Mary should leave, and that Tom should [VP e], too. b. *I said that Mary should kiss Pete, and that Tom should kiss [NP e], too. c. *I said that Mary thought that Bill left, and that Tom thought [S‫ ׳‬e], too. d. *I said that Mary talked to Bill, and that Tom talked [PP e], too. e. *I said that Mary believed Bill to be intelligent, and that Tom believed [S e], too. NPs, S’s, PPs, and Ss may not be elliptical, as the ungrammaticality of (76b– e) shows. The problem comes with the examples in (77), which suggest that an AP may delete: (77) a. I said that Mary was angry, and that Bill was [? e], too. b. I claimed that Mary is happy with Tom, and that Bill is [? e], too. In other situations, however, APs may not be elliptical; consider (78): (78) a. *John seems angry, and [S Mary [VP seems [AP* e]]], too. b. *Mary feels happy, and [S Gary [VP feels [AP* e]]], too. We need a way of distinguishing between these two environments. Because the APs deleted in (77) do not differ in any relevant respect from those deleted in (78), the factor responsible for the difference in grammaticality must be the verb governing the AP. In (77) the verb is be, whereas in (78) it is either seem or feel. We will treat be as the exceptional case and link its unique behavior in this context with another of its exceptional properties: be is the only “main” verb in English that may raise into Infl.23 This is demonstrated by, among other things, its ability to appear before not: (79) Gary is not happy.

Passive Arguments Raised  287 Our explanation for this correspondence goes as follows. Imagine that main verb be is, in fact, not a main verb, but is instead an auxiliary verb.24 This immediately accounts for the fact that it, unlike all other “main” verbs, is able to appear in Infl position. Now, if all sentences must have a main verb, then sentences overtly containing just be must also contain a phonologically null main verb.25 It is the VP headed by this null verb that is deleted in (77).26 Our claim, then, is that the difference between be and other verbs rests on be’s unique subcategorization properties. Only be may be followed by a VP headed by an empty verb. This accounts at once for be’s anomalous behavior with respect to Verb Raising and VP Ellipsis. We may therefore conclude that VP Ellipsis does affect only verb phrases; where it seems that APs are being deleted, in fact a verb phrase containing an empty verb is deleted.27 5.3  VP Ellipsis and Passive Participles We turn now to a description of the restriction on VP Ellipsis that prevents its application in (75h). In that example, repeated in (80), an empty VP cannot follow the present participle: (80) *Gary was being given a book, and Mary was being, too. A very similar restriction is found in clausal gerunds. As in (80), an empty VP cannot be found following the gerund: (81) a. *I remember Mary having eaten the apple, and Gary having, too. b. *I remember Mary having been angry about it, and Gary having, too. A number of explanations have been proposed for the failure of VP Ellipsis in these contexts; see Lobeck (1986) for a review. It is sufficient for our purposes to rely on a descriptive statement of the restriction, as in (82): (82) VP Ellipsis cannot apply to a VP governed by V + ing. Because (82) refers to VPs governed by V + ing, it only prevents deletion of the VP immediately below V + ing. A VP more deeply embedded may delete. Thus, (82) distinguishes the ungrammatical (81b) from the grammatical (83): (83) I remember Mary having been angry about it, and Gary [I havingi ] [VP ti [VP been [VP e]]], too.

288  Mark Baker et al. In (83) the VP headed by an empty verb, and containing the AP angry about it, has deleted. This conforms to (82) because the elliptical VP is not governed by having. Now consider how (82) will apply in passive constructions involving progressive be, such as (84). Our theory of be will assign to such examples the representation in (85). (84) Gary was being given a book. (85) S I′

NP Gary








being V ε





a book

(The “ε” in (85) represents the null verb that subcategorizes be.) (82) prevents VP* (the VP headed by an empty verb) from being elliptical, because it is governed by being. But it does allow the phrase labeled “?” to be elliptical. However, this is precisely what the example that began this section, (75h), shows cannot happen: (86) *Gary was being given a book, and Mary was being, too. There is no readily apparent way that (82) can be extended so as to prevent deleting “?” in (85), without wrongly preventing the grammatical (83). The solution must lie elsewhere. The structure we have given passive clauses provides that solution. If (73) is correct, and passive participle phrases are Ss headed by an Infl containing -en, then the phrase labeled “?” in (85) is an S:28

Passive Arguments Raised  289 (87)

S I′

NP Gary








being V ε


I -en




a book

Recall that in the previous section we established that VP Ellipsis only affects VPs. Thus, the reason “?” cannot be elliptical in (85) is that it is not a VP, just as our theory of passives entails.

6. Conclusion We have discussed aspects of the syntax of passive constructions, taking as our starting point Jaeggli (1986). We have argued that the only crucial property of a passive morpheme is that it is an argumental affix tied to the Infl node; we have shown how all of the well-known properties of the passive construction can be derived from this. Furthermore, we have shown that the passive morpheme behaves like other syntactic arguments with respect to binding theory (section 2), and with respect to θ-theory (section 3), Case theory (section 4), and X-bar theory (section 5). In this way, a diverse body of facts—ranging from “crossover” effects in passives, to cross-linguistic constraints on the class of passivizable verbs, to failures of VP Ellipsis— receives a unified and explanatory account.

Notes   1. Notice that this account of by-phrases does not entail that by-phrases cannot appear in other constructions, such as nominals or the Romance faire. . .par construction.

290  Mark Baker et al.   2. If IMP is in VP, then the trace in object position binds IMP as well. This would explain the impossibility of It was reasoned to John that . . . on the interpretation where IMP = John. The following examples confirm this: (i) *Stories were told IMPi to Johni. (ii) *Letters were sent IMPi to Maryi. (iii) *Whoi was testimony given IMPi about ti? (iv) *A book was presented IMPi to everyonei.

These examples are all bad, showing that there is a c-command relation between IMP and other arguments in VP. In that case there are two violations of Principle C.   3. See Pesetsky (1987) for an alternative account of psych-verbs.   4. We thank a reviewer for this point and for examples (34b), (35).   5. In Perlmutter and Postal’s original formulation of the 1AEX, it is assumed that in impersonal passives the expletive is inserted as an object and is promoted to subject in order to block these examples.   6. Jaeggli’s account contains a serious difficulty, however. In particular, to block sentences like (39), his account requires (i): (i) -en may only receive the external θ-role of the V.

Jaeggli seeks to derive this statement from the fact that the external argument is the only one for which the verb is not strictly subcategorized. Hence, its manifestation alone is not restricted to a particular syntactic frame and it alone can be expressed by an affix. However, this is too strong, since internal arguments can be expressed by affixes in languages of the world: (42), (43), (45), (50) are all examples of this. On our account, (i) is derived from the fact that-en is generated in Infl, as discussed directly below.   7. Alternatively, the verb could raise to the Infl position, joining the passive morpheme there, as assumed in Baker (1985) (see Chomsky (1986)). At least in this case, this alternative seems more natural. See section 5 for relevant discussion.   8. See Burzio (1986) on chains that “overlap” in the subject position.   9. Moreover, our approach is consistent with a language having two passive morphemes, one of which obeys the lAEX and one of which does not. Italian seems to be such a language, having as it does both a copular passive and impersonal “passives” with the clitic si (see Belletti (1982), Manzini (1983), Burzio (1986)). This situation would be impossible if the 1AEX were itself a parameter of language. See Baker (1988, 332–334) for discussion. 10. Although it is true that verbs without an internal argument never passivize in English, certain verbs that never take an NP argument can. It is not obvious whether such verbs have accusative Case to assign to the passive morpheme. One such class of verbs are the pseudopassives: (i) John was talked about t last night.

We follow the tradition of assuming some process of reanalysis between talk and about: this complex verb can then inherit a Case feature from the P, and this will be assigned to -en, making it visible.   Conceivably, this approach could be extended to impersonal passives in French, which are possible only if the verb takes an internal PP argument: (ii) Il a été parlé de  vos frères  hier soir. There was spoken of your brothers yesterday evening (Kayne (1984))

Here one might claim that the same reanalysis and NP Movement processes that occur at S-Structure in English occur at LF in French. (On LF reanalysis, see

Passive Arguments Raised  291 Baker (1985), Haïk (1985); on LF NP Movement, see Chomsky (1986).) This would capture the close correspondence between permissible pseudopassives in English and permissible impersonal passives in French. 11. Italian and Spanish may be such languages, given that impersonal si passives are allowed with intransitive verbs but copular passives are not. The Case properties of the si morpheme are complex, however; in particular, there is evidence from infinitival constructions that si is dependent on nominative Case in a very unusual way, which goes beyond the bounds of our discussion. See Burzio (1986) for review and discussion.   Polish also has two passive constructions: one with copular morphology that is allowed only with transitive verbs and one with clitic morphology that is allowed with intransitives. In this language the mysterious Case-theoretic complications in the clitic-type passive do not seem to arise (J. Zapior (personal communication)). 12. Keenan and Timberlake (1985) point out that on earlier Government-Binding views of the passive, these constructions would be exceptions to Burzio’s Generalization, because the passive verbs assign accusative Case but no external θ-role. On our analysis, this potential anomaly disappears, since the verb does assign an external θ-role (to the passive morpheme itself) as well as accusative Case. 13. Given the theory of Baker (1985), we know that ke he is an inserted preposition in (61a) because otherwise the incorporation of its object would be impossible. Independent evidence for this is the fact that ke he can mark the raised subject in “raising to object” constructions (see Massam (1985)); this would be impossible if ke he were always a θ-role assigner. 14. Crucially, the fact that the instrumental preposition cannot be dropped without incorporation shows that—unlike verbs in certain other languages—verbs in Niuean can only assign one structural Case. 15. A handful of logically triadic verbs such as ‘give’ are exceptions to this. The important point is that “reassignment” of structural Case is not free across the class of incorporating transitive verbs, the way it is in Niuean. 16. Notice that we have made no attempt to choose between the two different approaches to impersonal passives (for example, in German), either of which could be sufficient. We see no need to do so for current purposes, since both have solid independent motivation.   An LI reviewer has suggested to us that infinitives might provide a way of distinguishing the two proposals: if the passive morpheme needs to receive nominative Case from a tensed Infl, then impersonal passives should be impossible in infinitivals. In fact, infinitival impersonal passives are acceptable embedded under raising verbs but not in other contexts, both in German (Roberts (1985)) and in relevant Slavic languages (Sobin (1985), J. Zapior (personal communication)). It is not clear that this distinguishes the proposals, however, because even if the passive morpheme did not need Case from Infl, a null subject required by the Extended Projection Principle of Chomsky (1981) might, since (for some reason) PRO—the usual subject of an infinitive—cannot be an expletive (*PRO/*it to rain would bother us). Since the issues about control and expletives that arise lead far beyond passive constructions per se, we do not explore the matter further. 17. See also Akmajian and Wasow (1975), Culicover (1982), Jackendoff (1972), and—for recent discussion, where somewhat different structures are assumed— Lasnik (1981), Lobeck (1986), and Chomsky (1986). 18. In particular, not may have clausal scope only if it follows the first of a series of auxiliary verbs, the contrast in (i) shows. And if only verbs that have been

292  Mark Baker et al. moved into Infl may undergo Subject-Auxiliary Inversion and “Subject-Auxiliary Inversion” actually moves the Infl node, then the contrast in (ii) also demonstrates that main verbs may not move into Infl:   (i) Mary can’t have left. ?Mary can have not left. (ii) Can Mary leave? *Can leave Mary? *Leaves Mary? 19. Our proposal gives to a passive sentence a D-Structure representation very like the one adopted in Chomsky and Lasnik (1977), where the passive morpheme is assumed to head an AP. 20. Kayne’s claim is made, in particular, for the past tense and passive auxiliary be in the Romance languages. We extend this claim to the passive auxiliary be in English. 21. We do not know how to determine whether the main verb has raised to join-en in Infl position, or whether-en has lowered onto the verb. Note that raising the main verb in these contexts would not engender the empirical problems that arise if one hypothesizes that the main verb moves into the matrix Infl. (See footnote 18.) Of course, the explanation that is ultimately given for the inability of main verbs to move into Infl in general must have the property that main verbs are allowed to raise into an Infl bearing the passive morpheme. That is, this scenario relies on some unknown constraint on Verb Raising that distinguishes the Infls of passive clauses from others. 22. We disagree here with Lobeck’s (1986) claim that examples like (75b,c) are ungrammatical. In particular, we find a sharp contrast between these examples and (75h). Our judgments are consistent with Emonds (1976) and Iwakura (1977). 23. We are setting aside the exceptional, and somewhat archaic or dialectic, instances of main verb raising with need, have, and dare. See Pullum and Wilson (1977) for discussion. 24. We follow Schachter (1983) in this respect. With this hypothesis, we may define auxiliary verb as a verb that is subcategorized by a VP, and main verb as any other verb. This leaves the task of distinguishing auxiliary verbs from causative and perception verbs, which also seem to be subcategorized by a VP. 25. This may be too strong. It is sufficient for our purposes to say that “main verb” be may be an auxiliary verb. That there are situations where be can be the main verb is suggested by such cases as (i): (i) We saw Mary be rewarded.

Bare clausal complements to perception verbs do not usually host auxiliary verbs. 26. An alternative account that also links the ability of be to raise to Infl with its ability to introduce an elliptical phrase turns on the structure that Verb Raising produces. Once be has raised, it is followed by a VP headed by its own trace, as in (i): (i) [Mary [I bei ] [VP ti [AP happy]]].

Now VP Ellipsis could simply delete this VP. This account, though adequate for simple clauses, does not extend to examples involving modal verbs, as in (ii): (ii) [Mary [I should] [VP be [AP happy]]].

In (ii) be does not raise into Infl because should is base-generated there (see Emonds (1976) and Jackendoff (1977)), but ellipsis of the material following be is still possible. The text account is therefore to be preferred.

Passive Arguments Raised  293 27. If the passive participle results from raising a verb into an Infl with-en, then it is mysterious why VP Ellipsis cannot affect the VP vacated by the moved verb, as in (i): (i) *Mary was believed happy, and Gary was believed, too.

This problem does not arise if VP Ellipsis is not a deletion process but instead results from base-generating an empty VP (see Williams (1977)). On this view, the participle believed in the second conjunct of (i) would have to be basegenerated in Infl, and this is impossible under our account of the passive. 28. Note that (85) does not involve an “adjectival” passive, since given cannot assign the goal θ-role externally: the given book, but *the given man. Hence, one cannot resort to claiming that “?” is an AP in (85).

References Aissen, J. (1983) “Indirect Object Advancement in Tzotzil,” in D. Perlmutter, Studies in Relational Grammar 1, University of Chicago Press, Chicago, Illinois. Akmajian, A. and T. Wasow (1975) “The Constituent Structure of VP and AUX and the Position of the Verb BE,” Linguistic Analysis 1, 205–245. Baker, M. (1985) Incorporation: A Theory of Grammatical Function Changing, Doctoral dissertation, MIT, Cambridge, Massachusetts. Baker, M. (1988) Incorporation: A Theory of Grammatical Function Changing, University of Chicago Press, Chicago, Illinois. Belletti, A. (1982) “‘Morphological’ Passive and Pro-Drop: The Impersonal Construction in Italian,” Journal of Linguistic Research 2, 1–34. Belletti, A. and L. Rizzi (1988) “Psych-verbs and θ-theory,” Natural Language and Linguistic Theory 6, 291–352. Besten, H. den (1981) “Government, Syntaktische Struktur und Kasus,” in M. Kohrt and J. Lenerz, eds., Sprache, Formen und Strukturen, Max Niemeyer Verlag, Tübingen. Besten, H. den (1985) “The Ergative Hypothesis and Free Word Order in Dutch and German,” in J. Toman, ed., Studies in German Grammar, Foris, Dordrecht. Borer, H. (1984) Parametric Syntax, Foris, Dordrecht. Burzio, L. (1986) Italian Syntax, Reidel, Dordrecht. Chomsky, N. (1981) Lectures on Government and Binding, Foris, Dordrecht. Chomsky, N. (1986) Barriers, MIT Press, Cambridge, Massachusetts. Chomsky, N. and H. Lasnik (1977) “Filters and Control,” Linguistic Inquiry 8, 425–504. Couquaux, D. (1981) “French Predication and Linguistic Theory,” in R. May and J. Koster, eds., Levels of Syntactic Representation, Foris, Dordrecht. Culicover, P. (1982) Syntax, 2nd ed., Academic Press, New York. Emonds, J. (1976) A Transformational Approach to English Syntax, Academic Press, New York. Gibson, J. (1980) Clause Union in Chamorro and in Universal Grammar, Doctoral dissertation, University of California, San Diego. Haϊk, I. (1985) The Syntax of Operators, Doctoral dissertation, MIT, Cambridge, Massachusetts. Iwakura, K. (1977) “The Auxiliary System in English,” Linguistic Analysis 3, 101–136. Jackendoff, R. (1972) Semantic Interpretation in Generative Grammar, MIT Press, Cambridge, Massachusetts.

294  Mark Baker et al. Jackendoff, R. (1977) X’-Syntax: A Study of Phrase Structure, MIT Press, Cambridge, Massachusetts. Jaeggli, O. (1982) Topics in Romance Syntax, Foris, Dordrecht. Jaeggli, O. (1986) “Passive,” Linguistic Inquiry 17, 587–622. Kayne, R. (1972) “Subject Inversion in French Interrogatives,” in J. Casagrande and B. Saciuk, eds., Generative Studies in Romance Languages, Newbury House, Rowley, Massachusetts. Kayne, R. (1984) Connectedness and Binary Branching, Foris, Dordrecht. Kayne, R. (1989) “Facets of Romance Past Participle Agreement.” In P. Benincà (ed) Dialect Variation and the Theory of Grammar. Dordrecht: Foris, pp. 85–103. Keenan, E. and A. Timberlake (1985) “Predicate Formation Rules in Universal Grammar,” in J. Goldberg et al., eds., Proceedings of the West Coast Conference on Formal Linguistics 4, Stanford University, Palo Alto, California. Lasnik, H. (1981) “Restricting the Theory of Transformations: A Case Study,” in N. Hornstein and D. Lightfoot, eds., Explanations in Linguistics: The Logical Problem of Language Acquisition, Longmans, London. Lasnik, H. (1985) “Illicit NP Movement: Locality Conditions on Chains?”Linguistic Inquiry 16, 481–490. Lebeaux, D. (1983) “A Distributional Difference between Reciprocals and Reflexives,” Linguistic Inquiry 14, 723–729. Lees, R. B. and E. S. Klima (1963) “Rules for English Pronominalization,” Language 39, 17–28. Lobeck, A. (1986) Syntactic Constraints on VP Ellipsis, Doctoral dissertation, University of Washington, Seattle. Manzini, M. R. (1980) “On Control,” ms., MIT, Cambridge, Massachusetts. Manzini, M. R. (1983) Restructuring and Reanalysis, Doctoral dissertation, MIT, Cambridge, Massachusetts. Marantz, A. (1984) On the Nature of Grammatical Relations, MIT Press, Cambridge, Massachusetts. Massam, D. (1985) Case Theory and the Projection Principle, Doctoral dissertation, MIT, Cambridge, Massachusetts. Merlan, F. (1976) “Noun Incorporation and Discourse Reference in Modern Nahuatl,” IJAL 42, 177–191. Nerbonne, J. A. (1982) “Some Passives Not Characterized by Universal Rules: Subjectless Impersonal,” in B. Joseph, ed., Grammatical Relations and Relational Grammar, Working Papers in Linguistics No. 26, Ohio State University, Columbus, Ohio. Ostler, N. (1979) Case-Linking: A Theory of Case and Verb Diathesis Applied to Classical Sanskrit, Doctoral dissertation, MIT, Cambridge, Massachusetts. [Distributed by the Indiana University Linguistics Club, Bloomington.] Özkaragöz, I. (1980) “Evidence in Turkish for the Unaccusative Hypothesis,” in Proceedings of the Sixth Annual Meeting of the Berkeley Linguistics Society, University of California, Berkeley. Özkaragöz, I. (1982) “Monoclausal Double Passives in Turkish,” paper presented at the Conference on Turkish Language and Linguistics in Ataturk’s Turkey, University of California, Berkeley. Perlmutter, D. (1978) “Impersonal Passives and the Unaccusative Hypothesis,” in J. Jaeger et al., eds., Proceedings of the Fourth Annual Meeting of the Berkeley Linguistics Society, University of California, Berkeley. Perlmutter, D. and P. Postal (1984) “The 1-Advancement Exclusiveness Law,” in D. Perlmutter and C. Rosen, eds., Studies in Relational Grammar 2, University of Chicago Press, Chicago, Illinois. Pesetsky, D. (1987) “Binding Problems with Experiencer Verbs,” Linguistic Inquiry 18, 126–140.

Passive Arguments Raised  295 Postal, P. (1971) Cross-Over Phenomena, Holt, Rinehart and Winston, New York. Postal, P. (1986) Studies of Passive Clauses, State University of New York Press, Albany, NewYork. Pullum, G. K. and D. Wilson (1977) “Autonomous Syntax and the Analysis of Auxiliaries,” Language 53, 741–788. Rizzi, L. (1986) “On Chain Formation,” in H. Borer, ed., Syntax and Semantics 19: The Syntax of Pronominal Clitics, Academic Press, New York. Roberts, I. (1985) The Representation of Implicit and Dethematized Subjects, Doctoral dissertation, University of Southern California, Los Angeles. Roeper, T. (1983) “Implicit Arguments and the Head-Complement Relation,” Linguistic inquiry 18, 267–310. Ross, J. R. (1969) “Auxiliaries as Main Verbs,” in W. Todd, ed., Studies in Philosophical Linguistics, Series One, Great Expectations Press, Evanston, Illinois. Sadock, J. (1980) “Noun Incorporation in Greenlandic,” Language 56, 300–319. Sadock, J. (1985) “Autolexical Syntax: A Theory of Noun Incorporation and Similar Phenomena,” Natural Language and Linguistic Theory 3, 379–440. Schachter, P. (1983) “Explaining Auxiliary Order,” in F. Heny and B. Richards, eds., Linguistic Categories: Auxiliaries and Related Puzzles, vol. 2, Reidel, Dordrecht. Seiter, W. (1980) Studies in Niuean Syntax, Garland, New York. Sobin, N. (1985) “Case Assignment in Ukrainian Morphological Passive Constructions,” Linguistic Inquiry 16, 649–662. Stowell, T. (1981) Origins of Phrase Structure, Doctoral dissertation, MIT, Cambridge, Massachusetts. Timberlake, A. (1976) “Subject Properties in the North Russian Passive,” in C. Li, ed., Subject and Topic, Academic Press, New York. Timberlake, A. (1982) “The Impersonal Passive in Lithuanian,” in Proceedings of the Eighth Annual Meeting of the Berkeley Linguistics Society, University of California, Berkeley. Travis, L. (1984) Parameters and Effects of Word Order Variation, Doctoral dissertation, MIT, Cambridge, Massachusetts. Webelhuth, G. (1986) “Some Data on the Verb-Object Relation in German,” Linguistic Inquiry 17, 772–776. Williams, E. (1977) “Discourse and Logical Form,” Linguistic Inquiry 8, 103–139. Williams, E. (1981) “Argument Structure and Morphology,” The Linguistic Review 1, 81–114.


Complex Inversion in French Luigi Rizzi and Ian Roberts

1. Introduction In this paper1 we would like to show that some recent theoretical innovations permit a principled account of complex inversion, a French construction which is on the agenda of theoretical and Romance syntacticians ever since Kayne’s (1972) seminal analysis. Some properties of the construction will lead us to revise and tighten current assumptions on Case, visibility and head-to-head movement, and to propose a new hypothesis on the nature of the root/non-root distinction. The major cases of complex inversion are found in root interrogative sentences: (1) Quel livre Jean a-t-il lu? Which book John has he read? (2) Personne n’est-il venu? No-one isn’t he come? ‘Didn’t anyone come?’ A striking property of the construction is that there are apparently two subjects: a full NP, which occurs to the left of the inflected verb (after a whword or initially in yes/no questions), and a pronoun to the right of the inflected verb. That the NP is not dislocated is shown by the fact that it follows the Spec of Comp in (1), and by the well-formedness of an example like (2) involving the quantified NP personne ‘no-one’, which is in general unable to appear in a dislocated position (see n. 2). The simultaneous presence of a lexical and a pronominal subject here gives the appearance of clitic doubling, either of the kind found with objects in various dialects of Spanish, as illustrated from the River Plate dialect in (3a), or the kind found with subjects in northern Italian dialects, illustrated from Fiorentino in (3b): (3) a. Lo ví a Juan. Him I-saw to John. ‘I saw John.’

Complex Inversion in French  297 b. La Maria la parla sempre. The Mary she talks always. ‘Mary is always talking.’ However, despite apparent similarities, at least two fundamental properties distinguish the French case from those in (3). First, the French construction is highly selective in that it is restricted to direct questions and other environments featuring fronting of the inflected verb. No such construction-specific restriction is found with the ordinary cases of clitic doubling. Second, the pronominal elements in (3) have clear properties of syntactic clitics, which occur attached to the verb or under Infl, and do not occupy an NP position in the syntax. On the other hand, it appears to be the case that French unstressed subject pronouns are in NP position in the syntax, and are cliticized to the inflected verb in the phonology (for relevant evidence see Couquaux 1986; Kayne 1983; Rizzi 1986). The contrast with northern Italian dialects is revealing; while subject clitics and full subject NPs can, and in some cases must, co-occur in many dialects, the two elements are in full complementary distribution in standard French.2 If French subject pronouns manifest an NP position on syntactic levels of representation, then the kind of doubling shown in (1) and (2) must involve two NP positions, not just one, as in (3). Such a state of affairs thus raises different and more acute theoretical problems than the familiar cases of clitic doubling. The basic goal of this paper is to show that the fundamental properties of complex inversion can be properly understood if we combine elements of the thorough analysis proposed in Kayne (1983) with certain more recent proposals: a. Chomsky’s (1986b) extension of X-bar Theory to non-lexical categories; b. an adaptation of Baker’s (1988) approach to visibility and head-to-head movement; c. the idea, independently arrived at by a number of researchers, that subjects are base-generated in VP and raise to their surface subject position in IP. Our adaptation of Baker’s theory of head-to-head movement, in conjunction with a strict interpretation of the Projection Principle, also yields a principled account of the fact that complex inversion is limited to root structures (cf. Den Besten 1983; Safir 1981/82; Safir and Pesetsky 1981), an account which can be extended to root phenomena in general (Emonds 1976). In section 2 we outline an analysis of subject-clitic inversion, a necessary prerequisite. In section 3, we address the problem posed by the presence of two subjects, which we factor into three distinct problems: a. how can the Case requirements of the two nominals be simultaneously fulfilled? (the Case problem); b. which positions do the two subjects come from in the derivation? (the source problem);

298  Luigi Rizzi and Ian Roberts c. which positions do the two subjects occupy at S-structure? (the landingsite problem). In section 4, we turn to the question of the restriction of complex inversion to root contexts and we develop a general approach to the root/non-root distinction.

2.  Subject-Clitic Inversion First of all, it is necessary to sketch an analysis of one component of complex inversion which exists as an independent construction: subject-clitic inversion. This construction involves the inversion of a pronominal subject with the inflected verb, shown in (4): (4) a. b.

Est-il parti? ‘Has he left?’ Où est-il allé? ‘Where has he gone?’

Following den Besten (1983) and Kayne (1983), we assume that this inversion process involves leftward movement of the verb over the subject rather than rightward movement of the subject over the verb. Adopting the extension of X-bar Theory to non-lexical categories proposed in Chomsky (1986b), and the theory of head-to-head movement of Baker (1988), this process can be seen as raising of the inflected verb from I0 to C0, shown in 5 (cf. Rizzi 1987):3 (5)


XP C0 I0









This approach immediately explains why inversion is impossible if C0 is filled. For instance, in the Quebec dialect of French, where an overt C0 can co-occur with a wh-element in Spec-CP, inversion is restricted to the case in which this option is not taken (Goldsmith 1981): (6) a. Qui que tu as vu? Who that you have seen?

Complex Inversion in French  299 b. Qui as-tu vu? Who have you seen? c. *Qui qu’ as-tu vu? Who that have you seen? In (6c) C0 is filled by que and hence is not available as a landing site for movement of the inflected verb. Standard French does not allow the co-occurrence of a wh-element and que, but a reflex of the same phenomenon can be seen with a certain class of adverbs. These adverbs are able to either trigger inversion or to co-occur with a that-clause. Again, these options are exclusive: (7) a. Peut-être qu’il a fait cela. Perhaps that he has done that. b. Peut-être a-t-il fait cela. Perhaps has he done that. c. *Peut-être qu’ a-t-il fait cela. Perhaps that has he done that. The natural account of (7) is to say that this class of adverbs (which includes peut-être, à peine ‘hardly,’ and a few others) are able to appear in Spec-CP. This brings the paradigm in (7) into line with that in (6). Third, again in standard French, a conditional clause can be introduced either by the overt complementizer si ‘if’ or by the inversion of a verb in the conditional mood, but not by both: (8) a. Si tu avais fait cela . . . If you had done that . . . ’ b. Aurais-tu fait cela . . . Had you done that . . . ’ c. *Si aurais/avais-tu fait cela . . . If had you done that . . . ’ Si and the inflected verb thus appear to compete for the same position, namely C0. The analysis of subject-clitic inversion as involving I0-to-C0 movement follows and updates the basic idea proposed by den Besten in that it treats inversion in French as essentially the same phenomenon as the more pervasive kinds of inversion found in Germanic languages. There is nevertheless a striking difference between the French case and the Germanic case (illustrated below by subject-aux inversion in English); namely, that the process is restricted to pronominal subjects in French, unlike in Germanic:   (9) a.   Has John spoken? b. *A Jean parlé?

300  Luigi Rizzi and Ian Roberts (10) a. Has he spoken? b. A-t-il parlé? Developing a suggestion by Szabolcsi (1983), we will propose that the impossibility of (9b) should be accounted for in terms of Case Theory. The idea is that raising of I0 to C0 destroys the context in which I0 assigns Case to the subject in French, but not in English or in other Germanic languages. A straightforward implementation of this proposal makes use of the idea of directionality of Case assignment; suppose that in French Nominative Case can only be assigned leftward, while in English and in other Germanic languages either direction of assignment is possible. In that case, a phonologically-realized NP will violate the Case Filter in the context created by I0-to-C0 movement in French. This is precisely the context of (9). So Jean violates the Case Filter in (9b). In English, there is no Case Filter violation here because Nominative can be assigned either leftward or rightward.4 As it stands, this proposal is too strong, as it rules out the well-formed example (10b). In order to account for (10b) we need to elaborate on what the Case Filter really requires. Following the general proposals of Baker (1988), we assume that the requirement that NPs be Case-marked is actually an instance of a more general requirement that nominals be associated with a Case feature. This association takes place in one of two ways: either by means of assignment of the feature from a head to the nominal, or by means of incorporation of the nominal into the head bearing the Case feature (for a precise formulation of this requirement, see Baker, Johnson and Roberts (1989: 239) [this volume, Chapter 8]): (11) a. Assignment:




b. Incorporation:




One variety of incorporation is cliticization. Following Kayne (1983), we assume that the pronoun in subject position can cliticize to the inflected verb in the syntax, once the latter has been moved to C0.5 So (10b) has a representation as shown in (12): (12)

CP C0 ai(-t-)ilj

IP NP tj

I′ I0




Complex Inversion in French  301 Here the clitic escapes the effects of the strict directionality condition on Nominative assignment in French as it is associated with a Case feature (the Nominative feature borne by I0 in C0) by incorporation with C0, so that the fact that Case assignment to Spec-IP is blocked is irrelevant. To sum up, we treat subject-clitic inversion as the combination of the raising of the inflected verb to C0 followed by incorporation of the subject pronoun with the inflected verb in C0. Incorporation of the pronoun is one way of associating it with a Case feature. Due to the directionality condition on Nominative assignment in French (or, alternatively, the language-specific mode of Case assignment discussed in n. 4), this is the only way that a subject can satisfy the requirements of Case Theory when I0 to C0 takes place. The fact that I0 to C0 can only occur with pronominal subjects is thus reduced to the fact that pronominals are the only elements that undergo incorporation in French (in fact, incorporation from subject position appears to be restricted to pronominals universally; see Baker and Hale 1988). With this background, we can go back to the issues raised by complex inversion.

3.  The Problem of Two Subjects The existence of two apparent subjects in complex inversion constructions poses three problems. The first of these we call the “Case problem”: how are the two subjects assigned Case? The second problem is the “source problem”: where do these two subjects originate? The third problem is the “landing-site problem”: where do these subjects, in particular the NP, appear at S-structure? In this section, we will answer each of these questions in turn, thereby arriving at an analysis of complex inversion. 3.1  The Case Problem It is implicit in most versions of Case Theory and explicit in some (e.g., Vergnaud 1985) that there is a biunique relation between Case assigners and Case assignees. If this is so, the Case problem can be put as follows: how do both the full NP and the clitic satisfy the requirements of Case Theory in complex inversion? We will show that the analysis of subject-clitic inversion given in the previous section provides an automatic solution to this problem. Before presenting our analysis, we must make a preliminary assumption concerning the position of the full subject NP, a matter we will elaborate on below. For the moment, we simply recast Kayne’s (1983) proposal in terms of the assumptions about X-bar Theory of Chomsky (1986b). As the NP apparently occupies a position immediately to the right of Spec-CP and

302  Luigi Rizzi and Ian Roberts immediately to the left of C0, we take it that this NP is left-adjoined to C′. The complete structure is thus the following: (13) [CP wh [C’ NP [C ’ [C0 I0 -Cl]IP]]] In this structure the NP is governed by I0 and is to the left of it. Therefore it is assigned Nominative Case from right to left, in the usual way operative in simple declarative clauses (and, presumably, the two elements are in a configuration sufficiently close to Spec-head agreement, if the proposal in n. 4 is to be adopted). As for the clitic, we have seen that it cannot be assigned Case in the usual way because it is “on the wrong side” of I0, and need not be assigned a Case because it is associated with a Case feature by incorporation. The Case requirements of the two nominals are thus satisfied independently of each other. This account is not incompatible with the idea that there is a bi-unique relation between Case features and nominals; the bi-uniqueness condition is relativized to modes of association of Case features with nominals, in that assignment of a Case to a nominal is subject to bi-uniqueness, as well as association of a nominal to a Case feature by incorporation. However, the two modes of association can independently associate a single Case feature with two nominals.6 This account allows us to see why complex inversion is impossible in English: (14) *Which books John has he read? Here I0 in C0 could assign Nominative Case either leftwards or rightwards, but not to both nominals at the same time. Since English subject pronouns never undergo incorporation, he cannot incorporate into C0, so this means of satisfying the requirements of Case Theory is unavailable. Hence there is no way that the requirements of Case Theory can be satisfied in 14. This analysis retains the idea of Kayne (1972) that the possibility of complex inversion in French is a consequence of the existence of subject clitics in this language. 3.2  The Source Problem The Case problem is just one of the issues raised by the presence of two subjects. Another question which must be answered is: where do the two subjects come from, i.e., which positions are they base-generated in? We begin by giving a brief summary of Kayne’s (1983) answer to this question. In Kayne’s terms, the derivation of a complex inversion structure is as follows (we alter the category labels so as to accord with Chomsky 1986b): (15) a. [CP [IP Jean a mangé]] b. [CP a [IP Jean t mangé]] c. [CP Jean a [IP t t mangé]] d. [CP Jean a [IP il t mangé]] e. [CP Jean a-t-il [IP t t mangé]]

Complex Inversion in French  303 The first step is movement of the inflected verb to Comp, deriving (15b) from (15a). Next, the subject left-adjoins to some projection of Comp, giving (15c). Example (15d) is derived by the insertion of an expletive pronoun in subject position. Finally this pronoun cliticizes leftwards onto the inflected verb in Comp. This derivation involves two problematic steps. First, strict cyclicity is violated in (15d) and (15e), in that the operations which derive these structures take place in a subdomain of the domain of operations deriving (15c). Such a violation is suspect, even if the Strict Cycle Condition does not itself turn out to be a primitive condition of the theory (see Freidin 1978); why should it hold as a theorem in general but not in this case? Second, a widely accepted if not explicitly formulated assumption concerning lexical insertion is that all phonetically realized material is present at D-Structure (see Burzio 1986). This means that derivational operations can only create traces or fill empty positions by means of movement (they may also possibly delete material). This plausible constraint is violated by the insertion of il in (15d). It is fairly clear that both of these problems stem from the same cause: the fact that Kayne assumes that there is only one subject position in basic clause structure, at the time an uncontroversial assumption. This is why the same position must be the source of the two subjects, and thus why il must be inserted after the subject position has been vacated, leading to a violation both of strict cyclicity and of the condition on lexical insertion mentioned above. In the context of recent proposals by a number of authors (Kitagawa 1986; Koopman and Sportiche 1985, 1991; Kuroda 1986; Manzini 1986; Sportiche 1988a; Zagona 1982; and others) regarding the base position of subjects, we can straightforwardly solve the source problem. We adopt a variant of these proposals according to which subjects are base-generated in the Specifier of VP and raise in the course of the derivation to the Specifier of IP. This amounts, in effect, to treating I as a raising trigger. The proposal is illustrated for a simple English sentence in (16): (16) a. DS: [IP [I′ I0 [VP John [V′ loves Mary ]]]] b. SS: [IP Johni [I′ I0 [VP ti [V′ loves Mary ]]]] As in normal cases of raising, the subject moves to Spec-IP in order to satisfy the requirements of Case Theory at S-Structure. The relevance of this proposal for us is that it makes available two subject positions. We thus propose that the two subjects of the complex inversion construction each occupy one of the two subject positions at D-Structure: the pronoun, which following Kayne we assume to be an expletive,7 occupies Spec-IP and the full NP occupies Spec-VP. The following is the DS representation of an example like (15e): (17) [IP ili [I′ I0 [VP Jeani [V′ a mangé]]]]

304  Luigi Rizzi and Ian Roberts Here the subject argument, Jean, occupies a theta-position, and the expletive pronoun is in a non-theta-position. The Theta Criterion is thus met at D-Structure. In French, the leftmost verbal element must raise to a tensed inflection (cf. Emonds 1978; Pollock 1989), so the following configuration is derived: (18) [IP il [I′ a [VP Jean [V′ t mangé]]]] If no further movement takes place, the structure will be ruled out by Case Theory, since, given our assumptions, Jean will be unable to receive a Case here. In fact, if there is no interrogative or adverbial element present that activates the CP-level, this kind of configuration is ruled out by Case Theory. If the CP-level is activated by the presence of some appropriate element, I0to-C0 movement can legitimately apply, yielding the following configuration: (19) [CP a [IP il [I′ t [VP Jean [V′ t mangé]]]]] The pronoun is now able to incorporate with the auxiliary, since the auxiliary c-commands it. Moreover, our assumptions about Case Theory, spelled out in the previous section, mean that the inflected verb still has the capacity of assigning a Nominative Case feature leftwards to an NP which it governs. The NP can then move directly from Spec-VP to a position to the left of the auxiliary where it will be assigned Nominative Case. These operations yield a well-formed complex inversion structure, illustrated in (20): (20) Jeani [C aj-t-ili [IP ti [I′ tj [VP t’i [V′ tj mangé]]]]] The structure can only arise where I0 moves to C0, because the environment in which the two subjects are both able to satisfy the requirements of Case Theory depends on the presence of the inflected verb in C0.8 A striking fact about the above derivation is that Jean raises from Spec-VP position to the pre-C0 position, skipping Spec-IP. In this representation, the Caseless trace left in the Spec-VP position, t′i, is not a variable. Moreover, being non-pronominal, we must take it to be an anaphor, analogous to an NP-trace. Thus (20) is analogous in relevant respects to cases of super-raising that have been discussed in the literature (cf. Lasnik 1985; Chomsky 1986b; Baker 1988). In general, super-raising leads to severely ungrammatical sentences, of the type in (21): (21) *Johni seems that Bill likes ti Why is it that the application of NP-movement skipping Spec-IP does not lead to ungrammaticality in (20)? There are two issues to be addressed here. The first concerns the Binding Theory, and the second the intersection of the ECP and Theta Theory.

Complex Inversion in French  305 Taking the binding-theoretic question first, the problem is that NP-traces are subject to Principle A of the Binding Theory. This principle requires that anaphors be bound in their binding domain. In (20), the binding domain for t′i is the minimal category containing a governor for t′i and a subject, i.e., IP. Therefore Jean has moved to a position outside the binding domain of its trace in (20). However, the representation in (20) is saved from Principle A by the fact that Jean and il can (and must) have the same index. This ensures that t′i satisfies Principle A, as it is bound by an element which is in its binding domain, namely the trace of il, which occupies Spec-IP. Thus the derivation of (20) violates Principle A of the Binding Theory, but the representation does not. Since, under current assumptions, the binding conditions are checked on representations and not on derivations, (20) does not lead to a violation. It is well known that the Binding Theory is too weak to deal with the whole class of super-raising structures, however. In particular, what we have just said will not distinguish (20) from examples such as (22): (22) *Johni seems that hei likes ti This sentence is very bad, despite the fact that the trace has an antecedent in its binding domain, the coindexed subject he. This leads us to the second issue mentioned above. Under current approaches, (22) is ruled out either as a violation of the ECP (Chomsky 1986b), or as a violation of Theta Theory (Rizzi 1990). Both accounts have in common that a crucial antecedent-­ government relation fails to obtain. We will develop here the theta-theoretic approach. In general, arguments in non-theta-positions must be connected to their theta-positions through chain formation. The basic condition on chain formation is that each element in a chain antecedent governs the next (see Chomsky 1986b). Moreover, well-formed theta-chains must preserve the bi-uniqueness condition imposed by the Theta Criterion in that they can contain exactly one argument, and can be assigned exactly one thetarole. Structures such as (22) violate this condition in that the only chains that would satisfy the Theta Criterion violate the antecedent-government requirement. In particular, no chain can unite the NP-trace and John. So (22) is ruled out ultimately by Theta Theory. If (22) is ruled out in this way, why is (20) grammatical? Being basegenerated in a non-theta-position, il is an expletive in (20), so that the chain (Jean, il, t, t’) contains exactly one argument. Moreover, this chain is well formed with respect to the antecedent-government requirement since each member antecedent-governs the next. Hence the Theta Criterion is satisfied here.9 To summarize, we propose that the D-Structure for complex inversion is as in (17) and the S-Structure as in (20). The derivation involves several types of movement: head-to-head movement of a, cliticization of il to a and NP-movement of Jean. All of these movements take place so that the two

306  Luigi Rizzi and Ian Roberts subjects are able to satisfy the requirements of Case Theory, outlined in the previous section. Movement of the inflected verb to C0 is a necessary precondition for the satisfaction of these requirements, so that this approach explains why complex inversion can only occur in interrogatives or other constructions activating the CP-level. Raising the NP subject from Spec-VP to a position in C does not violate either the Binding Theory or other conditions on chains, despite being derivationally close to super-raising, because unlike other cases of super-raising, the NP moved across the subject lands within the same clause and the antecedent-government requirement on each link of the chain can be met. 3.3  The Landing-Site Problem Two questions fall under the landing-site problem: (i) what is the structure of the sequence WH NP V-Cl? and (ii) how is the unique well-formed order to be guaranteed? Above we proposed that the natural updating of Kayne’s analysis would posit that the full NP subject occupies a position left-adjoined to C′. On this proposal, there is only one CP, whose Specifier is occupied by the whphrase, whose head is occupied by V-Cl, and the subject NP is left-adjoined to C′, as in (23): (23) [CP wh [C′ NP [C′[C V-Cl] IP]]] This analysis violates a putative constraint on adjunction, i.e., Chomsky’s (1986b) proposal that maximal projections can only be adjoined to other maximal projections. If the proposal in (23) is correct, Chomsky’s constraint should be weakened so as to allow adjunction of non-heads to non-heads. This would maintain the important restriction that non-heads cannot be adjoined to heads, and heads cannot be adjoined to non-heads. It is nevertheless worthwhile to explore some alternatives, although we shall tentatively conclude that the structure in (23) is to be kept. One alternative is immediately suggested by the guiding intuition behind the proposals made in the previous section for the underlying structure of complex inversion, i.e., that the construction involves two subjects. Pushing this intuition, we would be led to the conclusion that the NP literally is in a subject position at S-Structure, as well as at D-Structure. This implies that basic clause structure makes available three subject positions, not just two, as we have been assuming up to now: the source position of the NP, the source position of the pronoun, and the landing-site position of the NP. In fact, Pollock (1989) proposes just such a structure for clauses. He argues that, instead of considering there to be a single node Infl containing two kinds of features, Tense and Agr, these two elements should be treated as heading their own maximal projections. This proposal, motivated by

Complex Inversion in French  307 facts from Verb Raising in French, leads to a considerably more articulated structure for the clause, namely that illustrated in (24):10 (24)

AgrP Spec-Agr

Agr′ Agr

TP Spec-T

T′ T


Spec-V V


This structure in principle makes available three subject positions, all of which we could exploit in the following representation for complex inversion: (25)

AgrP Spec-Agr Jeanj

Agr′ Agr ak-t-ilj

TP Spec-T tj

T′ T tk

VP Spec-V




In the D-Structure representation Jean occupies Spec-VP and il Spec-TP. The auxiliary raises to Agr0; the pronoun incorporates into Agr0; and Jean moves to Spec-AgrP. The main point in favour of this structure is that it provides a clear and simple solution to the landing-site problem by making available a sufficient number of structural positions. However, adopting this structure poses several problems in other areas. The basic problem is that the CP-level plays no role in (25). This means on the one hand that there is no obvious way to state the fact that complex inversion is characteristic of interrogatives. Nothing prevents the generation of sentences exactly like those in (25) as declaratives. Although a sentence such as Jean a-t-il mangé is grammatical in French, it must be understood as a question. This is clearly a fact that our analysis must capture, but which the proposal in (25) does not naturally deal with. Moreover, the fact that the

308  Luigi Rizzi and Ian Roberts CP-level plays no role in (25) means that it is hard to see how this approach can provide an account of the root nature of complex inversion (see section 4 on this). More seriously, we would be left without an account of the fact that complex inversion is incompatible with the presence of an overt C0, as in: (26) a. Peut-être Jean est-il parti. Perhaps John has he left. b. Peut-être que Jean est parti. Perhaps that John has left. c. *Peut-être que Jean est-il parti. Perhaps that John has he left. The same observation holds for the Québecois phenomenon mentioned above (examples from Safir 1981/82: 461–462; thanks to Maria Teresa Guasti for drawing our attention to this fact): (27) a. Quoi que Jean veut? What that John wants? b. *Quoi que Jean a-t-il voulu? What that John has he wanted? If complex inversion involves movement of the inflected verb to C0 these paradigms are immediately accounted for, on a par with the simple subjectclitic inversion cases discussed earlier (see examples (6) and (7)). But given a structural representation such as (25), the gaps in the paradigms remain mysterious. All these problems clearly stem from the fact that movement to C0 is not involved in this analysis. We therefore reject the proposal in (25). In particular we will not assume that V-Cl is in Agr0, but in C0, as the evidence reviewed forcefully argues.11 A less radical alternative to C′-adjunction is CP-adjunction of the whphrase and structure-preserving movement of the subject NP to Spec-CP. This would give the structure in (28): (28) [CP wh [CP NP [C’[C0 V-Cl] IP] The order wh NP V-Cl would then involve assuming wh-adjunction to CP rather than NP-adjunction to C′, an assumption that avoids the technical problem mentioned in connection with (23). However, the structure in (28) poses some problems of its own. These arise in part because it implies that wh-movement in the syntax can have a landing site which is not the typical position of wh-operators, the Spec of Comp, and in part because it involves movement of a non-operator, the subject NP, to an operator position. The second option raises the possibility of non-operator movement to Spec-CP in general, which would lead us to

Complex Inversion in French  309 expect generalized Verb Second (V2), a phenomenon not found in (Modern) French. The first option raises the question of what prevents iteration of the wh-adjunction, or the combination of wh-movement to Spec-CP and whadjunction to CP. This would give rise to clearly ungrammatical sentences such as the following: (29) *Où quels livres Jean a-t-il trouvés? Where which books John has he found? For these reasons, we maintain the analysis shown in (23), involving C′-adjunction of NP, and wh-movement to Spec-CP in wh-questions (or the presence of a null operator in this position in yes/no questions).12 Returning then to the structure in (23), it is important to see how a theory allowing C′-adjunction necessarily only gives rise to the order of elements found in complex inversion. Taking an example where a wh-phrase is present, there are four logical possibilities to be considered: (30) a. [CP wh [C′ NP b. [CP wh [C′ wh c. [CP NP [C′ NP d. [CP NP [C′ wh Clearly, all of these possibilities, except (30a), must be excluded. Example (30b) violates the constraint on the distribution of wh-elements in French which requires that they be either in operator position (i.e., Spec-CP) or in situ at S-Structure. Example (30c) is ruled out because a non-operator, NP, occupies an operator position, namely Spec-CP (in a non-V2 language). Finally, (30d) is ruled out for both of these reasons. We must also rule out the possibility of C′-adjunction of a non-subject in (30a), as well as multiple C′-adjunction. Following the Principle of Full Interpretation of Chomsky (1986a), we take it that an element occurring in a given position at LF must be licensed in that position by an interpretation. As the C′-adjoined position is neither an operator position nor an argument position (nor a left-dislocation position, a position whose content is presumably licensed at LF by a rule of predication), an element occupying this position at LF can only be licensed by being in a wellformed theta-chain. The formation of a well-formed chain from this position is impossible for non-subject NPs, because the subject in Spec-IP will block chain-formation with any position it c-commands, since it will block antecedent-government of any such position.13,14 Thus the only way of licensing the C′-adjoined NP at LF is by linking it to a trace in subject (i.e., Spec-IP) position. Therefore the only possible candidate for C′-adjunction is the subject NP itself. The C′-adjunction option thus does not give rise to overgeneration. The above approach to the landing-site problem has the advantage that it allows us to deal with two other properties of complex inversion noted

310  Luigi Rizzi and Ian Roberts by Kayne. First, the construction does not allow questioning of the subject itself: (31) *Qui est-il parti? Who did he leave? Second, complex inversion is incompatible with stylistic inversion: (32) a. Où Jean est-il allé? Where John has he gone? b. Où est allé Jean? Where has gone John? c. *Où est-il allé Jean? Where is he gone John? According to the approach to the landing-site problem advocated above, the representations for the relevant parts of these examples would be as follows: (33) a. [CP Qui [C′ t [ est-il [ t′ [ t′′ parti ]]]]] b. [CP Où [C′ pro [ est-il [ t′ [ t′′ allé ] Jean ]]]] Following Kayne (1983), we can straightforwardly account for the illformedness of these two representations by exploiting the fact that the crucial empty category is in an adjoined, hence A′, position. Consider first (33a). Here t does not qualify as a variable because it is an A′-position; t′ does not qualify either, since it has the status of an incorporation trace (a status that we assume to be incompatible with the status of syntactic variable), and t′′ the trace in the base position of the subject, cannot be a variable because it is in a Caseless position. Hence there is no syntactic variable that the operator can bind, and so the structure is ruled out by the general ban on vacuous quantification. Next, consider (33b). We assume that stylistic inversion involves a pro subject licensed by a C0 under certain conditions (as Pollock 1986 suggests for some cases). Recall that pro is really an abbreviation for the feature matrix [−anaphoric, +pronominal]. It is natural to assume that these features only classify empty categories in A-positions; in fact, the only distinction that is needed in A’-positions is that between intermediate traces and empty operators, a distinction that is not properly captured by the features [±anaphoric, ±pronominal]. Hence the empty category occupying the C′-adjoined position in (33b) cannot be pro. If pro is a necessary component of stylistic inversion, (33b) will be illformed.15,16 Notice that the approach to the landing-site problem based on the representation in (25) is unable to account for the facts in (33) in an equally straightforward way, because in that approach the crucial

Complex Inversion in French  311 empty category would be in an A-position non-distinct from an ordinary subject position.17

4.  Root Phenomena A salient property of complex inversion is the fact that it is limited to root clauses, as the ungrammaticality of (34) shows: (34) *Je me demande qui Jean a-t-il vu. I wonder who John has he seen. In this section we will propose an account of this restriction, which we phrase in the context of a general approach to the nature of root phenomena. The root character of complex inversion is undoubtedly to be related to the root character of one component of the construction, namely subject-clitic inversion: (35) *Je me demande qui a-t-il vu. I wonder who has he seen. Both (34) and (35) appear to conform to a fundamental generalization concerning root phenomena: movement of the inflected verb to C0 is by and large restricted to main clauses. This rough generalization subsumes, in addition to the French constructions, subject-aux inversion in English and the main types of V2 in other Germanic languages (cf. den Besten 1983). The account we want to propose relies on the idea that the correct distinction is not main vs. embedded clause, but rather selected vs. non-selected clause (see Kayne 1982). A quick survey of the relevant cases supports this hypothesis. In the first instance, we should separate independent CPs from subject, complement and adjunct CPs; the former allow verb-movement to C0 while the latter do not. It is clearly true that independent CPs are not selected, and it follows from the Projection Principle, in conjunction with the Theta Criterion, that both complement and subject CPs must be selected. This leaves adjunct CPs. In typical adjuncts, for example the kind which can host a parasitic gap, CP is selected by a Preposition (in English, without, before, in order, etc.). Thus the whole adjunct is a PP containing a CP selected by the Preposition in such cases. There is, however, one class of adjunct CPs which provides evidence that the correct generalization regarding the possibility of inversion concerns the selected/nonselected distinction rather than the main/embedded distinction, namely the class of conditional protases (see Kayne 1982). Conditionals are embedded adjuncts, and they are also not selected. As (36) shows, they optionally allow inversion: (36) a. Had I the time, I’d help you. b. Aurais-je le temps, je vous aiderais.

312  Luigi Rizzi and Ian Roberts Putting these observations together with previous remarks on the incompatibility of inversion with a filled complementizer (cf. (8c) and English *If had I the time . . .), the following generalization emerges: (37) Inversion is possible only if

(i) CP is not selected, and (ii) C0 is not filled.

In most cases the two conditions overlap, for example in embedded thatclauses, but there are cases of both unselected clauses with a filled C0 that block inversion (cf. (6c-8c), (26c), (27b)), and of selected clauses where C0 is empty and inversion is blocked (e.g., (34) and (35)). We have already seen that condition (ii) of (37) follows directly from the idea that inversion involves movement of the inflected verb into C0: if C0 is filled movement cannot take place. The main topic of this section will be to explain what underlies condition (i) of (37). One possible approach would be to try to reduce (i) to (ii) by assuming that a selected C0 is always filled in the relevant sense. This is not implausible in the case of indirect questions such as (34) and (35), as here we could claim that C0 is filled by the feature [+wh] selected by the main predicate, and hence is not available as a landing site for movement. However, the drawback to this approach is that there is no good way to ensure that all selected CPs have a filled C0, especially in cases where C0 is phonetically null. For this reason, we will explore a more principled approach. We will claim that condition (i) of (37) derives from the Projection Principle. The Projection Principle requires that selectional properties be satisfied at all levels of syntactic representation. This requirement extends to categorial selectional properties, thereby imposing a strong structure-­ preservation constraint on all selected contexts. We will propose that I0to-C0 movement or, more precisely, the instances of this process that concern us here, does not preserve the structure in the strong sense required by the Projection Principle, and so is banned in all selected contexts. To show how this idea can work, we must first introduce some assumptions concerning the nature of head-to-head movement. We further constrain the approach of Baker (1988: 59) by assuming that head-to-head movement is always and only substitution of a head into another head position. In other words, we restrict the adjunction option to maximal projections (but see n. 18). In cases where incorporation results in a visible amalgam of the two heads, e.g., standard cases of Noun incorporation or V-to-I movement where V picks up tense and agreement marking, we assume that the incorporation host morphologically subcategorizes for the incorporee, hence a structural slot is created for the incorporee at D-Structure as a function of the lexical properties of the incorporation host (cf. Lieber 1980, on morphological subcategorization). So (tensed) I0 in a language like French has the subcategorization frame [+V0—], an incorporating V0 in Mohawk has the feature [+N0—], and so on. In general, where

Complex Inversion in French  313 an incorporation trigger X has the feature [+ Y0—], this means that the slot for Y0 is base-generated within X0, triggering substitution of Y0 during the derivation, leading to the creation of a complex head with the government and Case-marking properties discussed at length by Baker (1988, ch. 2). With this kind of incorporation, the head of the complex formed by incorporation remains X0, the incorporation trigger.18 Of course, nothing prevents an incorporation host of this kind from being selected by a higher head. Since incorporation does not alter categorial status, no problem is posed for the Projection Principle. Consider, for instance, Noun incorporation in an incorporating language. In such cases, the Verb has the morphological subcategorization feature [+N0—], creating a slot into which the Noun can be substituted. In (38), Noun incorporation is strongly structure-preserving, in the sense that it moves N0 to a pre-existing slot and it does not change categories; the verb does not become a noun. If I0 selects a V-projection (cf. Chomsky l986b), the Projection Principle is not violated since the complex head resulting from incorporation remains a verb at S-Structure. On the other hand, if the potential host does not provide a structural slot via morphological subcategorization, adjunction of heads being excluded (or limited to cliticization; see n. 18), the only way for a lower head to incorporate is by direct substitution into the host head. Of course, in most cases this operation will be excluded by the Recoverability Principle, the content of the host head being nonrecoverably erased. There is one case, though, in which recoverability is not violated: this is when the host head is radically empty, hence there is no content to recover. Our claim is that this is precisely what happens in the familiar cases of I0-to-C0 movement. This gives rise to a structure such as (39): 0



NP 0



V0 N




V 0 [+N

] N0


(39) a.


b. C′










Let us see how (39b) can be ruled out in selected contexts. We maintain the standard assumption that selection involves properties of heads. If CP

314  Luigi Rizzi and Ian Roberts is selected in (39b), then there is a higher selecting head requiring that its complement’s head be C0. This lexical requirement is met at D-Structure but not at S-Structure where the phrase’s head is a C0 and an I0 (under the standard definition of the “is-a” relation). So (39b), in a selected context, is ruled out by the Projection Principle.19 We thus derive condition (i) of (37). This approach has a number of significant consequences. First, we account for the fact that V0-to-I0 movement is typically not restricted to unselected domains, while I0-to-C0 movement typically is.20 In our system, this difference follows from the fact that V0 to I0 is usually an instance of the first type of incorporation described above, i.e., that which is triggered by a morphological subcategorization feature of an agreement or tense affix. In this case, the categorial status of the host head is not affected, and even if I0 were selected by C0 (which it may or may not be) there would be no Projection Principle violation. This is why V0-to-I0 movement systematically differs from I0-to-C0 movement across languages. The second consequence is that I0 to C0 is not necessarily excluded in all selected environments. If C0 has the relevant morphological subcategorization feature, movement of I0 to C0 would not involve substitution for C0 and would not violate the Projection Principle. This appears to be the case in the instances of I0-to-C0 movement attested in the Romance languages: Auxto-Comp in Italian and the corresponding structure in inflected infinitives in Portuguese (cf. Rizzi 1982, Chs. 3 and 4; Raposo 1987). The Portuguese case is particularly telling: the construction involves an inflected verbal element in C0 position in various kinds of infinitival complements, as in (40) (from Raposo 1987: 98): (40) O Manel pensa terem os amigos t levado o livro. Manel thinks to-have-agr the friends taken the book ‘Manel thinks that the friends have taken the book.’ As this option is lexically selected (e.g., epistemic verbs allow it but volition verbs do not), a natural way to express this restriction is to say that epistemic verbs but not volition verbs select an embedded C0 with an agreement morpheme, which in turn morphologically subcategorizes an I0 slot. Then movement of the inflected auxiliary to C0 does not involve substitution for C0 itself, and no problems arise with the Projection Principle. So this kind of I0-to-C0 movement is allowed to apply in complement and other embedded contexts.21 To summarize, in this section we have proposed that the generalization underlying the restriction of complex inversion and subject-clitic inversion (and, more generally, I0-to-C0 phenomena) to root contexts is (37). The second part of this generalization follows straightforwardly from the very idea that these processes involve I0-to-C0 movement. We proposed that the first part is derived from the Projection Principle, once certain refinements are added to Baker’s theory of head-to-head movement.22

Complex Inversion in French  315

5. Conclusion The analysis of complex inversion that we have proposed integrates a number of strands: the basic insights of Kayne’s (1983) analysis, Chomsky’s (1986b) extension of X-bar Theory, Baker’s (1988) theory of head-to-head movement and the more elaborated proposals for the structure of clauses that have been made recently. We have shown how these strands can be drawn together so as to give a fairly complete analysis of complex inversion. Moreover, the analysis has led to a number of theoretical proposals; in particular, we have refined the theory of head-to-head movement by proposing that such movement is always substitution (perhaps with adjunction limited to cases of cliticization). Substitution can be into a slot provided by the morphological subcategorization of the host, or directly into the host head when the latter is empty. The second kind is properly restricted to root environments by a strict interpretation of the Projection Principle. APPENDIX I Embedded Subject-Aux Inversion in English Embedded Subject-Aux Inversion (SAI) is never found in indirect questions in English (*John wonders should he go to the store). However, SAI can be triggered by certain negative adverbials: (41) a. Never in my life have I been so insulted! b. Only in America could you get away with that. In certain embedded contexts, sentences of the type in (42) are possible (cf. Kayne 1982, 1983): (42) He said that under no circumstances would he do it. Two properties characterize this construction. First, that cannot be deleted: (43) ?*He said under no circumstances would he do it. Second, the complement is a weak island: (44) ?*What did he say that under no circumstances would he do? If we maintain that this type of inversion is an instance of I0-to-C0 movement, as is clearly shown by the impossibility of SAI where if is present (see above), we have no alternative other than to treat these cases as instances of CP-recursion. 23 We propose, therefore, that that has the marked property in English of selecting CP. Thus, if that is not present, a structure such as (43)

316  Luigi Rizzi and Ian Roberts can involve only one CP, where I0 to C0 is excluded for the reasons we have presented. That this option is by and large restricted to that is shown by the deviance of recursion with other choices of C0. For example, the structure is impossible with a [+wh] C0: (45) *I wonder if/whether under no/any circumstances would John do that. The islandhood of these complements is explained by the CP-recursion idea, as the embedded clause in (44) would have a representation such as the following: (46) [CP t that [CP under no circumstances [C′ would [IP he t do t ]]]] Extraction of the object in (46) would cross the lower tensed CP, which, in the system of Chomsky (1986b), has bounding properties akin to those of a standard wh-island since its Specifier is filled by the negative adverbial. APPENDIX II On the Landing-Site Problem The approach to head-to-head movement developed in section 4 allows us to elaborate a more principled solution to the landing-site problem of complex inversion, which dispenses with the ad hoc step of C′ adjunction (cf. section 3.3).24 The background is provided by the uncontroversial assumption that different kinds of heads license different kinds of specifiers: I0 licenses an A-specifier, C0 licenses an A′-specifier, and so on. Let us now take seriously the idea, formulated in section 4, that the result of inversion is a clause headed by C0 and by I0. In that case, two specifier positions can be licensed: the typical specifier of C0, the landing site for wh-movement, and the typical specifier of I0, a subject position. Both positions are used in complex inversion: (47) Où Jean [[est-il] [t t allé t] If we look at the problem derivationally, as we have done throughout the paper, we can simply assume that, when the new head is created by I0-to-C0 movement, the extra specifier position is automatically provided and made available for the lower subject to move into. Notice that this option never arises in cases involving incorporation qua substitution for a slot created via morphological subcategorization by the host head (that is, V-to-I movement does not create an extra position within IP corresponding to the V-specifier): in such cases the host head remains the only head of the construction after

Complex Inversion in French  317 incorporation, and so no additional Spec position can be licensed. Only in the case in which incorporation involves substitution for the host head, i.e., I0-to-C0 movement in root contexts, does the construction involve a genuine double head, and therefore a double specifier can be allowed. Moreover, this option is excluded in a language lacking subject clitics, such as English, for Case-theoretic reasons, as before (I has only one Case to assign, and so cannot Case-mark both its newly created specifier and the original specifier). The fact that the two specifiers are strictly ordered can now be related to the fact that a Case relation is involved only with one specifier: in (47), Jean must be adjacent (in the appropriate sense) to the head that assigns Case to it, hence où cannot intervene. The C′-adjunction solution made crucial use of the A′ status of the adjoined position to account for the incompatibility of complex inversion with wh-movement of the subject and stylistic inversion: (48) *Qui t est-il venu? (49) *Oú pro est-il allé Jean? This solution is no longer available within the more principled analysis that we are now adopting: if the NP position preceding the inflected verb is a legitimate I0-specifier, then it is an A-position, and (48) and (49) cannot be excluded as before because of the illicit A′-status of the variable and pro. A different approach is in order. Concerning (48), Marc-Ariel Friedemann (personal communication) pointed out to us that this structure is independently ruled out by the ECP within the system of Relativized Minimality (Rizzi 1990), regardless of the A- or A′-status of the trace. In this system, traces must be properly headgoverned, a requirement that is fulfilled for a subject trace in languages such as English or French by a C0 agreeing with its Spec: (50) Qui C0 [ t est venu ] (51) Who C0 [ t left ] In (48) no such proper head governor can be provided for the trace of qui, as C0 containing I0 is on the wrong side of the trace, hence the structure is ruled out by the ECP. As for (49), we can now elaborate on Sportiche’s (l988b) approach to Case Theory presented in n. 4. If Case can be assigned under strict government or agreement, the choice of mode of assignment for each specific instance of Case being a parameter, then it is reasonable to look at the licensing of pro along the same lines. So, pro can be licensed under agreement from its licensing head (as is the case for subject pro in Italian) or under strict government (as is the case for object pro in Italian; cf. Rizzi 1986a). It appears that the non-argument pro responsible for stylistic inversion in

318  Luigi Rizzi and Ian Roberts French is licensed under strict government from C0 (when additional conditions are met): (52) Le jour [ où C0 [ pro est venu Jean ]] The day when came John But then pro cannot be licensed in a structure such as (49) where it would be, if anything, in an agreement configuration with the appropriate head, and would not be strictly governed by it. The important facts illustrated by (48) and (49) can thus be naturally reconciled with our more principled approach to the landing-site problem.

Notes   1. Thanks to Adriana Belletti, Anna Cardinaletti, and the audience at the Séminaire interdépartemental de recherche linguistique at the University of Geneva for their comments on an earlier version on of this material.   2. If subject pronouns occur in NP position in French, then a sentence such as: (i) Marie, elle parle toujours. Mary, she speaks always.

must involve left-dislocation. This is supported by the fact that quantified NPs, generally excluded in cases of left-dislocation (cf. John/*Nobody, he’s a nice guy), are in fact impossible in structures of this kind: (ii) *Personne, il n’ est venu. No-one, he came.

The corresponding case is possible in various northern Italian dialects: (iii) Gnun l’a dit gnent.  (Piedmontese) No-one he has said nothing. ‘No-one said anything.’

This is expected: if the clitic is under Infl in (iii), gnun can appear in subject position, where quantified NPs are generally allowed to occur. See Rizzi (1986b) for a detailed presentation of this argument. See also Renzi (1987) and Roberge (1986) for examples showing that certain dialectal varieties of French pattern with northern Italian dialects in this respect.   3. Pollock(1989) following Emonds (1978), shows that in French the leftmost verbal element must raise to I0 in tensed clauses. Such verb raising is impossible in (Modern) English for non-auxiliary verbs.   4. Alternatively, we could adopt the approach developed by Sportiche (l988b) (and also suggested by Jaeggli, personal communication) according to which Case can be assigned in one of two fundamentally different ways: either via government (defined in terms of strict c-command) or via Spec-head agreement. So, Objective and Oblique Cases are generally assigned via government by V or P, while Nominative Case is assigned via Spec-head agreement with I0 in declarative clauses in English and French (cf. also the earlier suggestion of Belletti and Rizzi 1981: 125). As the mode of assignment for I0 must be subject to parametric variation in this system, one could then claim that I0 can assign Nominative Case both by agreement and by government in English, the latter mode of assignment being relevant in inverted clauses, while it can only assign Case via

Complex Inversion in French  319 agreement in French. I -to-C movement destroys the Spec-head agreement configuration and makes Nominative assignment impossible in French in inverted clauses.   One advantage of this approach is that it is relatively easy to see why a V0 which has been raised to I0 (or C0) may still assign Case to its object, while an I0 which has been raised to C0 has its Case-assignment capacity inhibited, as in French (this issue was raised by Alessandra Tomaselli, personal communication): a raised V0 still governs its object via Baker’s (1988, ch. 2) Government Transparency Corollary, while a raised I0 is simply no longer in a Spec-head configuration with Spec-IP. Once raised, I0 can only Case-mark Spec-IP by government, an option which is unavailable in French.   5. According to (the obvious updating of) Kayne (1983), the cliticization of the pronominal subject to the inflected verb is allowed to apply in the syntax only when I-to-C movement takes place, as only in this case is the cliticization target higher than the subject pronoun. If the inflected verb does not move, cliticization in the syntax would be downgrading, hence the clitic trace would not be bound by the clitic. The process is then restricted to apply in the phonology in this case. Notice that even if the pronoun is cliticized in the syntax in (12), it still manifests an NP position in that it fills the subject position at D-Structure.   6. Nothing in what we have said rules out the comparable situation with objects, i.e., a structure like complex inversion involving an object pronoun and an object NP. In such a structure, the pronoun could satisfy Case Theory by incorporating with V while the NP is assigned Objective Case under government by V. We suggest that Case Theory actually allows this possibility, but that Theta Theory rules it out since V would have only one object theta-role to assign but two object arguments. The basic difference between the hypothetical object case and the attested subject case, then, is that object pronouns cannot be expletives in French (cf. Kayne 1983), while subject pronouns can. If also in River Plate Spanish, Rumanian, etc., object clitics cannot be expletives, as appears to be the case, then object-clitic doubling in these languages must involve the composition of two argument chains, in the sense of Chomsky (1986a), Rizzi (1987a).   7. On the fact that the expletive agrees with the argument here, but not in other constructions, see Kayne (1983:127–129).   8. Generating the pronoun and the NP the other way around in (17), i.e., with il in Spec-VP and Jean in Spec-IP at D-structure, gives rise to an S-Structure which could satisfy Case Theory without I0-to-C0 movement (the only movement needed would be incorporation of il with the inflected verb in I0). However, in such a sentence Theta Theory would be violated at D-Structure, as the argumental NP occupies a non-theta-position.   9. An example such as (i) is ruled out in English by the antecedent-government condition: 0


(i) *A man seems that there was killed t.

Here the chain (a man, there, t) is not well-formed because a man does not antecedent-govern there. The difference with the complex inversion example in (20) is that the raised NP antecedent-governs the clitic in (20). Recall that the configuration of (20) is impossible in English for Case reasons, as English pronouns do not incorporate. 10. We follow Belletti (1990) in assuming that AgrP dominates TP, while Pollock proposes that TP dominates AgrP. 11. If, because of its other virtues, we still want to adopt Pollock’s proposed clause structure, we must explain why (24) is not an option for complex inversion. To get this result, it is enough to assume that one of the Spec positions in (24) is either absent or an A′-position, hence not available as the base position for

320  Luigi Rizzi and Ian Roberts il. The most plausible candidate for this is Spec-TP. If Spec-TP is not present, it obviously cannot be occupied by il. If it is present but an A′-position, it could not be the base position of an expletive, since expletives belong to the A-system. So, il would have to be base-generated in the Spec-Agr position, which means that the representation in (25) could not arise since incorporation of il from Spec-Agr to Agr0 would violate the ECP (see Baker 1988). 12. Another possibility which comes to mind is CP-recursion. This means that the structure of complex inversion would be as follows: (i)

[CP1WH [C′1 C10 [CP2 NP [C′2[C20 V1] IP ]]]]

However, this proposal fails to account for nearly all the important properties of complex inversion. In particular, there would be no way to account for the root nature of the phenomenon (CP-recursion, if available, should be possible in both root and embedded contexts). So we reject this possibility. 13. This requires a version of the Relativized Minimality Principle (see Rizzi 1990), according to which subjects block antecedent-government not just in A-chains but in theta-chains, the latter also including some chains headed by an argument in an A′-position (cf. n. 16). The same reasoning extends to the case where the C′-adjoined position is occupied by a predicate or adjunct, assuming that such an element must be connected by a well-formed chain to its canonical functional position, and that the subject (or perhaps the main predicate; see Roberts 1988). Is able to block antecedent-government in this case as well. 14. The presence of an object clitic on the verb in C0 (as in *Pourquoi cela l’as-tu dit) does not save the preposed object, because object clitics are unable to be expletives in French (cf. n. 5), therefore a chain including the two arguments cela and le inevitably violates the Theta Criterion here (ct. Kayne 1983:1 17). 15. The fact that variables are restricted to A-positions is actually a subcase of the restriction of the features [±anaphoric, ±pronominal] to A-positions, under the usual assumption that variables are defined in terms of this feature system. 16. It was proposed in Rizzi (2000) that this approach also gives an account of the fact that pro cannot appear in Spec-CP and thereby fulfill the V2 requirement in German: (i) Gestern wurde pro getanzt. Yesterday was danced. (ii) Es wurde t getanzt. It was danced. (iii) *Pro wurde t getanzt.

There is evidence that the element fulfilling the V2 requirement does not have to be phonetically realized, e.g., the empty operator involved in yes/no questions or the discourse-bound empty operator discussed in Huang (1983) can fulfill the V2 requirement. Thus the phonetic emptiness of Spec-CP is not in itself the cause of the ungrammaticality of (iii). Rather, (iii) is excluded because pro cannot appear in an A′-position such as Spec-CP. 17. We allow the possibility that theta-chains can be headed by A′-positions, as is the case with the theta-chain headed by the subject NP in the C′-adjoined position in complex inversion (other cases would be clitic chains and the chains relating preposed initial arguments to their theta-positions in V2 structures). 18. What is the status of cliticization with respect to our proposals for head-to-head movement? There are two possibilities. On the one hand, we could treat cliticization on a par with Noun incorporation, by taking cliticization hosts to have an appropriate morphological subcategorization frame. For languages such as Romance, which have cliticization but not Noun incorporation, we can make

Complex Inversion in French  321 the required categorial distinction by adopting the proposal made by Baker and Hale (1988) that pronouns are members of the category Determiner (D) (cf. Postal 1966). Cliticization hosts such as Romance Verbs (or perhaps Infl) would then have the specification [ +D0—]. On the other hand, we could distinguish cliticization from other types of affixation by weakening the ban on head adjunction and maintaining that cliticization is the one case of head-to-head movement which involves adjunction rather than substitution. 19. We assume, with Chomsky (1965), that a positive specification of categorial selection in a lexical entry implies a negative value of all the non-occurring specifications. So [+—C0] implies, among other things, [-—Io], whence the desired result. This account further entails that there can be no operation of S’-deletion in the literal sense of elimination of the CP-level. If this were allowed, a predicate which selected CP at D-Structure would select IP at S-Structure and LF in a clear violation of the strong version of the Projection Principle required by our analysis. The obvious alternative is that “S′-deletion” verbs in fact select infinitival IPs at all levels. 20. For example, according to Pollock (1989), V0-to- Io movement in French takes place in both main and embedded clauses; the same is true for V0-to-I0 movement in Italian (Belletti 1990), Middle English (Roberts 1985 [this volume, Chapter 1]) and Vata (Koopman 1984). 21. There is another class of apparently non-selected CPs, relative clauses, pointed out by Bonnie Schwartz (personal communication). These clauses clearly strongly disallow inversions (*The man who do I know). While it may be possible to claim that restrictive relatives are in fact selected by the Determiner of the head, such an account does not seem viable for appositives, where inversion is equally impossible. This suggests that an extension of our approach is needed. The Projection Principle serves to maintain the semantics/syntax correspondence in cases of selection, but there is no doubt that this correspondence must be maintained in other cases too. In particular it is plausible to suggest that the predication function can only be fulfilled by certain categories (see the list given in Williams 1980). In that case, full relative clauses presumably must be CPs at LF in order to be licensed by predication. If this is so, then the same result obtains as in the case of selection: no substitution for C0 would be possible, as the categorial status would be affected, thus preventing predication. The common factor behind relatives and indirect questions is, on this view, the fact that the Projection Principle and other well-formedness conditions on the syntax/ semantics interface require that such clauses be projections of C0 alone at the relevant syntactic levels. 22. A problem with this approach is posed by cases of embedded V2 in German. The usual [-—wh] complementizer in German is daß. Unlike English that, daß is generally obligatory. Thus a normal case of [–wh] subordination features daß in the embedded C0, with the tensed Verb in final position in the lower clause. However, certain verbs of saying and thinking allow daß to be dropped, and this triggers V2 in the complement CP: (i) a. Ich sagte er hatte meine Frau gesehen. I said he had my wife seen. (ii) b. Ich glaube er mag mich nicht. I think he likes me not.

The CPs here are clearly complements to sagen and glauben, respectively. So we are apparently faced with an instance of I to C in a selected context. This phenomenon in fact lends prima facie support to our first suggestion concerning condition (i) of (37), in that we could claim that C0 simply isn’t filled here.

322  Luigi Rizzi and Ian Roberts Within the more principled approach involving the Projection Principle, we could explore the possibility that these examples involve incorporation triggered by the morphological subcategorization property of C0, as in the Romance cases discussed earlier. Alternatively, it could be the case that these structures are base-generated in extraposed position, hence the Projection Principle does not directly prevent categorial shift of an element in this position. 23. CP-recursion may also be in order to describe the colloquial varieties of French which allow subject clitic inversion in embedded interrogatives (René Amacker, personal communication). 24. Our proposal is conceptually close to Haider’s (1987) Matching Projection approach, even if the two ideas are formally and empirically quite different.

References Baker, M. 1988. Incorporation: A Theory of Grammatical Function Changing. Chicago: University of Chicago Press. Baker, M.C. and K. Hale. 1988. “Pronoun and Anti-Noun Incorporation,” ms, McGill University/MIT. Baker, M.C., K. Johnson and I. Roberts. 1989. “Passive Arguments Raised,” Linguistic Inquiry 20:219–252 [this volume, Chapter 8]. Belletti, A. 1990. Generalized Verb Movement. Aspects of Verb Syntax. Turin: Rosenberg and Sellier. Belletti, A. and L. Rizzi. 1981 “The Syntax of ne: Some Theoretical Implications,” The Linguistic Review 1:117–154. den Besten, H. 1977/83. “On the Interaction of Root Transformations and Lexical Deletive Rules,” ms, University of Amsterdam. Published (1983) in W. Abraham (ed.), On the Formal Syntax of the Westgermania. Amsterdam: John Benjamins. 47–131. Burzio, L. 1986. Italian Syntax: A Government-Binding Approach. Dordrecht: Reidel. Chomsky, N. 1965. Aspects of the Theory of Syntax. Cambridge, Mass.: MIT Press. ———, N. 1986a. Knowledge of Language: Its Nature, Origins and Use. New York: Praeger. ———, N. 1986b. Barriers. Cambridge, Mass.: MIT Press. Couquaux, D. 1986. “Les pronoms faibles sujet comme groupes nominaux,” in M. Ronat and D. Couquaux (eds.), La Grammaire Modulaire. Paris: Les Éditions de Minuit. 25–46. Emonds, J. 1976. A Transformational Approach to English Syntax. New York: Academic. ———, J. 1978. “The Verbal Complex of V′-V in French,” Linguistic Inquiry 9:151–175. Freidin, R. 1978. “Cyclicity and the Theory of Grammar,” Linguistic Inquiry 9:519–549. Goldsmith, J. 1981. “Complementizers and Root Sentences,” Linguistic Inquiry 12:541–574. Huang, J. 1984. “On the Distribution and Reference of Empty Pronouns,” Linguistic Inquiry 15:531–574. Kayne, R.S. 1972. “Subject Inversion in French Interrogatives,” in J. Casagrande and B. Saciuk (eds.), Generative Studies in Romance Languages. Rowley, Mass.: Newbury House. 70–126. ———. 1982. “Predicates and Arguments, Verbs and Nouns,” GLOW Newsletter 8:24. [Abstract of paper presented at the 1982 GLOW Conference.] ———. 1983. “Chains, Categories External to S, and French Complex Inversion,” Natural Language and Linguistic Theory 1:109–137.

Complex Inversion in French  323 Kayne, R.S. and J.-Y. Pollock. 1978. “Stylistic Inversion, Successive Cyclicity, and Move NP in French,” Linguistic Inquiry 9:595–621. Kitagawa, Y. 1986. “Subjects in Japanese and English,” Ph.D., University of Massachusetts, Amherst. Koopman, H. 1984. Verb-Movement and Universal Grammar: From the Kru Languages to Grammatical Theory. Dordrecht, Foris. Koopman, H. and D. Sportiche. 1985. “Theta Theory and Extraction,” GLOW Newsletter 14:57–58. [Abstract of paper presented at the 1985 GLOW Conference.] Koopman, H. and D. Sportiche. 1991. “The Position of Subjects.” Lingua 85:211–258. Kuroda, Y. 1986. “Whether we Agree or Not: A Comparative Syntax of English and Japanese.” Lingvisticae Investigationes 12:1–47. Lasnik, H. 1985. “Illicit NP-movement: Locality Conditions on Chains?” Linguistic Inquiry 16:481–490. Lieber, R. 1980. “On the Organisation of the Lexicon,” Ph.D., MIT. Manzini, M.-R. 1986. “Phrase Structure and Extraction,” GLOW Newsletter 16:55–57. [Abstract of paper presented at the 1986 GLOW Colloquium.] Pollock, J.-Y. 1986. “Sur la syntaxe de EN et le paramètre du sujet nul,” in M. Ronat and D. Couquaux (eds.), La Grammaire Modulaire. Paris: Les Éditions de Minuit. 211–246. Pollock, J.-Y. 1989. “Verb Movement, UG and the Structure of IP,” Linguistic Inquiry 20:365–424. Postal, P. 1969. “On So-Called ‘Pronouns’ in English,” in D. Reibel and S. Schane (eds.), Modern Studies in English. Englewood Cliffs, New Jersey: Prentice Hall. Raposo, E. 1987. “Case Theory and Infl-to-Comp: The Inflected Infinitive in European Portuguese,” Linguistic Inquiry 18:85–110. Renzi, L. 1987. “I pronomi soggetto: un caso di parentela tipologica tra fiorentino e francese, e un capitolo poco noto di storia della lingua italiana,” ms, Università di Padova. Rizzi, L. 1982. Issues in Italian Syntax. Dordrecht: Foris. Rizzi, L. 1986a. “Null Objects in Italian and the Theory of pro,” Linguistic Inquiry 17:501–557. Rizzi, L. 1986b. “On the Status of Subject Clitics in Romance,” in O. Jaeggli and C. Silva-Corvalàn (eds.), Studies in Romance Linguistics. Dordrecht: Foris. 391–420. Rizzi, L. 1987b. “On the Structural Uniformity of Syntactic Categories,” paper presented at the Second World Basque Conference, San Sebastian, September 1987. Rizzi, L. 1990. Relativized Minimality. Cambridge, Mass.: MIT Press. Rizzi, L. 1991. “On the Status of Referential Indices,” in A. Kasher (ed.), The Chomskian Turn. Oxford: Blackwell. 273–299. Rizzi, L. 2000. “Three Issues in Romance Dialectology,” in L. Rizzi (ed.), Comparative Syntax and Language Acquisition. London: Routledge. Roberge, Y. 1986. “The Syntactic Recoverability of Null Arguments,” Ph.D., University of British Columbia. Roberts, I. 1985. “Agreement Parameters and the Development of English Modal Auxiliaries,” Natural Language and Linguistic Theory 3:21–58 [this volume, Chapter 1]. Roberts, I. 1989. “Thematic Minimality,” Rivista di Grammatica Generativa 13:111‑137. Safir, K. 1981/82. “Inflection-Government and Inversion,” The Linguistic Review 1:417–467. Safir, K. and D. Pesetsky. 1981. “Inflection, Inversion and Subject Clitics”, Proceedings of NELS 11. 331–344.

324  Luigi Rizzi and Ian Roberts Sportiche, D. 1988a. “A Theory of Floating Quantifiers and Its Corollaries for Constituent Structure,” Linguistic Inquiry 19:425–449. Sportiche, D. 1988b. “Conditions on Silent Categories,” ms, University of California, Los Angeles. Szabolcsi, A. 1983. “On the Non-Unitary Nature of Verb-Second,” ms, Max-Planck Institute for Psycholinguistics, Nijmegen. Vergnaud, J.-R. 1985. Dépendances et niveaux de représentations en syntaxe. Amsterdam: John Benjamins. Williams, E. 1980. “Predication,” Linguistic Inquiry 11:203–238. Zagona, K. 1982. “Government and Proper Government of Verbal Projections,” Ph.D., University of Washington, Seattle.

10 Excorporation and Minimality Ian Roberts

1. Introduction* Baker (1988) presents a theory of syntactic incorporation according to which the operation that derives morphologically complex words from more basic elements (roots, stems, or affixes) is held to be the variant of Move-α that applies to heads. Thus, various kinds of affixation and incorporation processes are viewed as instances of head-to-head movement, as in (1): (1)


X0 + Y0i





A number of phenomena have been analyzed in this way: Baker himself treats noun incorporation, applicative constructions, causatives, and passives, and arguably the best-known work that builds on his has been done in the domain of verb movement (see Pollock (1989)). Other authors have also proposed treating Romance clitic climbing (Burzio (1986), Kayne (1989)) and Dutch verb raising (Haegeman (1988)) in terms of head-tohead movement. The major advantage of Baker’s approach is that it allows an account of certain constraints on morphological operations in terms of well-known and independently motivated syntactic conditions, notably the Empty Category Principle (ECP). The Head Movement Constraint, which prevents head movement from “skipping” intervening heads (Travis (1984)), can be derived from the ECP (Chomsky (1986)). Moreover, it follows from the ECP that only heads of complements can incorporate; incorporation is

326  Ian Roberts impossible from subjects and from adjuncts. Abstractly, then, the following cases are ruled out by the ECP in a system like Baker’s (in other words, in each case the trace of head-to-head movement fails to be properly governed): (2) a. *

XP X0 + Z0


ZP t

b.  *




0 0 X +Y


c.  *




X 0 + Y0


The question we will address in this squib is, What is the status of excorporation? Excorporation is successive cyclic head-to-head movement where one head simply “passes through” another, first incorporating and then moving on, as in (3): (3)







Y0 + ti



Z incorp


Excorporation and Minimality  327 Excorporation seems to be impossible in the genuine morphological cases of head-to-head movement, such as noun incorporation and affixation. For example, assuming that do is inserted in English at PF in order to carry “stranded” verbal affixes (Chomsky (1957)) and that have and be raise from base V-positions to I (Emonds (1976; 1978)),1 we never find cases of subject-aux inversion of the following type: (4) a.  *Have John does t gone?    (S-Structure: have John [t -s] t gone) b.  *Be John did t arrested?    (S-Structure: be John [t -ed] t arrested) Instead, once the auxiliary combines with agreement (I0), the two elements must move together to C0 (giving Has John gone? and Was John arrested? for (4)). Baker (1988, 73) suggests that derivations like (3) can be ruled out in terms of a ban on word internal traces (although he also mentions (fn. 19) the possibility of ruling out (3) in terms of the ECP). On the other hand, if cliticization and verb raising are cases of incorporation, excorporation might have to be countenanced. In the case of cliticization, excorporation would be manifested by clitic climbing. In the case of verb raising, excorporation would be needed to account for the interaction of verb second with this process (Jean Rutten (personal communication)): (5) a. Italian La volevo t chiamare t ieri. her I-wanted to-call yesterday ‘Yesterday I wanted to call her up.’ b.  Dutch Gisteren had ik [mijn vriendin op t] t yesterday had I my girlfriend up willen bellen. want call ‘Yesterday I wanted to call my girlfriend up.’ I follow Kayne (1989) in taking clitics to be heads, and clitic climbing to be successive-cyclic head movement. In (5a) the clitic la moves through the lower I and on, possibly through an intermediate C, to its surface position. The clitic passes through these heads morphologically unscathed; it does not carry along any features of the heads it moves through (this is particularly clear if we assume, following Belletti (1988), that Italian infinitives always raise to I). In (5b) successive applications of verb raising (which I describe in more detail below) create a verbal complex had – willen – bellen, out of which had alone moves to satisfy the verb-second requirement.

328  Ian Roberts I will show that the elaboration of Baker’s theory proposed in Rizzi and Roberts (1989 [this volume, Chapter 9]), in conjunction with the Minimality Condition (either the “rigid” condition of Chomsky (1986), or the relativized condition of Rizzi (1990)), gives exactly the correct results. Cases of incorporation that involve genuine affixation are prevented from undergoing subsequent excorporation by the ECP, whereas other instances of incorporation, those apparently operative in cliticization and verb raising, may allow excorporation. I will further show that this lends support to a treatment of verb raising as a particular kind of incorporation.

2.  Background Assumptions I first introduce the modifications Rizzi and Roberts (1989 [this volume, Chapter 9]) make to the theory of head-to-head movement. Rizzi and Roberts further elaborate the approach of Baker (1988, 59) by assuming that head-to-head movement may be either substitution of a head into another head position or adjunction of a head to another head position. In cases where incorporation results in a visible amalgam of the two heads (such as standard cases of noun incorporation, or V-to-I movement where V “picks up” tense and agreement marking), we assume that the incorporation host morphologically subcategorizes for the incorporee; hence, a structural slot is created for the incorporee at D-Structure as a function of the lexical properties of the incorporation host. So (tensed) I0 in a language like French has the subcategorization frame [ + V0 __ ], an incorporating V0 in Mohawk has the feature [ + N0 __ ], and so on. In general, where an incorporation trigger X0 has the feature [ + Y0 __ ], this means that the slot for Y0 is base-generated within X0, triggering substitution of Y0 during the derivation, leading to the creation of a complex head with the government and Case-marking properties discussed at length by Baker (1988, chap. 2). With this kind of incorporation, the head of the complex formed by incorporation remains X0, the incorporation trigger. For convenience, I adopt the notation originally proposed in Selkirk (1982, 3ff.) and indicate the incorporation trigger as X−1. It is crucial for what follows, however, that we continue to consider this element to be a head as far as the ECP is concerned (as it clearly is, if we extend standard assumptions about X-bar theory below the X0 level).2 On the other hand, if the potential host does not provide a structural slot via morphological subcategorization, head-to-head movement may take place either as an instance of adjunction or, if the host head is radically empty, as substitution into the empty head position. In the case of adjunction, following the proposals concerning adjunction in May (1985), the host head is realized in two segments, neither of which is itself a head. I leave aside the case of substitution into a radically empty head (see Rizzi and Roberts (1989 [this volume, Chapter 9]) for discussion). Substitution incorporation and adjunction incorporation are illustrated in (6) (these structures

Excorporation and Minimality  329 are left-headed purely for purposes of illustration; in fact, English inflections usually appear on the right of the stem): (6) a. 



X–1 [+ Y0]











The structures in (6a) and (6b) have quite different properties with respect to the Head Movement Constraint as it is derived from the ECP, as we will see. The second piece of background that is required is the Minimality Condition. In Chomsky (1986, 10) the Minimality Condition operates in configurations of the kind shown in (7): (7) . . . X . . . [ Y . . . W . . . Z . . . ] Here, if W governs Z, then the Minimality Condition prevents X from governing Z even if X satisfies all the other criteria for governing Z. The Minimality Condition is particularly important for the computation of antecedent government relations; thus, an intervening governor may block government of a trace by its antecedent, leading potentially to a violation of the ECP (or perhaps of conditions on chain formation; see Rizzi (1990, chap. 3)). This condition derives the Head Movement Constraint in a configuration like (2a) in the following way: Y0 in (2a) is an intervening governor for the trace and so prevents Z0 from antecedent-governing its trace. In the next section we will see how putting these assumptions into action derives the correct account of the properties of excorporation.

3. Excorporation In this section I will show how excorporation is prevented in cases like (4), but may be possible in cases like (5). Consider first the cases in (4). Suppose that have/be raising in English is triggered by the presence of an agreement

330  Ian Roberts affix in I0 that morphologically selects V0. So have/be raising is an instance of head-to-head movement of the type in (6a), with have/be = Y0 and I = X0. If have/be then moves to C alone, stranding the agreement affix, the following configuration results: (8)








I–1 [+V0



In this structure the I−1 sister to V0 is a head. In terms of minimality, then, I−1 counts as an intervening governor for the trace (see (7)), and antecedent government of this trace by V0 in C0 is therefore blocked. This situation will arise whenever excorporation takes place from a selected slot—that is, whenever Y0 in (6a) moves on, stranding X−1. This is what explains the ungrammaticality of (4). (I assume that any index possessed by an incorporee percolates to the complex formed after incorporation; this ensures that the trace in the base V0 position is antecedent-governed in (8) and permits the formation of the chain (V, t) here.) Consider next the situation that arises when incorporation involves the adjunction of one head to another, as in (6b). Following the conception of adjunction outlined in May (1985) and adopted in Chomsky (1986), the two occurrences of the host head X0 in (6b) are the segments of the single head X0. We can thus propose that the X0 sister to Y0, since it is not itself a head, cannot block proper government of the trace of Y0. Therefore, Y0 is able, other things being equal, to move on, stranding the host head, and its trace will be properly governed (note the parallel with movement of maximal projections, where adjunction may void barrierhood for analogous reasons). The situation with verb raising in examples like (5b) is actually more complex. Here it is not the adjoined element, the infinitival verb (willen bellen, itself the result of an earlier incorporation of bellen with willen), that excorporates, but rather the apparent original incorporation host, the inflected verb had.What allows this situation is the fact that the adjoined element is not the head of the resulting complex and thus arguably cannot count as an intervening governor for the trace of the inflected verb in cases of this type.3 This account does not affect the earlier account of (4) and (6a); here the selecting affix is the head of the complex I0.

Excorporation and Minimality  331 In adjunction structures of the kind in (6b), then, both the host and the incorporee are free to move on. In clitic climbing, the incorporee moves on; in structures combining verb raising and verb second, the host moves on. This raises the possibility that both elements could excorporate independently, the subsequent derivation perhaps involving different kinds of head-to-head movement. If we look more closely, sentences like (5b) in fact exemplify this possibility. The derivation of (5b) involves several kinds of verb movement. The entire derivation is presented in (9): (9) a. ik [[[mijn vriendin op-bellen] willen] heb] I0 b. ik [[[mijn vriendin op - t] willen-bellen] heb] I0 c. ik [[[mijn vriendin op - t] t] [V0 heb [V0 willen-bellen]]] I0 d. ik [[[mijn vriendin op - t] t] [V0 t [V0 willen-bellen]] [I0 had]] e. ik [[[mijn vriendin op - t] t] [V0 t t] [I0[I0 had] [V0 willen-bellen]]] f. [C0[I0 had]] ik [[[mijn vriendin op - t] t] [V0 t t] [I0 t [V0 willen-bellen]]] (9a) is the D-Structure representation. In (9b) bellen adjoins to willen (stranding the particle op). In (9c) the complex willen bellen adjoins to the matrix V0 heb-. The latter is in its base position and therefore uninflected at this stage of the derivation. However, heb- is selected by the tense/agreement morphology in I0. So this element excorporates from [V0 heb- willen bellen] and moves to I0 in (9d), forming had. This possibility is allowed, since verb raising involves adjunction, for the reasons just outlined. Next, willen bellen adjoins to I0. So the two parts of the complex formed by raising of willen bellen to heb- move to I0 independently of one another and produce different resulting structures. This is simply the combination of the two cases of excorporation we have discussed.4 As the last step in the derivation, I0 raises to C0 for verb second to give (9e); as described above, I0 can excorporate from the complex verb, but V0 (heb-) cannot excorporate from I0.

4. Conclusion The above remarks are intended as an elaboration of Baker’s theory. We see that, when certain points left open by Baker are developed, a picture emerges of a range of head-to-head movements. Given that it seems justified to treat processes as superficially varied as noun incorporation, V-to-I movement, verb raising, and clitic climbing all as instances of headto-head movement, a theoretically based account of what distinguishes them is most welcome. I hope that the foregoing analysis is a step in that direction.

332  Ian Roberts

Notes * My thanks to Hagit Borer, Luigi Rizzi, Jean Rutten, and two anonymous LI reviewers for invaluable help with this squib. All errors are of course my own. 1. These assumptions about the functioning of the English auxiliary system are made purely for the sake of the argument here. The reality is undoubtedly much more complex; see Chomsky (1989) and Pollock (1989) for some recent proposals. As an anonymous reviewer points out, the proposal sketched in the text does not prevent I containing an affix raising to C followed by do-insertion into C. Where an aspectual auxiliary is present, this would give results like *Does John have left? This is ruled out by the Strict Cycle Condition and the idea that have/ be raising is obligatory. A full treatment of this and related issues (notably the matter of long movement of auxiliaries via successive adjunctions, also pointed out by a reviewer) would go beyond the scope of this squib. These matters are dealt with, using essentially the system of head-to-head movement presented here, in Roberts (1993). 2. We could ask how the system carries over to cases of compounding, where the incorporation host is not a bound morpheme. One possibility would be to claim that morphological subcategorization is not the unique prerogative of X−1s. X−1 s must have a morphological subcategorization, since, as a consequence of wellformedness conditions of X-bar theory, they cannot stand alone (that is, they cannot be dominated by X′ or by a nonbranching X0; this is the No-Stray-Affix Filter). This does not prevent X0s from having morphological subcategorization frames; such X0s would be the elements that trigger compound formation. 3. Here certain differences emerge between Rizzi’s Relativized Minimality Condition and Chomsky’s rigid Minimality Condition. Chomsky’s condition is phrased in terms of the idea that some projection of a given head blocks government from outside that projection (see Chomsky (1986, 42)); it therefore follows automatically that adjoined heads do not project minimality barriers, since, in their adjoined position, they do not project at all. In Rizzi’s system any c-­commanding head blocks antecedent government of another head; adjoined heads are not immediately different from base-generated or substituted heads with respect to this principle. The natural refinement of Rizzi’s system is to say that only a head occupying a base-generated X0 position can block antecedent government of another head. This approach has the advantage that heads incorporated by substitution block excorporation of their hosts, an apparently correct result that Chomsky’s version of minimality cannot obtain for the same reason that adjoined heads do not block excorporation of their hosts in his system. 4. I assume that the Strict Cycle Condition is not operative here, since the two applications of head-to-head movement have exactly the same domains of application.

References Baker, M. C. (1988) Incorporation: A Theory of Grammatical Function Changing, University of Chicago Press, Chicago, Illinois. Belletti, A. (1990) Generalized Verb Movement: Aspects of Verb Syntax, Rosenberg & Sellier, Turin. Burzio, L. (1986) Italian Syntax, Reidel, Dordrecht. Chomsky, N. (1957) Syntactic Structures, Mouton, The Hague. Chomsky, N. (1986) Barriers, MIT Press, Cambridge, Massachusetts. Chomsky, N. (1989) “Some Notes on Economy of Derivation and Representation,” in I. Laka and A. Mahajan, eds., Working Papers in Linguistics 10, Department of Linguistics and Philosophy, MIT, Cambridge, Massachusetts. Reprinted in R.

Excorporation and Minimality  333 Friedin (ed) Principles and Parameters in Comparative Grammar. Cambridge MA: MIT Press, pp. 417–454. Emonds, J. (1976) A Transformational Approach to English Syntax, Academic Press, New York. Emonds, J. (1978) “The Verbal Complex V-V’ in French,” Linguistic Inquiry 9, 151–175. Haegeman, L. (1988) “Verb Projection Raising and the Multidimensional Analysis: Some Empirical Problems,” Linguistic Inquiry 19, 671–684. Kayne, R. (1989) “Null Subjects and Clitic Climbing,” in O. Jaeggli and K. Safir, eds., The Null Subject Parameter, Reidel, Dordrecht. May, R. (1985) Logical Form: Its Structure and Derivation, MIT Press, Cambridge, Massachusetts. Pollock, J.-Y. (1989) “Verb Movement, Universal Grammar, and the Structure of IP,” Linguistic Inquiry 20, 365–424. Rizzi, L. (1990) Relativized Minimality, MIT Press, Cambridge, Massachusetts. Rizzi, L. and I. Roberts (1989) “Complex Inversion in French,” Probus 1, 1–30 [this volume, Chapter 9]. Roberts, I. (1993) Verbs and Diachronic Syntax, Reidel, Dordrecht. Selkirk, E. (1982) The Syntax of Words, MIT Press, Cambridge, Massachusetts. Travis, L. (1984) Parameters and Effects of Word Order Variation, Doctoral dissertation, MIT, Cambridge, Massachusetts.

11 Two Types of Head Movement in Romance* Ian Roberts

1. Introduction The basic point of this paper is to argue that there are two distinct kinds of head movement. One kind is triggered by morphological properties of the host head, while the other kind is not, and in fact often appears to be triggered by some property of the moved head. Adopting and extending the terminology of Chomsky & Lasnik (1993), we refer to the former as L-related head movement and the latter as non-L-related head movement.1 Both types of head movement are subject to the ECP, but, since the nature of the target of movement is different in each case, the antecedent-government requirement manifests itself in different ways. This gives the appearance of differing locality conditions; in particular, only L-related head movement obeys the “classical” Head Movement Constraint of Travis (1984). By a revision of the Relativized Minimality Condition of Rizzi (1990), however, we see that both the cases which obey this condition and those which do not are in conformity with the ECP. We assume a conjunctive formulation of the ECP, as in Rizzi (1990: chapter 2). Moreover, we assume that traces of head movement are subject to a uniform head-government requirement. For non-L-related head movement, this raises the possibility that the head-governor and the antecedent-governor may be distinct. Our main empirical argument for the framework to be adopted relies on this fact; we will show that there is diachronic evidence from French that non-finite AGR ceased to be a head-governor for head traces in the 17th century, with the result that a number of instances of non-L-related head movement disappeared together. Moreover, we will see that the Romance languages as a group can be characterized in terms of whether non-finite AGR is or is not a head-governor. We will speculate on the relationship of this property to the null-subject parameter.

Two Types of Head Movement in Romance  335

2.  The Head Movement Constraint and the ECP In the recent work on head movement, the fundamental locality constraint that has been assumed is the Head Movement Constraint (HMC). This was originally formulated by Travis (1984: 131) as follows: (1) An X° may only move into the Y° which properly governs it. The HMC prevents head movement from non-complement categories; the empirical consequences of this are discussed in depth in Baker (1988). To the extent that some kind of Minimality Condition is incorporated into the definition of proper government, it also blocks head movement which “skips” a locally c-commanding head moving directly to a non-local c-commanding head. The banned configuration is schematized in (2): (2) [ZP [Z° X°] [YP Y°[XP t]] The instantiations of this configuration in familiar languages involving I-to-C movement are strongly ungrammatical, as following illustrate: (3) a. English SAI: *Have you would t said it to John b. French SCL-inversion: *Dit-il a t la vérité? (vs. A-t-il dit la vérité? ‘Has he told the truth?’) c. German V2: *Überall folgen ich dich t werde. (vs. Überall werde ich dich folgen,‘I’ll follow you everywhere.’) This is a desirable consequence of the HMC as formulated in (1). Starting with Chomsky (1986b), the ECP has been formulated such that the HMC is a deductive consequence of this principle rather than a separate generalization. For concreteness, we will frame our discussion in terms of a conjunctive version of the ECP where the definition of antecedentgovernment involves the Relativized Minimality Condition; cf. Rizzi (1990: chapter 2). Thus traces must satisfy both of the following conditions in order to be well-formed: (4) A non-pronominal empty category must be (i) properly head-governed (formal licensing); (ii) θ-governed or antecedent-governed (identification). (Rizzi (1990:74)) Head- and antecedent-government are the notions that will play a central role in our discussion throughout. Rizzi (1990: ch. 3) eventually abandons

336  Ian Roberts the θ-government component of the definition in (4); since we will be primarily concerned with functional categories in our discussion, θ-government plays no role. Both head- and antecedent government are defined in terms of Relativized Minimality (RM) in the following way: (5) For α (a head, antecedent), X α-governs Y only if there is no Z such that (i) Z is a typical potential α-governor for Y; (ii) Z c-commands Y and does not c-command X. The definitions of typical potential α-governor are as follows: (6) Z is a typical potential head-governor for Y = Z is a head m-commanding Y. (7) a. Z is a typical potential antecedent-governor for Y, Y in an A-chain = Z is an A-specifier c-commanding Y. b. Z is a typical potential antecedent-governor for Y, Y in an A′-chain = Z is an A′-specifier c-commanding Y. c. Z is a typical potential antecedent-governor for Y, Y in an X-chain = Z is a head c-commanding Y. Applying these definitions to the schema in (2), we see that the moved X does not head-govern its trace since there is an intervening head which m-commands the trace and so counts as a typical potential head-governor for that trace. This is the effect of (6). (7c) tells us that X does not antecedent-govern its trace since there is an intervening head which c-commands the trace, and so counts as a typical potential antecedent-governor for the trace. Thus X fails to head and antecedent- govern its trace and the trace violates the ECP. This subsumes the basic case of the HMC under the ECP. In the remainder of this section, we will discuss four conceptual objections to this account. The other sections of the paper demonstrate the empirical desirability of a different conception of the constraints on head movement by showing that the “classical” HMC of (1) does not give the correct descriptive characterization of them. We will argue for a “fully relativized” condition on head movement, linked to a division of heads into L-related and non-L-related. As just mentioned, there are four objections to the RM account of how the HMC follows from the ECP. First, it is clear that many instances of head movement are triggered by some morphological property of the host head. For instance, V-to-I movement in languages like French or Middle English is clearly related to properties of the agreement paradigms of those languages. Rizzi & Roberts (1989 [this volume, Chapter 9]) refer to this kind of morphological triggering of head movement as m(orphological)selection. It is arguable that the standard cases of verb-movement in the Germanic languages are all cases which are triggered by m-selection (cf. in particular Tomaselli (1990) for arguments that V2 Cs have an abstract morphological feature which triggers I-to-C). M-selection, like other types

Two Types of Head Movement in Romance  337 of selection, is strictly local (cf. Chomsky (1965)). Thus there is at best a redundancy between the notion of morphological selection (which is independently needed to account for the differential triggering of head movement, e.g. this is how we distinguish noun-incorporating languages from non-noun-incorporating languages, languages with applicatives from those without applicatives, languages with synthetic causatives from those without synthetic causatives, etc.) and the HMC. This is especially clear if we adopt some version of Chomsky’s (1991) economy guideline, since we would not, other things being equal, expect a head to move from a position where it was not m-selected. This means that heads inside adjuncts and subjects could never undergo head movement. It seems, then, that the “classical” HMC can be derived from m-selection combined with the economy guideline, both of which are needed quite independently of head movement. In that case, one might wonder whether the ECP should be able to “see” head traces at all; one could make the general assumption that the ECP (like the related pro-module) only checks XPs, and that head traces which do not contribute to semantic interpretation are typically deleted in the mapping to LF in accordance with economy. We will return to this point below. Second, Rizzi’s definition of typical potential antecedent-governor in (7) is conceptually unsatisfactory (the same point is made by Chomsky & Lasnik (1993)). (7a) and (7b) appear to form a natural class of positions (pace Chomsky & Lasnik (ibid)), but (7c) does not appear to belong to this class. (7c) differs from (7a,b) in that (7a,b) refer crucially to the functional distinction between A and A′-positions, while (7c) refers to the purely structural notion of X. Also, (7a,b) refer to specifier positions, while (7c) does not. We should be suspicious of such an unnatural definition. It is clear that (7c) is the problematic component and this is the part which derives the HMC. Third, it is in fact (7c) alone that guarantees the ill-formedness of the traces in examples like (2). The definition of typical potential head-governor in (6) simply guarantees that X in (2) does not head-govern its trace, but it does not say that the trace is not head-governed. Neither (6) nor (7c) says anything about Y’s ability to head-govern this trace. Since the structure is ruled out because the trace is in any case not antecedent-governed, the point may seem moot, but we have just seen that the definition which gives the result that the trace is not antecedent-governed is suspect. Rizzi (1990) deals with this point by stipulating that head traces are not subject to an independent head-government requirement. This move effectively compounds the conceptual problem surrounding how (2) is explained by the ECP. We will show in what follows that this stipulation can be abandoned with empirical and conceptual gain. Fourth, in the context of a conjunctive ECP, we expect to find selective violations of one or other clause of this condition. In the case of XP-movement, it is in fact quite easy to construct examples of each type of violation for

338  Ian Roberts both A′and for A-movement (still adopting the background assumptions of Rizzi (1990: chapter 2)): (8) a. b. c. d.

*How do you wonder which problem to solve t t′? *Who did you say that t left? *John seems that it is likely t to win. *John was preferred for t to leave.

In (8a), t′ is head-governed (as the possibility of the corresponding short movement shows), but not antecedent-governed since which problem functions as a typical potential antecedent-governor for this trace. In (8b), on the other hand, t is antecedent-governed (in Rizzi’s system) since there is no typical potential antecedent-governor which intervenes between the trace and who. However, the trace fails to be head-governed since that is not a governor (and I cannot govern its specifier, cf. Rizzi (1990), Koopman & Sportiche (1991)). (8c,d) illustrate the same for NP-movement. (8c) is a case of super-raising where the trace fails to be antecedent-governed owing to the presence of the typical potential antecedent-governor it occupying an intervening A-specifier position. Nevertheless, the possibility of short raising shows that we must take this trace to be head-governed, presumably by the raising predicate. Finally, on the assumption that for, although it is a governor, is not a proper headgovernor, the trace in (8d) fails to be properly head-governed, although it is antecedent-governed (as the grammaticality of the corresponding sentence without for shows). There are no corresponding examples involving head movement. Since we want to assimilate head movement as far as possible to other cases of movement, i.e. XP-movement, we should wonder why this is. Once again, Rizzi’s stipulation that head traces are not subject to a separate head-government requirement answers the question, but in an unsatisfactory way. What we wish to do in this paper is construct an account of the locality conditions on head movement which overcomes this objection, as well as the three given above. In the context of a research program which aims to assimilate head movement as far as possible to XP-movement, I take this goal to be desirable. In the foregoing, we have seen conceptual reasons to want to reformulate the conditions on head movement. We have alluded to an alternative account in terms of m-selection and economy. Such an account would claim that head movement is always and only triggered by m-selection. This account would capture the impossibility of “long” head movement of the type schematized in (2) and exemplified in (3); these would be violations of m-selection, since only I can be m-selected by C, and violations of economy, since there is no trigger for movement of the lower head. The impossibility of head movement from subjects and adjuncts discussed in Baker (1988) would similarly result from the combination of the locality condition on m-selection and economy.

Two Types of Head Movement in Romance  339 This kind of approach works for the kind of case just mentioned. However, there are cases of head movement which are not triggered by a morphological property of the host of head movement.2 In terms of what was just said, we would expect these cases of head movement not to be local in the same way as the cases discussed here. This prediction turns out to be correct, but there is a locality condition. The central point of this paper is to show (a) that this locality condition should be formulated in terms of antecedent-government, (b) that this condition is related to the putative antecedent-government condition on traces of “standard” kinds of head movement, along the general lines proposed by Rizzi (1990) and summarized above, and (c) that there is a head-government requirement on all head traces, both traces of m-selected head movement and traces of nonm-selected head movement. If (a–c) are correct, then the ECP does apply to head traces after all.3 In the next sections I will try to provide some empirical motivation for the claim that there exist cases of head movement which are not triggered by m-selection. Once we have seen some data, we can provide the details of the overall theory of constraints on head movement.

3.  Lema & Rivero on “Long Head Movement” In a series of interesting and important recent papers, Rivero (1991, 1993, 1994) and Lema & Rivero (L&R) (1990, 1991a,b) discuss a case of head movement which cannot be regarded as m-selected. Lema & Rivero refer to this phenomenon as Long Head Movement (LHM). LHM is found in various Slavic and conservative Romance varieties: Bulgarian, Czech, Rumanian,4 literary European Portuguese (EP) and Old Spanish (OSp) (L&R (1991b) also mention Old Provençal, Old Catalan and Early Italian; Rivero (1991) discusses Slovak and Serbo-Croatian, while Borsley et al. (1996) show that the same phenomenon exists in Breton). Here we will limit our discussion and illustration to the Romance cases. LHM constructions have the following general form: (9) Prt/infini Aux+AGR ti In (9), a non-finite verb-form is moved over a finite auxiliary (the non-finite form may itself be an auxiliary). (10) gives some examples of the schema in (9) (see L&R (1991a)): (10) a. b.

Seguir-te-ei por toda a parte (EP) Follow-you-will-(I) by all the part ‘I will follow you everywhere’ Darte he un exemplo (OSp) Give-you (I) will an example ‘I will give you an example’

340  Ian Roberts To the extent that the finite auxiliary is a head which c-commands the base-position of the verb (and this is certainly the natural assumption to make), the derivations of these sentences clearly violate the HMC in that the non-finite verb “skips” the finite auxiliary. It is less clear whether the representations of this kind of sentence violate the HMC. One way to “save” the standard HMC is to propose that the non-finite verb and the auxiliary are coindexed (perhaps because they form a “tense chain” in the sense of Guéron & Hoekstra (1988)), or that the auxiliary undergoes head movement to the position occupied by the non-finite verb after verb-movement, possibly at LF. Variants of these solutions are proposed inter alia in Lema & Rivero (1990) and Lema (1990); they all give the result that at SS a single chain contains the non-finite verb, the finite auxiliary and the trace. In this way, the SS (and LF) representations of these sentences do not violate the HMC. For our purposes, the question of whether these constructions technically violate the HMC or not is secondary to the question of the trigger for movement of the non-finite verb. If we can show that this movement is not triggered by m-selection, then the fact that the movement does not obey the same constraints as m-selected head movement (i.e. the “classical” HMC) comes as no surprise if, as we suggested in the previous section, the “classical” HMC derives from m-selection. LHM is triggered in EP and OSp by the ban on clitic-first orders, the Tobler-Mussafia Law of Romance philology (cf. Tobler (1875), Mussafia (1983); for recent analyses, see de Kok (1985), Benincà (1989, 1991), Alberton (1990), Salvi (1991) and Cardinaletti & Roberts (2002 [this volume, Chapter 12])). Similarly, the Slavic cases of LHM are all triggered by clitic pronouns, auxiliaries or particles that obligatorily occupy the second position in the clause. The Tobler-Mussafia Law can be roughly formulated as the following filter (cf. also Benincà (1991)): (11) *[CP Y clitic . . . where Y = ∅ or another clitic. LHM is the case where Y is a non-finite verb, and the clitic is an auxiliary verb. This constraint was operative in all the Medieval Romance languages (with some variation which need not concern us here; see Benincà (1991)), as the following examples illustrate: (12) a. Rogó-le el alcalid que ge-lo departisse (OSp) Asked-him the judge that him-it tell ‘The judge asked him to tell it to him’  (Zif 141:L&R (ibid))

b. Voit le li rois (Old French) Sees him the king ‘The king sees him’ (Charroi de Nimes, 58; Cardinaletti & Roberts (2002 [this volume, Chapter 12]))

Two Types of Head Movement in Romance  341 c. Vogliolo sapere da mia madre (Old Italian) (I) want-it to-know from my mother ‘I want to know it from my mother’ (Novellino, III; Alberton (1990)) However, in Old French and in Old Italian we do not find cases where a non-finite verb has moved in front of an inflected auxiliary, to give the order V[-fin] CL AUX. Instead, we find the order AUX CL V[-fin], as in: (13) a. Ai le jou bien fait? (Old French) Have it I well done? ‘Have I done it well?’ (Clari:71, 27; de Kok (1985: 83)) 

b. Hailo tu fatto per provarmi? (Old Italian) Have-it you done to try me? ‘Have you done it to try me?’ (Alberton (1990))

Examples of this type are not found with the future auxiliary in Spanish and Portuguese. So the future auxiliary cannot act as a first-position element which “protects” a second-position pronominal clitic from first position in OSp and EP. We propose that this is due to the fact that the auxiliary itself is a clitic. This property distinguishes the OSp future auxiliary from other auxiliaries in the Medieval Romance languages (the cognate of the OSp future auxiliary was an affix during the entire medieval period in French and elsewhere). Because the future auxiliary is a clitic, it itself requires that some element precede it, in accordance with the ban on first-position clitics in Medieval Romance. This gives rise to mesocliticization, i.e. the order infinitive–clitic pronoun–clitic auxiliary as in (10). If some other element, e.g. a wh-phrase, is fronted to a position within CP, LHM does not and in fact cannot take place: (14) A quien nos dar-édes por cabdiello? to who us give-you+will as leader? ‘Who will you give us as leader?’ Here the infinitive is fronted to the auxiliary, but follows the pronominal clitic; it is in a different position from the one it occupies in (10) (in the Slavic languages the verb does not move at all when some other constituent is fronted). We conclude that LHM is a last-resort operation which satisfies (11).5 Moreover, the order of the elements in (10) clearly shows that the auxiliary cannot be considered an affix, since pronominal clitics do not intervene between stems and affixes. We conclude that LHM cannot be a case of m-selected head movement.

342  Ian Roberts L&R show that, among others, LHM has the following properties in all known cases: (15) a. It is restricted to root contexts. b. It is blocked by negation. Let us consider these properties one by one. The restriction to root contexts is illustrated by the Portuguese example in (16): (16) Uma historia .. onde me referirei de espaço a elle A story where me (I) refer-will at length to her ‘A history where I will refer to her at length’ 

(ibid., (9a), p. 4)

In the relative clause here, the order is clitic-infinitive-auxiliary. This is always the case in non-root contexts. In the Slavic languages, the verb remains in situ in non-root contexts, while in the Romance languages it moves to the auxiliary, as (16) shows (cf. (14) where we see the same phenomenon where some other constituent is fronted). We follow L&R in taking the root nature of this phenomenon as an indication that it involves movement to C; since den Besten (1983), it has been standard to regard root cases of head movement as involving C. So LHM is V-to-C movement (see Benincà (1991)). The clitic element occupies a functional head position immediately below C (arguably the AGR1 of Cardinaletti & Roberts (2002; this volume, Chapter 12)). The essential locality condition on LHM is revealed by the fact that negation blocks it. This is illustrated by the OSp example in (17): (17) Aqui non vos faran si non todo plazer Here not to-you make-will-3pl if not all pleasure ‘Here they will not give you anything but pleasure’ (ibid., (13b), p. 5) Here we have the same order as in non-root contexts: clitic-infinitiveauxiliary. This is systematically the case in negative clauses. We could try to account for this by saying that negation is a head, and as such head movement cannot take place across it. Thus, the following derived structure is ruled out by the HMC: (18) [C V[-fin]i ] . . . Neg . . . ti . . . However, if this is true, why does an inflected auxiliary not similarly block head movement in the following structure? (19) [CV[-fin]i] [AGR(cl-) Aux[+fin]]  . . . ti . . .

Two Types of Head Movement in Romance  343 Note also that an analogous problem arises in the analysis of do-support in English. Negation clearly does not block have/be raising, but recent analyses (notably Chomsky (1991), Pollock (1989)) have proposed that do-support is obligatory with clausal negation because negation blocks either affixhopping or LF verb-movement. Again (although this time at LF), negation appears to selectively block head movement. In Roberts (1993: 1.3) I proposed treating the fact that negation seems to selectively block head movement by extending the A/A’ distinction to the head level. This meant that there were two types of heads: A-heads and A’-heads, with T and AGR being A-heads and Neg and C A′-heads. Then, by extending Relativized Minimality to the X°-level, it was possible to claim that movements to A and A’-head positions would not “interfere” with each other, and the fact that negation selectively blocks A’-movement could be explained. In her commentary on the earlier version of this paper, Iatridou pointed out that this did not seem to be a natural extension of the A/A′-distinction and also underlined the fact that this distinction is theoretically unclear even for maximal projections (cf. Chomsky & Lasnik (1993), Koopman & Sportiche (1991)). For this reason I propose to reformulate the basic idea of earlier work in terms of the notion of L-relatedness. Chomsky & Lasnik (1993) define L-relatedness as follows: “Given a lexical head L, we say that a position is L-related if it is the specifier or complement of a feature of L.” Certain functional heads, e.g. T and AGR, are taken to be features of V; others, e.g. C, are not. We propose, then, that the notion of L-relatedness can be extended to heads along the following lines: (20) Given a lexical head L, a position is L-related if: (i) it is a feature of L; (ii) it is a specifier or complement of a feature of L. Being a feature of L in Chomsky & Lasnik’s sense certainly includes m-selecting L, although it is presumably a broader notion (below we will see some evidence that AGR is always L-related whether it m-selects V or not). So L-related heads m-select other heads, thereby triggering “classical” local head movement (as mentioned in note 2, this property can be “induced” by the Specifier). Non-L-related heads do not directly trigger head movement but this in itself does not disqualify them from being landing sites for head movement. Under the right conditions (notably if head movement is triggered by something else, e.g. (11)), they may serve as landing sites for head movement. It should be borne in mind that at the Xº-level, as at the XP-level, there is in principle no inherent connection between category and function; just as NPs can occupy either L-related or non-L-related positions, so we may expect that a given head can be either L-related or non-L-related, depending on factors other than its category (for example, its internal structure). Moreover, we do not mean to imply that a head is L-related iff it has

344  Ian Roberts an m-selection feature, since this would disqualify I in Modern English, with the undesirable consequence that Spec of IP would be a non-L-related position. Rather, there is a one-way implication: if a head has an m-selection feature, then it is L-related. Taking up the suggestion in Chomsky & Lasnik (1993) that the A/A′ distinction is to be fully replaced by the notion of L-relatedness, we can now reformulate the definition of “typical potential antecedent governor” given in (7) as follows: (21) W is a typical potential antecedent-governor for Z = i. in a non-L-related chain: for Z = XP, W is an XP in a non-L-related position c-commanding Z; for Z = X°, W is a non-L-related X° c-commanding Z. ii. in an L-related chain: for Z = XP, W is an XP in an L-related position c-commanding Z; for Z = X°, W is an L-related X° c-commanding Z. Chomsky & Lasnik (1993) propose a characterization of the notion of a chain which is uniform with respect to L-relatedness, which amounts to the following: (22) The chain C = (a1. . . , an) is uniform with respect to L-relatedness if each ai is L-related or each ai is non-L-related. Still following Chomsky & Lasnik (ibid), we assume that only in “mixed” chains, i.e. those where an is an L-related position and all other positions are non-L-related, are intermediate traces of moved heads deletable. The definitions in (21) may in large part follow from other considerations, although we will retain them in this form for convenience. Chomsky & Lasnik argue that XP-movement is constrained as described in (21) as a consequence of economy. As we have seen, (21ii) in its application to heads follows from the locality condition on m-selection combined with economy (to the extent that m-selection is coextensive with L-relatedness for heads; see above). (21i) in its application to heads may also derive from economy considerations; the very fact that all cases of LHM are instances of “lastresort” movement triggered by the filter in (11) strongly suggests that this is true (although some of the instances of non-L-related head movement to be discussed in section 4 are less obviously economy effects). As a consequence of (21), the Relativized Minimality Condition will apply to heads in a fashion analogous to the way in which it applies to maximal projections. This immediately overcomes the conceptual objection that Rizzi’s definition of typical potential antecedent-governor mixes structural and functional notions. This reformulation defines the intervener for head movement in functional terms too, eliminating this asymmetry.

Two Types of Head Movement in Romance  345 Now, we have seen that LHM is non-m-selected movement to C. Therefore it is a case of head-movement to a non-L-related position. As such, AGR does not count as an intervener since it is L-related and so the non-finite verb is free to move over it. We assume that negation is non-Lrelated, at least in the languages in question (Ouhalla (1990) argues that negation m-selects V in Turkish and Berber; if so, then we make clear predictions about how negation interacts with head movement). In that case, negation does count as an intervener. This provides us with an explanation for the fact that negation blocks LHM, but an inflected auxiliary (which we take to be in AGR) does not. (The English case is clearly more complex since it involves an LF operation; we turn to it in section 6).6 So, to sum up, L&R’s data show that there exist cases of head movement which are not triggered by m-selection. This motivates a restatement of the condition on head movement in terms of the notion of “typical potential antecedent-governor” along the lines of (21). What we have argued in this section is that head-to-head movement is subject to antecedent-government, although antecedent-government in general may well derive from economy conditions on chain formation. However, if we continue to assume a conjunctive ECP (even as a descriptive artifact), there is another factor which we have not yet considered: the fact that traces of head movement must be head-governed. As we mentioned in the previous section, the independent nature of this requirement is obscured in a system in which the HMC holds in full since the same head—the one which is the minimal governor of the trace of incorporation—will always be both the antecedent-governor and the head-governor of the lower trace. In a system like the one we are proposing here, however, this difference is of real empirical importance. In configurations like (9) or (19), V[-fin] antecedentgoverns ti from C but the minimal governing head must head-govern it. In the case of LHM of non-finite verbs we can assume that the verb has moved through T (to pick up infinitival morphology in the examples we have discussed), and so what we really have is T-to-C movement. The minimal governor for T is AGR (AGR2 in terms of Cardinaletti & Roberts (2002 [this volume, Chapter 12]); note that cases where a finite verb moves to C to satisfy (11), e.g. (12), are different in that “long” movement is from AGR2 “skipping” AGR1), Thus we see that a necessary (but not sufficient) condition for LHM is that AGR must have the capacity to head-govern a head trace. In the rest of this paper, we will explore the consequences of an approach to the HMC based on the version of RM extended to heads as in (21), along with the related idea that the head- and antecedent-governors for a given head trace may be separate elements. Our argument runs as follows: first, we identify further, less obvious, cases of LHM in Romance; second, we show how all the LHM constructions depend on AGR’s ability to head-govern a head trace, an ability which we take to be subject to parametric variation; third, we show how this postulated parametric property of AGR allows us to give an interesting account of a series

346  Ian Roberts of related syntactic changes in the history of French, and a related typology of the Romance languages. The empirical results regarding the history of French and the typology of Romance thus support the contention that head traces are subject to a head-government requirement, and therefore that the ECP applies to head traces, It does not appear that this requirement can be subsumed under economy; one possibility, following Aoun et al. (1987), is that it is a PF-requirement. In the final section of this paper we will provide an argument to that effect.

4.  Other “Long” Movements 4.1 Clitic-Climbing In this section we propose an analysis of clitic-climbing which treats this construction as a case of non-m-selected head movement. The analysis is largely inspired by Kayne (1989), and is best considered as an elaboration of the proposals in that paper. Our analysis focuses initially on Italian, although as far as we are aware the comparable phenomena of Spanish could be handled in the same way. In section 5, we will discuss the historical development of this construction in French. We propose the following initial characterization of Italian clitics: (23) Clitics adjoin to AGR, possibly via successive-cyclic movement. As in the case of XP-movement, successive-cyclic movement takes place in order to satisfy locality conditions, in particular the ECP. We will follow Sportiche (1998) in assuming that an initial step of cliticization involves XP-movement, which we take to be adjunction to TP. We construe successive clitic-movement as excorporation, in the sense of Roberts (1991 [this volume, Chapter 10]). Roberts (1991 [this volume, Chapter 10]) shows that excorporation is possible from an adjoined position or from a position into which a head has “freely” substituted, but not from an m-selected position into which a head has been substituted to satisfy m-selection. In the domain of XP-movement, adjoined positions are always A′-positions, i.e. non-L-related positions, We will make the same assumption for head movement: no adjoined position is an L-related position. Thus the chain formed by clitic-movement is a non-L-related chain. Moreover, if we take the base position of clitics to be D (cf. Postal (1969)), then clitic chains are uniformly non-L-related chains in the sense defined in the previous section. What (23) states is that once adjoined to AGR, clitics do not further excorporate (although they may move with AGR). This is not the place to further elaborate the theory of cliticization, but we tentatively take this attraction to AGR to be caused by some intrinsic property of the clitics, qua Ds. Although (23) requires clitics to adjoin to AGR, it does not say that they have to adjoin to the local c-commanding AGR. In the general case, the clitic will not in fact move further than the local AGR because further

Two Types of Head Movement in Romance  347 movement will be impossible. Should a clitic move to the non-local AGR, we have the following configuration: (24) [ AGR cli [ AGR ]] . . . V [CP [C (ti )] . . . AGR . . . ti . . . VP Here the clitic passes through the lower Comp, since Comp (in Italian) is a non-L-related head (CP is also a barrier by inheritance; clitic movement, like all other movement, must obey subjacency). Economy considerations then predict that short movement must be preferred to the representation in (24), since this representation contains an extra trace in C as compared to the derivation where the clitic moves just to the lower AGR. Thus (optional) clitic-climbing should never be found. One possibility would be that “restructuring” verbs, i.e. those which tolerate clitic climbing, have a “defective” CP complement which, unlike other CPs, does not block either head movement or NP-movement from within (cf. Rizzi (1982) on how restructuring verbs allow certain cases of “long” NP-movement). In that case, clitic-climbing would not necessarily create a “superfluous” trace in the lower C, and economy would not be violated in (24). We will not speculate on what exactly this “restructuring” property may be, but, as Kayne (1989) points out, it cannot be simply CP-deletion, since there is at least one restructuring verb that can take a [+wh] C, namely sapere ‘know’. Whatever the special property of restructuring, where V is volere ‘want’, for example, the configuration in (24) is allowed, but where V is decidere (‘decide’), it is not: (25) a. Lo voglio fare   ‘I want to do it’ b. *Lo decido di fare   ‘I decide to do it’ In our terms, it is natural to propose that the basic property of “restructuring” verbs is that they are able to head-govern a clitic-trace in the C of their complement, while non-restructuring verbs are not. This in turn may be related to the intuition that these verbs are somehow “closer” to their complement clause than other types of infinitival-taking verbs, an idea instantiated in clause-union analyses of clitic-climbing. Note that in (24) the clitic does not pass through the lower AGR. The characterization of cliticization in (23) rules out this possibility: the clitic cannot excorporate from AGR. However, nothing prevents a clitic from moving to the lower AGR and staying there; this is what obligatorily happens with non-restructuring verbs and, in Standard Italian, this is also an option with restructuring verbs: (26) a. Voglio farlo b. Decido di farlo

348  Ian Roberts (The position of the infinitive indicates that the infinitive has moved in these examples—cf. Kayne (1991) and section 4.2). In these terms, then, clitic-climbing is the case where (23) is satisfied by movement to a nonlocal AGR. There is a further condition on this movement, however: the low AGR must be able to head-govern the clitic trace in the DP adjoined to TP (we assume that this trace is undeletable at LF since it is the clitic’s base position). So, for (24) to be well-formed, V must head-govern the clitic-trace in C and AGR must head-govern the lower trace. As we have mentioned, V’s property seems to be lexically determined. We will argue that (non-finite) AGR’s head-government property reflects a parametric option that is taken in Italian but not in contemporary French. The clitic-trace in the lower T is antecedent-governed by the trace in C; AGR does not count as an intervening head since it is L-related.7 So we assimilate clitic-climbing, in essential respects, to LHM. Cliticclimbing clearly parallels LHM in one important way: it is blocked by negation. This is illustrated by the following paradigm: (27) a. b. c.

Non lo voglio fare *Lo voglio non fare Voglio non farlo ‘I don’t want to do it’

Taking non to be Neg and Neg to be a non-L-related head, we see why clitic-climbing is blocked in (27b): Neg prevents the trace of the climbed clitic in C from antecedent-governing the clitic trace in T. For this account to work, we must also prevent excorporation from Neg. To do this, we make the following assumption: (28) Excorporation from non-L-related heads is impossible. This idea is related to the fact that economy considerations seem to underlie the definition of typical potential antecedent-governor given in (21). Following Roberts (1991 [this volume, Chapter 9]), excorporation cannot take place from m-selected positions. However, adjunction to an m-selecting, L-related head followed by excorporation from it is allowed (except where that head is AGR in the case of clitic movement). On the other hand, adjunction to a non-m-selecting, non-L-related head followed by excorporation is in conflict with the idea that movement never goes further than the nearest appropriate landing site—essentially the idea behind deriving Relativized Minimality from economy. Thus negation, like any non-L-related head, will always break non-L-related head chains. As we have already mentioned, our analysis of clitic-climbing is very close to that proposed by Kayne (1989). Kayne proposes that Italian differs from French in that non-finite Infl is able to L-mark VP in Italian, but

Two Types of Head Movement in Romance  349 not in French. Hence an infinitival VP is not a barrier to clitic-movement in Italian, while it is in French. Kayne in turn relates this capacity of Infl to the null-subject property of Italian, suggesting that an Infl which is able to license null subjects in finite clauses is sufficiently “lexical” to L-mark VP in non-finite clauses. However, if we combine the results of work done subsequent to Kayne (1989), essentially Pollock’s (1989) analysis of short movement of infinitives in French and the related “split-Infl” hypothesis along with the idea proposed inter alia by Koopman & Sportiche (1991) that the base position of the subject is VP-internal, it emerges that an L-marking account of the difference between French and Italian is not tenable. As is well-known, Pollock (1989) argues that infinitives in French move to T (reformulating his clause structure along the lines of Belletti (1994)). This is shown by an example like the following: (29) Se laver souvent les mains. . . To wash-self often the hands,. . . The adverb souvent is taken to be adjoined to VP, and so the infinitive se laver must be outside VP. However, the fact that the infinitive precedes the negator pas while finite verbs must precede this element leads Pollock to argue that infinitives undergo just the “short” movement to T in (contemporary) French. If this account is correct, then in Kayne’s terms we must say that T is able to L-mark VP. In that case the relevant transposition of Kayne’s analysis of clitic-climbing in the “split-Infl” system must involve AGR, and indeed this seems the natural way to retain the connection with null subjects. So we must say that French non-finite AGR is unable to L-mark TP while its Italian counterpart has this capacity. The example in (29), however, shows that this assumption is problematic. The clitic reflexive se is an anaphor requiring a local, c-commanding antecedent. Thus the PRO subject of the infinitive must appear in an L-related position c-commanding se. Moreover, if we wish to maintain the results of Chomsky’s (1981) PRO theorem (cf. Koopman & Sportiche (1991) on how this result may hold in a system where subjects are base-generated in VP), then we must require PRO to move in infinitival clauses in order to escape government by T (and possibly V) in its base position. So it seems clear that PRO moves out of VP here. Moreover, if we adopt the general proposal in Koopman &Sportiche (1991: 237–239), then PRO must have moved to Spec of AGRP (Spec of AGRP is an L-related position in infinitives since AGR is always an L-related position). Further support for the idea that the subject raises from VP in infinitival clauses in French comes from examples with an overt subject, such as the following: (30) J’ai vu Jean se laver souvent les mains I’ve seen John wash-self often the hands

350  Ian Roberts Here the same reasoning applies as in (29). The infinitive is in T, as the position of the adverb shows, and the subject must be in an L-related position so as to bind se. In her study of the complements of perception verbs, Guasti (1992: 202–204) argues that these complements are AGRSPs (equivalent to our AGRP) in French. Hence, the only L-related position from which the subject can bind se in (30) is Spec of AGRP. Thus NP-movement to Spec of AGRP is possible in French infinitives, and AGR must L-mark TP. The movement of the infinitive alone shows that T L-marks AgrP. These facts do not vary between French and Italian (although infinitives move further in Italian; see Belletti (1990), Guasti (1992) and below), and so it seems that we are compelled to say that non-finite AGR can L-mark TP in French. In that case, we must look elsewhere for an account of the differences between French and Italian in terms of clitic-climbing. Kayne in fact leaves open the question of how NP-movement out of nonfinite VPs is possible in French (and his article was written prior to the general adoption of the VP-Internal Subject Hypothesis) but we consider that an account which can encompass the basic insight of his analysis of clitic-climbing and avoid the problem that his L-marking account poses for the analysis of NP-movement is to be preferred over the one he proposes. We adopt the following definition from Cinque (1991): (31) XP is L-marked if XP is directly selected by an X ≠ [-V]. This means that non-complements (subjects and adjuncts) and complements to N and P are not L-marked, while complements to V and A are L-marked. Functional categories are not specified for [+V], hence they are not [-V], hence they L-mark their complements. This accounts for the possibility of NP-movement to Spec of AGRP in French infinitives (and in infinitives generally). We account for the possibility of clitic-climbing in Italian in terms of AGR’s head-government capacity. Italian AGR is able to head-govern a clitic-trace, while French AGR is unable to. Just as in Kayne’s analysis, we are able to make the natural connection to the null-subject parameter: Italian AGR is “stronger” than French AGR (see section 6). This account implies that French clitics can move no further than TP, which seems correct ‑ cf. Kayne (1989). Presumably, they adjoin first to VP, as proposed by Sportiche (1998). Since we are treating clitic-climbing as a variety of LHM, we might wonder why LHM of the type discussed in the previous section is absent in Italian. Recall, however, that we posited AGR’s head-government capacity as a necessary condition for LHM; there may be other conditions. For OSp and EP there are two further conditions: the Tobler-Mussafia Law, i.e. the filter in (11), must hold and there must be a class of clitic auxiliaries (a subvariety of what Lema & Rivero call functional auxiliaries) which are unable to act as hosts for the enclisis required by this filter. In Modern Italian the

Two Types of Head Movement in Romance  351 Tobler-Mussafia Law does not hold, as the general availability of clitic-first orders shows. It is well known that this Law held for Medieval Italian, however, but the general absence of OSp/EP-style “mesocliticization” indicates that all the Italian auxiliaries were able to act as hosts for enclisis (cf. the contrast between (10) and (13) in the previous section). Whatever the precise account of the differences between Italian and the languages discussed in the previous section, the absence of LHM of the kind seen there from Italian does not pose a problem for our analysis of clitic-climbing. Conversely, however, we expect that the languages discussed in the previous section will have clitic-climbing, other things being equal, because AGR is able to head-govern a clitic-trace. This expectation is borne out, as the following examples show: (32) a. Como lo podré catar? (OSp) ‘How will I be able to look at him?’ (L&R (1990b:l5)) b. receio de que alguém nos pudesse ouvir (EP) ‘the fear that someone could hear us’ (1893, Machado de Assisi) (32) shows that the implicational statement “If LHM, then clitic-climbing” is true. This is because both constructions depend on AGR being able to head-govern a clitic trace. Moreover, Kayne (1989:254) proposes the implication “If clitic-climbing, then null subjects.” It therefore follows that the following must hold: “If LHM, then null subjects.” To the best of my knowledge, this statement holds for the whole of Romance both synchronically and diachronically. In this section we have proposed an analysis of clitic climbing which assimilates it to LHM in the sense that, following Kayne (1989), it is a case of non-m-selected head movement. Hence clitic chains are (uniformly) nonL-related chains. Following (21), clitic movement can “skip” an intervening AGR if a higher AGR is accessible. There are two preconditions for this: (i) the higher V must be able to head-govern a trace in the C of its complement CP; (ii) the lower AGR must be able to head-govern a clitic trace. The first of these conditions is satisfied by the so-called “restructuring verbs,” the second is a parametric property—related to the null-subject parameter— which distinguished Modern Italian from Modern French. 4.2  “Long” Movement of Infinitives In this section we will discuss a further instance of non-m-selected head movement, and relate it to the parameter we are proposing concerning AGR’s capacity to head-govern a trace of non-m-selected head movement. The construction, or really family of construction types, that we are concerned with are those which involve “long” movement of infinitives. By long

352  Ian Roberts movement of infinitives we mean those cases where an infinitive is moved further than in Pollock’s “short” movement, i.e. further than Belletti’s T. Putting things this way, there are at least two cases of long movement of infinitives to distinguish: movement beyond AGR and movement to AGR. In Italian, infinitives seem to move beyond AGR, as shown by the fact that (with the exception of negative imperatives; cf. Zanuttini (1991), Kayne (1992)) enclisis is obligatory with infinitives: (33) Gianni ha deciso di non farlo/*lo fare più. ‘G. has decided to not do-it/*it-do any more’. (Belletti (1990)) Kayne (1991) proposes that enclisis of this kind is in general proclisis to an empty position. Assuming this is true, and still assuming that Italian clitics always move to AGR, far(e) here must have moved to some position above AGR. Note that the fact the infinitive + clitic complex precedes the negative adverb più (which Belletti shows generally occupies the same position as French pas) indicates that both elements are outside TP. We conclude then that Italian infinitives raise to a position above AGR, although we have nothing to say about the precise nature of that position here beyond the fact that the movement does not appear to be the result of m-selection (pace Belletti (1990); Kayne (1991) proposes that the infinitive adjoins to I’). Nor do we have anything to say concerning the trigger for infinitive movement beyond T. We treat movement of the infinitive beyond AGR as involving essentially the same properties as clitic-climbing and LHM: movement of the infinitive, being non-m-selected, forms a non-L-related chain, thus the infinitive is able to skip AGR since this is an L-related head. The trace of this movement is thus antecedent-governed by the moved infinitive and headgoverned by AGR. So the well-formedness of this construction depends on AGR having the capacity to head-govern. More precisely, we propose the following substructure for the relevant parts of sentences like (33): (34) Infi [ AGR clj AGR] . . . [TP t j [TP [ T ti ] . . . T m-selects the infinitive. The chain linking the trace in T to the trace in the V-position is thus an L-related chain, while the subsequent movement to the position above AGR forms a non-L-related chain. The traces in TP are head-governed by AGR, and neither of them can be deleted at LF. The clitic-trace cannot be deleted because it is part of uniformly non-Lrelated chain (cf. the discussion of clitic-movement in the previous section, and the discussion of Chomsky & Lasnik’s notion of uniformity in section 3). The chain formed by moving the infinitive successively from V to its SS position is not a uniform chain; however here T carries substantive

Two Types of Head Movement in Romance  353 information about finiteness, and so cannot be deleted in LF (cf. Chomsky (1991)). A number of Romance languages manifest the order clitic-infinitiveadverb. According to Kayne (1991), this is the case in Sardinian, while Motapanyane (1991) shows this to be the case in Rumanian. Moreover this was true of Middle French, as the following examples (from de Kok (1985: 335), cited in Alberton (1990)) illustrate: (35) a. car elle (. . .) commença a ne les chercher pas for she began to NEG them look-for not ‘for she began not looking for them’ (Hept., 65) b. Le pauvre gentilz homme (. . .) les pria de ne les abandoner point The poor gentleman them begged to NEG them abandon not ‘The poor man begged them not to abandon them’ (Hept., 3) Assuming that pas and point occupy their modern positions (cf. Pearce (1990) for a detailed discussion of the interaction of infinitives and negation in Old and Middle French), the infinitives in these cases are raised at least as far as AGR. These examples also illustrate the order clitic-infinitive, which is the usual one for most of Old and all of Middle French, although there are some cases of enclisis to infinitives in Old French (cf. de Kok (1985: 285)). We take the cases in (35) to be movement to AGR. This kind of case is thus not an instance of LHM, in that the infinitive obeys the classical HMC by moving from V to T to AGR. Nevertheless AGR does not m-select the movement of the infinitive (as shown by the total lack of agreement morphology on infinitives in Old French, Middle French and Rumanian; Sardinian, on the other hand, does have an inflected infinitive and so the following remarks do not necessary apply to that language) and thus head-government of the trace of movement to AGR depends on AGR’s general ability to head-govern. In this sense, the two types of “long” movement of infinitives both depend on this property of AGR, even though only movement beyond AGR violates the classical HMC. Once again, we see the connection to the null-subject parameter. Kayne (1991) points out that if a language has the order INF-ADV, it allows null subjects. In other words, if a language allows either kind of “long” infinitive-movement, either to AGR or past AGR, then it has null subjects. So AGR’s head-government capacity is related to its capacity to license null subjects. Let us refer to this capacity as that of being a “generalized licenser” for empty categories (in a manner rather similar to that proposed by van Kemenade & Hulk (1995)). More simply, we will refer to AGR which is a generalized licenser as [+L]. So we see that Italian AGR is [+L], while (Modern) French AGR is [-L].

354  Ian Roberts 4.3 Aux-to-Comp Rizzi (1982: chapter 3) discusses an Italian construction where a participial or infinitival (or, more marginally, a subjunctive) auxiliary inverts around a subject and assigns nominative Case to that subject, as in (36): (36) Avendo Gianni fatto questo,. . . Having John done this,. . . As with various types of Germanic and French inversion which yield the order Aux-subject-Prt, Rizzi argues that this is an instance of movement to C (Rizzi adduces the complementary distribution of conditional inversion with “if,” exactly parallel to Germanic and French, as evidence). Now, a striking property of (contemporary) Italian is that inversion around an overt nominal subject is in general impossible (these examples are more acceptable at a very high stylistic level, but many native speakers reject them): (37) a. *Ha Gianni preso il libro?   ‘Has G. taken the book?’ b. *Che film ha Gianni visto?   ‘Which film has G. seen? ’ In this respect, Italian patterns with French and against Germanic, cf.: (38) a. Has John spoken? b. *A Jean parlé? Rizzi & Roberts (1989 [this volume, Chapter 9]), adapting ideas from Koopman & Sportiche (1991), rule out (38b) by assuming that French chooses a different parametric option for nominative assignment to that chosen by English and the other Germanic languages, one which gives the result that nominative cannot be assigned to Spec of AGRP in inversion contexts. In French, nominative Case cannot be assigned under government, hence there is no way for Jean to receive Case in (38b) as inversion destroys the context for nominative assignment (see Rizzi &Roberts (1989 [this volume, Chapter 9]) for details). This account carries over naturally to Italian, accounting for the ungrammaticality of (37). However, Aux-to-Comp as in (36) is now problematic: why is nominative assignment under government possible here but not in (37)? The answer to this question requires an elaboration of the account of nominative assignment in Italian. First, it is well-known that Italian allows so-called “free inversion”: (39) Telefonò Gianni Phoned Gianni

Two Types of Head Movement in Romance  355 There are good reasons to think that the inflected verb is not in C here: the fact that the subject must follow the participle in a compound tense rather than appear between the auxiliary and the participle; the fact that the construction is not sensitive to the root/embedded distinction; the fact that the verb is not sensitive to the content of the complementizer, etc (cf. Kayne (1972)). We thus adopt a variant of the proposals in Giorgi & Longobardi (1991) and Rizzi (1990) and assume that the subject is in its base position here, and it receives Case from T under government. In (39), the inflected verb has moved on to AGR, but the trace in T is still able to Case-mark the subject thanks to the Government Transparency Corollary (GTC, cf. Baker (1988); note that there is no Agreement Transparency Corollary to save (37) and (38b)): (40) [ AGR ′ [ AGR V+T+AGR ] [TP [T t] [ VP t NP]]]. T assigns nominative to the postverbal subject in this configuration. In (37) on the other hand, T does not govern the position of the subject but the combination AGR+T(+V) does. AGR is also a nominative assigner in Italian, and so where T and AGR combine we have a head with the following structure: (41)

AGR[+Nom] T[+Nom]


Following standard assumptions about headedness in derived words (i.e. complex heads) and the percolation of features to the whole word (cf. Lieber (1980), Marantz (1984)), the nominative feature associated with the dominating AGR here is AGR’s and is hence not able to be assigned under government. T’s feature is blocked by the presence of AGR, and T cannot assign nominative from its base position to the subject in Spec of AGRP. This account of nominative assignment in free inversion as opposed to “Germanic inversion” constructions in Italian leads to the following conclusion: (42) T can assign nominative under government from positions where it has not combined with AGR [+Nom]. (42) provides the answer to the problem posed above. In Aux-to-Comp constructions, T does not combine with AGR [+Nom]. One reason for this could be that T does not combine with AGR at all, but moves to C directly, skipping AGR (this would be possible, since AGR does not m-select T here, as it does in finite clauses). This would give the following structure: (43) [C V+Ti ] AGR [T ti′ ] [ VP . . . ti . . .]

356  Ian Roberts Thus one natural way to account for the Case-assignment properties of Aux-to-Comp constructions is in terms of an LHM analysis. T can skip AGR here thanks on the one hand to the fact that AGR, being an L-related head, does not intervene in the non-L-related chain formed by T-to-C movement, and so T can antecedent-govern its own trace from C. On the other hand, we must assume that AGR is able to head-govern the trace in T. Since AGR does not m-select T, unlike in finite clauses, there is then no requirement that T move to AGR. Moreover, to the extent that we might wish to assume that AGR always carries a [+Nom] feature, independently of finiteness, then Case theory prevents T from moving through AGR. However, there are good reasons not to treat Aux-to-Comp as LHM, but rather as non-m-selected movement through AGR. This is because clitics are moved with the auxiliary to C in this construction: (44) a. *Avendo Gianni lo fatto,. . . b. Avendolo Gianni fatto . . . ‘John having done it,. . . ’ Given our account of Italian clitic-placement based on (23), we are obliged to say that (44b) involves AGR-to-C (recall that we have been assuming that clitics only move beyond AGR where AGR itself moves). So, rather than an LHM account, we propose that Aux-to-C involves adjunction of the nonfinite auxiliary to a non-finite AGR which neither m-selects T nor contains a [+Nom] feature. Moving through AGR in this way, the auxiliary “picks up” the clitic (this account means that Kayne’s (1991) analysis of enclisis must be slightly weakened since this instance of enclisis does not involve adjunction of the clitic an empty functional head at SS, although such an adjunction does take place during the derivation). Since m-selection is not involved, non-finite AGR must be intrinsically a head-governor in order to allow the structure. Once again, the head chain is non-uniform, since T m-selects V. The trace in T cannot be deleted, since it carries information about finiteness. On the other hand, the trace adjoined to AGR can—and therefore must—be deleted. This is a good result since it is unclear how this trace would be head-governed. So we see that the same property is manifested in Aux-to-C as in LHM constructions more generally, as well as in infinitive-to-AGR movement. This is true despite the fact that the derivation of Aux-to-C does not violate the classical HMC.

5.  The History of French We have now seen that four constructions are related to AGR’s ability to head-govern the trace of non-m-selected, and possibly therefore, long, head movement: LHM, clitic-climbing, long infinitive-movement and Aux-toComp. Moreover, following Kayne (1989, 1991), non-finite AGR’s ability

Two Types of Head Movement in Romance  357 to head-govern in this way appears to be related its ability to license a null subject in finite clauses. In this section we will see further evidence for relating these constructions from the history of French. Essentially, French lost clitic-climbing, long-infinitive movement, Aux-to-Comp and null subjects together (there are no attested cases of LHM in the Lema & Rivero sense at any stage in the history of French, for reasons that are probably connected to the nature of the OF auxiliary system—cf. the discussion of Italian in section 4.1—so we leave this construction aside in what follows). Modern French lacks clitic-climbing, long infinitive movement, Aux-toComp and null subjects: (45) a. b. c. d.

*Je le peux faire I it can do *Ayant Jean fait cela,. . . Having John done that . . . *N’aimer pas ses parents,. . . To love not one’s parents . . *Je veux faire-le ‘I want to do it’ *Avons fait cela ‘(We) have done that’

Clearly then, Modern French AGR is [-L]. In that case, (45a–c) are ruled out by the head-government requirement of the ECP, as the trace of clitic or verb-movement is not head-governed (but it is antecedent-governed on our assumptions). On the other hand, in earlier stages of French the constructions of (45) are attested. In (46) we illustrate clitic-climbing (a), Aux-to-Comp (b), and infinitive-movement (c): (46) a. b.

Nous lui devons render gloire (1536, Calvin) We to-him must give glory Ayant ce bon homme fait tout son possible . . . Having this good man done everything possible . . . (Brunot (1905: 670)) c. Car vous avez le choix de combater ou de ne combatre pas For you have the choice to fight or to neg fight not (c15, Jouvencel)

These data indicate that in Middle and Renaissance French AGR was [+L]. So French underwent a change in the parametric value assigned to AGR, roughly in the 17th century. The situation concerning the null-subject parameter at this stage of the history of French is quite complex. It seems best to characterize the system of this period as a “defective” null-subject system. Fully productive

358  Ian Roberts null subjects (whose distribution is limited in various ways at all periods of French—see below) are largely lost in the 16th century (cf. Roberts (1993: chapter 2)). In the 17th century, null subjects are still found, but subject to various restrictions. They were (a) when referential, restricted to certain persons, in particular 2pl; (b) most frequently non-referential; (c) sensitive to the properties of C, in that both referential and non-referential null subjects were favored in questions and relatives. These properties are illustrated in (47) (examples (a) and (b) are from Maupas’ (1607) contemporary grammar; (c) contains an expletive pro in Spec of AGRP while the subject has arguably remained in VP): (47) a. b. c.

Rarement advient que ces pronoms nominatifs soient obmis. ‘Rarely (it) happens that these nominative pronouns are omitted’ J’ay receu les lettres que m’avez envoyees. ‘I’ve received the letters that (you) have sent me’ Viendra jamais le jour qui doit finir ma peine? ‘Will-come never the day which must end my pain?’ (late cl6, Desporal)

Finally, it should be noted that the morphological agreement system was not at this time especially “rich”; the 17th-century paradigms were largely the same as the contemporary ones, which are generally regarded as insufficient for the identification of the content of a null subject. Nevertheless, AGR was presumably able at this period to formally license pro even if identification of the content of pro depended on other, rather hard to discern, factors. Strikingly, 17th-century French is not the only example of a defective null-subject system of this kind. In recent work, Poletto (1995) has shown that Renaissance Veneto null subjects were restricted in very similar ways (to 1sg and 1pl, otherwise expletive or in the presence a [+wh] C), and at this period the Veneto agreement paradigms were also rather “poor.” The data in (47) confirm the idea that French AGR changed from to [+L] to [−L] around the 17th century. This is sufficient to provide a diachronic argument for the correlations that we have proposed and the account that we have given of them, and thus to support the analyses in section 4 and the system of head movement proposed in section 3. The natural question to pose at this point is: what caused the value of [+L] to change in French? In other work (cf. Clark & Roberts (1993 [this volume, Chapter 2]), Roberts (1993)), I have argued for a major parametric change in the early 16th century, roughly between Middle and Renaissance French, which eliminated both V2 and interrogatives like (38b) (A Jean parlé ‘Has John spoken?’ cf. the discussion in 4.3). This change involved loss of the possibility of nominative-assignment under government, directly eliminating inversion around a nominal subject and so indirectly eliminating V2.

Two Types of Head Movement in Romance  359 Now, no attested period of French shows the kind of freely available null subjects that we find in contemporary Italian or Spanish. In Old French, null subjects could only occur in V2 contexts (cf. inter alios Adams (1987), Hirschbühler (1990), Roberts (1993), Vance (1989)). Let us adopt the approach to the licensing of null subjects put forward in Rizzi (1986), which we can formulate as follows: (48) If X licenses pro in position P in configuration C, X can Case-mark P in C. This means that null subjects in V2 contexts also depend on the possibility of nominative-assignment under government. Thus the 16th-century loss of this possibility eliminated null subjects from the grammar of French and, in so doing, rendered AGR [-L], i.e. unable to license pro. A knock-on effect of this change was the loss of AGR’s ability to head-govern head traces, with the consequences that we have just seen. On this view, then, AGR really became [-L] in the 16th century; the defective null-subject system that survives into the 17th century is a transitional residue and there is a timelag of roughly a century before the knock-on effect of the change in the nominative parameter is felt. This account relates a whole series of important syntactic changes in French, and provides us with an example of how parametric changes may cascade through a system over a period of time. Although these remarks answer the question posed above, it may be possible to go a step further; here we move to rather more speculative ground. It is clear from what we have said that the loss of V2 is indirectly connected to the change in the value of [+L]. There are two significant facts about V2 systems generally: (49) a. C is able to head-govern a subject trace in V2 languages. b. V2 languages with referential null subjects (e.g. Old French and Veneto) only allow such null subjects when AGR is in C. The statement in (49a) is illustrated by the fact that SVO clauses are possible in V2 languages. We follow the tradition beginning with den Besten (1983) (and recently argued for by Schwartz & Vikner (1996)) in treating all root V2 clauses in V2 languages, including SVO clauses, as involving V-movement to C. Hence a German sentence like (50a) has the representation in (50b): (50) a. Johann liebt Maria ‘John loves Mary’ b. [CP Johanni [C′[C liebt ] [AGRP ti. . . Maria ]]] Here C head-governs ti. This should be contrasted with the situation in English, where the ungrammaticality of (51) shows that C, when it contains

360  Ian Roberts an auxiliary, is unable to head-govern a subject trace (this argument is due to Rizzi (1990)):8 (51) *Whoi did ti leave? Thus C is a head-governor in V2 languages but not elsewhere. In terms of what we have said about AGR above, this implies that a V2 C is [+L] (note also that a number of authors, notably Tomaselli (1990), have suggested that C in V2 languages is able to license an expletive null subject). Putting this together with the fact that V2 null-subject languages like OF and Medieval Veneto characteristically have poorer agreement inflection than non-V2 null-subject languages like Italian, we are led to the idea that in such a system C is responsible for formally licensing null subjects, while AGR identifies their content. Since according to Rizzi (1986), the same head must perform both of these functions, it is only where AGR combines with C, i.e. in V2 clauses, that a referential null subject is possible. In such languages, then, both AGR and C are [+L]. The two cases of defective null-subject systems that we have seen in 17th century French and Renaissance Veneto are both found in languages where a formerly productive V2 system has been recently lost. We can understand this if we consider that the loss of V2 means a loss of productive AGR-to-C movement. Since it is not morphologically rich at this point, AGR alone is unable to license productive null subjects, but a defective system can survive for a time, after which AGR too becomes [-L]. This account retains the idea that defective null-subject systems are transitional between fuller systems and essentially non-null-subject systems (although on this latter point the history of Veneto since the 16th century is interesting and complex—Poletto (1995), Vanelli (1987)), while attributing the defectiveness less directly to the weakness of the morphology and more to properties of the syntax. This seems to be the correct move given that French has changed with regard to the possibilities of null subjects since the 17th century, but its morphological agreement system has changed only very slightly. It also makes the time lag between the loss of nominative-assignment under government and the loss of the constructions dependent on non-finite AGR being [+L] easier to understand. This section has provided empirical support for the system of head movement developed in section 3 and the analyses of various Romance constructions in section 4. The constructions discussed in Section 4 all depend on AGR being [+L], a property which also underlies AGR’s ability to license null subjects. The diachronic evidence from French shows that all these constructions are lost as null subjects are lost, strongly supporting the contention that a single property underlies all these constructions in the way that we have suggested. This evidence also supports our overall system for head movement, and our contention that there is a head-government requirement for head traces which is independent of antecedent-government.

Two Types of Head Movement in Romance  361

6.  Conclusion: Head-Government in PF In the foregoing we have argued that there are two types of head movement which obey Relativized Minimality, in differing ways as a function of the differing nature of the chains that are created. In section 2, we gave three conceptual reasons why such a move is desirable: (a) given a conjunctive ECP, the lack of selective violations of antecedent or head-government just in the case of head movement is suspect; (b) Rizzi’s definition of typical potential antecedent-governor mixes structural and functional notions; (c) the status of head traces regarding the head-government requirement is in general unclear. We dealt with these objections by distinguishing between head movement which is m-selected and forms uniformly L-related chains, and head movement which is not m-selected and forms non-L-related chains. We then extended the RM system in full to the head level, in terms of the definitions of typical potential antecedent-governor given in (21). In their application to head movement, these definitions may derive in large part from economy constraints on chain formation. Head chains formed as a result of m-selection properties of the host head are strictly local as a result of the locality constraints on selection generally. Head chains formed for other reasons (and it should be underlined that it is unclear why infinitives undergo “long” movement in many Romance languages) are not subject to the locality requirement induced by m-selection, but they are subject to the basic constraint that they undergo the minimum movement possible; this is seen most clearly in the LHM cases discussed by Lema & Rivero which are triggered by the filter in (11). As we noted in section 2, this approach overcomes the conceptual objection that Rizzi’s definition of typical potential antecedent-governor mixes structural and functional notions. We have also clarified the status of head traces with respect to head government: head traces, like all other traces, are subject to a head government requirement which is in principle independent of an antecedent-government requirement. And in fact we can construct cases where one requirement is violated while the other is satisfied. (52) is a case where a trace is head- but not antecedent-governed: (52) *Lo voglio che faccia. it I-want that I-do(subjunctive) This is an attempt to perform clitic-climbing from a tensed (subjunctive) clause. It fails because the non-L-related chain formed by clitic movement is unable to “skip” the non-L-related head C (either by excorporation or by substitution—cf. section 4.1). Nevertheless, AGR is able to head-govern the clitic trace in T (which is probably adjoined to T as in cases of long infinitive-movement discussed in 4.3).

362  Ian Roberts Conversely, (53) is an example where a head-trace is antecedent- not head-governed: (53) *Have you would t said it to John? English AGR is clearly [−L], and so it cannot head-govern the trace of longmoved have here. But have, occupying C, is able to antecedent govern its trace, since AGR, occupied by would, does not count as an intervener (even if would raises from T to AGR, the same is true since T is an L-related head—see 4.3). What is the nature of the head-government requirement? If the antecedent-government requirement of the ECP largely follows from economy considerations, we might wonder whether the head-government requirement follows from anything. One possibility is that the head-government requirement is a PF condition. In fact, it has been argued in Aoun et al. (1987) that the head-government requirement does not hold in LF. If this is true, then we expect that LF head movement in English and French, if non-m-selected, can form non-local non-L-related chains. For English, this is the case with LF V-to-C movement of the type proposed in Pollock (1989) and Chomsky (1991), and this is why negation selectively blocks V-movement in English. Have/be raising is selected and hence not sensitive to the presence of negation.9 On the other hand, LF V-raising is not m-selected, but really a kind of Quantifier Raising of the type discussed in May (1977, 1985), so negation blocks it, making do-insertion necessary in auxiliary-less negative clauses. The availability of trace-deletion makes no difference here, since there must be a trace in the V position. For French, Kayne (1989) provides evidence that there are LF “restructuring” effects, which we interpret as being instances of LF LHM uninhibited by the fact that French AGR is [-L]. The evidence comes from easy-to-please constructions, which were originally related to restructuring by Rizzi (1982). The initial observation is that easy-to-please constructions are not cases of unbounded dependency in Romance languages, as they are in English (cf. Chomsky (1977); Jaeggli (1982) showed that Spanish patterns with the other Romance languages in this respect). The following English/Romance contrast illustrates the difference: (54) a. Bill is easy to convince Mary that we should talk to. b. *Questo lavoro è facile da promettere di finire per domani. ‘This job is easy to promise to finish for tomorrow.’ However, Rizzi (1982) notices that where the intermediate verb is a restructuring verb, long-distance versions of this construction rather similar to what is found in English are possible: (55) Questa canzone è facile da cominciare a cantare. ‘This song is easy to start to sing.’

Two Types of Head Movement in Romance  363 Kayne (1989) observes that the same is true for French, and suggests that the availability of LF restructuring is behind this: (56) a. ?Ce livre serait impossible à commençer à lire aujourd’hui.   ‘This book would be impossible to start to read today.’ b. *Ce genre de livre est facile à promettre de lire.   ‘This kind of book is easy to promise to read.’ Combining our system with the idea that the head-government requirement does not hold in LF yields an account of (56) (and of the absence of (55)) in French. As in the case of overt movement of infinitivals discussed in 4.2, we assume that the trace in T cannot be deleted since it contains information as to finiteness. So we see that there is evidence that the head-government requirement is essentially a PF requirement. The components of the ECP seem to belong to different “interface” components, as Aoun et al. (1987) originally argued. The main theoretical result of the paper is that this is true for X°-traces just as it is true for XP-traces. The main empirical result of this study, which as we have seen supports the overall system of head movement, is a typology of the Romance languages according to their parametric choices among C and AGR as [+L], as follows: (57) OF, Med. N. It. Mod.It., Sp., Prt Mod. Fr. Rhaeto-Romansch

AGR + + -

C + +

If AGR is [+L], null subjects and “LHM” constructions are possible ceteris paribus. If C is [ +L], the language is V2. If both C and AGR are [+L], null subjects will depend on AGR-to-C movement.10 We have not discussed the fourth possibility, as instantiated in RhaetoRomansch, but to the extent that this language is reported to be similar to the Germanic languages with regard to its V2 and null-subject possibilities (see Haiman (1988)), this would be its characterization in terms of our system. This characterization combines a wide range of empirical phenomena. In particular, it is to my knowledge a new claim of this paper to relate LHM, clitic-climbing, long-movement of infinitives and Aux-to-Comp to each other. We have shown that the facts of the history of French justify these connections, and that the theory of head movement explains them.

Notes   * Earlier versions of this paper have been presented at the 13th GLOW Colloquium, Leiden, the University of Maryland Workshop on Verb Movement and the Workshop on Romance Syntax at University College, London. I’d like to thank the audiences at those meetings for their comments. At Maryland, Sabine

364  Ian Roberts Iatridou was the Commentator on this paper; the modifications that the paper has subsequently undergone are in large part a response to her comments. I am especially grateful to her. I am also especially indebted to Adriana Belletti, Bob Borsley, Guglielmo Cinque, Maria-Teresa Guasti, Maria-Rita Manzini, Luigi Rizzi, Maria-Luisa Rivero for comments and discussion. None of these people are responsible for my mistakes, though.   1. One antecedent of our proposal, at least in the domain of verb-movement, is Koopman (1984), who proposed that there were both “A”-type verb-move and “A’”-type verb-movement. Cf. also the proposal in Pollock (1989) that certain cases of verb-movement are triggered by a variable-binding requirement; this amounts to saying that these are cases of A’-movement of V.   2. Or the Specifier of the host; on Rizzi’s (1996) account of “residual” V2 in English and French questions, it is essentially Spec of CP which triggers I-to-C in virtue of an abstract morphological property. We can regard this as “induced” m-selection, and so the locality of I-to-C movement in cases of residual V2 is predicted. This account is incompatible with the account of the impossibility of I-to-C movement in indirect questions proposed by Rizzi & Roberts (1996 [this volume, Chapter 9]), but compatible with the one in Rizzi (1996): selection for a [+wh] C in indirect questions satisfies the WH-criterion and so I-to-C movement is blocked by economy.   3. It may be that the antecedent-government component of the ECP derives from conditions on chain-formation, as suggested both by Rizzi (1990b) and Chomsky & Lasnik (1993) in different ways. However, in what follows we provide clear evidence that head-traces must be head-governed. If the ECP is reduced to a head-government requirement, then we have empirical evidence that it applies to head traces. In section 6, we will suggest that the head-government requirement is a PF condition.   4. Rumanian may be a slightly different case from the others, since here LHM is triggered by illocutionary force, being found in questions and exclamations (Rivero (1993)). Following the reasoning in note 2, such movements are the result of an abstract morphological trigger. In that case, Rumanian LHM would be nonlocal head movement triggered by morphology. On the view being developed here, this should be impossible. We suspect that Rumanian LHM is in fact a type of “inverted conjugation” found in perfect tenses in a range of Romance languages: Old French (Dupuis (1989)), Sardinian (Jones (1993)), Old Spanish (Lema & Rivero (1991a)) and Southern Italian dialects. It is unclear what this phenomenon is, and how it is connected to VP Fronting and to Scandinavian-style Stylistic Fronting (Rögnvaldsson & Thráinsson (1990)), but it is clear that it is not the same as LHM. See Lema & Rivero (1991a) for discussion.   5. On what (11) may ultimately derive from, see Roberts (1992). (11) certainly is not a phonological property (pace Cardinaletti & Roberts (2002 [this volume, Chapter 12])). Since dislocated elements ‑ which are presumably outside CP—do not “count.” Thus we find enclisis with dislocated elements. This observation is crucial to an understanding of word order in Modern European Portuguese ‑ cf. Benincà (1991), Salvi (1991).   6. Zanuttini (1991) argues that in languages where clausal negation occupies a higher position than the inflected verb, e.g. Italian, Spanish and Portuguese, it selects TP in finite clauses. This relation is c-selection of the standard kind, i.e. a relation between a head and a maximal projection (p. 54). Thus Zanuttini’s proposal does not affect the point being made in the text; in fact, she explicitly points out (p. 52) that Romance negative markers are not affixal. Hence they do not m-select anything, so we have no reason to treat Neg as L-related.

Two Types of Head Movement in Romance  365

The account of Osp and EP LHM in the text implies that all cases of the ToblerMussafia Law in Medieval Romance involve LHM, i.e. V-to-C movement. We thus predict that V-to-C movement, i.e. enclisis, is blocked by negation. This is certainly true in contemporary European Portuguese, where a version of the Tobler-Mussafia Law is still operative (Benincà (1991:24)): (i)  a.  Não os comprendo.   b. *Não comprendo-os.      ‘(I) not understand them’  7. Pace Belletti (1990), we assume that AGR does not m-select T in infinitives (in languages without inflected infinitives). One might then question the idea that AGR is L-related here. Nevertheless, we assume that AGR is intrinsically L-related independently of m-selection (cf. the theory of A-positions proposed by Rizzi (1991)). Recall that when we introduced the notion of L-relatedness we pointed out that we cannot maintain the very strong claim that “X is L-related iff X has an m-selection feature.” Instead, we retain the weaker implication that “if X has an m-selection feature, X is L-related.” We can tentatively add that AGR is always L-related; note that this could derive the existence of affixhopping in Modern English.   8. In Chomsky (1991), the impossibility of (51) is attributed to a violation of economy, do being inserted in a context where it is not required. However economy cannot tell the whole story here, since a parallel argument, due to Friedemann (1991) and summarized in Rizzi (1990b), can be constructed for French on the basis of the ungrammaticality of (i): (i) *Quei sent ti mauvais?    ‘What smells bad?’ Interrogative que must cliticise to a verb in C. However, if the verb moves to C, the trace in subject position cannot be head-governed. If the verb does not move to C, que’s cliticization requirement is not satisfied. It is not obvious that economy considerations can explain the ungrammaticality of this example.   9. If we say this, we must reconsider the nature of the locality condition on selection. One possibility is that the only kinds of head that can be m-selected are L-related heads. So, in the spirit of Relativized Minimality, Neg does block m-selection of T by AGR even when it is base-generated in a position which intervenes between these heads. 10. The approach to clitic-climbing presented in 4.2 implies that if C is L-related, clitic-movement can skip it. Thus, independently of any “restructuring” property of the matrix verb, clitics may be able to climb to the higher AGR. The prediction is that V2 languages with clitic systems allow clitic-climbing with verbs like “decide,” “promise,” etc. (cf. the ungrammaticality of (25b) in Modern Italian). In fact, this is what we find in Old French, in complete conformity with our analysis and the resulting typology—cf. Pearce (1990).

References Adams, M. 1987. Old French, Null Subjects and Verb-Second Phenomena. PhD dissertation, UCLA. Alberton, S. 1990. Enclise du pronom objet en français et en italien antique ou la loi Tobler-Mussafia. Mémoire de Licence, University of Geneva. Aoun, J., N. Hornstein, D. Lightfoot & A. Weinberg. 1987. Two Types of Locality. Linguistic Inquiry 18:537–577. Baker, M. 1988. Incorporation: A Theory of Grammatical-Function Changing. Chicago: University of Chicago Press.

366  Ian Roberts Belletti, Adriana. 1990. Generalized Verb Movement: Aspects of Verb Syntax. Turin: Rosenberg and Sellier. Belletti, A. 1994. Verb positions: evidence from Italian. In D. Lightfoot & N. Hornstein (eds) Verb Movement. Cambridge: Cambridge University Press, pp. 19–40. Benincà, P. 1989. L’ordine delle parole nelle lingue romanze medievali. XIX Congreso Internacional de Lingüística e Filoloxia Romanicas. Santiago de Compostela. Benincà, P. 1995. TOP and SpecCP in Medieval and Modern Romance. In A. Battye & I. Roberts (eds) Clause Structure and Language Change. Cambridge: Cambridge University Press, pp. 325–344. Den Besten, H. 1983. On the interaction of root transformations and lexical deletive rules. In W. Abraham (ed) On the Formal Syntax of the Westgermania. Amsterdam: John Benjamins. Borsley, R., M.-L. Rivero & J. Stephens. 1996. Long head movement in Breton. In R. Borsley & I. Roberts (eds) The Syntax of the Celtic Languages. Cambridge: Cambridge University Press, pp. 53–74. Cardinaletti, Anna, and Ian Roberts. 2002. Clause structure and X-second. In G. Cinque (ed) Functional Structure in DP and IP: The Cartography of Syntactic Structure Volume One New York/Oxford: Oxford University Press, pp. 123–166 [this volume, Chapter 12]. Chomsky, N. 1965. Aspects of the Theory of Syntax. Cambridge MA: MIT Press. Chomsky, N. 1977. On Wh-Movement. In P. Culicover, T. Wasow & A. Akmajian (eds) Formal Syntax. New York: Academic Press. Chomsky, N. 1981. Lectures on Government and Binding. Dordrecht: Foris. Chomsky, N. 1986. Barriers. Cambridge, MA: MIT Press. Chomsky, N. 1991. Some notes on economy of derivation and representation. In R. Friedin (ed) Principles and Parameters in Comparative Grammar. Cambridge, MA: MIT Press, pp. 417–454. Chomsky, N. & H. Lasnik. 1993. The theory of principles and parameters. In J. Jacobs, A. von Stechow, W Sternefeld & T. Vennemann (eds) Syntax: An International Handbook of Contemporary Research. Berlin: de Gruyter. Reprinted in N. Chomsky The Minimalist Program. Cambridge MA: MIT Press, 1995, pp. 13–128. Cinque, G. 1991. Types of A’-Dependencies. Cambridge, MA: MIT Press. Clark, R. & I. Roberts. 1993. A computational model of language learnability and language change. Linguistic Inquiry 24:299–345 [this volume, Chapter 2]. Dupuis, F. 1989. L’expression du sujet dans les subordonnées en ancien français. PhD dissertation, University of Montreal. Friedemann, M.-A. 1991. Propos sur la montée du verbe en C dans certaines interrogatives françaises. Mémoire de licence, University of Geneva. Giorgi, A. & G. Longobardi. 1991. The Syntax of Noun Phrases: Configuration, Parameters and Empty Categories. Cambridge: Cambridge University Press. Guasti, M.-T. 1992. Causative and Perception Verbs. PhD dissertation, University of Geneva. Guéron, J. & T. Hoekstra. 1988. T-chains and the constituent structure of auxiliaries. In A. Cardinaletti, G. Cinque & G. Giusti (eds) Constituent Structure. Dordrecht: Foris. Haiman, J. 1988. Rhaeto-Romansch. In N. Vincent & M. Harris (eds) The Romance Languages. London: Routledge. Hirschbuhler, P. 1990. La légitimation de la construction V1 à sujet nul dans la prose et le vers en ancien français. Revue québécoise de linguistique 19:32–55. Jones, M. 1993. Sardinian Syntax. London: Croom Helm. Kayne, R. 1972. Subject inversion in French interrogatives. in J. Casagrande and B. Saciuk (eds.) Generative Studies in Romance Languages. Rowley, MA: Newbury House.

Two Types of Head Movement in Romance  367 Kayne, R. 1989. Null subjects and clitic climbing. In O. Jaeggli & K. Safir (eds) The Null Subject Parameter. Dordrecht: Kluwer, pp. 239–261. Kayne, R. 1991. Romance clitics, verb movement and PRO. Linguistic Inquiry 22:647–686. Kayne, R. 1992. Italian negative infinitival imperatives and clitic climbing. In L. Tasmowski & A. Zribi-Hertz (eds) De la musique à la linguistique. Hommages à Nicolas Ruwet. Ghent: Communication & Cognition, pp. 300–312. Van Kemenade, A. & A. Hulk. 1995. Verb second, pro-drop, functional categories and language change. In A. Battye & I. Roberts (eds) Clause Structure and Language Change. Cambridge: Cambridge University Press, pp. 227–256. De Kok, A. 1985. La place du pronom personnel régime conjoint en français. Une étude diachronique. Amsterdam: Rodopi. Koopman, H. 1984. The Syntax of Verbs: From Verb-Movement Rules in the Kru Languages to Universal Grammar. Dordrecht: Foris. Koopman, Hilda, & Dominique Sportiche. 1991. The position of subjects. In James McCloskey (ed) The Syntax of Verb-Initial Languages, 211–258. Amsterdam: Elsevier. [Special issue of Lingua 85.] Jaeggli, O. 1982. Topics in Romance Syntax. Dordrecht: Foris. Lema, J. 1990. Licensing Conditions on Head Movement. PhD dissertation, University of Ottawa. Lema, J. & M.-L. Rivero. 1990. Long head movement: ECP vs HMC. In J. Carter et al. (eds) Proceedings of NELS 20, pp. 333–347. Lema, J. & M.-L. Rivero. 1991a. Types of verbal movement in Old Spanish: Modals, futures, and perfects. Probus 3. 237–278. Lema, J. & M.-L. Rivero. 1991b. Inverted conjugations and V-second effects in Romance. In C. Laeufer & T. Morgan (eds) Theoretical Analyses in Romance Linguistics. Amsterdam: Benjamins, pp. 311–328. Lieber, R. 1980. On the Organization of the Lexicon. PhD dissertation, MIT. Marantz, A. 1984. On the Nature of Grammatical Relations. Cambridge MA: MIT Press. May, R. 1977. The Grammar of Quantification. PhD dissertation, MIT. May, R. 1985. Logical Form: Its Structure and Derivation. Cambridge MA: MIT Press. Motapanyane, V. 1991. Theoretical Implications of Complementation in Rumanian. PhD dissertation, University of Geneva. Mussafia, A. 1983. Scritti di filologia e linguistica. Padua: Antonore. A. Daniele and L. Renzi (eds) Ouhalla, J. 1990. Sentential negation, relativized minimality and the aspectual status of auxiliaries. The Linguistic Review 7:183–231. Pearce, E. 1990. Parameters in Old French Syntax. Dordrecht: Kluwer. Poletto, C. 1995. The diachronic development of subject clitics in North-Eastern Italian dialects. In A. Battye & I. Roberts (eds) Clause Structure and Language Change. New York/Oxford: Oxford University Press, pp. 295–324. Pollock, J.-Y. 1989. Verb movement, UG and the Structure of IP. Linguistic Inquiry 20:365–424. Postal, P. 1969. On the so-called pronouns of English. In D. Reibel & S. Schane (eds) Modern Studies in English. Englewood Cliffs, NJ: Prentice-Hall. Rivero, M.-L. 1991. Long head movement and negation: Serbo-Croatian vs Slovak and Czech. The Linguistic Review 8:319–351. Rivero, M.-L. 1993. Long head movement vs V2 and null subjects in Old Romance. Lingua 89:217–245. Special Issue on Null Subjects in Diachrony edited by Aafke Hulk and Ans van Kemenade. Rivero, M.-L. 1994. Clause structure and V-movement in the languages of the Balkans. Natural Language and Linguistic Theory, 12: 63–120. Rizzi, L. 1982. Issues in Italian Syntax. Dordrecht: Foris.

368  Ian Roberts Rizzi, L. 1986. Null Objects in Italian and the Theory of pro. Linguistic Inquiry 17:501–557. Rizzi, L. 1996. Residual verb-second and the Wh-criterion. In A. Belletti & L. Rizzi (eds) Parameters and Functional Heads. New York/Oxford: Oxford University Press, pp. 63–90. Rizzi, L. 1990. Relativized Minimality. Cambridge MA: MIT Press. Rizzi, L. 1991. Proper head-government and the definition of A-positions. GLOW Newsletter 26:46–47. Rizzi, L. & I. Roberts. 1989. Complex inversion in French. Probus 1, 1‑30 (Reprinted in A. Belletti & L. Rizzi (eds). 1996. Parameters and functional heads. Oxford: Oxford University Press, pp. 91–118) [this volume, Chapter 9]. Roberts, I. 1991. Excorporation and minimality. Linguistic Inquiry, 22, 209–218 [this volume, Chapter 10]. Roberts, I. 1992. Wacknagel meets the extended projection principle. GLOW Newsletter, 28. Roberts, Ian. 1993. Verbs and Diachronic Syntax. Dordrecht: Kluwer. Rögnvaldsson, E. & H. Thráinsson. 1990. On Icelandic word order once more. In J. Maling & A. Zaenen (eds) The Syntax of Modern Icelandic. San Diego: Academic Press. Salvi, G. 1991. Difesa e illustrazione della legge di Wackernagel applicata alle lingue romanze antiche. In Miscelleanea G. B. Pellegrini. Padova: Unipress. Schwartz, Bonnie, and Sten Vikner. 1996. The verb always leaves IP in V2 clauses. In A. Belletti & L. Rizzi (eds) Parameters and Functional Heads: Essays in Comparative Syntax. New York/Oxford: Oxford University Press, pp. 11–62. Sportiche, D. 1998. Movement, agreement and case assignment. In. D. Sportiche (ed) Partitions and Atoms of Clause Structure. London: Routledge, pp. 88–243. Tobler, A. 1875. Review of J. Le Coultre De l’ordre des mots dan Chrétien de Troyes. Reprinted in A. Tobler (1912) Vermischte Beiträgen zu Französischen Grammatik, V. Leipzig. Tomaselli, A. 1990. La sintassi del verbo finito nelle lingue germaniche. Padua: Unipress. Travis, L. 1984. Parameters and Effects of Word Order Variation. PhD dissertation, MIT. Vance, B. 1989. Null Subjects and Syntactic change in Medieval French. PhD dissertation, Cornell University. Vanelli, L. 1987. I pronomi soggetto nei dialetti italiani settentrionali dal Medio Evo a oggi. Medievo Romanzo 12. Zanuttini, R. 1991. Syntactic Properties of Sentential Negation: A Comparative Study of Romance Languages. PhD dissertation, University of Pennsylvania.

12 Clause Structure and X-Second Anna Cardinaletti and Ian Roberts

The purpose of this essay is to propose a unified analysis of a range of “secondposition” phenomena that have been attested in various languages, and in so doing to motivate a more elaborated theory of Nominative Case assignment. The proposal is that many languages, including a number of Germanic and Romance languages, have a projection that intervenes between Comp and the highest Infltype projection (which, following Belletti 1990, we take to be AgrP). We refer to this projection as Agr1P, and we refer to the lower, “traditional” AgrP as Agr2P. Thus our claim is that in the languages in question there are two Agr-heads and two projections of Agr. These two Agrs are both “subject” Agrs; in this respect, our proposal is distinct from but not exclusive with Chomsky’s (1989) idea that, in addition to the standard “subject” Agr, there is also an “object” Agr. We show here that this proposal is of considerable empirical value in that it offers a new perspective on a range of second-position phenomena and allows us to connect “verb-second” effects with various kinds of “cliticsecond” effects, known in traditional grammar as Wackernagel’s Law and the Tobler-Mussafia Law.1 In fact, we suggest that the presence of Agr1P is fundamentally related to Nominative Case assignment, in that the basic property of Agr1° seems to be that of assigning Nominative Case; the other properties that we ascribe to it (e.g., attracting clitics and attracting the inflected verb) are intimately related to its Nominative-assigning property. In this sense, it may be best to think of Agr1P as NomP. As a working hypothesis, then, we assume that in languages that have both Agr1P and Agr2P, Agr2° is not an assigner of Nominative Case. Because our focus is on the interaction of clause structure with structural Case assignment, we concentrate almost exclusively on the processes and properties of S-structure. It is a classic tenet of generative grammar that inflectional affixes may be separate syntactic entities at pre-phonological levels of representation; see the analysis of the English auxiliary system in Chomsky (1957). At the turn of the 1990s, this idea received new impetus, beginning with Pollock (1989). Pollock’s proposal that the Infl-node of Chomsky (1981) should be split into its morphological components led to the working hypothesis in much recent research that any inflectional head which appears to be syntactically relevant heads its own maximal projection with the standard X-bar theoretic structure. Our proposal amounts to the claim that certain languages

370  Anna Cardinaletti and Ian Roberts have a special X-bar projection for the assignment of Nominative Case; in terms of the connection with inflectional morphology, it may be possible to relate the existence of Agr1P to the possession of morphologically realized Nominative Case. The structure that we are proposing is as follows: CP Spec

C′ Agr1P

C° Spec

Agr1′ Agr1°

Agr2P Spec

Agr2′ Agr2°


The order of TP and Agr2° varies in the languages under consideration: in West Germanic TP is on the left of Agr2° (see Giusti 1986 for the proposal that IP is head-final in German); in North Germanic and Romance, it is on the right. By contrast, Agr1°, like C°, precedes Agr2P in all the languages we discuss. The essay is organized as follows: in section 1 we analyze the phenomenon of embedded verb-second, basing ourselves largely on the best-known case of this type: Icelandic (although we also analyze both Old French and Yiddish in these terms). In section 2 we show how our system gives a natural analysis of the “clitic-second” phenomena in Germanic and Romance; in our terms the traditionally recognized Wackernagel position of Germanic languages is Agr1°, as is the clitic position in those Romance languages that obey the Tobler-Mussafia Law. The last two sections deal with ways in which the properties of Agr1° vary parametrically: we discuss different modes of Nominative Case assignment in section 3, where we present and elaborate the recent proposals of Koopman and Sportiche (1991), and different kinds of null subjects in section 4. Finally, the appendix is devoted to the discussion of Stylistic Fronting. We have also added a postscript to this version of the paper, explaining its rather unusual and protracted prepublication history.

1.  Agr1 as a Position for the Inflected Verb: Embedded V2 1.1 Icelandic The postulation of Agr1P allows us to account in a straightforward way for certain differences within the class of verb-second languages. While in many languages—for example, German, Dutch, and Mainland Scandinavian— verb-second is essentially a root phenomenon, it appears to be generalized

Clause Structure and X-Second  371 to all types of embedded clauses in Icelandic. The following data (from Rögnvaldsson and Thráinsson 1990 and Thráinsson, personal communication) illustrate this, showing that in a variety of embedded clauses we have the order XP > V > subject:2 (1) a. Ég held að þegar hafi María lesið þessa bók. I believe that already has M. read this book ‘I believe that Mary has read this book already.’ b. Ég harma að þegar hafi María lesið þessa bók. (factive) I regret that already has M. read this book ‘I regret that Mary has already read this book.’ c. Ég spurði hvort þegar hefði María lesið þessa bók. (Wh) I asked whether already had M. read this book ‘I asked whether Mary had already read this book.’ d. sú staðreynd að þegar hefur María lesið þessa bók (NP) the fact that already had M. read this book ‘the fact that Mary had already read this book’ e. bókin sem þegar hefur María lesið (relative) book-the that already had M. read ‘the book that Mary had already read’ The Icelandic situation, as illustrated in (1), contrasts with what we find in German. In German, embedded V2 is possible only in a limited class of embedded clauses, essentially the complements to “bridge verbs” of the type in (1a). Embedded V2 is excluded in all the contexts parallel to (1b–e):3 (2) a. b. c. d. e.

Ich glaube, gestern hat Maria dieses Buch gelesen. I believe yesterday has M. this book read ‘I believe Mary read this book yesterday.’ *Ich bedauere, (daβ) gestern hat Maria dieses Buch gelesen. I regret (that) yesterday has M. this book read *Ich frage mich, ob gestern hat Maria dieses Buch gelesen. I ask myself whether yesterday has M. this book read *die Tatsache, gestern hat Maria dieses Buch gelesen the fact yesterday has M. this book read *das Buch, das gestern hat Maria gelesen the book which yesterday has M. read

For (2a), an analysis in terms of CP-recursion seems to be in order. As noted in Rizzi and Roberts (1996 [this volume, Chapter 9]), the same class of verbs allows an otherwise root phenomenon—subject-aux inversion triggered by a negative-polarity item—in its complement in English: (3) a. I believe that only in America could you do such a thing. b. *I wonder whether only in America could you do such a thing.

372  Anna Cardinaletti and Ian Roberts It seems then that, independently of verb-second, the complements of bridge verbs are able to have root properties. We propose, still following Rizzi and Roberts (1996 [this volume, Chapter 9]), that this is because bridge verbs allow CP-recursion in their complements. More precisely, we propose that bridge verbs select a C° which selects another C°; to avoid unlimited recursion at the C-level, we clearly must propose that the two C°s have different properties (e.g., that the first allows a “propositional” complement, while the second only allows a “predicational” complement in the terms of Rizzi 1990b). In German, the two C°s are different in form: the first is null, and the second is filled by the inflected verb, like the C° of a matrix clause. (In English, too, the C° that selects CP is different from other C°s: that in (3a) cannot be deleted, while other occurrences of that can be.) Adopting this analysis for (2a), we propose the following partial structure: (4) . . . glaube [CP Ø [CP gestern [C hat [ Maria . . . ] ] ] ] Vikner (1995) proposes extending this analysis to embedded verb second in Icelandic. This entails that CP-recursion is generalized in Icelandic, while it is limited to a specific class of complements in German, Mainland Scandinavian, and English. In other words, the property of selecting C° is available for all classes of C° in Icelandic. If this were true, however, then there would be no way to avoid unlimited recursion of C°, which is clearly an undesirable consequence. Instead, our proposal provides a straightforward account of the data in (1). These examples have the following structure (although, to the extent that the class of verbs which allows CP-recursion in German also allows it in Icelandic, (1a) may also have a structure like (4) with að in the higher C° and the inflected verb in the lower C°):4 (5) [CP C° [Agr1P TOP [Agr1′ Vi+Agr1 [Agr2PNPNom [Agr2°ti] . . . ]]]] As we will see in more detail in section 3, the special property of Icelandic is that SpecAgr1′ is a topic position, while the usual superficial subject position is SpecAgr2′. If we do not assume a double-Agr structure, no other position would be available for the subject. Since there is no generalized CP-recursion, SpecC′ is not available. Although we assume that the subject is base-generated in VP (see in particular Koopman and Sportiche 1991), the base-position of the subject is unavailable, at least for a definite NP, since it is not a position that can receive Nominative Case from Agr° (see section 4 for some evidence that indefinite NPs can appear in this position). This idea is confirmed by the fact that definite subjects always precede VP-adverbs, as in . . .að pegar hefur María oft lesið þessa bók ‘that already has Mary often read this book’ (Thráinsson, personal communication) (see also Vikner 1995, 68). Furthermore, we follow Rizzi (1990a) in assuming that SpecT′ is inherently

Clause Structure and X-Second  373 an A′-position and, as such, is not a possible landing site for the subject. In fact, if we adopt Rizzi’s (1991) characterization of potential A-positions as either θ-positions or specifiers of Agr, SpecT′ must be an A′-position. Hence, SpecAgr2′ is the position of the definite subject. Using this analysis of embedded clauses, we are not forced to treat matrix V2 in Icelandic as involving movement of the verb to C°. Movement to Agr1° would clearly suffice to derive the same orders (see Rögnvaldsson and Thráinsson 1990 and the references given there for recent discussion of similar proposals). At the same time, the data do not force us to reject a movement-to-C analysis. One property of Icelandic, however, suggests that, in fact, matrix V2 should be handled in terms of verb-movement to Agr l° rather than to C°. Icelandic makes much more frequent use of declarative V1 orders than do the other (Modern) Germanic languages (aside from Yiddish). Declarative V1 is illustrated in the following example: (6) Hitti hann þá einhverja útlendingar. met he then some foreigners ‘He then met some foreigners.’ (Sigurðsson 1985, 1) We propose that in (6) the inflected verb has undergone structure-preserving topicalization (i.e., topicalization of an X° category to another head position Y°). This operation probably takes place for reasons connected to information structure, since the examples in question seem to be presentational sentences. We propose that the landing site of this operation is C°; these are the only declarative clauses in which the verb is in C° (as in other Germanic and Romance languages, the inflected verb is typically in C° in matrix interrogatives, imperatives, and hypotheticals). As we will see in section 2.2, this kind of “verb-topicalization” is not restricted to Icelandic but also is found in Medieval Romance languages (see also Benincà 1989; Alberton 1990). More generally, we expect this possibility to exist in all languages that realize V2 at the Agr1P-level since in such cases C° is freely available as a landing site for structure-preserving topicalization. Since the V2 requirement is satisfied elsewhere, SpecC′ in such languages can remain empty. This analysis implies that declarative operators do not exist, since otherwise we would expect V1 declaratives to be generally possible on a par with V1 interrogatives, hypotheticals, and so on.5 One reason to favor an analysis of (6) in which the verb moves to C° over one in which the verb moves only to Agr1°, while the subject stays in SpecAgr2′, is that this kind of V1 is a root phenomenon. Since den Besten (1983), the simplest treatment of root phenomena has been to say that they involve movement to C°, a position available in principle in matrix clauses but unavailable in embedded clauses. If matrix clauses are Agr1P in Icelandic, this implies that verb second is not a unified phenomenon in

374  Anna Cardinaletti and Ian Roberts the Germanic languages, at least in the sense that the landing site of the verb may vary cross-linguistically. We will see further evidence in favor of this conclusion as we proceed. See also Diesing (1988, 1990) and Santorini (1988, 1989) for a similar conclusion based on Yiddish evidence (and see section 1.3 here for some discussion of Yiddish). To sum up, following on from our “double-Agr” hypothesis about basic clause structure, the following conclusions emerge for Icelandic: (7) a. Agr1° can assign Nom under government (see section 3). b. SpecAgr2′ is a subject position. c. SpecAgr1′ can be a topic position. d. SpecC′ is an operator position. 1.2  Old French The V2 nature of Old French (OF) is illustrated clearly by the examples in (8) (non-nominative clitics—e.g., en in (8b)—are effectively part of the inflected verb, and so do not “count” in the computation of the second position): (8) a. Einsint aama la demoisele Lancelot. (Adams 1987b, 50) thus loved the lady Lancelot ‘Thus the lady loved L.’ b. Desuz un pin en est li reis alez. (Schulze 1888, 200) under a pine-tree of-it is the king gone ‘The king went underneath a pine tree.’ c. Quatre saietes ot li bers au costé. (Le Charroi de Nîmes, 1. 20)6 four boats-of-war had the baron at-the side ‘The baron had four boats of war at his side.’ Adams (1987a,b) shows that V2 is possible in the complements to bridge verbs. The class of bridge verbs in question is comparable to the class which in V2 Germanic langu