Diachronic Syntax (Oxford Textbooks in Linguistics) [2 ed.] 9780198861461, 019886146X

This second edition of Ian Roberts's highly successful textbook on diachronic syntax has been fully revised and upd

159 71 7MB

English Pages 752 [751] Year 2022

Polecaj historie

Natural Language Syntax (Oxford Textbooks in Linguistics) 9780199230174, 9780199230181, 019923017X

In this book, Peter Culicover introduces the analysis of natural language within the broader question of how language wo

151 43 3MB Read more

Compositional Semantics: An Introduction to the Syntax/Semantics Interface (Oxford Textbooks in Linguistics) 9780199677146, 9780199677153, 019967714X

This book provides an introduction to compositional semantics and to the syntax/semantics interface. It is rooted within

154 47 3MB Read more

Pragmatics (Oxford Textbooks in Linguistics) [2 ed.] 9780199577767, 0199577765

Yan Huang's highly successful textbook on pragmatics - the study of language in use - has been fully revised and up

185 16 2MB Read more

Grammaticalization (Oxford Textbooks in Linguistics) 9780198748540, 9780198747857, 019874854X

This textbook introduces and explains the fundamental issues, major research questions, and current approaches in the st

140 103 3MB Read more

Linguistic Typology (Oxford Textbooks in Linguistics) 9780199677092, 9780199677498, 0199677093

This textbook provides a critical introduction to major research topics and current approaches in linguistic typology, t

186 91 4MB Read more

Diachronic syntax : the Kartvelian case 9780126135183, 0126135185

151 88 111MB Read more

Diachronic and Comparative Syntax 1138233048, 9781138233041

This book brings together for the first time a series of previously published papers featuring Ian Roberts' pioneer

759 138 3MB Read more

The Oxford Handbook of Comparative Syntax (Oxford Handbooks in Linguistics) 0195136500, 9780195136500

Comparison across formal languages is an essential part of formal linguistics. The study of closely-related varieties ha

109 3 6MB Read more

An Introduction to Multilingualism: Language in a Changing World (Oxford Textbooks in Linguistics) 9780198791102, 9780198791119, 0198791100

This book offers an introduction to the many facets of multilingualism in a changing world. It begins with an overview o

175 122 7MB Read more

Austroasiatic Syntax in Areal and Diachronic Perspective 9004396950, 9789004396951

This volume elevates historical morpho-syntax to a research priority in the field of Southeast Asian language history, t

950 101 6MB Read more

Diachronic Syntax (Oxford Textbooks in Linguistics) [2 ed.]
9780198861461, 019886146X

Author / Uploaded
Ian Roberts

Table of contents :
Cover
Diachronic Syntax
Copyright
Dedication
Contents
Preface to the second edition
Preface to the first edition
List of figures, tables, and boxes
Figures
Tables
Boxes
List of abbreviations and acronyms
Introduction
Further reading
Works on the history of linguistics
Textbooks on syntax
Chomsky’s work and introductions to it
Historical linguistics
1: Formal comparative and historical syntax
Introduction
1.1. UG and variation in grammatical systems
1.2. Null subjects
1.2.1. Null subjects in the synchronic dimension I: consistent null-subject languages
1.2.2. Null subjects in the synchronic dimension II: radical null-subject languages
1.2.3. Null subjects in the synchronic dimension III: partial null-subject languages
1.2.4. Null subjects in the synchronic dimension IV: types of null subjects
1.2.5. Null subjects in the diachronic dimension I: European and Brazilian Portuguese
1.2.6. Null subjects in the diachronic dimension II: early Germanic
1.2.7. Null subjects in the diachronic dimension III: French
1.2.8. Conclusion: types of parameters
1.3. Verb-movement parameters
1.3.1. Verb-movement in the synchronic dimension
1.3.1.1. Verb-movement to T
1.3.1.2. V-movement to C: full and residual V
1.3.1.3. Further properties related to verb-movement
1.3.2. Verb-movement in the diachronic dimension
1.3.2.1. V-to-T in earlier English
1.3.2.2. V in diachrony
1.4. Negative concord
1.4.1. Negative concord synchronically
1.4.2. Negative concord and n-words in the diachronic dimension
1.4.3. Summary and conclusion
1.5. Wh-movement
1.5.1. Wh-movement parameters
1.5.2. Wh-movement in the diachronic domain
1.6. Head-complement order
1.6.1. Head-complement order synchronically
1.6.2. Head-complement order diachronically
1.6.3. The Final-Over-Final Condition
1.7. Summary
Further reading
The history of French
Null subjects
Subject clitics
General aspects of generative theory
Verb-movement
Negation
Wh-movement
Language acquisition and parameter-setting
Language typology
The history of English
Germanic syntax
Older Indo-European languages
2: Types of syntactic change
Introduction
2.1. Reanalysis
2.1.1. The nature of reanalysis
2.1.2. Transparency and Tolerance
2.1.3. Phonology and reanalysis
2.1.4. Expressing parameters
2.1.5. Reanalysis and the poverty of the stimulus
2.1.6. Conclusion
2.2. Grammaticalization
2.3. Argument structure
2.3.1. Thematic roles and grammatical functions
2.3.2. Changes in English psych verbs and recipient passives
2.4. Changes in complementation
2.5. Word-order change
2.5.1. Introduction
2.5.2. Early typological approaches to word-order change
2.5.3. Generative accounts and directionality parameters
2.5.4. ‘Antisymmetric’ approaches to word-order change and the Final-Over-Final Condition
2.5.5. Conclusion
2.6. Conclusion
Further reading
Reanalysis, abduction, and learnability
V-to-T movement and the development of English auxiliaries
The effects of the loss of dative case and the development of psychological predicates in English
Grammaticalization
Word-order change in English
Other work on the history and varieties of English
Germanic syntax
Latin and Romance syntax
The structure of DPs
3: Acquisition, learnability, and syntactic change
Introduction
3.1. Parameter-setting in first-language acquisition: a scenario for language change?
3.1.1. Introduction
3.1.2. Root infinitives
3.1.3. Early null subjects
3.1.4. Conclusion
3.2. The logical problem of language change
3.3. The changing trigger
3.3.1. Contact-driven parameter-resetting
3.3.2. Cue-driven parameter-resetting
3.3.3. Morphologically driven parameter-resetting
3.3.4. Conclusion
3.4. Markedness and complexity
3.4.1. The concept of markedness
3.4.2. Markedness and parameters
3.4.3. The Subset Principle
3.4.4. Markedness and core grammar
3.4.5. Markedness and inflectional morphology
3.4.6. Markedness, directionality, and uniformitarianism
3.4.7. Conclusion
3.5. Parameter-setting, parameter hierarchies, and syntactic change
3.5.1. Parameter hierarchies
3.5.2. Parameters and diachronic stability
3.5.3. From unmarked to marked: traversing parameter hierarchies
3.5.4. Conclusion
3.6. Conclusion
Further reading
Principles-and-parameters theory
Learnability and markedness
Language acquisition
Further relevant works on syntactic theory
Other relevant work on historical and typological syntax
Phonological theory and phonological change
4: The dynamics of syntactic change
Introduction
4.1. Gradualness
4.1.1. Introduction
4.1.2. Lexical diffusion: nanoparametric change
4.1.3. Microparametric change
4.1.4. Formal optionality
4.1.5. The Constant Rate Effect
4.1.6. Conclusion
4.2. The spread of syntactic change
4.2.1. Introduction
4.2.2. Orderly differentiation and social stratification
4.2.3. Grammars in competition
4.2.4. Formal optionality again
4.2.5. Abduction and actuation
4.2.6. Change in progress? Null subjects in Brazilian Portuguese
4.2.7. Conclusion
4.3. Drift: the question of the direction of change
4.3.1. Introduction
4.3.2. Typology, the Final-Over-Final Condition and drift
4.3.3. Drift and parametric change
4.3.4. Cascading parameter changes in the history of English
4.3.5. Conclusion
4.4. Reconstruction
4.4.1. Introduction
4.4.2. Traditional comparative reconstruction
4.4.3. Questions about syntactic reconstruction
4.4.4. The correspondence problem
4.4.5. The ‘pool of variants’ problem
4.4.6. Parametric comparison
4.4.7. Conclusion
4.5. Conclusion
Further reading
Historical and Indo-European linguistics
Morphology and phonology
Italian dialects
Sociolinguistics
5: Contact, creoles, and change
Introduction
5.1. Second-language acquisition, interlanguage, and syntactic change
5.2. Contact and substrata
5.2.1. Introduction
5.2.2. Contact and word-order change in the history of English
5.2.3. Substratum effects: Hiberno-English and Welsh English
5.2.4. A ‘borrowing scale’
5.2.5. Conclusion
5.3. Creoles and creolization
5.3.1. Introduction: pidgins and creoles
5.3.2. The Language Bioprogram Hypothesis
5.3.3. The substratum/relexification hypothesis
5.3.4. Conclusion: how ‘exceptional’ are creoles?
5.4. Language creation in Nicaragua
5.5. Conclusion
Further reading
Pidgins and creoles
The Celtic languages and English
Contact and the history of English
Nicaraguan and other sign languages
Epilogue
Glossary
References
Index of Subjects
Index of Names
Index of Languages

Citation preview

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

Diachronic Syntax

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

O X F OR D T E X T B O O K S IN L I N G UI S T I C S

PUBLISHED

The Grammar of Words An Introduction to Linguistic Morphology   by Geert Booij

Compositional Semantics An Introduction to the Syntax/Semantics Interface by Pauline Jacobson

A Practical Introduction to Phonetics   by J. C. Catford

The History of Languages An Introduction by Tore Janson

An Introduction to Multilingualism Language in a Changing World by Florian Coulmas

The Lexicon An Introduction by Elisabetta Ježek

Meaning in Use An Introduction to Semantics and Pragmatics   by Alan Cruse

A Functional Discourse Grammar for English by Evelien Keizer Grammaticalization by Heiko Narrog and Bernd Heine

Natural Language Syntax by Peter W. Culicover Principles and Parameters An Introduction to Syntactic Theory by Peter W. Culicover A Semantic Approach to English Grammar by R. M. W. Dixon

Diachronic Syntax   by Ian Roberts Speech Acts and Clause Types English in a Cross-Linguistic Context by Peter Siemund Linguistic Typology by Jae Jung Song

Semantic Analysis A Practical Introduction by Cliff Goddard

Cognitive Grammar An Introduction by John R. Taylor

Pragmatics   by Yan Huang

Linguistic Categorization   by John R. Taylor IN PREPARATION

Codeswitching by Jeff MacSwan An Introduction to Phonetics and Phonology by Nathan Sanders

Cognitive Grammar An Introduction   by John R. Taylor

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

Diachronic Syntax IAN ROBERTS SECOND EDITION

1

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

3

Great Clarendon Street, Oxford,  , United Kingdom Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries © Ian Roberts ,  The moral rights of the author have been asserted First Edition published in  Second Edition published in  Impression:  All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by licence or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this work in any other form and you must impose this same condition on any acquirer Published in the United States of America by Oxford University Press  Madison Avenue, New York, NY , United States of America British Library Cataloguing in Publication Data Data available Library of Congress Control Number:  ISBN –––– Printed in Great Britain by Ashford Colour Press Ltd, Gosport, Hampshire Links to third party websites are provided by Oxford in good faith and for information only. Oxford disclaims any responsibility for the materials contained in any third party website referenced in this work.

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

To all parameter-setting devices everywhere, but most especially Julian and Lydia (whose cognitive states have crystallized since the first edition)

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

Contents xiii xiii xv xvii

Preface to the second edition Preface to the first edition List of figures, tables, and boxes List of abbreviations and acronyms



Introduction



Further reading

1.

Formal comparative and historical syntax

 

Introduction 1.1. UG and variation in grammatical systems



1.2. Null subjects 1.2.1. Null subjects in the synchronic dimension I: consistent null-subject languages 1.2.2. Null subjects in the synchronic dimension II: radical null-subject languages 1.2.3. Null subjects in the synchronic dimension III: partial null-subject languages 1.2.4. Null subjects in the synchronic dimension IV: types of null subjects 1.2.5. Null subjects in the diachronic dimension I: European and Brazilian Portuguese 1.2.6. Null subjects in the diachronic dimension II: early Germanic 1.2.7. Null subjects in the diachronic dimension III: French 1.2.8. Conclusion: types of parameters



1.3. Verb-movement parameters 1.3.1. Verb-movement in the synchronic dimension 1.3.1.1. Verb movement to T 1.3.1.2. V-movement to C: full and residual V 1.3.1.3. Further properties related to verb-movement

    

vii

       

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

CONTENTS

1.3.2. Verb-movement in the diachronic dimension 1.3.2.1. V-to-T in earlier English 1.3.2.2. V in diachrony

  

1.4. Negative concord 1.4.1. Negative concord synchronically 1.4.2. Negative concord and n-words in the diachronic dimension 1.4.3. Summary and conclusion

 

1.5. Wh-movement 1.5.1. Wh-movement parameters 1.5.2. Wh-movement in the diachronic domain

  

1.6. Head-complement order 1.6.1. Head-complement order synchronically 1.6.2. Head-complement order diachronically 1.6.3. The Final-Over-Final Condition

   

1.7. Summary

 

Further reading

2.

 



Types of syntactic change



Introduction 2.1. Reanalysis 2.1.1. The nature of reanalysis 2.1.2. Transparency and Tolerance 2.1.3. Phonology and reanalysis 2.1.4. Expressing parameters 2.1.5. Reanalysis and the poverty of the stimulus 2.1.6. Conclusion

      

2.2. Grammaticalization



2.3. Argument structure 2.3.1. Thematic roles and grammatical functions 2.3.2. Changes in English psych verbs and recipient passives

  

2.4. Changes in complementation



2.5. Word-order change 2.5.1. Introduction 2.5.2. Early typological approaches to word-order change 2.5.3. Generative accounts and directionality parameters

   

viii

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

CONTENTS

2.5.4. ‘Antisymmetric’ approaches to word-order change and the Final-Over-Final Condition 2.5.5. Conclusion



2.6. Conclusion



Further reading

3.

 

Acquisition, learnability, and syntactic change

 

Introduction 3.1. Parameter-setting in first-language acquisition: a scenario for parameter change? 3.1.1. Introduction 3.1.2. Root infinitives 3.1.3. Early null subjects 3.1.4. Conclusion

    

3.2. The logical problem of language change



3.3. The changing trigger 3.3.1. Contact-driven parameter-resetting 3.3.2. Cue-driven parameter-resetting 3.3.3. Morphologically driven parameter-resetting 3.3.4. Conclusion

    

3.4. Markedness and complexity 3.4.1. The concept of markedness 3.4.2. Markedness and parameters 3.4.3. The Subset Principle 3.4.4. Markedness and core grammar 3.4.5. Markedness and inflectional morphology 3.4.6. Markedness, directionality, and uniformitarianism 3.4.7. Conclusion

       

3.5. Parameter-setting, parameter hierarchies, and syntactic change 3.5.1. Parameter hierarchies 3.5.2. Parameters and diachronic stability 3.5.3. From unmarked to marked: traversing parameter hierarchies 3.5.4. Conclusion

     

3.6. Conclusion



Further reading

ix

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

CONTENTS

4.

The dynamics of syntactic change

 

Introduction 4.1. Gradualness 4.1.1. Introduction 4.1.2. Lexical diffusion: nanoparametric change 4.1.3. Microparametric change 4.1.4. Formal optionality 4.1.5. The Constant Rate Effect 4.1.6. Conclusion

      

4.2. The spread of syntactic change 4.2.1. Introduction 4.2.2. Orderly differentiation and social stratification 4.2.3. Grammars in competition 4.2.4. Formal optionality again 4.2.5. Abduction and actuation 4.2.6. Change in progress? Null subjects in Brazilian Portuguese 4.2.7. Conclusion

     

4.3. Drift: the question of the direction of change 4.3.1. Introduction 4.3.2. Typology, the Final Over Final Condition and drift 4.3.3. Drift and parametric change 4.3.4. Cascading parameter changes in the history of English 4.3.5. Conclusion

   

4.4. Reconstruction 4.4.1. Introduction 4.4.2. Traditional comparative reconstruction 4.4.3. Questions about syntactic reconstruction 4.4.4. The correspondence problem 4.4.5. The ‘pool of variants’ problem 4.4.6. Parametric comparison 4.4.7. Conclusion

       

4.5. Conclusion



 

 



Further reading

x

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

CONTENTS

5.

Contact, creoles, and change

 

Introduction 5.1. Second-language acquisition, interlanguage, and syntactic change



5.2. Contact and substrata 5.2.1. Introduction 5.2.2. Contact and word-order change in the history of English 5.2.3. Substratum effects: Hiberno-English and Welsh English 5.2.4. A ‘borrowing scale’ 5.2.5. Conclusion

 

5.3. Creoles and creolization 5.3.1. Introduction: pidgins and creoles 5.3.2. The Language Bioprogram Hypothesis 5.3.3. The substratum/relexification hypothesis 5.3.4. Conclusion: how ‘exceptional’ are creoles?

    

5.4. Language creation in Nicaragua



5.5. Conclusion



   



Further reading

     

Epilogue Glossary References Index of subjects Index of names Index of languages

xi

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

Preface to the second edition First of all, I’d like to thank Vicki Sunter at OUP for suggesting that I produce a second edition of this book, and then for so patiently waiting for it to materialize. Second, I’d like to thank many friends and colleagues for their comments on various revisions and for their help in obtaining copies of books, papers, and other resources in the period from April to August , when I had no access to my own books or direct physical access to libraries: Theresa Biberauer, Margaret Deuchar, Eugênia Duarte, Teresa Guasti, Caroline Heycock, Mary Kato, Adam Ledgeway, Anna Roussou, Bonnie Schwartz, Bert Vaux, Nigel Vincent, and Charles Yang. A special thanks to Olenka Dmytryk of the Modern and Medieval Languages and Linguistics Library at the University of Cambridge for locating the Atlas of Pidgin and Creole Structures (APiCS) online. Third, I’d like to thank the audiences at the presentations of some of the material in Chapters  to  in Cambridge and Geneva for their comments and questions. Finally, thanks to Alex Cairncross for compiling the Indexes. San Polo di Torrile, September 

Preface to the first edition I’d like to thank the following people for their help, at different times and in different ways, with this book: Roberta d’Alessandro, Bob Berwick, Theresa Biberauer, Anna Cardinaletti, Lucia Cavalli, Chris Cummins, Teresa Guasti, Anders Holmberg, Nina Hyams, Judy Kegl, Ruth King, Adam Ledgeway, Glenda Newton, Ilza Ribeiro, Luigi Rizzi, Anna Roussou, Bonnie Schwartz, Christina Sevdali, Michelle Sheehan, Nigel Vincent, David Willis, the reviewers at Oxford University Press, and John Davey. All the errors are, as ever, entirely mine. I’d also like to thank the University of Cambridge for giving me sabbatical leave in –, which made it possible for me to finish the book. Downing College Cambridge January 

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

List of figures, tables, and boxes Figures . An idealized graphical change . Auxiliary do. Percentage of do forms in different types of sentence, – . Social stratification of (r) in New York City . The rate of overt pronominal subjects in the nineteenth and twentieth centuries . Sample parametric distances based on the data in Table . . Result of application of KITSCH to the data in Table .

Tables . Synchronic verb-movement parameters . Parameters of verb-movement in older Germanic, Romance, and Celtic languages . Verbal agreement inflection in Middle English . Occurrences of null and overt third-person subjects in spoken EP and BP (adapted from Duarte : , table ) . Contexts of verb-movement into the left periphery in Old Germanic (from Walkden : ) . Parameter grid for nominal syntax . Comparison of word order (head-initiality) and null subjects in creoles and non-creoles (APiCS and WALS) . Comparison of other features in APiCS and WALS . Comparison of properties alleged to be rare/non-existent in creoles (APiCS and WALS) . Incidence of word-order and null-subject properties in APiCS not surveyed in WALS

xv

     

         

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

LIST OF FIGURES, TABLES, AND BOXES

Boxes . Technical aspects of movement . Further cross-linguistic variation in overt wh-movement . The significance of wh-movement . Merge and the LCA

xvi

   

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

List of abbreviations and acronyms AAVE ABSL ACC AMP AN APiCS AP ASL BC BCC BDT BP CCH CI CL CORREL CNSL CP DAT DP DR E&F EF EM EME ENE EP EPP FC

African-American Vernacular English Al-Sayyid Bedouin Sign Language accusative case Algonquian Macroparameter adjective-noun Atlas of Pidgin and Creole Structures (Michaelis et al. ) adjective phrase American Sign Language blocking category Borer-Chomsky Conjecture Branching Direction Theory Brazilian Portuguese Cross-Categorial Harmony Conditional Inversion clitic correlative marker consistent null-subject language complementizer phrase dative case determiner phrase Diachronic Reanalysis Emonds and Faarlund () Edge Feature External Merge Early Middle English Early Modern English European Portuguese Extended Projection Principle free choice

xvii

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

LIST OF ABBREVIATIONS AND ACRONYMS

FE FF FLB FLN FOFC G G HC HPSG IG IE IM IP IS ISL ISN LA LA LA LAD LCA LFG LSN ME MidF MMM MSc NA NE NOM NP NPA NPI NSL

Feature Economy formal feature Faculty of Language in the Broad sense Faculty of Language in the Narrow sense Final-Over-Final Condition Generation  Generation  Haitian Creole Head-Driven Phrase Structure Grammar Input Generalization Indo-European Internal Merge inflection phrase information structure Israeli Sign Language Idioma de Señas Nicaragüense first-language acquisition second-language acquisition Labelling Algorithm Language Acquisition Device Linear Correspondence Axiom Lexical-Functional Grammar Lenguaje de Señas Nicaragüense Middle English Middle French Maximize Minimal Means Mainland Scandinavian noun-adjective Modern English nominative case noun phrase Negative Polarity Adverb negative polarity item null-subject language xviii

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

LIST OF ABBREVIATIONS AND ACRONYMS

NNSL NSP OE OF OHG ON OS OV P&P PAC PCM PEI PF PIE PL PLD PNSL PP PRT PSN QI QP QS RAH RI RL RNSL SCL SL SOV SVO TL TMA TP

non-null-subject language Natural Serialisation Principle Old English Old French Old High German Old Norse Old Saxon object-verb principles and parameters Probably Approximately Correct Parametric Comparison Method Prince Edward Island phonological form Proto Indo-European plural marker primary linguistic data partial null-subject language prepositional phrase particle Pidgin de Señas Nicaragüense quantity-insensitive quantifier phrase quantity-sensitive Rich Agreement Hypothesis root infinitives Receiving Language radical null-subject language subject clitic Source Language subject-object-verb subject-verb-object Target Language tense/mood/aspect tense phrase xix

OUP CORRECTED PROOF – FINAL, 5/10/2021, SPi

LIST OF ABBREVIATIONS AND ACRONYMS

UG UT VO VP VSO WALS WHL WLB

Universal Grammar utterance time verb-object verb phrase verb-subject-object World Atlas of Language Structures (Haspelmath et al. ) Weinreich, Labov, and Herzog () Willis, Lucas, and Breitbarth ()

xx

OUP CORRECTED PROOF – FINAL, 4/10/2021, SPi

Introduction

Just under , years ago, a monk named Ælfric translated the Latin Vulgate Bible into English. Here are a few lines from his translation of the passage in Genesis (:–) describing the temptation of Eve:1 Ēac swelċe sēo nǣdre wæs ġēappre þonne ealle þā ōðre nīetenu þe God ġeworhte ofer eorðan; and sēo nǣdre cwæð tō þām wīfe: ‘Hwȳ forbēad God ēow þæt ġē ne ǣten of ǣlcum trēowe binnan Paradīsum?’ Þæt wīf andwyrde: ‘Of þara trēowa wæstme þe sind on Paradīsum wē etað: and of þæs trēowes wæstme, þe is onmiddan neorxenawange, God bebēad ūs þæt wē ne ǣten, ne wē þæt trēow ne hrepoden þȳ lǣs wē swulten.’ Þā cwæð sēo nǣdre eft tō þām wīfe: ‘Ne bēo ġē nāteswhōn dēade, þēah ġē of þām trēowe eten.’

This is the West Saxon dialect of Old English (OE), the nearest thing to a standard language of Anglo-Saxon England. Different varieties of OE were spoken in England from the time of the Anglo-Saxon invasions of the island of Britain in the fifth century until, according to the usual chronology, the Norman invasion in . After this Norman French became the language of the ruling class; the English of the period – is conventionally known as Middle English (ME). Early Modern English (ENE) conventionally dates from , though sometimes , the date of the introduction of printing into England, is taken as the beginning of this period. Modern English (NE) refers to the period from  to the present. OE and NE are strikingly different; for most speakers of NE, OE appears to be a foreign language. To the untrained eye passages such as the above are indecipherable. 1 The text is taken from Mitchell and Robinson (: ). Details regarding the source of the text are given by Mitchell and Robinson (: ). I have followed Mitchell and Robinson’s ‘normalization’ of the orthography and accents (see Mitchell and Robinson (: –)). On Ælfric’s life and work, with particular emphasis on his authorship of the first grammar of Latin written in English, see Law (: –).



OUP CORRECTED PROOF – FINAL, 4/10/2021, SPi

INTRODUCTION

By contrast, here is the King James Bible version of the same passage, from . This variety of English is of course somewhat archaic, representing a literary variety of the ENE, but it is nonetheless relatively comprehensible for modern readers: Now the serpent was more subtil than any beast of the field which the LORD God had made. And he said unto the woman, Yea, hath God said, Ye shall not eat of every tree of the garden?  And the woman said unto the serpent, We may eat of the fruit of the trees of the garden:  But of the fruit of the tree which is in the midst of the garden, God hath said, Ye shall not eat of it, nor shall ye touch it, lest ye die.  And the serpent said unto the woman, Ye shall not surely die. (Mitchell and Robinson : )

Here we are observing language change. As our OE passage illustrates, languages can change almost out of recognition in the course of a millennium. This book aims to present some recent ideas regarding certain aspects of this phenomenon of language change, in the context of an influential general theory of language. The particular aspect of language change that this book is concerned with is syntactic change, change in the ways in which words and phrases are combined to form grammatical sentences. If we update all the other aspects (vocabulary, orthography, etc.) of our passage from Ælfric above, but keep the syntax the same as the OE, we have something like the following: Also such the snake was deceitfuller than all the other beasts that God made on earth; and the snake said to the woman: ‘Why forbade God you that ye not eat of each tree in Paradise?’ The woman answered: ‘Of the trees’ fruit that are in Paradise we eat: and of the tree’s fruit, that is in-the-middle-of Paradise, God bade us that we not eat, nor that we the tree not touch lest we die.’ Then said the serpent back to the woman: ‘Not be ye not-at-all dead, though that ye of the tree eat.’

This is a word-for-word rendering of the passage into NE (hence ‘inthe-middle-of ’ is hyphenated, as it corresponds to the single OE word onmiddan, and similarly ‘not-at-all’ for nāteswhōn). Although it is now comprehensible, my rendering brings to light a number of syntactic differences between Ælfric’s English and today’s. Of these we can note the form of the question the serpent puts to Eve: ‘Why forbade God you . . . ?’ In NE, main verbs like forbid do not invert with subjects in questions, the auxiliary do being used instead (i.e. ‘Why did God forbid 

OUP CORRECTED PROOF – FINAL, 4/10/2021, SPi

INTRODUCTION

you . . . ?’). We see another instance of ‘main-verb inversion’—but this time not in a question—in the penultimate line: ‘Then said the serpent back to the woman’. Another striking difference occurs twice in the last two sentences: Eve says ‘that we the tree not touch’ and the serpent, in his reply, says ‘though that ye of the tree eat’. Here we see the order subject (we/ye), object ((of) the tree), verb (touch/eat); this is an order which NE does not usually allow but which is usual in OE subordinate clauses. Further differences can be observed. Look, for example, at the position of the negative word ne: I have translated this as ‘not’; although in fact ne died out in late ME, and in NE not derives from OE nan wuht ‘no wight’ (‘no creature’). Notice too the occurrence of ‘that’ in various places where it is not allowed in NE, such as following ‘though’ in the last sentence. I have not attempted to represent the OE case marking on nouns and articles, but this can be observed in the different forms of the word I have translated as ‘the’: sēo (nominative singular masculine), þæt (nominative/accusative singular neuter), þāra (genitive plural), þæs (genitive singular masculine/neuter), þām (dative singular masculine/ neuter), etc. These case markings have all but disappeared in NE, a development which, although in itself a morphological or phonological change, may have affected English syntax. So we can see that English syntax has changed in a number of ways in the past , years. But during that period the language has been passed on from generation to generation in the normal way, first in England and later in the various countries where English speakers settled. Children have learnt the language at their mothers’ knees, and there is no good reason to think that invaders or other foreign influences caused the kinds of changes we have just observed, with the possible exception of the Norse invaders of the ninth to eleventh centuries (see §..). In particular, although English has absorbed a great deal of vocabulary from French and Latin in the past millennium, there is no clear evidence that either of these two languages has influenced English as far as the types of changes we have just observed are concerned. So how and why do changes like these take place? That is the central question this book will address. The recorded histories of other languages also attest to syntactic change. The example of Ælfric’s Bible translation could easily be replicated by comparing an excerpt from a twelfth-century chanson de geste with Modern French, or by comparing Plato’s syntax with Modern Greek, or the Vedic hymns with Modern Hindi. Like all other 

OUP CORRECTED PROOF – FINAL, 4/10/2021, SPi

INTRODUCTION

types of language change, syntactic change can be observed wherever we compare surviving ancient texts with those in a corresponding modern language. As has often been observed, change appears to be almost an inherent feature of all aspects of language. Language, to use McWhorter’s (: ) phrase, appears to show a kind of ‘structured variation’. The purpose of this book is to present some recent ideas concerning this structured variation in syntax and apply them to change over time. To do this, we must develop a general theory of the nature of the structures and of the nature of the variations. The theory of syntactic variation is the object of the first chapter, and so I will say no more about it here. Concerning the nature of syntactic structure, I will adopt what is arguably the most influential theory of recent years: that developed by Noam Chomsky and his associates and usually known as generative grammar.2 The most recent variant of generative grammar is known as the Minimalist Programme, and I will assume a version of this in what follows. However, since my goal here is neither to develop nor to defend this particular version of generative syntax, I will try to keep the technical details to a minimum. I hope that readers who are fully conversant with these details will not see my approach as too simplistic, and that those who are unfamiliar with them will not be deterred. Three aspects of Chomsky’s thinking about language are central to what follows, and we must be explicit about these. The first is the idea that sentences can be exhaustively divided up into smaller constituents, down at least to the level of the word,3 and that the basic combinatorial principles are discrete, algorithmic, recursive and purely formal. By ‘discrete’ I mean that the elements of syntax are clearly distinguished from one another: clines, squishes, fuzzy sets, and continua play no role. By ‘algorithmic’, I mean that syntactic structures can be determined in an explicit, step-by-step fashion. By ‘recursive’, I mean that syntactic operations can apply to their own output, thus in principle creating infinite structures from a finite set of symbols and operations. And by ‘formal’ I mean that syntactic operations are not directly

2

Boldfaced items in the text are defined in the Glossary. This idea is not original to Chomsky. It was an aspect of the American structuralist school of linguistics which was dominant in the United States prior to the s, and has older historical antecedents; Seuren (: ) traces it back to Wundt (). 3



OUP CORRECTED PROOF – FINAL, 4/10/2021, SPi

INTRODUCTION

determined by semantics, but can be seen as operations which manipulate symbols independently of any denotation those symbols may have. The simplest way to illustrate these ideas is in terms of the basic operation Merge, proposed in Chomsky (). Merge combines two syntactic elements (in the simplest case, two words) into a more complex entity which consists of those two elements and its label, the label being determined by one of the two elements. More recently, Chomsky (, ) has proposed that Merge just combines two elements, with the label of the resulting syntactic object being supplied by a separate Labelling Algorithm. For example, Merge may combine the noun apple with the determiner the, forming the larger phrase the apple. The resulting phrase is usually regarded as a determiner phrase (DP) in current work, reflecting the assumption that the label of the larger unit formed by Merge is contributed by the determiner the. Merge may then combine the verb ate with the DP the apple, giving the phrase ate the apple, which is taken to be a verb phrase (VP)—the verb determines the label of the larger unit in this case. The structure that results from these operations can be represented as a labelled bracketing, as in (a), or as a tree diagram, as in (b): ()

a. [VP [V ate ] [DP [D the ] [N apple ]]] b.

VP

V ate

DP

D the

N apple

These two representations are entirely equivalent, the choice between them being determined by didactic or typographical considerations. Merge is discrete, in that it combines distinct elements (words, categories); algorithmic, in that it can be seen to apply in a mechanical, step-by-step fashion (and it can be formalized rather more precisely than I have done here—see Chomsky : ff.; Stabler ; Stabler and Collins: : f.); recursive, in that it applies to its own output, as our example illustrates with the DP formed by Merge being itself part of 

OUP CORRECTED PROOF – FINAL, 4/10/2021, SPi

INTRODUCTION

the input to the next operation of Merge forming the VP; and formal in that reference is not made to the meaning of the symbols combined. The second aspect of Chomsky’s thinking which is important for our purposes has two components: that the fundamental principles of syntax are universal, and that they may therefore reflect some aspect of human cognition. These two points are logically distinct, although they naturally go together. The first idea is that operations like Merge are not specific to any particular language but are formal universals of language. This is a radical and thought-provoking idea,4 which has given rise to much debate over many years. It implies that principles such as Merge must have been operational in Ælfric’s English, Plato’s Greek, etc., every bit as much as they are in present-day English, Greek, etc. Assuming formal universals in this way means that our approach to historical questions adheres to the uniformitarian hypothesis, the idea that ‘the languages of the past . . . are not different in nature from those of the present’ (Croft : ; see Roberts a for discussion). Rather than attempt to justify Chomsky’s radical idea here, I hope that the chapters to follow will show that this idea has a number of very interesting empirical and conceptual consequences. Chomsky’s further proposal that the formal universals of language represent an aspect of human cognition has given rise to even more controversy. What is most relevant in the present context is that it allows us to relate syntactic structure to children’s acquisition of their first language. During the early years of life these universals are put into action as the child develops the capacity to speak and understand. There are two ways to think about how this may happen. On the one hand, if the universals themselves must be acquired, then this of course must happen during language development. On the other hand, if the universals are inherited (since language is common to all—and only— humans, and inherited universals may be thought of as ultimately derivable from the nature of the human genome, along with other 4 Again, I do not mean to imply that it is new. In the Western tradition, the concept of universal grammar has its origins in Plato and Aristotle; see Maat (). The Cartesian Port-Royal grammarians in the seventeenth century and the medieval speculative grammarians (the modistae) (Law : ) developed what we could in hindsight think of as theories of UG. Chomsky (: ch. , /) discusses his own view of some of the historical antecedents of his ideas on this and other matters. Chomsky (/) is critically reviewed by Aarsleff () and is also commented on by Simone (), among many others (see the references given in Simone : ); see in particular McGilvray (, ).



OUP CORRECTED PROOF – FINAL, 4/10/2021, SPi

INTRODUCTION

specifically human features), then they are simply applied to the task of language acquisition. The celebrated argument from the poverty of the stimulus (which I will briefly review at the beginning of Chapter ) asserts that the second of these views is the more plausible of the two. But, whether we accept this or not, it is clear that differences between languages must be acquired as part of the process of first-language acquisition. Going back to our notion of structured variation, we see that, while the universal structures may be either inherited or acquired, the variation must be acquired (on the problem of acquiring variation, see Fodor and Sakas ; Crisma, Guardiano, and Longobardi ; Roberts and Longobardi ). Since historical change is variation in time, this in turn implies a connection between historical change and language acquisition. This is an old idea (Paul ); see Harris and Campbell (: ); Morpurgo-Davies (: –), which has been taken up most notably and influentially by Lightfoot (, , ) in the context of generative grammar. In much of what follows we will explore the ramifications of this idea. The third idea of Chomsky’s is the distinction he makes between Externalized Language (E-language) and Internalized Language (Ilanguage); see Chomsky (: –). These are two quite distinct conceptions of language. I-language refers to the intensional, internal, individual knowledge of language. E-language is really an ‘elsewhere’ concept, referring to all notions of language that are not I-language. Universal Grammar (UG) is a more general notion than I-language; indeed, it can be defined as ‘the general theory of I-language’ (Berwick and Chomsky : ). Concepts such as Merge, and formal universals of language more generally, as introduced above, are concepts relating to I-language, in fact to UG as the general theory of I-language. E-language is arguably a more complicated notion than I-language, involving society, culture, history, and so on. Concepts like ‘English’ and ‘French’ in their everyday senses are E-language concepts. In the context of studying syntactic change, notions such as ‘Old English’, ‘Middle English’, etc., are clearly E-language concepts; earlier we defined them in a directly historical way, as is standard practice in historical linguistics. Our evidence for these languages exists purely in the form of texts, and as such is also E-language. However, we take these texts to reflect the I-language of the individuals who produced them (just as the text you are now reading reflects my I-language). What we are concerned with in studying syntactic change, then, is 

OUP CORRECTED PROOF – FINAL, 4/10/2021, SPi

INTRODUCTION

seeing how observable differences in the syntax of E-languages in texts must reflect changes in I-languages across different generations. The E-language vs. I-language distinction can shed some light on what we called above the ‘odd state of affairs’ that OE texts are very hard for untrained speakers of NE to understand. The term ‘English’ in both ‘Old English’ and ‘Modern English’ is an E-language term; it cannot refer to any individual’s I-language as it designates a historical entity more than , years old. But we assume, somewhat simplifying a more complex historical reality, that there has been a continuous line of transmission of I-languages from generation to generation among (at least a certain subset of) ‘English speakers’ over that millennium, and that in that process the nature of the I-languages has changed significantly, as their E-language by-products show. Our goal here is to investigate how and why I-languages can apparently change in this way (see Battye and Roberts : ‒, Hale , , and Walkden : ‒ for further discussion of the connections between I-language and syntactic change). To sum up, three of Chomsky’s ideas are central to the discussions to follow: the idea that there are formal universals of syntax, the idea that these universals are an aspect of human cognition and the distinction between I-language and E-language. I have sketched these ideas here, but I have not attempted to do them justice. Recent introductions to the theory of syntax, all of which go over these points to a greater or lesser extent, are Haegeman (, ); Poole (); Adger (); Lasnik, Uriagereka, and Boeckx (); Radford (, , ); Hornstein, Nunes, and Grohmann (); Carnie (); Larson (); Freidin (); Sportiche, Koopman, and Stabler (); and Koeneman and Zeijlstra (). A basic grounding in Chomsky’s ideas about the cognitive status of language is presented in Smith (), while Chomsky (, , ) goes into these questions in more detail and, in the case of the latter two, with specific reference to the Minimalist Programme. However, this book is not intended to build directly on the textbooks in the sense of providing more sophisticated analytical techniques or theory-internal reflections. Discussions of the relative merits of the Minimalist Programme, or of any other designated approach to the nature of the formal universals of syntax, will not figure: I will simply adopt an informal version of minimalism. In the widest sense, then, the goal of this book is to illustrate how Chomsky’s three ideas just summarized can form the basis for the study of historical syntax. These ideas can shed light on how and why English 

OUP CORRECTED PROOF – FINAL, 4/10/2021, SPi

INTRODUCTION

has changed since Ælfric’s time in the ways we observed above, and allow us to integrate our account of these changes with a general theory of structure and variation in syntax. I hope that this book will provide a clear conception of the implications of Chomskyan thinking for traditional questions in historical linguistics and a different perspective on the nature of UG and first-language acquisition. For those already familiar with Chomskyan syntax, I hope it will provide an illustration of the importance and relevance of syntactic change for our conception of how grammatical systems vary syntactically over time and how such systems are acquired. In this sense, the book is written from an explicitly Chomskyan perspective, although the emphasis is on the interpretation and extension of Chomsky’s thinking, rather than on the defence, exegesis, or criticism of specific proposals—technical or philosophical—in Chomsky’s writings. Finally, I should point out what this book is not intended to do. It is not intended as a manual for syntactic analysis; the textbooks cited above fulfil this function. Neither is it intended as a guide for doing historical work, whether of a traditional philological kind or of a computational, corpus-driven kind. Instead, as stated above, the book is intended as an introduction to a particular area of linguistic theory. Further reading At the end of each chapter, I will give a few details and comments on the more important works mentioned, as well as other relevant works. Naturally, a number of works are mentioned in more than one chapter; I will comment on each work at the end of the first chapter in which it is mentioned. Thus, if the reader does not find a comment on a work at the end of a later chapter, the preceding chapters should be checked. Not every single reference mentioned in the text is commented on in these sections, but all of the more significant and useful works are. The further reading mentioned in this chapter falls into various categories. Works on the history of linguistics Robins () is the classic introduction to the history of linguistics, ‘from Plato to Chomsky’ as the subtitle says. Law () is a very 

OUP CORRECTED PROOF – FINAL, 4/10/2021, SPi

INTRODUCTION

thorough overview of the history of linguistic thought in Western Europe from antiquity to . Chomsky (/) contains Chomsky’s own assessment of the seventeenth-century antecedents to his thinking on the nature of language, mind, and grammar. Aarsleff () is a very critical assessment. Morpurgo-Davies () surveys the history of linguistics in nineteenth-century Western Europe, and provides a valuable perspective on the development of modern historical linguistics. Seuren () is a very interesting history of Western linguistics, usefully combined with a history of logic in Western Europe. The views expressed on generative grammar are somewhat idiosyncratic, however. Allan () is an extremely wide-ranging and useful handbook, covering all major periods and subfields, including chapters on non-Western linguistic traditions. Textbooks on syntax Adger () is an introduction to minimalist syntax, which presupposes no prior knowledge of earlier versions of syntactic theory. Similarly, Freidin () presupposes no background, and provides an excellent conceptual introduction to the overall goals of generative grammar and the Minimalist Programme. Koeneman and Zeijlstra () also presupposes no prior knowledge of syntax, and provides a particularly reader-friendly introduction to aspects of the Minimalist Programme. Larson () is another excellent general introduction, presupposing no prior knowledge, with a particular emphasis on theory-construction. Radford (, , ) are very comprehensive introductions, again presupposing no prior knowledge of syntax. Sportiche, Koopman, and Stabler () is very comprehensive and sophisticated, presupposing no prior knowledge and reaching an advanced level. Carnie () is a more general introduction and combines elements of minimalist syntax with those of the earlier government-and-binding theory, as do Ouhalla () and Roberts (). Haegeman () is the most comprehensive introduction to government-and-binding theory available, and Haegeman () is a general introduction to syntactic theory. Poole (), Lasnik, Uriagereka, and Boeckx (), and Hornstein, Nunes, and Grohmann () are introductions to



OUP CORRECTED PROOF – FINAL, 4/10/2021, SPi

INTRODUCTION

the technical aspects of the Minimalist Programme written by teams of leading experts in the field. Chomsky’s work and introductions to it Chomsky () remains in many ways the foundational text of generative grammar; Chapter  of this book is arguably Chomsky’s fullest and most lucid introduction to the goals of generative grammar to date. Chomsky () is a collection of papers from the early s, including (Chapter ) the first exposition of the Minimalist Programme, and (Chapter ) some very important refinements of those initial ideas. The technical notions of minimalism are further developed and refined in Chomsky (, ) and elsewhere (see the Further reading sections in later chapters), while Chomsky (; ) present the conceptual background to the Minimalist Programme. Cook and Newson () is an accessible introduction to Chomsky’s thinking on UG, although some of the ideas presented are a little outdated. Smith () is more up to date, and covers Chomsky’s thinking on a range of issues, including politics. Roberts (c) is a general handbook on UG, which contains several chapters dealing with fundamental aspects of Chomsky’s thinking. Historical linguistics Lightfoot () is arguably the foundational text in diachronic generative syntax, and the direct inspiration for much of the material in this book. Lightfoot () develops a number of the central ideas of the earlier work, as well as introducing the notion of ‘degree- learnability’, which will play a role in our discussion, notably in Chapter . Lightfoot () restates and elaborates a number of the ideas from the earlier works. Lightfoot () further elaborates these ideas, with the emphasis on the emergence of new languages. Harris and Campbell () is a very interesting survey of the issues in diachronic syntax from a non-Chomskyan theoretical perspective, and contains a number of clarifications of core questions, as well as some interesting novel proposals. Mitchell and Robinson () is the most comprehensive introduction to Old English and Anglo-Saxon literature and culture



OUP CORRECTED PROOF – FINAL, 4/10/2021, SPi

INTRODUCTION

available. Paul () is a classic statement of the concepts and methods of historical linguistics, written by a major neogrammarian. This work remains influential to this day. Yang (a) introduces the influential ‘variationist’ approach to language change, and Yang () is devoted to the very important Tolerance Principle (which we look at in §..). Ledgeway and Roberts (c) is a handbook covering the major concepts in syntactic change.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

1 Formal comparative and historical syntax

1.1. 1.2. 1.3. 1.4. 1.5. 1.6. 1.7.

Introduction UG and variation in grammatical systems Null subjects Verb-movement parameters Negative concord Wh-movement Head-complement order Summary Further reading

        

Introduction In this chapter, I will present the way in which syntactic variation is analysed in current theory. The central notion is that of parameter of Universal Grammar, a term which is fully explicated in §.. The rest of the chapter is devoted to illustrating certain parameters of Universal Grammar (UG), with examples taken from both the synchronic and the diachronic domains. We establish that this analytic device, which has been used to describe synchronic variation across languages, can also be used to describe diachronic changes between different stages of the same language. Before introducing parametric variation, however, we need to be more precise about what does not vary, i.e. about the nature of the formal universals of syntax that were mentioned in the Introduction. Chomsky has always argued that one of the goals of linguistic theory is to develop a general theory of linguistic structure that goes beyond simply describing the structures of individual languages (see Chomsky 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

: ; Matthews : ff. and references given there). In other words, a major concern of linguistic theory is to develop a characterization of a possible human grammar. To do this, we elaborate a theory of the formal universals of human language, known as Universal Grammar (UG). Roberts (c: ) defines UG as follows: ‘UG is the general theory of I-languages, taken to be constituted by a subset of the set of possible generative grammars, and as such characterizes the genetically determined aspect of the human capacity for grammatical knowledge’. Thus, UG embodies the essential invariant parts of the structure of language. Whilst our main concern in this book is with syntax, UG also contains principles related to phonology, morphology, and semantics. Whether these aspects of language are subject to parametric variation in the same way as syntax is an open question; there is some reason to think that this is true of phonology and morphology (see the discussion of Dresher  in §.., for an example of a phonological parameter), while it is possible that semantics is not subject to variation. However, owing to my own lack of relevant expertise, I will leave these other subsystems of UG aside and concentrate on syntax. One important way in which syntax makes human language possible has to do with its recursive nature. As mentioned in the Introduction, recursion makes it possible to construct infinite structures from a finite number of elements. The recursive nature of syntax is a necessary component of what Chomsky has called the ‘creative aspect of language use’: the fact that humans are able to produce and understand utterances that have never been produced before. This formal property of naturallanguage syntax allows us to give expression to our freedom of will. In saying that UG defines a possible human language, I mean that UG is intended as a general theory of the structure of human language; we have seen that Berwick and Chomsky (: ) define it as the general theory of I-language. Thus UG is not simply an account of the structure of the set of languages that happens to exist at this—or any other—historical moment. To be more precise, UG is intended to give an account of the nature of human grammar, rather than language; the notion of grammar, or I-language, is more precise and less subject to confusion due to social, political, and cultural factors than that of language (or, regarding the latter factors, E-language). Moreover, whilst a language can be thought of just as a set of strings of symbols, a grammar is more abstract, being the device which determines which 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

INTRODUCTION

sets of symbols are admitted in the language. In other words, even if we had at our disposal the means, both intellectual and practical, to write an exhaustive description of the grammar of every language currently spoken and every language for which textual evidence survives, any resulting inductive distillation of the results of such a survey would not yield UG. It could yield an extensional definition of the common features of all currently (and recently) existing grammars, and would be universal in this weak sense. But what UG aims for is an intensional characterization of the class of human grammars: a characterization of what makes a grammar what it is. UG should tell us what the defining properties of any possible human grammar are. A natural question to ask is whether UG is a purely abstract entity (for example, a set of some kind) or whether it has some physical or mental existence. Chomsky’s view is that UG has mental reality, in that it is part an aspect of the human mind, the language faculty. We can define UG as our theory of a central facet of the language faculty, the mental faculty or faculties which both facilitate and delimit the nature of grammar. This view has the advantage that UG can now be seen as being in principle a theory of an aspect of physical reality; the language faculty—as a mental reality—is physically instantiated in the brain (somehow—a number of complex philosophical, psychological, and neurological issues arise here). In a very important article, Hauser, Chomsky, and Fitch () distinguish the Faculty of Language in the Narrow sense (FLN) from the Faculty of Language in the Broad sense (FLB). They propose that the FLB includes all aspects of the human linguistic capacity, including much that is shared with other species: ‘FLB includes an internal computational system (FLN, below) combined with at least two other organism-internal systems, which we call “sensory-motor” and “conceptual-intentional”’ (Hauser, Chomsky, and Fitch : ); ‘most, if not all, of FLB is based on mechanisms shared with nonhuman animals’ (Hauser, Chomsky, and Fitch : ). FLN, on the other hand, refers to ‘the abstract linguistic computational system alone, independent of the other systems with which it interacts and interfaces’ (Hauser, Chomsky, and Fitch : ). This consists in just the operation which creates recursive hierarchical structures over an unbounded domain, Merge, which ‘is recently evolved and unique to our species’ (Hauser, Chomsky, and Fitch : ). Chomsky’s () three factors in language design relate to FLB. These are: 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

() Factor One: the genetic endowment, UG. Factor Two: experience, Primary Linguistic Data (PLD) for language acquisition. Factor Three: principles not specific to the faculty of language/nondomain-specific optimization strategies and general physical laws. In this context, UG is just one factor that contributes to FLB.1 Furthermore, there is nothing mysterious about the idea that the language faculty may be genetically inherited—the innateness hypothesis. This is the idea that the particular aspects of cognition which constitute the language faculty are a consequence of genetic inheritance, and it is no more or less surprising and problematic than the general idea that cognition is to some degree genetically facilitated. And if cognition is physically instantiated in the brain (somehow), then the claim is just that aspects of the physical functioning of the brain are genetically inherited. To recapitulate: it is a goal of linguistic theory to attempt to develop a general characterization of a possible human grammar. It is reasonable, although not a matter of logical necessity, to take this characterization to be a reflection of some aspect of how the mind works, i.e. as a facet of human cognition which we call the language faculty. If cognition has a physical basis in the brain, then so does the language faculty. Finally, it 1 Since aspects of the language faculty may not be domain-specific on this view, it may not be correct to think in terms of a specialized ‘mental module’ for language, although there is some evidence from language pathology for this (see in particular Smith and Tsimpli  and Smith ). There is also evidence for a critical period specific to language acquisition, see Guasti (: –), which may in turn favour the postulation of a ‘language module’, although not as a logical necessity. Clearly, though, the claim that language is a facet of cognition and physically instantiated in the brain does not entail the postulation of a language module. For a response to Hauser, Chomsky, and Fitch (), defending the conception of modularity, see Pinker and Jackendoff (); Jackendoff and Pinker (). In this connection, it is worth considering an interesting terminological proposal made by Rizzi (: –). He suggests that we may want to distinguish ‘UG in the narrow sense’ from ‘UG in the broad sense’, deliberately echoing Hauser, Chomsky, and Fitch’s FLN–FLB distinction. UG in the narrow sense is just the first factor of (), while UG in the broad sense includes third factors as well. Furthermore, Tsimpli, Kambanaros, and Grohmann () find it useful to distinguish ‘big UG’ from ‘small UG’ in their discussion of certain language pathologies. The definition of UG from Roberts (c) given above is ambiguous between these two senses of UG, in that it states that UG ‘characterizes the genetically determined aspect of the human capacity for grammatical knowledge’. This genetically determined property could refer narrowly to the first factor or more broadly to both the first and the third factors; the distinction is determined by domain-specificity in that the first factor is specific to language and the third more general.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

INTRODUCTION

may be that the language faculty is genetically inherited; that some aspect of the human genome determines its existence in all normal humans. The innateness hypothesis is highly controversial. As mentioned in the Introduction, the principal argument for it is the poverty-of-thestimulus argument. Here I will briefly summarize this argument (for a more detailed presentation, see Roberts : –; Jackendoff : –; and, in particular, Guasti : –; Lasnik and Lidz ; Pullum and Scholz () present a very strong version of the poverty-of-the-stimulus argument, which they subject to a detailed critique; see Lasnik and Lidz  for discussion). As its name implies, the poverty-of-the-stimulus argument is based on the observation that there is a significant gap between what seems to be the experience facilitating first-language acquisition (the PLD in ()) and the nature of the linguistic knowledge which results from first-language acquisition, i.e. one’s knowledge of one’s native language. The following quotation summarizes the essence of the argument: The astronomical variety of sentences any natural language user can produce and understand has an important implication for language acquisition . . . A child is exposed to only a small proportion of the possible sentences in its language, thus limiting its database for constructing a more general version of that language in its own mind/brain. This point has logical implications for any system that attempts to acquire a natural language on the basis of limited data. It is immediately obvious that given a finite array of data, there are infinitely many theories consistent with it but inconsistent with one another. In the present case, there are in principle infinitely many target systems . . . consistent with the data of experience, and unless the search space and acquisition mechanisms are constrained, selection among them is impossible . . . No known ‘general learning mechanism’ can acquire a natural language solely on the basis of positive or negative evidence, and the prospects for finding any such domainindependent device seem rather dim. The difficulty of this problem leads to the hypothesis that whatever system is responsible must be biased or constrained in certain ways. Such constraints have historically been termed ‘innate dispositions’, with those underlying language referred to as ‘universal grammar.’ (Hauser, Chomsky, and Fitch : –)

Similarly, in introducing the general question of the nature of the learning problem for natural languages, Niyogi (: ) points out that the basic problem is the inherent difficulty of inferring an unknown target from finite resources and in all such investigations, one concludes that tabula rasa learning is not possible. Thus children do not entertain every possible hypothesis that is consistent with the data



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

they receive but only a limited class of hypotheses. This class of grammatical hypotheses H is the class of possible grammars children can conceive and therefore constrains the range of possible languages that humans can invent and speak. It is UG in the terminology of generative linguistics.

As an illustration of the complexity of the task of language acquisition, consider the following sentences: () a. The clowns expect (everyone) to amuse them. b. The clowns expected (everyone) to amuse themselves. If everyone is omitted in (a), the pronoun them cannot correspond to the clowns, while if everyone is included, this is possible. If we simply change them to the reflexive pronoun themselves, as in (b), exactly the reverse results. In (b), if everyone is included, the pronoun themselves must correspond to it. If everyone is left out, themselves must correspond to the clowns. (One might object that facts such as these are semantic, but they are usually considered to be partially determined by syntax—the usual analyses of these phenomena are described in the textbooks cited in the Introduction.) The point here is not how these facts are to be analysed, but rather the precision and the subtlety of the grammatical knowledge at the native speaker’s disposal. It is legitimate to ask where such knowledge comes from. Another striking case involves the interpretation of missing material, as in (): () John will go to the party, and Bill will—too. Here there is a notional gap following will, which we interpret as go to the party; this is a ‘missing’ VP, and the phenomenon is known as VP-ellipsis. In (), we have another example of VP-ellipsis: () John said he would come to the party, and Bill said he would— too. Here there is a further complication, as the pronoun he can, out of context, correspond to either John or Bill (or an unspecified third party). Now consider (): () John loves his mother, and Bill does—too. Here the gap is interpreted as loves his mother. What is interesting is that the missing pronoun (the occurrence of his that isn’t there following does) has exactly the three-way ambiguity of he in (): it may 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

INTRODUCTION

correspond to John, to Bill or to a third party. Example () shows we have the capacity to apprehend the ambiguity of a pronoun which is not pronounced. The above cases are examples of native grammatical knowledge. The basic point in each case is that native speakers of a language constantly hear and produce novel sentences in that language, and yet are able to make very subtle judgements of interpretation and ambiguity. They are also able to distinguish well-formed sentences from ill-formed ones. Here is a further example, uttered while planning a party, for example. This example is based on Radford (: ): () Who did he think was likely to drink what? This sentence has a natural interpretation, known as the ‘pair-list’ interpretation, according to which an answer to ‘who’ and an answer to ‘what’ are paired (i.e. ‘He thought John was likely to drink vodka, Mary gin, Bill orange juice’, etc.). We understand the sentence this way naturally, and moreover we immediately understand that he cannot refer to the same individual as who (more technically, he must be disjoint from who). Also, we can recognize the following variants of this example as ungrammatical (indicated by an asterisk), even if they are trying to mean the same thing: (0 )

a. *Who did he think that was likely to drink what? b. *What did he think who was likely to drink? (cf. What did he think John was likely to drink?) c. *Who was he thought likely to drink what? d. *Did he think who was likely to drink what?

The question is why and how we are able to distinguish previously unheard examples like () from ungrammatical but extremely similar ones such as (0 ). In first-language acquisition, negative evidence— information about ungrammatical sentences—is unavailable; children may be exposed to ungrammatical sentences but they are not told that they are ungrammatical; where explicit instruction is intended, it appears to be either ignored or misunderstood. Meaning probably isn’t much help in distinguishing the examples in (), as the sentences in (0 ) mean the same as those in (), to the extent they mean anything, which (0 a) pretty clearly does. This knowledge must either come from experience or from within. If we truly have no experience of novel sentences like (), then it must come from within. Moreover, if the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

knowledge of these properties of English comes from within, it must represent some aspect of UG, as there is no genetic disposition to English. Here we see the links between the poverty of the stimulus, the postulation of an innate language faculty, and UG. To put it another way, if we deny that knowledge of grammar of the type illustrated in ()–() can be innate, then we must maintain that the conditions of language acquisition and the nature of our minds (minds by hypothesis lacking any special predisposition to grammatical knowledge) are such that we are able to glean subtle aspects of the interpretation of pronouns purely from experience, including absent pronouns as in (); we must also be able to distinguish sentences like () from non-sentences like (0 ). Despite much criticism of the poverty-of-thestimulus argument (see Pullum and Scholz  and the references given there), no clear account of why and how native speakers can do any of this has emerged. On the other hand, introductory textbooks of the kind referred to earlier offer such an account in terms of an innate UG. Of course, a natural response is to say that, while we may never have heard (), we have heard plenty of examples like it. But here we must be very clear about what ‘like ()’ actually means. If ‘like ()’ means ‘containing the same, or nearly the same words, as ()’ then of course (0 ) are very like (); these examples contain exactly the same words as () in all cases except one. But the examples in (0 ) are ungrammatical while () is grammatical. Construing ‘like ()’ in any other sense involves attributing knowledge of some aspect of syntactic structure to speakers who recognize the difference between () and (0 ), and this is exactly what the poverty-of-the-stimulus argument is trying to explain. Thus the question of the mental status and the origins of that knowledge is begged. The idea of some kind of superficial resemblance among sentences as informing language acquisition has underlain many behaviourist theories of acquisition. Chomsky () showed how one rather wellworked-out behaviourist theory of language acquisition was doomed to failure. More recently, Guasti (: –) provides a detailed discussion of why mechanisms such as imitation, reinforcement, and association are unable to account for the first-language acquisition of such aspects of grammar. Moreover, there is evidence that firstlanguage acquisition takes place on the basis of ‘positive evidence’ only, in the sense that children do not have access to information 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

INTRODUCTION

regarding what is not allowed; they only hear examples of what is possible (see Guasti : –, : –, and the references given there; this issue is complex as it involves making assumptions regarding what children ‘do’ with what they hear, about which almost nothing is known). Also, language acquisition takes place in a largely uniform way across children from different social groups and language backgrounds, does not rely on explicit instruction, and happens very quickly given the complexity of the task and the relatively rudimentary nature of more general reasoning and other cognitive skills at an early age. Most of first-language acquisition is effectively accomplished by the age of . The poverty-of-the-stimulus argument asserts that, given the factors mentioned above, it is highly implausible to think that there is no predisposition to language at all. If there is a ‘predisposition to language’, then some aspect of linguistic knowledge is innate. In the absence of any account of how grammatical knowledge like that illustrated in (–)—and myriad similar examples (see Anderson and Lightfoot : –; Crain and Pietroski ; Fodor and Crowther ; Jackendoff : –; Freidin : –; and the references given in these sources)—may be determined purely on the basis of experience by a mind with no predisposition to language, we conclude that knowledge of language arises from the interaction of innate knowledge with relevant experience. This does not mean that UG directly determines facts of the type in ()–() regarding ellipsis, anaphora, etc., but rather that such facts can be seen as consequences of fairly abstract innate principles interacting with experience. The question of the balance between innate knowledge and experience is difficult and complex; it is also to a considerable extent an empirical matter, i.e. it cannot be determined purely by theoretical speculation. (This point is made by Pullum and Scholz ().) We will come back to this in §. below. The important point is that the innateness hypothesis can provide a solution to the poverty-of-the-stimulus problem. As Guasti says: The nativist hypothesis explains why language acquisition is possible, despite all limitations and variations in learning conditions. It also explains the similarities in the time course and content of language acquisition. How could language acquisition proceed in virtually the same ways across modalities and across languages, if it were not under the control of an innate capacity? (Guasti : )

For many years, Chomsky has argued for an innate language faculty, and takes UG to be the theory of this faculty. Here I will follow this 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

view, in part because of the force of the poverty-of-the-stimulus argument as just given. In the chapters to follow, I will try to show that this point of view can be revealing for our understanding of language change.

1.1. UG and variation in grammatical systems In the previous section, we saw the reasons for postulating the existence of an innate language faculty. Nevertheless, it seems that we cannot escape the fact that different languages have different grammars. We can easily observe that a sentence which is syntactically well-formed in one language may be ill-formed in some other language. Compare the following very simple sentences and non-sentences in English and German: () a. b. c. d.

Tomorrow John will visit Mary. *Morgen Johann wird besuchen Maria. Morgen wird Johann Maria besuchen. *Tomorrow will John Mary visit.

Example (a) is a quite unremarkable English sentence, but its exact syntactic counterpart in German—a word-for-word translation with the words retaining their English order—is ungrammatical in German. Conversely, (c) is a correct German rendering of the English (a), but if we translate it back into English retaining the German word order, we arrive at the impossible (d). The conclusion is clear: English syntax differs from German syntax. How are we to reconcile this conclusion with the postulation of a uniform language faculty? One simple way to answer the question would be to say that English speakers and German speakers are genetically distinct: one aspect of this genetic difference is a difference in the respective language faculties, which has the consequence that English and German have different syntax. This gives rise to the differences observed in (). However, this view cannot be maintained, since we have ample evidence from immigrant communities the world over that children of speakers of one language are perfectly able to become native speakers of the language of their adopted community. In the case of English and German, it suffices to point to the large numbers of German-speaking immigrants to the United States in the late nineteenth and early twentieth centuries whose 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

UG AND VARIATION IN GRAMMATICAL SYSTEMS

descendants have, by now for several generations, been native speakers of English. This simple fact is incompatible with the idea that the syntactic differences observed in () are attributable to some genetic difference between English speakers and German speakers. (Of course, this view would also lead us to postulate genetic differences in cognition between different nationalities and ethnic groups, a highly dubious move on ethical grounds. As we see, however, there is no support for this position, and good evidence against it, in this as in many other domains.) If differences between the grammars of different languages cannot be accounted for in directly genetic terms, then how are we to account for them, given the assumption of an innate UG? It would seem that these differences are not part of the innate language faculty. However, it does not take much technical knowledge of syntax to be able to tell that the differences in word order between English and German in () involve fairly central aspects of syntax. (They involve at least the position of the verb and the position of the direct object, as we shall see in §. and §..) Moreover, there are good reasons to think that syntactic variation across languages is not random; this conclusion has been established by language typologists (quite independently of the assumption of UG in the sense described in the previous section). For example, in () we can see that in English the verb always precedes the direct object; one aspect of the ungrammaticality of (d) is the fact that the direct object precedes the verb (*Mary visit). For this reason, English is referred to as a VO (verb-object) language. On the other hand, in German nonfinite verbs generally follow their objects (cf. Maria besuchen in (c) and the ungrammaticality of the reverse order in (b)); German may therefore be considered a kind of OV language. Language typologists have shown that a number of other variant traits are correlated with VO vs. OV order (see Comrie , Song , , , Croft , and Moravscik  for introductions to language typology and wordorder typology; we will return to these questions in more detail in §., §., and §.). It is thus now generally accepted that syntactic variation among languages is not random. For this reason, coupled with the fact that fairly central properties seem to vary, we might want to ‘build variation in’ to our theory of the language faculty. What seems to be required is a way of expressing syntactic variation within the theory of the language faculty itself. This is achieved by adopting the notion of parameters of variation. The central idea is quite 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

simple: alongside the invariant principles of UG there may be certain limited options which remain open, to be ‘filled in’, as it were, by experience. These options determine the parameters along which grammars may vary, and are thus known as the parameters of variation. In this view, the language faculty consists of invariant principles and associated parameters. In terms of the three factors of language design in (), we could consider both the principles and parameters to be innate, i.e. part of the first factor, as well as the range of options specified by the parameters. Experience, the second factor, is necessary only to fix the values of the parameters. Alternatively, the third factors may play a role too; we will develop this idea in what follows. On this latter view, variation stems less from the innate endowment and more from the interaction of the three factors (see also Crain and Thornton : ; Guasti : ). We can illustrate the interaction of principles and parameters in an informal way using our examples of word-order differences between English and German seen in (). We saw that English is a VO language and that German is OV (at least where V is non-finite). So we could say that UG determines the nature of V (the universal theory of syntactic categories, i.e. Nouns, Verbs, etc., would do this), the nature of O (the universal theory of grammatical functions such as subject, direct object, indirect object, etc., would do this) and how they combine to form a VP (this may be determined in part by the nature of Merge, as we saw in the Introduction, and in part by the theory of grammatical functions, which would state for example that a determiner phrase (DP) merged with V is a direct object). So UG dictates that a Verb and its direct object combine by Merge to form a VP. The parametric option concerns the linear order of V and O—UG says nothing about this. Hence grammars may choose either of the two logical options: OV or VO. As we saw, German takes the former option while English takes the latter. There are several points to note regarding this brief and somewhat simplified illustration. First, we see that the role of experience lies simply in determining the linear order of rather salient elements: verbs and their direct objects. The actual learning task is thus rather simple, and, impoverished though it may be, the stimulus is presumably not so defective that this information cannot be detected by language acquirers. So we reconcile poverty-of-the-stimulus considerations with cross-linguistic variation. This, in essence, is the great 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

UG AND VARIATION IN GRAMMATICAL SYSTEMS

attraction of this approach. It also specifies quite clearly the relation between experience and the innate faculty. If an invariant UG is associated with a set of binary parameters of variation, such that grammatical knowledge arises in the learner as it ‘interrogates’ the linguistic data it is exposed to and thereby sets the parameters that define its grammar; Fodor and Sakas (: ), following Yang (: ff.), called this the ‘ questions’ model of learning. Second, just as in the game of twenty questions, a choice has to be made: not deciding is not an option. At the relevant level of abstraction, the task of acquisition of syntax consists purely in fixing the values of parameters in this way. The ability to acquire a given language may thus be construed as the ability to set the parameters to determinate values. Each grammar must choose a value for each parameter, although certain values may be determined by default—I return to this point in §.. An implication of this in the present case is that all languages can and must be defined as OV or VO; ‘free word-order’ languages cannot exist, for example. Third, options may be determined by ‘gaps’ in UG principles. This appears to be the case in our example of OV vs. VO: everything about the Merge of V and its object to form a VP is determined by invariant Merge except the relative order of merged elements. These elements must be ordered, and if UG provides no specification, a parametric option is created. It seems that the content of the parametric option is simply to force a consistent choice on a grammar. We can take this approach further and view parameters as emergent properties of the three factors of language design given in (); see Roberts (a, a), Biberauer and Roberts (, ). Parameters arise from the interaction of the three factors, if we construe the latter as follows: () Factor One: underspecified Universal Grammar (UG). Factor Two: Primary Linguistic Data (PLD). Factor Three: two third-factor principles, e.g.: (i) Feature Economy (FE) (Roberts and Roussou : ): Postulate as few formal features as possible. (ii) Input Generalization (IG) (based on Roberts :  and §.): Maximize available features. Together, the two third-factor conditions given here form an optimal search/optimization strategy of a minimal kind (Biberauer () unifies them as Maximize Minimal Means); we will look at these and other 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

third factors more closely in §.. Roberts (a) argues that these third factors interact with the other two factors to give rise to parametric variation, expressed in the form of parameter hierarchies. I will return to these points in §. and §.. This approach is conceptually attractive as it makes variation an inherent part of the system rather than an unexplained accretion. Fourth, parameters are usually thought of as binary, either/or options. This of course relates to the previous two points. The importance of this idea is that, like other aspects of UG, parameters are discrete entities: here as elsewhere, clines, continua, squishes, and the like are ruled out. This idea does not prevent us from postulating parameters with more than two values; such parameters can always be reconstrued as networks of binary parameters. Again, I will say much more about hierarchies of parameters in §. and §.. The parameters of UG tell us what is variant, and by implication what is invariant, in UG. They do three things that are of considerable general interest. First, they predict the dimensions of language typology. In our example, this implies that all languages can be divided into VO or OV. The VO languages include, in addition to English, the Romance languages, Greek, the Bantu languages, Thai, and many Papuan languages. The OV languages include Latin, the Indic languages, the Dravidian languages, Japanese, Korean, the Turkic languages, and many Amerindian languages (see Dryer a and §.). Parameters can thus play a central role in the classification of languages. An important facet of this idea is that parameters may be able to define clusters of covarying properties, of the type stated by implicational and other types of universal put forward by typologists and others. We will come back to implicational universals in §.. For example, VO vs. OV order seems to correlate with the relative order of auxiliaries and main verbs. We observe that auxiliary-verb (AuxV) and VO pattern together in English, while in German VAux and OV pattern together (again, we limit our attention to non-finite auxiliaries in German for the sake of simplicity): () a. John can visit Mary. b. Johann wird Maria besuchen können.

(AuxV) (VAux)

Such clustering of variant properties is of central importance for language typology, since it establishes that syntactic variation is non-random. I will explore this and other word-order correlations more in §.. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

Second, parameters should predict aspects of first-language acquisition. As noted above, first-language acquisition of syntax consists largely, perhaps exclusively, in fixing the values of parameters. In that case, we expect to be able to observe the effects of this parameter-fixing process in the development of syntax. Intensive research on this topic over recent years has provided some intriguing conclusions on this point, which we will review in §.. (See Guasti  for a much more detailed summary.) If variant properties cluster, as mentioned above, then it may be that one aspect of a grammar is acquired ‘for free’ once another is acquired (for example, AuxV order as a consequence of VO order). If this idea can be maintained, the task of the language acquirer is further simplified, and acquisition and typology are bound together. Third, and of most concern to us here, parameters can tell us which aspects of syntax are subject to change in the diachronic dimension. On the basis of the English and German examples in (), we can see that the relative order of the verb and its direct object, as a parameter, may be subject to change. In fact, we observed this in the Introduction when we compared Ælfric’s English with present-day English. There we saw that, at least in OE subordinate clauses, direct objects precede their verbs (‘that we the tree not touch’; ‘though that ye of the tree eat’), while of course such orders are not possible in present-day English. If variant properties cluster together, then the clear prediction is that, all other things being equal, they will change together. (We will look at the case of VAux and OV order in the history of English in §.. and §..) The nature of parametric change will be a central focus of this book. In this section, we have seen the motivation for the notion of parameter in UG, and, albeit in a fairly rough form, an example of a parameter. We have also suggested that parameters may arise from the interaction of the three factors in language design as in (). The rest of this chapter is devoted to giving more detailed examples of parameters, both in the synchronic and the diachronic domains.

1.2. Null subjects The first phenomenon we will look at is null subjects. We will see that there are three different kinds of null-subject language, alongside nonnull-subject languages such as English. As with all the parameters to be discussed in this chapter, we first present the relevant phenomena in 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

the synchronic dimension, and then present the diachronic corollaries, i.e. the evidence that the synchronic variation described by a parameter finds parallels in diachronic change. 1.2.1. Null subjects in the synchronic dimension I: consistent null-subject languages The original motivation for the postulation of a null-subject parameter is that certain languages allow finite clauses to express a definite, referential, pronominal subject covertly, or silently. In other languages, this is impossible. The contrast is illustrated by the following examples: ()

Parla italiano. speaks- Italian ‘He/she speaks Italian.’

[Italian]

()

Habla español. speaks- Spanish ‘He/she speaks Spanish.’

[Spanish]

()

*Parle français. speaks- French ‘He/she speaks French.’

[French]

()

*Speaks English.

These examples show us that where French and English require an overt pronoun, Italian and Spanish permit a phonologically null pronoun pro. Hence () and () have the form pro parla italiano and pro habla español respectively. This null pronoun can alternate with an overt form: ()

Lui/lei parla italiano. he/she speaks Italian ‘He/she speaks Italian.’

[Italian]

()

Él/ella habla español. he/she speaks Spanish ‘He/she speaks Spanish.’

[Spanish]

The overt pronoun-containing cases are generally agreed by nativespeakers to be interpreted as ‘emphatic’. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

Null-subject languages of the Spanish or Italian type also permit alternations between null and overt pronouns in embedded contexts, which don’t always have the same interpretation: ()

Todos los estudiantesi piensan que ellosi/proi Every student thinks that (they) son inteligentes. are intelligent.

[Spanish]

According to Montalbetti (), the overt pronoun ellos here resists the bound-variable interpretation (‘For every x, x a student, x thinks x is intelligent’). Generally pro behaves like overt pronominals as far as its interpretative properties are concerned. So, in an example like (), both covert pro and overt él can either refer to Juan or to some linguistically unspecified male third party, just like English he in the translation: ()

Juani piensa que proi/j/éli/j es inteligente. J. thinks that he is intelligent.

[Spanish]

Furthermore, pro acts like a typical pronoun in that it only allows ‘strict’ interpretations, those which are identical to the interpretations of an overt antecedent. Hence in (b) the silent pronoun can refer to su propuesta, but where the su refers to María (a rather natural reading out of context) so must the understood su in (b): ()

a. María cree que su propuesta Maria believes that her proposal aceptada. accepted. b. Juan también cree que—será Juan also believes that—will-be

será will-be

[Spanish]

aceptada. accepted.

This behaviour contrasts with what we observe under VP-ellipsis in English as in (): ()

John wants his proposal to be accepted and Bill does—too.

Here, the elided material following does can be understood as containing a fully ambiguous pronoun: Bill may either want his own proposal to be accepted, or he may want John’s proposal to be accepted. The latter is known as the ‘strict’ reading (since the elided material has exactly the same interpretation as the overt material in the previous 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

clause), and the former is known as the ‘sloppy’ reading. The silent material in () corresponds to an elided VP, not a pronoun. In () the silent material is a pronoun. Pronouns typically only allow ‘strict’ readings. In the terminology introduced in Hankamer and Sag (), VP-ellipsis is a case of ‘surface’ anaphora while pro, as a typical pronoun, involves ‘deep’ anaphora. So we postulate pro because it acts like an overt pronoun in an English-type language and is interpreted as an argument. More generally, if subject arguments typically denote individuals while pronouns, like other nominal expressions, denote individuals, then we should treat pro as a nominal. Moreover, there are cases where subjects are semantically empty. These are known as ‘expletive’ or ‘non-referential’ subjects. English has two such subject pronouns, it and there, illustrated in (): ()

a. It is raining. b. It seems that John is intelligent. c. There arrived a train.

In null-subject languages of the Spanish-Italian type, there is no overt counterpart to such pronouns. Instead, the subject is obligatorily silent in the translations of (), as the following Italian examples illustrate: ()

a. Piove. rains ‘It is raining.’ b. Sembra [che Gianni sia intelligente]. seems that John be intelligent ‘It seems that John is intelligent.’ c. È arrivato un treno. is arrived a train ‘There arrived a train.’

[Italian]

One way to account for the appearance of overt expletive pronouns in English of the kind seen in () is to propose the following (see Chomsky : ): ()

The Extended Projection Principle (EPP): Every clause must have a filled subject position.

We can generalize () to null-subject languages of the Spanish-Italian type if we postulate expletive pro here (see Rizzi , a). Since 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

() is not a requirement for a phonologically overt subject, silent pro satisfies the EPP in (). We can extend this idea to a construction known as ‘free inversion’ in Italian and similar languages. Here, the subject appears in a position which follows the verb; in fact it follows the whole ‘verbal complex’ of auxiliary and participle in compound tenses: ()

Ha telefonato Marco. has telephoned Marco ‘Marco has called.’

[Italian]

Taking ‘subject position’ in () to refer specifically to a preverbal subject position, we postulate pro in this position in (), just as in (). An important advantage of this idea, first pointed out by Rizzi (), is that it allows us to account for a contrast in certain kinds of wh-interrogatives between English and Spanish/Italian-type languages. In English, a wh-question cannot be formed on the subject of a finite subordinate clause where a complementizer is present. This phenomenon, known as the that-trace effect, is illustrated by the contrast in (): ()

a. *Who did you say that — would fix the bike? b. Who did you say — would fix the bike?

In (a), wh-question-formation on the subordinate-clause subject is degraded, while in (b), where there is no complementizer present, such question-formation is allowed. Languages such as Spanish and Italian do not show this effect, as the example in () illustrates: ()

Chi hai detto che — ha telefonato? who have- said that has telephoned ‘Who did you say called?’

[Italian]

Rizzi () showed that, despite initial appearances, Italian has ‘thattrace effects’ too; this was based on a complex argument regarding the interpretation of negative quantifiers which I will not go into here. Rizzi then attributed the grammaticality of () to the fact that the whquestion is formed on the postverbal, ‘freely-inverted’ subject position seen in (). Owing to the possibility of ‘free inversion’ then, Italian has a wh-question-formation option for subjects that English doesn’t have. Since ‘free inversion’ is possible thanks to the availability of expletive pro to occupy the preverbal subject and thereby satisfy the Extended Projection Principle of (), ultimately the contrast in wh-questions 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

between Italian and English can also be traced back to the availability of pro in the former language but not the latter. All of this led Rizzi () to postulate the following ‘parametric cluster’. Italian has all the properties in (), while English has none of them: ()

a. The possibility of a silent, referential, definite subject of finite clauses. b. ‘Free subject inversion’. c. The apparent absence of complementizer-trace effects. d. Rich agreement inflection on finite verbs.

Properties (a–c) are all consequences of the availability of pro in the preverbal subject position, as we have seen. The natural question to ask at this point is what allows a language to have pro. (d) is a plausible answer: languages like Italian and Spanish clearly differ from languages like English in that the finite verb shows a rich array of subjectagreement markers. This is illustrated for Italian, Spanish, and Greek (another language of this kind) in (): ()

a. Italian: bev-o, bev-i, bev-e, bev-iamo, bev-ete, bev-ono. b. Spanish: beb-o, beb-es, beb-e, beb-emos, beb-éis, beb-en. c. Greek: pin-o, pin-is, pin-i, pin-ume, pin-ete, pin-un. All: ‘I drink’, etc.

In (), we observe that all six person-number combinations making up the present tense of the verb ‘to drink’ have distinct endings, and the same is true for nearly all tenses of nearly all verbs in these languages and others like them. This obviously contrasts with the minimal person-number inflection of English, limited to the marking the third-person singular of the present tense with -s, and with the Mainland Scandinavian languages (Danish, Swedish, and Norwegian) which entirely lack person-number agreement on finite verbs and which, like English, do not allow null subjects. However, there are languages which are ‘poorer’ in agreement inflection than the languages shown in () but ‘richer’ than English or Mainland Scandinavian. Compare, for example, the German and Romanian verb paradigms in (): ()

a. German: schlaf-e, schläf-st, schläf-t, schlaf-en, schlaf-t, schlaf-en. b. Romanian: dorm, dorm-i, doarm-e, dorm-im, dorm-iţi, dorm. Both: ‘I sleep’, etc. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

Here we see five out of six distinct forms in each case. However, German is not a null-subject language while Romanian is. Outside of the really clear cases like (), then, ‘rich agreement’ is not a fully reliable guide to what determines the presence of pro; see Roberts (a: –) for further discussion of ‘rich agreement’ and the suggestion that it is a second- and third-factor phenomenon in terms of the three factors in (). Italian, Spanish, and Greek are examples of the first type of nullsubject language to be discussed in the generative literature, notably by Rizzi () as we have seen. Following Roberts and Holmberg (), we call these ‘consistent null-subject languages’ (or CNSLs). The properties of CNSLs are as follows: ()

Consistent null-subject languages: (i) the possibility of leaving the definite subject pronoun unpronounced in any person-number combination in any tense; (ii) rich agreement inflection on the verb; (iii) sg null subjects restricted to a definite interpretation; an arbitrary null subject (in a finite clause) needs a special marker in the sg; (iv) conform to the cluster in (); (v) allow overt subject pronouns, but with a different interpretation.

We have seen all of the properties listed in () except for (iii). This is shown by examples such as (): ()

Qui non (si) può fumare. Here not (SI) can smoke ‘He/she/one can’t smoke here.’

[Italian]

This example features a null subject pro. If the ‘medio-passive/impersonal’ clitic si is left out, pro must be interpreted as definite, translating therefore as ‘he’ or ‘she’. If si is included, on the other hand, pro is interpreted as indefinite, translating roughly as ‘one’. The important point is that the indefinite interpretation is unavailable without the presence of the special marker si (on whose status, see Manzini ; Cinque ; D’Alessandro ). In the next two subsections, we look at the other main types of null-subject languages: radical and partial null-subject languages. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

1.2.2. Null subjects in the synchronic dimension II: radical null-subject languages Many languages, including Japanese and Chinese, appear to allow null arguments very freely as subjects and objects with no agreementmarking at all to ‘recover’ their content: ()

a. John-wa [zibun-no tegami-o ] suteta. John- self- letter- discarded. ‘John threw out his (own) letters.’ b. Mary-mo — suteta. Mary-also discarded. ‘Mary threw out—too.’

[Japanese]

Should we analyse the silent object in (b) as a pro comparable to the null subject in our Italian and Spanish examples in the previous subsection? If so, the notion that pro is connected to rich agreement will clearly have to be abandoned, or at the minimum radically modified, since there is no agreement marking of any kind in Japanese. An alternative approach to null arguments in languages of this kind has been developed in recent years, originating in proposals by Oku () and Tomioka (). This approach relies on the fact that Japanese doesn’t have determiners, and so the null arguments could be cases of NP-ellipsis where no determiners are present. This idea is seen most clearly if we assume that nominals can have at least two levels of internal structure, illustrated with a simple English example in (): ()

[DP the/this/a/every [NP man ]]

Here DP stands for Determiner Phrase; as a first approximation the/ this/a and every can be seen as determiners of various kinds. In certain contexts, notably after demonstratives, the NP can be elided: ()

I like those shirts but I don’t like [DP these [NP — ]].

In a language like Japanese, where determiners are not obligatory in referential nominals (this is not to imply that Japanese lacks demonstratives and quantifiers, but it does lack definite and indefinite articles and it does allow singular count nouns in argument positions—see (), ()), we could envisage a structure like the following: ()

[DP Ø [NP — ]] 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

Such a nominal would constitute a silent argument, but it would not be pro. In this connection, Tomioka () put forward the following generalization (Tomioka : ): ()

Tomioka’s Generalization: ‘All languages which allow discourse pro-drop allow (robust) bare NP arguments . . . Null pronouns in Discourse Pro-Drop languages are simply the result of N’-Deletion/NP-Ellipsis without determiner stranding.’

Languages like English and Italian have rich determiner systems, including definite and indefinite articles. In these languages, bare singular count nouns cannot function as arguments; they must be marked by some form of determiner: ()

a. *John read book. b. *Gianni ha letto libro.

[Italian = (a)]

In Japanese, on the other hand, bare singular count nouns are readily found: ()

Taroo-ga hono-o yonda. Taro- book- read ‘Taro read a/the book.’

[Japanese]

As the translation shows, hono-o in this example can be read out of context as either definite or indefinite. In the previous subsection we introduced the distinction between ‘strict’ and ‘sloppy’ readings of pronouns and elided material. This distinction is illustrated in the following English examples: ()

a. John threw out his letters and Bill did—too. b. John threw out his letters and Bill threw them out too.

In (a) we have VP-Ellipsis and a sloppy reading (Bill threw out his letters) is allowed. Following the terminology introduced by Hankamer and Sag (), this is ‘surface anaphora’. On the other hand, the pronoun in (b) does not allow the sloppy reading: them here can only refer to John’s letters. There is no strict vs sloppy ambiguity; this is ‘deep anaphora’. The key observation regarding Japanese is that (b) allows the sloppy interpretation (‘Bill threw out his letters’). Therefore, like the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

English VP-ellipsis example (a), this is a case of surface anaphora, and as such appears to be a case of ellipsis (of NP, in terms of the structure in ()), rather than pro (which, as a pronoun, is associated with deep anaphora). (b) features a silent object, but the parallel observation can be made about silent subjects in Japanese, as the following examples show: ()

a. Taroo-wa [zibun-no kodomo-ga eigo-o [Japanese] Taro- self- child- English- hanasu to] omotteiru. speak  think ‘Taroi thinks that hisi child speaks English.’ b. Ken-wa [— furansugo-o hanasu to ] omotteiru. Ken- e French- speak  think ‘Ken thinks that—speaks French.’

The null subject in (b) can be interpreted as either Ken’s child or as Taro’s child; the former is a case of the sloppy reading. The fact that the sloppy reading is allowed means that this is a further case of surface anaphora, implying that here too we have ellipsis and not pro. Compare () with its English counterpart in (): ()

Taro thinks that his child speaks English, but Ken thinks that he speaks French.

Here, only the strict reading is available for he in the second clause. So this is deep anaphora, typical of pronouns. This example contrasts minimally with the Japanese examples in (). A further observation concerns what are known as ‘quantificational readings’ of numeral and other expressions, as in: ()

a. Sannin-no mahootukai-ga Taroo-ni ain-ni [Japanese] Three- wizard- Taro- see-to kita. came ‘Three wizards came to see Taroo.’ b. Hanako-ni-mo ai-ni kita. Hanako--also see-to came ‘They came to see Hanako too.’



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

In (b) the three wizards in question do not have to be interpreted as the same three as in (a). This is a further example of surface anaphora, akin to sloppy readings of pronouns, and further supports the idea that these null arguments are cases of ellipsis rather than pro. Compare again the parallel English examples, with a pronoun and with VPEllipsis: ()

a. Three wizards came to visit John. They came to visit me too. b. John met three wizards and I did—too. —quantificational reading possible.

In (a) the pronouns they can take three wizards in the previous sentence as its antecedent, and here it is understood that they refers to the same three wizards as three wizards in the first sentence (e.g. Gandalf, Dumbledore, and Merlin). In (b), on the other hand, while the elided VP is understood as ‘met three wizards’, the three wizards can be a different set of three wizards as compared to the antecedent. The consistent pattern is that ellipsis is associated with quantificational readings of numeral phrases (as opposed to coreferent readings), while pronominal anaphora is associated with strictly coreferent readings. Following this pattern, we are led to see the null arguments in Japanese as cases of nominal ellipsis. I conclude that null arguments in radical null-subject languages involve ellipsis, not pro. More generally, we can summarize the properties of radical null-subject languages (RNSLs) as follows: ()

(i) no restrictions on omission of subject or object pronouns of any kind; (ii) no agreement inflection on the verb (or anywhere else, seemingly); (iii) null subjects can be either definite (= ‘he/she’) or indefinite (= ‘one’); (iv) nothing really comparable to ‘free-inversion’ and these languages don’t generally conform to the cluster in (), but many of these languages lack overt wh-movement and many may lack complementizers; (v) sloppy readings possible, supporting the idea that the silent arguments are cases of ellipsis.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

Property (iii) is illustrated by examples like the following: () Haru-ga kure-ba, tabi-ni de-taku-naru. [Japanese] spring- come-when trip- leave-want-become ‘When spring comes, one wants to go on a trip.’ Here the null subject has an indefinite interpretation, as its translation by ‘one’ reveals. There is no special marker comparable to Italian si here. 1.2.3. Null subjects in the synchronic dimension III: partial null-subject languages Starting with Holmberg (), a third type of null-subject language has been recognized, known as partial null-subject languages (PNSLs). The properties of PNSLs are as follows: ()

(i) person restrictions on omission of a definite subject pronoun, especially third-person in root contexts; (ii) not necessarily very rich agreement inflection on the verb; (iii) sg null subjects can have an indefinite interpretation without the need for a special marker; (iv) subjects are typically preverbal, there is no general ‘freeinversion’ option, hence these languages don’t generally conform to the cluster in (); (v) allow overt subject pronouns with no interpretative difference.

Holmberg () illustrates property (i) with Finnish: () a. (Minä) I b. (Sinä) You c. *(Hän) He/she d. (Me) We e. (Te) You f. *(He) They

puhun speak- puhut speak- puhuu speak- puhumme speak- puhutte speak- puhuvat speak-

englantia ‘I speak English, etc.’ [Finnish] English englantia English englantia English englantia English englantia English englantia English 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

In general, third-person null subjects are not possible with definite interpretations in main clauses. Regarding third-person null subjects in subordinate clauses, Holmberg (: ) says: ‘A rd person definite subject pronoun can be null when it is bound by a higher argument, under conditions that are rather poorly understood’. He illustrates this with the following example: () Pekkai väittää [että häni/j/Øi/*j puhuu Pekka claims that he speaks englantia hyvin ]. English well.

[Finnish]

Here we see that the overt subject pronoun in the subordinate clause, hän, may take the main-clause subject Pekka as its antecedent or may refer to a contextually given male individual. The null subject, by contrast, can only have the linguistic antecedent Pekka as its antecedent. This contrasts with what we see in CNSLs; see the Spanish example in (). A further contrast with CNSLs concerns the possibility of an indefinite interpretation for third-person singular main-clause subjects without a special marker, as in: () Täällä ei saa polttaa. Here not may smoke ‘One can’t smoke here.’

[Finnish]

This example contrasts minimally with the Italian example in (). Other PNSLs include Marathi and other Indo-Aryan languages (Holmberg, Nayudu, and Sheehan ), Russian (Barbosa ), Hebrew (Borer , , ; Landau ; Shlonsky ), Brazilian Portuguese (Figuereido-Silva ; Holmberg ; §.., §..), Icelandic (Holmberg, p.c.). As we will see in §.., several old Germanic languages also appear to have been PNSLs at some point in their history (see Walkden , Rusten  on Old English; Axel  on Old High German; Kinn  on Old Norwegian; and Kinn, Rusten, and Walkden  on early Icelandic; and §..). 1.2.4. Null subjects in the synchronic dimension IV: types of null subjects So we arrive at the following typology of null-subject languages: 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

() a. Consistent NSL (Italian, etc.): agreement-determined subjectpronoun drop; null arguments intrinsically definite. b. Radical NSL (Japanese, etc.): apparently free pronoun-drop, argument-ellipsis. c. Partial NSL (Finnish, etc.): subject pronoun-drop restricted according to persons and structural context; null arguments intrinsically indefinite. The obvious question to ask now is why we find these different types of null-subject language. Roberts (a: ) makes a connection between null subjects and the nature of determiner systems. Alongside ‘rich agreement’, as discussed above, we can informally describe a notion of ‘rich determiners’: languages which overtly mark both definiteness and indefiniteness in nominals, usually along with gender and number marking on the articles. In these terms, we can cross-classify ‘rich agreement’ and ‘rich determiner’ languages as follows: () Rich D? Rich agreement? + +

+

Italian, etc., i.e. consistent NSLs (CNSLs) partial NSLs (PNSLs) radical NSLs (RNSLs)

The fourth logical possibility is given in (); this characterizes non-nullsubject languages (NNSLs) such as English (English has definite and indefinite determiners, but they do not bear overt number and gender features): () +

the fourth possibility: non-NSLs

In these terms, CNSLs and NNSLs (Romance, Germanic, etc.) have definite and indefinite articles. We must add a proviso for the Southern and Western Slavic languages (with the exception of Bulgarian/Macedonian); these languages are CNSLs but do not have definite or indefinite determiners. On the other hand, they do have rich clitic systems, which can be analysed as determiner-like elements bearing person and number features. Russian, a PNSL, has lost its pronominal clitic system, and contrasts minimally with Polish, a CNSL, which has retained its clitics. PNSLs typically have ‘more’ verbal agreement inflection than English, but have no articles or clitics (Russian/Finnish), only definite articles (Hebrew, Icelandic) or the possibility of bare singular count nouns in argument position (Brazilian Portuguese). Finally, RNSLs lack articles and clitics altogether. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

We expect from the above that PNSLs contrast with CNSLs (and NNSLs) in allowing bare nominals in argument positions: () a. Kirja on pöydällä. book is table-on ‘The book is on the table.’ b. polis-An-nI cor pakaD-I-A police-- thief. catch-- ‘The police caught the thief.’ c. Cachorro(s) gosta(m) da gente. Dog(s) like(s) of people ‘Dogs like people.’

[Finnish]

[Marathi (Holmberg et al. )] [BP]

So it seems that the type of null subject is partly a consequence of the nature of the determiner system. Why should this be the case? We saw above (see ()) that apparent argument ellipsis in RNSLs is the result of NP-ellipsis under a null determiner. Barbosa () argues at length that pro should be seen as a minimal NP, rather than as a full DP. In these terms, we can characterize the three kinds of null-subject language as follows: () RNSL: [DP Ø [NP e ]] PNSL: [DP Ø/D[–def] [NP pro ]] CNSL: [DP D[def] [NP pro ]]

(e = ellipsis site)

(Here we see that PNSLs may entirely lack determiners, as in Finnish and Russian, or may have determiners which do not make definiteness distinctions, as in Hebrew and Icelandic.) We now need to complete the picture by looking at what happens in NNSLs. In a classic paper, Postal () argued that in English pronouns are really a kind of determiner. One important piece of evidence for this idea comes from non-possessive adnominal pronouns (studied in great detail in Höhn ), as in examples such as the following: () We professors love you students. The definite article can be seen as the third-person version of the determiner, with we as first-person and you as second-person: () a. [DP [D The ] [NP professors ]] love [DP [D the ] [NP students ]]. b. [DP [D We ] [NP professors ]] love [DP [D you ] [NP students ]]. Now, adopting Barbosa’s proposal that pro is an NP, we can propose the following structure for sentences containing apparently simple pronouns: 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

() [DP [D We ] [NP pro ]] love [DP [D you ] [NP pro ]]. The difference between CNSLs and PNSLs, on the one hand, and NNSLs on the other, is that in the former languages the pronominal D can form a morphosyntactic unit (a complex head D+V to a first approximation) with the finite verb, while in NNSLs this doesn’t happen. The operation forming a complex head is known as incorporation; see the discussion of verb-movement in §.. We are now able to give a more precise account of the properties which distinguish the various kinds of null-subject languages, including NNSLs such as English. Let us take ‘rich D’ as in () and () to mean that Person and Number features are associated with the determiner system (following the idea in Longobardi  and Richards  that Person and Definiteness may reduce to the same thing), while ‘rich agreement’ refers to the same property of the position occupied by the finite verb (which, as we will see in detail in §..., is the Tense position, so incorporation creates a complex D+T/V head). Then we can reformulate () and () as follows, at the same time subsuming (): () a. b. c. d.

RNSL: PNSL: CNSL: NNSL:

T[-Person] . . . T[+Person] . . . T[+Person] . . . T[-Person] . . .

[DP Ø[-Person] . . . [DP D/Ø[-Person] . . . [DP D[+Person] . . . [DP D[+Person] . . .

[NP e ]] [NP pro ]] [NP pro ]] [NP pro ]]

We can now understand why CNSLs and PNSLs, but not NNSLS, allow a pronoun in D to incorporate into T, if we suppose that a condition on incorporation is that the category which incorporates cannot have a positive feature that the incorporation host lacks. This condition is met in (b) and (c), but not in (d). The condition may be met in RNSLs, but these are really distinct kinds of system, in that both Person and NP pro appear to be absent. As we have observed, RNSLs fail to show any form of clitics or agreement; in these languages then, it seems all bets are off. The different types of null-subject systems are distinguished by the different features associated with their D and T heads. This is consistent with a general approach to the nature of parameters known as the Borer-Chomsky Conjecture, which we can state as follows: () All parameters of variation are attributable to differences in the features of the functional heads in the Lexicon (Borer : –; Chomsky : ; Baker : f.). 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

Since D and T are functional heads (as opposed to lexical heads such as Nouns, Verbs, and Adjectives), differentiating the various null-subject systems as in () is in line with the Borer-Chomsky Conjecture. We conclude that the current state of synchronic knowledge regarding null-subject languages involves at least the following empirical and theoretical claims. First, there are (at least, see below) three types of null-subject language: CNSLs, RNSLs, and PNSLs, with the distinct properties as given in (), (), and () above. Second, only CNSLs are expected to conform to the parametric cluster in (). Third, the internal structure of DP and the nature of the D-system are relevant for null subjects. There are also a number of open questions. First, the question of the nature and characterization of both ‘rich agreement’ and ‘rich determiners’ is to some extent open, and in any case in need of greater precision (as we will see in §..). Second, does the four-way typology given above (including NNSLs) exhaust the typology of NSLs? Barbosa (: –) argues that there is a further type, which she calls ‘semiprodrop’ languages. Semi-prodrop languages are more restricted than PNSLs, allowing only non-referential and indefinite null subjects; fully referential, definite null subjects are never possible in languages of this kind. Barbosa suggests that various creoles and Icelandic are of this type (see §. on creoles). It is not clear how this type, if it is distinct from PNSLs (see Roberts a: , n.  on this), would fit in to the typology suggested above. Third, it should be apparent from the above discussion that RNSLs are radically different from the other types; we will return to this point in §... Fourth, null objects, both direct and indirect, need to be integrated into our typology. Finally, the position and status of overt preverbal subjects in CNSLs needs to be clarified. To see what is at issue here, consider a simple subject-verbobject sentence in a CNSL such as Italian: () Gianni parla italiano. John speaks Italian.

[Italian]

There are two options for the position of the subject here. On the one hand, it may be a true preverbal subject occupying a preverbal subject position directly comparable to the position of the subject in the English translation. Alternatively, it could be in a left-dislocated position with pro (now construed as [DP [NP pro]]) in the true subject position. This would be the null-subject counterpart of an English sentence like John, he speaks Italian. The alternatives are schematized in (): 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

() a. [Gianni V O] b. [Gianni [pro V O]] Barbosa (), following a line of work going back to Borer (), argues that (b) is the correct analysis in that the subject occupies a topicalized or left-dislocated position (although she does not propose there is a pro in subject position). This conclusion is not uncontroversial; see the discussion and references in Roberts (a). The distinction between (a) and (b) may be relevant for certain cases of diachronic change, as we will see in §... Having illustrated the various kinds of null-subject system in the synchronic domain and the associated parameters in (), it is now time to look at cases of diachronic change involving these systems. 1.2.5. Null subjects in the diachronic dimension I: European and Brazilian Portuguese In () we listed the properties of CNSLs as follows: () (i) the possibility of leaving the definite subject pronoun unpronounced in any person-number combination in any tense; (ii) rich agreement inflection on the verb; (iii) sg null subjects restricted to a definite interpretation; an arbitrary null subject (in a finite clause) needs a special marker in the sg; (iv) conform to the cluster in (); (v) allow overt subject pronouns, but with a different interpretation. These are to be contrasted with the properties of PNSLs, repeated here from (): () (i) person restrictions on omission of a definite subject pronoun, especially third-person in root contexts; (ii) not necessarily very rich agreement inflection on the verb; (iii) sg null subjects can have an indefinite interpretation without the need for a special marker; (iv) subjects are typically preverbal, there is no general ‘freeinversion’ option, hence these languages don’t generally conform to the cluster in (); (v) allow overt subject pronouns with no interpretative difference. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

When a CNSL changes into a PNSL, then, we expect to observe the following: ()

(i) (ii) (iii) (iv)

the introduction of person restrictions on null subjects; possible loss of rich agreement on the verb; loss of special markers for indefinite markers of null subjects; loss of ‘free inversion’ and associated apparent complementizer-trace effects; (v) loss of interpretative differences between overt and null subjects.

Given the nature of the diachronic record of many languages, properties (iii–v) may be difficult to observe. On the other hand, erosion of inflection in general, including verbal agreement inflection, is easy to observe; in particular, it has very clearly happened in many Germanic languages. The introduction of person restrictions on null subjects can be observed through the selective increase in the use of overt pronouns in some persons but not others, perhaps in specific contexts, as we shall see. All of the changes in () can be observed in the recent history of Brazilian Portuguese (BP), since the early nineteenth century. This has been documented by Duarte (, , ). Concerning (i), Figuereido-Silva (: ) claims that in main clauses in an ‘out-of-the-blue’ context (e.g. where there is no salient discourse topic; see Barbosa : , note ) thirdperson null subjects are dispreferred, while first- or second-person null subjects are readily possible (I will return to this point in §..): () a. pro encontrei a meet.. the ‘I met Mary yesterday.’ b. *pro encontrou a meet.. the

Maria ontem. Mary yesterday

[BP]

Maria ontem. Mary yesterday

European Portuguese, on the other hand, readily allows third-person null-subject pronouns at greater frequency than BP; see Duarte (: , table ). Concerning (ii), it is well-known that BP has undergone a reorganization of the pronominal system in such a way that formerly third-person pronouns came to be used as sg, pl, and pl forms, replacing the earlier pronouns, with the result that sg verbal inflection is now used in these persons as well. The effect is a levelling of the verb paradigm, as shown in (): 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

() a. EP and former BP paradigm: (eu) falo (nos) falamos (tu) falas (vos) falais (ele/ela) fala (eles) falam b. Reorganized colloquial BP paradigm: eu falo a gente fala você fala vocês falam ele/ela fala eles falam ‘I speak, etc.’ (There are some complications concerning the use of tu in twentiethcentury BP; see §.., note .) The effect of the reorganization of the pronominal system is a reduction of the number of distinctions in the verbal inflection from six to three. As such the new system arguably falls below the threshold for ‘rich’ agreement and is thus incompatible with CNSL status (see also Roberts b). Concerning (iii) (Holmberg : ), following an earlier observation by Rodrigues (), observed the following contrast between European and Brazilian Portuguese: () a. É assim que faz o doce. [BP] Be.. thus that make.. the. sweet ‘This is how one makes the dessert.’ b. É assim que se faz o doce. [EP] be.. thus that SE make.. the. sweet ‘This is how one makes the dessert.’ In (a), the arbitrary/indefinite interpretation of the null subject (corresponding to English one) is available without the presence of the ‘impersonal’ clitic se, while in European Portuguese se is required for this interpretation; (a) is grammatical in European Portuguese, but the null subject must have the definite interpretation ‘he/she’. Concerning (iv), European Portuguese shows all the standard hallmarks of a CNSL: () a. ‘Free inversion’: Telefonou ontem o João. called- yesterday John ‘John called yesterday.’



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

b. Wh-movement of a subject over a finite complementizer: Que aluno disseste que – comprou um computador? which student said- that – bought- a computer ‘Which student did you say bought a computer?’ (Barbosa, Duarte, and Kato : ) BP, on the other hand, does not allow free inversion, except with unaccusative verbs—see Kato (), although it does allow whmovement as in (c); this fact about BP remains anomalous. Finally, examples comparable to () do not show the interpretative difference in BP observed by Montalbetti () for Spanish, in that in examples like () the overt pronoun ele can have a bound-variable interpretation (thanks to Eugênia Duarte, p.c., for providing this example): () Cada um podia fazer as perguntas que ele Each one could- do- the questions that he queria. wanted- ‘Each one could ask the questions that he wanted.’ (Rádio CBN) It appears, then, that BP has changed, or is changing, from a CNSL to a PNSL in its fairly recent history; we will return to the possibly ongoing nature of this change, and some empirical complications, in §... This is correlated with the weakening of agreement seen in () and with the possibility of bare count nouns in argument positions as seen in (c). 1.2.6. Null subjects in the diachronic dimension II: early Germanic Another example of a change from CNSL to PNSL and, at least sporadically, to NNSL comes from the early Germanic languages, although here the textual record is less abundant than in Portuguese. With the exception of Modern Icelandic, which is arguably a kind of PNSL (although, as we mentioned in §.., Barbosa () treats Icelandic as a ‘semi prodrop’ language, distinct from canonical PNSLs), the Modern Germanic languages are all NNSLs. However, it is now fairly clearly established that this was not exactly the situation in several of the older Germanic languages: OE, Old High German (OHG), Old Norse (ON), Old Saxon (OS) and Gothic (see van Gelderen , , Rusten  on OE; Axel  on OHG; Falk , 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

Sigurðsson , Faarlund b, Håkansson , Kinn , Kinn, Rusten, and Walkden  on ON; Ferraresi , , Miller : – on Gothic; and in particular Walkden : – on OS and for general overview and analysis). All of these languages allowed referential null subjects to varying degrees, in contrast to the modern languages (although Modern Dutch and German apparently allow expletive null subjects; see Wurmbrand , Biberauer  for convincing arguments that these putative null subjects are simply absent positions rather than subject positions filled by ‘expletive pro’). Null subjects in the old Germanic languages are illustrated by the following examples: () a. Nu scylun hergan hefaenricaes uard.2 [OE] Now must. praise heavenly-kingdom. guard ‘Now we must praise the lord of the heavenly kingdom.’ (Caedmon’s Hymn, Cambridge University Library MS M, l ; van Gelderen : , (); Walkden : ) b. Sume hahet in cruci. [OHG] some. hang. to cross ‘Some of them you will crucify.’ (Monsee Fragments XVIII.; Matthew :; Axel : ; Walkden : ) c. þer diþi ok drak miolk [Old Swedish] there sucked and drank milk of moþor spina of mother. teats ‘There he sucked and drank milk from his mother’s teats.’ (Tjuvabalken in Den äldre Codex of Westgöta-Lagen, dated ; Falk : , (a); Walkden : ) d. naht jah dag in diupiþai was [Gothic] night-. and day.. in deep was Mareins sea.. ‘a night and a day (I) was on the deep of the sea’ (Cor : B; Miller : )

2

For discussion of variation in the incidence of we in this example across various mss., see Walkden (: ).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

e. Giuuitun im thô eft te Hierusalem went. . then after to Jerusalem iro sunu sôkean their son seek. ‘They then went to Jerusalem to seek their son.’ (Heliand –; Walkden : )

[OS]

These examples illustrate the occurrence of null subjects in the older Germanic languages, showing that something significant has changed in the languages that have survived (i.e. all of them except Gothic). But what kind of null-subject systems are they? We can immediately exclude the possibility that these are RNSLs from the simple presence of verbal subject-agreement marking in all of these languages, illustrated in more detail below (see also Walkden : – for discussion, and dismissal, of this possibility). So we are left with the option of CNSLs or PNSLs. Here the diagnostics () and () are relevant, as well as their diachronic counterparts in (). In practice, given the nature of the available data, only diagnostics (i) (person restrictions on null subjects) and (ii) (presence of ‘rich’ agreement) are easily applicable, although (iii) (indefinite subjects with/without a special marker) and (v) (interpretative differences between null and overt subjects) can also play a role in determining the status of these languages. Throughout the following discussion, I will leave (iv) aside. It seems fairly clear that Gothic was a CNSL. Walkden (: ) states that ‘Gothic seems to pattern with Italian-style languages’. Harbert (: ) says ‘[a]mong the GMC languages, this possibility [‘full’ prodrop—IR] is attested only in GO[thic]’, and Miller (: ) says ‘Gothic was the most null-subject Germanic language’. The agreement paradigm for the present tense of Gothic nasjan (‘to save’) is as follows: () nasj-a, nasj-is, nasj-iþ, nasj-ōs (), nasj-ats (), nasj-am, nas-iþ, nasj-and (‘I save’, etc.; Walkden : , table ., following Wright : –) Here we see seven distinct endings out of eight forms (the usual six person/number combinations, plus first- and second-person dual). This corresponds to the definition of ‘rich agreement’ given in Roberts (a: ), which allows up to one syncretism; in () we observe a syncretism between sg and pl (there is also a syncretism between sg and sg in the past tense; see Walkden : ). Null subjects are 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

found in all persons and, to the extent the data allow us to determine, all tenses in Gothic. Combined with the observation of ‘rich agreement’ in (), we see that diagnostics (i) and (ii) hold for Gothic. Concerning (iii), the older Indo-European languages—all very plausibly CNSLs given their very rich agreement paradigms (see §..)—had synthetic mediopassive forms whose logical external argument was typically a null arbitrary pronoun. Strikingly, Gothic has a synthetic passive of this kind (Miller : ; see also Harbert : f.): () saei gabairada weihs haitada [Gothic] REL bear.. holy... call.. sunus gudis son. god. ‘the holy one who will be born will be called the son of God’ (Luke ;; Miller : ) Information concerning diagnostics (iv) and (v) is unavailable for Gothic. The attested corpus of Gothic dates largely from the sixth century, and as such, aside from a few North Germanic runic inscriptions (Faarlund a: ), is the oldest attested form of Germanic. Thus we may expect Gothic to be more conservative than the other attested Germanic languages—closer to the other older Indo-European languages, e.g. Latin—for the simple reason that it is older. One thing that we clearly see when we compare Gothic with the other early Germanic languages is that the other languages all have more impoverished agreement paradigms, as () shows: () a. Present tense of Old Icelandic taka (‘to take’): tek, tek-r, tek-r, tǫk-um, tak-ið, tak-a (‘I take’, etc.; Faarlund b: ). b. Present tense of OE nerian (‘to save’): ner-ie, ner-est, ner-eþ, ner-iaþ, ner-iaþ, ner-iaþ (‘I save’, etc.; Walkden : , following Mitchell and Robinson : ). c. Present subjunctive of OHG nerine (‘to save’): neri-e, neri-ēs(t), neri-e, neri-ēm, neri-ēt, neri-ēn (‘I save’, etc.; Walkden : , following Braune and Eggers : ). d. Present tense of OS nērian (‘to save’): nēri-u, nēri-s, nēri-ēd, nēri-ad, nēri-ad, nēri-ad (‘I save’, etc.; Walkden : , following Cordes and Holthausen : –). 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

Here the syncretic forms are boldfaced. The Old Icelandic and OHG paradigms show just one syncretism here, but Walkden (: ) shows that fuller paradigms of present and past indicative and subjunctive show at least one further syncretism, which is not the case in Gothic. The OE and OS present tenses show two syncretisms in the plural across all four tense/mood paradigms. So these languages all have slightly, but crucially, impoverished agreement forms compared to Gothic (and compared to canonical CNSLs such as Italian). They also lack inflection for dual number. Furthermore, all of the languages except Gothic show person restrictions. In Old Icelandic, first- and second-person singular and plural pronouns are null in .% of examples and third-person singular and plural pronouns are null in .% (Kinn, Rusten, and Walkden : , table , data based on the IcePaHC corpus of Wallenberg et al. ). Although the incidences are low (and decline through the Old Icelandic period, see Kinn, Rusten, and Walkden : , table ), the person asymmetry is clear and statistically significant, as Kinn, Rusten, and Walkden show. Intriguingly, it is the opposite of that observed in BP; see (). Finally, arbitrary/indefinite null subjects can appear in Old Icelandic with no special marker: () má þar ekki stórskipum fara Can there not big-ships. travel ‘One cannot travel there with big ships’ (Hkr II..; Faarlund b: )

[Old Icelandic]

Old Icelandic thus displays three of the diagnostics of PNSLs. Moreover, the very low overall incidence of null subjects, averaging well under % of all examples, suggests that null and overt subjects did not show interpretative differences in that overt subjects could appear wherever null ones did not. So, Old Icelandic, despite the low overall incidence of null subjects, had the properties of a PNSL (as, arguably, does Modern Icelandic, as we mentioned above). Concerning OE, Walkden (: –) investigated the incidence of null subjects in twenty-five OE texts of over , words from the York-Toronto-Helsinki Parsed Corpus of Old English Prose (Taylor et al. ) and the York-Helsinki Parsed Corpus of Old English Poetry (Pintzuk and Plug ). He concludes that ‘[m]any of these texts . . . show a frequency of overt pronouns of –% in all clause types’ 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

(); see also the data presented in Rusten (: –, , tables . and ., ). The vast majority of these texts are in the ‘standard’ literary variety of OE, West Saxon. However, Walkden (: , table .), following Berndt (: –) and van Gelderen (: , table .), also shows that in two Anglian OE texts, the Rushworth Gospels and the Lindisfarne Gospels, first- and second-person singular and plural null subjects average .% of examples and third-person singular and plural null subjects average .%. In Beowulf, Bald’s Leechbook, Bede and MS E of the Chronicle, all texts traditionally classified as West Saxon but known to have Anglian features, first- and secondperson null subjects occur in .% of examples, and third-person null subjects occur in .%. Walkden concludes that Anglian OE had null subjects subject to a clear person constraint. Rusten () based his quantitative study of null subjects in OE on five corpora (the York-Toronto-Helsinki Parsed Corpus of Old English Prose, the York-Helsinki Parsed Corpus of Old English Poetry, The Penn-Helsinki Parsed Corpus of Middle English (Kroch and Taylor a), the Parsed Corpus of Middle English Poetry (Zimmerman ) and the Penn-Helsinki Parsed Corpus of Early Modern English (Kroch et al. ), a total of  texts and . million words). He concludes that ‘Walkden’s dialect-split hypothesis must be considered falsified’ (Rusten : ). However, in line with Walkden’s observations on the Anglian texts, Rusten finds a general split between thirdperson and non-third-person null subjects, particularly clear in poetic texts, in favour of third-person null subjects (Rusten : , tables . and .). He also finds that the strongest bias in favour of null subjects is in verb-initial clauses, in both prose and poetry (Rusten : , tables . and ., ). However, these biases are very weak statistically, leading him to conclude that they are ‘a type of “residue” that surfaces very unevenly in different genres and texts’ (Rusten : ) and they may be ‘reflective of a formerly productive partial pro-drop property’ (Rusten : ; see also p. ). He thus concurs with Walkden’s conclusion that the null-subject ‘property must have been lost . . . during and before the time that our earliest texts were being produced’ (Walkden : ) (although Walkden states this solely for West Saxon OE while Rusten intends it as applying to all OE dialects). The example from Cædmon’s Hymn in (a) is consistent with this conclusion, being from a text which is somewhat older than most of the surviving OE corpus; Cædmon’s Hymn is usually dated to 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

the late seventh century, while most OE texts are from the ninth and tenth centuries (see also note ). Let us look at these conclusions in the light of the diagnostics in () and (). Concerning diagnostic (v), as in Old Icelandic the general incidence of overt subjects strongly suggests that there were no interpretative restrictions on overt subject pronouns; one indication of this is the general use of hit as the overt subject of meteorological verbs, as in (): () & hit rinde ða ofer eorðan feowertig daga [OE] And it rained then over earth forty days & feowertig nihta on an and forty nights on one ‘and it rained then over earth forty days and forty nights without cease’ (Gen .; Rusten : ) Concerning (iii), generic null subjects are attested in OE with no special marker (here the finite verb mæg is sg): () Þær mæg nihta gewhæm niðwundor [OE] There may night.. each evil-wonder seon, fyr on flode. see, fire on flood ‘There one may each night see an evil portent, fire on flood.’ (Beowulf ; Rusten : ) We saw above in (b) that OE verbal agreement shows two systematic syncretisms, and therefore does not count as ‘rich’, diagnostic (ii). Finally, we have seen the evidence for residual person restrictions, diagnostic (i). Still leaving aside diagnostic (iv), we see that OE has all the properties of a PNSL listed in (). Given the extremely low incidence of null subjects generally, as pointed out by Rusten () (and Walkden for West Saxon), however, it seems that the main part of the OE corpus is essentially non-null-subject, but that there are residual PNSL properties. Null subjects in OHG were analysed in detail in Axel (); see also Axel and Weiß () and the summary and discussion in Walkden (: –). Axel (: ) says that ‘[i]n the older OHG prose texts a person split can clearly be observed’, as shown in Axel (: , table ). As in ON and OE, third-person null subjects are 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

preferred; see Walkden (: –, table ., figure .). So OHG conforms to diagnostic (i) for PNSLs. Concerning ‘rich’ agreement, diagnostic (ii), we saw in the discussion of () that OHG shows one syncretism more than Gothic. Concerning diagnostic (iii), Axel (: ) gives several examples of arbitrary null subjects with pl agreement: () tho brahtun imo luzile Then brought. him little ‘then little children were brought to him’ (T , ; Axel : ; my English translation)

[OHG]

(This example translates a Latin passive, oblati sunt ‘they were brought’.) Again, it is impossible to evaluate diagnostic (iv). Concerning diagnostic (v), Axel (: ) says that ‘there are some indications in the OHG texts that in contrast to Modern Italian, null subjects and overt subject pronouns had the same referential properties in postfinite environments’. For example, in () ‘the overt realization of the subject pronoun does not trigger an emphatic or contrastive reading’: () Dhar ir quhad . . . chiuuiso meinida ir dhar [OHG] Where he said . . . certainly meant he there sunu endi fater son and father ‘where he said . . . he certainly meant there the Son and the Father’ (I ; Axel : ) (Here there is no overt pronoun in the Latin original.) As far as can be determined, then, OHG meets the criteria for a PNSL. This is true of the earlier OHG texts from the eighth and ninth centuries. Axel (: –) points out that in later OHG ‘presumably before the turn of the eleventh century . . . referential null subjects are hardly attested anymore’. OHG may then have undergone the transition from PNSL to NNSL around AD, two or three centuries later than OE. Finally, Walkden (: –) discusses OS. Here too third-person null subjects are preferred; see Walkden’s (: –) table . and figure .. OS shares with OE syncretic forms in the plural across all the verb tense/mood paradigms, as shown in (d); Hogg (: ) says that this is characteristic of North Sea Germanic, including Old Frisian, and so may represent a single very early change in this branch 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

of West Germanic (see also Harbert : ). Evidence concerning diagnostics (iii–iv) is unavailable. OS thus appears to be a PNSL. Walkden (: ) concludes that Proto West Germanic and Proto Northwest Germanic, i.e. the ancestors of all the languages under discussion except Gothic, must have been PNSLs. He tentatively concludes, citing Grimm (: ), Paul (: ), Lockwood (: ), and Fertig (: ), that Proto Germanic was a CNSL. Ringe (: ) also points out that Proto Germanic ‘probably continued to be a pro-drop language’ (i.e. continuing from Indo-European, see §..). This conclusion is supported by the Gothic evidence we have seen, by the fact that Gothic is the oldest attested Germanic language and by the fact that all the oldest attested daughters on other branches of Indo-European (i.e. Latin, Greek, Old Irish, Sanskrit, Old Church Slavonic) show rich agreement and apparently unrestricted referential null subjects and as such are almost certainly CNSLs (see the discussion of syntactic reconstruction in §..). The change from CNSL to PNSL may have taken place between Proto Germanic and Proto Northwest Germanic or it may have taken place separately in Proto West Germanic and Proto North Germanic; parsimony would suggest the former scenario (see again §..). At some stage just before the attestation of (most of) the OE corpus, OE seems to have become an NNSL; the same change occurred in OHG around  , according to Axel (: –). If Modern Icelandic is a PNSL, this change has not occurred there, but it has occurred in the history of the Mainland Scandavian languages, which are clearly NNSLs today (see Håkansson  on Old Swedish; and Kinn ,  on the history of Norwegian; some modern Mainland Scandinavian dialects remain PNSLs, see Sigurðsson  and Rosenkvist  on Ölvdalian). Across Germanic, then, we see a development from CNSL to PNSL to NNSL. Walkden (: ) observes that the development from CNSL to PNSL can also be inferred in the development of Marathi (and other Indo-Aryan PNSLs; see Roberts a:  and the references given there) from Sanskrit and in the development of Russian, argued by Holmberg, Nayudu, and Sheehan () to be a PNSL, from Old Church Slavonic. As we have seen, the change can also be documented in the recent history of Brazilian Portuguese. If we consider the change from CNSL to PNSL in the light of (), we have inferred the introduction of person restrictions on null subjects in Germanic by comparing Gothic with the other old Germanic languages, and Duarte’s (, , ) evidence shows this happening 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

in Brazilian Portuguese. This change seems to go hand in hand with the introduction of syncretisms in the verbal agreement paradigm; in both Germanic and Brazilian Portuguese, the threshold seems to lie between one and two syncretisms (see Müller  for an account of this in terms of the notion of impoverishment in Distributed Morphology). We saw in () that Gothic had synthetic mediopassive verb forms. As Harbert (: ) observes, the synthetic mediopassive is found only in relic forms elsewhere in Germanic, usually with the verb ‘to be called/named’, e.g. ON haiti, OE hātte. So we can infer the loss of special markers for indefinite null subjects in North and West Germanic (see also Hogg :  on the absence of ‘passive inflection’ in the OE verb). Finally, as null subjects become unavailable in certain persons, any earlier interpretative differences between overt and null subjects disappeared. We see that the change from CNSL to PNSL is attested. In OE, later OHG, and Mainland Scandinavian we see the further change from PNSL to NNSL. In Germanic and in Brazilian Portuguese, the erosion of agreement marking, introducing crucial syncretisms, seems to have played an important role. However, loss of agreement cannot be the only factor causing the change from CNSL to PNSL since Finnish, a canonical PNSL, has an Italian-like rich-agreement system with no syncretisms (see ()). In §.., we suggested a correlation between null subjects and the nature of the determiner system; this was given in () and (), repeated here: () Rich D? +

Rich agreement? + Italian, etc., i.e. consistent NSLs + partial NSLs radical NSLs

() +

NNSLs

If it is correct that Gothic was a CNSL while all the other Old Germanic languages were PNSLs later turning into non-NSLs, then () and () lead us to expect that Gothic had rich agreement and rich D while the other Old Germanic languages lacked the rich-D property (and, as we saw, had slightly impoverished agreement as compared to Gothic). So the question () and () lead us to investigate is the nature of the determiner systems in Gothic and the other Old Germanic languages. We expect that Gothic will have the rich-D property while the other languages will lack it. Moreover, we expect that, as they move from PNSL to NNSL, the other languages will lose agreement and develop a rich determiner system. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

Roberts (a: ) proposes that the ‘rich-D’ property is manifested either by the presence of both definite and indefinite determiners or by ‘overt instantiation of φ-features on third-person determiners where a lexical noun is present’. None of the Old Germanic languages, Gothic included, had indefinite articles, and so they all fail to qualify as rich-D languages by this criterion. This implies that the second possibility must be found in Gothic and not in the other Old Germanic languages. Gothic either lacks definite articles or has ‘incipient’ definite articles (Lehmann : , Miller : , n. ; although Miller also points out that ‘it is unclear whether any Gothic Ds are actually in head position’). However, one striking property of Gothic is that there is a paradigm of deictically neutral third-person anaphoric determiners or pronouns, sa, þata, so. These elements cannot be fully assimilated to a definite article since they fail to appear in contexts where a known unique entity is being referred to where the reference is not anaphoric or emphatic in a general sense (see Miller :  for a discussion of the various uses of these elements). The following example illustrates the occurrence of the bare nominals sunus mans (‘the son of man’) and haubiþ (‘head’) with definite interpretations, where the Greek original has definite articles: () iþ sunus mans ni habaiþ ƕar haubiþ [Gothic] But son man.  has where head galagjai - lay.. ‘but the son of man does have anywhere he may lay his head down’ The sa, þata, so paradigm inflects for gender, number, and case. These elements can stand alone, as in gudis sunus ist sa (Matthew :), which Miller (: ) translates as ‘This is God’s son’. They also cooccur with nouns in the relevant contexts: () ahma ina ustauh in auþida.j jah was [Gothic] Spirit him led in desert. And was in þizai auþidai dage fidwor tiguns in ... desert. days four tens ‘the spirit led him out into the desert, and he was in the/that desert forty days’ (Mark .f.; Miller : ) 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

Here Miller translates þizai as ‘the/that’; the desert is first mentioned in the immediately preceding sentence (without sa, þata, so, like ‘spirit’). Miller (: ) concludes that ‘Gothic Ds correspond to Greek articles in emphatic, demonstrative and relative contexts.’ The sa, þata, so paradigm cannot be fully assimilated to demonstratives, because (a) it is deictically neutral and (b) Gothic has full paradigms for proximal (sah, etc.) and distal (jains, etc.) demonstratives. So we can take sa, þata, so to be a φ-bearing D-element which is distinct both from definite articles and from demonstratives. Nonetheless, it may occupy D (although, as Miller points out, this is hard to tell). If so, then the presence of these elements in Gothic instantiates the rich-D property. Coupled with the fact that Gothic has rich agreement, as we saw above (see ()), this makes Gothic a CNSL in terms of (). OE, on the other hand, has just two demonstratives, lacking a deictically neutral one (Hogg : ; Mitchell and Robinson : ; van Kemenade : –; Ringe and Taylor : –). Similarly, ON has just two demonstratives (Faarlund b: ). However, according to Braune and Eggers (: –), OHG had three sets of demonstratives: proximal (dëse/dësēr), distal (jenēr), and neutral (dër) just like Gothic. OHG differs from Gothic in not having rich agreement, as we saw above. However, following (), this would predict that OHG was a NNSL, which at least for earlier OHG does not seem to be true, as we have also seen. It may be, then, that we can only rather weakly conclude that the older Germanic languages were in various stages of transition from having neutral demonstratives and rich agreement, and therefore being CNSLs. This was the situation in Gothic and Proto Germanic. More innovative systems (pre-OE, early OHG, ON) may have lacked a separate neutral demonstrative but had no real article system in that the status of the originally distal demonstrative is unclear and there were certainly no indefinite articles. They were therefore PNSLs. Later, the development of an article system, which was incipient in OE and OHG and developed fully in subsequent stages, made them NNSLs. A further possibility, which could lead us to assert more confidently that the earliest Germanic (at least Proto Germanic) was a CNSL, comes from the fact that a distinction between weak and strong adjectival inflection can be observed across the whole family (except where adjectival agreement has been lost, as in Modern English). Weak inflection is associated with definite nominals, strong inflection occurs elsewhere (Ringe : ). The weak inflection is characterized by a 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

suffix in -n-; Ringe (: ) states that it is ‘reasonable to hypothesize that the n-stem suffix of the weak adjective paradigm was originally a definite article—the first of several that arose within the development of Germanic.’ If this is the case, then Proto Germanic would have had definite determiners and rich agreement and therefore CNSL status. With the reanalysis of the original definite markers as adjectival inflection, the languages developed into PNSLs and then, as agreement eroded and definite determiners developed, into NNSLs. 1.2.7. Null subjects in the diachronic dimension III: French It appears that French has changed its NSL status at least once in its history. At earlier stages, French was a null-subject language of a rather unusual kind. In Old French (OF, –) and Middle French (MidF, –) we can readily find examples of null subjects, such as the following: () Old French: a. Tresqu’en la mer cunquist la tere altaigne. until the sea conquered- the land high ‘He conquered the high land all the way to the sea.’ (Roland, ) b. Si chaï en grant povreté. thus fell- into great poverty ‘Thus I fell into great poverty.’ (Perceval, ) c. Si en orent moult grant merveille. thus of-it had- very great marvel ‘So they wondered very greatly at it.’ (Merlin, ; Roberts a: ff.) () Middle French: a. Et ly direz que je me recommande and her will-say- that I myself recommend humblement a elle . . . humbly to her ‘And you will say to her that I humbly ask her good will . . . ’ (S , )



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

b. Ne vous pourroye a demi dire le tresgrant dueil.  you could- at half say the very-great grief ‘I could not tell you half the great grieving.’ (S , ; Vance : ) The relevant subject pronouns are conspicuously absent in these examples. As we mentioned in §.., such pronouns must be present in Modern French finite clauses; see (). In other words, if we perform the exercise carried out with Ælfric’s English and Modern English in the Introduction, and update all aspects of the examples in () except for their syntax, the result is ungrammatical Modern French, viz.: (0 )

a. *Et lui direz que je me recommande humblement à elle . . . b. *Ne vous pourrais à moitié dire le très grand deuil.

We can see that the grammar of French has changed historically; structures which were formerly grammatical no longer are. This is a clear indication that a parameter has changed. So it seems clear that French used to be a null-subject language. As in the case of the early Germanic languages, the possibility that it was an RNSL can be immediately excluded by the fact that subject-verb agreement is very clearly attested. Can we determine whether it was a CNSL or a PNSL? Here, once more, are the properties which define CNSLs: ()

(i) the possibility of leaving the definite subject pronoun unpronounced in any person-number combination in any tense; (ii) rich agreement inflection on the verb; (iii)  null subjects restricted to a definite interpretation; an arbitrary null subject (in a finite clause) needs a special marker in the ; (iv) conform to the cluster in (); (v) allow overt subject pronouns, but with a different interpretation.

Concerning (i), in () we see examples of sg, sg, and pl null subjects in OF. () gives examples of null subjects in the other person/ number combinations: 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

() a. A si grant tort m’ociz mes [ null subject] To so great wrong me kill. my cumpaignuns, . . . comrades ‘You are very wrong to kill my comrades, . . . ’ (La Chanson de Roland, p. ; Zimmermann : ) b. La veïssez si grant dulor [ null subject] there see... so great pain de gent . . . of people ‘There you would have such dreadful human suffering . . . ’ (La Chanson de Roland, p. ; Zimmermann : ) c. Or redevons d’Erec parler [ null subject] now re-must. of=Erec speak ‘Now we must speak of Eric again’ (Chrétien de Troyes, Erec et Enide ; Vance : ) It seems, then, that there were no person restrictions on null subjects in OF. Zimmermann (: ), from whom (a, b) are taken, makes the same point. Concerning (ii), Roberts (a: ), following Foulet (/: ff.), Vance (: ), gives the following paradigm for the present-tense conjugation of the -er verb chanter (‘to sing’): () chant, chant-es, chant-e(t), chant-ons, chant-ez, chant-ent Roberts (a: –) observes: Here, if we count the zero-inflection for  as an ending, there are six distinct person inflections, as in Latin, Spanish or Italian. However, in spoken OF (according to Foulet, early in this period), those endings were reduced to the three we find in ModFr as a result of two processes: (i) phonetic erosion of final consonants, eliminating  -s,  -t (the latter already ‘hardly more than a memory’ in the twelfth century, cf. Foulet /: ) and  -nt; (ii) an operation of analogy, which added -e in the  . . . so the situation since the thirteenth century has been one where only  and pl have distinct endings, the rest of the paradigm being identical.

Vance (: ) gives essentially the same characterization of Early OF, ninth to early twelfth century, but a slightly different one for later OF, in that sg retains -әs and pl -ә(n)t. This latter system has just one syncretism. Modern French thus has three syncretisms while Late OF has one, and Early OF none. In these terms, Modern French lacks rich 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

agreement while both periods of OF had it. On these grounds, we can treat OF as a CNSL (see Roberts b: – for a different analysis of Modern French agreement marking, leading to the same conclusion but on different premises). Turning now to diagnostic (iii), sg null subjects are readily found in OF, as in (a), but seem to very consistently have a definite interpretation in all contexts. It is hard to determine how arbitrary or generic subjects were expressed in OF, but Robert (, II: –, ON) says that ‘the use of on as a personal pronoun is essentially fixed from the thirteenth century’ (‘L’usage de on, pronom personnel, est fixé pour l’essentiel avant le XIIIe siècle’). This suggests that OF expressed arbitrary subjects as in Modern French, with the overt pronoun on (‘one’). If this is true, then again OF behaves like a CNSL in that a special marker appears where the subject is indefinite. Since this is an overt pronoun, it seems that indefinite null subjects are not found in OF. Zimmermann (: –) makes the same observation, and gives the following examples: () a. Unches mais hom tel ne vit ajustee. [OF] Once more one such not saw battle ‘Never before has one seen such a battle.’ (La Chanson de Roland, p. ; Zimmermann : ) b. Einz que om alast un sul arpent de [OF] before that one goes a single arpent of camp, . . . flat land ‘in less time than one goes a single arpent of flat land, . . . ’ (La Chanson de Roland, p. ; Zimmermann : ) This diagnostic indicates, then, that OF was a CNSL. Next consider diagnostic (iv). Earlier stages of French show free inversion:3

Vance (: ff.) argues that OF examples of this type are not really cases of free inversion of the Italian type. Whilst it is true that there are clear differences in comparison with Italian, which Vance amply illustrates, we have already seen that there are differences among null-subject languages regarding the properties of the free-inversion construction: in fact, the absence of free inversion may be characteristic of PNSLs. I will return below to the possibility that MidF was a PNSL. Cf. the discussion of BP in §... 3



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

() a. Tant fu de bone hore nez li chevaliers! [OF] so-much was of good hour born the knight ‘The knight was born at such a propitious hour!’ (Q , ) b. car assez l’ot eschaufé li serpenz for much him-had warmed-up the snake ‘for the snake had heated him up considerably’ (Q , ; Vance : ) French does not appear to have allowed complementizer-trace effects. Sentences of this kind are rather rare, and the main studies of OF and MidF word order do not comment on them. Nevertheless I am unaware of any clear examples of such effects in OF or MidF. If, as we shall see below, null subjects were limited to main clauses in OF, this is what we would expect, as complementizer-trace phenomena are by definition phenomena found only in embedded clauses. Finally, diagnostic (v). On this, Zimmermann (: ) says ‘referential subject pronouns are frequently overt when not having any emphatic or contrastive interpretation, especially in embedded clauses’. This appears to contradict the other diagnostics in that OF appears not to behave like a CNSL. However, reference to embedded clauses leads us to consider the most striking feature of OF null subjects. With the exception of the last diagnostic, what we have said so far implies that OF was like Modern Italian or Spanish in being a CNSL. However, there is a significant complication in OF, in that null subjects are sensitive to the type of clause in which they appear. Null subjects are much more widely attested in main clauses than in embedded clauses (see Price ; Einhorn ; Vanelli, Renzi, and Benincà /; Adams a,b; Vance , ; Foulet ; Roberts a; Zimmermann ). Examples such as the following are of great interest in this connection: () a. Ainsi s’acorderent que il prendront par nuit. [OF] thus they-agreed that they will-take by night ‘This they agreed that they would take by night.’ (Le Roman du Graal, B. Cerquiglini (ed.), Union Générale d’Editions, Paris, : ; Adams (b: ); Roberts a: ) 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

b. dont la joye fut tant grant par la ville qu’elle of-which the joy was so great in the town that-it ne se pourroit compter  self could count ‘the joy concerning which was so great around town that it could not be counted’ (Jehan de Saintré , ; Sprouse and Vance : ) In (a) we observe a null pl subject in the main clause, and an overt pl subject in the complement clause (il could be plural in OF; the verb form is clearly pl here). The context clearly favours an interpretation where the overt pronoun in the subordinate clause corresponds to the null subject in the main clause. This is generally not possible in nullsubject languages. Similarly, in the MidF example (b) the overt pronoun elle in the adverbial clause corresponds to la joye in the main clause, a further case of the same type. If we suppose, however, that null subjects were only allowed in main clauses in OF and MidF, then the subordinate clauses in () would be non-null-subject contexts. The overt pronouns would be required, much as in Modern French or English, whether or not they correspond to a main-clause element. In §..., we will see that this restriction is related to the fact that the finite verb occupies a different position in main and embedded clauses in these periods of French (a further difference with the modern language).4 If OF was a CNSL, then the typology in () predicts that it would have had ‘rich’ determiners. Indeed, OF had both definite and indefinite articles (Foulet : ), although both had slightly different forms and distributions as compared to Modern French. The forms varied for person, number and case, the latter since OF distinguished nominative from non-nominative case (the latter is traditionally known as ‘casrégime’, or ‘governed case’). The forms of the two articles were as follows: () a. Definite article:   Nom: li ()/la () li ()/ les () Non-Nom: le ()/la () les ( & ) 4 In fact, in MidF the differences between main and embedded clauses regarding both null subjects and verb position are less marked than in OF. See Vance () for detailed discussion and analysis of this.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

b. Indefinite article:   Nom: uns ()/une () un ()/ unes () Non-Nom: un ()/une () uns ()/ unes () As Foulet (: –) points out, the definite article had a more restricted distribution in OF as compared to Modern French, functioning as a true marker of definiteness in a fashion somewhat similar to Modern English. The indefinite article seems to have been restricted to specific interpretations, to judge from Foulet’s (: –) discussion. In conditional, negative and interrogative contexts, no article appears (see also the discussion in Roberts and Roussou : ): () a. je ne nourriroie trahitor I  feed.. traitor ‘I would not feed [a] traitor’ (La Chastelaine de Vergi l. –; Foulet : ) b. S’anmie volés avoir . . . If enemy want. have. ‘If you want to have an enemy . . . ’ (Courtois d’Arras l. ; Foulet : ) c. Avés vous dont borse trovée? Have. you then purse found? ‘Have you then found a purse?’ (Courtois d’Arras l. ; Foulet : )

[OF]

OF also had bare plurals and bare mass nouns, again like Modern English and unlike Modern French (see Déprez , Roberts and Roussou :  for discussion and examples). None of these differences in form and distribution between OF and Modern French articles alters the fact that OF had rich D by the criteria put forward in Roberts (a: ): ‘overt instantiation of φ-features on third-person determiners where a lexical noun is present’. Combining this with the clear evidence for rich agreement in (), we see that OF had both rich D and rich agreement and, therefore, by (), is correctly predicted to be a CNSL. The unusual property of OF is that null subjects were largely restricted to main clauses, but, in those contexts OF showed CNSL properties, as we have seen. The null-subject property of French appears to have changed in the seventeenth century (see Fontaine ; Roberts a: ff.; Vance : ff.; Sprouse and Vance ). The remarks of a contemporary 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

grammarian, Maupas, in his  Grammaire française, are interesting in this respect. Maupas points out that subject pronouns are omitted in three main contexts that are of interest. These are (i) where the subject is pl or pl, (ii) after certain conjunctions, notably et (‘and’) and si (‘thus, so’), and (iii) where the subject is a non-referential. Examples of null subjects of these types are as follows: () a. J’ay receu les lettres que m’avez [th c. French] I have received the letters that me-have- envoyees. sent ‘I have received the letters you have sent me.’ b. Il vous respecte et si – vous servira bien. he you respects and so you will-serve well ‘He respects you and will serve you well.’ c. Rarement advient que ces pronoms nominatifs Rarely happens that these pronouns nominative soient obmis. be omitted ‘It rarely happens that these nominative pronouns are omitted.’ (Roberts a: –) (Example (c) is from Maupas’ own discussion of null subjects.) Although null subjects are still permitted at this point, the range of possible contexts is highly restricted, and became still more restricted in the seventeenth century. It is possible that late sixteenth- and early seventeenth-century French was a PNSL of a rather restricted kind: the person restriction observed by Maupas, the availability of third-person null subjects just in coordinations and the possibility of null nonreferential subjects are all consistent with this. On the other hand, the existence of both definite and indefinite articles continuously from OF through to Modern French is not consistent with this. In later seventeenth-century French, only non-referential null subjects like that in (c) are found, and these become progressively limited to fixed expressions. It is reasonable to conclude, then, that French ceased to be a fully null-subject language in the seventeenth century, perhaps, like the early Germanic languages discussed in the previous section,



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

passing through a brief, and highly constrained, PNSL stage.5 The development of the syncretisms in verbal inflection discussed around () above, presumably led to the loss of the CNSL property. We mentioned in §.. above that the position and status of overt preverbal subjects in CNSLs needs to be clarified. To see what is at issue here, consider a simple subject-verb-object sentence in a CNSL such as Italian, e.g. (), repeated here: () Gianni parla italiano. John speaks Italian.

[Italian]

As we saw in §.., there are two options for the position of the subject here. On the one hand, it may be a true preverbal subject occupying a preverbal subject position directly comparable to the position of the subject in the English translation. Alternatively, it could be in a leftdislocated position with a null subject in the true subject position, the null-subject counterpart of an English sentence like John, he speaks Italian. The alternatives were schematized in (), repeated here: () a. [Gianni V O] b. [Gianni [pro V O]] A further complication arises in many Northern Italian dialects which have obligatory clitics marking the subject function. Given the obligatory nature of these markers, the question arises as to whether they are true pronominal subjects or agreement markers. Somewhat analogous to () then, the question arises as to whether a sentence containing a 5

It is possible that null expletives have survived in Modern French in one construction, Stylistic Inversion (see Kayne and Pollock , ; Pollock ). This construction looks similar to Italian free inversion (see Pollock  on this) but occurs only in certain contexts: questions, relatives, exclamatives, clefts (i.e. ‘wh’ contexts), and subjunctives. Here is an interrogative example: (i) A qui a téléphoné Marie? [ModFr] to whom has phoned Marie ‘Who did Marie call?’ It can be observed that the preverbal subject position is empty. This construction may thus be a survival of expletive null subjects in one set of highly restricted contexts (see Roberts a: –; Vance : ff.). Kayne and Pollock () suggest that examples like (i) contain a null subject which ‘doubles’ the postverbal subject, a silent version of elle. Their analysis implies that Modern French retains referential null subjects in certain contexts, again making it perhaps a kind of PNSL (see the discussion in Roberts b: –). Nonetheless, it is clear, pace Zimmermann (: –) that a major change in the distribution of null subjects has taken place in the history of French.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

nominal subject and a subject clitic has the structure in (a) or (b), where [cl+V] in (b) is intended to indicate that the clitic is an agreement marker on the head: () a. [NPsubject [cl V . . . ]] b. [NPsubject [[cl+V] . . . ] If an analysis like (b) is correct for Northern Italian dialects (see Poletto and Tortora  for overview and references), could it be true for Modern French? In other words, which of the two structures in () is correct for a simple sentence like Jean, il parle (‘John, he speaks’), where, as in (), [ il+parle ] is intended to indicate that il is an agreement marker on the verb and Jean occupies the true subject position: () a. [Jean [il parle . . . ]] b. [Jean [[il+parle] . . . ] Here, the pronoun il is certainly a subject clitic in that it is a phonological dependent of the verb: no material except for other clitics can intervene between it and the verb, and it cannot be stressed, coordinated, or modified. These characteristics were originally noticed in Kayne () and are subject to close scrutiny and an interesting theoretical interpretation in Cardinaletti and Starke (). This issue is relevant to the question of null subjects since, if (b) is the correct analysis, then the simple sentence il parle has the structure in (): () [pro [[il+parle] . . . ] In other words, Modern French would have to be treated as a nullsubject language of some kind (probably a CNSL; Roberts a, a argues that Northern Italian dialects are CNSLs in which the subject clitics are exponents of agreement). Certain authors (among them Jaeggli ; Roberge ; Sportiche ; Culbertson ) have argued that there is little or no difference between French and Northern Italian dialects regarding the status of subject clitics. Without going into the details of this debate, it is possible that at least some varieties of Modern French treat the subject pronouns as agreement markers. One particularly interesting case is Algerian French, as reported in Roberge and Vinet (). In this variety,



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

‘doubling’ of the subject is ‘extremely frequent’ (Roberge and Vinet : ), as indeed in many varieties of modern colloquial French. Hence we find examples of the following type (here the absence of a comma after the first word indicates that they are to be read without an intonational break at this point, i.e. with a single intonation contour): () a. Marie Mary b. Pierre Peter

elle she il he

vient. comes mange. eats

[ColloquialFr]

Nevertheless, unlike in many Northern Italian dialects, the doubling is not obligatory in French (although there are also Northern Italian dialects where doubling is not obligatory; see Poletto and Tortora ). Algerian French allows examples like (): () Personne il sait que c’est leur mère. [Algerian French] no-one he knows that it’s their mother (Roberge and Vinet : ) Here, a non-referential nominal like the negative quantifier personne (‘nobody’) cannot be a topic or a left-dislocated element: cf. English *Nobody, he knows that it’s his mother. Instead, personne must occupy the preverbal subject position, and so il cannot be in that position but must be treated as an agreement prefix. Hence, in this variety of French, at least, we are led to conclude that the subject clitics are agreement markers rather than pronouns, although their presence is optional when an overt subject is present. The consequence of this is that an example like il parle features a null subject, as shown in (). Hence Algerian French is a null-subject language, whatever the status of Standard French. It may be, then, that French has developed, or is developing, as follows: () Stage I: null subjects (in main clauses), strong subject pronouns; Stage II: no null subjects, weak subject pronouns; Stage III: null subjects, subject clitics. In connection with Stage I, we can observe that in OF and MidF, subject pronouns were not phonologically dependent on the verb in that they could be separated from the verb, modified and co-ordinated: 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

() a. Et il, a toz ses oz, s’en and he, and all his army, went (Price : ; Roberts a: ) b. Se je meïsmes ne li di . . . if I self not him say . . . ‘If I don’t tell him myself . . . ’ (Franzén : ; Roberts a: ) c. Et qui i sera? Jou et and who there will-be? I and ‘And who will be there? I and you.’ (Price : ; Roberts a: )

ala. away

[OF]

tu. you

These are the characteristics of ‘strong’ pronouns (see Cardinaletti and Starke ). By the sixteenth century, these pronouns had ‘weakened’ and, as we have seen, null subjects disappeared at just about this time. This gave rise to the Stage II system. Stage III is what we find in Algerian French, whether or not other varieties have reached this stage (on this last point, see Zribi-Hertz , Roberts b, and in particular Culbertson  for evidence from a variety of sources). Here the subject clitics have further ‘weakened’ to become agreement markers and the value of the null-subject parameter changes again. Although it is rather approximate, and raises a variety of questions, the diachronic development in () illustrates the interactions between subject pronouns and null subjects. Since the Medieval Northern Italian dialects had null subjects in main clauses, like OF (see in particular Vanelli, Renzi, and Benincà / on this), we may conjecture that these dialects have also gone through the stages in (); see also Poletto () on null subjects and subject pronouns in Renaissance and seventeenth-century Veneto, and Renzi () on eighteenth-century Florentine. 1.2.8. Conclusion: types of parameters In this section we have illustrated the three main kinds of null-subject languages identified in the recent literature (see Roberts a: – and the references given there) on the basis of synchronic evidence (§..–..). In §.., we turned to the diachronic aspects of this typology, identifying first the developments in the recent history of Brazilian Portuguese from CNSL to PNSL, and then the nature of the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NULL SUBJECTS

early Germanic languages, where the trajectory CNSL to PNSL to NNSL can be observed (§..). Finally, in §.. we turned to French, where we showed, in agreement with Zimmermann (), that OF was a CNSL, although there was the well-known restriction that null subjects were only found in main clauses. The later history of French was briefly discussed: Middle French shows some properties of PNSLs, but probably not all of them, and there are varieties of Modern French which may have ‘returned’ to CNSL status by reanalysing subject-clitic pronouns as agreement markers. Although many questions remain open, especially regarding French, the above discussion illustrates how null-subject languages may change over time, suggesting that the trajectory CNSL > PNSL > NNSL is a natural one, due to the erosion of ‘rich’ agreement marking (and the development of articles, in line with (), ()) with the possibility of NNSLs becoming CNSLs through reanalysis of subject pronouns as agreement markers. These are our first examples of parametric change. On the other hand, it is difficult to find clear cases of RNSLs changing. Japanese and Korean, for example, have been RNSLs throughout their recorded history (which, for Japanese, goes back to roughly   and for Korean to roughly  ). The same may be true of the much longer attested history of Chinese, but the evidence from the early stages is difficult to interpret. In these terms, the following taxonomy of parameters, introduced by Biberauer and Roberts (, , ), is relevant: ()

For a given value vi of a parametrically variant feature F: a. Macroparameters: all heads of the relevant type share vi; b. Mesoparameters: all heads of a given naturally definable class, e.g. [+V], share vi; c. Microparameters: a small subclass of heads (e.g. modal auxiliaries, pronouns) shows vi; d. Nanoparameters: one or more individual lexical items is/are specified for vi.

In Chapters  and , I will develop and motivate the idea that parametric change involves reanalysis through language acquisition (see in particular §. and §.). In these terms, since the evidence for macroparameters is abundant for language acquirers, they must be ‘easily’ set, and hence resist reanalysis and are strongly conserved diachronically. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

Meso- and microparameters, on the other hand, are correspondingly less salient for acquirers and hence somewhat less diachronically stable. Roberts (a: ff.) argues that RNSLs result from a macroparameter regulating the presence of a Person feature. This is consistent with the great stability of this property that we can observe in the history of various East Asian languages. CNSLs, on the other hand, involve fully specified Person (and other) features on both the clausal Tense head and the Determiner head in nominals. As such, CNSLs involve a mesoparametric property. CNSLs are somewhat stable, tending to hold of genera in the sense of Dryer (), i.e. language groups ‘whose relatedness is fairly obvious without systematic comparative analysis’ with a common ancestor not more than four millennia old, and often considerably less; the major subgroups of Indo-European, e.g. Romance, are canonical examples of genera. Both Latin and the majority of Modern Romance languages, as well as many older IndoEuropean languages, as mentioned above, are CNSLs. PNSLs may be microparametric, crucially involving properties of the D-system, and as such are more diachronically unstable, as we have seen. Finally, the residual null-subject of Modern French described in note  may represent a nanoparametric setting. Thus the various kinds of null-subject system illustrate the various kinds of parameters both synchronically and diachronically. We will return to the taxonomy of parameters in () repeatedly below. To conclude our discussion of null subjects, we replace parameter A as formulated in §.. with the following: A. Are null arguments allowed in the absence of agreement marking? (Macroparameter) YES: RNSLs (e.g. Japanese, Korean, Mandarin . . . ). NO: non-RNSLs (e.g. most, perhaps all, Indo-European languages). A. Are null subjects allowed without person restrictions and with associated ‘rich’ verbal-agreement inflection? (Mesoparameter) YES: CNSLs (e.g. Italian, Spanish, Greek, Latin, Gothic, European Portuguese, Old French). NO: non-CNSLs (e.g. English, German, Finnish, etc.).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

VERB-MOVEMENT PARAMETERS

A. Do null subjects show person restrictions and allow indefinite interpretations in the sg? (Microparameter) YES: PNSLs (e.g. Finnish, Brazilian Portuguese, Marathi, Russian, Early Old High German, etc.). NO: NNSLs (e.g. Modern English, Modern French, Modern Swedish, Modern Norwegian, etc.). It is easy to see that A–A form a parameter hierarchy. We will encounter several such parameter hierarchies in the subsequent sections and chapters.

1.3. Verb-movement parameters Starting with the seminal work of Emonds (, ), den Besten (), and Pollock (), a great deal of attention has been paid to parameters involving the overt position of the verb in the clause. Since, as we stated in the Introduction, we take verbs universally to merge with their complement(s) to form a VP, the natural way to analyse variation in the surface position of the verb is by postulating a further operation of movement which places the verb in some other position in the clause after it has formed the VP. We will see that there are two principal varieties of verb-movement, which can be distinguished in terms of the target of movement, i.e. the position to which the verb is moved. We will also see that each of the two main cases has several associated subcases. As in the previous section, I first present the parametric variation from a synchronic point of view, and then move on to the evidence for parametric change. 1.3.1. Verb-movement in the synchronic dimension 1.3.1.1. Verb-movement to T

If, first, V merges with its direct object to form a VP as described in the Introduction, and, second, the grammatical function ‘direct object’ is defined in terms of Merge in this way (as we briefly mentioned in §.), and, third, if Merge is strictly binary, as is usually assumed, then it follows that verbs are always adjacent to their direct objects. To a good first approximation, this is true for English. The main verb and its 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

direct object are generally adjacent, with adverbial and other material such as negative markers unable to intervene between them:6 () a. *John kisses often Mary. John often kisses Mary. b. *John eats not chocolate. John does not eat chocolate. In many other languages, however, this straightforward situation does not obtain. In French, finite verbs are naturally separated from their direct objects by adverbs of various kinds, while the negative element pas obligatorily intervenes between the finite verb and its object: () a. Jean embrasse souvent Marie. *Jean souvent embrasse Marie. b. Jean (ne) mange pas de chocolat. *Jean (ne) pas mange de chocolat.

(=(a)) (=(b))

Beginning with Emonds (), two ideas have been put forward which together describe what is going on in French examples of this type. First, it is assumed that adverbs like often and souvent as well as the negative elements pas and not are merged in positions to the left of VP (positions whose precise nature need not detain us; Pollock (: –) shows that negation and adverbs in fact occupy different positions—below we will see that there are several distinct adverb positions). Second, it is proposed that in French the finite verb is required to move to a position still further to the left, outside

One construction where this is not true is the so-called ‘double-object’ construction, where the indirect object must intervene between the verb and the direct object: (i) a. John sent Mary a bunch of flowers. b. *John sent a bunch of flowers Mary. 6

Here it is the notional indirect object that must be adjacent to the verb: (ii) a. John often sends/*sends often Mary flowers. b. John does not send/*sends not Mary flowers. The ‘dative alternation’ relates (ia) to (iii): John sent a bunch of flowers to Mary.

(iii)

This alternation has given rise to much discussion in the literature on syntactic theory. Most of the textbooks cited in the Introduction devote some space to this question, and so I will leave it aside here. We will come back to this construction again briefly in §..



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

VERB-MOVEMENT PARAMETERS

VP. Very roughly, then, we have a structure like the following for (a), where embrasse moves to some position P: ()

P embrasse

VP

Adv/Neg souvent/pas

VP V DP (embrasse) Marie

What is the position P merged with VP here? We can get a clue, perhaps, by comparing English examples containing auxiliaries with () and (): () a. John has often kissed Mary. b. John has not kissed Mary. The natural position for often, and the only position for not, is between the auxiliary and the main verb. Now, auxiliaries mark, among other things, tense, aspect, and mood. Observe also that in the grammatical version of (b), the auxiliary does, whose sole content is that of tense/ agreement marker, must precede not; in fact, auxiliary do is in complementary distribution with the auxiliary have of (). If auxiliaries carry tense information, then we might think that there’s a special position for tense-markers outside VP to the left of the position of adverbs and negation. Moreover, we have mentioned that the verb-movement in French examples like () only affects finite verbs. Infinitives, for example, cannot precede pas:7

7 Pollock (: ) points out that infinitives can in fact precede souvent, but not pas. The logical conclusion is that there are two targets for movement, one above pas and one between pas and souvent: (i) . . . X pas Y souvent V . . .

Here X corresponds to P in (). Finite verbs obligatorily move to X = P in French, as we have said, while infinitives move optionally to Y. See Pollock for details, and Belletti (). Below, we will refine this structure by adding distinct positions for several classes of adverbs.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

Box 1.1. Technical aspects of movement A brief word on some technical aspects of movement is perhaps required here. Chomsky () suggests that movement is really second Merge, or internal Merge. That is, when movement happens, Merge applies to an element that has already been merged once, and merges it again, as shown in ():

C

()

A

C

B

A

Here A has first merged with B, and is subsequently remerged with C. Since Merge builds structure, second Merge places the moved element ‘higher’ in the tree than the position of first Merge. In fact, movement generally ‘extends the tree’ in this sense. (Structure is always built up leftwards, i.e. the right branches are generally the recursive branches, for reasons put forward in Kayne (), which we will briefly review in §...) When an element undergoes movement, a copy remains in the original position. Usually, this copy is not pronounced—it undergoes deletion in the phonological component. I have indicated the copy of the moved verb here in brackets in the main text, and will continue to follow this practice. In the examples involving questioning of the subject of a finite clause of the type discussed in the previous section we thus have the following:

() a. Who did you say (who) wrote this book?

b. Chi hai detto che pro ha scritto questo libro (chi)?

Assuming Rizzi’s (a) analysis of null subjects, there are two different types of silent category in (b). The formation of morphosyntactically complex heads is construed as headto-head movement, e.g. V-to-T movement as discussed in this section, and D-to-T movement in CNSLs and PNSLs as discussed in §... ‘Incorporation’ is the term often used for this kind of movement. What causes movement to take place? Since its incidence varies crosslinguistically, as this section is intended to show, it must be controlled by a parametrically varying property. We can for present purposes regard the property of triggering movement (or functioning as an attractor for some category) as a property arbitrarily associated with different positions in different grammatical systems. In §..., I will introduce a notation which encodes this arbitrary, cross-linguistically variant property of certain positions.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

()

a. Souvent . . . often b. Ne pas  not

VERB-MOVEMENT PARAMETERS

embrasser to.kiss embrasser to-kiss

Marie, Mary . . . Marie, . . . Mary . . .

[French]

c. *N’embrasser pas Marie, . . . There appears to be a connection between the target of finite-verbmovement and the temporal specification of the clause. Accordingly, let us identify P with the category Tense, i.e. the category of morphemes (including auxiliaries) or features whose content determines the temporal specification of the clause. For simplicity, we can think of this as the position T, associated with features such as [Present], [Past], etc., although the reality is certainly considerably more complex. When T merges with VP, the resulting category is a TP, a temporal expression. So we can replace () with (): ()

TP

T embrasse

VP

Adv/Neg souvent/pas

VP

V (embrasse)

DP Marie

The structure for English examples like the grammatical version of (b) is as follows:8

8 Here T contains the feature [Present], I assume, which lacks a phonological realization. The relation between T and the inflection on the unmoved verb is presumably mediated by the operation Agree, which I will introduce in the next section.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

TP

()

T

VP

Adv/Neg often

VP

V kisses

DP Mary

Comparing () and (), we can observe that English and French differ in the position occupied by the finite main verb. More precisely, finite main verbs move to T in French, while their English counterparts do not. So we formulate the following parameter: B.

Does V move to T in finite clauses? YES: French, Welsh, Icelandic, Greek . . . NO: English, Swedish, Danish, . . .

The difference in the value of the parameter in B accounts for the wordorder differences between English and French that we have observed here, and similar differences among the other languages listed in B. The basic difference is that languages with a positive value for this parameter allow the order V–Adv/Neg–object while those with a negative value for it require the order Adv/Neg–V–object. Schifano () has demonstrated that we can find microparametric variation in the position targeted by the finite lexical verb across the Romance languages by distinguishing different classes of adverb. There are several different adverb classes, which can be distinguished both syntactically and semantically. This can be seen by comparing the position of the verb in relation to various classes of adverbs in the main Romance standard languages (French, Romanian, Italian, European Portuguese, and Spanish; Schifano also looks at a number of other varieties, see below). Consider first the position of the finite verb in relation to an adverb like ‘probably’ which expresses a modal meaning and typically appears in a relatively ‘high’, leftward position in the clause: 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

()

VERB-MOVEMENT PARAMETERS

a. Antoine confond probablement (*confond) [French] A. confuse.. probably le poème. the poem ‘A. is probably confusing the poem.’ b. Andrei greşeşte probabil (greşeşte). [Romanian] A. is-wrong.. probably. ‘A. is probably wrong.’ c. Antonio (*confonde) probabilmente [Italian] A. probably confonde la poesia. confuse.. the poem ‘A. is probably confusing the poem.’ d. O João (*conhece) provavelmente [Portuguese] The J. probably conhece o assunto. know.. the matter ‘J. probably knows the matter.’ e. Sergio (*confunde) probablemente [Spanish] S. probably confunde este poema. confuse.. this poem ‘S. is probably confusing this poem.’

Here we see that verb-movement over an adverb of the ‘probably’ class varies across Romance: it is obligatory in French, (a), optional but with the higher placement judged to be pragmatically neutral in Romanian, (b), and impossible in Italian, Spanish, and Portuguese (c– e). Compare () with the verb positions in the various languages in relation to the equivalents of ‘already’, an adverb with a temporal/ aspectual meaning typically occupying a medial position in the clause which follows ‘probably’ (cf. the contrast in English between John probably has already finished and *John already has probably finished): ()

a. Marie connaît déjà (*connaît) M. know.. already cette histoire. this story ‘Marie already knows this story.’ 

[French]

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

b. Maria cunoaşte deja (cunoaşte) [Romanian] M. know.. already povestea asta. story=the this ‘Marie already knows this story.’ c. Maria conosce già (*conosce) questa [Italian] M. know.. already this ricetta. recipe ‘Marie already knows this recipe.’ d. A Maria (*sabe) já sabe esta [Portuguese] The Mary already know.. this história. story ‘Marie already knows this story.’ e. María (conoce) ya conoce esta [Spanish] M. already know.. this historia. story ‘Mary already knows this story.’ Here we see two differences compared to (): first, in Italian, the verb obligatorily raises over adverbs of this kind, and, second, the verbraising over these adverbs is optional but pragmatically marked in Spanish. The other languages show the same pattern as with ‘probably’. A third pattern emerges with ‘always’, a further kind of aspectual adverb which tends to occur in a position to the right of ‘already’ (cf. English In those days, John was already always talking about that vs *? In those days, John was always already talking about that): ()

a. Antoine confond toujours (*confond) [French] A. confuse.. always ce genre de poèmes. this type of poems ‘A. always confuses this type of poems.’ b. Maria face mereu (face) desertul. [Romanian] M. make.. always desert=the ‘M. always makes the dessert.’



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

VERB-MOVEMENT PARAMETERS

c. Gianni confonde sempre (*confonde) [Italian] G. confuse.. always queste poesie. these poems ‘G. always confuses these poems.’ d. O João vê sempre (*vê) este [Portuguese] The J. see.. always this tipo de filmes. type of movies ‘J. always watches this type of movies.’ e. Sergio (confunde) siempre confunde [Spanish] S. always confuse.. estos poemas. these poems ‘Sergio always confuses these poems.’ Here French, Romanian, and Italian show the same pattern of either obligatory (French, Italian) or pragmatically neutral (Romanian) verbmovement, while in Portuguese the movement is obligatory and in Spanish, as with ‘often’, it is optional but judged to be pragmatically marked in the higher position. We can summarize the patterns in ()–() as follows, leaving aside the pragmatically marked options in Romanian and Spanish: ()

French/Romanian probably Italian already Portuguese always Spanish

So we see that verbs move to the highest position in French and Romanian, to a medial position in Italian, to a relatively low position in Portuguese and to a still lower position in Spanish. Schifano presents data from a wider range of Romance languages and dialects, showing that Sardinian and Northern Italian dialects pattern with Italian,9 while Southern Italian dialects and Southern Regional Italian pattern with Portuguese and Catalan (of Valencia) patterns with Spanish.

9 Or, more precisely Northern and Central Regional Italian: as noted in the text directly, Southern Regional Italian patterns differently. In the text, ‘Italian’ means Northern and Central Regional Italian unless otherwise indicated.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

Cinque () demonstrates in detail that, in addition to T, there is a series of Mood heads structurally ‘above’ TP and there is a further series of Aspect heads in the structural area between T and VP. A simplified version of the structure Cinque proposes is given in ():

MoodP

()

Mood

TP T

AspP VP

Asp

We can insert the adverbs in (), along the lines seen in () and () (still simplifying Cinque’s proposal):

MoodP

()

Mood

TP Adv probably

TP AspP

T

AspP

Adv already Asp

VP Adv always

VP

In these terms, we can see that verb-movement targets Mood in French and Romanian, Tense in Italian, and Aspect in Portuguese. It might appear that the verb does not move out of VP in Spanish, but in fact there is evidence that the verb precedes a class of adverbs occupying a lower Aspect head, between Aspect and VP in ():



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

()

VERB-MOVEMENT PARAMETERS

Sergio contesta bien (*contesta) a las S. answer.. well to the preguntas. questions ‘Sergio answers the questions well.’

[Spanish]

Hence we see evidence for a parameter hierarchy, which we can formulate as follows (see Schifano :  for a fuller presentation): B. Does V move out of VP in finite clauses? (Mesoparameter) YES: Romance languages, Welsh, Icelandic, Greek, etc. NO: English, Swedish, Danish, etc. B. Does V move to the higher Aspect head? (Microparameter) NO: Spanish, Catalan, etc. YES: French, Italian, Portuguese, etc. B. Does V move to Tense? (Microparameter) NO: Portuguese, Southern Italian dialects, Southern Regional Italian, etc. YES: French, (Northern Regional) Italian, Romanian, etc. B. Does V move to the Mood head? (Microparameter) NO: (Northern Regional) Italian, Sardinian, Northern Italian dialects, etc. YES: French, Romanian, etc. There is an implicational relation among B–B, in that a negative value for each ‘closes off ’ the later options. We can illustrate this more clearly if we draw the hierarchy as a tree indicating the implicational relations directly: ()

NO English

B1: does V move out of VP in finite clauses? YES: B2: does V move to the higher Aspect head? NO Spanish

YES: B3: does V move to the Tense head?

NO Portuguese

YES: B4: does V move to the Mood head?

NO Italian

YES: French, Romanian 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

The mesoparameter B is higher in the hierarchy than the microparameters B–B. B is a mesoparameter since it involves movement of the lexical verb V out of its own phrase VP; see (b). The other heads in (), Mood, Tense, and Aspect, are all functional heads associated with microparameters. The functional heads are hierarchically arranged in the structure of the clause in a way which parallels the parameter hierarchy in (). A particularly interesting case of verb-movement is found in the VSO language Welsh, which we can in fact take to be representative of the Celtic VSO languages generally, and perhaps of other VSO languages such as Classical Arabic and Biblical Hebrew.10 The usual order in finite clauses in Welsh is VSO, as () illustrates: ()

a. Fe/mi welais  saw ‘I saw Megan.’ b. Fe/mi wnes  did ‘I saw Megan.’

i Megan. I Megan

[Welsh]

i weld Megan. I see Megan

By definition, in VSO order the verb is not adjacent to the direct object. If we are to retain the assumptions about the structural relation of the verb and its direct object described above, we must assume that the verb moves ‘over’ the subject in simple VSO sentences like (a). This analysis is supported, in the case of Welsh, by the existence of an alternative way of expressing almost any sentence whose verb is in a simple tense by using a construction involving the auxiliary gwneud (‘do’) and a non-finite form of the verb (the so-called ‘verbal noun’), as in (b). Here the auxiliary precedes the subject, which in turn precedes the verb, which in turn precedes the direct object. So we have the order AuxSVO. If auxiliaries are in T, as we suggested for English, then we can assign a structure like () to (b); the ‘particle’ fe/mi is merged with TP in a higher position, and so we leave it out of ():

10

The type of analysis to be described almost certainly does not generalize to all languages which have surface VSO order, however. See the papers in Carnie and Guilfoyle (), especially Lee (); Massam (); Rackowski and Travis (). In particular, see Biberauer and Roberts () and Roberts (: –) for a distinction between two basic, and rather different, types of VSO language. For an in-depth study of Hebrew and Palestinian Arabic dialects, see Shlonsky ().



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

VERB-MOVEMENT PARAMETERS

TP

()

T wnes

VP DP i

VP

V weld

DP Megan

Here the subject is merged with the ‘core’ VP containing the verb and the direct object, and the auxiliary in T precedes the subject. Examples without auxiliaries and with VSO order involve V-movement of the type we observed above for French, as follows: ()

TP T welais

VP DP i

VP V (welais)

DP Megan

VSO order thus involves V-movement to T. What distinguishes Welsh from French is a further parameter concerning whether the subject appears in VP or in a higher position, which we may take to be the analogous position in relation to TP, viz.: ()

[TP Jean [TP embrasse souvent Marie]].

Here the internal structure of the inner TP is that diagrammed in (). So Welsh subjects appear in VP, as in (, ), and French 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

subjects appear in TP as in ()—here we presumably have a further parameter. If subjects are universally defined in terms of their merged position, like direct objects, then we can take them to be universally merged with VP as in the putative Welsh structures in (), () and moved in languages like French (and English) to the position shown in (). (We will modify this idea in §...) In that case, VSO order of the Welsh type arises where the subject does not move and the verb does. However, there are two main reasons to think that things are slightly more complex than this in Welsh. First, there is evidence that in Welsh and in some other Celtic languages (perhaps all of them) the subject does move from the VP-internal position, although it does not move as far as it does in English and French. One piece of evidence comes from the fact that the subject must precede the negative element (d)dim: ()

(Ni) ddarllenodd Emrys ddim o’r llyfr. [Welsh] () read Emrys not of the book ‘Emrys didn’t read the book.’

The best way to ensure this is to place negation outside VP (unlike the structure given in (), but in line with what was implicit in note ) and then cause the subject to move to a position in between T and negation. Second, we have seen from the Romance V-movement microparameters that there may be more functional heads than just T outside VP. Such functional heads are found in compound tenses in Welsh, as in the following example: ()

Oedd y bachgen wedi bod yn ymlad. [Welsh] Be.. the boy  be  fight ‘The boy had been fighting.’

Here we see Tense, occupied by the finite auxiliary oedd (‘was’) and two Aspect heads, one containing the perfect marker wedi, the other the progressive marker yn. Most importantly, the subject y bachgen (‘the boy’) occupies a position immediately following the auxiliary and preceding the first Aspect head. The structure of () is given in ():



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

VERB-MOVEMENT PARAMETERS

TP

()

T oedd

AspP

DP y bachgen

AspP Asp wedi

AuxP Aux bod

AspP Asp yn

VP

DP (y bachgen)

VP | V ymlad

Here we see the two distinct Aspect heads and phrases, with a further functional position housing the non-finite auxiliary bod (‘be’) in between them. The subject appears in a position immediately following T, as in a simple clause like (a). Where no auxiliaries are present, the main verb raises out of VP, at least to T (whether the verb raises as high as Mood is not clear, and we will leave this question aside here). So, in Welsh, parameters B and B are positive. As far as verb-movement is concerned, then, Welsh is either like Italian or like French. The parameter that distinguishes Welsh from Romance concerns the position that the subject moves to: this position is, at it were, ‘one step lower’ in Welsh than in Romance, so that the subject always appears in a position immediately following the finite verb or auxiliary. Positions of the type occupied by the subjects in both French and Welsh, which we can generalize as in (a), are known as specifiers. Positions formed by simple merger with a head, such as that occupied by direct objects, are known as complements. Finally, the category formed by the head and the complement is conventionally labelled X0 (so the inner TP in () should really be T0 ): ()

a. [XP YP [X0 X . . . ] ] (YP is a specifier of XP) b. [X0 X YP] (YP is the complement of X) 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

Since Merge is binary, there can be only one complement per head. On the other hand, there is no restriction in principle on the number of specifiers. We can now define ‘direct object’ as the complement of V and ‘subject’ as the specifier of V. The parameter distinguishing French from Welsh concerns whether the subject moves to a specifier of TP (‘SpecTP’)—this is what we have in French—or whether it moves to the next-lowest specifier position, the specifier of the higher AspP in (). XP-movement always creates specifiers, since it must always ‘extend the tree’. Head-movement, on the other hand, always targets other head-positions (see Box .). Here we have seen several cases of verb-movement. In SVO systems, the mesoparameter B can distinguish systems like English from those of the kind we find in Romance (incidentally, this shows that SVO order does not result from a unique parameter-setting). Verbmovement can play a central role in deriving VSO order in VSO languages, if we allow for the possibility that the subject might be realized relatively ‘low’ in the structure in such languages. (It is very likely that VSO does not represent a single parameter-setting either, as mentioned in note .) 1.3.1.2. V-movement to C: full and residual V

A further group of parameters of verb-movement distinguishes verbsecond (V) languages from languages like English, French, and Welsh. In V languages, as the name implies, the finite verb is the second element in the clause. More precisely, it follows exactly one XP. German is perhaps the best known example of a V language, and we can illustrate the V phenomenon with the following German examples: ()

a. Ich las schon letztes Jahr diesen Roman. [German] I read already last year this novel b. Diesen Roman las ich schon letztes Jahr. this novel read I already last year c. Schon letztes Jahr las ich diesen Roman. already last year read I this book ‘I read this novel last year already.’ d. *Schon letztes Jahr ich las diesen Roman.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

VERB-MOVEMENT PARAMETERS

In each of the grammatical examples, one XP precedes the finite verb las: the subject ich in (a) (which is a DP, although it consists of just one word); the direct object diesen Roman in (b), and the adverbial phrase schon letztes Jahr in (c). It is impossible to ‘stack up’ more than one XP before the finite verb, as the ungrammaticality of (d) shows. This should be compared with English and French examples like the following: ()

a. Last year I read this novel. b. L’an dernier j’ai lu ce roman. (=(a))

It is easy to see that in (a) and (c) at least the verb is separated from the direct object and has thus moved out of VP. In (b) the direct object has moved—let us refer to the operation which places an XP in first position in V clauses as topicalization. So in this example both the verb and the direct object have moved out of VP. It is clear that V only applies to finite verbs; infinitives and participles are unaffected by it. The counterparts of () in a periphrastic tense show just the auxiliary in second position: ()

a. Ich habe schon letztes Jahr diesen [German] I have already last year this Roman gelesen. novel read b. Diesen Roman habe ich schon letztes Jahr gelesen. this novel have I already last year read c. Schon letztes Jahr habe ich diesen Roman gelesen. already last year have I this book read ‘I read this novel last year already.’

These examples also show that non-finite verbs follow their direct objects in German, as we briefly saw in §., and will see once more in §.. Since it affects finite main verbs and auxiliaries, V seems to involve the T-position. The question is: does it involve movement just to the T-position or does it involve movement from the T-position to a still higher position? It is usually thought that in V clauses the verb moves to a higher position than T (but see note ). The reason for this is that V is largely restricted to main clauses in many languages, with German



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

again being a typical case. In embedded clauses, the finite verb or auxiliary must appear in final position in German: ()

Du weißt You know a . . . daß . . . that b . . . daß . . . that habe. have

()

Ich frage I ask a . . . ob . . . if b . . . ob . . . if habe. have

wohl, [German] well ich schon letztes Jahr diesen Roman las. I already last year this novel read ich schon letztes Jahr diesen Roman gelesen I already last year this book read

mich, [German] myself ich schon letztes Jahr diesen Roman las. I already last year this book read ich schon letztes Jahr diesen Roman gelesen I already last year this book read

Embedded clauses are typically introduced by a complementizer: daß (‘that’) in () and ob (‘if/whether’) in (). Let us label the category complementizer C. We can then see that C merges with TP to form an embedded clause: ()

[CP ob/daß [TP ich schon letztes Jahr diesen Roman las]]

English embedded clauses like whether/that I read this novel last year would also have the structure in () at the CP-level. Now, we can understand the ban on V in embedded clauses if we take it that, first, the verb moves to C in V clauses and, second, since C already contains a complementizer in embedded clauses, the position is already filled and so the verb is unable to move there. This idea was first proposed by den Besten (). If the verb moves to C in V clauses the structure of an example like (b) will be as shown in example ():11

11

In the lowest VP, the copy of the direct object precedes V, an instance of the general OV pattern of German. I return to this in §. and §..



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

()

VERB-MOVEMENT PARAMETERS

CP

DP diesen Roman

Cʹ

C las

TP

DP ich

Tʹ

T (las)

VP

Adv VP schon letztes Jahr DP (ich)

Vʹ

DP (diesen Roman)

V (las)

Here we see that the topicalized XP occupies a Specifier of CP (see (a)). We can thus postulate the following parameter: C.

Does the finite verb move to C in finite main clauses? YES: German, Dutch, Swedish, Icelandic, Danish, Kashmiri, Romansch . . . NO: English, French, Italian, Welsh . . .

Of course, this parameter is connected with only one aspect of V, the verb-movement part. Why and how movement of just one XP to SpecCP is required is a separate matter that I will not go into here.12 12 It should also be pointed out that many analysts do not accept that V moves to C in all types of V clause. Travis () and Zwart () argue that V does not move beyond T in subject-initial V clauses such as (a) and (a). See Schwartz and Vikner () for a defence of the view presented in the text. In some languages, notably Yiddish and Icelandic,

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

Verb-movement to C and verb-movement to T interact in various ways. Both English and French have been referred to as ‘residual V’ languages, since a finite verb or auxiliary may occupy C in just some kinds of main clauses. In particular, we find this construction in mainclause interrogatives in both languages: ()

a. How many books have you read? b. Combien de livres as-tu lus? (=(a))

Here the thing to notice is that have/as precedes the subject. We are assuming that auxiliaries are T-elements, and we saw earlier that subjects occupy SpecTP in English and French, and so here the auxiliaries must move to a higher position. The natural candidate is C, given the analysis of V in (). This gives rise to the structure in () for (a) (leaving aside the Aspect and Mood heads):

CP

()

DP how many books

C

C have

TP

DP you

T

T (have)

VP

DP (you)

V V DP read (how many books)

the restriction to main clauses is much less rigid, which has led to the suggestion that V does not move to C in these–so-called ‘symmetric V’—languages (see Vikner : –, Holmberg  for discussion).

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

VERB-MOVEMENT PARAMETERS

The idea that the auxiliaries move to C in () is supported by the fact that indirect questions, where a complementizer is present, do not allow this order: ()

a. *I wonder if has he read that book. b. *Je me demande si a-t-il lu ce livre.

(=(a))

Now, observe that main verbs can move to C in main-clause questions in French, but this possibility is limited to auxiliaries in English: ()

a. Quel genre de livre préfères-tu lire (=(b)) en vacances? b. *What kind of book prefer you to read on holiday?

Instead of (b), where no other semantically motivated auxiliary is present, the dummy auxiliary do must be used: ()

What kind of book do you prefer to read on holiday?

We can understand the fact that only auxiliaries can move to C in residual V clauses in English in terms of parameter B introduced earlier, combined with the following restriction on head-movement:13 ()

The Head-Movement Constraint: In a single step of movement, a head can only move as far as the next head-position up.

The Head-Movement Constraint means that the verb cannot move directly from V to C in a structure like (), ‘skipping’ an intermediate head-position like T. Now, in English, main verbs do not move to T; we saw above that English has the negative value for parameter B. Thus English main verbs cannot move to C: they cannot move there directly because of (), and they cannot move there via T because of the negative value of parameter B. On the other hand, auxiliaries, which are merged as T (or Aspect or Mood), may move to C in English, and French main verbs may move to C via T thanks to French having the positive value of parameter B. 13 () is an instance of an important general condition on movement, which requires that any instance of movement target the nearest ‘available’ position. The precise structural definition of ‘nearest’ will emerge in the next section. The definition of ‘available’ is rather complex and will not detain us here. Again, these matters are dealt with in detail in the textbooks cited earlier. See also Rizzi (, , ).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

Table 1.1. Synchronic verb-movement parameters

German, Dutch, Icelandic Swedish, Danish, Norwegian French, Italian, Spanish, Greek Breton English Welsh, Irish (unattested) (unattested)

Parameter B

Parameter C

Subject moves to SpecTP

Yes No* Yes** Yes No Yes No No

Yes Yes No Yes No No Yes No

Yes Yes Yes No Yes No No No

* (134a) below shows that Danish, like Swedish and Norwegian, lacks V-to-T movement in non-V2 clauses, since the verb is adjacent to the direct object in VP and follows ikke (‘not’). These are nevertheless V2 languages. The same applies in Dutch and German if VP follows T, as shown in (125). However, if VP precedes T, then it may be that V moves to T in these languages—see note 38. To comply with (131), V must pass through T ‘on its way’ to C in V2 clauses in these languages (and possibly through Mood and Aspect heads, as must auxiliaries in English and other languages which have auxiliaries). That English does not allow this possibility in examples like (129b) may be connected either with the existence of the dummy auxiliary do in English (an item with no counterpart in the Scandinavian languages) or with the precise formulation of the way in which C triggers V-movement in full V2 languages. ** Here I leave aside the further microvariation concerning verb-movement in Romance discussed in the previous section and captured by parameters B2–B4.

By adopting (), we can understand the difference between English and French observed in () as a consequence of the different values of parameter B in the two languages. Taken together, then, parameters B and C, along with the parameter concerning subject position which distinguishes Welsh from Romance as we saw in the previous section, account for a considerable range of cross-linguistic variation, which we can summarize in Table ..14 All of the major Germanic, Romance, and Celtic languages, as well as Greek, appear to fall into the typology created by the three parameters we have discussed in this section. (I have included two Romance CNSLs, alongside Greek, also a CNSL.) We can also deduce the properties of certain languages which we have not yet discussed. For example, Breton combines (a form of) V with VSO (see Shafer ; Jouitteau /). Here we see how parameters may combine to define a range of cross-linguistic options. The last two possibilities in Table . may be unattested because T must always be a target for movement, of a head, an XP or both. This rather mysterious property of T is captured in some versions of the Extended Projection Principle of Chomsky (, ). See §.. for a different conception of the Extended Projection Principle (or EPP). 14



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

VERB-MOVEMENT PARAMETERS

1.3.1.3. Further properties related to verb-movement

Two further properties may be related to verb-movement. First, Holmberg () observed a phenomenon in the Scandinavian languages which has come to be known as ‘object shift’. This can be observed in Danish examples like the following (from Vikner ): ()

a. Hvorfor læste studenterne den ikke? [Danish] why read students-the it not ‘Why didn’t the students read it?’ b. Hvorfor læste studenterne ikke artikeln? why read students-the not articles-the ‘Why didn’t the students read the articles?’

Examples (a,b) show that pronouns appear before the negation in V clauses, while non-pronominal objects follow the negation. Taking negation in Danish to occupy a position similar to English not and French pas (see ()), we see that the object has been moved out of VP in (a). The non-pronominal object in (b), on the other hand, presumably remains in VP (these objects may move in Icelandic, but not in Swedish, Norwegian, or Danish). Object shift does not take place where the verb does not move; for example in subordinate clauses— (a)—or where V is non-finite, as in (b): ()

a. Det var godt at han ikke købte it was good that he not bought ‘It was good that he didn’t buy it.’ b. Hvorfor skal studenterne ikke læse why shall students-the not read ‘Why don’t the students have to read it?’

den. [Danish] it den? it

We thus arrive at what has become known as Holmberg’s generalization: the object moves only if the verb moves.15 Object shift is therefore connected to a positive value for one of the two parameters we have seen in this section. Second, Bobaljik and Jonas (: –) propose that ‘transitiveexpletive constructions’, illustrated in (), are found just in languages which have a positive value for parameter B: 15

Holmberg () reformulated this generalization along different lines, but I retain his earlier formulation here for expository reasons.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

()

a. Es essen einige Mäuse Käse in der [German] there eat some mice cheese in the Küche. kitchen ‘There are some mice eating cheese in the kitchen.’ b. Er hat iemand een appel gegeten. [Dutch] there has someone an apple eaten ‘Someone has eaten an apple.’ c. Það hafa margir jólasveinar [Icelandic] there have many Christmas.trolls borðað búðing. eaten pudding ‘Many Christmas trolls have eaten pudding.’ (Icelandic: Bobaljik and Jonas : )

Compare the situation in English and the Mainland Scandinavian languages, which have the negative value for parameter B: ()

a. *There has someone eaten an apple. b. *Der har nogen spist et æble. there has someone eaten an apple c. *Det har någon ätit ett apple. there has someone eaten an apple

[Danish] [Swedish]

Bobaljik and Jonas restrict their attention almost exclusively to Germanic languages, and it is unclear exactly how transitive expletive constructions might manifest themselves in a null-subject language or a VSO language; I therefore leave this question aside. Having seen two important parameters regarding verb-movement, and a number of assumptions regarding the structure of the clause and the nature of movement, I now turn to the diachronic evidence that the values of these parameters can change. 1.3.2. Verb-movement in the diachronic dimension Here we will see evidence that the two parameters B and C, and possibly one of B or B, have changed at various points in the history of English. Parameter C has also changed in the history of French and in other Romance languages (see Wolfe a,b), and this latter change 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

VERB-MOVEMENT PARAMETERS

may be connected to the changes concerning null subjects in the history of French which we discussed in §.. above. Furthermore, the history of Romance, combined with certain facts from the history of English, provide evidence for breaking Parameter C down into subparameters. 1.3.2.1. V-to-T in earlier English

In earlier English, until approximately , main verbs were able to move out of VP, apparently to T (but see below for some important provisos to this statement). Warner (: –) provides a very interesting discussion of the chronology of this change; see also the discussion of Kroch () in §... We can see this from the fact that main verbs could be separated from their direct objects by negation and by adverbs, as in the following examples: ()

a. if I gave not this accompt to you [ENE] ‘if I didn’t give this account to you’ (: J. Cheke, Letter to Hoby; Görlach : ; Roberts : ) b. The Turkes . . . made anone redy a grete ordonnaunce. ‘the Turks . . . soon prepared a great ordnance.’ (c: Kaye, The Delectable Newsse of the Glorious Victorye of the Rhodyans agaynest the Turkes; Gray : ; Roberts a: )

Examples like these have a slightly familiar ‘Shakespearean’ feel for many speakers of present-day English. Shakespeare lived from  to , and so in his English V-movement of this kind was possible; hence examples of the type in () can be found in his plays and poems. Despite this air of familiarity, the examples in () are ungrammatical in present-day English. This shows us that the mesoparameter B has changed since Shakespeare’s time. Until at least the seventeenth century, the lexical verb was able to raise out of VP, as it does in the Modern Romance languages. Is it possible to tell which of the microparameters B–B discussed and illustrated in §.. held for pre-seventeenth-century English? Haeberli and Ihsane () demonstrate very clearly that movement of the lexical verb out of VP into the Tense-Mood-Aspect (TMA) ‘field’ was lost in two distinct stages in Late Middle and Early Modern English. First, SVAdvO order of the kind seen in (b) declines in 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

the fifteenth century; the steepest drop in incidence of this order falls between – and –, dropping from .% of relevant examples to .%, while the incidence of SAdvVO rises from .% to .% between – and – (see Haeberli and Ihsane : , table ). Haeberli and Ihsane (: ) conclude their examination of the data saying ‘the decline of V-movement past adverbs to T starts in the middle of the fifteenth century and is to a large extent completed by the middle of the sixteenth century’. However, they go on to demonstrate that V-not orders the kind seen in (a) are lost later: ‘the negation data seem to show a very slow decline starting from  with V-movement remaining very strong up to ’ (Haeberli and Ihsane : ; see also their table , p. ). Assuming that negation systematically occupied a place above VP and below T, it appears that the first change left the verb in a position above negation but ‘at the same time in a position that is lower than the head position it occupies in the early fifteenth century’ (Haeberli and Ihsane : ). Schematically, then, we have the situation in (): ()

. . . T . . . Adv . . . H . . . Neg . . . [VP . . . ]

Until the mid-to-late fifteenth century, English verb-movement targeted T (there was an earlier stage when V moved to a higher position, as we will see in §...). This movement was lost by the middle of the sixteenth century, but the evidence from negative clauses shows that verbs still moved over Neg to H. Later, from the seventeenth century on, verb-movement over Neg was lost. It is difficult to identify H precisely; Haeberli and Ihsane (: –) examine the data from the relevant periods in the light of the distinctions drawn by Schifano () for the Romance languages on the basis of adverb positions, and conclude that ‘no intermediate stages affecting different adverb types can be identified’ (Haeberli and Ihsane : ). As they point out, this is at least in part a consequence of the fact that the relevant adverbs are rather infrequently attested in the surviving texts. However, given our discussion in §... and in particular the clause structure in (), we can conclude that the first stage of verbmovement must have been at least to T and the second stage was to Asp. Hence, ME had parameter B set to positive, while Early Modern English from the middle of the sixteenth century until at least the middle of the seventeenth century had either B or B (and therefore B) positive. Modern English, as we have seen, sets B to negative. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

VERB-MOVEMENT PARAMETERS

Leaving aside the microparametric changes discussed by Haeberli and Ihsane, we can take examples like () to tell us that preseventeenth-century English had the ‘Romance’ value for mesoparameter B. If so, then following the reasoning outlined at the end of the previous section, we expect that main verbs were able to move to C in residual V environments at this time. This is correct, as () shows: ()

What menythe this pryste? [ENE] what does this priest mean (–: Anon., from J. Gairdner (ed.), , The Historical Collections of a London Citizen; Gray : ; Roberts a: )

Example () is ungrammatical for modern speakers. Here the main verb moves to C. At this stage, i.e. late fifteenth century, V-to-T movement was still possible, and this in turn made V-movement to C possible, as in Modern French. Compare () with the French example in (a); we take the two sentences to be structurally isomorphic in relevant respects.16 Also, given Holmberg’s generalization as presented in the last section, we expect to find object shift in sixteenth-century English. Again, this expectation is borne out:

16

Examples comparable to () continue to be found in the late sixteenth century, including in Shakespeare. In fact, the evidence is that inversion of the lexical verb over the subject was lost at the same time as V-not orders (as we will see in more detail in §..); this implies that the ability to raise the verb out of VP was sufficient to make movement to C possible in interrogatives. Given the Head Movement Constraint (see ()), this suggests that movement from Asp to T and on to C is triggered in these cases; see the discussion and references in the first note to Table . above. Eric Haeberli (p.c.) informs me that there is no evidence in the parsed corpora that the loss of V-to-C started earlier, but the amount of direct questions per text is very limited in the parsed material from around . Like Haeberli and Ihsane (), Han () also proposes a sequential loss scenario of V-movement in English, with movement to a high inflectional head being lost before the complete loss at a later point. On the basis of the data in Ellegård (), Han observes that do is more frequent in interrogatives than in negatives at the beginning of the change and argues that this can be related to the loss of V-movement to a higher inflectional head (i.e. T in ()); owing to the Head Movement Constraint, once V-movement to this high head starts to decline, V-movement to C declines as well, making the do-insertion necessary. Han (: ) suggests that this analysis is supported by the sudden increase of do in interrogatives in the early stages of its emergence. In negative declaratives, on the other hand, the loss of V-movement to a high inflectional head has a more limited effect. Many thanks to Eric Haeberli for drawing my attention of the relevance to Han’s paper to these issues.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

()

a. if you knew them not (, John Lyly; Roberts : ) b. They tell vs not the worde of God. (, Thomas Stapleton; Roberts : )

[ENE]

Here V has moved out of VP; we know this because it precedes not. The pronominal object (in fact it is an indirect object in (b)—see note ) also precedes not and so we take this element, too, to have left VP. In (b), the direct object presumably remains within VP; see again note . Transitive expletive constructions are also found in earlier periods of English, up until approximately the sixteenth century, as () shows: ()

a. Within my soul there doth conduce a fight. [ENE] (Shakespeare; Jonas : ) b. . . . there had fifteene severall Armados assailed her. (; Ralegh Selections ; Jonas : )

So we witness the clustering of non-adjacency of the main verb and its direct object, main-verb-movement to C in residual V, transitive expletive constructions and object-shift in sixteenth-century English. This cluster of properties in sixteenth-century English, and the absence of these properties in present-day English, can be described by postulating a change in the value of parameter B at some point between roughly  and the present. 1.3.2.2. V in diachrony

At a still earlier period, until approximately the fifteenth century (see Fischer et al. (: –) for discussion), English had the positive value of parameter C. In other words, OE and ME were V languages. The following examples illustrate V in OE: ()

a. Se Hæland wearð þa gelomlice ætiwed the Lord was then frequently shown his leornung-cnihtum. his disciples ‘The Lord then frequently appeared to his disciples.’ (ÆCHom I, ..; Fischer et al. : )



[OE]

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

VERB-MOVEMENT PARAMETERS

b. On twam þingum hæfde God þæs mannes sawle in two things had God this man’s soul gegodod. endowed ‘With two things had God endowed the man’s soul.’ (ÆCHom I, ..; Fischer et al. : ) c. þa astah se Hælend up on ane dune. then rose the Lord up on a mountain ‘Then the Lord went up on a mountain.’ (ÆCHom I, . .; Fischer et al. : ) In (a), the subject precedes the finite auxiliary; in (b) a PP precedes the auxiliary; in (c) the adverb þa (‘then’) precedes the finite verb. Just like the other modern Germanic languages (with the exceptions given in note ), V is only found in main clauses in OE. The verb is typically final in embedded clauses (although this word order is less consistent in OE than in Modern Dutch or German—see Fischer et al. (: ff.); §..; and §. for more detailed discussion of this point): ()

a. . . . þæt ic þas boc of Ledenum gereorde [OE] . . . that I this book from Latin language to Engliscre spræce awende. to English tongue translate ‘ . . . that I translate this book from the Latin language to the English tongue’ (AHTh, I, pref, ; van Kemenade : ) b. . . . þæt he his stefne up ahof. . . . that he his voice up raised ‘ . . . that he raised up his voice.’ (Bede .; Pintzuk : ) c. . . . forþon of Breotone nædran on scippe lædde . . . because from Britain adders on ships brought wæron. were ‘ . . . because vipers were brought on ships from Britain.’ (Bede .–; Pintzuk : )

The general picture is thus strikingly similar to Modern German, as a comparison with the previous section and the brief discussion of German word order in §. reveals. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

In ME, at least until the latter part of this period, V is still consistently observed: ()

a. On þis gær wolde þe king Stephne tæcan [ME] in this year wanted the king Stephen seize Rodbert. Robert ‘During this year King Stephen wanted to seize Robert.’ (c; ChronE (Plummer) .; Fischer et al. : ) b. Oþir labur sal þai do. other labour shall they do (Ben. Rule(I) (Lnsd) .; Fischer et al. : ) c. Nu loke euerich man toward himseluen. now look every man to himself ‘Now it’s for every man to look to himself.’ (c; Ken.Serm .; Fischer et al. : )

In (a) an adverbial PP precedes the finite auxiliary; in (b), the direct object precedes the finite auxiliary, and in (c) an adverb precedes the finite verb. It appears then that both parameters B and C have changed in the recorded history of English. We will look again at V in OE at the end of this section. French, too, was a V language at an earlier stage of its history, as the following OF examples illustrate: ()

a. Par Petit Pont sont en Paris entré. [OF] by Petit Pont are (they) in Paris come ‘They entered Paris by the Petit Pont.’ (Charroi de Nîmes, ; Roberts a: ) b. Sa venoison fist a l’ostel porter. his goods made (he) to the hostel carry ‘He had his goods carried to the hostel.’ (Charroi de Nîmes, ; Roberts a: ) c. Li cuens Guillelmes fu molt gentix et ber. the count G. was very kind and good ‘Count G. was very kind and good.’ (Charroi de Nîmes, ; Roberts a: )

In (a) the PP par Petit Pont precedes the verb, while in (b) the direct object sa venoison occupies the preverbal position. In (c), the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

VERB-MOVEMENT PARAMETERS

subject precedes the finite verb. Embedded clauses generally did not allow V order in OF, although this may not have been true in early OF, prior to the thirteenth century—see Adams (a,b,c); Dupuis (, ); Hirschbuhler and Junker (); Hirschbuhler (); Roberts (a: ff.); Vance (: –). Vance (: , table .) shows that subject-initial order is found in % of main clauses in her thirteenth-century prose text (La Queste del Saint Graal) and in –% of embedded clauses, depending on the precise nature of the clause in question; Wolfe (a: , table .) gives .% subjectinitial main clauses and .% subject-initial subordinate clauses (Wolfe a: , table .). This period of OF, at least, appears therefore to feature movement of the finite verb to C quite generally in main clauses. We conclude that OF had the positive value for parameter C. In MidFr, the nature of V changes somewhat, but V main clauses are still found, as the following examples illustrate. In (a), the conjunction et can be thought of as being outside the V clause; it does not ‘count’ as the initial constituent in the V clause: ()

a. Et aussi fis je de par vous. and also did I from by you ‘And I did likewise with respect to you.’ (S , ; Vance : ) b. Si suis je aussi bien armé. thus am I as well armed ‘Thus I am as well armed.’ (S , ; Vance : )

[MidFr]

We saw in §.. that OF was a null-subject language. We also saw from the discussion of () that there is reason to think that null subjects were restricted to main clauses, unlike in Modern Italian, Spanish, and Greek. We are now in a position to state the observation more clearly: in OF, null subjects are allowed only in V clauses. This observation was first made by Thurneysen (), and has been restated and analysed in the context of generative approaches by Vanelli, Renzi, and Benincà (/ ); Adams (a,b); Vance (, ); Roberts (a); and, most recently and comprehensively, Wolfe (a). In terms of the analysis of V put forward in the previous subsection, we can say that the subject may be null just where the verb moves to C. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

One advantage of linking V and null subjects in this way has to do with a class of embedded clauses which allows V. These are known as the so-called complements to ‘bridge verbs’, verbs of saying, thinking, etc.17 V is allowed in such clauses in all of the V Germanic languages, including those which do not otherwise allow V in embedded clauses (see (), (), and note ). In German, the complementizer disappears and the verb must be in the subjunctive, at least according to prescriptive grammar, while in the other languages the V order follows the complementizer, illustrated here by Danish (from Vikner : ): ()

a. Watson behauptete, dieses Geld hätte [German] Watson claimed this money had(subjunc) Moriarty gestohlen. M. stolen b. Watson påstod at disse penge havde [Danish] Watson claimed that this money had Moriarty stjålet. Moriarty stolen ‘Watson claimed Moriarty had stolen this money.’

However such clauses are to be analysed (and see Vikner (: ff.) for a survey of the possibilities; see also Penner and Bader  on some semantic properties of this construction in certain varieties of German), we can make an immediate inference regarding OF if it is true that V and null subjects are connected in that language. The inference is this: if OF also allows V in complements to bridge verbs, then it should allow null subjects in exactly these embedded clauses. In (a) I illustrate that embedded V is indeed possible in this context, and in (b) I show that null subjects are also possible here (OF allowed the complementizer que (‘that’) to optionally drop, as in (b)): 17 They are known as bridge verbs because a constituent in their complement clause can readily undergo wh-movement. Compare (ia), which features a bridge verb, with (ib), which does not: (i) a. Who does John think that Mary likes (who)? b. ?Who does John regret that Mary likes (who)?

The metaphorical notion is that verbs such as think in some way form a bridge across which the wh-constituent may move from its merged position in the embedded clause to the SpecCP position of the main clause. On the nature of wh-movement, see §., especially Box ..



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

()

VERB-MOVEMENT PARAMETERS

a. Et il respondirent que de ceste nouvele [OF] and they replied that of this news sont il moult lié. are they very happy ‘And they replied that they were very happy with this news.’ (Le Mort le Roi Artu, J. Frappier (ed.), Droz, Genève, : ; Adams b) b. Or voi ge bien, plains es de mautalant. now see I well, full are- of bad-intentions ‘Now I see clearly that you are full of bad intentions.’ (Charroi de Nîmes ; Roberts a: )

We expect to find (b) if we have (a) and if null subjects and V are linked in OF. So let us conclude that this is the case. If so, a diachronic prediction immediately emerges: we might expect the loss of V and the loss of null subjects to be connected in the history of French. The two changes are at least chronologically correlated, in that both took place in the sixteenth century (see Adams (a,b), Roberts (a), Vance () for extensive discussion and analysis of this). This is then an interesting case where two putatively independent parameter values appear to be linked. A similar connection between V and null subjects is attested in the history of the Northern Italian dialects mentioned in §.. in connection with the null-subject parameter; these varieties had the positive setting for both parameters, and both appear to have changed, with the subsequent reversal of the value of the null-subject parameter caused by the change in status of subject pronouns (as discussed in relation to French in §..). Wolfe (a), following earlier proposals in Benincà (b, ), who in turn drew to some extent on earlier philological work on Romance, argues that all the Medieval Romance languages were V (see the discussion and references in Wolfe a: , –). The principal evidence for this comes from three basic properties observable across a range of Medieval Romance languages which have clear counterparts in the Modern Germanic V languages and which are not found in the Modern Romance languages. The first of these is a ‘preverbal field capable of hosting a range of constituents’ (Wolfe a: ) as well as the subject. Wolfe (a: ) gives the following examples of object-initial clauses:



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

()

a. E tutte queste cose fece perchè [Old Italian] And all these things do.. because Roboam regnasse dopo di lui Roboam reign... after of him And he did all these things because Roboam would reign after him’ (Novellino ; Vanelli : ) b. una fetra fei lo reis . . . [Old Piedmontese] a chair make.. the king Salomon Salomon ‘King Salomon made a sedan chair’ (Sermoni Subalpini V, Benincà : ) c. este logar mostro dios a abraam [Old Spanish] this place showed God to Abraham ‘God showed Abraham this place’ (GE-I.v, Fontana : )

The second property indicative of general Medieval Romance V is verb-subject inversion when a non-subject appears in preverbal position. This is seen in (); () gives examples where the initial constituent is an AdvP, (a), and a non-finite clause, (b): ()

a. et così levai -e’ lo me rem [Old Venetian] And thus lift.. =I the my oar ‘And I thus lifted my oar . . . ’ (Lio Mazor , , Benincà a: ) b. por le fazer plaser mando el [Old Spanish] to her. make. pleasure send.. the rrey fenchir king fill. ‘To please her, the King had [them] fill . . . ’ (Lucanor , Fontana : )

Third, we observe robust asymmetries between main and embedded clauses, with main clauses showing V order and embedded clauses SVO:



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

()

VERB-MOVEMENT PARAMETERS

a. Quand’io presi arme . . . [Old Italian] When=I took.. arms ‘When I took up arms . . . ’ (Novellino , , Vanelli, Renzi, and Benincà : ) b. Yo se que aaron tu hermano [Old Spanish] I know. that aaron your brother es omne bien razonado be. man well reasoned ‘I know that your brother Aaron is a reasonable man’ (GE-I r, Fontana : )

So a positive setting for Parameter C may have been general across Medieval Romance. If this was the case, then the connection between the loss of V (i.e. change in Parameter C from positive to negative) and the loss of null subjects cannot be as close as was suggested above for the history of French and the Northern Italian dialects since Modern Spanish and Italian are clearly full-fledged CNSLs and yet have lost V. In this connection, Wolfe (a: –) makes a very interesting proposal for further microvariation in V systems. Wolfe follows and develops the proposal in Rizzi () that the left-periphery of the clause, the CP, should be split into several functional heads. The most important of these are the Force and Fin(iteness) heads, the former interfacing ‘upwards’ with a higher predicate or the discourse (in the case of a root clause), and the latter interfacing ‘downwards’ with TP. In between these two heads Topic and Focus heads may appear. Given this structure for the ‘split CP’, Wolfe proposes that the Medieval Romance languages vary as to whether the target of verb-movement is Force or Fin. Slightly simplying Wolfe’s proposal, we can illustrate the two possibilities as in (): ()

a. Fin V2: ForceP Force

TopicP Topic

FocusP Focus

FinP

XP

Fin’ Fin VFin

TP

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

b. Force V2: ForceP XP

Force’ TopicP

Force VFin Topic

FocusP Focus Fin

FinP TP

Wolfe (a: ) comments on these structures as follows: The most important point to note is that the grammar in [(b)] will yield a necessarily stricter V grammar than that in [(a)]. Whilst in the Fin-V grammar in [(a)] the full range of left-peripheral structure is accessible above the primary landing site of the finite verb . . . , this is not the case in the Force-V grammar in [(b)].

Wolfe goes to on to show how Fin-V systems allow greater freedom of V, V, and V orders than Force-V ones, as well as a range of other differences. It emerges that Old Spanish, OF and Old Venetian (along with possibly other Medieval Northern Italian dialects) are Force-V while Old Italian, Old Occitan, and Old Sicilian are Fin-V. Moreover, OF changed from Fin-V to Force-V around . Most importantly in the present context, Wolfe (a: , note ) suggests a correlation between Force-V and the loss of null subjects: ‘once V undergoes reanalysis from a Fin- to a Force-V system, Null Subjects . . . will become unstable as they will only ever be realized in a postverbal position’. Given these proposals, we can retain the idea that the loss of V and the loss of null subjects are linked in the history of French, since later OF was Force-V; the same may be true of the history of Venetian (see also Poletto ) and the other Northern Italian dialects. On the other hand, Spanish and Italian lost Fin-V but retained their CNSL status. We can now reformulate the V parameter as a hierarchy (see Wolfe b:  for a similar proposal): 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

()

VERB-MOVEMENT PARAMETERS

Parameter C: V? YES/NO (mesoparameter) Parameter C: If YES, then FinV? (microparameter) Parameter C: If NO, then ForceV? (microparameter)

The similarity between Parameters C–C and Parameters B–B seen in §... is clear. In each case we observe a mesoparameter determining the superordinate option: in the case of Parameter B, verbmovement into the TMA domain; in the case of Parameter C, verbmovement into the left periphery. Furthermore, logically consequent on a positive setting of the mesoparameter, the microparameters B–B and C–C determine exactly which head in the relevant domain is targeted. B and C are mesoparameters since they involve movement of the lexical verb V out of its own phrase VP; see (b). The individual heads in the TMA field (Mood, Tense, and Aspect), and those in the left periphery (Force and Fin) are functional heads associated with microparameters. The functional heads are hierarchically arranged in the structure of the clause in a way which parallels the respective parameter hierarchies in () and (). Comparing () and (), it seems that there are fewer microparameters associated with verb-movement into the left periphery than are associated with movement into the TMA field, i.e. fewer C-parameters than B-parameters. However, if we look again at V in OE we can observe a complication to the pattern as we described it above (see the discussion around (–)), first pointed out by van Kemenade (: –). It appears that pronouns can intervene between the first constituent and the verb in what would otherwise be a V clause. This is illustrated in (a) with a subject pronoun and in (b) with a complement pronoun: ()

a. Hiora untrymnesse he sceal rowian on his their weakness he shall atone in his heortan. heart ‘He shall atone in his heart for their weakness.’ (CP .; Pintzuk : ) b. Fela spella him sædon þa Beormas many stories him told the Permians ‘the Permians told him many stories’ (Oros, , ; van Kemenade : ) 

[OE]

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

Furthermore, where the initial constituent is a wh-phrase, the negative element (ne) or a class of discourse adverbs including canonically þa (very roughly ‘then’), the verb was required to follow this consituent directly, and thus precede a pronoun. This is illustrated in (): ()

a. Hwæt sægest þu, yrþlincg? [OE] what sayest thou, earthling ‘What do you say, ploughman?’ (ÆColl., .; Fischer et al. : ) b. Þa weard he to deofle awend. then was he to devil changed ‘Then he was changed to a devil.’ (AHTh, I, ; van Kemenade : ) c. Ne worhte he þeah nane wundre openlice. nor wrought he yet no miracles openly ‘But he worked no miracles openly.’ (AHTh, I, ; van Kemenade : )

The same effect is observed in southern dialects of ME, but not, according to Kroch and Taylor (), in northern dialects. It is also seen in OHG, with some variation in wh-questions (Tomaselli ; Axel : –), but not in OS or ON (Faarlund : ; Walkden : –). Wolfe (b: ) argues that similar orders are found in twelfth-century Spanish, thirteenth-century Occitan, and fourteenthcentury Sicilian, and Meelen () gives examples from Middle Welsh. The correct analysis of this construction has been and remains the subject of much debate: see van Kemenade (: –), Pintzuk (), Roberts (), Fischer et al. (: –), Fuß and Trips (), Haeberli (), Axel (: –), and Walkden (: ff.) for discussion and different types of analysis. Following Roberts () and Walkden (: ff.), we can account for this situation by postulating that the pattern seen in () is a case of Fin-V, with the pronoun in a position between SpecForceP (occupied by the initial XP) and Fin, probably SpecTopP (since pronouns typically represent old, i.e. topical, information). The ‘stricter’ V pattern in () can then be accounted for as Force-V: the verb is in Force with the fronted XP in Spec Force. If this analysis is on the right track, then OE (and possibly also OHG, as we mentioned above) combined elements of Fin-V and Force-V, depending on the type of CP involved. This conclusion 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

VERB-MOVEMENT PARAMETERS

implies that there must be further microparameters determining verbmovement to Fin and Force, or at least the latter, as a function of the feature-content of Force, along the lines of C: ()

Parameter C: If NO, then Force[+F]-V? (microparameter)

Here [+F] is intended to cover wh-, negation, and whatever discourserelated features best characterize the þa-class of adverbs.18 To conclude this discussion, I present (Table .) the diachronic correlate of Table ., given at the end of the previous subsection (the second line is the same as in the earlier table, as are lines  and ; the latter two are however relevant diachronically in that earlier stages of these languages appear elsewhere in the table). As in Table ., I extrapolate away from the microparameters B–B and, now, C–C. Where different stages of the same language appear in different lines of the table, this indicates that a parameter change has taken place. As the table shows, Parameter C changed in seventeenth-century Welsh (see Willis ; Meelen ); Meelen () argues that Middle Welsh was Fin-V. Table 1.2. Parameters of verb-movement in older Germanic, Romance, and Celtic languages Parameter B Parameter C Subject moves to Spec TP OE, ME, OF, MidFr (Danish, Swedish, Norwegian) ENE, Modern French Middle Welsh* NE Modern Welsh (unattested) (unattested)

Yes No Yes Yes No Yes No No

Yes Yes No Yes No No Yes No

Yes Yes Yes No Yes No No No

* Willis (1998) argues that Middle Welsh was a V2, VSO language, with the V2 parameter changing in the seventeenth century. See also Meelen (2016).

18

Wolfe (b) distinguishes two types of Force-V systems, with a more restricted kind, represented by Modern Standard Germn, only allowing hanging topics and dislocated constituents in the pre-V position, while the Force-V Medieval Romance languages additionally allowed various adverbs and wh-phrases in this position. Wolfe suggests a natural diachronic evolution from the less strict to the more strict form of Force-V. Following Haegeman () and Benincà and Poletto (), Wolfe suggests that there is FrameP above ForceP in structures such as (), and that there is a microparameter regulating the kind of category SpecFrameP can host.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

1.4. Negative concord In this section, I want to present a rather different sort of parameter from the ones discussed up to now. This kind of parameter has to do with the presence or absence of a particular class of items and how these items interact with syntactic principles and mechanisms. As we will see, the class of lexical items in question can be characterized in terms of formal features, whose nature and role in the theory will be introduced. The parameters are also more connected with semantics than those we have seen so far, in that questions of how certain words and syntactic relations among words are semantically interpreted arise. The parameters have to do with the expression of negation and with the phenomenon of negative concord. 1.4.1. Negative concord synchronically The basic observation concerning negative concord can be simply stated: in some languages, each morphologically negative expression corresponds to a logical negation, while in others, several morphologically negative expressions in a given syntactic domain may combine to express a single logical negation. Languages of the latter type are known as negative-concord languages; languages of the former type are non-negative-concord languages. English is a non-negative concord language, as the following set of examples, taken together, show: ()

a. I saw nothing. b. I didn’t see anything. c. I didn’t see nothing.

Examples (a,b) each contain a single logical negation, and can be translated into a quasi-logical statement such as ‘There is no x such that I saw x’, and a single negative morpheme: no or nothing in (a) and n’t in (b). In (b) we also see the polarity item anything; polarity items are words or phrases that must occur in a particular structural relation with a negative element, or at least with non-assertive elements of various kinds (English any-words can also appear in questions and conditionals, for example: Did you see anyone? If you see anyone, . . . ). Combining the two expressions of negation in a single sentence, as in (c), gives rise to a slightly awkward sentence, which clearly has a 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NEGATIVE CONCORD

positive interpretation.19 In (Standard) English, then, two realizations of negation in a single sentence (or clause) give rise to a positive statement. Each negative expression ‘counts’ as an autonomous logical negation, and so the positive interpretation is a consequence of the logical truth p ¬ ¬ p. In French, (a,b) are both naturally translated as (): ()

Je n’ai rien vu. I -have nothing seen ‘I have seen nothing/I haven’t seen anything.’

[French]

Here, we observe that there are two negative morphemes, ne and rien. We can see that rien is intrinsically negative because in isolation it corresponds to ‘nothing’, unlike anything, which is ungrammatical in isolation: ()

a. Qu’est-ce que tu as vu? Rien! what is it that you have seen? Nothing ‘What have you seen? Nothing!’ b. What have you seen? *Anything!

[French]

The same is true of the other negative words (or n-words) in French, such as those given in () below. The two negative expressions in () correspond to a single logical negation. This is an example of negative concord; the two expressions must agree, i.e. show concord, in the formal expression of a single logical negation. French makes extensive use of negative concord, as the following examples show. I have systematically given the English translations using both no and not . . . any (or the equivalent of any):

19

Many readers will recognize (c) as the non-standard equivalent of (b). This shows that non-standard varieties of English have negative concord. Therefore, if negative concord is determined by a parameter, we conclude that non-standard English and Standard English have at least one different parameter value and therefore are different grammatical systems (again we see that a single E-language, ‘English’, corresponds to multiple I-languages). Labov (: –) provides a detailed discussion of negative concord in one variety of non-standard English, which we will come back to in §..; see also van der Auwera () for a survey of negative concord across World Englishes and English-based creoles. The normative notion that it is illogical to interpret sentences like (c) as containing a single negation reflects ignorance of the cross-linguistically common phenomenon of negative concord on the part of normative grammarians, rather than ignorance of logic on the part of speakers of non-standard English.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

()

a. Jean n’a jamais dit cela. [French] John -has never said that ‘John has never said that/John hasn’t ever said that.’ b. Jean n’a aucun espoir de gagner. John -has no hope to win ‘John has no hope of winning/John hasn’t any hope of winning.’ c. Jean n’a vu personne. John -has seen no-one ‘John has seen no-one/John hasn’t seen anyone.’

What is the nature of the negative-concord relation? Here I will adopt the analysis of negation and negative concord put forward in Zeijlstra (, ) and Biberauer and Zeijlstra (a,b). They propose that negative concord is an abstract relation of agreement between two negative elements. To see how this works, we must first define agreement in formal terms. This is the relation of Agree, introduced in Chomsky (, ). Agree is a matching relation holding between formal features in a particular syntactic domain. Formal features are categorial features such as V and N, as well as features such as Person, Number, Gender, and Case, which are clearly relevant for morphosyntactic well-formedness. Some formal features, such as Number, have semantic content; others, such as certain Case features (for example, Nominative, Accusative) lack semantic content. Formal features which have a semantic interpretation are known as interpretable features; those which lack such an interpretation are uninterpretable. Chomsky argues that certain formal features may be interpretable in one position and uninterpretable in another: for example, Person and Number features are uninterpretable on verbs but interpretable on nouns. The Agree relation eliminates uninterpretable features, which is a necessary condition for a sentence to be grammatical. Interpretable features are not eliminated, as they are interpreted by the semantic component. Agree is defined as follows (the Greek letters here stand for any syntactic category): ()

α Agrees with β where: (i) α and β have non-distinct formal features; (ii) α asymmetrically c-commands β.

We define asymmetric c-command in (): 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

()

NEGATIVE CONCORD

α asymmetrically c-commands β if and only if β is contained in the structural sister of α.

Two categories are structural sisters if they are ‘at the same level’ in a tree diagram. To put it another way, two categories α and β are structural sisters as a result of being merged with one another. Asymmetric c-command as defined in () thus obtains between α and β in the following schematic configuration: ()

α

γ

... β ... Here β is contained in γ, the structural sister of α, and so, by (), α asymmetrically c-commands β. We can illustrate how the Agree relation works with English subjectverb agreement. The standard relation of subject-verb agreement in English, as in an example like John kisses Mary, corresponds to the following structure:

TP

()

DP John

Tʹ

T

VP

DP (John)

Vʹ

V kisses



DP Mary

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

Here, the structural sister of the position occupied by John is T0 , and the verb is contained in this category. Hence John asymmetrically c-commands the verb, and so the verb’s uninterpretable features can be eliminated under Agree with those of John. The Agree relation refers to non-distinctness of features, since John is third-person, singular, masculine, while the verb only marks third-person and singular. Features of Person, Number, and Gender are collectively known as φ-features.20 Where the conditions for Agree as given in () are met, α is referred to as the Probe and β the Goal of the Agree relation. Chomsky proposes that Probes and Goals may have uninterpretable features (in addition to interpretable features) which render them active for the Agree relation. Now, it is natural to suppose that Negation is also a formal feature, and so we can construe negative concord as in the French examples in () as an instance of the Agree relation. As mentioned above, the French negative expressions jamais, rien, and aucun in () are known as n-words. N-words, found in all the Romance languages and many other languages, but not in (Standard) English, are negatively marked words which are able to function either autonomously as negative quantifiers (i.e. expressing, roughly, ‘no x’) or participate in negative-concord relations. Zeijlstra (, ) treats them as bearing an uninterpretable Negation feature, written as uNeg (see also Watanabe ). In most of the Romance languages, e.g. Italian, the clausal negative marker (i.e. the word that translates English not) carries an interpretable Negation iNeg, so an Italian example like (a) has the distribution of Neg features and the corresponding Agree relation in (b): ()

a. Gianni non telefona a nessuno. [Italian] Gianni  telephone.. to nobody ‘Gianni doesn’t telephone anybody.’ b. [TP Gianni non[iNEG] telefona [VP a nessuno[uNEG]]]

Nothing other than unstressed, clitic pronouns can intervene between non and the finite verb in Italian. We saw in §... that the finite verb moves to T in Italian (i.e. Parameters B and B are positively set in Italian). So non is either part of T or in Spec,TP. Either way it asymmetrically c-commands nessuno: if non is part of T, the sister of T is VP 20

The verb is also present tense, of course. This uninterpretable feature is eliminated under Agree with T.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NEGATIVE CONCORD

(or an AspP containing VP) and nessuno is inside VP; if non is in Spec, TP its sister is T’, which likewise contains VP. Since non and nessuno share the formal Negation feature and non asymmetrically c-commands nessuno, the two conditions for Agree given in () are met. This is ‘upward’ Agree, in that the element bearing the uninterpretable feature uNeg is c-commanded by the element bearing the interpretable feature iNeg; this is the version of Agree adopted by Zeijlstra (; see also Adger ; Wurmbrand a,b; Keine and Dash ). In many negative-concord languages (but not Italian, as we shall see below) an n-word subject co-occurs with the main expression of clausal negation. This is the case in Romanian (as well as Hungarian and many Slavonic languages), shown in (): ()

Nimeni nu a venit a petrecere. no-one not has come to the-party ‘No-one came to the party.’ (Martins : )

[Romanian]

Languages like Romanian are known as strict negative-concord languages. Zeijlstra (; ) analyses them by postulating that the clausal negation (nu in Romanian) has an uninterpretable Negation feature, and there is an abstract negative operator Op¬, occupying the specifier of a higher category (which Zeijlstra :  calls NegP) which Agrees with both the n-word nimeni and the clausal negator nu (Op¬ was proposed by Laka ; and Ladusaw ). Thus, an example like () has the distribution of negators and Neg features shown in (): ()

[Op¬[iNeg] [TP nimeni[uNeg] nu[uNeg] a venit a petrecere]].

Italian, on the other hand, is known as a non-strict negative-concord language. Here n-words in subject position are able to negate the clause alone and cannot co-occur with the clausal negator: ()

Nessuno è venuto. Nobody be.. come. ‘Nobody has come.’

[Italian]

Here Op¬ Agrees with the n-word in subject position. N-words cannot appear in non-subject positions without an overt clausal negator:21 According to Bernini and Ramat (: –), examples comparable to () are grammatical without non, and with the appropriate intonation contour, as yes/no questions, 21



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

()

?*Gianni ha telefonato a nessuno. G. have.. call. to no one.

[Italian]

Biberauer and Zeijlstra (b: ) account for examples of this kind in terms of the following idea: ‘in the absence of an overt [iNEG]bearing element, an abstract negative operator will always be postulated in the lowest position in the clause that still c-commands all instances of [uNEG].’ Hence, in an example like (), Op¬ negates VP, but not the clause. This gives rise to the interpretation that there was a calling event but no one was called, which is a contradiction. Hence they claim that examples of this kind are not ungrammatical but that their unacceptability arises from the fact that they are inherently contradictory. In these terms, then, the distinction between strict negative-concord languages like Romanian and non-strict negative-concord languages like Italian reduces to interpretable (non-strict) as opposed to uninterpretable (strict) Neg feature on the clausal negator. In strict negativeconcord languages like Romanian, the negative marker carries uNeg; in non-strict negative-concord languages it carries iNeg. For non-negative-concord languages like (Standard) English, negative quantifiers like nothing in (a,c) have an iNeg feature, hence (a) is negative in the absence of not (and Op¬) and (c) is interpreted as double negation, since not also has iNeg. So we see that the Neg features of both clausal negation and negatively marked phrasal categories can vary in terms of whether they contribute a semantic negation or not. Again, we can organize these parameters into an implicational hierarchy as follows (this is a simplified version of the hierarchy given by Biberauer : ; see also Giannakidou , Biberauer et al. , Gianollo : , Roberts a: –): ()

a. Parameter D: Are all negative elements iNeg? YES: non-negative concord, e.g. Standard English. NO: negative concord, e.g. Italian, Romanian, French. b. Parameter D: Are all sentential negators iNeg? YES: non-strict negative concord, e.g. Italian. NO: strict negative concord, e.g. Romanian, French.

meaning ‘Has John phoned anybody?’ It may be that n-words such as nessuno are at least marginally able to function as more general polarity items, similar to English any; see Rizzi (: , n. ) on why these elements cannot be fully assimilated to polarity items.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NEGATIVE CONCORD

In addition to Standard English, Standard Dutch, Standard German, and Latin have the positive value for Parameter D, and so lack negative concord.22 Other possible cases are the Uto-Aztecan languages Tümpisha Shoshone, Hopi, and Yaqui discussed by Haspelmath (: –), and some dialects of Arabic (Haspelmath : –), but Lucas (: f.) points out that many Modern Arabic dialects are mixed, having both negative quantifiers and n-words. Van der Auwera and van Alsenoy (: , table ; : , table .), in a survey of  languages controlled for genus and macro-area (Dryer ; Miestamo ; van der Auwera and van Alsenoy : ), found that .% of the languages surveyed showed negative quantifiers as in Standard English (John bought nothing), while % had negative concord; the cross-linguistically commonest pattern was negation combined with an unmarked, i.e. not ‘special’, indefinite (John did not buy something), with .%, followed by negation combined with a negative polarity item (John did not buy anything), at .% (the frequencies sum to more than % as some languages have more than one option). If Parameter D in (a) is negative, the next relevant parameter is D, which determines whether there is non-strict negative concord. Since Parameter D is negative and hence not all formal negators are semantic negators, if Parameter D is positive, and all sentential negators are therefore iNeg, it follows that negatively marked Ds may be uNeg, giving rise to negative concord. Most of the Romance languages show non-strict negative concord, as illustrated with Italian in (), (), () above. Spanish and Portuguese are like Italian in

‘Negative spread’ can be distinguished from ‘negative concord’ (den Besten ). Negative spread involves cooccurrence of two negative expressions with a single negative interpretation, in the absence of a clausal negation marker (see Willis et al. :  for discussion and illustration). This is found in Modern German (while ‘standard’ negative concord is not): (i) Hier hilft keiner keinem. [German] Here help.. no one. no one. ‘Nobody helps anybody here.’ 22

If keiner/keinem has iNeg, then here we have a case of ‘quantifier absorption’ akin to what is found in multiple questions; see §... Jäger (, ) and Breitbarth and Jäger () document the diachrony of negative spread, alongside negative concord with sentential negation, in the history of both High and Low German.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

these respects, while Catalan shows variable behaviour; see Martins (: ). Romanian, on the other hand, has strict negative concord, as we have seen; see (). Attic Greek of the fourth and fifth centuries  also showed non-strict negative concord (Chatzopoulou : ). French is also a strict negative-concord language in regard to ne, but not pas, which, if added to (), gives rise to a double-negation reading (Déprez ): ()

Personne n’ est venu. No one =be.. come. ‘No one has come.’

[French]

As proposed in Zeijlstra (: ), we can assign uNeg to ne and iNeg to pas, as shown in (): ()

Jean n[uNeg]’est pas[iNeg] venu.

Here ne Agrees with pas (‘downwards’, in this case). An example with an n-word in subject position like () has the configuration of negators and Neg features in (): ()

[Op¬[iNeg] [TP personne[uNeg] n[uNeg]’est venu]].

Here Op¬ Agrees with both the n-word personne and ne. Adding pas to () gives the representation in (): ()

[Op¬[iNeg] [TP personne[uNeg] n[uNeg]’est pas[iNeg] venu]].

Here there are two iNeg features, and so the double negation interpretation arises (see Zeijlstra : ). In addition to the two basic conditions on Agree given in (), a third aspect of the Agree relation emerges when we consider examples of the following type: ()

*Je n’ai pas vu personne. I -have not seen no-one

Examples like this are ‘very marginal’ and ‘always have a double negation reading’ (Déprez : ). The structure of () is given in ():



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NEGATIVE CONCORD

TP

()

DP Je

Tʹ

T n’ai

VP

Neg pas

VP DP (je)

Vʹ

V vu

DP personne

Here, ne c-commands pas and pas c-commands personne. However, these elements are unable to form a single instance of negative concord, unlike in the case of (). We can understand this if we take it that we have two distinct Agree relations, each involving an interpretable and an uninterpretable Neg: one holding between Op¬ and ne, the other holding between pas and personne. It is impossible for Op¬, ne and personne to form a single Agree relation, the way they do in (), because the presence of pas blocks the Agree relation between ne and personne. This last point brings us to the third part of the definition of Agree: ()

(iii) there is no γ non-distinct in formal features from α such that γ c-commands β and α c-commands γ.

In the structure in () pas acts as γ, since it is non-distinct in formal features from α (ne), and it c-commands β (personne) and is c-



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

commanded by ne. In the relevant sense, then, pas ‘breaks’ the Agree relation between ne and personne.23 Given this analysis of French, we can now add a further parameter to those in (), as follows: ()

Parameter D: Are some sentential negators iNeg? YES: French, some Northern Italian Dialects. NO: strict negative concord with a single negator, e.g. Romanian.

The positive value of this parameter determines the presence of two clausal negators, one iNeg and one uNeg, as suggested for French in (), and the interaction of the two negators with negation and n-words that we see in () and (). If a language with two clausal negators had the positive value for either of Parameters D or D, all basic negations would be double negations, i.e. () would mean ‘John didn’t not come’, and single negation with a clausal negator would be unavailable. This option is presumably unavailable for functional reasons, since all languages seem to require a way to semantically negate clauses (see Bernini and Ramat : ; Talmy : ; Willis, Lucas, and Breitbarth : ). This concludes our discussion of negative concord in the synchronic domain. Parameters D–D in () and () rather clearly form a parameter hierarchy comparable to Parameters A–A in §.., Parameters B–B in () and Parameters C–C in (), (). Parameter D is a macroparameter, since it regulates all possible occurrences of negation across all categories, while D and D are microparameters since they relate to a maximum of two or three lexical items. In this section we have seen two main things: first, the concept of negative concord and the observation that this is a property of some languages but not others; moreover, it has at least the strict and non-strict variants. Second, we have introduced the technical notion of Agree and the associated distinction between interpretable and uninterpretable features as a way of analysing negative concord. This led us to the postulation of Parameters D–D, to whose diachronic effects we turn in the next section. 23

(i)

The grammaticality of (i) reveals an important difference between rien and pas: Il n’a rien dit à personne. [French] he -has nothing said to no-one ‘He said nothing to anyone.’

Here rien is uNeg, like pas and personne, and all three enter an Agree relation with Op¬.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NEGATIVE CONCORD

1.4.2. Negative concord and n-words in the diachronic dimension Here we will look at examples of changes in the values of Parameters D and D. A change in the setting of Parameter D involves change from a negative-concord to a non-negative-concord system (or vice-versa). A change in Parameter D involves change from strict to non-strict negative concord (or vice-versa); we will begin by looking at how this happened in the history of French, along with looking at the related development of the system of Modern French n-words from formerly non-negative expressions. I defer discussion of the development of multiple clausal negation (Parameter D) to the discussion of grammaticalization in §.. Let us look first at how the Modern French negation system developed. We will see that a change in parameter D is involved. The expressions which are n-words in Modern French (rien, personne, jamais, etc., as discussed in the previous section) were, at an earlier stage in the history of the language, either simple nouns in the case of rien and personne or positive adverbs in the case of jamais. The fact that these words are not historically negative can be seen from their etymologies: rien comes from Latin rem, meaning ‘thing’; personne is etymologically related to the feminine noun personne (‘person’—the English word is of course borrowed from French); and jamais derives from a compound of ja (‘already’) and mais (‘more’). The other nwords of French have similar non-negative etymologies, with the single exception of nul ‘no’ from Latin nullus ‘no’. Regarding the nature of these elements, Foulet (: ) comments as follows: Although ne is the essential negation in Old French and needs no extra help to express the idea of negation, it is nevertheless the case that from an early stage there is a preference to reinforce it with a series of words whose usage is sometimes rather curious. These words, with one exception, . . . take their negative value purely from their association with ne, and it is impossible to use them with a negative meaning without ne preceding or following them.24

‘Si ne est la négation essentielle du vieux français et n’a besoin d’aucun secours étranger pour exprimer l’idée négative, il est vrai pourtant que depuis longtemps on aime à la renforcer par une série de mots dont l’emploi est parfois bien curieux. Ces mots, à une exception près, . . . tiennent leur valeur négative uniquement de leur association avec ne, et il est impossible de les employer au sens négatif sans les faire précéder ou suivre de ne’ (my translation). 24



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

In stating that ne is the ‘essential negation’ of Old French, Foulet means that it alone sufficed to express clausal negation, unlike in Modern French. This is illustrated by examples like (): ()

a. Je ne nourriroie trahitor. [OF] ‘I would not feed [a] traitor.’ (Ch. –; Foulet : ) b. Li ostes ne set que il vent. the landlord not knows what he sells ‘The landlord doesn’t know what he is selling.’ (Bodel, Le jeu de saint Nicholas, l. ; Ayres-Bennett : ) c. Un moyne . . . ne laboure comme le paisant, [MidFr] a monk not works like the peasant, ne garde le pays comme l’homme de guerre, ne not guards the country like the soldier not guerit les malades comme le medicin, ne cures the sick like the doctor, not presche ny endoctrine le monde comme le preaches nor teaches the world like the bon docteur evangelicque et pedagoge, ne good doctor evangelical and pedagogical, not porte les commoditez et choses necessaires à brings the commodities and things necessary to la republicque comme le marchant. the republic like the merchant ‘A monk . . . doesn’t work like the peasant, doesn’t protect the land like the soldier, doesn’t cure the sick like the doctor, doesn’t preach or teach like the good preacher and teacher, doesn’t bring goods and commodities essential to the nation like the merchant.’ (Rabelais, Gargantua (ed. Calder, Droz : ); AyresBennett : , her translation)

It seems that we should conclude from examples of this kind that ne had an iNeg feature at this period. The example in (c), which is from the sixteenth-century author Rabelais, shows that this was the case at least until this period. In fact, it is generally stated that ne . . . pas negation became obligatory in the seventeenth century (Harris : ; Robert : ; Ayres-Bennett : ; Mosegaard-Hansen : ). (See note , Chapter .) 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NEGATIVE CONCORD

In the quotation given above, Foulet also points out that the words which ‘reinforce’ negation are not negative unless ne appears in the same context. This can be seen from examples such as the following: ()

a. comment qu’il onques en aviegne how that it ever of-it happens ‘how it might ever happen’ (Courtois d’Arras ; Foulet : ) b. Aucuns se sont aati . . . some selves are boasted . . . ‘Some (people) have boasted . . . ’ (le Bossu, Jeu de la Feuillée ; Foulet : ) c. douce riens por cui je chant sweet thing for whom I sing ‘sweet one for whom I sing’ (Muset, Chansons VIII, ; Foulet : )

[OF]

In (a), onques is either an indefinite adverb or a negative polarity item. Since it appears in an indirect question, we cannot tell which it is on the basis of this example. In (b), it is fairly clear that aucuns is an indefinite, while in (c) rien is a noun (feminine in gender, as the form of the adjective douce shows). These examples clearly show that these items did not at this time have an iNeg feature. These expressions appear in negative clauses in OF where ne is present: ()

a. Ce n’avint onques. [OF] that not-happened ever ‘That didn’t ever happen/that never happened.’ (Chastelaine de Vergi ; Foulet : ) b. k’il n’aient de vous aucun bien that-they not-have from you a good ‘that they won’t have any good(s) from you’ (Jeu de la Feuillée ; Foulet : ) c. . . . li feus, qu’il ne pooit por riens estaindre . . . the fire that-he not could for thing put-out ‘ . . . the fire that he couldn’t put out for anything’ (Huon le Roi, Le Vair Palefroi –; Foulet : )



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

As the English translations in ever and any indicate, it may be possible to consider onques, aucun, and rien as polarity items in these examples. Alternatively, they may be regarded as indefinites. In fact, negativepolarity items are a kind of indefinite; see Giannakidou () and Horn () for a critical discussion of this question. But, given the evidence in (), they cannot be n-words in the way they are in Modern French. It seems, then, that in OF ne was the true negation, and thus had an iNeg feature. The fact that the future n-words could appear in nonnegative contexts like () shows that they did not at this time have a Neg feature, whether interpretable (which would have made the relevant clauses negative) or uninterpretable (which, without the presence of Op¬, have rendered the clauses ungrammatical). When they appeared in negative contexts they may have been indefinites in the scope of negation or negative-polarity items. Presumably their distribution changed so that they appeared predominantly in the scope of negation, again perhaps because they were polarity items. This led to the development of an Agree relation with ne’s iNeg feature and their reinterpretation as n-words. Still following Zeijlstra’s approach, the formal correlate of the development of rien etc. as genuine n-words must have been their taking on a uNeg feature. This may have been a consequence of the development of the bipartite ne . . . pas negation, with pas bearing the iNeg feature and requiring ne to be reanalysed as uNeg. Where pas is not present, Op¬ must be postulated to Agree with the uNeg features of the n-words. Hence three things seem to have changed at the time bipartite negation was introduced into French: (i) ne became uNeg; (ii) rien, etc., became full-fledged n-words, i.e. they too took on uNeg; (iii) Op¬ was introduced for clauses containing ne and an n-word (but not pas). Presumably, the cause of all this was the development of pas as a ‘second’ clausal negator with [iNeg], a matter we will come back to in §.. We thus expect the development of negative concord (and hence n-word status for rien, etc.) to correlate chronologically with the development of the ne . . . pas clausal negation.25 Below I will suggest, on the basis of 25

This predicts that rien, etc., never co-occur with ne . . . pas at any period of French. In the pre- period, rien was a negative-polarity item, co-occuring with the original autonomous negator ne, as seen in (c). In the post- period, after the introduction of bipartite negation, rien is an n-word licensed by Op¬ in examples like (i) of note  above. Like personne, rien can only very marginally co-occur with pas, and then it has a double-negation interpretation; see the discussion of () in the previous section.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NEGATIVE CONCORD

further evidence from OF, that the only development at this time was the association of uNeg with rien. Martins (: –) shows that other Old Romance languages (Old Spanish, Portuguese, Galician-Leonese, and Italian) were like OF in allowing what are now n-words to appear in ‘modal contexts’, i.e. ‘questions, imperatives, conditionals, comparatives, the scope of modal verbs, the scope of words expressing prohibition, generic constructions, subjunctive clauses introduced by . . . “before” (). It may be, then, that these elements were polarity items at this period. Martins also shows that Modern Catalan has this pattern. Interestingly, some of the polarity items in Catalan have a non-negative etymology, like those in French. This is true of res (‘nothing’), like French rien from Latin rem, and cap (‘no’), as in: ()

a. T’ha passat res? [Catalan] to-you-has happened nothing ‘Did anything happen to you?’ b. Si hi trobeu cap defecte, digueu-m’ho. if in-it you-find no defect, tell-me-about-it ‘If you find any defect, let me know.’

These elements cannot appear in (non-modal) positive declarative clauses: ()

*T’ha passat res. to-you-has happened nothing

[Catalan]

Instead, the unambiguously positive alguna cosa (‘something’) must be used here. This situation appears to resemble OF, as Martins points out. We see from the above that the Modern French system of negation and n-words, including strict negative concord with ne in examples like (), probably emerged with the development of bipartite negation around . The development of this negation pattern, and the However, some examples where rien cooccurs with ne . . . pas are found in the seventeenth century, e.g. the following from Molière (Bernini and Ramat : ; gloss added): (i) Ne

faites pas semblant  do.. not seeming ‘Act as if nothing happened.’

de of

rien. .thing

[th c. French]

We can account for this by supposing that rien may have retained its negative-polarity item status after the development of bipartite negation, effectively becoming lexically ambiguous for a time.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

subsequent loss of ne in Colloquial French, is an instance of Jespersen’s Cycle (see Jespersen ; Willis, Lucas, and Breitbarth ); I will discuss this in §.. We have suggested that three things changed with the introduction of bipartite negation, i.e. the introduction of pas bearing iNeg: the status of ne (changing from iNeg to uNeg), the status of rien, etc. (changing from negative-polarity items or negative indefinites to n-words), and the introduction of Op¬. However, while examples of the kind in () show that French did not have obligatory bipartite negation until the seventeenth century, they do not have to be interpreted as showing that ne had an iNeg feature. As we saw in the previous section, Zeijlstra (, ) analyses the clausal negators of strict negative-concord languages as having a uNeg feature which agrees with the abstract negative operator Op¬. In fact, there is evidence that OF had strict negative concord, in that we find negative expressions in subject position co-occurring with ne: ()

a. Nient ne nous vaut, vous en venrés. [OF] Nothing not us is-worth, you of-it will-see ‘Nothing is as valuable as we are, you will see.’ b. Nus hom ne les peüst irier. No man not them can anger ‘No man can anger them.’ (Huon le Roi, Le Vair Palefroi, ; Foulet : )

(Martins : – gives similar examples from Old Spanish, Old Portuguese, and Old Italian.) Here the negative expression cannot be a polarity item as it is not c-commanded by ne (Gianollo : , , draws the same conclusion from similar OF data). Following Zeilstra’s () analysis of strict negative concord as summarized in the previous section, we revise the analysis given above and attribute uNeg to ne rather than iNeg and postulate the presence of Op¬ in a higher position, Agreeing with ne (and any n-words present). We can also revise what we said above and treat the examples in () in the same way. In this way, the systematic introduction of pas bearing an iNeg feature led to just one change: postverbal negatives other than pas (i.e. rien, etc.) changed from being indefinites or polarity items to being nwords, as we saw above. We thus witness a change, at least in part, from



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

NEGATIVE CONCORD

strict to non-strict concord in French, parameter D. The status of ne (as uNeg) and Op¬ (present and, as always, iNeg) did not change. In both Low and High German we witness a change in parameter D, the loss of negative concord. As Breitbarth and Jäger (: –, ) and Jäger (, : f.) show, both OHG and Middle High German (MHG) showed negative concord alongside examples where just the negative expression appears, without a clausal negator: ()

OHG: a. gibot her/in tho thaz sie niheingamo nisagatin. told he/them then that they nobody .told ‘then he told them not to tell anybody’ (Tatian , –; Jäger : ) b. Inthemo noh nu nioman/ Ingisezzit uuas. In.which still now nobody put was ‘in which nobody had been put yet’ (Tatian , –; Jäger : )

()

MHG: a. Da enwart nymand konig, (er enwúrd darzu there .became nobody king he .was to.it erkorne.) chosen ‘Nobody became king there, unless he was chosen.’ (Proslancelot , ; Jäger : ) b. Und sie hatten nymant miteinander gewunnen and they had nobody with.each.other won dann ein junges knebelin kleyn than a young boy small And they had no child with each other apart from a small boy’ (Proslancelot , ; Jäger : )

Jäger shows that the negative-concord pattern is much more frequent in OHG (averaging % of negative clauses) than in MHG (where it averages %), while the pattern without the clausal negator is quite rare in OHG, averaging %, compared to % in MHG (see tables . and ., Jäger : , ). There is also a third pattern for negation, combining a clausal particle with a non-negative indefinite, which accounts for % (OHG) and % (MHG) of negative clauses 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

respectively. Jäger (: ) proposes that ‘[t]he loss of NC [negative concord, IR] is basically caused by the loss of the preverbal clitic-neg particle.’ In terms of Zeijlstra’s system, we can see this as the loss of the iNeg-bearing negative particle and the shift of the former n-words from uNeg to iNeg. (Breitbarth and Jäger : f. arrive at a different analysis, treating Modern German negative indefinites as uNeg.)26 Breitbarth (, ) documents a similar development in the history of Low German, showing that the loss of the preverbal negative particle took place during the fifteenth and sixteenth centuries, with some variation across dialects (see in particular the data and discussion in Breitbarth : –). As Breitbarth (: ) says, in Low German ‘the developments are largely parallel to the High German ones . . . , though with a delay of about  years’. This is a change in the value of parameter D from positive to negative. Still following the general approach put forward by Zeijlstra, we can attribute the difference between German and French to the fact that preverbal negation has persisted in French, remaining present in more formal varieties to the present day, with the consequence that Op¬ must be posited to license both this element and n-words where pas is not present. In High German, on the other hand, the preverbal negator disappeared completely, along with Op¬, and the former n-words became negative quantifiers (there are some Low German varieties which retain bipartite negation, notably West Flemish; see Haegeman c).

26 There is evidence that OHG (and OLG) were strict negative-concord languages. In OHG we find examples like (i) and (ii): (i) got nioman nigisah io in altere [OHG] God nobody .saw ever in ages ‘Nobody has ever seen God.’ (Tatian , ; Breitbarth and Jäger : )

(ii)

Neo nist zi chilaubanne dhasz fona dhemu Never .is to believe that of the chiforabodot. predicted ‘It is not to be believed that this was predicted of Salomo.’ (Isidor IX, ; Jäger : )

salomone Salomo

sii be

dhiz this

This implies that OHG n(i), like OF ne, was uNeg and Op¬ occupied a higher position in the clause, licensing the clausal negator and n-words. This does not alter the proposals in the text, in particular the idea that we witness a change in the value of parameter D in the history of German.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WH-MOVEMENT

1.4.3. Summary and conclusion In () I present the three parameters concerning the expression of negation that we have looked at here: ()

a. Parameter D: Are all negative elements iNeg? YES: non-negative concord, e.g. Standard English, Modern High German. NO: negative concord, e.g. Italian, French (all periods), OHG, MHG. b. Parameter D: Are all sentential negators iNeg? YES: non-strict negative concord, e.g. Italian, Spanish, Classical Attic Greek. NO: strict negative concord, e.g. Romanian, French (all periods), OHG. c. Parameter D: Are some sentential negators iNeg? YES: post- French, some Northern Italian Dialects. NO: strict negative concord with a single negator, e.g. Romanian, OF.

We have looked at cases of change in the setting of D and D. We postpone the discussion of the development of bipartite negation, i.e. change in the value of D, to §.. As we pointed out above, Parameter D is a macroparameter, since it regulates all possible occurrences of negation across all categories, while D and D are microparameters since they relate to a maximum of two or three lexical items.

1.5. Wh-movement 1.5.1. Wh-movement parameters An important parameter, which has attracted a great deal of attention since it was first proposed in Huang (), concerns the incidence of ‘overt’, as opposed to ‘covert’ wh-movement. Wh-movement is an operation that takes place in, among other constructions, wh-questions. We came across this operation in our discussion of the null-subject parameter in §.. and again in our discussion of residual V in §.... I repeat (a) here for convenience: 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

()

a. How many books have you read?

As we saw, (a) has the structure in (), also repeated here:

CP

()

DP how many books

C

C have

TP

DP you

T

T (have)

VP

DP (you)

V V DP read (how many books)

In this structure, movement of the DP how many books to SpecCP is an instance of wh-movement, leaving a copy in the direct-object position, as shown here. Wh-movement can also take place in indirect questions, although here there is no T-to-C movement in Standard English—see §.. on some non-standard varieties: ()

I wonder [CP how many books he has read (how many books)]

In English, movement of one wh-phrase to SpecCP is obligatory in both direct and indirect wh-questions, as the ungrammaticality of () shows: ()

*Have you read how many books?

If we ‘undo’ T-to-C movement, we have a well-formed ‘echo question’, as distinct from a simple request for information: 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

()

WH-MOVEMENT

You have read HOW many books?!

As the punctuation is meant to indicate, () is only grammatical with a very particular intonation. Indirect echo questions are impossible: ()

*I wonder he has read HOW many books?!

In the clausal complement to verbs like wonder, wh-movement to SpecCP is required. Many languages do not have wh-movement of the kind we find in English, although they certainly allow their speakers to ask whquestions. In these languages, the wh-phrase remains unmoved, in its grammatical-function position (subject, direct object, adverb, etc.). Chinese and Japanese are perhaps the best known examples of such languages, called wh-in-situ languages, as illustrated by the examples of indirect questions in (): ()

a. Zhangsan xiang-zhidao [Lisi mai-le sheme ]. [Chinese] Zhangsan wonder Lisi bought what ‘Zhangsan wonders what Lisi bought.’ (Watanabe : , ()) b. Boku-wa [CP [IP John-ga nani-o [Japanese] I-Top John-Nom what- katta] ka] shiritai. bought Q want-to-know ‘I want to know what John bought.’ (Watanabe : , (a))

We can thus formulate a parameter to distinguish English-style languages from Chinese-Japanese style languages: Parameter E:

Does a wh-phrase move to the Specifier of an interrogative CP? YES: English, Italian, Spanish, German, Welsh . . . NO: Chinese, Japanese, Thai, Korean, Turkish . . .

What properties co-occur with parameter E? One possibility is that insitu languages lack wh-determiners and wh-pronouns of the kind exemplified by which, what, etc., in English. The Chinese and Japanese words that translate these English expressions are in fact indefinite pronouns; the interpretation of the clause as a wh-question depends on the presence and nature of a sentential particle in these and many other languages. This is clearly seen in Chinese, where the nature of the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

particle determines the interpretation of the clause, and of the indefinite DP, as involving wh-interrogation or not: ()

a. Hufei chi-le sheme (ne) Hufei eat- what Qwh ‘What did Hufei eat?’ b. Qiafong mai-le sheme ma Qiafong buy- what Qy/n ‘Did Qiafong buy anything?’ (Cheng : –)

[Chinese]

Here we see that the choice of the sentence-final particle determines whether sheme is interpreted as an indefinite in a yes/no question or as a wh-phrase in a wh-question. Cheng () makes two interesting generalizations in this connection. First, she proposes that every clause needs to be ‘typed’, i.e. its ‘force’ as an interrogative, declarative, etc. must somehow be marked. In the case of typing a wh-question, she proposes that either a wh-particle in C or fronting of a wh-word to the SpecCP is used (). Assuming that C Agrees for a wh-feature with a given DP, we could think of this as realization of the wh-feature on either the Probe or the Goal, with associated movement of the Goal when the wh-feature is morphologically realized there. This implies that C is final in the Chinese examples in () above; we will turn to the question of cross-linguistic variation in word order in the next section. The strongest version of Cheng’s generalization probably cannot be maintained, as there are in-situ languages which have unambiguous wh-words, e.g. Hindi and Turkish; similarly, there are languages with both wh-movement and the possibility of interpreted in-situ wh-phrases as indefinites, e.g. German. For discussion of other aspects of Cheng’s generalizations, see Roberts (a: –). Languages which have overt wh-movement vary along a further important dimension. In English, it is possible for more than one whphrase to appear in a single sentence. We saw an example of this in the Introduction to this chapter, which I repeat here, with the copy of whmovement, who, indicated: ()

Who did he think (who) was likely to drink what?

As we saw, examples like these can have a ‘pair-list’ interpretation. What is impossible, however, is movement of more than one whphrase to SpecCP: 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

()

WH-MOVEMENT

*Who what did he think (who) was likely to drink (what)?

But some languages allow this. The best known ‘multiple whmovement’ languages are the Slavonic languages, illustrated by the following Bulgarian example (from Rudin ): ()

Koj kogo e vidjal? who whom aux saw-s ‘Who saw whom?’

[Bulgarian]

Here a further parameter must be at work, distinguishing among languages with the positive value for parameter E, as follows: Parameter E:

Do all wh-phrases move to the Specifier of an interrogative CP? NO: English, Italian, Spanish, German, Welsh . . . YES: Bulgarian, Russian, Serbo-Croatian . . .

There is an obvious implicational relation between parameters E and E, indicating a further parameter hierarchy (but see Bošković  for a different view, and Roberts a: – for a more elaborate whmovement parameter hierarchy which integrates Bošković’s proposals). Another potentially important property connected with whmovement is word order in the VP. Bach () observed that wh-insitu languages tend to be OV (on OV languages in general, see §.). He formulated what has become known as Bach’s generalization: ()

If a language is OV, then it has wh in situ (see Kayne : ).

This is an example of a possible implicational universal, something I will come back to in much more detail in §.. Languages like Dutch, German, and Latin are problematic for () since they clearly have wh-movement and are OV (see §..). Dryer (a,c) together provide data on the correlation between VO order and overt wh-movement. Of the  languages with determinate properties regarding both wh-movement and OV or VO word order, i.e. excluding ‘no dominant order’ and ‘mixed’ wh-movement, we observe the following pattern of co-occurrence: ()

OV and wh in situ VO and wh in situ VO and wh-movement OV and wh-movement

    

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

Box 1.2. Further cross-linguistic variation in overt wh-movement Other parameters clearly distinguish among languages with wh-movement. One, which we will look at in more detail in §., concerns the possibility of Preposition-stranding. Preposition-stranding is the cross-linguistically rather rare option of moving the complement of a preposition, while leaving the preposition ‘stranded’. English and Mainland Scandinavian languages allow this in wh-questions, illustrated in (a) (it is also allowed in passives, but I will leave that point aside here):

() a. Who did you speak to __?

b. To whom did you speak __?

As (b) shows, English also allows ‘pied-piping’ of the Preposition along with the wh-phrase. It is not clear what exactly permits this in English and the Scandinavian languages. In most other languages which have overt whmovement, the preposition must be pied-piped. This is the case of Standard French, for example:

() a. *Qui as-tu parlé à __?

[French]

b. A qui as-tu parlé __?

French has some property which requires pied-piping of the DP in all cases. Kayne () made a very interesting and influential proposal concerning this, but in the context of technical assumptions which have not been carried over into most versions of minimalism. We will look at Preposition-stranding in one variety of French in our discussion of contact in §... Another parameter concerns the possibility of ‘left-branch extraction’, i.e. moving a wh-phrase from the left branch of the category containing it. This is not allowed in English, as () shows:

() *Whose did you read [DP (whose) book]? Here whose has moved from the left branch of the DP. The equivalent of () is allowed in many languages, however. () is an example from Russian:

() Č’ju

ty čital knigu? whose you read book ‘Whose book did you read?’ (Biberauer and Richards : , ())

We will encounter this fact about Russian again in §...



[Russian]

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WH-MOVEMENT

Since the generalization is formulated as a one-way implication, the fact that VO languages seem to readily allow wh-in-situ is not problematic. Most of the languages combining OV and wh-movement are Australian, Amazonian, or Amerindian (along with some Khoisan and Nilo-Saharan languages), which may explain why this possibility was earlier thought to be rare or non-existent; these languages have become much better described in the years since Bach (). We can observe, however, that it is less common than the other patterns, with less than % of the surveyed languages showing it. Bach’s generalization, combined with the fact that this combination is outnumbered four-to-one by OV, in-situ languages, may indicate that () describes a tendency in the world’s languages. As we will see in the next section, Watanabe () suggests Old Japanese did not obey this generalization. Finally, it has been suggested that ‘wh-agreement’ is only found in contexts of wh-movement, and never in wh-in-situ languages. Whagreement can be defined as ‘a phenomenon in which verbal inflection and complementizers display distinct morphosyntactic properties in the clauses which immediately contain the displaced wh-phrase and its traces [i.e. copies—IR]’ (Watanabe : ). A familiar example is past-participle agreement in compound tenses in French, which, at least in literary French, is found when a direct object undergoes whmovement (as well as in other contexts, which I leave aside here): ()

Quelle voiture as-tu conduite? which car (..) have-you driven (.) ‘Which car have you driven?’

[French]

In Irish, the complementizer introducing a finite clause from which wh-movement has taken place changes its form from go to aL (the ‘L’ indicates that this element causes lenition in the initial consonant of the following word—see McCloskey  for details): ()

a. Deir said gur ghoid na síogaí í. say they -[] stole the fairies her ‘They say that the fairies stole her away.’ b. an ghirseach a ghoid na síogaí the girl a stole the fairies ‘the girl that the fairies stole away’



[Irish]

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

c. rud a gheall tú a dhéanfá thing a promised you a do [-] ‘something that you promised that you would do’ (McCloskey : –) In Palauan, when any argument other than the local subject (in a direct question, the main-clause subject) undergoes wh-movement, the verb shows irrealis morphology. This is illustrated in (a), where the direct object is wh-moved, and in (b), where the embedded subject is moved (here the main verb shows irrealis marking): ()

a. ng-ngera [a le- silseb-ii (ng-ngera) a se’el-il] [Palauan] -what --burn- friend- ‘What did his friend burn?’ (Georgopoulos : , in Watanabe : ) b. ng-te’a [a l-ilsa a Miriam  who --see Miriam [el milnguiu er a buk er ngii (ng-te’a)] ]  --read  book her ‘Who did Miriam see reading her book?’ (Georgopoulos : , in Watanabe : )

Watanabe (: ) concludes that ‘if a language exhibits a whagreement phenomenon, it always shows sensitivity to overt whmovement’. We will return to this point when we take up Watanabe’s discussion of Old Japanese in the next subsection. In this section we have seen the basic cross-linguistic properties of wh-movement: the existence of wh-movement and wh-in-situ languages, a difference captured by Parameter E; a further distinction among wh-movement languages between ‘single’ and ‘multiple’ whmovement, captured by the subordinate parameter E. Arguably both of these are macroparameters, since they both concern all wh-phrases (of all categories) in the relevant languages. We also briefly discussed Bach’s generalization and Watanabe’s observation that wh-agreement may correlate with wh-movement. There is much more to say about wh-movement; in fact the investigation of this phenomenon has been a central theme in generative grammar for many years (see Box .), but the above points are sufficient here.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WH-MOVEMENT

Box 1.3. The significance of wh-movement Although not directly relevant to the aims of this book, it may be worth briefly indicating why wh-movement has been of such importance for generative theory. The basic reason is that wh-movement appears to be able to operate over an unlimited amount of material. For this reason, it is known as an unbounded dependency. The apparently unbounded nature of whmovement is illustrated in ():

() a. What did Bill buy (what)?

b. What did you force Bill to buy (what)? c. What did Harry say you had forced Bill to buy (what)? d. What was it obvious that Harry said you had forced Bill to buy (what)?

However, beginning with Ross (), a range of constructions in which even unbounded dependencies cannot be formed was recognized. These constructions are known as islands. Two examples of the various islands that were first recognized in Ross () are given in () and ():

() The complex DP constraint:

a. *Which writer did you write [DP a play which [TP was about (which writer)] ] ? b. *Which writer did you believe [DP the claim that [TP we had met (which writer)] ] ?

() The left branch condition (LBC):

a. *Whose did you play [DP (whose) guitar]? b. Mick’s friend’s favourite guitar c. Whose guitar did you play (whose guitar)?

‘Island effects’ of this type are not found in wh-in-situ languages such as Chinese, as Huang () first showed. However, very intriguingly, Huang also showed that other locality effects associated with wh-movement are found in wh-in-situ languages. In particular, adjunct wh-elements cannot be interpreted with wide scope in certain islands. The following example illustrates this for the adjunct weishenme (‘why’) in a complex NP in Chinese: xihuan [weishenme mai shu de ren]? [Chinese] you most like why buy book  person ‘Why do you like the man who bought the books?’

() Ni zui

This demonstrates that, while movement is sensitive to islands, there are further locality constraints on wh-interpretation which are independent of overt movement. See Watanabe () for more systematic discussion of this, and Richards () for an overview of wh-movement and island constraints across a range of languages. (continued)



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

Box 1.3. (Continued) Chomsky () proposed an account of island constraints in terms of the subjacency condition, a general condition on the formation of unbounded dependencies. The idea is that, despite appearances, wh-movement does not in fact apply over unbounded distances, but instead proceeds in a series of short ‘hops’ via each SpecCP position. Thus an example like (c) above has the structure in (), with the various copies indicated:

() [CP What [C0  did [TP Harry say [CP (what) [C0  [TP you had forced Bill [CP (what) [C0  [TP to buy (what)]]]]]]]]]?

This kind of movement is known as successive-cyclic movement. Subjacency is then formulated so as to prevent movement of a wh-constituent across more than one blocking category (BC). The blocking categories (in English) are DP and TP. Movement from a complex DP as in () crosses a DP and a TP boundary, as does movement from the left branch of a DP as in (), hence these examples are ruled out by the subjacency condition. Chinese examples like () cannot be subject to subjacency, and so some other principle must be at work here (see the textbooks on government-and-binding theory cited in the Introduction for discussion of this). It has been claimed that phenomena such as the alternation in the Irish complementizers illustrated in () are evidence for successivecyclic movement, since exactly the complementizers in clauses containing the position from which the wh-constituent was moved show the complementizer alternation. In general, wh-movement has been important as it has provided clear evidence for the local nature of syntactic dependencies, even where they appear to hold over unlimited domains. It has also provided a clear indication of the precise nature of the locality constraints that hold. It has thus been a central tool in the investigation of UG.

1.5.2. Wh-movement in the diachronic domain Watanabe () argues that parameter E changed its value in Old Japanese, specifically between the Nara Period (eighth century) and the Heian Period (ninth to tenth centuries). In the earlier period, then, Japanese had overt wh-movement. The basic evidence for this is the observation, which Watanabe attributes to Nomura (), that ‘the



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WH-MOVEMENT

wh-phrase must precede the nominative subject in the Nara Period’ (: ). This is illustrated by examples such as the following: ()

a. Kasugano-no fuji-ha chiri-ni-te [Old Japanese] Kasugano- wisteria- fall-. nani-wo-ka-mo mikari-no hito-no ori-te what--- hike- person- pick- kazasa-mu? wear.on.the.hair-will ‘Since the wisteria flowers at Kasugano are gone, what should hikers pick and wear on their hair?’ b. Kado tate-te to-mo sashi-taru-wo izuku-yu-ka gate close- door-also shut-. where-through- imo- ga iriki-te yume-ni mie-tsuru? wife-  enter- dream- appear- ‘From where did my wife come and appear in my dream, despite the fact that I closed the gate and shut the door?’ (Man’youshuu nos.  and ; Watanabe : , (a,b))

The wh-phrase here bears the focus marker ka. In fact, focused nonwh-phrases, marked with ka, occupy this pre-subject position, which also follows a topic marked with hu: ()

Hatsuse-no kawa-ha ura na-mi-ka [Old Japanese] Hatsuse- river- shore absent-ness- fune-no yori-ko-nu? boat- approach-come- ‘Is it because Hatsuse River has no shore that no boat comes near?’ (Man’youshuu no. ; Watanabe : , (b))

The existence of overt wh-movement in Old Japanese explains the presence of a wh-agreement phenomenon, known in the Japanese philological tradition as kakasimusubi. Kakasimusubi relates the nature of a preposed XP to the form of the verb: where the preposed XP is marked with ka, for example, the verb must appear in the ‘adnominal form’, which is morphologically distinct from the ‘conclusive form’ of (unfocused) declaratives. Watanabe (: ) gives further details. Both the particles and the special verb forms were lost by the fifteenth century, although Watanabe



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

argues that the wh-movement system triggering kakasimusubi was lost by the beginning of the eleventh century (). As he points out, ‘[r]ecognising overt wh-/focus-movement in Old Japanese makes it possible to understand why it used to have the system of kakasimusubi’ (); the wh-agreement was lost as a consequence of the loss of overt whmovement, i.e. the change in parameter E. Watanabe documents five changes which take place between the Nara and Heian Periods: ()

a. loss of overt wh/focus-movement; b. decrease in the use of ka with the wh-phrase in genuine whquestions; c. use of wh-ka in rhetorical questions; d. loss of ka with non-wh forms; e. increase in subject topicalization.

Changes (b–d) amounted to the loss of ka as the morphological trigger for wh-/focus-movement. (e) removed the word-order trigger for focus/wh-movement, since it created orders in which the focused XP did not systematically precede the subject. The changes in () combined with the consistent verb-final order (which, as we saw in our discussion of Bach’s generalization in the previous section, tends to disfavour overt whmovement) to eliminate overt wh-movement from the grammar of Japanese. In this way, the value of parameter E changed. Kuroda () endorses Watanabe’s general conclusions, noting also that there was already a wh-in-situ construction in Old Japanese, which was independent of kakasimusubi (this is consistent with a positive value of E, as is the existence of in-situ wh-phrases in multiple questions in languages where parameter E is negative, such as English).27 As a related point, Kuroda () observes that Old Japanese optionally also had wh-movement in relative clauses, where Modern Japanese has internally headed relative 27

Kuroda analyses Old Japanese wh-movement as targeting SpecIP, while Watanabe (: ) argues that it targets SpecFocP in a split-CP (of the kind discussed in relation to V in §...; see ()). Clearly, Watanabe’s analysis is more directly in line with the formulation of parameter E given above than Kuroda’s. Aldridge () takes a different view again, suggesting that Old Japanese wh-movement targeted a clause-medial position, roughly speaking immediately at the left of VP, while Aldridge () proposes an analysis closer to Kuroda’s in postulating that the focused XP in kakasimusubi constructions moves to SpecTP. See Dadan () for further discussion of these alternatives. Despite these differences, all these analyses share the idea that Old and Modern Japanese differed in the incidence of overt wh-movement, which is our central concern here.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WH-MOVEMENT

clauses. In fact, Kuroda relates the presence of wh-movement in Old Japanese to a range of other differences between Old and Modern Japanese, all connected to an abstract agreement macro-parameter (cf. the discussion of Japanese as an RNSL in §..). Loss of overt wh-movement is attested elsewhere too. Aldridge () argues that Archaic Chinese had wh-movement to a clausemedial position, implying a version of parameter E has also changed in the history of Chinese. Dadan (: –) gives evidence, based on Hale () and Fortson () that Sanskrit and Avestan had overt wh-movement, while Modern Hindi and Modern Persian are wh-insitu languages. In fact, Fortson (: ) argues that all the older Indo-European languages had overt wh-movement (see below on Latin and Classical Greek), hence any contemporary Indo-European language with wh-in-situ will have undergone change in parameter E. Finally, Duguine and Irurtzun () show that one variety of Modern Basque, the Navarro-Labourdin variety, is developing wh-in-situ, possibly under the influence of Colloquial French (on which, see below). As Dadan (: ) points out, the loss of overt wh-movement ‘is indeed a cross-linguistic trend in need of explanation’. The development from Latin to some of the Modern Romance languages provides evidence of a change in the value of both parameters E and E. There is now quite clear evidence that Latin had multiple wh-movement (see in particular Devine and Stephens : ; Danckaert : ff.; Dadan : –, and the references given there). The following examples from Dadan (: ), with his sources indicated, illustrate: ()

a. Quis quem fraudasse dicatur? [Latin] who whom defrauded is.said ‘Who is said to have defrauded whom?’ (Cicero, Pro Roscio Comoedo ; Devine and Stephens : ) b. Quis apud quos quibus praesentibus sit acturus who. with who. who. is.present be do. ‘Who is going to plead before whom in whose presence?’ (Quintilian ..; Devine and Stephens : ) c. Ego [quid cui debe-a-m] sci-o. I. what. whom. owe-.-. know-.. ‘I know what I owe to whom.’ (Seneca, De Beneficiis ..; Danckaert : ) 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

(a) presents two fronted wh-phrases, while in (b) there are three. (c) shows that multiple fronting was possible in embedded clauses in Latin. Roussou (: ) provides evidence that Classical Greek also had multiple fronting. The Modern Standard Romance languages behave like English, in requiring movement of one wh-phrase, while requiring all other whphrases in multiple questions to remain in situ.28 The following Spanish and French examples illustrate: ()

a. *A quien que le pidio Ivan? [Spanish] to who what  asked Ivan b. A quien le pidio Ivan que? to who  asked Ivan what ‘What did Ivan ask who?’ (Dadan : , citing Reglero and Ticio : )

()

a. *A qui quoi Jean a-t-il demandé? to who what Jean has-he asked b. A qui Jean a-t-il demandé quoi? to who Jean has-he asked what ‘What did John ask who?’

[French]

However, in various registers of French, Brazilian Portuguese, and Spanish, wh-movement has undergone, or may be undergoing, change in that wh-in-situ questions seem to have developed in their fairly recent history. Foulet (: ) discusses this in relation to French, giving, among others, the following examples (boldface, glosses and translations added): ()

a. Vous avez vu qui là-bas? You have seen who over there ‘Who did you see over there?’ b. Vous pensez à quoi alors? You think of what then ‘What are you thinking of then?’

28

[French]

It has been claimed that Italian does not allow multiple questions of any kind; see Rizzi (: ).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WH-MOVEMENT

c. Vous irez où cet été? You will-go where this summer ‘Where are you going this summer?’ Foulet (: –) comments on these examples as follows:29 These sentences hardly get into books . . . However they are very frequent and appear to be gaining ground . . . these sentences introduce a singularly new note into the language. To leave the interrogative word behind the verb, to take away its privileged position, to reduce it to the role of a secondary complement, is to turn upside down the best established rules of syntax, is to break not only with the French tradition but with the Latin tradition, it is to go against twenty-four centuries of history. It is not surprising that the written language refuses to take seriously sentences which give the impression of an infant’s babbling.

Leaving aside the normative tone of these comments, we see that Foulet correctly observes that the loss of wh-movement marks a change to a long-standing syntactic property (the positive value of parameter E); in fact, if Fortson (: , : ) is correct, the property is older than Latin, going back possibly to Proto IndoEuropean (see §..). Foulet also correctly observes that wh-in-situ is not part of the literary language, but that it is frequent and, as of , increasing in frequency; Dadan (: –) cites recent studies showing that it appears in up to % of interrogatives in the speech of middle-class Parisians. Finally, Foulet is correct in pointing out that wh-in-situ is a feature of child language (although this might be a generous interpretation of his remark). Oiry (: ) gives the following examples from children’s production data: ()

a. Tu penses que j’habite où? [Child French] You think. that I=live.. where ‘Where do you think I live?’

‘Ces phrases ne pénètrent guère dans les livres . . . Elles sont pourtant très fréquentes et semblent gagner du terrain . . . ces phrases introduisent dans la langue une note singuliérement nouvelle. Rejeter le mot interrogatif après le verbe, lui enlever sa place privilégiée, le réduire à un role de complement secondaire, c’est bouleverser les règles les mieux établies de la syntaxe, c’est romper non seulement avec la tradition française mais avec la tradition latine, c’est aller à l’encontre de vingt-quatre siècles d’histoire. Il n’est pas surprenant que la langue écrite refuse de prendre au sérieux des phrases qui lui font l’effet de balbutiements d’enfant’. My translation. 29



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

b. Tu penses qu’il y a quoi dans You think. that=it  have.. what in la malle? the trunk ‘What do you think is in the trunk?’ c. Tu penses que le policier fait quoi? You think.. that the policemen do.. what ‘What do you think the policeman is doing?’ Here we see wh-in-situ in the subordinate clauses selected by verbs which are incompatible with indirect questions (*You think what the policeman is doing?), hence the wh-element has scope over the entire sentence containing the main and the subordinate clause. In other words, the overt-movement counterpart of (a), for example, would be (): ()

Où penses -tu/est-ce que tu penses Where think. =you/is=it that you think. que j’habite? that I=live.. ‘Where do you think I live?’

[French]

Similar examples are given from colloquial varieties of French by Starke (: –) and Bobaljik and Wurmbrand (: ). Following Obenauer () and Starke (), Shlonsky (: ) shows that French wh-in-situ is possible in complex DP-islands, i.e. relative clauses, as in (gloss added):30 ()

Vous connaissez des gens qui pourraient You know.. some people who can.. héberger combien de personnes? host. how.many of persons ‘You know people who could host how many people?’ (cf. *How many people do you know people who can host (them)?)

30 There are further complexities concerning the interaction of French wh-in-situ with islands of various kinds, but we leave those aside here; see the discussion and references in Shlonsky (: –).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WH-MOVEMENT

Such examples are normatively dispreferred but, as the child-language evidence in particular shows, they are attested in everyday speech.31 Starting with Lopes-Rossi (: ff.; ), it has been observed that Brazilian Portuguese also shows wh-in-situ: in direct questions, in subordinate complements to verbs like ‘think’ which do not take interrogative complements and, according to Kato (: ), in complex-DP islands (see Box .). This is shown by the following examples from Kato (: , , glosses and translations added): ()

a. A senhora veio fazer o que aqui? [BP] The lady come.. do.Inf the what here ‘What did you (polite feminine address) come to do here?’ b. Você mora em que andar ai no Itaim? You live.. on what floor there in-the Itaim? ‘Which floor do you live on in Itaim?’ c. Maria pensa que Jean comprou o que ? Mary think.. that John buy.. the what ‘What does Mary think John bought?’ d. Maria ama o livro que quem escreveu? Mary like.. the book that who write.. *‘Who does Mary like the book that (he) wrote?’

(Note that the English translations of () and (d), with overt whmovement, lead to strong ungrammaticality with a gap, and clear illformedness with the overt ‘resumptive pronoun’ in the gap position; this is clear evidence of the islandhood of this structure—see again Box ..) Lopes-Rossi (, ) observes that examples of this kind first

31

However, wh-in-situ is generally not possible in indirect questions, or in simple main clauses where there is an overt complementizer que: (i) a. *Pierre a demandé tu as embrassé qui. [CollFr] Pierre have.. ask. you have.. kiss. who. b. Pierre a demandé qui tu as embrassé. Pierre have.. ask. who you have.. kiss. ‘Pierre asked who you kissed.’ c. *Que tu as vu qui?  you have.. see. who d. Qui que tu as Who  you have.. ‘Who have you seen?’

vu? see.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

appear in Brazilian Portuguese in the twentieth century. They are also found in European Portuguese, but are less common (Kato : ). In Spanish too we find examples of wh-in-situ in main clauses, in embedded clauses and inside relative-clause (complex DP) islands (see Dadan : –): ()

a. ¿Y tú invitaste a tu fiesta a quién? [Spanish] And you invited to your party to who ‘And who did you invite to your party?’ b. ¿Y tú te vas a vestir para la fiesta cómo? And you  are-going to dress for the party how ‘And how are you going to dress up for the party?’ c. ¿Te has enamorado del hombre que vive con quién? You have fallen-in-love of-the man that lives with who ‘Who have you fallen in love with the man that lives with?’

So we see that parameter E changed its value between Latin and Romance, ruling out multiple wh-movement in Romance,32 and that parameter E has changed, or is in the process of changing, in contemporary French, Brazilian Portuguese, and Spanish. Mathieu and Sitaridou () show that Classical Greek allowed leftbranch extraction, as in (b) (see Box . on this), while Modern Greek does not, as () shows: ()

a. Tina dynamin echei? [Classical Greek] what... power... have. ‘What power does it have?’ (Plato, Laws, a; Mathieu and Sitaridou : , (a)) b. Tina echei (tina) dynamin. (Plato, Republic, b; Mathieu and Sitaridou : , (b))

32

Modern Romanian has multiple wh-fronting, as examples like the following show: cui ce ziceai [că ia promis] [Romanian] Who to.whom what said. that to-him= has promised ‘Who did you say promised what to whom?’ (Rudin /: , gloss modified)

(i) Cine

Whether this is an innovation due to contact with Slavic languages, e.g. Bulgarian, which have multiple fronting, or an indication that parameter E did not change in this part of the Romance-speaking area is not clear. Gheorghe () shows that the oldest attested stages of Romanian had multiple fronting (thanks to Adam Ledgeway for drawing my attention to this article).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

()

HEAD-COMPLEMENT ORDER

a. Ti dinami exi? what power... have. ‘what power does he/she/it have?’ b. *Ti exi (ti) dinami? (Mathieu and Sitaridou : , ())

[Modern Greek]

Mathieu and Sitaridou relate the change to the loss of rich agreement on wh-elements in Greek and the reanalysis of ti, the definite article in Modern Greek, from an Adjective to a Determiner, which had the consequence that ti lost its earlier ‘indefinite’ uses. (See Roberts and Roussou : – for details.) We return to the change discussed by Mathieu and Sitaridou in §... We conclude that parameters connected to wh-movement, both E and E, are subject to diachronic change. In the history of Latin and Romance, we can observe changes to both.

1.6. Head-complement order As the final example of parametric variation, I take one of the most pervasive and well-studied instances of cross-linguistic variation: the variation in the linear order of heads and complements. I will concentrate on the aspect of this variation which has been recognized since Lehmann () as the most important: the relative order of verbs and their objects. In fact, we took the variation between verb-object (VO) languages and object-verb (OV) languages as our initial example of a parameter in §.. Here I develop the points made there in more detail. We will also look at an important generalization regarding headcomplement order: the Final-Over-Final Condition (FOFC) and some of its diachronic implications. 1.6.1. Head-complement order synchronically As we saw in §., English and German differ regarding the order of infinitival verbs and their direct objects. This difference is illustrated by the contrasts in (), repeated here: () a. Tomorrow John will visit Mary. b. *Morgen Johann wird besuchen Maria. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

c. Morgen wird Johann Maria besuchen. d. *Tomorrow will John Mary visit. These examples show us that German has OV order in infinitives, while English has VO. Since we have seen that the verb moves to C in main finite clauses in German (see §...), giving rise to verb-second order, we can suppose that the verb occupies its unmoved position in examples like (c). Finite verbs may also occupy this position in clauses where movement to C is not allowed. Again, we saw this in §... in examples such as () and (): ()

Du weißt wohl, [German] You know well a . . . daß ich schon letztes Jahr diesen Roman las. . . . that I already last year this novel read b . . . daß ich schon letztes Jahr diesen Roman gelesen habe. . . . that I already last year this book read have

()

Ich frage mich, I ask myself a . . . ob ich schon letztes . . . if I already last b . . . ob ich schon letztes Jahr . . . if I already last year

[German] Jahr year diesen this

diesen this Roman book

Roman las. book read gelesen habe. read have

Indeed, in our representation of a verb-second clause in () we assumed that the finite verb was merged in a position following the direct object. German thus combines OV and V orders. These two properties are nevertheless independent. We saw in §... that the Scandinavian languages are V but VO, and many OV languages are not V. Examples of OV languages in which all clauses—main and subordinate—are OV include Japanese, Korean, and Turkish, as sentences like () show: ()

a. Sensei-wa Taro-o sikata. teacher- Taro- scolded ‘The teacher scolded Taro.’ b. Kiho-ka saca-li-l cha-ass-ta. keeho- lion- kick-- ‘Keeho kicked the/a lion.’ 

[Japanese]

[Korean]

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

HEAD-COMPLEMENT ORDER

c. Ahmet kitab-i oku-du. Ahmet book- read- ‘Ahmet read the book.’

[Turkish]

We can thus formulate the OV/VO parameter as follows: F. Do direct objects precede or follow their verbs in overt order? PRECEDE: German, Dutch, Japanese, Korean, Turkish, Basque . . . FOLLOW: English, Romance, Thai, Zapotec . . . However, as we have already seen, other word-order properties are correlated with the OV/VO parameter. We saw in §. that the relative order of auxiliaries and their associated verbs correlates with OV or VO order, in that VAux order is characteristic of OV languages and AuxV order of VO languages. Controlling once more for the effects of verbsecond order in finite main clauses, we can observe this correlation by comparing English and German. The following examples, again repeated from §., illustrate: () a. John can visit Mary. b. Johann wird Maria besuchen können.

(AuxV) (VAux)

Similarly, Japanese has VAux order, as in: ()

John-ga Mary-to renaisite iru. (VAux) John- Mary-with in-love is ‘John is in love with Mary.’

[Japanese]

Other languages showing the combination of OV and VAux orders include Basque, Burushaski, Chibcha, Hindi, Kannada, Nubian, Quechua, and Turkish. These are languages from Greenberg’s () original thirty-language sample, as summarized in Hawkins (: –). Conversely, in the same sample, the languages which show VO and AuxV order are Finnish, Greek, Italian, Mayan, Norwegian, Serbian, and Swahili. This list excludes the VSO languages Welsh, Zapotec, and Masai, all of which have AuxV; if VSO in all these languages is derived as described for Welsh in §..., then these languages also fit the pattern. Only one language, Guaraní, with VAux and VO, diverges from the pattern, although a number of the languages Greenberg studied lacked a class of auxiliaries according to Greenberg’s definition of what an auxiliary is (see note ) and therefore their status regarding this correlation could not be determined. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

Correlations of this kind were first observed by Greenberg (), and have formed essential data for the field of language typology ever since. They are usually known as implicational universals, since they are often stated in the form of logical implications, i.e. ‘if a language has property P, then it has property Q’; we saw an example of an implicational universal in our formulation of Bach’s generalization in () in the previous section. What such statements effectively assert is that, of the four combinations of presence vs. absence of properties P and Q which are in principle available, one is not found: presence of P and absence of Q. Here, for example, is the statement of the correlation between OV and VAux (from Hawkins : ): In languages with dominant order SOV, an inflected auxiliary always follows the main verb.

If we are to adopt the idea that parameters are responsible for variation in grammatical systems, then such implicational universals must be derived either from clustering effects created by a single parameter, or by some theoretical articulation of the relations among the values of distinct parameters. In this respect, the results of work in language typology can be seen as setting a particular research agenda for parameter-based comparative syntax. For example, let us state parameter F as follows:33 F. Do main verbs precede or follow their auxiliaries in overt order? PRECEDE: German, Dutch, Japanese, Korean, Turkish, Basque . . . FOLLOW: English, Romance, Zapotec . . . If, in terms of the clause structure we adopted in §., we continue to take auxiliaries to be members of T,34 then we can conflate F and F as follows:35 33 Thai is in F, but not F. This is because according to Greenberg’s (: ) definition of auxiliaries in terms of possible person/number inflection (‘a closed class of verbs . . . inflected for person and number, . . . in construction with an open class of verbs not inflected for both person and number’), Thai cannot have auxiliaries as it lacks such inflections. A different definition of auxiliaries may yield a different result, although the facts in Thai are rather complex. In any case, Thai probably does not constitute a counterexample to the implicational correlation between F and F. 34 Here I gloss over the distinction made in §... among Tense, Mood, and Aspect heads for ease of exposition. 35 The structure we gave in () for German does not correspond to F, as we indicated T as preceding its VP complement. However, we could assume that VP precedes T in



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

HEAD-COMPLEMENT ORDER

F. Does the structural complement of V/T precede or follow V/T? PRECEDE: German, Dutch, Japanese, Korean, Turkish, Basque . . . FOLLOW: English, Romance, Thai, Zapotec . . . F represents a first step in the direction of cross-categorial harmony as a way of understanding and unifying some of Greenberg’s wordorder universals. This idea originates in Vennemann () and was developed using a variant of the theory of phrase structure assumed here in Hawkins (). A further property which in many languages correlates with F is the relative order of adpositions (i.e. pre- or postpositions) and their complements. In Greenberg’s thirty-language sample, all but three of nineteen VO languages are prepositional, and all eleven OV languages are postpositional. Dryer (b) gives the following figures concerning this correlation in , languages he sampled ( languages were defined as ‘other’, i.e. not straightforwardly falling into one of the four types): ()

OV and Po(stpositions) OV and Pr(epositions) VO and Po VO and Pr

   

More than % () of the languages sampled show consistent orders in this respect, while only fifty-six (just under %) of the languages diverge. This is clearly a significant result, although the divergent languages require further investigation. We could thus formulate parameter F, and unify it with F as F: F. Do objects of adpositions precede or follow their adpositions in overt order? PRECEDE: Japanese, Korean, Turkish, Basque, Amharic . . . FOLLOW: English, Romance, Thai, Zapotec, German, Dutch . . .

German without having to change any fundamental aspect of what was stated there. In fact, as pointed out by Vikner (), if T is final in German, then we may assume that V always moves to T, as the finite verb is always final in non-V clauses in that language. See also the first note to Table .. Dryer (: –) points out that there is no correlation between the relative order of tense/aspect particles and V and that of verb and object; however, as he makes clear in a subsequent section (–), this applies to all cases of tense/aspect markers, including affixes, particles, and auxiliaries. If purely verbal auxiliaries are taken into consideration, then the correlation discussed in the text holds for his much larger sample.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

F. Does the structural complement of V/T/P precede or follow V/T/ P in overt order? F takes us temptingly close to a fully category-neutral statement of word-order variation, which might look like F: F. For all heads H, does the structural complement of a head H precede or follow H in overt order? This is fairly close to Dryer’s (: ) Branching Direction Theory (BDT), which he states as follows:36 Verb patterners are heads and object patterners are fully recursive phrasal dependents, i.e., a pair of elements X and Y will employ the order XY significantly more often among VO languages than among OV languages if and only if Y is a phrasal dependent of X.

Here ‘verb patterners’ and ‘object patterners’ refer to elements which pattern with, respectively, the verb and the object in their relative ordering. Dryer’s BDT is based on a carefully selected sample of  languages from all over the world, and as such has impressive empirical scope. In the World Atlas of Language Structures (WALS henceforth; Haspelmath et al. (), https://wals. info) a total of , languages are surveyed, but not for all the properties given below. In terms of what we have seen so far, F and F present an obvious difficulty concerning German, Dutch, and Latin: these languages apparently pattern one way with regard to F and the other way with regard to F, thereby making the conflation of F and F given in F impossible. According to Dryer (b) there are ‘repeated instances’ of the co-occurrence of OV and Prepositions in Iranian languages, including Persian, Tajik, and Kurdish; VO co-occurs with Postpositions in West African languages, some Finno-Ugric languages, and in

36

F may seem closer to what Dryer () calls the Head-Complement Theory, i.e. the theory that verb patterners are complement-takers while object-patterners are complements. His objections to this theory have to do with the relative ordering of verb and manner adverbs, with noun-genitive order and with relatives. Clearly, none of these cases involves complementation if this concept is related to subcategorization, argument structure, assignment of thematic roles, etc. (See §., for more on these notions.) However, F refers to structural complementation, i.e. the structure [HP H XP], where H is non-recursive and XP is (potentially) recursive. In this sense, F is equivalent to Dryer’s generalization.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

HEAD-COMPLEMENT ORDER

South America. He adds that languages of both types are ‘often not typical OV or VO languages’. Biberauer and Sheehan (: –) distinguish three types of OV languages. Type I OV languages allow just one argument to precede the verb, so where there is both a direct and an indirect object (e.g. with a verb like ‘give’) one of them precedes and one follows the verb. Also, manner adverbs, adpositional and clausal arguments follow the verb. Examples include a range of West African languages, some Papuan languages, Iraqw (a Southern Cushitic language spoken in Tanzania), and Neo-Aramaic. In Type II OV languages, nominal, adjectival, and adpositional arguments precede the verb, but finite clausal complements must follow the verb and certain adverbials either precede or follow. Examples of languages of this type include Dutch, German, Turkish, Persian, and Hindi. Type III OV languages are those generally taken to be fully harmonic, such as Japanese, Korean, Malayalam (on which see below), and other Dravidian languages. In languages of this kind, all arguments and adverbials precede the verb. Biberauer and Sheehan (: , note ) observe that the distinctions among these types can be captured by assuming differing degrees of departure from a general VO template, i.e. just inverting V and O (Type I), allowing or requiring a range of arguments, especially those with relatively little internal structure of their own (see §..), to precede V, and requiring all arguments and modifiers to precede V; see §.. and Roberts (a: ff.) for more on this idea. The expected cross-categorial harmony is not found in cases like German, Dutch, and Latin, then. To deal with this difficulty, we have several options. The first and worst option is to retract any attempt at cross-categorial generalization and revert to the position that the relative order of complement and head is to be restated for each category of head. This is a very weak theory, which would effectively make F—and similar correlations to be discussed below—appear to be accidental. It also goes against Dryer’s conclusions, as enshrined in his BDT. An intermediate approach might group heads into subclasses for the purposes of the statement of cross-categorial wordorder correlations. We could, for example, consider F to be the subcase of F which refers to clausal heads, those which make up the ‘core’ of the clause, TP. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

It is here that one of the central ideas in what follows comes into play. In essence, in order to capture both the fact that there are exceptions to F and the fact that cross-categorial harmony is preferred, we need to be able both to state the parameter in categoryneutral terms as in F and to weaken it to specific categories as necessary, along the lines of F–F. This can be achieved if we assume (i) that all parameters are properties of individual heads, in line with the Borer-Conjecture given in () above, but (ii) that there are independent factors which can cause groups of heads—over which we can generalize with an appropriately formulated feature system—to act in concert. We must further assume that (iii) there is a preference for the independent factors to act in this way, but that this preference is ‘soft’, i.e. defeasible. If we make these assumptions, and if we can understand more precisely the weighting of the preference, then we can maintain that ‘with overwhelmingly greater than chance frequency’ (in Greenberg’s apposite formulation) the category-neutral parameter in F holds, while at the same time allowing for disharmonic patterns in languages such as German, Dutch, and Latin. The independent factors we have just alluded to are the third set of factors in language design, which we introduced as () in §.. We repeat them here: ()

(i) Feature Economy (FE) (Roberts and Roussou : ): Postulate as few formal features as possible. (ii) Input Generalization (IG) (based on Roberts :  and §.): Maximize available features.

As we mentioned in §., the third-factor conditions combine to form an optimal search/optimization strategy of a minimal kind; Biberauer () unifies them as Maximize Minimal Means, we will come back to this in §.. Roberts (a) argues that these third factors interact with the other two factors to give rise to parametric variation, expressed in the form of parameter hierarchies. In effect, F–F do form a hierarchy, with F as the superordinate option, F as the second option, and F–F as subsidiary options (see §.. () for a statement of the hierarchy).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

HEAD-COMPLEMENT ORDER

Let us finally consider the whole range of word-order correlations established by Dryer. These are as follows (see his table , ):37 ()

a. b. c. d. e. f. g. h. i. j. k.

V want Copular Aux Neg-Aux Comp Q-marker Adverb V Article Plural marker l. Noun m. Noun n. Adjective

PP infinitive Predicate VP VP Sentence Sentence Sentence Manner Adverb Noun Noun

slept [on the floor] wants [to see Mary] is [a teacher] has [eaten dinner] don’t [know French] that [John is sick] if [John is sick] because [John is sick] ran [slowly] the [man] PL [man] (= ‘men’)

Relative Clause movies [that we saw] Genitive father [of John] Standard of comparison taller [than John]

All of the correlations in () fall under F, given the assumptions we have made about clause structure, and other fairly plausible assumptions regarding the structure of nominals and the structure of complex adjectival constructions. (a,b) both involve the relative order of verbs and their complements, PP in (a) and one of CP, TP, or VP in (b). (c–e) all plausibly involve the order of T and its structural complement, VP in (d, e) and either a special PredicateP (see Bowers , ) or a variety of different phrasal complements in (c). (f,g) clearly involve the relative order of C and TP, and (h) may too; alternatively, (h) involves P with a CP complement (this depends on the exact status of adverbial subordinators as P or C). (i) is slightly more surprising; however, manner adverbials may be taken to occupy complement positions inside VP, as long as these positions are dissociated from the requirements lexically imposed by 37

Dryer also shows that the relative order of verb and subject is predicted by the relative order of verb and object. He comments, ‘the proportion of the genera containing SV languages is higher among OV languages than it is among VO languages, largely because of the extreme rarity of OVS languages’ (). For the reason given, it may not be correct to subsume the correlation under the same principle as the others, i.e. under F in the terms adopted here. (See Dryer :  for relevant discussion.) I will leave this correlation aside; it is not clear how to capture it in terms of the assumptions about phrase structure and grammatical functions being made here.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

heads (pace the theory proposed in Chomsky (: chapter ), for example). The correlation in (j) can be understood in terms of the DPhypothesis (which we briefly mentioned in the Introduction), according to which articles are members of the category D, which has an NP complement. Similarly, (k) may reflect an elaboration of that hypothesis which claims that D takes NumberP as its complement, and NP is the structural complement of Number. (We will see this idea again in a different context in §..) (l) is consistent with the idea that the relative clause (a CP) is the structural complement either of the head noun, or of the D which introduces the entire structure (see Kayne , Bianchi , Cinque  on the latter idea). (m) is straightforward for some instances of genitives (for example, the one given as illustration; here of John is uncontroversially the complement of father), but not for others (for example, Caesar in Caesar’s destruction of the village). Finally, (n) can be maintained if the ‘standard of comparison’ is taken to be the (CP) complement either of the comparative/superlative morpheme or of than. (Bhatt and Pancheva  give a recent version of this analysis, which has its origins in Chomsky : ff.) All of the above points raise analytical questions, but they are compatible with the idea that a parameter like F may underlie Dryer’s results. English shows a very consistent VO pattern, except that genitives may precede their nouns (as in John’s father); in French and other Romance languages, where this kind of genitive is not available, the VO pattern is completely consistent with the VO value of F. We can illustrate a consistent OV pattern with the Dravidian language Malayalam. To the extent that the phenomena listed in () are found, Malayalam instantiates a consistent OV pattern, as shown in () (all examples are from Asher and Kumari ; capitals indicate retroflex consonants): ()

a. mee ∫ayuTe meele/miite vecciTTiTuNTə table- on put-- (PP V) ‘put on the table’ (Asher and Kumari : )



[Malayalam]

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

HEAD-COMPLEMENT ORDER

b. vi ∫vasikka vayya (infinitive V)38 believe- cannot ‘cannot believe’ c. Ii kuTTi nallavan aaNə (Predicate Copula) this child good- be- ‘The child is good.’ (Asher and Kumari : ) d. Nii atə ceyyaNam. (VP Aux)39 you it do- ‘You must/should do it.’ (Asher and Kumari : ) e. Nii atə ceyyaNTa. (VP Neg-Aux) you it do-- ‘You need not do it.’ (Asher and Kumari : ) f. Ellaarum etti ennə kuTTi paraňňu. (Sentence C) all arrive-Past 40 child say- ‘The child said that all had arrived.’ (Asher and Kumari : ) g. Avan vannoo? (Sentence Q) he come-- ‘Did he come?’ (Asher and Kumari : ) h. Ňaan paraňňa poole avan (Sentence Adverb) I say-- like he pravartticcu. act- ‘He acted in the way I told him.’ (Asher and Kumari : )

An example with ‘want’ was unavailable, but it is well-known that in many languages the complement of ‘can / be able to’ is syntactically very similar to that of ‘want’, and so this example may suffice. 39 Malayalam being an agglutinating language, it is unclear whether the relevant morphemes in this example or the next are endings or auxiliaries. It is nevertheless clear that they follow the VP, and so they may illustrate the relevant property. 40 ‘Quotative Particle’: rather clearly a kind of complementizer, to judge by its distribution (Asher and Kumari : ff.). 38



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

i. AvaL bhamgiyaayi prasamgiccu. (Manner-Adv V) she beauty- speak- ‘She spoke beautifully.’ (Asher and Kumari : ) j. There is no separate category of articles. (Noun Article) (Asher and Kumari : ) k. Number is marked by a suffix. (Noun Plural) (Asher and Kumari : ) l. naaLe naTakkunna ulsavam (RelCl Noun)41 tomorrow take-place-- festival ‘the festival that takes place tomorrow’ (Asher and Kumari : ) m. aa kuTTiyuTe peena (Genitive Noun) that child- pen ‘that child’s pen’ (Asher and Kumari : ) n. Raaman kᴙşNane kaaLum miTukan (Standard Adj) Raman Krishnan- than clever aaNə. be- ‘Raman is cleverer than Krishnan.’ (Asher and Kumari : ) Thus Malayalam represents the ‘precede’ value of F, while English (with the proviso for genitives mentioned) and the Romance languages represent the ‘follow’ value. There is, however, a problem with F. Dryer shows that a minority of the languages in his sample actually conform to the BDT on all points. The majority of languages diverge at least in some respects (the commonest divergence being N-Rel order in OV languages). Thus, if F is a single parameter, it predicts a spectacular clustering of properties, which is not actually attested in the majority of languages; see also

41 There is another relative-clause construction involving wh-movement, in which the head noun appears to precede the clause—see Asher and Kumari (: ). However, the participial construction illustrated here is more common.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

HEAD-COMPLEMENT ORDER

Biberauer and Sheehan (), Cinque (). In order to solve this problem, we need to be clearer about how the linear orders relevant to F are produced. F can only be a true parameter if it is associated with a unitary grammatical operation, for example, whatever operation determines which branch of a binary-branching pair is the recursive one. (Such a formulation would be the closest parametric analogue to Dryer’s BDT.) However, it may be that linear order is the result of the interaction of various grammatical operations. If this were the case, then we may be able to regard a more abstract version of F as a ‘pure’ word-order parameter, and then appeal to other operations which are independently parameterized in order to capture the many divergences from the two ‘pure’ patterns given by F. The preference for ‘harmonic’ ordering may thus derive from an overriding tendency for independent parameters to conspire to produce a certain type of grammar; here again, the third-factor principles of Feature Economy and Input Generalization given in (i) and (ii) are relevant. We will return to this point in §. and §.. In this section, I have outlined the idea that systematic crosslinguistic variation in head-complement orders exists, at least across significant subsets of heads. I have attempted no theoretical explanation of this, beyond simply observing that the tendency for cross-categorial harmony, in essentially the sense introduced by Hawkins (), and developed by Dryer (), alongside the widespread existence of disharmonic order of various kinds, is a promising area for the application of the idea of parameter hierarchies. 1.6.2. Head-complement order diachronically In this section, I will focus simply on OV and VO orders, limiting my attention to the restricted cases F and F discussed in the previous section. We already made the basic observation in the Introduction: Old English showed OV word order in embedded clauses. I mentioned this in the discussion of verb second in §..., and I repeat example () from that discussion here:



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

()

a. . . . þæt ic þas boc of Ledenum gereorde to [OE] . . . that I this book from Latin language to Engliscre spræce awende. English tongue translate ‘ . . . that I translate this book from the Latin language to the English tongue’ (AHTh, I, pref, ; van Kemenade : ) b. . . . þæt he his stefne up ahof. . . . that he his voice up raised ‘ . . . that he raised up his voice.’ (Bede .; Pintzuk : ) c. . . . forþon of Breotone nædran on scippe lædde . . . because from Britain adders on ships brought wæron. were ‘ . . . because vipers were brought on ships from Britain.’ (Bede .–; Pintzuk : )

We can thus observe that OE had OV order in subordinate clauses. We saw in §... that OE had verb-second order in main clauses, and so we do not expect to find overt OV order in such clauses (unless of course the object is fronted to first position). An objectinitial verb-second clause in OE like () can thus be treated as having a structure just like that proposed for such clauses in German in ():42 ()

Maran cyððe habbað englas to Gode þonne men. [OE] more affinity have angels to God than men ‘Angels have more affinity to God than people.’ (AHTh, I, ; van Kemenade : )

42 Here I have placed the indirect object to Gode in a position adjoined to VP. This is an approximation of the standard analysis of indirect objects (on which see in particular Radford : ch. ; Hornstein et al. : ff.; and §.), and should not be interpreted as indicating a formal parallelism with the adverbial phrase schon letztes Jahr in (). I have omitted þonne men from the representation as it obscures the parallel with (), and it is rather unclear where this constituent (which is most likely a CP containing a great deal of elided material) should attach.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

()

HEAD-COMPLEMENT ORDER

CP

Cʹ

DP maran cyððe C habbað

DP englas

TP

Tʹ

T (habbað)

VP

PP to Gode

VP

DP (englas)

V

V DP (maran cyððe) (habbað) The position of the auxiliary in (c) indicates a further parallel with German. Here we see the finite auxiliary in a subordinate clause following the non-finite verb. (See note  on one possibility for analysing this order in German, which would carry over to OE.) So we conclude that parameter F had the value ‘precede’ in OE, and that it has therefore changed in the subsequent history of English. While the situation concerning F is reasonably clear, that regarding F is not. Hawkins (: ) states that OE is prepositional, with Article-Noun, Genitive-Noun, and Noun-Relative order. To this we can add that the complementizer þæt always precedes the sentence it introduces. So OE is clearly mixed as far as F is concerned; here the points 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

regarding the role of the third factors raised in relation to German, Dutch, and Latin in the previous section come into play once more. Although broadly similar to Modern German, OE word order was rather freer, an observation which has given rise to much discussion. In particular, non-pronominal direct objects could follow the non-finite verb as well as precede it: ()

a. þæt hi urum godum geoffrian magon that they our gods offer may ðancwurðe onsægednysse grateful sacrifice ‘that they may offer a grateful sacrifice to our gods’ (ÆCHom I, ..; Fischer et al. : )

[OE]

b. ðe is genemned on Læden Pastoralis, & which is named in Latin Pastoralis and on Englisc Hierdeboc in English Shepherd’s-Book ‘which in Latin is called Pastoralis and in English Shepherds’ Book’ (CPLetWærf ; Fischer et al. : ) In (a), the auxiliary follows the main verb, while in (b) the auxiliary precedes the main verb. We also find examples where the auxiliary precedes the non-finite verb (in contexts which are clearly not those of the general verb-second rule—see §...) but where the object precedes both verbal elements, as in: ()

a. se ðe nan ðing nele on ðissum life [OE] he who no thing not-wants in this life ðrowian suffer ‘he who will suffer nothing in this life’ (ÆCHom I, ..; Fischer et al. : ) b. Gif he ðonne ðæt wif wille forsacan if he then the woman wish refuse ‘if he then wishes to refuse the woman’ (CP ..; Fischer et al. : )

This variation in word order presents a further challenge to the postulation of a parameter like F, which we will come back to in §.. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

HEAD-COMPLEMENT ORDER

Nevertheless, it is fairly clear that the OV pattern is the statistically predominant one in OE. See Pintzuk () for discussion of this and a number of related points, in particular regarding the relation between surface OV or VO order and other variant properties, such as the position of particles, etc.; Pintzuk interprets the variation in OE word order as evidence that there is more than one grammar underlying the corpus. See also Biberauer and Roberts (); Taylor and van der Wurff (); Pintzuk and Taylor (); Taylor and Pintzuk (); and §... Dating the change in parameter F in English is rather difficult. On the one hand, there is evidence for VO and AuxV order in OE, as we have already seen. On the other hand, superficial OV order can be found in the fifteenth century and even, in the relevant context and the right register, as late as the seventeenth century. Once again, this points perhaps to a non-unitary parameter. I reserve judgement on this question here; we can simply observe that the change took place during the ME period (but see also §..). The general question of the gradualness or otherwise of parameter changes will be discussed in more detail in §.. The same kind of word-order change can be observed in the development from Latin to Romance. Classical Latin word order, although rather free, is tendentially OV. See Harris (: ff.); Vincent (: ff.); Ernout and Thomas (: ); Salvi (); Devine and Stephens (); Ledgeway (); Danckaert (: –, ). The following examples illustrate: ()

a. Caesar exercitum reduxit. [Latin] Caesar. army. led-back.. ‘Caesar led back his army.’ (Caesar, De Bello Gallico ..; Danckaert : , Ledgeway : ) b. cum populus regem iussisset when people. king. order.. ‘when the people have chosen a king.’ (Livy, ab Urbe condita ..; Danckaert : )

In Classical Latin, it was also usual for auxiliaries to follow main verbs. This is true to the extent that a class of auxiliaries can be discerned; the clearest case involves esse (‘be’) in the perfect passive or perfect of deponent verbs, as in (): 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

()

a. Orgetorix mortuus est [Latin] Orgetorix die.. be.. ‘Orgetorix died.’ (Caesar, De Bello Gallico .; Devine and Stephens :  (gloss added)) b. subitum bellum in Gallia coortum est sudden. war. in Gaul break-out.. be.. ‘A sudden war broke out in Gaul’ (Caesar, De Bello Gallico .; Devine and Stephens :  (gloss added))

So here we see evidence for the ‘precede’ value of parameter F, as in OE (Danckaert : –, shows that there is a major difference between the development of ‘BE-periphrases’ of the kind seen in () and complements to modals in Late Latin). And again, it is clear that the Modern Romance languages have the ‘follow’ value, just like Modern English, as the following French translations of (a) and (a) show: ()

a. César Caesar b. Orgetorix Orgetorix

ramena son armée. led back his army. est mort. died.

[French]

It is not entirely clear when this change took place in the history of Latin/Romance. Ledgeway (: ), citing Adams (), says that ‘this change in directionality had already been largely completed as early as the Classical Latin period, but that its effects were in many cases masked by the deliberately archaizing patterns of the literary language’. This point is substantiated in Adams (: –), who says that by the time of Plautus (c.– ) ‘in spoken Latin of the informal varieties VO was already established as the unmarked order, but . . . OV was preferred in literary Latin as the unmarked pattern’. See also the quantitive evidence in Ledgeway (: , table .; , table .). As with OE, the other features of Latin are mainly VO, as first argued by Adams (: –; see also Harris : ff., Hawkins : , Clackson and Horrocks : –, Ledgeway : –, : –, and references given there). Classical Latin was mainly prepositional (Ledgeway : –), had Noun-Relative order, and showed either order of Noun and Genitive, as shown clearly by the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

HEAD-COMPLEMENT ORDER

statistical evidence in Ledgeway (: , table .); the order of Genitive and Noun was not free, however, it was conditioned by various pragmatic and grammatical factors (Devine and Stephens : ch. , Ledgeway : –). Moreover, it is clear that complementizers and question words such as ut and ne precede their clauses (Ledgeway : ): ()

a. ad Romam (Pr) [Latin] to Rome b. Germani qui trans Rhenum incolunt (N-Rel)43 Germans who across Rhine lived ‘Germans who lived across the Rhine’ (Caesar, Bello Gallico I, , ; Ernout and Thomas : ) c. liber Petri (N-Gen) book of-Peter ‘Peter’s book’ d. Ubii Caesarem orant ut sibi (C-Sentence) Ubii Caesar beg that them(selves) parcat. he-spare ‘The Ubii beg Caesar to spare them.’ (Vincent : )

Comparatives also showed both an archaic head-final pattern, where the standard precedes the adjective (see (n), (n)) and a headinitial one where the adjective precedes the standard: ()

a. nil hoc homine audacius nothing this. man. braver ‘none is braver than this man’ (Plautus, Menaechmi ; Ledgeway : )

[Latin]

43 Head-initial relatives replaced an earlier, archaic correlative construction, probably inherited from Proto Indo-European. Ledgeway (: ) gives examples like the following: (i) cui testimonium defuerit, is obuagulatum ito [Archaic Latin] Who-. witness. lack... he wail. will-go-. ‘He whose witness has failed to appear may summon him by loud calls’ (Leges XII Tabulorum ; Ledgeway : )

Ledgeway comments: ‘it is evident that this archaic pattern constituted a stylistically and pragmatically marked structure, which by early Latin (e.g. Plautus) had already long yielded to the head-initial structure’.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

b. hominem callidiorem uidi neminem/ quam man. cleverer. I.saw nobody. than Phormionem Phormio. ‘I never saw a more cunning fellow than this Phormio’ (Terence, Phormio –; Ledgeway : ) So it appears that F may have changed in early Latin, but with OV orders being retained in literary Classical Latin, giving the impression that F had not changed. To quote Ledgeway once more: since at least the early Latin period the arrangement of head and dependent already betrays an unmistakeable shift from an original head-final order, still largely discernable in early Latin and the earliest attestations of the other Italic varieties, towards an increasingly unmarked head-initial order across a broad range of categories and constructions. (Ledgeway : )

He goes on to describe the OV order of many classical texts as a ‘deliberate literary affectation’. Danckaert (: –) provides more detailed corpus evidence on verb-object orders, whose initial results indicate that VO is present from the earliest periods and that there is little overall change over the period from early to Late Latin. He goes on to argue at length that Latin at all periods had several object positions, and that any analysis of verb-object order alone, without taking into account the syntactic context, in particular cases where the VO constituent is in the complement of a modal, is inevitably misleading owing to the degree of structural ambiguity at play. In essence, he shows that parameters F and F cannot be investigated separately, but at the minimum a parameter such as F is required in order to fully understand the diachronic data. We can infer or observe a similar OV-to-VO change elsewhere in Indo-European. Rögnvaldsson () and Hróarsdóttir (, , ) show in detail that Old Icelandic was OV. The relevant examples for parameter F are:44

Both Rögnvaldsson () and Hróarsdóttir (, , ) use the term ‘Old Icelandic’ for the language which is frequently referred to as ‘Old Norse’. See Faarlund (b: ) and Rögnvaldsson (: ) for a clarification of the terms for the older stages of the Scandinavian languages. 44



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

()

HEAD-COMPLEMENT ORDER

a. so þorsteinn skyldi lífinu tapa so Thorstein should life-the lose ‘so that Thorsteinn should die’ b. þú munt fret hafa, að . . . you will heard have, that . . . ‘you will have heard that . . . ’ (Hróarsdóttir : )

[Old Icelandic]

Indeed, OV order is usually assumed for early Germanic generally (see the discussion in Hawkins : ff. and the references given there). Taylor (, ) argues that Greek changed from OV in the Homeric period to VO in the Classical period. Early Old Irish shows some OV orders; see Bergin (–); Thurneysen (: ); Russell (: ff.); Doherty (a,b). Continental Celtic also shows SOV order (Russell : ff.; Eska : –, ), although Gaulish may have been SVO while the more conservative Lepontic was SOV; it appears that in Gaulish, as in Latin and OE, the orders are mixed in that Noun-Genitive and prepositional orders are found. On the other hand, the modern Celtic languages are all VO, in fact VSO. It seems, then, that changes in parameter F, and probably F, are widespread, at least in the history of several branches of IndoEuropean, in particular in the Indo-European languages of Western Europe (Bauer : ). Indeed, Lehmann (: ) cites Delbrück (–) for proposing ‘OV order for the early dialects of Indo-European and the parent language as well’. Fortson (: , : ) says that ‘[i]t is almost universally asserted that most of the ancient IE languages were verb-final, and that PIE [Proto-IndoEuropean—IGR] was as well’ (although he goes on to point out that this claim ‘needs tighter formulation and convincing demonstration’); see also Hoenigswald, Woodard, and Clackson (: ). If this is correct, then all present-day Indo-European languages with VO order (English, North Germanic, Romance, Celtic, Greek, Slavonic, etc.) must have undergone a change in parameter F, at least. Lehmann (: –) also cites evidence that Proto Indo-European may have been postpositional and had Standard-Adjective and Relative-Noun orders. This suggests a more far-reaching OV typology than can be ascertained for OE or Latin and, concomitantly, the possibility of changes in F–F. A great deal has been said about



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

Indo-European syntax,45 and we will return to this topic in §., when we discuss syntactic reconstruction. The OV-to-VO change is not, however, restricted to Indo-European: Kiparsky (: ) observes that ‘Western Finno-Ugric languages, including Finnish’ have undergone a change from OV to VO. The above observations show that, despite certain difficulties in interpreting the data and despite the problematic nature of parameter F, there is good evidence that parameter F has changed in English, Icelandic, and Romance, and reason to think that these parameters may have also changed in other branches of Indo-European and elsewhere. 1.6.3. The Final-Over-Final Condition Biberauer, Holmberg, and Roberts () and Sheehan, Biberauer, Roberts, and Holmberg () observe an important cross-linguistic generalization about word order, which they call the The Final Over Final Condition (FOFC). FOFC can be stated informally as follows: ()

A head-final phrase βP cannot dominate a head-initial phrase αP where α and β are heads in the same Extended Projection.

The notion of Extended Projection is taken from Grimshaw (, ); it corresponds to the traditional notion of clause or nominal, i.e. the combination of the lexical categories VP or NP and all their respective associated functional categories up to CP in the case of the clause and DP (or PP) in the case of the nominal. We can see the import of FOFC if we look at the following abstract structures: ()

a. b. c. d.

[βP [βP [βP *[βP

β [αP α γP]] [αP γP α] β] β [αP γP α]] [αP α γP] β]

45 The older Indo-European languages conform to a general type in showing non-rigid head-final order, second-position effects, a very active left periphery, sub-extraction from DP, null subjects and objects, synthetic verbal morphology and case inflections. On Latin, see Ledgeway (), Danckaert (, ), Devine and Stephens (), Salvi (); on Greek, Taylor (, ); on Sanskrit, Hale (, ), Kiparsky (); on Old Church Slavonic, Večerka (); on Celtic, Watkins (, ), Russell (: –), Newton (), Eska (, ); on Germanic, Walkden (: –), Ringe (: ); on Gothic, Miller (); on Old Iranian, Skjærvø (: f.); and on Anatolian, Garrett ().



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

HEAD-COMPLEMENT ORDER

(a) is a harmonically head-initial structure, typical of a system where F is set to ‘follow’. (b) is a harmonically head-final structure, with F set to ‘precede’. (c) is a disharmonic system where β is set to ‘follow’ and α to ‘precede’; this is what we observed in Latin and German where βP corresponds to CP and αP to TP (i.e. F is set to ‘precede’, while the relevant parameter for CP is set such that C’s complement follows C), giving rise to the following structure in CP: ()

[CP . . . C [TP VP T]]

We observed this structure in examples like () and () in §... and the Latin example in (d) in the previous section. Most importantly, though, FOFC asserts that the disharmonic order in (d) is not found. In other words, the mirror image of the German or Latin structure in () is claimed not to exist. FOFC is thus a constraint on disharmonic orders; it effectively asserts that certain combinations of settings of F–F are not found. Consider for example the combination of setting F to ‘follow’ and F to ‘precede’. Assuming auxiliaries are in T, this combination of settings gives rise to the structure in (): ()

[TP [VP V O] [T Aux]]

() instantiates the abstract structure in (d), where α is V, β is T, and γ is O. On their own, the differing values of F and F can combine to give rise to the structure in (), but FOFC asserts that this is not found. So FOFC predicts that the order V>O>Aux is not found in the world’s languages. In addition to ruling out V>O>Aux order, and Aux>VP>C order (the mirror image of ()), FOFC explains, or is part of the explanation of, a range of cross-linguistic generalizations; see Biberauer, Holmberg, and Roberts (: ) and Biberauer, Holmberg, Roberts, and Sheehan () for details, examples, and references. The most important synchronic generalizations are the following: ()

a. The higher a head is in an Extended Projection, the less likely it is to be final. b. In OV languages which have embedded finite clauses with an initial complementizer the embedded clause is always extraposed (e.g. German).

The generalization in (a) is supported by the fact that OV order is more common cross-linguistically than clause-final complementizers; 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

this follows from the fact that final Cs are not found in VO languages, but initial Cs are found in both VO and OV languages (in other words, FOFC rules out the mirror image of (), as we saw above). The generalization in (b) refers to the following structure, where V is final in VP in C is initial in the complement CP: ()

[VP [CP C TP] V]

Here again, we have a structure which instantiates (d). German is a language of this kind. It is a well-known fact about German syntax that finite CP complements are always postverbal, as the following example illustrates:46 ()

Er weißt, daß sie kommen. He know.. that they come.. ‘He knows they’re coming.’

[German]

Many other languages have the combination of V-final VPs and C-initial subordinate clauses, and all have a postposing rule comparable to the German one. Moreover, some languages, for example Turkish and Hindi/Urdu, have both initial and final complementizers; the postposing rule only applies to CPs with initial complementizers (see Biberauer, Holmberg, Roberts, and Sheehan : – for examples and references). Most importantly for our present purposes, FOFC makes strong predictions about diachronic change. On the one hand, it predicts that change from head-final to head-initial starts at the top of the Extended Projection; for example, change from OV to VO is preceded by change from VP-T to T-VP, which is preceded by TP-C to C-TP. Conversely, change from head-initial to head-final starts at the bottom of the Extended Projection, beginning in VP and affecting the order of C and TP last.

46

German allows D-initial DPs and P-initial PPs to precede V in a V-final VP: hat [VP [DP ein Buch ] gelesen.] [German] He has a book read ‘He has read a book.’ nach Berlin ] gefahren.] b. Sie ist [VP [PP She is to Berlin driven ‘She went to Berlin.’

(i) a. Er

In examples like this, the notion of Extended Projection becomes relevant: DP and PP are distinct Extended Projections from VP, and hence FOFC does not apply to the V-D and VP relations.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

HEAD-COMPLEMENT ORDER

Biberauer, Sheehan, and Newton (: –) give evidence from word-order change in the history of English that the FOFC predictions are correct. In OE, as we have seen, we find evidence for verb-final VPs and, assuming auxiliaries are in T, T-final TPs; see (c) in the previous section. We also find Aux>O>V and Aux>V>O orders, as seen in () and () above. On the other hand, the FOFC-violating order V>O>Aux is not found in OE. As mentioned in the previous section, we will return to the question of variation in OE word order in §.. Moreover, Biberauer, Sheehan, and Newton point out, following Pintzuk (), the change from VP>Aux to Aux>VP order was completed by Early ME, while the OV-to-VO change was completed by only Late Middle English. Here is an example of OV order from Chaucer (late fourteenth century): ()

I may my persone and myn hous so kepen and [ME] deffenden. ‘I can keep and defend myself and my house in such a way.’ (Chaucer, Melibee ; Fischer et al. : )

Foster and van der Wurff () give the following ratios of VO to OV orders in poetry at fifty-year intervals in late ME:  (),  (),  (); in prose , , and . It seems, then, the order in TP changed before that in VP. If the changes had happened in the opposite order, we would find examples of V>O>Aux (head-initial VP combined with head-final TP) at some change in the history of English, which we do not. FOFC predicts that the changes took place in this order (of course, FOFC would also allow the two changes to take place simultaneously; see §. for discussion of gradualness in syntactic change). Similarly, Taylor and Pintzuk (: –) argue that the change from head-final to head-initial TP started earlier in the history of English than the change from head-final to head-initial VP. As they point out, FOFC dictates that the change in TP is a precondition for the change in VP, but that this does not mean that the first change had to be completed before the second change began. In fact, although the time spans of the two changes overlap almost completely, there is some evidence that the variation in headedness in the TP starts earlier than the variation in headedness in the VP. Two of the oldest OE texts, Beowulf and Laws of Æthelbert, were uniformly OV but, at least in the case of Beowulf, with variation in the headedness of TP (Pintzuk and Kroch ; Pintzuk ). As Taylor and Pintzuk () point out, 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

‘[b]ased on the Germanic ancestors of OE and on the diachronic trends, it is reasonable to assume that the earliest stage of OE was uniformly head-final in both the VP and the TP’; see also their table ., p. . Ledgeway (: –) shows that word-order change from headfinal to head-initial in Latin follows a pattern consistent with FOFC. Concerning clausal complements, he observes that Latin retained an archaic Accusative-cum-Infinitive (AcI) construction, inherited from Proto Indo-European, alongside an innovative construction with finite clauses introduced by the complementizers ut, quod, or quia (in §. we will look more closely at changes in clausal complementation between Latin and Romance). He argues that the AcI complements feature a silent final complementizer and goes on to observe that these complements regularly precede the main-clause verb, while, as in German, C-initial clauses follow the main verb. So we see contrasts such as the following (I have made two minor changes to the category labels here): ()

a. [CP [TP tacitum te dicere] Ø] credo [Latin] silent. you. say. I.believe ‘I fancy you say to yourself.’ (Martial, ..; Ledgeway : ) b. scis enim [CP quod [TP epulum dedi ]] you.know for that feast. I.gave ‘for you remember that I gave a public banquet once’ (Petronius, Satyricon .; Ledgeway : )

Ledgeway goes on to show that in Classical authors (Cicero, Caesar, and Livy), Christian authors, and in Medieval Latin, the C-initial clauses overwhelmingly appear postverbally, while the AcI clauses tend to be preverbal in Classical Latin and appear equally pre- or postverbally in the later periods (see Ledgeway : –, tables ., ., and .). Ledgeway takes PP to be part of the nominal Extended Projection (this is consistent with the observation that P-initial PPs can precede V in a V-final PP in German; see note ). In this connection, he observes that ‘original head-final adpositional structures . . . have been replaced by head-initial adpositional structures’ (Ledgeway : ). He goes on to propose that bare-case nominals feature a silent postposition, contrasting the following examples: 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

()

HEAD-COMPLEMENT ORDER

a. Pompeius [ . . . ] profiscitur [PP [ Canusium ] Ø ] [Latin] Pompey. sets.out Canusium. ‘Pompey . . . sets out for Canusium.’ (Caesar, De Bello Ciuili ..; Ledgeway : ) b. miles [PP ad [ Capuam ]] profectus sum soldier. to Capua. set.out I.am ‘I set out as a soldier for Capua.’ (Cicero, De senectute ; Ledgeway : )

Ledgeway also points out that head-initial PPs have been observed to tend to appear postnominally, while bare case-marked nominals (by hypothesis, containing a null postposition), appear both pre- and postnominally. So PPs behave rather similarly to CPs, and this behaviour can be explained by FOFC. Ledgeway then provides a general account of the word-order changes in Latin and on to Romance based on ‘a principle of crosscategorial harmonization, such that once head-initiality becomes established in the topmost CP and PP layers, it is then free to percolate down harmonically to the phrases that these in turn embed’ (Ledgeway : ). He gives several examples of FOFC-determined asymmetries in word order, e.g. involving preposition and gerundive complements where the gerund in turn has a nominal complement (see his table ., p. ; Roberts a: ). Concerning the relative orders of auxiliaries, verbs and objects, he suggests that there is one true Latin auxiliary, esse (‘be’), found in the perfective forms of passive and deponent verbs (see the discussion of () in the previous section). The unmarked order in these constructions is O>Aux>V (although ‘O’ may not be in the VP-internal object position in passives; object-tosubject movement in passives was optional in Latin, as it still is in Modern Spanish and Italian). However, esse was a phonologically weak element which tended to appear enclitic to an initial stressed element, particularly in main clauses. We thus find the two orders seen in (): ()

a. Cur armatorum corona senatus [Latin] Why armed.. crown. senate. saeptus est surrounded. is ‘Why has the senate been surrounded with a belt of armed men?’ (Cicero, Orationes Philippicae .; Ledgeway : ) 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

b. IS est hodie locus saeptus that. =is today place. guarded. ‘THAT site is guarded to this day’ (Cicero, De Diuinatione .; Ledgeway : ) Ledgeway goes on to point out that, once the enclisis operation became a syntactic rule it was reinterpreted as a rule raising T to C in main clauses, a precursor of the general V rule that came to be prevalent in Medieval Romance, as we saw in §.... However, the FOFC-violating order V>O>Aux is found in Latin, as pointed out by Biberauer, Roberts, and Holmberg (: ): ()

damnetur is qui [Latin] condemn.... that.. who.. fabricatus gladium est manufactured.. sword. be.. ‘he should be condemned, who manufactured the sword.’ (Cicero, pro Rabirio ; Danckaert : )

Following Remberger (), Biberauer, Holmberg, and Roberts suggest that these participles are nominal in Latin, and hence form a distinct Extended Projection from the auxiliary esse. As such they are not counterexamples to FOFC. Danckaert (: ff.) takes issue with this conclusion, and argues instead that an earlier period of Latin including Classical Latin indeed had these orders, but the structure was not that seen in () and so did not violate FOFC. In Later Latin, the string was reanalysed in such a way that the underlying structure did violate FOFC and as a consequence this order disappeared. In essence, then, FOFC requires head-final to head-initial wordorder change, of the kind seen in the history of English and of Latin/ Romance, to take place in ‘top-down’ order within Extended Projections. We have evidence that this was the case in these languages. Of course, the prediction is that we should find the same order of changes in the other Indo-European languages that appear to have undergone this change mentioned in the previous section (North Germanic, Celtic, Greek, and Slavonic), as well as in Western Finno-Ugric. FOFC thus makes a rich array of diachronic predictions. FOFC also makes predictions about the change from head-initial to head-final order; we return to this in §...



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

SUMMARY

1.7. Summary Here I will summarize the main conclusions regarding the diachronic changes in parameter settings that we have seen in this chapter. The goal of this chapter has been to demonstrate that the notion of parameter, as it has been construed in work on comparative syntax since Chomsky (), can play a useful role in describing syntactic variation in the diachronic dimension, just as it can in the synchronic domain. In other words, parameter change may be, at the very least, a useful analytical tool for diachronic syntax. Here are the changes we have seen: A. Null-subject parameters: A. Are null arguments allowed in the absence of agreement marking? A. Are null subjects allowed without person restrictions and with associated ‘rich’ verbal-agreement inflection? A. Do null subjects show person restrictions and allow indefinite interpretations in the sg? Parameter A changed value in French c.seventeenth century, also in Northern Italian dialects at about the same time, and presumably in prehistoric OE; in OHG it changed c.. Some varieties of Brazilian Portuguese have changed from A to A. (See Duarte , , ; the papers in Kato and Negrão ; Barbosa, Duarte, and Kato ; Kato and Duarte ; §.., and the references given there.) The same change has happened in Welsh (Tallerman .) B. B. B. B.

B. Verb-movement parameters (non-V): Does V move out of VP in finite clauses? Does V move to the higher Aspect head? Does V move to the Tense head? Does V move to the Mood head?

Parameter B changed in English in the seventeenth century, also in Danish and Swedish c., and appears to have changed in many French-based Creoles (see §., and the references given there). Parameter B may have changed in fifteenth-century English, yielding B or B before B changed in the seventeenth century. C. V parameters: C. V? YES/NO C. If YES, then FinV? 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

C. If NO, then ForceV? C. If NO, then Force[+F]-V? Parameters C and C changed from positive to negative in English in the fifteenth century. Parameter C changed from positive to negative, making C positive in Old French c., and C and C became negative c.. C or C became negative along with C in most, if not all, other Romance languages at various points in the Medieval period. C became negative in Welsh c. (Willis ) and possibly also in pre-Old Irish (Doherty a,b). D. Negative-concord parameters: D. Are all negative elements iNeg? D. Are all sentential negators iNeg? D. Are some sentential negators iNeg? Parameter D changed from negative to positive in MHG, with High German losing negative concord; the same happened in fifteenth– sixteenth-century Low German. In French, Parameter D changed from positive to negative, with D becoming positive, around  with the introduction of the ‘second’ negator pas. E. Wh-movement parameters: E. Does a wh-phrase move to the Specifier of an interrogative CP? E. Do all wh-phrases move to the Specifier of an interrogative CP? Parameter E changed its value between Latin and Romance, eliminating multiple wh-movement in Romance (except in Modern Romanian, see note ). Parameter E has changed, or is changing, in recent colloquial French, Brazilian Portuguese, and Spanish; it also changed between the Nara and Heian Periods of Old Japanese. F. Head-complement parameters: Do direct objects precede or follow their verbs in overt order? Do main verbs precede or follow their auxiliaries in overt order? Does the structural complement of V/T precede or follow V/T? Do objects of adpositions precede or follow their adpositions in overt order? F. Does the structural complement of V/T/P precede or follow V/T/P in overt order? F. For all heads H, does the structural complement of a head H precede or follow H in overt order? F. F. F. F.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

SUMMARY

F is a generalization over F and F; F generalizes over F and F; and F includes F, as well as applying to CP and all nominal and adjectival projections. F changed in ME, with some evidence that F changed in late ME and F in early ME; it also changed at some stage in the recorded history of Icelandic. There is reason to think that F has changed its value, perhaps incrementally (i.e. starting with CP and PP and then affecting structurally lower heads, as determined by FOFC) in Latin, between Homeric and Classical Greek (– ), and presumably in Celtic, as well as elsewhere in Indo-European (see note ). It is clear that there are implicational relations within each set of parameters A–F and that some parameters have more wide-ranging effects than others. In () we distinguished four different types of parameters, and the relevant definitions are repeated here: () a. b. c. d.

For a given value vi of a parametrically variant feature F: Macroparameters: all heads of the relevant type share vi; Mesoparameters: all heads of a given naturally definable class, e.g. [+V], share vi; Microparameters: a small subclass of heads (e.g. modal auxiliaries, pronouns) shows vi; Nanoparameters: one or more individual lexical items is/are specified for vi

We can assign the parameters in A–F to the different parameter types, as follows: ()

a. Macroparameters: A, D, E, E, F. b. Mesoparameters: A, B, C, F, F. c. Microparameters: A, B–B, C–C, D–D, F, F, F.

We have seen no clear examples of nanoparameters here, but see Biberauer and Roberts (, ) for an example from the history of English. The different types of parameters stand in implicational relations such that if a macroparameter has a given value then the corresponding mesoparameters become relevant; and if the mesoparameters have a given value then the microparameters become relevant. Clearly, then, the parameter types form an implicational hierarchy; we will look at the hierarchical relations among parameters in more detail in §.. On the basis of what we have seen in this chapter, then, there can be no doubt that the notion of parametric change has a role to play in the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

study of diachronic syntax, and a central role at that. However, this conclusion raises a number of questions which I have so far been avoiding. Foremost among these is the one summarized in the following quotation: If we isolate a parametric difference between, say, English and Italian, then we can simply describe the parameter and its consequences, and ideally say something about the typology it implies and the trigger experience that sets it . . . Then our job is done and we can go to the beach . . . If, however, we isolate a parametric difference between one historical stage of English and another, then we need to explain not just what the parameter is and what its effects are, but how, at some point in the generation-togeneration transmission of language, the new value was favoured over the older one. (Roberts : ; emphasis in original).

The question of how parameters can change their value in the generation-to-generation transmission of language is a very difficult and intriguing one. It is also of some importance for the theory of comparative syntax, since a successful answer to this question is likely to tell us much about the nature of parameters, the kind of PLD required to set them to a particular value, whether there are default values, and potentially many other matters. Much of what follows is devoted to fleshing out and considering these consequences of the point that changes in the values of parameters are an important force in the historical syntax of languages.

Further reading The history of French Thurneysen () is the original study in which the observation that null subjects could only appear in V clauses was made. Vanelli, Renzi, and Benincà (/) was the first paper to clearly show that Old French and the Medieval Northern Italian dialects had a V system with non-clitic subject pronouns and null subjects, which later developed into a non-V system with clitic subject pronouns and no null subjects. Adams (a,b) were the first works in government-binding theory to analyse the OF pattern of combined V and null subjects just in V contexts, and they were influential for much subsequent work. Adams (a,b,c) dealt with some of the problems left unresolved in the earlier work, particularly concerning early OF. Vance (, ) 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FURTHER READING

looks in detail at Middle French, arguing that the changes affecting V and null subjects may not be a single change. Roberts (a) extends Adams’ analyses and attempts to unite a number of changes in the history of French as a single parameter change. Further studies of these and related phenomena in the history of French are Dupuis (, ); Hirschbuhler and Junker (); Hirschbuhler (); Labelle and Hirschbuhler (, ); Labelle (). Kaiser (), Sitaridou (), and Zimmermann (, ) present different views on the nature of verb-movement and null subjects in the history of French and their possible connections. Ayres-Bennett () is a very useful collection of texts from various periods in the history of French, with commentary on all structural features, including morphology and syntax. Ayres-Bennett () is primarily a study of attitudes to language in seventeenth-century France; since French was undergoing a number of major changes at that time, much careful documentation of the textual evidence for the grammatical changes is adduced. Harris () is a wide-ranging study of the development of French, as well as to a lesser extent Spanish and Italian, from Latin, using a loosely Greenbergian descriptive-typological framework. De Kok () is a detailed study of the distribution of complement clitics in Old French. Foulet () is a detailed descriptive study of the development of the syntax of interrogatives in the history of French. Price () is a descriptive historical treatment of the development of Latin into French. Einhorn () is a general history of French. Foulet () is the standard reference for the descriptive syntax of Old French. Robert () is a comprehensive historical dictionary of French. The ,-page Grande Grammaire Historique du Français edited by Marchello-Nizia, Combettes, Prévost, and Scheer was published in . Null subjects Rizzi () contains a number of classic articles in principles-andparameters theory. In particular, in ch.  he presents a highly influential account of the null-subject parameter, in which he argues for the first time that wh-movement of a subordinate-clause subject across a complementizer moves the wh-phrase from postverbal position. Rizzi (a) presents the standard government-binding analysis of null subjects and (arbitrary) null objects. Gilligan () and Nicolis () are detailed studies of the status of the cross-linguistic 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

predictions made by the version of the null-subject parameter put forward in Rizzi (), and given in () of §... Gilligan concludes that the correlations essentially do not hold while Nicolis concludes that the correlations hold up fairly well across a wide range of languages, but that the distribution of expletive null subjects in creoles is problematic (see §.., on these). Alexiadou and Anagnostopoulou () is an influential proposal concerning the analysis of expletive null subjects in the context of the version of minimalism in Chomsky (: ch. ), which much subsequent work proposes carrying over to referential null subjects. The latter idea is anticipated and developed at length in Barbosa (). The same basic idea was first developed in government-binding terms by Borer (). Barbosa, Duarte, and Kato () is a systematic study of a number of apparent differences in the distribution and interpretation of null subjects between European and Brazilian Portuguese, partly based on the thorough diachronic studies of Brazilian Portuguese in Duarte (, ). See also Duarte () and Kato and Duarte () and §... The papers in Kato and Negrão () also concentrate on Brazilian Portuguese. Haegeman () is a detailed study of the phenomenon of ‘diary drop’—apparent null subjects in a stylistically restricted register—in English and French. Cardinaletti () is an early analysis of expletive null subjects in German. Roussou and Tsimpli () present a novel analysis of VSO orders in Modern Greek. The first explicit account of a partial null-subject system was Holmberg (), this was developed in Holmberg, Nayudu, and Sheehan () and several of the papers in Biberauer, Holmberg, Roberts, and Sheehan (). Camacho () is a further useful overview of many of the issues from a different perpective. A different line of research, looking at the informationstructural properties of null subjects and, to some extent, assimilating partial and consistent null subjects, is developed in Frascarelli (, ) and Frascarelli and Jiménez-Fernández (). Kinn (, ) and Kinn, Rusten, and Walkden () look at the evidence for null subjects in Old Norse, Middle Norwegian, and Old Icelandic respectively. Walkden () features an overview of null subjects in early Germanic, and Rusten () is a close study of Old English, while Axel () looks at Old High German and Breitbarth (, ) looks at varieties of Low German. A very important paper, bringing together the analysis of partial and radical pro-drop, is Barbosa (); see ch.  of Roberts (a) for an extension. The papers 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FURTHER READING

in Cognola and Casalicchio () give a further recent overview. Huang (, ) were early contributions on the nature of radical/discourse pro-drop in East Asian languages. Neeleman and Szendrői () develop and defend a different theory of radical/ discourse pro-drop. Argument ellipsis was first proposed for Japanese by Oku (). Tomioka () made the connection between bare arguments and radical pro-drop. Saito (, ) develops the theory, originally put forward by Fukui () and Kuroda () that radical pro-drop and argument ellipsis are connected to the general lack of agreement in Japanese and other East Asian languages. Takahashi (, ) and Şener and Takahashi (), are further important works on argument ellipsis, the first of these extending the approach to Turkish. Li (, ) bring out important differences between Chinese and Japanese pro-drop. Simpson, Choudhury, and Menon () looks at argument ellipsis in Bangla, Hindi, and Malayalam. Duguine () proposes that argument ellipsis is found in Spanish and Basque; Saab () argues against this conclusion for Spanish. Subject clitics Brandi and Cordin () is an early study of subject clitics in Trentino and Fiorentino, in which they establish that the subject clitics in these dialects are not pronouns but agreement markers, and that whmovement of a subject over a complementizer moves an inverted subject only. Rizzi (b) presents further arguments that subject clitics in many Northern Italian dialects are agreement markers rather than pronouns. Poletto () provides an account of the development of subject pronouns into subject clitics in the history of Veneto. Poletto () is an analysis of the nature and positions of subject clitics in over a hundred Northern Italian dialects. Manzini and Savoia () look at the same phenomena in almost  dialects. Cardinaletti and Repetti () argue that the subject clitics of at least some Northern Italian dialects are pronouns, rather than agreement markers. Several of the papers in D’Alessandro, Ledgeway, and Roberts () consider these issues, taking various positions. Poletto and Tortora () provides an overview of subject clitics across Romance. Jaeggli () is the earliest government-binding analysis of clitics and null subjects in Romance, concentrating on Spanish and French. He argues that the French subject pronouns are agreement markers. Roberge () also 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

argues that in at least some varieties of French the subject pronoun is in fact an agreement marker, as does Sportiche (). Renzi () looks at subject clitics in eighteenth-century Florentine. Cardinaletti and Starke () present a detailed typology of pronouns, dividing them into strong, weak, and clitic. They analyse this typology in terms of a theory of structural deficiency. Roberge and Vinet () is a collection dealing with clitics and related matters in a range of Romance varieties. Zribi-Hertz () and Culbertson () consider the status of subject clitics in varieties of Modern French. General aspects of generative theory Chomsky () is the earliest monograph on generative grammar, and remains in many ways a classic exposition of the theory. The analysis of English auxiliaries presented there remains influential. Chomsky () is a review of Skinner’s Verbal Behavior, a behaviourist account of first-language acquisition. The review effectively destroyed behaviourist accounts of language acquisition, at least in linguistics, and went a long way towards establishing a nativist alternative. Chomsky () is the foundational text of governmentbinding theory, and gives the first clear overview of how the modules of that theory (binding theory, Case theory, bounding theory, etc.) were thought to interact. Chomsky () introduces some refinements, notably the Extended Projection Principle. Chomsky () is the work that most of the technical notions of the theory of syntax introduced in this book are based on. Chomsky () further refines and develops this model, while Chomsky (a,b) discusses the conceptual background to and implications of the current theory, notably introducing the third factors in language design. Chomsky () further develops the theory of phases, while Chomsky (, ) propose an important Labelling Algorithm. Hauser, Chomsky, and Fitch () is an important paper in which it is suggested that certain aspects of the human language faculty have correlates in cognitive abilities in other animals. Berwick and Chomsky (, ) develop some of these ideas further in the context of the evolution of language. Aspects of ‘narrow syntax’ (the formal operations of syntax, especially Merge) may, however, be uniquely human. Pinker and Jackendoff () and Jackendoff and Pinker () are responses to Hauser, Chomsky, and Fitch, arguing that the earlier view of a dedicated 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FURTHER READING

language module of cognition is correct. Smith and Tsimpli () is a study of a linguistic savant, an individual in whom linguistic abilities appear to be hypertrophic. They argue that this supports the view that there is a dedicated cognitive module for language. Anderson and Lightfoot () is a general introduction to Chomskyan linguistics, which includes a useful chapter on language change. Pullum and Scholz () present a very strong version of the argument from the poverty of the stimulus, which they then argue must be false. Jackendoff () is a general introduction to generative linguistic theory with emphasis on the place of linguistic theory in cognitive science. Verb-movement Den Besten (), written and first circulated in , remains the classic treatment of verb-movement to C, unifying the analysis of Germanic V, French subject-clitic inversion and English subject-aux inversion. Emonds () is an early study of verb-movement, in which what was later analysed (in the terminology used here) as V-to-T movement was first identified. Pollock () is the fundamental article on verb-movement, systematically showing how English and French differ in this respect. This article also introduced the ‘splitINFL’ clause structure, which ultimately led to more elaborate structures such as that proposed by Cinque (). Belletti () extended Pollock’s () analysis of verb placement to Italian, showing that Italian infinitives move to a higher position than their French counterparts; Schifano () develops these works and proposes a very detailed account of the placement of finite and non-finite verbs and auxiliaries across Romance using the general framework of Cinque (). Other important recent studies of verb-movement include Koeneman and Zeijlstra () and the typological study in Tvica (). Travis () is an important and influential study of V and expletive constructions in the Germanic languages, noted for the fact that she argued, contra den Besten (), that subject-initial V clauses are TPs. Vikner () is a comprehensive study of verbmovement and transitive expletive constructions across the Germanic languages. Schwartz and Vikner () argue that the verb always leaves TP in V clauses, contra Travis () and Zwart (). Zwart () presents an early minimalist analysis of verb-movement in Dutch, in which he argues, in agreement with Travis (), that 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

subject-initial V clauses are TPs. Roberts () looks at the status of V-to-T in various French-based creoles; we will look at this in more detail in §... Cinque () is an important and influential study of adverbs, functional categories, and clause structure, in which he advocates that simple clauses may contain more than forty functional categories. Cinque () applies these results to the analysis of restructuring and clitic-climbing phenomena in Romance. Emonds () features the first version of the analysis of VSO presented in §.... Carnie and Guilfoyle () is a collection of papers on VSO and VOS languages. Massam () proposes an analysis of VOS and VSO orders in the Polynesian language Niuean in terms of (remnant) VP-fronting. Rackowski and Travis () is an analysis of Malagasy and Balinese, taking into account also Niuean, of the same type. Shafer () is a comprehensive discussion of word order and clause structure in Breton. Jouitteau (/) is a more recent and comprehensive study of word order, agreement, and related phenomena in Breton. Willis () is a detailed study of the loss of V in Welsh. Shlonsky () studies word order and clause structure in Hebrew and a range of Arabic dialects. Thurneysen () is the most comprehensive grammar of Old Irish to date, and the main source of information on that language. Bergin () put forward what has become known as ‘Bergin’s Law’, an important observation concerning Old Irish syntax which states that when the verb is not in initial position in the clause, it must take on a particular form of inflection, the ‘conjunct’ inflection. Russell () is a comprehensive introduction to Celtic philology, with an interesting chapter on the development of VSO and the complex double-inflectional system of Old Irish. Doherty (a,b) analyses Bergin’s Law and related phenomena concerning the position and inflection of the Old Irish verb from a minimalist perspective. Other important generative works on Old Irish are Adger () and Newton (). Negation Ladusaw () is the first systematic study of negative-polarity items in modern formal semantics. He proposes an important generalization concerning the semantic nature of the context in which these items appear, a generalization challenged in Giannakidou (, ). Giannakidou (, , ) are in-depth studies of negation, negative 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FURTHER READING

concord and negative-polarity, primarily from a semantic point of view, although there is some syntactic analysis too. Horn () summarizes the issues arising in connection with the analysis of both free-choice and negative-polarity any. Horn and Kato () is a collection of articles on the general topic of the syntax and semantics of negation. Watanabe () gives an analysis of negative concord in Japanese and other languages in terms of Agree, and draws different conclusions on the nature of n-words from Giannakidou (). Zeijlstra () also analyses negative concord in terms of Agree, applying the analysis to a wide range of languages, as well as looking at Jespersen’s Cycle and other diachronic changes affecting negation; Zeijlstra () and Biberauer and Zeijlstra (a,b) develop this approach further. Déprez (, , ) are detailed treatments of French and Haitian Creole negation. Martins () analyses negative concord and n-words synchronically and diachronically across a range of Romance languages. Borsley and Morris-Jones () is an analysis of Welsh negation. Jespersen () famously proposed the cycle of changes in clausal-negation marking, which has since become known as Jespersen’s Cycle; see §.. Willis, Lucas, and Breitbarth () is a very comprehensive overview of aspects of the diachrony of negation in many languages, with a focus on Jespersen’s Cycle. Breitbarth () is a history of Low German negation, while Jäger (, , ) look at High German, Chatzopoulou () looks at the history of Greek negation, and Gianollo (: chs  and ) looks at the development of negation in Latin and Romance. Ingham and L’Arrivee () is a collection of articles concentrating mainly on the history of negation in English and French, while Ingham () surveys the history of English. Van der Auwera and van Alsenoy (, ) are typological surveys of negative concord and negative indefinites, and van der Auwera () looks at negative indefinites in World Englishes. Wh-movement Ross () is the classic study in which island phenomena were first presented and analysed. Chomsky () is the foundational article of the Extended Standard Theory of generative grammar, the version of the theory current in the s. It contains the first statement of the subjacency condition. Huang () is the classic work on whexpressions in Chinese, showing that, while they are in-situ elements, 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

facts concerning scope and selection strongly suggest a ‘covert movement’ analysis. Watanabe () provides an overview of research on wh-in-situ languages. Bach () is one of the earliest discussions of wh-in-situ; this is where Bach’s generalization was first put forward. Cheng () is an important study of cross-linguistic variation in whconstructions, in which the importance of clause-typing is brought to the fore. Rudin () was the first analysis of multiple wh-movement in Slavonic. Bošković () puts forward an analysis of the difference between single and multiple wh-movement languages which does not treat the latter as a parameter dependent on the positive setting of parameter E. Instead, he suggests that parameter E also distinguishes among multiple-fronting languages, and that what distinguishes the latter from single-movement and in-situ languages is a further requirement that all wh-phrases must be focused. Multiple-movement thus involves a single instance of wh-movement and multiple-focus-movement. McCloskey () puts forward an analysis of the alternating complementizers in Irish, apparently triggered by wh-movement from the clause introduced by the complementizer. Watanabe () provides the account of the development of wh-in-situ in Old Japanese that was summarized in §.., while Watanabe () provides an analysis of wh-agreement in Palauan and other languages. Mathieu and Sitaridou () present an analysis of the loss of wh-movement from a leftbranch in the history of Greek, while Roussou () argues for multiple wh-fronting in Classical Greek. Lopes-Rossi () and Kato () look at the emergence of wh-in-situ in Brazilian Portuguese; Obenauer (), Starke (), Oiry (), and Shlonsky () look at this phenomenon in French, while Oiry and Roeper () look at child French. Bobaljik and Wurmbrand () analyse wh-in-situ in a range of languages, including English. Dadan () analyses the loss of wh-movement as driven by the Labelling Algorithm of Chomsky (, ). Rizzi (, , ) presents a general theory of locality of wh-movement and other operations, known as relativized minimality. Language acquisition and parameter-setting Clark and Roberts () attempts to develop a general account of how parameter values are set and changed on the basis of PLD; the account is applied to the changes in the history of French discussed in 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FURTHER READING

§... Dresher () presents a cue-based learning theory for phonological parameters, and shows how a natural ‘learning path’ emerges. We will return to these ideas in detail in Chapter . Guasti () is a recent textbook on language acquisition, which gives a thorough overview of work to date on first-language acquisition from the parameterbased perspective. We will refer to this work extensively in Chapter . Biberauer () considers parameter-setting in the context of an emergentist theory of parameters based on the three factors of language design introduced in (). Language typology Greenberg () is the foundational text in language typology, where, on the basis of a thirty-language sample, forty-five syntactic and morphological universals of various types were tentatively proposed. Some of these have not been falsified by much larger recent surveys. Lehmann () was the first attempt to apply Greenbergian typology to word-order change. This article was also the first to propose that OV and VO are the most important predictors of other aspects of word order. Vennemann () developed Lehmann’s ideas, proposing the Natural Serialization Principle as an account of Greenbergian wordorder correlations. Hawkins () was a major contribution to language typology, where, among other things, the notion of crosscategorial harmony was developed in terms of X0 -theory. Dryer () is a major study, the most comprehensive of its time, of the Greenbergian word-order correlations. On the basis of a sample of  languages from all over the world, the Branching Direction Theory described in §.. is proposed. The World Atlas of Language Structures (WALS; Haspelmath et al. ; https://wals.info) is the most comprehensive survey of the structural features of the known languages of the world to date, covering  phonological, morphological, syntactic, and lexical features in , languages. Haspelmath () is a thorough and interesting typological study of indefinite pronouns. Comrie () is a classic introduction to language typology. Song () and Croft () are more recent and more detailed introductions. Song () and Aikhenvald and Dixon () are excellent handbooks of language typology; and Song () is an introduction to word-order typology. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

The history of English Van Kemenade () was the first detailed generative study of Old English syntax. It is clearly shown that many of the central features of OE syntax are similar to or identical with those in Dutch and/or German. Kroch () introduces the important concepts of the Constant Rate Effect and grammars in competition, applying both of them to a number of case studies including notably the development of dosupport and the loss of V-to-T movement in Early Modern English; we will look at these proposals in detail in §... Pintzuk (, ) applied Kroch’s () idea of grammars in competition to word-order variation in Old English. In Pintzuk (), these ideas are developed further; in particular it is argued that quantified objects undergo a special leftward-movement rule. Kiparsky () proposed the first account of word-order change which assumed Kayne’s () universal underlying word order, while Roberts () makes a similar attempt. Fuß and Trips () also propose an account of word-order change in English, which makes use of a restricted version of the competing grammars idea. (We return to the topic of competing grammars in more detail in §. and §..) Other works on word-order change in OE and ME include Biberauer and Roberts (, ); Taylor and van der Wurff (); Pintzuk and Taylor (); Taylor and Pintzuk (). Haeberli () is a study of Old English word order with special emphasis on V and the nature of the subject position. Jonas () is a study of verb-movement, transitive expletive constructions, and related matters in Faroese and in the history of English. Warner () provides an overview and attempted synthesis of the accounts of the loss of V-to-T movement in Early Modern English available at the time. He also provides a detailed chronology, which is only partly compatible with the results of Kroch (). We will return to this in §.. Haeberli and Ihsane () observe that V-to-T movement was lost in two stages in late ME/ENE. Kroch and Taylor () propose an account of the loss of V in Middle English which relies on the idea that contact between Northern and Southern dialects played a causal role. Fischer et al. () is an overview of the state of the art concerning generative studies of the historical syntax of English at that time. Van Kemenade and Los () is an excellent handbook on the history of English, while Hogg and Denison () provides a general overview.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FURTHER READING

Görlach () is a very useful general overview of Early Modern English. Gray () is a collection of Middle English texts, with commentary. Hogg (–) is the invaluable Cambridge History of the English Language, a six-volume work which provides extremely detailed information about every aspect of the history of the language, from its Germanic and Indo-European origins to the present day. Germanic syntax Holmberg () is a study of word order and case in the Scandinavian languages, and features the original statement of Holmberg’s generalization: the object moves only if the verb does. Holmberg () restates the generalization in the light of new data and theoretical developments. Holmberg and Platzack () and Vikner () are works of lasting importance on Scandinavian syntax. Roberts () shows that Holmberg’s generalization, in its earlier form, held in Early Modern English for as long as verb-movement was still found. Rögnvaldsson () is an early study of word-order change in the history of Icelandic, which uses the ‘grammars-in-competition’ approach. Hróarsdóttir (, , ) together represent the most detailed studies of word-order change in the history of Icelandic to date. Penner and Bader () provide a series of highly detailed analyses of V and related phenomena in Swiss German. They give evidence that in certain cases V affects the interpretation of the clause. Bobaljik and Jonas () propose an early minimalist analysis of Germanic transitive expletive constructions. Haider (, ) are general studies of German, with an emphasis on the analysis of word order. Axel () and Jäger () are in-depth studies of Old High German syntax. Miller () is the most complete modern study of Gothic. Breitbarth () is a detailed study of Low German syntax, concentrating on negation. Zwart () was the first study of an OV Germanic language in terms of Kayne’s () antisymmetric theory of linearization (which we will look at in §..), Zwart () is an overview of Dutch syntax, and Thráinsson () of Icelandic. Walkden () makes important proposals regarding the possible reconstruction of Proto Germanic, which we will return to in §...



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FORMAL COMPARATIVE AND HISTORICAL SYNTAX

Older Indo-European languages Lehmann () is an in-depth study and a personal view of the state of the art in Indo-European studies at the time. Taylor (, ) is a study of the change from OV order in Homeric Greek to VO order in Classical Greek, using the grammars-in-competition approach. Vincent () is an overview of the structure of Latin, with a particularly careful and interesting analysis of the clausal complementation system of that language, which we will look at in more detail in §.. Salvi (), Devine and Stephens (), Ledgeway (), and Danckaert (, ) are all detailed studies of Latin word order, greatly enhancing our understanding of the synchrony and diachrony of Latin and Romance. Clackson and Horrocks () surveys the history of Latin from Proto Indo-European to the Romance languages. There is now a fair body of work on the syntax of the older Indo-European languages in addition to Latin and Greek: on Sanskrit, Hale (, ), Kiparsky (); on Old Church Slavonic, Večerka (), Pancheva (, ); on Celtic, Watkins (, ), Russell (), Newton (), Eska (, ); on Germanic, Walkden () (on Gothic, Miller ); on Old Iranian, Skjærvø (); and on Anatolian, Garrett (). Fortson (, ) and Clackson () are general manuals of Indo-European with good chapters on syntax.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

2 Types of syntactic change



Introduction 2.1. Reanalysis



2.2. Grammaticalization



2.3. Argument structure



2.4. Changes in complementation



2.5. Word-order change



2.6. Conclusion

 

Further reading

Introduction Unlike the previous chapter, here the discussion will focus entirely on diachronic questions. Now that the notion of parametric change has been justified on empirical grounds, my goal here is to discuss a number of different kinds of syntactic change, and to show how the notion of parametric change can account for them. Thus my goal is to illustrate the power and utility of the parametric approach to syntactic change. The goal of the last chapter was to show that parametric variation was operative in the diachronic domain, i.e. that at least some examples of syntactic change can be analysed as parameter change. Here I want to show that all the major kinds of syntactic change involve parameter change. Thus the notion of parameter change is not merely useful, it is pervasive; in fact, I wish to maintain that it is the principal explanatory mechanism in diachronic syntax. In §., I look at reanalysis, which has frequently been considered a mechanism of syntactic change (the most important discussions of 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

reanalysis in relation to syntactic change are Andersen ; Lightfoot , , ; Roberts a; Harris and Campbell ; Yang a: ff.; Roberts and Roussou ). Here I will try to show how, properly defined, reanalysis is forced by parameter change. §. deals with grammaticalization, the development of new grammatical elements from other grammatical elements or ‘full’ lexical items. The phenomena will be discussed and illustrated, and the formal analysis summarized, following the main ideas put forward by Roberts and Roussou (, ). In §., I turn to changes in argument structure; perhaps the best-known case of this kind of change is the development of psychological predicates in the history of English. This will be summarized, and a parametric analysis will be proposed, developing and updating certain ideas in Lightfoot (); Fischer and van der Leek (); Kayne (); and in particular Allen (). In §., I discuss changes in clausal complementation, taking the very well-known and extensive changes that can be observed in the development from Latin to Romance as the principal example (see Vincent (: –) for a summary of these). Again, I will propose that these changes represent changes in parameter values. Finally, §. picks up the discussion of word-order change from Chapter , and discusses word-order change in the history of English in some detail; this leads us to a more refined approach to the variation in word order than was described in §..

2.1. Reanalysis 2.1.1. The nature of reanalysis Harris and Campbell (: , ) define reanalysis as ‘a mechanism which changes the underlying structure of a syntactic pattern and which does not involve any modification of its surface manifestation’, although they add that there can be a surface manifestation in the form of word-order or morphological change, perhaps appearing after the reanalysis of underlying structure has taken place. What I want to show here is that reanalysis is intimately bound up with parameter change. In fact, reanalysis is usually a symptom of a change in the value of a parameter; given that a single parameter value (of one kind or another; see the discussion of parameter types in §..) may underlie a cluster of surface grammatical properties, this implies that a parameter change 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

REANALYSIS

may manifest itself as a cluster of reanalyses, and a reanalysis is usually one symptom of a parameter change. The idea that reanalysis is central to syntactic change is hardly novel. Harris and Campbell (: ) show that it may go back as far as Aristotle and the Arabic grammatical tradition. Later (–), they give examples of the concept from the writings of Bopp (); Paul (); Brugmann (); and Wackernagel (–). They also state () that reanalysis ‘has been perhaps the single most important factor in modern treatments of syntactic change’. If we can relate reanalysis to parameter change, then, we will clearly be giving parameter change a central role in diachronic syntax. In a sense, we have little choice other than to relate reanalysis to parameter change, given our general assumptions. Following Harris and Campbell’s definition, reanalysis affects the structural representation associated with a surface string, without altering the string itself. The structural representation, given the assumptions made up to now, is built from three major operations: Merge, Move, and Agree (strictly speaking, this is just the two operations of Merge and Agree, since Move is just one case of Merge: Internal Merge; see Box .). These operations are essentially invariant, and, as such, cannot be open to reanalysis. However, the formal features which form the basis of parametric variation given the Borer-Chomsky Conjecture (BCC; see §.., ()) can and do vary, and this drives variation in Move and Agree across languages, as we saw in detail in Chapter . Hence parameters relating to these operations are what changes when reanalysis takes place. We will see examples of this below. Beginning, it seems, with Paul (; see Harris and Campbell’s :  discussion of his ideas), reanalysis has often been related to child-language acquisition. An important concept here is that of abductive change, as put forward, in the context of a discussion of phonological change in Czech, by Andersen (). Abduction was distinguished from induction and deduction by the philosopher Charles Sanders Pierce. Deduction proceeds from a law and a case to a result (for example, ‘All men are mortal’ (law); ‘Socrates is a man’ (case), therefore ‘Socrates is mortal’ (result)). Induction proceeds from a case and a result to a law (for example, an immortal being may observe that men (cases) eventually die (result) and conclude that all men are mortal (law)). Abduction proceeds from a law and a result to a case. Abduction is open to error in a way that induction and deduction 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

are not. With deduction, the case instantiates the law, and so the result must follow. With induction, the result is intrinsically associated with the case, and so the law follows. But abduction cannot follow necessarily: the connection between the case and the results known to follow from the law might be accidental. To take our example, from the statement ‘x is mortal’ (result) and the law that all men are mortal, one cannot conclude that a random mortal x is human (case). It is easy to see that x could be a mortal non-human. In part because of its logically flawed nature, the notion of abduction gives us a useful way of thinking about reanalysis in language acquisition. Following Andersen (: ), we can schematize abductive change as follows: () Generation 1: G1 → Corpus1

Generation 2: G2 → Corpus2 Here, ‘Corpus’ refers to a body of sentences produced by speakers. This is called an ‘Output’ by Andersen, and, in work on learnability it is called a ‘text’; see the discussions of learnability theory in Bertolo (); Fodor and Sakas (). ‘G(rammar)’ refers to an instantiation of UG with parameters set. Generation  (G; which we can think of, somewhat simplistically, as the ‘parental’ generation, the term ‘generation’ being intended in its everyday sense) has grammar G which underlies Corpus. Generation ’s (G’s) grammar (simplistically, the ‘children’s grammar’),1 G, derives from Corpus and the nature of the language faculty, given the assumptions about language acquisition we have adopted here (which were summarized in §.). The notion of abduction comes in here, since we can think of the language faculty as the law, and Corpus as the result: the child then abduces the case, i.e. a particular grammar. But, as illustrated above, the child may make an error of abduction, and, as it were, mistake a similar case (G) for the actual case, G. The important thing about language acquisition that the schema in () brings out is that there is no direct link between G and G. This is because, in the last analysis, grammars are mental entities and it is impossible to have direct access to the contents of 1 In §. we will see a reason to modify this simplistic terminology.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

REANALYSIS

another mind. Grammars are only transmitted from one generation to the next via corpora, and corpora may give rise to errors of abduction. Still putting things rather simplistically, the possibility arises of ‘mismatches’ between G and G as a consequence of the way in which grammars are transmitted. The general view has been that reanalyses are just such mismatches. This has been widely regarded as the basic factor underlying change (see the discussion in Harris and Campbell (: –, ff., and the references given there). To quote Kroch (: ): ‘[l]anguage change is by definition a failure in the transmission across time of linguistic features’. If syntactic change centrally involves reanalysis and if reanalyses are mismatches, and if reanalysis is symptomatic of parameter change, then it follows that parametric change is the basic factor underlying syntactic change. Moreover, if reanalysis is driven by abduction in language acquisition, then so is parameter change. So we arrive at one of the main ideas we will explore in this book (mainly in the next chapter): that parametric change is driven by language acquisition. As already stated, this idea is not new: a version of it (without the technical notion of parameters) seems to have first been put forward by Hermann Paul, and has been argued for in the context of generative grammar most notably by Lightfoot (, , ). This scenario for abductive change naturally raises two fundamental questions: What are ‘mismatches’? and How can mismatches arise? Let us assume for the moment that ‘mismatches’ are reanalyses in exactly the sense defined by Harris and Campbell as given above, and that these must be linked to a parameter change; at the abstract level, mismatches must be connected to features driving the operations Move (Internal Merge) and Agree. Then we can see that Generation  may abduce some difference in underlying structure for some part of Corpus as compared to Generation , and this may have some effect (in morphology or word order, as Harris and Campbell suggest) on Corpus; these effects are the overt signs of the parameter change. Putting things this way brings out the problems with this approach. There are two principal problems, which we can call the Regress Problem and the Chicken-and-Egg Problem.2 The Regress Problem 2 Croft (: ) also refers to a ‘Chicken-and-Egg Problem’ in diachronic syntax. But his problem is different from the one I discuss below. Croft’s problem is that reconstructed changes may be used to support hypotheses about typological change, while a postulated



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

can be put as follows: an innovation in Corpus may be ascribable to a mismatch in G (compared to G), but it must have been triggered by something in Corpus—otherwise where did it come from? But if Corpus could trigger this, then how could G produce this property without itself having the innovative property? To quote Kroch (: –) again: Since, in an instance of syntactic change, the feature that learners fail to acquire is learnable in principle, having been part of the grammar of the language in the immediate past, the cause of the change must lie either in some change, perhaps subtle, in the character of the evidence available to the learner or in some difference in the learner, for example in the learner’s age at acquisition, as in the case of change induced through second-language acquisition by adults in situations of language contact.

Here Kroch illustrates the problem and the only possible solutions: either Corpus is subtly changed so that G is more readily abduced from it than G, or some external factor such as language contact is at work. There is no doubt that language contact plays an important role in many syntactic changes, and that it can provide a straightforward solution to the Regress Problem. This will be the subject matter of Chapter . But it seems that not all changes can be explained through contact, and where contact is not a causal factor, subtle changes in Corpus seem to offer the only mode of explanation for change. These subtle changes may be caused by some extrasyntactic, but still intralinguistic, factor such as phonological or morphological change; we will see examples of this below. If some change in Corpus is responsible for reanalysis but is not itself the reanalysis, we face the Chicken-and-Egg Problem. If we observe two correlated changes, how can we know which caused the other? To put it another way, we might want to say that two innovations in Corpus are due to a single mismatch in G caused perhaps by a single feature of Corpus. This will solve the Regress Problem along the lines just sketched, for one of the innovations. But if Corpus shows the two innovations, how do we know which is playing the causal role? How do we know which innovation is a cause and which an effect of the reanalysis? And, for whichever one we call the cause, we still have the typological change may be supported by a reconstructed change. As Croft says: ‘[t]his appears to be a vicious circle.’ As we will see below, however, the Chicken-and-Egg Problem for us relates to distinguishing causes and effects of change, which is a different matter.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

REANALYSIS

Regress Problem. This problem can be observed in two different treatments of the causal role of reanalysis. On the one hand, Lightfoot () proposes a series of different changes leading to accumulated opacity in the grammar, ultimately causing a reanalysis (we will see an example of this directly); on this view, the prior changes are not explained and are subject to the Regress Problem, although the reanalysis is explained. On the other hand, Timberlake () and Harris and Campbell () propose that reanalysis causes a group of unrelated changes; this approach explains the changes but not the reanalysis (see Harris and Campbell : ). Of course we are always free to assert that there is no causal relation between the two innovations, but in doing this we flout Occam’s razor (by having more entities, i.e. underlying changes, than necessary) and have the Regress Problem twice over.3 2.1.2. Transparency and Tolerance In this section, I will describe two ways of dealing with the problems introduced in the previous section. The first, the Transparency Principle, was proposed in the pioneering work in generative diachronic syntax, Lightfoot (). The second, Yang’s () Tolerance Principle, is of more recent vintage but no less significant for that. Lightfoot’s () Transparency Principle offered a way of dealing with both the Regress Problem and the Chicken-and-Egg Problem. This can be seen from his discussion of the development of English modal auxiliaries (e.g. can, must, may, will, shall, ought). Lightfoot argues that several changes affecting these items took place together in the sixteenth century.4 These include the loss of the ability to take direct objects (or indeed any kind of complement other than an apparently bare VP, with ought a consistent exception in requiring a to-complement), and the loss of non-finite forms. () illustrates an early 3 Harris and Campbell (: –) criticize Lightfoot’s (: ff.) discussion of the differences between parametric changes and other kinds of changes in part because it does not solve the Chicken-and-Egg Problem. The criticisms are partly justified, but they apply to any approach involving reanalysis, as Harris and Campbell () acknowledge. 4 Many authors have pointed out that Lightfoot’s chronology seems to be incorrect, in that it is not clear that all these changes took place at the same time; see in particular Warner (; ). Nonetheless, I present the developments approximately as Lightfoot did, since they illustrate the general point regarding transparency and reanalysis that I wish to make here.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

example of will with a direct object, and an example of an infinitival modal (konne, corresponding to NE can): () a. Wultu kastles and kinedomes? Wilt thou castles and kingdoms? (c., Anon; Visser –: §) b. I shall not konne answere. I shall not can answer. (, Chaucer; Roberts : )

[ME]

Moreover, after the loss of V-to-T movement (the change in parameter B discussed in §..), modals diverged syntactically from all the other verbs of English (except the aspectual auxiliaries have and be, and dummy do) in that they retained the earlier pattern of negation and inversion syntax, i.e. they precede clausal negation and are inverted over the subject in main-clause interrogatives. This is of course still the case in present-day English: () a. I cannot speak Chinese. b. Can you speak Chinese? This can be accounted for, consistently with the idea that parameter B changed value in ENE, if we assume that by the time this parameter changed, modals were merged in T rather than V. Hence, once the V-to-T parameter changed, the syntactic differences between modals and main verbs in negation and inversion emerge since main verbs no longer move to T, while modals are merged there. So we then have the NE situation in which modals have ‘T syntax’ and main verbs have ‘V syntax’. According to Lightfoot (), the creation of a new class of modal auxiliaries was due to the accumulation of exception features— morphological, semantic, and syntactic—on the modal verbs, which made them ‘opaque’ as main verbs. The morphological exception feature was that the modals, by the sixteenth century, were the only surviving members of the class of OE ‘preterit-present’ verbs. These verbs are characterized by having ‘a strong past tense with present meaning . . . and a new weak past tense’ (Mitchell and Robinson : ). By late ME, the consequence of this was that these were the only verbs in the language to lack a sg ending in the present tense (-(e)s or -(e)th); in a language with as impoverished an inflectional system as English, it is reasonable to suppose that this is a highly irregular feature. The semantic ‘irregularity’ of these verbs was their modal meaning, and 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

REANALYSIS

in particular their ability to act as a periphrastic substitute for the moribund subjunctive inflections. In virtue of their meaning, the usual form–meaning correlation between preterit morphology and past time did not always hold (for example, in I should do it tomorrow). One syntactic irregularity may have been that, with the exception of ought, the modals never took to-infinitives as their complement, although this was established as the main form of non-finite sentential complementation by the end of the ME period (Fischer et al. : ff.). Lightfoot (: –) is the original presentation of these and other opacity-inducing factors. So, Lightfoot’s claim is that the Transparency Principle forced the modals to change category once this opacity became too great. This approach has two notable advantages. First, it narrows down the Regress Problem; as long as we know how much opacity can be tolerated and what the nature of opacity really is, we can know at which point Corpus will have sufficient exception features to cause Generation  to abduce G rather than G. More precisely, suppose the Transparency Principle states that a certain structure can only be acquired if it requires the postulation of less than n exception features. G is acquired on this basis, but something in Corpus must be abduced as a further exception feature, making G unlearnable for Generation , and hence triggering reanalysis. We can see that the Regress Problem still appears in that there is some feature of Corpus which must be an exception for Generation  but not for Generation . Similarly, a characterization of exception features would also solve the Chicken-and-Egg Problem; otherwise this arises in connection with exactly the same feature of Corpus, which is what is really driving the reanalysis. Nevertheless, the merit of the Transparency Principle is that it forces us to say that reanalysis is caused by at least one exception feature too many. The problems with the Transparency Principle also emerge from this discussion. The most fundamental of these is that there is no definition of transparency or its converse, opacity. Without these notions, it is clear that the potential advantages relative to the Regress Problem or the Chicken-and-Egg Problem are not realizable. Unfortunately, Lightfoot () offered no such definitions, and neither have any arisen in more recent work by Lightfoot or others. So we must conclude that the Transparency Principle does not offer true solutions to the Regress Problem and the Chicken-and-Egg Problem. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

The Transparency Principle can be seen as an evaluation metric on possible grammars, i.e. a way of comparing grammars. An evaluation metric is a way of choosing between different grammars. This naturally raises the question of who or what is doing the choosing. The choice can be seen at an abstract, mathematical level as simply a way of optimising grammars. Alternatively, we may consider an evaluation metric as a guide for linguists making analytical choices. But the most interesting way of thinking of evaluation metrics, and the one most relevant for syntactic change, is in terms of language acquisition: the acquirer may choose between two (or more) grammars which account for the same primary linguistic data (PLD) in terms of an evaluation metric, i.e. the evaluation metric may cause an acquirer to prefer one grammar over another. A valid evaluation metric will ultimately ensure that the most highly-valued grammar compatible with the PLD is the one a learner converges on. Looking again at (), both G and G may be able to generate the same subset S of grammatical sentences of both Corpus and Corpus, but if the evaluation metric favours G then Generation  favours G, and so a new grammar emerges. In principle, as sketched above in relation to the Transparency Principle, a good evaluation metric will indicate exactly what is ‘wrong’ with G as compared with G, and thereby—in principle—solve the Regress Problem. By the same token, the Chicken-and-Egg Problem can in principle be tackled by the right evaluation metric. As we saw above, however, the Transparency Principle lacks the precision to achieve this in practice, although it isolates the problem in principle. The conception of language acquisition that was prominent until around  relied heavily on evaluation metrics to distinguish the competing classes of rule systems which could generate the PLD (until that time grammars were thought to consist of systems of rule systems). The best-known early example of an evaluation metric was put forward for phonological rule systems in Chomsky and Halle (: f.). This was a feature-counting metric, in fact a version of Feature Economy as introduced in §. (): of the two grammars (or rule systems) G and G, the more highly valued was the one whose rules involved the least number of phonological distinctive features, or the least number of distinctive features of a particular type.5 The Transparency Principle, at 5 As we will see in §.., Chomsky and Halle’s evaluation metric only counted marked feature values. We will discuss markedness at length in §..



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

REANALYSIS

least in the way Lightfoot applied it in his analysis of the emergence of the English modals, is an exception-feature-counting principle. Yang (: –) introduces a mathematically precise and conceptually elegant evaluation metric, the Tolerance Principle. This is stated in (): () Tolerance Principle If R is a productive rule applicable to N candidates, then the following relation holds between N and e, the number of exceptions that could but do not follow R: N e θN where θN :¼ lnN (Here ‘lnN’ means ‘the logarithm of N’; Yang : – discusses the motivations for this formulation, which I will take for granted here.) The Tolerance Principle defines a tipping point beyond which exceptions to a productive rule R become ‘intolerable’, and hence require the system to change so that R is no longer productive (possibly entailing thereby the disappearance of R). Where R either becomes unproductive or disappears, the grammar changes. So the Tolerance Principle defines the conditions for grammar change. The grammar change in question may occur during language acquisition, where the acquirer abandons one grammar in favour of another, or in diachronic change. Looking once again at (), the Tolerance Principle could be the causal factor favouring G over G for a given generation of acquirers. This opens up the possibility that the Regress Problem and the Chicken-and-Egg Problem could be solved quantitatively: a statistical fluctuation in the PLD could cause e to become too large. Another possibility would be the case where some other rule change in G affecting R’ 6¼ R, affects the PLD in such a way that e increases past the tipping point defined in (). As mentioned above, R’ may be a phonological rule whose change leads to the masking of a crucial aspect of the PLD relevant for maintaining e at a level below the threshold defined in (); we will look at a case of this kind in the next section. Or, of course, R’ could be a distinct syntactic rule. As Yang (: –) makes clear, recursive application of the Tolerance Principle ‘is critical for the learner to acquire the complex structures in the world’s languages’. The recursive application of the Tolerance Principle may be fundamentally important in relation to parameter hierarchies, as we will see below. Yang gives many examples of the application of the Tolerance Principle in language acquisition, and also applies it to several cases 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

of language change. As he points out, it ‘offers a concrete program that integrates language acquisition and change’ and ‘should be extended to additional cases of language change for which quantitative data is available’ (Yang : ). Let us therefore look more closely at how the Tolerance Principle works. Yang’s principal example from language acquisition is the well-known one of the acquisition of regular and irregular past-tense forms of English verbs. The course of acquisition of these forms proceeds in three stages (Marcus et al. ; Pinker ). At the first stage, irregular verbs are correctly inflected for past tense (e.g. hold—held). This is followed by a second stage which involves overgeneralization of the regular pattern (e.g. hold—holded; following Yang, I will refer to the regular past-tense rule, which adds –ed to a verb stem, as the d-rule). Finally, the child stabilizes at the ‘correct’ adult-like system in which regulars and irregulars are distinguished. Yang (: ) comments: [t]he trajectory of English past-tense learning is quite typical when considered in the crosslinguistic study of language development . . . In many (but not all) cases of language acquisition, children show an initial stage of conservatism, during which they do not seem to generalize beyond the input, before the emergence of productivity. How do they calibrate the balance between rules and exceptions? Where is the critical juncture at which children recognize the productivity of rules?

The Tolerance Principle provides answers to these questions. Since rules and their productivity can change diachronically, and since we take change to be driven by acquisition, the Tolerance Principle can provide answers to the diachronic counterparts of these questions. Yang (: ff.) shows in detail how the various rules for pasttense formation of English verbs vary in productivity as a function of the size of the acquirer’s vocabulary, as predicted by the Tolerance Principle. The key point is that the quantity θN, being defined logarithmically, has the property that ‘the proportion of tolerable exceptions drops quite sharply as N increases’ (Yang : ); see table . of Yang (: ) for an illustration of this. Yang compares four rules for the English past tense, but here I will illustrate with just the two in (), one (from the adult perspective) regular and one irregular: () a. ɪ → æ/ __ ŋ (‘i’ becomes ‘a’ before ‘ng’: sing-sang, ring-rang, etc.) b. PAST → d (add ‘(e)d’ to the verb stem elsewhere).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

REANALYSIS

Rule (b) is an ‘elsewhere’ rule: no context for the application for the rule is specified so it applies in any context where other rules fail to apply (this is the principle of disjunctive ordering, which is pervasive in language; see Kiparsky  and §.). The  most frequent verbs (extracted from a five-million-word corpus of child-directed North American English; details are given by Yang : ) include bring, ring, and sing. Here the ratio of rule-obeying verbs to exceptions is three to one (i.e. one exception out of three possibilities) and so rule (a) is seen as productive, and indeed many English-acquiring children overgeneralize, producing brang as the past tense of bring. At  verbs spring and swing are added to the list of verbs potentially undergoing (a) (in technical terms, we say these verbs meet the structural description of the rule). So the ratio of rule-obeying verbs to exceptions is five to one. At  verbs, there are  regulars, while θ = . Hence with a vocabulary this size, the d-rule (b) is not productive. This is what Yang (: ) calls the ‘transient stage of productivity’ for rule (a). When we reach  verbs, there are eight verbs meeting the structural description of (a), i.e. ending in /ɪŋ/: bring, fling, ring, sing, spring, sting, swing, and wing. Three obey (a), three change the vowel to /ᴧ/ (spelt ‘u’), bring is idiosyncratic (past tense brought) and wing is regular. As Yang (: ) says: ‘[a]ll of them lose because none is numerically dominant enough to tolerate the rest’. At this stage, there are  verbs obeying the d-rule, while θ = , so again (b) is not productive. By the time the total number of verbs extracted from the corpus, ,, is reached, e =  and θ, = . So the d-rule is now productive, while rule (a) remains at eight. So N (the number of verbs meeting the structural description of the rule) = , and e (the number of exceptions) = . Rule (a) is therefore unproductive, since θ = . So we see how the Tolerance Principle makes very clear predictions about changes in the productivity of the rules as more verbs are acquired; Yang shows that other irregular rules (feed-fed, fly-flew) are almost always irregular (although fly-flew is briefly productive at the first, one-hundred-verb stage). The data are summarized in Yang (: , table .). What applies to changes in the grammar of a child acquiring their first language can apply to changes in the productivity of rules over time. As Yang (: ) points out, ‘morphological change must be driven by the numerical relationship between N and e for any given rule R’: R will be productive if e θN and unproductive if e > θN. These 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

quantities define the threshold for the analogical levelling of morphological classes, a well-known kind of morphological change. Further, ‘one may construct a dynamical system of (N, e)t that undergoes perturbation over generations (t)’ (Yang : ); we will come back to this idea in §.. and §... Can we apply the Tolerance Principle to syntactic change? Can it fill in the gaps left open by the Transparency Principle that we mentioned above? This may be possible. In this connection, an important corollary of the Tolerance Principle introduced by Yang (: ) is relevant. This is the Sufficiency Principle, stated in (): () Sufficiency Principle Let R be a generalization over N items, of which M items are attested to follow R. R can be extended to all N items if and only if: N N—M < θN where θN :¼ lnN The essence of the change which affected the English modals as described by Lightfoot was that the modals changed from being a subclass of verbs, the preterit-presents, to being a separate syntactic category of auxiliaries (as above, for the sake of illustration we leave aside the empirical problems for Lightfoot’s account which emerged later; see note ). If we say, again for the purposes of illustration, that there were eight verbs in this irregular subclass in ME, then, given that θ=, if more than three of these verbs failed to follow a relevant generalization R which holds of verbs, then the entire subclass may fail to be analysed as verbs. This would lead to the creation of a grammatical category distinct from that of verbs to which the modals would be reassigned: the category of modal auxiliary. A good candidate for R may be the ability to appear in non-finite forms, which some of the ‘pre-modal’ preterit-present verbs (e.g. sculan/shall) failed to do already in OE (see Warner , ). The tipping point for a small class such as the preterit-presents could be just one more verb failing to appear in non-finite forms at a given historical moment, leading to a sudden change of the kind Lightfoot proposed despite centuries of lexical variability and notional irregularity prior to the change. Of course, to substantiate this conjecture a thorough quantitative analysis of the behaviour of the preterit-present verbs throughout, at least, the whole ME period would be required. But the point of principle holds, and with it the relevance of both Tolerance and Sufficiency for understanding syntactic change. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

REANALYSIS

The changes in the modals may have been a category change, arguably a microparametric change in terms of the taxonomy of parameters given in §.. (). But what of the kinds of parameter changes documented at length elsewhere in Chapter ? Here there is an important connection between the idea of parameter hierarchies introduced in Chapter  and the Tolerance Principle. As Biberauer (a,b, ) points out, one can construe the implicational relations among the different parameter types, or, more generally, between parameters low in a hierarchy in contrast to relatively ‘higher’ ones, as increasingly fine-grained featural distinctions. Each move ‘down a hierarchy’ involves the introduction of a new feature, effectively making a (sub) categorial distinction in the grammar. Hence, a macroparameter applies to a maximally undifferentiated class of elements, while mesoparameters relate to large classes (e.g. verbal vs. nominal categories), microparameters to more fine-grained categories (e.g. modal auxiliaries, as just mentioned) and nanoparameters may be lexically idiosyncratic (like irregular verbs, lexically flagged as unproductive options); see §.. (). As Yang (: ) points out in his discussion of the acquisition of English stress, which requires a distinction between nouns and verbs, the recursive application of the Tolerance Principle can create categorial distinctions. At an initial stage, corresponding to languages with much simpler stress systems than English (of which there are many) the dominant stress pattern satisfies the Tolerance Principle over the entire vocabulary and no further subdivision is necessary. The next stage is to partition the lexical items into nouns and verbs. If productivity is satisfied then, the learner stops. Otherwise the learner looks to break up the data (within nouns and verbs) even more. (Yang : )

This is exactly parallel to Biberauer’s conception of how acquirers traverse a parameter hierarchy (it is also very close to Dresher’s ,  Successive Division Algorithm for acquiring systems of phonological distinctive features; see §.). It follows that changes in productivity will lead to different paths within a parameter hierarchy for acquirers; in other words, changes in productivity as characterized by the Tolerance Principle will lead to parametric change. Hence, recursive application of the Tolerance Principle will account for both the acquisition of and change in parametric systems. The Tolerance Principle appears to fulfil the initial promise of the Transparency Principle: 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

it is an evaluation metric which can apply equally well to language acquisition and language change. What is the status of the Tolerance Principle in relation to the language faculty? The natural answer is to treat it as a third factor in terms of Chomsky (); see §. () (this is also argued by Biberauer (), who suggests that it may be a subcase of Maximize Minimal Means). This conclusion really arises by elimination: it is highly implausible to consider it part of UG (first factor) and it cannot be acquired from an environmental stimulus (second factor). Therefore it must be a third factor, along with Feature Economy and Input Generalization (or perhaps, like them, subsumed by Maximize Minimal Means, as Biberauer a,b proposes; see §.. for a different view). As such, we take it to be innate and, potentially, domain-general; it is plausible to conjecture that it may apply in other cognitive domains. This leads us to a larger question, on which I once again quote Yang: ‘I am at a loss to say why the Tolerance Principle appears to work so well . . . It is as if the assumptions that underly its derivation, which are meant to approximate reality, in fact are the reality’ (Yang : ). I have nothing to add to this comment here, except to note that we seem to be faced with an example of ‘the unreasonable effectiveness of mathematics in the natural sciences’ (Wigner ). If this conclusion implies that diachronic syntax is a natural science, then it is a very positive, if metaphysically perplexing, one. Here we have seen that both Lightfoot’s Transparency Principle and Yang’s Tolerance Principle can be construed as evaluation metrics allowing acquirers to choose more highly valued grammars over relatively low-valued ones. Since the right evaluation metric can provide answers, at least in principle, to both the Regress Problem and the Chicken-and-Egg Problem, they can make the programme of relating language change and language acquisition feasible. We saw that the Transparency Principle left many crucial questions open, while the Tolerance Principle and its corollary the Sufficiency Principle provide precise and plausible answers to these questions and as such may play a central role in our account of language acquisition and change. 2.1.3. Phonology and reanalysis As we have already mentioned, one way to tackle both the Regress Problem and the Chicken-and-Egg Problem is to attribute the crucial 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

REANALYSIS

factor leading to reanalysis to another part of the grammar, for example, phonology or morphology. An example where phonology plays a role is the development of the question particle ti in Colloquial French (see Harris , Bennett , Roberts a: –, Harris and Campbell : ; a similar development has taken place in the history of Occitan (Wheeler : –) and some varieties of Franco-Provençal Valdôtain (Roberts b: ff.)). This element is a reanalysis of the epenthetic consonant /t/ and the sg masculine pronoun il in inversion contexts, roughly as follows: () (Jean) a-t-il fait cela? → Jean a ti fait cela? [French] (John) has he done that John has  done that ‘Has John done that?’ This change, which in fact involves the reanalysis of subject–clitic inversion and the loss of pronominal features associated with il, depends on the ability to drop word-final /l/ after /i/ in colloquial French. The effects of it can be seen where the preverbal subject is not sg masculine, as in: ti souvent? () a. Elle t’écrit she you-writes  often ‘Does she write to you often?’

[Colloquial French]

b. On t’a ti demandé ton adresse? one you-has  asked your address ‘Have you been asked for your address?’ Also, since the ‘complex inversion’ construction from which ti was reanalysed could not have an initial subject clitic (*Il habite-t-il Lyon? ‘he lives-he in Lyon’ = ‘Does he live in Lyon?’; see Kayne ; Rizzi and Roberts  on this), the existence of examples with il in subject position is a further indication of this reanalysis: () Il habite ti Lyon? he lives  Lyon ‘Does he live in Lyon?’

[Colloquial French]

For this reanalysis to take place, it suffices that Generation  produced an inversion structure containing epenthetic /t/ and the pronoun il, with a low-level phonological rule deleting word-final /l/. This gives rise to a surface string containing the phonological sequence /ti/. By treating /ti/ as a Q-marker, Generation  can analyse this string as 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

containing no post-verbal subject clitic (in the complex inversion construction, a preverbal subject is present in any case), no /t/-epenthesis, and no /l/-deletion. Syntactic opacity may play no role here; rather it is the indeterminacy of the earlier form which makes it subject to reanalysis (although I will return to this point directly). Both the Regress Problem and the Chicken-and-Egg Problem are solved by appealing to the idea that the crucial causal factor was the deletion of word-final /l/.6 Roberts (a: ff.) puts forward a general notion of Diachronic Reanalysis (DR) which is operative here. The reanalysis relating the two constructions in () is given in () (see Roberts (a: ), although the structures proposed here are simpler in various respects) the (-t-) in (a) is presumably not present in the syntactic structure, being an epenthetic consonant:7 ()

a. [CP Jean [C [T a] C] [TP (-t-) il [VP fait cela]]] > b. [TP Jean [T a] ti [VP fait cela]]

This change can be dated to the early seventeenth century (Roberts a: –). According to Roberts (ff.), both structural ambiguity and structural simplicity are preconditions for a DR of this type in that (b) is clearly a simpler structure than (a). I will discuss various ways of characterizing structural simplicity in §.. Hence opacity does in fact play a role, in the guise of relative structural simplicity; the idea is that reanalysis is motivated by a general preference on the part of language acquirers to assign the simplest possible structural representations to the strings they hear (as part of Corpus). I will henceforth refer to this as the ‘simplicity preference’ (again, this can be seen as a third factor, presumably related to Feature Economy). 6 Actually the problems are solved for syntax, but they may be shifted to the phonology. If final /l/-deletion is a productive option, why is it not postulated by Generation  in this case? Again, we do not fully understand why Generation  tolerates the earlier grammar and why Generation  innovates. See note . 7 See Roberts (a: ), and the references given there, on the structural position of ti. It is clearly lower than the position of the finite auxiliary, which we are assuming to be T, but external to VP. It is possible that the phonological opacity concerns /t/-epenthesis rather than /l/-deletion. No morphological or phonological operation equivalent to /t/-epenthesis is found elsewhere in French, while final-consonant deletion is rife on most analyses (see Tranel ; Dell ; Pagliano ). Moreover, the opacity of /t/-epenthesis would carry over to the many varieties (including Quebec French, as well as varieties of Occitan and Franco-Provençal) where the question particle appears to have arisen from the sg pronoun tu. I leave these complex and interesting questions aside here.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

REANALYSIS

Moreover, DR of the type illustrated in () is associated with parameter change. DRs are seen as the symptoms of parameter change. Here, the development of ti is associated with the loss of subject–clitic inversion in main-clause yes–no questions; to the extent that inversion involves T-to-C movement of the kind described in §..., and depends on the ability of the relevant type of C to trigger movement, it is a property subject to parametric variation. Roberts () suggests that ‘the notion DR may . . . prove to be epiphenomenal. All DRs may turn out to be instances of Parametric Change.’ The approach to reanalysis which regards it as caused both by simplicity and ambiguity gives rise to an interesting angle on the Regress Problem: assuming that language acquirers (Generation  in ()) will always prefer the simplest possible representation of the strings of Corpus, to solve the Regress Problem we could look for what prevented the simpler analysis for Generation . In this case, we take this to be phonology; Generation  has an underlying /l/ in il, which is deleted by the /l/-deletion rule (see notes  and  for some provisos to this). Similarly, the Chicken-and-Egg Problem may be reduced to phonology; presumably some change in the underlying phonological form led Generation  to abandon the underlying /l/ in il, with the reanalysis as a direct consequence. From the point of view of syntax, then, the problems are solved, although they may resurface in accounting for the relevant phonological changes. Lightfoot (: –) critiques DRs for having no really useful role to play in an account of language change. Strictly speaking, this may be true; we have already seen that Roberts (a) suggests DRs may be epiphenomenal, and we are following that suggestion here. Lightfoot correctly states that DRs are to be regarded as relating grammatical representations of subsequent generations, but incorrectly points out that ‘they occur where grammatical shifts have already taken place’ (). In fact, DRs are intended as an indication of how a potentially ambiguous string had one analysis at one period (Generation ) and another at a later period (Generation ). Their utility lies in bringing out the alleged role of simplicity and ambiguity in driving reanalysis. We have seen that Lightfoot () regards opacity as the principal cause of reanalysis, although he also mentions ambiguity (: ). Timberlake () and Harris and Campbell (: ff.) consider reanalysis to be a consequence of ambiguity. Finally, Roberts (a) 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

regards reanalysis as driven by both factors, assuming that the preference for simplicity can be seen in terms of opacity of the earlier structure. 2.1.4. Expressing parameters If we are to view reanalysis as always accompanying parameter change, i.e. as the structural manifestation of the change in the value of at least one parameter, then we have to consider how Corpus in () succeeds or fails in triggering different values for a given parameter, i.e. in leading language acquirers to set a given parameter to a given value. Lightfoot () tackles this question, following Dresher (), by introducing the notion of cue for a parameter. See also Lightfoot (: ff.), where a number of examples of cues are given; the loss of V-to-T movement in ENE and the development of the modals and do are also discussed there (–). In a similar vein, Clark and Roberts () introduced the concept of parameter expression. This can be defined as follows (this definition is from Roberts and Roussou : ): ()

A substring of the input text S expresses a parameter pi just in case a grammar must have pi set to a definite value in order to assign a well-formed representation to S.

To give a simple example, a sentence like () expresses the positive value of the wh-movement parameter, since this parameter must be given the positive value in order for the sentence to be grammatical (in a non-OSV language): ()

Who did you see?

The notion of ‘trigger’ (or, equivalently, cue) can be defined in terms of parameter expression, as follows: ()

A substring of the input text S is a trigger for parameter pi if S expresses pi.

Thus () is a trigger for the positive value of the wh-movement parameter E of Chapter . Clearly, for Generation  to converge on the same grammar as Generation  in the scenario in (), Corpus must express all the relevant parameters.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

REANALYSIS

We can begin to connect p-expression to reanalysis by introducing the following notions (again, originally from Clark and Roberts , but slightly reformulated here): ()

a. P-ambiguity: A substring of the input text S is p-ambiguous with respect to a parameter pi just in case a grammar can have pi set to either value and assign a well-formed representation to S. b. A strongly p-ambiguous string may express either value of pi and therefore trigger either value of pi. c. A weakly p-ambiguous string expresses neither value of pi and therefore triggers neither value of pi.

Our example in () illustrates strong p-ambiguity in relation to Parameter E of Chapter , the parameter which determines whether a language has multiple wh-movement; on the basis of sentences with just one wh-phrase, the acquirer cannot tell whether multiple wh-movement is possible or not. Strong p-ambiguity is arguably linked to reanalysis. We might suppose that reanalysis takes place given a class of strongly ambiguous strings in relation to a particular parameter in a given corpus, and where a simpler representation is associated with one value rather than the other. In the example involving French interrogative ti given above, the relevant strings are rendered p-ambiguous with respect to subject– clitic inversion (T-to-C movement; the ‘residual’ version of the V parameter of §...) by the phonological option of /l/-deletion or selection of an underlying form lacking final /l/. Since the reanalysed structure is simpler than the earlier one (see () and the following discussion, as well as notes  and  on /t/-epenthesis), this is the preferred structure. So p-ambiguity and the simplicity preference are what drives reanalyses, seen as surface manifestations of parameter change. We can give a more extended example of how this approach works with the loss of V-to-T movement (Parameter B of Chapter ) in ENE. As we saw in Chapter , examples like the following (repeated from () of §...) indicate that V moves to T at this period:



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

()

a. if I gave not this accompt to you [ENE] ‘if I didn’t give this account to you’ (c.: J. Cheke, Letter to Hoby; Görlach : ; Roberts : ) b. The Turkes . . . made anone redy a grete ordonnaunce ‘The Turks . . . soon prepared a great ordnance.’ (c.: Kaye, The Delectable Newsse of the Glorious Victorye of the Rhodyans agaynest the Turkes; Gray : ; Roberts a: )

In terms of the notion of p-expression introduced above, we can say that examples like this express the positive value of the V-to-T parameter. On the other hand, at that time as at this, many very simple sentences, which must have been extremely prominent in the trigger experience, were p-ambiguous. A simple sentence such as John walks expresses either value of the V-to-T parameter, as illustrated by the two possible structures in (), and is therefore strongly p-ambiguous: ()

a. John [T walks] [VP . . . (walks) . . . ] b. John T [VP walks]

Furthermore, following the change in status of the modal auxiliaries and do (which appears to have taken place slightly earlier in the ENE period than the final loss of V-movement over negation; see Roberts a: ff.; Warner : –; Haeberli and Ihsane ; §...; and below), any simple sentence containing a finite auxiliary was weakly p-ambiguous regarding the V-to-T parameter, assuming the auxiliary was in T: ()

a. I may not speak. b. I do not speak.

So we see that there was much p-ambiguity regarding the V-to-T parameter in sixteenth-century English. However, this ambiguity existed, albeit in a slightly different form, in ME too. () was strongly p-ambiguous prior to reanalysis of modals and do as auxiliaries, since these elements were at that stage main verbs. In this connection, the fact that dummy do became an auxiliary at the same time as the modals (see Denison  and Roberts a: ) played a very important role. This is the case because in the sixteenth century do could appear in



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

REANALYSIS

positive declaratives, as shown in () (both examples are from Shakespeare’s Richard III, discussed in Barber : ): ()

a. Where eyes did once inhabit. b. Thou didst receive the sacrament.

[ENE]

In fact, do could seemingly appear in any context, except where a modal or aspectual auxiliary was present. Thus, do was always available as an alternative to verb-movement. In particular, this meant that there was always a non-V-movement alternative to constructions like (), which otherwise expressed the positive value of the V-to-T parameter. We still have to ask why it is that the p-ambiguity of examples like () and () only became crucial in the sixteenth century. In other words, what prevented this p-ambiguity from leading to a reanalysis of V-to-T movement structures, and hence a resetting of the V-to-T parameter, prior to this time? One answer has to do with morphology. Southern varieties of English lost a large part of their verbal agreement morphology in the latter part of the fifteenth century.8 For example, Gray (: ff.) gives the agreement paradigms shown in Table . Table 2.1. Verbal agreement inflection in Middle English 



cast-(e)

cast

cast-est

cast-est

cast-eth

cast-eth

cast-e(n)

cast-(e)

cast-e(n)

cast-(e)

cast-e(n)

cast-(e)

8 Northern varieties, notably including Older Scots (spoken and written in the Kingdom of Scotland in the fifteenth and sixteenth centuries—see Derrick McClure ) had rather different paradigms from the OE ones. By the sixteenth century, these paradigms were apparently invariant, although they were already subject to what may have been a precursor of the modern Northern Pronoun Rule, in that the agreement endings disappeared in certain persons where the subject was non-pronominal; see Roberts (a: ff.) and the references given there; Jones () on Scots varieties from a synchronic and diachronic point of view; Henry () on the variant of the Northern Pronoun Rule found in presentday Belfast English (a variety which derives from Older Scots); and Jonas () on the present-day Shetland dialect of English. See also Tortora and den Dikken (); Buchstaller et al. (); and references given there.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

for the present tense of the verb cast in East Midlands English at the beginning and end of the fifteenth century. Shortly after , what remained of the plural agreement marking was lost. This development is striking in that many authors have observed a correlation between the ‘richness’ of verbal agreement inflection and the positive value of the V-to-T parameter, notably in a range of Scandinavian languages and dialects. Vikner (: ) sums up the relation as follows: ()

An SVO language has V-to-T movement if and only if person morphology is found in all tenses.

This generalization (and its precursors; see the very thorough discussion of these in Vikner ) has become known as the Rich Agreement Hypothesis (RAH). It has been criticized for being empirically too strong. There appear to be a number of varieties in which verbal inflection has disappeared or nearly disappeared, but which nevertheless continue to show V-to-T movement. Well-known cases are the Kronoby dialect of Swedish (spoken in Finland), the Norwegian dialect of Tromsø, and Middle Scots (but see note  on the latter): ()

a. He va bra et an tsöfft int bootsen. [Kronoby it was good that he bought not book-the Swedish] ‘It was good that he didn’t buy the book.’ (Platzack and Holmberg : ) b. Vi va’ bare tre støkka før det at [Tromsø Nor.] we were just three pieces for it that han Nielsen kom ikkje. he Nielsen came not ‘There were only three of us because Nielsen didn’t come.’ (Vikner : note , ()) c. Quhy sing ye nocht, for schame! [Middle Scots] why sing you not, for shame (Anon. The Unicornis Tale; Gray : ; Roberts a: )

In (a,b) we see that the finite verb in the embedded clause precedes negation, in (c) the verb is inverted to C. These examples are therefore equivalent to sixteenth-century English examples like (a), and are taken to indicate that V moves to T in these varieties. However, these varieties have no subject-verb agreement at all (like the standard 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

REANALYSIS

Mainland Scandinavian languages, which lack V-to-T movement). As Thráinsson () points out, this indicates that a biconditional statement of the type in () cannot be right. See also Bobaljik () and Alexiadou and Fanselow () for further arguments and evidence against a strong, biconditional correlation of the kind in (). On the other hand, Koeneman and Zeijstra () defend precisely such a strong, biconditional version of (), which they state as follows: ()

A language exhibits V-to-I movement if and only if the regular paradigm manifests featural distinctions that are as rich as those featural distinctions manifested in the smallest pronoun inventories universally possible (Koeneman and Zeijlstra : ).

Koeneman and Zeijlstra argue that the smallest pronoun inventories distinguish [speaker] (first-person vs. second- and third-person), [participant] (first- and second-person vs. third) and [plural]. They show that many languages which have been analysed as having V-to-T movement, e.g. Icelandic, French, and others (see §...), make such distinctions in regular verbal paradigms while languages lacking V-to-T movement like English and Danish do not. They also argue that the correlation between the morphological property defined in () and syntactic verb-movement is rooted in language acquisition (see also Roberts a: f., , on the relation between ‘rich’ agreement and null subjects and/or verb-movement). Heycock and Sundquist () argue against Koeneman and Zeijlstra’s conclusions using diachronic evidence from Danish. Partly on the basis of earlier work by Sundquist (, ), they conclude that ‘V-to-I persisted for more than two centuries, and multiple generations, after the loss of rich agreement in Danish’ (Heycock and Sundquist : ; see also Heycock and Wallenberg ). There is similar evidence from elsewhere in the history of the VO Germanic languages, including English; see below. Tvica () is a typological study intended to investigate the wider cross-linguistic validity of (). Following the typological sampling methodology of Rijkhoff, Bakker, Hengeveld, and Kahrel (), Tvica examined twenty-four non-Indo-European, non-OV languages from eleven language phyla9 for rich agreement (defined, following 9 A language phylum is equivalent to a traditionally recognized language family (Rijkhoff, Bakker, Hengeveld, and Kahrel : ).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

Koeneman and Zeijlstra, as showing distinctions in three persons and two numbers) and verb-movement, diagnosed in terms of whether the order V>adverb>object is attested or not, with careful controls on the type of adverb (in essence, only manner adverbs are considered; see the discussion in Tvica : –). The general conclusions are summarized in the following quotation: While the investigated languages vary greatly across different phyla with respect to the syntax of finite verbs as well as the morpho-syntactic and morpho-phonological properties of agreement, careful scrutiny of the interaction (or absence thereof) between the two variables leads to the conclusion that once we control for other displacement phenomena, the Rich Agreement Hypothesis (RAH) makes correct predictions in most languages, and cannot be falsified in others. (Tvica : )

In particular, the biconditional statements (), () appear to be confirmed, or at least not disconfirmed, by Tvica’s results. Given the nature of Tvica’s investigation, diachronic evidence involving a possible correlation between the loss or gain of one of the two variables could not be taken into account. One way to approach this issue, particularly in relation to diachronic change, is to hypothesize that morphological paradigms of certain types may express parameter values (see Roberts : ). For example, let us restate Vikner’s generalization in () as follows: ()

If (finite) V is marked with person agreement in all simple tenses, this expresses a positive value for the V-to-T parameter.

The statement in () differs from Vikner’s generalization in two important ways. First, it is a statement about the expression of a parameter, and thus ultimately about the trigger experience, rather than being a statement about something internal to UG (recall that both Koeneman and Zeijlstra  and Roberts a:  consider that the correlation must be rooted in some aspect of language acquisition). In other words, it represents ‘a choice on the basis of surface cues from among the limited set of possibilities provided by Universal Grammar’ in the words of Anderson (: ), who criticizes approaches of the type put forward by Vikner in which morphology determines syntax. Second, it is a one-way implication; it allows for languages with a positive value of the V-to-T parameter but without verbal agreement inflection, just as has been observed in varieties such as Kronoby Swedish, Tromsø Norwegian, Early Modern Danish, and 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

REANALYSIS

Middle Scots. Thráinsson (: ) similarly proposes a one-way implicational relation between V-to-T movement and the relevant verbal agreement inflection; see also Bobaljik and Thráinsson (). The generalization in () applies to the past, too. As already mentioned, Middle Scots had a seemingly invariant verbal agreement paradigm (with the complication mentioned in note ) and yet allowed V-to-T movement, as we saw in (c), and cf. also Heycock and Sundquist () on Middle Danish. So we can conclude that, prior to  or shortly afterwards, verbal agreement inflection in Southern varieties of English expressed a positive value for the V-to-T parameter. Interestingly, if we treat V-to-T as an undifferentiated option (i.e. verb-movement or not), there seems to have been a delay between the loss of agreement marking and the loss of V-to-T movement, in that verbal inflection is lost approximately seventy-five years before V-to-T movement is lost. However, if we take the results of Haeberli and Ihsane () as summarized in §... into account, then the loss of V-to-T movement correlates closely to the loss of agreement inflections, but V-to-H movement (in terms of the schema in §... ()) remained. Thráinsson (: –) shows that, as far as can be ascertained, there is an apparent delay between the loss of agreement and the loss of V-to-T in the Mainland Scandinavian languages; as we have seen, Heycock and Sundquist () confirm this for Danish and explicitly argue that Koeneman and Zeijlstra’s conclusions are too strong. All this is consistent with the one-way implicational statement in (). What remains to be seen is whether the loss of V-to-T in Mainland Scandinavian falls into two stages as it does in English. Warner (: –) divides the chronology for the loss of V-to-T in English into four periods. In Period  (up to c.), T attracts V, due to its agreement morphology, as we have seen. In Period  ( – roughly ), T loses the attraction property and variation ensues as data like () triggers V-to-T, but the evidence of modals and do in T does not favour this, being weakly P-ambiguous in relation to this parameter. In Period  (c.–), V-to-T movement is no longer found, but there are lexical exceptions (mainly know and doubt) which continue to show the older pattern. Finally, in Period , from  on, V-to-T movement of main verbs no longer occurs. Note that Period  corresponds closely to the period Haeberli and Ihsane () isolate as involving V-movement to a head intermediate between T and V (see §...). 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

The shift from Period  to Period  is the crucial one in the present context. If this line of reasoning is correct, the loss of morphological expression of the V-to-T parameter created the strong p-ambiguity needed for a reanalysis of the following kind (() could easily be reformulated in terms of V-to-H movement in the innovative structure): ()

[TP John [T walk-eth] . . . [VP . . . (V) . . . ]] > [TP John T . . . [VP . . . [V walks]]]

Following Warner, this reanalysis led to variation for a period, but favoured the innovative, structurally more economical grammar. The reanalysis manifests a change in the V-to-T parameter, which, as we saw in §.., is associated with a cluster of properties: main-verb inversion in questions, V–not order, and possibly also pronominal object shift and transitive expletive constructions (V–adverb–object order was lost earlier, roughly at the end of Warner’s Period , according to Haeberli and Ihsane ; see again §...). Postulating that the morphological expression of the parameter played a crucial role in preventing the earlier reanalysis effectively deals with the Regress Problem in this instance. Nevertheless, we can ask whether this is a really principled and general answer to the problem. In particular, this solution to the Regress Problem shifts it to morphology, rather as the answer proposed in the case of the development of French ti sketched in the previous subsection shifted it to the phonology. Furthermore, although Haeberli and Ihsane’s account gives us a chronological connection between the loss of agreement morphology and the change in the V-to-T parameter, it raises the tricky question of the nature of and the trigger for V-to-H movement: here the morphological expression of the V-to-T parameter is lost, and yet limited verb-movement appears to remain for a generation or two at least. Of course, if Haeberli and Ihsane are right, what we are witnessing here is V-movement to a head lower than T; such V-movement need not be connected to morphological expression in the form of agreement if the relevant target H is not associated with agreement features, but then of course the further question arises as to what H is and what the p-expression for V-to-H is. If Warner is right in saying that there was variation at this period, we may be able to appeal to mechanisms of ongoing change of the kind we will discuss in more detail in §. and §., in particular a version of Kroch’s () notion 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

REANALYSIS

of competing grammars. But it is important to see that what we have just sketched above, while clearly indicating the relations among p-expression, reanalysis, and parameter change, undoubtedly leaves important questions open.10 One currently spoken Scandinavian language appears to be at a late stage in the loss of V-to-T movement at present: Faroese. The careful investigation of the incidence of V-to-T movement in this language by Heycock et al. (, ) leads to the conclusion that it is likely that the loss of V-to-T is ‘currently at the very tail end of the ‘S-shaped curve’ that characterizes replacement of one variant by another’ (Heycock et al. : ; we will discuss the S-curve in §., see Figure .). The reasons for this are, first, that unambiguous examples of V-to-T, illustrated by verb>negation order in non-V subordinate clauses are found, although truly unambiguous examples are rare since embedded V is fairly liberal in Faroese (as it is in Icelandic; see §..., note ), a property we may be able to connect to Wolfe’s (a) notion of ‘Fin V’ discussed in §.... The textual evidence in Heycock et al.’s () tables – indicates this and shows a clear contrast with Danish, where verb>negation order is categorically impossible outside V contexts. Second, extensive testing of native-speaker intuitions in Faroese reveals a certain acceptance of verb>negation order in non-V environments, again in sharp contrast to the judgements of native speakers of Danish on comparable data sets. Third, the absence of a discernible effect of the age variable on Faroese native-speaker intuitions, which at first sight might seem to argue against ongoing change, favours the idea that the loss of V-to-T is almost, but not quite, complete. As Heycock et al. (: ) say: ‘as a change goes to completion, the rate of change slows, and generational differences are expected to become less and less marked’. Finally, the verbal-agreement paradigms of Faroese are significantly reduced as compared to Icelandic (which clearly has V-to-T 10 One of these is the technical question of why morphological marking of agreement on V should be associated with attraction by T. In terms of the theory assumed here, basically that of Chomsky (; ), T and V Agree for tense features, in that T has interpretable tense features and the morphology on V encodes uninterpretable tense features. T has uninterpretable agreement features, but these can only be valued against DPs. Hence it is unclear why V should be attracted to T (i.e. why we should have Move and Agree holding between V and T) just when V has rich morphological marking of agreement. One possibility may be to appeal to the nature of the morphological marking of tense and aspect rather than agreement; different versions of this approach are pursued by



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

movement; see Table . of Chapter ). Following Thráinsson (), Heycock et al. (: ) conclude that morphological reduction has led to a situation in which the morphological trigger for V-to-T has been lost and other factors have become crucial, leading to a certain amount of variability in native-speaker judgements and a low incidence of unambiguous cases of the relevant orders. The situation in contemporary Faroese is in many respects directly comparable to what we observed above regarding ENE. There are two important differences between ENE and contemporary Faroese, however. First, ENE, by the seventeenth century, had developed a class of auxiliaries (modals and do) which undermined a great deal of potential evidence in negation and inversion contexts for V-to-T, as we saw above. On the other hand, Faroese, like all the other Germanic languages, lacks an English-style auxiliary system.11 Second, Faroese is V, unlike ENE, and in fact a V language which is fairly liberal regarding the incidence of V in various kinds of embedded clauses: verb>negation orders in these contexts are strongly p-ambiguous between a positive value for the V-to-T parameter and a positive value for what we can loosely call the ‘liberal embedded-V parameter’. As Heycock, Sorace, and Svabo Hansen (: ) observe, an embedded-V analysis of sentences with verb>negation order (which may have been generated by the conservative V-to-T grammar) could allow speakers of the innovative grammar to accommodate the earlier grammar: here the fact that such examples are strongly p-ambiguous plays an important role in, as it were, ‘diverting’ otherwise crucial aspects of the trigger experience. Heycock and Wallenberg () apply Yang’s (a,b, , ) variational model of language acquisition to modelling the loss of V-toT in a language with embedded V, using quantitative data from Icelandic and Swedish corpora. The central idea of this model is that parameter values are associated with a probability weighting that changes during acquisition in response to the input. The weighting of a given parameter value vi changes as a function of the evidence in favour of vi: the more evidence in its favour, the more it is rewarded. We could naturally construe this idea in terms of the incidence of unambiguous p-expression of vi in the PLD. In terms of Yang’s 11 Although Heycock et al. () show that Faroese-acquiring children do produce more verb>negation orders where the verb is actually an auxiliary.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

REANALYSIS

model, the question is which of two grammars, one with V-to-T and one without V-to-T but with embedded V, ‘wins’ and emerges as preferred through acquisition (in Yang’s terms, which is the fitter grammar). It emerges that the grammar without V-to-T wins, without this being an inherent preference (or an unmarked parameter setting; see §.). Heycock and Wallenberg (: ) conclude that ‘a V-to-T grammar could only be truly stable diachronically if it showed some evidence to the learner other than word order, e.g. agreement morphology’. Again we see that ‘rich’ agreement morphology expresses a positive value of the V-to-T parameter. Generally speaking, the Faroese situation is not unlike the one Warner suggested for Period  of ENE described above. Moreover, as we shall see in §., we expect variation of this type as a change is ongoing (even when very close to completion as seems to be the case with Faroese V-to-T); this lends support to the competing-grammars idea. The close comparison of contemporary Faroese with ENE may well be able to tell us a lot about the details of ongoing changes. 2.1.5. Reanalysis and the poverty of the stimulus One final point before we leave the discussion of reanalysis. It may seem at first sight that the scenario for abductive reanalysis that we have described is actually inconsistent with the argument from the poverty of the stimulus, in that abductive reanalysis is precisely a case where children do not necessarily converge on the grammar underlying their trigger experience. This point becomes clear if we reconsider part of the quotation from Hauser, Chomsky, and Fitch () given in the Introduction to Chapter : A child is exposed to only a small proportion of the possible sentences in its language, thus limiting its database for constructing a more general version of that language in its own mind/brain. This point has logical implications for any system that attempts to acquire a natural language on the basis of limited data. It is immediately obvious that given a finite array of data, there are infinitely many theories consistent with it but inconsistent with one another. In the present case, there are in principle infinitely many target systems . . . consistent with the data of experience, and unless the search space and acquisition mechanisms are constrained, selection among them is impossible . . .

Under abductive reanalysis, the child does in fact construct for itself a system which is consistent with the data from its experience but which 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

is not exactly consistent with that underlying the trigger experience, as we have seen. But the important thing is that the search space and the acquisition mechanisms are highly constrained, and so reanalyses, although possible and actually attested (if the view of syntactic change being sketched here is correct), do not vary ‘wildly’ over just any imaginable possibilities. Instead, they appear to be of a rather limited type: it has often been observed that syntactic changes fall into fairly well-defined patterns; see Harris and Campbell (, ch. ) for an overview of various approaches to syntactic change. Here we have adopted the approach to parametric variation based on the BCC given in §.. (), repeated here: ()

All parameters of variation are attributable to differences in the features of the functional heads in the Lexicon (Borer : –; Chomsky : ; Baker a: f.)

In these terms, parameter change can only involve the formal features of functional categories. To the extent that reanalyses reflect the rather limited range of parametric options made available by the BCC, they too can only involve the reflexes of different choices of formal features of functional heads, as these are manifested, usually through differing options of movement and Agree. Furthermore, as we shall see in §. and §., acquirers can discern and reproduce variation and optionality in the PLD in their internalized grammars. For this reason, studying them may eventually shed light on an important aspect of linguistic theory. 2.1.6. Conclusion In this section, I have introduced the central concept of reanalysis, largely following Harris and Campbell’s () definition. I have suggested that reanalysis is symptomatic of underlying parametric change, and that it results from the abductive nature of language acquisition. I identified two problems which have often been discussed in the literature, which I called the Regress Problem and the Chicken-andEgg Problem. These problems were discussed in relation to examples of reanalysis from the literature. Furthermore, we saw how an evaluation metric such as Yang’s () Tolerance Principle may provide a way to solve them. I will return to these matters in §., where we will see that the Regress Problem falls under the more general logical problem of 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRAMMATICALIZATION

language change. In §., where we discuss the nature of the trigger for parameter values in more detail, we will come back to the Chicken-andEgg Problem. Bearing all this in mind, we turn in the next section to a well-known and highly pervasive type of syntactic change: grammaticalization.

2.2. Grammaticalization Grammaticalization can be defined as the process by which new grammatical morphemes are created. The term was first coined by Meillet (), although, as Harris and Campbell (: ) point out, the notion certainly predates the introduction of the term; see Hopper and Traugott (: ff.) for a discussion of the history of the nineteenthcentury antecedents of the concept. Since the s, grammaticalization has been the focus of much attention in the typological and functional literature on syntactic change. (See in particular Heine and Reh ; Lehmann , ; Heine, Claudi, and Hünnemeyer ; Traugott and Heine ; Heine et al. ; Bybee, Perkins, and Pagliuca ; Hopper and Traugott ; the compendia of cases of grammaticalization in Heine and Kuteva , the Narrog and Heine  handbook; and Kuteva, Heine, Hong, Long, Narrog, and Rhee .) In more formal approaches to syntax, there has been work by Roberts and Roussou ( ); van Kemenade (); Wu (); Simpson and Wu (); Munaro (); Tremblay, Dupuis, and Dufresne (); Batllori et al. (), and in particular van Gelderen (, , , , ). Here I will focus on the formal approaches to grammaticalization presented in Roberts and Roussou (, ) and in van Gelderen (, , , , ) as these approaches clearly illustrate how the phenomenon may be reduced to reanalysis and associated parameter change. In terms of the kind of theory of syntax being assumed here, in which many grammatical morphemes (complementizers, auxiliaries, determiners, etc.) are seen as exponents of functional categories, the idea that grammaticalization involves the creation of new grammatical morphemes implies that grammaticalization frequently involves the development of new exponents of functional categories. Given the BCC (see ()), the formal features of functional heads are the locus of parametric change, i.e. able to trigger the cross-linguistically varying 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

properties of Agree and Move. In these terms we can see how creating a new exponent of a functional head F may involve changing parametric options involving Agree and Move by introducing new features associated with F. A frequently discussed example of grammaticalization involves the history of French negation (Jespersen ; Foulet ; Hock ; Déprez , ; and especially Willis, Lucas, and Breitbarth, henceforth WLB, , Breitbarth, Lucas, and Willis ). In §.., we saw how a number of what are now n-words in Modern French developed from formerly positive expressions: aucun (formerly ‘some’, now ‘no’), rien (formerly ‘thing’, now ‘nothing’) and personne (formerly ‘person’, now ‘no-one’). We also mentioned this change may have correlated in the seventeenth century with the development of the ne . . . pas pattern as the standard form of clausal negation, with pas bearing the interpretable Negation (iNeg) feature from that time on. We saw how these changes can be accounted in terms of the general approach to negation and negative concord proposed in Zeijlstra (, ). What we did not discuss there is the origin of pas. This word comes from the noun meaning ‘step’, which still exists in contemporary French. It was grammaticalized as a negative marker at the relevant stage in the history of French. The development of the two-part clausal negation is part of a series of changes first observed by Jespersen () which have become known as Jespersen’s Cycle. They can be illustrated for both French and English as follows: ()

Stage I: a. OE: b. OF: Stage II: a. ME: b. Standard French:

ic ne I  jeo ne I 

secge say dis say

ne  je ne 

seye say dis says

I

Stage III: a. ENE: I say not b. Colloquial French: je dis pas 

not  pas 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRAMMATICALIZATION

As () shows, both French and English illustrate the ‘cyclic’ development from a preverbal negative marker, to a combination of pre- and postverbal marking, to a final stage where only the postverbal marking survives. Stage III represents ENE, and, as we have seen, English then developed a rather different pattern of Negation involving the auxiliary do; this was linked to the change in the V-to-T parameter which we discussed in the previous section. Jespersen’s Cycle, perhaps involving more than just the three stages shown in (), has now been shown to have occurred in many languages in addition to English and French. Zeijlstra () documents the various stages of the cycle in Dutch and its dialects, Slavonic, Greek, Romanian, Hebrew, Hungarian, Italian, Spanish, Portuguese, German, Swedish, Norwegian, Quebec French, Bavarian, and Yiddish. Van Gelderen (: –) analyses Old Norse, Modern Scandinavian, Finnic, Sami, Kamassian, Berber, Arabic, Amharic, Koorete, Hausa, Athabascan, Eyak, Tlingit, Haida, Chinese, and various creoles. WLB () present very detailed case studies of French, English, Italo-Romance, High and Low German, Dutch, Brythonic Celtic, Greek, Slavonic, Arabic, various other Afro-Asiatic languages including Berber and Coptic, and the Uralic language Mordvin. So it is very clear that Jespersen’s observations represent a significant and prevalent pattern of morphosyntactic change, and one with import for semantics since it concerns the expression of negation. Moreover, there appears to be a significant and consistent interaction with negative concord, negative quantifiers and negative polarity items (see §. for discussion of these phenomena). Looking again at (), the transition from Stage II to Stage III may at first sight not seem to be of great interest from a syntactic point of view, since it appears to involve just the loss of an unstressed element. However, we saw in §.. that Jäger () links the loss of the preverbal negator to the loss of negative concord in the history of German, i.e. to a change in the value of Parameter D, introduced in () of Chapter . Moreover, Zeijlstra (: , , , –) explicitly makes this connection. Zeijlstra (: ) formulates this as follows: ‘All languages with a negative marker X are NC languages.’ This is a one-way implication, in that it allows for NC languages which lack a negative marker of the relevant kind. What is ruled out, though, is a non-NC language with negative marker which is a head.12 12 Standard English may be a problem for this. While many nonstandard varieties of English exhibit negative concord, Standard English does not exhibit negative concord



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

The change from Stage I to Stage II, on the other hand, involves the grammaticalization of the negative element. Here I will briefly summarize the development of the French negator point, as described in Roberts and Roussou (: ff.), since this seems to represent a fairly typical case (see also Déprez , ; Breitbarth, Lucas, and Willis ). The reason for choosing point rather than pas is that the development of this element is in certain respects more interesting for our conception of grammaticalization than that of pas; recall also that point remains an alternative form of sentential negation alongside pas, at least in literary varieties of French. The negator point developed from a noun meaning ‘point’; this noun was borrowed into English, and survives in Modern French as a masculine noun. The negator point, on the other hand, is not a noun in contemporary French; it lacks grammatical gender and other nominal features such as number. This element occupies the same position as pas, i.e. it follows a finite verb and precedes a non-finite verb (see §...); in fact, it is simply a stylistic alternative to pas in the relevant registers of contemporary French: ()

a. Jean ne mange point de chocolat. John  eats not of chocolate ‘John does not eat chocolate.’ b. ne point embrasser Marie, . . .  not to-kiss Mary . . . ‘not to kiss Mary . . . ’

[French]

In order to understand the change that converted point from a noun into a clausal negator, we need to take a closer look at the internal structure of DPs and at the development of the French article system. Let us suppose, following initial proposals by Bernstein (, ), Ritter (), Zamparelli (), among many others, that the structural complement of D is not in fact NP, but rather a further functional category indicating Number, i.e. NumP, as shown in ():

between its n-marked indefinites and the negator n’t, which certainly appears to be a head. Zeijlstra (: –) suggests that Standard English is ‘underlyingly an NC language’.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRAMMATICALIZATION

DP

()

D

NumP

Num

NP

Furthermore, we can assume that the usual postnominal position of adnominal APs in both Old and Modern French is a reflex of the fact that nouns in general move to Num, with adnominal APs adjoined to NP (cf. Longobardi a, : –; Crisma and Longobardi ; and the references given there), as in:

DP

()

D un a

NumP

Num chevalier knight

NP

AP preu noble

NP (chevalier)

‘a noble knight’ Num is also the position for certain quantifiers, as argued by Zamparelli (); see also Crisma and Longobardi (: –), and references given there. In these terms, we can understand the development of a class of n-words in French (including point, but also aucun, rien, personne, etc.) as involving the loss of N-to-Num movement for these items and their reanalysis as exponents of Num. This can explain the change in distribution of these elements and the loss of phi-features; after the loss of N-to-Num movement these elements are no longer 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

nouns and so, we may assume, they can no longer enter the relevant Agree relations with Num and D.13 How did this change take place? Here we follow Déprez’s (, ) analysis. Déprez observes that Modern French DPs almost always require an article. In particular, sentences like () are ungrammatical if no article is present: ()

Jean a mangé *(des) pommes. John has eaten (some) apples.

[French]

Déprez further observes that this was not the case in earlier French. As we saw in §.., null Ds are found in OF with singular mass nouns and with bare plurals, much as in English or in other Romance languages (see Longobardi (a) on the latter point): ()

a. Si mengierent pain et burent cervoise. [OF] so they-ate bread and drank beer ‘So they ate bread and drank beer.’ (Gr. , –; Foulet : ) b. En me bourse grande a il deniers a grant in my purse big has there coins in great planté. plenty ‘In my big purse there is money in great plenty.’ (Av. –; Foulet : )

We see then that French has lost a class of null indefinite determiners; these were replaced by the indefinite article un(e), the ‘partitive article’ du, de la, des, and, for generic plurals, the plural definite article les. In this connection, Déprez (: ) points out that ‘an attractive conjecture is that the use of bare rien and personne in environments from which bare NPs gradually disappeared, survived by . . . undergoing incorporation into the obsolete empty indefinite determiners which preceded them’. Roberts and Roussou (: ff.) develop Déprez’s conjecture by supposing, following Longobardi (a, , ; Crisma and Longobardi ), that Ds give nominals their referential properties, i.e. their ability to refer to objects or sets of objects in the 13 Actually aucun turned into a D and so retained phi-features. D may be the only position in DP where phi-features are systematically marked in Modern French (Harris : –).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRAMMATICALIZATION

world. Once the OF null indefinite Ds were lost (which was presumably due to the extension of the use of the indefinite and ‘partitive’ articles; see Foulet (: ff.) on these developments), DPs with null Ds could no longer be referential. Words such as rien, personne, and point, as well as a small number of others including chose (‘thing’) and âme (‘soul’) (Foulet : ff.), which for a time were also negative expressions, could remain in such DPs, but had to be interpreted as non-referential quantifiers occupying Num. The fact that Nouns like rien and personne denoted ‘generic’ entities (‘thing’, ‘person’) clearly helped in their reanalysis as quantifiers; in this respect point is rather different, being originally a Noun denoting a ‘minimal quantity’ (a ‘minimizer’ in the terminology of Bolinger ; Horn ), a point I return to below. This accounts for how these words ceased to be nouns, but it does not account for how they became negative (i.e. took on a uNeg feature in terms of what was proposed in §.., following Zeijlstra , ). Roberts and Roussou suggest that this change is bound up with the loss of null indefinite Ds in French, as mentioned above, along with the development of a null negative D in examples like () (see Kayne : ff. for an analysis of the DP bracketed in () as containing a null negative determiner): ()

Jean n’a pas mangé [DP e de pommes] John -has not eaten of apples ‘John has not eaten (any) apples.’

[French]

This is the only case of a null D with common Nouns in Modern French, and it is negative.14 In OF, this construction did not exist; see the detailed discussion in Foulet (: ff., ff.). Instead, a singular negative indefinite typically lacked an overt article altogether:

14 Except in indefinites with the ‘partitive article’ where the head Noun is premodified by an Adjective: (i)

a. J’ai acheté du pain. I’ve bought of-the bread ‘I’ve bought some bread.’ b. J’ai acheté de bon pain. I’ve bought of good bread ‘I’ve bought some good bread.’

[French]

Kayne (: ) suggests that this de is an article, rather than there being a null determiner or a quantifier.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

()

a. je ne nourriroie [DP trahitor]. [OF] ‘I would not feed [a] traitor.’ (Ch. –; Foulet : ) b. [DP Offrande] hui mais n’i prenderai. offering today more not-there I-will-take ‘I will take no more offerings today.’ (F. ; Foulet : )

This construction is slightly different from that in () as it features singular count nouns in negative contexts, while in () we have mass or plural nouns in non-negative contexts. The article-less DPs in () are non-specific indefinites. Roberts and Roussou follow Foulet (: ff.) in taking the development of the null negative D, associated with de, as being caused by the same reanalysis as that which created clausal point. (a) is an example of point in a positive context (albeit an if-clause, and as such a context for negative-polarity items—§..) and (b) is an example of point in a negative context. In both cases, it is followed by partitive de: ()

a. Ja por rien nel te deïsse [OF] already for nothing not-it you I-would-say se point de ton bien i veïsse. if bit of your goods there would-see ‘I would not tell you if I saw the smallest piece of your goods.’ (P. –; Foulet : ) b. cel aweule la qui n’a point d’argent that blind-man there who not-has bit of money ne de houce ausi nor of clothes too ‘that blind man who doesn’t have a single bit of money nor clothes’ (Av. –; Foulet : )

In the examples above, the verb is transitive, and point can be interpreted as the head of the direct-object DP taking a partitive PP-complement. Thus the relevant part of the structure of (b) would be as follows: ()

V [DP [DØ] [NumP [Num point] [NP (point) [PP d’argent . . . ]]]]

In this structure, point, like aucun, rien, and personne as discussed above, is reanalysed as merged in Num when the loss of the null 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRAMMATICALIZATION

indefinite D meant that referential Nouns were no longer legitimate in determinerless DPs. However, two things distinguish point from the other elements which were reanalysed from N to Num. The first is a semantic difference: point lacks the descriptive content susceptible of being reinterpreted as a negative quantifier, i.e. it does not have the ‘generic’ meaning of words like rien and personne, whose core meanings, ‘thing’ and ‘person’, were reinterpreted as the restriction on the negative quantifier after grammaticalization (see below). Instead, it is a minimizer in Bolinger’s sense. Second, point was able to be syntactically separate from the following de-phrase, as in (): ()

De contredit n’i avra point. of opposition not-there will-have bit ‘There will not be a bit of opposition.’ (P.,  and ; Foulet : )

[OF]

In examples like () the de-phrase satisfies the V constraint operative in OF (see §...).15 The syntactic separability of point from the dephrase combines with point’s lack of semantic content beyond ‘pure’ negation to create the circumstances for the reanalysis of point as a clausal negator, and thus the reanalysis of the DP headed by the null article as negative. This second reanalysis, which affected point but not rien and personne, can be schematized as follows: ()

a. ne V [DP [D Ønon-specific] [NumP [Num point] [NP d’argent . . . ]]] > b. ne V [Neg point] [VP [DP Ønegative d’argent]]

This is the origin of both the null negative determiner and the clausal negator point. The other clausal negators pas and dialectal mie (from the noun meaning ‘crumb’, another minimal quantity) underwent a similar reanalysis. As Foulet (: ) points out, once expressions like il n’y a pas d’argent (‘there is no money’) arise, the development of the negative de-phrase is complete, since these are etymologically absurd, i.e. they could not mean ‘there is not a step of money’, although negative pas derives from a noun meaning ‘step’. So we see how the development of the null negative determiner is connected with the development of clausal negator point. The result of 15 Whether the fronted constituent de contredit here is a PP, an NP or a DP is not clear, but not crucial for the point at issue.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

this development, combined with the loss of the null non-specific indefinite article of (), is that null Ds have a negative feature, i.e. uNeg, since the null D is effectively an n-word, in Modern French. Now, since rien and personne were the only nouns able to appear with a null determiner, they too became inherently negative. In terms of Zeijlstra’s () analysis summarized in §.., they took on a uNeg feature. In this way, all three developments—the development of clausal point, the development of the null negative D, and the development of rien and personne as n-words—are linked together by the loss of N-to-Num movement and the reanalysis in (). Breitbarth, Lucas, and Willis (: ff.) provide a finer-grained analysis of the stages of Jespersen’s Cycle (as does Zeijlstra : , –), based on the ten case studies in WLB (). First, they divide Stage I of () into two sub-stages. Stage Ia corresponds to Stage I of (), with ne (taking French as the paradigm case once again) as the single logical negator. To this they add Stage Ib, in which the paselement appears as an ‘optional pseudoargumental reinforcer’. By this they mean that pas (or equivalent, in the case of French point, mie, goutte, etc.) is, following Jespersen’s original insight, an optional ‘extra’ element adding ‘emphasis’, loosely speaking, to the negative force of the sentence. These elements are typically minimizers, expressing the lowest end of a pragmatic/semantic scale (Fauconnier ; Horn ; Chierchia ; Eckardt ). Breitbarth, Lucas, and Willis refer to them as pseudoargumental since they originate as the selected complements of verbs, but in many cases the verbs in question are optionally transitive, allowing these elements to begin to be interpreted as adverbial reinforcers rather than as optional arguments; another ‘bridging context’ in their terminology involves reanalysing a nominal with a partitive complement as a reinforcer (as we saw with point in (), () above). The transition from Stage Ib to Stage II involves reanalysing the pseudoargumental reinforcer as a Negative Polarity Adverb (NPA); a precondition for this is that the pre-existing negator (the ne-element) must have an uninterpretable negation feature, uNeg, in Zeijlstra’s (, ) terms. We saw in §.. that OF had strict negative concord, so ne was already uNeg. This transition involves the former pseudoarguments escaping the selectional restrictions of the verbs which originally selected them and thereby being reanalysed as adverbs. As minimizers, they ‘make the clauses in which they are used more informative, and focus the absence of alternatives or their 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRAMMATICALIZATION

improbability’ (Breitbarth, Lucas, and Willis : ); this captures the intuition that they emphasize or reinforce the negation. The reanalysis of these elements as NPAs requires semantic bleaching of the original nominal so that it can escape selection restrictions imposed by the verb of which it was originally a complement, along with syntactic changes in internal structure and external distribution, similar but (for Breitbarth, Lucas, and Willis) not identical to the kind of reanalysis seen in (). Breitbarth, Lucas, and Willis argue that at this stage these elements function as restrictors on indefinite quantifiers. They next become adjuncts, i.e. full-fledged NPAs. Stage IIb of the cycle involves their reanalysis as negative elements, initially possibly combined with an inherent Focus feature. Since the original negator has uNeg, these elements take on an interpretable negation feature iNeg and Op¬ disappears from the relevant contexts (see §..). Whilst the original negator is typically a clitic, or otherwise morphosyntactically bound to the finite verb, the new negator has greater syntactic freedom and is able to function as a constituent negator and as a narrow focus element. In Stage IIc of the cycle, the original negator loses its negation feature altogether and remains as an ‘optional remnant’ or is exapted into a different function. Finally, in Stage III the original negator disappears, leaving just the innovated negator carrying the iNeg feature. Breitbarth, Lucas, and Willis’s account of Jespersen’s cycle is broadly compatible with the account of the grammaticalization of point above, although it adds important details regarding semantic and pragmatic factors, as well as introducing and utilising the important notion of bridging contexts which favour reanalysis. So we see that this case of grammaticalization involves several reanalyses. The most important reanalysis, the one which fundamentally changes the nature of negation system, was caused by the ambiguous expression of the iNeg feature. In the older structure in (a) the iNeg feature is associated with Op¬, while in the innovative structure in (b) an iNeg feature is associated with point (see §..). The null negative D is a silent n-word, so it has a uNeg feature which Agrees with point (or pas), as an instance of negative concord. This reanalysis changed the value of Parameter D of Chapter , (), giving rise to a system of negative concord with multiple clausal negation.16 As we saw 16 However, it is difficult to be sure of the chronology. We saw in §.. that ne . . . pas became the obligatory form of negation in the seventeenth century. Personne became an



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

in §.., it also caused postverbal negatives other than pas/point to be reanalysed as n-words, making French a non-strict negative-concord language (recall that OF was a strict negative-concord language, as we saw in §..). As a typical case of grammaticalization, this change shows that we can regard grammaticalization in general as parameter change with associated reanalysis. The parameter change takes place when the p-expression is ambiguous and a reanalysis takes place. Like many grammaticalization cycles, the changes in negation systems illustrated by Jespersen’s Cycle are not strictly cyclic (see Hopper and Traugott : ; and, in particular, Van Gelderen , , , ), but we can observe an interesting series of apparently related changes that recur across languages. Roberts and Roussou () present a number of cases of the same type affecting the T-, C-, and D-systems; in each instance grammaticalization involves reanalysis triggered by ambiguous p-expression and associated parameter change. Van Gelderen (, ) explores and illustrates the nature of grammaticalization cycles in great detail; see also the papers in Bouzouita, Breitbarth, Danckaert, and Witzenhausen (). In van Gelderen (), she illustrates the following cycles across a wide range of languages and language families (see her table ., p. ; see also van Gelderen : , table , van Gelderen : , table .): ()

a. Subject agreement: demonstrative/emphatic/noun > pronoun > agreement > zero; b. Object agreement: demonstrative/pronoun > agreement > zero; c. Copula cycle: demonstrative > copula > zero; d. Case or Definiteness or DP: demonstrative > definite article > ‘Case’ > zero; e. Negative: i. negative argument > negative adverb > negative particle > zero (i.e. Jespersen’s Cycle); ii. verb > aspect > negative > C; f. Future and Aspect Auxiliary: A/P > M > T > C.

n-word in the seventeenth century, and Brunot and Bruneau (), cited in Déprez (: ), point out that the changes in the article system were not complete until that time. We may therefore tentatively continue to date the change as taking place at this period. Breitbarth, Lucas, and Willis (: figure .) give the beginning of the seventeenth century as the transition point from Stage I/II to Stage II in French. See however Chapter , note , for some indication that this chronology may not be fully correct.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRAMMATICALIZATION

From the fact that several of the cycles end in zero, we see that they are not exactly cycles: the cyclic nature of the changes lies in the fact as one exponent traverses the stages illustrated, it is potentially ‘renewed’ by another element which enters the cycle and then ‘follows’ the original one through the various stages. In this sense, we can think of pas or point as ‘renewing’ ne in French as the clausal exponent of negation. Van Gelderen argues that cycles of the kind illustrated in () are a consequence of a general principle of Feature Economy, which she formulates as follows (Van Gelderen : ): ()

Minimize the semantic and interpretable features in the derivation.17

The typical pathway of change is as in (): ()

Adjunct Specifier Semantic > [iF] >

Head [uF] >

Ø Ø

As Van Gelderen points out, we see the pattern of change in () in Jespersen’s Cycle, where Adjunct corresponds to Breitbarth, Lucas, and Willis’s () NPA, as described above and Specifier to the widelyheld idea, first proposed in Pollock () for French and adopted by Zanuttini (), Zeijlstra (), Van Gelderen (), WLB (: –), among many others, that pas-type negators occupy a Specifier position, usually simply called SpecNegP. The later reanalysis from Specifier to head also follows van Gelderen’s construal of Feature Economy (see below) and a change from interpretable to uninterpretable F. The disappearance of an exponent of a functional head (Head > Ø in ()) follows from a preference for acquirers to analyse overt 17 This version of Feature Economy is rather different from the one put forward in in Chapter , (), repeated here: (i) Postulate as few formal features as possible. Following Roberts and Roussou (: ), (i) is a simple feature-counting mechanism; see also the general discussion of evaluation metrics in §... Van Gelderen’s approach, on the other hand, penalizes certain types of features and so involves a more elaborate notion of economy. Whilst in itself this is not problematic (we mentioned in §.. that Chomsky and Halle’s () original feature-counting evaluation metric was formulated so as to count only marked features), one could wonder why semantic features, which play no role in the narrow-syntactic derivation, should be preferred to formal features, or why interpretable features, which do not imply an Agree relation with another feature bundle, should be preferred to uninterpretable features, which do. If anything, from the point of view of the cost to the syntactic derivation, the relative economy relations could be the exact opposite to what van Gelderen proposes.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

material as lexical rather than functional (Van Gelderen : , following Faarlund : ). Van Gelderen (: , : ) puts forward a further important principle, the Head Preference Principle (HPP): ()

Be a head, rather than a phrase.

Again, van Gelderen (: –) justifies this in terms of evidence from language acquisition. In more recent work, van Gelderen () connects this to Chomsky’s (, ) Labelling Algorithm; see also Dadan (). A further general principle that van Gelderen (: , –; : ) adopts is the Late Merge Principle: ()

Merge as late as possible.

() applies to External Merge. Late External Merge is preferred over early External Merge combined with movement (i.e. Internal Merge) to a higher position for economy reasons. In this way () leads to ‘upward reanalysis’ whereby an earlier movement operation raising an element from a lower position to a higher one is replaced by direct (external) Merge into the higher position; in this respect van Gelderen’s () derives the merge-over-move preference in diachronic change from economy in a way that converges with Roberts and Roussou’s () approach based on Feature Economy as defined in () of Chapter  (see also note ). However, van Gelderen’s approach is more general as it dissociates Late Merge from loss of movement in principle, although the two can and do coincide in many cases of grammaticalization; see van Gelderen (: ) for discussion. Although there are a number of differences, both Roberts and Roussou’s approach and van Gelderen’s share the idea that grammaticalization can be analysed in formal terms as reanalysis, typically ‘upwards’, leading to category change, driven by economy preferences of some kind which derive from third factors in the sense of Chomsky (); see Chapter , (). The approaches differ somewhat in how economy is to be construed (see note ), and van Gelderen places greater emphasis on cycles of renewal of grammatical material. But the two approaches have much in common. We have illustrated grammaticalization with arguably the best-known cycle, Jespersen’s Cycle, showing how Roberts and Roussou account for the grammaticalization of the pas-type negator (the transition from Stage I to Stage II in ()). 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

ARGUMENT STRUCTURE

This change is also linked to two parameter changes in French, affecting Parameters D and D of §.. (), bringing about the Modern French system of non-strict negative concord combined with multiple exponence of clausal negation. This also completes the account of the history of French negation presented in §... It seems clear that grammaticalization is a pervasive kind of syntactic change which is driven by central aspects of the design of the language faculty. A further important aspect of grammaticalization is that it is typically microparametric, in that it affects small numbers of lexical items, typically as exponents of functional heads; in fact, it generally creates new exponents of functional heads. We mentioned in §.. that Parameters D and D, which regulate the type of negative concord and the number of exponents of clausal negation, are microparameters. Again, Jespersen’s Cycle emerges as a canonical case of grammaticalization, which we can now see as pervasive, usually endogenous microparametric change driven by reanalysis.

2.3. Argument structure 2.3.1. Thematic roles and grammatical functions In this section, I turn to changes in argument structure, the way in which the participants in the action, event or state of affairs described by a predicate are realized in the structure of the sentences containing that predicate. An important distinction to be made in this connection is that between semantic (or thematic) roles such as Agent, Patient, Recipient, etc., and grammatical functions such as subject, direct object, indirect object, etc. Following a standard view in generative grammar, I take thematic roles to be primitives associated with each verb18 as a matter of lexical semantics (which is not to say that verbs do not fall into lexical classes; they do, as we shall see), and grammatical functions to be defined in terms of structural positions. Thus, the subject is the DP Specifier of TP and the direct object is the DP complement of V, for example. Although distinct, there are connections between thematic 18 Actually all lexical categories assign thematic roles, but I restrict attention here to verbs as this is the richest category in terms of thematic structure, and because the changes we will be looking at concern verbs.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

roles and grammatical functions: for example, Agents are always subjects (in active clauses), although subjects need not always be Agents, as the subjects of stative verbs like know, believe, and contain show. The relation between thematic role and grammatical function is specified lexically for each verb. As just mentioned, verbs fall into lexical subclasses. These can be defined in terms of the number of thematic roles they have and the way in which they distribute these. A thorough discussion of the verb classes of English can be found in Levin and Rappaport-Hovav ();19 here I will limit my attention to the one or two types which are relevant for the discussion of changes to follow. Traditional grammar recognizes a distinction between transitive and intransitive verbs (see Law :  on the origin of this notion); the former are verbs with two thematic roles (for example, eat, like, hit) while the latter are verbs with just one (for example, laugh, cough, fall, die). There are also verbs with three thematic roles, such as give, send, show. Some of these verbs can appear in what is often called the ‘double-object’ construction as in John sent Mary a letter (see Chapter , note ); below we will look at one change that has affected this construction in the history of English. Modern linguistic theory has established a distinction between two types of intransitives: unergatives such as laugh and sing, and unaccusatives such as fall and die (see Perlmutter ; Burzio ). In the former, the single argument of the verb is a true subject. (These verbs are usually agentive.) In the latter, the verb’s argument is first-merged as an object, and moves to the subject position. There is much crosslinguistic evidence for this distinction, although in English the evidence is rather indirect. The clearest indication of unaccusativity in English lies in the availability of a deverbal adjective formed from the verb’s participle: thus we have a fallen leaf, meaning ‘a leaf which has fallen’, but not a laughed man (meaning ‘a man who has laughed’); so we see that fall is an unaccusative verb and laugh is an unergative. For a thorough critique and discussion of this and other tests for unaccusativity in English, see Baker (b).

19 See also Borer (), Ramchand (), and Baker (a) for analyses of verb classes from a non-lexicalist perspective in which argument-structure distinctions are captured by structural differences in complex, iterated VPs (“VP-shells”; see Larson  and below).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

ARGUMENT STRUCTURE

The distinction between thematic roles and grammatical functions can be observed when we compare agentive transitives with so-called ‘psychological’ verbs (henceforth psych verbs), i.e. those which describe a psychological event or state. Consider the following pair of sentences: ()

a. John reads the newspaper. b. John likes the newspaper.

In both of these examples, John is the subject and the newspaper is the direct object. However, while in (a) John is the Agent of the action described by read and the newspaper is the Patient of the action, in (b) John has the thematic role of Experiencer, the person of whom the psychological state described by like holds, and the newspaper is what that state is about, the Theme.20 Psych verbs, unlike action transitives, can in fact distribute their thematic roles ‘the other way around’, as it were, making the Theme the subject and the Experiencer the object: compare the newspaper pleases/amuses/annoys/appals John with (b). This possibility gives rise to doublets of psych verbs which are very close in meaning but which distribute their thematic roles differently, such as like/please, fear/frighten, etc. Many languages have a third psych-verb construction in which the Theme is the subject and the Experiencer is marked like an indirect object. This construction is restricted to one verb in present-day English, appeal (as in the newspaper appeals to John), but is cross-linguistically common. The (near) loss of this construction in the history of English is one important change in argument structure that we will look at below. Argument structure can be manipulated by syntactic operations. The best-known and probably most widespread such operation is the passive. In the passive of a transitive verb, the DP which corresponds to the direct object in an active sentence functions as the subject and the DP corresponding to the subject of an active sentence is either absent or appears in a by-phrase, so the passive of (a) is the newspaper was read (by John), for example. Double-object verbs passivize the first object,

20 The terminology associated with thematic roles is notoriously varied. I will attempt to use the most neutral labels possible, and hence use ‘Theme’ here. Pesetsky (: ff.) argues that there are in fact various thematic roles associated with what I am calling the Theme argument of psych verbs. See Landau () for discussion of Pesetsky’s proposals.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

which in fact corresponds to the notional indirect object, as the following examples illustrate:21 ()

a. b. c. d. e.

John sent Mary a letter. John sent a letter to Mary. Mary was sent a letter (by John). A letter was sent to Mary (by John). %A letter was sent Mary (by John).

The passive in (c) is known as the ‘recipient passive’. This construction has changed in the recorded history of English, as we shall see below. I follow the standard assumption in taking both thematic roles and grammatical functions to be universal. Languages vary somewhat in how grammatical functions are morphosyntactically marked: the Modern English system relies primarily on word order, but many languages have morphological case marking on DP constituents (nouns, articles, and other DP-internal elements such as adnominal adjectives), which plays a major role in marking grammatical function; this was the situation in Latin and, to some degree, in OE. Still other languages may mark grammatical functions by means of agreement, and many languages combine these various methods. We take all of this to involve parametric variation (concerning the Agree and Move relations), retaining the view that grammatical functions are defined in structural terms however they are overtly marked. 2.3.2. Changes in English psych verbs and recipient passives Where there is synchronic variation, there is diachronic change. The ways in which languages mark grammatical functions can change, as 21 The ‘%’ diacritic in front of (e) indicates that the example is not acceptable in all dialects of English. Most American speakers reject examples of this type. They are natural in North-Western varieties of British English; there are in fact several distinct varieties in this region (see Siewierska and Hollman ; Haddican ; Haddican and Holmberg ; Myler ; Biggs , ; Holmberg, Sheehan, and van der Wal ) which differ regarding the precise range of theme passives of the general kind in (e) they allow, and how this is related to various kinds of ‘Preposition-drop’ or silent Prepositions (as in I’m going the pub). Biggs () shows that there is a recent innovation in Liverpool English, giving rise, for speakers roughly under , to a much more liberal variety than elsewhere in the North-West (or of course Standard British or American English), which allows examples like (i): (i)

a. The book was given the teacher. b. The package was sent her nan’s.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

ARGUMENT STRUCTURE

indeed they have done in the history of English and in the development from Latin to Modern Romance. Here I want to focus on two changes involving the marking of grammatical functions in the history of English. I will suggest that one of these changes was a microparametric change while the other was a nanoparametric change affecting individual lexical items, or rather a series of nanoparametric changes diffusing through the lexicon. The first change concerns recipient passives and the second concerns psych constructions. Both have been much discussed in the literature on diachronic syntax: see Lightfoot (, : ff., : ff.); Fischer and van der Leek (); Allen (, ); Anderson (); Denison (, : ff.); Guidi (); van Gelderen (); and Stein, Ingham, and Trips (). The main traditional studies are van der Gaaf () and Jespersen (–: III). Much of the following discussion is based on Allen (), which is the most thorough and authoritative study of these and other changes. The first change affects recipient passives. () shows that in recipient passives in OE the subject of the passive retained its indirect-object marking, i.e. dative case: ()

ac him næs getiðod ðære lytlan lisse. [OE] and him- not-was granted the small favour- ‘But he was not granted that small favour.’ (ÆCHom I ..; Denison : )

Here the other argument of ‘grant’, ðære lytlan lisse, is in the genitive case; I will return to this point below. Concerning the second change, one class of psych constructions is illustrated in (). This is the third type of psych construction described above—that where the Experiencer is marked as an indirect object. () shows that this construction was found in OE: ()

hu him se sige gelicade how him- the victory- pleased ‘how the victory had pleased him’ (Or .; Denison : )

[OE]

Here the Experiencer, him, is in the dative case, the typical marking of an indirect object in OE (the sg accusative masculine pronoun was hine in OE). The Theme argument is in the nominative case, although it was not the subject; see Allen () for extensive discussion of this point, along with Guidi (). The verb, translated as ‘please’ here, is 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

lician, the ancestor of NE like. We observe that this verb has undergone a redistribution of its thematic roles (although not really a change in meaning, since the core meaning has involved causing pleasure all along), presumably associated with the loss of the construction in (). A further point that is relevant here is that certain two-argument verbs required their object to have some case other than the accusative. When passivized, the case of the active object is retained on the passive subject. This can be seen in the following example with the passive of ‘help’, a verb whose object was required to be dative: ()

Ac ðæm mæg beon suiðe hraðe geholpen [OE] and that- may be very quickly helped from his lareowe. by his teacher ‘But that may be remedied very quickly by his teacher.’ (CP ..; Fischer et al. : )

The three constructions shown in ()–() were all lost during the ME period. Since they all feature dative case, and English lost its morphological case system during the same period, it is tempting to relate these developments, as many authors have done. Aside from their intrinsic interest, this lends some importance to the discussion of these changes, as they may reflect some of the syntactic consequences of the loss of morphological case marking. The NE counterparts to the OE constructions in ()–() are as follows: ()

a. How he liked the victory. b. But he was not granted that small favour. c. But that may be helped very quickly by his teacher.

Allen () argues that the change affecting psych verbs was distinct from that affecting the passive constructions. Regarding the psych verbs, she says ‘the loss of case distinctions did not make the ‘“impersonal” constructions impossible, but contributed to the decline in frequency of these constructions which ultimately resulted in changes to the grammar which made them ungrammatical’ (). Regarding the passive constructions in () and (), on the other hand, she states that these changes ‘support the generative view that a syntactic change can be an essentially sudden reanalysis or change in parameter-settings which take place as a by-product of another change which removes . . . the evidence available 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

ARGUMENT STRUCTURE

to language learners for the old analysis’ (). This contrasts with what she refers to as the ‘lexically-implemented’ change involving the loss of the psych construction (and the associated changes in the relation between thematic roles and grammatical functions in some verbs such as like). The changes in the psych verbs affected individual verbs and diffused through the lexicon over a considerable period. Allen (: ff.) argues that the beginnings of this change may be discerned in the optional assignment of ‘lexical case’ (for the moment this can be taken to mean dative case, see below) to the Experiencer arguments of certain verbs in OE (for example, sceamian ‘to be ashamed’; Allen gives the full range of data in her table ., ) and says that ‘while the loss of morphological case distinctions may well have exacerbated the tendency to treat Experiencers as nominative subjects (at least as an option), it did not create it’ (). The change was completed only by approximately , in that the sixteenth-century examples of this construction appear to be fixed expressions, the best-known of which is methinks. Although Allen describes this as non-parametric change, since it appears to affect individual lexical entries one by one rather than changing a productive syntactic property, we could view this as a series of nanoparametric changes. In §.. (d), we defined nanoparameters as follows: ‘one or more individual lexical items is/are specified for vi.’ Diffusion of a change through the lexicon could therefore be seen as a series of nanoparametric changes affecting individual psych verbs over a long period. The difficulty with this idea is that, at first sight, it appears to be incompatible with the BCC (see §.. () and () above), in that vi in the definition of nanoparameters refers to a value of a formal feature of a functional head. I will return to this point below, once the structural properties of psych-predicates have been more fully introduced. In §.., I discuss the nature of nanoparametric change in more detail. In contrast to the psych verbs, as Allen says, the changes to the passive constructions are good candidates for treatment as parametric change. I will suggest that there was a parameter change and associated reanalysis involving the case features associated with the DPs bearing thematic roles (the arguments) in these constructions. This parameter change was associated with a reanalysis caused by the loss of overt morphological case distinctions. We will see that this was a microparametric change in the sense of the taxonomy in §.. (). 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

OE had a morphological case system in which nominative, accusative, dative, and genitive cases were distinguished. There was already some syncretism among case forms in OE, and Allen (: ff.) shows in detail that the system had broken down in all the ME dialects except Kentish by the end of the thirteenth century at the latest (see her table ., p. ). Allen argues that this change directly caused the loss of indirect passives in the early thirteenth century: ‘[t]he indirect passives [i.e. the construction in ()—IR], disappear just at the time when this morphological distinction disappeared in most of the country and follows straightforwardly from the fact that there was no longer any evidence available to language-learners for two types of objects of monotransitive verbs’ (). We can understand this change as a reanalysis and associated parameter change, but in order to see this, certain assumptions about case and arguments must be introduced. Generative theory postulates the existence of abstract Case (written with a capital ‘C’ to distinguish it from morphological case). Nominative and Accusative Case can be thought of as uninterpretable features associated with argument DPs which must be deleted under an Agree relation with features of a relevant head, subject to the usual conditions on Agree (see §..). Nominative Case is deleted under Agree with the φ-features of finite T, giving rise to subject-verb agreement in finite clauses. Accusative Case is similarly deleted under Agree with v, a verbal functional category which takes the lexical VP as its structural complement. Both the Probe and the Goal bear uninterpretable features in these instances. The φ and Case features in a simple transitive sentence (for example, John met Mary) are thus as in (): ()

TP DP John [iφ, uNom]

T’ T [uφ]

vP (DP John)

v’ v [uφ]

VP V met



DP Mary [iφ, uAcc]

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

ARGUMENT STRUCTURE

Here we have the Agree relations described in (): ()

a. v’s [uφ] features Agree with those of the object Mary, resulting in deletion of v’s features and of DP’s ACC feature; b. T’s [uφ] features Agree with those of the subject John, resulting in deletion of T’s φ-features and DP’s NOM feature.

In addition, the subject DP raises to SpecTP. We will return to the mechanism which causes this movement to happen in §... For now, it is important to note that the subject is first-merged in SpecvP. From now on, I will adopt this version of the idea that the subject is merged in a predicate-internal position, rather than the idea that the subject is merged in SpecVP, which was discussed in §.... In () the Case/ Agree relations are purely structural, in that they are blind to the lexical properties of the verb. This is not true for all Case features, as we will see below. Burzio (: ) puts forward an important generalization regarding the nature of transitive clauses: Accusative Case is present if and only if the verb assigns a subject thematic role. This idea has become known as Burzio’s generalization. Assuming a category such as v makes it possible to locate the two properties related by Burzio’s generalization on a single head. (I will refer to these as the ‘Burzio properties’ henceforth.) Let us suppose that v assigns the subject thematic role and, as just outlined, is responsible for deleting the uninterpretable Accusative feature of the direct object. Following Chomsky (: ), when v has this double property we write it as v*. In unaccusative clauses, v is either defective or absent; either way, it lacks the ‘Burzio properties’ of Agreeing for Accusative Case and having a thematic subject merged in its specifier. I will assume for simplicity that v is simply absent in these cases. (This will be revised in §...) Under these conditions, the object can be marked as Nominative and raises to the subject position: ()

[TP John+NOM T[uφ] [VP arrived (John+NOM)]]

In passives, v also lacks the Burzio properties. This is what causes the object to be able to appear in the subject position, and the original active subject to be either absent or to appear in a by-phrase: ()

[TP John+NOM [T[uφ] was] [vP v [VP arrested (John+NOM)]]]



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

Presumably the passive morphology on the verbal participle plays a crucial role in determining the ‘defective’ nature of v here—see Baker, Johnson, and Roberts () for more on this point. In both () and (), all the conditions for T to Agree with the DP John inside the VP are met: T and DP have non-distinct formal features, T asymmetrically c-commands DP and there is no third category non-distinct in formal features from T which T c-commands and which in turn c-commands the VP-internal DP (see the discussion of Agree in §..). In a transitive clause like (), on the other hand, T cannot ‘reach’ the VP-internal DP Mary since the DP John in SpecvP (before it is moved to SpecTP) is closer to T in the sense that T c-commands John and John c-commands Mary. So John ‘breaks’ the potential Agree relation between T and the VP-internal DP here, but not in () or ().22 Now, let us suppose that languages with richer morphological case marking than NE, such as OE, make syntactic distinctions among the features which license the verb’s arguments in addition to the simple Accusative vs. Nominative Case of NE.23 As the OE evidence we have seen clearly shows, we need to allow for Dative arguments of V, i.e. an abstract Dative Case which corresponds to morphological dative case. (We also saw an example of a Genitive argument in (); the same considerations apply here.) Let us suppose that morphological dative case is a realization of an abstract Dative Case feature associated with a particular functional head. This head could be a further example of ‘little’ v, comparable to the transitive v* but distinct from it in its feature make-up. For clarity, I will follow the terminology introduced by Pylkkänen (), which originates in descriptive Bantu linguistics, and refer to this as the Applicative head, Appl for short. Appl Agrees for Dative with a DP under the usual conditions for Agree. Furthermore, ‘inherent’ Cases of the Dative type are directly associated with thematic roles in a way in which structural Nominative and Accusative 22 The analysis of the passive sketched in () raises the question of the status and position of the by-phrase in passives, as well as the question of the representation of the ‘understood’ counterpart of the active subject in passives where there is no by-phrase (note that John was arrested means ‘someone arrested John’, so the question is what the syntactic status of the argument ‘someone’ is here). For different answers to these questions, see Baker, Johnson, and Roberts (); Collins (); Roberts (a: –); den Dikken (); and Akkus, Legate, Ringe, and Šereikaitė (). 23 NE also has a Genitive Case, operative inside DPs, but I will leave that aside here and restrict the discussion to Case at the clausal level.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

ARGUMENT STRUCTURE

are not; the presence of Appl must be compatible with the lexical semantics of the verb, in that the verb assigns a Recipient θ-role to the Dative argument which agrees with Appl. Now we are in a position to see what the parametric change was in thirteenth-century English. In OE and EME, as long as the morphological case distinctions were manifest, v Agreed with abstract Accusative Case, corresponding to morphological accusative case only, just as described above for NE with the single difference that the Accusative feature corresponded to a particular inflectional marking on the DP object. Recipient arguments of V were associated with a Dative Appl. After the loss of morphological case distinctions, Dative Appl disappeared from these contexts and v’s feature makeup changed in such a way that it Agreed with any and all non-subject arguments, i.e. its φ-features valued the Case feature of any available DP. This parametric change was associated with the reanalysis shown in ():24 ()

[CP Him+DAT [TP [T was] [vP v [ApplP ApplDAT [VP helped (him +DAT)]]]]] > [TP He+NOM [T was+ uφ] [vP v [VP helped (he+NOM)]]]

In the old structure in (), if v is present, it is inert since there is nothing it can Agree with; Appl is a closer Probe for the Recipient argument him since it is c-commanded by v but c-commands him. In the new structure in (), on the other hand, the passive v has a different status, in that its ability to value Case on the object has been switched off, while in the old structure v had no such property to be switched off; this was a variety of impersonal passive in the sense that no internal argument needed to have its Case feature deleted under passive as the only Case feature available was Appl’s DAT feature. The head v here is thus intransitive, as it is in unergative intransitives, or when the only complement of V is a PP or a CP. I attribute the cause of reanalysis to the loss of morphological case distinctions, exactly as stated by Allen (: ): it ‘follows straightforwardly from the fact 24 Dative Experiencers could be subjects in OE and ME, as in Modern Icelandic (Sigurðsson ); Allen (: ff.) argues extensively that some preverbal datives in OE were subjects. On the other hand, Allen (: ) points out that ‘although PDEs [preposed dative Experiencers—IGR] behaved like subjects, preposed dative Recipients in passive ditransitive constructions did not’. Accordingly, I treat the root category as CP here, with the Dative argument a topic in SpecCP (the internal argument of help is a Recipient). See also Eythórssen and Barðdal (: –).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

that there was no longer any evidence available to language-learners for two types of objects of monotransitive verbs’ (). Once him+DAT could no longer be distinguished from him+ACC in the active, the latter option was chosen and, by implication, v was taken to be ‘passivisable’ with its ACC-licensing property (uninterpretable φ-features) switched off, as described above for (). This also led to subject agreement, since T has taken on subject φ-features as a consequence of this change. In the older structure, T was ‘impersonal’, i.e. its φ-features bore the default sg values. This change affected v’s feature makeup, and as a parametric change had a number of consequences. The first was the loss of indirect passives. As Allen shows, this construction was lost in the early thirteenth century. Since v took on uninterpretable φ-features in all (transitive) clauses, a further consequence was the loss of all non-subject arguments which formerly bore inherent Case (either Genitive or Dative), i.e. all arguments in the c-command (and therefore the Agree) domain of v; more precisely, the head(s) (at least Appl for Dative and perhaps others for Genitive) that licensed the earlier inherent Cases had to disappear once v took on uninterpretable φ-features as they would otherwise have blocked Agree with the internal argument as described above. Allen points out in connection with genitive-marked object arguments that ‘we can say that in those dialects in which there is clear evidence of a dative/accusative distinction, genitive objects are still found, while in those dialects in which this distinction was lost, genitive objects are not found’ (). (She does go on to point out that genitive objects were in any case being replaced by accusative or prepositional objects from late OE onwards.) Third, dative-marked subject arguments can survive. Allen (: ff.) argues that exactly this happened in the case of many psych verbs; we will return to this point below. A fourth and related point is that this development contributed to the loss or lexical reorganization of psych verbs, i.e. the nanoparametric changes affecting these verbs, whose exact nature we will also consider below. As we have seen, the changes in psych verbs were gradual, but the microparametric change in v’s properties just described ruled out a formerly available possibility for these verbs and so played a role in furthering the ongoing nanoparametric changes. Finally, the reanalysis in () affected recipient passives. However, Allen shows that the modern construction is not reliably attested before , over a century after the parametric change just discussed. She 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

ARGUMENT STRUCTURE

comments ‘the historical record does not support the notion of a replacement of the dative-fronted passive by the recipient passive . . . Rather, the dative-fronted passive seems to have died out from the texts some time before the recipient passive was introduced’ (). There is, in fact, a period in the fourteenth century when neither construction is found. The microparametric change we have proposed will account for the disappearance of the old, dative-fronted construction, since this contained a Dative argument licensed by Appl in the thirteenth century. But in itself it predicts nothing about the introduction of the modern-style recipient passive. In fact, the NE recipient passive requires the presence of two occurrences of v*, one associated with the subject thematic role and uninterpretable φ-features, the other associated with the indirect-object thematic role and uninterpretable φ-features, as shown in () (on the motivation for this type of analysis of double-object constructions, see Larson ): ()

[TP John [v*P (John) v*[uφ] [VP gave [v*P Mary+ACC v*[uφ] [VP V a book+ACC]]]]] Agree

Agree

Here the upper v*’s [uφ] features Agree with those of Mary and the lower v’s Agrees with the φ-features of a book. In modern recipient passives, only the upper v* is ‘switched off ’ in the sense mentioned above: ()

[TP John [T was] [vP v [VP given [v*P (John) v* [uφ] [VP V a book+ACC]]]]] Agree

Stein, Ingham, and Trips () add important further data and analysis to Allen’s general conclusions. On the basis of a corpus study of the relevant periods, they show that the earliest appearances of clear cases of the recipient passive date to the mid-fifteenth century and that, prior to the sixteenth century, all of them involve verbs of French origin (e.g. inform, allow, assign, and others). The earliest clear examples with native verbs (give, hinder, and bid) are from , , and  respectively. On the other hand, theme passives appear frequently with native verbs throughout ME. They propose that the recipient passive was introduced into late ME in part as a result of extensive contact with French. Stein, Ingham, and Trips (: , ) refer to ‘copying’ of abstract structural properties into the English grammatical 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

system when verbs having these properties were borrowed from French. If this is copying of formal features such as structural Case, we could think of this as ‘borrowing’ of new parameter values; we will return to this point when we look more closely at contact-induced change in §... Two things emerge from Stein, Ingham, and Trips’ additions and refinements of Allen’s account of recipient passives. First, it seems clear that the earlier indirect passive was completely lost. As Allen says, the recipient passive did not ‘replace’ it; instead, it was an independent development which took place approximately two centuries after the loss of indirect passives. Second, the innovation of the recipient passive was, if Stein, Ingham, and Trips () are correct, partly the result of contact with French. So the loss of the indirect passive and the introduction of the recipient passive were separate parametric changes: the first entailed the loss of Dative Appl, while the second entailed the introduction of an ‘extra’ v*. Both of these are clearly microparametric changes. Finally, let us briefly reconsider the psych verbs. The reanalysis affecting a verb such as like must have had the form in (): ()

[CP/TP Him+DAT . . . T[uφ] [vPExp vExp [VP (him+DAT) like pears+NOM]]] > [TP He+NOM T[uφ] [v*P (he+NOM) v*[uφ] [VP likes pears +ACC]]]

The OE psych construction featured a further kind of v, which I have labelled simply vExp here (see Belletti and Rizzi  for a similar proposal). This v, like Appl, Agreed for Dative Case with the Experiencer argument first-merged in SpecVP. Although similar to Appl, we must distinguish vExp from Appl as it was not lost when the indirect passive disappeared, as we have seen. If we treat inherent Case as a different kind of feature from structural Case, then T would be able to Agree for NOM with the direct object, as shown in the first line of (), since the Dative Experiencer would not count as a closer goal for T and so would not ‘break’ the Agree relation between T and the Theme DP. (Bejar : ,  proposes a similar analysis for these constructions.) The reanalysis involves replacing vExp with v*, the latter with its normal properties of being associated with a subject argument in its Specifier and Agreeing with the VP-internal object. (Again, Bejar : ,  proposes the same thing; van Gelderen  also 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

ARGUMENT STRUCTURE

proposes that this change in psych verbs involved changes in v’s feature make-up, although she assumes that aspectual rather than Case features are involved.) In line with Allen’s () conclusions, as reported above, I take it that this reanalysis was facilitated, but not caused, by the loss of Dative-licensing Appl and of distinct morphological dative case. One consequence of the reanalysis in () is that the first-merged position of the Experiencer argument changed from SpecVP to Specv*P. Another is that the Theme ceases to Agree with T but instead Agrees with v like a ‘regular’ direct object. These changes together give the effect of a change in the grammatical functions of the arguments of like, although, as we pointed out above, like does not actually change meaning. We said above that the change to the psych verbs, since it affected psych verbs gradually over a period of several centuries between the OE period and the sixteenth century, can be seen as a series of nanoparametric changes as defined in §.. (). If this was a parametric change, then, to be consistent with the BCC, it should affect the formal features of a functional head. We can now maintain that this is the case in that we can regard the change shown in () as a change in the formal features of v, from having Dative Case to having φ-features. This might appear to be a microparametric change to the formal features of v, but, since Dative is an inherent Case, vExp can only appear where the lexical verb is a psych verb, i.e. has an Experiencer θ-role to assign. Thus the occurrence of vExp is indirectly lexically conditioned. Not all psych verbs are associated with vExp; in OE as in ME and NE there were psych verbs which appeared with a nominative Experiencer and an accusative Theme, i.e. which were associated with v* rather than vExp, e.g. cwēman ‘to please’ in the following example: ()

ic cweme drihtne on rice lyfigendra [OE] I. please lord- in kingdom of.the.living ‘I please the Lord in the kingdom of the living.’ (Miura : ; PsGlF (Kimmens) [ (.)]; DOE s.v. cwēman)

What we observe, in late ME in particular, is the gradual reduction in the incidence of vExp once the morphological reflex of abstract Dative Case, i.e. morphological dative case, is lost. The indirect lexical conditioning of the microparametric property of vExp caused its loss to be lexically conditioned; this is our construal of Allen’s statement that the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

change was ‘lexically implemented.’ This sensitivity to lexical items made this a nanoparametric change. In conclusion, in this section we have looked at well-known examples of changes affecting both the realization of argument structure of psych verbs and changes in the functioning of the major grammatical-function changing operation, the passive. Largely thanks to Allen (), these changes are empirically well-documented. We have seen how two parametric changes and associated reanalyses can account for many aspects of these changes, in line with the general approach being advocated in this chapter. As it affected the history of English, this change is usually thought of as an example of how the loss of morphological case marking may affect syntax and the lexicon. In the parametric analysis sketched above, this idea is directly reflected, in that v* is associated with φ-features in systems where the complements of the verb do not show morphological case distinctions. The most important general conclusion from the above discussion is that we have observed the interaction of two kinds of parametric change: microparametric changes (changes in Appl and in the distribution and feature content of v) and an indirectly lexically conditioned series of nanoparametric changes. All of these changes are associated with reanalyses, as we have seen (see () and ()). Although the microparametric changes influenced the lexically conditioned nanoparametric change, the two kinds of change are in principle independent and operate in rather different ways, with microparametric change being rather sudden and the nanoparametric changes appearing to diffuse through the lexicon as the associated reanalyses take place in the context of different lexical verbs.

2.4. Changes in complementation25 In this section we are once again concerned with the nature of the arguments bearing thematic roles determined by verbs. However, the focus here is not on changes in how thematic roles are mapped onto the grammatical functions or on changes in grammatical-function changing operations such as passives, but rather on how the same 25 The revision of this section for the second edition has benefitted enormously from detailed comments by Adam Ledgeway, for which I am immensely grateful.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CHANGES IN COMPLEMENTATION

argument in the same grammatical function may change status. Moreover, I will concentrate on arguments that express a proposition of some kind, and which are therefore typically realized as clausal constituents (mostly but not exclusively as CPs, as we will see). So the main focus will be on how the propositional arguments associated with verbs like ‘order’, ‘desire’, ‘say’, etc., may change their syntactic properties, without changing either their thematic role (roughly Theme, in each of these cases) or their grammatical function (structurally the complement of the verb in each of these cases). The particular examples of changes in complementation I will look at here are those which distinguish the Romance languages as a whole from Latin. Given the range of languages and constructions to be discussed, the treatment will of necessity be rather coarse-grained. Nevertheless it is possible to observe some interesting diachronic processes at work, and once again we encounter reanalysis and different kinds of parametric change. The general conclusion will be that these notions are relevant to the diachrony of (clausal) complementation, as they are to the other diachronic processes discussed in this chapter. Vincent (: –) summarizes the clausal complementation system of Latin, presenting five main types of complement, as follows (see also Ledgeway : –, and table .):26 ()

a. ut/ne + subjunctive (verbs of ordering, desiring, warning, requesting, urging, fearing, etc.):27 Ubii Caesarem orant ut sibi parcat. [Latin] ubii- Caesar- beg- UT selves- spare ‘The Ubii beg Caesar to spare them.’

26 Horrocks (: –) gives a very similar list, adding quin + subjunctive complement to a negated main verb such as ‘hinder’ or ‘be doubtful’, accusative + participle complements to perception verbs, gerund/gerundive complements to ‘see to’ and various adjectives and nouns, as well as supines in complements to adjectives. All of these completely disappeared in Romance, and so I leave them aside here. 27 There is a further class of complements in ut following mostly impersonal verbs expressing existence, non-existence or simple events: (i)

Accidit ut esset luna happened UT -.- moon ‘There happened to be a full moon.’ (B.G. , , I; Ernout and Thomas : )

plena. full

[Latin]

This ut is negated with ut non rather than ne. I will leave it aside in what follows; see Ernout and Thomas (: ff.), Horrocks (: f.).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

b. (Bare) infinitive (‘want’, ‘prefer’, ‘dare’, ‘try’, ‘begin’, etc.): Volo vincere. want- to-win ‘I want to win.’ c. Accusative-cum-infinitive (AcI: ‘verbs of saying, thinking, hoping, perceiving’ ()): Dicit te errare. says- you()- to-go-wrong ‘He says you are going wrong.’ d. Quod (or quia) + indicative (‘verbs of emotion where in a loose sense the complement can be said to express the cause or origin of the emotion’ ()): Dolet mihi quod tu nunc stomacharis. pains- me- QUOD you()- now are-angry- ‘It pains me that/because you are angry now.’ e. Indirect question (‘any verb with the appropriate meaning’ (), marked by an initial wh-expression in the subordinate clause with the verb in the subjunctive): Ab homine quaesivi quis esset. from man- asked- who be--- ‘I asked the man who he was.’ (Ernout and Thomas : ) In most of the Romance languages, in particular French and (Standard) Italian, this system has changed quite drastically. This can be seen if we translate the above examples into French: ()

a. Les ubii supplient César de les épargner. [French] the Ubii beg- Caesar DE them spare ‘The Ubii beg Caesar to spare them.’ b. Je veux gagner. I want to-win ‘I want to win.’ c. Il/elle dit que tu te trompes. s/he says- that you()- you()- mistake ‘He says you are going wrong.’



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CHANGES IN COMPLEMENTATION

d. Ça me fait de la peine parce que tu it me- makes of the pain because that you() es fâché maintenant. are angry now ‘It pains me that/because you are angry now.’ e. J’ai demandé à l’homme qui il était. I have asked to the man who he was- ‘I asked the man who he was.’ The changes can be summarized as follows (see Vincent : ):28 ()

a. b. c. d. e.

loss of ut/ne + subjunctive; restriction in distribution of the bare infinitive; loss of AcI; the spread of quod-clauses into former (a) and (c) environments; no change in wh-clauses (except that the mood of the lower clause may now be indicative, but see below and §.. on multiple wh-movement in Latin).

The different changes illustrate a variety of patterns of loss, restriction, spread, and, in the case of (e), (near) stability. Let us look at the changes more closely and see whether we can see any more general patterns. (a) In the Modern Romance languages, the Latin ut + subjunctive construction has completely disappeared and has been replaced by ‘prepositional infinitives’, i.e. infinitival clauses introduced by a complementizer derived from a preposition, a/à or di/de. Prepositions frequently grammaticalize as complementizers; this kind of development is discussed in Haspelmath (); Hopper and Traugott (: –); Roberts and Roussou (: ff.); van Gelderen (: –); and the references given there. It seems quite reasonable to treat ut and ne as complementizers in Latin, members of category C heading the CP complement of the relevant classes of verbs (see Ledgeway : –, :  on the C-position in Classical Latin). Modern Romance a/à and di/de clauses are usually treated as CPs (see, for example, Jones (: ); Ledgeway (: –), but 28 Vincent adds the development of the causative construction from facere (‘do/ make’) + infinitive. I will leave this construction aside here, since it arguably also involves changes in grammatical functions. On the development of Romance causatives, see Davies (); Sheehan (: ); and Vincent (: –).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

see Kayne (: ff.) for a very different view), and we can take a/à and di/de to be complementizers (Kayne : –; Ledgeway : ; : –). In that case, the change that has taken place here seems to involve one type of CP (non-finite, introduced by a grammaticalized preposition) replacing another (finite, introduced by a particle). It is important to bear in mind that ‘replacement’ does not imply ‘reanalysis’ here; it is not clear whether or when the infinitival constructions replaced the ut ones, although there is some evidence for complementizer a in Vulgar Latin (Gamillscheg : , cited in Hopper and Traugott : ). In addition, ut-clauses were also replaced by quod/quid in Late Latin, leading to the occurrence of both finite (que/che-clauses) and infinitival irrealis complements in Romance (thanks to Adam Ledgeway, p.c., for reminding me of this last point). (b) The change affecting bare infinitives also involves prepositional infinitives. The latter have replaced bare infinitives in various contexts, notably object control, i.e. cases where the reading of the understood subject of the infinitive is determined by the object of the main verb. In Latin, this construction could involve bare infinitives, but in Modern French and Italian a prepositional complementizer is always required in these cases:29 ()

a. Ab opere . . . legatos discedere vetuerat. [Latin] from work- legates move-away had-forbidden- ‘He had forbidden the legates to move away from the work.’ (Caesar, B.G. , , ; Ernout and Thomas : ) b. Il avait défendu aux légats de [French] he had forbidden to-the legates  s’éloigner des travaux. selves-distance from-the works ‘He had forbidden the legates to move away from the work.’ (Ernout and Thomas’ (: ) translation of (a))

29 As Adam Ledgeway, p.c., points out, this is not the case in Ibero-Romance, where bare infinitival complements are very common in object-control complements. Finite subjunctive complements also appear fairly naturally in this context in Ibero-Romance: (i)

Pidió a María que comprara el coche. Ask. to Maria that buy. the car . . ‘He asked María to buy the car (i.e. that she buy the car).’



[Spanish]

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CHANGES IN COMPLEMENTATION

Similarly, a number of the subject-control verbs (i.e. verbs with infinitival complements whose implicit subject is understood as corresponding to the main-clause subject) listed by Ernout and Thomas (: ) require a prepositional infinitive in Modern Romance, for example, studeo (‘be eager’), postulo (‘claim’), although not all of them do: in French and Italian a few verbs have preposition-less infinitival complements (e.g. aimer/amare ‘to love’). Again, Ibero-Romance shows a bare infinitive very frequently in subject-control contexts, as in object-control contexts as mentioned above (Adam Ledgeway, p.c.). Many of the verbs which take a bare infinitival complement in Modern Romance are ‘semiauxiliary’ verbs (for example, vouloir (‘want’), pouvoir (‘can’), the causatives faire (‘do/make’) and laisser (‘let’), perception verbs such as voir (‘see’), entendre (‘hear’), and some impersonals such as falloir (‘be necessary’). In these cases, it is likely that the complement clause is somehow reduced, i.e. not a CP but perhaps a TP or vP; see Sheehan , forthcoming, Sheehan and Cyrino ). One important class of verbs with a propositional bare-infinitive complement in Modern French is the so-called ‘cognitive verbs’, which express ‘belief or the communication of belief ’ (Jones : ). These in fact correspond to the Latin AcI construction, where the subject of the infinitive was overt and marked accusative, unlike in French, where the subject is understood and ‘controlled’ by the main-clause subject:30 ()

a. Te abisse hodie hinc negas? [Latin] you()- go-away-past- today here deny-- ‘Do you deny that you left here today?’ (Vincent : ) b. Est-ce que tu nies être parti [French] is it that you deny to-be left d’ici aujourd’hui? from-here today ‘Do you deny that you left here today?’

30 AcI is found in higher registers of French where the subject undergoes whmovement, as in: (i)

Quelle femme crois -tu avoir volé notre Which woman think -you have. stolen our ‘Which woman do you think to have/has stolen our money?’

See Ledgeway (: ).



argent? money

[French]

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

It seems that this construction derives from the earlier AcI construction, which in these cases was replaced by a bare-infinitive construction with subject control (see again Ledgeway : –). (c) The Latin AcI construction has disappeared with propositional complements of the type illustrated in (c) in Romance.31 This construction appears to resemble the English construction in (), although in Latin it is found in the complement of a wider range of verbs: ()

I believe him to be mistaken.

In the construction in () the Accusative Case of the subject of the infinitival clause depends on the verb (or, more precisely, v*—see the previous section) of the main clause. The clearest way to see this is by passivizing the main verb, in which case the subject of the infinitive becomes the subject of the main clause: ()

He is believed to be mistaken.

This is what we expect if the Case of the infinitival subject depends on the main-clause v*. Under passivization, as we saw in the previous section, v* is rendered ‘defective’ and as such unable to Agree with φfeatures on any category in its c-command domain. The subject of the infinitive can thus not be Accusative; instead, it Agrees for Nominative with the main-clause finite T (and moves to the main-clause subject position). In Latin, it seems we can find the same pattern as in English () and (). In particular, we find examples where the main verb is passivized and the subject of the infinitive appears in the nominative case: ()

a. Galli dicuntur in Italiam transisse. [Latin] Gauls- say-- in Italy- to-have-crossed ‘The Gauls are said to have crossed into Italy.’ (Ernout and Thomas : ) b. Traditur Homerus caecus fuisse. report-- Homer- blind- to-have-been ‘Homer is reported to have been blind.’ (Vincent : )

31 It survives with perception verbs and causative laisser. Here again, though, it is likely that the complement is not a full CP but a smaller category; see Sheehan (: , forthcoming), Sheehan and Cyrino (). In any case, it arguably denotes an event rather than a proposition, see Guasti (: ff., ff.).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CHANGES IN COMPLEMENTATION

We can thus analyse these examples along the same lines as the English one in (). This implies that in the active AcI constructions the Accusative Case of the subject of the infinitive Agrees with v* of the main clause. (This construction is usually analysed as a TP rather than a CP complement in English (for example, in Chomsky : ); below I will suggest that both the Latin and the English constructions are CPs, following Kayne , and, for Latin, Ledgeway : f.) However, the possibility illustrated in () was restricted to a subclass of the verbs of saying. (Woodcock :  gives an indication of which authors used which verbs in this construction.) The apparently more productive option features the subject of the infinitive in the Latin construction in the accusative independently of the main clause. The evidence for this is summarized in Bolkestein (). First, alongside examples like (), we find examples where the main verb is passive and yet the subject of the complement infinitive is nevertheless accusative: ()

Dicitur Gallos in Italiam transisse. [Latin] say-- Gauls- in Italy- to-have-crossed ‘It is said that the Gauls have crossed into Italy.’ (Ernout and Thomas : )

Second, AcI clauses are found as complements to Nouns, which is impossible with the nominal equivalents of English verbs which appear in the AcI construction: ()

a. *the belief (of) him to be mistaken b. nuntius oppidum teneri message town- to-be-held ‘the message that the town was being held’ (Bolkestein : )

[Latin]

Third, many verbs which appear in the AcI construction do not otherwise have an Accusative object. This is true of dicere (‘to say’), as in shown in (): ()

a. Dico te venisse. I-say you- come-- ‘I say that you have come.’ b. *Dico te. I-say you (Bolkestein : ) 

[Latin]

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

Finally, there are verbs such as credere (‘to think/believe’) which allow an AcI complement but take a direct nominal complement in the dative, rather than the accusative (Cann : ; Vincent : ). We must therefore allow for some mechanism of Accusative Agreement inside the infinitival clause, since by assumption the v of the main clause cannot be responsible for this in cases like () and (), neither can the noun nuntius in (b) or a dative-taking verb such as credere. I propose, following Cecchetto and Oniga (), linking this to the fact that infinitivals bear morphological marking of tense/aspect and voice in Latin. Thus, alongside the present active infinitive, for example, facere (‘to do’), we have the perfect active fecisse (‘to have done’), the future active facturum esse (‘to be about do’) and the corresponding passive forms factum esse ‘to have been done’ (perfect passive), fieri ‘to be done’ (present passive) and factum iri ‘to be about to be done’ (future passive) (see Harris : ). Although many of these forms are periphrastic and imply a rather complex morphosyntactic analysis which I cannot go into here, the coexistence of synthetic forms like fecisse and facere suggests that Latin infinitives are significantly different from those of Modern Romance, where no such opposition survives. This is further supported, as Cecchetto and Oniga (: ) point out, by the fact that Latin AcI clauses allow the full range of infinitivaltense forms: ()

a. Dicunt eum laudare - him- praise-- ‘They say that he praises her.’ b. Dicunt eum laudavisse say- him- praise-- ‘They say that he praised her.’ c. Dicunt eum laudaturum say- him- praise-- ‘They say that he will praise her.’

eam. her-

[Latin]

eam. her- esse eam. to-be her-

Let us suppose that the tensed nature of Latin infinitives implies the presence of a functional head with the capacity to Agree with an Accusative subject. This can explain the data in ()–(). (Cecchetto and Oniga (: ) make the same connection between tensed forms of infinitives and the possibility of Accusative subjects of infinitives, but in a technically more indirect way.) 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CHANGES IN COMPLEMENTATION

The analysis of () just sketched provides a way of understanding why this variant of the AcI construction does not survive in Romance. All other things being equal, we predict that it died out with the tense/ aspect marking of infinitives, which appears to have died out in Late Latin (Harris : ; Haverling ). I will return to the question of why the English-style option for the AcI was lost. What is clear is that the AcI was replaced by clauses introduced by quod, the final change to be considered and to which we turn immediately. (Recall that there has been essentially no change in the nature of C-system in indirect questions; see (e) and (e) above.) (d) The commonest pattern of clausal complementation in Modern Romance involves a finite clause introduced by que/che, which derives from Latin quod/quid, the nominative/accusative neuter form of the relative pronoun (Harris : ). Since que/che clauses commonly appear as the complements of verbs of saying and believing in Modern Romance, they have clearly taken over much of the distribution of the Latin AcI construction. Quod-clauses were originally found in various non-complement positions: for example as subjects or adverbials, as in (): ()

a. Multum ei detraxit . . . quod [Latin] much him- detracted . . . QUOD alienae erat civitatis. foreign- was- city- ‘The fact that he was from a foreign city detracted from him a great deal.’ (Nep. , , ; Ernout and Thomas : ) b. Adsunt propterea quod officium sequuntur. are.- on.that.account QUOD duty- follow- ‘They are present because they follow duty.’ (Cicero; Kennedy : )

Quod-clauses also followed adverbials such as nisi (‘unless’), praeterquam (‘except’), etc., and appeared to require a factive meaning, in that the truth of the proposition expressed by the complement clause was presupposed (see Ernout and Thomas : ff. and note the factive interpretations of (d); the factive interpretation is clearest where the verb is indicative). Still according to Ernout and Thomas (: ), quod-clauses appear as complements to ‘metalinguistic’ verbs like 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

addere (‘add’), praeterire (‘elude’), mittere (‘omit’), and with impersonal eventive verbs, usually accompanied by an adverb, or facere (‘do/make’) accompanied by an adverb: ()

accidit perincommode quod eum [Latin] happened- unfortunately QUOD him nusquam vidisti. nowhere saw- ‘It is unfortunate that you didn’t see him anywhere.’32 (Cicero, At. , , ; Ernout and Thomas : )

Here too the factive interpretation is clear. Finally, both Woodcock (: ) and Ernout and Thomas (: ) point out that quod-clauses first appear with verbs of saying and believing in apposition with a neuter form. Kühner and Stegmann (: ) give examples from Plautus illustrating this development; see also Adams (: –). I will return to this point below. According to Ernout and Thomas, quod-clauses appear as a direct complement to verbs of saying and believing only in Late Latin, in Petronius’ imitations of the speech of the lower classes or freed slaves (‘affranchis ou de petites gens’), and the language of translations from Greek (following the légo óti (‘I say that’) pattern), especially Christian ones. Woodcock (: ) points out that quod-clauses commonly appear instead of the AcI ‘from the second century of our era’, while Adams (: ) says that they become common in the third century; see also the references in Ledgeway (: , note ). Similarly, Kühner and Stegmann (: ) say that quod-clauses replaced AcIclauses in Late Latin. On the other hand, Adams (: ) points out that, although attested in Late Latin, ‘they remained in the minority compared with the accusative + infinitive’ (see also Adams ). Ledgeway (: ) says ‘the AcI construction was commonly replaced by a finite complement clause introduced by the complementizers QUOD and QUIA, a usage finally consolidated as the core complementation pattern in vulgar texts after the fall of the empire’. According to Ernout and Thomas, the earliest example of a quod-clause is (a); (b) is from Petronius; (c) is from the Vulgate:

32 Ernout and Thomas’ French translation is ‘il est très malheureux que tu ne l’aies vu nulle part.’



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

()

CHANGES IN COMPLEMENTATION

a. Legati Carteienses renuntiaverunt quod [Latin] legates- from-Carteia announced- QUOD Pompeium in potestate haberent. Pompey- in power- had-- ‘The legates of the people of Carteia announced that they had Pompey in their power.’ (B. Hisp. , ; Ernout and Thomas : ) b. Scis quod epulum dedi. know- QUOD meal- gave- ‘You know that I gave a meal.’ (Petronius , ; Ernout and Thomas : ) c. Scimus quia hic est filius noster. we-know QUIA this is son our ‘We know that this is our son.’ (Vulgate: John , ; Ernout and Thomas : )

(Quia, ‘because’, was an alternative to quod at this stage, at least in ecclesiastical writers—see below.) It seems fairly clear, then, that quod-clauses were not true complements to verbs of saying and believing in Classical Latin, although they developed into complement clauses in Late Latin. The factive interpretation of quod-clauses, their ability to appear as subjects and the origin of quod as a relative pronoun all point to an original status as a nominal. Let us suppose then that quod-clauses were DPs in Classical Latin, headed by the D quod, which in turn selected a CP (see Kiparsky and Kiparsky , Farkas , Roussou , , , Ledgeway  on the notion of factives as ‘nominalized clauses’; traditionally, such clauses are known as ‘noun clauses’). This structure was reanalysed as a CP with C quod in Late Latin, and as such it was able to appear in the complement to CP-taking verbs which in Classical Latin took the AcI construction, i.e. verbs of saying and believing. The reanalysis is schematized in (): ()

[DP [D quod] [CP [TP epulum dedi]]] > [CP [C quod] [TP epulum dedi]]

I will return below to the parameter change associated with this reanalysis. The development of quod-complements in Late Latin follows a fairly well-established diachronic path, at least across several branches of Indo-European and probably more widely (see Rix ; Clackson 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

: –; Hackstein ). We can summarize this as in (), using approximate English paraphrases:33 ()

a. Parataxis: ‘I said that. That is true.’ b. Correlative: ‘I said that, that/which is true.’ c. Subordinate clause: ‘I said that that is true.’

The paratactic stage (a) is of relatively little relevance here. It is worth emphasizing that this putative stage was not one at which clausal embedding did not exist. The evidence is that the most archaic forms of clausal embedding in Latin and elsewhere in Indo-European involved non-finite and participial clauses of various kinds, a pattern that is amply attested in Sanskrit, Archaic and Classical Latin, Homeric and Classical Greek (Roberts and Roussou : –), and Old Irish, among other languages; see Kiparsky (), Bate (). Reanalysis along these lines seems also to have taken place in Germanic (Harris and Campbell : –; Kiparsky ; and Roberts and Roussou : –). The fact that finite complementizers very often develop from relative pronouns is clearly consistent with this: English/Germanic that and variants originate in the neuter nominative/accusative form of the relative pronoun; the Modern Greek complementizer oti derives from the Classical Greek relative pronoun hōs/hóti; Bayer and Dasgupta (: ) note that the Bangla complementizer/relativizer/ discourse particle je derives from the masculine singular relative pronoun of Vedic; in Old Irish the main subordinator was the nasalizing relative clause, i.e. nasal mutation on the initial consonant of the verb in the subordinate clause (which was initial, Old Irish being VSO); Old Iranian had finite subordinate clauses introduced by initial complementizers derived from relative pronouns (Skjærvø : , –). Hackstein () observes a development from (co)relatives to finite factive complementation in Tocharian and elsewhere in older Indo-European. In Hittite, clausal embedding often involved kuit, again a complementizer derived from a neuter relative pronoun (Cotticelli-Kurras ). So there is ample evidence across Indo-European for a reanalysis of the kind seen in () linking the schema in (b) with that in (c). 33 Roussou (: ) proposes that the reanalysis creating complementizers from relative pronouns involves a change from anaphora to complementation, without a change in the categorial status of the pronoun/complementizer (quod in ()). In other words, the crucial change is the structural reanalysis of (b) as (c) without an associated categorial change in the pronoun/complementizer.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CHANGES IN COMPLEMENTATION

Correlative structures as in (b) typically involve headless relative clauses with an anaphoric pronoun in the main clause. Here is a Classical Latin example: quidem [dictum [Latin] () ne hoc  ... even [say..... putas], think..] quod . . . Taurum ipse transisti? QUOD . . . Taurus... self... surmount.. ‘[Do you think that] not even this was said, that he himself surmounted the Taurus?’ (Cic. Fam. ,,; Hackstein : ) Here the quod-clause is a relative clause, with quod the neuter singular nominative/accusative form of the relative pronoun. As such, it is a DP. Adams (: ) says that ‘[t]here can be little doubt that the quod-construction arose from the use of verba dicendi et sentiendi [verbs of saying and thinking, IR] with a pronominal neuter object anticipating a following explanatory clause introduced by quod’, as we see in (). A further relevant property of many of the older IndoEuropean languages, including Early and Classical Latin, was that they allowed null objects. Hence a pronoun in the position of hoc in an example like () could be null. Hackstein (: ) points out that this first occurs with factive verbs. Thus an example like (b) was in principle amenable to reanalysis from a correlative with null equivalent of hoc or id ‘anticipating’ the quod-clause, to a structure in which the quod-clause is the direct clausal complement to scire (‘know’), with quod the initial complementizer as shown in (). This reanalysis creates finite complement clauses. Up to now I have been assuming a generic ‘Romance’ complementation system. However, it is worth pointing out that the system is quite different in a number of Southern Italian dialects; see Rohlfs (: ); Ledgeway (, , , , : –, : –); and Manzini and Savoia (: ff.).34 This is illustrated for some of these varieties in (), an abridged version of Ledgeway’s (: , table .): 34 Colasanti () describes a variety from Southern Lazio which has ternary complementizer system in which factive complements are marked by co (from quod) as opposed to realis non-factives marked by ca (from quia) and irrealis complements marked by che (from quid).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

()

a. Sicilian: pensu ca vèni b. N. Calabrian: criju ca vèni c. Salentino: crisciu ca vène ‘I think he’ll come.’

vògghiu chi mmanciassi vuogliu chi mmangia ogghiu cu mmancia ‘I want that he eat.’

The ca complementizer, corresponding to Latin AcI, derives from Latin quia, which, as just mentioned, was an alternative to quod in Classical and Late Latin. However, it appears that quia was more common than quod in the relevant contexts in older Latin (Ernout and Thomas : ), and so it is possible that this different system arose in the area where Latin had been spoken longer. In this area, quia-clauses, reanalysed as in (), took over from Latin AcI, and quod-clauses, reanalysed in the same way, replaced ut-clauses.35 A consequence of this is that prepositional infinitives are rarer in these varieties than in many Romance varieties (although we observed above that they are rarer in Ibero-Romance than in French or Italian, for example). In roughly this area, infinitives are highly restricted, occurring only in complements to ‘semi-auxiliary’ verbs like volere (‘want’), etc. (see Ledgeway : ff. and especially Ledgeway : , table . and references given there).36

35 Actually there must have been more to these changes than what is sketched here. Ledgeway (: –) argues that ca (the reflex of quia) occupies a higher position in the left periphery than chi/cu, the reflex of quid/quod in the modern Southern Italian dialects. In terms of the ‘split-CP’ of Rizzi (), introduced in §..., Ledgeway proposes that ca occupies Force and chi/cu occupies Fin. This entails a complication of the simple reanalysis seen in (), which I will not go into here. It may also imply that ut occupied Fin in Classical Latin, with the possibility that ut/ne clauses where FinPs, while AcIs were ForcePs. This possibility requires much closer scrutiny than we can give it here, however. 36 There is a further general complementation pattern in Romance, found in Romanian and in two dialect areas of the extreme South of Italy (Southern Calabria/North-East Sicily and Salento). Here infinitives are almost entirely absent, and specific particles introduce the subjunctive clauses corresponding to Latin ut-clauses, as in the Romanian examples in (i) and (ii): (i) Cred că va I-believe that will ‘I believe he’ll come.’

veni. come

[Romanian]

(ii) Vreau să vină. I-want  come- ‘I want him to come.’ This system is characteristic of the Balkan Sprachbund, and is plausibly attributable to the influence of Byzantine Greek (see Calabrese :  on Salentino; see Ledgeway  on Southern Calabrian/North-East Sicilian, and Ledgeway , , , , : –).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CHANGES IN COMPLEMENTATION

Returning to the mainstream Romance system, let us recapitulate the changes we have discussed: ()

a. loss of ut/ne + subjunctive, largely replaced by the prepositional infinitive in French and Italian, although not in IberoRomance; b. restriction in distribution of the bare infinitive (except with ‘cognitive verbs’, and, again not in Ibero-Romance); c. loss of AcI in propositional complements; d. the spread of quod-clauses into former (c) environments.

All these changes affected the realization of CPs. We can summarize the situation further by saying that two new kinds of complementizers emerged in Late Latin: finite quid/quod (through the reanalysis in (), possibly originating in correlatives of the kind in ()) and, with more varied distribution across Romance, as we have seen, the prepositional complementizers a and de (through reanalysis of PPs as CPs). The former took over from the AcI, and the latter from ut and from many instances of bare infinitives. The Late Latin constructions arose through the reanalyses we have described and replaced the Classical Latin constructions.37 What caused the Classical Latin constructions to disappear? We can simply assume that ut disappeared through phonological attrition. Regarding the AcI construction, however, more needs to be said. This is where parameter change becomes relevant. First, we need to clarify the structure of the Latin AcI. As we saw in §.., Ledgeway (: –) proposes that this construction, as a conservative complementation pattern, ‘instantiates an earlier head-final construction introduced by a null head-final C’. So the structure is as follows (this example is repeated from §.., (a)): ()

[CP [TP tacitum te dicere] Ø] credo. [Latin] silent. you. say. I.believe ‘I fancy you say to yourself ’ (Ledgeway : ; Martial ..)

37 It is interesting to observe that ut probably underwent an earlier reanalysis of a kind similar to that affecting quod/quia, in that it originates as an adverbial introducing a clause. According to Sihler (: ) ut, or uti, comes from an earlier *kwuta, an indefinite/wh pronoun as the initial labiovelar indicates. (The initial kw was lost by reanalysis of the negative form *ne+cut(e)i as nec+uti.) The original meaning was ‘where, so that, as’. In being replaced by reanalysed a/de we observe a further case of a ‘cycle’ of grammaticalization (see §.).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

As Ledgeway points out, this null complementizer is similar to the overt infinitival complementizer for in English, as in () (Cecchetto and Oniga  propose the same, although they propose the null complementizer in the AcI is initial rather than final; see also Kayne :  and below): ()

a. [For [him to leave]] would be a mistake. b. It’s nice for the rich [for [the poor to do the work]].

The proposal for ‘null head-final for’ as we can call Ledgeway’s proposal for the AcI allows us to exploit an earlier idea of Kayne’s (: ) regarding certain differences between French and English. Kayne observes that English and French differ in two ways as regards infinitive constructions. English allows AcI constructions of the kind seen in () and () (i.e. where Accusative Case on the subject of the infinitive Agrees with v* in the matrix clause), while French does not. Second, the English prepositional complementizer for itself probes Accusative Case on the subject of the complement infinitive, as we see in (). In French, on the other hand, prepositional complementizers cannot be followed by an accusative subject, or indeed any kind of overt subject. We can adopt Kayne’s essential idea in terms of the following parameter: G. Does non-finite C probe accusative subjects in SpecTP? We can see that Classical Latin and English have a positive value for this parameter, while the Modern Romance languages (quite uniformly, despite all the other differences in their complementation systems) do not.38 A positive value for this parameter allows the English-style AcI construction and requires prepositional complementizers introducing infinitives to be followed by overt subjects. Given Ledgeway’s analysis of the AcI, we conclude that Classical Latin also had a positive value for this parameter (although it had a different value for the relevant word-order parameter in this case; see §..); Kayne (: ) observes these similarities and differences between Classical Latin and English and in fact suggests that Latin may have had a null C. A negative value of Parameter G bans the English-style AcI construction and overt subjects of infinitives introduced by prepositional 38 Recall that I am assuming that the complements to causative and perception verbs in Modern Romance are not CPs; see in particular Sheehan (, forthcoming) for relevant discussion.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CHANGES IN COMPLEMENTATION

complementizers. (In §.. we also saw Ledgeway’s : – evidence that the head-final AcI clauses and head-initial quod-clauses pattern in relation to the verb as predicted by FOFC; we will return to this point in §...) The value of Parameter G changed between Classical and Late Latin.39 This was caused by the reanalysis of a/de as C-elements and the loss of tense/aspect distinctions in infinitives; in terms of the analysis just proposed these aspect-marked infinitives may have been licensed by the null C in AcI constructions (this would be a difference between the Latin constructions and English for-infinitives).40 The consequences of this parameter change were: the complete loss of the AcI propositional complements and the associated reanalysis of factive quod-clauses as CPs, along with the reanalysis of AcI clauses with a subject coreferent to the main clause as bare infinitives. This parameter may be linked to much broader changes taking place in Late Latin (which began at a much earlier stage, as Ledgeway shows): Ledgeway (: ) suggests that the change from a null final marker to an overt initial particle may be reflected in the nominal domain by the change from postpositions, crucially including null postpositions associated with oblique case marking, to prepositions. The earlier nominallicensing pattern is shown in (a) and the innovative one in (b) (these examples are repeated from §.., ()):

39 It changed in the opposite direction in late ME, perhaps as a consequence of OV > VO word-order change. Fischer et al. (: ff.) provide a very interesting discussion of these developments. 40 In fact, the Latin AcI on Ledgeway’s analysis resembles more closely the English ACC-ing gerund, as in: (i) (ii)

[Him smoking those cigars] bothers me. I dislike [him smoking those cigars].

Here there is arguably a null C which licenses the gerundive form of the verb (we can see that this is a gerund and not a progressive from the fact that stative verbs are possible: I dislike [him knowing all the answers]), analogous to the null C in Latin which licenses the aspectually marked infinitives. There is in fact an interesting interaction with aspect in English, as the progressive is impossible: (iii) *I dislike [him being eating dinner so late these days]. (Compare the for-infinitive in I am happy [for him to be eating dinner so late these days].) This is a case of the well-known, but poorly understood, doubl-ing filter first observed by Ross ().



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

()

a. Pompeius [ . . . ] profiscitur [PP [Canusium] ØP] [Latin] Pompey. sets-out Canusium. ‘Pompey . . . sets out for Canusium.’ (Ledgeway : ; Caesar, De Bello Gallico ..) b. miles [PP ad [Capuam]] profectus sum soldier. to Capua. set-out I.am ‘I set out as a soldier for Capua.’ (Ledgeway : ; Cicero, De senectute )

Parameter G, as stated above, is fairly clearly a microparameter affecting a class of non-finite Cs. However, if Ledgeway’s proposals are correct, then there may be a mesoparameter affecting probes of a particular type including at least adpositions and complementizers. This mesoparameter gives the following two possible Agree configurations for some subset P of probes: ()

a. [XP [ . . . Goal . . . ] [Probe Ø]] b. [XP [Probe X 6¼ Ø] [ . . . Goal . . . ]]

Formulated in this way, we can see that this parameter is connected to word-order; we saw in §.. that Ledgeway (: ) relates these changes in CP and PP to the more general shift from primarily headfinal order in Early Latin to primarily head-initial order in Romance, a point we return to in §... However, the intriguing, and at first sight very difficult, question is why the ‘head-final probe’ should tend to be silent while the head-initial one tends to be overt.41 We cannot explore this question in detail here, but it suggests that these parameters (the microparameter G and the putative mesoparameter ()) are connected to a possible macroparameter determining head vs. dependent marking in the sense of Nichols (), or, more precisely, the formal congeners of those properties in terms of the probe-goal-Agree system. In any case, here we see the role of reanalysis and associated parameter change in changes in complementation, and possibly how changes in complementation may reflect other parametric changes. The development of the prepositional infinitives seems, however, to have been a separate change from the change in Parameter G (although it played a causal role in relation to the change in Parameter G). This 41 Although it may not always be overt if the comments in the previous note concerning English ACC-ing gerunds are on the right track.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CHANGES IN COMPLEMENTATION

change seems to be connected to a separate parametric change concerning the nature of C. After this change, overt elements in C mark finiteness (a/de mark a non-finite CP and, after the change in the status of quod, this element marks a finite CP). In Classical Latin, however, overt complementizers marked the mood of the clause: ut/ne marked the clause as subjunctive, and wh-complementizers ((e) above), prescriptively at least, were also always followed by a subjunctive. Ernout and Thomas (: –) point out that indicative indirect questions are found in Plautine Latin and in Late Latin.42 The fact that indirect questions clearly show up in the indicative in Late Latin can be considered a further consequence of this parametric change. One cause of the reanalysis of prepositions as complementizers, as often pointed out (see for example Harris : ) was the growing use of prepositions to mark case relations as the morphological case system began to suffer phonological erosion. This was particularly relevant as gerunds and supines, both arguably CPs contained in DPs in Classical Latin, required case. Hence, with the erosion of case marking and its replacement with prepositional constructions, prepositions began to be used with certain kinds of non-finite clauses. This may have facilitated the reanalysis of certain prepositions as complementizers. This in turn suggests that this change may in fact be a further reflex of the mesoparameter in (). So all the changes in (a–d) which took place between Classical and Late Latin may have been a reflex of a change from (a) to (b). On the other hand, (e) did not change in the sense that the nature of the [+wh] C introducing indirect questions did not change. However, we saw in §.. that Latin, unlike Modern Romance (with the exception of Romanian, see note  of Chapter ), allowed multiple wh-movement, including in indirect questions, as the following example, repeated from §.. (c), shows: ()

Ego [quid cui debe-a-m] sci-o. [Latin] I. what. whom. owe-.- know-. ‘I know what I owe to whom.’ (Seneca, De Beneficiis ..; Danckaert : )

Examples like () show that Parameter E, which determines the choice between multiple and single overt wh-movement, has also changed between Latin and Romance. It is uncertain when or how 42 Recall that we have suggested that quod was not a complementizer in Classical Latin.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

this happened, however (see Dadan  for a proposal). It appears to be unconnected to the changes in (a–d), for which we now have a unified account. Of course, this brief sketch has left many questions open, but it serves to illustrate in general terms how our parameterbased approach to diachronic syntax can account for many kinds of change, as well as illustrating the utility of distinguishing among different kinds of parameters. One traditional and often repeated view is that clausal subordination, or hypotaxis, is a relatively recent reanalysis of parataxis, or clausechaining (see for example Ernout and Thomas : ). This idea has a long history, going back at least to Schlegel () (see Harris and Campbell : –, ff.). However, the claim that earlier stages of certain languages may have lacked subordination altogether violates the uniformitarian hypothesis, the idea that all languages at all times reflect the same basic UG, and so cannot be taken seriously in the approach adopted here; for discussion of uniformitarianism in relation to the theory of parameters, see Roberts (a: –). Harris and Campbell (: ff.) provide good arguments against this idea. Their most incisive criticism runs as follows: ‘[e]ven if parataxis does develop into hypotaxis, in and of itself this does not tell us how hypotaxis, true subordination, developed’ (: ). So I conclude that the traditional parataxis-to-hypotaxis idea should be abandoned, as it is conceptually problematic and in practice unrevealing. On the other hand, it is quite plausible that a language may lack finite clausal subordination of the familiar type exemplified by English thatclauses and Romance que/che-clauses. In fact, prior to the grammaticalization of ut as a complementizer (see note ), Latin was probably such a language. Moreover, it is very likely that this is typical of the older Indo-European languages, as we mentioned above. Turkish is a further example of a language in which the familiar pattern of finite complementation plays a fairly marginal role: complementation is typically expressed by various kinds of nominalization (Kornfilt : ff.). In fact, the analysis of the development of Romance complementation sketched above implies that finite complementation with a dedicated, overt complementizer of the that/que/che type is a parametric option, and the synchronic and diachronic evidence is that this is basically correct. Of course, it is entirely likely that a notion such as ‘finite complement clause’ may need to be replaced by something more abstract; here again, the mesoparameter in () is relevant. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WORD-ORDER CHANGE

We have seen in this section that rather complex and pervasive changes in clausal complementation can be linked to two relatively simple but rather abstract microparametric changes affecting C, which may be subcases of a larger mesoparametric change, and their associated reanalyses. I have illustrated this with one case: the changes from Latin to Romance. However, there is nothing particularly unusual in these developments, and they are representative of the kinds of changes which can take place in complementation systems, as the evidence from elsewhere in Indo-European confirms. If so, then what we have seen here is an illustration of how changes in the complementation system can be handled in terms of parametric change affecting the category C.

2.5. Word-order change 2.5.1. Introduction In this section, I will focus primarily on the alternation between OV and VO orders, i.e. I will largely restrict attention to parameters F and F as defined in §... Concerning parameter F, we have already observed several times that Old English showed OV word order in embedded clauses. The following examples, which by now may be familiar, illustrate this: ()

a. . . . þæt ic þas boc of Ledenum gereordre [OE] . . . that I this book from Latin language to Engliscre spræce awende. to English tongue translate ‘ . . . that I translate this book from the Latin language to the English tongue.’ (AHTh, I, pref, ; van Kemenade : ) b. . . . þæt he his stefne up ahof. . . . that he his voice up raised ‘ . . . that he raised up his voice.’ (Bede .) c. . . . forþon of Breotone nædran on scippe lædde . . . because from Britain adders on ships brought wæron. were ‘ . . . because vipers were brought on ships from Britain.’ (Bede .–; Pintzuk : ) 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

We saw in §... that OE had verb-second order in main clauses, and so we do not expect to find overt OV order in such clauses, unless of course the object is fronted to first position. The position of the auxiliary in (c) indicates a further pattern which has changed since OE: the finite auxiliary in a subordinate clause followed the non-finite verb. This fact can be related to OV order in terms of parameter F of §., which I repeat here:43 ()

F. Does the structural complement of V/T precede or follow V/T?

It appears, then, that F had the value PRECEDE in OE, while of course it has the value FOLLOW in NE. It has therefore changed in the course of the history of English. The main goal of this section is to investigate in more detail what this assertion may mean, whether it is correct, and, if it is incorrect, how examples like those in () are to be interpreted. 2.5.2. Early typological approaches to word-order change The earliest approaches to word-order change were directly inspired by Greenberg’s () observations of implicational relations among word-order types. Lehmann () made two important proposals in this connection. First, he argued that subjects, since they may be dropped in many languages and can be pleonastic in any language (as far as is known), are not ‘primary elements’ of the clause. This reduces the word-order types to two: OV and VO. Second, Lehmann proposed that, in typologically consistent OV languages, verbal modifiers appear to the right of V and nominal modifiers to the left of O; in consistent VO languages we find the opposite pattern. Now, many, or probably most, languages are inconsistent in relation to this typology (cf. NE, which is VO with preverbal modifiers, consistent with the typology, but it also has prenominal adjectives and possessors, inconsistent with the typology). To account for this, Lehmann proposed that ‘when languages show patterns other than those expected, we may assume they are undergoing change’ (: ). Applied to the history of English, then, we could observe 43 F might be reformulated in the light of the postulation of vP between T and VP, but I leave this possible complication aside for the moment. I will return to v’s possible role in word-order change in §.. below.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WORD-ORDER CHANGE

that, throughout its history, English has been drifting from OV to VO. Presumably, ME represents the period in which this very determinant of word order was in transition from one type to the other. This statement, in Lehmann’s system, would automatically predict a shift from VAux to AuxV order at the same period; we pointed out in §.. that this is directly predicted by FOFC, since ‘mixed’ VAux and VO orders give rise to the illicit V>O>Aux order. Lehmann’s system also predicts a shift from RelN to NRel order in the history of English (see §. for an illustration of this typological property); this is correct, although one could not claim that ME was the transition period for this change, as NRel already predominates in OE.44 It seems unsatisfactory to consider the historical persistence of such ‘mixed’ systems as simply a feature of diachronic transition from one type to another. By this criterion, English has been in transition throughout its entire history, probably since Proto Germanic. If F of §.. is a true parameter, then the same point could of course be made in relation to a putative change in its value from PRECEDE to FOLLOW. Vennemann () also advocated reducing Greenberg’s wordorder types to OV and VO, leaving subjects out of the picture. He develops the Natural Serialization Principle (or NSP; originally proposed by Bartsch and Vennemann : ), which requires Operators and Operands to be serialized in a consistent order—either Operator>Operand, or Operand>Operator—in any language. Since objects, along with adjectives, relative clauses, etc., are Operators and verbs and nouns Operands, the NSP predicts the correlations with OV and VO orders which we observed in §.., and of course also predicts diachronic correlations. As in the case of Lehmann’s approach, the difficulty is that we are led to regard ‘mixed’ systems as persisting over very long periods, and, correspondingly, individual changes in parameters like those in F as taking place over similar periods (cf. Vennemann’s :  remark that ‘a language may become fairly consistent within a type in about  years’ (for example, English)). As various authors, for example, Comrie () and Song () have pointed out,

44 See Hawkins (: ), who states that Late Common Germanic was already NRel. The claim that the oldest attested IE languages were already NRel is important for Lehmann’s reconstruction of IE relatives (Lehmann : ). NE also retains AdjectiveNoun order rather than the predicted Noun-Adjective order although ME arguably had a greater incidence of NA order than does NE (Hawkins : ).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

this casts doubt on the strength of word-order conformity as a causal factor in change. This point is succinctly summarized in the following remark by Song (: ): [T]ypological consistency must at the same time be considered to be strong and weak . . . It must be weak enough to permit incongruous word order properties to be incorporated into typologically consistent languages in the first place and it must also be strong enough to remedy the resulting situation by bringing all remaining word order properties into line with the new ones.

Regarding the synchronic predictions, Hawkins (: ) points out that up to % of the languages in Greenberg’s original thirty-language sample do not conform to the NSP, and observes that ‘the NSP’s predictions are both too strong—there are too many exceptions—and too weak—there are distinctions between attested and non-attested language types that it is failing to capture’ (). Again we see the essential empirical inadequacy of this kind of approach. And once again, we must bear this in mind in relation to the status of the generalized head-complement parameter F of §... Lightfoot (: ff.) makes a different kind of criticism, observing that long-term changes of the type envisaged by Lehmann and Vennemann are incompatible with a view of grammar as a cognitive module in the sense advocated by Chomsky. (This idea and the justification for it were presented in the introduction to Chapter .) His point is that if grammars are psychological entities, then they are properties of individuals; they are reinvented anew with each generation of children. Long-term diachronic drift would then, all things being equal, entail a kind of ‘racial memory’ (: ) on the part of the children acquiring language in order to cause the drift to continue in a consistent direction. Notions such as racial memory have no place in modern scientific theories, and so, to the extent that an approach like Lehmann’s or Vennemann’s, combined with a generally Chomskyan view of the nature of language, entails such a thing, we must reject either the Lehmann–Vennemann account or the Chomskyan view of language. Lightfoot strongly advocates rejection of the former. We will reconsider this argument when we come to consider Sapir’s () notion of drift in §.. Despite these general difficulties with the NSP, Vennemann (: ) proposes an interesting analysis of OV > VO change. The central idea relies on Greenberg’s Universal : 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

()

WORD-ORDER CHANGE

If in a language the verb follows both the nominal subject and nominal object in dominant order, the language almost always has a case system (Greenberg : ).

Vennemann’s central idea is that ‘as reductive phonological change weakens the S-O [subject-object distinguishing—IR] morphology, and does not develop some substitute S-O morphology, the language becomes a VX language.’ This would naturally link the loss of case morphology, one type of ‘S-O morphology’, with the change from OV to VO. This seems attractive as an account of this change in both English and in the development from Latin to Romance, as this change was accompanied by the loss of morphological case marking (on nonpronouns) in both languages. However, it is clear that the loss of case is neither necessary nor sufficient for OV > VO change. Greek and Icelandic have both undergone this change while retaining their case systems; this point was also made by Kiparsky (: ). (See Taylor  on Greek, and Hróarsdóttir ,  on Icelandic, which we also mentioned in §...) According to Comrie (: –), the Baltic and the Slavonic languages may be similar. Conversely, Dutch has largely lost its case system and yet has remained underlyingly OV, according to mainstream generative analyses, beginning with Koster (). Finally, Comrie (: ) points out that Proto Niger-Congo has been reconstructed as an SOV language without case, and many languages of this family have changed to SVO, still with no case; and so this is an example of OV changing to VO quite independently of the loss of case. Hawkins (: ) proposed Cross-Categorial Harmony (CCH) as a generalization over many of Greenberg’s implicational universals, as well as a number of exceptions to them. Hawkins states it as follows: ‘there is a quantifiable preference for the ratio of preposed to postposed operators within one phrasal category . . . to generalize to the others.’ The term ‘operator’ is taken from Vennemann’s work, and so may be understood as described above. It is important to note that the principle is stated as a preference, rather than as an absolute requirement, and so grammars tend to correspond to it but do not have to. Furthermore, the principle makes reference to phrasal categories, explicitly acknowledging that phrase structure plays a role in accounting for these correlations. In fact, Hawkins (: ff.) adopts X0 -theory as the prime structural explanation for the CCH. X0 -theory is the theory of phrase 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

structure, a variant of which was summarized in () of §.... The central idea of X0 -theory is that all categories conform to the same structural template, which I repeat here as (): ()

a. [XP YP [X0 X . . . ]] (YP is a specifier of XP) b. [X0 X ZP . . . ] (ZP is the complement of X)

As Hawkins points out (: ), the basic advantage of X0 -theory is that, since it offers a category-neutral template for phrase structure, it is well-suited to the expression of cross-categorial generalizations like the CCH. We observed in §.. that a general head-complement ordering parameter of the kind given there as Parameter F—repeated here as ()—would predict spectacular cross-categorial harmony: ()

F. For all heads H, does the structural complement of H precede or follow H in overt order?

However, it is clear that, as it stands, such a parameter is subject to criticisms of the kind summarized above in relation to the early proposals of Lehmann and Vennemann. Hawkins, however, interprets the CCH as a preference (and note that Dryer  formulates his BDT as a preference too; see §..). In fact, Hawkins suggests that the CCH may derive from a preference for relatively simple grammars, since ‘the more similar the ordering of common constituents across phrasal categories at the relevant bar levels, the simpler are the word order rules of the grammar.’ If this is correct, then F cannot be a single parameter; cross-categorial harmony must derive from some higherorder factor determining interactions among formally independent parameter values, perhaps an evaluation metric of some kind. We will return to this idea in §.. 2.5.3. Generative accounts and directionality parameters Van Kemenade (), to some extent developing ideas in Canale () and Hiltunen (), influentially applied X0 -theory to accounting for the OV > VO change in the history of English. A general assumption in syntactic theory at the time was that there was a level of syntactic representation, known as the base, where the X0 template held in a ‘pure’ form. (This assumption has been dropped in minimalist versions of syntactic theory.) This template was subject to manipulation through movement operations at later stages of the derivation 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WORD-ORDER CHANGE

(cf. the discussion of Move in Chapter , Box .). We can thus speak of ‘underlyingly’ VO and OV languages, i.e. we may take a parameter such as F or F to hold in the base but to be to some extent obscured by the action of subsequent movement operations. The word-order parameters assumed in this type of theory clearly have a more abstract status than the word-order variants assumed by Lehmann, Vennemann, and Hawkins. Moreover, this approach allows for the possibility that surface orders may diverge from the underlying order. As long as this divergence is somehow accessible to language acquirers, then the underlying order can be maintained, i.e. the parameter controlling the base order can be set. If the divergence becomes too great, then acquirers may reset the parameter (the Transparency Principle might be relevant here; see §..). In these terms, a parameter like F derives from the option of headinitial or head-final order within the category formed by the head and its complement in the base, prior to the operation of any movement rules. Correspondingly, a parameter like F derives from a similar option where the head X is specified for some set of categorial features. This order may thus be H–XP (head-initial) or XP–H (head-final). Van Kemenade assumed the OE order to be XP–H for H=V; this corresponds to our parameter F, since van Kemenade assumes that auxiliaries, which we take to be T-elements, are verbs with sentential complements.45 Van Kemenade shows how certain movement operations disguised this underlying OV order in various ways. One such operation is known as extraposition, which moves a range of complements to the right of the verb, as shown by examples like the following: ()

a. . . . þæt ænig mon atellan mæge [ealne . . . that any man relate can all demm] misery ‘ . . . that any man can relate all misery’ (Orosius .–; Pintzuk : )

þone the

[OE]

45 In fact, van Kemenade took this order to be the consequence of the direction of assignment of thematic roles. If thematic roles are assigned to the right, the head assigning those roles precedes the complement being assigned them. If the roles are assigned to the left, the complement precedes the head. This, however, causes the parameter only to apply to cases where the head takes the complement as its semantic argument. It seems, though, that F has a wider purview than this, as was discussed in §...



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

b. Æfter ðisum gelamp þæt micel manncwealm becom after this happened that great pestilence came [ofer þære Romaniscan leode]. over the Roman people ‘Then it happened that a great plague came over the Roman people.’ (AHTh, II, , ; van Kemenade : ) In (a) an object DP, and (b) a PP, appears to the right of the finite verb in a subordinate clause. We thus say that the DP and PP are extraposed here. Now, PP-extraposition as in (b) is also found in Modern Dutch and German, but not DP-extraposition of the kind seen in (a), for ‘light’ DPs of the kind seen here; heavier ones, for example those including a relative clause, may extrapose in these languages. Starting with Stockwell (), it has been proposed that OE, especially in later periods, extended the incidence of DP-extraposition to a wider range of DPs than Dutch or German, in particular to ‘light’ DPs. Pintzuk and Kroch () showed that in early OE (in the eighth-century epic poem Beowulf) only prosodically heavy DPs were postverbal in subordinate clauses and these were preceded by a metrical break. In later OE prose, on the other hand, this is clearly not the case; see Fischer et al. (: –) for discussion. Examples like () illustrate this: ()

Þu hafast gecoren [DP þone wer]. thou hast chosen the man (ApT .; Fischer et al. : )

[OE]

Here, just the light DP þone wer is extraposed, and there is no evidence for a prosodic break after the participle gecoren. Van Kemenade (: ) concludes: ‘It is quite possible then, that the phenomenon of extraposition started off in early OE as the postposing of heavy constituents such as S [sentence/clause—IR], PP and heavy NP’s, and was extended later to include other constituents, light NP’s, adverbials.’ A second factor was ‘verb raising’ and ‘verb-projection raising’. These operations, the former found in Dutch and the latter in numerous Netherlandic, Swiss German, and other West Germanic dialects, derive orders in which the non-finite verb and possibly some of its complements appear to the right of the finite auxiliary. This is the opposite of the expected order in a language where V and 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WORD-ORDER CHANGE

T underlyingly follow their complements. For analyses of these phenomena in Modern West Germanic languages, see Hinterhölzl (), Koopman and Szabolcsi (), and in particular Wurmbrand (). OE had both operations, as the following examples illustrate: ()

a. đæt he Saul ne dorste ofslean (verb raising) [OE] that he Saul not dared murder ‘that he didn’t dare to murder Saul’ (CP, , ; van Kemenade : ) b. þæt he mehte his feorh generian (verb-projection that he could his property save raising) ‘so that he could save his property’ (Oros, , ; van Kemenade : )

Van Kemenade (: ) points out that the underlying order of OE was ‘not easily retrievable from surface patterns’ owing to the surface orders created by extraposition and verb(-projection) raising.46 This was a major factor leading to the change in the parameter governing the underlying order. As a consequence ‘the underlying SOV order changed to SVO. This change was completed around ’ (van Kemenade : ). A similar idea is proposed by Stockwell (), while Stockwell and Minkova () suggest that main-clause V, which gave rise to many surface VO orders, may have influenced the acquisition of subordinate OV order, and once this became VO it in turn influenced main-clause order. Lightfoot () makes a different proposal regarding the relation between word-order change in main and embedded clauses. He proposes a broadly similar account to van Kemenade’s in that movement rules obscure the underlying order in such a way as to ultimately lead to a change in the value of the parameter determining the underlying order. He assumes a parameter determining underlying word order of the same general kind as that assumed by van Kemenade (his (b),

46 Van Kemenade also assumes that the parameter-settings responsible for OE word order were inherently marked. This is because she assumes that Nominative and Accusative Case are assigned from left to right while, as we mentioned in note , thematic roles are assigned from right to left. In current work, the earlier notion of Case-assignment is subsumed under the Agree relation introduced in §.., as we saw in §.. For this reason I leave this aspect of van Kemenade’s account aside here. The idea that extraposition and verb(-projection) raising would have obscured the underlying OV parameter value still holds, independently of these details of van Kemenade’s analysis.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

()). However, he assumes that language acquirers only have access to main clauses as trigger experience. They are ‘degree- learners’ in Lightfoot’s terminology, meaning that they can only access material which involves no clausal embedding. In OE main clauses, as we have seen (see §...), the finite verb appeared in second position. Hence the underlying OV order was obscured by the movement of the verb (to C, according to the standard analysis of V as summarized in §..., but see the discussion of the nature of OE V in §...). Of course, the same situation obtains in Modern Dutch and German. However, Lightfoot argues (ff.) that ‘[v]erb-second languages typically have unembedded ‘“signposts” indicating the movement site of the verb’. These include the final position of the particle where the verb is a particle verb (these are known as ‘separable prefixes’ in many pedagogical grammars of German), the position of the non-finite main verb where there is a finite auxiliary, or auxiliary-like, verb (i.e. where a single clause contains two verbal elements), and the position of ‘verbal specifiers such as negatives and certain closed-class adverbials’ which must, he assumes, be merged immediately to the left of VP. The linear separation of the finite verb from these elements in V clauses is illustrated by the following Dutch examples: ()

a. Jan belt de hoogleraar op. John calls the professor up. b. Jan moet de hoogleraar opbellen. John must the professor up-call ‘John must call the professor up.’ c. Jan belt de hoogleraar niet op. John calls the professor not up ‘John doesn’t call the professor up.’

[Dutch]

This linear separation of V from its complement triggers V-movement, and allows the degree- learner to postulate the underlying OV word order. This is the situation in Modern Dutch and German. Furthermore, Dutch and German both allow main-clause infinitives with a particular illocutionary force; () for example is a rhetorical question: ()

Ik de vuilnisbak buiten zetten? Nooit. I the garbage-can outside put? Never ‘Me put the garbage can outside? Never.’ 

[Dutch]

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WORD-ORDER CHANGE

In OE, however, at least two of the three ‘signposts’ showing linear separation of the finite verb and its complement are either absent or unclear, while the third has, according to Lightfoot, an uncertain status. The clearest observation is that OE negation involved the preverbal proclitic ne, rather than an adverbial element appearing to the left of VP as in Dutch and German (this is a consequence of Jespersen’s Cycle; see §.). As a proclitic, ne is always directly adjacent and to the left of the finite verb: ()

Ne geseah ic næfre ða burh. not saw I never the city ‘I never saw the city.’ (Ælfric, Homilies I..; Lightfoot : )

[OE]

So the position of negation in OE does not function as a ‘signpost’ for the base position of the verb. Second, particles were often fronted along with the verb in second position in OE, unlike their Dutch and German counterparts: ()

Stephanus up-astah þurh his blod S. up-rose through his blood gewundorbeagod. [OE] glory-crowned (Homilies of the Anglo-Saxon Church I, ; Lightfoot : )

Regarding the position and status of non-finite verbs, Lightfoot () suggests that the relevant structures did not exist in OE. Thus, Lightfoot has grounds for asserting that the ‘signposts’ for underlying OV order in main clauses were less clear in OE than in Modern Dutch or German. Since, by the degree- hypothesis, language acquirers have no access to potential triggers in embedded clauses, the relatively systematic OV order of embedded clauses was not relevant to determining the value of the parameter. On the other hand, OE did allow finite clauses with the verb in final position, in particular in the second conjunct of coordinate constructions, as in: ()

and his eagan astungon [OE] and his eyes (they) put.out (Parker, Anglo-Saxon Chronicle ; Lightfoot : )



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

Such constructions gradually become less frequent during OE. Then ‘as matrix instances of object-verb diminished to a certain point, underlying object-verb order became unlearnable and the verb-order parameter came to be set differently’ (). The indications are that this was an abrupt change in word order in embedded clauses in the twelfth century, as originally shown by Canale (). (We will see directly that this factual claim regarding both the date and suddenness of the change has been challenged.) Whatever the merits of the idea that children are degree- learners, Lightfoot’s account illustrates in a rather different way from van Kemenade’s how a parameter determining underlying word order can change owing to that word order being distorted on the surface in a crucial way as the result of a movement operation. Both Lightfoot and van Kemenade date the change in the relevant parameter to approximately the twelfth century. However, there are some difficulties with this. As van Kemenade (: ) says, ‘the older word order did not, of course, become immediately ungrammatical . . . For a long time we continue to find OV structures, but . . . these were not firm enough in the language environment to trigger the older, marked situation.’ Similarly, Fischer et al. (: ) point out that ‘[i]t is only after about  that clauses with VO order begin to vastly outnumber those with OV order’. Fischer et al. give, among others, the following late example of OV order, from Chaucer (late fourteenth century): ()

I may my persone and myn hous so kepen and [ME] deffenden. ‘I can keep and defend myself and my house in such a way.’ (Chaucer, Melibee ; Fischer et al. : )

Foster and van der Wurff () give the following ratios of VO to OV orders in poetry at fifty-year intervals in late ME:  (),  (),  (); in prose , , and . Clearly, some account must be given of the persistence of the OV orders into later ME. Conversely, as already mentioned, OE shows a wider range of possible word orders, in both main and embedded clauses, than do Modern Dutch and German. Of particular importance in this connection are subordinate clauses with elements following a non-finite verb which are known not to appear to the right of such verbs in any Modern 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WORD-ORDER CHANGE

Germanic language: particles, pronouns, and light adverbs. These orders are illustrated in (): ()

a. He wolde adræfan ut anne æþeling. [OE] he would drive out a prince ‘He would drive out a prince.’ (ChronB (T) .–; Pintzuk : ) b. swa þæt hy asettan him upp on so that they transported themselves inland on ænne sið. one journey ‘so that they transported themselves inland in one journey.’ (ChronA . (); Pintzuk : ) c. Þæt Martinus come þa into þære byrig. that Martin came then into the town ‘that Martin then came into the town.’ (ÆLS .–; Pintzuk : )

These orders look very similar to those of NE, and are usually interpreted as being an instantiation of the innovative order, in part because an extraposition analysis would either have to involve extraposition of particles, pronouns, or light adverbs, not usually thought to be possible, or of the implausible constituents consisting of these elements and the following material; ut anne æþeling, him upp on ænne sið, þa into þære byrig in the examples in (), for example. If this is literally true, then OE must have already allowed head-initial orders. Pintzuk (; ) develops this idea by proposing that OE had a ‘double base’: both the OV and the VO value of the relevant parameter were allowed, giving rise to two distinct grammars through the OE period. This approach can account elegantly for some of the variation in word order (see Pintzuk : ff. for discussion), and we will return to the notion of grammars in competition in §., §., and §..47 However, it is not clear what caused one of the two grammars (the OV one, in the case of the history of English) to fall out of use at the time it did. The notion of change in parameter values is not useful here. What appears to be clear, 47 An interesting variant of this approach, where a major restriction is imposed on which categories can show parametric variation in head-complement order, and therefore coexisting word-order patterns, is developed by Fuß and Trips (). We will briefly consider Fuß and Trips’ proposal in §..



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

however, is that the transition from the old OV system to the new VO one was not as abrupt as van Kemenade and, in particular, Lightfoot, imply. An important feature of the OV orders which appear in later ME is that they show a growing preponderance of quantified or negative objects. By comparing a group of early thirteenth-century texts with a group of Late Middle English texts, Kroch and Taylor (b) show that quantified objects appeared preverbally at both periods. In the later period, however, OV order was all but confined to quantified and negative objects. This conclusion is supported by data from fifteenthcentury correspondence reported in Moerenhut and van der Wurff () and Ingham (; ).48 Fischer et al. (: ) state that fourteenth-century English continued to allow OV with non-quantified objects, but that fifteenth-century English only allowed OV where O was negative or quantified, or where there was an empty subject, as in a co-ordinate or relative clause. (a,b) are examples from later ME of OV order with a quantified object, and (c) illustrates OV order with an empty subject: ()

a. Þei schuld no meyhir haue. (negative object) [Late ME] ‘They were not allowed to have a mayor.’ (Capgrave, Chronicles .; Fischer et al. : ) b. He haþ on vs mercy, for he may al þynge do (quantified ‘He has mercy on us, for he can do everything.’ object) (Barlam ; van der Wurff : ) c. alle þat þis writinge redden or heere ‘all that will read or hear this writing’ (Sermon ; Fischer et al. : )

It seems, then, that OV order with quantified and negative objects was lost later than OV order with non-quantified and non-negative objects. In fact, Pintzuk (: –) suggests that OV orders with quantified 48 Ingham () relates the restriction to quantified/negative preverbal objects in fifteenth-century English to the constructions like that in (i), with the expletive there and a negative subject: (i)

Ther shal no thing hurte hym. (PL , ; Ingham : )

This is a further example of a transitive expletive construction, of the type discussed in §....



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WORD-ORDER CHANGE

objects had a different status from those with non-quantified objects as early as OE. If so, then it is not surprising that the two types of OV order may have been lost at different times. In this connection, we note that the Greenbergian categories OV and VO are not sufficiently finegrained to account for the observations that have been made. Finally, van der Wurff and Foster () observe that surface OV disappears completely from prose writings during the sixteenth century. One approach to making the OV vs. VO difference more finegrained has been to take into consideration the information-structural (IS) properties of the direct object. Information structure refers to the status of the referent of a category, in this case the direct-object nominal, as given or new information in relation to preceding discourse. Building on earlier work (Taylor and Pintzuk a,b, ), Taylor and Pintzuk () present a finer-grained account of wordorder change in the history of English in which the IS status of the direct object plays a central role. They distinguish three grammars, G– G. G, the most conservative system, probably inherited from Proto Germanic, had head-final TP and VP giving rise to OVAux as the unmarked order. However, VAuxO was a possible order, where the object was either heavy or new information. This system disappeared in early ME, c.. G was innovated in early OE and also died out in early ME; it thus coexisted with G for much of the attested OE period (Taylor and Pintzuk, like Pintzuk , , adopt the ‘grammars in competition’ approach to syntactic change; see above). G differed from G in that TP was head-initial, although VP, as in G, was head-final; we observed in §.. that this is the FOFC-compliant order of changes from head-initial to head-final categories in the clausal Extended Projection. G generates unmarked AuxOV order, but AuxVO order is possible, again where the object is either heavy or focused. Finally, G is the fully head-initial grammar, where both TP and VP are head-initial. This system survives into NE, of course. Taylor and Pintzuk take the first occurrences of postverbal pronouns as unambiguous evidence for this grammar (pronouns are typically given information and are of course very ‘light’, hence they would not be moved to a postverbal position by the postposing operations of G or G). There is some evidence for this order in pre- OE, although it is not found in all texts from this period (see Taylor and Pintzuk’s note , p. ). In this grammar, there are no IS effects associated with postverbal objects. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

We see then that all three grammars coexisted at least in later OE and early ME, giving rise to the four orders OVAux, AuxOV, VAuxO, and AuxVO. Taylor and Pintzuk illustrate these orders with the following examples: ()

a. OVAux: Gif heo þæt bysmor forberan wolde [OE] If she that disgrace tolerate would ‘if she would tolerate that disgrace’ (Taylor and Pintzuk : ; coaelive,+ALS_[Eugenia]: .) b. VAuxO: þæt he friðian wolde þa leasan wudewan that he make-peace-with would the false widow ‘that he would make peace with the false widow’ (Taylor and Pintzuk : ; coaelive,+ALS_[Eugenia]: .) c. AuxOV: þurh þa heo sceal hyre scippend understandan through which it must its creator understand ‘through which it must understand its creator’ (Taylor and Pintzuk : ; coaelive,+ALS_[Christmas]: .) d. AuxVO: swa þæt heo bið forloren þam ecan life so that it is lost the eternal life ‘so that it is lost to the eternal life’ (Taylor and Pintzuk : ; coaelive,+ALS_[Christmas]: .)

These orders coexist at the period when all of G, G, and G coexist in competition. Crucially, the IS status of postverbal objects is different in the VAux as compared to the AuxV orders. This is particularly important in the case of AuxVO order, which can be generated by G with IS-sensitive object-postposing, or can be generated by G with no ISsensitivity of the object. The loss of IS-sensitive objects in this order is thus a clear sign of the change to a fully head-initial grammar. Struik and van Kemenade () take issue with some of Taylor and Pintzuk’s conclusions, although they too consider IS important for 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WORD-ORDER CHANGE

understanding OE word-order patterns. They argue that in OE preverbal objects were overwhelmingly given information in IS terms. They conclude that ‘if an object is preverbal it is given, but this does not mean that a given object is necessarily preverbal’ (Struik and van Kemenade : ). Postverbal objects are new, but may also be given, or be determined by heaviness (as argued by Taylor and Pintzuk). Struik and van Kemenade point out that their results are consistent with the idea that OE was underlyingly VO with an object-preposing rule applying to given objects, presumably a form of scrambling. This operation was gradually lost in ME. Elenbass and van Kemenade (: –) show that in Early ME (the M period, –) preverbal direct objects are always definite; indefinite direct objects are invariably postverbal. In this period, then, scrambling could only apply to definite DPs. Earlier, we criticized the Lehmann–Vennemann kind of approach to word-order change, in part because it leads to the conclusion that the change takes place over too long a period. What we have seen above suggests that an approach of the sort advocated by van Kemenade and Lightfoot has the change taking place too quickly; both before and after the alleged turning point, which they situate in the twelfth century, we find, respectively, the innovative order (cf. Pintzuk’s evidence from OE, of the kind in ()) and the conservative order (it is clear in particular from Fischer et al. :  that OV with a non-quantified object was found in the fourteenth century, as illustrated by ()). Moreover, neither the gradualist Lehmann–Vennemann approach nor the catastrophist van Kemenade–Lightfoot approach can account satisfactorily for differences between early and late ME regarding negative and quantified objects in OV order, or the various IS-related effects discussed by Elenbaas and van Kemenade (); Taylor and Pintzuk (); and Struik and van Kemenade (). Where does all this leave the idea that parameter F changed in ME? 2.5.4. ‘Antisymmetric’ approaches to word-order change and the Final-Over-Final Condition We have several options in dealing with this situation. One option is to retract any attempt at cross-categorial generalization and revert to the position that the relative order of complement and head is to be restated for each category of head. As mentioned in §.., in connection with certain difficulties posed by German and Dutch for 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

approaches to cross-categorial harmony, this would effectively make any implicational generalizations about word order of a synchronic or diachronic nature appear to be an accident. However, the Final-OverFinal Condition (FOFC) alleviates this situation somewhat. Here I repeat the informal statement of FOFC from §..: ()

A head-final phrase βP cannot dominate a head-initial phrase αP where α and β are heads in the same Extended Projection.

The consequence of FOFC is that even the weakest theory of wordorder variation, one which allows every head to have ‘its own’ parameter determining the position of the complement, still excludes a particular kind of disharmonic order, i.e. that in which a head follows a complement which is itself head-initial, as shown in (): ()

*[βP [αP α γP] β]

In this section we will pursue the implications of FOFC for word-order change along with a radically different approach to head-final orders, the ‘antisymmetric’ theory of Kayne (). Kayne () proposes the Linear Correspondence Axiom (LCA) as a principle of phrase structure. This can be stated as follows (this is an informal, simplified statement; for the original, see Kayne : –): ()

A terminal node α precedes another terminal node β, if and only if α asymmetrically c-commands β.

Terminal nodes are nodes which do not dominate anything, while nonterminals dominate something. We can see the implication of () for word order if we consider a simple verb-complement structure like (), where for simplicity we assume that pronouns such as him are Ds (see Postal ; Höhn , ; Crisma and Longobardi : –): VP

()

V see

DP

D him



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WORD-ORDER CHANGE

Here V asymmetrically c-commands D and so, by the LCA, see must precede him. Given the LCA, there is no possibility of parametric variation in underlying head-complement order of the type assumed by van Kemenade () and Lightfoot (). Instead, the natural assumption (although not the only logically possible one) is that all languages are underlyingly VO, and OV orders are derived by leftward-movement of objects. This idea extends to all cases where a complement precedes its head in surface order; it must have been fronted to that position from an underlying post-head position determined by the LCA. When I introduced the idea of movement in §.., I stated that movement was a matter of purely arbitrary variation among grammatical systems. In fact, Chomsky (, ) proposes that movement depends on Agree. A precondition for movement to relate two positions is that the two positions be in a Probe-Goal relation. Movement takes place where the Probe, in addition to having uninterpretable morphosyntactic features of some kind (see §..), also has an extra property which causes the Goal to undergo ‘second Merge’ (see Chapter , Box .). Let us call this property ‘attraction’ of the Goal. Whether a given category is an attractor in this sense indeed appears to be a matter of arbitrary variation, although it is thought that only functional categories can be attractors. Following Chomsky (, ), I will indicate this attraction property with an Extended Projection Principle, or EPP, feature. (We mentioned the earlier conception of the EPP in note  of Chapter .) Where a Probe P is an attractor I will write it as P[+EPP]. Since many parameters involve the presence or absence of movement (for example, those connected to verbmovement discussed in §.), whether or not a Probe has an EPP feature is an important aspect of parametric variation; I will return to this point in §.. Given this view of movement along with the interpretation of Kayne’s proposals just sketched, OV systems differ from VO systems in that some Probe P to the left of and structurally ‘higher’ than VP attracts the object, i.e. it enters the Agree relation with the object and triggers object-movement, so P would have an uninterpretable feature of some kind. The obvious candidate for P is v, since we saw in §. that this element has uninterpretable φ-features which probe the direct object and allow the object’s ACC-feature to be deleted. OV order arises when v* has an EPP feature. This gives the following derived structure: 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

Box 2.1. Merge and the LCA Strictly speaking, the presentation of the LCA here is not compatible with the structure-building operation Merge. In the Introduction, I presented Merge as the operation which ‘combines two syntactic elements (in the simplest case, two words) into a more complex entity which consists of those two elements and its label; the label being determined by one of the two elements’. Merge is thus an intrinsically binary operation. However, in () himD does not appear to have merged with anything. The discrepancy is in part due to the fact that Kayne () does not assume Merge. If we ‘correct’ the structure in () to bring it into line with Merge we have ():

VP

()

V see

D him

Here him is simultaneously maximal, in that it is immediately contained in a different category, and minimal, in that it does not itself contain anything. More generally, if we wish to make (), and hence the LCA, compatible with the symmetrical nature of Merge as we have defined it we will always run into problems with the most deeply embedded category; the right branch of all higher categories is recursive and hence there is an asymmetric relation between the terminals which permits the LCA to determine linear order. This can be seen from the abstract phrase marker in (), where the LCA defines the linear order A>B>C>{D, E}, but cannot order D and E. Since all trees must terminate with a non-recursive right branch, the problem of ordering the terminal on this branch with that on its sister will always arise:

() A

B

C

D

E (continued)

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WORD-ORDER CHANGE

Box 2.1. (Continued) One way to solve this ‘rightmost-branch problem’, suggested by Chomsky (: ), is to require that one of the elements merged at the most deeply embedded level is phonologically null. This is legitimate since ‘there is no reason for the LCA to order an element that will disappear at PF’ (Chomsky : ). In this case, we can assume an empty N-element is merged with D, perhaps as a consequence of the inherent nature of D, so this would give () instead of ():

VP

()

V see

DP

D him

N ∅

Cardinaletti and Starke () present a general analysis of the internal structure of pronominal DPs; see also Déchaine and Wiltschko (), Panagiotidis (, ), Elbourne (), Barbosa (), and Roberts (a: f.) for similar proposals, all of them broadly compatible with the structure in (). It is not clear whether this proposal can provide a general solution to the rightmost-branch problem, however.

()

[v*P DP-Obj v*[uφ, EPP] [VP V (DP-Obj)]]

The change from OV to VO must then be seen as the loss of the trigger for movement, i.e. the loss of v*’s EPP feature. Using different technical devices, an approach to word-order change in English of this kind was first proposed in Kiparsky (: ) and developed in Roberts (); Struik and van Kemenade’s () results regarding the IS status of preverbal objects are also compatible with an analysis of this kind, as are the results of Elenbaas and van Kemenade concerning scrambling in Early ME which we mentioned in the previous section. Taylor and Pintzuk (: ) also comment that their data could be analysed in these terms, although they themselves do not actually do this. This kind of approach has also been adopted by van der Wurff (, ); 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

Fischer et al. (); and Ingham (, ), and by Hróarsdóttir (, ) for word-order change in Icelandic. The approach, which we will refer to as the ‘antisymmetric approach’ since the LCA requires phrase structure to be antisymmetric, has a number of advantages. One conceptual advantage is that it eliminates the possibility of changes in the operation of External Merge itself; only Move and Agree may vary and these are completely conditioned by the feature-content of functional heads, in conformity with the BCC. This is advantageous in that the range of formal options that the child must consider in setting parameters is limited. Indeed, we can continue to maintain that all parametric variation concerns Agree or Move; External Merge is invariant. (We return to this point in §..) Limiting the options of formal variation in this way is a good thing, in that it takes us a small step further towards reconciling poverty-of-the-stimulus considerations with the attested variation in grammatical systems (cf. the discussion in §.). This point has an important corollary when we consider change: it implies that changes must always concern Agree or Move, a point I will return to in §. and §.. Second, the antisymmetric approach to word-order change makes possible a more fine-grained empirical analysis, as movements can be selectively triggered. For example, we might claim that until around , following Fischer et al., v* triggered movement of objects generally, much as schematized in (), while in fifteenth-century English v* only triggered movement of quantificational or negative objects, in virtue of having one of the two specifications [uφ] or [uφ, uOp, EPP] (where ‘Op’ here designates both negation and quantifiers). Since we have already seen that the Op feature may enter into Agree relations in our discussion of negation in §.., this suggestion has some independent plausibility. Of course, it is also possible that quantificational and negative objects were attracted by some other category than v, either from the fifteenth century (as suggested by van der Wurff ,  and Ingham , ), or perhaps through ME and even in OE, as argued respectively by Kroch and Taylor (b) and Pintzuk (). In that case v would have been [uφ, EPP] up to c. and the other category would have been [uOp, EPP] all along. Whichever of these analyses turns out be correct, we can see that the antisymmetric approach, while conceptually more restrictive than the approach considered earlier, is also more flexible. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WORD-ORDER CHANGE

A further empirical advantage of the antisymmetric approach concerns the adjacency of the verb and direct object. To quote Kiparsky (: ): ‘[r]igid VO languages . . . require adjacency of verb and object . . . whereas rigid OV languages . . . allow adverbs to intervene freely’; see also Haider (, ). We can observe this difference in the history of English: the examples of OE and ME OV order in () and () all have material intervening between the object and the verb; on the other hand, in NE the verb must be adjacent to the direct object (as mentioned in §...). If OV order is derived by leftwardmovement of the object as in (), then it is quite conceivable that adverbial and other material may intervene between the target of this movement and VP. VO order, on the other hand, does not involve object-movement, and so, to the extent that the verb does not move, the adjacency created by merging these two elements will be undisturbed. This difference between OV and VO systems cannot be so readily captured if we assume that the relevant parameter concerns the firstmerged order of object and verb. What I have said about the antisymmetric approach so far might lead one to consider that it is potentially so fine-grained that it has no hope of capturing larger-scale implicational relations such as those behind parameters like F or F. But in fact this is not the case. If we ally the antisymmetric approach to the idea that potential movement triggers, i.e. various (sub)classes of functional heads, tend to trigger or fail to trigger movement harmonically, then we can in fact begin to understand the implicational relations that we have seen. Thus, parameter F would be the option of leftward-movement of the complements of V and T, and parameter F would be the option of leftward-movement of many different types of complements. The tendency for ‘head-initial’ and ‘head-final’ patterns to hold across categories, as revealed by the typological work of Greenberg, Hawkins, and Dryer, would result from a preference for potential movement triggers to act together. We could perhaps restate Hawkins’ () generalization of cross-categorial harmony in the context of Kaynian antisymmetry as follows: ()

There is a preference for the EPP feature of a functional head F to generalize to other functional heads G, H . . .

() is actually a case of Input Generalization, one of the third factors introduced in §. (). I will return to this question in §. and §., where I will propose a more precise version of (). For now it 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

suffices to note that the antisymmetric theory is both fine-grained enough to allow an analysis of the differences between fourteenthand fifteenth-century English, and at the same time at least in principle able to capture large-scale implicational relations. To quote Kiparsky (), the antisymmetric theory ‘would then predict . . . that the mixed system of head-complement relations of Germanic would become uniform.’ But Kiparsky goes on to point out that ‘OV commonly changes to VO but the converse does not happen.’ As things stand, what has been suggested here does not predict this; in §., I will return to this point. A potential problem with the antisymmetric approach is that it runs the risk of entailing a complication of the structure of the clause, as triggers and landing sites for movement need to be postulated. To the extent that these are postulated purely to account for leftward-moved complements, the approach is no better than the one which postulates parameters determining the linear order of merged elements; it simply shifts the locus of variation from Merge to Move. If, on the other hand, the movements and positions needed to derive head-final orders are independently required, then this point is not problematic. A second problem concerns the specific proposal that OV order is derived by leftward-movement of objects. In true OV systems, all complements must precede V. At first sight, this may seem to imply that a whole host of movements and positions must be postulated in order to account for preverbal PPs, particles, adverbs, etc. Again, to the extent that the movements are postulated purely to derive the headfinal surface order, the antisymmetric approach loses its advantage over other approaches. An interesting way of handling this last problem has been put forward. Following an initial proposal by Hinterhölzl (), various authors (Hróarsdóttir ; Haegeman ; Koopman and Szabolcsi ; Koster ; Biberauer ) have proposed what has come to be known as the ‘roll-up’ analysis, first for West Germanic OV languages, but of course it extends in principle to any and all head-final languages. The basic idea is that the Goal for leftward-movement may be contained in a larger category which is moved along with the Goal (thereby undergoing pied-piping) when EPP-driven movement takes place. So, for example, instead of the object alone being attracted by v*’s EPP feature, as in (), the entire VP might move. This would give the structure in (0 ): 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WORD-ORDER CHANGE

(0 ) [vP [VP V DP-obj] v*[uφ, EPP] (VP)]] In this structure, all VP-internal material in addition to the direct object is moved to the left of v*. But, also, as it stands, V moves too, and so OV order is not derived. But suppose V independently raises to v* (Chomsky (: ) assumes that this is a separate operation from verb-movement to T, object-movement, etc.), and then the ‘remnant’ VP moves to SpecvP. This will give the derived structure in (00 ): (00 ) [v*P [VP (V) DP-obj] V+v*[uφ, EPP] (VP)] The structure in (00 ) gives rise to the surface order where all V’s complements precede V. Suppose we now iterate the pied-piping operation at the TP-level, i.e. we allow T to attract the entire vP to its Specifier (pied-piping the subject, with which T’s [uφ] features Agree). This gives (): ()

[TP [v*P DP-subj [v*P [VP (V) DP-obj] V+v*[uφ, T (vP)]

EPP]

(VP)]]

If T is the category containing an auxiliary, then this sequence of operations will give us the surface word order Obj–V–Aux, which is the usual order found in subordinate clauses in West Germanic. Also, all other verbal complements will appear to the left of V and Aux. Let us illustrate this kind of derivation with (c), which we repeat here: (c)

. . . forþon of Breotone nædran on scippe lædde . . . because from Britain adders on ships brought wæron. [OE] were ‘ . . . because vipers were brought on ships from Britain.’ (Bede .–; Pintzuk : )

The order in which the VP-constituents are first-merged is as in ():49 ()

[VP lædde of Breotone nædran on scippe]

When VP is merged with v*, V raises to v* and VP to the Specv*P, giving (): 49 If Merge is binary, as stated in the Introduction, there must be further structure inside the VP. I leave this aside here. The important point is that all these elements, except V itself, move as a unit.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

()

[v*P [VP (lædde) of Breotone nædran on scippe] [v*lædde v* [+EPP]] (VP)]

Next, the auxiliary wæron is merged in T and v*P moves to SpecTP. This gives the structure in (): ()

[TP [v*P [VP (lædde) of Breotone nædran on scippe] [v* lædde v*] (VP)] [T wæron] (vP)]

Although this derivation appears rather complex, as long as roll-up can be motivated, the approach has all the advantages of the antisymmetric approach to word-order variation and change which we enumerated above.50 In particular the loss of these roll-up operations involving vP and VP pied-piping plays a central role in the OV > VO change we have discussed here. Accounting for head-final orders in this way has a further very important consequence, in that it gives us a natural account of FOFC. We can derive FOFC from a version of the Strict Cycle Condition of Chomsky (), which we can formulate as follows:51 ()

Strict Cycle Condition: No rule R can apply to a domain dominated by a node A in such a way as to solely affect B, a proper subdomain of A.

() says all rule applications must apply to the smallest portion of structure they can, so in (), if R can apply to B rather than A, then it must do so: ()

. . . [A . . . [B . . . ] . . . ] . . .

Once R has applied to B, then A becomes the smallest structural domain for it to apply to, and so on. Applying the Strict Cycle to roll-

50 We may further note that ‘verb-projection raising’ order as in (b) and ‘verb-raising’ order as in (a) can be straightforwardly derived by lack of vP-movement and objectmovement instead of VP-movement respectively. In other words, these orders reflect movement of DP only to SpecTP and SpecvP, rather than pied-piping of vP and VP. We thus have a rather natural way of accounting for the synchronic variation in OE as a stable option of pied-piping vP or VP vs. ‘stranding’, i.e. movement of the DP Goal alone. This idea is developed in Biberauer and Roberts (). We return to the question of optionality in §... 51 This formulation is closely modelled on that in Chomsky (: ), with reference to cyclic nodes removed. It is also close in spirit to the phonological cycle, introduced in Chomsky and Halle (: ).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WORD-ORDER CHANGE

up directly captures FOFC. Consider the case where roll-up applies to vP but not VP in (): ()

[CP C [TP T [vP v [VP V O]]]]

If roll-up applies in vP first, moving VP around v, () is violated because it can apply in the smaller domain VP and the familiar standard FOFC violation in () results, possibly deriving V > O > Aux order if Aux is in v: ()

*[CP C [TP T [vP [VP V O] v]]]

Once roll-up has applied at the lowest possible level/in the smallest possible domain, it can apply at the next-lowest/smallest domain, etc. By the same token, ‘stop/go’ roll-up is impossible. Roll-up is therefore maximally cyclic and maximally local. So FOFC derives from the LCA, roll-up, and the version of the Strict Cycle in (). We saw in §.. that Ledgeway (: –) shows that wordorder change from head-final to head-initial in Latin follows a pattern consistent with FOFC and that both Biberauer, Sheehan, and Newton () and Taylor and Pintzuk () say the same about word-order change in English. In Late Latin, the change at the CP level from AcI complements introduced by a null final complementizer to finite clauses introduced by the overt initial complementizers ut, quod, or quia meant that the conditions for TP, vP, and VP to change ‘headedness’, which we now construe as the loss of roll-up, were met. Similarly, Ledgeway takes PP to be part of the nominal Extended Projection and observes that ‘original head-final adpositional structures . . . have been replaced by head-initial adpositional structures’ (Ledgeway : ). Since oblique case-marked nominals had a silent postposition, the introduction of overt prepositions created the conditions for a general shift of headedness in nominals. Ledgeway then provides a general account of the word-order changes in Latin and on to Romance based on ‘a principle of cross-categorial harmonization, such that once head-initiality becomes established in the topmost CP and PP layers, it is then free to percolate down harmonically to the phrases that these in turn embed’ (Ledgeway : ). This principle may in turn be related to Input Generalization as in §. (ii), a point we return to in §... Furthermore, Ledgeway shows that the C-initial clauses overwhelmingly appear postverbally, while the AcI clauses tend to be preverbal in 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

Classical Latin and appear equally pre- or postverbally in the later periods. For example, following Herman’s (: ) survey of four Late Latin Christian texts,52 he observes the following ratios of postverbal to preverbal quod/quia-clauses: :, :, :, :. This compares with the ratios for AcI clauses, which are :, :, :, and : (see Ledgeway : , table .). As Ledgeway observes, this distribution follows from FOFC since, as we pointed out in §.., C-initial complement CPs in OV languages must extrapose in order to avoid a FOFC violation, while there is no such requirement on head-final complements such as the AcI complements. In this way, FOFC played a further role in the general word-order change, in that it created surface head-complement orders with clausal complements, making possible a reanalysis of the extraposed clause as occupying its first-merged postverbal position and thereby eliminating this case of roll-up. The reanalysis is shown in (): ()

[[VP (CP) [V’ V (CP)]] . . . CP] > [V’ V CP]

In the older structure, CP first rolls up around V and then extraposes to the right of VP (for simplicity of exposition, here I ignore V-movement and assume that CP rolls up to SpecVP; see Biberauer, Holmberg, and Roberts  on the latter point). The reanalysis eliminates both of these movements. If roll-up is eliminated in VP, then FOFC requires it to be eliminated throughout the Extended Projection of V, and so a class of systematically head-initial clauses results from this reanalysis, strongly favouring the general word-order change. Ledgeway also points out that head-initial PPs tend to appear postnominally, while bare case-marked nominals (by hypothesis, containing a null postposition), appear both pre- and postnominally. So PPs behave rather similarly to CPs. This behaviour can also be explained by FOFC and, just as with postverbal head-initial CPs, leads to a further case of surface head-initial order prone to a reanalysis eliminating roll-up. There may be a further connection to the mesoparameter in (), repeated here: ()

a. [XP [ . . . Goal . . . ] [Probe Ø]] b. [XP [Probe X 6¼ Ø] [ . . . Goal . . . ]]

52 These are St Cyprian (third century), Lucifer of Cagliari (died ), the fourthcentury Peregrinatio Aetheriae and Salvian of Marseille (–).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

WORD-ORDER CHANGE

In (a), the category containing the Goal has been rolled up (presumably into the Specifier of the Probe), while in (b) roll-up of this category has not taken place. So () connects roll-up to the lack of overt exponence of the Probe combined with inflectional marking on the Goal. In this way, dependent marking and head-final orders are connected, as has often been observed in the typological literature (Nichols : f., : f., tables –; Song : ). Here we may have a glimpse of a macroparameter, although, as we noted in §., the question of why roll-up triggers are null while heads which fail to trigger roll-up tend to be overt remains open. Note, though, that the latter tendency provides a morphological trigger for the absence of rollup: this could be the p-expression of the non-rollup option. A null head would then be a (weakly) ambiguous p-expression of roll-up. Clearly these questions need to be further pursued, as they have many implications for typology, acquisition, and diachronic change. So we see that FOFC plays a major role in the head-final to headinitial word-order change seen in the history of English and of Latin/ Romance. FOFC, especially when construed as a constraint on structures derived from cyclic roll-up, is of central importance for understanding word-order change. 2.5.5. Conclusion In this section, we have discussed three different approaches to wordorder change: the typological approach advocated by Lehmann and Vennemann, the approach postulating variation in underlying headcomplement order of van Kemenade and Lightfoot, and the ‘antisymmetric’ approach based on Kayne (). In connection with the antisymmetric approach, we elaborated on the relevance of FOFC, developing some of the points made in §... We have also mentioned the ‘grammars in competition’ idea, influentially advocated by Pintzuk (, ) and Kroch and Taylor (b). Furthermore, we have seen that the data are more complex than the simple statement that English changed from OV to VO might seem to imply, while this statement nevertheless contains an important kernel of truth (and there is no reason to doubt that the same applies to Latin/Romance and other languages which have undergone this change, some of them mentioned in §..). In particular, OV order and its implicational 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

correlate VAux appear to have been incrementally lost in English, beginning in OE, with the final disappearance of OV only taking place at the end of the fifteenth century. If language change is driven by language acquisition, this cannot be a single parameter change; instead we must view OV order as arising from the interaction of several parameters, which tend to act harmonically. It seems that the antisymmetric approach lends itself particularly well to this view of things, although it is not without problems. From the foregoing discussion, we see that word-order change in English and elsewhere is somewhat more complex than previously thought, in that it involves several separate but related parameter changes (see Fischer et al. : – for a clear statement of what the various stages may have been in the history of English). On this view, there is no simple OV/VO parameter, as different types of OV order are derived by different operations at different periods. This conclusion considerably refines our notion of ‘word-order type’ both synchronically and diachronically, and entails a Hawkins-esque notion of cross-categorial harmony, formulated in terms of the association of an EPP feature with different categories (which may reduce to Input Generalization, one of the third factors in language design; see §.). An important conceptual advantage of word-order typology is thus retained, while the descriptive inadequacies of the ‘traditional’ OV/VO opposition are replaced by a more fine-grained analysis.

2.6. Conclusion The goal of this chapter was to illustrate the power and the utility of the notion of parametric change by showing how most of the principal kinds of syntactic change which have been discussed in the literature can be reduced to this mechanism. I have tried to show that reanalysis, grammaticalization, changes in argument structure, complementation, and word order can all be understood in these terms. From here on, I will take it that this is the case; although there is much in the preceding two chapters that is open to debate, I maintain that they together constitute support for the thesis that the key notion for an understanding of diachronic syntax is parameter change. Furthermore, I follow Lightfoot (, , ) in taking parameter change to be driven by language acquisition. What we have not seen in 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FURTHER READING

detail up to now, though, is how parameter change is connected to parameter-setting in language acquisition. One of the principal goals of the next chapter is to do just this, which will entail a clearer characterization of parameters as a formal aspect of the theory of grammar.

Further reading Reanalysis, abduction, and learnability Andersen () is the classic exposition of the concept of abductive reanalysis. The empirical focus of the article is not syntax but phonology: sound changes in various Czech dialects. The conceptual importance of abductive reanalysis for our general understanding of change is however made very clear. Timberlake () is a classic study of syntactic reanalysis, in which it is proposed that the effects of reanalysis may not manifest themselves in surface changes immediately. Longobardi () deals with the development of the French preposition chez from the Latin noun casa. The Inertia Principle plays a major role in the analysis, and its nature and implications are discussed in some detail. We will look at the Inertia Principle in §.. Bertolo () is a collection of articles all dealing with aspects of learnability in relation to principles-and-parameters theory. Kroch () is an excellent survey of the issues and results in generative diachronic syntax. Yang (a, b, ) propose different models of acquisition whose implications for language change are explicitly discussed. Fodor and Sakas () is an excellent overview of how the theory of learnability has developed over the history of generative grammar, with a particular focus on the advantages and disadvantages of different versions of principles-andparameters theory. V-to-T movement and the development of English auxiliaries Warner () is a thorough and highly critical review of Lightfoot (), calling into question many of the empirical claims made there concerning the historical development of English modal auxiliaries. Warner () is a monograph on the development of the auxiliary system in general, with the analysis stated in terms of Head-Driven Phrase Structure Grammar (HPSG). Denison () defends the idea 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

that auxiliary do developed from an earlier raising/control verb, which had a bare-infinitive complement, an idea developed a little further in Roberts (a). Roberts () reconsiders the reanalysis of the English modals as auxiliaries, first dealt with in Lightfoot (). This is also the first paper to clearly recognize that English has lost V-to-T movement and to attempt to relate this to impoverishment of verbal agreement inflection. Lightfoot () takes up these points, following on from the discussion in Lightfoot (). Vikner () gives a clear and systematic statement of the correlation between V-to-T movement and agreement inflection, which he restricts to VO languages. Alexiadou and Fanselow () argue that this correlation is not an aspect of grammar, but rather a contingent fact about diachrony. Anderson () is another critique of the proposals in Vikner (); again, the thrust of the argument is that conditions directly relating agreement inflection to movement are somewhat implausible. Bobaljik () also criticizes Vikner’s proposals, mainly on empirical grounds. Thráinsson () looks at ongoing change in Faroese regarding V-to-T movement and agreement morphology, arguing that these developments pose problems for the proposed correlation between V-to-T movement and rich agreement; this question is taken up in more detail in Heycock, Sorace, and Svabo Hansen (), Heycock, Sorace, Hansen, Wilson, and Vikner (, ) and Heycock and Wallenberg (). Bobaljik and Thráinsson () propose a parameterized version of Pollock’s () split-Infl idea: some languages combine T and Agr features on a single head, others split them into two projections. The cue for the difference is, again, the richness of verbal agreement morphology. Koeneman and Zeijlstra (, ) propose a biconditional version of the correlation between verb-movement and agreement, attempting to show that the earlier criticisms can be overcome; Tvica () is a typological survey whose results support their proposals. Heycock and Sundquist () present counterevidence to Koeneman and Zeijlstra based on the history of Danish. Schifano () is a thorough cartographic survey of verb-movement into the tense-mood-aspect field across a wide range of Romance varieties whose principal results are that there are at least four distinct landing sites for verb-movement in Romance and that verbmovement is correlated with the absence of tense-mood-aspect inflection rather than with the presence of agreement inflection. The question of how these results may carry over to Germanic is unclear, however (see Roberts a: f.). 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FURTHER READING

The effects of the loss of dative case and the development of psychological predicates in English Allen () is a thorough and interesting discussion of the development of recipient passives and constructions involving psych verbs. It is arguably the most in-depth study of the syntactic effects of the loss of the morphological marking of inherent Case in English to date. Bejar () looks at Allen’s analysis of the changes in psych verbs from a minimalist perspective. Eythórssen and Barðdal () survey the behaviour of ‘quirky subjects’ in a range of Germanic languages, and come to conclusions only slightly different from Allen’s. Lightfoot () was one of the earliest analyses of the effects of the loss of morphological dative case on English syntax. Fischer and van der Leek () is in part a response to this, going into much greater empirical detail regarding the development of ME psych verbs. Van der Gaaf () is the principal traditional study of the development of psych verbs in the history of English. Belletti and Rizzi () is an influential analysis of psych verbs, mostly in Italian, in terms of governmentbinding theory. Pesetsky () includes a very detailed study of NE psych verbs, breaking them up into a number of thematically-defined subclasses, and arguing against some of Belletti and Rizzi’s conclusions. Landau () explicitly connects locative and experiencer arguments in his analysis of psych-predicates, and advocates a theory closer to Belletti and Rizzi’s. Belletti and Rizzi () sketch a new analysis of some kinds of psych-predicates based on Collins’ () notion of a ‘smuggling derivation’ (see below). Baker, Johnson, and Roberts () is an influential analysis of passives using government-binding theory. Collins () is the most thorough and interesting alternative account of passives, which develops the idea of ‘smuggling’ derivations, a particular manifestation of restrictions on the locality of movement. Two rather different recent analyses of passives are den Dikken () and Akkus, Legate, Ringe, and Šereikaitė (). Grammaticalization Bybee, Perkins, and Pagliuca () is a major typologically-based survey of grammaticalization phenomena. Heine et al. (), Heine, Claudi, and Hünnemeyer (), Heine and Kuteva (), Kuteva, 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

Heine, Hong, Long, Narrog, and Rhee (), Traugott and Heine (), and Heine and Reh () are all important collections of materials on grammaticalization, again looked at from a functionaltypological perspective, along with the handbook edited by Narrog and Heine (). Lehmann (, ) provide useful overviews of the phenomena. Haspelmath () presents an account of the development of infinitival markers from purposive conjunctions, relying on the functional-typological notion of grammaticalization. Batllori et al. () is a recent collection of articles adopting a formal, mostly minimalist, approach to different kinds of grammaticalization phenomena. Meillet () is a general paper on grammatical change. It is noteworthy for the first recorded use of the term ‘grammaticalization’ and for the claim that this, along with analogy, are the only mechanisms of grammatical change. Bopp () was one of the first major treatises on comparative Indo-European grammar. Some of the ideas put forward prefigure more recent ideas about grammaticalization. Hopper and Traugott () is the main textbook on grammaticalization; again the focus is largely functional-typological. Simpson and Wu () is an interesting formal account of grammaticalization in various East Asian languages. Wu () is a formal treatment of a number of cases of grammaticalization in the history of Chinese. Roberts and Roussou () is an early version of the later monograph on grammaticalization, in which the formal, minimalist-based approach is proposed (Roberts and Roussou ; see the Further reading section in Chapter ). Van Gelderen (, , , ) are a series of important works on formal aspects of grammaticalization, particularly on grammaticalization cycles; van Gelderen () is a useful overview. Willis, Lucas, and Breitbarth (), Breitbarth, Lucas, and Willis () are definitive studies of negation, quantification, and indefiniteness cycles; the first volume features ten case studies in the languages of Europe and the Mediterranean, while the second provides a general, detailed cross-linguistic account of the cycles, taking into account semantic and pragmatic issues as well as language contact. Word-order change in English Foster and van der Wurff () study OV vs. VO orders in Late ME, with some very revealing quantitative results. Moerenhut and van der 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FURTHER READING

Wurff () is another partly quantitative analysis of the incidence of OV orders of various kinds in ME. Van der Wurff (, ) and van der Wurff and Foster () further investigate details of ME word order, supporting an antisymmetric approach and arguing that object-movement in Late ME was restricted to certain types of object. Ingham (, ) look at the nature of the preverbal object in Late ME OV orders, showing clearly the preference for negative or quantified objects in this order. Kroch and Taylor (b) argue that quantified objects move to a designated position throughout the ME period. Pintzuk and Kroch () is an important early study of OE word order, in which the authors demonstrate that postverbal objects in subordinate clauses in Beowulf are always preceded by a metrical pause. They argue that this is consistent with the idea that such objects are extraposed. Roberts () proposed an ‘antisymmetric’ VO analysis of OE word order, suggesting that this has the advantage of allowing us to see the change from surface OV to VO as the loss of leftward object-movement rather than as a reanalysis of the base. Biberauer () proposes an account of synchronic variation in Modern Spoken Afrikaans which makes use of roll-up and pied-piping options as sketched in §... Biberauer and Roberts (, ) apply these ideas to word-order variation and change in the history of English. Stockwell () and Stockwell and Minkova () are further studies of OE word order and ME word-order change, the former being one of the earliest generative analyses of OE. Canale () is an early study of word-order change in ME. Hiltunen () is another important early study. Pintzuk (, , , ) all develop the idea that word-order change resulted from competition between head-initial and head-final grammars, while Pintzuk and Taylor () is a useful overview. Taylor and Pintzuk (a,b, ) propose that information structure is relevant to understanding the development of VO orders in English; this idea is taken up in different ways in Elenbass and van Kemenade () and Struik and van Kemenade (). Other work on the history and varieties of English Denison () is a comprehensive review of nearly all the major work on the historical syntax of English available at the time, very coherently organized into topical sections and with extremely useful commentary 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

and bibliography. Henry () is a detailed study of a number of syntactic peculiarities of the English of Belfast. Some of them, including the Northern Pronoun Rule, are shared by other regional varieties of English deriving from Northumbrian OE. Jones () is a comprehensive survey of Scots English, with much useful historical material. Los () looks at the rise of the to-infinitive in ME, arguing that it did not derive directly from an OE purposive, but had an earlier origin, with its distribution being enlarged during ME as it replaced subjunctive that-clauses in a number of contexts. Jespersen (–) is a classic survey of the historical grammar of English. It remains the most comprehensive work of its kind. Visser (–) is a very large compendium of syntactic constructions from all periods of English; before the advent of electronic corpora, this was an invaluable tool, and it remains useful today. Hogg (–) is the definitive survey of the history of the language, while Hogg and Denison () is a briefer general history. Van Kemenade and Los () and Nevalainen and Traugott () are important and extremely useful handbooks. Huddleston and Pullum () is the definitive synchronic descriptive grammar of contemporary English. Germanic syntax Hinterhölzl () deals with verb-raising and verb-projection raising in dialects of German. This was one of the first studies in which a rollup analysis was proposed in order to account for this kind of phenomenon. Koopman and Szabolcsi () develop and extend the roll-up approach to a range of constructions including those involving preverbs in Hungarian. Koster () invokes roll-up in the analysis of verb-raising in Dutch. Wurmbrand (, ) provide analyses and overviews of verb-raising and related restructuring phenomena in German, in which it is argued that these ‘clause-union’ phenomena involve reduced complements of various kinds. Kiparsky () presents an intriguing reconstruction of clausal subordination in IndoEuropean, and a proposal for the development of finite complementizers in Germanic which relates this to the rise of V. Koster () is a classic article in which it is shown that the most economical analysis of Dutch V clauses involves generating the verb in final position and raising it to C. Sigurðsson () is an important study of case marking and non-finite clauses in Icelandic. The major result is the evidence that 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FURTHER READING

PRO can bear dative case, contrary to a central tenet of governmentbinding theory. Jonas () looks at various aspects of the syntax of Norn, the North Germanic language spoken on the Shetland Islands until the eighteenth century. Thráinsson () is a thorough overview of the main issues and results in Icelandic syntax; Haider () provides a comparable treatment of German syntax, and Zwart () of Dutch syntax. Latin and Romance syntax Bolkestein () provides detailed arguments that the Latin AcI construction is not the same as English-style Exceptional Case Marking. Cecchetto and Oniga () develop Bolkestein’s analysis further, adding more data and greater sophistication. Sihler () is an important and very thorough historical grammar of Latin and Greek, clearly demonstrating the relations between these languages and Proto Indo-European. Vincent () is a very useful survey of the structure of Latin, with a thorough discussion of how Latin clausal complementation differs from that of the Modern Romance languages; Vincent () is an analysis of the changes from Latin to Romance couched in terms of the Lexical-Functional Grammar (LFG) framework. Woodcock () is a traditional grammar of Latin. Perlmutter () was the first to observe the systematic differences in behaviour between unaccusative and unergative intransitives in Italian. He provided a detailed analysis of these and related constructions in terms of relational grammar. Burzio () was the first in-depth study of unaccusatives and related constructions in government-binding theory. It also deals with a very wide range of constructions from Italian and its dialects. Levin and Rappaport-Hovav () is the most thorough study of unaccusativity in English to date, featuring very insightful analyses of a number of subsystems of the English lexicon. Baker (b) develops a theoretically up-to-date analysis of verb classes in English (as well as Basque, Georgian, and Tibetan; see also Baker a). Guasti () presents an analysis of the complements of causative and perception verbs in French and Italian in terms of a late version of government-binding theory, while Sheehan (, forthcoming) develops and updates these. Calabrese () presents an analysis of control and raising in Salentino, a Southern Italian dialect almost entirely lacking in infinitives. Ledgeway () similarly studies 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

TYPES OF SYNTACTIC CHANGE

these and related phenomena in the dialects of Southern Calabria and North-East Sicily, which also almost entirely lack infinitives. Ledgeway () is the most detailed study of the syntax of Southern Italian dialects to date, while Ledgeway () is a detailed description of Medieval Neapolitan. Rohlfs () is the classic traditional description of Italian dialects. Jones () is probably the most comprehensive survey of Sardinian syntax in terms of generative grammar to date. Kayne () is a classic treatment of ‘complex inversion’ in French. Rizzi and Roberts () reanalyse French complex inversion, exploiting the VP-internal subject hypothesis in order to account for the presence of two realizations of the subject in this construction. Roberts (b) is an analysis of clitics and inversion in Franco-Provençal Valdôtain, a variety of Franco-Provençal spoken in the Val d’Aoste in North-Western Italy. Kayne () is a collection of articles on universals, Romance syntax, and English syntax, with an introduction in which, among other things, Kayne argues that the number of distinct grammatical systems currently extant is probably greater than the human population. Wheeler () is a thorough survey of the history and structure of Occitan. Salvi (), Devine and Stephens (), Ledgeway (), and Dankaert (, ) are all detailed treatments of Latin word order. Clackson and Horrocks () is a general overview of the history of Latin from Proto Indo-European to Romance; Clackson () is a general handbook on all aspects of Latin. Maiden, Smith, and Ledgeway (, ) is an overview of all aspects of the history of the Romance languages, while Ledgeway and Maiden () is a monumental, largely synchronic, overview of all the Romance languages. The structure of DPs Bernstein (, , ) present a general analysis of the structure of nominals across languages using the DP hypothesis and N-to-D movement. Longobardi (a) is another major study of nominal syntax in terms of the DP hypothesis, in which N-to-D movement plays a central role. Longobardi () develops the earlier work, developing a general theory of the syntax-semantics mapping in nominals, in part developing and challenging the proposals in Chierchia (). Crisma and Gianollo () and Crisma () are important works on the diachrony of DPs. Crisma and Longobardi () is an excellent 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FURTHER READING

overview of much comparative work on DPs. Ritter () also looks at nominals in terms of the DP hypothesis and N-to-D movement. Zamparelli () is another treatment of the internal structure of nominals, arguing in particular, and partly on semantic grounds, for a distinct functional projection from D to house certain types of quantifier. Alexiadou, Haegeman, and Stavrou () is a general overview of many aspects of DP syntax.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

3 Acquisition, learnability, and syntactic change

Introduction 3.1. Parameter-setting in first-language acquisition: a scenario for parameter change? 3.2. The logical problem of language change 3.3. The changing trigger 3.4. Markedness and complexity 3.5. Parameter-setting, parameter hierarchies, and syntactic change 3.6. Conclusion Further reading

       

Introduction Having by now established that parameter change is the central analytical notion in accounting for syntactic change, here we look at the deeper questions this conclusion raises. The purpose of this chapter is to explore the idea, introduced in §., that parameter change is driven by the first-language acquisition process (this idea is pursued in Lightfoot , , ; see Croft : –,  for critical discussion), and thereby to illustrate how the study of syntactic change may be relevant for our understanding of the processes involved in first-language acquisition. One way of construing the idea that parameter change is driven by the first-language acquisition process is to think that a parameter value changes because an innovative alternative is more ‘accessible’ to acquirers, thereby rendering the conservative value in effect ‘inaccessible’, or unlearnable. This view



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

INTRODUCTION

has two important consequences. First, to the extent that abductive reanalysis of the sort discussed in §. is a symptom of an underlying parameter change, it can explain the pervasiveness of this kind of reanalysis in syntactic change. Second, it entails that language learnability is intimately connected to change—in fact learnability becomes the key to understanding syntactic change. This further tightens the connection between L acquisition of syntax and syntactic change. In fact, Niyogi () points out that ‘every theory of language acquisition also makes predictions about the nature of language change’. We begin by looking in §. at the current state of knowledge regarding first-language acquisition of syntax, basing our presentation fairly closely on the discussion in Guasti (). In §. we consider what we call, following Clark and Roberts (), ‘the logical problem of language change’; we will see that this is closely related to, and maybe subsumes, the Regress Problem discussed in §.. In this context, we will develop more fully the role of the third factors, Feature Economy (FE) and Input Generalization (IG). These third factors play a major role in determining the nature of parameters and therefore of parameter change. In §. we try to get a picture of what kinds of external circumstances could cause a parameter change; this relates closely to the discussion of the logical problem of language change. Under this heading, we discuss in a preliminary way the possible role of language contact (although this will be the focus of Chapter ), as well as the notion of cue introduced by Dresher () and discussed at length in relation to syntactic change in Lightfoot (). We also discuss the role of morphological change in triggering syntactic change. §. introduces the notion of markedness, characterizing it directly in terms of third factors. Finally, in §., we bring the strands of the discussion together by presenting parameter hierarchies in detail and looking at how systems may change their position in a hierarchy through parameter change. We also consider diachronic (in)stability; here again the distinctions among the types of parameters introduced in §.., macroparameters, mesoparameters, microparameters, and nanoparameters, will be important. This concludes the general discussion of parametric change as the mechanism of syntactic change, the remaining chapters of the book being concerned with the wider implications of this view. But let us now begin at the beginning, i.e. with firstlanguage acquisition. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

3.1. Parameter-setting in first-language acquisition: a scenario for language change? 3.1.1. Introduction In this section, I will summarize some aspects of the burgeoning literature on first-language (L) acquisition of syntax. My focus will be on the major empirical observations and their implications for the thesis that parametric change is driven by the acquisition process. These are, first, that many important parameters appear to be set rather early in the acquisition process (see () below), and, second, that there are two phenomena of interest in children’s early production: the so-called root or optional infinitives (see Radford , ; Weissenborn ; Crisma ; Platzack ; Pierce ; Wexler , , ; Behrens ; Kramer ; Poeppel and Wexler ; Guasti /; Rizzi /, ; Haegeman b; Phillips ; Torrens ; Hyams ; Schütze ; Hamann and Plunkett ; Hoekstra and Hyams ; Wijnen ; Rasetti ; Hyams , ; see also Guasti : ff. and the references given there) and early null subjects (see inter alia Hyams , ; O’Grady, Peters, and Masterson ; Bloom ; Valian ; Gerken ; Ingham ; Wang et al. ; Weissenborn ; Hyams and Wexler ; Rizzi , ; Bromberg and Wexler ; Clahsen, Kursawe, and Penke ; Haegeman a, ; Hamman, Rizzi, and Frauenfelder ; Guasti , ; Lee ; Roeper and Rohrbacher ; the papers in Friedemann and Rizzi ; as well as the references given in Guasti : .). The purpose of our discussion is to indicate to what extent our understanding of the parameter-setting process has been furthered by this work, and to see if an empirical connection with a parameter-changing approach to syntactic change can be discerned. Before looking at the phenomena which have been observed, however, we need to be clear about our general conception of first-language acquisition. In the introduction to Chapter , I presented and tried to justify Chomsky’s claim that the human language faculty is a facet of human cognition, physically instantiated in the brain and, most importantly, genetically inherited as an aspect of the human genome.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER‐SETTING IN FIRST‐LANGUAGE ACQUISITION

Under this conception of the language faculty, first-language acquisition can be characterized in the following terms:1 The language faculty is a distinct system of the mind/brain, with an initial state S common to the species . . . and apparently unique to it in certain respects [footnote omitted]. Given appropriate experience, this faculty passes from the state S to some relatively stable state SS, which then undergoes only peripheral modification. (Chomsky : )

The stable state is adult competence in a given language, which remains unaltered in essential respects from childhood on. Here we can take the initial state to be not just Universal Grammar (UG) as the domainspecific first factor in language design (see note ) but rather what Rizzi (: , note ) calls ‘UG in the broad sense’, which includes the innate but non-domain-specific third factors. The process of firstlanguage acquisition is the process by which the language faculty ‘passes from the state S to some relatively stable state SS’ in Chomsky’s formulation. We thus have the following schematic notions: () a. (‘Broad’) UG = S (initial state); b. Adult competence = SS (stable state); c. Stages of acquisition = < S, . . . , Si>, . . . , Sn>i, Sn+. Here, (c) indicates the stages of first-language acquisition. These can be thought of as an ordered n-tuple of states of indeterminate, but certainly finite, number, occurring later than S and earlier than SS. They correspond neither to the initial state nor to the adult competence, but rather to what we can think of as an immature competence. How are the various states defined in relation to one another? To put it crudely, what does Sn+ have that Sn lacks in (c)? As we have seen, the ‘innateness hypothesis’ claims that S is determined wholly by the As we mentioned in note  of Chapter , modularity—the idea that the language faculty is a domain-specific innately endowed system of the mind/brain—is not essential to the minimalist conception of the language faculty. But the crucial point for the purpose of the discussion of first-language acquisition here is that even if the language faculty (in either the broad or the narrow sense as defined by Hauser, Chomsky, and Fitch ) is in some sense ‘emergent’ and may lack a single neurophysiological locus in brain architecture and a single phylogenetic source (see Berwick and Chomsky ,  on the latter), we can nevertheless meaningfully distinguish the initial state of the system (or of the relevant subparts) in the newborn child from the modified state which is the stable, adult state. This process appears to be subject to a critical, or sensitive, period; see Guasti (: ‒) and §. for discussion. The notion of a critical/sensitive period for language acquisition is important for Niyogi’s () account of language change and diversification; see §... 1



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

genome, independently of any experience. This, as we have seen (see §.) is at least the first factor in language design in the sense of Chomsky (), possibly, following Rizzi (), also the third factors. The second and third factors play a role in the transitions between the stages of acquisition in (c). SS differs from S in that it is at least partly determined by experience of the linguistic environment: the primary linguistic data, or PLD, the second factor in language design. The PLD, among other things such as providing the vocabulary of the first language, clearly plays a central role in causing parameters to be set to determinate values. As we shall see in more detail below, the third factors (including at least Feature Economy, Input Generalization, and the Tolerance Principle; we will also discuss the Subset Principle in §..) play an important role and to a significant extent determine the form of the parameters and their interactions. We can therefore assume that the various intermediate states are distinguished by having differing values of various parameters. Each Sn differs from Sn+ in one of two ways: either insufficient experience has been accumulated at Sn for setting certain parameters, or the overall system has not matured sufficiently at Sn to permit certain parameters or parameter values to be attained. If parameters emerge from the interaction of the three factors (with ‘narrow UG’ remaining constant in simply underspecifying formal-feature values), then different parameters come ‘on line’ for acquisition at different times, with the system gradually developing with respect to the kinds of PLD it is able to accommodate, as well as to interactions among parameters, in accordance with the third factors (see §. and §.). In any case, we can consider that the various intermediate states differ from one another in representing successively closer approximations to the adult system (SS) in terms of the values of the parameters. To put it rather simplistically: if m parameters are set to the adult value at stage Sn then at least m +  parameters are set to the adult value at stage Sn+. Beginning with the pioneering study in Hyams (), the nature of the intermediate grammars has been studied fairly intensively. Guasti (: , , ) points out that between the ages of  and  years old, i.e. some time before linguistic maturity if this is characterized as the stable state, children know at least the following about the parameter values of the language they are in the process of acquiring (relevant references are given in Guasti : ‒, , ‒): 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER‐SETTING IN FIRST‐LANGUAGE ACQUISITION

() a. the value of the head direction parameter in their native language; b. the value of the V-to-T parameter in their native language; c. the value of the V parameter in their native language; d. the value of the topic drop and null-subject parameters; e. the value of the parameters governing question-formation, overt movement or in-situ placement of the wh-element and application or non-application of T-to-C movement. To this we can add that Hamann, Rizzi, and Frauenfelder () show that as soon as a French-acquiring child produces clitics, they are placed in the correct clitic position for French, even though there is much parametric variation in clitic-placement across languages. Wexler (: ) describes this general phenomenon as follows: ‘[b]asic parameters are set correctly at the earliest observable states, that is, from the time that the child enters the two-word stage around  months of age’. He continues: ‘[q]uite possibly . . . children have set basic parameters . . . before the entry into the two-word stage’. This observation has become known as Very Early Parameter-Setting. Guasti (: ‒) also provides evidence that, still between the ages of  and , children know the properties of unaccusative verbs (see §..). Furthermore, by the age of , children comprehend and produce passives based on actional verbs, although they have difficulty with passives of non-actional verbs (Guasti : ‒); by around  to  years old, they have acquired most, but not all, of the principles governing the distribution and interpretation of anaphoric and other pronouns (the binding theory: see the textbooks mentioned in the Introduction for discussion of this aspect of syntactic theory) (: ‒): by the age of , they have also acquired the principles concerning the distinction between referential and quantified expressions (for example, John vs. every boy) and many aspects of the interpretation of quantified expressions (: ). Most of the parameters listed in () are familiar from Chapter . (a) refers to word-order parameters, of which we identified several subtypes in the discussion in §.. (parameters F–F). The L-acquisition literature has shown that children are sensitive to these parameters and that ‘from the onset of multiword utterances (or even earlier)’ (Guasti : ) they have correctly identified the relevant values for the ambient language; see Guasti (: ‒) for discussion of 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

how word-order parameters may be acquired. (b) clearly refers to parameters B‒B of §..., most likely the mesoparameter B; see Guasti (: ‒) on evidence for English-acquiring children’s knowledge of lack of V-to-T movement in English. (c) refers to parameter C, the basic verb-second mesoparameter, of §.... (d) refers to parameters A‒A of §.., although the notion of ‘topic drop’ was not discussed there; I will return to this in the discussion of ‘early null subjects’ in L acquisition below. (e) refers to parameter E of §... It seems then that the early-set parameters in () are all mesoparameters. All of the parameters listed in () are important and salient for synchronic description, as we saw in Chapter , and Guasti’s summary of the L-acquisition evidence shows that they are salient for language acquirers; these parameters are set early and correctly, or so it appears. As it stands this observation supports the poverty-of-the-stimulus argument, as Guasti implies (: ); children are able to glean the values of these parameters from the PLD almost before they are able to produce multiword utterances. This strongly suggests a predisposition to the task, given the very young age of the children (multiword utterances normally begin between  and  months old; see Guasti : ), the complexity of the PLD, and the rather abstract nature of these parameters. However, as we saw in Chapter , several of these parameters can be shown to have changed their values in the recorded history of various languages. This brings us face to face with what Clark and Roberts (: –) termed the logical problem of language change, which I will discuss in more detail in the next section. For the moment, it suffices to note both the tension between Guasti’s conclusions and what we saw in Chapter , and that the obvious resolution of this tension must involve some intergenerational change in the nature of the PLD; some of the questions raised by this conclusion were discussed in §., and will be further discussed later in this chapter. Returning to the discussion of the intermediate stages of acquisition as defined in (c), it may seem that we have little evidence regarding the setting of the mesoparameters in () beyond the fact that they are typically set early and accurately. Far from shedding light on a parameter-resetting approach to syntactic change, this appears to pose a problem, as just noted. Of course, it is entirely possible that parameter-resetting can take place between  and  months, but this will be very difficult to document on the basis of children’s 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER‐SETTING IN FIRST‐LANGUAGE ACQUISITION

utterances prior to the two-word stage (which normally starts at around  months); it is possible that some experimental methodology could be developed in order to ascertain this, but I am aware of none that has been developed at present. (Wexler : , note  makes the same point.) But the grammar of the earliest stages of multiword production is still an immature one in the sense that it is subject to modification through further stages of acquisition. Guasti’s conclusions as listed in and immediately below () appear to show that there is an intermediate, fairly early, state of the language faculty Si> at which a number of important parameters have been set, but that there are nevertheless further stages of acquisition, and presumably therefore of parameter-setting, Sn>i, Sn+ negation order in a non-V environment, show that that Old Norse had V-to-T movement: ()

a. ef hann var eigi þinn broðir [ON] if he was not your brother ‘if he was not your brother’ b. ok spurðu hvárt þeir hefði ekki þar um riðit and asked whether they had not there  ridden ‘and asked whether they had not ridden there’ (Faarlund b: )

This clearly indicates that the positive value of the V-to-T parameter is the conservative one. So Icelandic and Faroese form the required minimal pair as regards the V-to-T parameter. Concerning (b), there is evidence that Icelandic children go through a root-infinitive phase. Sigurjónsdóttir (: ) reports examples such as the following: ()

a. Kisa finna dúkkinni. Cat find. the-doll. ‘The cat finds the doll.’ b. Ég fá húbahúba hjá ömmu. I get. hubahuba at grandma’s.

[Icelandic]

Jonas (: ‒) gives evidence for root infinitives in child Faroese, illustrated in (): ()

a. Radio danda har radio stand-inf. here b. Maria ikki klippa Maria- not cut- (it) ‘Maria didn’t cut (it).’



[Faroese]

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

In (a), we see a simple root infinitive in a main clause; in (b) the nonfinite verb follows the negator ikki. Jonas (: ) also gives examples showing finite verb preceding ikki.6 Concerning (c), Sigurjónsdóttir (: ) shows that the incidence of root infinitives declines from % to % of clauses between age ; and age ; in Icelandic. Sigurjónsdóttir () shows that there are two stages, a ‘pure’ root-infinitive stage from ; to ;, and an optional root-infinitive stage from ; to nearly ; (see her table , p. ). Furthermore, root infinitives overwhelmingly follow the negator ekki, which finite verbs always precede in the adult language (Sigurjónsdóttir : , : , table ). Moreover, raised finite verbs show agreement while root infinitives do not. Between ; and ; finite verbs effectively replace root infinitives and consistently raise across ekki (Sigurjónsdóttir : , ); this indicates that the loss of root infinitives is connected to the acquisition of finite, agreeing verb-forms. Sigurjónsdóttir does not comment directly on agreement, but she does point out that her results show a relationship between the child’s ‘realization that the infinitive form in Icelandic has a stem and a suffix and [the] acquisition of finiteness’ (Sigurjónsdóttir : ). So we see evidence that acquisition of the agreement-finiteness inflectional system correlates with the disappearance of root infinitives. Again, this can be seen as the child determining the correct setting of the [Tense] parameter in Legate and Yang’s () terms. Finally, concerning (d), we saw in §. that Thráinsson () and Heycock et al. () conclude that morphological reduction of agreement inflections has played a causal role in the loss of V-to-T in Faroese. We conclude that the setting of the V-to-T parameter in the 6

Heycock et al. () also show that children acquiring Faroese produce and accept examples with V-to-T movement. They observe that ‘[g]iven the very scant data available in an SVO language to determine whether V-to-T is possible in the absence of V, it might have been surmised that the diachronic change could initially have been triggered by children failing to get enough evidence for V-to-T, and that it might subsequently have been driven by children underestimating its frequency in the target grammar. These findings suggest that this is an unlikely scenario, since the acquisition bias appears to be in the other direction (the children initially hypothesize more V-to-T than is warranted by the input)’ (Heycock et al. : ). Heycock and Wallenberg (: ) cite Waldmann () for examples of finite verb>negation order in non-V contexts in child Swedish. The behaviour of Faroese and Swedish children is clearly very different from that of children acquiring (Modern) English, as observed above. The reasons for this are unclear, but it seems likely that it is connected to the nature of embedded V in these languages, something clearly absent in English.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER‐SETTING IN FIRST‐LANGUAGE ACQUISITION

acquisition of Icelandic and its loss in the recent history of Faroese comes very close to instantiating the ideal scenario for establishing the connection between language acquisition and language change. Since Faroese-acquiring children largely fail to acquire V-to-T movement (but see note  for an important proviso) but do set the [Tense] parameter correctly, the likely supposition is that successful acquisition of the Faroese V system is what is crucial; here the relevant evidence from studies of the acquisition of Faroese is unfortunately lacking at present, but the prediction is clear. We see that it may be possible to link root infinitives to the loss of V-to-T movement in both English and Faroese. What may have happened at the crucial historical stage in both languages is that the morphological trigger for the positive value of the V-to-T parameter was too meagrely expressed. It cannot be the case that the relevant generation of acquirers simply became ‘stuck’ in the root-infinitive stage of acquisition, since both English and Faroese show verbal inflection for tense and agreement, albeit in a limited way compared to languages with V-to-T movement. The difference is that, in non-V contexts in Faroese and where no auxiliary is present in English, tense/ agreement marking is associated with the finite verb by Agree rather than by movement (see §..); this particular manifestation of Agree in English has been known as Affix Hopping since Chomsky (). Let us therefore modify Legate and Yang’s proposal, while keeping their central idea that root infinitives are a reflection of ongoing variational parameter-setting, along the following lines: the morphological evidence of past-tense marking in both English and Faroese is sufficient to set the [Tense] parameter to the positive value, but the nature of the agreement marking is insufficient to set the V-to-T parameter to its positive value (see the discussion in §..). English and Faroese differ in their variants of the [+Tense] parameter: in the case of Faroese, we can take this to be V, especially the ‘liberal embedded V’ found in that language (which, as we saw in §.., also may account for speakers with the innovative value of the parameter being able to accommodate to strings generated by the conservative grammar, e.g. verb>negation order in embedded clauses). As we have seen, English had already lost V before it lost V-to-T. In English, the operative variant of [+Tense] must be expressed by the auxiliary system, particularly do-support, another property known to be incompatible with root infinitives. Correctly acquiring embedded V in Faroese and do-support in English 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

then causes the root-infinitive stage to be passed (in English this takes longer than in many other languages, which we can attribute to the highly marked nature of do-support). This step essentially involves acquiring Affix Hopping: the morphological operations associating the content of Tense with the unmoved verb (we could elaborate this to say that Faroese has Affix Hopping from C to V in non-V embedded clauses, while English has Affix Hopping from T to V). The rootinfinitive stage reflects a grammar which lacks both V-to-T movement and Affix Hopping, hence the verb appears in the morphologically unmarked form; clearly the postulation of a silent modal is compatible with this stage too, as argued by Guasti and Rizzi () for English and Sigurjónsdóttir () for Icelandic. This account is consistent with the idea that the [Tense] option is a macroparameter, V-to-T is a mesoparameter and the Affix Hopping options are microparameters (do-support, since it affects just one lexical item, might be a nanoparameter). Moreover, this account is consistent with the spirit, but not exactly the letter, of the parameter hierarchies for verb-movement proposed in Roberts (a: , ), and with very early setting of macroparameters, at least; see note  and Tsimpli () on early vs. late parameter-setting.7 We conclude that there is a connection between root infinitives in acquisition and the diachronic loss of V-to-T movement: in acquisition, ‘rich’ agreement triggers V-to-T movement, while in diachrony the loss of agreement is linked to the loss of movement. The connection is somewhat indirect, despite the approximation of Faroese-Icelandic to

7 Although we can treat the [Tense] macroparameter as set early, there is still an issue with V-to-T. In fact, only in French and Icelandic do we observe vacillation for V-to-T, precisely in the form of root infinitives. Recall that English children never produce strings like John speaks not, and in Dutch, German, and Swedish root infinitives are in complementary distribution with V; no ‘intermediate’ V-to-T system, with the finite verb not in second position, is observed (with the exception of Swedish, mentioned in note ). French and Icelandic are precisely the languages whose agreement is insufficient for consistent null subjects, but sufficient for V-to-T movement. Presumably it is the fact that these systems lie in this grey area which leads acquirers to pass through a stage where they treat them as lacking V-to-T, i.e. effectively lacking any form of ‘rich’ agreement. Recall also that lack of V-to-T movement is in principle preferred over V-to-T movement (see §..), and so the root-infinitive stage could result from this preference operating in an NNSL. A complication here is that there is some evidence that Icelandic is a PNSL, but I will leave this aside (see §..). Apart from in these cases, V-to-T is always set early. Early Faroese and Swedish production of V-to-T, mentioned in the previous note, may be linked to the nature of embedded V, as we said there.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER‐SETTING IN FIRST‐LANGUAGE ACQUISITION

the ideal scenario of (). Let us now turn to the other L-acquisition phenomenon of interest: early null subjects. 3.1.3. Early null subjects Early null subjects are illustrated by the French example in () (from Guasti : ): () Est trop gros. is too big ‘It is too big.’

[French]

While similar to root infinitives, early null subjects differ from them in that the verb is clearly finite in form, as can be seen from (), and the fact that they are compatible with the presence of auxiliaries (see the discussion in Guasti : ). Guasti (: , figure .) shows that early null subjects are attested in several NNSLs in addition to French: Dutch, Flemish, German, Danish, and English. Sigurjónsdóttir (, ) shows that they are found in Icelandic. At first sight, it might seem that early null subjects provide evidence that the null-subject parameter is not set as early as we have been supposing up to now. In fact, in her pioneering work on this phenomenon, Hyams () proposed that early null subjects were an indication of ‘parameter-missetting’ in relation to the null-subject parameter, in that children acquiring NNSLs initially set the parameter to the positive value. This led to the suggestion that the null-subject parameter may have a default positive value. Hyams’ account of early null subjects comes close to fulfilling our ideal scenario for the connection between language acquisition and language change. In terms of that scenario as presented in (), we could suppose that L is Italian and L0 is English; P is the null-subject parameter; vi is the positive value of that parameter; and vj is the negative value. Although, as with the English-French case discussed above, English and Italian are not closely related in that they do not belong to the same genus, we can say with some confidence that Proto Indo-European, the common parent of English and Italian, was a nullsubject language (see §..), and that Italian has retained that value from Latin, which retained it from the parent language. In that case, English is syntactically innovative with respect to Italian as far as this parameter is concerned. However, this case departs from () in that it is 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

the innovative language which shows a property characteristic of the conservative system, since it is early English which shows the Italianlike property of null subjects. However, we can pursue the acquisitionchange connection here by asking what property of English acquisition ‘resets’ the parameter to the innovative, non-null-subject, value, and then asking whether we can observe this property changing at a relevant stage of the history of English. In these terms, the crucial factor F, causing children to fix the parameter correctly after a period of ‘aberrant’ production, could be either the presence of modal auxiliaries (originally identified by Hyams as inherently incompatible with null subjects) or overt expletive pronouns (often thought to be incompatible with a positive value for the null-subject parameter—see Rizzi a, and, for a different view, Holmberg ). The difficulty with this is that feature F is predicted to have arisen when English, or the relevant ancestor of English, ceased to be a null-subject language. We saw in §.. that OE was probably not a null-subject language, although there is some evidence that the stage of the language immediately prior to the earliest texts may have been a partial null-subject language (PNSL), and that Walkden (: ) concludes that Proto West Germanic was probably a PNSL, with Proto Germanic a CNSL. But both modal auxiliaries and overt expletive pronouns are much more recent innovations in the recorded history of English. It is usually thought that modal auxiliaries of the Modern English type arose in the sixteenth century (as was briefly discussed in §..; see note  of that section on the chronology of this change), while overt expletives appeared during ME (Williams ; Biberauer ; Biberauer and Roberts ). Thus there is a clear chronological mismatch regarding this feature, which vitiates this particular application of the scenario. An alternative might be to take L0 to be French. Here we are on much firmer ground regarding criterion (a): French and Italian clearly belong to the same genus and it is clear that Latin was a null-subject language. So French can certainly be defined as syntactically innovative in relation to Italian in this respect. It is also clear that French shows early null subjects (see ()), and so we see the relevant kind of ‘aberrant’ production in child language. However, we run into difficulty with the third part of the scenario: neither French nor Italian has modal auxiliaries, and so the relevant factor must be overt expletive pronouns, which of course Modern French has but Italian lacks. But the problem 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER‐SETTING IN FIRST‐LANGUAGE ACQUISITION

is that OF, which, as we saw in §.., was a CNSL (albeit of an unusual kind), had overt expletives, as the following examples show: () a. Il est juget que nus les ocirum. it is judged that we them will-kill (Roland, ; Roberts a: ) b. Il ne me chaut. it not to-me matters (Einhorn : )

[OF]

One possibility for saving this approach might be to claim that the expletives illustrated in () are in SpecCP (which they almost certainly are, given the V nature of OF; see §...), and that the relevant property for changing the null-subject parameter, in both acquisition and change, involves expletives in SpecTP. However, there are examples of expletives in SpecTP in OF, such as the following: () car ainsin estoit il ordonne for thus was it ordained ‘for thus it was ordained’ (Vance  (b), ; Roberts a (b): )

[OF]

We must therefore conclude that OF had overt expletive pronouns able to appear in SpecTP. We therefore do not know what caused the nullsubject parameter to change its value in the history of French, and so we are unable to relate this change to the acquisition of a given value of the null-subject parameter (see the discussion of the historical development of null subjects in the history of French in §..). There is a more fundamental problem in extrapolating from early null subjects to changes in the null-subject parameter however. Since Hyams’ early work, evidence has emerged that early null subjects are not the result of a ‘missetting’ of the null-subject parameter to the Italian value. The main reason for this is that early null subjects do not occur in the following environments (Guasti : ‒): () a. questions with a fronted wh-element; b. subordinate clauses; c. matrix clauses with a fronted non-subject.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

On the other hand, null subjects readily occur in these environments in adult null-subject languages, as the following Italian examples (from Guasti : ‒) show: () a. Cosa mangi? What eat. ‘What are you eating?’ b. Non so cosa mangi.  know. what eat. ‘I don’t know what you are eating.’ c. Ieri ho dormito tanto. Yesterday have. slept a lot ‘Yesterday, I slept a lot.’

[Italian]

Because of data like this, it has been widely concluded that, despite initial appearances, early null subjects are not a case of missetting of the nullsubject parameter to the ‘Italian’ value. Hence our comparison of Italian and French above in relation to the ‘ideal scenario’ was to no avail. Another option, pursued by Hyams (), was to claim that early null subjects result not from a ‘subject-drop’ option of the familiar Italian kind, but from a ‘topic-drop’ option of the kind seen in languages such as Chinese and Japanese (see in particular Huang (, ) on this). The advantage of this idea is that it reconciles the occurrence of early null subjects in languages with impoverished agreement systems with the known facts of adult languages: while CNSLs like Italian appear to require ‘rich’ verbal agreement for the recovery of the content of null subjects (see §..), topic-drop languages like Chinese have no agreement at all and yet allow null arguments of various kinds, as the following Chinese examples illustrate:8 () a. __ kanjian ta le. (he) see he 

[Chinese]

b. Ta kanjian __ le. he see (him)  ‘He saw him.’

8

We saw in §.. that languages with null subjects, null objects, and no agreement are radical null-subject languages (RNSLs). RNSL-hood is arguably independent of topic drop, as German is typically thought to have topic drop (Ross ; Haider ; Trutkowski ) but is a non-NSL. The comments concerning null objects in the text to follow are unaffected by this complication, however.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER‐SETTING IN FIRST‐LANGUAGE ACQUISITION

The disadvantage of this approach is also apparent from (b): topicdrop languages allow null objects fairly freely, in addition to null subjects. On the other hand, Hyams and Wexler () show that early null objects are rather rare in child English, and Wang et al. () show that child Chinese allows null objects significantly more freely than child English. So the idea that the putative parametermissetting is in the ‘Chinese’ direction rather than in the ‘Italian’ one does not appear to hold up either. So it appears that there is no obvious ‘parameter-missetting’ going on with early null subjects. Other possibilities which have been explored to account for this phenomenon include relating it to diary drop, a phenomenon, originally identified by Haegeman (), characteristic of certain registers in NNSLs such as English and French. Relevant examples are: () a. . . . cried yesterday morning. (Plath : ) b. Elle est alsacienne. . . . Paraît intelligente. She is Alsatian. Seems intelligent. (Léautaud : ) These examples have been argued to involve clausal truncation (but at a higher level of clausal structure than that involved in root infinitives as described above) by Haegeman () and Rizzi (). Indeed, a truncation analysis seems to account for this phenomenon quite well; see Guasti (: ff.) for summary and discussion. The other accounts which have been put forward involve extrasyntactic factors, such as processing difficulties (Bloom ) and metrical difficulties (dropping of weakly stressed syllables; Gerken ). As these do not involve factors which may be relatable in any direct way to parametric change, I will leave them aside here. (See Guasti : ‒ for critical discussion of Gerken’s proposal.) 3.1.4. Conclusion In conclusion, for all their intrinsic interest and the light they shed on L acquisition, it seems that early null subjects cannot clearly be related to the kinds of phenomena seen in parametric change; root infinitives, on the other hand, do allow us to make a tentative connection between 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

studies of immature competence and the acquisition-driven conception of parametric change.9 Hence we can take a small, initial step in empirically establishing the acquisition-driven conception of parametric change: the change in the V-to-T parameter, both in acquisition and in diachrony, is the consequence of the nature of agreement. Acquisition of agreement leads to the loss (or non-occurrence, in the case of CNSLs) of root infinitives. Diachronic loss of agreement (itself a byproduct of independent morphological and phonological changes) leads to the loss of V-to-T movement. These conclusions give us a clear indication of the importance of the second factor, the PLD, in parameter-setting. There are several reasons why making the connection between parameter-setting in acquisition and parameter-(re)setting in diachrony is difficult in practice. First, there is something of a sociological divide between linguists working on L acquisition and those working on diachronic syntax.10 This is a regrettable, but entirely contingent state of affairs, and something which can in principle easily be remedied. Rizzi (: –) conjectures that root infinitives and early null subjects (as well as ‘determiner drop’ and ‘copular drop’, two other features of the production of - and -yearolds not discussed here) may arise from the fact that ‘the child initially assumes all the parametric values which facilitate the task of the immature production by reducing computational load’ (). This general strategy is constrained by the values of the parameters which are fixed early, hence the observation that the null-subject parameter is correctly fixed in the acquisition of null-subject languages, but early null subjects and root infinitives may appear in the acquisition of NNSLs owing to the adoption of the conjectured strategy. On the other hand, it might be tempting to regard the generalized ‘dropping’ option as a macroparameter in the sense of §.. (). This is implausible, however: if parameter hierarchies are intended to reflect learning paths, then macroparameters must be amongst the most early-set parameters, and so we would not expect to see evidence that they are not fixed at these relatively late stages of acquisition. We suggested above that root infinitives reflect a stage of acquisition in which Legate and Yang’s () [Tense] parameter, a macroparameter, is positively set, V-to-T is negatively set, but the relevant Affix Hopping microparameters are not set. Perhaps an analogue of this proposal could account for early null subjects as well as determiner and copula drop. Such a conclusion would not necessarily be inconsistent with Rizzi’s conjecture, which could be relativized to morphological microparameters such as Affix Hopping. 10 As mentioned several times already, Lightfoot has consistently made the connection between syntactic change and the acquisition of syntax; see Lightfoot (, , , , ). In particular, Lightfoot () develops the ‘degree- learnability’ theory with a view to explaining aspects of both. Lightfoot’s application of degree- learnability to word-order change in the history of English was briefly discussed in §.; he also applies the same notion to aspects of language acquisition. This, Clark and Roberts (), Hale (, ), Roberts (), DeGraff (), and Yang (, ) are the early cases in the literature where an explicit connection is attempted between change and acquisition; for more recent discussions, see Hale (, ); Yang (, , ); Aboh and DeGraff (); Fuß (). The connection is mentioned in Hyams (: , n. ). 9



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER‐SETTING IN FIRST‐LANGUAGE ACQUISITION

Second, good empirical coverage of early production is limited to a few languages: English, French, Dutch, German, Catalan, Spanish, Greek, Mandarin, Japanese, and Italian most prominent among them to judge by Guasti (); in diachronic syntax, selected topics have been studied in the histories of a range of languages, but a good overall picture of the syntactic history of languages outside the Germanic and Romance families largely remains to be established. Thus our database of languages is at present rather small, and so our chances of finding the ideal case correspondingly restricted. Third, and most importantly, the nature of the data in both cases may make the ideal scenario described above hard to identify. The immature competence of small children goes hand in hand with a general cognitive immaturity, notably for example a smaller short-term memory capacity, which means that comparing children’s grammars with adult grammars may really be like comparing chalk with cheese. The diachronic data we have is the output of adult competence, but of course the surviving texts have been subject to many vicissitudes of history; one of the principal goals of traditional philology is simply to unravel the sometimes tortuous histories of extant texts. In their different ways both the acquisition data and the diachronic data are corrupt, and this makes comparing data from the two sources in any reliable way all the more difficult. Of course, what we would ideally like is an acquisition study of an earlier stage of a language. Since acquisition studies only began in the midtwentieth century (see Grégoire –; Leopold –; Jakobson ), this is not possible at present.11 Although empirically establishing the connection between language acquisition and language change is challenging, three points are worth bearing in mind. First, as stated above, what is lacking is clear empirical confirmation through acquisition studies that parameter change is driven by acquisition; the lack of evidence does not disconfirm this approach, especially since we can quite easily see why such evidence 11 I am aware of one striking exception to this generalization: the journal kept by Jean Héroard in the period – of the speech of the young dauphin Louis XIII (AyresBennett : ff.). Héroard, who was the dauphin’s personal physician, transcribed samples of the dauphin’s speech between the ages of ; and ; (i.e. in the years –). This immediately yields some interesting observations, notably that ne was already dropped, i.e. the dauphin’s production represented a very early instance of Stage III of Jespersen’s Cycle in French (Ayres-Bennett : ; see §., note  and §..). Ernst () is an edition of Héroard’s journal. Ayres-Bennett (: ff.) provides further examples of the sporadic omission of ne in seventeenth-century French.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

may be lacking, and the conceptual arguments (see §.) are unaffected. Second, the ‘ideal scenario’ of () may well be realized by some new data; in fact, the evidence concerning Icelandic root infinitives and Faroese verb-movement comes close to the ideal scenario, as we saw above. Third, as mentioned above, Guasti’s conclusions regarding early and accurate setting of several important parameters, listed in (), lead us to pose the intriguing logical problem of language change. It is to this last issue that we now turn.

3.2. The logical problem of language change In this section, we will look in more detail at the apparent tension we noted in the previous section between the evidence that many important parameters are set at an early stage of language acquisition and the idea that syntactic change is driven by abductive reanalysis, associated with parametric change, of parts of the PLD. This will lead us to formulate the logical problem of language change, as a kind of paradox for learnability theory. At the same time, some basic concepts of learnability theory are introduced. In considering a possible approach to solving this problem, we introduce the Inertia Principle first put forward in an early version of Keenan (), as developed in Longobardi (). This in turn leads to an explicit characterization of the conditions under which abductive change can take place, in terms of ambiguity and complexity. The discussion will sharpen some of the notions connected to parameter-setting that we are concerned with, and so our overall account of syntactic change will be further elucidated, and a number of questions raised for detailed consideration in later sections of this chapter. Let us begin by recapitulating some of the ideas we have put forward up to now: () a. The central mechanism of syntactic change is parameter change. b. Parameter change is manifested as (clusters of) reanalyses. c. Reanalysis is due to the abductive nature of acquisition. The idea in (a) was argued for at length in Chapters  and . Both (b) and (c) were introduced in the discussion of reanalysis in §.. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE LOGICAL PROBLEM OF LANGUAGE CHANGE

There we also saw Andersen’s () familiar schematization of abductive change, which I repeat here: () Generation 1: G1 → Corpus1

Generation 2: G2 → Corpus2 Owing to the abductive nature of language acquisition, Grammar  (G) may in principle not be identical to Grammar  (G). If identity between grammars is defined in terms of identity of parameter-settings, this implies that G may differ from G in at least one parameter value, and, by (b), this will give rise to a cluster of reanalyses. Following Roberts and Roussou (: ), we can give the following characterization of abductive change (cf. also Lightfoot (, , ), Fuß ): () (A population of) language acquirers converge on a grammatical system which differs in at least one parameter value from the system internalized by the speakers whose linguistic behaviour provides the input to those acquirers. This essentially states what the schema in () illustrates. So where G may differ from G in at least one parameter value in (), an abductive change takes place. Let us consider () and () in the light of the main concepts of learnability theory, the abstract, formal theory which deals ‘with idealized “learning procedures” for acquiring grammars on the basis of exposure to evidence about languages’ (Pullum : ). In terms of this theory, any learning situation can be characterized in terms of the answers to the following questions (this presentation is from Bertolo : ff.): () a. ‘What is being learned . . . ? b. What kind of hypotheses is the learner capable of entertaining? c. How are the data from the target language presented to the learner? d. What are the conditions that govern how the learner updates her responses to the data? e. Under what conditions, exactly, do we say that a learner has been successful in the language learning task?’ 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

The answers to some of these questions are obvious, given the assumptions we are making here regarding UG and parametric variation: for example, the answer to (b) is that the learner can only consider distinct values of parameters in the acquisition of syntax. Similarly, the answer to the question of how the data are presented to the learner—(c)—appears to be simply in the form of spontaneous linguistic behaviour which makes up the PLD. As we saw in the introduction to Chapter , no negative evidence is available to the learner. One could put the answer to this question more abstractly, and say that the data is presented in the form of p-expression—as defined in () of §..—in the strings in the PLD. Some of the other questions in () are trickier. For instance, our entire discussion of the possibility of evidence of ‘parameter-missetting’ in the previous section can be construed as addressing (d). On the basis of that discussion, Yang’s (a,b, , ) variational learning model could be seen as providing the possibility of updating of at least some parameter-settings; moreover, the Tolerance and Sufficiency Principles of () and () of §.. may be able to formalize the relation between the acquisition of lexical items (exponents of functional heads bearing certain formal features (FFs), in terms of the Borer-Chomsky Conjecture (BCC)) and the fixation of parameter values. But it is (a) and (e) which are most important to our concern with the relation between learnability/acquisition and change. One possible answer to (a) is that acquirers are learning the parameter values of the grammar that produces the PLD. But in that case the learning task would be seen as unsuccessful when abductive parametric change of the sort schematized in () takes place. This seems to be the wrong conclusion, since the learners in () have acquired a grammar, just not—under the usual idealization—the parental one (since G 6¼ G). So the answer to (a) would be simply that a parametric system is learned. This point relates to (e), too: the criterion for successful learning cannot be replication of the parental grammar, but approximation to it, in such a way that abductive change of the sort shown in () is possible. For this last reason, it is often thought that Probably Approximately Correct (PAC) algorithms are learning algorithms which can provide useful simulations of language acquisition. (See Clark and Roberts ; Bertolo : –; Pullum : ; and in particular Niyogi : ff. and the references given there.) 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE LOGICAL PROBLEM OF LANGUAGE CHANGE

Most work on L acquisition assumes that the stable state of acquisition, SS, corresponds exactly to the target grammar. In other words, it assumes that no ‘mismatch’ arises between G and G in (). In considering the stages of L acquisition in the previous section, we implicitly took this view in saying we can consider that the various intermediate states differ from one another in representing successively closer approximations to the adult system (SS) in terms of the values of the parameters. To put it another way: if m parameters are set to the adult value at stage Sn then at least m +  parameters are set to the adult value at stage Sn+.

But, as we have seen, abductive change requires a slightly looser formulation than this, in order to allow for the possibility of ‘mismatch’ between G and G in (); the same point is made by Fuß (: f.). The following remark by Niyogi and Berwick (: ) summarizes the difference between the standard assumptions in work on language acquisition and what seems to be required for an acquisition-driven account of change: ‘it is generally assumed that children acquire their . . . target . . . grammars without error. However, if this were always true, . . . grammatical changes within a population would seemingly never occur, since generation after generation children would have successfully acquired the grammar of their parents’. Thus language acquisition is usually taken to be deterministic in that its final state converges with the target grammar that acquirers are exposed to. The postulation of abductive change challenges exactly this assumption. In the previous section, we saw good support for the deterministic assumption in L acquisition. Recall Guasti’s list of parameter values which appear to be correctly fixed from roughly the time of the earliest multiword utterances: ()

a. the value of the head direction parameter in their native language; b. the value of the V-to-T parameter in their native language; c. the value of the V parameter in their native language; d. the value of the topic drop and null-subject parameters; e. the value of the parameters governing question-formation, overt movement or in-situ placement of the wh-element and application or non-application of T-to-C movement.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

Moreover, we saw in the previous section that two of the best-studied phenomena of child production, root infinitives and early null subjects, cannot be directly analysed in terms of ‘parameter-missetting’, i.e. in terms of mismatches between G and G in (). As Roberts and Roussou (: ) put it: ‘the standard paradigm for language acquisition is not immediately compatible with the observation that grammatical systems change over time’. This ‘standard paradigm for language acquisition’ is more than just a methodological simplification on the part of linguists working on L acquisition: () gives the evidence of accuracy and earliness in the setting of a number of important parameters. And yet we saw in Chapter  that all the parameters referred to in () (with the exception of topic drop, which was not discussed there) have changed their values in the recorded histories of various languages. So we see a tension between the results of L acquisition research and what we can observe about syntactic change. The obvious explanation for the fact that children are able to set many parameters very early lies in the highly restricted range of analyses of the PLD that UG allows them to entertain and the limited exposure to PLD needed for parameter fixation. In this respect, the facts reported in () support the argument from the poverty of the stimulus, as noted in the previous section. But then, how are we to explain the fact that these parameters are subject to change over time? Notice that our answers to the learnability questions in () do not answer this question; they simply allow for abductive parametric change without explaining how it happens. This leads us to the logical problem of language change, which we can formulate as follows (this formulation is based on unpublished work with Robin Clark (Clark and Roberts : )): () If the trigger experience of one generation permits members of that generation to set parameter pk to value vi, why is the trigger experience produced by that generation insufficient to cause the next generation to set pk to vi? The logical problem of language change as formulated here is close to the Regress Problem for reanalytical approaches to change which we introduced in §. in the following terms: an innovation in Corpus [in ()] may be ascribable to a mismatch in G (compared to G), but it must have been triggered by something in Corpus—otherwise where did



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE LOGICAL PROBLEM OF LANGUAGE CHANGE

it come from? But if Corpus could trigger this, then how could G produce this property without itself having the innovative property?

Essentially, this formulation puts the problem the other way around as compared to (), in addition to not being directly phrased in terms of parameter change. I take it, though, that () subsumes what I called the Regress Problem in the earlier discussion. The first thing that is required in order to find our way out of the dilemma stated in () is a slightly weaker notion of the deterministic nature of L acquisition than that which is usually assumed in the Lacquisition literature. So let us propose, following Roberts and Roussou (: ), that ‘the goal of language acquisition is to fix parameter values on the basis of experience; all parameter values must be fixed, but there is no requirement for convergence with the adult grammar’. More precisely, as mentioned in the discussion of (e) above, let us suppose that the goal of acquisition is to approximate the parental grammar, not to replicate it. Making this move allows for pk in () to receive a different value from that found in the input, therefore making space for language change. The stable state SS of language acquisition now amounts to the situation where all parameters are fixed to a given value (cf. the remark in relation to parameter-setting in §. that ‘not deciding is not an option’). Let us call this view of the endpoint of language acquisition ‘weak determinism’. The ‘approximation’ approach may seem too weak, in particular in that it does not appear to account for the results of L-acquisition research as summarized in (), since it in principle allows parameters to vary freely and randomly from generation to generation. However, Roberts and Roussou (: ) add an important proviso to the above quotation to the effect that convergence with the adult grammar ‘happens most of the time’; that is, approximation usually amounts to replication. This brings us to an important principle of syntactic change, first put forward in Keenan (): the Inertia Principle. Keenan formulates this as follows: () Things stay as they are unless acted on by an outside force or decay (Keenan : ). This principle is very general; in fact it holds of the physical world in general, taking decay to include entropy, i.e. the second law of thermodynamics. For our purposes, we can take () to mean that, although L 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

acquisition is not inherently deterministic but rather weakly deterministic in Roberts and Roussou’s sense, the target system is successfully converged on, i.e. the stable state SS of acquisition has the same parameter values as that of the parent system; G and G in () do not differ. This is no doubt due to the highly restricted range of analyses of the PLD that UG allows and the limited exposure to PLD needed for parameter fixation, i.e. standard poverty-of-stimulus considerations. Longobardi (: ) adopts Keenan’s principle, and puts forward the following very interesting version of it:12 () ‘syntactic change should not arise, unless it can be shown to be caused’ (emphasis his). In other words, as Longobardi says, ‘syntax, by itself, is diachronically completely inert’ (–). It is clear that this view is compatible with the results of L-acquisition research, as reported in the previous section and in (). If we combine () with Roberts and Roussou’s weak determinism, we arrive at the following: () If a definite value vi is expressed for a parameter pi in the PLD, then (a population of) acquirers will converge on vi. In other words, given adequate p-expression, Inertia will hold. So Inertia implies that most of the time abductive change does not happen. P-expression was introduced and defined in §.; the definition is repeated here, along with the corollary definition of trigger: () a. Parameter expression: A substring of the input text S expresses a parameter pi just in case a grammar must have pi set to a definite value in order to assign a well-formed representation to S. b. Trigger: A substring of the input text S is a trigger for parameter pi if S expresses pi. As long as there is a trigger for a given parameter value, then, Inertia will hold and abductive change will not take place.

12 See Walkden () for a critique of inertia, focusing on the notion of ‘cause’ in Longobardi’s formulation, and Roberts (b) for a defence of it in terms of weak pambiguity as defined in (c) below.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE LOGICAL PROBLEM OF LANGUAGE CHANGE

Under what circumstances does abductive change happen, then? This must be when no definite value vi is expressed for a parameter pi in the PLD. According to Longobardi’s version of Inertia in (), this lack of robust p-expression must be ‘a well-motivated consequence of other types of change (phonological changes and semantic changes, including the appearance/disappearance of whole lexical items) or, recursively, of other syntactic changes’ (: ). More precisely, we propose, following our discussion of reanalysis in §., that both ambiguity and opacity of the p-expression are required in order for abductive change to take place. Ambiguity is defined in relation to parametric systems as follows: () a. P-ambiguity: A substring of the input text S is strongly p-ambiguous with respect to a parameter pi just in case a grammar can have pi set to either value and assign a well-formed representation to S. b. A strongly p-ambiguous string may express either value of pi and therefore trigger either value of pi. c. A weakly p-ambiguous string expresses neither value of pi and therefore triggers neither value of pi. In fact, as we saw in §., strong p-ambiguity is what is required for reanalysis. These definitions are repeated from that discussion, where they were illustrated in relation to certain reanalytical changes. Weak p-ambiguity arises where some parameter value is undetermined, also possibly leading to change in a parameter value, although not necessarily through reanalysis. We can define opacity in terms of complexity (see Lightfoot ). Following an idea developed in Clark and Roberts (), let us assume that learners are conservative in that they have a built-in preference for relatively simple representations (the precise characterization of simplicity will be discussed in §. and §.). If a given piece of PLD is p-ambiguous, there will be at least two representations for it, each corresponding to a different grammar, i.e. representing systems with distinct parameter values. Assuming that any two representations differ in complexity and therefore opacity, the learner will choose the option that yields the simpler representation. The more complex representation will be both opaque (in virtue of being more complex than the other available representation(s)) and ambiguous (by definition). Therefore it is inaccessible to the learner, i.e. it is effectively unlearnable. The Inertia 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

Principle tells us that the strong p-ambiguity of the trigger (and therefore its relative opacity, assuming that any two alternative representations differ in overall complexity) arises through either extrasyntactic factors or as the consequence of an independent syntactic change. Actually, closer reflection reveals that the circumstances just described do not guarantee a change; they merely suspend inertia, since we can take it that p-expression (and therefore strong p-ambiguity) and the preference for relative simplicity are forces acting on the learner, in the sense relevant for the Inertia Principle as stated in (). Hence, it is possible that things will not stay as they are. Whether a parameter value of G actually changes will depend on the relative complexity of the representations of aspects of the PLD entailed by the parameter-settings in G compared to those of G: a parameter will change if it corresponds to the single option expressed by the PLD for G and the more opaque of two options expressed by the strongly ambiguous PLD for G. Here again weak determinism is relevant, in that it implies that under these conditions a definite value vi for pi will be assigned. This value will still be compatible with the input, but— again thanks to weak determinism—may differ from that of the target grammar, in which case an abductive change takes place. So, our tentative answer to the question posed in () is that between the two generations in question there is a change in the PLD. In other words, some extrasyntactic factor, or at least a factor independent of the change in question, introduces p-ambiguity into the expression of at least one parameter in the PLD of G. Still assuming that any two representations can be distinguished in terms of complexity (and still leaving complexity undefined, but see below), complexity/opacity will then choose between the two values, possibly leading to a parametric change.13 The crucial question becomes that of locating the factors that may introduce p-ambiguity into the PLD. This is what we turn to in the next section. Clearly, all of the above discussion turns on the notion of complexity, which we must therefore define. In fact, we have already introduced this notion, as the third-factor principle of Feature Economy, in §. (i): 13 Kroch (: ) points out that it is also possible that ‘extrasyntactic’ change may be attributable to some property of the learner, for example, age at the time of acquisition. This is relevant ‘in the case of change induced through second-language acquisition by adults in situations of language contact’ (). I will discuss this case briefly in the next section, returning to it in detail in Chapter .



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE LOGICAL PROBLEM OF LANGUAGE CHANGE

() Feature Economy (Roberts and Roussou : ): Postulate as few formal features as possible. Here is a more formal version of Roberts and Roussou’s proposal (for the original, see Roberts and Roussou : ): () Given two structural representations R and R0 for a substring of input text S, R is simpler than R0 if R contains fewer formal features than R0 . The notion of ‘formal feature’ here is the standard one in current versions of syntactic theory, as introduced in §..: it includes features such as Person, Number, Gender, Case, and Negation. And, as we saw in §.., movement takes place where the Probe of an Agree relation has uninterpretable FFs and ‘an extra property’ triggering movement. Chomsky (, ) proposes that that ‘extra property’ is in fact a further FF known as the EPP feature. This means that Probes, in terms of FFs, are more complex than non-Probes, and Probes that cause movement are more complex than those which do not. We will see the effects of this definition of complexity in more detail in the next section. In this section, we resumed certain aspects of the discussion of reanalysis in §., notably the question of abductive change. Applying this idea strictly to parameter change, we arrived at the logical problem of language change as stated in (), partly on the basis of some of the observations regarding language acquisition made in the previous section. We suggested a way of solving this problem on the basis of the Inertia Principle of () and the corollary stated by Longobardi in (). We are led to the conclusion that abductive parametric change only occurs when the trigger for a given parameter value, as defined in (b), is both ambiguous and opaque. P-ambiguity is defined in () and opacity/complexity in (). P-ambiguity can only be introduced through extrasyntactic factors, for example, through language contact (see note ), morphophonological erosion, or through an independent syntactic change. In the remaining sections of this chapter, I will address the two main issues that this account of change raises: the nature of changes to the trigger, and the nature of complexity as the principal force which acts on parameter-setting, preventing things from staying the same (see ()); this will be linked to the concept of markedness of parameter values. Finally I will draw these threads together and attempt a full characterization of parameter hierarchies and their role in syntactic change. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

3.3. The changing trigger Following on from our rather abstract discussion of the logical problem of language change, this section looks at the idea that changes in the trigger experience—in particular the introduction of strong p-ambiguity—are responsible for language acquirers resetting parameter values. In other words, we investigate what both Kroch (: ) and Longobardi (: –) put forward implicitly as the solution to the problem: the idea that intergenerational changes in the PLD render earlier parametersettings prone to abductive change. We saw that this is due to the expression of these parameters becoming ambiguous and opaque. I will discuss three ways in which the PLD may be rendered ambiguous and/or opaque: contact-driven parameter-resetting, as suggested in Kroch and Taylor (), King (), and Kroch, Taylor, and Ringe (); cue-based resetting of the type advocated by Dresher () and Lightfoot () (the cue-based model was originally proposed by Dresher and Kaye ); and morphology-driven parameter-resetting, as suggested in Roberts (, ); Roberts and Roussou (). These ways of rendering the PLD ambiguous and opaque are not mutually exclusive; it is very likely that all three possibilities exist. 3.3.1. Contact-driven parameter-resetting The contact-driven view of parameter-resetting can be construed, in our terms, as the case where PLD is affected by an alien grammatical system. What this means is that Generation  in the schema for abductive change in () is subjected to a different kind of PLD from Generation  in that Generation  receives PLD that either directly or indirectly reflects a distinct grammatical system (i.e. set of parameter values) from that which underlay the PLD for Generation . The direct case of contact is that where the PLD simply contains a significant quantity of tokens from a distinct system (where ‘distinct’ means that the grammar in question generates strings that cannot express the original grammar); this would naturally arise where Generation  is brought up in an environment which contains a language or dialect absent from the early experience of Generation . Such situations can and do arise through many different types of historical contingency: emigration, invasion, and intermarriage being the most obvious but certainly not the only ones. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE CHANGING TRIGGER

The indirect case of contact arises where Generation  uses a second language in interaction with Generation . Here the PLD for the two generations is very obviously distinct. On a standard view of creole genesis, in the case where this second language is a pidgin, Generation  may form a creole; I will leave this possible situation aside here and return to it in §.. If Generation  has learnt the second language after the sensitive (or critical) period for language acquisition, the PLD will consist of interlanguage and many parameter-settings may be radically underdetermined by the PLD; see Guasti (: ‒). I will come back to this in the discussion of second-language acquisition and the nature of interlanguage in §.; see also Labov (). This situation gives rise to weak p-ambiguity and hence potentially to change, which may have the properties of the creation of new grammatical features ex nihilo. The direct and indirect cases of language contact influencing PLD are diagrammed in (): () a. Direct contact:

Generation 1: G1 → Corpus1 CorpusAlien

Generation 2: G2 → Corpus2 b. Indirect contact: Generation 0: G0 → Corpus0

Generation 1: G1 → Corpus1; GAlien → CorpusAlien

Generation 2: G2 → Corpus2



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

Lucas (, ) and Breitbarth, Lucas, and Willis (: ‒) adopt the framework of Van Coetsem (, ) for analysing contact-driven syntactic change. This involves making a distinction between a Source Language (SL) and a Receiving Language (RL) in a contact situation. Further, borrowing is distinguished from imposition: a change made by a speaker dominant in the RL, under ‘RL-agentivity’, is borrowing, while a change made by a speaker dominant in the SL is imposition. In terms of the schemata in (), direct contact in (a) involves RL-agentivity and hence is borrowing. Indirect contact as in (b) is more likely to involve SL-agentivity (the alien grammar here is the SL grammar and the alien corpus consists of strings generated by this grammar) and hence imposition; this is also pointed out by Willis, Lucas, and Breitbarth (: ‒); see also their discussion of Bowern (). As already mentioned, I will leave aside further discussion of issues to do with the nature of language contact, second-language acquisition and possibly creolization here and return to them in Chapter . An example of what looks like direct contact, that is, contact-induced borrowing affecting aspects of syntax, comes from King’s () study of the French spoken on Prince Edward Island in Canada. As Winford (: ) points out, this instance of contact involves RL-agentivity. Prince Edward Island (PEI) French is a variety of Acadian French, also spoken in the other Canadian Maritime Provinces (New Brunswick and Nova Scotia) and in parts of Newfoundland. French has been spoken in Acadia since the early seventeenth century (King : ), but according to the  census, only . per cent of the population of PEI were native speakers of French, and only . per cent spoke the language at home.  per cent of these people lived in a single area, Prince County (King : ). Contact with English is clearly very extensive, and King (: ch. ) documents much lexical borrowing and code-switching. What is of interest in the present context is that King (: ch. ) documents cases of preposition-stranding in PEI French. As we saw in Box . of Chapter , preposition-stranding is the cross-linguistically rather rare option of moving the complement of a preposition, while leaving the preposition ‘stranded’. English and Mainland Scandinavian languages allow this, both in wh-questions, illustrated in (a), and in passives, shown in (b) (Icelandic allows the equivalent of (a), but not (b) with a nominative subject; Kayne : ): 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE CHANGING TRIGGER

() a. Who did you speak to __? b. John was spoken to __. It is not clear what exactly permits this in English and the Scandinavian languages (see Abels  for a suggestion which exploits the structure of the PP). In most other languages which have overt wh-movement, the preposition must move with the wh-phrase, i.e. it must be pied-piped (see §.. on the wh-movement parameter). This is the case of Standard French, for example (both examples in () are equivalent to (a)): () a. *Qui as-tu parlé à __? b. A qui as-tu parlé __?

[French]

Of course, English also allows the equivalent of (b) (To whom did you speak __?). This is probably a case of formal optionality in English; a [+wh] C with an EPP feature can cause either the DP complement of the preposition or the whole PP to move, although Agreewh holds just between C and the DP. (We will look in more detail at the concept of formal optionality in §...) Optionality has social value, in that the pied-piping option is characteristic of formal registers. French has some property which requires pied-piping of the DP in all cases. Kayne () made an interesting and influential proposal concerning this. The central idea in Kayne’s analysis was that English prepositions resemble verbs in that their complements are always accusative and able to undergo movement. French prepositions, on the other hand, have inherently Case-marked complements (in the sense defined in §..), which cannot undergo movement independently of the preposition. So the parameter distinguishing French from English is connected to the differential lexical properties of prepositions in the two languages. The existence of inherent Case in French is morphologically signalled by the contrast between dative and accusative third-person clitic pronouns, for example, le (acc) vs. lui (dat.); English has no comparable contrast. (Icelandic distinguishes accusative and dative complements, and has Preposition-stranding of some types, as mentioned above; see Kayne :  for discussion.) PEI French behaves like English in allowing preposition-stranding. () illustrates this in a wh-question, a relative clause and a passive:14 14

King points out () that preposition-stranding has been observed in Montreal French (by Vinet : ), but the phenomenon is much more restricted in that variety,



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

() a. Où ce-qu’elle vient de __? [PEI French] where that she comes from ‘Where does she come from?’ (King : , ()) b. Ça, c’est le weekend que je me souviens de __ That it is the weekend that I me remember of ‘That’s the weekend that I remember.’ (King : , ()) c. Robert a été beaucoup parlé de __ au meeting. Robert has been much talked of at-the meeting ‘Robert was talked about a lot at the meeting.’ (King : , ()) Data like this indicate that PEI French has what appears to be Englishstyle Preposition-stranding. Thus, PEI French, thanks to the very extensive contact with English, has developed a parametric option which is lacking in Standard French, and not fully instantiated in other varieties of Canadian French; see note . King insightfully and convincingly relates PEI French prepositionstranding to the fact that PEI French has borrowed a number of English prepositions and particle-verb combinations (see King : ‒, tables  and ). Some of these are illustrated in (): () a. Ils avont layé off du monde à They have laid off some people at factorie. factory ‘They have laid off people at the factory.’ (King : , ()) b. Il a parlé about le lien he has talked about the link ‘He talked about the fixed link.’ (King : , ())

la [PEI French] the

fixe. fixed

according to King’s account of Vinet’s observations. Roberge (, ) surveys a range of Canadian varieties of French and observes that Alberta French is intermediate between Montreal French and PEI French.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE CHANGING TRIGGER

These prepositions also allow stranding: () a. Qui ce-qu’a été layé off __? who that has been laid off ‘Who has been laid off?’ (King : , ()) b. Quoi ce-qu’il a parlé about __? what that he has talked about ‘What did he talk about?’ (King : , ())

[PEI French]

King argues that ‘the direct borrowing of English-origin prepositions has resulted in the extension of a property of English prepositions, the ability to be stranded, to the whole set of Prince Edward Island prepositions’ (). If the option of stranding is genuinely a lexical property of prepositions, as roughly sketched in our remarks on Kayne () above, then we might expect that option to be borrowed with the English prepositions, although PEI French does retain an accusative–dative distinction in pronouns, as King (: , table .) shows. So here we have a fairly clear case of direct contact: at some point in the history of PEI French, elements from an alien grammatical system—English prepositions—were borrowed and this affected the parameter governing preposition-stranding. Acquirers seem to have generalized the input based on the English prepositions; we have already mentioned Input Generalization as a third factor (see §., (i)) and will return to it in §.. This is borrowing, rather than in any sense imperfect learning of French by native speakers of French; nor does it reflect imperfect learning of English by French speakers. PEI French appears to allow preposition-stranding in contexts where English does not. Hornstein and Weinberg () observed that examples like the following are unacceptable for most English speakers:15 () *Who did Pugsley give a book yesterday to __? 15

Hornstein and Weinberg suggest that this is because the verb must c-command the stranded preposition, but that in () the PP is ‘extraposed’ outside of VP and to its right, as its position relative to the adverb yesterday shows, and therefore the PP is not c-commanded by the verb. The relevant parts of the structure would thus be approximately as in (i): (i)

[ . . . [VP V] yesterday] PP

The definition of c-command was given in §.., ().



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

PEI French appears to allow examples equivalent to (): () a. Quoi ce-que tu as parlé hier à [PEI French] what that that you have spoken yesterday to Jean de __? John of ‘What did you speak yesterday to John about?’ (King : , ()) b. Quoi ce-que tu as parlé hier de __ à Jean? what that that you have spoken yesterday of to John ‘What did you speak yesterday about to John?’ (King : , ()) King relates this to the independent fact that ‘French does not have the strong adjacency requirements found in English’ ().16 The Prince Edward Island case is, as we said, a clear case of direct contact affecting the trigger experience. Reanalysis is relevant to the extent that the structure [PP P DP] changes its properties as DP becomes extractable from PP. To the extent that this was caused by extensive lexical borrowing of English prepositions, we can assume that whatever structural property of P allows stranding was ‘copied’ to French Ps and then generalized.17 It is hard to evaluate the role of strong p-ambiguity here, as the pied-piping option is apparently only available with certain prepositions; with de, it is not found: *De quel enfant as-tu parlé? (‘About which child did you speak?’) is not good, but in other cases it is possible: Pour quelle raison qu’il a parti? / Quelle raison qu’il a parti pour? (‘For what reason did he leave? / What reason did he leave for?’) (Ruth King, p.c.). It is clear, though, that strings with preposition-stranding crucially affected the PLD at the time of contact, leading PEI French to change the value of the preposition-stranding parameter, thus creating a difference with Standard French.

16 It could in fact be connected to the possibility of raising a participle to a slightly higher structural position than that occupied by English participles, allowing the verb a wider range of c-command possibilities, roughly in line with what is suggested in note . On raising of French participles, see Pollock (: ). We mentioned movement of nonfinite verbs in French in §.., note . 17 If we treat prepositions as a lexical category and maintain that all parameters are associated with functional heads (in line with the Borer-Chomsky Conjecture) then the relevant property must be a property of a functional head associated with P; this is the essence of Abels’ () analysis of preposition-stranding.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE CHANGING TRIGGER

In §.., we saw a similar case of copying of an abstract structural property associated with borrowing of a class of lexical items. There we saw how Stein, Ingham, and Trips () show that recipient passives with nominative derived subjects (of the type He was given a book) originated in mid-fifteenth century English with verbs of French origin. Stein, Ingham, and Trips (: , ) refer to copying of abstract structural features, which we suggested involved the innovation of an ‘extra’, passivizable v*. The parallels with King’s PEI study are very striking. In both cases, we are dealing with microparametric changes involving a class of functional items. Microparametric borrowing is thus readily attested. This naturally raises the question of whether we can find cases of mesoparametric borrowing. A possible example is that of null subjects and verb-initial word order found in Early Modern Irish English discussed in Corrigan (: f.). We will discuss this case in §... 3.3.2. Cue-driven parameter-resetting Let us now turn to the second way of making the trigger experience ambiguous and opaque: the cue-based approach. Lightfoot () and Dresher () argue that learners use input forms, i.e. pieces of PLD, as cues for setting parameters. The trigger in this case is not sets of sentences but fragments of utterances (partial structures) (cf. also Fodor  and Fodor and Sakas : ‒ on the potential importance of fragments of sentences for parameter-setting). For Dresher () each parameter has a marked and a default setting, and comes with its cue, as part of the UG specification of parameters. For example, Dresher (ff.) proposes that there is a parameter determining whether a given language’s stress system is quantity-sensitive (QS) or not (Q(uantity)I(nsensitive)). English, for example, is QS, in that the basic stress rule states that the penultimate syllable is stressed if heavy; otherwise the antepenult is stressed (cf. Cánada, with a nonheavy CV penult, as opposed to Vancóu:ver, with a heavy CV penult). Thus the heaviness (or quantity) of a syllable plays a role in determining stress-assignment. Not all languages have quantity-sensitive stressassignment; QS thus represents a value of a particular parameter. The parameter in question is formulated as follows:



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

() Quantity (in)sensitivity a. Parameter: The language {does not/does} distinguish between light and heavy syllables . . . b. Default: Assume all syllables have the same status (QI). c. Cue: Words of n syllables, conflicting stress contours (QS). (Dresher : , ()) Dresher (: ) points out that: In QI systems all words with n syllables should have the same stress contour, since they are all effectively equivalent. Taking quantity insensitivity to be the default case, a learner will continue to assume that stress is QI until it encounters evidence that words of equal length can have different stress contours.

Dresher goes on to point out that there is abundant evidence that English is not QI, and so the learner quickly sets this parameter to the QS value. Thus we see how the unmarked value requires no evidence, and the marked value is associated with the cue. We will look more closely at the question of the marked and default settings of syntactic parameters in the next section. Lightfoot (: ), however, takes a much stronger view and argues that ‘there are no independent “parameters”; rather, some cues are found in all grammars, and some are found only in certain grammars, the latter constituting the points of variation’. He illustrates this approach with the loss of V-to-T movement in ENE. He assumes that the NE situation whereby tense and agreement morphology is realized on V is the default (Lightfoot says that this is a morphological rule, but we can continue to think of it as an instance of Agree; see §.. and the discussion of root infinitives in §..), and so the V-to-T grammar needs the cue [T V] (‘[I V]’ in his notation). Lightfoot suggests that the main expression of the cue (where his notion of ‘expression’ is like our notion of p-expression in being a structure which requires the cue in order to be grammatical) to be subject-verb inversion, as in () of Chapter , repeated here as (): () What menythe this pryste? ‘What does this priest mean?’ (–: Anon., from J. Gairdner (ed.), , The Historical Collections of a London Citizen; Gray : ; Roberts a: ) This cue was perturbed by three factors: (i) the reanalysis of modals as T-elements (see §..); (ii) the development of dummy do in the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE CHANGING TRIGGER

sixteenth century, also a T-element (we briefly mentioned this in our discussion of the loss of V-to-T in §..; it may well have been the same change as that affecting modals); (iii) the loss of V, which clearly took away many environments in which verb-subject order had formerly been found. He concludes: with the reanalysis of the modal auxiliaries, the increasing frequency of periphrastic do, and the loss of the verb-second system, the expression of I[V] in English became less and less robust in the PLD. That is, there was no longer anything very robust in children’s experience which had to be analysed as I[V], which required V to I, given that the morphological I-lowering operation was always available as the default. (Lightfoot : )

This account is very similar to the one we proposed in §.., with the notion of cue playing the role of our notion of p-expression. Lightfoot also points out that ‘weak’ verbal agreement inflection is a precondition for the change (‘the possibility of V to I not being triggered first arose in the history of English with the loss of rich verbal inflection’; : ), although he does not explicitly say that this inflection is the expression of a cue, which would be the equivalent in his terminology of our claim in §.. that the relevant agreement morphology expresses the parameter. The similarities between Lightfoot’s cue-based account of the loss of V-to-T and the one we gave in §.., are, as we see, very close. In fact, the definition of ‘trigger’ in (b), in making reference to ‘a substring of the input text S’, is equivalent to the Lightfoot/Dresher notion of cue. However, in this sense, cues cannot be identified with parameters: parameters are abstract properties of grammars, features of part of an individual’s mental representation. As we have seen, parameters are an emergent property of the three factors of language design. In this perspective, cues are an aspect of the second factor: the learner’s interaction with the relevant PLD. In fact, we can take our definition of p-expression and trigger in () to both subsume the Lightfoot/ Dresher notion of cue and to define the second factor (as it relates to syntax). So we can see very clearly that, although the notion of cue is useful, it must be kept distinct from the notion of parameter. A further point is that Lightfoot’s cue-based approach is too unconstrained: if there is no independent definition of cues, then, in his terms, we have no way of specifying the class of possible parameters, and hence the range along which languages may differ, synchronically or 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

diachronically. It is, however, possible to maintain that parameters can be independently defined and that learners also make use of cues provided by the input (this is closer to Dresher’s view); if we do this we do not run into this difficulty. So, it seems reasonable to take the view that cues, i.e. triggers as defined in (b), are provided by the input while parameters are emergent properties of the three factors in language design. Construed in this way, the Lightfoot/Dresher view is essentially the one I have been presenting here, as the close similarities in the analysis of the loss of V-to-T in English show. However, there is a difference: Lightfoot has no account for the shift in the cue. He says: this model . . . has nothing to say about why the distribution of cues should change. This may be explained by claims about language contact or socially defined speech fashions, but it is not a function of theories of grammar, acquisition or change—except under one set of circumstances, where the new distribution of cues results from an earlier grammatical shift; in that circumstance, one has a ‘chain’ of grammatical changes. One example would be the recategorization of the modal auxiliaries . . . , which resulted in the loss of V to I. (Lightfoot : )

A morphologically based approach like that sketched in §.. can, on the other hand, explain the change in the cue/p-expression in terms of morphological loss. Let us now return to that account. 3.3.3. Morphologically driven parameter-resetting It is worth making two points here. First, we observed in §.. that there was a strong p-ambiguity in the analysis of very simple positive declarative sentences like John walks in sixteenth-century English, as schematized in () (repeated from () of Chapter ): () a. John [T walks] [VP . . . (walks) . . . ] b. John T [VP walks] We saw that Lightfoot also makes this observation. (a) represents the conservative structure with V-to-T movement, and (b) the innovative structure without V-to-T movement. The proposal was that the conservative system was preferred as long as there was a morphological expression of V-to-T movement through the agreement system. This was stated in terms of the following postulate, linking agreement



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE CHANGING TRIGGER

marking on the verb to V-to-T movement (repeated from () of Chapter ): () If (finite) V is marked with person agreement in all simple tenses, this expresses a positive value for the V-to-T parameter. We then proposed that the loss of much verbal agreement, particularly plural endings in both simple tenses, led to the loss of the morphological expression of the V-to-T parameter and thus to a reanalysis of (a) as (b) with the concomitant change in the value of the V-to-T parameter. Since (a) contains an occurrence of movement, V-to-T movement, missing from (b), if movement is triggered by FFs it must have at least one more FF than (b). Hence, by Feature Economy as in (), (), it is more complex than (b). So here the crucial factor creating ambiguity and opacity in the PLD is the erosion and loss of certain endings, something I take to be a morphological property. (In fact, it is more than likely that it is ultimately phonological; see Lass : ff.) This solves the Regress Problem and gives an account of what changed in the p-expression/cue-expression, unlike Lightfoot’s analysis. The second point concerns weak p-ambiguity, as defined in (c). As a comparison of the definition of strong p-ambiguity in (b) and that of weak p-ambiguity in (c) shows, the essential difference lies in the fact that a weakly p-ambiguous string triggers neither value of a parameter. This notion can be relevant to understanding certain aspects of change in that an independent change can render a former trigger weakly p-ambiguous, i.e. render it irrelevant for triggering some value of a parameter that it triggered prior to the independent change. The loss of V-to-T movement in ENE exemplifies this. The reanalysis of the modals and do as functional elements first-merged in T had the consequence that examples like () (repeated from () of Chapter ) no longer triggered V-to-T movement, i.e. they became weakly pambiguous: () a. I may not speak. b. I do not speak. Prior to reanalysis of modals and do as T-elements, such examples provided an unambiguous trigger for a positive setting for the V-to-T parameter, in that modals and do were verbs (with plural agreement marking) which moved to T and expressed the morphological trigger for V-to-T movement. Once modals and do are first-merged in T, such 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

sentences become weakly p-ambiguous in relation to the V-to-T parameter in that they are compatible with either value of the parameter. As such, an important, and frequently occurring, kind of trigger for the positive setting of the V-to-T parameter is lost. Weak p-ambiguity may be relevant to understanding certain changes in this way. (We will discuss weak p-ambiguity more in §. and §..) Another change discussed in Chapter  illustrates further how morphological change may affect the PLD in such a way as to create ambiguity and opacity in triggers and hence abductive change. This concerns the loss of so-called recipient passives in thirteenth-century English, as discussed in §... There we schematized the crucial reanalysis as in () (() of Chapter ): () [CP Him+DAT [TP [T was] [vP v [VP helped (him+DAT)]]]] > [TP He+NOM [T[uφ] was] [vP v [VP helped (he+NOM)]]] We treated this reanalysis as directly caused by the loss of dative-case morphology, i.e. by the loss of any morphological distinction between morphological accusative and morphological dative case. This led, we proposed, to a parametric change in the nature of v, in that it henceforth had the capacity to value Case features of the object in all transitive clauses, with the consequence that all inherently Case marked non-subject arguments were lost. We tied the reanalysis in (), and the associated parametric change in the Case-licensing capacity of v, to the loss of case-marking distinctions among complements. This is directly supported by Allen’s () detailed account of the breakdown of case morphology in English, as we saw. Let us now consider the effect of the loss of dative-case morphology on the relevant PLD in more detail. Consider a variant of () with a non-pronominal argument, which therefore after the loss of dative-case morphology is fully ambiguous as to which Case-feature it bears. We also make the argument singular, so that verbal agreement cannot tell us whether it is Nominative. The string in question is thus the man was helped. One could assume that the situation is straightforward here: if there is no dative morphology there is no abstract Dative Case. However, two points militate against this simple assumption. First, Allen (: ff.) shows that direct passives (i.e. those with Nominative subjects) appear in dialects where the dative/accusative distinction still remains. Second, the relation between abstract Case 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE CHANGING TRIGGER

and concrete case is not usually one-to-one; instead, it normally has the form of a one-way implication, viz. (see Kayne (: –) on this): () If a DP has morphological dative case, then the grammar has abstract Dative Case. This is in fact the simplest statement of the relation between morphological case and abstract Case that one can postulate, assuming the existence of any kind of abstract Case at all. What this implies is that as long as there was morphological dative case there could be no ambiguity at all regarding these constructions. However, it says nothing about the situation once the case morphology has been lost. One might conclude that the string in question is weakly p-ambiguous, since it provides no unambiguous information regarding the parametric property of v. But the very fact that, thanks to the one-way implication in (), there could be an abstract DAT, shows that the structure must be strongly p-ambiguous (since the presence of DAT implies one feature make-up for v, while the absence of DAT implies that active v* has uninterpretable φ-features; see §..). The structural ambiguity is partially represented in (), bearing in mind that English was a V language at this time, and supposing that our example is a main clause:18 () [CP The man+DAT/NOM [C was] [TP [T[uφ] (was)] [vP v [VP helped (the man+DAT/NOM)]]]] The structure in () shows that the clause is a CP, and that the man, which we are taking to be ambiguously Nominative or Dative, occupies SpecCP. What is left unclear in () is the nature of SpecTP and the way in which T’s uninterpretable φ-features are eliminated where the man is DAT. It is usually assumed that SpecTP must be filled, and whatever fills this position must be able to Agree with some feature in T, i.e. that T has an EPP feature (see §..). Where the man is NOM, we can assume that it moves through the SpecTP position on its way to SpecCP, thereby satisfying the EPP. And of course, the man’s interpretable φ-features Agree with T’s uninterpretable φ-features and so these uninterpretable features are able to be eliminated from the derivation along with the man’s NOM-feature. On the other hand, where See note  of Chapter  on the reason for assuming that these clauses are V-clauses, i.e. CPs. 18



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

the man is DAT, it has no feature which can Agree with T. (This is clear where we have a plural argument, as there is no agreement in number between a dative argument and the verb, i.e. between a DAT DP and T’s φ-features; see Allen (: ff., ff.) for discussion.) So, if the man is DAT, it is unable to move through SpecTP on its way to SpecCP as it cannot Agree with any feature of T.19 Therefore the EPP must be satisfied in some other way in this situation. There are two options as to what can fill SpecTP. On the one hand, we can postulate an expletive null subject (i.e. pro), an element which can also bear NOM and thus Agree with T (see §.. on expletive, i.e. non-referential, pro). The other option is to assume roll-up movement, in the sense introduced in §.., of either vP or VP into SpecTP (see examples (‒) in Chapter ). In order for rollup to satisfy the requirement that the element in SpecTP Agree with some feature of T, we have to assume that the moved category may contain the element which Agrees with T. In the present case, we may assume, following Baker, Johnson, and Roberts (), that the passive marker itself is a nominal element capable of bearing a Case feature which can Agree with T. (This idea is updated in the context of rollup by Richards and Biberauer .) So in fact our example in () really manifests a strong p-ambiguity between (a) and either (b) or (b0 ) (this is a more elaborate version of the reanalysis given in () above and in () of Chapter ): () a. [CP The man+NOM [C was] [TP (the man+NOM) [T[uφ] (was)] [vP v [VP helped (the man+NOM)]]]] b. [CP The man+DAT [C was] [TP pro+NOM [T[uφ] (was)] [vP v [VP helped (the man+DAT)]]]] 0 b . [CP The man+DAT [C was] [TP [vP v [VP helped+NOM (the man+DAT)]] [T[uφ] (was)] (vP)]] The structure in (a) is quite unproblematic: the man Agrees for Nominative with T, and moves through SpecTP on its way to SpecCP, satisfying the requirement for an element in SpecTP, which Agrees with T. In (b), the man bears the interpretable DAT feature which does 19 Chomsky (: ) suggests that the very similar dative subjects in Icelandic might Agree for a Person feature with T, but, as mentioned in note  of Chapter , there is no evidence that preposed datives were subjects in passives in OE. It thus seems correct to take the DAT DP to occupy SpecCP and to have not moved through SpecTP. So, at least in OE and ME, there is no Agree for person features in this construction.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE CHANGING TRIGGER

not need to Agree with T. It moves in one step to SpecCP. SpecTP is filled by expletive pro, which is NOM and so Agrees with T. In (b0 ), the man behaves exactly as in (b). However, vP raises to SpecTP and the passive marker on helped, which is contained in vP, Agrees with T. The ambiguity in () is created by the loss of dative morphology. Given (), where there is dative morphology, one of the options (b) or (b0 ) is the only one. Once dative case is lost, (a) becomes available. (a) involves two movements of the man+NOM: first to SpecTP and then to SpecCP. (b) involves just one movement of the man+DAT to SpecCP, and insertion of expletive pro in SpecTP. (b) therefore appears to be less complex than (a). We could attempt to introduce some further complexity cost associated with the postulation of expletive pro, but probably the best course of action is to assume that expletive pro is not relevant here and that instead the correct representation of the Dative option involves a structure with rollup like (b0 ). Since this structure involves copying of all the vP-internal material, along with the FFs (EPP features, see §..), this structure will be more complex than (a). In that case, the definition of economy in (), () gives the right results. As a consequence of this reanalysis and the associated parametric change, v’s feature make-up was changed as described in §.. with the consequences outlined there. Without the loss of dative-case morphology, Inertia ensures that the structure remained as (b0 ) (not (b) if we rule out expletive pro). A final illustration of the role of morphological change causing the PLD to change comes from our discussion of complementation in Latin and Romance in §.. Among various other changes discussed there, we suggested that Latin infinitives had a null C which was able to Agree for Accusative with an argument in the subordinate clause. This gives rise to the accusative + infinitive construction in Latin (or, more precisely, one variant of it—see the discussion in §. and below), as in () (repeated from (a) of Chapter ): () Dico te venisse. I-say you- come-- ‘I say that you have come.’ I tentatively proposed that Latin non-finite null C lost this capacity when the tense/aspect forms of the infinitive such as perfect venisse were lost. In line with () and () above, let us state this as a one-way implicational statement, as follows: 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

() If C[-finite] probes Accusative DPs, then it licenses inflectional distinctions marking tense/aspect. This means that after loss of forms such as venisse, the unmarked (formerly present) form of the infinitive was no longer associated with an Accusative-licensing C. Nevertheless, examples like (), with an Accusative subject of the complement clause and the unmarked form of the infinitive, would have been possible: () Dico te venire. I-say you- to-come We noted in §. that this construction was ambiguous in Classical Latin in that the Accusative subject of the infinitive te could be licensed either by the infinitival C, or by the main-clause v*. (The evidence for this comes from the two attested passive constructions in (a) and () of Chapter .) After the loss of the tense/aspect forms of the infinitives, examples like () became less ambiguous than previously, in that they became unambiguously English-style accusative + infinitives, with the Accusative subject agreeing with v* in the superordinate clause. This possibility was, however, then ruled out by a change in the value of Parameter G, which, following Ledgeway (), was probably a subcase of a deeper mesoparametric change in possible Agree configurations shown in () of Chapter . So the loss of morphology played a role in eliminating the accusative + infinitive construction in the development from Latin to Romance, although in this instance by eliminating an ambiguity rather than creating one. 3.3.4. Conclusion Here we have tried to apply the solution to the logical problem of language change proposed in the previous section to actual cases of change discussed in the earlier chapters. If contact, cues, or morphology cause changes in PLD, then p-ambiguity and consequent opacity of one representation result, leading to abductive reanalysis and associated parametric change. We also saw that there is often an implicational relation between a morphological trigger and a parameter value. We will return to this last point briefly in the next section.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

MARKEDNESS AND COMPLEXITY

The main question that has been begged throughout the discussion is that of the definition of opacity. This, along with the closely related notion of markedness, is the subject of the next section.

3.4. Markedness and complexity The purpose of this section is to connect economy as defined in (), () above to markedness, and thereby arrive at a basis for defining the marked and unmarked values of parameters with a view to formulating parameter hierarchies along the lines of () and () of Chapter . Here I will discuss an approach to determining in general the marked and unmarked values of parameters which correlates marked parameter values to the relative opacity or complexity of representations or derivations. The idea is that marked settings are associated with opaque, that is relatively complex, constructions. 3.4.1. The concept of markedness The concept of markedness originated in Prague School phonology, apparently with Trubetzkoy, and was taken up by Jakobson (). The history of the concept is described in detail in Battistella (: ff.). The basic idea can be stated as follows: given a binary opposition, the two terms of the opposition may stand in a symmetric relation or in an asymmetric one. In the former case, we say that the terms are equipollent; in the latter, we refer to one term as the marked one and the other as the unmarked one. The asymmetry lies in how the absence of specification of the terms is interpreted: the absence of the marked term implies the unmarked term, but the absence of the unmarked term does not on its own imply the marked term. In other words, all other things being equal we assume the presence of the unmarked term; the presence of the marked term requires something special, however, i.e. some kind of ‘mark’. This asymmetric formulation can be maintained independently of the nature of the terms involved, the nature of the asymmetry, or the correlates of the asymmetry in some other domain. It is often supposed that the marked term is associated with relatively greater complexity; this is arguably inherent in the idea of it requiring an extra ‘mark’. For example, Cinque (: f.) proposes a series of markedness conventions for the features associated with various 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

functional heads in his analysis of clause structure; see especially his table ., p. . He states that marked features are ‘more restricted [in] application, . . . less frequent, conceptually more complex, expressed by overt morphology’ (), while unmarked features are in each case the opposite. We can illustrate the essential asymmetry that characterizes markedness relations with phonological distinctive features. A phonological opposition, for example that of voicing, can be thought of as an equipollent opposition between [+Voice] and [Voice] or an asymmetric opposition between [mVoice] and [uVoice]. (Here and below, ‘u’ before the name of a feature means ‘unmarked’, not ‘uninterpretable’ as in the specifications of formal syntactic features in earlier sections; ‘m’ means ‘marked’.) Chomsky and Halle (: ch. ) discuss markedness in relation to the phonological distinctive-feature system they propose; the approach to markedness and complexity adopted here is largely inspired by their discussion. Where the opposition is equipollent, if a segment is not [+Voice] then it is [Voice] and viceversa. But where the opposition involves a markedness asymmetry, markedness conventions and perhaps other statements are required to determine the value of the coefficient of a feature (Chomsky and Halle : ff.). Moreover, there need not be a straightforward relation between the u/m values and the +/ values; for example, Chomsky and Halle (: ) proposed that [uVoice] is [Voice] if the segment is [sonorant], but [+Voice] if it is [+sonorant]. Marking conventions of the type first put forward by Chomsky and Halle imply that the underspecified feature [Voice] can ‘default’ to a given value under various circumstances, something impossible in the case of the equipollent +/ opposition.20 In turn, this leads to the possibility of an ‘elsewhere convention’, a notion going back to the Sanskrit grammarians of Indian antiquity (see Kiparsky ): in the absence of specification, the unmarked feature value is assumed, while a marked value requires positive specification, and therefore a longer description; the elsewhere convention can be seen as the linguistic case of a general notion of default and as such a third factor. For Chomsky and Halle, the markedness asymmetry 20 The idea that there is no single unmarked value for a feature, but that this may depend on other features, represents an important difference between Chomsky and Halle’s approach and the Prague School approach to markedness.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

MARKEDNESS AND COMPLEXITY

relates to the evaluation metric they propose for determining the relative simplicity of rule systems: ‘the unmarked value of a feature was cost-free with respect to the evaluation metric, while the marked values were counted by the metric’ (Battistella : ). The correlates of the asymmetry were stated by the marking conventions (Chomsky and Halle : – propose thirty-nine of these), which are intended to capture aspects of the intrinsic content of distinctive features. The correlates of markedness in the distinctive-feature system include: cross-linguistic frequency of unmarked terms (all languages have voiceless obstruents, but not all have voiced ones: note how in implicational universals, the marked value of an opposition entails the unmarked one (see Croft  for discussion)); unmarked terms appear earlier than marked ones in language acquisition and are lost later in language deficits (this was first proposed by Jakobson ); and the fact that unmarked values emerge under neutralization in certain positions, for example, the coda of a syllable or the end of a word (for example, final-obstruent devoicing is cross-linguistically very common, while obstruent voicing in this context is relatively rare). Kenstowicz (: –) discusses these points in more detail in relation to phonology. To summarize, the notion of markedness that has developed in phonological theory involves three central ideas: (i) the unmarked value of an opposition is underspecified, i.e. no specification implies the unmarked specification; (ii) the unmarked value is the ‘elsewhere’ case; (iii) the unmarked value has the shorter description. This last notion relates to a simplicity criterion often employed in computer science, the idea of minimum description length (see Berwick : ff. for an application of this idea to language acquisition). 3.4.2. Markedness and parameters Since we take parameters to have binary values (see §. for general discussion, and note that all the examples of parameters we have discussed have been formulated in a strictly binary fashion), we can in principle apply markedness logic to the opposition between these values; in other words, we can treat the binary opposition between the two values of a parameter as an asymmetric one in the sense described above. This has in fact been suggested in various places ever since the earliest formulations of principles-and-parameters theory (see 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

Chomsky : , and the discussion in Battistella : ff.).21 If we do this, we have to answer three questions: (i) what is the nature of the features involved in the asymmetric relation? (ii) what is the nature of the asymmetry? (iii) what are the correlates of the asymmetry in other domains? Regarding question (i), it suffices for now to simply treat the features in question as the values of a parameter, which, following the BorerChomsky Conjecture (see §.., () and §.., ()), involve the properties of FFs of functional heads; giving a fuller answer requires a proper statement of the form of parameters, something we have yet to do. In the next section, I will give a general characterization of parameter hierarchies which will provide a more substantive and nuanced answer to this question. We could answer question (ii) in terms of the definition of Feature Economy given in (), which we repeat for convenience: () Given two structural representations R and R0 for a substring of input text S, R is simpler than R0 if R contains fewer formal features than R0 . In terms of (), the nature of the asymmetry between the parameter values lies in the complexity of the structures generated by the grammars determined by the different values. The unmarked value of a parameter determines a grammar which generates simpler structures than those generated by the marked value. In the next section, we will consider the role of the other principal third factor of relevance to parameters, Input Generalization (see §., (ii)). The interaction of Feature Economy and Input Generalization gives rise to parameter hierarchies, which automatically express markedness relations, as we will see.

21

Chomsky’s discussions of markedness here and in Knowledge of Language (), make use of the distinction between the ‘core grammar’ and the ‘periphery’ in various ways. I am not assuming this distinction here, as seems to be more in line with Chomsky’s assumptions in his more recent work on the Minimalist Programme, where this distinction should no longer play a role. (This is what I take to be the implication of Chomsky’s remark that it ‘should be regarded as an expository device, reflecting a level of understanding that should be superseded as clarification of the nature of linguistic inquiry advances’ (Chomsky : , note )). As the text discussion will make clear, I follow Chomsky’s thinking in taking markedness to impose a preference structure on parameters for the language acquirer.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

MARKEDNESS AND COMPLEXITY

Question (iii) brings us back to the issue of most concern here. At least some of the correlates of the markedness asymmetry between parameter values lie in syntactic change: since abductive reanalysis and parametric change arise through p-ambiguity and opacity/complexity of the trigger, with the less complex structure being preferred, then—all other things being equal—parametric change will be in the direction of unmarked values.22 We expect to find correlates in language acquisition (for example, marked values being harder to acquire and hence acquired later) and typology (marked values being less crosslinguistically frequent). We discussed language acquisition in §. and saw that it is difficult, although in principle certainly not impossible, to observe the postulated connection between acquisition and change. I will come back to the relationship between markedness and typology in the next section. Feature Economy as in () comes remarkably close to the notion of ‘value’ put forward by Chomsky and Halle (: ): ‘The ‘value’ of a sequence of rules is the reciprocal of the number of symbols in its minimal representation’. Taking the relevant symbols to be FFs, the definition in () would make it possible to value syntactic derivations just as Chomsky and Halle propose valuing phonological derivations. We did essentially this in our discussion of the role of complexity/ opacity in abductive reanalysis in the previous section. A further point arises from this. Roberts and Roussou (: –) derive a series of markedness hierarchies from the definition of complexity in (). Here I give a simplified version of their hierarchy: () Move > Agree > neither Here ‘>’ means ‘is more marked than’. So a category set to a parameter value which requires movement is more marked than one which merely causes an Agree relation, which is in turn more complex than one which has neither property. This follows straightforwardly from the feature-counting idea: in order to give rise to movement, a category must have both uninterpretable FFs and the movement-triggering (EPP) feature. In order to give rise to an Agree relation, a category need only have an uninterpretable FF. Finally, in order to trigger 22 Of course, we do not want only this kind of change to be possible: change from unmarked to marked must be allowed somehow. This point will be dealt with in the next section.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

neither operation, a category should lack both EPP and uninterpretable FFs.23 This approach is developed in some detail in Roberts and Roussou (: ch. ), and we will consider some of its implications below.24 For a fuller picture of how parameters interact, we need to bring IG into the picture; we turn to this in §.. 3.4.3. The Subset Principle Other approaches to determining the marked and unmarked values of parameters have been put forward. One important and influential proposal was the Subset Principle of Berwick (: ‒, ff.). This states that ‘[t]he learning function maps the input data to that value of a parameter which generates a language: (a) compatible with the input data; and (b) smallest among the languages compatible with the input data’ (Manzini and Wexler : , cited in Roussou : ). The interest of the Subset Principle is that it relates to an important and fairly well-established aspect of language acquisition: the fact that language acquirers do not have access to negative evidence. In other words, language acquirers are not presented with ungrammatical sentences which are marked as such. As Guasti (: ) puts it, ‘negative evidence is not provided to all children on all occasions, is generally noisy, and is not sufficient . . . Children have the best chance to succeed in acquiring language by relying on positive evidence [emphasis omitted—IGR], the utterances they hear/see around them—a resource that is abundantly available’. (We touched on the absence of negative evidence in our discussion of the poverty of the stimulus in Chapter .) 23 One might wonder whether a category with three uninterpretable features is more marked than one with one uninterpretable feature and an EPP feature. This is predicted by (), but is not consistent with the hierarchy in (). () should be understood as holding in relation to a given feature: in that case, Move-F will always be more complex, and therefore more marked, than Agree-F, for any F, as Move requires the EPP feature in addition to F. Chomsky () introduces the possibility of Move occurring independently of Agree, possibly triggered by a further kind of feature known as an Edge Feature (EF). EFtriggered movement is characteristic of wh-movement, topicalization, and focalization, movements which typically target the ‘left periphery’ of the clause. It is possible, and would follow from (), that this type of movement is less marked than that triggered by EPP where Agree is involved. 24 We saw in §. that van Gelderen (, , , ) develops broadly similar ideas using a slightly different notion of economy. Her focus is on grammaticalization cycles of the kind illustrated by Jespersen’s Cycle; see §. (). In §.., I will propose a somewhat different notion of cyclic change in terms of parameter hierarchies.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

MARKEDNESS AND COMPLEXITY

Because they only have access to positive evidence, acquirers are, other things being equal, at risk of falling into a ‘superset trap’. This can happen if acquirers posit a grammar which generates a language which is a superset of the language generated by the actual grammar in their environment, in the sense that it contains no examples that are incompatible with the PLD to which the children are exposed, but it generates examples that are incompatible with the target grammar. (This situation would correspond to the schema for abductive change in () of §. above, with G a subset of G.) If children only have access to positive evidence, they will never hear any example which causes them to ‘retreat’ from the superset grammar. Thus they may posit a grammar which is incompatible with the target, and recall that it is a standard assumption in work on language acquisition that this does not happen; this is what underlies the empirical force of the Inertia Principle—see the discussion in §. above. In order to rule out the risk of superset traps, the Subset Principle is proposed as a condition on language acquisition. The Subset Principle, as just given in the quotation from Manzini and Wexler (), forces children to hypothesize the grammar which generates the smallest language compatible with the trigger experience. In this way, it is argued, they do not run the risk of falling into superset traps. The notion of markedness which derives from this then is that marked parameter values will generate bigger languages. The contrast between CNSLs like Italian and NNSLs like English may serve as a (slightly artificial) example (Parameter A of §..). As we saw in §.., CNSLs allow a definite, referential pronoun subject to be dropped in finite clauses, while NNSLs do not: () a. Parla italiano. b. *Speaks Italian.

[Italian]

On the other hand, CNSLs typically allow the pronominal subject to be expressed, just like NNSLs: () a. Lui parla italiano. b. He speaks Italian.

[Italian]

(As we saw in connection with examples ()‒() of Chapter , there are interpretative differences between CNSLS and NNSLs where subject pronouns are expressed in the former; I will gloss over these for the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

purposes of illustration of the Subset Principle, however.) Thus the grammar of Italian generates a larger set of strings than that of English. In other words, NNSLs are a subset of null-subject languages. This can be illustrated as follows: ()

Parla italiano

Lui parla italiano/he speaks Italian

The Subset Principle therefore implies that the positive setting of the null-subject parameter is more marked than the negative setting. One empirical problem that one could raise here is the evidence of early null subjects in L acquisition of NNSLs. However, as we saw in §.., this phenomenon probably is not attributable to a ‘missetting’ of the null-subject parameter at an early stage of language acquisition, and so the objection does not hold. A much more serious problem with the above line of reasoning emerges if we consider the various parameters we have put forward in our discussion: verb-movement parameters (both V-to-T and V), the parameters relating to negation and whmovement, and word-order parameters all define intersecting grammars. That is, in each case, one setting of the parameter allows one type of structure S and disallows another type S0 , while the other setting allows S0 and disallows S. This is clearest in the case of word-order parameters: one setting of parameter F, for example, allows VO and at the same time disallows OV, while the other setting has just the opposite effect. The intersection relation can be illustrated as follows: ()

John Mary loves G1 (OV)

John walks



John loves Mary G2 (VO)

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

MARKEDNESS AND COMPLEXITY

The material in the intersection is weakly p-ambiguous in relation to the parameter P in question in terms of the definition in (c), while the material in the complement of the intersection expresses the value of P. The same exercise could be repeated for the verb-movement parameters, the negative-concord parameter, and the wh-movement parameter. In fact, it can also be repeated for parameter A, to the extent that it is true that CNSLs do not have overt expletives (see §..): ()

Parla italiano. John speaks Italian./ E’ arrivato un uomo. Gianni parla italiano.

There arrived a man.

As Battistella (: ) points out: ‘[i]f markedness relations obtain between parameters that are not in a subset relation, they must be accounted for in some other way’. The above considerations seem to indicate that the Subset Principle is not a useful way of predicting markedness relations in general among parameters, since most—if not all—parameters define intersection relations of the kind seen in () and (). A further issue arises if we look at the Subset Principle in the diachronic domain. If NNSLs are subsets of CNSLs, then we expect a diachronic preference for change from the latter to the former. We saw in §. that there appears to be a tendency for change from to CNSL to PNSL to NNSL (see in particular §..), seen both in the history of Germanic and in the history of French. Assuming PNSLs are subsets of CNSLs and supersets of NNSLs, which the distribution of null subjects seems to indicate (although the distribution of overt subjects and the indefinite interpretations of null subjects in PNSLs may be problematic for this assumption; see §..), then this is consistent with what the Subset Principle predicts. However, we also saw at least certain varieties of contemporary French may be reintroducing consistent null subjects by reanalysing subject clitics as agreement markers; such a change goes against the predictions of the Subset Principle. We will return to these changes in §.. in the light of parameter hierarchies and other third factors. So we conclude, rather reluctantly, that the Subset Principle is not useful in providing the basis for determining the markedness of parameter values in cases like the above. The reluctance is due to the fact that the Subset Principle has the great conceptual merit of being firmly 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

grounded in an important fact about acquisition: that children do not have access to negative evidence. It is also a good candidate as a third factor, since it represents a general constraint on hypothesis formation. However, there is one area where the Subset Principle may be useful. This lies in distinguishing between a grammar which allows genuine formal optionality and one which does not. Abstractly, a case of this type would be where G allows an alternation between two constructions C and C while G, thanks to a different parameter-setting, does not. An example might be the difference between English and (Standard) French regarding preposition-stranding or pied-piping. (This was discussed in Chapter , Box ., and §...) English has the option of prepositionstranding or pied-piping, while French only allows the latter. The French situation is illustrated by () above, which we repeat here: () a. Who did you speak to __? b. To whom did you speak __? () a. *Qui as-tu parlé à __? b. A qui as-tu parlé __? The fact that English allows both options, while French only allows one of them means that the French parameter-settings generate a language which is a subset of the English one. We could, therefore, regard English as marked in relation to French in this respect. Of course, (b) is characteristic of a relatively ‘high’ register as compared to the more colloquial (a), but for the purpose of this illustration of the logic of the Subset Principle I abstract away from this; we will come back to the concept of the ‘social value’ of variants in §.. Biberauer and Roberts () develop this idea in relation to certain changes in the history of English. One case they discuss is the change in object movement in Late ME. We saw in §.. that several word-order parameters changed in ME. One of these was Parameter F, controlling whether VP rolls up around T. Combined with object-rollup around V (Parameter F), the rollup option here gives rise to general O>V>Aux order.25 After the loss of VP-rollup, movement of definite objects was optional until c. (van der Wurff , ). Around , this was lost, while negative and quantified DPs continued to 25

Recall that the FOFC requires that O precedes V in VP if VP precedes T in TP; see §...



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

MARKEDNESS AND COMPLEXITY

move (see §..). So the earlier ME grammar allowed the orders in (), while the post- one only allowed those in (): () a. O[def] V b. V O[def] () a. O[-def] V b. V O[def] The grammar which generates the orders in () is clearly favoured by the Subset Principle. Biberauer and Roberts argue that this change was caused by certain ambiguities in V and Verb Projection Raising structures which placed defocused shifted definite objects in stringfinal position, obscuring the possibility of (a); the Subset Principle then favoured the more restrictive grammar underlying (). Quantified and negative objects continued to undergo operator-movement to a position to the left of the ‘core’ VP, presumably because they were readily identifiable as operators. Biberauer and Roberts refer to this kind of change as ‘restriction of function’, whereby in one system a given operation applies more freely than in another. The more restrictive grammar then produces a subset of the grammatical strings of the more liberal one. In the case of fifteenth-century English, given that OV order was an option with non-negative, non-quantified DPs in the earlier stage (i.e. ME from roughly  to ), we have a situation where the fifteenth-century grammar only allowed OV for a particular class of objects and required VO elsewhere, while the earlier grammar allowed OV with any kind of object. Thus object-movement was restricted in function. There seems to be a tension between concepts of markedness based on the Subset Principle and those based on feature-counting, since the more restrictive grammar, favoured by the Subset Principle, requires more features than the less restrictive one. In terms of parameter hierarchies of the kind seen in A-F of §., however, we can think of Feature Economy and the Subset Principle as distinct third factors which can lead to different directions of change: Feature Economy can favour a parametric change in the ‘upward’ direction in a hierarchy, since each ‘lower’ option is associated with more features than all ‘higher’ ones (cf. the discussion of the Tolerance Principle in relation to Feature Economy in §..). The Subset Principle, on the other hand, favours the postulation of more restrictive, i.e. ‘lower’ options. Since, all 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

else equal, we want changes to be possible in either direction on a hierarchy, this is a good result. What remains to be clarified, however, is what causes one third factor or the other to play the dominant causal role. Presumably, ambiguous p-expression of a featural distinction in the PLD will prevent distinctions being made and therefore cause movement down the hierarchy to halt; the Tolerance Principle will not apply, and the Subset Principle is not activated as the ‘lower’ candidate option is simply not considered. We will return to these questions in §.., where we will see that the role of Feature Economy is more complex than indicated here. We see then that the Subset Principle has a major conceptual advantage, being based on what appears to be an important fact about language acquisition: namely, that language acquirers do not make use of negative evidence. Its actual application to parametric systems may be somewhat restricted, since so many parameters appear to define languages in intersection, rather than inclusion, relations. However, it may play a role in relation to true formal optionality, in predicting that such systems would be marked, and it may play a role in accounting for the diachronic phenomenon of restriction of function, which is a case of ‘moving down’ a parameter hierarchy. 3.4.4. Markedness and core grammar Another proposal, which was not intended to form the basis of a general account of the markedness of parameter values, was made by Hyams (: ff.). She took the view that markedness was a feature only of the ‘periphery’ of the grammar (in the sense briefly discussed in note  above). The null-subject parameter, however, is a property of core grammar, and so the question of the markedness of its settings does not arise. Nevertheless, on the basis of her observation of early null subjects in the production of children acquiring NNSLs, Hyams argues that the null-subject value is the more accessible value (Hyams : –) since null subjects do not require the costly process of lexicalization of pronouns (). Hence children acquiring English begin with the assumption that it is a null-subject language, and the parameter is reset to the negative value during the course of language acquisition (on the basis of the evidence from modals and overt expletive subjects, as mentioned in §.). One could imagine that this would favour a tendency in language change in the direction of null subjects, but, as 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

MARKEDNESS AND COMPLEXITY

we saw in §.., the general view now held amongst researchers on L acquisition is that early null subjects do not reflect a ‘missetting’ of the null-subject parameter, but rather some property of immature competence. For this reason we leave this proposal aside. 3.4.5. Markedness and inflectional morphology At this point it is justifiable to ask what the advantages of an analysis of parameter values into marked and unmarked values might be. Aside from connecting syntactic change to the form of parameters, as we have done, one independent point has to do with the nature of language acquisition. Lasnik () observes that there is an intrinsic connection between markedness in L acquisition and the question of indirect negative evidence. The notion of indirect negative evidence is discussed by Chomsky (: –): although, as we have suggested and as is the standard view amongst L‒acquisition researchers, children do not have direct negative evidence in the sense of not having access to the information that a given structure or string is ungrammatical, Chomsky suggests that indirect negative evidence may nevertheless be available in the case where some feature is expected by acquirers but is not actually found in the PLD. In such a situation, the lack of the ‘expected’ feature may be a kind of evidence: indirect negative evidence. A sufficiently clear and robust characterization of the markedness of parameter-settings may provide a form of indirect negative evidence. For example, if the marked value of a parameter associated with a given feature is that associated with movement, then if there is no evidence for movement in the PLD, the acquirer has indirect evidence that the marked value of the parameter in question does not hold. In other words, evidence for marked features requires direct positive evidence, and indirect negative evidence that the positive setting does not hold arises simply when the direct positive evidence is not available. That this is the case follows from our basic characterization of the asymmetric nature of markedness relations: the presence of the marked feature must be in some way signalled. So if there is no evidence for a marked parameter value, it is not assumed. This in itself is a form of indirect negative evidence. Absence of evidence can thus play a role in parameter-setting. If absence of evidence is the ‘elsewhere’ case, then we see a connection between the elsewhere condition, indirect negative evidence and markedness: the elsewhere case is the unmarked case. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

This conclusion is consistent with Chomsky and Halle (); see §... Seen in this light, the elsewhere condition is a general principle of computational conservatism that applies to language acquisition, and as such is a good candidate for a third factor. Feature Economy is a purely formal notion of markedness. As such, it differs from other approaches which have tried to relate markedness to substantive universals, either directly or indirectly. Chomsky and Halle’s () marking conventions relate the purely formal, featurecounting evaluation metric they propose to substantive phonetic and phonological universals. We also saw in §.. that Cinque (: ) proposes a series of markedness conventions for the features associated with various functional heads in his analysis of clause structure. His postulations of marked and unmarked values are based on familiar Jakobsonian criteria. For example, Cinque assumes that the unmarked value of his postulated MoodSpeech Act category is ‘declarative’, while the marked value is ‘-declarative’; the unmarked value of Modepistemic is ‘direct evidence’ and the marked value is ‘-direct evidence’, the idea being that in each case the unmarked value is inherently simpler than the marked one. How do Cinque’s proposals regarding markedness relate to the proposal made above regarding the relation between complexity and markedness? The two notions are quite distinct, in several important respects. The fundamental difference between the two is that Cinque’s proposals regarding markedness relate to the substantive content of features of functional heads, ultimately their notional semantic properties, while what was sketched in §.. is a purely formal, featurecounting notion. We therefore might want to keep the two kinds of markedness distinct. We could call the complexity-based notion of markedness discussed above formal markedness and Cinque’s notion substantive markedness. (This distinction is proposed in Roberts and Roussou (: ), although on slightly different grounds.) We saw earlier that Chomsky and Halle (: ch. ) link formal markedness (their feature-counting evaluation metric) to substantive markedness by means of markedness conventions. We might want to contemplate a similar move in the present context. One reason for this is that, as we saw above, Cinque proposes as one criterion of markedness a greater likelihood of morphological expression. This connects to the postulates introduced in the previous section regarding the morphological expression of certain parameter values. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

MARKEDNESS AND COMPLEXITY

Let us repeat those statements here: () If (finite) V is marked with person agreement in all simple tenses, this expresses a positive value for the V-to-T parameter. () If a DP has morphological dative case, then the grammar has abstract Dative Case. () If C[-finite] probes Accusative DPs, then it licenses inflectional distinctions marking tense/aspect. We can note that, directly in the case of () and indirectly (by means of the marked way of realizing SpecTP in the absence of a Nominative DP) in the case of () and (), the realization of a morphological feature implies the marked value of the parameter. This suggests that the following general template might hold for the relationship between morphological expression of a parameter and the markedness of that parameter:26 () If a formal feature of a head H is inflectionally expressed, then H is associated with a marked parameter value. Although rather vague as it stands, something like () could serve as a marking convention linking overt inflectional morphology with the marked values of syntactic parameters, as well as providing a clear general statement of the kind of morphological triggers (or cues, in Lightfoot’s  terminology) that are relevant in acquisition and change. It also predicts that the loss of inflectional morphology, at least for certain types of inflection, may perturb the PLD in such a way as to lead to abductive change along the lines we saw in the previous section. In the next section, we will propose a further marking convention related to cross-categorial harmony in word-order patterns and word-order change. 3.4.6. Markedness, directionality, and uniformitarianism One final very general point regarding markedness concerns the concept of uniformitarianism. We briefly mentioned this concept in §. in our discussion of diachronic aspects of subordination. This idea 26

() is deliberately vague in formulation and no doubt requires a great deal of refinement.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

is formulated by Croft (: ) as follows: ‘[t]he languages of the past . . . are not different in nature from those of the present’. In terms of the parameter-based approach to syntactic change, we can take this to mean that all languages at all times (in the history of our species) reflect the same basic language faculty (both broad and narrow in the sense of Hauser, Chomsky, and Fitch ) and therefore the same set of parametric options, and that those parametric options have the same markedness properties.27 Stated as above, the uniformitarian hypothesis seems very plausible. In fact, one can argue that it is a precondition for applying the UG-based approach to diachronic questions (see Roberts : ).28 However, one question we can raise has to do with the transition from unmarked to marked parameter values, an issue we have postponed until the next section. We clearly want to allow for the transition to marked parameter values, although we have not yet seen how this may be possible in terms of the general approach outlined above. If we do not allow for the innovation of marked values, then two highly problematic issues arise. First, we predict that all languages are tending towards a steady state, from which they will not be able to escape, where all parameters are fixed to unmarked values. Second, we have to explain where the marked parameter values currently observable in the world’s languages came from. In order to avoid these consequences it is necessary to have a mechanism for the innovation of marked parameter values. 27 Roberts (a: ‒) suggests that the view of parameters as emergent properties of the three factors of language design, as espoused here, may lead to a different view in that ‘what is a possible language has not changed and cannot change in diachrony . . . , but the array of probable languages may be different today from what it was in . . . human linguistic (pre)history from at least kya to roughly kya’ (). This somewhat nuanced view of uniformitarianism does not materially alter the main point to be made here, although it does allow for the diminution of the range of parametric options actually instantiated by the world’s languages over very long periods. Nichols’ () conclusions, to be discussed directly, may be consistent with this view. 28 Or indeed any kind of historical linguistics. Interestingly, for most of the history of linguistic thought in the West, uniformitarianism was not assumed, in that it was thought that the three languages of the Holy Scriptures, Latin, Greek, and Hebrew, were not subject to change or decay. See the discussion of Dante’s De vulgari eloquentia in Law (: , ). Clearly, the assumption that Latin and Greek could change was necessary for comparative Indo-European philology to be possible, although the Renaissance recognition of the changeability of Latin did not give rise to the postulation of the Indo-European family (see Law (: ff.) and Simone (: ) on this). See Roberts (a: ‒) and the references given there for discussion.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

MARKEDNESS AND COMPLEXITY

The question of uniformitarianism arises here, in that if every language were in the maximally steady state we would have a violation of a strong version of this thesis. However, a weaker interpretation would allow for the idea that change from marked to unmarked is more regular and frequent than change from unmarked to marked. This would entail that the set of languages in the world would gradually change towards ever less marked systems. On this view, UG and the available parameters do not change, and so uniformitarianism is not violated, but at the same time the range of different sets of options actually instantiated in the world’s languages steadily diminishes. In other words, if we think of the set of parameters as defining an abstract space (perhaps a ‘state-space’ in the terminology of dynamical systems—see §..) within which grammars can exist, a general move towards more and more unmarked systems implies that ever smaller pockets of the available space are occupied by actually existing systems. Something like this is certainly possible in principle; whether it is actually happening is an empirical question, albeit a rather difficult one to answer with any certainty. At first sight, there appears to be some evidence for something like this from typological studies: Nichols (: –), for example, observes that the overall level of structural diversity in (some aspects of) grammatical systems is lower in the Old World than in the New World and the Pacific. She points out that ‘[t]he high diversity there [in the New World and the Pacific—IGR] can be regarded as a peripheral conservatism in dialect-geographical terms; these areas, secondarily settled, are far enough from the Old World centres of early spread to have escaped the developments that have lowered genetic density and structural diversity in the Old World’ () (but see Campbell and Poser : ‒ for a critique of Nichols’ methods and conclusions). However, determining whether there is a global tendency for reduction in diversity requires knowledge of change at very great time depths, greater than the maximum of ,–, years which the traditional methods of comparative and internal reconstruction seem to allow. Nichols () addresses this very question, and in fact concludes that ‘today’s linguistic universals are the linguistic universals of the early prehistory of language’ (Nichols : ). This conclusion strongly favours the uniformitarian view, and the concomitant view that the world’s languages are no less evenly spread among the options made available by UG than they were in prehistory. As Nichols states ‘[t]he only thing that has demonstrably changed is the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

geographical distribution of diversity’ (). We thus clearly need a mechanism for introducing marked parameter values, as there seems to be an overall equilibrium over time in the grammatical systems attested, as far as we can tell; I will return to this point in the next section. 3.4.7. Conclusion In this section, we introduced the concept of markedness and applied it to parameter values, in terms of the definition of complexity given in (), (). We also looked at other approaches to the markedness of parameter values, notably the Subset Principle. Further, we briefly considered the relationship between markedness and indirect negative evidence, as defined by Chomsky (), and Cinque’s () proposals for substantive markedness values associated with functional heads. We considered the relationship between inflectional morphology and syntactic markedness, tentatively suggesting the correlation in (). Last, the possibility that the world’s languages are tending towards ever more unmarked systems was considered and rejected, following Nichols’ () conclusions. In the next section, I will conclude the general discussion of parameter-setting which has been the theme of this chapter by developing in full the notion of parameter hierarchy that we have touched on several times already (see in particular §.) and considering some of its consequences for syntactic change. This will lead to a somewhat different view of the relation between markedness and economy from that described here.

3.5. Parameter-setting, parameter hierarchies, and syntactic change In this section, I will synthesize the discussion in the preceding sections, by proposing a general format for parameters and parameter hierarchies and looking at the consequences of such an approach for language acquisition and change. The goal of this exercise is to give a clear view of the issues involved, and to bring together the strands of the discussion in this chapter and the preceding chapters, rather than to make new theoretical proposals. The conclusions we reach here will also form the basis of the discussion in much of the remaining two chapters. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER-SETTING, PARAMETER HIERARCHIES

Accordingly, we first present a general introduction to the idea of parameter hierarchies, in particular as it is presented in Roberts (a, ch. ). We then flesh out further the discussion of markedness from the previous section, considering in particular the third-factor conditions underlying markedness. 3.5.1. Parameter hierarchies In this section, I will lay out the approach to parameter hierarchies proposed in Roberts (a: ‒). We have seen the motivation for distinguishing different kinds of parameters, and we have elaborated a four-way taxonomy of parameters which I repeat here from §.. (): () For a given value vi of a parametrically variant feature F: a. Macroparameters: all heads of the relevant type share vi; b. Mesoparameters: all heads of a given naturally definable class, e.g. [+V], share vi; c. Microparameters: a small subclass of heads (e.g. modal auxiliaries, pronouns) shows vi; d. Nanoparameters: one or more individual lexical items is/are specified for vi. We have also adopted the BCC (see §.., ()): () All parameters of variation are attributable to differences in the features of the functional heads in the Lexicon (Borer : ‒; Chomsky : ; Baker a: f.). In these terms, we can characterize macro- and mesoparameters as different-sized aggregates of microparameters, in that they represent different distributions of identical FFs across functional heads. Our characterization of FFs in relation to the lexical entries of functional heads leads us to see parameters as reflecting UG-underspecified parts of lexical entries rather than add-ons. Here we will look at how the third factors in language design, general principles of computational optimization, constrain and determine the shape of parametric variation. This is the central idea behind the emergentist approach to parameters: parameters are the result of the interaction of the three factors, and not directly predetermined by UG. So in this section, we will look at the nature of parameter hierarchies, also seen as the result of the interaction of the three factors. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

Let us begin by looking again at the three factors in language design as put forward by Chomsky (: ) (this formulation is closer to Chomsky’s original one, and a little different in detail from that given in §., ()): ()

I. genetic endowment (‘apparently nearly uniform for the species, which interprets part of the environment as linguistic experience, a nontrivial task that the infant carries out reflexively, and which determines the general course of the development of the language faculty’); II. experience (‘which leads to variation, within a fairly narrow range’); III. principles not specific to the faculty of language.

(.I) is the innate UG, which I take to contain the following invariant and variant mechanisms (see Roberts a: ): () a. invariant operations of the computational system (Merge, delete, copy, Agree); b. a specification of the non-functional lexicon; c. a list of FFs with a specification of which FFs are optional and which not. (.II) is the PLD, or more precisely the PLD as mediated by the acquirer (‘intake’ rather than ‘input’ in the words of Evers and van Kampen ). It was thought since at least as early as Chomsky (: ch. ) that these two factors were crucial for determining the nature of the mature language faculty. The innovation in Chomsky (), although adumbrated in earlier work (see e.g. : ), is (.III). These third factors include in particular ‘principles of structural architecture and developmental constraints that enter into canalization, organic form, and action over a wide range, including principles of efficient computation’ (Chomsky : ). For the specific case of parametric variation, we can describe the three factors as follows: ()

I. underspecification of FFs in the lexical entries of functional items; II. triggers, defined in terms of p-expression; III. computationally efficient general strategies of L acquisition. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER-SETTING, PARAMETER HIERARCHIES

It is important to emphasize that, on the current view, (I) is UG’s sole contribution to parametric variation. The key aspect of UG in this respect is (c): a list of FFs with a specification of which FFs are optional and which not, from which each system makes a one-time selection of the optional FFs (e.g. φ-features, but not negation if the latter is a substantive universal, on which, see Willis, Lucas, and Breitbarth : ). In a nutshell, to adopt the apposite formulation in Biberauer and Richards (), parametric variation arises ‘where UG doesn’t mind’. The second factor relates to the contribution of the environment. This is not simply exposure to PLD. As already mentioned, this concerns intake rather than input. More specifically, the parts of the PLD relevant for parameter-setting are constituted by triggers, in turn defined in terms of p-expression as in (), repeated here: () a. Parameter expression: A substring of the input text S expresses a parameter pi just in case a grammar must have pi set to a definite value in order to assign a well-formed representation to S. b. Trigger: A substring of the input text S is a trigger for parameter pi if S expresses pi. Following proposals in Biberauer (, a,b, ) we can tentatively locate the domain of p-expression as departures from the simplest Saussurean form–meaning mapping. The non-functional lexicon relates semantic and phonological forms in a classic arbitrary relation, as in (): () Lexical entry for cat: /kæt/ (phonological form); λx[cat(x)] (semantic form). Clearly these arbitrary pairings must be acquired, and the ‘trigger’ here is simply exposure (of course, lexical acquisition is highly non-trivial and raises important stimulus-poverty questions, as pointed out by Chomsky : –, but these are not our concern here). Biberauer (: –, a: –, : –) lists a number of possible departures from the simplest form–meaning mapping: doubling/agreement (where there are two or more forms for one meaning, as in standard φ-agreement and negative concord), silence (where there are no overt forms for a given meaning, as with null arguments), 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

multifunctionality (several meanings for one form, e.g. epistemic and deontic modals), movement (given Chomsky’s ,  proposed ‘duality of semantics’, whereby at least A’-movement gives rise to information-structural effects), and same-category recursion (as in nominal compounds). The key point throughout is that in all these cases we have departures from a simple, arbitrary one-to-one form– meaning relation, and that these are the sign that there is more going on than just a mapping between phonological and semantic features; instead, FFs must be at work. In other words, all of these departures indicate the presence of FFs to the acquirer, and hence the possibility of variation (see also Guasti : f. for discussion of ways of bootstrapping from phonology and semantics into syntax). So UG provides a range of optional FFs, the acquirer interrogates the PLD looking for triggers expressed, following Biberauer’s proposals, as departures from the simplest form–meaning mapping (and other means of bootstrapping described by Guasti). This entire procedure is optimized in terms of computational efficiency by the domain-general third-factor conditions. The next step is to look at those conditions in detail. The third factors introduced in (.III) fall into several kinds, but I will concentrate on principles of computational efficiency, or more precisely, principles of computational conservatism. The central idea here is that cognitive computations are costly, and there is therefore a general pressure to make them as inexpensive as possible. Efficiency should be understood as the optimal conservative use of computational resources (this can in fact be understood in a variety of ways; see the discussion in Mobbs ). Two important third-factor strategies are given in (): ()

(i) Feature Economy: Given a pair of adequate structural representations R, R0 for a substring of input text of the PLD S, choose R iff R has n distinct FFs and R0 has m > n distinct FFs. (ii) Input Generalization: If a functional head Hi of class C is assigned FFi, assign FFi to all functional heads {H . . . Hn} in C.

Both of these have already been discussed, in particular FE in §., and IG was mentioned in relation to harmonic word-order patterns in §.. and §.., although the formulations here are slightly more 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER-SETTING, PARAMETER HIERARCHIES

precise than those given earlier. FE and IG do not exhaust the third factors; we have already considered Yang’s () Tolerance and Sufficiency Principles (see §..), the Subset Principle (§..) and the elsewhere convention (§.. and §..). Together, FE and IG form a minimax search/optimization algorithm, with FE minimizing FFs and IG maximizing available FFs. Biberauer (, a) combines them under a single notion, Maximize Minimal Means (MMM). To put it in a nutshell, the combination of FE and IG implies that acquirers first postulate NO heads bearing an FF (of the relevant type). This maximally satisfies FE and IG. Then, once an FF is detected in the PLD, that feature is generalized to ALL relevant heads, satisfying IG. Next, if a (class of) heads not bearing F is detected (but, crucially, F is known to be present in the system), the procedure retreats from the maximal generalization preferred by IG, and concludes that SOME heads bear F. At the same time, a distinction is made between the set of heads bearing F and its set-theoretic complement, and the procedure is iterated for the subsets. Once no further features are detected, the procedure halts. This NO > ALL > SOME procedure, in which the ‘SOME’ step creates a new distinction, closely parallels Dresher’s (, ) Successive Division Algorithm, which is a mechanism for detecting phonological distinctive features and hence determining phoneme inventories.29 In addition to Dresher’s approach, Biberauer (, a) points out connections between this approach and various other aspects of optimization and learning, including Lenneberg’s () notion of ‘differentiation’, the Chinese Restaurant Process in Computer Science, and suggestions by Jaspers () regarding learning of logical and colour terms (see Biberauer , a,  for further discussion, especially of Jaspers’ work, and more non-linguistic parallels). 29

(i)

The SDA is stated as follows (Dresher : ): The Successive Division Algorithm: a. Begin with no feature specifications: assume all sounds are allophones of a single undifferentiated phoneme. b. If the set is found to consist of more than one contrasting member, select a feature and divide the set into as many subsets as the feature allows for.[footnote omitted] c. Repeat step (b) in each subset: keep dividing up the inventory into sets, applying successive features in turn, until every set has only one member.

The similarity with the NO>ALL>SOME learning path described in the text is clear, as is the similarity with the more formal statement in () below.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

The notion of ‘adequate structural representation’ in (i) should be understood as an adequate parse of the relevant piece of PLD (which could be a complex head). This is important, since FE must interact with the PLD or no FFs would ever be postulated. What FE is intended to guarantee is that p-expression will always give rise to the postulation of the smallest number of FFs consistent with the expression. IG allows features, once triggered, to generalize across functional heads in a given class. Here the classes are those which define the different parameter types given in (). Macroparameters represent the largest possible class for the FF in question, hence macroparameters are inherently preferred by IG. Mesoparameters represent a smaller class, but nonetheless a large natural class, microparameters a still smaller class, and nanoparameters no real class at all, but just a small number of lexical items. It is clear that these options are successively less favoured by IG while, if we take FE to refer to FF-types, not tokens, FE is neutral. IG then effectively arranges parameter types into a preference hierarchy, with ‘larger’ settings preferred over relatively ‘smaller’ ones. IG embodies the logically invalid, but heuristically useful, learning mechanism of moving from an existential to a universal generalization: ‘if FF is present on some head, it is present on all (similar) heads’ (where ‘similar’ is defined in terms of class membership). Like FE, it is stated as a preference, since it too is always defeasible by the PLD: the acquirer navigates the parameter hierarchies by retreating from overgeneralizations created by IG. The crucial aspects of the PLD are those described above in the discussion of the second factor. This view is consistent with the following remark of Chomsky’s (: ): ‘the early steps of compiling linguistic experience might be accounted for in terms of general principles of data analysis applied to representations preanalysed in terms of principles specific to the language faculty, the kind of interaction one should expect among the three factors’. Given FE and IG, we can clearly see how the three factors interact to give rise to parameters of different types and to parameter hierarchies. We have presented FE and IG as computational-optimization strategies, but they also determine the way the data will be analysed, with data analysis driven by optimization. Furthermore, the PLD is ‘preanalysed in terms of principles specific to the language faculty’, in that it must contain triggers as defined above in order to be usable at all. The third-factor principles in () lead all the relevant functional heads to prefer to ‘point the same way’, creating the macroparametric 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER-SETTING, PARAMETER HIERARCHIES

effect out of microparametric pieces. They interact with the PLD and the optional FFs made available by UG to define a learning path. First, acquirers will always by default postulate that NO heads bear FF, for a given UG-optional FF. Given the underspecified nature of parameters, and the idea that the negative value for a parameter involving FF is simply the absence of FF, this satisfies FE. If FF is nowhere postulated, it also vacuously satisfies IG. The absence of a feature is not triggered by the PLD; it is simply the default assumption (NO is the elsewhere case). In this way, we avoid the problem of discovering negative values for parameters in the PLD: optional FFs are absent until triggered. At the second step, once an FF is detected in the PLD, IG requires that that feature is generalized to ALL relevant heads. This will lead to FF taking on its macroparametric value. In other words, thanks to IG, macroparameters are the most readily set kind of parameter. This is consistent with their diachronic stability, which we look at in the next subsection. At the third step, the learner retreats from the maximal generalization and moves to a mesoparametric option of some kind. Again, the absence of FF is not directly triggered. Instead, what happens is that the learner detects a formal contrast between one subclass of heads and another; as mentioned in §.., the Tolerance Principle gives a formal expression of how this happens. As mentioned above, there is a strong affinity between this approach and Dresher’s (, ) Successive Division Algorithm for the acquisition of phoneme inventories, as well as learning procedures observed in other domains. Let us apply the approach to a concrete case of parametric variation, word-order parameters construed in terms of the LCA as described in §... More precisely, suppose, following Biberauer, Holmberg, and Roberts (), that heads which follow their complements have an FF lacking on heads which precede their complements. Still following Biberauer, Holmberg, and Roberts, call this FF ^. Now, the learning path as we have described it will have as the default the postulate that NO heads bear ^, maximally satisfying FE and IG. This gives rise to harmonic head-initial order: no heads roll-up around their complement so all heads precede their complement. If ^ is detected in the PLD, e.g. by the need to parse a substring as complement > head (in the basic cases presumably determined by the acquirer’s ability to discern argument-structure relations, on which see Guasti : ‒), IG requires that ^ is generalized to ALL relevant heads. This gives rise to 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

harmonic head-final order. Next, if a head which does not bear ^ is detected (i.e. if head > complement order is recognized for some pair), a contrast among the maximal set of potentially ^-bearing heads is made such that one subset has ^ and the other does not. In other words, the learner postulates that SOME heads bear F. In the case of the roll-up parameters, this gives us a mesoparametrically based disharmonic system. We see then that, as first pointed out by Biberauer, Holmberg, Roberts, and Sheehan (), FE and IG conspire with the underspecified parameters to give rise to the NO > ALL > SOME learning trajectory. At the SOME stage, the operation is iterated on further subsets, and so on down to the point where no more distinctions are triggered by the PLD. Yang’s Tolerance Principle plays a role here: as we observed in §.. the recursive application of the Tolerance Principle makes it possible to make categorical, i.e. featural, distinctions. Hence the Tolerance Principle may cause acquirers to traverse the parameter hierarchies as they acquire more lexical items, including functional items with FFs specified. The learning path is restated in slightly more formal terms in () (see Biberauer, Holmberg, Roberts, and Sheehan : ): ()

(i) default assumption: ¬∃h [F(h)] (ii) ∃h [F(h)] ! 8h [F(h)]; (iii) if ∃h ¬[F(h)] is detected, select h’ ⊂ h as the domain of quantification and go back to (i); (iv) if no further F(x) is detected, stop.

Here FFs are treated as predicates and heads as their arguments, which seems natural as FFs are intuitively properties of individual heads. (i) states that the default is that FF is absent from the system; this is driven by FE and is consistent with IG. (ii) captures the abductive leap from existential to universal quantification driven by IG. (iii) expresses the notion of contrast in terms of subsets; and (iv) states that the procedure stops when all accessible features have been detected. The learning path just proposed appears to be problematic in relation to the Subset Principle (which, as we have seen, is probably another third factor), since it involves retreat from an overgeneralization while the Subset Principle is based on the idea that acquirers do not do this. However, there are two points to be made here. First, looking at our example involving head–complement order, we can see that the retreat is driven by positive evidence of head > complement orders, i.e. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER-SETTING, PARAMETER HIERARCHIES

evidence that the complement > head order favoured by IG is not fully general. The two systems in question—a harmonically head-final system and a disharmonic system—are not in a subset–superset relationship but in an intersection relationship. As we saw in §.., this may be true of most parametric options. More interestingly, the potential for a superset trap does not arise if we think of the acquirer as overgeneralizing due to ignorance of categorial distinctions, for example as being unable to distinguish N from V as proposed by Biberauer (, a,b, ) and Branigan (). This ignorance gradually erodes as the acquirer traverses the learning path: finer and finer distinctions are made as a consequence of this, as we have seen. As mentioned above, the Subset Principle may favour more restrictive options, hence it may have the effect of ‘pushing’ learners down the hierarchy. Concretely, the roll-up hierarchy we have been considering looks as follows (Roberts a: ‒ gives a more precise and detailed version of this hierarchy): () a. Is ^ present in the system? (Y/N) If N: rigidly, harmonically head-initial language. b. If Y: is ^ generalized to all (relevant) heads? Y: rigidly, harmonically head-final language. c. If N: is ^ restricted to some subset of heads e.g. [+V]: West Germanic. d. If N, . . . further restrictions on the distribution of ^ ! greater disharmony. It should be clear how the options in () conform to the NO > ALL > SOME pattern. It is important to stress once more that at the first stage the acquirer is not looking for ^, or, as it were, positively ‘setting’ ^ to ‘absent’. The feature is simply not posited because it is not triggered. The hierarchy in () has a number of very attractive properties. First, by situating the options that give rise to harmonic word orders at the top, as macroparameters (affecting the maximal range of heads), it captures the cross-linguistic preference for such orders, noticed in the typological tradition since Greenberg () and enshrined in Hawkins’ () principle of Cross-Categorial Harmony (see §..). This is because harmonic orders represent the two initial steps of the learning path: the default absence of the FF (caused by FE) and the next-default maximal 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

generalization of FF (caused by IG). If there is no PLD inconsistent with these options, then the learning path stops there. So we directly link default (or near-default) ‘parameter-settings’ to ease of acquisition and cross-linguistic frequency.30 In other words, we can simultaneously capture typological and acquisitional preferences. There is also, via ease of acquisition, a connection to diachronic stability, in that higher options are predicted to be more stable. I will return to the question of the relative stability of macroparameters in the next subsection. Second, a hierarchy like that in () allows us to capture disharmonic orders without proliferating parameters or FFs. There is just one FF relevant for determining head–complement order; it merely distributes across categories differently in disharmonic systems. Third, hierarchies like () give us a very easy way to state markedness relations. In a nutshell, the lower an option is on a hierarchy, the more marked it is. This can in fact be understood in terms of a minimum-description-length (or, more correctly, programme-size complexity) definition of complexity, if we assume that the more complex category descriptions at the lower reaches of the hierarchy will involve more category information and so will be, in the relevant sense, larger (see e.g. Rissanen and Ristad : –). Finally, and very importantly, the hierarchies dramatically restrict the space of possible grammars by creating dependencies among parameters. As observed by Roberts and Holmberg (: ), given parameter hierarchies, the cardinality of the set G of grammars is the cardinality of the set P of parameters plus one, to the power of the number of hierarchies. As they point out, if there are five hierarchies and thirty parameters, then the cardinality of G is 5 = ,,. This is significantly less than the billion grammars allowed by thirty binary independent parameters; see also Biberauer, Holmberg, Roberts, and Sheehan (: –). The hierarchical approach allows us to posit quite a large number of parameters, but almost all of them are dependent on the values of other parameters. In this section, we have introduced and illustrated the concept of parameter hierarchy, along with the idea that such hierarchies are emergent properties arising from the interaction of the three factors of language design put forward by Chomsky (). The third factors The roll-up hierarchies proposed by Roberts (: ‒) lead to a more nuanced view of the nature of the ‘pure’ word-order types and their cross-linguistic distribution. 30



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER-SETTING, PARAMETER HIERARCHIES

in question are FE and IG, although we have also at least aspects of the roles played by the Subset Principle and the Tolerance Principle. We have seen that these hierarchies have a number of attractive features, one of the most important of which is that they directly provide a definition of markedness in terms of the topography of the hierarchy: all things being equal, lower positions in hierarchies are more marked and higher ones less so. 3.5.2. Parameters and diachronic stability In this section, we will look at the diachronic implications of the fourway typology of parameters given in (). Before doing that, though, one or two further conceptual points should be clarified. First, it is important to see that the definitions in () do not add anything to UG. The basic notion of FF is unchanged, as is the account of what can and cannot vary (see ()). These definitions merely combine our notions of FF and functional head into sets of different sizes (thanks to Richard Kayne, p.c., for pointing this out). Also, since each definition relates to the value vi of FF, the different ‘parameter types’ really refer to parameter-settings, with a different distribution of FF constituting a different parameter-setting. Second, we can connect the definitions in () to the notion of pexpression as follows: () a. Macroparameters are maximally p-expressed; b. Mesoparameters are systematically p-expressed in relation to large natural classes of functional heads, core functional categories, or their effects; c. Microparameters are systematically p-expressed in relation to small, featurally definable subclasses of functional heads; d. Nanoparameters are unsystematically p-expressed on individual lexical items. It follows from () and, in particular, () that macroparameters are highly salient in the PLD. Therefore they are readily set by acquirers and, given the general view that diachronic change arises through parameter-resetting by reanalysis of aspects of the PLD in acquisition that we are defending here, will therefore tend to not be easily reset. Hence macroparametric values are strongly conserved diachronically. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

In fact, we could conjecture that they hold at the genus level (in the sense of Dryer : ‒, ‘genetic groups roughly comparable in time depth to the subfamilies of Indo-European’) or higher, possibly at the level of stocks such as Indo European. Mesoparameters also tend to characterize genera, but perhaps with exceptions and outliers owing to various diachronic contingencies (see Roberts b on the status of English in Germanic and Brazilian Portuguese in Romance in this connection, and §.. for more on English). Microparameters, on the other hand, will characteristically vary among closely related languages, in line with the general characterization in Kayne (). Finally, nanoparameters are more idiosyncratic, in a sense rather like irregular verbs, and simply have to be acquired on an item-by-item basis. We will look at some examples of nanoparameters and nanoparametric change in §.. and at microparametric change in §... Here I will describe and exemplify macro- and mesoparameters in more detail. Given the Inertia Principle combined with the fact that macroparameters are saliently expressed in the PLD, we can look for the effects of macroparameters in diachrony by looking for properties that have not changed, i.e. properties which are diachronically highly stable. What we are looking for is very simple: we can show that certain syntactic properties, which we know to be subject to parametric variation, are, as far as the record shows, extremely stable in the history of certain languages or language families. Here we consider three cases of this kind. The first case is harmonic head-final order. Dravidian, Japanese, and Korean are all good examples of languages that are usually described as being of this type, as is well-known. There are almost no departures from head-final order across Dravidian (the North Dravidian language Brahui has four prepositions, a result of extensive contact with IndoAryan; Steever : ). The same is true for the oldest surviving Dravidian attestations in Old Tamil, which date back to the fifth century . Furthermore, given the prevalence of head-final orders across the family and in its recorded history, Proto Dravidian is reconstructed as head-final. Proto Dravidian is dated by Steever (: ) to  , i.e. , years ago. In order to calibrate stability, let us assume that a new generation of native speakers emerges every twenty-five years (this is probably a conservative estimate: twenty or fifteen years is probably nearer the mark, but a lower estimate will only strengthen the point we are making here). For each new generation of native speakers parameter-settings 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER-SETTING, PARAMETER HIERARCHIES

have to be replicated in that generation. Call this an ‘iteration of the learning cycle’. In each iteration, each parameter value has to be replicated. Change takes place when a parameter value is not faithfully replicated. Given a new generation every twenty-five years, there are four iterations of the learning cycle per century. Therefore in , years we have  iterations of the learning cycle. Thus the relevant generalized rollup parameters (corresponding to option (b) in the hierarchy in the previous subsection) have remained constant over at least  iterations of the learning cycle. The oldest texts in Japanese date from around ‒  (Frellesvig ), and so are ,‒, years old. These show both headfinality and radical pro-drop, as in Modern Japanese (see Yanagida , Yanagida and Whitman  on word order in the history of Japanese). So these parameters have remained constant over forty-eight to fifty-two iterations of the learning cycle. Concerning Old Korean, one of the oldest attestations is from a Silla stele, Imsin sŏgi sŏk, dated to either  or  (Lee and Ramsey : ). Lee and Ramsey say that In this text, all the Chinese characters are used in their original, Chinese meanings, but the order in which they are put together is completely different from that of Classical Chinese. The syntax is almost purely Korean. For example, instead of the Chinese construction ‘from now’, the order of the two characters is reversed, Korean-style . . . Sentences end in verbs.

This seems to be a clear indication that Old Korean was head-final. If we date this text to   in round figures, then head-finality has remained constant for , years, or fifty-six iterations of the learning cycle. The second phenomenon is ‘radical’ pro-drop, or radical null-subject languages (RNSL) status as defined in §... We have already mentioned that Japanese has retained this property throughout its recorded history. The same is true of Korean (Lee and Ramsey ). Again we observe a strongly retained property. Third is multiple incorporation in the Algonquian languages. Branigan () shows that the polysynthetic nature of the Algonquian language Innuaimûn (formerly known as Montagnais) can be analysed as ‘multiple attraction’ by T, Fin, v, p, and n. He analyses an example like () as having the underlying structure in (), from which () is derived by multiple attraction of Aux and V-v by T:



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

ueueshtâ-pan n-ûtâpân. () Pûn tshî Paul CAN repair-/ -car ‘Paul can repair my car.’

[Innuaimûn]

TP

()

T -pan

AuxP Aux tshî

vP

DP Pûn

v’ v -tâ

VP V ueuesh

DP n-ûtâpan

TP

()

T Aux tshî

AuxP T

v V ueuesh

v -tâ

Aux t T Pan

vP DP Pûn

v’ v t

VP V t

DP n-ûtâpan

Branigan proposes that () is derived from () by, first, movement of Aux to T, followed by movement of V-v (independently formed by headmovement in the standard way), which ‘tucks in’ between Aux and T, as shown in (). This, combined with raising the subject DP from SpecvP to SpecTP, derives the order seen in () from the underlying structure in (). See Branigan (: ff.) for arguments in favour of this analysis and detailed technical exposition. Branigan goes on to show that similar derivations apply involving v, p, Fin and n (again, see Branigan  for the detailed argumentation and exposition). He concludes that the highly polysynthetic nature of Innuaimûn is a consequence of what he refers to as the Algonquian Macroparameter (AMP): 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER-SETTING, PARAMETER HIERARCHIES

() For all functional categories F, F attracts multiple heads. Branigan argues that a single macroparameter like () is preferable to a series of category-specific options of the kind ‘Fin attracts multiple heads’, ‘T attracts multiple heads’, ‘v attracts multiple heads’, etc. What is most relevant here are Branigan’s twin observations that (a) multiple incorporation is a ‘signature’ property of the Algonquian family, and (b) this family is very old and geographically widespread. Regarding (a), Branigan (: ) points out that ‘all Algonquian languages appear to make use of multiple head-movement in essentially identical ways’. Similarly, Mithun (: ) observes that Algonquian languages are polysynthetic, and Pentland (: ) says that the ‘Algonquian languages are polysynthetic, hierarchical, nonconfigurational head-marking languages.’ More specifically, the Algonquian verb has a tripartite structure, containing an ‘initial’ verb-stem, a ‘medial’ incorporated noun, and a ‘final’ which indicates transitivity and animacy of the subject (intransitives) or the object (transitives) (see Mithun :  and Branigan :  for illustration and references). Concerning the age of Algonquian, there appears to be general agreement with the view expressed in Goddard (: ), who says that ‘[t]here are no accurate means of estimating how long ago the Proto-Algonquian parent language was spoken, but a reasonable guess would be about , to , years ago.’ To quote Branigan again: ‘This history is reflected in numerous ways . . . There are lexical differences between all Algonquian languages . . . There are phonological differences . . . There are also morphological differences, often involving inflectional morphology’ (Branigan : ). The geographical extent of the family, especially if the Ritwan languages Yurok and Wiyot of Northern California are included (giving the higher-order family known as Algic), is impressive: ‘[t]he Algic or Algonquian-Ritwan family, consisting of around  languages, stretches from present Labrador southward along the Atlantic coast as far as North Carolina, westward across the plains into Alberta and Montana, and to the Pacific coast’ (Mithun : ). As Branigan points out, [P]opulations in different regions have shared areal contact with very different types of other languages, from the Salish languages in the northwest, to Inuktutut in the north and Siouan and Iroquoian in large areas in the middle. In short, groups of Algonquian speakers have . . . been dispersed for a very long time. (Branigan : )



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

What we observe here is a syntactic property, polysynthesis reflected in multiple head-attraction, which has not changed for millennia. In , years, we have  iterations of the learning cycle. So Branigan’s claim that the AMP in () has remained constant throughout the history of Algonquian can be seen as the claim that the parameter has successfully replicated itself over  iterations. In these terms, a harmonic head-final system associates the roll-up trigger (^ in the notation adopted by Biberauer, Holmberg, and Roberts ; see the discussion of () in the previous section) with all heads. Therefore, by the definition in (), harmonic head-finality is the reflex of a macroparameter. Similarly, a radical pro-drop system allows all probes to ‘license pro’ in the absence of agreement marking (see §.. and Parameter A of §..), and so this is also a macroparameter. Finally, polysynthesis involves head-movement of all goals, as Branigan’s AMP given above attests. So again this is a macroparameter. We observe, then, three cases, each independently thought to be macroparameters, which are conserved for millennia. As (b) says, mesoparameters concern a naturally definable class of syntactic categories (e.g. all verbal, all nominal, all φ-bearing heads or all finite Cs) and, as such, are ‘smaller’ than macroparameters (which, as we have seen, concern all relevant categories), but ‘larger’ than microparameters (which affect much smaller natural clauses). A good candidate for a mesoparameter is the classical consistent null-subject parameter, as manifested in most Romance languages (see §..). In §.., we stated this parameter as follows: () Are null subjects allowed without person restrictions and with associated ‘rich’ verbal-agreement inflection? (See Holmberg , Barbosa , and Roberts a: f. for more sophisticated accounts.) This parameter affects all (finite) subjects, arguably a natural class of elements. The consistent null-subject parameter has been stable from Latin through most of the recorded histories of Italian, Spanish, and European Portuguese (although Brazilian Portuguese has diverged since c.—see §.. and §..). French and certain Northern Italian dialects are rather different from the rest of Romance. It is possible that these varieties were subject to Germanic influence in the latter part of the first millennium and that this destabilized the old system thereby causing various microparametric changes in those varieties (see Vanelli, Renzi, and Benincà /). 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER-SETTING, PARAMETER HIERARCHIES

Another likely case of a mesoparameter is V in Germanic (Parameter C of §... and §.). This property characterizes all of Modern Germanic except for Modern English, which lost V in the fifteenth century, as we saw in §.... However, the diachronic emergence of V is obscure and the evidence from Gothic, Old High German and Old English—none of which were ‘fully V’ in the way the Modern Germanic varieties are—suggests it was not present in Proto Germanic; cf. Walkden (). Nonetheless, we can give Modern Germanic V the following characterization (see Roberts b): () Root declarative C attracts V and has an Edge Feature (EF). This parameter affects all verbs, and hence applies to a large natural class. V has remained remarkably stable across nearly all North and West Germanic varieties, with English changing in the fifteenth century for reasons that remain unclear (cf. Kroch and Taylor  for the suggestion that this was caused by contact). Significantly, where V does not consistently hold of all root Cs, as is the case in Modern English, movement to C is much less stable; this point is emphasized in Biberauer and Roberts (), and we return to it below. A further example of a mesoparameter is verb-movement in Romance. As we saw in §..., the Romance languages show varying degrees of verb-movement in the inflectional domain of the clause. We suggested that the Romance languages share a mesoparametric value which causes the finite lexical verb to raise out of VP into the TenseMood-Aspect (TMA) field (Parameter B). There is an associated set of microparameters (Parameters B‒B) arranged in a hierarchy, which we repeat here (see Schifano : ): ()

B1: does V move out of VP in finite clauses? NO English

YES: B2: does V move to the higher Aspect head?

NO Spanish

YES: B3: does V move to the Tense head?

YES: B4: does V move to the Mood head? NO Portuguese NO YES: Italian French, Romanian 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

V-movement into the inflectional field characterizes all the Romance languages; V characterizes the whole of Germanic except for postfifteenth-century English (see §...). It seems to be a typical property of mesoparameters that they hold of language genera, but that we frequently find outliers as mesoparameters are more amenable to change—perhaps especially through contact—than macroparameters. In this section, we have looked at some of the diachronic implications of the four-way taxonomy of parameters as macro, meso, micro, and nano, concentrating on macro- and mesoparameters; we will look at examples of micro- and nanoparameters in §.. and §... It is important to stress that what all four types of parameter have in common is that they concern variation in the FFs of functional heads; in this sense, they are all consistent with the BCC. Formally, they differ only in the size of the set of heads that bears the same FFs: for macroparameters, this is the maximal set; for mesoparameters, it is a large natural class; for microparameters, a small natural class; and for nanoparameters, either a very small, not necessarily natural class of items or the minimal set of one item. Since parameters determine typology and change, each type of parameter has correspondingly large-scale or small-scale effects. Macroparameters determine gross typological properties and are diachronically stable; mesoparameters determine ‘signature’ properties of genera which often have exceptions and are, accordingly, somewhat less diachronically stable; microparameters determine intricate local variation, often best observed among closely related systems (as stressed by Kayne ) and are somewhat unstable, with grammaticalization a pervasive kind of microparametric change, while nanoparameters are highly unstable and very parochial. These typological and diachronic properties are determined directly by the p-expression of each parameter type as stated in (). It is clear that these parameter types fall into a hierarchical relationship of the kind we described in the previous section. For the remainder of this book, I will assume this approach to parameters and parameter hierarchies. 3.5.3. From unmarked to marked: traversing parameter hierarchies Parameter hierarchies give us an automatic definition of markedness, as we have seen: the lower an option is in a hierarchy, the more marked it is. This follows from FE and IG, in that SOME options, the third and most marked of the NO>ALL>SOME pattern, seem to require some 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER-SETTING, PARAMETER HIERARCHIES

violation of both: features must be postulated, in violation of ‘pure’ FE, and their domain of application must be restricted to some subset of functional heads, in violation of ‘pure’ IG. The more narrow, i.e. microparametric, the SOME option, the more features seem to be required and the less they are generalized. It follows that any change which involves a system moving from a higher to a lower option on a parameter hierarchy is in fact a change from a relatively less to a relatively more marked option. Conversely, any change which involves movement upwards in a parameter hierarchy involves movement from a more to a less marked option. We have seen several examples of changes involving feature loss and concomitant movement upwards in a parameter hierarchy. The loss of V-to-T movement in ENE brought about a situation in which English has the negative value of parameter B in (); as such, English is higher in the hierarchy than any Romance language and, although we don’t know where earlier English was situated in () (see the discussion of Haeberli and Ihsane  in §...), it must have had a positive value for one of B‒B. Hence the loss of V-to-T movement led to a less marked system; NE has at least one FF less than ENE, the FF responsible for raising lexical verbs out of VP (however exactly that feature is characterized).31 Similarly, a system lacking V altogether is less marked than any kind of V system, presumably; I will return to this point below in the discussion of conditional inversion in the history of English. Wh-movement seems to show the same diachronic tendency. We saw evidence in §.. that Latin had multiple wh-movement, i.e. a positive value for Parameter E. Most of the Medieval and Modern Romance languages have English-style single wh-fronting per wh-C, but contemporary Colloquial French, Brazilian Portuguese, and Spanish have wh-in-situ to varying extents depending in part on register, i.e. Parameter E has changed or is changing in these varieties. Here again, we can think in terms of the loss of the relevant movement-triggering features and concomitant diachronic movement upwards in the In fact, since it seems that verb-movement was lost in two stages between the fifteenth and seventeenth centuries (see §...), pre-fifteenth-century English must originally have been either at B or B, with the first change moving ‘up’ to either B or B in the fifteenth century. As Haeberli and Ihsane () show, the second change in the seventeenth century then fixes English at B. Both changes are clearly from more to less marked as characterized here. 31



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

relevant parameter hierarchy (see the hierarchy for wh-movement in Roberts a: ). On the other hand, we have also seen examples of changes in which a system moves downwards in a hierarchy. In §.., we saw that the Final-Over-Final Condition (FOFC) requires word-order change from head-final to head-initial to affect higher heads in a given Extended Projection before, or at least not later than, it affects lower ones. If this is not the case, then an intermediate stage with FOFC-violating orders of the kind schematized in () will result: () *[βP [αP α

γP] β]

We saw evidence supporting the prediction that TP changed from head-final to head-initial before VP changed from head-final to headinitial in both the history of English and the history of Latin/Romance; see the examples and references given there. In terms of the word-order hierarchy in (), there is a series of further options subsequent to (d), which determines head-final order in the clause (assuming all clausal heads to be [+V]). These options must give rise to systems which obey FOFC, and so they must amount to head-finality in TP and lower, and, last, head-finality in VP, with further options for any other clausal heads situated between TP and VP such as AspP, vP, etc. (see Biberauer and Sheehan : ‒, Roberts a: ‒ for slightly differing examples of such ‘lower’ parts of the word-order/roll-up hierarchy). Hence the evidence presented in §.. for successive shifts from head-final to head-initial orders in successively lower categories in the clause is evidence of systems moving down the hierarchy over time, and hence change from relatively less to relatively more marked. A further case of this type is the change from CNSL to PNSL to NNSL, which we saw evidence for across Germanic in §.. and for French (with an unusual and rather short PNSL phase) in §... In §.. we posited the following parameters for the different kinds of NSLs: () A. Are null arguments allowed in the absence of agreement marking? (Macroparameter) YES: RNSLs (e.g. Japanese, Korean, Mandarin . . . ). NO: non-RNSLs (e.g. most, perhaps all, Indo-European languages). 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER-SETTING, PARAMETER HIERARCHIES

A. Are null subjects allowed without person restrictions and with associated ‘rich’ verbal-agreement inflection? (Mesoparameter) YES: CNSLs (e.g. Italian, Spanish, Greek, Latin, Gothic, European Portuguese, Old French). NO: non-CNSLs. A. Do null subjects show person restrictions and allow indefinite interpretations in the Sg? (Microparameter) YES: PNSLs (e.g. Finnish, Brazilian Portuguese, Marathi, Russian, Early Old High German, etc.). NO: NNSLs (e.g. Modern English, Modern French, Modern Swedish, Modern Norwegian, etc.). It is clear that we can regard A‒A as a hierarchy for null subjects (fuller versions of this hierarchy are given in Roberts a: , ). So again we see changes moving down the hierarchy, from less to more marked. On the one hand, it is desirable to have a mechanism allowing change from less to more marked systems for the reasons discussed in §..: our theory of change must accommodate the fact that the overall range of diversity in the world’s languages does not, as far as can be discerned, seem to have changed. Certainly there is no evidence for any kind of general tendency towards a steady-state maximally unmarked system. On the other hand, if the notion of markedness is to have real content, there should be at least a weak dispreference for change from less to more marked in contrast to change in the opposite direction. If we consider the various third factors we have discussed, this may in fact be the case. We said above that FE disfavours movement down the hierarchy, because such movement involves the addition of new features. But in fact this isn’t always true: in both the word-order/roll-up hierarchy in () and the null-subject hierarchy in () the features involved are the same: ^ for () and, arguably, Person in () (the null-argument hierarchies in Roberts a referred to above are formulated in terms of the distribution of Person features, with PNSLs and NNSLs having a more restricted distribution of Person than CNSLs). What differs in the more marked options lower in the hierarchy is the distribution of the features. Therefore, the lower options are successively more marked in relation to IG, but not in relation to FE. Both FE and IG favour higher positions in parameter hierarchies and therefore 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

favour upward parametric changes, but only IG actively disfavours lower positions; FE, to the extent that new features are not introduced in lower positions but only the distribution of features is affected, is neutral.32 Furthermore, as discussed in §.. above, the role of the Subset Principle can also be seen in this light: if we think of the Subset Principle as a preference to avoid superset traps by preferring more restricted feature distributions over less restricted ones, then it will always favour going down the hierarchy as an acquisition strategy. To this extent, the Subset Principle favours lower, and therefore more marked, parameter-settings. But the effects of the Subset Principle are slight, for the reasons discussed in §.., and indirect as far as syntactic change is concerned. Taking the three third factors together then, we have the general picture shown in Table .. Table 3.1. Third factors in relation to direction of change in a parameter hierarchy Upward change

Downward change

Feature Economy

Favoured

Neutral

Input Generalization

Favoured

Disfavoured

Subset Principle

Disfavoured

Favoured

Our fourth third factor, the Tolerance Principle, is entirely neutral as to the direction of change, since it simply defines the threshold at which a downward step in a hierarchy is taken. The elsewhere condition can be seen as complementary to Tolerance, in that it defines defaults, i.e. what happens when the tipping-point defined by Tolerance is not reached. Hence we see that, taken together, the third factors give rise to a slight preference for upward over downward change. Given that markedness always involves preferences, this seems like a good result. The unmarked nature of upward change arguably manifests itself in a particular phenomenon observed by Biberauer and Roberts (: 32 For this to be maintained, we need to slightly weaken what was said in §.. on the interaction of the Tolerance Principle with parameter hierarchies, namely the idea that all movement down a hierarchy involves new feature distinctions. Movement down a hierarchy may involve new feature distributions (i.e. new combinations, for example) but not necessarily and not always new distinctions.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER-SETTING, PARAMETER HIERARCHIES

), which they refer to as ‘instantaneous simplification’. This consists in a change from a low position to a significantly higher position on a hierarchy caused either by the complete loss of a feature (favoured by FE and, vacuously, IG) or a major enlargement of the distribution of a feature (favoured by IG). Biberauer and Roberts discuss the development of Conditional Inversion (CI) in the history of English. In contemporary English, this is a case of T-to-C movement in the antecedent of an unreal conditional as in examples such as the following: () a. b. c. d. e.

Had I been rich, everything would have been ok. Should he do that, everything will be ok. *Did I do that, everything would be ok. ?Were I/he to do that, . . . *Were I rich / a rich man / in London, . . .

As (c, e) show, in contemporary English CI is restricted to a small subset of auxiliaries: had, should, and, marginally, were (with an infinitival complement as in (d); note the contrast with predicative were in (e)). Given this restriction, Biberauer and Roberts argue that CI represents a nanoparameter associated with irrealis C in contemporary English. As a case of T-to-C movement, CI is an example of residual V, as described in §.... Biberauer and Roberts show that at the OE stage (which has survived fairly robustly almost everywhere else in Germanic), V-movement into the C-system was generally triggered, although as we saw in §... both Fin and Force only triggered V in particular clause types (see (, ) of §...). From roughly  onwards, only non-declarative C (or Force) triggers movement of T. As Biberauer and Roberts () point out, this is a change from a mesoparameter to a microparameter. In the modern period, nondeclarative C/Force is no longer a uniform movement trigger; conditional C/Force must be distinguished from interrogative C/Force. Again the movement-triggering features have become more specific and the parameters involved more micro. At the last stage, which began in the mid-nineteenth century (see Biberauer and Roberts  for references and discussion), CI represents a nanoparameter, as it is restricted to just the three auxiliaries seen in (). So we see clear movement down the hierarchy relevant for (V-to-)T-to-C movement over the history of English. As Biberauer and Roberts (: ) point out: ‘The constructions have become more marked, typologically more 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

unusual (no known language shares the Modern English auxiliaryinversion system), more subject to microvariation . . . and, presumably, harder to acquire’ (on this last point, see the discussion of English root infinitives in §..). Biberauer and Roberts () go on to make a very interesting and, in the present context, highly relevant point. They observe that several diachronically innovative varieties of World English have entirely lost CI and some have lost inversion, i.e. T-movement to C, altogether. This is the case in Singapore English, Singlish. In this variety, inversion is not found in wh-questions (which tend to show wh-in-situ, as in (a)): () a. b. c. d.

Go where? Why you so stupid? Why she never come here? How to fix?

[Singlish]

Yes/no questions typically involve clause-final particles borrowed from Chinese varieties, while conditionals have neither if nor inversion (as in colloquial varieties of English spoken elsewhere): () a. You do that I hit you. b. You put there, then how to go up? c. Disturb him again, I call Daddy to come down. To quote Biberauer and Roberts once again: The Singlish system, then, represents a situation where even the nanoparametric version of inversion has disappeared, giving rise to a system which is fundamentally simpler than the Standard Modern English one by virtue of the fact that inversion is entirely absent . . . Singlish can thus be thought of as having taken a simplifying ‘leap’ away from the ever more nano-systems that define modern Englishes. (: )

They tentatively suggest that Singlish has thus lost head-movement altogether, reverting to a macroparametric NO value for this property (as argued for Chinese by Huang ; Huang and Roberts ). This is what they call instantaneous simplification. Here, FE and IG together override the weak preferences induced by the Subset Principle, and the Tolerance Principle is inoperative.33 We see from this that parameter 33 In terms of the Yang’s formulation, given in §.., (), and repeated here, the antecedent clause does not apply, as R (T-to-C movement in this case) is no longer productive since it has disappeared from the grammar:



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

PARAMETER-SETTING, PARAMETER HIERARCHIES

hierarchies, combined with the approach to markedness just described, allow us to understand how systems may become gradually more marked, in the sense of moving down the hierarchy, until a feature (or feature class) simply disappears from the system, and the system radically simplifies, ‘leaping up’ to a much higher position in the hierarchy. We thus come to a position rather similar to that originally proposal by Lightfoot () in relation to the Transparency Principle as described in §..: the accumulation of marked properties eventually culminates in a resetting of the system to a less marked state. Here, however, this applies to parametric systems rather than rule systems as on Lightfoot’s conception. We have observed something similar regarding the null-subject parameters of (). In §.. we suggested that there may be innovative varieties of contemporary French in which subject-clitic pronouns have been reanalysed as agreement markers, thereby causing the system, having undergone the changes from CNSL to PNSL to NNSL, to revert to CNSL status. We summarized this in () of Chapter , repeated here: () Stage I: null subjects (in main clauses), strong subject pronouns; Stage II: no null subjects, weak subject pronouns; Stage III: null subjects, subject clitics. In this case, the ‘leap’ is from a negative microparameter to a positive mesoparameter, and involves a significant increase in the distribution of the Person feature. As such, it is a less radical change than the one just described involving inversion in Singlish, and is in fact favoured only by IG, with FE neutral since no new features are introduced. It is worth briefly reconsidering van Gelderen’s (, , ) cycles of change in the light of the foregoing. In §., we listed several of the cycles she discusses and analyses, some of which we repeat here (see §., (a‒d)):

(i)

If R is a productive rule applicable to N candidates, then the following relation holds between N and e, the number of exceptions that could but do not follow R: N e θN where θN :¼ lnN

Similarly, if there is no rule, there is no elsewhere case.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

() a. Subject agreement: demonstrative/emphatic/noun > pronoun > agreement > zero; b. Object agreement: demonstrative/pronoun > agreement > zero; c. Copula cycle: demonstrative > copula > zero; d. Case or Definiteness or DP: demonstrative > definite article > ‘Case’ > zero. As we saw there, van Gelderen also sees these cycles as driven by a version of FE. In particular, she points out that the typical pathways of change are as in () (repeated from §., ()): () Adjunct Specifier Head Ø Semantic > [iF] > [uF] > Ø The fact that these pathways end in zero suggests that features disappear. Hence we predict that the last stage of these cycles should involve ‘instantaneous simplification’ as characterized by Biberauer and Roberts (). Here we see a different kind of cyclic change from that discussed by van Gelderen, in that systems may move down hierarchies over long periods and then ‘leap’ back to a much higher position. If new grammatical material is then introduced through regular processes of grammaticalization as described by van Gelderen and Roberts and Roussou () (see §.), then the system may begin to move down the hierarchy once again. In this section, we have looked at the role of the third factors in relation to markedness, construed in terms of parameter hierarchies. We have seen that change involving movement down a hierarchy introduces greater markedness while change involving upward movement is from more to less marked. This is not a very strong preference, however, as the interactions among the third factors schematized in Table . show. The other main kind of change is instantaneous simplification, favoured by IG alone or, in more radical cases, IG and FE together; this involves a ‘leap’ of a system from a low position to a relatively high one, and as such a potentially radical ‘resetting’ of the system, possibly giving rise to a significant typological change. We see that parameter hierarchies can make rich and interesting predictions about markedness and types of change.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CONCLUSION

3.5.4. Conclusion In this section, we have attempted to consolidate the discussion in the earlier sections of the chapter, and to some degree that in the earlier chapters as well, by considering the format for parameters and the various ways in which parameters may interact, in the form of parameter hierarchies. We then developed a concept of markedness in relation to parameter hierarchies, resulting in new predictions about types of change.

3.6. Conclusion This chapter has attempted to consolidate the ideas which were introduced in the first two chapters. There we first demonstrated the utility of the notion of parameter of UG for analysing syntactic change (Chapter ) and for giving a (near-)unified account of different types of change (Chapter ). Here, we showed how parameter change can be seen as driven by language acquisition. The essential notion is that of the conservatism of the learning device, which always attempts to set parameters on the basis of the greatest computational efficiency. This is connected to three of the third factors we have considered: FE, IG, and the Subset Principle. The interactions of these factors lead to a particular conception of markedness and markedness preferences. The interaction of the third factors with underspecified UG (first factor) and the learner’s intake of PLD (second factor) leads to emergent parameter hierarchies. These hierarchies make rich predictions for both acquisition and change, as well as of course synchronic typological variation as shown at length in Roberts (a). In seeking to relate parametric change to language acquisition, we undertook a survey of recent work on the acquisition of syntax in §.. Here we encountered the Very Early Parameter-Setting observation, which to some extent hampers empirically establishing a straightforward relationship between acquisition and change, although it does not conceptually preclude such a relationship. §. discussed the logical problem of language change, which led us to an initial formulation of FE. The subject of §. was the changing trigger. Given Inertia, i.e. the idea that syntactic change must be caused (to paraphrase Longobardi 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

: ), we considered how contact and morphological erosion may induce change. §. dealt with markedness at some length. We suggested that FE and IG should form the basis of markedness. The Subset Principle is also relevant in two cases: first, if parameter systems allow formal optionality and, second, if construed as a general preference for more restrictive (in fact more marked) parameter-settings. Finally, in §. we presented parameter hierarchies in full and suggested how marked properties can be innovated. We saw that the preference for change from marked to unmarked is slight, but crucial, in particular in that it can induce ‘leaps’ up the hierarchies. In the next two chapters, we look at the consequences of the general view of syntactic change that we have outlined over the preceding chapters. We begin, in Chapter , by looking at the dynamic aspect of syntactic change—and considering how it might be handled in the terms described here. Chapter  focuses on questions connected to contact, substratum effects, and creoles.

Further reading Principles-and-parameters theory Baker () is an excellent introduction to the principles-and-parameters conception of UG. Baker pursues a sustained analogy between contemporary comparative syntax and nineteenth-century chemistry, culminating in a ‘periodic table’ of parameters, which can be seen as a predecessor to parameter hierarchies of the kind seen in () and (). The principal difference between Baker’s table and the hierarchies proposed here (and elsewhere, notably in Roberts a) is that Baker’s table was intended to subsume all parameters, while the hierarchies discussed here relate to specific areas of grammar such as word order, null subjects, etc., ultimately to specific features. Baker correctly observes that any analogue to the quantum-theoretic explanation of why the periodic table has the properties it has is far off. Newmeyer (, ) argues at length that the principles-and-parameters approach to comparative syntax has failed, and that variation across grammatical systems should be handled in terms of performance systems of various kinds. Roberts and Holmberg () is a reply to Newmeyer, arguing that the principles-and-parameters approach is a 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FURTHER READING

valid and useful approach to comparative syntax. Roberts and Holmberg () takes up this argument, focusing on null subjects, and in fact proposes an early version of the null-subject parameter hierarchy. Biberauer () is an important collection of papers on parameter theory, notably featuring an excellent introductory overview. Two of the most important papers in that volume are Baker (a), an influential defence of the distinction between macro- and microparameters, and Haspelmath (), a critique of formal parametric approaches. Baker (, b) are both book-length studies of macroparameters concerning, respectively, polysynthesis and agreement/case. Parameter hierarchies are developed in Roberts (a), Biberauer, Holmberg, Roberts, and Sheehan (), and, in particular, in Roberts (a). Learnability and markedness Lasnik () is an early discussion of learnability in relation to principles-and-parameters theory. Lightfoot () is the first statement of the degree- learnability idea, developed at much greater length in Lightfoot (). Fodor () proposes a learnability theory for syntax. Roberts () looks at the relation between syntactic change and learnability, proposing a version of an FE-based approach to markedness. Dresher and Kaye () is the initial proposal for cuebased learning of phonological parameters, later developed in Dresher (). Berwick () first put forward the Subset Principle as a natural learnability-driven constraint on the language-acquisition process. Manzini and Wexler () offer an account of parametric variation involving long-distance reflexives, which makes explicit reference to the subset relations among the languages produced by grammatical systems defined by the different parameter settings proposed. This represents a further case where the Subset Principle may be relevant for understanding the relations among parameter values, and perhaps as a basis for a theory of markedness. Biberauer and Roberts () highlight the relevance of the Subset Principle for change in systems featuring formal optionality. Niyogi and Berwick () is a pioneering study of how syntactic change can be mathematically modelled. Battistella () is an introduction to and historical overview of the concept of markedness, with particular reference to syntax. Fodor and Sakas () is an excellent, partly historical overview of how the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

concept of learnability has evolved and an assessment of the learnability of parametric systems. Biberauer and Roberts () introduce the important notion of instantaneous simplification in relation to parameter hierarchies. Lightfoot () is a useful overview of many of the issues raised in this chapter, assuming a cue-based approach to parameters; see also Roussou (). Language acquisition Hyams () is the ground-breaking study of the acquisition of syntax using the principles-and-parameters approach, in which the phenomenon of early null subjects is first described. Hyams analysed these as null subjects of the Italian type, an idea she has abandoned in subsequent work (see Hyams , ; Hyams and Wexler ). Radford () was the first to generalize Hyams’ () account, and argue that English children, at least, go through a stage of acquisition in which no functional categories are available at all. This work led directly to the postulation of root infinitives and early null subjects. Pierce () is a pioneering study of Early French, in which it is shown that at the root-infinitive stage, the infinitival form of the verb does not raise to T while the optional finite form does. Poeppel and Wexler () is an important study of Early German, in which they argue for a root-infinitive stage in that language, and that infinitive verbs do not undergo the verb-second operation (i.e. they do not move to C). Rizzi () is an influential study of root infinitives, in which it is argued that these derive from the possibility of ‘clausal truncation’, i.e. realising a clause as a VP only, at a stage of acquisition in which the language faculty is not fully mature. Rizzi () proposes something similar for ‘diary-drop’. Clahsen and Smolka (), Clahsen and Penke (), and Clahsen, Kursawe, and Penke () are all studies of Early German, in which it is shown that the complex adult verbmovement system develops according to a series of well-defined stages. Haegeman (a,b), Guasti (, ), Hamann and Plunkett (), and Hoekstra and Hyams () are all studies of the early stages of the acquisition of various Romance and Germanic languages from the perspective of principles-and-parameters theory. Hoekstra and Hyams’ article is noteworthy for advocating that Early Null Subjects of the kind found in NNSLs such as English and other Germanic languages are not to be equated with those found in null-subject 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FURTHER READING

languages such as Italian. Wexler (, , ) provides overviews and summaries of much of the work in this field, as well as developing more general ideas, notably the Very Early ParameterSetting observation discussed in §.. Friedemann and Rizzi () is a collection of important articles on the acquisition of the syntax of a range of Germanic and Romance languages. Brown () is an early and very influential study of the first-language acquisition of English, while Jakobson () is, among other things, a ground-breaking study of language acquisition and language disorders, in which the concept of markedness plays an important role. Ernst () is an extremely detailed study of Héroard’s journal, in which the speech of the young dauphin was recorded over a period of several years. Héroard’s journal is a unique document, of potentially great interest for language acquisition and language change, as well as providing a valuable record of the nature of spoken French in the early seventeenth century. DeGraff () is a highly original collection of articles dealing with creolization, language acquisition, and language change. The introduction and epilogues are extremely useful and thought-provoking. This collection represents a unique attempt to bring together these areas, which have often been studied somewhat in isolation from one another. Guasti () is an up-to-date introduction to the state of the art in languageacquisition research from the perspective of generative grammar, with detailed discussion of both root infinitives and early null subjects. Bavin () and Lidz, Snyder, and Pater () are comprehensive handbooks on language acquisition. Further relevant works on syntactic theory Giorgi and Pianesi () propose a general theory of the syntax of temporal relations, and an analysis of the temporal systems of Italian, English, and Latin. A facet of their approach is the idea that functional heads are structurally present only when needed in order to bear certain features. This view differs notably from that put forward by Cinque (). Massam and Smallwood () and Massam (, , ) are analyses of Austronesian languages with VSO and VOS orders in which the central idea is that the V-initial orders derive from VP-fronting, possibly of a remnant VP. Other work on Austronesian and Mayan languages which typically feature the VSO/VOS alternation includes Lee (), Rackowski and Travis (), and 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

Coon (, ); see also Clemens and Coon () for a different analysis of the Mayan language Chol. Carnie and Guilfoyle () and Carnie, Harley, and Dooley () are important collections on verbinitial languages. Chomsky () is an important development of phase theory. This paper pays particular attention to the ‘A’-system’, i.e. wh-movement, topicalization, and focalization, all movements to the Specifier(s) (or ‘edge’) of CP. It is proposed that these movements are triggered by the E(dge) F(eature), a feature that has no connection with the Agree system. Chomsky (, ) propose the Labelling Algorithm. Richards and Biberauer () develop an analysis of the distribution of what have often been seen as overt and null expletives in Germanic (see the discussion of null expletives in §..) which makes use of the twin notions of roll-up of vP to SpecTP (as briefly described in §..) and the optionality of pied-piping operations. Biberauer and Richards () make a very interesting and well-argued case for formal optionality in syntax, arguing in particular that this is a natural outcome of the kind of minimalist syntax proposed in Chomsky (; ). We will look at some of their proposals in more detail in §... The FOFC is introduced and analysed in Biberauer, Holmberg, and Roberts (); the papers in Sheehan, Biberauer, Holmberg, and Roberts () discuss the constraint from a variety of empirical and theoretical perspectives. Other relevant work on historical and typological syntax King () is a detailed and very interesting study of Prince Edward Island French. In addition to arguing convincingly that prepositionstranding in this variety is the result of extensive borrowing of English prepositions, as summarized in §.. above, King looks at the syntactic consequences of the borrowing of the particle back and the whelements whoever, whichever, etc., into this variety of French. Keenan () is a very detailed, original and interesting study of the development of English reflexives, arguing convincingly that they were originally emphatic forms. It is here that the Inertia Principle is proposed for the first time. Kroch et al. () is a detailed and influential study of the loss of V in the history of English, arguing that this change was driven by contact between Northern and Southern dialects of ME. Croft () puts forward a general account of syntactic change in functional-typological terms, hence differing in its basic assumptions 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FURTHER READING

for what is being put forward here. Nichols () makes the distinction between ‘head-marking’ and ‘dependent-marking’. This is the distinction between a system in which a grammatical notion is marked on a head or on a dependent of that head, for example, marking grammatical functions through verb-agreement (head-marking) vs. case on nominals (dependent-marking). Nichols () made a number of important and influential proposals regarding the worldwide areal distribution of typologically variant properties, including but not limited to head- vs. dependent-marking. Since , the Oxford University Press series Oxford Studies in Diachronic and Historical Linguistics (edited by the present author and Adam Ledgeway) has published over forty monographs in this area. Many of these are important contributions to diachronic syntax adopting the general framework presented here. These include Ledgeway (); Galves et al. (); Willis, Lucas, and Breitbarth (); Poletto (); Benincà, Ledgeway, and Vincent (); Kiss (); Walkden (); Breitbarth (); Biberauer and Walkden (); Hill and Alboiu (); Pană Dindelegan (); Cardoso (); Mathieu and Truswell (); Danckaert (); Gianollo (); Jäger, Ferraresi, and Weiß (); Martins and Cardoso (); Chatzopoulou (); Wolfe (a,b); Rusten (); Nicolae (); Breitbarth, Lucas, and Willis (); and Maiden and Wolfe (). Phonological theory and phonological change Chomsky and Halle () is the classic exposition of generative phonology. It is notable for the system of distinctive features proposed, for the explication of the functioning of an ordered system of phonological rules, for the postulation of the levels of ‘systematic phonetics’ and ‘systematic phonemics’, for the analysis of the cyclic nature of stress-assignment in English, and for the markedness conventions connected to the evaluation metric discussed in §.. Kiparsky () is a treatment of the nature of rule-ordering in the standard model of generative phonology as put forward by Chomsky and Halle (), in which the Elsewhere Condition is put forward as a condition determining one kind of rule-ordering. The origins of this condition in the works of the Indian grammarian Pāṇini are explicitly acknowledged. Kenstowicz () is a standard, comprehensive introduction to (preoptimality-theory) generative phonology. Martinet () is a classic 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

ACQUISITION, LEARNABILITY, AND SYNTACTIC CHANGE

structuralist account of phonetic and phonological change, in which the fundamental idea is that sound changes arise from the interaction of economy (ease of articulation) and expressiveness (the need to make distinctions). Lass () is a further contribution to the Cambridge History of the English Language, in which the phonology and morphology of Middle and Early Modern English are described in detail. Blevins (, ) are major contributions to evolutionary phonology, the idea that synchronic phonological processes find their explanation in diachronic processes of sound change. Anderson () reviews the idea that diachronic forces can explain synchronic generalizations, concentrating on phonology but also briefly discussing morphology and syntax. Garrett () is a recent overview of the venerable question of the causes of sound change. Honeybone and Salmons () is an important handbook on phonological change.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

4 The dynamics of syntactic change

4.1. 4.2. 4.3. 4.4. 4.5.

Introduction Gradualness The spread of syntactic change Drift: the question of the direction of change Reconstruction Conclusion Further reading

      

Introduction The previous chapters have outlined the general approach to syntactic change that a parameter-based approach makes possible. As we have seen, syntactic change is seen as changes in values of parameters taking place through the process of first-language acquisition. The changes are driven by the nature of the parameter-setting device, which, as it were, tries to set parameters on the basis of the primary linguistic data (PLD) as efficiently as possible, where efficiency can be defined in terms of the third factors we reviewed in the previous chapter (see §.). In particular, the drive towards efficiency of parameter-setting leads to both the preference for simplicity of postulated derivations or representations and the tendency to generalize the input: the interaction of these preferences and other third factors underlies the markedness values associated with parameters, as we saw in some detail. The general view of change, acquisition, and parameters that we have outlined up to now is the one that we will maintain in the remainder of the book. The goal of this chapter is to see to what extent this view can shed any light on, or may be challenged by, certain fairly standard 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

observations concerning syntactic change. For the purposes of this chapter, the two most important aspects of the approach to syntactic change that we have described are: (i) that it is catastrophic, in the sense that a given parameter changes its value suddenly and irrevocably, at a given historical moment; and, (ii) that it is ‘internal’ to the language acquirer, a matter of I-language, and as such in principle entirely independent of the social, cultural, or historical environment of that acquirer; all that counts is the acquirer’s linguistic environment, i.e. whatever aspect of the PLD that leads to the change in the parameter value. Both of the aspects of parametric change just described are entailed by the approach described up to now, and yet both are to a certain degree inimical to what is often assumed, explicitly or implicitly, in much work in historical linguistics. Language change very often seems to be gradual rather than in any way sudden or catastrophic, and ‘external’ forces of the kind that shape E-language—social, cultural, and historical—have often been thought to be central to understanding the nature of language change; both of these points are forcefully made in Weinreich, Labov, and Herzog (; WLH). Two important facets of gradualness are, on the one hand, the concept of lexical diffusion (the idea that changes may gradually ‘diffuse’ through the lexicon item by item, rather than affecting the entire grammar at once) and, on the other, the concept of drift (the idea that languages may change over very long periods in certain preferred directions). Both lexical diffusion and drift contrast sharply with, and might at least seem to be in conflict with, the hypothesized sudden, discrete nature of parametric change; we will discuss to what extent there truly is a conflict here in the first three sections of this chapter. What we will see is that the notions of micro- and nanoparameter as introduced in the preceding chapters (see in particular §..) can capture important aspects of diffusion, while drift can be thought of as the interaction of meso- and macroparameters. A further aspect of parametric change is the fact that it is irrevocable; this has been argued by Lightfoot (, a,b) to render syntactic reconstruction impossible. Since reconstruction has proven to be a powerful and revealing tool in historical phonology and morphology, it is natural to ask whether it can also be used in syntax. In §., I will review Lightfoot’s arguments that the nature of parametric change makes reconstruction impossible; at the same time, I will briefly review some recent work on quantifying linguistic relatedness, and how this may be adapted in the context of parametric theory for the purposes of 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRADUALNESS

syntactic reconstruction. I will also summarize the proposals for the syntactic reconstruction of aspects of Proto Germanic in Walkden (, , ). One difficult question related to ‘external’ aspects of change has to do with the fact that ‘successful’ changes—those which have actually led to the elimination of an earlier system in favour of an innovative one in the attested record—are not confined to individuals, but concern aggregates of individuals, i.e. speech communities.1 In order for our approach to account fully for change we need to say something about how changes may spread through a speech community; this will be addressed in §.. More generally, at least the first two sections of this chapter are an attempt to deal, in the context of the theoretical assumptions regarding syntactic change that have been made in the preceding chapters, with what WLH (–) refer to as the paradoxical relationship between language structure and language history, and which they claim (–) to have been recognized by de Saussure () when the distinction between synchrony and diachrony was first made. How can we reconcile the postulation of discrete, homogeneous, algorithmic systems like generative grammars (or structuralist-style systems of elements in opposition) which are properties of individuals, with the apparently gradual change of a language at a given historical period in a given speech community? We have touched on this problem occasionally in previous chapters, but now it is time to try and deal with it systematically.

4.1. Gradualness 4.1.1. Introduction As I mentioned in the Introduction, language change, including syntactic change, appears to be gradual. For example, Kroch (: ) says: ‘[s]tudies of syntactic change which trace the temporal evolution 1 Strictly speaking, we should use a different term here, since ‘speech community’ could be taken to exclude Deaf communities which make use of sign language (which we will discuss in §.). However, I will continue to use this term owing to its familiarity, bearing in mind that it really refers to a linguistic community without prejudice as to the linguistic modality used in that community.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

of the forms in flux universally report that change is gradual’. He goes on to say: [b]efore the rise of generative grammar, this sort of gradualness was taken for granted. Syntactic change, once actuated, was conceived primarily as a slow drift in usage frequencies, which occasionally led to the loss of some linguistic forms. New forms, whether they entered the language as innovations or borrowings, would normally affect the language only marginally at the outset and then, if adopted by the speech community, would spread and rise in frequency.

Harris and Campbell (: ) state that ‘syntactic change may be considered gradual in a number of respects’, although they also point out that ‘it is not useful to consider syntactic change in terms of a dichotomy between gradual vs. abrupt’ (: ). Moreover, we noted in §.. that Vennemann considers that certain changes may take millennia to come to completion, in commenting that ‘a language may become fairly consistent within a type in about , years (for example, English)’ (: ).2 Further, Haspelmath (a: ) refers to grammaticalization as ‘gradual unidirectional change . . . turn[ing] . . . lexical items into grammatical items’. Lightfoot (: ) observes that ‘common sense rests to some degree on the notion of continuity and gradual change’, but goes on to make the very important point that ‘things depend on the units of analysis: languages, seen as social entities, change gradually, but I-languages change abruptly’. The dichotomy between I-language and E-language is crucial to understanding the apparent paradoxes and, once the distinction is properly made and understood, in many cases the paradoxes disappear. The idea that change is gradual arguably has two sources: on the one hand, the time course of change is impossible to pin down to a single historical moment. On the other hand, the emphasis on gradualness may stem in part from the influence that both geology—in the form of the uniformitarian thesis (introduced in §.), which was taken over from the work of the pioneering geologist Lyell (–)—and Darwin’s theory of biological evolution have had on historical linguistics since the nineteenth century (see Lightfoot : ; MacMahon : ff. on the relation between Darwinian thinking and historical linguistics; Morpurgo-Davies : –, –, , n. , ,  on the influence of Darwin and Lyell on nineteenth-century linguists. This comment really relates to the question of ‘typological drift’, which we will look at in §.. There is nonetheless a clearly implicit notion of gradual syntactic change. 2



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRADUALNESS

Labov : – insightfully points out that uniformitarianism need not imply gradualism in historical linguistics, although for a long time it was thought to do so in geology).3 Both geology and the Darwinian approach to evolution emphasize the importance of gradual change over very long periods, although in the theory of evolution, ‘punctuated equilibrium’ has more recently become an influential point of view; see Eldredge and Gould (). As Kroch points out, syntactic changes are not traceable to a specific historical moment, in the way that historical events such as the end of the First World War or the fall of the Byzantine Empire are. Instead, syntactic changes seem to be temporally diffuse. For this reason, it may be thought that the entailment of the approach to syntactic change that parametric change must be abrupt, sudden, or, to use Lightfoot’s (, ) terminology, catastrophic, is problematic. However, further reflection reveals that there are compelling conceptual reasons to view syntactic change as sudden, in exactly the way we would expect given what we have seen up to now, and that there is in fact no strong empirical reason to consider that things should be otherwise, once a range of additional factors, both grammatical and sociolinguistic, are fully taken into consideration. There are two main reasons why a gradualist view of syntactic change is conceptually untenable, given the general view of syntactic change that I have been propounding here. The first was lucidly articulated in Lightfoot (). Discussing the idea that syntactic change might be a form of drift (see §.), Lightfoot points out that no change involving more than a single generation is possible, given an acquisition-driven view of change: Languages are learned and grammars constructed by the individuals of each generation. They do not have racial memories such that they know in some sense that their language has gradually been developing from, say, an SOV and towards an SVO type, and that it must continue along that path. After all, if there were a prescribed hierarchy of changes to be performed, how could a child, confronted with a language exactly half-way along this hierarchy, know whether the language was changing from type x to type y, or vice versa? (Lightfoot : ) 3 More recently, the connection between Darwin’s ideas and historical linguistics has been taken up in a different way; see in particular Longobardi et al. (), where a connection is made between parametric syntax and the ‘New Synthesis’ of genetics, linguistics, and archeology pioneered by Cavalli-Sforza (). We will discuss Longobardi’s research programme in §...



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

(We have already touched upon this argument of Lightfoot’s in relation to word-order change in §.., but the point applies to all types of syntactic change.) As Lightfoot points out, this raises a seemingly insuperable problem for the postulation of any notion of typological drift, a matter we return to in §.. The second point is a very simple, but conceptually powerful, one: parameter settings, like all other formal entities in generative grammar (and recall that parameter settings are no more than values of formal features, in line with the Borer-Chomsky Conjecture (BCC)—see §.., ()) are discrete entities. As we pointed out in the initial discussion of parameters in §., ‘clines, continua, squishes and the like are ruled out’. Therefore gradual change from one value of a parameter to another is simply impossible: a parameter must be in one state or the other; it cannot be in between (cf. Kroch’s :  remark that ‘the change from one grammar to another is necessarily instantaneous’). This is really a matter of logic, not linguistics: the Law of the Excluded Middle (p v ¬p) is the relevant concept. Any system which appears to be, for example, ‘tendentially VO’ (and, of course, plenty do, as one simply surveys the data) must be analysed as being one thing or the other; strictly speaking no system can be in a state intermediate between two parameter values. This conclusion holds for any approach to linguistics which makes use of discrete categories, for example, grammatical categories such as verb, noun, etc., or phonemes of the usual type. Hockett (: –), quoted in WLH () makes a similar observation when he points out that ‘[s]ound change itself is constant and slow. A phonemic restructuring, on the other hand, must in a sense be absolutely sudden’. No special claim is being made about parametric systems of the sort being assumed here in this respect. Hale (: ) makes a related point in defining change as a set of differences between two grammars. He points out that this means that ‘“change” therefore has no temporal properties—it is a set of differences’ (emphasis in original). Since gradualness is usually intended as a temporal concept in discussions of language change, a consequence of Hale’s views is that change, as he defines it (and this definition applies to parametric change), cannot be gradual. But it is undeniable that the ‘mirage’ (to use a term attributed to Mark Hale by Lightfoot : ) of gradualness is very apparent to us in our observation of the phenomena. There are several reasons for this, many of which have been pointed out (see Roberts a: ; Harris and Campbell : ; Willis : –; Hopper and Traugott : , ). 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRADUALNESS

The most obvious one concerns the time course of change: changes do not appear to be instantaneous in the historical record. Instead, they typically follow an S-shaped curve when frequency of occurrence of a new form vs. an old one is plotted against time. (This observation is due originally to Osgood and Sebeok : ; see Kroch : , WLH: , n. .) That is, changes start slowly, gather speed, and then taper off slowly again. Denison () dubbed this pattern of change ‘slow, slow, quick, quick, slow’ (see also Denison ). The S-curve is shown in Figure .. Kroch () reconciled this fact about the time course of change with the discrete nature of grammars by postulating the ideas that grammars may ‘compete’, and that one replaces another at a constant rate. We will consider Kroch’s proposals in detail below and in the next section. Alongside competing grammars, the other factors giving rise to the mirage of gradualness fall into two main kinds. First, there are sociolinguistic factors: these include the often highly artificial and restricted register of surviving texts, variation in individual usage conditioned by style, accommodation, etc., and variation in a speaker’s or writer’s age, gender, class, etc. We will consider some of these issues in slightly more detail in the next section. Second, there are factors which arise from the nature of the grammatical system. These include lexical diffusion, micro- and nanoparametric variation whose precise nature may have gone unrecognized by analysts, and true optionality. All of these factors, both sociolinguistic and grammatical, are, as far as anyone knows, independent of one another. Their combined effect is the mirage of gradualness. In other words, these factors may cushion the effects of an instantaneous, discrete, structural change in the historical record. 100

% new

80 60 40 20 0 time

Figure 4.1

An idealized graphical change

From Denison (: ).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

Hale (: –) distinguishes ‘change’ which, as we saw above, has no temporal properties, from ‘diffusion’, by which he means the process whereby a change spreads ‘from the innovator to (a subset of) those with whom the innovator comes into contact’ (Hale : ). As such, diffusion may have temporal properties, and the evidence of change is preserved in the historical record in the form of ‘diffusion events’, as we witness one system replacing another over time. Once again, no special claim is being made about parametric systems in this respect which would not be made about any formal system making use of discrete entities. 4.1.2. Lexical diffusion: nanoparametric change Let us now look at the grammatical factors that may ‘cushion’ change in the sense described above in more detail, beginning with lexical diffusion. The term lexical diffusion refers to the idea that a change may spread through the lexicon gradually, perhaps in the last analysis one word at a time. This view has been proposed, somewhat controversially, for sound changes (see in particular Labov : ff.). It is arguably more widely accepted as an operative concept in syntactic change, at least according to Harris and Campbell (: ). We have seen an example of lexical diffusion in syntax in our discussion of changes to psych verbs in ME in §... There, we saw how one type of OE psych construction—that in which the Experiencer is marked dative—was lost, with the original dative Experiencer being reanalysed as the subject and the original nominative Theme as the direct object. Thus, for example, OE lician, the ancestor of NE like, used to appear in examples like () (repeated from () of Chapter ): () hu him se sige gelicade how him- the victory- pleased ‘how the victory had pleased him’ (Or .; Denison : )

[OE]

We observed that this verb has undergone a redistribution of its thematic roles (although not really a change in meaning, since the core meaning has involved causing pleasure all along), associated with the loss of the construction in (). Allen () argues that the changes in the psych verbs were changes in lexical entries of individual verbs, which diffused through the lexicon over a considerable period. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRADUALNESS

She argues (ff.) that the beginnings of this change may be discerned in the optional assignment of dative case to the Experiencer arguments of certain verbs in OE and that the change was completed only by approximately . One piece of evidence for this is the innovation of new verbs with dative Experiencers as late as the fourteenth century; Allen (: ff.) gives examples of this with ought and need. Allen argues that the loss of morphological case distinctions in ME facilitated but did not cause this change. We also saw that the individual changes may be treated as nanoparametric, in that they involve specifications of individual lexical items. To be consistent with the BCC in involving changes to functional rather than lexical items, we suggested that the changes involved the gradual reduction in distribution of vExp (a variant of v which licenses a Dative Experiencer argument in SpecVP; see the discussion of () in §..). The change is sensitive to properties of lexical items since vExp can only occur where V has an Experiencer argument, i.e. with psych verbs. Hence the change is indirectly lexically conditioned, but involves a change in the formal features of the functional head v, from licensing Dat to having φ-features which Agree with the direct object. So the changes diffuse through a well-defined subclass of lexical items; see Kiparsky (: f.) for a discussion of this in relation to phonological change. The important point for our present purposes, however, is that lexical diffusion seen as nanoparametric change can definitely give rise to gradual change: if we consider the change in psych verbs in English as a single change, then we have to say that it went on essentially throughout the entire ME period. However, strictly speaking, looking at the grammatical systems underlying the E-language ME, there was a series of discrete reanalyses across a subclass of lexical verbs as the occurrences of vExp were reanalysed one by one; there is no reason to treat these as gradual (and, indeed, reason not to treat them as gradual, since they affected discrete formal entities: the features of v). These changes, especially taken individually, had a far smaller effect on the derivations and representations produced by the grammar: changing a feature of C or T will affect any clause, but changing a feature of v associated with a psych verb has a considerably smaller effect. We conclude, then, that apparent lexical diffusion is really BCCcompliant nanoparametric change. Although each individual change is discrete, if we speak (loosely) of ‘changes in ME psych verbs’, the changes take place gradually over a long period. There is no 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

contradiction here; as we quoted from Lightfoot above ‘things depend on the units of analysis’. 4.1.3. Microparametric change In §.. () we defined microparameters as involving variation in the formal features of a small subclass of functional heads. As such, microparameters are associated with fine-grained synchronic variation; again, apparently gradual diachronic change may arise as those features alter their values. A case in point, which is of considerable general interest, concerns the phenomenon of ‘auxiliary selection’ in Romance languages, particularly in Central and Southern Italian dialects. The term ‘auxiliary selection’ refers to the choice of the equivalent of ‘have’ or ‘be’ as the auxiliary for compound tenses such as the perfect. Henceforth, I will use the notations HAVE and BE to designate the respective auxiliaries in a cross-linguistically neutral way. In Standard Italian, as in German and Dutch, the choice of auxiliary depends on the argument structure of the main verb: if the verb is transitive or an unergative intransitive, the auxiliary is HAVE; if the verb is an unaccusative intransitive, passive or marked with the clitic si (which has various uses we will not go into here), the auxiliary is BE. These facts are illustrated in () and ():4 () a. Maria ha visto Gianni. (transitive)[Standard Italian] Maria has seen-.. Gianni ‘Mary has seen John.’ b. Maria ha telefonato. (unergative intransitive) Maria has phoned-.. ‘Mary has phoned.’

4 Where the auxiliary is BE in the examples in (), the participle agrees with the subject. But where the auxiliary is HAVE, the participle does not agree with the subject, and here shows the default masculine singular form. In the discussion to follow, I will largely leave aside the important question of the relation of participle agreement to auxiliary selection, not least because the Central and Southern Italian dialects show a range of at least partially independent variation in this respect; see Ledgeway (forthcoming); Ledgeway and Roberts (a: ‒); Roberts (a: ‒) and references given there.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRADUALNESS

() a. Maria è arrivata. (unaccusative intransitive) [Standard Italian] Maria is arrived-.. ‘Mary has arrived.’ b. Maria è stata accusata. (passive) Maria is been.-. accused-. ‘Mary has been accused.’ c. Maria si è accusata. (reflexive) Maria SI is accused-. ‘Mary accused herself.’ (The distinction between unaccusative and unergative intransitives was introduced in §...) If we suppose that aspectual auxiliaries are always merged in v, then we can state the generalization for Standard Italian in terms of the notion of ‘defectivity’ of v in the sense of Chomsky (: –), which we introduced in §... Where v lacks what we called the ‘Burzio properties’ of Agreeing with the Case feature of the direct object and assigning a thematic role to the subject, it is defective. (In the earlier discussion we took it that v could also be absent, but we will see directly that this cannot be maintained here.) The generalization regarding Standard Italian auxiliary selection can be stated as in () (this is an updating and simplification of Burzio : ff.): ()

v*Perfect = HAVE; vPerfect = BE

The ‘perfect’ feature here may well be a selectional feature, stating that the VP complement to v must be headed by a perfect participle; this feature must be present on all the instances of v discussed here, and so we will leave it out from now on. As we said, v* denotes a non-defective v, one capable of Agreeing with the direct object’s Case and assigning an external thematic role to the subject. In other words, v* is shorthand for two features: an external thematic role and a feature Agreeing with an internal Case. Moreover, if we assume that the direct object’s structural Case feature Agrees with v*’s person and number features (see §..), we see that defectivity is signalled by a rather complex set of



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

features. Auxiliary selection in Standard Italian is conditioned by this set of features.5 In languages like Modern English or Modern Spanish, where HAVE is the sole perfect auxiliary but where BE is used for the passive, the features of v which determine which auxiliary is merged (or which determine when HAVE is merged—see note ), relate rather simply to voice and aspect. We can formulate the generalization as follows: ()

v[active] = HAVE.

The statement in () is clearly simpler than that in (), given that the notion of non-defectivity of v refers to a thematic feature and a set of person/number features. It is worth pointing out in this connection that both Spanish and English have lost an Italian-style auxiliaryselection system in their recorded histories (see Penny : , Loporcaro : ff., Aranovich , Rosemeyer  on Spanish; Mateu  on Catalan and Spanish; and Denison : –, McFadden and Alexiadou ,  on English).6 We can see by comparing () and () that this change involved a simplification of the specification of the features of v required for the merger of HAVE. Across Romance, and in particular in the Central and Southern dialects of Italy, we see a range of diachronic and synchronic variation in the factors governing the choice of aspectual auxiliary (see Ledgeway : ‒, ‒, for references). Ledgeway and Roberts (a: ) put forward the hierarchy in ():

5 HAVE-selection is naturally seen as the marked option. There are several reasons to think this. First, HAVE-auxiliaries are cross-linguistically rather rare; in Indo-European they are not found in Celtic or Slavonic (with the exception of Macedonian (David Willis, p.c.)), or in Hindi (Mahajan ), for example. Second, any context where HAVE is found corresponds to one where BE can be found in some other language, but not vice versa. For example, HAVE is never, to my knowledge, the basic passive auxiliary. In all the languages mentioned here, this is always BE. Third, we can observe that the context for merger of HAVE has a longer description than that for BE. BE can thus be considered to be the default auxiliary. Accordingly, all we need to do in order to give an account of auxiliary selection is specify the context where HAVE is merged. I will follow this practice for the remainder of this discussion. 6 Swedish (Faarlund : ), Portuguese, and Romanian (Loporcaro : ) have also undergone this change.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRADUALNESS

() 1. Does L present auxiliary alternation? No: Modern English

Yes: 2. Sensitive to mood?

Yes: Romanian (7)

No 3. Sensitive to tense?

Yes: Sanleuciano (8)

No 4. Sensitive to person?

Yes: Ariellese (9a‒b) No 5. Sensitive to argument structure? Yes: Standard Italian (2)‒(3)

The hierarchy in () reveals five broad dimensions of microparametric variation in auxiliary distribution, the markedness and complexity of which grows as we move down the hierarchy. Option () makes the simplest distinction between languages which do not show any alternation in the perfective auxiliary on the one hand and those displaying varying patterns of alternation between BE and HAVE on the other. Of course, Modern English exemplifies this option, which we take to be the simplest. In Italo-Romance, certain varieties of the extreme south also have this pattern.7 Ledgeway and Roberts (a: ) claim that ‘[i]f a variety does present auxiliary alternation, then as indicated in [()] this variation can, in order of complexity, be determined by mood, tense, person, and argument structure’. In Romanian, auxiliary choice is determined by the realis vs. irrealis mood distinction: the HAVE auxiliary appears in the present indicative (a) and the invariable BE auxiliary appears in the present subjunctive (b) (as well as in the future and conditional perfect and the perfect infinitive): 7

A HAVE-only system of this kind is not the least marked option. Given what we said in note , that would be a BE-only system. This is found in Welsh and in many Slavic languages lacking HAVE. In Italo-Romance, it is found in many central-southern dialects of Italy (e.g. Pescolanciano, illustrated in Ledgeway and Roberts a: ‒; see also Tuttle : ‒, Manzini and Savoia , II: ‒, Legendre : ‒, Ledgeway : ‒).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

() a. Am mâncat. Have.. eat. ‘I have eaten.’ b. Vor fi mâncat. Will. be. eat.ppt ‘They will have eaten.’

[Romanian]

In the Campanian dialect of San Leucio del Sannio in Southern Italy (Iannace : ‒, f.; Ledgeway : f.) auxiliary distribution is sensitive to tense distinctions. The present perfect (a) shows HAVE and the pluperfect indicative shows BE (b): () a. Èggio fatto tutto. [Campanian of S. Leucio di Sannio] Have.. do. all ‘I have done everything.’ b. Erem’ auta dice quéllo. Be.. have. say. that ‘We had had to say that.’ Ledgeway and Roberts place sensitivity to mood higher in the hierarchy in () than sensitivity to tense on the assumption that, while the basic modal distinction is simply Realis, tense involves a distinction between Past, Future, and Anterior (Vikner ; Cinque : ‒). Person-driven auxiliary selection is common in Central and Southern Italo-Romance. One such variety is the Eastern Abruzzese dialect of Arielli (D’Alessandro and Ledgeway ; D’Alessandro and Roberts ). Here BE appears in first and second persons and HAVE in third persons. This is illustrated in the singular persons in (): () So /si /a fatecate. [Eastern Abruzzese of Arielli] Be.//have. work. ‘I/you/(s)he have/has worked.’ As Ledgeway (forthcoming) shows, auxiliary systems that operate fundamentally in terms of person and argument structure distinctions frequently blend these with modal and temporal restrictions to produce increasingly marked and complex proper subsets of person and verb class combinations. For example, the South-Eastern Marchigiano dialect of Ortezzano (Central Italy) combines all three factors, showing the first/second vs. third Person split with argument-structure 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRADUALNESS

sensitivity in the third Persons of the present perfect only and BE generalized elsewhere. Ledgeway and Roberts emphasize that the modal and temporal contrasts in such systems do not override the person or verb class distinctions. This justifies placing person and argument structure lower than mood and tense in (). As they say ‘mood and tense as independent determinants of auxiliary variation are not constrained, at least in Romance, by person and argument structure, whereas person and argument structure as independent dimensions of auxiliary selection are frequently augmented by the incorporation of restrictions relating to mood and tense’ (Ledgeway and Roberts a: ‒). Considering the diachronic implications of (), Ledgeway and Roberts observe that the historically most conservative pattern of auxiliary selection is that determined by argument structure (cf. Vincent ; Bentley ; Ledgeway : ‒; Adams ). Placing this option at the bottom of the hierarchy implies that the changes giving rise to other patterns in the history of Romance involve movement up the hierarchy; as such, they involve movement from more to less marked options, implying that the auxiliary-selection rule in () is the most complex option (see the discussion of markedness and parameter hierarchies in §.). Aside from the fact that argument-structure restrictions often combine with person, tense and mood restrictions as just noted, we pointed out above that ‘v*’ is really an abbreviation for a complex of Case/Agree and thematic properties, the ‘Burzio properties’ of licensing both an external argument and Accusative Case. Significantly, no movements downwards from, say, mood-driven auxiliation to person-auxiliation are observed. We saw in §. that movement upwards in a parameter hierarchy may involve a ‘leap’ (what Biberauer and Roberts  called ‘instantaneous simplification’). Ledgeway and Roberts point out that Dragomirescu and Nicolae () and Ledgeway () show that the shift from an original argument-structure-based split to a mood-driven system in the history of Romanian was a ‘saltational’ change, with no evidence of auxiliary variation having first passed through intermediate person- and tensedriven splits. A particularly good example of both synchronic and diachronic variation in this domain is the variation in the Neapolitan area discussed by Ledgeway (: ff.). According to Ledgeway, Literary Neapolitan has a system like that of Standard Italian, where auxiliary 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

selection is determined by argument structure. Urban Neapolitan, on the other hand, has a system like that of English or Spanish, where HAVE is always found in the active. Third, peripheral varieties (Torre del Greco, Torre Annunziata, Pompei, Sorrento; see Ledgeway :  and the references given there) show the pattern where BE appears where the subject is in the first or second person and HAVE where it is in the third person. These systems are illustrated in ()–() (Ledgeway’s ()–(), ):8 () Literary Neapolitan: a. Aggiu visto a Ciro. I-have seen A Ciro ‘I have seen Ciro.’ b. So’ arrevato. I-am arrived ‘I have arrived.’

(transitive)

(unaccusative)

8

Ledgeway discusses two further varieties: Procidano, where the choice of auxiliary is determined by tense, and an obsolescent urban variety, where auxiliary selection is ‘determined by a combination of grammatical person and clitic-doubling’ (Ledgeway : ): (i)

Procidano: a. Hó visto a Ciro/arrevèto. I-have seen A Ciro/arrived ‘I have seen Ciro/arrived.’ b. Fove visto a Ciro/arrevèto. I-had seen A Ciro/arrived ‘I had seen Ciro/arrived.’

(ii) Obsolescent urban dialect: a. Aggiu visto a Ciro. I-have seen A Ciro/arrived ‘I have seen Ciro/arrived.’ b. ‘O so’ visto a Ciro. him I-am seen A Ciro ‘I have seen Ciro.’ c. (L’) ha visto a Ciro. him he-has seen A Ciro ‘He has seen Ciro.’ In (iib,c) we see clitic-doubling of the direct object. The a marker preceding the direct object here and in ()–() is comparable to that found in Spanish, and is another widespread feature of Central and Southern Italian dialects—see Ledgeway (: ch. ) and the references given there. The contrasts here show that the person-driven auxiliary alternation only appears where the direct object is clitic-doubled.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRADUALNESS

() Urban Spoken Neapolitan: a. Aggiu visto a Ciro. I-have seen A Ciro ‘I have seen Ciro.’ b. Aggiu arrevato. I-have arrived

(transitive)

(unaccusative)

() Peripheral varieties: a. So’ visto a Ciro/arrevato. (first/second person subject) I-am seen A Ciro/arrived ‘I have seen Ciro/arrived.’ b. Ha visto a Ciro/arrevato. (third-person subject) s/he-has seen A Ciro/arrived ‘s/he has seen Ciro/arrived.’ Ledgeway argues that the urban variety is innovative in relation to the literary one. Here we see the same change taking place as that which has happened in English, Spanish, and the other languages mentioned in note . As we said above, this can be seen as a simplification in the environment for merger of HAVE. The peripheral varieties also represent an innovation (see Rohlfs : ; Tuttle ; Bentley and Eythórssen ). Again, we see upward movement in the hierarchy in () here. We see then that the variation in auxiliary selection in the Central and Southern Italian dialects, and elsewhere in Romance, can be reduced to variation in the features of v associated with merger of the HAVE-auxiliary; this is a case of complex microparametric variation which can be handled in terms of the general approach to parametric variation assumed here, in particular Ledgeway and Roberts’ hierarchy in (). Alongside synchronic microvariation in auxiliary selection (witness Ledgeway’s description of the variation in the Neapolitan area), we find seemingly gradual diachronic change: this is a further instance of apparent gradualness which can be reduced to discrete parameter settings, i.e. the formal features of functional categories. 4.1.4. Formal optionality The final way in which apparent gradualness may come about as a consequence of properties of the grammatical system itself has to do 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

with optionality. If grammars allow for true formal optionality, with no semantic consequences, then surface variants may exist (without grammar competition, since by assumption there is just one grammar underlying the variants). Random fluctuations in usage may then give rise to an impression of gradual change. But in fact in such a case there would be no grammatical change at all, simply a difference in the use of one or another form that grammar makes available. I will return to this last point in the next section, looking at how social factors may interact with formal, grammatical options. Optionality has been seen as problematic in the context of minimalist syntax, since a central tenet in minimalism has been the idea that formal operations only apply when forced to: they are either obligatory, for example in the presence of the relevant triggering feature (such as Internal Merge being triggered by an EPP feature, or Agree by the presence of uninterpretable features), or, where the triggering feature is absent, they are impossible. This kind of approach appears to leave little room for optionality, beyond, of course, optionality in the choice of lexical items.9 However, it is not quite correct to conclude that true optionality within a single grammar is absolutely impossible. Biberauer (), Biberauer and Roberts (), and Richards and Biberauer (), Biberauer and Richards () have shown that in the context of the versions of minimalism presented in Chomsky (, , , , ), true optionality is a technical possibility. Moreover, Biberauer and Richards () demonstrate that this possibility is in fact attested. (See also the discussion of movement of definite direct objects in ME in §...) Briefly, the technical possibility of optionality derives from the dissociation of the operation eliminating uninterpretable features, Agree, from the triggering of movement by means of EPP features. (In earlier versions of minimalism, for example in Chomsky , , these were conflated.) One of the principal ways in which movement is triggered is in the case where the Probe of Agree has an EPP feature. In that case, the Goal of Agree is attracted to the Probe. 9 Of course, one could always postulate optional EPP or uninterpretable features. But if these are the features which define parameters, and if different grammars are defined in terms of whether and how they differ in parameter values, then postulating optional features in this way is equivalent to postulating different grammars. Our concern here, however, is with the possibility of optionality in a single grammar.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRADUALNESS

However, and this is the central point in the present context, nothing prevents the category which moves from being larger than the Goal; all that the system requires is that the Goal either be exactly what is moved, or be contained in what is moved. More generally, movement involves the abstract configuration in (): ()

. . . X[+EPP] . . . [YP . . . Z . . . ] . . .

In this situation, in principle it does not matter whether Z or YP moves to X. Therefore the possibility of true optionality arises.10 This is purely formal optionality, with no semantic consequences at all. Biberauer and Richards () discuss a number of cases where exactly this kind of optionality appears to obtain. One such case concerns wh-movement in Russian. In this language, a wh-pronoun may move alone (off the ‘left branch’ of the wh-DP) or the entire wh-DP may move: () a. Č’ju knigu ty čital? whose book you read b. Č’ju ty čital knigu? whose you read book ‘Whose book did you read?’ (Biberauer and Richards , ())

[Russian]

Here C triggers movement of a wh-phrase to its specifier; in (a) the wh-element pied-pipes the larger DP, č’ju knigu (‘which book’), while in (b) just the wh-expression č’ju moves. In English (and many other languages), the equivalent of (b) is ungrammatical (*Whose did you read book?). Biberauer and Richards propose that this is because English wh-expressions are determiners, i.e. Ds, while wh-movement must involve movement of a phrasal category. In Russian, wh-expressions like č’ju are Quantifier Phrases (QPs), and so are able to move alone to SpecCP. So we see that grammars may, under certain conditions, allow true optionality. In that case, if the two variants are attested at gradually varying frequencies over time, no grammatical change is at work, but 10 It may also be the case that one or other option is excluded as a parametric choice, but this choice has to be determined by something else in the grammar or we would again have two grammars—see the previous note. Richards and Biberauer () and Biberauer and Richards () discuss this point in detail, and show how optional pied-piping in the Germanic languages which have it is determined by independent aspects of the grammar.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

simply a gradual shift in choice of options. A potential example of the diachronic situation, unfortunately lacking in full documentation for the relevant period, comes from the history of Greek. Biberauer and Richards (: ) show that Ancient Greek displayed the same options as those seen in Russian regarding wh-movement. However, Modern Greek is like English in disallowing the equivalent of (b). Their data and the account of the development in Greek is based on Mathieu and Sitaridou (), summarized in §..; see (), () there. They argue that this indicates that the Greek wh-expressions have been historically reanalysed from QP to D, in line with the proposals in Mathieu and Sitaridou () and Roberts and Roussou (: –). If, at some intermediate stage of Greek, a gradual preference for the equivalent of (a) over (b) is attested, this would be exactly the case in point. Horrocks (: ) dates the reorganization of the article system to spoken Byzantine Greek, and so in principle we might expect to find this gradual shift towards a preference for (a) in this period, or just before (see also Manolessou ). Latin also allowed both multiple wh-movement (see §.., ()) and examples comparable to (b); it is well-known that the article system of Modern Romance is a post-Latin development (see Vincent ; Giusti ). It is important to see that while the grammar, to use Biberauer and Richards’ formulation, ‘doesn’t mind’ which option is taken, other factors—including those of a sociolinguistic nature—may determine the choice made by a given speaker at a given historical moment. I will come back to this last point in the next section. 4.1.5. The Constant Rate Effect Finally, let us return to Kroch’s seminal work on the Constant Rate Effect and consider the proposal in more detail. Kroch () identified the Constant Rate Effect for the first time. The Constant Rate Effect claims that ‘when one grammatical option replaces another with which it is in competition across a set of linguistic contexts, the rate of replacement, properly measured, is the same in all of them’ (Kroch : ). Kroch goes on to point out that ‘the grammatical analysis that defines the contexts of a change is quite abstract’ (), and that we must therefore ‘look for causes of change at more abstract levels of structure’ than simply contextual effects (). 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRADUALNESS

Kroch illustrates the Constant Rate Effect with a number of cases, the most striking of which is the development of periphrastic do in Early Modern English. Since this is connected to the change in the value of the V-to-T parameter discussed in §..., §.., §.., and §.., let us now look in detail at this. The starting point of Kroch’s demonstration of the Constant Rate Effect is the observation we reported above that changes tend to follow an S-shaped curve when the frequency of new vs. old forms is plotted against time (see Figure . above). The S-curve can be mathematically modelled by a function called the logistic. This function permits one to determine, as a linear function of time, the values of two quantities s and k. The first of these, s, is the ‘slope’ of the function and ‘hence represents the rate of replacement of the new form by the old’ (Kroch : ). The second, k, is the ‘intercept’ value, which measures the frequency of the new form at some fixed point in time. What is of interest for the Constant Rate Effect are the values of s; as Kroch says ‘[b]ecause fitting empirical data to the logistic function will allow us to estimate the slope parameters for each context of a changing form, we can determine, where sufficient data are available, whether the rates of change in different contexts are the same or different’ (Kroch : –). The Constant Rate Effect summarizes the empirical result that they are the same. Kroch bases his study of the rise of periphrastic do on Ellegård’s () survey of the instances of this element in texts for the period –. Ellegård’s data is summarized in the graph in Figure .. As Kroch (: ) points out, the curves are approximately S-shaped up to Ellegård’s Period , –.11 Kroch (: ) observes that it ‘seems plausible to hypothesize that the point of inflection in Period  corresponds to a major reanalysis of the English auxiliary system’; this is the loss of V-to-T movement. Kroch applies the logistic to the data underlying Ellegård’s graph in Figure . and arrives at the following values for the slope parameter for Periods – (the contexts are those employed by Ellegård):

Ellegård’s periods are as follows: Period  ‒; Period  ‒; Period  ‒; Period  ‒; Period  ‒; Period  ‒; Period  ‒; Period  ‒; Period  ‒; Period  ‒; Period  ‒. See Kroch’s table , p. . 11



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

% 90 80 70 60 50 40 30 20 10

1500

1600

1700

Figure 4.2 Auxiliary do. Percentage of do forms in different types of sentence, – Upper broken line: negative questions Upper solid line: affirmative questions Lower broken line: negative declarative sentences Lower solid line: affirmative declarative sentences Adapted from Ellegård, The Auxiliary Do. University of Gothenburg, ; see also Lenoch (: ).

() Negative declaratives: . Negative questions: . Affirmative transitive adverbial and yes/no questions: . Affirmative intransitive adverbial and yes/no questions: . Affirmative wh-object questions: . Kroch points out that the single best slope for all five contexts has a value of . and that the probability of random fluctuations giving rise to the deviations from this value observed in () is . (χ2 = .). As he says, the results ‘support the hypothesis that the slopes of the curves are underlyingly the same and the observed differences among them are random fluctuations’ (). This shows that the grammar with do-insertion is replacing the grammar with V-to-T movement at a constant rate in all contexts, a conclusion which justifies the postulation of something like the V-to-T parameter of §..., as well as the idea that this parameter was changing its value in the sixteenth century (see §...). Of course, we still need to understand what it means for the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRADUALNESS

grammars to compete; we will come back to this in the next section. Kroch goes on to confirm this conclusion by comparing the rate of replacement of V-Adv order with Adv-V order (the adverb in question is never); recall that this is a further indication of the loss of V-to-T movement.12 It emerges that the slope of the curve plotting the replacement of V-Adv by Adv-V has very nearly the same value as that plotting the rise of do in questions and negatives. He concludes ‘we have here substantial evidence that all contexts reflecting the loss of V-to-I raising [i.e. V-to-T raising—IGR] change at the same rate’ (). A well-known feature of sixteenth-century English, which we commented on in §.., is the fact that periphrastic do could also appear in positive declarative contexts, where it is no longer available (without emphasis) in Modern Standard English (see §.., ()). Kroch shows that this occurrence of do changed at the same rate as do in the other contexts up to Period . After this, however, the use of positive declarative do declines, while its use increases in the other contexts.13 Kroch suggests that this is because positive declarative do starts to compete with affix hopping (i.e. the Agreeφ relation between T and V—see §..) once V-to-T is lost, i.e. after Period , and ‘loses’ this competition. Hence the development of positive declarative do does not follow an S-curve and does not, after Period , change at a constant rate with the other contexts for periphrastic do. Although not required by Kroch’s assumptions, we can take ‘grammatical option’ in the statement of the Constant Rate Effect to refer to a parametric option. This is justified by the fact that the Constant Rate Effect can reveal clusterings of surface changes of the type that clearly indicate change in a single underlying parameter. As we have just seen, in Kroch’s discussion of the rise of periphrastic do, we are witnessing the change in the value of the V-to-T parameter. We can thus formulate the Constant Rate Effect in parametric terms as follows: () Value vj of parameter Pi replaces value vi6¼j at a constant rate. 12

There is a complication in that ME allowed Adv-V order as well. See Kroch (: –) for a discussion of the problem and a proposed solution. See also Haeberli and Ihsane () and §.... 13 Although not monotonically: the interrogative contexts continue to change together and to increase in frequency, but negative declarative do decreases for the period immediately after , and only picks up again after . Kroch suggests, very plausibly, that this was caused by independent changes in the syntax of not. This is discussed further in §.. below and in Roberts (a: –).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

It is important to see that the Constant Rate Effect implies that one parameter setting will be in competition with another for a certain period; as already mentioned, I will come back to this notion of competing grammars (differing in the value of at least one parameter) in more detail in the next section. The Constant Rate Effect is of great interest for two reasons. First, it reduces the observed gradual replacement of one form by another to competing grammars. Since distinct grammars are distinct entities, defined as differing from one another in at least one parameter value, the appearance of gradual change really reflects one discrete entity gradually replacing another, either in a speech community or in an individual. The gradualness is then an effect either of spread through a population, or of some factor causing an individual endowed with competence in more than one grammar to access one of these grammars more readily than the other(s) over time. In these terms, () can be reformulated as follows: (0 ) In a given speech community/individual, grammar Gi with parameter Pi set to value vi replaces grammar Gj with parameter Pi set to value vj6¼i at a constant rate. One might wonder why (0 ) should hold. It is unlikely to be a fact about the grammars themselves. Instead, it is plausible that it may be a fact either about speech communities or about the ways in which individuals choose among grammars available to them. As such, it may be attributable to sociolinguistic factors or to the dynamics of populations, or both factors acting in tandem. Kauhanen and Walkden () propose an answer to this question. They argue that the Constant Rate Effect is indeed a fact about how individuals choose among grammars available to them, as suggested above. More precisely, they propose that the Constant Rate Effect ‘occurs because of contextspecific production biases which serve either to promote or to hinder an underlying change in progress’ (Kauhanen and Walkden : ). The result of their investigation is ‘a first step towards a mechanistic model of the CRE [Constant Rate Effect, IGR] that not only describes the diachronic phenomenon but explains it by deriving it from independently plausible first principles of language acquisition and use’ (Kauhanen and Walkden : ). The principles of language acquisition in question amount to a version of Yang’s (a,b) variational learning model described in §... The principles of language use 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

GRADUALNESS

involve production biases in relation to contexts for which Kauhanen and Walkden provide a formal expression. However, Kauhanen and Walkden do not provide an exhaustive account of what the production biases may amount to: for syntactic change, they mention processing preferences of the kind discussed by Hawkins (, ) and the possible effects of Gricean pragmatic inferencing (we saw in §. that Breitarth, Lucas, and Willis  introduce pragmatic factors into their account of Jespersen’s Cycle). They go on to demonstrate that their model can distinguish genuine cases of the Constant Rate Effect (including the emergence of English periphrastic do as discussed by Kroch  and summarized above) from various ‘false positives’. The second reason that the Constant Rate Effect is of great interest is that it can provide direct evidence of the clustering effect we expect from parametric change. The ‘different contexts’ Kroch refers to are the different surface manifestations of a given parameter setting. We saw this in detail with the example of periphrastic do above. In principle, it could be repeated with other examples of parameters. For example, Kroch (: –) discusses the loss of V in French; see §.... Willis (: ) makes this point regarding the Constant Rate Effect, and observes that what he calls the ‘uniform diffusion’ of the consequences of a parametric change manifest themselves as the Constant Rate Effect. It is worth pointing out that a statistical treatment of the data connected to a parametric change of at least the level of sophistication and detail of Kroch’s study may be needed in order to demonstrate the uniform diffusion of a parametric change; the textual record may very well not wear the effects of parametric change on its sleeve. 4.1.6. Conclusion In this section we have seen several reasons to think that the observed gradualness in the time course of syntactic change is illusory. This is so since grammars rely on discrete entities, the relevant one for our purposes being parameter values, and because language acquirers can have no information as to ‘ongoing changes’ in the PLD to which they are exposed. The illusion of gradual change can be traced to a variety of factors, both grammatical and sociolinguistic. Here I have concentrated on the grammatical factors: lexical diffusion, construed as 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

nanoparametric change, microparametric variation, and true optionality. I also illustrated, following Kroch (), the Constant Rate Effect, showing how this reduces gradualness in syntactic change to extragrammatical grammar competition. It is now time to review the nature of this grammar competition in more detail, and to consider the sociolinguistic factors which influence the time course of syntactic change. More generally, we must address the role of ‘external’ forces (i.e. aspects of E-language external to the language acquirer other than the PLD itself) in syntactic change.14

4.2. The spread of syntactic change 4.2.1. Introduction The central question that I want to address in this section is: how does a parameter change spread through a speech community? An account of this is necessary in order to give a full description of the nature of ‘successful’ syntactic changes, i.e. those whose result is the complete replacement of an old grammar by a new one. The internalist account of parameter setting as being a consequence of aspects of the firstlanguage acquisition process, which we described in the preceding chapter, gives us a very interesting perspective on change at the individual, I-language level, but, on its own, it cannot tell us about how changes affect speech communities, i.e. E-language aspects of change. This point is really at the heart of what WLH (ff.) refer to as the paradoxes of historical linguistics; discussing Paul’s () account of

14

Maria-Teresa Guasti (p.c.) points out that there is evidence for gradual change, even in the usage of a single child, in first-language acquisition: she presents data showing the gradual time course of the loss of both early null subjects and root infinitives (see figures ., ., and . in Guasti : , ). It would be very interesting to know if the Constant Rate Effect can be observed in these cases. It may be possible to appeal to competing grammars in language acquisition, as suggested in Clark and Roberts () and Yang (, a,b, , ), in order to account for this. It is also possible that these facts further demonstrate that language acquisition is fundamentally different from language change, owing perhaps to the effects of maturation. If maturation can be shown to be at work in first-language acquisition, then the uniformitarian thesis does not hold in this domain, whereas we are following the standard assumption that it does in language change. (See §. and §...)



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE SPREAD OF SYNTACTIC CHANGE

language change, they state that ‘[i]n isolating the language of the individual from the language custom of the group, Paul developed a dichotomy which was adopted by generations of succeeding linguists and which lies . . . at the bottom of the twentieth-century paradoxes concerning language change’ (). We can put this issue in terms of the distinction between the actuation and the various aspects of the implementation of change as described by WLH (–). The most important aspect of implementation for our purposes is the transition problem: ‘the intervening stages which can be observed, or which must be posited, between any two forms of a language defined for a language community at different times’ (WLH: ). The actuation of a change is the introduction of a novel form; the transition of a change is the spread of that form through a speech community. The approach to syntactic change as driven by first-language acquisition that we have advocated here arguably provides an account of the actuation of syntactic change (although WLH: – make some critical remarks about Halle’s () similar proposals regarding phonological change; I will return to these below), but says nothing about the transition problem. That is what I want to address here. We begin by taking up again Kroch’s notion of grammars in competition. We will scrutinize this more closely, looking at grammar competition both in the speech community and in the individual. In this context, I take up the question posed at the end of the previous section: what are the Constant Rate Effect and S-curve describing change really properties of? Here I also introduce the concept of ‘syntactic diglossia’: the idea that individuals, and therefore speech communities, may synchronically instantiate several grammatical systems, with only minimal phonological and lexical variation. This leads naturally to a discussion of code-switching. Diglossia—syntactic or otherwise—and code-switching are cases of what WLH () call ‘orderly differentiation’, which they take to be a key notion in understanding language change. Orderly differentiation can include both coexistent systems—languages or dialects in contact— or variation within a single system, and it may be keyed either to other aspects of the system or to sociological factors such as age, sex, class, and ethnicity.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

4.2.2. Orderly differentiation and social stratification A well-known example of orderly differentiation involving the social factors related to a sound change comes from the incidence of final and preconsonantal (r)15 in New York City (see Labov ; WLH: –). Figure . shows the frequency of final and preconsonantal (r) (i.e. the pronunciation of car as /kar/ as opposed to /ka:/) in the casual speech of adult native New Yorkers in the mid-s. The horizontal axis plots age and socio-economic class; the vertical axis the percentage of (r). As WLH () point out, this figure ‘shows an increase in the stratification of (r): the distance between the upper middle class and the rest of the population is increasing’. They further point out that the pronunciation of (r) ‘has evidently acquired the social significance of a prestige pronunciation’ for younger speakers, while for older speakers ‘there is no particular pattern to the distribution of (r)’

SOCIAL CLASS 6–8: Lower middle 9: Upper middle (20–29 yrs) 9: Upper middle 3–5: Working class 0–2: Lower class

70

% constricted [r]

60 50 40 30 20 10 0 Casual Speech

Figure 4.3

Careful Speech

Reading Style

Word Lists

Minimal Pairs

Social stratification of (r) in New York City

From Labov (: ).

15 Sociolinguistic variables having to do with phonetic or phonological variation are commonly written in normal parentheses, as opposed to the square brackets used for phonetic transcription and the obliques used for phonemic transcription.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE SPREAD OF SYNTACTIC CHANGE

(). Here we see an example of a linguistic variable taking on social value; Labov also demonstrated how speakers may be conscious of such social value.16 Labov (, , , ) has shown very convincingly how social factors influence the spread of sound change through a speech community, and so it is natural to ask whether they also influence syntactic change. In fact, one of Labov’s pioneering studies concerned negative concord in African-American Vernacular English (AAVE). Labov () makes a number of striking observations concerning negative concord, and other aspects of the syntax of negation, in AAVE. The most important observations are (i) that negative concord can manifest itself across clauses, in that formal, clausal negation in a complement or adjunct clause can be construed as an instance of negative concord with clausal negation in a higher clause; and (ii) negative concord can affect the subject. (Thus AAVE is a strict negative-concord language in the terminology introduced in §...) These two phenomena are illustrated in (): () a. Well, wasn’t much I couldn’t do. (Derek; Labov : ) b. Down here nobody don’t know about no club. (William T., , Florida; Labov : ) The two phenomena combine to give rise to examples of the following type: () a. It ain’t no cat can’t get in no coop. b. When it rained, nobody don’t know it didn’t. Each of these examples actually contains a single logical negation (‘No cat can get into a coop’; ‘nobody knew it did’), the further expressions of formal negation being reflexes of the negative-concord system of this variety. In the terms we introduced in §.., it must be the case that AAVE clausal negation (usually expressed by an auxiliary combined with n’t) has an uninterpretable Neg feature. The non-clause-bounded negative concord seen in () poses a problem for the Agree-based 16 Labov’s original data, published in , was collected in  and was replicated by Fowler (). She found that, overall, little had changed in the social stratification of this variable in the intervening years. Labov (: ) concludes that this change is still in the initial ‘slow’ change of the lower part of the S-curve, saying ‘at some point in the process we must expect a sudden acceleration’ ().



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

analysis of negative concord described in §.. (largely based on Zeijlstra ); see the comments in Roberts (a: ‒). Labov points out that in examples like those in () we observe two changes. The first is that negative concord has ‘lost its emphatic character’ in AAVE, and the second is its spread into the new environment, in that it extends to clausal negation in lower clauses. The ‘loss of emphatic character’ is seen partly through the existence of other devices to indicate emphatic negation in AAVE, as follows: () a. Introduction of ‘extra’ quantifiers: She ain’t in no seventh grade. b. ‘Free floating negatives’: But not my physical structure can’t walk through that wall. c. Negative inversion: Ain’t nobody on the block go to school. d. Concord with new quantifiers: Don’t so many people do it. Labov argues that even Standard English has negative concord in emphatic contexts (pace the rather simplified discussion of English in §..), and so the loss of the emphatic interpretation is an important change. We saw in our discussion of Breitbarth, Lucas, and Willis’s () account of Jespersen’s Cycle in §. that the introduction of ‘extra’ emphatic negative elements corresponds to what they identify as Stage Ib of Jespersen’s Cycle. To speakers unfamiliar with AAVE, it is the extension of negative concord to clausal negation in lower clauses that is the really striking innovation; as just observed, this poses difficulties for the account of negative concord adopted here. Labov compares AAVE with two non-standard varieties of English spoken by whites: one in New York City and the other in Atlanta. Both have negative concord, but the New York variety does not allow negative concord on a lower verb; the Atlanta variety does but does not allow ‘long-distance’ negative concord otherwise. (These facts are summed up in Labov’s table ., .) Moreover, negative concord is variable in these varieties, but not in AAVE, where non-concord is only found in contexts which can be construed as involving code-switching with the standard variety (Labov : –). All of this clearly shows that AAVE is a distinct grammatical system from Standard English (and from the non-standard white varieties), and that it may have 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE SPREAD OF SYNTACTIC CHANGE

undergone, or be undergoing, certain parametric changes which do not affect other varieties (see Roberts a: , note ). And of course this distinct variety has social value, since it is associated with a particular ethnic group. Syntactic variation and change can also, therefore, be associated with social value and social stratification. 4.2.3. Grammars in competition Let us return now the question of grammars in competition, another potential case of orderly differentiation in the sense of WLH. In addition to the cases discussed in Kroch (), Santorini (, , ) argues for grammar competition in word-order change in the history of Yiddish; Pintzuk () very influentially argued the same for word-order change in the history of English (as we mentioned in §..); similarly Taylor () reaches the same conclusion regarding word-order change between Homeric and Classical Greek. Finally, Fontana () argues that the loss of V in Spanish involved grammar competition. These and a number of other studies have contributed a substantial body of work on syntactic change making use of the idea of competing grammars (see Kroch , : ; Pintzuk : ). As we saw at the end of §., the Constant Rate Effect can be seen as the consequence of the presence of two grammars in an individual or a speech community where one grammar is in the process of replacing the other—see (0 ). In fact, the Constant Rate Effect is an aspect of the transition of a change: the logistic function tracks the fraction of the advancing form in relation to the slope and the intercept, and in relation to time. As Kroch (: ) points out, the fraction of the advancing form ‘jumps from zero to some small positive value in a temporal discontinuity’; this is the point of actuation. The actuation can be seen as the abductive reanalyses associated with parametric change, as described in the preceding chapters, i.e. as an aspect of I-language. The transition is the spread of the new parameter setting, with the associated reanalysed structures, through a speech community (or really through a body of texts representing the E-language of a speech community, a linguistic fossil record). So (0 ) is a generalization about the implementation of a change. If the Constant Rate Effect, in the guise of (0 ), is a generalization about the transition of a change, what is the S-curve? We can phrase this question in another way: following Kroch, in order to countenance 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

the Constant Rate Effect at all, we must assume that more than one grammar may be present in a speech community at a given historical moment. If we are to abandon the idealization of an ideal speaker/ hearer in a homogeneous speech community (see Chomsky : ), which has proven so useful for the purposes of linguistic theory, and instead consider ‘actual people in specific circumstances’ (Kroch : ), such as speakers of sixteenth-century English or contemporary Neapolitans controlling a range of dialects in addition to Standard Italian, then we simply must admit that a single speech community may feature more than one grammar. It seems, then, that we have no choice other than to adopt the idea of grammars in competition. However, we are really only forced to adopt the idea of coexisting grammars, and there is no reason—as far as the grammars themselves are concerned—why this coexistence should not be peaceful. To deny this would be to deny that an individual can be truly, natively bilingual, and this seems contrary to fact. So, why do we find change: the case where one of the grammars ousts the other one, in conformity with the S-curve? In other words, although the Constant Rate Effect entails competing or coexisting grammars, competing/coexisting grammars do not entail the Constant Rate Effect.17 As it stands, the competition/coexistence model explains neither the inception of the change nor its completion (this is also true for Kauhanen and Walkden, who take the existence of competing/coexisting grammars for granted). We can consider the inception of the change to be actuation, and perhaps explain it along the lines described in the previous chapters, but this still leaves open the question of the completion of the change, i.e. the simple fact that one grammar eventually ousts the other. So the 17 Kauhanen and Walkden (: ) claim that this may not be true of their account of Constant Rate Effects: they ‘claim that Constant Rate Effects occur if, and only if, diachronically constant biases impinge on an underlying change’ (emphasis mine, IGR). Since they assume grammars in competition, then the relation is in fact a biconditional one in their approach. The question then becomes whether the production biases behind their account of the Constant Rate Effects are ‘diachronically constant’ (see the summary of their discussion of production biases in §..). This is very important for their model, as they point out: ‘[t]ime-dependent biases, or random biases, would result in change processes in which the trajectories of different linguistic contexts are not parallel to each other— something we might refer to as an “Inconstant Rate Effect.” Having said that, it is not inconceivable that some non-innate biases are constant on sufficiently long timescales so that they may give rise to Constant Rate Effects’. Here it may be that third factors of the kind discussed in particular in §.., which would in fact be innate but not domain-specific biases, play a role. As innate cognitive biases, these would certainly be ‘constant on sufficiently long timescales’.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE SPREAD OF SYNTACTIC CHANGE

competition/coexistence idea needs supplementing if we are to arrive at an account of the transition of change. Yang’s (, a,b, , ) variational learning model, combined with the idea that change is actuated by abductive reanalysis, driven by third-factor principles such as Feature Economy and Input Generalization, as we saw in Chapter , can provide an answer to this question. As we saw in §.., the central idea of the variational learning model is that parameter values are associated with a probability weighting that changes during acquisition in response to the input. The weighting of a given parameter value vi changes as a function of the evidence in favour of vi: the more evidence in its favour, the more it is rewarded. This leads Yang to propose the Fundamental Theorem of Language Change (Yang : ): when grammars compete, the one with the greater parsing advantage will win. ‘Greater parsing advantage’ can be interpreted more generally in terms of greater efficiency in relation to third factors; in that case, the innovative grammar has an inherent advantage and will always win. Kauhanen and Walkden (: ) take a more nuanced view: for them, the production biases may alter the outcome, potentially in favour of the conservative grammar (if, as suggested in note , the biases reduce to third factors, this amounts to saying that the third factors which facilitate actuation do not necessarily guarantee transmission, which is in line with our conclusion regarding the weak effects of markedness biases in §..). Another possibility would be to construe the actuation/implementation distinction as illusory, and to consider transition to be simply multiple cases of actuation (i.e. reanalyses and associated parameter change) across a population. The initial actuation could take place in a very small group of acquirers, for the kinds of reasons discussed in the previous chapter, and spread through the population as more and more acquirers are exposed to the innovative system. Assuming the innovative system to be favoured by the parameter-setting device, then we can understand why this system eventually ‘wins’. The S-curve is then seen as a property of population dynamics; cf. Kroch’s (: ) observation that ‘[i]n the domain of population biology, it is demonstrable that the logistic governs the replacement of organisms and of genetic alleles that differ in Darwinian fitness’. (Recall that the logistic underlies the S-curve.) If we construe markedness as fitness, i.e. the less marked a parameter-setting is, the fitter it is (see Clark and Roberts  for an implementation of this idea), then we can immediately see why 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

syntactic change follows the S-curve. Yang (: ) develops a similar idea, proposing as a corollary to the variational learning model of change that ‘[o]nce a grammar is on the rise . . . it is unstoppable’; cf. §.., §.. and below (see also the discussion of actuation in relation to the Tolerance Principle in Yang : ‒). Niyogi and Berwick (, ) show that it is possible to model the interaction of a set of grammars, a learning algorithm of a particular type, and a quantity Pi ‘the distribution with which sentences of the ith grammar gi . . . are presented if there is a speaker of gi in the adult population’ (: ) as a dynamical system. (Lass : – gives a very useful introduction to some of the concepts of dynamicalsystems theory in the context of historical linguistics; see note  below.) We can think of Pi as relating to the nature of the PLD. They apply their model to the case of the loss of V in the history of French (see §...). Niyogi and Berwick demonstrate that if the initial condition assumes a homogeneous speech community (i.e. that Pi is uniform), the V system dies out rather slowly (‘within  generations,  percent of the speakers have lost Verb second completely’; : ). Very interestingly, they go on to show that if the initial condition involves a mix of V and non-V grammars, V is lost across the population much more quickly. They say that ‘[s]urprisingly small proportions of Modern French [i.e. a non-V grammar—IGR] cause a disproportionate number of speakers to lose Verb second’ (: ). It seems, then, that variation might lead to change simply because of the ways in which the alternate grammars, the learning algorithm, and the distribution of PLD interact dynamically (in essence, this is also Kauhanen and Walkden’s () general conclusion, although their discussion focuses in particular on the Constant Rate Effect). This conclusion is confirmed in Niyogi and Berwick (), where, alongside the conclusion just described, that V becomes unstable if only a very small proportion of non-V examples are introduced into the sentences presented to the learning algorithm, they show that null subjects are not lost under these conditions, and so the system does not replicate the actual historical development of French. An important further result is that they are able to derive the S-curve as the typical (but not the only) form of propagation of a new system through a population. These simulations are very suggestive and arguably represent, as Niyogi and Berwick (: ) point out, a first real effort to ‘place historical linguistics . . . on a scientific platform’. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE SPREAD OF SYNTACTIC CHANGE

Niyogi () develops these ideas further, and shows in detail how learning theory can contribute to our understanding of language change (as well as relating it to other questions, such as the evolution of language), once the fundamental concepts are properly formalized. As in the case of Niyogi and Berwick (; ), the central idea is that, given a learning algorithm, a probability distribution of linguistic tokens across a population, and a restricted class of grammars from which to select, variability will readily result as long as the time allowed for the selection of hypotheses is restricted. In other words, the existence of a critical, or sensitive, period for language acquisition may be sufficient to guarantee variation in a speech community after a single generation.18 To illustrate this idea, Niyogi (: ) invites us to consider a world in which there are just two languages, Lh and Lh. Given a completely homogeneous community where all adults speak Lh, and an infinite number of sentences in the PLD, the child will always be able to apply a learning algorithm to converge on the language of the adults, and change will never take place. On the other hand, Niyogi continues: Now consider the possibility that the child is not exposed to an infinite number of sentences but only to a finite number N after which it matures and its language crystallizes. Whatever grammatical hypothesis the child has after N sentences, it retains for the rest of its life. Under such a setting, if N is large enough, it might be the case that most children learn Lh, but a small proportion end up acquiring Lh. In one generation, a completely homogeneous community has lost its pure [footnote omitted—IGR] character.

What happens ‘next’, as it were, i.e. in the third and subsequent generations, depends on the nature of the grammars of Lh and Lh, the quantity N, and the nature of the learning algorithm. All these issues are discussed, illustrated, and formalized in detail by Niyogi. Given the hetereogeneity in any speech community, the random distribution of PLD, and the limited time for learning, change is inevitable. Moreover, ‘the dynamics of language evolution are typically non-linear’ 18 Guasti (: ‒) discusses the evidence for a critical, or sensitive, period for language acquisition. She concludes: ‘The evidence to date shows that there are critical periods for acquiring the phonology, morphosyntax, and syntax of one’s native language and also of a second language. These periods may differ for different aspects of language; that is, there are multiple critical periods. Moreover, recent evidence shows that different types of experiences may alter the onset, duration, and offset of critical periods’ (Guasti : ‒).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

(Niyogi : xiv); ‘much like phase transitions in physics, . . . the continuous drift of such frequency effects could lead to discontinuous changes in the stability of languages over time’ (Niyogi : xv). Modelling the interaction of learners, data, and grammars at the level of populations ‘provides some understanding of how a major transition in the linguistic behavior of a community may come about as a result of a minor drift in usage frequencies provided those usage frequencies cross a critical threshold’ (Niyogi : ; emphasis in original). In this work, we see several important strands of thought (natural languages as formal systems, the heterogeneity of speech communities, the critical period for language acquisition, and the existence of an innate capacity for acquiring grammars) combine to give us an understanding, expressible in formal, mathematical terms, of how and why languages change. In fact, we arrive at a diachronic criterion of adequacy for grammatical theories, since ‘[t]he class of grammars G (along with a proposed learning algorithm A) can be reduced to a dynamical system whose evolution must be consistent with that of the true evolution of human languages (as reconstructed from the historical data)’ (Niyogi : ). We saw in §.. that Yang (: ) points out that the Tolerance Principle can be ‘dynamicised’ such that N (the number of candidates to which a productive rule R is applicable) and e (the number of exceptions that could but do not follow R) can ‘undergo[] perturbation over generations (t)’, i.e. we can postulate the dynamical system (N, e)t. If, as suggested in §.., Tolerance is the mechanism by which parameter hierarchies are traversed, then this gives us a dynamical system based on parameter hierarchies. Thus a system of variation based on parameter hierarchies meets the diachronic criterion of adequacy. Niyogi and Berwick’s approach may be workable for modelling grammar competition/coexistence in the speech community. However, an important aspect of Kroch’s thinking involves the possibility that a single individual may have more than one grammar. The following quotation summarizes Kroch’s position: One difficulty with the Niyogi and Berwick model . . . is that it presumes that the competing parameter settings are located in different speakers, so that the quantitative element in syntactic change is located in the population, not in the individual. However, the data from the empirical studies that reveal the gradual nature of change are not consistent with Niyogi and Berwick’s model in this respect. On the contrary, . . . the variation in usage that reflects different parameter settings is found within texts . . . To



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE SPREAD OF SYNTACTIC CHANGE

model this variation, it is necessary to allow for syntactic diglossia within individual authors as the normal situation during a period of change. (Kroch : )

What Kroch refers to as ‘syntactic diglossia’ here means that individuals have competence in more than one syntactic system. Although the phonologies and lexica associated with each system are the same or very similar, the two systems differ in the value of at least one parameter and are distinct grammars by definition. (Of course, if parameter values are specified in lexical entries, then that amounts to a difference between the two lexical entries in question.) In fact, Niyogi (: ff.) devotes some space to modelling the kind of dynamical system that arises if learners are assumed to be capable of acquiring more than one grammar. The model is applied to the changes involving the loss of V and null subjects in the history of French discussed in Clark and Roberts () and Niyogi and Berwick (, ) (see §.. and §... for more details), with the very interesting result that V must have been lost before null subjects were, and that the loss of V must have been triggered by an increase in the use of pronominal subjects (Niyogi : ). Both of these empirical claims are probably correct; see Roberts (a) and especially Vance () for more details. In general, then, it seems that modelling the interaction of grammars, possibly multilingual learners, and populations as dynamic systems may be capable of contributing much to our understanding of the nature of language change. Let us examine Kroch’s concept of ‘syntactic diglossia’ more closely. The possibility that a single individual may have two distinct grammars must of course be acknowledged for the case of true bilinguals, individuals who have native competence in what we clearly would consider two different languages (for example, English and Italian, where at least the value of the null-subject parameter would be distinct). In many speech communities, the two grammars are in a relation of diglossia, in that one system is appropriate for a given range of sociolinguistic functions and the other for a quite distinct range. Diglossic situations typically involve a contrast between a ‘high’ variety, appropriate to relatively formal situations, contrasting with a ‘low’ variety, appropriate to more informal situations (see Ferguson ; WLH: ; MartinJones ). Kroch suggests that syntactic diglossia may, where the innovative grammar is ‘more native’ than the conservative, prestige



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

variety ‘acquired a bit later in life’ (: ), account for the gradual spread of changes: it could easily be the case that the forms in competition in syntactic diglossia represent an opposition between an innovative vernacular and a conservative literary language. Since the former would have both a psycholinguistic advantage and the advantage of numbers, it should win out over time, even in written texts. Under this model, the gradualism found in texts might not reflect any basic mechanism of language change, but rather the psycho- and sociolinguistics of bilingualism. (Kroch : )

This seems to imply that the ‘psycho- and sociolinguistics of bilingualism’, rather than facts about population dynamics or dynamic systems more generally, may underlie the S-shaped trajectory of change. If Kroch’s view is correct, we would expect to find similarities between syntactic diglossia and other forms of diglossia or bilingualism. This is entirely possible in principle, and certainly constitutes a way of understanding syntactic change which could reconcile our ‘internalist’, I-language approach with the need to account for the spread of change through a speech community. Unfortunately, however, there is in practice rather little evidence in favour of syntactic diglossia in some of the cases where competing grammars have been proposed. (This does not mean, of course, that the evidence has not been lost, or was simply never recorded.) The first problem is that, although we must postulate the use of multiple grammars by a single author in a single text, there is no clear evidence that the authors in question control the variants in the typical diglossic way, using one grammar for formal, institutional contexts and the other in less formal contexts. For example, Pintzuk (: ) treats the following two OE examples as representatives of competing grammars, since (a) has AuxOV order and (b) OVAux order: () a. He ne mæg his agene aberan. he  can his own support ‘He cannot support his own.’ (CP .) b. hu he his agene unðeawas ongietan wille how he his own faults perceive will ‘how he will perceive his own faults’ (CP .–) 

[OE]

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE SPREAD OF SYNTACTIC CHANGE

Pintzuk argues that (a) is generated by grammar with head-initial TP (IP in her terminology), since auxiliaries are taken to occupy T, while (b) is generated by a grammar with head-final TP. Both examples are taken from a single text, Alfred’s translation of Gregory the Great’s Cura Pastoralis written in the late ninth century. So, given Pintzuk’s analysis, the sentences in () support Kroch’s assertion that competing grammars may be found in a single text by a single author. However, they also raise a problem for the notion of diglossia: it is not clear whether one would want to categorize Alfred’s style as ‘high’ or ‘low’. The translations of various Latin works such as the Cura Pastoralis were carried out on Alfred’s orders to preserve knowledge of Latin culture in England at a time of a decline in learning owing to the Danish invasions: see Mitchell and Robinson (: ),19 but there seems to be no reason to suppose that the translator is consciously manipulating ‘high’ or ‘low’ registers in using these different word orders. Certainly, no such claim is made by Pintzuk, although she comments (: ) that ‘competition occurs within an individual and can be understood as code-switching or register-switching’. The notion that the different word orders reflect different grammars in a diglossic relation does not receive any further discussion or support, however.20 Pintzuk’s mention of code-switching raises another possibility. It seems clear from studies of code-switching that a single speaker may employ two distinct grammars even in a single sentence.21 The following example of English–Spanish code-mixing illustrates this: () No creo que son fifty dollar suede ones. ‘(I) don’t think that (they) are . . . ’ (Poplack : , cited in Muysken : ) Here we see the main clause and most of the embedded clause, as far as the post-copular nominal predicate (fifty dollar suede ones), is Spanish.

These translations of Latin works are sometimes referred to as ‘vernacular’, for example, by Mitchell and Robinson (: ), but in this context this simply means OE as opposed to Latin. 20 Harris and Campbell (: ) also point out, discussing Lightfoot’s (: –) appeal to diglossia in a similar context, that this line of reasoning is likely to lead to the postulation of ‘a plethora of grammars’. 21 Muysken (: ) prefers the term ‘code-mixing’, suggesting that ‘code-switching’ already implies a particular analysis of the processes involved. 19



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

Being Spanish, null subjects are allowed: the subject of each clause is null. So in code-mixing, speakers can produce sentences parts of which are generated by a grammar with one parameter setting, and parts by another grammar with an opposite parameter setting. One might think that () is a Spanish sentence with English words inserted in it, but if parameters are specified by lexical entries, then, although the negative value of the null-subject parameter is not expressed by the English portion of (), there is a switch in grammars along with the switch in lexical items; this can in fact be seen from the order of prenominal modifiers in the DP fifty dollar suede ones, showing the typical English order of adnominal modifiers, which is ungrammatical in Spanish. However one accounts for the phenomenon illustrated in () (and see Muysken : ff. for an interesting discussion), it shows that different grammars may be used even in a single sentence. If this is true for code-mixing in general, and if the ‘syntactic diglossia’ discussed by Kroch is also subject to code-mixing, then we expect to find it at the subsentential level. But this creates an empirical problem, at least for Pintzuk’s study of word-order change in the history of English. It is well-known that OE allowed a wide range of subordinate-clause word orders (see the discussions in §.. and §.). In fact all the logically possible orderings of auxiliary (Aux), verb (V), and direct object (O) are attested, as pointed out by Pintzuk (: , : ), with the striking exception of the order VOAux (Pintzuk : , n., : ; see also Kiparsky : ; Roberts : ). This order is ruled out by the Final-Over-Final Condition (FOFC); see §.., §... But this order should be allowed if subsentential code-switching is possible where the grammars are in competition, and if head-initial and headfinal VPs and TPs are in competition. That is, a structure such as the following should be possible (extrapolating away from vP, which is not assumed by Pintzuk):22 () [TP DP-subject [VP V DP-object] [T Aux]] Muysken (: –) gives evidence for linearly discontinuous code-mixing of the type that would produce the structure in (). 22 In Pintzuk (: , ()), a double-VP structure is assumed, with the VPs labelled VP and VP and the auxiliary moving from the higher V, V, to T (her I). Presumably, VP could be assimilated to vP.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE SPREAD OF SYNTACTIC CHANGE

Here TP is produced by a head-final system and VP by a head-initial one. So such orders are predicted by Pintzuk’s approach, and yet are not found; therefore they must be ruled out by stipulation (see Pintzuk : ). This point is also made by Fuß and Trips (: ), who propose a refinement of the competing-grammars model in order to deal with it.23 So it seems that the competing grammars must be allowed to coexist in a single individual, without standing in a diglossic relationship and without the kind of possibility of subsentential code-mixing that is common in bilingual individuals and speech communities. Muysken (: –) points out that a ‘growing number of studies have demonstrated . . . that many bilinguals will produce mixed sentences in ordinary conversations . . . for some speakers it is the unmarked code in certain circumstances’. For speakers commanding two very similar systems, which are all but identical lexically and phonologically and minimally different in some syntactic parameter, as is proposed for OE by Pintzuk for example, such code-mixing should be highly prevalent. But the absence of the VOAux order in OE suggests that this is not the case, or at the very least poses a serious analytical problem for accounts like those put forward by Pintzuk. So the first problem with the competing-grammars approach is that it predicts unattested structures, given the expected effects of code-mixing. The second problem, which has often been pointed out, is that the competing-grammars hypothesis creates learnability problems. Kroch (: ) argues that ‘the learner will postulate competing grammars only when languages give evidence of the simultaneous use of incompatible forms’. Similarly, Niyogi (: ) states that the fact that ‘learners do not attain a single grammar . . . reflects the fact that a single grammar is not adequate to account for the conflicting data they receive’. This must be true in the case of ‘true’ bilingualism: 23 Fuß and Trips propose that only lexical categories can vary for the head parameter, while functional categories are universally head-initial. If roll-up movement of the type described in §.. is not allowed, then this is enough to rule out the VOAux order, since T will always take its VP complement to its right, still assuming Aux is in T. At the same time, the orders attested in OE are allowed. Clearly, however, one can question the theoretical basis of this directionality distinction between lexical and functional categories. Moreover, a number of examples of Greenbergian cross-categorial harmony of the type illustrated in §.. will be hard to account for, for example, the order NP-D, or clause-final complementizers. (See the discussion following the Malayalam examples in () of §...)



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

presumably a child exposed to PLD from both Italian and English is able to tell, on the basis of a host of cues including of course phonological and lexical evidence, that s/he is confronted with two systems (see Guasti : ff., ‒). But this is not so clear if the two systems are minimally syntactically distinct, not obviously sociolinguistically distinct (since they coexist in the same speaker and the same text) and not, or only minimally, distinct in terms of the lexicon and phonology. Kroch (: ) states that ‘[h]ow learners acquire diglossic competence is, of course, an important issue for language acquisition, but there is no doubt that they do’. Niyogi (: ) proposes a way to formally model how learners may estimate the instances of different grammars in the PLD, but this does not really give rise to diglossia in the sociolinguistic sense, and it is not clear under what conditions multiple grammars would be postulated rather than a single one. Hence the acquisition of ‘covert diglossia’ may be a genuine problem, or in any case it raises a complication which is not raised if we do not postulate competing grammars, and hence the complication is justified to the extent that competing grammars play a role in explaining syntactic change. It may also be the case that assigning marked and unmarked values to parameters, for example by arranging them in hierarchies as described in §., may pose learnability problems for a competinggrammars approach. Each parameter is associated with a marked and unmarked value, and where the marked value is not triggered (i.e. expressed in the PLD), the unmarked value is assumed. Suppose then that we have a parameter p with two values m and u, the marked and unmarked ones. (In §. the marked option involves moving further down a parameter hierarchy; the unmarked value implies the learner will stop searching at that point in the hierarchy). Then, if the cue or expression for value m of p is present, p will be set to that value; otherwise (including in the case where there is absolutely no evidence relevant to p in the PLD, the NO option in a parameter hierarchy—cf. §..) it will be set to the unmarked value. This approach appears to leave little room for the postulation of competing grammars: ‘evidence of the simultaneous use of incompatible forms’ would presumably favour the postulation of the marked value, since this would effectively constitute evidence for this value, given that the default value by assumption requires no evidence. In all and only cases where the marked value is not expressed in the PLD, the unmarked form results. This is particularly 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE SPREAD OF SYNTACTIC CHANGE

clear if, as suggested in §.., the Tolerance Principle defines the tipping point for movement down a hierarchy, then postulation of competing grammars would be impossible in principle: either the occurrence of a form incompatible with a given parameter value crosses the threshold or it does not. So we see that the postulation of competing grammars, although a possible example of what WLH called ordered heterogeneity, may not be without problems for the account of setting and changing parameter values that we put forward in §.. Leaving aside language contact for now, there are three ways in which competing grammars could be acquired in terms of the approach to parameter-setting we have put forward. The first possibility is that the syntactically distinct systems are also distinct in some other ways: if there are clear phonological and lexical differences systematically associated with the parametrically differing systems, then presumably acquirers will keep the two systems apart. Again, this is what must happen in the case of ‘true’ bilingualism.24 By assumption, this is not what happens in the case of competing grammars. The second possibility is to postulate that children associate the competing grammars with some sociolinguistic or contextual marker. As Croft (: –) and Denison (: ) point out, this can also account for change, and possibly for the S-shaped propagation of change: if an innovating form has a social value, this may favour it in the grammar competition. The idea of distinct systems being available to speakers in a given speech community is discussed in WLH (–). They insist that the two systems must be functionally distinct () for two reasons. First, in order to be of interest for the theory, the two systems must be in competition. Second, there must be ‘a rigorous description of the conditions which govern the alternation of the two systems. Rules of this sort must include extralinguistic factors as governing environments’ (–). These extra-linguistic factors include the kinds of factors which have been much discussed by sociolinguists: age, sex, gender, social class, ethnicity, etc. What has not been clearly shown 24 There is evidence from first-language acquisition that both phonological and lexical acquisition begin in the first year of life. In particular, very young children (in the first month of life) seem able to distinguish their native language from other languages, and begin to be able to discriminate consonants. These findings are documented in detail in Guasti (: ch. ); see in particular her table ., p. . It is possible, then, that two completely distinct systems are kept apart from a stage prior to the onset of the acquisition of syntax.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

in the studies of grammars in competition following on from Kroch () is that the putatively competing grammars are functionally distinct in WLH’s sense. Indeed, if anything, the evidence suggests that they are not, as examples like () indicate (although of course it must always be kept in mind that the relevant evidence is somewhat flimsy). The third and arguably most interesting way to think of the acquisition of competing grammars would be to invoke, presumably as an aspect of the language faculty (i.e. a first or third factor), a ban on true doublets. As Pintzuk (: ) suggests, one could, following Kroch (: ), assume that the presence of ‘syntactic doublets in the lexicon’ will give rise to variation and change. As she says, Kroch () ‘suggests that in syntax, as in morphology, doublets that are semantically and functionally non-distinct are disallowed; and that doublets of this type . . . compete in usage until one of the forms wins out’ (Pintzuk : ). In morphology, the Blocking Effect, defined as ‘the non-occurrence of one form due to the simple existence of another’ (Aronoff : ), bans doublets, hence, for example, the non-existence of *graciosity in English is due to the existence of graciousness. In other words, acquirers could be led to the postulation of two systems by the mere existence of syntactic doublets. This could follow if system-internal optionality were not allowed. Then the existence of doublets, analysable in principle as options, would have to be seen by acquirers as contradictory properties, and hence as evidence for distinct systems. Such a line of reasoning could clearly be applied to parameters viewed as formal features of functional heads, as assumed here. Moreover, since these are features of lexical entries, one might think that this is no different in principle from something as simple as the variant pronunciations of economics (with initial /i:/ or initial /ϵ/), with the exception that free variation is simply not allowed. This approach is very appealing, and technically compatible with earlier versions of minimalism such as Chomsky (, ) where, as we mentioned in §.., optionality was not allowed. However, as we have seen, syntactic doublets genuinely do appear to exist: the Russian example in () illustrates this (and see the discussion of object movement in ME in §..). More generally, true optionality, with no functional or semantic import, does seem to be allowed in the current system, as we saw in the previous section. If that is true, then Kroch’s () ‘no-doublets’ assumption is empirically wrong, and therefore 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE SPREAD OF SYNTACTIC CHANGE

this way of accounting for the acquisition of competing grammars is not available. I will return to the possible implications of true optionality in the next subsection. I conclude that the competing grammars idea, although in principle providing a way of accounting for the spread of change through a speech community and over time in a way which would be broadly compatible with the approach to syntactic change we described in the previous chapters, has not really been shown to be free of the twin problems of allowing for unattested examples through subsentential code-mixing (including examples violating a generalization as pervasive and powerful as the FOFC) and of posing the learnability problem of when such a system should be postulated. A clear demonstration of the social value of one or the other competing grammars would, however, illustrate the utility of this approach (and we will see a likely example of this in §.. below). It is also important to remember that we cannot exclude the possibility that the competing grammars postulated by Kroch, Pintzuk, and others for the earlier stages of various languages for which we have little or no sociolinguistic information, did in fact have a social value whose nature has been completely obscured by the passage of time and the nature of the extant texts. 4.2.4. Formal optionality again WLH (–) discuss the notion of system-internal variation. We have seen that the current model of syntax allows for this (cf. the discussion of (), () in §..). So, a further possibility for accounting for the spread of change lies in true optionality. Let us reconsider the example of optionality of pied-piping in wh-movement constructions in Russian from Biberauer and Richards (): () a. Č’ju knigu ty čital? whose book you read b. Č’ju ty čital knigu? whose you read book ‘Whose book did you read?’ (Biberauer and Richards , (), )

[Russian]

According to Biberauer and Richards, the difference between Russian and English is that in Russian wh-words are QPs, while in 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

English they are Ds. Both languages have overt wh-movement in interrogatives, i.e. in both languages a [+wh] C forces movement of a wh-XP to its Specifier in virtue of having an EPP feature (i.e. Parameter E of §.. is positive in both languages). In both languages wh-DPs, containing the wh-word and its associated NP, can move. But Russian has an option lacking in English: that of moving the wh-word alone, as a QP, to SpecCP. Russian is thus an example of a single grammatical system, with all parameters set to determinate values (in particular the parameter determining the categorical status of wh-words), which can nevertheless produce doublets, i.e. true optionality where there is no semantic, functional, or social difference. The learnability problem for competing grammars discussed in the previous section does not arise, since there is clear evidence in the PLD for the Russian value of the wh-word parameter from examples like (a). Given a system with pure formal optionality in a single grammar, it might then suffice for one of the doublets to take on social value in order for one of the variants to become predominant (in certain social groups) and hence for a change to take place. However, to fully understand change in this way, we need to bear in mind the interaction of structural, sociolinguistic, and psycholinguistic factors. The structural factor is that UG disallows the possibility of non-XPs undergoing wh-movement (i.e. there is no wh-head movement), but the observation that no language only has the option instantiated by (b) appears to be correct (see Gavarrò and Solà : , who observe that this is also true for child language). Hence, this doublet cannot oust the other. On the other hand, a change of the kind whereby option (b) disappears and only (a) remains (which we suggested, following Biberauer and Richards, may have taken place in the history of Greek) is favoured by the Subset Principle since the elimination of (b) and the retention of (a) involves a shift from a superset to a subset grammar (see §..). But we can consider that the taking on of social value on the part of the variant in (a) may have had the effect of skewing the PLD for speakers in such a way that the unmarked option of the grammar which only allows (a) was preferred. Here we see how sociolinguistic factors may interact with other factors in syntactic change. It may be that such an interaction of sociolinguistic, psycholinguistic, and structural factors is required in order to have a complete picture of how syntactic change works. (Of course, the account of this particular 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE SPREAD OF SYNTACTIC CHANGE

change in the history of Greek remains very sketchy owing to the lack of relevant data.) However, if Niyogi and Berwick’s (, ) results, as described above, are borne in mind, it may be enough to simply have variation in the PLD; the social value of the variants may influence the change in various important ways, but change is guaranteed by the mere presence of variation (this conclusion is consistent with Kauhanen and Walkden’s () conclusions regarding the Constant Rate Effect; see §..). For this kind of scenario to have any generality, we need to formulate parameters in such a way that true syntactic doublets may be a recurrent possibility; this has not been done in most of the work on parameters in recent years, and an important research question concerns the extent to which it can be done. As Biberauer and Richards (: ) observe, however, the technical make-up of the current system leads us to expect formal optionality: ‘[o]ptionality . . . would have to be stipulated not to exist’. 4.2.5. Abduction and actuation A final question brings us back to the matter of the actuation of change. We have offered a purely ‘internalist’, I-language account of this, seeing change as driven by abductive reanalysis, associated with parametric change, in language acquisition. Let us consider again the schema for abductive reanalysis that we discussed in §..: () Generation 1: G1 → Corpus1 Generation 2: G2 → Corpus2

In our earlier discussion, we referred to G as the ‘parental grammar’ and G as the children’s grammar. Corpus was seen as the output of the parental grammar. We observed that this description involved a certain amount of idealization. In fact, WLH (–) criticize Halle’s () reanalytical model of phonological change, which is quite similar to what is schematized in (), on several grounds, two of which are relevant here. First, they point out that the speech of older peers, only slightly older than the acquirer, plays a more important role in determining the nature of the acquired system than does parental speech. (This point is also taken up by Croft : .) Second, they observe that change is not purely 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

intergenerational, but that changes appear to last longer, and continue in the same direction, across generations. The second point concerns gradualness: as we have seen, grammatical and sociolinguistic factors can cushion the effects of parametric change and cause it to appear to be temporally diffuse; at the level of the speech community this is very likely if the community has any sociolinguistic complexity, as all speech communities do. What it implies is that successive generations can replicate the ordered heterogeneity of the speech community, whether this is done through the postulation of competing grammars or through the exploitation of formal options in the grammatical system, or a combination of the two. In other words, language acquirers can acquire variation; Westergaard (: ) observes that ‘linguistically conditioned variation does not constitute conflicting input for children, as they seem to easily resolve this variation within a single grammar’. The first point is well-taken, and requires us to see the PLD as more diverse than we have hitherto been assuming for expository purposes. Again, this means that orderly differentiation must be present in the PLD and that acquirers must be sensitive to it. Rather than referring to G in () as the parental grammar, then, we should call it the system (or systems) underlying the PLD. That this may not be produced by parents, but rather by older siblings or peers, does not in itself require that we abandon the concept of abductive change. And of course, we should not refer to G and G as properties of different generations (itself a highly diffuse concept, as WLH:  point out), but rather as properties of a relatively older group providing PLD for a relatively younger group. This implies that where there is abductive reanalysis in G of (), the age discontinuity between the two groups may be rather small, perhaps less than ten years.25 This may be a further source of the observed gradualness of change discussed in §.. However, we are able to retain the notion of discrete parameter values underlying the formal distinctions between the grammatical systems. WLH (–) present the following summary account of their proposals for how change spreads (the ‘transition problem’ in their terminology):

25 This implies that the loose calculations of diachronic stability in §.. should be recalibrated. As mentioned there, this in fact strengthens the points made concerning the stability differences among different parameter types.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE SPREAD OF SYNTACTIC CHANGE

This transition or transfer of features from one speaker to another appears to take place through the medium of bidialectal speakers, or more generally, speakers with heterogeneous systems characterized by orderly differentiation. Change takes place () as a speaker learns an alternate form, () during the time that the two forms exist in contact within his competence, and () when one of the forms becomes obsolete.

This approach can be taken over into the parameter-based account of syntactic change being proposed here, with the ‘alternate forms’ in question being either options generated by a single system or competing grammars. The new parametric option (represented either by a new grammar or a resetting of a parameter such that a new option is introduced) enters the speech community and some individual’s competence at stage (), and the old parametric option (represented either by an old grammar or by a resetting of a parameter such that an option is lost) disappears at stage (). At stage (), one option replaces the other as it spreads through the community, with a changing proportion of individuals having two grammars or a setting of the relevant parameter permitting formal optionality. Clearly, this scenario gives rise to more apparent gradualness. It can also account for the typical S-curve of change as one grammar replaces the other through the community at stage (). Moreover, it is broadly in accordance with the results of Niyogi and Berwick’s (, ) dynamic models as summarized above. Thus we have a general model for gradual spread of change through a speech community, which is compatible with the results of sociolinguistic research on synchronic variation of the kind reported above by Labov (, , , ), and which nevertheless allows us to maintain an acquisition-driven view of the actuation of change and a notion of discrete, binary parameters as the locus of variation among grammatical systems. Of course, much more needs to be done to really establish this view, and much more direct evidence for it needs to be gathered, but it represents a promising research programme for getting a full picture of the nature of syntactic change. 4.2.6. Change in progress? Null subjects in Brazilian Portuguese26 An example of ongoing, or at least fairly recent, change may come from Brazilian Portuguese (BP). As we saw in §.., BP contrasts with 26

I am very grateful to Eugênia Duarte and Mary Kato for helpful and extensive comments on this section.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

European Portuguese (EP) in that it shows some typical properties of a partial null-subject language (PNSL), while EP is a consistent nullsubject language (CNSL) (but see note ). The following examples show the standard hallmarks of a CNSL in EP (see §.., ()): () a. Null definite pronominal subjects in any person-number combination in any tense: Telefonaram ontem. called- yesterday ‘They called yesterday.’ b. ‘Free inversion’ (see §.., (a)): Telefonou ontem o João. called- yesterday John ‘John called yesterday.’ c. Wh-movement of a subject over a finite complementizer (see §.., (b)): Que aluno disseste que – comprou um computador? which student said- that – bought- a computer ‘Which student did you say bought a computer?’ (Barbosa, Duarte, and Kato : ) d. ‘Rich’ agreement (see §.., (a)): (eu) falo (nos) falamos (tu) falas (vos) falais (ele/ela) fala (eles) falam e. Arbitrary Sg null subjects require a special marker (see §.., (b)): É assim que se faz o doce. be.. thus that SE make.. the. sweet ‘This is how one makes the dessert.’ BP, on the other hand, does not allow free inversion, except with unaccusative verbs—see Kato ().27 It does, however, allow null subjects as in (a) and wh-movement as in (c), but the incidence of third-person null subjects compared to EP is considerably reduced

27 Kato and Duarte () relate the loss of general free inversion in BP to the loss of clitics in that variety (on which see in particular Nunes ) and to a prosodic constraint which prevents the inflected verb from appearing in linear first position.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE SPREAD OF SYNTACTIC CHANGE

(see below). We saw in §.. that Figuereido-Silva (: ) claims that in main clauses in an ‘out-of-the-blue’ context (e.g. where there is no salient discourse topic; see Barbosa : , note ) third-person null subjects are dispreferred, while first- or second-person null subjects are readily possible (see §.., ()): () a. pro

encontrei a meet.. the ‘I met Mary yesterday.’ b. *pro encontrou a meet.. the

Maria ontem. Mary yesterday

[BP]

Maria ontem. Mary yesterday

This contrast may be due to the fact that first- and second-person referents are naturally salient, while third-person ones are not necessarily so. Roberts (a: ), following Holmberg (), adopts the proposal that there are first- and second-person speech-act operators in the CP-system in order to account for this difference. As we saw in §.., EP allows null-third-person subject pronouns; see Duarte (: , table ). However, despite this difference, null subjects are declining in all persons, as shown in Duarte (, , ). Duarte () reports the results shown in Table . for the rate of overt and null third-person subjects in the two varieties. Moreover, BP allows overt inanimate subjects much more readily than EP, as illustrated in () (underlining indicates intended coreference): () A casa virou un filme quando ela teve de ir [BP] the house became a movie when it had to go abaixo. down ‘The house became a movie when it had to be pulled down.’ Duarte () finds a % incidence of null inanimate subjects in EP and % for BP; from her data it seems clear that overt inanimate pronouns are increasing significantly in BP. Barbosa, Duarte, and Kato illustrate a host of other respects in which EP patterns like a CNSL and BP diverges; these include subject left dislocation with an overt resumptive subject pronoun in BP, postverbal emphatic subject pronouns in EP, the possibility of interpreting overt pronouns as bound pronouns in BP, and the possibility of relativeclause extraposition in BP. The contrasts regarding left dislocation are 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

illustrated in () and the possibility of interpreting overt pronouns as bound variables in () (repeated from §.., ()):28 () a. A Clarinha, ela cozinha que é uma maravilha. [BP] the Clarinha, she cooks that is a wonder b. A Clarinha, __ cozinha que é uma maravilha. [EP] the Clarinha, cooks that is a wonder ‘Clarinha cooks wonderfully.’ (Barbosa, Duarte, and Kato : –) () Cada um podia fazer as perguntas que ele queria. Each one could- do- the questions that he wanted- ‘Each one could ask the questions that he wanted.’ (Rádio CBN; Eugênia Duarte, p.c.) Table . gives figures from Duarte () based on recent recordings of spoken BP. Table 4.1. Occurrences of null and overt third-person subjects in spoken EP and BP Adapted from Duarte (: , table ). Variety

Null subject

Overt subject

Total

EP

 (%)

 (%)

 (%)

BP

 (%)

 (%)

 (%)

So we see that contemporary spoken BP shows a range of properties typical of PNSLs, in contrast with EP which is a typical CNSL (along with Spanish, Italian, Romanian, and the majority of the Romance languages, with the provisos in note ). Duarte (, ) documents the use of overt subject pronouns in popular plays written in Rio de Janeiro from the first half of the nineteenth century up to  and in interviews recorded in Rio de Janeiro in . Her results, shown in Figure ., indicate that the rate of pronominal subjects rose from % in the mid-nineteenth century to % by the end of the twentieth century. It is possible to discern an

28 The authors point out that the string corresponding to the BP one in (b) is grammatical in EP, but it carries a sense of redundancy and is ‘uttered only when the speaker is hesitating’ (), which is not the case in BP.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE SPREAD OF SYNTACTIC CHANGE

80%

74% 67%

70% 60% 46%

50%

50%

40% 30% 20%

20%

23%

25%

10% 0% 1845

1882

1918

1937

1955

1975

1992

Figure 4.4 The rate of overt pronominal subjects in the nineteenth and twentieth centuries From Barbosa, Duarte, and Kato (: ) and adapted from Duarte (: ).

S-curve here, giving the result that contemporary BP is coming close to completion of this change. Duarte suggests that the change was caused by a reorganization of the pronominal system in such a way that formerly third-person DPs came to be used as sg, pl, and pl forms, replacing the earlier pronouns, with the result that sg verbal inflection is now used in these persons as well. The effect is a levelling of the verb paradigm, as can be seen by comparing the EP paradigm in (d) with the BP one in ():29 () Reorganized colloquial BP paradigm: eu falo a gente fala você fala vocês falam ele/ela fala eles falam ‘I speak, etc.’ The effect of the reorganization of the pronominal system is a reduction of the number of distinctions in the verbal inflection from six to three. As such the new system arguably falls below the threshold for ‘rich’

29 The replacement of the sg familiar pronoun tu by the polite form você is reminiscent of the replacement of the pl intimate pronoun thou by the pl/formal you in ENE (see Lass : – and the references given there). It is not quite the same, however, in that você is etymologically third-person; the original pl form in Portuguese being vos. There is a further important difference in that, while você replaces tu in the plays analysed by Duarte (, ) after the s, from the s, tu seems to return, in variation with você, as documented in Duarte ().



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

agreement (see §.., ()), again showing a property characteristic of PNSLs rather than CNSLs; compare (ii) and (ii) in §.. (see Duarte ,  for this idea). Furthermore, in terms of the idea put forward in §.., () that PNSLs have impoverished determiner systems, it is worth pointing out that BP, unlike all other non-creole Romance languages, allows bare singular count nouns in argument position, as in () (see Munn and Schmitt ): () Cachorro(s) gosta(m) da gente. Dog() like..(). of.the people ‘Dogs like people.’

[BP]

There is therefore some reason to think that BP has become, or perhaps is becoming, a PNSL, diverging from the CNSL status of EP (and more general Romance). In these terms, we could see Stage () of the change in terms of the quotation from WLH in the previous section—the innovation of an alternate form—as the initial reorganization of the pronoun system. WLH’s Stage () is documented by Duarte’s data given in Table .; thus individuals in the BP speech community either have competing grammars or the system permits formal optionality. The former conclusion is favoured by both ‘internal’ and ‘external’ considerations. First, it is difficult to see how a single grammar could be simultaneously a PNSL and a CNSL; at the very minimum this would entail a radical rethinking of the null-subject parameter hierarchy in A‒A of §.. and §... Second, the CNSL and PNSL varieties have social value in Brazil, with CNSL properties found only in formal written language. In particular, there is evidence that null subjects emerge in BP as a function of education and that there is code-switching between a grammar allowing third-person NSLs and one which does not (Duarte , ; Kato ; Kato and Duarte ). BP may be a genuine case of grammars in competition, then, with fairly clear register correlates with the choice of grammar. Again, this would firmly situate contemporary BP in WLH’s Stage . Contemporary BP is arguably not yet at WLH’s Stage () however; this would entail the obsolescence of the CNSL variant. Duarte () reconsiders the current BP situation in the light of a refinement of the contexts that still license the null subject, in order to look more closely at the contrasts between BP and EP, using recordings of spontaneous speech in both varieties made in ‒. The results show that BP and 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

THE SPREAD OF SYNTACTIC CHANGE

EP null subjects are significantly different, in particular in the case where there is no c-commanding antecedent for the null subject: here the incidence of first-person null subjects in BP is .% compared with EP (Duarte : , table ). Furthermore, null subjects in both varieties are licensed in the presence of a prominent, syntactically accessible antecedent, although even here BP shows a preference for an overt subject. Both of these properties appear to favour an analysis of BP as a PNSL, but Duarte raises questions about the validity of this characterization of BP, on the grounds of the difficulty of making a straightforward comparison with other PNSLs (e.g. Finnish) and questions about the status of generic null subjects in BP.30 Duarte further emphasizes that BP null subjects are clearly declining in frequency in all contexts, and that it does not seem possible to treat BP as a synchronically stable system. The indications are, therefore, that BP is changing, at least in that it is losing its earlier CNSL status, largely retained in EP (but see note ). The indications are that BP is changing into a PNSL, although the provisos raised by Duarte (, ) are important and clearly indicate that further investigation, both syntactic and sociolinguistic, is needed. Importantly, the dynamics of the change correspond broadly to the scenario outlined by WLH. Of course, this brief summary does not do justice to the full complexity of the situation in BP, either sociolinguistically or syntactically, but the example is illustrative. In particular, it shows us how we may be able to use the tools of syntactic theory combined with sociolinguistic theory in order to observe syntactic change in progress, a point also emphasized by Duarte (). 4.2.7. Conclusion In this section, I have tried to show how sociolinguistic and other considerations can interact with a parametric syntactic system to give a full picture of how a change goes to completion. A number of questions remain open, of course, but the two central ideas are (i) that change starts through the abductive reanalysis of PLD by 30 A further complication is that, contrary to what was stated in §.., there are some possibly significant divergences between EP and the other ‘core’ Romance CNSLs Spanish and Italian. EP allows first-person overt subjects somewhat more readily than Spanish and Italian (cf. Marins ; Soares da Silva ; Duarte : ). Duarte and Soares da Silva () conclude that EP is not a fully well-behaved or ‘prototypical’ (C)NSL.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

individuals, and (ii) there is a period of variation, which may be associated with grammars in competition, formal optionality, diglossia, nanoparametric diffusion, etc., until one system finally takes over from the other. If the diglossic situation is sufficiently clear, in that the use of the varieties perceived as ‘high’ and ‘low’ is associated with quite unambiguous and stable social value, then the grammars may peacefully coexist across a number of generations. Something like this may be the situation in the German-speaking areas of Switzerland where Standard German is the ‘high’ variety and local varieties of Swiss German—significantly different from Standard German in their syntax as in many other structural respects—represent the ‘low’ variety (see for example Brandner  and the references given there on the syntactic features of the Alemannic dialects of South-Western Germany and Switzerland). Our emphasis here has been on the nature of the process that gives rise to the initial change in the individual, but, in order to have a full picture of how changes may actually go to completion, we must take into account the period of variation and the nature of variation as well. We have seen that syntactic variation can take on a range of forms, not always indicating the presence of two distinct grammars in an individual or a speech community. One important task for future research is to clarify the differences among these types of syntactic variation, and investigate what implications these may have for syntactic change.

4.3. Drift: the question of the direction of change 4.3.1. Introduction Here we take up WLH’s observation that changes seem to take place over longer periods than an approach which claims that it is caused by generation-to-generation transmission of PLD would lead one to expect. In fact, if abduction as in () relates not generations but older and younger peers within a single generation, then the prediction of the abductive approach is that change is still more rapid. So we have to face the question of the direction of change over periods longer than required for an abductive change of the type schematized in (). But is it meaningful, or even possible, to think of syntactic change as having an inherent direction? The idea that language change is directional, and 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

DRIFT: THE QUESTION OF THE DIRECTION OF CHANGE

that languages pass through cycles, or ‘life cycles’, of changes was common in the nineteenth century: Morpurgo-Davies (: –) identifies this kind of thinking in the work of Bopp, Grimm, Humboldt, Schlegel, Schleicher, and Max Müller, and observes the influence of Darwinian and Hegelian thought on the last two of these (–). The idea that change is directional was stated very influentially, and eloquently, by Sapir (: ff.) in terms of the concept of drift. He says that ‘[l]anguage moves down time in a current of its own making. It has a drift’ (), and goes on to assert that ‘[e]very word, every grammatical element, every locution, every sound and accent is a slowly changing configuration, molded by the invisible and impersonal drift that is the life of language. The evidence is overwhelming that this drift has a certain consistent direction’ (). Moreover, we can infer the ‘drift’ of a language from its history: ‘[t]he linguistic drift has direction . . . [t]his direction may be inferred, in the main, from the past history of the language’ (–). He suggests three connected drifts in the history of English: the ‘drift toward the abolition of most case distinctions and the correlative drift toward position as an all-important grammatical method’ and ‘the drift toward the invariable word’ (). These are all long-term drifts: ‘[e]ach of these has operated for centuries, . . . , each is almost certain to continue for centuries, possibly millennia’ (). Each change is therefore just one in a series of changes, created by what went before and in turn creating the conditions for subsequent changes. In this section, I want to address the question of whether we could entertain a notion of ‘parametric drift’. Clearly, this question relates directly to the gradualness issue, discussed in §., as well as to WLH’s criticism of the abductive model which we discussed in the previous section. But a number of further questions arise. Most prominent among these is the matter of causation: if there is such a thing as parametric drift, what causes it? We do not wish to invoke Hegelstyle laws of history, or the nineteenth-century organic metaphor of languages (or grammars) as going through a ‘life cycle’. But if grammars are transmitted discontinuously from generation to generation, or from older to younger peers, through language acquisition, how can acquirers know which way the system they are acquiring is drifting? As we have seen, this point has been very forcefully made by Lightfoot (, , ). A second question is: if drift does exist, what are the natural directions for it? Third, how does this concept relate to the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

general idea that syntactic change is subject to the Inertia Principle, as discussed in §.? Here I will suggest that parametric drift can be thought of as a cascade of changes, a kind of ‘domino effect’ in the parametric system, whereby an initial, exogenous change destabilizes the system and causes it to transit through a series of marked states until it eventually restabilizes as a relatively unmarked system again. The series of marked states could in principle cover several cohorts of acquirers, with each successive group of acquirers being led to reanalyse different aspects of the PLD which have been rendered marked by an earlier change. I will briefly illustrate this idea by looking at changes affecting the English auxiliaries in the ENE period, some of which we have already seen in earlier sections. In fact, we have already seen some cases of this type in §..: there we saw examples of how, given the interaction of the third factors which gives rise to markedness, systems may become gradually more marked by diachronically ‘moving down’ parameter hierarchies, until a feature or a class of features disappears from the system. The system may then ‘leap up’ to a much less marked state; this is what Biberauer and Roberts () referred to as ‘instantaneous simplification’. This approach answers the questions raised above in the following ways: markedness, construed as the interaction of the third factors Feature Economy, Input Generalization, and the Subset Principle, may be what underlies parametric drift and determines the natural direction of change; it is also what causes the reanalyses, explaining why the Inertia Principle does not hold in these cases (although I will tentatively suggest that the Inertia Principle does hold in a slightly different way). 4.3.2. Typology, the Final-Over-Final Condition and drift Typological studies have often emphasized the directional nature of change. The earliest kind of structural typology proposed was the morphological typology which distinguishes isolating, agglutinating, and inflectional languages. Isolating languages tend to lack inflectional affixes, marking grammatical notions either by word order or by separate particles of various kinds: Chinese and Vietnamese are usually cited as good examples of this type. Agglutinating languages add affixes to roots in a predictable way, with something close to a one-form–onemeaning correlation, while inflectional languages add affixes to roots in a way subject to various kinds of phonological and morphological 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

DRIFT: THE QUESTION OF THE DIRECTION OF CHANGE

conditioning. Malayalam is a good example of an agglutinating language (see §..), and the older Indo-European languages such as Latin, Classical Greek, or Sanskrit are good examples of inflectional languages (Comrie : – gives details and examples). This typology is usually attributed to the early nineteenth-century linguists Friedrich and August Schlegel (but see Morpurgo-Davies : – on its seventeenth- and eighteenth-century antecedents), and formed the basis of a directional account of morphological change in Schleicher (–: , –). Schleicher proposed that languages change from isolating to agglutinating to inflectional in that direction and not in the opposite one. Change in morphological type has thus been seen as unidirectional (see Croft : – on this). As we saw in §.., the idea of long-term typological change in word order is associated in particular with Lehmann () and Vennemann (). Lehmann (: ) proposed that languages which are not typologically consistent in terms of his OV vs. VO typology are undergoing change. He discusses directional change from OV to VO in IndoEuropean (–) and mentions the possibility of VO changing to OV in Tocharian (). We could apply this idea to the history of English, and observe that English has been drifting from OV to VO since at least the Proto-Germanic period, and has not yet completed the drift (since English retains AN and DemN orders, at least). Vennemann’s () approach, based on the Natural Serialization Principle, also runs into the difficulty that we are led to regard ‘mixed’ systems as persisting over very long periods. We have already quoted Vennemann’s (: ) remark that ‘a language may become fairly consistent within a type in about  years’ (for example, English). Both Lehmann and Vennemann proposed long-term, directional, typological drift from OV to VO as a mechanism of change. A well-known and very striking example of apparently directional typological change comes from Greenberg’s () study of wordorder change in the Ethiopian Semitic languages. Greenberg looked at synchronic word-order differences among a number of these languages (Ge’ez, Tigre, Tigrinya, fourteenth-century Amharic, Modern Amharic, Old Harari, and Harari) and observed that Ge’ez had free word order in the clause, with a tendency towards VSO, and has AN order alternating with NA in nominals, along with NGen (alternating with GenN) and Prepositions. Tigre is SOV and AN, but predominantly NGen and Prepositional. Tigrinya is the same, but with GenN. Amharic of both 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

periods is like Tigrinya but with GenN and a growing tendency for Postpositions. Finally, Harari is basically postpositional. As Croft (: ) says, these systems ‘differ from each other in small enough ways that the actual historical process can be perceived, just as motion is perceived in a sequence of stills from a movie’. Greenberg (: –) also compares the situation in the Ethiopic languages with what seems to be a parallel situation in the Iranian languages. The important point here concerns directionality. Croft (: –) observes that if we concentrate on three of the variant wordorder dyads, AN/NA, GenN/NGen, and Prepositions vs. Postpositions, we can observe the following combinations: () a. b. c. d.

NA and NGen and Prep AN and NGen and Prep AN and GenN and Prep AN and GenN and Postp

Croft says that this can be seen as a unidirectional historical process, whereby each change leads to the next and the reverse process is not found. Unidirectional changes of this kind impose ‘a major constraint on possible language changes. In fact, [they] cut[s] out half of the logically possible language changes’ (: ). Discovering such unidirectional processes is therefore a major goal of diachronic typological linguistics. Here we are again dealing with long-term typological drift. If we take it, following Greenberg (), Hawkins (), and others, that NA and NGen are both head-initial orders, then what we see in () is an incremental shift from a harmonically head-initial system to a harmonically head-final one.31 We saw in §.. and §.. that Ledgeway () documents an almost opposite path of change over the history of Latin and into Romance, with the inherited (near-) harmonically head-final system of Archaic Latin changing into the harmonic head-initial system of Romance regarding these same dyads (on the evidence for a very early unmarked AN order, see Ledgeway : ‒; and on the GenN order Ledgeway : ‒, especially table ., p. ). As Ledgeway argues, and as we 31

Since Dryer (), it has been generally thought that the order of adjective and noun, i.e. the AN/NA dyad, is not one which participates in cross-linguistic implicational universals. Dryer showed that the earlier view that it did was the result of faulty sampling. Nevertheless, we retain it here as we are reporting Greenberg’s () results.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

DRIFT: THE QUESTION OF THE DIRECTION OF CHANGE

saw in §.. and §.., this follows from the FOFC. In that case, we could view the order of changes in the Ethiopian Semitic languages observed by Greenberg as reflecting the opposite FOFC-constrained diachronic order in the transition from harmonic head-initial to harmonic head-final order. We saw in §.. that Biberauer, Holmberg, and Roberts (: ) and Biberauer, Holmberg, Roberts, and Sheehan (: ) point out that FOFC predicts just two possible trajectories of word-order change: change from harmonic head-final to harmonic head-initial order must be ‘top-down’ in a given Extended Projection, while change from harmonic head-initial to harmonic head-final order must be ‘bottom-up’. Any other order of changes will give rise to intermediate stages featuring a final-over-initial order, in violation of FOFC. We see then that FOFC is the ‘major constraint on possible language changes’ observed by Croft. In §.. and §.. we saw the relevance of FOFC for change from head-final to headinitial order; here we see its relevance to the opposite direction of wordorder change.32

32

Given the characterization of head-final to head-initial change in terms of the successive loss of roll-up movements in §.., the natural account of the change from head-initial to head-final order ought to involve the gradual, FOFC-compliant introduction of roll-up. Analysing the changes in the nominal Extended Projection in () in these terms requires some rather unorthodox assumptions about the position of genitives and APs inside the nominal, stemming from the fact that the simplest structural treatment of the adjective-noun and genitive-noun pairs involves treating them as head-complement relations. The simplest such analysis would involve the following structure for the head-initial order in (a): (i) [PP P [DP D [nP

n [NP GenP

[N’ N AP]]]]]

To obtain the order in (a), we assume general N-to-n movement. (b) then derives from AP-movement to adjoin to NP or to SpecnP (which option is taken depends on the relative order of GenP and AP in the order in (c)). (c) is obtained by movement of the remnant NP, containing just GenP, to SpecDP. Finally, (d) is obtained by movement of DP to SpecPP. Accounting for the changes in () in terms of successive introduction of roll-up is thus technically feasible. Note that the somewhat implausible assumption that an adnominal AP is the complement of N is essentially the technical replication of the assumption in early word-order typology that the AN/NA dyad was relevant for implicational word-order universals, an assumption abandoned since Dryer () (see previous note). If we drop this idea, and simply leave AP out of the picture, the structure in (i) becomes much more plausible. Of course, it is well-known that there are many kinds of genitive (subjective, object, partitive, possessive, etc.). Discussion of these here would take us too far afield, and would of course go beyond the results of Greenberg (). For recent discussion and analysis of different kinds of genitive, see Crisma and Longobardi ().



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

If we take into account clausal word order, we can add an initial stage to () on the basis of Greenberg’s data from Ge’ez. We could, abstracting slightly away from the possibility of alternate orders, bring in the OV/VO dyad and restate () as (): () a. b. c. d. e.

VO and NA and NGen and Prep OV and NA and NGen and Prep OV and AN and NGen and Prep OV and AN and GenN and Prep OV and AN and GenN and Postp

(Ge’ez) (Tigre) (Tigrinya) (Fourteenth-century Amharic) (Harari)

It seems then that the first of the series of changes, taking Ge’ez to represent the most conservative stage, was from VO to OV. In (a) we see harmony across the clausal and nominal Extended Projections. The initial change from VO to OV is followed by the incremental changes inside the nominal Extended Projection discussed above. This conforms to Input Generalization: the feature (perhaps Biberauer, Holmberg, and Roberts’  diacritic ^; see §..) which triggers roll-up is generalized across Extended Projections and then within the nominal Extended Projection. This is the reverse of the general change from head-final to head-initial orders across the clausal and nominal Extended Projections seen in the development of Latin/Romance as analysed by Ledgeway (), discussed in §... More generally, in terms of the interactions of the third factors discussed in §.., Input Generalization favours word-order harmony over disharmony, while, if we continue to assume that roll-up is triggered by some kind of formal feature, Feature Economy (construed in terms of feature types, as discussed there) favours harmonic headinitial orders and is neutral between partially and wholly head-final orders. The Subset Principle favours disharmonic head-final orders over harmonic ones. Harmonic head-initial orders may thus, all else equal, be seen as the least marked of all the possibilities. Although the typological validity of this conclusion may be dubious (suggesting that all else is not in fact equal), it has been proposed that change from VO to OV is an unusual change, and that OV to VO is more common (see for example Kiparsky ). It is certainly true that English, North Germanic, Romance, Celtic, Greek, Slavonic, and Western Finno-Ugric have all undergone the change from OV to VO at some point in their history (see again §. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

DRIFT: THE QUESTION OF THE DIRECTION OF CHANGE

and §.).33 However, there may be some reason to doubt that OV to VO is a ‘natural’ change, or a more natural one than VO to OV. If we look beyond Indo-European we see many cases of highly stable harmonic head-final order at the clause level: in §.. we saw that this is true of Japanese, Korean, and Dravidian (see the references given there), to which we can add Turkic (Kornfilt ). We suggested in §.. that this is due to the fact that harmonic head-final order is the reflex of a macroparameter. In that case, some factor must have ‘destabilized’ the former head-final orders in various branches of Indo-European. In §. we saw evidence that finite clausal complementation arose from a reanalysis of relative clauses in Latin and a number of other Indo-European languages, including Germanic, Greek, Old Irish, Old Iranian, Tocharian and Hittite (see the references given there). In all these languages, the finite complementizers develop from (usually neuter) relative pronouns: English/Germanic that, Classical Greek relative pronoun hōs/hóti; in Old Irish the main subordinator was the nasalizing relative clause, i.e. nasal mutation on the initial consonant of the verb in the subordinate clause (which was initial, Old Irish being VSO); Old Iranian had finite subordinate clauses introduced by initial complementizers derived from relative pronouns (Skjærvø : , –). In Hittite, clausal embedding often involved kuit, again a complementizer derived from a neuter relative pronoun (CotticelliKurras ). Since all these languages had wh-movement in relative clauses, the finite complementizer was invariably initial. In TP, on the other hand, the order was generally head-final (except in Old Irish, although Continental Celtic was OV, as we saw in §..). This diachronic path for the development of finite complementation is fairly well-established (see Rix , Clackson : ‒, Hackstein ). So, again taking our cue from Ledgeway’s (: ‒) discussion of

33

As we noted in §.., several Iranian languages are typologically unusual. Greenberg (: ) notes that we would expect Old Persian to be VO, given the synchronic variation in the languages he documents. But it is OV. He goes on to suggest, following Friedrich (), that, while Proto Iranian was SOV, Avestan allowed VSO order and showed other VO traits (prepositions and NGen order). He concludes that ‘[t]his was the situation when a wave of SOV spread over Iranian territory during the middle Iranian period’ (). It seems, then, that these languages began developing from OV to VO, but, for unclear reasons, this change was reversed. See Harris and Campbell (: ‒) for further discussion of the possible reasons for the typologically unusual word-order patterns in ‘Old Persian’.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

Latin/Romance word-order change, we can take it that that the presence of head-initial finite CPs triggered a more general word-order change from head-final to head-initial, as clause-medial heads ‘harmonise’, in line with Input Generalization. There is a further aspect to this change. We saw in §.. that an important consequence of FOFC is that OV languages which have embedded finite clauses introduced by an initial complementizer are forced to extrapose those clauses, in order to avoid a FOFC-violating configuration of the kind *[VP . . . [CP C TP] V] (see §.., (b)). In all the languages mentioned above, the finite subordinate clauses obeyed this constraint. Hence the development of FOFC-compliant head-initial, postverbal finite CPs can ‘destabilize’ a harmonically head-final system. There were really two aspects of this: first, the simple existence of head-initial CPs and, second, the surface headcomplement order V > CP required by FOFC. Thus there was evidence for head-initiality at the bottom (in VP) and the top (in CP) of the clausal Extended Projection. This combines with Input Generalization to favour a general shift to harmonic head-initial order. On the other hand, the Indic languages show rigid head-final order, despite the fact that Sanskrit had the principal typological features found in the older Indo-European languages listed above. It is striking to note in this connection that Sanskrit had no general subordinator (Kiparsky ; Davison : ). This provides indirect evidence to confirm the idea that initial finite subordinators played a crucial role in bringing about the general head-final to head-initial change across much of Indo-European. Many other head-final languages and families are stably head-final. Change from head-initial to head-final may be somewhat rarer than the opposite, partly because head-initial orders have a slight markedness preference (see above) and partly because, while initial complementizers can be innovated through grammaticalization of a relative pronoun along the lines described in §., change to OV order requires a change in the properties of v and/or V (Anders Holmberg, p.c.). We see that, by ruling out a wide range of imaginable word-order changes, FOFC can explain certain tendencies towards drift. In this sense, drift is purely epiphenomenal, at least in these cases. Is drift always an epiphenomenon, or is there a deeper insight in Sapir’s observation of an ‘invisible and impersonal drift that is the life of the language’? 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

DRIFT: THE QUESTION OF THE DIRECTION OF CHANGE

4.3.3. Drift and parametric change Lightfoot has criticized approaches to syntactic change which invoke typological drift as a mechanism in various places, as we have already mentioned (see §.. and §.). The most detailed and explicit discussion is in Lightfoot (: –). He criticizes this general approach to word-order change on three grounds, all of them essentially stemming from the single general point that grammars must be seen as being replicated through language acquisition, rather than transmitted in any direct way across the generations (or from older to younger peers). We gave the crucial quotation in the discussion of gradualness above, and repeat it here: Languages are learned and grammars constructed by the individuals of each generation. They do not have racial memories such that they know in some sense that their language has gradually been developing from, say, an SOV and towards an SVO type, and that it must continue along that path. After all, if there were a prescribed hierarchy of changes to be performed, how could a child, confronted with a language exactly half-way along this hierarchy, know whether the language was changing from type x to type y, or vice versa? (Lightfoot : )

From this he concludes: ‘[t]herefore, when one bears in mind the abductive nature of the acquisitional process, the concept of an independent diachronic universal (i.e. unrelated to the theory of grammar) becomes most implausible’. He identifies three ways in which one could countenance long-term drift against the background of ‘the abductive nature of the acquisitional process’: racial memory, ‘mystical metaconditions on linguistic families or goal-oriented clusters of changes’ (), and typological drift as seen in the previous subsection. Of these, clearly the first can be disregarded with no further discussion. The second was proposed in Lakoff ’s (: ) account of the drift towards analyticity in various Indo-European languages. Lightfoot objects that such metaconditions run the risk of postulating diachronic grammars, a notion inherently incompatible with the nature of the language acquisition and, as such, untenable.34 34

Such a metacondition could conceivably be a condition on grammar which is successively inherited across generations by the usual language-acquisition process. For Lightfoot () this kind of condition would fall foul of the Transparency Principle, being very abstract. In terms of the assumptions being made here, it could derive from thirdfactor markedness conventions of the kind discussed in §., or the interaction among them, or between them and a condition such as FOFC, as seen in the previous section.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

The third way in which one could countenance long-term drift is of course the kind of typological drift that we illustrated above using Croft’s summary of Greenberg’s work on the Ethiopic languages. Here the criticism is that there is no way of building such long-term teleology into the language-acquisition process, as the above quotation states. Moreover, Lightfoot questions the empirical basis for the diachronic generalizations: we have good long-term textual attestation of very few languages, we know that typological shifts can go in different directions (for example, OV to VO or VO to OV), and we know that even in Indo-European a fair amount of variation in basic word order can be observed: SVO (English, Romance, etc.), SOV (Indic), and VSO (Celtic).35 Finally, he observes that many changes do not involve alterations to observed word order: the development of the modal auxiliaries in ENE (discussed in §.) did not change the surface order of modals and main verbs, but was arguably nonetheless a significant change in the syntax of English. Despite all these criticisms, Lightfoot does not deny that changes can be ‘provoked by earlier changes and in turn themselves provoke others’ (). The conceptual underpinning of much of Lightfoot’s critique of typological drift is fully valid: here I am advocating exactly the same general approach to syntactic change as Lightfoot. All the points made by Lightfoot, summarized in the previous paragraph, are well-taken, with the possible exception of the one concerning the different directions of change: see the discussion of this in the previous section. Lightfoot (: ) does acknowledge, however, that ‘implicational sequences of syntactic changes do exist . . . there are sets of roughly analogous changes which cluster together in independent languages’. His critique focuses on the fact that the ‘implicational sequences of

35

We saw how VO orders may have arisen over a swathe of Indo-European languages in the previous section. VSO, at least of the Indo-European (i.e. Celtic) type, involves productive verb-movement to a high position in TP and a relatively low position for subjects (see §...). It is uncertain how VSO order originated in Celtic, although it seems clear it is a feature of Insular rather than Continental Celtic (see §..). Newton () suggests that it arose through a reanalysis of (remnant) VP-fronting into the left periphery, an operation found elsewhere in the older Indo-European languages; see Chapter , note , and §... It has long been observed that the Insular Celtic and certain Semitic languages, including notably Classical Arabic and Biblical Hebrew, share a range of morphosyntactic properties including VSO order. Vennemann () proposed that this was due to ancient contact leading to an Afro-Asiatic substrate in Insular Celtic, although this idea is not generally accepted by historical linguists (Baldi and Page ; Isaac ).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

DRIFT: THE QUESTION OF THE DIRECTION OF CHANGE

syntactic changes’ are something to be explained rather than an explanatory notion in themselves. Again, this point is well-taken. So, how can we understand ‘implicational sequences of syntactic changes’ in terms of a parametric theory of change and variation? Here, we seem at first sight to be confronted with a paradox, as noted by Roberts and Roussou (: –). They point out that one can view the parameters of UG as defining an abstract space of variation. In that case, the natural, and arguably the usual, way of viewing synchronic variation is to see grammars as randomly scattered through this space, while the natural way to think of diachronic change would be to see it as a random ‘walk’ around this space. This is consistent with the idea that typological drift is an incoherent notion, essentially for the reasons Lightfoot gives. But, in that case, what of ‘implicational sequences of syntactic changes’ in the diachronic domain, and what of implicational relations in the synchronic domain? Alongside the evidence for ‘implicational sequences of syntactic changes’, typological work has yielded evidence for a further type of change which strongly tends to be unidirectional: grammaticalization, the process by which new grammatical morphemes are created, either from other grammatical (or functional) elements or from lexical elements (see Croft : –). In §. we discussed and illustrated the approach to this phenomenon adopted by Roberts and Roussou () and van Gelderen (, , , ), an approach which is fully compatible with the general assumptions being made here in that grammaticalization is reduced to parameter change. Although a few isolated cases of degrammaticalization have been observed (see Willis ), grammaticalization appears to be a pervasive phenomenon, and strongly tends to follow certain ‘pathways’; for example, minimizers or generic terms tend to become n-words and/or clausal negators, demonstratives tend to become definite determiners, and verbs of certain semantic classes tend to become auxiliaries. Heine and Kuteva () and Kuteva, Heine, Hong, Long, Narrog, and Rhee () give very extensive lists of examples of this sort. This, then, is a further example of tendential directionality in parameter change. If so, then ‘grammaticalization paths’ are a further case of implicational sequencing of syntactic change, and raise the same questions for a parametric approach as the other ones. A further point arises from Sapir’s original discussion of drift. His concern is to understand why languages differ from each other at all. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

He observes that social, geographical, or even individual variation do not actually provide an explanation. He says that if ‘individual variations ‘on a flat’ were the only kind of variability in language, I believe we should be at a loss to explain why and how dialects arise, why it is that the linguistic prototype gradually breaks up into a number of mutually unintelligible languages’ (: ). It is in this context that he introduces drift. So, for Sapir, drift explains the very existence of different grammatical systems. Some explanation of this phenomenon is also required even if a parameterized syntax is assumed: why, even if there are different parameters, should there be different systems? We saw in our discussion of the dynamic-systems models of syntactic change proposed by Niyogi and Berwick (, ) in §. that variation may spontaneously arise through the interaction of the set of possible grammars, the learning algorithm, and fluctuations in the PLD, but Sapir’s point is that the variation over time in a group in relation to the ‘linguistic prototype’ must be centrifugal. Otherwise, as he says ‘[o]ught not the individual variations of each locality, even in the absence of intercourse between them, to cancel out to the same accepted speech average?’ (). He proposes drift as the centrifugal force; arguably, a parametric approach needs something similar. A final argument in favour of treating diachronic change, seen as parameter change, as something more than a random walk through the range of possibilities defined by UG comes from the observation that the UG-defined space, all other things being equal, is simply too big. It is a consequence of the Borer-Chomsky Conjecture that the number of possible parameter values is a function of the number of formal features of functional heads. In the simplest case, if n is the number of formal features then n is the number of parameter values. If all the parameters can in principle be independently set (although their effects may interact), then () also holds: () The cardinality of the set of grammatical systems jGj is n. n gives us the size of the space of possible grammars that has to be searched by the acquirer. If there are thirty formal features, then there are sixty parameter values and jGj = 30, or ,,,. This estimate is almost certainly on the low side; Roberts (a: ‒) lists sixty-nine parameters in TP/VP domain, while Longobardi () identifies ninety-one parameters just in the nominal domain, 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

DRIFT: THE QUESTION OF THE DIRECTION OF CHANGE

giving . If there are just a hundred FFs, then jGj = ,,,,,,,,,,. Kayne (a: ) claims that a grammar space of this size is not problematic for acquirers since they do not have to ‘work directly with the space of possible grammars’; however, since Clark (), it has been widely held that domainsearch is required (see the discussion in Fodor and Sakas : ‒). Given that the acquirer must search the entire domain space and cannot set parameters piecemeal, this very large space may pose a severe learning problem: at the very minimum, acquirers must be able to search the space extremely efficiently so as to rule out billions of possible grammars (and, following standard poverty-of-stimulus considerations, on the basis of impoverished PLD). There is also a diachronic/typological issue here. Consider the following thought experiment (based on Roberts , a: –, a: ‒). Suppose that at present approximately , languages are spoken and that this figure has been constant throughout human history (back to the emergence of the language faculty in modern Homo sapiens); this is highly unlikely, but the supposition suffices for the point to be made. Suppose further that every language changes in at least one parameter value with every generation; this is quite reasonable (in fact, where there is no normative pressure from educational and other institutions, it is quite likely that there is more parametric change than this, but again this simple supposition will suffice). Third, if we have a new generation every twenty-five years, we have , languages per century. Finally, suppose that modern humans with modern UG have existed for , years, i.e. , centuries. It then follows that ,, languages have been spoken in the whole of human history, i.e. 7 ; here again, the numbers can be revised as necessary without fundamentally affecting the argument. The figure of ,, is twenty-seven orders of magnitude smaller than the number of possible grammatical systems arising from the postulation of a hundred independent binary parameters. So if the parameter space is as large as a hundred parameters, there simply has not been enough time since the emergence of the species (and therefore, we assume, of UG) for anything other than a tiny fraction of the total range of possibilities offered by UG to be realized. So, even with a UG containing just a hundred independent parameters, we should expect that languages appear to ‘differ from each other without limit and in unpredictable ways’, in the famous words of Joos (: ). 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

Alternatively, if all languages ever spoken derive from a single one (perhaps an idiolect), then, if diachronic change is constrained, there may not have been time—and there may never be time—to see the full range of variation that UG allows, with the consequence that no typological generalization of a non-extensional kind can ever be maintained (thanks to Anders Holmberg for pointing this out). Either way, it is clear that this kind of parametrized UG creates too large a space of possible grammars (and recall that these conclusions are based on a UG with a hundred parameters, but it is probable that there are many more). Perhaps the most important consequence of this line of reasoning is that the uniformitarian assumption ought to have no force. So many grammatical systems are available that they could appear to vary wildly, and there would be no way to establish uniformity, even with a restrictive UG and a number of binary parameters in the low hundreds. For all these reasons we must add something to the conception that grammars diachronically ‘walk’ around the parameter space defined by UG. Something must cause grammatical systems to ‘clump together’ synchronically in certain areas of that space and to drift towards those areas diachronically, i.e. something creates local maxima in that space, which we could think of as macro-types. The natural hypothesis here is that the force responsible for this is third-factor driven markedness of the kind described in §.. To phrase things in terms of the theory of dynamical systems, we can think of UG as a state space, a multidimensional space in which each point represents a system-state, i.e. a set of values of parameters, a grammar. This is rather different from seeing the ensemble of the learning algorithm, the set of grammars, and the probability distribution of PLD items as a dynamical system in the way we saw Niyogi and Berwick (, ) and Niyogi () do in §..: here I am simply looking at the set of grammars. Markedness creates basins of attraction, or sinks, in this space, i.e. points towards which grammars always tend to move. In other words, certain areas of the parameter space attract grammatical systems, by being relatively unmarked. We saw this in the case of harmonically head-initial systems in the previous section. There are presumably a number of basins of attraction and therefore a number of macro-types; in terms of the parameter hierarchies we argued for in §., these will be connected to macroparametric settings. Since there are several third factors which, 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

DRIFT: THE QUESTION OF THE DIRECTION OF CHANGE

to a degree, create different preferences as we saw in §.., there may be—and presumably are—various basins of attraction.36 I do not want to insist unduly on the dynamical-systems terminology here; for present purposes, the notions of attractor, basin of attraction, and space of variation can be taken as metaphorical. The central point is that something like Sapir’s notion of drift is required on both empirical and conceptual grounds, even given a highly restrictive theory of the language faculty and a relatively small number of binary parameters of variation, and that this notion can be understood in terms of markedness: markedness defines the areas in the space towards which grammars tend to drift. This view is also compatible with the Inertia Principle: essentially if no other force acts on a grammatical system (i.e. no external contingency radically alters the PLD), a grammar will continue to drift in a given direction—it is conceivable that Inertia does not entail stasis.37 Furthermore, this idea is compatible with Lightfoot’s (: –) view that parametric change is chaotic in the sense that it is highly sensitive to very small variations in initial conditions. However, Niyogi () shows that the dynamical system constituted by the learning algorithm, the set of grammars and the distribution of PLD is not chaotic, although it is non-linear. Again speaking metaphorically, a grammar may only need a very small ‘push’ in order to start to drift in a direction which may take it, given enough time, a long distance in the state space from its starting point. In other words, a long-term typological change may come about through a cascade of smaller changes all tending in the same overall typological direction. These points are arguably illustrated by the word-order changes in Latin/Romance and other branches of Indo-European discussed above, which may ultimately have resulted from the reanalysis of neuter 36 Possibly of various kinds: Lass (: ) describes the distinction between pointattractors, limit-cycle attractors and gutters. Point-attractors resemble a level ‘valley’ in the state space (or ‘epigenetic landscape’ (Waddington )) in which a system will come to rest at a steady state. Limit-cycle attractors are like a bowl, around the side of which the system will oscillate. Gutters are sloping valleys, down which the system will continue to move. It is an open question whether these distinctions are useful in diachronic syntax, but arguably one worth exploring, as indeed Lass does (: –). 37 Although if we take this view we can no longer construe Inertia as deriving from perfect convergence by language acquirers, as we suggested in §.. It implies that optimal convergence, determined by markedness, will force change in a particular direction— towards an attractor.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

relative pronouns as complementizers. These changes took place over a long period (for example, through the entire period in which Latin is documented in its various forms according to Ledgeway ) and give the impression of a single, large, gradual change. There is nothing inevitable about such a sequence of changes: it is the consequence of what we might call ‘inertial drift’. Of course, in the actual history of actual languages, many other forces, notably contact, also play a role. As a result, systems are deflected from their ‘path of drift’. 4.3.4. Cascading parameter changes in the history of English It has often been pointed out that English seems to diverge quite radically from the other West Germanic languages. It used to be thought that this had to do with the influence of Norman French, although more recently the effects of Old Norse have sometimes been regarded as responsible for this divergence; Emonds and Faarlund () propose that Norse replaced English in the ME period, a matter we return to in §... A series of changes took place in the history of English between  and , which had the net effect of transforming English from a typologically ‘standard’ West Germanic language into the cross-linguistically unusual system of Modern English which differs in a number of important respects from the neighbouring North Germanic, Romance, and Celtic languages, as well as from the other West Germanic languages (making Modern English an ‘outlier’ in Germanic; see Roberts b and §..). Arguably, this is an instance of the formal account of drift I suggested in the previous subsection: a cascade of parametric changes diffused through parts of the functionalcategory system over a fairly long period of time. It amounts to what Sapir referred to as ‘the vast accumulation of minute modifications which in time results in the complete remodeling of the language’ (: ). Here I want to look at the subgroup of those changes which affected the ENE verb-movement and auxiliary system. (For more detailed discussion of both the empirical and technical aspects of these and the earlier changes, see Biberauer and Roberts .) Let us begin with the loss of V in the fifteenth century. We can date this change to approximately  (see van Kemenade : ff.; Fischer et al. : ff.). It is often said, starting with van Kemenade (: ff.), that this change came about through ‘decliticization’: the earlier V orders, where a clitic pronoun intervened between the initial 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

DRIFT: THE QUESTION OF THE DIRECTION OF CHANGE

constituent and the verb in C (see §... on V in OE; this system persisted in southern dialects of ME, as Kroch and Taylor  show), were reanalysed as involving a non-clitic pronoun in a lower position and the verb in T. So we had the following reanalysis: () [CP XP [C S-cl [C[T V]]] [TP (Subj) tT . . . ]] > [CP XP C [TP S-pron [T V] . . . ]] This reanalysis was presumably favoured by the fact that many V orders were in any case subject initial, and such orders were prone to be reanalysed as TPs with V-to-T movement (see Adams a,b; Roberts a on Old French; Willis  on Middle Welsh). Therefore, a consequence of the loss of V was that V-to-T movement became a general feature of finite clauses; see also Haeberli and Ihsane (: ‒). Where there was a pronoun, the relevant reanalysis was as in () (here the set notation indicates that the order of the copy of V and the copy of XP inside TP is irrelevant): () [C XP [T V]] [TP { . . . (XP) . . . (V) . . . }] > [TP XP [T V] . . . (XP) . . . ] This change eliminated V-movement to C and XP-movement to SpecCP in a range of cases. The next change to take place was the lexicalization of T by the reanalysis of modals and do as auxiliaries. As we saw in §.., this most probably happened c.– (Lightfoot ; Roberts , a: ff.; Warner : –). Roberts (a: ff.) and Roberts and Roussou (: ‒) argue that a further factor in this change was the loss of the infinitival ending on verbs, which had the consequence that constructions consisting of modals followed by an infinitive were reanalysed as monoclausal: the absence of the infinitival ending meant that there was no evidence for the lower functional T-v system. Denison () and Roberts (a) suggest that do became an auxiliary at the same time as the modals, in the early sixteenth century. Formerly, in Late ME, do had been a raising or causative verb (see Roberts a: ff.). The reanalysis of the modals favoured the loss of V-to-T movement, the first step of which may have happened in the early sixteenth century, according to Haeberli and Ihsane () (see §...). We saw in our discussion of Kroch () and the Constant Rate Effect in §.. that, although there was variation throughout the ENE period, as Warner (: –) observes (see the discussion of his periodization 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

of ENE in §..), the period – seems to be the crucial one as far as this change is concerned (but see again the discussion of Haeberli and Ihsane  in §...). The development of auxiliaries, and particularly the free availability of ‘dummy’ do, including in positive declaratives, meant that T was frequently lexically filled and that this option was always available. I suggested in §.. that the loss of morphological expression of the V-to-T parameter created the strong p-ambiguity needed for a reanalysis of the following kind (repeated from () of §..): () [TP John [T walk-eth] . . . [VP . . . (V) . . . ]] > [TP John T . . . [VP . . . [V walks]]] Following Kroch () and Warner (), this reanalysis led to variation for a period, but the innovative, structurally more economical grammar was favoured. By now the verb-auxiliary system is rather similar to that of Modern English, with the exception of the absence of do-support. Do could still be freely inserted in positive declarative clauses, and, conversely, clausal negation could appear without do in the absence of any other auxiliary, giving rise to examples with the order not–V (since V-to-T has been lost):38 () a. Or if there were, it not belongs to you. (: Shakespeare,  Henry IV, IV, i, ; Battistella and Lobeck ) b. Safe on this ground we not fear today to tempt your laughter by our rustic play. (: Jonson, Sad Shepherd, Prologue ; Kroch ) The development of NE-style do-support was preceded by the development of contracted negation, which took place around , as the following remark by Jespersen (–, V: ) suggests: ‘The contracted forms seem to have come into use in speech, though not yet in writing, about the year . In a few instances (extremely few) they may be inferred from the metre in Sh[akespeare], though the full form is written’. Around , then, negation contracted onto T, but since V-to-T movement of main verbs had been lost, only auxiliaries were 38

This is what lies behind the differential rate of change in the introduction of do noted in Kroch (), which we mentioned in note  above.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

DRIFT: THE QUESTION OF THE DIRECTION OF CHANGE

able to bear contracted negation. This gave rise to a new system in which negative auxiliaries were used as the basic marker of clausal negation. (It is clear from a range of languages, including Uralic, Latin, Afrikaans, OE, and others, that negative auxiliaries are a UG option; see Holmberg , Roberts a: ‒.) The new class of auxiliaries included negative modals like won’t, can’t, and shan’t, but also the non-modal negator don’t/doesn’t/didn’t.39 I take these to be the sociolinguistically neutral expressions of negation in contemporary spoken English; Aux + not forms with clausal-negation interpretation are an instance of negative concord between a silent negative T and not (see Biberauer and Roberts : ; if this analysis is correct, it has implications for what we said concerning NE negation in §.). Once the negative auxiliaries, including doesn’t, don’t, and didn’t, are established as the unmarked expression of clausal negation (probably by the middle of the seventeenth century; Roberts a: ), the modern system of do-support comes into being. In this system, merger of do in T depends either on the presence of an ‘extra’ feature on T, in addition to Tense and φ-features (i.e. Q or Neg), or on the presence of a discourse effect, in contexts of emphasis and VP-fronting, as in:40 () a. John DOES (so/too) smoke. b. He threatened to smoke Gauloises and [smoke Gauloises] he did. . . . With this final development, the present-day English system is in place. The series of changes just described can be seen as a cascade of parametric changes. We can summarize them, giving approximate dates for each one, as follows: () a. b. c. d.

loss of V () >; development of lexical T (modals and do) () >; loss of V-to-T () >; contraction of negation () >;

39

Zwicky and Pullum () argue that the negative auxiliaries must be distinct items in the lexicon. The negative n’t must be treated as an inflectional suffix, rather than a clitic, since inflections but not clitics trigger stem allomorphy, and n’t triggers such allomorphy (see also Spencer : ff.). The same is argued by E. Williams (: ) on the basis of the unpredictable relative scope relations between various modals and negation. See also Biberauer and Roberts (: ‒). 40 This is connected to Chomsky’s (: ) proposal that ‘optional operations can apply only if they have an effect on outcome’.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

e. development of negative auxiliaries (s) >; f. development of do-support (later seventeenth century). In more technical terms, the parametric changes involved are arguably the following (with the exception (d)): () a. b. c. d. e. f.

(Matrix) C loses features triggering V-movement. Modal and aspectual features of T realized by External Merge. T loses features triggering V-movement. (Possibly not a syntactic change). Negative features of clause realized by External Merge in T. Obligatory merger of do in T restricted to contexts where T has a Neg, Q feature (or clear discourse effect).

All the syntactic changes except (a) involve changes in the feature make-up of elements merged in T, and there is a clear formal parallel between (a) and (c). So we observe exactly the kind of small, incremental changes discussed above, although this is not lexical diffusion, but a series of changes mostly affecting the features of a single functional head; as such these are arguably microparametric changes. Taken together, they give rise to a major reorganization of the English verb-placement and auxiliary system, and have created a system which is quite unlike anything found elsewhere in Germanic (or Romance), from a starting point in  which was comparable to what we find in Modern Icelandic (see §..., Table .). As we have seen, English word order changed from OV to VO and from VAux to AuxV in the ME period. VAux order involved roll-up of vP to SpecTP. The loss of this order is clearly another change in the movement-triggering properties of T. It is interesting to note that Icelandic underwent the word-order change but not the subsequent changes discussed above (see Rögnvaldsson ; Hróarsdóttir , ; and §..). It may be that Icelandic never lost V because it never had subject (pro)clitics, if the account of the loss of V sketched above based on decliticization is right. Bringing the word-order change into the picture, we see ‘the vast accumulation of minute modifications which in time results in the complete remodeling of the language’ Sapir described. So this is a clear example of parametric drift. As already mentioned, this account of cascading parameter change in the history of English is developed in more detail in Biberauer and Roberts ().



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

DRIFT: THE QUESTION OF THE DIRECTION OF CHANGE

What causes the cascade effect? The key idea, due to Lightfoot (: ), is that ‘grammars practice therapy, not prophylaxis’. Essentially, each parameter change skews the PLD in such a way that the next is favoured. We have seen in the description above how each successive change was favoured. In general, then, we see that it is possible to maintain a version of the Inertia Principle and at the same time account for an intricate series of related syntactic changes, not all of which have a purely syntax-external cause. There are many details, both technical and empirical, that remain to be clarified in the account of these changes in ME and ENE, but in general terms we can see this as an example of parametric drift, as described above. The changes probably concern a single parameter hierarchy concerning verb-movement options, and fairly clearly involve increasing markedness. Although these changes do not exactly map on to the verb-movement hierarchy in Roberts (a: ), they clearly involve movement down the hierarchy from a V grammar to an NE-style grammar. We saw in §.. that Biberauer and Roberts (, ) conclude that the development of the NE auxiliary system involved an increase in markedness and in §. we commented on the late acquisition of aspects of that system, in particular the acquisition of agreement and the relatively high frequency of root infinitives, which we suggested was connected to the late acquisition of Affix Hopping. Two final points remain to be made, both stemming from Sapir’s original discussion of drift, and linking up with questions we have looked at elsewhere in this book. First, Sapir remarks (: ) that ‘if this drift of language is not merely the familiar set of individual variations seen in vertical perspective, that is historically, instead of horizontally, that is in daily experience, what is it?’ Here I take Sapir to be asserting essentially that change spreads through orderly heterogeneity in the speech community, in the sense discussed by WLH and in §... It is clear from Kroch () that there was variation in sixteenth- and seventeenth-century English regarding V-to-T movement and do-insertion, and we can assume that the same must have been true regarding the status of the modals and the nature of contracted negation and negative auxiliaries. Presumably, the variants had social value (although it is hard to find evidence of this; they certainly do in most varieties of contemporary English, though—see also Jespersen –, V: ). So in principle we can, in line with Sapir and 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

WLH, tie sociolinguistic variation in with our account of the changes in the ENE verb/auxiliary system discussed above. A final quotation from Sapir illustrates the second point: The general drift of language has its depths. At the surface the current is relatively fast. In certain features dialects drift apart rapidly. By that very fact these features betray themselves as less fundamental to the genius of the language than the more slowly modifiable features in which the dialects keep together long after they have grown to be mutually alien forms of speech . . . The momentum of the more fundamental, predialectic, drift is often such that languages long disconnected will pass through the same or strikingly similar phases. (Sapir : )

Here Sapir is asserting that there are relatively superficial, ‘faster’ changes which may not affect the ‘more slowly modifiable features’ which are more ‘fundamental to the genius of the language’. Our fourway division of parameters into macro-, meso-, micro-, and nanoparameters arguably captures Sapir’s intuition. We saw in §.. how the different parameter types correspond to differing degrees of diachronic stability, and as such may reflect shared features of stocks, genera, or individual language families. We also see from our discussion of the history of English that a given path of drift can make a language an outlier in its family, as NE rather clearly is in West Germanic (we return to this point in §..). The opposite case to outliers is also relevant here: the question of parallel development, language families in which several daughters diverge typologically from the parent language, but all in the same way; this is true of the Romance languages in relation to Latin, for example, as has often been pointed out (cf. Harris : –); Gianollo, Guardiano, and Longobardi () show that the Romance languages are parametrically more similar to each other than any of them are to Latin, at least as regards their DP-syntax (see also Longobardi : ‒). We also saw in §. and §.. that several branches of Indo-European have undergone parallel development in introducing finite complementation by a reanalysis of relative pronouns as complementizers. Again, given a certain common ‘starting point’ (cf. the general typological features of older Indo-European languages mentioned in Chapter , note , and §..), some parameters are more likely to change than others. Those more likely to change uniformly may be mesoparameters, while micro- and nanoparameters are more likely to be idiosyncratic, reflecting Sapir’s ‘surface . . . current’. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

RECONSTRUCTION

4.3.5. Conclusion In this section, we have discussed the possibility of parametric drift, considered its relation to typological drift of the more familiar kind, and Lightfoot’s () well-known objections to the latter, and sketched a possible example of this drift from the history of English. If parametric change admits of directionality, as I have suggested (pace Lightfoot) that it must, the next question to investigate is whether we can exploit this potential directionality in reconstructing lost parametric systems. This is the topic of the next section.

4.4. Reconstruction41 4.4.1. Introduction In this section, I want to take up a further question from traditional historical linguistics in relation to the parametrically based approach to syntax and syntactic change that I have been presenting here: the question of syntactic reconstruction. Comparative reconstruction is a powerful and effective methodology in historical linguistics; Campbell (: ), for example, says that the ‘comparative method is central to historical linguistics, the most important of the various methods and techniques we use to recover linguistic history’. Its importance for establishing the nature of the phonological systems of unattested languages, or stages of languages, is all but undoubted (Fox  is a useful introduction to the general topic). However, in the area of syntax, the situation is a little different. To quote Campbell again, ‘[o]pinions are sharply divided concerning whether syntax is reconstructible by the comparative method’ (: ). In particular, Lightfoot (: –, : –, : –) has argued that the reanalytical nature of parametric change makes reconstruction impossible. On the other hand, Harris and Campbell (: ch. ) suggest, against the background of a rather different model of syntax but nonetheless an approach to syntactic change which assumes reanalysis as a central 41

This section is based on a talk given with Nigel Vincent at the University of Konstanz in February . I am grateful to Nigel for his collaboration and for extremely stimulating discussions of these questions with him, which greatly clarified my thinking. The views expressed here are my own responsibility, however.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

mechanism, that syntactic reconstruction is possible. In earlier work (Roberts ), I suggested that parameters can actually be used as a basis for reconstruction. This idea was taken up and applied in detail to Germanic by Walkden (, ), as we shall see. Here I will reconsider the issues involved, and make a further suggestion regarding the possible reconstruction of aspects of the syntax of Proto IndoEuropean (PIE). The conclusion will be that syntactic reconstruction using the model assumed here is possible within certain clear limits. Being possible, it is definitely desirable, and sets a clear agenda for future research. As Clackson (: ) says: ‘[s]yntactic reconstruction through comparison of related languages and the formation of explanatory hypotheses for the structure of the parent language seems to work. Perhaps it is time to leave the arguments about the methodology to one side, and concentrate on reconstructing syntax’. 4.4.2. Traditional comparative reconstruction Before entering into the discussion of the issues surrounding syntactic reconstruction, let us briefly review the reconstructive method, using an example from phonology. Campbell (: ff.) discusses this at some length (see also Fox : ch. ; Clackson : ‒). Consider the following Romance forms, the words for ‘goat’ in various languages: () Italian: Spanish: Portuguese: French:

capra cabra cabra chèvre

/kapra/ /kabra/ /kabra/ /∫εvr(ə)/

Of course, in this case, we know the original Latin form was capra (/kapra/). So we can use the reconstruction of Latin on the basis of Romance forms as a check that the method is reliable. The method proceeds as follows: once we have a set of cognate forms, we establish the likely sound correspondences. In the case of the forms in (), we clearly want to say that Italian, Spanish, and Portuguese /k/ and /a/ correspond to French /∫/ and /ε/ respectively, for example, and that Italian /p/, Spanish/Portuguese /b/, and French /v/ correspond. The next step is to hypothesize what the likely proto-form was, i.e. the form in the ancestor which gave rise to the attested forms in the daughter languages. This is done on the basis of ‘majority rule’ (not because of any predilection for democracy on the part of historical 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

RECONSTRUCTION

linguists, but rather as a matter of methodological parsimony, since this option involves postulating fewer changes), the likely direction of changes, the factoring of common features, the plausibility of the proposed changes, and the typological plausibility of the overall reconstructed system. Another factor which may be relevant is the relative age of the daughter languages; the older a daughter language is, the more likely it is to be closer to the ancestor (all other things being equal), and so forms in an older language may carry more weight in determining a reconstructed form than those in a relatively younger language. (This criterion does not play any role in the case illustrated by (), however.) In the case of the initial consonant, the majority clearly favours /k/, and since /k/ is likely to become palatalized before a front vowel like /ε/, we postulate the change from /k/ to /∫/ in the development from Latin to French. Majority rule similarly favours /a/ over /ε/ as the vowel, here and in the case of the final vowel, for which we posit that /a/ changed to /ε/, then to /ə/, and then dropped in final position (Campbell : ). The factoring of common features is relevant for the medial consonant /p ~ b ~ b ~ v/; the common feature here is labiality, and the natural direction of consonant lenition suggests that the original consonant was /p/, which was voiced in Ibero-Romance and lenited to /v/ in French. Since the /r/ is common to all the cognate forms, we arrive at the reconstructed form /kapra/, which, as I said above, we know to be correct from the Latin record. This brief illustration of reconstruction shows how the method works. Much more data is needed to really indicate its effectiveness, and to illustrate the points regarding typology and overall plausibility. But this is enough to allow us to proceed to the questions raised by syntactic reconstruction. 4.4.3. Questions about syntactic reconstruction The first question that needs to be addressed concerns the ontological status of reconstructed forms. When we posit a reconstructed form, what are we actually positing? This question applies to all kinds of reconstruction, of course. There are, roughly speaking, two views on this question as far as phonological and morphological reconstruction are concerned. First, there is what we can call the ‘realist’ view, represented notably by Lass (: –). According to this view, accurately reconstructed forms reflect the reality of an earlier stage of a 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

language: what was actually spoken by real people at a given historical moment. Lass (: ) says, of the reconstructed Indo-European /p/ as the initial consonant in the word for ‘pig’, ‘if we met a Proto-IndoEuropean speaker we’d expect that his word for ‘pig’ would begin with some kind of voiceless labial stop’. This realist view of reconstructed forms can be opposed to the conventionalist view, as favoured by Meillet () and by Lightfoot (: –, : –, : –). According to this view, reconstructions simply summarize existing knowledge about relationships among attested stages of languages; there is no commitment as to what the actual historical forms were. In Lightfoot’s words, reconstruction ‘is the exploitation of acquired knowledge to express genetic relations’ (: ). I will adopt the definition of a proto-grammar, a reconstructed grammar, put forward by Mark Hale (: ). Hale proposes that a proto-grammar is a ‘set of grammars which are non-distinct in their recoverable features’. I take this to mean that, while we cannot claim to be able to isolate a unique proto-grammar, we can approximate it by isolating its recoverable features on the basis of comparison of attested descendant grammars. The recovered features are real features, not simply conventions summarizing existing knowledge about descendant grammars. The features in question are structural features of grammars which are in principle open to variation (or there would be no question as to what to reconstruct); in the domain of syntax, this definition implies the following question: is there any reason to think that parameter values should not in principle be recoverable, in the same way as other grammatical features? Since parameter values reduce to formal features of functional heads, we need a very strong reason to answer this question negatively. In the absence of such a reason, I will take the answer to be positive. Lightfoot’s argument has always been that reconstruction of syntax is impossible because syntactic change is largely or totally driven by reanalysis and reanalysis is a process with no inherent directionality. In other words, there is no way to say, for example, that an SOV system is more likely to turn into an SVO one or a VSO one. Because of this, one cannot infer from the present existence of, say, two related languages, one SVO and VSO, what the basic word order of the parent language may have been. As Lightfoot puts it (: ): ‘the kind of reanalyses that occur in catastrophic change constitute cutoff points to reconstruction’. Proto-languages are no more amenable to 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

RECONSTRUCTION

reconstruction than proto-weather: ‘one can no more reconstruct the syntax of a proto-language than one can reconstruct last week’s weather, and for the same reason: both reflect chaotic systems’ (Lightfoot : ).42 On the other hand, Harris and Campbell (: ) claim that syntactic reconstruction may be possible provided we can solve the correspondence problem. The correspondence problem relates to the very first step in the method of reconstruction as outlined in the previous subsection: setting up the group of putative cognates whose common ancestor is to be reconstructed. In phonology, this problem is straightforward: yesterday’s segments correspond in some fairly systematic way to today’s (thus French chèvre is the inherited reflex (or continuation) of Latin capra). But in syntax, we cannot take sentences as the unit of correspondence: of which Latin sentence is L’état, c’est moi the continuation, or reflex? As Watkins (: ) puts it: ‘[t]he first law of comparative grammar is that you’ve got to know what to compare’. Harris and Campbell (: –) propose a way of solving this problem, and illustrate it on the basis of interesting data from Kartvelian languages. I will not dwell on their approach, however, since it is based on a different set of assumptions about the nature of syntax from those I am adopting here. Their point, however, is welltaken: the correspondence problem appears much more difficult in the case of syntactic reconstruction than in the case of phonological or morphological reconstruction. This is because phonological and morphological reconstructions are based on a finite array of lexical items, while syntax deals with the unbounded range of sentences. There is little doubt that the sentence cannot be the unit of reconstruction; here we can clearly see the nature of the correspondence problem for syntactic reconstruction. A third point of view is represented by Watkins (: ). He makes the simple but telling observation that syntactic reconstruction, whatever its methodological basis, has in fact been put to the test, and has passed it very successfully: ‘the confirmation by Hittite of virtually every assertion about Indo-European word order patterns made by Berthold Delbrück [see Delbrück (–)—IGR] [is] . . . as dramatic as the surfacing of the laryngeals in that language’ (Watkins 42

In this connection, it is worth pointing to the aptness of the title of Willis’ () reconstruction of Brythonic free relatives.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

: ). So one might be tempted not to worry unduly about methodological matters, as Clackson (: ) suggests in the quotation at the end of §... Vincent and Roberts () point out two further problems for syntactic reconstruction, in addition to the correspondence problem. One of these is the directionality problem: choosing which attested forms to reconstruct depends on knowing which of the putative changes is the most natural or likely. Do we have a way of deciding which kind of word-order change is most natural, for example? Lightfoot claims that we cannot determine directionality. But we saw in the previous section that it might be possible, and even desirable, to develop a theory of parametric drift, determined by a robust theory of markedness of parameter values and parameter hierarchies. Such a theory may be readily formalizable as dynamical system, and may be compatible both with the Inertia Principle and with Niyogi’s () observation that change is non-linear. The third problem Vincent and Roberts identify is the ‘pool of variants’ problem. This can be simply illustrated with the forms of the future in a range of Romance varieties: () French: chanter-ai (‘I will sing’) Italian: canter -ò ‘ Spanish: cantar -é ‘ Rumanian: voi cînta ‘ Sardinian: appo a cantare ‘ Calabrese, Salentino: no form How are we to decide what the original form might have been on this basis? In this connection it is worth pointing out that the Classical Latin future forms amabo ‘I will love’, dicam ‘I will say’, etc., cannot be reconstructed at all on the basis of comparative Romance evidence.43 Lass (: , note ) points out that a number of other properties of Classical Latin cannot be reconstructed just on the basis of the surviving Romance languages: the ‘case-system, verb morphology, OV syntax, three genders, etc.’. This is perhaps slightly overstated: Spanish, some Francoprovençal varieties and possibly Romanian show some residue of a neuter gender, while Old Occitan, Old French, and some Francoprovençal varieties have residual case systems (see §.., (), on Old French). The consistent proclitic placement of unstressed complement pronouns in nearly all Modern Romance languages could conceivably provide a basis for a suspicion of an earlier OV order, although we would need a more predictive theory of syntax than we currently have to state this with any real confidence. 43



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

RECONSTRUCTION

Let us now consider the first and third of these issues one at a time. I will leave the directionality problem aside, since it was discussed in §.. 4.4.4. The correspondence problem Let us entertain the idea that identifying the parameters of syntactic variation offers a way to solve the correspondence problem. To make this idea clearer, recall that our guiding assumption throughout this book is that UG contains a finite set of binary parameters which defines the possible variation in grammatical systems. Now for a notational convention: being binary, parameter values can be expressed as members of the set {, }.44 So, for example, if we state the null-subject parameter as ‘every tensed clause requires an overt subject’ (see §.) then we can say that languages like French and English have the value , while Italian, Spanish, Greek, etc., have the value . Here we are doing nothing other than identifying a point of correspondence between the grammatical systems. As we saw in Chapter , what can be and has been achieved in the synchronic domain using parameters can be applied to the diachronic domain. In accordance with our demonstration in §.. that the null-subject parameter changed its value between Old French and Modern French, then, we can say that Old French had the value  for the statement of the parameter just given, while Modern French has the value . Again, we have identified a point of correspondence between Old and Modern French. There is no conceptual difficulty here at all.45 To illustrate how we might proceed with ‘parametric reconstruction’, consider the case where we have four daughter languages with determinate values for a given parameter, and we wish to reconstruct the

44

Following a notational device standard in formal semantics, we can then think of parameter values as the truth values of contingent statements about grammars. 45 Lightfoot (: –) argues that what is suggested in the text (which was first suggested in Roberts ) does not help with syntactic reconstruction, since we need an approach to directionality. He then goes on to mention four parameters which he considers cannot stand in a markedness relation. Three of these, OV vs. VO, V-to-T movement, and the presence or not of English-type auxiliaries, have been extensively discussed here and a view as to which is the marked value has been suggested (see §.). The fourth parameter concerns the bounding of wh-movement, as put forward by Rizzi (: ch. ). However, the status of this parameter is somewhat uncertain—see Rizzi ().



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

value of that parameter in the parent language. The situation is schematized in (): proto-language: p = ?

()

daughter 1: p = 0

daughter 2: p = 0

daughter 3: p = 0

daughter 4: p = 1

Here we would be justified in postulating p =  in the parent language on two grounds. First, ‘majority rule’ (or more precisely, analytical economy): this hypothesis entails that only one of the four daughters changed, and is therefore the simplest. This is justified as long as we know that the daughters with p =  are older than the one with p = . Also, if we know that  >  is the more natural change for p (the directionality question) than  > , postulating p =  is justified. Considerations of typological naturalness might also play a role. In other words, exactly the considerations that were at work in our illustration of phonological reconstruction above are relevant here; there are no differences of principle in syntactic reconstruction. To put the discussion on a slightly more concrete basis, suppose that p is the VO/OV parameter, stated for simplicity and concreteness as ‘V {precedes()/follows()} the direct object’.46 And suppose that we are trying to reconstruct the value of this parameter in PIE on the basis of what we know about Latin, Gothic, Sanskrit, and Old Irish. Then we have the situation in (): ()

Latin: p = 0

IE: p = ?

Sanskrit: p = 0

Gothic: p = 0

Old Irish: p = 1

Majority rule clearly favours p =  (the OV setting) here. So do several other considerations: (i) Latin, Gothic, and Sanskrit are all older than Old Irish;47 (ii) Archaic Old Irish shows some evidence of OV orders, and we saw in §.. that Continental Celtic, at least Lepontic, also 46

Of course, this is unlikely to be the correct formulation of this parameter in practice— see §.—but it is adequate for the point being made here. 47 Most of the Old Irish corpus dates from the eighth and ninth centuries (Mac Eoin : ), Gothic from about four centuries earlier.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

RECONSTRUCTION

shows SOV order (see Russell : ff.; Doherty a,b; Eska : ‒, ); (iii) OV > VO is (marginally) preferred over VO > OV (see §..). On the basis of the criteria normally used in comparative reconstruction, then, we would come to the conclusion that p =  was the PIE value of this parameter. This agrees with the discussion in Brugmann and Delbrück (–); Lehmann (: ff.), and the references given in the latter; Clackson (: ), Fortson (: , : ‒), and Keydana (: ). Pursuing this idea, we can also define the notion of a ‘continuation’ of a form. We can say that a continuation is a potentially variant property which is stable across grammars in the diachronic domain. Applying this to parameters, this is the case where a parameter value is ‘correctly’ set by acquirers over a given period. In other words, continuation is the opposite of abductive reanalysis (and represents the straightforward case of Inertia, leaving aside the possibility of ‘inertial drift’). In terms of the illustration just given, we would say that p =  in Latin, Gothic, and Sanskrit is the continuation of the PIE value of this parameter. In other words, over many generations (two or three millennia, in fact) the relevant trigger experience was sufficiently robust for acquirers to have continued to correctly set this parameter value on the basis of their PLD. Given what we saw in §.., macroparameters are those whose values are most likely to continue, with meso- and microparameters proportionally less likely to. Macroparameters are thus in principle those most valuable for reconstruction. The above considerations suffice to establish that parameters can be used as a solution to the correspondence problem for syntax. If parameters are formal features of lexical entries, this is no surprise. Lexical items can be used as corresponding items in general: those relevant for parameters are simply somewhat more abstract than the non-functional lexicon. Given this, and our general remarks on directionality, I conclude that there is no problem of principle in using parameter values as features of a proto-grammar in Hale’s sense as defined above. We can thus now state the following: () A proto-grammar is a set of grammars which are non-distinct in their recoverable parameter values. We can illustrate () by reconsidering some of the parameters we have been looking at in the preceding chapters. Here we adopt a notation first used in Clark and Roberts (). We assign one of the set {, , *} 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

to each parameter, where  is the positive value,  the negative value and * an unknown value ranging over {, }.48 Each parameter is stated independently of its position in a hierarchy here: () a. b. c. d.

T licenses a (consistent) null subject. (Root) C attracts V. C attracts WH. v attracts VP.

Italian: ; English:  German: ; English:  English: ; Chinese:  German: ; English: 

Now, if we impose an arbitrary order on the sequence of parameter values (for example, that given in (a‒d)), we can give the index of each grammatical system, as follows:49 () English: Italian: Chinese: German:

   

The correspondence problem can now be restated in a maximally general form as: for some proto-language P, what is P’s index regarding the properties listed in () or some comparable list of parameters? For illustration, let us briefly consider what these might have been for PIE. First, the null-subject parameter. As mentioned in §.., there is very little doubt that PIE was a null-subject language, almost certainly a CNSL. All the oldest attested daughters are CNSLs, and so we can simply assume that nothing changed in this regard. To the extent that CNSLs have ‘rich’ verbal inflection, the traditional idea that IE was a language with a rich verbal inflection (see Szemerényi : ; Fortson : , : ‒; Clackson : ‒; Lundquist and

48 This assumes that all parameters are set in any given system: ‘*’ is intended to indicate the unrecoverable parameter values in the sense of (). However, there is the further possibility that some parameters are neutralized by other parameter values (see §..). I will leave this complication aside here for expository simplicity. 49 In (), I have given the value for consistent null subjects in Chinese as  since Chinese is usually regarded as an RNSL in the sense of §.. (Huang , but see Li  on certain differences between Chinese and Japanese null arguments). On Chinese as a wh-in-situ language, see Huang (), §.. and Box .. See Huang (, ) for V-movement in Chinese, and Li (); Sybesma () for detailed discussion and analysis of Chinese word order. On Chinese syntax more generally, see in particular Huang, Li, and Li (); Huang, Li, and Simpson (); and Li, Simpson, and Tsai (). The values of the other parameters given here in English, Italian, and German were discussed in §., §.., and §...



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

RECONSTRUCTION

Yates : ) supports this conclusion. We have already seen reasons to treat PIE as OV (see ()). So we can immediately state (): () PIE: ** Concerning V, there is some evidence for an optional verb-fronting rule in Indo-European (Keydana : ‒), and clear evidence— possibly the best-known fact about Indo-European syntax—that various ‘light’ elements, pronouns, particles, and light adverbs, were attracted to the second position (Wackernagel ; see also Fortson : –, : ‒, Clackson : ‒). If we identify this second position with (root) C, then we observe that PIE may have had a kind of category-neutral version of V (see Roberts b, forthcoming). This is easily expressible in the formal system assumed here: the EPP feature of C which attracts heads is simply not specified for V. So it is possible that PIE had the positive value for a parameter closely related to the V parameter. Finally, it is widely agreed that PIE had wh-movement in both interrogatives and relatives to a position in the left periphery of the clause (see Hale  on Sanskrit; Garrett  on Hittite; Kiparsky  on Archaic Latin and various older forms of Germanic; and Fortson : , : ‒, Keydana : ). If this is correct, then we can conclude, with the proviso that the ‘second-position’ parameter did not relate specifically to verbs but to heads more generally, that the ‘parametric index’ of PIE for the four parameters listed in () was (): ()  There is evidence that both Latin and Classical Greek had multiple wh-fronting (see §.. and §..), but it is unclear how widespread this was in the other older languages. Walkden (: ) surveys the distribution of verb-movement into the left periphery in all the oldest Germanic languages and arrives at the results shown in Table .. (Here ‘Run.’ refers to Runic Norse, inscriptions from Scandinavia from approximately ‒AD; see Faarlund , a,b.) The evidence from Runic is extremely scanty, and very little can be concluded from it except that there was some V-fronting (Runic was fairly clearly OV, see Faarlund : ). The notable thing about Old Norse is that it is the only variety to show V-fronting in all contexts, 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

Table 4.2. Contexts of verb-movement into the left periphery in the oldest Germanic languages Yes/ noQ

whQ

Neginit

Imperative

Narrative Inversion

XPFronting

Matrix declaratives

Subordinate Clauses

Run. ?

?

?

Yes

?

Yes

Yes

?

ON

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Goth Yes

Yes

Yes

Yes

Yes

yes*

No

No

OE

Yes

Yes

Yes

Yes

Yes

yes?

Yes

No

OHG Yes

Yes

Yes

Yes

Yes

Yes

Yes

No

* only where a definite subject has been fronted From Walkden (2009: 57).

including subordinate clauses. It appears to have been a ‘symmetrical’ V language, then, just like Modern Icelandic (see §..., note ). Although Modern Icelandic has no pronominal clitics or weak pronouns, there is some evidence that weak pronouns followed the verb in V clauses in Old Norse (Faarlund : ). It is plausible to think, then, that the Proto-Germanic starting point showed the following order (assuming the more complex left periphery introduced in §..., (); see also Walkden : ‒, ‒): () Topic > (V) > focus > (V) > clitic/topic (V) [TP . . . . (V)] There are three XP positions in the left periphery, and the verb may move to a position following each one. Such movement may have been optional at the earliest stages. Walkden’s proposals are an important contribution to the general enterprise of parameter-based syntactic reconstruction. Moreover, his conclusions regarding Proto Germanic fit well with what has been observed elsewhere in older Indo-European varieties. For example, Hale () gives the structure in () for the Vedic Sanskrit left periphery: () [TopP Top [CP

C [FocP

Foc

IP]]]

Hale shows that V-movement to C was an option here. This structure closely approximates the Force-Foc-Fin structure we have been assuming. Very similar structures have been proposed by Garrett () for Anatolian; Carnie, Pyatt, and Harley () and Newton () for Old 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

RECONSTRUCTION

Irish; Salvi (), Devine and Stephens (), and Ledgeway () for Latin.50 Fortson (: –, : ‒) gives evidence from a range of archaic Indo-European languages (Hittite, Old Avestan, Greek, Latin, Armenian, Vedic, and Gothic) for ‘verb-topicalization’, wh-movement, a pre-wh topic position and second-position clitics. All of this points to a common structure for the left periphery along the lines of (): () [ForceP XP/V Force(EPP) [FocP wh Focwh,EPP [FinP Pronouns Fin . . . ]]] As we mentioned above, pronominal enclisis, with the pronoun typically in second position, is common across the older branches of IndoEuropean, as originally observed by Wackernagel (). The examples in () from Old Persian and Homeric Greek illustrate this. () a. Old Persian: pasāva=maiy Auramazdā upastām abara then=me. Ahura-Mazda- aid- he-brought ‘Then Ahura Mazda brought me aid.’ (XPh ; Clackson : ) b. (Homeric) Greek: autàr egṑ theós eimi, diamperès hḗ se  I- god- am, thoroughly who- you- phulássō I-protect ‘But I am a goddess, the one who protects you steadfastly.’ (Odyssey .‒; Clackson : ) Here we see cliticization to an adverbial sentential connective (‘then’) and to a relative pronoun respectively. Situating these elements in SpecForceP and SpecFocP respectively, we can place the pronouns in SpecFinP, right-adjacent in the string to the cliticization hosts. The cliticization patterns observed are thus consistent with the structure in (). Old Irish, like OE, OHG, and Gothic, gives evidence that negation can appear at the FocP level (either in Foc in SpecFocP), triggering enclisis:

Adams () is a detailed survey of ‘Wackernagel pronouns’ in Latin. His conclusion is that these pronouns are enclitic to the focused constituent on the left edge of the colon. This is clearly consistent with the general proposals in the text: Latin Foc attracts a variety of elements and the weak pronouns are in SpecFinP, in a position immediately right adjacent to the focused category (Foc is empty: Latin does not have a focus particle or other morphological mark of focus), in a position to feed PF enclisis. 50



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

() Maisse doíne ní=s toimled Glory of-men not=of-it he-partook ‘The glory of men, he did not partake of it.’ (Thes. II .; Clackson : )

[Old Irish]

Here, there is a dislocated XP (maisse doíne; note that the pronoun is a resumptive) in a higher Spec position, presumably SpecForceP. Additionally, Old Irish showed enclisis to what are traditionally called ‘preverbs’, i.e. aspectual markers of various kinds. There is a complex interaction between the enclitic pronoun, the preverb, and ‘conjunct particles’, a class which includes negation, question particles, and complementizers (elements naturally construed as associated with Foc and/or Force). The following examples illustrate this: () a. aton.cí (Preverb > clitic > verb) [Old Irish] +us.see. ‘He sees us’ b. Ní.m- accai (‘conjunct’ > clitic > preverb > verb) .me-see. ‘He doesn’t see me’ (See Carnie, Pyatt, and Harley ; Newton ). In (a) we see the preverb moving over the clitic with the verb following the clitic. We can account for this with the following structure: () [FocP/ForceP

Preverb [FinP clitics [Fin V] . . . ]]

So we can maintain that clitics/weak pronouns are attracted to SpecFinP. Taking the preverbs to be aspectual particles, incapable of bearing tense or agreement marking, and taking V to raise from T to Fin (V-to-T raising is justified by the ‘rich’ subject-agreement marking and the null-subject nature of Old Irish; see §..), we have a nested head-movement dependency here. This can be accounted for if Fin attracts T and T attracts V, hence the finite verb, while the preverb moves independently to satisfy an EPP-feature associated with either Force or Foc (cf. the discussion of V-movement to SpecForceP in Old Romance in §...). In (b) we see enclisis to negation, as in (), but with the preverb prefixed to the lexical verb. In this case and only this, the preverb incorporates with the verb and together these elements move to Fin. The optimal analysis would allow preverb-incorporation in all cases, but with excorporation of the preverb where it satisfies a 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

RECONSTRUCTION

higher head’s EPP feature as in (a) (excorporation is allowed in the framework for the analysis of head-movement proposed in Roberts ). Gothic too had a class of preverbs, such as the aspect-marker ga. Elements like the interrogative marker -u attached to these (in addition to attaching to lexical verbs): () ga-u-laubjats? --believe. ‘Do you two believe?’ (M , ; Longobardi b: )

[Gothic]

The similarity to Old Irish here is striking. If the Q-marker occupies Foc, then the preverb is either SpecFocP or SpecForceP. Either way, it functions as a simultaneously minimal and maximal category satisfying an EPP-feature. Structures similar to those in () and (), showing the order preverb > clitic > verb are also found in Archaic Latin (see Vincent : ): () Sub vos placo . . .   place. ‘I implore you’

[Archaic Latin]

Similar examples are attested in Old Lithuanian (Eythórsson ). A structure of the general kind in () is thus supported across several branches of older Indo-European, with microparametric variation regarding the detailed featural make-up of Foc and Force. In particular, weak/clitic pronouns are consistently attracted to SpecFinP. Clackson (: ‒) makes very similar observations. He also points out that sentential enclitics in Sanskrit, Greek, and Latin occur further left (i.e. higher in the left periphery, in our terms) than pronominal enclitics, suggesting that the former are in Force or Foc with the pronouns in SpecFinP. Hittite, on the other hand, has only one position for all enclitics, which follows the focused elements—enclisis to topics is not found. In Hittite, enclitics cannot attach to ‘left detached’ (i.e. possibly ForceP-external elements, in any case clearly higher than any focus position) elements, neither can they be ‘delayed’, i.e. appear later than second (see Clackson’s table ., p. ). The following examples illustrate how the ‘particle chain’ follows the first constituent in Hittite: 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE d Hantasepes andulisas harsa[r]=a () a. Harkanzi=ma=an They-hold== H-. man- head-=and GIS

SUKURA=ya spear-=and ‘The Hantasepa-divinities hold both human heads and lances.’ (StBoT ,,; Clackson : ) b. V SESMES=SU nu=smas ÉMES taggasta  brothers+his =them. houses he-assigned ‘As for his five brothers, he assigned them houses.’ (THeth  II ; Clackson : ) If the connective particles here are in Force, then we have evidence that Hittite Force had an EPP feature. The pronouns encliticize to that position. The more rigid ordering in Hittite as compared to the other languages is reminiscent of Wolfe’s (a,b) distinction between Fin-V and Force-V introduced in §... in that the former allows the verb to be ‘delayed’, i.e. third or later, while in the latter the verb is more rigidly in second position. It may be that the locus of encliticization in Hittite is always Force while in the other languages it may be various left-peripheral elements. This intuition would have to be formalized in terms of preferential targets of cliticization, a matter I will not go into here. We can conclude that there is support for a general structure of the left periphery of the kind seen in () across Indo-European, with microvariation among the daughter languages regarding the feature make-up of the various heads, and this in turn determines the different movement requirements observed. Hittite appears to show a different pattern, the more rigid second-position patterns suggesting Force is the principal attractor, although the precise determination and formalization of this is left open here. This picture is consistent with what we saw regarding second-position phenomena in Old Romance and Old English in §...; the structural template is the same, but there is microvariation in the featural properties of the heads making up the left periphery. So, although many questions of data and analysis remain open, this conclusion shows that reconstruction of some of the parameters of PIE is at least possible. In this way, a potentially interesting research agenda emerges, since all the issues raised above need to be clarified across the range of older Indo-European languages. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

RECONSTRUCTION

4.4.5. The ‘pool of variants’ problem Let us now return to the ‘pool of variants’ problem. We saw in () that the forms for the expression of the future in various Romance languages and dialects show quite a range of formal options, and it is at first sight rather unclear which of these forms may represent the Latin original. However, our approach to markedness can point us in the right direction, even if it does not provide an exact conclusion. Let us assume that the representation of future concerns the realization of a tense or mood feature, associated with a head in the TP-field (see §...). The French, Italian, and Spanish forms in () all express this feature synthetically: they realize it by means of movement of the inflected verb into the functional system. Romanian and Sardinian, on the other hand, have periphrastic constructions in which it is plausible to think that the auxiliary is merged in the relevant functional position. If movement implies more formal features then, the periphrastic constructions represent a less marked option. The Southern Italian dialects in fact represent a still less marked option, in that there is no formal expression of the future feature at all: here there may in fact be no formal feature of the relevant kind in the grammar. Since these are the most conservative dialects, they may tell us more than the others. From this we can conclude that the original Latin future form, if it had one, has been lost; and of course we know that this is true. Here our approach is useful in a rather negative way, but it is possible to say a little more. If the External Merge (EM) option is less marked than the Internal Merge (IM) one (since IM requires more features it is more marked in terms of FE; see §.), we might conclude that the French, Italian, and Spanish forms derived from an option that was originally more similar to the Sardinian and Romanian one. Given the evidence that Romanian has undergone extensive influence from the neighbouring non-Romance Balkan languages, all of which have a future auxiliary diachonically derived from the verb ‘to want’, we can give preference to the Sardinian variant as the reconstructed form. Hence we could reconstruct an original periphrastic future formed with a ‘have’ auxiliary. All the future forms, whether synthetic or analytic, clearly preserve the verbal infinitive, and so we can think that the common ancestor forms consisted of a ‘have’ auxiliary combined with the infinitive. This is the correct reconstruction for Vulgar Latin (Harris (: –); 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

Roberts and Roussou (: –) and the references given there).51 As we mentioned above, we could never reconstruct the Classical Latin -bo or -am forms, but we can see how our markedness-based approach can help with the pool-of-variants problem. 4.4.6. Parametric comparison Beginning with Longobardi (), Giuseppe Longobardi, Cristina Guardiano, Chiara Gianollo, and various collaborators have developed a systematic method of parameter-based grammatical comparison: the Parametric Comparison Method (PCM). For recent expositions of the approach, see Guardiano and Longobardi (), Longobardi and Guardiano (), and Ceolin et al. (). The PCM is ‘a novel tool for studying the historical evolution and classification of languages’ and ‘an innovative system for reconstructing language phylogenies’ (Longobardi and Guardiano : ). In particular, the PCM gives a concrete method for quantifying the degree of grammatical difference between, in principle, any given pair of languages. It is easy to observe that, while languages differ from one another in all aspects of their structure, some pairs of languages differ from each other more than others do: Spanish and Portuguese are very similar to each other indeed; English is quite similar to German and more similar to Dutch, while Japanese is significantly unlike all of these languages. Many, if not all, of these degrees of structural and lexical difference can be correlated to historical relationships. The PCM provides a systematic way of quantifying our intuitions about grammatical relatedness. The relevance of the method to historical syntax concerns the fact that if, as Guardiano and Longobardi (: ) claim, the method is more reliable than attempts to quantify lexical, morphological, and phonological distance (on these, see in particular the papers in Ringe, Warnow, and Taylor  (with a brief summary of this below), MacMahon and MacMahon , , MacMahon , Nakleh, Ringe, and Warnow , Nakleh, Warnow, Ringe, and Evans , Warnow, Ringe, and Warnow , and for an overview 51

This is not to imply that Sardinian preserves intact the Vulgar Latin form. A future of almost exactly this form is found at all attested stages of Sardinian, but, intriguingly, lacking the a element between aere (‘have’) and the infinitive at the earlier stages, which suggests it might be a syntactically rather different entity.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

RECONSTRUCTION

of the computational methodologies, see Dunn ), then it may be possible to use parametric comparison to establish or confirm relationships among languages. The PCM has been restricted to a specific syntactic domain, on the grounds that this controls the number of variable properties and makes the cross-linguistic data more comparable and manageable. The domain is the DP. In §.. we observed that within the DP there is variation in Article-Noun order, Plural marker-Noun order, NounRelative Clause order, and Noun-Genitive order (see §.., (j– m)). Among the other properties that may vary are the presence/ absence of number marking (English: YES; Japanese: NO), the presence/absence of a system of articles (English: YES; Japanese: NO), and the presence/absence of a system of classifiers (English: NO; Japanese: YES). An actual example of a DP-NP parameter, adapted from Longobardi et al. (: ) is the following: () P: NP over D separates languages in which most elements normally associated with the D-area, such as ‘articles’ or, in some languages, demonstratives and numerals, surface phraseinitially in the DP (e.g. Indo-European languages) from languages wherein they occur in absolute phrase-final position (e.g. Basque); this is taken to be a signal that the whole complement of D raises to some position to the left of D. The description of each parameter indexes the parameter with a number (here P), a three-letter abbreviation (P is NOD), a brief characterization of its empirical scope along the lines of (), and its preconditions and its implications. Ceolin et al. () define ninetyfour parameters in sixty-nine languages in the context of the PCM, all restricted to the nominal domain. The parameters are organized into a table illustrating the values of each parameter in a range of languages (in Ceolin et al.  the total is sixty-nine languages). Table . presents a sample table of twenty parameters in forty languages from Guardiano and Longobardi (: ‒, figure .). The first column simply assigns an arbitrary number to each parameter. The second column gives the parameter’s threeletter abbreviation. The third column gives the parameter’s name and the conditions which must hold for the parameter to be active. From the fifth column on, each column represents the set of parameter values in a given language. Each row corresponds to a parameter value. Each 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

cell contains ‘+’, for a positive value of a parameter, ‘’ for a negative value or ‘’, to indicate that a parameter is neutralized by the value of some other parameter or combination of parameters, as specified in the third column. The interactions among parameters as expressed by the occurrence of  in the cells is very important. As Guardiano and Longobardi (: ) say, a  appears in a cell: a. as a consequence of interactions with parameter m, no construction which could trigger parameter n is ever manifested in the weakly generated language; b. the interaction between m and n necessarily produces surface consequences which are indeed identical to those obtained when n is set to + or to -, respectively.

Guardiano and Longobardi go on to stress the importance of the large amount of ’s in Table ., attesting to the richness of the implicational relations among parameters. For example, parameter , DGR or grammaticalized article governs the requirement that the definite reading of a nominal argument be marked overtly. In languages with a definite article, such as the Romance languages, this parameter is set to +; instead, it is set to in, say, Russian or Hindi. Now, the value for parameter  (DGR) neutralizes, or contributes to neutralizing, the relevance of  other parametric choices, and indirectly (i.e., by transitivity) that of  additional parametric choices. (Guardiano and Longobardi : )

Parameter hierarchies of the kind discussed in §. also encode implicational relations; effectively, in each hierarchy the one value selected by a grammar neutralizes all the others. Converting parameter hierarchies into a format like Table ., or vice versa, may be an important future task for parameter theory. Guardiano and Longobardi (: ‒) suggest that the implicational relations may be more complex than this; they allow for any kind of Boolean interaction ‘representing implicational conditions either as simple states . . . of another parameter, or as conjunctions . . . , disjunctions, . . . or negation thereof ’ (). It remains an open empirical question which is the correct view of parameter interactions, or whether one can be reduced to the other. A table like Table . gives an enormous amount of information regarding the DP-internal syntax of the languages in question, far more than can be summarized in the space available here. The important point for present purposes, however, concerns the range of similarities and differences in parameter values that we find in grids like 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

RECONSTRUCTION

this. The most important point from a diachronic perspective is summed up by the following quotation: / has no probative value at all. However, . . . it is easy to increase the number of comparanda:  binary independent parameters generate 30 languages = ,,,. Now the probability for two languages to coincide in the values of  chosen parameters =  in 30, of three languages  in (30)2, i.e. less than one in a billion of billions. But even choosing just  such parameters, the probability that their values coincide in three languages is less than one in a million. (Guardiano and Longobardi : )

Hence, if we can observe significant coincidences in parameter values across languages, then we are very likely to be observing some nonaccidental relationship. The greater the number of parameters that coincide in value and the greater the number of languages in which we observe the coincidence, the more likely it is that we are observing a genuine relationship. Guardiano and Longobardi argue that it is overwhelmingly likely to be the case that the relationships so observed are due to common historical ancestry rather than to contact, although the latter possibility cannot be completely excluded (for discussion of ‘horizontal transmission’, see Guardiano and Longobardi :  and the references given there, in particular Guardiano et al. ). In other words, parametric similarities reflect retention of parameter settings across generations, i.e. successive convergence over many generations by acquirers (see Longobardi and Guardiano : ). Given the discussion of diachronic stability and parameter types in §.., we expect macroparameters to be the most conserved, with meso- and microparameters proportionately less conserved. One might conclude that macroparameters are therefore the most informative for historical relations, but in fact they may be too slow-changing for certain purposes, and so meso- or microparameters may be more useful (Barbançon et al.  make the same point about changes in general). Guardiano and Longobardi also observe that ‘different parametric changes can non-accidentally trace splits of variable historical depth’ (). The majority of the parameters investigated by the PCM are arguably mesoparameters, as pointed out by Guardiano and Longobardi (: ). Their parameter , which, as mentioned above, determines the grammaticalization of definiteness and is very rich in implicational consequences, may be a macroparameter (or perhaps the reflex of a deeper macroparameter connected to the grammaticalization 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

Table 4.3. Parameter grid for nominal syntax

From Guardiano and Longobardi (: ‒).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

RECONSTRUCTION



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

of Person; see Longobardi ; Richards ; and Roberts a: , , f.); if the proposal in §.. () that the ‘richness’ of D, which can be instantiated by an article system, plays a role in determining the type of null subjects a language allows, is correct then we see that this parameter may also be relevant at the clausal level and may have numerous further implications. Evolutionary biologists have developed computational methods for constructing phylogenetic relationships on the basis of data regarding shared characteristics. The programs that construct such relationships are, naturally, indifferent to whether the data they are organizing concerns shared linguistic traits or shared genetic features (both can be regarded as polymorphisms). Details regarding some of these programs are given in MacMahon and MacMahon (: –); see also Dunn () and the references given there for an overview of the techniques and issues in the area of computational phylogeny. Such programs can present the phylogenies they construct as trees, similar to those familiar in historical linguistics. Figure . gives the sample parametric distances among the forty languages shown in Table . and Figure . gives the tree of relations generated from this data by Kitsch, one of the PHYLIP programs developed by Felsenstein (). The tree is rooted by convention, and so all the languages appear to be related. What is more relevant is what we find on the lower branches. Here we see that the principal language families are correctly grouped together: the Semitic, Germanic, Romance, Slavonic, Sinitic, Uralic, and Turkic languages are all correctly grouped together to the exclusion of other languages. The outliers are Japanese, which is generally regarded as an isolate, Wolof, which is a member of the very large Niger-Congo family, but is the only language of that family in the sample, and Kuikúro, a Carib language of Central Brazil, similarly the only representative of its family in the sample. Also, very closely related languages form subgroups together, e.g. the two varieties of Greek, Danish with Norwegian, and Italian with Sicilian. There are some errors in relation to traditional classifications (based on comparative reconstruction) however: notably Farsi is not grouped with the other Indo-European languages. (The closest-related language in the sample is Pashto, and this is correctly grouped with the other Indo-Iranian languages Marathi and Hindi.) But, by and large, the groupings replicate the known historical relationships among the languages in question. Of course, these groupings are well-known as the historical relations among the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

Figure 4.5

RECONSTRUCTION

Sample parametric distances based on the data in Table .

From Guardiano and Longobardi (: ).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

Spanish Portuguese French Sicilian Italian Romanian Danish Norwegian Icelandic English German Serbo-Croatian Slovenian Polish Russian Bulgarian Greek (standard) Cypriot Greek Marathi Hindi Pashto Irish Welsh Estonian Finnish Hungarian Turkish Buryat Basque (central) Basque (western) Farsi Inuktitut Arabic Hebrew Cantonese Mandarin Kadiweu Japanese Wolof Kuikuro

Figure 4.6

Result of application of KITSCH to the data in Table .

From Guardiano and Longobardi (: ).

languages considered, and so nothing new is being discovered here. However, the fact that the parametric comparison method can independently and automatically reveal the correct historical relationships provides an important indication that it is a valid method for detecting historical relationships. Guardiano and Longobardi (: ) further argue that an approach to measuring relatedness which relies on parametric syntax has certain advantages over an approach based on lexical similarities. The first of these is that parameters are less likely to be borrowed than 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

RECONSTRUCTION

lexical items. Of course, we have seen cases of syntactic borrowing in §.., and we will turn to the general question of borrowing in §.. Nonetheless, as already mentioned, where two languages share a parameter value the overwhelming likelihood is that this is due to sharing a common ancestor rather than horizontal transmission of some kind. Second, parameters are unconsciously set early in life in language acquisition (see §.); it is highly unlikely that conscious individual choice can play a role in setting parameters in either childhood or adulthood; in this parameters differ from ordinary lexical items. Third, thanks to Inertia, parameters tend to be stable; again, we discussed the relative stability of different parameter types in §.., and this may be significant for PCM as we saw above. In this connection, it is worth comparing the PCM with the research reported in Ringe, Warnow, and Taylor (). These authors used similar techniques to PCM to try to identify the first-order subgrouping of the Indo-European languages. A central concept in this connection is that of a character, which they define as ‘an identifiable point of grammar or lexical meaning which evolves formally over the course of the language family’s development, . . . each state of the character ought to represent an identifiable unique historical stage of development—a true homology [shared trait inherited from a common ancestor—IGR]’ (). An example of a lexical character would be English hand (=), Ger Hand (=), Fr main (=), It mano (=), Rus ruká (=); the numbers are arbitrary values designating each state of the character. An example of a phonological character would be the sequence of changes including Grimm’s Law, Verner’s Law, initialsyllable stress, merger of unstressed *e with *i except before *r. If these changes are absent, the character has value , if they are present, it has value . Unsurprisingly, this character singles out Germanic among the subgroups of Indo-European. Ringe, Warnow, and Taylor used a database of twenty-four languages representing all ten Indo-European subgroups, and compared  characters (twenty-two of them phonological, fifteen morphological, and the rest lexical). The result of running the tree-optimization software was that eighteen characters were incompatible with the best tree. As they put it: ‘in computational terms our result is a total failure’ (). However, since fourteen of the eighteen incompatibilities involved Germanic, they simply removed the Germanic languages from the comparison. This gave a more satisfactory result, and one according 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

to which Italo-Celtic, Balto-Slavonic, the satem group, and GraecoArmenian emerge as Indo-European subgroups. Nakleh, Ringe, and Warnow () developed a system of phylogenetic networks, rather than trees, in an attempt to reconstruct contact relationships in addition to genetic inheritance relationships. In this system, they were able to include Germanic, postulating ancient contact with Balto-Slavonic and Italo-Celtic. Given Guardiano and Longobardi’s () observations regarding the advantages of PCM over lexical, phonological, or morphological comparison, a natural proposal would be to combine the two approaches, and treat parameter values as characters, thereby adding syntax to the structural comparison.52 In addition to the DP-related parameters that have been the focus of the PCM, one could readily consider several of the clause-level parameters we have been looking at throughout this book as characters (for example, second-position phenomena, wh-movement, null subjects, the presence of clausal particles such as question particles, OV vs. VO order and the associated properties, the presence or absence of pronominal clitics, auxiliaries, etc.); most of these are relevant to the older Indo-European languages. For example, alongside () we could formulate a clause-level parameter such as () (here the number  and the abbreviation TOC are arbitrarily assigned to this parameter for purposes of illustration): () P: TP over C (TOC) separates languages in which most elements normally associated with the C-area, such as complementizers or, in some languages, question particles or other clause-type markers, surface phrase-initially in the CP from languages wherein they occur in absolute clause-final position; this is taken to be a signal that the whole complement of C raises to some position to the left of C. In this way, syntactic properties could play a real role in reconstructing the relationships among the Indo-European languages. However, Ringe, Warnow, and Taylor () observe that there are two things which must be avoided if the phylogenetic method they adopt 52 Ceolin, Guardiano, Irimia, and Longobardi (: ‒) report the results of a comparison of an application of PCM to  languages from a wide range of families with the comparison of phonemic inventories in Creanza, Ruhlen, Pemberton, Rosenberg, Feldman, and Ramachandran (). They conclude that ‘phonemic data exhibit a much shallower historical signal than syntactic data’ ().



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

RECONSTRUCTION

is to work. The first of these is what they call ‘backmutation’, the case where a change takes place in one direction, followed, at some later stage in the development of the family, by its exact reversal. In the areas of the lexicon, morphology, and phonology, they observe, backmutation is ‘either improbable or vanishly rare’ (). They say that ‘we simply do not find cases in which the contrast between two elements A and B in a structured system is eliminated from the language, then . . . reintroduced in precisely the same distribution that it originally exhibited’ (). This seems to be clearly true of such changes as phonemic split and merger, the loss or gain of inflection, and changes in word-meaning. However, we have seen that it is probably not true of syntactic parameters. (Recall the discussion of the null-subject parameter in the history of French and various Northern Italian dialects in §...) However, we may also observe that the parametric system as a whole does not undergo backmutation. To avoid this problem, it may be necessary to consider sets of parameters as characters (just as Ringe, Warnow, and Taylor  propose treating sets of sound changes as phonological characters). The second thing which must be eliminated in order for phylogenies to be computed, according to Ringe, Warnow, and Taylor (), is parallel development (in genetic terms, this is analogy rather than homology). As we have seen, there may be parallel syntactic development in both the Romance languages and at least a subset of the older Indo-European languages. One way to avoid this problem might again be to take sets of changes rather than individual changes as evidence for groups of related languages (or clades). In any case, it seems doubtful that syntactic change poses any problems not already encountered, perhaps in a slightly different form, in the area of phonology. A further obvious difficulty in using syntactic parameter values as characters is that much less is known about the syntax of a number of older IE languages compared to their phonology, lexicon, and morphology. However, here there is no issue of principle (beyond the usual difficulties of working with extinct languages): we simply need to apply the analytic, descriptive techniques of syntactic theory to the analysis of the older languages, as has to some extent already been done by Garrett (), Hale (), Kiparsky (); this has been done with great success in the analysis of various stages of Latin in recent years: see in particular Salvi (); Devine and Stephens (); Ledgeway (); and Danckaert (, ). There is no deep reason why the syntax of the older Indo-European languages should not be as well-known as 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

their phonology and morphology, opening up the possibility of syntactic reconstructions of the parent language that have as much validity as phonological or morphological reconstructions. In general, then, PCM can give us a way to quantify grammatical differences. As such, it may play a major role in developing our theories of language typology and language change. It may also shed light on major questions in historical linguistics. In this connection, Longobardi (: ), in an early exposition of PCM, observes that we can pose ourselves five fundamental questions in linguistic theory, as follows: () a. b. c. d. e.

What are the recorded samples of human linguistic behaviour? What are the actual human languages? What are the biologically possible human languages? Why do we have precisely these actual languages? Why do we have precisely these biologically possible human languages?

The answers to these questions correspond to different levels of adequacy in linguistic theory. The first three correspond to observational, descriptive, and explanatory adequacy as defined in Chomsky (). The fourth, Longobardi calls ‘actual historical adequacy’, while the fifth relates directly to the goals of the Minimalist Programme as described in Chomsky (, , , ). Concerning historical adequacy, Longobardi points out that ‘one of the best examples of historical explanation . . . in the human sciences has been provided by the historical-comparative paradigm in linguistics, where reconstructed protolanguages play the role of initial conditions deriving observed languages in conjunction with some general hypotheses about possible linguistic changes’ (: ). He further urges that we should take advantage of the combined insights of the two major scientific revolutions in linguistics, those which gave rise respectively to the historical-comparative paradigm during the XIX century and the ‘synchronic-cognitive’ paradigm in the XX. It is such a combination that may yield substance to a good deal of the historical-explanatory program. (: )

It is in this way that the parametric approach may take its place in the long tradition of historical and comparative linguistics. More generally, Ceolin et al. (: ) conclude that ‘the syntactic structures of I-languages . . . are an effective tool of historical knowledge’. This is clearly consistent with the overall goal of the present book. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CONCLUSION

4.4.7. Conclusion In this section, I have considered the possibility of using a parametric system for syntactic reconstruction. It is possible to deal with the three problems identified by Vincent and Roberts () in terms of the general approach to change we have been advocating throughout. We can also define a notion of proto-grammar in parametric terms, following Hale (), and provide tentative but fairly detailed reconstructions of certain specific parameters of PIE; as we saw, this was achieved for Proto Germanic by Walkden (, ). There is no reason not to take a realist view of the partial reconstructions offered here. I conclude from this that syntactic reconstruction is possible, even in a parameter-based reanalysis-driven approach to change (pace Lightfoot , , ). Moreover, we have seen how the PCM may offer the possibility of confirming and establishing historical relationships among languages. 4.5. Conclusion As I said at the beginning of this chapter, the purpose of the discussion here has been to consider to what extent the general model for syntactic change developed in Chapter  poses problems for, is compatible with, or is even useful or insightful for a number of questions concerning what can be rather generally thought of as the dynamics of syntactic change. These questions mainly concern ‘external’ aspects of change, and how these appear to be incompatible with the kind of ‘internalist’ account of change that was developed in the preceding chapters. To a degree, this distinction reflects the distinction between E-language and I-language, originally made by Chomsky () and presented in the Introduction. Our account of syntactic change is focused on I-language, but of course all actual records of language change are manifested in E-language, usually as texts. This discrepancy gives rise to some of the apparent incompatibilities between our view of change and aspects of the historical record, notably the fact that recorded changes typically appear to be gradual. Here I have tried to isolate and explain those incompatibilities. The general goal of this is to provide support for the internalist, I-language approach to syntactic change. What has emerged is that the parametric approach does not really pose any special problems for reconciling the external aspects of change 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

with what I take to be the necessarily internalist account of how systems change. Concerning the question of reconciling the apparently gradual nature of change with the fact that parametric change must be seen as instantaneous, we observed that several factors may ‘cushion’ the effects of a parametric change: sociolinguistic factors relating to diffusion, nano- and micro-parametric change and variation, and the possibility of competing grammars and/or true formal optionality. In §., I looked in some detail at WLH’s discussion of sociolinguistic factors in change, and how notions like diglossia, social stratification, and ‘ordered heterogeneity’ might be compatible with a parametric approach. We saw how integrating these aspects into a parametric approach would in principle not be problematic, and how Kroch’s notion of grammar competition may or may not be helpful here. In §., I argued, pace Lightfoot (, : –) that ‘parametric drift’ probably does exist and suggested how it can be integrated into our theory. A clearer understanding of markedness along the lines of §. should provide an account of this phenomenon. Finally, in §., I argued that syntactic reconstruction using parameter values as the point of correspondence is possible and may be desirable; again, this is contrary to what was argued in Lightfoot (, , ). In general, a parametric approach to language change does not really raise any special problems, or make any special predictions, as far as issues of the type considered here are concerned. The main contribution that a parametric approach has to offer to historical work does not concern the nature of change, since, as Hale () and Longobardi () both point out in different ways, parametric models are inherently atemporal and change is of course a temporal notion. Instead, what a parametric approach has to offer is a firm analytical foundation for the analysis and comparison of grammatical systems; it is a precise, rigorous, and flexible formal system for representing the knowledge and acquisition of syntax in the individual, and for representing the dimensions along which that knowledge can vary across individuals or aggregates of individuals. It provides an essential basis for the analysis of syntactic change, while at the same time not prejudging any of the more difficult, external questions which have been raised in this chapter. In the final chapter, I turn to the most complex ‘external’ question of all, that of how language contact should be thought of in parametric terms, and to what extent it is responsible for observed syntactic changes. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FURTHER READING

Further reading Historical and Indo-European linguistics Campbell () is a very good general introduction to historical linguistics, with the emphasis on phonological and morphological change. MacMahon () is a very useful general introduction to historical linguistics, with a good balance of discussion of syntactic, morphological, and phonological change, as well as some intriguing discussion of the historical connections between the study of language change and the study of evolutionary change in organisms/species. Durie and Ross () is a collection of articles on the nature and status of the methodology of reconstruction and its usefulness in historical linguistics. Lightfoot () is a rebuttal of the claims regarding syntactic reconstruction made in Harris and Campbell () and, to a lesser extent, Roberts (). The latter is a review of Harris and Campbell () (see the Further reading section in the Introduction), in which an early version of the idea discussed in §.. that parameters can solve the correspondence problem for reconstruction is put forward. Vincent and Roberts () argues, in agreement with Harris and Campbell () but on different grounds, in favour of syntactic reconstruction. The discussion of the ‘pool-of-variants’ issue in §.. relies heavily on this paper. Guardiano and Longobardi () and Longobardi and Guardiano () are the most detailed presentations of PCM to date. Earlier papers exploring the same idea are Longobardi (), Guardiano and Longobardi (, ), Gianollo, Guardiano, and Longobardi (, ). MacMahon (), MacMahon and MacMahon (, ), Nakleh, Ringe, and Warnow (), Nakleh et al. (), and Ringe, Warnow, and Taylor () pursue the question of quantifying relatedness using lexical and, in the case of the latter three articles, phonological and morphological information. Nakleh, Ringe, and Warnow () develops Ringe, Warnow, and Taylor () in considering phylogenetic networks rather than simply trees, in an attempt to reconstruct contact relations. These two papers arrive at a very interesting reconstruction of the first-order relations among the Indo-European languages. Hale () is a general article on the generative approach to diachronic syntax. A number of very useful conceptual clarifications are made, notably regarding the atemporal nature of grammars. Benveniste () is a classic study of the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

expression of possession in relation to perfectivity. Although the focus is on Classical Armenian, a number of important typological observations are made regarding the cross-linguistic incidence of ‘have’-like verbs and auxiliaries. Denison () is a very useful critical summary of the explanatory role of the S-curve in historical linguistics. Ellegård () is a classic study of the incidence of do in fifteenth-, sixteenth-, and seventeenth-century English, whose results are used extensively by Kroch (), and rather less extensively by Roberts (, a). Horrocks () is a wide-ranging and useful account of the history of all aspects of the structure of Greek from the earliest times up to Modern Greek. Clackson and Horrocks () is an analogous account of the history of Latin. Jespersen (–) is a monumental study of the historical syntax of English, containing invaluable data and discussions of an extremely wide range of phenomena. Roberts and Kato () is a collection of articles dealing with the diachronic syntax of Brazilian Portuguese, many of which focus on the question of the apparently changing status of null subjects in contemporary Brazilian Portuguese; Galves, Kato, and Roberts () is a recent collection, updating and revisiting many of the same issues. Penny () is arguably the main history of Spanish written in English in recent years. Pintzuk (, a,b) continues the research in Pintzuk (, ) (see the Further reading section in Chapter ) in looking at word-order variation and change in the history of English in terms of grammars in competition. Pintzuk and Taylor () is an overview of this approach to word-order change. Taylor and Pintzuk (, a, b,c, , ) continue this line of research, but take aspects of information structure systematically into account. All of these articles look in detail at different types of OV constructions in OE and ME. Biberauer and Roberts () is an extension of the earlier work on word-order change by these authors (see the Further reading section in Chapter ), which relates the ME word-order changes to the changes in Late ME and ENE discussed in §... Santorini (, , ) deals with word-order change in the history of Yiddish, arguing for a competing-grammars approach to these phenomena. Fontana () is a detailed study of word order in Old Spanish, in which it is argued that this language, like a number of other Romance languages, went through a V stage. Wolfe (a,b) are detailed and thorough overviews of V across Medieval Romance. Lakoff () looks at the development of the expression of modality and sentential 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FURTHER READING

complementation, concentrating primarily on Latin. It is a rare example of work on syntactic change in the generative-semantics framework. Mathieu and Sitaridou () looks at the ‘hyperbaton’ construction of Classical Greek, arguing that it is a case of general leftbranch extraction. Brugmann and Delbrück (–) is a reprint of the classic summary of the state of the art in Indo-European linguistics at the end of the nineteenth century. Wackernagel () is a classic study of one aspect of the syntax of the Indo-European languages, the central observation of which is that in many of the older Indo-European languages various ‘light’ elements (pronouns, light adverbs, and sentential particles) show a strong tendency to appear in the second position in the clause. Wackernagel’s observations are still highly relevant to the study of the left periphery of the clause, and remain largely unexplained in satisfactory theoretical terms. Watkins () is a very interesting discussion of the possibility of reconstructing Indo-European syntax, featuring a very intriguing proposal regarding relative clauses. Hale () looks at word order in Sanskrit in government-and-binding terms, arguing that this language has a fixed clause structure in which topics precede foci, both of which precede the ‘core’ of the clause (TP in the terminology used here). This analysis has been influential and has been applied to a number of other older IndoEuropean languages—see Fortson (: –, : ‒) for discussion. Szemerényi () is another very thorough introduction to Indo-European linguistics. Clackson () and Fortson () are recent and very useful textbooks on Indo-European. Each devotes a whole chapter to syntax. Klein, Joseph, and Fritz (‒) is a monumental three-volume handbook summarizing the state of the art in all aspects of Indo-European linguistics. Morphology and phonology Aronoff () is an important work on morphology in generative grammar, which is in many respects still relevant to current theoretical concerns. Spencer () is a very thorough study of pre-optimality morphological theory. Zwicky and Pullum () put forward an influential account of the differences between cliticization and affixation. Some of the ideas put forward here are taken up in Spencer (: –). Halle () is one of the first articles to look at language change from a generative perspective, proposing an approach 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

THE DYNAMICS OF SYNTACTIC CHANGE

to sound change in terms of an early conception of generative phonology. Abductive reanalysis through language acquisition plays a central role in this approach. Kiparsky () is a recent discussion of phonological change. Italian dialects Cocchi () is an early treatment of auxiliary selection in Central and Southern Italian dialects, in terms of an early version of minimalism. Loporcaro () is a very detailed study of past-participle agreement in Romance, from both a synchronic and a diachronic point of view, using the theoretical framework of relational grammar. Manzini and Savoia () is a monumental study of the phonology, morphology, and syntax of Italian and Rhaeto-Romance dialects. Tuttle () is a diachronic study of the person-driven auxiliary systems of the CentralSouthern Italian dialects, while Vincent () looks at the general development of the perfect periphrasis from Latin to Romance. Ledgeway (forthcoming) is a general overview of auxiliary selection in Romance, which adopts a parameter hierarchy to describe the variation. Legendre () is a survey of auxiliary-selection patterns in Central and Southern Italo-Romance, using the framework of optimality theory. D’Alessandro and Roberts () analyses auxiliary selection and past-participle agreement in the Central Italo-Romance Eastern Abruzzese variety Ariellese. D’Alessandro, Ledgeway and Roberts () and Benincà, Ledgeway and Vincent () are both collections on various aspects of the syntax of Italian dialects. Ledgeway and Maiden () is a monumental survey of all aspects, both synchronic and diachronic, of all the Romance languages, including much material on the dialects of Italy. Sociolinguistics Weinreich, Labov, and Herzog () is the foundational text for the sociolinguistically oriented study of language change, particularly sound change in progress. Many of the ideas first put forward here have been developed in greater detail in Labov (, , ). The article is most relevant and useful in demonstrating how any general theory of the structure of language must take ‘external’ factors into account if a full picture of how historical changes take place is to be 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

FURTHER READING

obtained. Labov () is a seminal study in sociolinguistics, looking at the incidence of postvocalic /r/ in the English of New York City, in which it is shown that this element functions as a sociolinguistic variable, in that its incidence correlates with the age, gender, and social class, of speakers. Furthermore, the variation across age-groups, classes, and genders indicates that this variety of English is moving from lacking postvocalic /r/ to having it. Finally, the incidence of postvocalic /r/ in this variety shows it to be a prestige form for many groups of speakers. Labov () is a collection of important articles, several of them on Black English Vernacular (the English of the Afro-American population of the United States, now usually referred to as AfricanAmerican Vernacular English). These include a seminal article on negative concord in this variety, which is briefly summarized in §... Labov (, , ) is a three-volume study of the nature of language change, with the emphasis primarily on sound change, and which brings together many of Labov’s concerns going back to his earliest work in the s. Muysken () is a detailed study of code-switching (or code-mixing), with emphasis on the widespread existence of subsentential code-switching.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

5 Contact, creoles, and change



Introduction 5.1. Second-language acquisition, interlanguage, and syntactic change



5.2. Contact and substrata



5.3. Creoles and creolization



5.4. Language creation in Nicaragua



5.5. Conclusion

 

Further reading

Introduction In the previous chapters, I have argued for the existence of parametric change in the historical record (Chapter ), for its general utility in accounting for types of syntactic change (Chapter ), and for an acquisition-driven model of parameter-setting and change (Chapter ). In Chapter , the discussion centred on how this approach to syntactic change may or may not shed light on a number of issues connected to the dynamics of change, concluding that the parametric approach has little that is special to contribute to these questions per se, but that the interest of the approach is that it provides a formally rigorous yet flexible analytical tool for comparative syntax. Moreover, in being unambiguously an I-language approach, it is possible to understand, maybe even resolve, some of the traditional paradoxes of historical linguistics in terms of the distinction between I- and E-language. The purpose of this final chapter is somewhat different: the goal here is to discuss second-language learning and acquisition as factors in syntactic change and, more generally, issues related to the phenomenon 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

INTRODUCTION

of language contact, and how this may affect the primary linguistic data (PLD) in a way that causes parametric change. We discussed contact in relation to parametric change in §.., where we introduced the framework of Van Coetsem (, ) for analysing contact-driven syntactic change and in particular the distinction between a Source Language (SL) and a Receiving Language (RL). However, the discussion there was only concerned with illustrating how the PLD could change in such a way that the Regress Problem could be solved. In particular, we did not discuss the nature and role of ‘imperfect learning’ of a second language by adult learners; this and related issues, including the nature of substrate effects and the question of the status of creoles are the main issues we will look at in this chapter. §. looks at interlanguage and second-language acquisition (LA). In §., I consider contact and substratum phenomena. §. considers creoles and creolization. Here we consider two approaches to creoles: one is based on the idea that Bickerton’s (, ) Language Bioprogram causes all parameters to be set to unmarked values, while the other is based on the idea that creoles show radical relexification (usually by a European language) of a substrate system which retains syntactic features of the original language or languages of the creolespeaking group. The interest of creoles lies in the fact that children whose PLD consists largely of pidgin would be faced with a particularly extreme instance of an impoverished and/or unstable stimulus for acquisition, and so how they cope with this may be of great interest for our general understanding of how children handle underdetermined input, which I take to be a feature of all first-language acquisition (LA). Against this background, I will also consider DeGraff ’s arguments against what he calls ‘creole exceptionalism’ (see DeGraff , , , ), focusing in particular on Aboh and deGraff (). In §., we look at a genuinely ‘exceptional’ situation, and what might be the most spectacular example of impoverished PLD ever documented. This is the case of a group of deaf children in Nicaragua who apparently invented their own sign language over a period of a few years, documented by Kegl, Senghas, and Coppola (; henceforth KSC) and further discussed in Kegl (). The structural properties and the ‘history’, insofar as it can be traced, of this language are clearly of the highest theoretical interest. In a similar vein, we will briefly consider recent research on Al-Sayyid Bedouin Sign Language (ABSL), a signed language which seems to have emerged in the last 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

seventy-five years (Meir et al. ). We will consider what conclusions may be drawn for our approach to syntactic change and to linguistic theory more generally from the evidence these languages provide.

5.1. Second-language acquisition, interlanguage, and syntactic change1 The topic of the acquisition of a non-native second (or third, etc.) language has given rise to a fairly significant research literature: the references given in Hawkins (, ), White (), and Schwartz and Sprouse () attest to this; see also the handbooks edited by Doughty and Long (), Ritchie and Bhatia (), Gass and Mackey (), Herschensohn and Young-Scholten (). In this section, I do not propose to provide an introduction to this field; Hawkins and White do this. Instead, I want to present an overview of the issues which are relevant to understanding the phenomena of language contact and substrate phenomena in the diachronic domain, which are the subject matter of the next subsection. To some extent, issues related to LA are also relevant to creoles and creolization. More generally, LA is relevant to syntactic change since, as we have mentioned, imperfect learning of a second language by adults in a contact situation may affect the PLD of a subsequent generation and lead to contact effects of various kinds. We noted in Chapter  that this is one way in which the Regress Problem can be solved. So our goal here is to look at what the concept of ‘imperfect learning’ really means.2 1 I am grateful to Bonnie Schwartz and Rex Sprouse for very helpful comments on this section. 2 The idea that ‘imperfect learning’ may be linked to complexity was developed in relation to sociolinguistics and language change by Trudgill (). The core idea is that complexity can be defined in relation to late L acquirers and that change occurs through simplification as a result of sociolinguistic situations involving substantial late L acquisition. I will not explore Trudgill’s approach here because (a) it has been largely limited to phonological and morphological change, (b) while both simplicity/complexity and sociolinguistic factors are highly relevant for understanding change, including syntactic change, as we have seen in the previous two chapters, there is no reason to conflate or combine them, (c) there are no good grounds to question the idea, originally due to Hockett (), that languages do not vary in simplicity/complexity in some (rather ill-defined) holistic sense. Furthermore, pace Walkden and Breitbarth (), Trudgill’s approach does not seem necessary to account for changes involving Jespersen’s Cycle (see §.) or null subjects (see §..‒). For a view of complexity in terms of parameter hierarchies, see Biberauer, Holmberg, Roberts, and Sheehan ().



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

SECOND LANGUAGE ACQUISITION, INTERLANGUAGE

In §.., we saw how LA is conceptualized in terms of current syntactic theory. There we saw that LA is the process by which the language faculty passes from the state S, corresponding to Universal Grammar (UG), to the stable state SS of adult competence. Schematically, LA falls into several stages , . . . , Sn>i, Sn+. We suggested that the intermediate stages correspond to an immature competence, in which the process of fixing the parameters of the grammar to an approximation of the grammar underlying the PLD is ongoing. LA differs from LA in a number of respects. In fact, we can observe differences in all three aspects of the acquisition process: the initial state, the intermediate states, and the final state. The initial state differs from that of LA by definition: the L acquirer has already acquired the L grammar. One might therefore expect that this grammar influences the L grammar and the process of L acquisition in various ways. The extent to which this is true is one of the central points which is debated in the research literature on LA. Whatever the correct point of view on this matter, we can see that there is a clear difference with L acquisition. The intermediate states of L acquisition differ in principle from those of L in up to three main ways. First, they may be affected by the L grammar, just as the initial state may be. Second, the course of L acquisition may not be the same as that of L acquisition; this follows from the first point to the extent that the L and the Target Language (TL) for LA differ. Third, since it seems to be a matter of casual observation that L acquisition is very frequently imperfect, it may be that the acquisition process ‘fossilizes’ at an intermediate state; an intermediate state may thus amount to a final state. To this one can also add that the time course of L acquisition may vary much more than that of L. Whilst it is usually said that L acquisition begins very early in life (see Guasti : f.) and is largely complete by the age of  or so, L acquisition can be a much more protracted affair, with even highly accomplished L speakers only approximating true TL competence after many years of practice in and exposure to the TL. This is the question of ‘ultimate attainment’; see also Lardiere (, , ; and the discussion of Coppieters  and Birdsong  in White : –). Finally, the final state of L acquisition very often fails to correspond to the target. Although one of the central ideas of this book is that this 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

may also be true of L acquisition, and this is what underlies much syntactic change, it is clear that L acquisition usually approximates the target very closely, so much so that much work on L acquisition assumes perfect convergence, and the Inertia Principle can be seen as reflecting this. But the outcome of L acquisition is very often conspicuously divergent in relation to the target. In fact, explaining this is one of the central issues in LA research. As Schwartz and Sprouse (: ) put it: ‘Why, if UG constrains adult L acquisition, does the ultimate attainment of adult Lers, even under optimal environmental circumstances, not simply mirror that of L acquisition?’ From opposing perspectives, most of the references given here try to deal with this question, or try to deny the presupposition that UG (newly) constrains L acquisition. Behind all of these differences between LA and LA lies the question of the critical, or sensitive, period: the idea that ‘there is a time period which is optimal for language acquisition, with a maturational decline with increasing age’ (White : ; see also Guasti : ‒) (we will see some very interesting evidence for the critical period in §.). Adult LA takes place after the putative critical period, and it is plausible to think that at least some of the differences in the intermediate states and in the final state are due to this. In fact, if we imagine that setting parameters involves effectively fossilizing them, then once L acquisition is complete no further language acquisition of any kind can take place and we would expect to find the differences between LA and LA just described. We will see directly that interlanguage studies have provided evidence against such a strong view of the critical period. If it is correct to think of the critical period in terms of the atrophying either of UG, perhaps for the reason just given, or of the parametersetting device (the learning algorithm), then it follows that LA can have no access to UG. In that case, we expect L production and comprehension (and any intuitions L acquirers may have about the TL) to be guided by UG only via the L competence, and we might expect L behaviour to be ‘wild’, in the sense that it might reflect structures which are not allowed by UG. Furthermore, we do not expect to find evidence of parameter-setting in LA of the kind that we find in LA. Studies of LA and interlanguage competence—investigating the production and comprehension capacities, and intuitions concerning grammaticality, of L learners and speakers—have centred on this 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

SECOND LANGUAGE ACQUISITION, INTERLANGUAGE

question, as documented in the references given above, and in particular Schwartz and Sprouse (). Again, we can divide the acquisition process into the three types of state in order to see more clearly what is involved. If UG is not involved in LA, then the initial state of LA must be either the L grammar or a non-UG-determined cognitive state, perhaps due to ‘general learning’ strategies of some kind. The former point of view characterizes the ‘local impairment’ approach of Beck (); the latter the ‘global impairment’ approach of Clahsen and Hong (). On the other hand, if UG is involved in LA then the initial state can again be the L grammar, or it can be UG itself, i.e. equivalent to the initial state of LA. The former point of view is that of the Full Transfer/Full Access approach (Schwartz and Sprouse ), the No Parameter-Resetting approach (for example, Smith and Tsimpli ), the Failed Functional Features Hypothesis (Hawkins and Chan ) and Tsimpli and Dimitrakopoulou’s () variant of this approach, which makes crucial reference to uninterpretable features (the Interpretability Hypothesis). White () called the latter point of view Full Access (without Transfer), which was advocated notably by Flynn and Martohardjono () and Flynn (). The intermediate states can be characterized by parameter-resetting, on a UG-based approach: this is the view of both the Full Transfer/Full Access approach and the Full Access (no Transfer) approach. Alternatively, one can think that parameters cannot be reset—this is the view taken by Smith and Tsimpli. If UG plays no role in LA then the possibility of parameter-resetting does not arise, and general-learning strategies must be at work; this of course does not rule out the possibility that an able L learner might be able to ‘simulate’ fluent, nativelike linguistic behaviour in performance. Such a learner would by definition not have true competence in the TL, having instead targetlike competence in their interlanguage. Regarding the final state, the UG-based theories all agree that a target-like grammar can, given favourable learnability conditions, be attained in principle, while the non-UG-based theories claim that this is impossible. As mentioned above, we would then expect L performance to reflect UG properties only via the L, if at all, and we would expect ‘wild’ structural features to be an aspect of the TL competence, observable in production. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

White (: ch. ) considers the evidence for the various points of view just given from experimental studies of various kinds regarding the nature of interlanguage performance. She concludes: On the whole, the results are consistent with Full Transfer Full Access: learners start out with L functional categories, features, and feature strength and are able to acquire L categories, features, and feature strength. Nevertheless, some aspects of the results are puzzling for this view, in that effects of the L reveal themselves more often than not in the form of variability, with L and L properties co-occurring, rather than there being initial effects of the L setting alone. (White : )

(‘Feature strength’ here refers to the movement-triggering property in the version of minimalism in Chomsky (; ): it is extensionally equivalent to the EPP features of the version of the theory assumed here, see §...) If parameters represent underspecified features which simply have to be set to a constant value by the learning algorithm, then we might think that L and adult L differ in that the consistency requirement does not obtain in adult LA. It may be that the learning algorithm does not really do its job in the adult LA case, but only once, for L. Atrophy of the learning algorithm may also be one way to construe the critical-period hypothesis.3 Two other aspects of interlanguage performance are relevant for our purposes here. First, it is widely observed that interlanguage performance is better in syntax than in morphology, particularly inflectional morphology. White comments that it ‘is well known that L learners exhibit optionality or variability in their use of verbal and nominal inflection and associated lexical items’ (: ). If morphology is a major cue for parameter values (see §..), then we can see how interlanguage performance may affect the PLD for a subsequent generation. Second, although L acquisition frequently fails to match the target grammar, there is no reason to assume that it does not correspond to a possible instantiation of UG, i.e. a grammar in the sense of having the parameters set (although possibly not to determinate values). To quote White again: ‘steady-state interlanguage grammars are not wild’ (: ). It is quite probable that the question of the UG-based status of interlanguage is not of central importance for understanding contact 3 This line of reasoning leaves aside the question of child LA, perhaps during the critical period (see Schwartz ). I will largely leave child LA aside here, since the influence on the PLD from an Alien Corpus as in () in the next section assumes adult LA.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

SECOND LANGUAGE ACQUISITION, INTERLANGUAGE

phenomena in diachrony. It may simply suffice to observe that ‘imperfect learning’, which could be construed in terms of any of the accounts of interlanguage sketched above, is enough to perturb the PLD of subsequent generations in such a way as to bring about syntactic change, and thereby solve the Regress Problem. In principle, different types of contact situation and different patterns of change could help decide among the various approaches to interlanguages, but in practice we have too little crucial data both regarding interlanguage and regarding the contact situations and the changes caused by them to be able to do this in the current state of knowledge. One could also think that nonUG-based interlanguage might be ineffectual in bringing about change, if acquirers ignore non-UG-based input. However, certain views of creolization take pidgins to be non-UG-based systems, and it is clear that the Deaf Nicaraguan children who invented a new language did not have UG-based input (or at least the first cohort of those children; see §.), so it seems that linguistic behaviour which does not have a basis in UG can function as a somewhat impoverished form of PLD. However, we can see that interlanguage is probably UG-based, and has three important properties: (i) it is variable, in that certain parameters seem inherently unstable; (ii) inflectional morphology tends to be inconsistent; and (iii) it does not correspond exactly to the target. We can take these three points as the content of the notion of imperfect learning. All of these factors can significantly perturb the PLD. Consider a situation where an older generation (or cohort of speakers) speaks an interlanguage with these properties, owing, perhaps, to a foreign invasion, and so this forms (part of) the PLD for the next generation (or cohort). It is intuitively clear, but also has been shown in the computational models produced by Niyogi and Berwick (, ) and Niyogi (), that the PLD will be such that it is able to trigger an L grammar in the younger generation/cohort which differs in some parameter values from the L of the older generation/ cohort, despite the fact that the latter group may have successively converged on its L grammar on the basis of the PLD from the previous cohort prior to the contact situation. This, then, is how contact may affect PLD. We see that the conclusions regarding the likely nature of interlanguage are quite important for our understanding of how contact can bring about syntactic change. Let us now turn to some concrete examples of contact and substratum effects. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

5.2. Contact and substrata 5.2.1. Introduction There is no doubt about the existence of the phenomenon of language contact, and its potential importance for understanding language change. Thomason (: ) says ‘most of what historical linguists study under the designation “language change” is due to contact’. Moreover, Kroch (: ) describes contact as an ‘actuating force for syntactic change whose existence cannot be doubted’.4 We have already discussed the role it may play in changing the PLD, and thereby solving the Regress Problem, in §.. and in the previous section. My goal in this section is to look more closely at the role of contact in affecting the PLD, and thereby acting as an ‘actuating force’, to use Kroch’s expression (which in turn harks back to WLH). I begin by reintroducing the distinction between direct and indirect contact, introduced in §.., making some modifications in the light of the discussion in §.. In §.., I will discuss the possibility that the change in English word order from OV to VO, which, as we have mentioned, may have been caused by contact with speakers of Old Norse (ON) in the Danelaw (see in particular Kroch, Taylor, and Ringe ; Trips ), before moving on to a discussion of the more radical proposals concerning the outcome of contact between Old English (OE) and ON put forward by Emonds and Faarlund (; henceforward E&F). This will lead to a discussion of the notion of an ‘outlier’ language. In §.., I will illustrate the notion of substratum in relation to varieties of English spoken in Wales and Ireland, considering the possibility that certain syntactic peculiarities of these varieties may be attributable to a Celtic substratum (Kallen ; Thomas ; Cottell ). Finally, in §.., I will consider the typology of contact relations put forward by Thomason and Kaufman () in the light of the general approach I have been developing here; here we will see that the division of parameters in macro-, meso-, micro-, and nanoparameters, introduced in §.., (), may play an important role.

4 Meisel, Rinke, and Elsig () argue that this is true for all cases of syntactic change, but this view overlooks the possibility of endogenous changes such as many cases of grammaticalization, the loss of V-to-T movement in English, etc., for which contactbased accounts seem difficult or impossible.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CONTACT AND SUBSTRATA

I have pointed out that contact may bring about parameter-resetting because where two grammatical systems are in contact, the PLD is affected by an alien grammatical system. What this means is that Generation  in the schema for abductive change in () of Chapter , or more precisely, the younger group of speakers, is subjected to a different kind of PLD from Generation , or the older group.5 The younger group receives PLD that either directly or indirectly reflects a distinct set of parameter values from that which underlay the PLD for the older group. We distinguished the direct and indirect cases of language contact influencing PLD as in () (modified from Chapter , ()): () a. Direct contact: Older group: G1 →

Corpus1 CorpusAlien

Younger group: G2 →

Corpus2

b. Indirect contact: Oldest group: G1

Older group: G1 →

→

Corpus0

Corpus1; GAlien

→ CorpusAlien

Younger group: G2 → Corpus2

5 From now on, I will use the terms ‘older group’ and ‘younger group’ rather than ‘Generation ’ and ‘Generation ’, given the discussion of Weinreich, Herzog, and Labov’s () critique of the notion of intergenerational change in §...



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

The core of the distinction between direct and indirect contact, as () shows, is that in the direct case the PLD simply contains a significant quantity of tokens from a distinct system, while in the indirect case the older group uses a second language in interaction with the younger group. Here the PLD for the two groups is very obviously distinct. The distinction between direct and indirect contact corresponds approximately to that which is sometimes made between borrowing and imperfect learning, respectively (see for example Kroch : ), although the direct case does not really have to result in ‘borrowing’ on the part of either the younger or the older group: the Alien Corpus merely has an effect on the younger group’s PLD. As we saw in §.., Lucas () and Breitbarth, Lucas, and Willis (: ‒) adopt the framework of Van Coetsem (, ) for analysing contact-driven syntactic change. There we introduced the distinction between the SL and the RL in a contact situation. Borrowing can be distinguished from imposition in these terms: a change made by a speaker dominant in the RL, i.e. under RL-agentivity, is borrowing, while a change made by a speaker dominant in the SL is imposition. In terms of the schemata in (), direct contact in (a) involves RLagentivity and hence is borrowing. Indirect contact as in (b) is more likely to involve SL-agentivity since the alien grammar here is the SL grammar and the alien corpus consists of strings generated by this grammar (although we will see in the next section that possible contactdriven change in the early Middle English (ME) was indirect and involved RL-agentivity). See again Breitbarth, Lucas, and Willis (: ‒) and §... 5.2.2. Contact and word-order change in the history of English This section concentrates on the history of English, and more specifically on the question of the nature and type of contact influence from Scandinavian, i.e. some form of ON, on syntactic changes that took place mainly in the Middle English (ME) period. The idea of syntactic influence of ON on OE is touched on by Jespersen (: ), who states that ‘the intimate fusion of the two languages must certainly have influenced syntactical relations’. Regarding word order, he mentions () the fact that ME and Scandinavian show Genitive-Noun order, while OE usually had the opposite, but adds ‘in these delicate matters it is not safe to assert too much, as in fact many similarities may have 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CONTACT AND SUBSTRATA

been independently developed in the two languages’. Also, Mitchell and Robinson (: ) comment that ‘the influence of the dialects spoken by the Danish invaders of the ninth century could have made itself felt and may well have been more advanced in colloquial OE than in the more conservative forms of the language recorded in the manuscripts’. I will first take up the possibility that the change from OV to VO, which has figured several times in our discussions by now, may have resulted from contact between ON and OE in the Danelaw in the ninth to eleventh centuries. Second, I will discuss the claims in E&F that there was language shift from OE to a form of relexified ON in the early ME period. Finally, I will briefly discuss the idea that Modern English (NE) is an ‘outlier’ in Germanic (as argued for Brazilian Portuguese in relation to the other Romance languages in Roberts b). The idea that the change from OV to VO in early ME resulted from contact between OE and ON is pursued in some depth by Trips (), who shows that one North-Eastern ME text, the Ormulum, written probably in Bourne, Lincolnshire, around  (Burnley (: )), shows a number of syntactic features that are typical of the Scandinavian languages but not typical of OE. Trips concludes that ‘the contact situation between Scandinavian and English brought about a number of syntactic changes, especially the word order change from OV to VO’ (). The contact between OE and ON in the Danelaw was probably of an indirect kind: the older group in the schema in (b) can be thought of as Norse immigrants who imperfectly learned OE. This adult-acquired OE formed part of the PLD for the younger group, who were native OE speakers. The PLD for the younger group was thus clearly different from that of the older group, whose native language was ON, and from that of an older group of OE speakers, or that of OE speakers outside the areas of Danish immigration, settlement, and intermarriage, whose PLD was not affected by ON. In these terms, the change would have taken place under RL-agentivity in Van Coetsem’s terms as described in the previous section. The extent of Scandinavian influence on English has often been commented on, for example, by Jespersen (: –), as just mentioned. There was undoubtedly a high degree of lexical borrowing: elements of core vocabulary such as sky, skin, die, ill, ugly, etc., and, more strikingly, prepositions and functional items such as from, at, the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

plural forms of the present tense of be, and—perhaps the best-known case—the pl pronouns they, them, and their. These are all lexical items of Scandinavian origin which must have come into English through contact with native speakers of ON at or shortly after the period of the Danelaw. There is also reason to think that the verbal inflections of Northern OE and ME were simplified due to contact with ON (this idea is considered by Kroch and Taylor : ‒). If pronouns can be borrowed, perhaps word-order parameters can be changed; after all, on the assumptions being made here, word-order parameters are specified as formal features of functional categories such as v, while pronouns are the realizations of formal features of the functional category D. Hence there is little difference, as far as the grammatical system is concerned, between one and the other. So the hypothesis certainly has an initial plausibility. Following the hypothesis that contact with speakers of ON was responsible for the actuation of the change from OV to VO in English, we must assume that ON was a VO language (Trips : ). This idea can be entertained independently of the nature of the parameter(s) responsible for VO order. The older group’s imperfect learning of OE meant that they spoke VO OE, or at least produced a larger number of surface strings with this order owing to the nature of their secondlanguage competence in OE than native OE speakers would have done. (Recall the complexities of OE word order, as discussed in §...) This would have sufficed to give acquirers of OE, the younger group in (b), exposure to PLD favouring a VO grammar for OE. Thus the imperfect learning of OE by Norse immigrants who were native speakers of VO ON sufficed to provide an ‘alternate form’ of OE to a given group of acquirers of OE in the ninth/tenth-century Danelaw. We thus arrive, through the schema in (b), at stage () of the change as described by Weinreich, Labov, and Herzog and discussed in §..: ‘a speaker learns an alternate form’. In our terms, a speaker acquires a grammar with an innovative parameter value. Stage (), whereby ‘the two forms exist in contact within his competence’ may correspond to a period of competing grammars, as documented by Pintzuk (, ) and described—with the difficulties noted there—in §... This stage could also in principle correspond to a single grammar with a new parameter value allowing for the option of surface VO order alongside OV; I will not explore this alternative here, but it is developed in Biberauer and Roberts (). Stage () of the change from OV to 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CONTACT AND SUBSTRATA

VO—when the OV system becomes obsolete—is independent of the question of Scandinavian influence and so is not relevant to our present concerns. Stage () took place in the ninth and tenth centuries, and is not directly attested. Stage () is reflected in the variation we see in the Early ME texts (see Kroch and Taylor  on this) and in the differences between ME dialects in the area of the Danelaw and those spoken elsewhere: Trips’ () study of the Ormulum supports this conclusion. It is important to see that contact with ON only introduces a new variant; presumably acquirers of OE in the Danelaw were also exposed to PLD produced by native speakers of OV OE which was able to trigger an OV grammar—this is what leads to the variation at stage (). This account relies on two assumptions: one concerning the grammatical structure of the ON spoken in the Danelaw, and the other concerning the sociolinguistic situation in the Danelaw and in early ME: the idea that ON speakers imperfectly learned OE and thereby altered the PLD for subsequent generations. The first assumption is that ON (or the variety of it spoken in England) was VO, or at least more predominantly VO than OE. The second assumption is that Norse settlers in the Danelaw learned OE as a second language under historical circumstances which meant that the PLD which resulted from their linguistic behaviour significantly influenced the wordorder patterns of the language in later generations. This is the combination of RL-agentivity and indirect contact mentioned above. Let us now look at each of these assumptions in turn. First, ON word order. We have no direct evidence at all of the word order of the ON spoken in England, as this was never written down; see Kastovsky (: ): ‘practically no Scandinavian manuscripts exist’. We must therefore consider what is known about the older attested stages of North Germanic more generally, and draw what conclusions we can from that evidence. Whilst all the Modern North Germanic languages are VO, Old Icelandic was OV (cf. Hróarsdóttir , ). Rögnvaldsson () showed that all attested stages of Icelandic from the earliest texts (the Family Sagas, written in the thirteenth and fourteenth centuries) up to the early nineteenth century show either pure or mixed OV order; in fact, Rögnvaldsson (: –) argues for a competing-grammars analysis for this long period of the history of Icelandic. (Recall that both Hróarsdóttir and Rögnvaldsson use the term ‘Old Icelandic’ to include ON; see Chapter , note .) Delsing (: ) shows that Old 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

Swedish, up to , was predominantly OV. Faarlund (: –, b) says that ON had mixed word order, with OV more common in subordinate clauses and in poetry. He says ‘it may be that Old Scandinavian was still underlying OV’ (). He also observes that Ancient Scandinavian, the language of the early runic inscriptions up to the seventh century, was OV. Since the ON of the Danelaw represents an older stage than the earliest attested ON, which dates from the thirteenth century, it is possible that it was ‘more OV’ than attested ON, both in its structural features and in the frequency of OV order. Thus, the assumption that the ON spoken in the Danelaw was entirely or predominantly VO, and so significantly different from OE in this respect that contact with ON was the actuating force in the shift from OV to VO in English generally, does not seem to be supported by what little evidence is available. The most plausible hypothesis seems to be that that variety of ON was a ‘mixed’ OV system, where ‘mixed’ could mean either that there were competing grammars, or that the system allowed true optionality. In other words, there is no reason to think that ON was significantly different from OE as far as the OV/VO parameter is concerned. Therefore, contact with ON cannot have actuated wordorder change in the way described above, and assumed by Trips. Let us now turn to the second assumption behind the contact-based account of English word-order change: that ON speakers learned OE as a second language under circumstances which brought about their influence on the PLD for later generations. Here we can make an argument based on the external historical facts, as far as they are clear, against the idea that contact with ON was the actuating force in word-order change in this way. Examples like () from Chapter , repeated here for convenience, show, as Pintzuk (: ) says, that late ninth-century West Saxon was already a mixed variety: () a. He ne mæg his agene aberan. he  can his own support ‘He cannot support his own.’ (CP .) b. hu he his agene unðeawas ongietan wille how he his own faults perceive will ‘how he will perceive his own faults’ (CP .–)



[OE]

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CONTACT AND SUBSTRATA

Recall that (a) illustrates AuxOV order and (b) OVAux. As we have seen, Pintzuk interprets this variation, and variation between OV and VO orders at the same period, as evidence for grammars in competition, in other words as part of an ongoing word-order change. But the word-order variation observed here cannot be attributed to ON influence, since the translation of the Cura Pastoralis was written in King Alfred’s lifetime (Burnley : ), i.e. before , while the earliest date for Norse settlement in the Danelaw is  (Sawyer ; in Thomason and Kaufman : ; Kastovsky : ; and the references given there). The Norse settlers were certainly fairly few in number to begin with (whatever their later numbers—see below) and only occupied a part of the country distant from Wessex, where the Cura Pastoralis was translated. It is highly unlikely that the influence of ON contact, especially if this was due to imperfect learning of English by Scandinavians, could have spread from the North-Eastern portion of the country, where the Scandinavians mostly settled, to Wessex in just thirty-four years, which is the largest window of time the facts permit. So it seems that ON simply did not have enough time to have actuated the grammar competition that led to the word-order change. One might think that, although both ON and OE were ‘mixed’ OV/ VO systems, the contact between the two engendered something akin to a creolization situation. If creoles tend to show unmarked values of parameters, then we could think that this gave rise to the shift to the unmarked VO value of the parameter. (Recall the discussion of unmarked values for parameters in §., and the conclusion that there is a very slight markedness preference for head-initial orders.) However, Thomason and Kaufman (: ff.) argue very convincingly against this, concluding that ‘[w]hen all the relevant data are examined . . . it is apparent that a creolization hypothesis is not required to explain the facts of Northern Middle English, nor is it even likely’ (). They also point out that ‘Norse influence could not have modified the basic typology of English because the two were highly similar in the first place’ (–). I conclude that the idea that ON influence in the Danelaw functioned as an actuating force in the change from OV to VO in the history of English cannot be supported by the available evidence regarding ON word order, and that the possibility of creolization of OE as a result of contact with ON is not realistic. External facts such as the very short window of time for Norse influence to make itself felt on Alfredian West Saxon OE, especially given the initially small numbers 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

of Norse settlers in the Danelaw and the physical separation of the Danelaw from Wessex, also militate against this idea. Let us consider the evidence regarding contact between ON and OE more carefully. Thomason and Kaufman (: ff.) survey the available evidence in detail. There is no doubt that Norse speakers learned English: Thomason and Kaufman state that ‘Norse probably lasted no more than two generations after ’ (), and ‘we are convinced that Norse was largely or entirely absorbed by English in the Danelaw by . ’ (). Therefore the idea that the PLD for language acquirers may have been affected in the manner described above by the imperfect learning of OE on the part of native speakers of ON has some initial support. ON speakers settled in England between  and , probably not in very large numbers.6 The evidence for large-scale immigration is uncertain (see also the comments in Kastovsky : ); what is clear is that Viking soldiers and their families settled in various localities over a period of about a century. The densest area of settlement was Leicestershire, Lincolnshire, Nottinghamshire, and Yorkshire (see also Burnley : ). It is likely that the Scandinavians mixed with the native population everywhere they settled. The contact between ON and OE must have taken place, then, between roughly  and  primarily in the counties mentioned, i.e. in the North-East Midlands and Yorkshire. Thomason and Kaufman (: –) discuss in detail what they refer to as the ‘Norsification’ of OE (see also Meisel, Rinke, and Elsig : ). Thomason and Kaufman claim to deal with ‘all the discoverable grammatical influence of Norse on OE (apart from syntactic rules)’ (; emphasis theirs). Since it doesn’t deal with ‘syntactic rules’, their survey cannot provide direct information on the situation regarding word-order change, but it does provide some very useful indications concerning the effect of the contact between ON and OE on the OE and Early ME grammatical systems. They define what they call the ‘Norsified dialects’; these are the varieties spoken in the counties just mentioned as those most densely settled by Scandinavians, along with 6 This and the comments in the rest of the paragraph on the nature of the settlement in the Danelaw are based on the extensive quotations from Sawyer () in Thomason and Kaufman (: –, notes –). These remarks are largely corroborated by the discussion in Kastovsky (: –), although he considers Sawyer’s estimates of the Danish population to be on the low side.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CONTACT AND SUBSTRATA

Northern East Anglia, parts of Northamptonshire and Cambridgeshire, Cheshire, Lancashire, and Derbyshire.7 The Norsified dialects ‘have not only heavy lexical influence from Norse, but also have adopted a significant number (between  and ) of Norse derivational and inflectional affixes, inflectional processes, and closed-class grammatical words’ (). They argue that this variety originated in the North-East Midlands, specifically in the areas identified as those most heavily settled by Scandinavians, and that it significantly influenced Northern ME. (In fact, they argue that it had more influence on Northern ME than did Northumbrian OE ().) Norsified English ‘arose at a time when Norse was still spoken but going out of use in its area’. The grammatical features in question include some I have already mentioned, such as the pl pronouns, are and art forms of the present tense of be, as well as a number of strong-verb forms, plural forms, comparative forms, etc., along with the loss of some inflections (for example, the ge- prefix on perfect participles).8 The entire list is given in Thomason and Kaufman (: –, table ). Many of these seem to be cases of contact-induced borrowing, whether due to direct contact or indirect contact (imperfect learning) in the sense defined above. The apparently rather rapid language shift suggests indirect contact, and this is what we described above. As we mentioned, this is compatible with the idea that word-order change was actuated by contact; but what militates against this is the complete lack of evidence that ON was in any way ‘more VO’ than OE. One issue which is important is the inflectional simplification— notably in verbal inflection—which took place in ME, and which may have been very important for various syntactic changes (see the discussions in §. and §.). Kroch and Taylor (: –) suggest that the simplification of verbal inflection came about in Northern ME due to the effects of imperfect learning of English on the part of ON speakers. Thomason and Kaufman (: ) argue that the simplification of the verbal inflection was a native, i.e. not a contact-induced, feature of Northumbrian OE (although Traugott :  says the loss of nominal inflection happened ‘possibly under Scandinavian 7 Thomason and Kaufman (: –) provide their own characterization of the ‘ethnolinguistic areas’ of England and Scotland, which I am somewhat simplifying here. 8 According to DeGraff (: ), Schleicher () had already suggested that the inflectional poverty of English as compared to Icelandic may have been due ‘to the much higher instance of language contact in the history of English’ (DeGraff : ).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

influence’). They say that the ‘degree of simplification in the North, versus its rarity in the rest of the Danelaw, correlates with the rather high level of social upheaval prevalent in the North between  and . It does not, however, correlate with anything in the structure of Norse’ (). Furthermore, the dropping of final schwa in ME, which had the effect of eliminating the phonological exponence of many inflections and which is known to have taken place earlier in Northern than in Southern dialects of ME ‘cannot be blamed on the language contact situation’ () since it happened after . It may be, then, that not all instances of morphological simplification that took place in ME, even if they took place first in the North, were caused by contact. (See Kastovsky : – for a general discussion of the effects of contact between ON and OE.) In conclusion, then, there is no question that contact with ON led to many changes in English. However, there is no strong basis for the idea that word-order change was among these. In this connection, it is worth recalling that change from OV to VO is well-known elsewhere (in particular among the European branches of Indo European; see §.. and §.. and the references given there), and can take place independently of contact: the change in nineteenth-century Icelandic appears to be a case in point (see Hróarsdóttir , ; Rögnvaldsson ). The general account of markedness developed in §. has the result that there is a slight preference for head-initial over headfinal orders, and so this may explain a slight tendency for OV systems to change to VO (see again §..). The case for contact-driven change from OV to VO in OE and/or ME is not empirically clear, and is in any case not required: given the general similarity in word order between ON and OE, whatever internal factor might have caused ON to be ‘more VO’ than OE and thereby exert putative influence on OE could have been, and most likely was, operative in OE anyway. The evidence for competing grammars or formal optionality, depending on how one interprets the word-order variation in subordinate clauses, does not reveal anything about contact with ON, as it is present in varieties of West Saxon which could not have been influenced by ON. Let us now turn to the second topic of this section, the proposals made by E&F. E&F argue that ME was not a continuation of OE, but instead it continued the Norse spoken in the Danelaw. In other words, a language shift took place in England at the beginning of the ME period. As they put it: 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CONTACT AND SUBSTRATA

While the parents may have been speaking mutually comprehensible amalgams of their different native Germanic languages, their children were already creating from these vocabularies a new North Germanic tongue consistent with Universal Grammar—the language which today we call Middle English. And so the language descended from it, Modern English, might more aptly be called Modern Norse. (E&F : ; emphasis theirs)

E&F support their claim with a range of evidence, including lexical, morphological, syntactic, and phonological properties. By far the most interesting (and relevant) evidence is morphosyntactic, but let us first briefly look at what they say about the lexical evidence. Concerning the borrowing of open-class lexical items, E&F point out that it is hard to get clear evidence for or against their thesis, since about half of the ME Germanic vocabulary was common to North and West Germanic. About a third of the open-class ME vocabulary was unambiguously derived from OE and about a sixth unambiguously from Norse. Nonetheless, it is very well-known that the Norse items include very common, core vocabulary items of a kind not generally thought to be readily borrowed, as mentioned above. One cannot, on the basis of this rather indeterminate evidence, exclude the possibility that ME is relexified Norse. In this connection, E&F essentially invert Thomason and Kauffman’s (: ‒) notion of ‘Norsified English’, instead positing ‘Anglicised Norse’. But, as they rightly point out, perhaps the most important observation here is that ‘[N]o principles of linguistic descent even remotely depend on differences in sources of vocabulary of this order. If they did, English would necessarily be classed as a Romance language’ (E&F: ). So the lexical evidence is indeterminate regarding the possibility of language shift. Let us now look at the morphosyntactic evidence. Regarding morphosyntax, E&F’s general argument is that there are a number of properties found in ME from the earliest texts which exhibit features not found in Common West Germanic or in OE, but found in North Germanic. These properties must have appeared in a short period of about two centuries. The traditional view is that these innovations are due to the combination of internal change and syntactic borrowing, as we saw above, but E&F propose that these changes are evidence of language shift from West Germanic (OE) to North Germanic (Anglicized Norse). Thus, ‘English’, in the sense of a West Germanic variety more or less directly continuing OE, disappeared after the twelfth century. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

Interesting and provocative though E&F’s thesis is, there are two very important caveats. First, as we have already mentioned, no texts at all survive of the Norse of the Danelaw. It was of course a North Germanic language, and surely a conservative one in relation to Medieval or Modern Mainland Scandinavian (MSc) (in part, because it is centuries older than any surviving MSc texts). Of the surviving attested North Germanic languages, Old Icelandic, the language of the sagas, dating from twelfth-century Iceland, is almost certainly the closest approximation we have to Danelaw Norse. So, all comparison of Danelaw Norse should be with this language. Comparison with Modern Icelandic or any stage of MSc is irrelevant, since we know that there were significant changes in the histories of these languages after ; they were spoken in different times and different places. Second, E&F acknowledge that endogenous change is possible and that Norse underwent internal changes. But if this is true then what prevented English from undergoing similar changes? The languages are quite closely related both historically and typologically, and contact may have accelerated the changes (in both directions). Convergent change among languages of a similar type is common; in fact it is widely acknowledged that this has happened in the history of Romance (see the discussion of changes in complementation in Ledgeway ; Ramat and Ricca ; Vincent ; and §.). Given these caveats, we need to be sure of the following points for E&F’s case for language shift to be convincing: () a. the relevant morphosyntactic property is found in Old Icelandic, as this is our closest proxy for Danelaw Norse (Modern Icelandic and MSc are historically and geographically too distant from Danelaw Norse to be reliable); b. the relevant morphosyntactic property is clearly absent from OE and West Germanic; c. the relevant morphosyntactic property could not have arisen through endogenous change (such as grammaticalization); d. the relevant morphosyntactic property cannot easily be borrowed, and so must entail language shift. So let us now look at a representative selection of E&F’s morphosyntactic evidence in light of the criteria in ().



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CONTACT AND SUBSTRATA

The first piece of evidence concerns word-order change from OV to VO. It is well-known that OE had verb-final order in subordinate clauses, as we have seen repeatedly (see in particular §.. and §.). In this connection E&F comment that ‘Norse VO word order is the source of the innovative VO order that came to predominate in twelfthcentury Middle English, as there is no other source for this pervasive change’. In other words, the OV-to-VO shift was caused by language shift according to E&F. The same problems hold for the idea OV-toVO change was due to language shift as those which hold for the idea that it was actuated by language contact as discussed above: the lack of evidence for VO order in Danelaw ON (and indeed the probability that it was mixed, like OE); the fact that ON word order changed independently of OE on this view, despite being typologically and genetically close to OE, so that whatever explains the change in ON could very plausibly explain the change in OE;9 the evidence for VO and AuxV order in Alfredian West Saxon OE, from a time and place where Scandinavian influence was highly unlikely. So the word-order change does not provide evidence for language shift. E&F’s second piece of evidence concerns verbal particles. They point out that both Scandinavian and ME but not OE and the rest of Germanic, have ‘a system of post-verbal directional and aspectual free morpheme particles, contrasted with WGmc systems of pre-verbal separable prefixes’: () a. þæt he his stefne up ahof. that he his voice up raised. ‘that he raised up his voice.’ (Bede .; Pintzuk : ) b. te eorl stæl ut [and] ferde after Rodbert the earl stole out and went after Robert, eorl of Gloucester Earl of Gloucester ( Peterborough Chronicle; E&F: )

[OE]

[ME]

However, again there are problems using these facts as evidence for language shift. First, no examples of Verb > Particle order are given for Old Icelandic, and so this argument fails criterion (a) above. Second, 9 E&F assert that ‘such word-order change [OV to VO, IGR] . . . is certainly very rare’. However, in fact this change is not at all rare, certainly among the European branches of Indo-European, as we have repeatedly seen (see §.. and §..).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

the change from Particle > Verb to Verb > Particle order has long been seen as related to word-order change, so the comments made above regarding the OV-to-VO change hold. Hence this change is not evidence for language shift. The third case involves future auxiliaries. E&F say that the ‘[OE] ancestors of the later future auxiliaries . . . sculan and willan, had the lexical content of necessity or obligation, and wish or intention, respectively’ (E&F: ), i.e. they were modal rather than purely future auxiliaries. But this is a plausible endogenous change, a case of grammaticalization; we witness exactly the same development of Latin habere and debere in Romance. Habere had deontic interpretations in Late Latin and in many of the Modern Standards it is the origin of the future and conditional tenses (see §..). Debere, clearly a modal in most Romance, forms the future auxiliary in certain varieties of Sardinian (Mensching and Remberger : ). Therefore this change does not meet criterion (c); this kind of change is a common form of grammaticalization, and so language shift is not needed to explain it. The fourth change concerns perfect infinitives. E&F () say these are rare in OE (Fischer : ‒) but common in ME and Norse. () is an ME example: () And wolde have kist his feet (Chaucer, Knight )

[ME]

However, OE didn’t have anything like a periphrastic perfect; this developed in ME (Biberauer and Roberts , ). Again, this is a natural and very common endogenous change (grammaticalization of HAVE as a perfect auxiliary), one which has taken place in the histories of French, Italian, German, and Dutch (see §..). Again this change fails criterion (c). The fifth property is preposition-stranding, as in (): () a. What did you talk about (what)? b. Kva snakka de om? What talked you about?

[Norwegian]

This is undoubtedly a cross-linguistically rare property shared by English and Modern MSc, and so looks like a good contender for supporting the language-shift account. However, once again there are problems. First, preposition-stranding is not clearly attested in early ME (Lightfoot ). Second, E&F (‒) give examples from 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CONTACT AND SUBSTRATA

thirteenth-century Old Danish and ON, but none from Old Icelandic, again failing criterion (a); Bech and Walkden (: ‒) argue that there are no attested cases of preposition-stranding in the older North Germanic languages, including ON. Third and most importantly, we know that this construction can be borrowed, as we saw in our discussion of Prince Edward Island (PEI) French in §... So prepositionstranding fails to demonstrate language shift by criteria (a) and (d). Sixth, there is the well-known question of the nature of V in OE and ME. As we saw in §..., OE allowed surface V, with the first element any XP, and the second usually (but not only) a pronoun unless the initial element was negation, a wh-phrase or one of a small class of discourse adverbs. This order is shown in () (repeated from §..., (b)): () þin agen geleafa þe hæfþ gehæledne thy own faith thee has healed ‘Thine own faith has healed thee.’ (BlHom .‒, cited in Pintzuk : )

[OE]

This is not found at all in Scandinavian, or in Northern ME (Kroch and Taylor , b). Kroch and Taylor () account for the loss of V in fifteenth-century English in terms of a North-South dialect mixture, with northern (rigid) V speakers reinterpreting the southern variety as non-V and adapting to it, thereby generally losing V. This entails that southern, fifteenth-century ME was a continuation of OE. But if all posttwelfth-century ME is Norse, then, given that V is completely rigid and stable throughout the history of MSc, why would Anglicized Norse have lost it? E&F’s claim that ME is North Germanic forecloses on the Kroch and Taylor account of the loss of V in English, arguably the most widely accepted account of this very important change. E&F’s seventh argument comes from relative pronouns. Early ME loses OE relative pronouns that display case or gender (the se þe relatives; see Mitchell and Robinson : ). As in Old Scandinavian, Early ME relativizers are invariant. However, Old High German lost the same system (although in a different way).10 So this argument fails by (b): the same change is observed in the history of German.

10 It is worth noting that English later calqued wh-relative pronouns from French, showing that French too had some grammatical influence on English (other possible morphosyntactic borrowings from Norman French include the progressive and the arbitrary pronoun one).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

E&F (‒) also ascribe the change in negation from ne+V to ne+V +naht to V+naht to Scandinavian, but this is of course Jespersen’s Cycle, and has taken place in French, many Italian dialects, Welsh, Breton, Dutch/Flemish, and German (at least); see §. and the references given there, particularly Willis, Lucas, and Breitbarth (), Breitbarth, Lucas, and Willis (). Again, this argument fails criterion (b). Finally, E&F argue (‒) that general loss of inflection in ME can also be attributed to language shift. However, the loss of inflection is more radical than what has happened in Scandinavian. Moreover, if Old Icelandic is the proxy for Danelaw Norse, then there has been considerable loss of inflection which cannot be attributed to the putative language shift. () lists some of the inflections lost in ME and compares that with both MSc and Modern Icelandic: () a. b. c. d.

grammatical gender (not lost in MSc); adjectival agreement (not lost in MSc); case (lost in ME and MSc, but not in Icelandic); subjunctive (lost in ME and MSc, but not in Icelandic).

E&F also claim that ME lost two inflections that survive in MSc: the ‘reflexive’ -s (which forms the MSc ‘-s passive’), and the enclitic definite article. Concerning the former, they claim that it’s ‘the dropping of a simple consonantal inflection’ (). However, this is rather surprising, since -s inflection is almost the only one that survives in NE (in the sg present, the plural, and the possessive). There is no trace of the reflexive -s anywhere in ME or NE, which makes ME/NE the only putative Scandinavian language to lack this form. This is of course easily explained if ME/NE are West Germanic languages. Concerning the enclitic article, E&F claim it too is an inflection, but the well-known distribution of this element in DPs modified by adnominal adjectives in Danish is problematic for this proposal, cf.: () a. hunden ‘dog-the’ b. den store hund ‘the big dog’

[Danish]

No trace of this is found anywhere in ME or NE. This otherwise strange fact is easily explained if ME/NE are West Germanic languages. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CONTACT AND SUBSTRATA

In conclusion, E&F’s case is interesting, but weak. The evidence of uniquely Scandinavian features in ME doesn’t force us to assume language shift rather than borrowing. Thomason and Kauffman’s (: ff.) notion of Norsified English looks more plausible than E&F’s Anglicized Norse. Bech and Walkden () review E&F’s claims and come to very similar conclusions. They summarize their position as follows: ‘the claim that English is Norse is a bold one, but E&F’s manifesto fails to convince on methodological, empirical, and theoretical grounds. The traditional view that English is a West Germanic language thus stands intact’ (Bech and Walkden : ). It is nonetheless true that NE is an outlier in the Germanic context, showing a range of syntactic features not found elsewhere in the family. Here are the properties that distinguish NE from the other West Germanic languages, ME and MSc (several of these were discussed in §..): () a. b. c. d. e. f. g.

Loss of V (fifteenth century) Loss of transitive expletive constructions (sixteenth century) Loss of object shift (fifteenth/sixteenth century) Development of modal auxiliaries (sixteenth century) Loss of infinitive inflection (fifteenth century) Do-support (sixteenth/seventeenth century) Clitic negation/negative auxiliaries (seventeenth century)

No-one invaded England in the fifteenth to seventeenth centuries. These changes were either endogenous (a cascade, as argued by Biberauer and Roberts  and summarized in §..) or due to the longterm consequences of a substrate which cannot have been Norse, since not one of the properties in () is found anywhere in North Germanic. A possibility that may be worth considering is that there was a Celtic substrate in English. The Celts arrived in Britain about a millennium before the Angles, Saxons, and Jutes (estimates vary) and had extensive contact with English speakers all over the country and arguably for many centuries, especially in Western areas. Moreover, Welsh, the extant descendent of the Celtic spoken on the main island of Britain at the time of the Germanic invasions, is a head-initial language with a dummy auxiliary and negative auxiliaries, and which went through a V phase in the medieval period (Willis ; Meelen ). Hence, whatever is responsible for the outlier status of NE in Germanic, it is not the Vikings. There was of course significant influence from North 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

Germanic (Danelaw Norse), as we have seen, but the case for actual language shift is slight. English, especially in the Danelaw areas, became somewhat ‘Norsified’, as documented by Thomason and Kaufman (), summarized above. But the key syntactic differences between English and the rest of Germanic emerged later. It is difficult to attribute these to a substrate rather than to a cascade of parameters as argued by Biberauer and Roberts (), although some role for Celtic cannot be absolutely ruled out. 5.2.3. Substratum effects: Hiberno-English and Welsh English Let us now turn to substratum effects, which I will argue to be another case of indirect contact. The change to the system introduced through indirect contact is preserved once acquired by the younger group, following the schema in (b). The usual notion of substratum in historical linguistics refers to the situation where a community gives up its original language in favour of a new one, but some feature or features of the original language survive and influence the structure of the adopted language (whether this happens due to the RL or the SL, i.e. through borrowing or imposition in Van Coetsem’s terms introduced in §..). A standard example comes from the history of Spanish: Cantabrian Spanish is thought to have a Basque substrate, which may be responsible for the loss of initial /f/ in that variety, which spread to other dialects—notably Castilian—in the Middle Ages, giving, for example Spanish hacer from Latin facere (‘to do’) (Menéndez-Pidal : ff.; Penny : –). This feature of Spanish may thus originate in language shift on the part of Basque speakers, who, as it were, spoke Spanish ‘with a Basque accent’ and therefore without initial /f/, since at this period it is thought that Basque had no /f/. (For a critical evaluation of this and other standard examples of substratum effects, see Hock and Joseph : –.) Here I follow Thomason and Kaufman (: –) in not distinguishing between the traditional notions of substratum, adstratum, and superstratum, although I will distinguish substrate from superstrate in the discussion of creoles in the next section; in that context, it is important to distinguish these notions in order to understand the ‘relexification’ account of creole genesis, as we shall see. All three notions, as traditionally used, refer to the socio-political relations between different groups shifting languages rather than to any 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CONTACT AND SUBSTRATA

structural features. French, for instance, is said to have a Celtic substratum since the inhabitants of Gaul who eventually abandoned Gaulish for Latin/Romance were conquered by the Romans, whilst it has a Germanic superstrate since the Franks conquered France somewhat later and then gave up their Germanic language for Gallo-Romance. Thomason and Kaufman (: ) point out that one feature seems to correlate with the substratum vs. superstratum distinction: ‘superstratum interference is more likely to include lexical items than substratum interference is’. They also point out that French is in a superstratum relation with English, and this correlates with the large amount of lexical borrowing from French into English. One might propose something similar for ON in relation to English, given the discussion in the previous section. The case of possible substratum influence I want to discuss concerns certain syntactic features of Hiberno-English, the English of Ireland, and Welsh English. In both Wales and Ireland, the inhabitants have been steadily shifting from Welsh and Irish respectively to English over several centuries, starting in the seventeenth century in the case of Irish (Ó Murchú : –) and the sixteenth in the case of Welsh (Owen Jones : –). The English spoken in these countries has certain features which may be attributable to substratum influence from the original languages, despite a paucity of loan words from them. Here I will look at three features, one for each variety and one common to both, before briefly looking at Corrigan’s () evidence from Early Modern Irish English. Thomas (: ff.) presents several features of Welsh English which he attributes to substratal influence from Welsh.11 The first is ‘fronting’, shown in () (Thomas (: ), his italics): () a. Coal they’re getting out, mostly. b. Singing they were. c. Now they’re going.

11 Unsurprisingly perhaps, contact with English has brought about syntactic changes in Welsh. Davies and Deuchar () document the evidence for an ongoing change concerning deletion of an initial finite auxiliary in contemporary Welsh. Since the word order in Welsh in sentences where there is a finite auxiliary and a main verb is AuxSVO (see §..., ()), this gives rise to SVO orders. Davies and Deuchar (: ) conclude that ‘contact with English may be seen as acting as a catalyst to this change and promoting convergence in the word order of Welsh towards that of English’.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

As Thomas says, these examples are equivalent to both clefting and pseudoclefting examples in Standard English, as the fronted material is new information: () a. What they’re getting out mostly is coal. b. What they were doing was singing. c. It’s now that they’re going. Welsh does not distinguish clefts from pseudoclefts, having instead a general fronting construction which fronts any XP to the left periphery of the clause, very likely to SpecCP, and can function to indicate new information. (See Tallerman ; Willis ; Roberts  on this construction.) The following examples illustrate fronting of a DP, nonfinite VP, and an adverbial PP: () a. [Y dynion] a werthodd y ci. [Welsh] the men  sold the dog ‘It’s the men who have sold the dog.’ (Tallerman : ) b. [Gadael y glwyd ar agor] a wnaeth y ffermwr. Keep the gate on open  did the farmer ‘Leave the gate open, the farmer did.’ (Rouveret : ) c. [Ym Mangor] y siaradais i llynedd. in Bangor  spoke I last year ‘It was in Bangor I spoke last year.’ (Tallerman : ) Fronting of what appears to be a non-finite VP, as in (b), is unacceptable in Standard English, whether to indicate new or old information. This variant of the construction at least, but perhaps the general use of fronting to indicate new information, may have entered this variety of English through imperfect learning of English by native speakers of Welsh and the subsequent transmission of this construction to later generations of native speakers of English, first perhaps bilingual in Welsh and then later not, following the schema for indirect contact in (b). Although the precise grammatical analysis of this construction is unclear, it probably represents a parametric option connected to the structural realization of new information (and as such, its formal correlate concerns the distribution of EPP features on C). Heine and 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CONTACT AND SUBSTRATA

Kuteva (: ‒) document similar examples of clefting of predicate APs, adverbs, and VPs in Hiberno-English; since Irish has similar clefting possibilities to Welsh, a similar account can be given to that just sketched for Welsh English regarding how the construction came into Hiberno-English. Heine and Kuteva also show, following Filppula (), that clefts of this type are most frequent in the spoken English of areas of the West of Ireland where Irish influence is most recent, and less frequent in Dublin where Irish influence is least strong; see also Cottell (). Another example of the Irish substrate in Hiberno-English is the well-known perfect with after, as in (see Kallen :  ff., ; Cottell ): () a. I’m after writing a letter. b. She is after selling the boat. This construction has a direct counterpart in Irish: () Tá sí tréis an bád a dhíol. is she after the boat  sell ‘She has sold the boat.’ (Kallen : )

[Irish]

Once again, it is very tempting to see this construction as having originated in imperfect learning of English by native speakers of Irish, followed by its adoption into the native English of later generations and its retention after Irish had been abandoned, even as a second language. Again, the schema for indirect contact in (b) applies. In this case, too, the precise nature of the parameter at work is not completely clear, although it is connected to the options for the realization of perfect v, and hence to the options we observed in Italian and various dialects in §.. (see Roberts : – for some relevant discussion). Third, a further construction common to both Welsh English and Hiberno-English is the option of inversion in indirect questions:12 12 Both of these examples illustrate what McCloskey (: ff.) calls ‘semi-questions’ in the subordinate clause. The embedded clause does not convey true interrogative force, but rather a kind of indefinite proposition. McCloskey proposes a subtle semantic test which brings out this distinction; see the references given there as well as McCloskey’s own discussion of the semantics of these clauses. The semantic difference between true questions and semi-questions is perhaps best intuitively seen in the distinction between ask, which takes a true question as its complement, and ask about, which takes a semi-question. Many



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

() a. I wouldn’t know would there be any there now. [Welsh English] (Thomas : ) b. I know is he going or not, but I’m not letting on. [Ulster English] (Henry :  ff.) This is not allowed in Standard English, where inversion is only possible in main clauses and a small class of embedded declarative clauses. We saw in §... that movement of the inflected auxiliary in T to C is blocked by the presence of a complementizer in C (see example () there). In fact, the condition in Standard English is that any complement interrogative C, whether realized by a complementizer or wh-expression or not, blocks inversion, i.e. movement of T. In Hiberno-English and Welsh English, this condition does not seem to hold, as () shows. Both Welsh and Irish, as VSO languages, allow orders very close to those seen in (). These examples are from Roberts (: –): () a. Tybed a geith hi ddiwrnod I-wonder  will-get she day wythnos nesa. week next ‘I wonder if she’ll get a free day next week.’ b. Chuir sé ceist ort an raibh tú asked he question to-you  were you ‘He asked whether you were content.’

rhydd [Welsh] free

sásta. [Irish] content

However, there are good reasons to think that in fact the verb has not raised as far as C in (), but only to T with the subject failing to raise to SpecTP. (See the discussion of the Welsh examples in §..., (‒); that discussion carries over to Irish, see McCloskey .) The element a/an glossed as ‘Prt’ in () is most likely to be an

varieties of English, including the variety of Hiberno-English described by McCloskey (), disallow inversion in subordinate semi-questions but allow it in true questions: (i) (ii)

I asked them what would they do. I asked them about what would they do.

(true question) (semi-question)

Henry (: ) explicitly states that the variety of Irish English she discusses differs from the one McCloskey investigated. It seems that the Welsh English reported by Thomas is like the variety Henry discusses rather than that discussed by McCloskey, although it is not easy to be sure on the basis of the evidence presented (cf. in particular, McCloskey’s observation that in the variety he investigates, complements to know with inversion improve if know is negated).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CONTACT AND SUBSTRATA

interrogative complementizer comparable to English if, a fact that would be consistent with the idea that the verb has moved only as far as T in these examples. Nevertheless, the general proposal for transmission of these structures to later generations through imperfect learning of English on the part of speakers of Irish or Welsh can be maintained. We see then that native speakers of these VSO languages have overgeneralized inversion in embedded clauses, partly due to interference from the native VSO grammar due to imperfect adult L learning along the lines described in the previous section.13 This then becomes part of the PLD for subsequent generations, and the substratum effect is once again created. The parameter in question concerns which kinds of C are able to trigger movement of T in embedded clauses; in terms of the discussion of T-toC movement in §..., this is residual, symmetric V. Corrigan (: f.) discusses aspects of Early Modern Irish English of the seventeenth and eighteenth centuries, soon after English was first introduced into Ireland at a time when the majority of the Irish population would have been first-language speakers of Irish with English as a second language. Here we find some evidence of null subjects and, possibly, verb-movement to initial position: () a. pro Shal Drink the good ale Whil feather-Cock crow! ‘[I] shall drink the good ale until the weather-cock crows!’ (, Bliss : , cited in Corrigan : ) b. is none of you strong enough . . . to owercome him (, Bliss : , cited in Corrigan : ) In (a), we see a Sg null subject. Corrigan plausibly attributes this to transfer from Irish, which is clearly a null-subject language (McCloskey and Hale ). In (b), the auxiliary is is in first position in the string. The sentence is therefore naturally construed as a yes/no question by speakers of NE, but Corrigan observes that the sentence ends in an exclamation mark in the original. The example could be a case of verbinitial order; alternatively, there may be an expletive null subject here. A further (possibly non-exclusive) possibility is that this example 13 It is worth noting that Thomas (: ) also comments that the ‘elision of the conjunction [i.e. the complementizer, in the terminology being used here—IGR] (if/ whether) is also facilitated by the Welsh rule of eliding the corresponding conjunction (a/ os) in similar environments in the vernacular’. I do not know whether Irish allows a similar process of elision, however.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

features ‘inversion’ of the negatively quantified subject in a way comparable to that documented for African-American Vernacular English (AAVE) by Labov (), see (a) of §... In this connection, Corrigan (: ) quotes Bliss’ (, §) comment that ‘[t]he initial position of the verb in Irish is rarely imitated in Hiberno-English, although it does occur’. What we seem to observe at this early stage of contact between Irish and English is fairly direct transfer of features of Irish to English. Presumably, greater exposure to English at later stages, combined with native bilingualism, led to a situation where the substratum effects were slighter, as in (). The alternative to substrate analyses which is often put forward is that the same construction could develop independently of contact; in fact, we raised exactly this objection to the idea that ON actuated the OV-toVO change in English above. In the case of (), it would suffice to point to the numerous non-standard varieties of English which allow inversion in indirect questions, including the variety of Hiberno-English discussed in McCloskey (), AAVE (Mufwene : ) and other varieties of American English. However, as McCloskey () and Henry () have shown, there may be very subtle differences among varieties allowing inversion in embedded questions—see note . One might expect that the most liberal varieties are those which show substrate effects through indirect contact, in the sense described here. This must remain a question for further investigation, however. 5.2.4. A ‘borrowing scale’ Let us finally consider Thomason and Kaufman’s (: –) borrowing scale, in relation to what we have said regarding contact and our general approach to syntactic change. () summarizes Thomason and Kaufman’s scale, restricting attention to lexical and syntactic traits: () () Casual contact: lexical borrowing of content words only. () Slightly more intense contact: lexical borrowing of some function words; syntactic borrowing of new functions and ‘new orderings that cause little or no typological disruption’ (: ). () More intense contact: function words, derivational affixes, inflectional affixes with borrowed vocabulary. Syntax: no complete typological change, but perhaps a partial one. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CONTACT AND SUBSTRATA

() Strong cultural pressure: moderate structural borrowing. ‘[F]airly extensive word order changes will occur . . . borrowed inflectional affixes . . . will be added to native words’ (: ). () Very strong cultural pressure: heavy structural borrowing. ‘Major structural features that cause significant typological disruption’ (: ). Thomason and Kaufmann (: ) state that the relation between ON and OE is of a different type, which they call typologically favoured borrowing; this is ‘structural borrowing at a higher level than the intensity of contact would seem to warrant, thanks to a close typological fit between source-language and borrowing-language structures’. As we concluded in §.., this seems a more plausible characterization of the contact between ON and OE than the E&F’s proposal that there was language shift. Aside from the first, all of these degrees of borrowing might involve the transmission of parameter values by disruption of the PLD according to the schemas for direct and indirect contact in (). What distinguishes the degrees in (()–()) is, on the one hand, the frequency of expression of a parameter: a change to a parameter associated with C, T, or v will have a greater effect than one associated with P, for example, since all clauses involve realizations of the former but only some involve a realization of the latter. It is very tempting to characterize the degrees of contact summarized in () as nanoparametric (), microparametric (), mesoparametric (), and macroparametric () borrowing. Clearly, a change in a more highly ranked parameter may have much more impact on the overall grammatical system, and thus require a much greater degree of contact for the younger group in the schema in () to be able to disregard the evidence for the very different indigenous structure. Something along the lines of Thomason and Kaufman’s scale could in fact be indicative of the general form of parameter hierarchies (see §.) and even be used as a tool to investigate them. This would represent a major, but very interesting, research project. 5.2.5. Conclusion In this section I have considered how language contact can be looked at in relation to the general approach to syntactic change that I have been describing here. It can be integrated quite usefully into the approach 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

and serve as a way of solving the Regress Problem, as well as giving us one way of seeing how Inertia can be violated. Here we looked at two cases of contact: contact between ON and OE in the Danelaw in the period –, and contact between Celtic languages and English in Ireland and Wales. We concluded that the OE-ON contact in the Danelaw is not necessary, and probably not sufficient, to explain the word-order changes in ME, pace Trips (). We also saw good reasons to reject E&F’s proposal that there was language shift from OE to relexified ON in England in the twelfth century; in this we agreed with the conclusions reached by Bech and Walkden (). Thomason and Kaufman’s () proposed ‘Norsified English’ seems a more accurate characterization of the situation in the former Danelaw area in ME than E&F’s ‘Anglicised Norse’. The possible Celtic substratum effects on English represent plausible accounts for some of the syntactic peculiarities of the varieties of English spoken in Ireland and Wales, although they may not be the only possible accounts. Both cases discussed here are probably indirect contact as in (b); the ME case, as we pointed out, involves RL-agency in Van Coetsem’s (, ) terms, while the Celtic case was one of imposition of the SL on the original Celtic-speaking communities. In §.., we saw an example of direct contact as in (a): the borrowing of preposition-stranding, along with some English prepositions, into PEI French. This is a very straightforward case of lexical borrowing creating a new syntactic option, in this instance preposition-stranding. In general, though, we can see that the evidence of language contact can be integrated into a parametric, acquisition-driven model of syntactic change quite unproblematically, and indeed in a way which yields a number of interesting research questions, particularly if Thomason and Kaufman’s borrowing scale can be interpreted in terms of parameter hierarchies.

5.3. Creoles and creolization 5.3.1. Introduction: pidgins and creoles Having looked at contact in general in the previous section, let us now look at what has often been seen as a special or extreme case of contact: the development of pidgins and creoles. The study of pidgins and 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CREOLES AND CREOLIZATION

creoles has been an important part of linguistics since the pioneering work of Schuchardt (see the collected translations in Schuchardt , , and the historical overviews in Holm : ‒ and Aboh and DeGraff : ‒). The essential interest of these varieties for historical linguistics lies in their origins. At least the external aspects of the origins of creoles are somewhat unusual, in that many of them arise from vernaculars developed in contact situations, usually known as pidgins. This has led to the idea that creoles might be significantly different from non-creole languages in that they may not have developed through the ‘normal’ mode of generation-to-generation transmission of language, an idea which Aboh and DeGraff () take issue with. Instead, they are generally thought to develop from pidgins, which are simplified communication systems limited to the contact situation in which they arise. Most importantly, pidgins are no-one’s native language. Holm (: –) defines pidgins as follows: A pidgin is a reduced language that results from extended contact between groups of people with no language in common; it evolves when they need some means of verbal communication, perhaps for trade, but no group learns the native language of any other group . . . By definition the resulting pidgin is restricted to a very limited domain . . . and it is no-one’s native language. (emphasis in the original)

Creoles, on the other hand, are thought to arise when a pidgin functions as PLD for a generation of children, and thereby becomes a native language. To quote Holm (: ) again: A creole has a jargon or pidgin as its ancestry; it is spoken natively by an entire speech community, often one whose ancestors were displaced geographically so that their ties with their original language and socio-cultural identity were partly broken. Such social conditions were often the result of slavery.

Thus the opposition between pidgins and creoles can be stated as follows: pidgins have no native speakers, are typically acquired by adults, have restricted communicative functions, are structurally simple, show inconsistent structural patterns, and may even depart from the universal properties imposed by UG. Creoles, on the other hand, have native speakers, are acquired by children, have a full range of communicative functions, are structurally complex with consistent patterns just like any other language, and are compatible with what we know about universals, both from a typological point of view and in the sense of not showing features which violate UG. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

If creoles arise exclusively from pidgins, as pidgins are described here, then this implies an ‘exceptional’ kind of origin for creoles. As already mentioned, this is the idea that has given rise to much of the interest in creoles. However, if the role of the superstrate or lexifier language (i.e. the—usually European—language that at least provides most of the pidgin and creole vocabulary) and that of the substrate language (i.e. the—most often African—language that was spoken by the displaced populations who are the ancestors of the eventual creole speakers) are taken into account, along with what is known about contact and substratum effects, then it is possible that the history of creoles does not feature such an exceptional ‘interruption’. This is the view advocated by Mufwene (, ) and DeGraff (, , , ), and, in particular, by Aboh and DeGraff (), as we shall see. 5.3.2. The Language Bioprogram Hypothesis It has frequently been pointed out that to the extent that the process of creolization involves, to quote Sankoff and Laberge (), ‘the acquisition of native speakers by a language’, creoles may be able to tell us much about language acquisition, language change, and learnability. Pidgins and creoles are therefore of central importance to the concerns of this book. Among the best-known work in this connection is that of Derek Bickerton (, , ), who has argued that creoles give a direct insight into the language faculty. Bickerton () put forward what he called the ‘Language Bioprogram Hypothesis’, whose central idea is that creoles are acquired on the basis of a radically impoverished trigger, so impoverished that the Language Bioprogram, the innate capacity which makes language and language acquisition possible, has a more direct relationship with the final-state system than in the case of non-creole languages; in later work Bickerton () identified this with Chomsky’s conception of UG, as I shall do here. (A good overview and assessment of the Language Bioprogram Hypothesis is Veenstra .) Non-creole languages are, as we argued in the Introduction to Chapter , underdetermined by experience, but the standard assumption is that aspects of experience profoundly influence the final state of language acquisition by triggering parameter-setting, etc. On the other hand, as we mentioned above, creoles have been taken by Bickerton and many others to have the characteristic property that their history 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CREOLES AND CREOLIZATION

features a break in the normal generation-to-generation ‘transmission of language’. The special property of creoles is that they are based on highly defective, or even absent, PLD, as this comes from pidgin. In this situation, to quote Bickerton, ‘the human linguistic capacity is stretched to the uttermost’ (Bickerton : ). To further quote Bickerton: It is debatable whether the P-input [pidgin input—IGR] is language at all . . . P-input is in no sense a reduced or simplified version of some existing language: it is a pragmatic, asyntactic mode of communication using lexical (and very occasionally grammatical) items drawn mainly, but by no means exclusively, from the politically dominant language. (Bickerton : )

At the point of creole genesis, then, a new system is effectively ‘invented’. Because of this, Bickerton argues, creoles provide a unique window on the language faculty. The substantive empirical claim in connection with the Language Bioprogram Hypothesis (and, more generally, ‘exceptionalist’ views of creoles), which can be thought of as following from the idea that ‘the human linguistic capacity is stretched to the uttermost’ in creolization, is that, while creoles represent systems that conform to UG, they in fact only show a small amount of the variation that we know from the study of non-creole languages to be available. To put it in terms of parameters, the claim is that creoles occupy only a small sub-area of the general space of possible parametric variation. (See §.. on the idea that the parameters of UG make available a multidimensional state space.) We will look at this claim more systematically in §.. on the basis of some of the evidence available in The Atlas of Pidgin and Creole Structures Online (Michaelis et al. , https://apics-online.info/; henceforth APiCS). If this claim proves to be correct, then there is indeed something special about creoles, and it would certainly be natural to attribute this to the special circumstances surrounding their origins. The evidence for the above claim comes from the morphosyntactic similarities claimed to hold among creoles that are based on different lexifier languages and are widely dispersed both geographically and historically. Bickerton (: ch. ) lists twelve such properties, and other authors (Taylor : ; Muysken : –; Romaine : –) have listed others. Holm and Patrick () systematically consider a total of ninety-seven possible shared properties across eighteen creoles from different parts of the world and with different lexifier languages (although they observe that the similarities are 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

‘limited’ (xi). It is most unlikely that such similarities are the result of historical borrowing or contact among the creoles themselves, and extremely unlikely that they are due to chance.14 Let us now look at some of the most striking morphosyntactic similarities that have been claimed for creoles, and then consider how they might be understood in terms of some of the parameters of UG that we have been dealing with in this book. The first concerns the nature of verbal inflection. Holm (: ) states that ‘[w]ith few exceptions, basilectal Atlantic creole verbs have no inflections’.15 Person– number agreement marking is entirely absent, including in creoles whose lexifier languages are inflectionally rich null-subject Romance languages such as Spanish and Portuguese. Tense, mood, and aspect (TMA) are indicated by preverbal particles, whose nature we will return to below. Mufwene (: –) shows that Kituba, a Bantu-based creole, lacks the typical Bantu pronominal prefixes on verbs indicating person and number and the agglutinating tense-aspect system. Person and number are expressed with overt pronouns, and tense-aspect by invariant particles. Verbal inflection is reduced to finite vs. non-finite marking and the passive, applicative, and causative extensions. In our discussion of the loss of V-to-T movement in the history of English in §.., we suggested verbal agreement inflection expressed the positive value of this parameter until the sixteenth century. A corollary of the absence of the person–number agreement on verbs, then, is the absence of V-to-T movement in creoles. This property is particularly striking in French-based creoles, as the evidence for V-to-T movement is particularly clear in French (see §...). The following examples from Haitian Creole illustrate the lack of V-to-T movement in this language (see Aboh and DeGraff : f.; DeGraff : –; Roberts : –): 14 The ‘monogenesis’ theory of creole origins holds that all creoles based on European languages are derived from Sabir, a Portuguese-based creole that was spoken in Africa in the fifteenth century, and which perhaps descended in turn from the older Mediterranean contact vernacular Lingua Franca (cf. Thompson ; Whinnom ; Todd ; Holm : ‒). Mufwene (: –) points out that the monogenesis theory only begs the question of universalist vs. substratist explanations for the nature of creoles, since we do not know how Sabir or Lingua Franca were formed; for a detailed and critical discussion of monogenesis, see Holm (: –, –). On Lingua Franca, see Holm (: –, : ‒). The table given in Romaine (: ) indicates one possible set of ‘monogenetic’ relations. 15 Atlantic creoles are those spoken in ‘the Caribbean area and coastal West Africa’ (Holm : ).



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CREOLES AND CREOLIZATION

() Adverb: a. Bouqui repasse déjà le linge. Bouqui irons already the cloth ‘Bouqui is already ironing the clothes.’ b. *Bouqui déjà repasse le linge. c. *Bouki pase deja rad yo. Bouki iron already cloth the(ir) d. Bouki deja pase rad yo. Bouki already iron cloth the(ir) ‘Bouki has already ironed the(ir) clothes.’ () Negation: a. *Jean ne pas aime Marie. b. Jean n’aime pas Marie. ‘John does not love Mary.’ c. Boukinèt pa renmen Bouki. Boukinèt  love Bouki d. *Boukinèt renmen pa Bouki. Boukinèt love  Bouki ‘Boukinèt does not love Bouki.’

[French]

[Haitian]

[French]

[Haitian]

Here we see that the usual diagnostics for V-to-T movement clearly show that Haitian has the ‘English’ value for this parameter. The same is true of the Indian Ocean creole Mauritian (from Green : ): () li pa pu dir narjẽ. he neg prt say nothing ‘He won’t say anything.’

[Mauritian]

Trinidad Creole French also shows this pattern (Hancock , cited in Holm : ).16 16 Réunionnais, the French-based ‘semi-creole’ (Holm : –, ) spoken on the island of Réunion in the Indian Ocean, appears to have V-to-T movement: (i) Li mãz pa sel. [Réunionnais] he eat not salt ‘He doesn’t eat salt.’ However, Réunionnais is known to be more ‘heavily influenced’ by French than the other Indian Ocean creoles or Haitian. In fact, Réunionnais is explicitly excluded from the class of creoles by Bickerton (: ). Baker and Corne (: ) provide evidence that Réunionnais had greater superstrate contact than other French-based creoles. Another



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

A second morphosyntactic similarity has to do with word order: creoles are almost without exception SVO (Bickerton , , ; Mühlhäusler ; Muysken ). There is nothing remarkable about this in creoles based on English or Romance, but it is striking that Dutch-based creoles such as Negerhollands and Berbice Dutch are also SVO in both main and embedded clauses (Holm : ). Similarly, Rabaul Creole German (also known as Unserdeutsch), spoken in Papua New Guinea, is consistently SVO; it is also not V (Romaine : ). SVO is of course a very common order among non-creoles, but SOV is just as common, and there is a significant minority of VSO languages. So creoles as a group can be distinguished from non-creoles in that they do not show non-SVO typologies; on the other hand, SVO itself is not confined to creoles. This is a good example of how creoles occupy just part of the space of variation that is attested in language in general.17 Consider next null subjects. It seems that creoles generally do not have referential null subjects. This, combined with the general absence of verbal agreement inflection, means that they are not consistent nullsubject languages (CNSLs) in the sense described in §... Here the interesting cases are creoles derived from CNSLs such as Spanish and Portuguese. The following examples from the Spanish-based creole Papiamentu illustrate both the impossibility of omitting the subject

interesting intermediate case is Mesolectal Louisiana Creole, as discussed by Rottet (); DeGraff (); DeGraff and Dejean (); Roberts (). Cape Verdean has rather ‘low’ verb-movement, perhaps to a low Asp head of the kind introduced in §...(); see Baptista (), Roberts (a: ‒). It has been claimed that Palenquero has V-to-T movement—see DeGraff (: –) and the references given there. 17 Some Indo-Portuguese creoles show at least optional OV orders. The variety spoken in Sri Lanka, reported in I. Smith (, a,b, ) and discussed in Holm (: –), shows some OV orders, along with ‘case inflections on nouns, verbal inflections, postpositions, a phrase-final quotative particle and conditional marker, and various postposed particles’ (Holm : ); in other words, a considerable range of ‘OV’ properties. Holm further observes that all speakers of this variety are bilingual with Tamil, a language which, like the closely-related Malayalam (see Chapter , ()), has these features. He suggests that ‘it is possible that the creole’s morphology is the result of recent wholesale borrowing associated with language death’ (: ). There are other possible counterexamples to the claim made in the text. Bickerton (: ) mentions Spanish-based creoles spoken in the Philippines which have VSO order; again, according to Bickerton, all speakers of these creoles speak a verb-initial Filipino language. Romaine (: –) mentions Trader Navajo, which is VSO, along with Hiri Motu and Eskimo Trade Jargon, both SOV, but the latter is a contact jargon and the former a pidgin (Holm : –, –; it is unclear whether Trader Navajo is a pidgin or a creole). See Table . and the discussion in §...



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CREOLES AND CREOLIZATION

pronoun (b) and the impossibility of ‘free inversion’ (c); see §.., on the relation between null subjects and free inversion. I have also indicated the corresponding grammatical Spanish sentences in parentheses (the examples are from Muysken : ): () a. E ta kome. he  eat (él está comiendo) ‘He is eating.’ b. *Ta kome. (está comiendo) ‘(S/he) is eating.’ c. *Ta kome Maria.  eat Maria (está comiendo Maria) ‘Maria is eating.’

[Papiamentu]

Nicolis (: –) provides a very detailed survey of the status of the null-subject parameter in a wide range of creoles. He observes that referential null subjects are absent in the following creoles: Kriyol (Portuguese-based, spoken in Guinea-Bissau), Saramaccan (Spanishand Portuguese-based, spoken in Suriname), and Cape Verdean (Portuguese-based, spoken in Cape Verde). Holm (: –) states that ‘[i]t is not possible to omit the subject pronoun in Principe C[reole] P[ortuguese]’ () and that ‘the pronominal systems of the other Iberianbased creoles have some points in common with it’ (), implying that unstressed subject pronouns are obligatory where null subjects can appear in the lexifier languages Spanish and Portuguese. It seems clear, then, that creoles in general do not permit referential null subjects. The situation concerning non-referential null subjects is rather more complex. All of the creoles just mentioned allow these: () a. (A) (bi-) kendi/ koto. it  hot/ cold ‘(It) was hot/cold.’ (Byrne : ) b. Tawata jobe.  rain ‘(It) was raining.’ (Kouwenberg : ) 

[Saramaccan]

[Papiamentu]

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

c. Falta puku karu maja l.18 lack little car hit him ‘A car nearly hit him.’ (Kihm : , cited in Nicolis : ) d. Sta faze kalor oji. is make hot today ‘It’s hot today.’ (Nicolis : )

[Kriyol]

[Cape Verdean]

Moreover, many creoles whose lexifiers are non-null-subject languages show the same pattern in allowing non-referential null subjects but disallowing referential ones. This is true of Berbice Dutch (Dutchbased), Haitian (French-based), Jamaican (English-based), and Mauritian (French-based), as the following examples show: () a. Te fè frèt. [Haitian]  make cold ‘It was cold.’ (DeGraff : ) b. O bi masi mεnlε dunggrə. [Berbice Dutch]  say must middle night ‘He said (it) must be midnight.’ (Kouwenberg , cited in Nicolis : ) c. (I) look like im nuh like yu. [Jamaican] () look like   like  ‘It looks like s/he does not like you.’ (Durrleman , cited in Nicolis : ) d. Posib Pyer lakaz. [Mauritian] possible Peter house ‘It is possible that Peter is at home.’ (Nicolis : )

18 Kriyol seems to differ from the other creoles mentioned here in that it does not allow ‘meteorological’ null subjects: (i) I na burfa. [Kriyol] ‘It’s drizzling.’ (Nicolis : ) See again Table ..



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CREOLES AND CREOLIZATION

Some creoles allow arbitrary null subjects. This is true of Cape Verdean and Mauritian: () a. Na veron, ta corda sedu. In.the summer  wake early ‘In the summer one wakes up early.’ (Barbosa : ) b. Fer rom ar disik. Make rum from sugar ‘Rum is made from sugar.’ (Syea , cited in Nicolis : )

[Cape Verdean]

[Mauritian]

It is also true in Papiamentu (Muysken and Law , cited in Barbosa : ). Following Roberts (a: , note ), it seems that these creoles are ‘a particularly restrictive kind of PNSL [partial null-subject language]’. Barbosa (: ) argues that creoles instantiate a further type of null-subject language (NSL), which she terms ‘semi-pro-drop’ languages. Many creoles seem to pattern this way, irrespective of the null-subject status of their lexifier languages. A fourth property which creoles seem to share is the absence of ‘special’ complement clitics, typically occupying a preverbal position in finite clauses. Such clitics are found in nearly all the Romance languages, but appear to be systematically absent in Romance-based creoles. Holm (: ) comments that ‘object pronouns . . . always follow the verb in Romance-based creoles’. The following contrasts between French and Haitian illustrate the situation: () a. Bouqui l’aime. Bouqui -like b. Bouki renmen li. Bouki like  ‘Bouqui likes him/her/it.’ c. *Bouki li renmen.

[French] [Haitian]

Haitian is quite typical of Romance-based creoles in this respect. A fifth morphosyntactic property is the nature of the preverbal TMA particles. These are invariant, monomorphemic elements which appear in a fixed sequence preceding and adjacent to the first verb. Some particles are postverbal in some creoles (for example, the Cape Verdean anterior marker ba (Baptista : )), but the overwhelming 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

tendency is for preverbal positioning of these elements. They constitute the basic way in which TMA are expressed in creoles, and are common to all creoles. These particles have appeared in several of the examples given above, for example, the Haitian anterior marker te in (a), the Papiamentu past marker tawata in (b), and the progressive marker ta in (a). Here are some further examples from Green (: ): () a. Li pa ti kapav fer he   able do ‘He couldn’t do that.’ b. Mo te bezwẽ fε l. I  must do it ‘I had to do it.’ c. E tabata sigi bende he   sell ‘He went on selling fish.’

sa. that

[Mauritian]

[Haitian]

piska. fish

[Papiamentu]

The TMA particles cannot be fronted with the fronted verb in the ‘predicate-cleft’ construction, which involves focusing and fronting of a copy of the VP. This is illustrated by the following contrast in Papiamentu: () a. Ta [ganja] Wanchu a ganjabo.  lie John  lie-you ‘John has really lied to you.’ b.*Ta [a ganja] Wanchu a ganjabo.

[Papiamentu]

Negation (pa in both Haitian and Mauritian) must precede all TMA markers:19 () a. Jan pa t ava ale nan mache. [Haitian] Jan    go in market ‘John would not have gone to the market.’ b. Jan te (*pa) ava (*pa) ale (*pa) nan mache. Jan     go  in market Note also the Mauritian example in (a), where pa precedes the anterior marker ti. 19 This is not true in all French-based creoles. DeGraff (: ) points out that in Louisiana Creole pa follows the anterior marker te.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CREOLES AND CREOLIZATION

The TMA markers arise through fairly typical ‘grammaticalization paths’, as the following quotation from DeGraff (: ) illustrates (see also Mufwene : –, ; Aboh and DeGraff : ‒): ‘Pral, marking future, also means “to go”; dwe, marking obligation or possibility, also means “to owe”; fini, marking completion, also means “to finish”; konnen, marking habituality, also means “to know”; sòti, marking recent past, also means “to leave”; etc.’. There is also a more detailed discussion of the origins of the Haitian TMA markers in DeGraff (: –). All the above points clearly indicate that the TMA markers should be analysed as functional elements occupying Mood, T, and Asp heads in terms of the clause structure proposed in §.... The predicate-cleft evidence suggests that they are VP-external, possibly vP-external, and their preverbal, post-subject position clearly points to positions in Mood, T, or Asp heads. The fact that they may correspond to grammaticalized main verbs in the lexifier languages, combined with the loss of V-to-T movement where the lexifier had this, suggests that these elements have a diachronic source which is comparable to that of the English modals as discussed in Lightfoot (, ); Roberts (, a); Warner (); Roberts and Roussou (); and §.. See also Mufwene (: ); Aboh and DeGraff (: ‒). The morphosyntactic properties discussed above and exemplified in ()–() illustrate some of the striking similarities among creoles. Others have also been suggested, such as serial verbs (Holm :  ff.); negative concord (Bickerton : ; Holm : –; Déprez , ); lack of inversion in direct questions (Bickerton : ; Holm : ; DeGraff : ); various complexities in copular constructions (Holm :  ff.; DeGraff ; ); overt wh-movement (Bickerton : ); and the absence of passive morphology in passive constructions (Haspelmath a: ). We will reconsider some of these claims by comparing data from APiCS and WALS in §... Properties frequently found in non-creoles are rare or unattested in creoles: basic SOV order (but cf. note ), ergative alignment, referential null subjects, etc. If this is genuinely the case, then it would appear that creoles occupy a small part of the parameter space and some kind of explanation would be required; we will look at this question too in the light of the data from APiCS and WALS in §... It is important at this stage to be clear on two points. First, it is not being claimed that creoles are all identical in their syntax: Muysken 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

(: –) observes a number of differences among serial-verb constructions in creoles, as well as pointing out (: ) that some creoles allow preposition-stranding while others do not; we have observed in note  above that there are certain differences among creoles regarding null subjects, although the generalization that fully referential null subjects are at best very rare in creoles appears to hold (see Table .); finally, there appear to be a number of detailed and intricate differences among the systems of TMA markers found in creoles, despite the general similarity in the existence of such systems in the first place (see Holm :  ff.; Muysken : ). All of these observed differences among creoles, and many others, are amply documented in APiCS (see also Holm and Patrick ). Second, it is not being claimed that creoles are synchronically exceptional in their morphosyntax. None of the properties discussed above is absent in non-creoles: NE and the MSc languages lack V-to-T movement (see §...); Icelandic may be a further case of highly restrictive PNSL, allowing only expletive and arbitrary null subjects (Barbosa :  classifies Icelandic as semi-pro-drop; see Roberts a: , note ); the North Germanic languages and English lack ‘special’ pronominal clitics, and, as we already mentioned, the English auxiliary system is rather like a system of TMA markers. What is being suggested is that there is a striking overall typological similarity among creoles, in that they appear to occupy a rather small space of the overall variation made possible by UG; this is the question we will investigate more fully in §... As mentioned, Bickerton () explained this kind of observation by invoking the Language Bioprogram. This forces grammars to take on a certain form under the extreme conditions of acquisition based on pidgin PLD. In this way, it may emerge that creoles can provide a special kind of ‘window’ onto the language faculty. This account has some appeal, but it raises a problem as it stands. The parameter-based approach to syntactic variation cannot treat some set C of languages as ‘closer’ to UG than its complement set C0 . We must maintain that creoles have exactly the same relationship to UG as any non-creole language, in that they represent a grammatical system with the values of the parameters fixed (see also Lightfoot : ; DeGraff : ; Aboh and DeGraff ). However, what we could maintain, more along the lines explored in Bickerton (, ), on the basis of the idea that the PLD is particularly deficient in the case of 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CREOLES AND CREOLIZATION

creole acquisition, is that creoles can tell us something about the default, unmarked values of parameters. This idea is put forward in the following terms by Bickerton (: ): ‘The consistency of this typology, despite the absence of any consistent empirical model for it, argues strongly that in addition to universal principles of syntax we must assume the existence of an unmarked set of grammatical options by which those principles can be realized.’ In the discussion of markedness and parameter-setting in §. and §., I suggested that in the absence of a clear expression of the value of a given parameter, the default option is always taken. The absence of a clear expression of a value for a parameter amounts to weak p-ambiguity, in the sense defined in (c) of §., and repeated here: () A weakly p-ambiguous string expresses neither value of pi and therefore triggers neither value of pi. It is at least possible that much pidgin input may be weakly pambiguous in this sense, given what is known about the nature of pidgins (although, as Aboh and DeGraff :  point out, not a great deal is known for certain about many pidgins). In that case, as Bickerton suggests, it may be that owing to the nature of the pidgin input, creoles demonstrate a high preponderance of unmarked parameter-settings and tend to look alike as they all tend to have the same default parameter values. This view does not imply that creoles are qualitatively different from non-creoles, since unmarked parameter-settings are equally available to non-creoles. We can test the idea that creoles tend towards unmarked parametersettings by looking at the parameter hierarchies for null subjects, verbmovement and word order given in Chapter , assuming that the general tendencies in relation to these properties in creoles are as described above (see §.. for further discussion of this point). The hierarchies are repeated here as ()‒() (recall that ^ in () is the general rollup feature of Biberauer, Holmberg, and Roberts , responsible for giving rise to surface head-final order): () a. Is ^ present in the system? (Y/N) If N: rigidly, harmonically head-initial language b. If Y: is ^ generalized to all (relevant) heads? Y: rigidly, harmonically head-final language



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

c. If N: is ^ restricted to some subset of heads e.g. [+V]: German, Dutch, . . . d. If N, . . . further restrictions on the distribution of ^ ! greater disharmony. () B1: does V move out of VP in finite clauses? NO English

YES: B2 : does V move to the higher Aspect head? NO Spanish

YES: B3: does V move to the Tense head?

NO Portuguese

YES: B4: does V move to the Mood head? NO Italian

YES: French, Romanian

() A. Are null arguments allowed in the absence of agreement marking? (Macroparameter) YES: RNSLs (e.g. Japanese, Korean, Mandarin . . . ). NO: non-RNSLs (e.g. most, perhaps all, Indo-European languages). A. Are null subjects allowed without person restrictions and with associated ‘rich’ verbal-agreement inflection? (Mesoparameter) YES: CNSLs (e.g. Italian, Spanish, Greek, Latin, Gothic, European Portuguese, Old French). NO: non-CNSLs. A. Do null subjects show person restrictions and allow indefinite interpretations in the Sg? (Microparameter) YES: PNSLs (e.g. Finnish, Brazilian Portuguese, Marathi, Russian, Early Old High German, etc.). NO: NNSLs (e.g. Modern English, Modern French, Modern Swedish, Modern Norwegian, etc.). As we saw in §., the lower a parameter-setting is in a hierarchy, the more marked it is. It is clear that the creole values in the word-order and verb-movement hierarchies are the negative settings of the highest option: these can be seen as defaults. In the case of null-subject 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CREOLES AND CREOLIZATION

hierarchy in (), however, if creoles represent a particular kind of PNSL, as we have suggested, then this is the positive value of the lowest parameter, and hence highly marked. So we see that Bickerton’s conjecture as summarized in the above quotation has some support, but this is not unequivocal. Once again, it is worth emphasizing that creoles do not seem at all exceptional from this perspective. As DeGraff () points out, non-creoles may also develop the unmarked parameter values we have observed to hold in creoles. In fact, as Roberts (: ), DeGraff (), and Aboh and DeGraff (: ‒) point out, English has undergone many changes comparable to what we observe in creoles. In fact, English has almost the same parameter values as a typical creole for the properties in ()‒(), with the exception of (), where English has the negative value for A. Moreover, assuming that Proto Germanic was a null-subject language (see §.., §..), all of these parameters have changed in the history of the language. This does not imply that English has undergone creolization; this idea is criticized at length in Thomason and Kaufman (: –), as we mentioned in §... Nor does it imply that creoles are special; it simply shows that ‘normal’ processes of change can lead to unmarked parameter values, as well as to marked ones (see §.. on this last point). This may be more common in creoles owing to the fact that PLD consisting of pidgin is relatively prone to weak p-ambiguity. (Kihm  critically assesses the idea that creoles tend towards unmarked parameter-settings, although his discussion of the problems posed by the nature of verb-movement in Portuguese and Cape Verdean have been superseded by Schifano ; see Roberts a: ‒, ‒ and §.. for discussion.) We will come back to the question of markedness in creoles when we discuss the question of creole exceptionalism (DeGraff , , , ; Aboh and DeGraff ) in §.. below. 5.3.3. The substratum/relexification hypothesis A different and widespread view of the nature of creoles, which also purports to explain the putative morphosyntactic similarities among them that we discussed in the previous section, is the substratum hypothesis. The central idea here is that the shared grammatical features of creoles derive from the language or languages originally spoken by the creole speech community, along with relexification from the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

superstratum language (see Holm’s :  discussion of the percentages of the ninety-seven features surveyed in Holm and Patrick  in Portuguese, the Portuguese-lexifier creole of Guiné-Bissau and one of its substrate languages, Balanta). Informally, one can think of this as substituting words from the lexifier languages into syntactic structures characteristic of the substratum language(s). Given the discussion of substrate effects in terms of imperfect learning in §., the idea would be that creoles arise through an extreme case of imperfect learning of the superstrate, lexifier language, such that the syntax of the substratum language is preserved to a very large extent. Such imperfect learning could be attributed to the extralinguistic situation in which pidgins are formed, which was certainly not one in which pidgin speakers attempting to learn the lexifier language as a second language would have been given any encouragement or tuition (but see Aboh and DeGraff : ‒ on the likely interaction of L and L acquisition in the history of Haitian Creole, which presents a different, more complex and less ‘exceptionalist’ picture, one we will return to in the next section). We could think that a system with the syntax of the substratum language and the lexicon of the superstratum language combine to form the PLD for subsequent generations, and in this way the substratum may be maintained for many generations. In terms of the Borer-Chomsky Conjecture, one way to think of this would be to say that the functional lexicon, with the parameter values instantiated in the formal features of functional heads, comes from the substrate language with the contentful lexicon coming from the superstrate language. Aboh and DeGraff (: ‒) show that in fact the situation regarding the interaction of the West African substrate, French superstrate and innovation (through abductive reanalysis) in Haitian Creole is more complex than this, with all three factors interacting to create the grammatical system of Modern Haitian Creole. A number of researchers have suggested that some of the features that we mentioned above, which Bickerton and others have taken as evidence for something like a Language Bioprogram, or the prevalence of unmarked parameter values in creoles owing to the circumstances of creolization, are in fact evidence of a substratum. In the Atlantic creoles, at least, this substratum is usually taken to be West African; the people who were forced into slavery in the Caribbean, Latin America, and elsewhere originated in West Africa. This view has been put forward by various researchers, and arguably goes back to Schuchardt 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CREOLES AND CREOLIZATION

(see the discussion in Holm : –). More recently, it has been put forward by Taylor (); Boretzky (); Muysken (); Koopman (; ); Lefebvre and Lumsden (); Lefebvre (); Lumsden (); see also the discussion in Meyerhoff (). Taylor summarizes the basic ‘substratist’ position, saying ‘[w]hile African loan words are relatively few in most West Indian creoles . . . African loan constructions are both common and striking’ (Taylor : , quoted in Holm : ). One striking property is the predicate-cleft, or verb-topicalization, construction illustrated in () above. This construction does not exist in English or French (although it exists in Spanish (Vicente ), Hebrew (Landau ), Colloquial Italian, and various other languages) and is not an obvious candidate for an unmarked parametersetting, at the very least since it appears to involve an overt copy, (which, for unclear reasons appears to be a rare option; see Box . of Chapter  on some technical aspects of movement). It is, however, quite widespread in creoles: Romaine (: ) gives examples from Sranan, Krio, and Mauritian, and, according to APiCS (Feature ; Maurer et al. ), twenty-nine of seventy-two, or %, of creoles surveyed have this construction. It is also found in some languages of West Africa, as the following examples (again from Romaine : , with my parentheses; see the sources cited there) show: () a. [Mi mun] ni won mun mi. me take is they took me ‘They actually arrested me.’ b. [Hwe] na kwasi hwe ase. fall is Kwasi fell down ‘Kwasi actually fell.’

[Yoruba]

[Twi]

The Kwa languages, spoken in several West African countries (Ivory Coast, Ghana, Togo, and Benin), are often posited as potential substratum languages, at least for Atlantic creoles (Koopman , ; Lefebvre and Lumsden ; Lefebvre ; Lumsden ; Winford ; Aboh ; Aboh and DeGraff ). Among the properties of creoles that might be attributable to a West African substrate such as the Kwa languages, are SVO order, preverbal TMA particles, serial verbs, the form of comparative constructions, and postnominal determiners. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

Regarding SVO order, according to the data in WALS, twelve out of nineteen Kwa languages surveyed are SVO, with one reported as showing SVO/SOV alternations (see below) and no relevant information available on the others (see https://wals.info/languoid/genus/ kwa#/./-.). On the other hand, Bambara, another West African language which is sometimes mentioned as a possible substrate language (see Holm : ), is OV. Some of these languages show a complication to basic VO order, however, in that in compound tenses a non-pronominal object precedes the main verb but follows the auxiliary. The following examples, from DeGraff (: , and see the sources cited there; see also Aboh , ,  on Gungbe), illustrate this: ˇ lìnkún. () a. Ùn Dú mO I eat rice ‘I eat rice.’ ˇ lìnkún Dú wε. b. Ùn Dò mO I be rice eat  ‘I am eating rice.’

[Fongbe]

[Fongbe]

Moreover, Aboh (; ) analyses the Gbe languages—including Fongbe—as having systematic V-to-T movement. Haitian and the other Atlantic creoles entirely lack this kind of alternation. As DeGraff (: ) points out, this ‘is unexpected in the strict-relexification proposals’. Regarding TMA particles, there are certainly some striking similarities between creoles and Kwa and other West African languages, clearly exemplified in the table in Holm (: ). However, the data in WALS is rather equivocal on this point: five of the Kwa languages have preverbal tense-aspect particles, two have an inflectional future, one language has mixed prefixing and suffixing, one marks tense-aspect with tone and one has no tense-aspect marking at all. Yoruba has no tense marking and Bambara has suffixes. DeGraff (: ) points out that postverbal and suffixal aspectual markers have been documented in the Gbe languages (see also Aboh ; ). Once again, then, in this regard the case for substratum effects is not straightforward. In languages which allow serial verbs, more than one verb can appear in a single VP (or perhaps vP; Baker : ), as seen in the West African language Edo: 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CREOLES AND CREOLIZATION

() Òzó ghá lè èvbàré khiẹ’n. Ozo will cook food sell ‘Ozo will cook the food and sell it.’ (Baker : )

[Edo]

Verb serialization is a widespread property in the world’s languages, being found in Chinese and other East Asian languages, and many African languages, including the Kwa languages and other West African languages. As such, it may be a good candidate for substrate influence. A common form for comparative constructions in creoles is a serialverb construction with a verb meaning ‘exceed’ following the expression of the standard of comparison and the adjective. Holm (: ) gives examples of this from Principe Creole Portuguese, Lesser Antillean (French-based), Ndjuka, and Gullah (both English-based). Here is the Ndjuka example: () A bigi pasa mi. ‘He is taller than I.’ (Hancock : , cited in Holm : )

[Ndjuka]

According to the data in WALS, this kind of comparative is very common in West Africa, being found in Yoruba, Igbo, and Hausa, for example, although no data is given concerning the Kwa languages. This, then, is another good candidate for a substratum effect, although it seems very likely that this kind of comparative construction is connected to the existence of verb serialization. Finally, a striking common property of many creoles and at least some West African languages is the presence of postnominal articles (Holm : –; Lumsden ). The following examples illustrate the similarities between the creoles Haitian and Principe and the West African languages Fongbe and Yoruba: () a. tab la table the ‘the table’ (Lumsden : ) b. bato a yo boat the  ‘the boats’ (Lumsden : )

[Haitian]



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

() a. DìDè Ò lÈ sketch the  ‘the sketches’ (Lumsden : ) b. wèmá Ò lÈ book the  ‘the books’ (Lumsden : )

[Fongbe]

() a. básta di óru sé can of gold the ‘the can of gold’ (Boretzky : , cited in Holm : ) b. owó tí nwọ fún mi náà money which they gave me the ‘the money which they gave me’ (Rowlands : , cited in Holm : )

[Principe]

[Yoruba]

Aboh and DeGraff (: ‒) show that Gungbe and Haitian also pattern very similarly regarding the relative position of adjectives, except that Haitian resembles French in having some prenominal adjectives: () a. Súrù xɔ` ɔ´ ɖàxó àwè lɔ́ lέ. Suru buy horse big two   ‘Suru bought the two big horses (in question).’ b. Jinyò achte (gwo) chwal (blan) yo. ‘Jinyo bought the big white horses in question’.

[Gungbe]

[Haitian]

As the use of ‘in question’ in the translations indicates, the nominals have a specific reading (see Aboh and DeGraff , : ‒ for further discussion). The examples in () and () show that the article comes after the noun and various adnominal complements and modifiers, such as relative clauses, but, at least in Haitian, Fongbe, and Gungbe, precedes the number marker. This looks like a further good candidate for a substrate feature, then. The available data in WALS only partially confirms this: the Kwa languages vary somewhat although there is a tendency for them to have definite articles, a strong tendency for Noun-Numeral order (found in all the languages for which there is data on this point), and some variation in plural marking between prefixes, suffixes, clitics and separate words. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CREOLES AND CREOLIZATION

The substratum idea is most strongly supported by constructions which do not obviously represent unmarked parameter-settings, which are not cross-linguistically common, which are robustly attested both in West African languages and in creoles, and which are not found in the superstrate, lexifier languages. Of the properties just reviewed, predicate-clefting may meet these criteria, although the exact status of this feature in relation to the relevant parameter hierarchy is not clear. Postnominal articles represent a D-final system, and so FOFC predicts general head-finality in DP; the evidence of general Noun>Numeral and Noun>Adjective order, along with widespread Noun>Demonstrative order, in the Kwa languages supports this: these languages seem to represent the reverse form of disharmonic order as compared to German, with head-final nominals and head-initial clauses (see §. and §.., ()). As such, they constitute a system with word-order mesoparameters of the kind in (c), and so are intermediate in markedness. TMA particles are not very robustly attested in the Kwa languages, as we saw, and as such do not fully meet the criteria just outlined. Further, if serial verbs represent an unmarked parameter value (see the discussion of deep analyticity in vP in Huang , Huang and Roberts , and Roberts a: ‒; Roberts suggests that this represents a macroparametric option, see the hierarchy in Roberts a: ), then they may meet the criteria. It seems, then, that there is a case for a substratum explanation for at least some properties of creoles. We could therefore suppose that speakers of Kwa and other West African languages carried over the syntactic properties of those languages in their attempts to learn French, English, Dutch, Spanish, or Portuguese, for the most part simply substituting lexical items from the European languages into their native syntactic structures. This relexified Kwa would have constituted the PLD for subsequent generations, and the substratum would have thereby persisted across the generations, much along the lines of the general scenario for substratum effects that I suggested in the previous section. However, simple relexification cannot be the whole story. As already mentioned, Aboh and DeGraff (: ‒) make a strong case, on the basis of both clausal and nominal syntax, that superstrate, substrate, and innovation through abductive reanalysis interact in complex ways in history of Haitian, and there is no compelling reason to suppose that Haitian is an atypical creole. A further relevant point is that we find conflicting features in Kwa languages and creoles. One such feature is 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

adposition-complement order. Postpositions appear to be widespread in the Kwa languages, according to WALS, but creoles are typically prepositional: according to APiCS (Feature ; Huber et al. ), sixtysix out of seventy-six creoles surveyed (%) are prepositional. Arguably, then, the substratum hypothesis needs to be supplemented with an account of which features of the substrate languages are most likely to be retained, and why. Here again markedness may have a role to play. Aside from the empirical issues discussed above, we can identify two conceptual problems with the relexification approach. First, if parameters are associated with lexical entries, as we are assuming here, then it is difficult to see how the notion of relexification can be formulated. The intuitive idea that lexical items from one system are inserted into the syntactic structures of another cannot be maintained under the minimalist assumptions being adopted here. In fact, the technical approach to lexical insertion assumed in Chomsky (: ch. ) and subsequent work does not allow for lexical items to be substituted into slots created by syntax; instead, merging lexical items creates syntactic structure—this was implicit in our discussion of Merge in the Introduction. The second problem is very clearly articulated by DeGraff (: ) in the following passage, where he is discussing Lefebvre’s () claim that the creation of Haitian involved relexification of West African substrate with French vocabulary: Lefebvre must assume that the Creole creator was somehow able to segment and (re) analyze French strings and adopt and adapt a great deal of French phonetics and surface order—down to the phonetic shapes and surface distribution of many affixes and grammatical morphemes—while ignoring virtually all abstract structural properties of French. Such a feat would make the Creole creator unlike any other language learner documented in the psycholinguistics and language-acquisition literature. After all, word segmentation and word- and affix-order are reflexes of abstract morphosyntactic properties. (DeGraff : , emphasis in original)

The evidence for Very Early Parameter-Setting which we reviewed in §. bears out DeGraff ’s point: the evidence is that, if anything, parameters are set before many surface forms are in fact acquired. If this is so, then the question why the French grammatical system was not acquired along with its lexicon by the creators of Haitian becomes very acute. DeGraff (: ) also points out that this implies that creoles cannot be entirely new creations, as implied by the Language Bioprogram Hypothesis, since they are made up of lexical items from the superstrate 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CREOLES AND CREOLIZATION

language which presumably retain some of the associated abstract morphosyntactic properties. The idea that the lexical items of an alien system come associated with certain parameters is also supported by our discussion of PEI French and Late ME in §... There we saw that abstract syntactic features such as preposition-stranding or the ability to license a particular structural Case (a particular property of v*) can be introduced into a system along with borrowed lexical items. A simple notion of ‘relexification’ predicts that all syntactic properties of the superstrate language would survive, and it is certainly clear that this is not the case. As we have seen, substrate and superstrate influences are both found in creoles and they interact. The question is whether what has happened in the history of creoles is in any profound sense distinct from the diachronic processes at work in the history of non-creoles. This is the subject of the next subsection. 5.3.4. Conclusion: how ‘exceptional’ are creoles? In conclusion, the case that creoles typically favour unmarked values of parameters, as discussed by Bickerton (, : , : ff.), is an interesting idea which has a certain amount of evidence in its favour but the evidence is far from equivocal and does not really support an ‘exceptionalist’ view of creoles, as we have seen. At the same time, there is some evidence for substrate effects from West African languages, although this is not always straightforward. Some features which seem to be widespread in creoles could perhaps be equally well accounted for either way; this may be true for SVO order, TMA markers, or serial verbs, for example. But in fact, as Mufwene (: –) pointed out, there is no real contradiction between the ‘universalist’, markednessbased view of creoles and a substratum account of the origin of at least some common traits: ‘most of the features of pidgins and creoles that the substrate hypothesis has been claimed to explain are not really accounted for unless some universal principles are accepted to apply at some stage in the formation of these languages’. We also saw that Aboh and DeGraff () stress the interaction of substratum and superstratum effects with innovation through abductive reanalysis. It is clear that there is nothing about the theory of markedness that would prevent the Kwa languages, or any other group, having a number of parameters set to unmarked values: indeed at least VO order and serial verbs represent unmarked parameter-settings on our assumptions. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

These parameter-settings would naturally emerge in the creoles, either through the PLD or by default. More marked parameter values may not be sufficiently triggered by the PLD. As MacMahon (: ) puts it, ‘[w]hen structures in different substrates coincide, these will be especially likely to be introduced into the creole; the bioprogram is here seen as a last-resort explanation, to be invoked when the relevant substrate structures conflict or no evidence for a particular structure is available’. In other words, markedness considerations become crucial when the PLD is either strongly or weakly p-ambiguous. As we have seen, this is a situation we encounter in many cases of syntactic change, not just creolization. This brings us to a final point on the topic of creoles: the question of creole exceptionalism. As I stated at the beginning of this section, much of the interest in pidgins and creoles has been stimulated by the idea that these systems can tell us something special about the nature of language change, language acquisition, or the language faculty. This is because the process of creolization has been thought to represent a break in the usual generation-to-generation transmission of language. DeGraff (, , a,b, ) and Aboh and DeGraff () have argued against ‘creole exceptionalism’ of this kind. Aboh and DeGraff () summarize this position, and argue in favour of what they call a null theory of creole formation. Their view is that creoles have emerged through normal processes of language change, a view they support by pointing out that observed changes in the history of French, the Scandinavian languages, and, in particular, English, have yielded in many cases similar results to what we observe in many creoles. (We observed above that English has changed most of the parameters discussed above in connection with creoles, and in the same direction, in the course of its history.) As DeGraff (a: ) says, ‘H[aitian]-C[reole] morphosyntax does not, and could not, isolate HC and its diachrony in some exclusively “Creole” empirical domain’. Similarly, DeGraff (b: ) says that ‘[i]n recent work, the joint investigation of language contact, language change and language acquisition, suggests that there is not, and could not be, any deep theoretical divide between the outcome of language change and that of Creole formation’. Aboh and DeGraff () argue that creole studies have since their inception stressed the alleged exceptional nature of creoles for dubious reasons. They begin by highlighting ‘the various biases that these 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CREOLES AND CREOLIZATION

definitions [of creoles, IGR] may have introduced into linguistics from the start’ (), suggesting that ‘[m]any of these biases go against the spirit of Universal Grammar’ (). They further point out, citing Mufwene (: ‒) that ‘creolization should [be] taken as a socio-historical, and not a linguistic, concept’ (). They observe, as we did above, that the ‘belief that Creole languages manifest the most extreme structural simplicity is often related to, among other things, an alleged ‘break in transmission’ due to the emergence of a structurally reduced Pidgin spoken as a lingua franca, immediately prior to their formation’ (). They argue strongly against the claim ‘that Creole languages as a class do not manifest complexity levels that surpass that of older, non-Creole languages’ (), demonstrating that Haitian nominals and relative clauses have a range of complex and intricate syntactic and semantic properties ‘that are not attested in contemporary dialects of French and Gbe’ () (the respective superstrate and substrate languages for Haitian) (see also Aboh and DeGraff ; Aboh : ‒). They also point out that there is no actual evidence for a pre-Haitian Pidgin: ‘there isn’t, to the best of our knowledge, any documentation of any such pre-Creole Pidgin in the history of the colonial Caribbean’ (). Furthermore, the alleged prevalence of certain unmarked, simple features in creoles to the exclusion of non-creoles is not empirically supported: see below for more on this point. As already mentioned, they observe that English has undergone a number of changes in its recorded history comparable to those that have occurred in the development of Haitian from French (see also DeGraff : ‒), and point out that a highly analytic language such as Vietnamese ‘seems more “Pidgin”-like than the earliest documented varieties of Creole in th-century Haiti’ (). More generally, ‘whatever evidence we have about the earliest documented varieties of HC [Haitian Creole, IGR] contradicts claims about a drastically reduced Pidgin as an essential ingredient in Creole formation’ (). This claim holds more generally than just for Haitian: ‘[o]nce Creoles are analyzed holistically . . . it becomes doubtful that there ever was a structureless Pidgin in their history—especially one so reduced that it would have massively blocked the transmission of features from the languages in contact into the emergent Creole’ (; see also Winford :  on Atlantic creoles generally). Moreover, as mentioned above, Aboh and DeGraff argue against the idea, discussed in the previous two sections, that creoles uniformly 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

manifest typological restrictions, documenting tense inflection in Kenya Pidgin Swahili, Pidgin Ojibwe, Taymir Pidgin Russian, and Fanakalo; subject-agreement inflection in Pidgin Ojibwe, Central Hiri Motu Pidgin, Arafundi-Enga Pidgin, and Lingala; evidential markers in Chinese Pidgin Russian; noun-class markers in Fanagalo, Kitúba, and Lingala; tone in Nubi Arabic and Lingala; and both SOV and, strikingly, OSV order in Ndjuka Trio Pidgin. They argue that sampling biases lie behind ‘claims that try to isolate Creole languages into one small corner of linguistic typology with grammars that fit a narrowly defined uniform structural template that, in turn, is placed at the bottom of some arbitrarily defined hierarchy of complexity’ (). In particular, they criticise Parkvall’s () study of the  languages, including thirtyfour pidgins and creoles, based on features from WALS. They point out that ‘[m]ost of the Creoles in this study are historically related to typologically similar European lexifiers (typically Germanic or Romance) with relatively little affixal morphology and with few crosslinguistically rare features, and to African substrates (mostly NigerCongo) that fall in narrow bands of typological variation as well’ (); in fact, we saw in the previous section that the Kwa languages are somewhat typologically uniform. More generally, ‘these Creole languages are in the set-union of Germanic, Romance, and Niger-Congo and have relatively similar profiles—within a relatively small window of typological variation’ (). Hence, for Aboh and DeGraff, the fact that Creoles appear to occupy a small part of the available variation in UG is an accident of the substrate and superstrate languages they have historically developed from. Again, there is nothing more exceptional here than anything we know from the history of English. In fact, as DeGraff (: f.) observes, in some respects (e.g. a highly mixed lexicon), English ‘out-creoles’ Haitian and other creoles. They go on to argue at length that there is little or nothing in the diachronic processes that caused French to develop from Latin that is distinct from the processes at work in the development of Haitian from French. In particular, they point out that one could postulate a ‘discontinuity’ in the development of French from Latin, but ‘French is a genetic descendant of Latin even if it belongs to ‘a typological class that is quite remote from the structural type represented by Latin’ (Meillet : )’ (Aboh and DeGraff : ). Latin appears to instantiate the main typological features common to the older Indo-European languages (see the discussion of the possible reconstruction of aspects 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CREOLES AND CREOLIZATION

of PIE syntax in §..), and we have mentioned several times that the Romance languages as a group have diverged from Latin in largely convergent typological directions. They conclude that ‘Caribbean Creoles duly fall in the scope of the comparative method as languages genealogically affiliated with their ancestors, notwithstanding the documentation of quasi-Sprachbund phenomena due to the pervasive influence of overlapping substrate and superstrate languages such as Germanic, Romance and subsets of Niger-Congo languages’ (). They go on to elaborate their account of L and L interaction in creating the PLD for successive generations of creole acquirers; as we mentioned, this entails an intricate interaction of (apparent) substrate and superstrate effects and innovations through abductive reanalysis of the kind described in §.. If Aboh and DeGraff () are correct, there is little or nothing that is ‘unusual’ about creoles, except perhaps the fact that the PLD may have been particularly unstable for several generations owing to the constant arrival of new L speakers in the creole speech communities. They suggest that there is an L‒L ‘recursive cascade’, ‘where the utterances produced by both L and L learners feed into the PLD for subsequent L and L learners, and then the latter’s utterances in turn feed the PLD of newly born L learners and newly arrived L learners, and so on’ (). This comes fairly close to an iteration of the indirect-contact scenario in (b). ‘Grammatical inventions’, to use a term introduced in Rizzi (), arise through the usual process of creation of a grammar by language acquirers on the basis of whatever PLD is available, but are perhaps more likely to arise when the PLD is unstable due to the presence of successive Alien Corpora over successive generations. If Aboh and DeGraff () are right, then creole genesis may not be qualitatively different from the cases of imperfect learning discussed in §.. However, even this approach begs the question of how much unstable PLD is usual, and this is a question that simply cannot be answered in our current state of ignorance regarding the relation between (first- or second-) language acquisition and language change (see §. for discussion in relation to L acquisition). Lightfoot (: ff.) reaches a view on creolization largely compatible with DeGraff ’s views, while dissenting views are expressed in Bickerton () and McWhorter (). In particular, McWhorter (: ‒) argues that four morphosyntactic properties common to 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

creoles support the postulation of pidgin origin and therefore a break in transmission of a kind that justifies seeing this class of languages as exceptional: copula omission, generalization of the infinitive form of verbs, loss of case distinctions on pronouns and preverbal negation. Two of these claims can be evaluated on the basis of data from APiCS. Three features in APiCS are relevant to the claim about copulas: Feature  (Predicative Noun Phrases; Michealis et al. b), Feature  (Predicative Adjectives; Michaelis et al. c), and Feature  (Predicative Locative phrases; Michaelis et al. d). Here we see that the copula is absent in twenty-two of seventy-six creoles (%) with predicative NPs; in thirty-four of seventy-six creoles (%) with predicative APs and in ten of seventy-six creoles (%) with predicative locatives. Furthermore, Feature  can be compared to its counterpart in WALS (Stassen ), which reports that  of  languages (%) have null copulas with predicative NPs. So this property does not support McWhorter’s claim: zero copulas are not a general property of creoles and, at least with predicative NPs, are more frequent in noncreoles. The other feature which can be investigated is the position of negation. Here McWhorter’s claim is on much firmer ground: sixty-five of seventy-six creoles (%) show preverbal negation (Feature , Position of Standard Negation; Haspelmath et al. ). According to WALS (Feature A, Order of Negative Morpheme and Verb; Dryer d),  of the , languages investigated (%) show NegV order. This is the commonest order of negation and verb, but of course it is still significantly less frequent than preverbal negation in creoles. It seems, then, that McWhorter’s evidence is inconclusive. Is there some way to test the claim that creoles occupy a limited typological space? One way to do this is compare the distributions of certain syntactic properties (and hence the associated parameter values) between creoles and non-creoles. As the discussion of negation in the previous paragraph suggests, this can be done by comparing the data in APiCS with that in WALS. APiCS in fact makes available, where possible, a direct comparison with WALS, and so this may be potentially revealing. Table . presents the results of such a comparison of the features relevant to word order and null subjects. What we see here is a clear preference for head-initial orders, particularly in the clause (VO and Prepositions), in creoles as compared to non-creoles. But at the same time this is only a preference; a greater proportion of creoles show head-initial orders as compared to 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CREOLES AND CREOLIZATION

Table 5.1. Comparison of word order (head-initiality) and null subjects in creoles and noncreoles (APiCS and WALS) Feature No.

APiCS Number

Feature description

APiCS %

WALS Number

WALS %

APiCS ; WALS A

SVO order/total

/



/



APiCS ; WALS A

Prepositions/total

/



/



APiCS ; WALS A

Noun Genitive/total

/



/



APiCS ; WALS A

Noun Adjective/total

/



/



APiCS ; WALS A

Dem Noun/total

/



/



APiCS ; WALS A

Num Noun/total

/



/



APiCS ; WALS A

NRel/total

/



/



APiCS ; WALS A

DegreeMod Adjective/total

/



/



APiCS ; WALS A

Obligatory subject pronoun

/



/



non-creoles. This can be attributed to the fact that the majority of substrate (e.g. Niger-Congo) and superstrate (e.g. Romance and English) languages are head-initial, perhaps combined with a slight markedness preference for head-initiality (see §..). The situation regarding null subjects is different: here we see a significant difference between creoles and non-creoles in the incidence of non-null-subject systems (% vs. %). Given that non-null-subject languages (NNSLs) represent a fairly marked option in the null-argument hierarchy (see () and §..), this cannot be attributed to a markedness preference. Instead, it is most plausibly attributed to the loss of verbal inflection, which as we have seen is a very common feature of creoles. However, although the null-subject evidence is an instance of a typological skewing in creoles as compared to non-creoles, it does not argue for a break in transmission. In both §.. and §.. we commented on the fact that the development from CNSL to PNSL to NNSL is a natural one in non-creoles related to the erosion of agreement (see also the discussion of Brazilian Portuguese in §..); what we see here in creoles could be exactly the same development, in the case where the lexifier is a CNSL. I will return to this point in the discussion of creoles as PNSLs below. Table . compares some other features documented in both APiCS and WALS. 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

Table 5.2. Comparison of other features in APiCS and WALS Feature number

Feature description

APiCS Numbers

WALS APiCS % Numbers

WALS %

APiCS ; WALS A

Inversion in YNQ/total

/

%

%

/

APiCS ; WALS A

YN Particle/total

/

%

/

%

APiCS ; WALS A

Negative concord/total

/

%

/

%

APiCS ; WALS A

Wh-movement/total

/

%

/

%

Here we see that inversion in yes-no questions is very rare in both creoles and non-creoles, but in fact rarer in non-creoles (being, with the single exception of the Austronesian language Palauan, entirely limited to Europe; see Dryer e). Similarly, question particles are common in creoles, but commoner in non-creoles. Furthermore, negative concord is equally prevalent in both creoles and non-creoles.20 The incidence of overt wh-movement differs however, being quite a bit (%) commoner in creoles than in non-creoles (%). This may be related to the fact that overt wh-movement may be commoner in VO languages than in OV languages (Bach’s Generalisation; see (, ) in §..) and hence indirectly to the tendency to head-initiality in creoles. Table . complements the first two tables by giving the incidence of allegedly rare features in creoles as compared to non-creoles. Table 5.3. Comparison of properties alleged to be rare/non-existent in creoles (APiCS vs WALS) Feature Number

Feature description

APiCS numbers

APiCS %

WALS numbers

WALS %

APiCS ; WALS A

SOV/total

/



/

%

APiCS ; WALS A

Ergativity (NPs)/total

/



/

%

APiCS ; WALS A

Agreement-licensed NS/total

/

%

/

%

Here we see that SOV creoles, ergative creoles and CNSL creoles all exist, but that in every case their incidence is much lower than in noncreoles. So, the tendency to head-initiality in creoles is confirmed, along 20 This is not consistent with the results of van der Auwera and van Alsenoy (), mentioned in §.., for reasons that are unclear but probably a consequence of the different language samples used in each case.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

CREOLES AND CREOLIZATION

with a strong tendency for accusative alignment and a lack of consistent null-subject systems.21 Finally, Table . presents data relating to other syntactic properties that we have discussed in this section, for which no direct comparison with non-creole data from WALS is available. Table 5.4. Incidence of word-order and null-subject properties in APiCS not surveyed in WALS APiCS Feature No.

Feature description

APiCS Number

APiCS %



Definite Art > N order

/





Indefinite Art > N order

/





AdvVO order/total

/





Null subject of rain

/





Null subject of seem

/





Null subject of existential

/





Preverbal TMA markers

/



The evidence here confirms the prevalence of preverbal TMA systems in creoles, as well as a strong tendency to Article-Noun order (especially for indefinite articles). Two further very interesting and relevant points are the prevalence of Adverb-VO order (%) and the strong preference for at least one kind of expletive null subject, in existentials (%). The first of these observations confirms the tendency for lack of verb-movement, which we can, as in the case of the history of Faroese, the other MSc languages and English (see §... and §..), take to be connected to the loss of verbal inflection. The second confirms, at least in part, a tendency for creoles to be PNSLs, although the evidence concerning two other kinds of expletive null subject here does not confirm this. What conclusions can we draw from this evidence concerning the typological status of creoles? We see that there is a clear tendency for creoles to show certain typological properties (head-initial order, lack of referential null subjects, lack of verb-movement, overt whmovement, negative concord, etc.), but this is just a more or less strong 21 In fact, the single ergative ‘creole’ is Gurindji Kriol, a mixed language derived from Gurindji (Pama-Nyungan) and Kriol (an English-lexifier creole). It may therefore not count as a true creole on certain definitions. For a structural sketch of Gurindji Kriol see Meakins ().



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

tendency. Markedness factors do not seem to play a very major role, particularly as regards null subjects. The typological evidence alone does not provide any evidence for a break in transmission in the history of the creoles; the one strikingly different property from non-creoles, a tendency to lack referential null subjects, can be accounted for in exactly the same way as for the history of French, Brazilian Portuguese and the Germanic family as a whole in §., by loss of verbal inflection. So, to the extent that the data in these tables tells us anything at all about the history of creoles, it provides support for Aboh and DeGraff ’s position that creoles ‘are historically related to typologically similar European lexifiers (typically Germanic or Romance) with relatively little affixal morphology and with few cross-linguistically rare features, and to African substrates (mostly Niger-Congo) that fall in narrow bands of typological variation as well’ (: ). When, as in the case of clausal head-initial word order, substrate, superstrate and (weakly) markedness preferences all ‘push the same way’, it is hardly surprising, and rather uninformative, that the creoles tend to be head-initial. As mentioned, the evidence for a development, in the case of creoles with Ibero-Romance lexifiers at least, from CNSL to PNSL and, at least in some cases, to NNSL, parallels the diachronic developments we have observed across Germanic and Romance. So here there are really no purely linguistic grounds for exceptionalism. Indeed, Occam’s Razor would favour not positing a break in transmission, since none is required to explain this development in the history of creoles any more than in the history of Germanic (on which see §..). A further, more speculative, argument could be made: as we saw in relation to the null-argument parameter hierarchy in (), the development from CNSL to PNSL involves an increase in markedness (see the discussion in §.. on how such changes may come about). This, as we have seen, is the result of a very common diachronic process of loss of inflection. However, one could speculate that had there been a break in transmission and a phase in which the original grammatical system was entirely eradicated and a new one built purely on the basis of maximally unmarked parameter values across the board, then we might expect the loss of verbal inflection to have created PLD which would have favoured RNSL creoles, since RNSLs are the least marked form of null-argument system (and are indeed cross-linguistically quite common among non-creoles). Yet creoles clearly do not, as a class, show RNSL properties (although there may be some among the 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

LANGUAGE CREATION IN NICARAGUA

minority which allow referential null subjects). All else equal, including independently motivated markedness assumptions, this is what we would expect from a break in transmission. The fact that we do not find this, but instead find developments in Romance- and Germaniclexifier creoles which parallel the developments in many non-creoles (including all the Germanic languages, with the possible exception of Gothic, see §..), may be taken as an argument against a break in transmission and hence against exceptionalism. To conclude, I would like to quote my own comment in Roberts (: ) on the allegedly exceptional nature of creoles and what this might be able to tell us about UG and language acquisition: What gives us a privileged view of UG, and of the nature of the parameter-setting algorithm, is not creoles but language change. Creoles are particularly interesting in that they represent an extreme of language change, but it is the mechanisms of language change, which are ubiquitous in the history of every language and every language family, that have made creoles what they are.

To this I would only add that the ‘extreme’ aspect of language change in the case of creoles may in fact have more to do with the extreme social conditions under which these varieties were formed than with any intrinsic aspect of the linguistic mechanisms of change; this would be in line with Aboh and DeGraff ’s () conclusions as summarized above. The parameter-setting device may have functioned quite normally, but under abnormal external conditions, in this case.

5.4. Language creation in Nicaragua In this section, I want to turn to what might really be an exceptional case: language creation in Nicaragua. I will summarize some very interesting and important research on sign language in Nicaragua, reported in KSC. (See also Senghas a,b; Kegl .) This appears to be a case of language creation in that a new signed language has emerged where none existed before. If they are correct, KSC’s work has implications for language-acquisition theory, the study of pidgins and creoles, and for the study of language change under the general assumptions I have been arguing for here: the creation of a language is in a sense an extreme case, perhaps the extreme case, of the reanalysis of PLD that underlies change, and so this case may have something to 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

tell us about language change in general (see also Guasti : ‒ on the importance of Nicaraguan Sign Language for language acquisition, and Guasti’s :  discussion of Goldin-Meadow and Mylander ). Before embarking on the discussion of signed languages in Nicaragua, a word or two regarding sign languages in general is perhaps in order. It is now an accepted result of modern linguistics that sign languages as used by Deaf communities22 in many parts of the world are true languages in every sense (see Goldin-Meadow : –). They differ from spoken languages only in the modality of transmission: gestural/visual as opposed to oral/aural. The signed languages which have been studied show all the structural features of spoken languages, including notably a syntax which has all the hallmarks of a generative system, being discrete, algorithmic, recursive, and purely formal. Signed languages also have a phonology, in that signs, which have a meaning, are made up of smaller units which themselves lack meaning, just as words (or morphemes) in spoken languages are made up of phonemes. Phonological processes such as assimilation have been observed, and phonological units such as the syllable proposed (see Sandler and Lillo-Martin : – for a summary of the evidence for sign-language phonology). Signed languages also show the full range of semantic properties of spoken language. To quote Sandler and Lillo-Martin (: ), sign languages: are natural languages, in the sense that they are not consciously invented by anyone, but, rather, develop spontaneously wherever deaf people have the opportunity to congregate and communicate regularly with each other. Sign languages are not derived from spoken languages; they have their own independent vocabularies and their own grammatical structures. Although there do exist contrived sign systems that are based on spoken languages . . . such systems are not natural languages.

The emergence of a new sign language, then, is an instance of the emergence of a new natural language, and as such of great interest. Let us now look at KSC’s documentation of this remarkable event. These developments are also summarized and discussed in Lightfoot (: ff.).

22 I follow the standard practice in work on sign languages in using the capitalized term ‘Deaf ’ to refer to Deaf communities and their members, and the non-capitalized ‘deaf ’ to refer to hearing loss.



OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

..

LANGUAGE CREATION IN NICARAGUA

KSC look at the signing of a community of approximately  Deaf children and young adults in Managua, Nicaragua. Their claim is that the sign language this community uses for internal communication has come into existence since approximately . Writing in the mids, they state that this ‘newly emergent language has been in existence for barely more than a decade’ (). They therefore have ‘one of the first documented cases of the birth of a natural human language’ (). The background to this situation lies in the social and political situation of Nicaragua. Until the Sandinista revolution in , Nicaragua was ruled by a dictatorship which made care for the disadvantaged and education for the majority of the population a low priority. Accordingly, there were no schools or any kind of social or welfare provision for the Deaf. Furthermore, there was considerable social stigma attached to deafness. For these reasons, there was no Deaf community in pre-Sandinista Nicaragua, and correspondingly no sign language. In , the Sandinista government set up a number of special schools as part of its general campaign to provide education for the population as a whole. In Managua, about  Deaf children came together over a few years in the largest of these schools. The children were of varied ages (see below for relevant details). In the schools, teachers attempted to teach the children Spanish by using fingerspelling. This is an example of a ‘contrived sign system[s] . . . based on spoken languages’ mentioned by Sandler and Lillo-Martin in the quotation above. The Deaf children communicated spontaneously with one another using homesigns, and soon developed a kind of pidgin. This in turn developed into a fully-fledged language, arguably a creolized version of the earlier pidgin, used to begin with only by children under  years old. KSC argue that this creole is a new language that was created by the under-s at this school, beginning around . KSC distinguish four types of signing system used in the Deaf community at the school in Managua. First, homesigns, or mimicas in Spanish and for the signers themselves. These are ad-hoc signs invented by isolated individuals: ‘idiosyncratic gestural systems . . . used by isolated deaf individuals’ (). KSC say ‘[e]ach isolate’s homesign system is unique, idiosyncratic, variable even within the individual, and lacking most characteristics, particularly syntactic, of what we would recognize as a full-fledged human language’ (–). Homesigns are still in use among the Deaf in Nicaragua, although nowadays they are 

OUP CORRECTED PROOF – FINAL, 8/10/2021, SPi

CONTACT, CREOLES, AND CHANGE

used only by older signers who made contact with the Deaf community after they had passed the critical period for language acquisition, and hence are unable to acquire sign language.23 Most important for present purposes, KSC distinguish Lenguaje de Señas Nicaragüense (LSN) from Idioma de Señas Nicaragüense (ISN).24 LSN is ‘a highly variable and ever-changing form of communication that developed from the point when the[se] homesigners came together’ (), while ISN is ‘a coexisting . . . fully articulated signed language form that is in use only by individuals who entered the schools at ages well below the end of what would count as their critical period for language acquisition’ (). We will look in more detail at some of the structural differences between LSN and ISN below, as well as the age profiles of ISN users. Finally, KSC distinguish a fourth variety: El Pidgin de Señas Nicaragüense (PSN). This is ‘a pidgin used between hearing individuals and deaf signers’ () which ‘characteristically involves the interspersing, sometimes overlapping, of Spanish and Nicaraguan signs’ (–). Interestingly, Deaf signers using PSN consider themselves to be speaking Spanish, while Spanish speakers consider themselves to be signing. I will have very little to say about PSN here, as it clearly appears to be a pidgin or contact vernacular. Instead, I will concentrate on LSN and ISN, as it is here that the evidence for language creation lies. KSC show that there are a number of important structural differences between LSN and ISN. First, there is a general difference in the nature of the signing gestures and the signing space (the area in front of the signer’s upper body in which signs are made). In LSN ‘signs are large and tend to be symmetrical’, while in ISN there is a ‘smaller signing space . . . The use of the two hands is more asymmetric’ (–). There are also quantifiable differences in fluency and rate of information conveyed. (LSN signers averaged twenty-four events/ minute in the study in Senghas et al. (), while ISN signers averaged forty-six, p