The Oxford Handbook of Universal Grammar [Illustrated] 0199573778, 9780199573776

This handbook provides a critical guide to the most central proposition in modern linguistics: the notion, generally kno

954 170 8MB

English Pages 674 [672] Year 2016

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The Oxford Handbook of English Grammar [Illustrated] 0198755104, 9780198755104

This handbook provides an authoritative, critical survey of current research and knowledge in the grammar of the English

3,898 310 15MB Read more

Oxford Handbook of Compositionality [Illustrated] 0199541078, 9780199541072

Compositionality is a key concept in linguistics, the philosophy of mind and language, and throughout the cognitive scie

658 143 44MB Read more

The Oxford Handbook of the Baroque [Illustrated] 0190678445, 9780190678449

Few periods in history are so fundamentally contradictory as the Baroque, the culture flourishing from the mid-sixteenth

796 120 14MB Read more

The Oxford Handbook of English Grammar 0198755104, 9780198755104

This handbook provides an authoritative, critical survey of current research and knowledge in the grammar of the English

1,827 188 10MB Read more

The Oxford Handbook of Dialectical Behaviour Therapy [Illustrated] 0198758723, 9780198758723

Dialectical behavior therapy (DBT) is a specific type of cognitive-behavioral psychotherapy developed in the late 1980s

1,820 273 17MB Read more

The Oxford Handbook of Numerical Cognition [Illustrated] 0199642346, 9780199642342

How do we understand numbers? Do animals and babies have numerical abilities? Why do some people fail to grasp numbers,

873 108 39MB Read more

The Oxford Handbook of Bayesian Econometrics [Illustrated] 0199559082, 9780199559084

Bayesian econometric methods have enjoyed an increase in popularity in recent years. Econometricians, empirical economis

890 83 7MB Read more

The Oxford Handbook of Shakespeare's Poetry [Illustrated] 0199607745, 9780199607747

The Oxford Handbook of Shakespeare's Poetry contains 38 original essays written by leading Shakespeareans around th

1,140 72 14MB Read more

The Oxford Handbook of Externalizing Spectrum Disorders [Illustrated] 9780199324675

Recent developments in the conceptualization of externalizing spectrum disorders, including attention-deficit/hyperactiv

1,964 154 10MB Read more

The Oxford Handbook of Neurolinguistics [Illustrated] 0190672021, 9780190672027

Neurolinguistics is a young and highly interdisciplinary field, with influences from psycholinguistics, psychology, apha

2,057 290 54MB Read more

The Oxford Handbook of Universal Grammar [Illustrated]
0199573778, 9780199573776

Author / Uploaded
Ian Roberts (editor)

Table of contents :
Cover
Preface
Contents
List of Figures and Tables
List of Abbreviations
The Contributors
1. Introduction
PART I PHILOSOPHICAL BACKGROUND
2. Universal Grammar and Philosophy of Mind
3. Universal Grammar and Philosophy of Language
4. On the History of Universal Grammar
PART II LINGUISTIC THEORY
5. The Concept of Explanatory Adequacy
6. Third- Factor Explanations and Universal Grammar
7. Formal and Functional Explanation
8. Phonology in Universal Grammar
9. Semantics in Universal Grammar
PART III LANGUAGE ACQUISITION
10. The Argument from the Poverty of the Stimulus
11. Learnability
12. First Language Acquisition
13. The Role of Universal Grammar in Nonnative Language Acquisition
PART IV COMPARATIVE SYNTAX
14. Principles and Parameters of Universal Grammar
15. Linguistic Typology
16. Parameter Theory and Parametric Comparison
PART V WIDER ISSUES
17. A Null Theory of Creole Formation Based on Universal Grammar
18. Language Change
20. The Syntax of Sign Language and Universal Grammar
21. Looking for UG in Animals: A Case Study in Phonology
References
Index of Authors
Index of Subjects

Citation preview

T h e Ox f o r d H a n d b o o k o f

UNIVERSAL GRAMMAR

OXFOR D HANDBOOKS IN LINGUISTICS Recently Published

THE OXFORD HANDBOOK OF CORPUS PHONOLOGY Edited by Jacques Durand, Ulrike Gut, and Gjert Kristoffersen

THE OXFORD HANDBOOK OF DERIVATIONAL MORPHOLOGY Edited by Rochelle Lieber and Pavol Štekauer

THE OXFORD HANDBOOK OF HISTORICAL PHONOLOGY Edited by Patrick Honeybone and Joseph Salmons

THE OXFORD HANDBOOK OF LINGUISTIC ANALYSIS Second Edition Edited by Bernd Heine and Heiko Narrog

THE OXFORD HANDBOOK OF THE WORD Edited by John R. Taylor

THE OXFORD HANDBOOK OF INFLECTION Edited by Matthew Baerman

THE OXFORD HANDBOOK OF DEVELOPMENTAL LINGUISTICS Edited by Jeffrey Lidz, William Snyder, and Joe Pater

THE OXFORD HANDBOOK OF LEXICOGRAPHY Edited by Philip Durkin

THE OXFORD HANDBOOK OF NAMES AND NAMING Edited by Carole Hough

THE OXFORD HANDBOOK OF INFORMATION STRUCTURE Edited by Caroline Féry and Shinichiro Ishihara

THE OXFORD HANDBOOK OF MODALITY AND MOOD Edited by Jan Nuyts and Johan van der Auwera

THE OXFORD HANDBOOK OF LANGUAGE AND LAW Edited by Peter M. Tiersma and Lawrence M. Solan

THE OXFORD HANDBOOK OF ERGATIVITY Edited by Jessica Coon, Diane Massam, and Lisa Travis

THE OXFORD HANDBOOK OF PRAGMATICS Edited by Yan Huang

THE OXFORD HANDBOOK OF UNIVERSAL GRAMMAR Edited by Ian Roberts

For a complete list of Oxford Handbooks in Linguistics please see pp 649–650.

The Oxford Handbook of

UNIVERSAL GRAMMAR Edited by

IAN ROBERTS

1

3 Great Clarendon Street, Oxford, ox2 6dp, United Kingdom Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries © editorial matter and organization Ian Roberts 2017 © the chapters their several authors 2017 The moral rights of the authors‌have been asserted First Edition published in 2017 Impression: 1 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by licence or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this work in any other form and you must impose this same condition on any acquirer Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America British Library Cataloguing in Publication Data Data available Library of Congress Control Number: 2016944780 ISBN 978–0–19–957377–6 Printed in Great Britain by Clays Ltd, St Ives plc Links to third party websites are provided by Oxford in good faith and for information only. Oxford disclaims any responsibility for the materials contained in any third party website referenced in this work.

This book is dedicated to Noam Chomsky, il maestro di color che sanno.

Preface

The idea for this volume was first proposed to me by John Davey several years ago. Its production was delayed for various reasons, not least that I was chairman of the Faculty of Modern and Medieval Languages in Cambridge for the period 2011–2015. I’d like to thank the authors for their patience and, above all, for the excellence of their contributions. I’d also like to thank Julia Steer of Oxford University Press for her patience and assistance. Finally, thanks to the University of Cambridge for giving me sabbatical leave from October 2015, which has allowed me to finalize the volume, and a special thanks to Bob Berwick for sending me a copy of Berwick and Chomsky (2016) at just the right moment. Oxford, February 2016

Contents

List of Figures and Tables List of Abbreviations The Contributors 1. Introduction Ian Roberts

xi xiii xvii 1

PA RT I P H I L O S OP H IC A L BAC KG ROU N D 2. Universal Grammar and Philosophy of Mind Wolfram Hinzen

37

3. Universal Grammar and Philosophy of Language Peter Ludlow

61

4. On the History of Universal Grammar James McGilvray

77

PA RT I I L I N G U I ST IC T H E ORY 5. The Concept of Explanatory Adequacy Luigi Rizzi

97

6. Third-Factor Explanations and Universal Grammar Terje Lohndal and Juan Uriagereka

114

7. Formal and Functional Explanation Frederick J. Newmeyer

129

8. Phonology in Universal Grammar Brett Miller, Neil Myler, and Bert Vaux

153

9. Semantics in Universal Grammar George Tsoulas

183

x Contents

PA RT I I I L A N G UAG E AC Q U I SI T ION 10. The Argument from the Poverty of the Stimulus Howard Lasnik and Jeffrey L. Lidz

221

11. Learnability Janet Dean Fodor and William G. Sakas

249

12. First Language Acquisition Maria Teresa Guasti

270

13. The Role of Universal Grammar in Nonnative Language Acquisition 289 Bonnie D. Schwartz and Rex A. Sprouse

PA RT I V C OM PA R AT I V E SY N TAX 14. Principles and Parameters of Universal Grammar C.-T. James Huang and Ian Roberts

307

15. Linguistic Typology Anders Holmberg

355

16. Parameter Theory and Parametric Comparison Cristina Guardiano and Giuseppe Longobardi

377

PA RT V W I DE R I S SU E S 17. A Null Theory of Creole Formation Based on Universal Grammar Enoch O. Aboh and Michel DeGraff

401

18. Language Change Eric Fuß

459

19. Language Pathology Ianthi MariaTsimpli, Maria Kambanaros, and Kleanthes K. Grohmann

486

20. The Syntax of Sign Language and Universal Grammar Carlo Cecchetto

509

21. Looking for UG in Animals: A Case Study in Phonology Bridget D. Samuels, Marc D. Hauser, and Cedric Boeckx

527

References Index of Authors Index of Subjects

547 627 639

List of Figures and Tables Figures 10.1 The red balls are a subset of the balls. If one is anaphoric to ball, it would be mysterious if all of the referents of the NPs containing one were red balls. Learners should thus conclude that one is anaphoric to red ball.

246

10.2 Syntactic predictions of the alternative hypotheses. A learner biased to choose the smallest subset consistent with the data should favor the one = N0 hypothesis.

246

16.1 A sample from Longobardi et al. (2015), showing 20 parameters and 40 languages. The full version of the table, showing all 75 parameters, can be found at www.oup.co.uk/companion/roberts.

382

16.2 Sample parametric distances (from Longobardi et al. 2015) between the 40 languages shown in Figure 16.1. The full version of the figure showing all parametric distances can be found at www.oup.co.uk/companion/ roberts. 388 16.3 A Kernel density plot showing the distribution of observed distances from Figure 16.2 (grey curve), observed distances between cross-family pairs from Figure 16.2 (dashed-line curve), and randomly-generated distances (black curve). From Longobardi et al. (2015).

390

16.4 KITSCH tree generated from the parametric distances of Figure 16.2.

391

19.1 Impairment of communication for speech, language, and pragmatics.

488

19.2 A minimalist architecture of the grammar.

494

Tables 13.1 Referential vs. bound interpretations for null vs. overt pronouns in Kanno (1997).

298

14.1 Summary of values of parameters discussed in this section for English, Chinese, and Japanese.

318

19.1 Classification of aphasias based on fluency, language understanding, naming, repetition abilities, and lesion site (BA = Brodmann area).

501

List of Abbreviations

AS

Autonomy of Syntax

ASC

autism spectrum conditions

ASD

autism spectrum disorders

ASL

American Sign Language

BCC

Borer–Chomsky Conjecture

CALM combinatory atomism and lawlike mappings CCS

Comparative Creole Syntax

CL

Cartesian Linguistics

CRTM Computational-Representational Theory of Mind DGS

German Sign Language

DP Determiner Phrase ECM

Exceptional Case Marking

ENE

Early Modern English

EST

Extended Standard Theory

FE

Feature Economy

FL

language faculty

FLB

Faculty of Language in the Broad sense

FLN

Faculty of Language in the Narrow sense

FOFC

Final-Over-Final Constraint

GA

genetic algorithm

GB

Government and Binding

HC Haitian Creole HKSL

Hong Kong Sign Language

IG

Input Generalization

L1

first language, native language

L2

second language

L2A

Second Language Acquisition

LAD

Language Acquisition Device

xiv List of Abbreviations LCA

Linear Correspondence Axiom

LF Logical Form LH

left hemisphere

LI lexical item LIS

Italian Sign Language

LSF

French Sign Language

ME

Middle English

MGP Modularized Global Parametrization MnC Modern Chinese MP

Minimalist Program

NE

Modern English

OC

Old Chinese

OCP

Obligatory Contour Principle

OT Optimality Theory P&P

Principles and Parameters

PCM Parametric Comparison Method PF Phonological Form PFF

pleiotropic formal features

PLD

Primary Linguistic Data

PP prepositional phrase QUD question under discussion RBP

Rule-Based Phonology

SAI

Subject–Auxiliary Inversion

SCM

structural case marking

SLI

specific language impairment

SMT

Strong Minimalist Thesis

SO syntactic object STM

short-term memory

SVC

Single Value Constraint

TİD

Turkish Sign Language

TLA

Triggering Learning Algorithm

TMA Tense, Mood, and Aspect TP

transitional probabilities

TSM

Taiwanese Southern Min

List of Abbreviations xv TVJT truth value judgment task UFS

Universal Feature Set

UG

Universal Grammar

VL

variational learning

WM working memory

The Contributors

Enoch O. Aboh is Professor of Linguistics at the University of Amsterdam, where he investigates the learnability of human language with a special focus on comparative syntax, language creation, and language change. His main publications include The Emergence of Hybrid Grammars (2015) and The Morphosyntax of Head–Complement Sequences (2004). As a co-organizer of the African Linguistics School, where he also teaches, he is strongly engaged in working toward a better transfer of knowledge of linguistics to Africa. Cedric Boeckx is Research Professor at the Catalan Institute for Advanced Studies (ICREA), and a member of the section of General Linguistics at the Universitat de Barcelona, as well as the Center for Complexity Science at the same university. He is the author of numerous books, including Islands and Chains (2003), Linguistic Minimalism (2006, OUP), Bare Syntax (2008, OUP), Language in Cognition (2010, Wiley-Blackwell), and Elementary Syntactic Structures (2014, CUP). He is the editor of OUP’s Studies in Biolinguistics series. Carlo Cecchetto is a graduate of the University of Milan and has held posts in Siena and Milan. He is currently Directeur de recherche at CNRS (UMR 7023, Paris VIII). He has published two monographs, has edited several collections of articles and has co- authored over 40 articles in academic journals on topics ranging from natural language syntax and semantics to sign language and the neuropsychology of language. Michel DeGraff is Professor of Linguistics at MIT and Director of the ‘MIT–Haiti Initiative’ funded by the U.S. National Science Foundation. He is also a founding member of Haiti’s Haitian Creole Academy. He studies Creole languages, focusing on his native Haitian Creole. His research deepens our understanding of the history and structures of Creole languages. His analyses show that Creole languages, often described as ‘exceptional’ or ‘lesser,’ are fundamentally on a par with non-Creole languages in terms of historical development, grammatical structures, and expressive capacity. His research projects bear on social justice as well. In DeGraff ’s vision, Creole languages and other so-called ‘local’ languages constitute a necessary ingredient for sustainable development, equal opportunity, and dignified citizenship for their speakers—a position that is often undermined by theoretical claims that contribute to the marginalization of these languages, especially in education and administration. Janet Dean Fodor has a Ph.D. in linguistics from MIT, and has taught at the University of Connecticut as well as at the City University of New York, where she is Distinguished

xviii The Contributors Professor at The Graduate Center. She is the author of a textbook on semantics and of a 1979 monograph republished recently in Routledge Library Editions. Her research—in collaboration with many students and colleagues—includes studies of human sentence processing in a variety of languages, focusing most recently on the role of prosody in helping (or hindering) syntactic analysis, including the ‘implicit prosody’ that is mentally projected in silent reading. Another research interest, represented in this volume, is the modeling of grammar acquisition. She was a founder of the CUNY Conference on Human Sentence Processing, now in its 29th year. She is a former president of the Linguistic Society of America and a corresponding fellow of the British Academy for the Humanities and Social Sciences. Eric Fuß graduated from the Goethe University Frankfurt and has held positions at the Universities of Stuttgart, Frankfurt, and Leipzig. He is currently Senior Researcher at the Institute for the German Language in Mannheim, Germany. He has written three monographs and (co-)edited several volumes of articles. His primary research interests are language change, linguistic variation, and the interface between syntax and morphology. Kleanthes K. Grohmann received his Ph.D. from the University of Maryland and is currently Professor of Biolinguistics at the University of Cyprus. He has published widely in the areas of syntactic theory, comparative syntax, language acquisition, impaired language, and multilingualism. Among the books he has written and (co-)edited are Understanding Minimalism (with N. Hornstein and J. Nunes, 2005, CUP), InterPhases (2009, OUP), and The Cambridge Handbook of Biolinguistics (with Cedric Boeckx, 2013, CUP). He is founding co-editor of the John Benjamins book series Language Faculty and Beyond, editor of the open-access journal Biolinguistics, and Director of the Cyprus Acquisition Team (CAT Lab). Cristina Guardiano is Associate Professor of Linguistics at the Università di Modena e Reggio Emilia. She specialized in historical syntax at the Università di Pisa, where she got her Ph.D., with a dissertation about the internal structure of DPs in Ancient Greek. She is active in research on the parametric analysis of nominal phrases, the study of diachronic and dialectal syntactic variation, crosslinguistic comparison, and phylogenetic reconstruction. She is a member of the Syntactic Structures of the World's Languages (SSWL) research team, and a project advisor in the ERC Advanced Grant ‘LanGeLin’. Maria Teresa Guasti is a graduate of the University of Geneva and has held posts in Geneva, Milano San Raffaele, and Siena. She is currently Professor of Linguistics and Psycholinguistics at the University of Milano-Bicocca. She is author of several articles in peer-reviewed journals, of book chapters, one monograph, one co-authored book, and one second-edition textbook. She is Associate Investigator at the ARC-CCD, Macquarie University, Sydney and Visiting Professor at the International Centre for Child Health, Haidan District, Beijing. She has participated in various European Actions and has been Principal Investigator of the Crosslinguistic Language Diagnosis project funded by the European Community.

The Contributors xix Marc D. Hauser is the President of Risk-Eraser, LLC, a company that uses cognitive and brain sciences to impact the learning and decision making of at-risk children, as well as the schools and programs that support them. He is the author of several books, including The Evolution of Communication (1996, MIT Press), Wild Minds (2000, Henry Holt), Moral Minds (2006), and Evilicious (2013, Kindle Select, CreateSpace), as well as over 250 publications in refereed journals and books. Wolfram Hinzen is Research Professor at the Catalan Institute for Advanced Studies and Research (ICREA) and is affiliated with the Linguistics Department of the Universitat Pompeu Fabra, Barcelona. He writes on issues in the interface of language and mind. He is the author of the OUP volumes Mind Design and Minimal Syntax (2006), An Essay on Names and Truth (2007), and The Philosophy of Universal Grammar (with Michelle Sheehan, 2013), as well as co-editor of The Oxford Handbook of Compositionality (OUP, 2012). Anders Holmberg received his Ph.D. from Stockholm University in 1987 and is currently Professor of Theoretical Linguistics at Newcastle University, having previously held positions in Morocco, Sweden, and Norway. His main research interests are in the fields of comparative syntax and syntactic theory, with a particular focus on the Scandinavian languages and Finnish. His publications include numerous articles in journals such as Lingua, Linguistic Inquiry, Theoretical Linguistics, and Studia Linguistica, and several books, including (with Theresa Biberauer, Ian Roberts, and Michelle Sheehan) Parametric Variation: Null Subjects in Minimalist Theory (2010, CUP) and The Syntax of Yes and No (2016, OUP). C.-T. James Huang received his Ph.D. from MIT in 1982 and has held teaching positions at the University of Hawai’i, National Tsing Hua University, Cornell University, and University of California before taking up his current position as Professor of Linguistics at Harvard University. He has published extensively, in articles and monographs, on subjects in syntactic analysis, the syntax–semantics interface, and parametric theory. He is a fellow of the Linguistic Society of America, an academician of Academia Sinica, and founding co-editor of Journal of East Asian Linguistics (1992–present). Maria Kambanaros, a certified bilingual speech pathologist with 30 years clinical experience, is Associate Professor of Speech Pathology at Cyprus University of Technology. Her research interests are related to language and cognitive impairments across neurological and genetic pathologies (e.g., stroke, dementia, schizophrenia, multiple sclerosis, specific language impairment, syndromes). She has published in the areas of speech pathology, language therapy, and (neuro)linguistics, and directs the Cyprus Neurorehabilitation Centre. Howard Lasnik is Distinguished University Professor of Linguistics at the University of Maryland, where he has also held the title Distinguished Scholar-Teacher. He is Fellow of the Linguistic Society of America and serves on the editorial boards of eleven journals. He has published eight books and over 100 articles, mainly on syntactic theory,

xx The Contributors learnability, and the syntax–semantics interface, especially concerning phrase structure, anaphora, ellipsis, verbal morphology, Case, and locality constraints. He has supervised 57 completed Ph.D. dissertations, on morphology, on language acquisition, and, especially, on syntactic theory. Jeffrey L. Lidz is Distinguished Scholar-Teacher and Professor of Linguistics at the University of Maryland. His main research interests are in language acquisition, syntax, and psycholinguistics and his many publications include articles in Proceedings of the National Academy of Sciences, Cognition, Journal of Memory and Language, Language Acquisition, Language Learning and Development, Language, Linguistic Inquiry, and Natural Language Semantics, as well as chapters in numerous edited volumes. He is currently editor in chief of Language Acquisition: A Journal of Developmental Linguistics and is the editor of the Oxford Handbook of Developmental Linguistics. Terje Lohndal is Professor of English Linguistics (100%) at the Norwegian University of Science and Technology in Trondheim and Professor II (20%) at UiT The Arctic University of Norway. He received his Ph.D. from the University of Maryland in 2012. His work focuses on formal grammar and language variation, but he also has interests in philosophy of language and neuroscience. He has published a monograph with Oxford University Press, and many papers in journals such as Linguistic Inquiry, Journal of Semantics, and Journal of Linguistics. In addition to research and teaching, Lohndal is also involved with numerous outreach activities and is a regular columnist in Norwegian media on linguistics and the humanities. Giuseppe Longobardi is Anniversary Professor of Linguistics at the University of York and Principal Investigator on the European Research Council (ERC) Advanced Grant ‘Meeting Darwin’s last challenge: Toward a global tree of human languages and genes’ (2012–2017). He has done research in theoretical and comparative syntax, especially on the syntax/ontology relation in nominal expressions. He is interested in quantitative approaches to language comparison and phylogenetic linguistics, and is active in interdisciplinary work with genetic anthropologists. Over the past ten years he has contributed the design of three innovative research programs (Topological Mapping theories, Parametric Minimalism, and the Parametric Comparison Method). Peter Ludlow has published on a number of topics at the intersection of the philosophy of language and linguistics. His publications include The Philosophy of Generative Linguistics (OUP, 2011) and Living Words: Meaning Underdetermination and the Dynamic Lexicon (OUP, 2014). He has taught at State University of New York at Stony Brook, University of Michigan, University of Toronto, and Northwestern University. James McGilvray is Professor Emeritus in Philosophy at McGill University in Montreal. He has published articles in the philosophy of language and philosophy of mind and has written and edited several books and articles on the works of Noam Chomsky and their philosophical, moral, and scientific foundations and implications. He is currently editing a second edition of The Cambridge Companion to Chomsky focusing primarily on change and progress in Chomsky’s work during the last ten years.

The Contributors xxi Brett Miller is Visiting Professor at the University of Missouri–Columbia. His interests include the role of phonetics in explaining the behaviors of phonological features; historical phonology, especially where laryngeal contrasts are concerned; and pragmatic functions of syntax in Ancient Greek. He also enjoys teaching linguistic typology and the history of the English language. Neil Myler is Assistant Professor of linguistics at Boston University. He received his doctorate from New York University in 2014, under the supervision of Prof. Alec Marantz, with a thesis entitled ‘Building and Interpreting Possession Sentences.’ His research interests include morphosyntax, morphophonology, microcomparative syntax (particularly with respect to English dialects and languages of the Quechua family), argument structure, and the morphosyntax and semantics of possession cross-linguistically. Frederick J. Newmeyer specializes in syntax and the history of linguistics and has as his current research program the attempt to synthesize the results of formal and functional linguistics. He was Secretary-treasurer of the Linguistic Society of America (LSA) from 1989 to 1994 and its President in 2002. He has been elected Fellow of the LSA and the American Association for the Advancement of Science. In 2011 he received a Mellon Foundation Fellowship for Emeritus Faculty. Luigi Rizzi teaches linguistics at the University of Geneva and at the University of Siena. He was on the faculty of MIT and the École Normale Supérieure (Paris). His research interests involve theoretical and comparative syntax, with special reference to the theory of locality, the study of syntactic invariance and variation, the cartography of syntactic structures, and the theory-guided study of language acquisition. His main publications include the monographs Issues in Italian Syntax and Relativized Minimality, and the article ‘The Fine Structure of the Left Periphery.’ Ian Roberts is a graduate of the University of Southern California and has held posts in Geneva, Bangor, and Stuttgart. He is currently Professor of Linguistics at the University of Cambridge. He has published six monographs and two textbooks, and has edited several collections of articles. He was president of Generative Linguistics of the Old World (GLOW) in 1993–2001, and of the Societas Linguistica Europeea in 2012–2013. He is currently Principle Investigator on the European Reserach Council Advanced Grant Rethinking Comparative Syntax. William G. Sakas has an undergraduate degree in economics from Harvard University and a Ph.D. in computer science from the City University of New York (CUNY). He is currently Associate Professor and Chair of Computer Science at Hunter College and is on the doctoral faculties of Linguistics and Computer Science at the CUNY Graduate Center, where he is the Co-founding Director of the Computational Linguistics Masters and Doctoral Certificate Program. He has recently become active in computer science education both at the college and pre-college levels. His research focuses on computational modeling of human language: What are the consequential components of a computational model and how do they correlate with psycholinguistic data and human mental capacities?

xxii The Contributors Bridget D. Samuels is Senior Editor at the Center for Craniofacial Molecular Biology, University of Southern California. She is the author of Phonological Architecture: A Biolinguistic Perspective (2011, OUP), as well as numerous other publications in biological, historical, and theoretical linguistics. She received her Ph.D. in Linguistics from Harvard University in 2009 and has taught at Harvard, the University of Maryland, and Pomona College. Bonnie D. Schwartz is Professor in the Department of Second Language Studies at the University of Hawai‘i. Her research has examined the nature of second language acquisition from a generative perspective. More recently, she has focused on child second language acquisition and how it may differ from that of adults. Rex A. Sprouse received his Ph.D. in Germanic linguistics from Princeton University. He has taught at Bucknell University, Eastern Oregon State College, and Harvard University and is now Professor of Second Language Studies at Indiana University. In the field of generative approaches to second language acquisition, he is perhaps best known for his research on the L2 initial state and on the role of Universal Grammar in the L2 acquisition of properties of the syntax–semantics interface. Ianthi Maria Tsimpli is Professor of English and Applied Linguistics in the Department of Theoretical and Applied Linguistics at the University of Cambridge. She works on language development in the first and second language in children and adults, language impairment, attrition, bilingualism, language processing, and the interaction between language, cognitive abilities, and print exposure. George Tsoulas graduated from Paris VIII in 2000 and is currently Senior Lecturer at the University of York. He has published extensively on formal syntax and the syntax– semantics interface. He has edited and authored books on quantification, distributivity, the interfaces of syntax with semantics and morphology and diachronic syntax. His work focuses on Greek and Korean syntax and semantics, as well as questions and the integration of syntax with gesture. He is currently working on a monograph on Greek particles. Juan Uriagereka has been Professor of Linguistics at the University of Maryland since 2000. He has held visiting professorships at the universities of Konstanz, Tsukuba, and the Basque country. He is the author of Syntactic Anchors: On Semantic Structure and of Spell-Out and the Minimalist Program, and co-author, with Howard Lasnik, of A Course in Minimalist Syntax. He is also co-editor, with Massimo Piattelli-Palmarini and Pello Salaburu, of Of Minds and Language. Bert Vaux is Reader in phonology and morphology at Cambridge University and Fellow of King’s College. He is primarily interested in phenomena that shed light on the structure and origins of the phonological component of the grammar, especially in the realms of psychophonology, historical linguistics, microvariation, and nanovariation. He also enjoys working with native speakers to document endangered languages, especially varieties of Armenian, Abkhaz, and English.

Chapter 1

Introdu c t i on Ian Roberts

1.1 Introduction Birds sing, cats meow, dogs bark, horses neigh, and we talk. Most animals, or at least most higher mammals, have their own ways of making noises for their own purposes. This book is about the human noise-making capacity, or, to put it more accurately (since there’s much more to it than just noise), our faculty of language. There are very good reasons to think that our language faculty is very different in kind and in consequences from birds’ song faculty, dogs’ barking faculty, etc. (see Hauser, Chomsky, and Fitch 2002, chapter 21 and the references given there). Above all, it is different in kind because it is unbounded in nature. Berwick and Chomsky (2016:1) introduce what they refer to as the Basic Property of human language in the following terms: ‘a language is a finite computational system yielding an infinity of expressions, each of which has a definite interpretation in semantic-pragmatic and sensorimotor systems (informally, thought and sound).’ Nothing of this sort seems to exist elsewhere in the animal kingdom (see again Hauser, Chomsky, and Fitch 2002). Its consequences are ubiquitous and momentous: Can it be an accident that the only creature with an unbounded vehicle of this kind for the storage, manipulation, and communication of complex thoughts is the only creature to dominate all others, the only creature with the capacity to annihilate itself, and the only creature capable of devising a means of leaving the planet? The link between the enhanced cognitive capacity brought about by our faculty for language and our advanced technological civilization, with all its consequences good and bad for us and the rest of the biosphere, is surely quite direct. Put simply, no language, then no spaceships, no nuclear weapons, no doughnuts, no art, no iPads, or iPods. In its broadest conception, then, this book is about the thing in our heads that brought all this about and got us—and the creatures we share the planet with, as well as perhaps the planet itself—where we are today.

2 Ian Roberts

1.1.1 Background The concept of universal grammar has ancient pedigree, outlined by Maat (2013). The idea has its origins in Plato and Aristotle, and it was developed in a particular way by the medieval speculative grammarians (Seuren 1998:30–37; Campbell 2003:84; Law 2003:158–168; Maat 2013:401) and in the 17th century by the Port-Royal grammarians under the influence of Cartesian metaphysics and epistemology (Arnauld and Lancelot 1660/1676/1966). Chomsky (1964:16–25, 1965, c hapter 1, 1966/2009, 1968/ 1972/2006, chapter 1) discusses his own view of some of the historical antecedents of his ideas about Universal Grammar (UG henceforth) and other matters, particularly in 17th-century Cartesian philosophy; see also chapter 4. Arnauld and Lancelot’s (1660/1676/1966) Grammaire générale et raisonnée is arguably the key text in this connection and is discussed—from slightly differing perspectives—in chapters 2 and 4, and briefly here. The modern notion of UG derives almost entirely from one individual: Noam Chomsky. Chomsky founded the approach to linguistic description and analysis known as generative grammar in the 1950s and has developed that approach, along with the related but distinct idea of UG, ever since. Many others, among them the authors of several of the chapters to follow, have made significant contributions to generative grammar and UG, but Chomsky all along has been the founder, leader, and inspiration. The concept of UG initiated by Chomsky can be defined as the scientific theory of the genetic component of the language faculty (I give a more detailed definition in (2) below). It is the theory of that feature of the genetically given human cognitive capacity which makes language possible, and at the same time defines a possible human language. UG can be thought of as providing an intensional definition of a possible human language, or more precisely a possible human grammar (from now on I will refer to the grammar as the device which defines the set of strings or structures that make up a language; ‘grammar’ is therefore a technical term, while ‘language’ remains a pre-theoretical notion for the present discussion). This definition clearly provides a characterization of a central and vitally important aspect of human cognition. All the evidence—above all the qualitative differences between human language and all known animal communication systems (see Hauser 1996; Hauser, Chomsky, and Fitch 2002; and c hapter 21)— points to this being a cognitive capacity that only humans (among terrestrials) possess. This capacity manifests itself in early life with little prompting as long as the human child has adequate nutrition and its other basic physical needs are met, and if it is exposed to other humans talking or signing (see c hapters 5, 10, and, in particular, 12). The capacity is universal in the sense that no putatively human ethnic group has ever been encountered or described that does not have language (modalities other than speech or sign, e.g., writing and whistling, are known, but these modalities convey what is nonetheless recognizably language; see Busnel and Classe 1976, Asher and Simpson 1994 on whistled languages). Finally, there is evidence that certain parts of the brain (in particular Broca’s area, Brodmann areas 44 and 45) are centrally involved with language, but crucial aspects of the neurophysiological instantiation of language in the brain

Introduction 3 are poorly understood. More generally in this connection there is the problem of understanding how abstract computational representations and information flows can be in any way instantiated in brain tissue, which they must be, on pain of committing ourselves to dualism—see c hapter 2, Berwick and Chomsky (2016:50) and the following discussion. For all these reasons, UG is taken to be the theory of a biologically given capacity. In this respect, our capacity for grammatical knowledge is just like our capacity for upright bipedal motion (and our incapacity for ultraviolet vision, unaided flight, etc.). It is thus species-specific, although this does not imply that elements of this capacity are not found elsewhere in the animal kingdom; indeed, given the strange circumstances of the evolution of language (on which see Berwick and Chomsky 2016; and section 1.6), it would be surprising if this were not the case. Whether the human capacity for grammatical knowledge is domain-specific is another matter; see section 1.1.2 for discussion of how views on this matter have developed over the past thirty or forty years. In order to avoid repeated reference to ‘the human capacity for linguistic knowledge,’ I will follow Chomsky’s practice in many of his writings and use the term ‘UG’ to designate both the biological human capacity for grammatical knowledge itself and the theory of that capacity that we are trying to construct. Defined in this way, UG is related to but distinct from a range of other notions: biolinguistics, the faculty of language (both broad and narrow as defined by Hauser, Chomsky, and Fitch 2002), competence, I-language, generative grammar, language universals and metaphysical universals. I will say more about each of these distinctions directly. But first a distinct clarification, and one which sheds some light on the history of linguistics, and of generative grammar, as well as taking us back to the 17th-century French concept of grammaire générale et raisonnée. UG is about grammar, not logic. Since antiquity, the two fields have been seen as closely related (forming, along with rhetoric, the trivium of the medieval seven liberal arts). The 17th-century French grammarians formulated a general grammar, an idea we can take to be very close to universal grammar in designating the elements of grammar common to all languages and all peoples; in this their enterprise was close in spirit to contemporary work on UG. But it was different in that it was also seen as rational (raisonnée); i.e., reason lay behind grammatical structure. To put this a slightly different— and maybe tendentious—way: the categories of grammar are directly connected to the categories of reason. Grammar (i.e., general or universal grammar) and reason are intimately connected; hence, the grammaire générale et raisonnée is divided into two principal sections, one dealing with grammar and one with logic. The idea that grammar may be understood in terms of reason, or logic, is one which has reappeared in various guises since the 17th century. With the development of modern formal logic by Frege and Russell just over a century ago, and the formalization of grammatical theory begun by the American structuralists in the 1930s and developed by Chomsky in the 1950s (as well as the development of categorial grammars of various kinds, in particular by Adjukiewicz 1935), the question of the relation between formal logic and formal grammar naturally arose. For example, Bar-Hillel (1954) suggested

4 Ian Roberts that techniques directly derived from formal logic, especially from Carnap (1937 [1934]) should be introduced into linguistic theory. In the 1960s, Montague took this kind of approach much further, using very powerful logical tools and giving rise to modern formal semantics; see the papers in Montague (1974) and the brief discussion in section 1.3. Chomsky’s view is different. Logic is a fine tool for theory construction, but the q uestion of the ultimate nature of grammatical categories, representations, and other c onstructs— the question of the basic content of UG as a biological object—is an empirical one. How similar grammar will turn out to be to logic is a matter for investigation, not decision. Chomsky made this clear in his response to Bar-Hillel, as the following quotation shows: The correct way to use the insights and techniques of logic is in formulating a general theory of linguistic structure. But this does not tell us what sort of systems form the subject matter for linguistics, or how the linguist may find it profitable to describe them. To apply logic in constructing a clear and rigorous linguistic theory is different from expecting logic or any other formal system to be a model for linguistic behavior (Chomsky 1955:45, cited in Tomalin 2008:137).

This attitude is also clear from the title of Chomsky’s Ph.D. dissertation, the foundational document of the field: The Logical Structure of Linguistic Theory (Chomsky 1955/ 1975). Tomalin (2008:125–139) provides a very illuminating discussion of these matters, noting that Chomsky’s views largely coincide with those of Zellig Harris in this respect. The different approaches of Chomsky and Bar-Hillel resurfaced in yet another guise in the late 1960s in the debate between Chomsky and the generative semanticists, some of whom envisaged a way to once again reduce grammar to logic, this time with the technical apparatus of standard-theory deep structure and the transformational derivation of surface structure by means of extrinsically ordered transformations (see Lakoff 1971, Lakoff and Ross 1976, McCawley 1976, and section 1.1.2 for more on the standard theory of transformational grammar). In a nutshell, and exploiting our ambiguous use of the term ‘UG’: UG as the theory of human grammatical knowledge must depend on logic; just like any theory of anything, we don’t want it to contain contradictions. But UG as human grammatical knowledge may or may not be connected to any given formalization of our capacity for reason; that is an empirical question (to which recent linguistic theory provides some intriguing sketches of an answer; see chapters 2 and 9). Let us now look at the cluster of related but largely distinct concepts which surround UG and sometimes lead to confusion. My goal here is above all to clarify the nature of UG, but at the same time the other concepts will be to varying degrees clarified. Biolinguistics. One could, innocently, take this term to be like ‘sociolinguistics’ or ‘psycholinguistics’ in simply designating where the concerns of linguistics overlap with those of another discipline. Biolinguistics in this sense is just those parts of linguistics (looking at it from the linguist’s perspective) that overlap with biology. This overlap area presumably includes those aspects of human physiology that are directly connected to language, most obviously the vocal tract and the structure of the ear (thinking of sign language, perhaps also the physiology of manual gestures and our visual capacity to

Introduction 5 apprehend them), as well as the neural substrate for language in the brain. It may also include whatever aspects of the human genome subserve language and its development, both before and after birth. Furthermore, it may study the phylogenetic development of language, i.e., language evolution. In recent work, however, the term ‘biolinguistics’ has come to designate the approach to the study of the language faculty which, by supposing that human grammatical knowledge stems in part from some aspect of the human genome, directly grounds that study in biology. Approaches to UG (in both senses) of the kind assumed here are thus biolinguistic in this sense. But we can, for the sake of clarity, distinguish UG from biolinguistics. On the one hand, one could study biolinguistics without presupposing a common human basis for grammatical knowledge: basing the study of language in biology does not logically entail that the basic elements of grammar are invariant and shared by all humans, still less that they are innate. Language could have a uniquely human biological basis without any particular aspect of grammar being common to all humans; in that case, grammar would not be of any great interest for the concerns of biolinguistics. This scenario is perhaps unlikely, but not logically impossible; in fact, the position articulated in Evans and Levinson (2009) comes close to this, although these authors emphasize the role of culture rather than biology in giving rise to the general and uniquely human aspects of language. Conversely, one can formulate a theory of UG as an abstract Platonic object with no claim whatsoever regarding any physical instantiation it may have. This has been proposed by Katz (1981), for example. To the extent that the technical devices of the generative grammars that constitute UG are mathematical in nature, and that mathematical objects are abstract, perhaps Platonic, objects, this view is not at all incoherent. So we can see that biolinguistics and UG are closely related concepts, but they are logically and empirically distinct. Combining them, however, constitutes a strong hypothesis about human cognition and its relation to biology and therefore has important consequences for our view of both the ontogeny and phylogeny of language. UG is also distinct from the more general notion of faculty of language. This distinction partly reflects the general difference between the notions of ‘grammar’ and ‘language,’ although the non-technical, pre-theoretical notion of ‘language’ is very hard to pin down, and as such not very useful for scientific purposes. UG in the sense of the innate capacity for grammatical knowledge is arguably necessary for human language, and thus central to any conception of the language faculty, but it is not sufficient to provide a theoretical basis for understanding all aspects of the language faculty. For example, one might take our capacity to make computations of a Gricean kind regarding our interlocutor’s intentions (see Grice 1975) as part of our language faculty, but it is debatable whether this is part of UG (although, interestingly, such inferences are recursive and so may be quite closely connected to UG). In a very important article, much-cited in the chapters to follow, Hauser, Chomsky, and Fitch (2002) distinguish the Faculty of Language in the Narrow sense (FLN) from the Faculty of Language in the Broad sense (FLB). They propose that the FLB includes all aspects of the human linguistic capacity, including much that is shared with other

6 Ian Roberts species: ‘FLB includes an internal computational system (FLN, below) combined with at least two other organism-internal systems, which we call “sensory-motor” and “conceptual-intentional” ’ (Hauser, Chomsky, and Fitch 2002:1570); ‘most, if not all, of FLB is based on mechanisms shared with nonhuman animals’ (Hauser, Chomsky, and Fitch 2002:1572). FLN, on the other hand, refers to ‘the abstract linguistic computational system alone, independent of the other systems with which it interacts and interfaces’ (Hauser, Chomsky, and Fitch 2002:1570). This consists in just the operation which creates recursive hierarchical structures over an unbounded domain, Merge, which ‘is recently evolved and unique to our species’ (Hauser, Chomsky, and Fitch 2002:1572). Indeed, they suggest that Merge may have developed from recursive computational systems used in other cognitive domains, for example, navigation, by a shift from domain-specificity to greater domain-generality in the course of human evolution (Hauser, Chomsky, and Fitch 2002:1579). UG is arguably distinct from both FLB and FLN. FLB may be construed so as to include pragmatic competence, for example (depending on one’s exact view as to the nature of the conceptual-intentional interface) and so the point made earlier about pragmatic inferencing would hold. More generally, Chomsky’s (2005) three factors in language design relate to FLB. These are: (1) Factor One: the genetic endowment, UG. Factor Two: experience, Primary Linguistic Data for language acquisition (PLD). Factor Three: principles not specific to the faculty of language/non-domain-specific optimization strategies and general physical laws. (See in particular chapter 6 for further discussion.) In this context, it is clear that UG is just one factor that contributes to FLB (but see Rizzi’s (this volume) suggested distinction beteen ‘big’ and ‘small’ UG, discussed in section 1.1.2). If FLN consists only of Merge, then presumably there is more to UG since more than just Merge makes up our genetically given capacity for languge (e.g., the status of the formal features that participate in Agree relations in many minimalist approaches may be both domain-specific and genetically given; see section 1.5). Picking up on Hauser, Chomsky, and Fitch’s final suggestion regarding Merge/FLN, the conclusion would be that all aspects of the biologically given human linguistic capacity are shared with other creatures. What is specific to humans is either the fact that all these elements uniquely co-occur in humans, or that they are combined in a particular way (this last point may be important for understanding the evolution of language, as Hauser, Chomsky, and Fitch point out—see also Berwick and Chomsky 2016:157–164 and section 1.6 for a more specific proposal). Competence. The competence/performance distinction was introduced in Chomsky (1965:4) in the following well-known passage: Linguistic theory is concerned primarily with an ideal speaker-listener, in a completely homogeneous speech-community, who knows its [the speech community’s] language perfectly and is unaffected by such grammatically irrelevant conditions as

Introduction 7 memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of this language in actual performance.

So competence represents the tacit mental knowledge a normal adult has of their native language. As such, it is an instantiation of UG. We can think of UG along with the third factors as determining the initial state of language acquisition, S0, presumably the state it is in when the individual is born (although there may be some in utero language acquisition; see chapter 12). Language acquisition transits through a series of states S1 … Sn whose nature is in some respects now quite well understood (see again chapter 12). At some stage in childhood, grammatical knowledge seems to crystallize and a final state SS is reached (here some of the points made regarding non-native language acquisition in chapter 13 are relevant, and may complicate this picture). SS corresponds to adult competence. In a principles-and-parameters approach, we can think of the various transitions from S0 to Ss as involving parameter-setting (see chapters 11, 14, and 18). These transitions certainly involve the interaction of all three factors in language design given in (1), and may be determined by them (perhaps in interaction with general and specific cognitive maturation). Adult competence, then, is partly determined by UG, but is something distinct from UG. I-language. This notion, which refers to the intensional, internal, individual knowledge of language (see chapter 3 for relevant discussion), is largely coextensive with the earlier notion of competence, although competence may be and sometimes has been interpreted as broader (e.g., Hymes’ 1966 notion of communicative competence). The principal difference between I-language and competence lies in the concepts they are opposed to: E-language and performance, respectively. E-language is really an ‘elsewhere’ concept, referring to all notions of language that are not I-language. Performance, on the other hand, is more specific in that it designates the actual use of competence in producing and understanding linguistic tokens: as such, performance relates to individual psychology, but directly implicates numerous aspects of non-linguistic cognition (short-and long-term memory, general reasoning, Theory of Mind, and numerous other capacities). UG is a more general notion than I-language; indeed, it can be defined as ‘the general theory of I-language’ (Berwick and Chomsky 2016:90). Generative grammar. A generative grammar is a mathematical construct, a particular kind of Turing machine, that generates (enumerates) a set of structural descriptions over an unbounded domain, creating the ability to make infinite use of finite means. Applying a generative grammar to the description and analysis of natural languages thereby provides a mathematically based account of language’s potentially unbounded expressive power. A generative grammar provides an intensional definition of a language (now construed in the technical sense as a set of strings or structures). Generative grammars are ideally suited, therefore, to capturing the Basic Property of the human linguistic capacity. A given generative grammar is ‘the theory of an I-language’ (Berwick and Chomsky 2016:90). As such, it is distinct from UG, which, as we have seen, is the

8 Ian Roberts theory of I-languages. A particular subset of the set of possible generative grammars are relevant for UG; in the late 1950s and early 1960s, important work was done by Chomsky and others in determining the classes of string-sets different kinds of generative grammar could generate (Chomsky 1956, 1959/1963; Chomsky and Miller 1963; Chomsky and Schutzenberger 1963). This led to the ‘Chomsky hierarchy’ of grammars. The place of the set of generative grammars constituting UG on this hierarchy has not been easy to determine, although there is now some consensus that they are ‘mildly context-sensitive’ (Joshi 1985). Since, for mathematical, philosophical, or information-scientific purposes we can contemplate and formulate generative grammars that fall outside of UG, it is clear that the two concepts are not equivalent. UG includes a subset, probably a very small subset, of the class of generative grammars. Language universals. Postulating UG naturally invites consideration of language universals, and indeed, if the class of generative grammars permitted by UG is limited, then it follows that all known (and all possible) languages will conform to that class of generative grammars and so formal universals (in the sense of Chomsky 1965) of a particular kind must exist. An example is hierarchical structure. If binary Merge is the core operation generating syntactic structure in UG, then ‘flat,’ or non-configurational, languages of the kind discussed by Hale (1983), for example, are excluded, as indeed are ternary or quaternary branching structures (and the like), as well as non-branching structures. The tradition of language typology, begun in recent times by Greenberg (1963/2007) has postulated many language universals (see, e.g., the Konstanz Universals Archive, http://typo.uni-konstanz.de/archive/). However, there is something of a disconnect between much that has been observed in the Greenbergian tradition and UG. Greenbergian universals tend to be surface-oriented and tend to have counter-examples (see chapter 15). Universals determined by UG have neither of these properties: clearly they must be exceptionless by definition, and, as the example of hierarchical structure and the existence of apparently non-configurational languages shows, they may not manifest themselves very clearly in surface phenomena. Nonetheless, largely thanks to the emphasis on comparative work which stems from the principles-and-parameters model, there has been a rapprochement between UG-driven comparative studies and work in the Greenbergian tradition (see c hapters 14 and 15). This has led to some interesting proposals for universals that may be of genuine interest in both contexts (a possible example is the Final over Final Constraint; see Sheehan 2013 and Biberauer, Holmberg, and Roberts 2014). Metaphysical universals. UG is not to be understood as a contribution to a philosophical doctrine of universals. Indeed, from a metaphysical perspective, UG may be somewhat parochial; it may merely constrain and empower human cognitive capacity. It appears not to be found in other currently existing terrestrial creatures, and there is no strong reason to expect it to be found in any extra-terrestrial life form we may discover (or which may discover us). This last point, although hardly an empirical one at

Introduction 9 present, is not quite as straightforward as it might at first appear, and I will return to it in section 1.3. In addition to Merge, UG must contain some specification of the notion ‘possible grammatical category’ (or, perhaps equivalently, ‘possible grammatical feature’). Whatever this turns out to be, and its general outlines remain rather elusive, it is in principle quite distinct from any notion of metaphysical category. Modern linguistic theory concerns the structure of language, not the structure of the world. But this does not exclude the possibility that certain notions long held to be central to understanding semantics, such as truth and reference, may have correlates in linguistic structure or, indeed, may be ultimately reducible to structural notions (see chapters 2 and 9). These remarks are intended to clarify the relations among these concepts, and in particular the concept of UG. No doubt further clarifications are in order, and some are given in the chapters to follow (see especially c hapters 3 and 5). Many interesting and difficult empirical and theoretical issues underlie much of what I have said here, and of course these can and should be debated. My intention here, though, has been to clarify things at the start, so as to put the chapters to follow in as clear a setting as possible. To summarize our discussion so far, we can give the following definition of UG: (2) UG is the general theory of I-languages, taken to be constituted by a subset of the set of possible generative grammars, and as such characterizes the genetically determined aspect of the human capacity for grammatical knowledge. As mentioned, the term ‘UG’ is sometimes used, here and in the chapters to follow, to designate the capacity for grammatical knowledge itself rather than, or in addition to, the theory of that capacity.

1.1.2 Three Stages in the History of UG At the beginning of the previous section I touched on some of the historical background to the modern concept of UG (and see c hapter 4 for more, particularly on Cartesian thought in relation to UG). Here I want to sketch out the ways in which the notion of UG has developed since the 1950s. It is possible to distinguish three main stages in the development of our conception of UG. The first, which was largely rule-or rule-system based, lasted from the earliest formulations until roughly 1980. The second corresponds to the period in which the dominant theory of syntax was Government–Binding (GB) theory; the 1980s and into the early 1990s (Chomsky and Lasnik 1993 was the last general, non-textbook overview of GB theory). The third period was from 1993 to the present, although Hauser, Chomsky, and Fitch (2002) and Chomsky (2005) represent very important refinements of an emerging—and still perhaps not fully settled—view. Each of these stages is informed by, and informs, how both universal and variant properties of languages are viewed.

10 Ian Roberts Taking UG as defined in (2) at the end of the previous section (the general theory of I- languages, taken to be constituted by a subset of the set of possible generative grammars, and as such characterizing the genetically determined aspect of the human capacity for grammatical knowledge), the development of the theory has been a series of attempts to establish and delimit the class of humanly attainable I-languages. The earliest formulations of generative grammars which were intended to do this, beginning with Chomsky (1955/1975), involved interacting rule systems. In the general formulation in Chomsky (1965), the basis of the Standard Theory, there were four principal rule systems. These were the phrase-structure (PS) or rewriting rules, which together with the Lexicon and a procedure for lexical insertion, formed the ‘base component,’ or deep structure. Surface structures were derived from deep structures by a different type of rule, transformations. PS-rules and transformations were significantly different in both form and function. PS-rules built structures, while transformations manipulated them according to various permitted forms of permutation (deletion, copying, etc.). The other two classes of rules were, in a broad sense, interpretative: a semantic representation was derived from deep structure (by Projection Rules of the kind put forward by Katz and Postal 1964), and phonological and, ultimately, phonetic representations were derived from surface structure (these were not described in detail in Chomsky 1965, but Chomsky and Halle 1968 put forward a full-fledged theory of generative phonology). It was recognized right from the start that complex interacting rule systems of this kind posed serious problems for understanding both the ontogeny and the phylogeny of UG (i.e., the biological itself). For ontogeny, i.e., language acquisition, it was assumed that the child was equipped with an evaluation metric which facilitated the choice of the optimal grammar (i.e., system of rule systems) from among those available and compatible with the PLD (see chapter 11). A fairly well-developed evaluation metric for phonological rules is proposed in Chomsky and Halle (1968, ch. 9). Regarding phylogeny, the evolution of UG construed as in (2) remained quite mysterious; Berwick and Chomsky (2016:5) point out that Lenneberg (1967, ch. 6) ‘stands as a model of nuanced evolutionary thinking’ but that ‘in the 1950s and 1960s not much could be said about language evolution beyond what Lenneberg wrote.’ The development of the field since the late 1960s is well known and has been documented and discussed many times (see, e.g., Lasnik and Lohndal 2013). Starting with Ross (1967), efforts were made to simplify the rule systems. These culminated around 1980 with a theory of a somewhat different nature, bringing us to the second stage in the development of UG. By 1980, the PS-and transformational rule systems had been simplified very significantly. Stowell (1981) showed that the PS-rules of the base could be reduced to a simple and general X′-theoretic template of essentially two very general rules. The transformational rules were reduced to the single rule ‘move-α’ (‘move any category anywhere’). These very simple rule systems massively overgenerated, and overgeneration was constrained by a number of conditions on representations and derivations, each characterizing independent but interacting modules (e.g., Case, thematic roles, binding, bounding, control, etc.). The theory was in general highly modular in character. Adding

Introduction 11 to this the idea that the modules and rule systems could differ slightly from language to language led to the formulation of the principles-and-parameters approach and thus to a very significant advance in understanding both cross-linguistic variation and what does not vary. It seemed possible to get a very direct view of UG through abstract language universals proposed in this way (see the discussion of UG and universals in the previous section, and chapter 14). The architecture of the system also changed. Most significantly, work starting in the late 1960s and developing through the 1970s had demonstrated that many aspects of semantic interpretation (everything except argument structure) could not be read from deep structure. A more abstract version of surface structure, known as S-structure, was postulated, to which many of the aforementioned conditions on representations applied. There were still two interpretative components: Phonological Form (PF) and Logical Form (LF, intended as a syntactic representation able to be ‘read’ by semantic interpretative rules). The impoverished PS-rules generated D-structure (still the level of lexical insertion); move-α mapped D-structure to S-structure and S-structure to LF, the latter mapping ‘covert’ since PF was independently derived from S-structure with the consequence that LF was invisible/inaudible, its nature being inferred from aspects of semantic interpretation. The GB approach, combined with the idea that modular principles and rule systems could be parametrized at the UG level, led to a new conception of language acquisition. It was now possible to eliminate the idea of searching a space of rule systems guided by an evaluation metric. In place of that approach, language acquisition could be formulated as searching a finite list of possible parameter settings and choosing those compatible with the PLD. This appeared to make the task of language acquisition much more tractable; see chapter 11 for discussion of the advantages of and problems for parameter- setting approaches to language acquisition. Whilst the GB version of UG seemed to give rise to real advantages for linguistic ontogeny, it did not represent progress with the question of phylogeny. The very richness of the parametrized modular UG that made language acquisition so tractable in comparison with earlier models posed a very serious problem for language evolution. The GB model took both the residual rule systems and the modules, both parametrized, as well as the overall architecture of the system, to be both species-specific and domain- specific. This seemed entirely reasonable as it was very difficult to see analogs to such an elaborate system either in other species or in other human cognitive domains. The implication for language evolution was that all of this must have evolved in the history of our species, although how this could have happened assuming the standard neo- Darwinian model of descent through modification of the genome was entirely unclear. The GB model thus provided an excellent model for the newly-burgeoning fields of comparative syntax and language acquisition (and their intersection in comparative language acquisition; see chapter 12). However, its intrinsic complexity raised conceptual problems and the evolution question was simply not addressed. These two points in a sense reduce to one: GB theory seemed to give us, for the first time, a workable UG in that comparative and acquisition questions could be fruitfully addressed in a unified

12 Ian Roberts way. There is, however, a deeper question: why do we find this particular UG in this particular species? If we take seriously the biolinguistic aspect of UG, i.e., the idea that UG is in some non-trivial sense an aspect of human biology, then this question must be addressed, but the GB model did not seem to point to any answers, or indeed to any promising avenues of research. Considerations of this kind began to be addressed in the early 1990s, leading to the proposal in Chomsky (1993) for a minimalist program for linguistic theory. The minimalist program (MP) differs from GB in being essentially an attempt to reduce GB to barest essentials. The motivations for this were in part methodological, essentially Occam’s razor: reduce theoretical postulates to a minimum. But there was a further conceptual motivation: if we can reduce UG to its barest essentials we can perhaps see it as optimized for the task of relating sound and meaning over an unbounded domain. This in turn may allow us to see why we have this particular UG and not some other one; UG conforms to a kind of conceptual necessity in that it only contains what it must contain. Simplifying UG also means that less is attributed to the genome, and correspondingly there is less to explain when it comes to phylogeny. The task then was to subject aspects of the GB architecture to a ‘minimalist critique.’ This approach was encapsulated by the Strong Minimalist Thesis (SMT), the idea that computational system optimally meets conditions imposed by the interfaces (where ‘optimal’ should be understood in terms of maximally efficient computation). Developing and applying these ideas has been a central concern in linguistic theory for more than two decades. Accordingly, where GB assumed four levels of representation for every sentence (D-Structure, S-Structure, Phonological Form (PF), and Logical Form (LF)), the MP assumes just the two ‘interpretative’ levels. It seems unavoidable to assume these, given the Basic Property. The core syntax is seen as a derivational mechanism that relates these two, i.e., it relates sound (PF) and meaning (LF) over an unbounded domain (and hence contains recursive operations). One the most influential versions of the MP is that put forward in Chomsky (2001a). This makes use of three basic operations: Merge, Move, and Agree. Merge combines two syntactic objects to form a third object, a set consisting of the set of the two merged elements and their label. Thus, for example, verb (V) and object (O), may be merged to form the complex element {V,{V,O}}. The label of this complex element is V, indicating that V and O combine to form a ‘big V,’ or a VP. In this version of the MP labeling was stipulated as an aspect of Merge; more recently, Chomsky (2013, 2015) has proposed that Merge merely forms two-member sets (here {V, O}), with a separate labelling algorithm determining the labels of the objects so formed. The use of set-theoretic notation implies that V and O are not ordered by Merge, merely combined; the relative ordering of V and O is parametrized, and so order is handled by some operation distinct from the one which combines the two elements, and is usually thought to be a (PF) ‘interpretative’ operation of linearization (linear order being required for phonology, but not for semantic interpretation). In general, syntactic structure is built up by the recursive application of Merge. Move is the descendent of the transformational component of earlier versions of generative syntax. Chomsky (2004a) proposed that Move is nothing more than a special

Introduction 13 case of Merge, where the two elements to be merged are an already constructed piece of syntactic structure S which non-exhaustively contains a given category C (i.e., [s … X … C … Y . .] where it is not the case that both X = 0 and Y = 0), and C. This will create the new structure {L{C,S}} (where L is the label of the resulting structure). Move, then, is a natural occurrence of Merge as long as Merge is not subjected to arbitrary constraints. We therefore expect generative grammars to have, in older terminology, a transformational component. The two cases of Merge, the one combining two distinct objects and the one combining an object from within a larger object to form a still larger one, are known as External and Internal Merge, respectively. As its name suggests, Agree underlies a range of morphosyntactic phenomena related to agreement, case, and related matters. The essence of the Agree relation can be seen as a relation that copies ‘missing’ feature values onto certain positions, which intrinsically lack them but will fail to be properly interpreted (by PF or LF) if their values are not filled in. For example, in English a subject NP agrees in number with the verb (the boys leave/ the boy leaves). Number is an intrinsic property of Nouns, and hence of NPs, and so we say that boy is singular and boys is plural. More precisely, let us say that (count) Nouns have the attribute [Number] with (in English) the values [{Singular, Plural}]. Verbs lack intrinsic number specification, but, as an idiosyncracy of English (shared by many, but not all, languages) have the [Number] attribute with no value. Agree ensures that the value of the subject NP is copied into the feature-matrix of the verb (if singular, this is realized in PF as the -s ending on present-tense verbs). It should be clear that Agree is the locus of a great deal of cross-linguistic morphosyntactic variation. Sorting out the parameters associated with the Agree relation is a major topic of ongoing research. In addition to a lexicon, specifying the idiosyncratic properties of lexical items (including Saussurian arbitrariness), and an operation selecting lexical items for ‘use’ in a syntactic derivation, the operation of Merge (both External and Internal) and Agree form the core of minimalist syntax, as currently conceived. There is no doubt that this situation represents a simplification as compared to GB theory. But there is more to the MP than just simplifying GB theory, as mentioned earlier. As we have seen, the question is: why do we find this UG, and not one of countless other imaginable possibilities? One approach to answering this question has been to attempt to bring to bear third-factor explanations of UG. To see what this means, consider again the GB approach to fixing parameter values in language acquisition. A given syntactic structure is acquired on the basis of the combination of a UG specification (e.g., what a verb is, what an object is) and experience (exposure to OV or VO order). UG and experience are the first two factors making up adult competence then. But there remains the possibility that domain-general factors such as optimal design, computational efficiency, etc., play a role. In fact, it is a priori likely that such factors would play a role in UG; as Chomsky (2005:6) points out, principles of computational efficiency, for example, ‘would be expected to be of particular significance for computational systems such as language.’ Factors of this kind make up the third factor of the FLB in the sense of Hauser, Chomsky, and Fitch (2002); see c hapter 6. In these terms, the MP can be viewed as asking

14 Ian Roberts the question ‘How far can we progress in showing that all … language-specific technology is reducible to principled explanation, thus isolating core processes that are essential to the language faculty’ (Chomsky 2005:11). The more we can bring third-factor properties into play, the less we have to attribute to pure, domain-specific, species-specific UG, and the MP leads us to attribute as little as possible to pure UG. There has been some debate regarding the status of parameters of UG in the context of the MP (see Newmeyer 2005; Roberts and Holmberg 2010; Boeckx 2011a, 2014; and chapters 14 and 16). However, as I will suggest in section 1.4, it is possible to rethink the nature of parameters in a such a way that they are naturally compatible with the MP (see again chapter 14). If so, then we can continue to regard language acquisition as involving parameter-setting as in the GB context; the difference in the MP must be that the parametric options are in some way less richly prespecified than was earlier thought; see section 1.4 and chapters 14 and 16 on this point. The radical simplification of UG that the MP has brought about has potentially very significant advantages for our ability to understand language evolution. As Berwick and Chomsky (2016:40) point out, the MP boils down to a tripartite model of language, consisting of the core computational component (FLN in the sense of Hauser, Chomsky, and Fitch 2002), the sensorimotor interface (which ‘externalizes’ syntax) and the conceptual-intentional interface, linking the core computational component to thought. We can break down the question of evolution into three separate questions therefore, and each of these three components may well have a distinct evolutionary history. Furthermore, now echoing Hauser, Chomsky and Fitch’s FLB, it is highly likely that aspects of both interfaces are shared with other species; after all, it is clear that other species have vocal learning and production (and hearing) and that they have conceptual- intentional capacities (if perhaps rudimentary in comparison to humans). So the key to understanding the evolution of language may lie in understanding two things: the origin of Merge as the core operation of syntax, and the linking-up of the three components to form FLB. Berwick and Chomsky (2016:157–164) propose a sketch of an account of language evolution along these lines; see section 1.6. Whatever the merits and details of that account, it is clear that the MP offers greater possibilities for approaching the phylogeny of UG than the earlier approaches did. This concludes our sketchy outline of the development of UG. Essentially we have a shift from complex rule systems to simplified rule systems interacting in a complex way with a series of conditions on representations and derivations, both in a multi- level architecture of grammar, followed by a radical simplification of both the architecture of the system and the conditions on representations and derivations (although the latter have not been entirely eliminated). The first shift moved in the direction of explanatory adequacy in the sense of Chomsky (1964), i.e., along with the principles and parameters formulation it gave us a clear way to approach both acquisition and cross-linguistic comparison (see chapter 5 for extensive discussion of the notion of explanatory adequacy). The second shift is intended to take us beyond explanatory adequacy, in part by shifting some of the explanatory burden away from UG (the first factor in (1)) towards third factors of various kinds (see c hapter 6). How successfully

Introduction 15 this has been or is being achieved is an open question at present. One area where, at least conceptually, there is a real prospect of progress, is in language phylogeny; this question was entirely opaque in the first two stages of UG, but with the move to minimalist explanation it may be at least tractable (see again Berwick and Chomsky 2016 and section 1.6). As a final remark, it is worth considering an interesting terminological proposal made by Rizzi in note 5 to chapter 5. He suggests that we may want to distinguish ‘UG in the narrow sense’ from ‘UG in the broad sense,’ deliberately echoing Hauser, Chomsky, and Fitch’s FLN–FLB distinction. UG in the narrow sense is just the first factor of (1), while UG in the broad sense includes third factors as well; Rizzi points out that: As the term UG has come, for better or worse, to be indissolubly linked to the core of the program of generative grammar, I think it is legitimate and desirable to use the term ‘UG in a broad sense’ (perhaps alongside the term ‘UG in a narrow sense’), so that much important research in the cognitive study of language won’t be improperly perceived as falling outside the program of generative grammar.

Furthermore, Tsimpli, Kambanaros, and Grohmann find it useful to distinguish ‘big UG’ from ‘small UG’ in their discussion of certain language pathologies in chapter 19. The definition of UG given in (2) is in fact ambiguous between these two senses of UG, in that it states that UG ‘characterizes the genetically determined aspect of the human capacity for grammatical knowledge.’ This genetically determined property could refer narrowly to the first factor or more broadly to the both the first and the third factors; the distinction is determined by domain-specificity in that the first factor is specific to language and the third more general. Henceforth, as necessary, I will refer to first-factor- only UG as ‘small’ UG and first-plus-third-factor UG as ‘big’ UG.

1.1.3 The Organization of this Book The chapters to follow are grouped into five parts: the philosophical background, linguistic theory, language acquisition, comparative syntax, and wider issues. With the possible exception of the last, each of these headings is self-explanatory; the last heading is a catch-all intended to cover areas of linguistics for which UG is relevant but which do not fall under core linguistic theory, language acquisition, or comparative syntax. The topics covered in this part are by no means exhaustive, but include creoles (chapter 17), diachrony (chapter 18), language pathology (chapter 19), Sign Language (chapter 20), and the question of whether animals have UG (chapter 21). Language evolution is notably missing here, a matter I will comment on below. The remainder of this introductory chapter deals with each part in turn. The goal here is not to provide chapter summaries, but to connect the chapters and the parts together, bringing out the central themes, and also to touch on areas that are not covered in the chapters. The latter goal is non-exhaustive, and I have selected issues which seem to be

16 Ian Roberts of particular importance and relevance to what is covered, although inevitably some topics are left out. The topic of universal grammar is so rich that even a book of this size is unable to do it full justice.

1.2 Philosophical Background Part I tries to cover aspects of the philosophical background to UG. As already mentioned, the concept of universal grammar (in a general, non-technical sense) has a long pedigree. Like so many venerable ideas in the Western philosophical tradition it has its origins in Plato. Indeed, following Whitehead’s famous comment that ‘the safest general characterization of the European philosophical tradition is that it consists of a series of footnotes to Plato’ (Whitehead 1979:39), one could think of this entire book as a long, linguistic footnote. Certainly, the idea of universal grammar deserves a place in the European (i.e., Western) philosophical tradition (see again Itkonen 2013; Maat 2013). The reason for this is that positing universal grammar, and its technical congener UG as defined in (2) in section 1.1.1, makes contact with several important philosophical issues and traditions. First and most obvious (and most discussed either explicitly or implicitly in the chapters to follow), postulating that human grammatical knowledge has a genetically determined component makes contact with various doctrines of innate ideas, all of which ultimately derive from Plato. The implication of this view is that the human newborn is not a blank slate or tabula rasa, but aspects of knowledge are determined in advance of experience. This entails a conception of learning that involves the interaction of experience with what is predetermined, broadly a ‘rationalist’ view of learning. It is clear that all work on language acquisition that is informed by UG, and the various models of learning discussed by Fodor and Sakas in chapter 11, are rationalist in this sense. The three-factors approach makes this explicit too, while bringing in non- domain-specific third factors as well (some of which, as we have already mentioned, may be ‘innate ideas’ too). Hence, positing UG brings us into contact with the rich tradition of rationalist philosophy, particularly that of Continental Europe of the early modern period. Second, the fact that UG consists of a class of generative grammars, combined with the fact that such grammars make infinite use of finite means owing to their recursive nature, means that we have an account of what in his earlier writings Chomsky referred to as ‘the creative aspect of language use’ (see, e.g., Chomsky 1964:7–8). This relates to the fact that our use of language is stimulus-free, but nonetheless appropriate to situations. In any given situation we are intuitively aware of the fact that there is no external factor which forces us to utter a given word, phrase, or sentence. Watching a beautiful sunset with a beloved friend, one might well be inclined to say ‘What a beautiful sunset!’ but nothing at all forces this, and such linguistic behavior cannot be predicted; one might equally well say ‘Fancy a drink after this?’ or ‘I wonder who’ll win the match tomorrow’ or ‘Wunderbar’ or nothing at all, or any number of other things.

Introduction 17 The creative aspect of language use thus reflects our free will very directly, and positing UG with a generative grammar in the genetic endowment of all humans makes a very strong claim about human freedom (here there is a connection with Chomsky’s political thought), while at the same time connecting to venerable philosophical problems. Third, if I-language is a biologically given capacity of every individual, it must be physically present somewhere. This raises the issue, touched on in the previous section, of how abstract mental capacities can be instantiated in brain tissue. On this point, Berwick and Chomsky (2016:50) say: We understand very little about how even our most basic computational operations might be carried out in neural ‘wetware.’ For example, … the very first thing that any computer scientist would want to know about a computer is how it writes to memory and reads from memory—the essential operation of the Turing machine model and ultimately any computational device. Yet we do not really know how this most foundational element of computation is implemented in the brain….

Of course, future scientific discoveries may remove this issue from the realm of philosophy, but the problem is an extremely difficult one. One possibility is to deny the problem by assuming that higher cognitive functions, including I-language, do not in fact have a physical instantiation, but instead are objects of a metaphysically different kind: abstract, non-physical objects, or res cogitans (‘thinking stuff ’) in Cartesian terms. This raises issues of metaphysical dualism that pose well-known philosophical problems, chief among them the ‘mind–body problem.’ (It is worth pointing out, though, that Kripke 1980 gives an interesting argument to the effect that we are all intuitively dualists). So here UG-based linguistics faces central questions in philosophy of mind. As Hinzen puts it in chapter 2, section 2.2: The philosophy of mind as it began in the 1950s (see e.g. Place (1956); Putnam (1960)) and as standardly practiced in Anglo-American philosophical curricula has a basic metaphysical orientation, with the mind–body problem at its heart. Its basic concern is to determine what mental states are, how the mental differs from the physical, and how it really fits into physical nature.

These issues are developed at length in that chapter, particularly in relation to the doctrine of ‘functionalism,’ and so I will say no more about them here. Positing a mentally-represented I-language also raises sceptical problems of the kind discussed in Kripke (1982). These are deep problems, and it is uncertain how they can be solved; these issues are dealt with in Ludlow’s discussion in chapter 3, section 3.4. The possibility that some aspect of grammatical knowledge is domain-specific raises the further question of modularity. Fodor (1983) proposed that the mind consists of various ‘mental organs’ which are ontogenetically and phylogenetically distinct. I-language might be one such organ. According to Fodor, mental modules, which subserve the domain-general central information processing involved in constructing beliefs and intentions, have eight specific properties: (i) domain specificity; (ii) informational

18 Ian Roberts encapsulation (modules operate with an autonomous ‘machine language’ without reference to other modules or central processors); (iii) obligatory firing (modules operate independently of conscious will); (iv) they are fast; (v) they have shallow outputs, in that their output is very simple; (vi) they are of limited accessibility or totally inaccessible to consciousness; (vii) they have a characteristic ontogeny and, finally, (viii) they have a fixed neural architecture. UG, particularly in its second-stage GB conception, has many of these properties: (i) it is domain-specific by assumption; (ii) the symbols used in grammatical computation are encapsulated in that they appear to be distinct from all other aspects of cognition; (iii) linguistic processing is involuntary—if someone speaks to you in your native language under normal conditions you have no choice in understanding what is said; (iv) speed: linguistic processing takes place almost in real time, despite its evident complexity; (v) shallow ouputs: the interpretative components seem to make do with somewhat impoverished respresentations; (vi) grammatical knowledge is clearly inaccessible to consciousness; (vii) first-language acquisition studies have shown us that I-languages have a highly characteristic ontogeny (see chapter 12); (viii) I-language may have a fixed neural architecture, as evidence regarding recovery from various kinds of aphasia in particular seems to show (see chapter 19). So there appears to be a case for regarding I-language as a Fodorian mental module. But this conclusion does not comport well with certain leading ideas of the MP, notably the role of domain-general factors in determining both I-language and ‘UG in the broad sense’ as defined by Rizzi in c hapter 5 and briefly discussed earlier. In particular, points (i), (ii), and (v) are questionable on a three-factor approach to I-language and UG. Further, the question of the phylogeny of modules is a difficult one; the difference between second-phase and third-phase UG with respect to the evolution of language was discussed in section 1.1.2 and the essentially the same points hold here. The three-factor minimalist view does not preclude the idea of a language module, but this module would, on this view, be less encapsulated and domain-specific than was previously thought and less so than a full-fledged Fodorian module (such as the visual-processing system, as Fodor argues at length). Pylyshyn (1999) argued that informational encapsulation was the real signature property of mental modules; even a minimalist I-language/UG has this property, to the extent that the computational system uses symbols such as ‘Noun,’ ‘Verb,’ etc., which seem to have no correlates outside language (and in fact raise non-trivial questions for phylogeny). So we see that positing UG raises interesting questions for this aspect of philosophy of mind. Finally, if there is a ‘language organ’ or ‘language module’ of some kind in the mind a natural question arises by extension concerning what other modules there might be, and what they might have in common with language. Much discussion of modularity has focussed on visual processing, as already mentioned, but other areas spring to mind, notably music. The similarities between language and music are well-known and have often been commented on (indeed, Darwin 1871 suggested that language might have its origin in sexual selection for relatively good singing; Berwick and Chomsky 2016:3 call this his ‘Caruso’ theory of language evolution). First, both music and language are universal in human communities. Concerning music, Cross and Woodruff (2008:3) point

Introduction 19 out that ‘all cultures of which we have knowledge engage in something which, from a western perspective, seems to be music.’ They also observe that ‘the prevalence of music in native American and Australian societies in forms that are not directly relatable to recent Eurasian or African musics is a potent indicator that modern humans brought music with them out of Africa’ (2008:16). Mithen (2005:1) says ‘appreciation of music is a universal feature of humankind; music-making is found in all societies.’ According to Mithen, Blacking (1976) was the first to suggest that music is found in all human cultures (see also Bernstein 1976; Blacking 1995). Second, music is unique to humans. Regarding music, Cross et al. (n.d.:7–8) conclude ‘Overall, current theory would suggest that the human capacities for musical, rhythmic, behaviour and entrainment may well be species-specific and apomorphic to the hominin clade, though … systematic observation of, and experiment on, non-human species’ capacities remains to be undertaken.’ They argue that entrainment (coordination of action around a commonly perceived, abstract pulse) is a uniquely human ability intimately related to music. Further, they point out that, although various species of great apes engage in drumming, they lack this form of group synchronization (12). Third, music is readily acquired by children without explicit tuition. Hannon and Trainor (2007:466) say: just as children come to understand their spoken language, most individuals acquire basic musical competence through everyday exposure to music during development … Such implicit musical knowledge enables listeners, regardless of formal music training, to tap and dance to music, detect wrong notes, remember and reproduce familiar tunes and rhythms, and feel the emotions expressed through music.

Fourth, both music and language, although universal and rooted in human cognition, diversify across the human population into culture-specific and culturally-sanctioned instantiations: ‘languages’ and ‘musical traditions/genres.’ As Hannon and Trainor (2007:466) say: ‘Just as there are different languages, there are many different musical systems, each with unique scales, categories and grammatical rules.’ Our everyday words for languages (‘French,’ ‘English,’ etc.) often designate socio-political entities. Musical genres, although generally less well defined and less connected to political borders than differences among languages, are also cultural constructs. This is particularly clear in the case of highly conventionalized forms such as Western classical music, but it is equally true of a ‘vernacular’ form such as jazz (in all its geographical and historical varieties). So a question one might ask is: is there a ‘music module’? Another possibility, full discussion of which would take us too far afield here, is whether musical competence (or I-music) is in some way parasitic on language, given the very general similarities between music and language; see, for slightly differing perspectives on this question, Lehrdal and Jackendoff (1983); Katz and Pesetsky (2011); and the contributions to Rebuschat, Rohrmeier, Hawkins, and Cross (2012). Other areas to which the same kind of reasoning regarding the basis of apparently species-specific, possibly domain- specific, knowledge as applied to language in the UG tradition could be relevant include

20 Ian Roberts mathematics (Dehaene 1997); morality (Hauser 2008); and religious belief (Boyer 1994). Furthermore, Chomsky has frequently discussed the idea of a human science- forming capacity along similar lines (Chomsky 1975, 1980, 2000a), and this idea has been developed in the context of the philosophical problems of consciousness by McGinn (1991, 1993). The kind of thinking UG-based theory has brought to language could lead to a new view of many human capacities; the implications of this for philosophy of mind may be very significant. As already pointed out, the postulation of UG situates linguistic theory in the rationalist tradition of philosophy. In the Early Modern era, the chief rationalist philosophers were Descartes, Kant, and Leibniz. The relation between Cartesian thought and generative linguistics is well known, having been the subject of a book by Chomsky (1966/ 2009), and is treated in detail by McGilvray in c hapter 4. McGilvray also discusses the lesser-known Cambridge Neo-Platonists, notably Cudworth, whose ideas are in many ways connected to Chomsky’s. Accordingly, here I will leave the Cartesians aside and briefly discuss aspects of the thought of Kant and Leibniz in relation to UG. Bryan Magee interviewed Chomsky on the BBC’s program Men of Ideas in 1978. In his remarkably lucid introduction to Chomsky’s thinking (see Magee 1978:174–5), Magee concludes by saying that Chomsky’s views on language, particularly language acquisition (essentially his arguments for UG construed as in (2)), sound ‘like a translation in linguistic terms of some of Kant’s basic ideas’ (Magee 1978:175). When put the question that ‘you seem to be redoing, in terms of modern linguistics, what Kant was doing. Do you accept any truth in that?’ (Magee 1978:191), Chomsky responds as follows: I not only accept the truth in it, I’ve even tried to bring it out, in a certain way. However I haven’t myself referred specifically to Kant very often, but rather to the seventeenth-century tradition of the continental Cartesians and the British Neoplatonists, who developed many ideas that are now much more familiar through the writings of Kant: for example the idea of experience conforming to our mode of cognition. And, of course, very important work on the structure of language, on universal grammar, on the theory of mind, and even on liberty and human rights grew from the same soil. (Magee 1978:191)

Chomsky goes on to say that ‘this tradition can be fleshed out and made explicit by the sorts of empirical inquiry that are now possible.’ At the same time, he points out that we now have ‘no reason to accept the metaphysics of much of that tradition, the belief in a dualism of mind and body’ (Magee 1978:191). (The interview can be found at https://www.youtube.com/watch?v=3LqUA7W9wfg). Aside from a generally rationalist perpective, there are possibly more specific connections between Chomsky and Kant. In chapter 2, Hinzen mentions that grammatical knowledge could be seen as a ‘prime instance’ of Kant’s synthetic a priori. Kant made two distinctions concerning types of judgments, one between the analytic and synthetic judgments and one between a priori and a posteriori judgments. Analytic judgments are true in virtue of their formulation (a traditional way to say this is to say that the predicate

Introduction 21 is contained in the subject), and as such do not add to knowledge beyond explaining or defining terms or concepts. Synthetic judgments are true in virtue of some external justification, and as such they add to knowledge (when true). A priori judgments are not based on experience, while a posteriori judgments are. Given the two distinctions, there are four logical possibilities. Analytic a priori judgments include such things as logical truths (‘not-not-p’ is equivalent to ‘p’, for example). Analytic a posteriori judgments cannot arise, given the nature of analytic judgments. Synthetic a posteriori judgments are canonical judgments about experience of a straightfoward kind. But synthetic a priori judgments are of great interest, because they provide new information but independently of experience. Our grammatical knowledge is of this kind; there is more to UG (however reduced in a minimalist conception) than analytic truths, in that the nature of Merge, Agree, and so forth could have been otherwise. On the other hand, this knowledge is available to us independent of experience in that the nature of UG is genetically determined. In particular given the SMT, the minimalist notion of UG having the form it has as a consequence of ‘(virtual) conceptual necessity’ may lead us question this, as it would place grammatical knowledge in the category of the analytic a priori, a point I will return to in section 1.3. A further connection between Kant and Chomsky lies in the notion of ‘condition of possibility.’ Kant held that we can only experience reality because reality conforms to our perceptual and cognitive capacities. Similarly, one could see UG as imposing conditions of possibility on grammars. A human cannot acquire a language that falls outside the conditions of possibility imposed by UG, so grammars are the way they are as a condition of possibility for human language. Turning now to Leibniz, I will base these brief comments on the fuller exposition in Roberts and Watumull (2015). Chomsky (1965:49–52) quotes Leibniz at length as one of his precursors in adopting the rationalist doctrine of innate ideas. While this is not in doubt, in fact there are a number of aspects of Leibniz’s thought which anticipate generative grammar more specifically, including some features of current minimalist theory. Roberts and Watumull sketch out what these are, and here I will summarize the main points, concentrating on two things: Leibniz’s rationalism in relation to language acquisition and his formal system. Roberts and Watumull draw attention to the following quotation from Chomsky (1967b): In the traditional view a condition for … innate mechanisms to become activated is that appropriate stimulation must be presented…. For Leibniz, what is innate is certain principles (in general unconscious), that ‘enter into our thoughts, of which they form the soul and the connection.’ ‘Ideas and truths are for us innate as inclinations, dispositions, habits, or natural potentialities.’ Experience serves to elicit, not to form, these innate structures…. It seems to me that the conclusion regarding the nature of language acquisition [as reached in generative grammar] are fully in accord with the doctrine of innate ideas, so understood, and can be regarded as providing a kind of substantiation and further development of this doctrine. (Chomsky 1967b:10)

22 Ian Roberts Roberts and Watumull point out that the idea that language acquisition runs a form of minimum description length algorithm (see chapter 11) was implicit in Leibniz’s discussion of laws of nature. Leibniz runs a Gedankenexperiment in which points are randomly distributed on a sheet of a paper and observes that for any such random distribution it would be possible to draw some ‘geometrical line whose concept shall be uniform and constant, that is, in accordance with a certain formula, and which line at the same time shall pass through all of those points’ (Leibniz, in Chaitin 2005:63). For any set of data, a general law can be constructed. In Leibniz’s words, the true theory is ‘the one which at the same time [is] the simplest in hypotheses and the richest in phenomena, as might be the case with a geometric line, whose construction was easy, but whose properties and effects were extremely remarkable and of great significance’ (Leibniz, in Chaitin 2005:63). This is the logic of program size complexity or minimum description length, as determined by an evaluation metric of the kind discussed earlier and in chapter 11. This point is fleshed out in relation to the following quotation from Berwick (1982:6, 7, 8): A general, formal model for the complexity analysis of competing acquisition … demands [can be] based on the notion of program size complexity[—]the amount of information required to ‘fix’ a grammar on the basis of external evidence is identified with the size of the shortest program needed to ‘write down’ a grammar. [O]‌ne can formalize the usual linguistic approach of assuming that there is some kind of evaluation metric (implicitly defined by a notational system) that equates ‘shortest grammars’ with ‘simple grammars,’ and simple grammars with ‘easily acquired grammars.’ [Formally, this] model identifies a notational system with some partial recursive function Φi (a Turing machine program) and a rule system as a program p for generating an observed surface set of data…. On this analysis, the more information that must be supplied to fix a rule system, the more marked or more complicated that rule system is…. This account thus identifies simplest with requiring the least additional information for specification[—]simplest = minimum extra information…. The program [model] also provides some insight into the analysis of the ontogenesis of grammar acquisition…. Like any computer program, a program for a rule system will have a definite control flow, corresponding roughly to an augmented flowchart that describes the implicational structure of the program. The flow diagram specifies … a series of ‘decision points’ that actually carry out the job of building the rule system to output. [The] implicational structure in a developmental model corresponds rather directly to the existence of implicational clusters in the theory of grammar, regularities that admit short descriptions. [T]his same property holds more generally, in that all linguistic generalizations can be interpreted as implying specific developmental ‘programs.’

They further connect this conception of language acquisition as being driven by a form of program size complexity/minimum description length with precursors in Leibniz’ thought to the idea developed in Biberauer (2011, 2015) and Roberts (2012) that parameters organize themselves into hierarchies which are traversed in language acquisition following two general third-factor conditions, Feature Economy (posit as few

Introduction 23 features as possible, consistent with PLD) and Input Generalization (maximize available features); see section 1.4 and c hapter 14 for more on these conditions and on parameter hierarchies in general. As they point out ‘the complexities of language acquisition and linguistic generalizations are determined by simple programs—a Leibnizian conclusion evident in the theory of parameter hierarchies’ (Roberts and Watumull 2015:8). Leibniz developed a formal system for combining symbols, as he believed that it was both possible and necessary to formalize the rules of reasoning. In this, he anticipated modern formal logic by well over a century. In the History of Western Philosophy, Russell (1946/2004:541) states that ‘Leibniz was a firm believer in the importance of logic, not only in its own sphere, but as the basis of metaphysics. He did work on mathematical logic which would have been enormously important if he had published it; he would, in that case, have been the founder of mathematical logic, which would have become known a century and a half sooner than it did in fact.’ In his formal system, Leibniz introduced an operator he wrote as ‘⊕’, which Roberts and Watumull dub ‘Lerge,’ owing to its formal similarities to Merge. (3) X ⊕ Y is equivalent to Y ⊕ X. (4) X ⊕ Y = Z signifies that X and Y ‘compose’ or ‘constitute’ Z; this holds for any number of terms. So ⊕ is a function that takes two arguments, α and β, and from them constructs the set {α,β}. In other words it is exactly like Merge. Roberts and Watumull then invoke Leibniz’s principle of the Identity of Indiscernibles, concluding that if Merge and Lerge are formally indiscernible, they are identical: Merge is Lerge. It thus appears that, in addition to developing central ideas of rationalist philosophy, Leibniz’s formal theory anticipated aspects of modern set theory and Merge. In this section, I have sketched some of the issues that link UG to philosophy, primarily philosophy of mind and philosophy of language. These and many other issues are treated in more detail by Hinzen in chapter 2, on philosophy of mind, and by Ludlow in chapter 3, on philosophy of language. In chapter 4 McGilvray discusses the historical and conceptual links between Cartesian philosophy and generative grammar.

1.3 Linguistic Theory Part II of this volume deals with general linguistic theory in relation to UG. Here the central concept is that of explanation, and how UG contributes to that. In chapter 5, Rizzi discusses the classical conception of explanatory adequacy as first put forward in Chomsky (1964) and developed extensively in the second stage of UG work. Chapter 6 focuses exclusively on third-factor explanations, concentrating on domain-general aspects of linguistic structure. In c hapter 7, Newmeyer contrasts formal and functional approaches to explanation. Chapters 8 and 9 deal with two central areas of linguistic

24 Ian Roberts theory, phonology and semantics, discussing how the concept of UG contributes to explanations in these areas. The classical notion of explanatory adequacy can be described as follows. Chomsky (1964) made the distinction between observational, descriptive, and explanatory adequacy. Regarding observational adequacy, he says: Suppose that the sentences (i) John is easy to please. (ii) John is eager to please. are observed and accepted as well-formed. A grammar that achieves only the level of observational adequacy would … merely note this fact one way or another (e.g., by setting up appropriate lists). (Chomsky 1964:34)

But of course the natural question to ask at this point is why these sentences are different in the way we can observe. To answer this question, we need to move to the level of descriptive adequacy. On this, Chomsky says ‘To achieve the level of descriptive adequacy, however, a grammar would have to assign structural descriptions indicating that in (i) John is the direct object of please, while in (ii) it is the logical subject of please.’ In fact, we could assign the two examples the structural descriptions in (5): (5) a

Johni is [AP easy [CP Opi [TP PROarb to please ti ]]]

b Johni is [AP eager [CP [TP PROi to please proarb ]]] These representations capture the basic difference between the two sentences alluded to in the quotation from Chomsky just given, as well as a number of other facts (e.g., that the subject of the infinitive in (5a) is arbitrary in reference, while the notional object of please in (5b) is arbitrary in reference). But linguistic theory must also tell us why these structural descriptions are the way they are. For example, the notations used in (5), CP, PRO, AP, etc., must be explicated. This brings us to explanatory adequacy: we have to explain how the structural descriptions are determined by UG. If we can do this, we also explain how, in principle, the grammatical properties indicated by representations like those in (5) are acquired (and hence how we, as competent adult native speakers, have the intuitions we have). In the second stage of UG, the development of the principles-and-parameters approach made it possible to see the grammar of a given language as an instantiation of UG with parameters fixed. So the properties of the English easy-to-please construction can ultimately be explained in these terms. For example, among the parameters relevant to determining the representation of this construction in (5a) are the following: CP follows the head A (rather than preceding it, as in a head-final language; see chapter 14, section 14.3, and below on the head parameter); English has infinitives, and indeed infinitives of this type; arbitrary null pronouns can appear in this context with the set of properties that we observe them to have, the trace is a wh-trace, etc.

Introduction 25 More generally, the second-stage UG approach offered, really for the first time, a plausible framework in which to capture the similarities and differences among languages within a rigorous formal theory. It also offered a way to approach language acquisition in terms of parameter-setting that was, at least in principle, very straightforward. This can be most clearly seen if we take the view that parametric variation exhausts the possible variation among languages and further assume that there is a finite set of binary parameters. In that case, a given language’s set of parameter settings can be reduced to a binary number n. Concomitantly, the task of the language acquirer is to extrapolate n from the PLD. Abstractly, then, the learner can be seen as a function from a set of language tokens (a text, in the sense of formal learning theory stemming from Gold 1967; see chapter 11) to n. In this way, we have the conceptual framework for the attainment of explanatory adequacy. As we have already seen, the adoption of the goals of the MP requires us to rethink the principles-and-parameters approach. First, what we might term ‘methodological minimalism’ requires us to reduce our theoretical machinery as much as possible. This has entailed the elimination of many UG principles, and hence the traditional area of parametrization. Second, there is the matter of ‘substantive minimalism.’ This goes beyond merely a severe application of Occam’s Razor by asking the question of why UG, with the properties we think it has, is that way and not some other way. Here, the third- factor driven approach offers intriguing directions towards an answer. We want to say that I-languages are the way they are because of the way the three factors—‘small’ UG, PLD and the acquirer’s characteristic mode of interaction with it, and third-factor strategies of computational optimization, as well as physical laws—interact. If this can be achieved, and Lohndal and Uriagereka point in some interesting directions in chapter 6, then, in an obvious sense, this takes us ‘beyond explanatory adequacy,’ to a potentially deeper level of understanding of the general nature of I-language and therefore of UG. In this connection, Guardiano and Longobardi’s discussion of notions of explanation in chapter 16, section 16.2, is instructive. They begin their chapter by listing the questions that any parametrized theory of UG must answer, as follows (their (1)): (6)

a

What are the actual parameters of UG?

b

What is the format of a possible parameter?

c

What kind of input sets each of the UG parameters?

d

How do parameter values distribute in space and time?

As they point out, (6a) relates to Chomsky’s (1964) notion of explanatory adequacy, as just introduced, because it concerns the form of UG (on a standard construal of ‘where’ parameters are to be stated, see c hapter 14 and section 14.1.5 on this). As they point out, and as both c hapters 14 and 15 discuss, the principles-and-parameters approach has attained ‘an intermediate level that we can call crosslinguistic descriptive adequacy, i.e., in accounting for grammatical diversity across languages’ (Guardiano and Longobardi, c hapter 16, p. 378), but they echo one of the conclusions of chapter 11 that there is at present no fully adequate account of how parameter values are set on the basis of PLD in language acquisition.

26 Ian Roberts They further point out that (6b) and (6d) go ‘beyond explanatory adequacy,’ in that they are concerned with what Longobardi (2003) called ‘evolutionary adequacy’ and ‘historical adequacy,’ respectively. Somewhat different answers to question (6b) are discussed in chapters 14 and 16. Finally, as they point out, the Parametric Comparison Method (PCM), which is the main subject matter of c hapter 16, is designed precisely to answer question (6d), a question not addressed in earlier work on principles and parameters. Another way to approach the question of what it may mean to go beyond explanatory adequacy is to think in terms of conceptual necessity, again bearing in mind the SMT. For example, it seems conceptually necessary to posit the two interpretative components of the grammar, since the fundamental aspect of language is that it relates sound and meaning, and of course we need a device which actually generates the structures. So the tripartite architecture of the grammar is justified on these grounds, while ‘internal’ levels of representation such as the earlier deep/D-structure and surface/S-structure are not required on the same grounds. One way to think about conceptual necessity, and indeed substantive minimalism more generally, is to ask oneself the following question: How different from what it actually is could UG have been? Taking a cue from Stephen Jay Gould: if we could rerun the tape of language evolution, how differently could UG turn out? Note that both first-and second-stage UG would allow it to be dramatically different (different systems of rule systems; different parameters and conditions on representations/derivations). As already suggested, the tripartite organization seems conceptually necessary, and Merge seems conceptually necessary in that there must be some kind of combinatorial device and simply combining elements to form sets is about the simplest combinatorial operation possible (Watumull 2015 argues that binary Merge is optimally simple in that Merge must combine more than one thing and binary combination is the simplest form of combination). The grammatical categories and features, as well as operations like Agree, however, seem less obviously conceptually necessary. Here the key issue is arguably that the nature and inventory of grammatical categories and features is poorly understood; in the current state of knowledge we barely have an extensional definition of them, still less an intensional one. However, it is clear in principle that the criterion of conceptual necessity may take us to a higher level of explanation. In this connection, it is worth reconsidering the Kantian synthetic a priori again. If we could successfully reduce the whole of UG to actual conceptual necessity, presumably the system would no longer represent this category of knowledge, but would instead fall under the analytic a priori. This consequence may be of philosophical interest, and it does rather change the complexion of how we might think about grammatical knowledge. In this context, a related question (thinking again of rerunning the tape of evolution) is: how universal is UG? Montague, as already briefly mentioned, had what at first sight seems to be a different concept from Chomsky’s. For Montague, ‘universal grammar’ refers to a general theory of syntax and semantics, which could apply to any kind of system capable of expressing true or false propositions. So, as Montague (1970) says: ‘There is in my opinion no important theoretical difference between natural languages and the artificial languages of logicians; indeed, I consider it possible to comprehend the syntax

Introduction 27 and semantics of both kinds of languages within a single natural and mathematically precise theory’ (Montague’s position echoes that of Bar-Hillel 1954 discussed in section 1.1.1, although it is significantly stronger). For Montague natural languages could be described, and, perhaps in some sense explained, as formal systems. This conception of universal grammar is mathematical, rather than psychological or biological; as such, it is universal, and the ontological question of what kind of thing universal grammar is becomes a case of the more general question about mathematical objects (which, as we have seen, may have a Platonic answer). Like any mathematical system, Montague’s universal grammar would have to be maximally simple and elegant. This approach appears to be very different from, and maybe incompatible with, the idea of a biologically given UG. But if, given the MP, the biological UG has the properties it has out of conceptual necessity then it might be closer to Montague’s conception than would at first sight appear. Could evolution give rise to a mathematically optimal system? Here we approach the very general question of convergent evolution. Conway-Morris (2003) puts forward a very strongly convergentist position, arguing that the development of complex, intelligent life forms is all but inevitable given the right initial conditions (which may be cosmically very rare, hence the eerie silence of extraterrestrials). In his words: Evolution is indeed constrained, if not bound. Despite the immensity of biological hyperspace I shall argue that nearly all of it must remain for ever empty, not because our chance drunken walk failed to wander into one domain rather than another but because the door could never open, the road was never there, the possibilities were from the beginning for ever unavailable. (Conway-Morris 2003:12)

Conway-Morris hardly discusses language, but he does say that ‘a universal grammar may have evolved as a result of natural selection that optimizes the exploration of a ‘language space’ in terms of rule-based systems’ (Conway-Morris 2003:253). From an MP perspective, one could interpret this comment as supporting the role of third factors in language evolution. As Conway-Morris points out at length, a consequence of his view is that extraterrestrial biology—where it exists at all—will be very similar to terrestrial biology. If language evolved in the same way as other biological systems (and this does not have to imply that natural selection is the only shaping force; see the discussion in Berwick and Chomsky 2016:16–49, then it would follow from Conway-Morris’ position in a way that is fully consistent with the MP notion of conceptual necessity, that extraterrestrial UG would be very similar to terrestrial UG. UG may be more universal than we once thought.

1.4 Language Acquisition The idea that we may learn something about universal grammar by observing children acquiring language is also an old one. Herodotus relates the story of the Pharoah

28 Ian Roberts Psammetichus (or Psamtik) I, who sought to discover the origin of language by giving two newborn babies to a shepherd, with the instructions that no one should speak to them, but that the shepherd should feed and care for them while listening to determine their first words. When one of the children said ‘Bekos’ the shepherd concluded that the word was Phrygian because that was the Phrygian word for ‘bread.’ Hence it was concluded that Phrygian was the original language. It is not known whether the story is true, but it represents a valid line of thought regarding the determination of the origin of language and possibly of universal grammar, as well as a very strong version of the innateness hypothesis. More recently, children brought up without normal interaction with adults have also been alleged to provide evidence for some version of the idea that the language faculty is innate (see, e.g., Rymer 1994 on the ‘wild child’ Genie; as well as Kegl, Senghas, and Coppola 1999 on the development of a new sign language among a group of deaf children). In the present context, the importance and relevance of the circumstances of language acquisition are self-evident. Accordingly, c hapters 10 through 13, making up Part III of the volume, deal explicitly with this topic from different perspectives, and the topic is taken up in several other chapters too (see in particular chapters 5, 14, 17, 18, and 19). Chapter 10 deals with the most important argument for UG, the argument from the poverty of the stimulus. The poverty-of-the-stimulus argument is based on the observation that there is a significant ‘gap’ between what seems to be the experience facilitating first-language acquisition and the nature of the linguistic knowledge which results. An early statement of this argument is given by Chomsky (1965:58) as follows: A consideration of the character of the grammar that is acquired, the degenerate quality and narrowly limited extent of the available data, the striking uniformity of the resulting grammars, and their independence of intelligence, motivation, and emotional state, over wide ranges of variation, leave little hope that much of the structure of the language can be learned by an organism initially uninformed as to its general character.

Similarly, in his very lucid introduction to the general question of the nature of the learning problem for natural languages, Niyogi (2006) points out that, although children’s learning resources are finite, the target is not completely ‘unknown’: it falls within a limited hypothesis space. As Niyogi points out, it is an accepted result of learning theory that this must be the case: the alternative of a tabula rasa imposes an insuperable learning problem on the child. This is the essence of the argument; it is discussed in detail, along with various counterarguments that have been put forward, by Lasnik and Lidz in chapter 10. In chapter 11, Fodor and Sakas discuss the question of learnability: what kind of evidence is needed for what kind of system such that a learner can converge on a grammatical system. This question turns out be more difficult than was once thought, as the authors show. In c hapter 12, Guasti summarizes what is known about first-language acquisition, and in chapter 13 Sprouse and Schwartz compare non-native acquisition with first-language acquisition, arguing that the two cases are more similar than is often thought.

Introduction 29 The three stages of UG that we introduced in section 1.1.2 are, perhaps not surprisingly, mirrored in the development of acquisition studies. In the first stage, the emphasis was on acquisition of rule systems and the nature of the evaluation metric; in the former connection, Wexler and Culicover (1980) was a major learnability result, as Fodor and Sakas discuss in c hapter 11. In the second stage, parameter-setting was the major concern. Naturally, in this connection comparative acquisition studies came to the fore, led by the pioneering work of Hyams (1983, 1986). In the third stage, there has been more discussion of third factors. In this connection, work by Biberauer (2011, 2015) and Roberts (2012) emphasizing the role of the third-factor conditions Feature Economy (FE) and Input Generalization (IG) has been important. FE and IG were introduced briefly earlier, and can be more fully stated as follows: (7)

(i) Feature Economy (FE) (based on Roberts and Roussou 2003:201): Postulate as few formal features as possible. (ii) Input Generalization (IG) (based on Roberts 2007:275): Maximize available features.

Together, these conditions form an optimal search/optimization strategy of a minimal kind. Roberts (2012) argues that these third factors interact with the other two factors, ‘small’ UG and the way in which the child interacts with the PLD, to give rise to parametric variation (expressed in the form of parameter hierarchies) as an emergent property, answering the otherwise difficult ‘where’ question regarding parameters in a minimalist architecture; see the discussion in chapter 14. A potentially very important development of this idea is found in Biberauer (2011, 2015), Biberauer and Roberts (2015a,b, and in press). There it is proposed that not just parameters (seen as a subset of the set of formal features of UG, see c hapter 14), but also the features themselves may be emergent in this way. If this conjecture turns out to be correct, then ‘small’ UG will be further emptied, and indeed the aspect of UG that perhaps most clearly represents the synthetic a priori will be removed from it, moving us further in the direction of conceptual necessity and a genuinely universal UG.

1.5 Comparative Syntax Since UG is the general theory of I-languages, a major concern of linguistic theory is to develop a characterization of a possible human grammar (Chomsky 1957:50). Hence the question of language universals and comparative linguistics plays a central role (but see 1.1.1 on how the notions of ‘language universal’ and ‘UG’ are nonetheless distinct). Part IV of the volume treats comparative syntax. Syntax has a whole part to itself since it is above all in this area that the principles-and-parameters approach to UG, both in its second and third stages, has been developed and applied; the details of this are given by Huang and Roberts in chapter 14. In chapter 15 Holmberg discusses language

30 Ian Roberts typology, and shows in detail how the Greenbergian cross-linguistic descriptive tradition is connected to and distinct from the Chomskyan tradition of looking for language universals as possible direct manifestations of UG (‘big’ or ‘small’). As already discussed, Guardiano and Longobardi introduce, defend and illustrate the Parametric Comparison Method (PCM) in c hapter 16, a method which, as we saw in the previous section, may afford a new kind of historical adequacy for linguistic theory. The idea that all languages are, as it were, cut from the same cloth, is not new. This is not the place, and I am not the person, to give even an approximate account of the development of ideas about language universals in the Western tradition of linguistic thought. Suffice it to say, however, that the tension between the easily observable arbitrary nature of linguistic form (famously emphasized by de Saussure 1916, but arguably traceable to the Platonic dialogue Cratylus; see Robins 1997:24; Seuren 1998:5–9; Law 2003:20–23; Graffi 2013; Itkonen 2013; Maat 2013; Magnus 2013; Mufwene 2013) and the widely accepted fact that there is a single independent non-linguistic reality ‘out there’ which we seem to be able to talk to each other about using language, has long been recognized in different ways in thought about language. To put it very simplistically, if there is a single reality and if we are able to speak about it in roughly the same way whether, like Plato, we speak Classical Greek, or like de Saussure, Genevan French, or, like me, Modern British English, then something must be universal, or at least very general, to language. More specifically, if logic reflects universal laws of thought, and if logical notions can be expressed through language, then we can observe something universal in language; negation may be a good example of this, in that negation is arguably a universal category of thought which receives arbitrarily differing formal expression in different languages. If something is universal, then it becomes legitimate to ask how much is universal—this is one way of asking what is universal. And of course it is easy to observe that not everything is universal. It only takes the briefest perusal of a grammar or dictionary of a foreign language to recognize that there must be many features of English which distinguish it from some or all other languages. What these are is a matter for empirical investigation. It is here that we see how the study of how languages differ is the reverse of the coin of the study of universals (on this point, see also Kayne 2005a:3), and hence the importance of work, and theories of, comparative syntax for understanding UG (again, ‘big’ or ‘small’). Another way of approaching the question of universals is to look at the origin of languages. If all languages developed from a common ancestor, or if they have all developed in the same way, or if we could show that language was somehow bestowed on humanity by some greater power (the name-giver of the Cratylus, God, Darwinian forces of selection and mutation, conceptually necessary laws governing the design of complex systems, inexorable cosmic forces of convergent evolution, etc.), then our question would surely be answered. We would be able to identify as universal what remains of the original proto-language in all currently existing languages, or what is created by universal processes of change, or the remnants of what was given to us by the higher power (on language change and UG, see chapter 18). Just as we can isolate three stages in the development of UG since the 1950s, we can observe three stages in the development of comparative syntax. In the first stage, it was

Introduction 31 supposed that differences among languages could accounted for in terms of differing rule systems. For example, a VO language might have the PS-rule VP → V NP, while an OV language would have VP → NP V. Clearly there is little in the way of a theory of cross-linguistic variation here. In the second phase of UG, the highly simplified X′- schema which regulates the generation of D-Structures was parametrized. In particular, the Head Parameter was proposed (see c hapter 14, section 14.3.1, for discussion and references): (8) In X′, X {precedes/follows} its complement YP. The grammar of each language makes a selection among precede and follow, giving rise to VO (and general head-initiality) or OV (and general head-finality). (8) itself is part of UG. The goal of comparative syntax at this stage was to discover the class of parameters, and thereby the form of parameters and ultimately enumerate their values for each language (see also chapter 16 and the discussion around (6)). In the third stage, the development of the MP led to some difficulties for this notion of parameter (see in particular Boeckx 2011a, 2014). Building on an earlier proposal in Borer (1984), Chomsky (1995b) proposed that parameters were lexically encoded as differing values of a subset of formal features of functional heads (e.g., ‘φ-features,’ i.e., person, number and gender, Case, categorial and other features; as already mentioned, no exhaustive list of these features has been proposed). This, combined with empirical results that largely indicated that large-scale clustering of abstract morphosyntactic properties predicted by the second-stage concept of parameter of UG was not readily found, led to the development of ‘micro-parametric’ approaches to variation (see in particular Kayne 2005a, Baker 2008b and c hapters 14 and 16). Although clearly descriptively adequate, it is unclear whether this approach has the explanatory value of the second-stage notion of parameter, notably in that it leads to a proliferation of parameters and therefore of parameter values and therefore of grammars. The range of cross- linguistic variation thus seems to be excessively unconstrained. Beginning with Gianollo, Guardiano and Longobardi (2008), a different approach began to arise, which, following Baker (1996, 2008b), recognized the need for macroparameters in addition to microparameters. Roberts (2012) initiated a particular concept of parameter hierarchy (developing an idea first proposed in Baker 2001). As described earlier, this approach treats parameters and parameter hierarchies as emergent properties created by the interaction of (‘small’) UG, the PLD and the two third-factor conditions in (7): Feature Economy (FE) and Input Generalization (IG). For more details, see the discussion and references in chapter 14. Guardiano and Longobardi, in c hapter 16, section 16.9, develop further the concept of Parameter Schemata, first proposed in Gianollo, Guardiano, and Longobardi (2008). More recently, Biberauer and Roberts (2015c) and Roberts (2015) have suggested a rather different departure from a purely microparametric approach. Pursuing the long- noted analogy between parameters and genes (see the discussion of Jacob in Chomsky 1980:67), they suggest that, just as there are ‘master genes’ (genes which control other

32 Ian Roberts genes), so there are ‘master parameters.’ Taking a term from genetics, they call these pleiotropic parameters. Such parameters are ‘deep’ parameters which profoundly influence the overall shape of a grammatical system. They go on to identify a small number of features which have this parametric property, which they call pleiotropic formal features (PFFs), and identify a small number of PFFs (Person, Tense, Case, and Order). This is a very tentative proposal at an early stage of development, but it may represent an important developmenot for the theory of parameters, and therefore of (‘big’) UG.

1.6 Wider Issues Part V deals with a range of wider issues that are of importance for UG, and where UG can, and has informed discussion and research. This includes creoles and creolization (Aboh and deGraff, c hapter 17), diachronic syntax (Fuß, c hapter 18), language pathology (Tsimpli, Kambanaros, and Grohmann, chapter 19), Sign Language (Cecchetto, chapter 20) and the question of whether animals have UG or some correlate (Boeckx, Hauser, and Samuels, c hapter 21). There are numerous interconnections among these chapters, and with the chapters in Parts I–IV, and they attest to the fecundity of the idea of UG across a wide range of domains. Of course, as I already mentioned, the topics covered are not exhaustive. Two areas not included which spring to mind are music and neurolinguistics in relation to UG (but see the remarks on music in section 1.2, and neurolinguistics is discussed in relation to pathology in chapter 19). But the largest lacuna is language evolution, a question whose importance for linguistic theory and any notion of UG is self-evident. Although there is no separate chapter on language evolution, the topic has been touched on several times in the foregoing discussion, and is also discussed, to varying levels of detail, in c hapters 2, 4, 6, 14, 16, 18, and 21. Furthermore, there is an entire volume devoted to language evolution in this very series (Tallerman and Gibson 2012; see in particular chapters 1, 13, 15, 31, 52, 56, and 65 of that volume). Most relevant in the current context, however, is Berwick and Chomsky (2016). Berwick and Chomsky begin by recognizing the difficulties that the question of language evolution has always posed. There are basically three reasons for this. First, language is unique to humans and so cross-species comparative work cannot tell us very much (although see Hauser, Chomsky, and Fitch 2002 and c hapter 21 for some indications). Second, the parts of the human anatomy which subserve language (brains, larynxes, etc.) do not survive in the fossil record, meaning that inferences have to made based on such things as surviving pieces of crania or hyoid bones, etc. Third, it seems that the evolution of language was a rather sudden event, and this goes against the traditional Darwinian idea of evolution proceeding through an accumulation of small changes. Berwick and Chomsky critique the idea that evolution can only proceed incrementally, pointing out that many evolutionary developments (including the origin of complex cells, and eyes; Berwick and Chomsky 2016:37) are of a similarly

Introduction 33 non-micromutational nature, and that modern evolutionary theory can and does take account of this. The general account they give of language evolution can be summarized by the following extended quotation: In some completely unknown way, our ancestors developed human concepts. At some time in the very recent past, apparently some time before 80,000 years ago if we can judge from associated symbolic proxies, individuals in a small group of hominids in East Africa underwent a minor biological change that provided the operation Merge—an operation that takes human concepts as computational atoms and yields structured expressions that, systematically interpreted by the conceptual system, provide a rich language of thought. These processes might be computationally perfect, or close to it, hence the result of physical laws independent of humans. The innovation had obvious advantages and took over the small group. At some later stage, the internal language of thought was connected to the sensorimotor system, a complex task that can be solved in many different ways and at different times. (Berwick and Chomsky 2016:87)

From this quotation we see the role of third factors (‘physical law independent of humans’), and the all-important point that the problem is broken down into three parts: the development of concepts (which remains unexplained), the development of Merge (‘a minor biological change’) and the development of ‘externalization,’ i.e., the link to the articulatory-perceptual interface (roughly speaking, in spoken language, morphology, phonology, and phonetics). The evolution of (‘small’) UG then amounts to the aforementioned ‘minor biological change.’ On this view, then, the three factors of language design all have separate histories: the first factor is that ‘minor biological change’ giving rise to Merge (essentially seen as the sole content of UG; see Hauser, Chomsky, and Fitch 2002); the second factor could not arise until the system was externalized (note that once PLD became possible through externalization language change could start—see chapter 18); the third factors are, at this level of generality, rooted in physical law and hence not subject to biological evolution. In their fourth chapter, Berwick and Chomsky (2016:109ff) treat language evolution as a mystery, and thus the narrative is a whodunnit: what, who, where, when, how, and why. Their solution to the whodunnit is summarized as follows, and again I quote at length: • ‘What’ boils down to the Basic Property of human language—the ability to construct a digitally infinite array of hierarchically structured expressions with determinate interpretations at the interfaces with other organic systems [footnote omitted] • ‘Who’ is us—anatomically modern humans—neither chimpanzees nor gorillas nor songbirds • ‘Where’ and ‘When’ point to sometime between the first appearance of anatomically modern humans in southern Africa roughly 200,000 years ago, prior to the last African exodus approximately 60,000 years ago….

34 Ian Roberts • ‘How’ is the neural implementation of the Basic Property—little understood, but recent empirical evidence suggests that this could be compatible with some ‘slight rewiring of the brain,’ as we have put it elsewhere. • ‘Why’ is language’s use for internal thought, as the cognitive glue that binds together other perceptual and information-processing systems. (Berwick and Chomsky 2016:110–111) The remainder of the chapter fleshes out each of these points in detail, with a particularly interesting proposal regarding the ‘slight rewiring of the brain’ mentioned under ‘How?’ (see Berwick and Chomsky 2016:157–164). If the assiduous and diligent reader of this volume, on reaching the end of chapter 21, feels a need to complete the experience by reading a chapter on UG and language evolution, I recommend they turn to chapter 4 of Berwick and Chomsky (2016).

1.7 Conclusion One thing which should be apparent to the reader who has got this far is that the area of study that surrounds, supports, questions or elaborates the notion (or notions) of universal grammar is extremely rich. My remarks in this introductory chapter have been extremely sketchy for the most part. The chapters to follow treat their topics in more depth, but each is just an overview of a particular area. It is no exaggeration to say that one could spend a lifetime just reading about UG in its various guises. And this is how it should be. The various ideas of UG and universal grammar entertained here represent serious attempts to understand arguably the most extraordinary feature of our species: our ability to produce and understand language (in a stimulus- free yet appropriate manner) in order to articulate, develop, communicate and store complex thoughts. Like any bold and interesting idea, UG has its critics. In recent years, some have been extremely vocal (see in particular Evans and Levinson 2009; Tomasello 2009). Again this is as it should be: all good ideas can and should be challenged. Even if these critics turn out to be correct (although I think it is fair to say that reports of the ‘death of UG’ are somewhat premature), at the very minimum the idea has served as an outstandingly useful heuristic for finding out more about language. But one is naturally led to wonder whether the idea can really be wholly wrong if it is able to yield so much, aside from any intrinsic explanatory value it may have. Both that fecundity and the explanatory force of the idea are amply attested in the chapters to follow.

Pa rt I

P H I L O S OP H IC A L BAC KG ROU N D

Chapter 2

Universal G ra mma r an d Phil osoph y of Mi nd Wolfram Hinzen

2.1 Introduction The traditional term ‘universal grammar’ has acquired a more narrow technical sense in 20th-century linguistics, namely that of the theory of the genetic basis for language acquisition. However, the purpose of contextualizing this enterprise with respect to philosophy invites us to consider it also in its more general and traditional sense. In this sense the project of a universal grammar, dating back to the earliest beginnings of human scientific reflection (Matilal 1990), is that of a ‘general’ or ‘philosophical’ grammar, which as such is not a prescriptive grammar and views itself as a science. It aims to go beyond description so as to achieve a rational explanation for linguistic phenomena. Its basic concern is the nature of grammar as such, rather than any language- specific grammar, as we would expect from any scientific approach to human language. Empirically speaking, the universality of grammatical organization as processed in the brains of neurotypical humans is not in doubt. A general grammar is thus an obvious interest we may have. Such an interest is related to but cannot be equated with an interest in linguistic universals, the interest of the typologist. In typology we aim to detect elements or patterns existing in all human languages without an interest in necessarily accounting for their existence in a general theory of grammar. It is also not the same as a search for the genetic basis of human language, which is an inquiry into the biological source of the patterns that our general grammar will come up with. It is therefore also not inherently tied to some hypothesis of ‘innateness.’ Rather, a general grammar as such merely comprises a rational pattern of linguistic organization characterized in principled terms. How such patterns come to be acquired in infancy, how they evolved in hominid evolution, how and where they are processed in the human brain, and what their genetic or neurophysiological basis is are all related but further questions. A theory of grammar will only

38 Wolfram Hinzen partially depend on their respective answers, and those who reject a language-specific genetic endowment in humans (e.g., Tomasello 2008; Evans and Levinson 2009) still owe us a general grammar in the above sense, which competes with other such grammars on its own empirical grounds, however the issue of acquisition is settled. Why should a philosopher be interested in a general grammar? A philosopher, for example, would clearly not normally have an interest in the grammar of, say, Korean, which in part is a contingent matter of history and convention, of no greater interest to the philosopher than the grammar of Russian. But a philosopher would naturally have an interest in language as such because of the foundational role it plays in human existence and in reflecting general properties of our minds, in which universal aspects of language may be grounded, or whose universal aspects language may ground. The general grammars of the 17th century, culminating in the landmark Grammaire du Port Royal (1660), sought to ground grammar in the rational and universal principles of human thought and in logic, thus illustrating the former option; some paradigms of late Paninian linguistics in Ancient India (Matilal 1990, Chaturvedi 2009), as well as arguably Modistic universal grammar (Covington 2009), illustrate the latter. Independently of such specific takes, it is clear that our sapiens-specific form of mental organization is paradigmatically revealed in language, of which grammar is one of the two primary organizational principles, the other being the lexicon. Put differently, there are words, and then there are the relations between them. Hence a philosopher might ask about the nature and significance of grammatical organization and what it tells us about our minds. Moreover, the ‘concepts’ that philosophers and psychologists commonly take to be the units making up our thoughts are in turn closely related to words, which we might think of as their names, i.e., to lexical organization. Issues of the specificity of both our mindset and of language will enhance this interest, given current evidence that grammatical organization is highly limited in animal communication systems in the wild1 and beyond the scope of what even the most extensive ‘language training’ regimes have been able to implant in the chimpanzee mind (see Tomasello 2008 for review; and c hapter 21). Without grammar, however, there can be no language in the human sense, and without language, as distinct from communication, there would be no human culture and history; and if we subtracted language from human social interactions, these would barely look the same. Having language, much of our communication may be nonverbal, yet without a foundation of language, what would our minds and communications be like? Would there be knowledge? In what sense? What knowledge do we have that animals lack, which makes us acquire language? 1

Birdsong, say, while having a (limited) formal syntax (Berwick et al. 2011) has no grammar in the sense of anything that goes with the forms of meaning found in human language. Suzuki et al. (2016) claim to have the ‘first unambiguous experimental evidence for compositional syntax in a non human vocal system,’ namely Japanese great tits (Parus minor). However, the evidence concerns composite behavior, not compositional meaning, and involves no evidence for headedness, an inherent aspect of word combination in humans. Arnold and Zuberbuehler's (2008) evidence for ‘meaningful combinatorial signals’ in primate communication (putty-nosed monkeys) involves no compositionality insofar as the meaning of individual calls is not preserved.

Universal Grammar and Philosophy of Mind 39 These questions are philosophical ones, which we expect to impact on the philosophies of mind and of language, and on epistemology. How modern linguistics has contributed to their answers and interfaced with the philosophy of mind is the primary topic in this chapter. From its inception, Chomsky’s work on what came to be called ‘generative grammar’ has been linked to the hope that the ‘technical study of language structure can contribute to an understanding of human intelligence’ (Chomsky 1968/ 1972/2006:xiv)—in other words, that the formal study of language structure will reveal general properties of mind that define human nature. A review of the extent to which this hope has been borne out over the last half-century will be a central concern in section 2.3, after reviewing some basic paradigms in the philosophy of mind in the following one. In section 2.4, our focus is more on the future and current visions for where and how linguistics might prove to have a transformative influence on philosophy.

2.2 What Is Philosophy of Mind? The philosophy of mind as it began in the 1950s (see, e.g., Place 1956; Putnam 1960) and as standardly practiced in Anglo-American philosophical curricula has a basic metaphysical orientation, with the mind–body problem at its heart. Its basic concern is to determine what mental states are, how the mental differs from the physical, and how it really fits into physical nature. One of its major paradigms is functionalism, which had formed by the 1970s (e.g., Putnam 1960; Fodor 1968), seeking to specifically define mental states, not in terms of some special ‘stuff ’ (‘mind stuff ’), but in terms of what they do—i.e., the function they carry out, no matter what they are made of. Thus, a state of your watch functions so as to indicate the time—it does not matter whether it is made of plastic, steel, aluminium, or rubber. Similarly, one of your internal states—such as thinking of your friend Alice and regretting her departure—might be defined by its cognitive function or role. As such it happens to be carried out by your physical brain, yet it maybe need not be, and a non-physical creature such as a computer, hypothetical Martian, or android might have the same mental state, as long as their internal states carry out the same functional roles. A function is as such something abstract. If characterized as a way of transforming a certain physically defined input state (e.g., marks on paper, in the case of a computer), into an equally physically defined output state (other marks on paper), it is surely nothing ‘mental.’ Therefore, the riddle posed by the mental is solved. As something abstract, the mind is non-physical, yet it is physically ‘realized,’ and potentially multiply so. Recent metaphysical debate has especially revolved around the exact relation of abstract, functionally characterized ‘mental state types’ to their contingent physical embodiments. This answer to the question of what mental states are illustrates the essentially meta physical agenda of the philosophy of mind. No actual functional definitions of any mental states in purely functional terms have ever been given, to my knowledge. Rather the concept is defended by appeal to the intuition that feeling pain might just mean something like being in a state with particular causal antecedents and consequences

40 Wolfram Hinzen (e.g., bodily injury mechanically causing the belief that something is wrong with the body, then the desire to be out of that state, then wincing, etc.).The discussion, in short, is about what it might be, in general and in foundational terms, to feel pain and how to think of the ontology of such a state. The essential starting point is, as with the identity theory of mind and brain that preceded functionalism, the agenda of refuting the metaphysical dualism of Descartes, who did, in the modern reading of the philosophy of mind, distinguish the mental as a distinct kind of ‘stuff ’ (the ‘mental substance’) from the physical one. Rejecting this dualism is virtually a condition for philosophical respectability in the philosophy of mind today (for some dissenters in philosophy, though, see the ‘non-Cartesian substance dualism’ of Meixner 2004 or Lowe 2008; for some dissenters within physics, see Penrose 1994 and papers collected in Hinzen 2006c). Functionalism remains closely related to behaviorism (e.g., Skinner 1957), to which it is similar not only in its basic focus on the (i) functional analysis of behavior, but also (ii) in its metaphysics (inimical to the mental as distinct from the physical) and (iii) its externalism (i.e., the tendency to see the content of internal states of organisms as reducing to its external causal connections, though in functionalism such internal states can also be the causes and effects of other such internal states). Closely connected to the philosophy of mind’s metaphysical agenda of refuting Descartes and making materialism viable has been an agenda in the philosophy of science, which is concerned with the relation between scientific theories, and in particular what it means to reduce one theory to another—chemistry to physics, say, or, closer to the case at hand, psychology (dealing with mental states) to biology (dealing with organic states). Traditional materialism is generally redefined as ‘physicalism,’ the doctrine that physical states (as defined by whatever physics takes them to be) are the only states there are. On this view biological, psychological, linguistic, moral, or social states do not add anything to the ontology of the real and need to be encompassed somehow by the physical (reduce to, depend on, or ‘supervene’ on it, or else simply not exist). Although the physicalist consensus in the second half of the 20th century has been almost unanimous in philosophy, it is worth noting that it is not actually based on any particular empirical scientific discoveries, though advances in the formal sciences proved crucial.2 There was no specific discovery in the brain sciences that led U. T. Place to proclaim the mind/brain identity theory in 1956, according to which particular mental states are identical to certain states of the brain. That the mind depends on the brain, in particular—as strokes, dementia, alcohol, acute states of being in love, and bumps to the head all reveal in different ways—cannot have been an observation lost on Descartes, who was a practicing scientist spending most of his time working out the organic functions of the human body.3 Rather, the basic foundation for the modern philosophy of 2

In particular, Turing’s (1950) way of making the concept of a universal calculating machine precise, which influenced functionalism but which Chomsky consistently interpreted as not supporting any particular metaphysics of the mind at all (see, e.g., Chomsky 2000a 44–45, 114). 3 Hinzen (2006c) is a special issue partially dedicated to showing that within science (physics and psychology), forms of dualism actually have been defended on a number of empirical grounds.

Universal Grammar and Philosophy of Mind 41 mind from the 1950s is that somehow there is a basic problem with the mental: among the things there are in nature, the mental somehow does not fit. So something needs to be done about it—eliminate it, reduce it to, or identify it with the physical, or reduce it to a mere ‘stance.’ The spectrum of options eventually broadened so as to now include: (i) Instrumentalism, which denied that states such as ‘thinking’ can form the basis of a scientific investigation of the mind and brain, being useful methodological abstractions instead, which we can use ‘as if ’ they depicted real states (the ‘intentional stance’ of Dennett 1987/1996). (ii) Eliminative materialism, which questions this utility, pleading for the elimination of mentalistic vocabulary in the investigation of the brain and human behavior altogether (Churchland 1981).4 (iii) Anomalous monism (Davidson 1980): the view that mental events are identical with physical events, yet the mental is anomalous in that the mental events have to be irreducibly described in mentalistic and normative vocabulary, which prevents strict physical laws from characterizing the mental domain, which involves rule-following in a social-normative sense instead (Kripke 1982). Before going into the question of what generative grammar had to contribute to these paradigms, it is finally worth noting that not only are none of these paradigms based on specific scientific discoveries as noted, but that they also do not seem to suggest any specific ways in which the mind should actually be scientifically studied. Indeed, it strikingly seems that they all to some degree create a basic problem for such a ‘science of the mind’: behaviorism and eliminative materialism, because they deny the existence of mental states; anomalous monism, because a mind science is precisely implied not to be possible on this theory; instrumentalism, because mental states only have an ‘as if ’ reality; and functionalism, because it is not clear how there should be a science of abstract functional ‘mind states’—an idea as rarefied as the idea that biology should study ‘abstract organism types’ that are contingently ‘realized’ as actual organisms, though this is not essential to them. As we shall see now, on the other hand, the idea of such a mind science is the actual starting point of generative grammar in its modern scientific sense. As Chomsky has argued for a half-century (most systematically in Chomsky 2000a), each of the paradigms above, though presented as expressions of ‘philosophical naturalism,’ is in fact an expression of a ‘methodological dualism’ inimical to the study of the mind in naturalistic terms, or as an object of nature, like any other such object. Metaphysical naturalism, which we have illustrated above, in other words, and methodological naturalism, have parted ways—as explained in detail in the next section. It did not go down well in 20th- century philosophy of mind (though in truth it was barely noticed) when Chomsky 4 This view comes even closer to behaviorism (Skinner 1957), which equally makes the foundational assumption that mental states don’t exist and mental state talk has no place in science, which in the case of psychology can be concerned with descriptions of observable behavior alone.

42 Wolfram Hinzen (1966/2009, 2000a, 1968/1972/2006) began arguing that in this way, contemporary philosophy has fallen behind the standards of the natural philosophers of the 17th century, whose scientific vision he characterized as essentially sound.

2.3 Foundational Assumptions in Generative Grammar This section is about what we might call the ‘philosophy of generative grammar,’ yet it turns out that this philosophy is so different from any of the paradigms above that it is worth asking whether it deserves the name ‘philosophy’ at all. This fact is remarkable in light of the other fact that, as philosopher of mind William Lycan (2003) notes, Chomsky’s (1957, 1965) expressly computational view of language processing was a major inspiration for Functionalism in the philosophy of mind, as founded by Hilary Putnam and Jerry Fodor.

What is meant here, on the other hand, by the term ‘computational view of language processing’ is not clear. Chomsky inaugurated modern linguistics as the study of internalized principles on which our knowledge of language rests—rather than as the study of linguistic behavior (Skinner 1957) or of corpora of linguistic productions, which on Chomsky’s view are only a source of evidence for the principles in question rather than the actual object of study. Undoubtedly such knowledge of language exists: speakers of English know (or judge) that ‘glink’ but not ‘glnik’ is a possible word of English; that in ‘John expects to kill Mary,’ the expected killer is John, while in ‘I wonder who John expects to kill Mary,’ the killer may or may not be John; or that in the last sentence, ‘I’ refers to the speaker and ‘John’ does not. Such knowledge is unbounded in the technical sense that no finite set of sentences can non-arbitrarily be specified, beyond which it does not extend; and it must be based on a finite mechanism implanted in the human brain. A theory of this mechanism, a grammar in Chomsky’s technical sense, is ‘generative’ if it is a formally explicit account of (or characterizes) the potential infinity of structures that can enter into the creative aspect of language use and the expression of semantic content in speech (or sign) (Chomsky 1968/1972/2006:20). It is an account of the structures, not their functions or how they are caused, or how they are put to use on an occasion. Effectively Chomsky classed the problem of how they are caused as a non- problem, since normal human behaviors have no non-circularly identifiable mechanical cause (Chomsky 1959); the problem of how they are put to use, by contrast, is classed as a ‘mystery’: the mystery of why, on an occasion, we decide to say what we do, for which no scientific theory exists. Generative grammar is thus not a theory of how speakers that are in particular physical states move to other physical states when reacting causally to a certain input.

Universal Grammar and Philosophy of Mind 43 Moreover, it is not an ‘expressly computational view of language processing,’ since it is not a theory of processing at all but rather a theory of a form of knowledge that humans factually and universally possess. It gives a ‘computational’ account of this knowledge not in any behavioral sense but in the sense that by the 1950s, a clear general understanding of finite generative systems with unbounded scope had been obtained in the formal sciences, which could now be used to give a precise analysis of the traditional Humboldtian notion of making ‘infinite use of finite means.’5 It is clear from this that no prior metaphysics is implied in this enterprise: no view of mental states being functional states, no commitment to functions characterizing cognitive processes as transforming physically describable states into other physically describable states. In fact, it proved to be a crucial element of the generative viewpoint that the perceptual reality of linguistic distinctions does not need to correspond to its physical reality. Thus, native speakers of English will rather unanimously agree that Flying airplanes can be dangerous has two possible meanings, one equivalent to ‘Airplanes that fly can be dangerous,’ the other to ‘It can be dangerous to fly airplanes.’ The meanings arise from a difference in the structural analysis that our brain gives to what is physically or perceptually the exact same sentence. The meaning distinction is real, but it is not a physical distinction in the input. In a similar way, ‘blackboard’ and ‘black board’ differ in the primary stress assigned to the first and second syllables, respectively. In a more complex example, ‘John’s blackboard eraser,’ finer distinctions in four distinct stress values emerge, with the strongest (1) on ‘black,’ the next weakest (2) on ‘John’s,’ the weakest (4) on ‘board,’ and the next to weakest (3) on ‘a’ in ‘eraser’ (see Chomsky 1968/1972/ 2006:117 for a derivation of this stress pattern): 2 1 4 3 John’s blackboard eraser As Chomsky notes, stress contours such as this form a perceptual reality, which trained observers will reach a high degree of unanimity about. There is, however, little reason to suppose that these contours represent a physical reality. It may very well be the case that stress contours are not represented in the physical signal in anything like the perceived detail.

That is, the perceptual reality follows from the principles of the internal knowledge applied by human beings to such an input (and an infinity of others). But the independent physical reality of the latter does not need to bear out such distinctions, any more than a physical description of their brain physiology does, where we may find 5

Chomsky (1968/1972/2006) links the emergence of structural linguistics in the early 20th century out of historical-comparative linguistics in the 19th century to the unavailability in this period of these formal methods, which did not allow one to make the idea of generativity or infinite use of finite means precise.

44 Wolfram Hinzen neural correlates of stress contours, but not stress contours in their perceptual qualities. Linguistic theory, in short, investigates a mental reality, even in investigating something as low-level as speech sounds. It is a science of the mental. But what does ‘mental’ here mean? Does it mean ‘non-physical’? What is the metaphysical commitment? Why could it not to be the functionalist one? At this juncture, we need to recall that Chomsky over many decades has urged a deflationary understanding of terms such as ‘mental’: The term ‘mental’ here is informal and descriptive, pretty much on a par with such loose descriptive terms as ‘chemical,’ ‘electrical,’ ‘optical,’ and others that are used to focus attention on particular aspects of the world that seem to have an integrated character and to be worth abstracting for special investigation, but without any illusion that they carve nature at the joints. (Chomsky 1968/1972/2006:iix)

In other words, when talking about ‘mental’ aspects of organisms we are doing no more than singling out a domain of inquiry. In the intended informal sense, there uncontroversially are such mental aspects of organisms: understanding language, seeing colours, missing someone, or regretting one’s body weight. Moreover, it seems clear that engaging in such activities will involve internal mechanisms of the organism as much as external ones, the latter relating to the organism’s embedding in a wider physical and social world. Denying this would be like denying, in the case of plants, that the growth pattern of a plant is solely due to environmental input and embedding, i.e., stimulation by light, rain, soil, etc. So we will assume for what we might call ‘mental organs’ what anyone assumes for physical ones: that to a significant degree, nativism will be true of them (no one even defends nativism about plants or hearts). A methodological monism will be applied. The internal structures in question in the case of language, moreover, it transpires, have the character of a system of knowledge as illustrated above, which proves to be rather systematic and deductively structured and entails consequences for both the sound and meaning of a potential infinity of linguistic expressions. The task to begin with should therefore be to characterize this knowledge in an explicit fashion, bracketing the question of ontology: reaching descriptive adequacy, and then to ask about its development, physiological basis, and evolution. Again irrespective of ontology, an epistemological consequence will arise, since our stance now entails that basic forms of knowledge (of language, mathematics, music, morality, etc.) are there by nature, hence no more due to normative assessment or ‘justification’ than any other natural object could be. Being a creature of a certain kind, certain things will seem obvious or intelligible to us, with no further justification possible than to say that this is how our minds work. This, overall, illustrates applying a methodological naturalism to the study of naturally evolving systems of knowledge in humans—seemingly uncontentiously, because while such a research program can fail to bear fruits, there is no lack of motivation for starting it. Is this metaphysical naturalism, though? So far, and crucially, no metaphysical issues have been addressed or have needed to be. The formulation of the scientific problem preceded the metaphysical discussion. Yet is the latter independent of the

Universal Grammar and Philosophy of Mind 45 methodological naturalist’s agenda? With the exception of eliminative materialism, none of the standard positions in the philosophy of mind are of course inconsistent with the existence of mental states or properties in different organisms. Rather, for these views the same task would now arise: to tell a story of how these various states relate to physical nature. The philosopher could thus suggest that his central ‘un-Cartesian’ challenge simply remains. The challenge of the anomalous monist remains too: to tell a story of how, if grammar contains a semantic component, as it does on Chomsky’s account, meaning can be subjected to the same standards of inquiry as any physical domain (Davidson 2004). According to the anomalous monist, the essential normativity of meaning forbids this and a methodological naturalism cannot in principle bear any fruit in this domain. Yet we are not at the end of the dialectic yet. For without saying what ‘physical’ means there is no challenge, trivially. We could thus aim to define this notion, and in this way approach the problem as Descartes had done. Like other 17th-century representatives of the ‘mechanical philosophy’ including Galileo, he laid down a priori what is to count as a scientifically respectable notion of matter (or the physical). In the case of Descartes, this strategy was not metaphysical in the contemporary sense but, at least arguably (Hinzen 2006b), the result of a rigorous redefinition of scientific method and the setting of a new standard of intelligibility for the natural philosopher (i.e., scientist). Specifically, Descartes resolved to allow himself nothing but reason and experiment to lead to scientific conclusions and to disallow any such conclusions appealing to properties of matter that violated this new standard. This in particular implied banning ‘occult forces’ of ‘repulsion and attraction’ within matter, which had been part of the historical Aristotelian heritage and baggage. Matter (a ‘body’) could be nothing other than extension as defined through the three dimensions of Euclidean space: length, breadth, and depth. It has ‘modes’ including shape and motion, which cannot be separated from it, with motion only induced mechanically and by contact. It is in the nature of a scientific discovery that with matter thus defined, the world then exhibits aspects that are in conflict with the principles of the mechanical philosophy thus defined—a conflict that appears principled and not merely a result of deficient knowledge on our part. Most clearly, for Descartes, these aspects are seen in the normal use of language so as to creatively express one’s thoughts. Mechanical nature in the form of a machine could not (and has not) replicated that. The mechanical philosophy, therefore, has a principled boundary, which calls for a second principle in the organization of nature: the ‘creative principle,’ which defined the workings of another substance, the mental. The mind–body problem was born, in the shape in which it would define the agenda of 20th-century analytic philosophy of mind. Chomsky’s ‘Cartesian linguistics’ (1966/2009) crucially endorses the plausibility of this conclusion relative to its empirical and methodological assumptions (see also c hapter 4). Even independently of them and its historical context, it suggests, there is something about the normal use of language that escapes, perhaps even in principle, from what mechanically defined input–output relations can capture, of the kind that we could implement in a computer or a learning machine. This does not prevent a methodological

46 Wolfram Hinzen naturalism as applied to systems of knowledge, and it would not even be inconsistent with a metaphysical dualism of the kind that Descartes endorsed, were it not for the fact that no definition of the ‘physical’ of the kind that Descartes proposed can be given today. No such definition, Chomsky claims, could be given, ever since the end of the very century in which the mechanical philosophy flourished also saw its collapse: Newton’s Principia failed to vindicate the notion of matter or body on which mind–body dualism depended, reintroducing an attracting force acting at a distance and stating explicitly that no mechanical explanation of the cause of gravity could be given (Chomsky 2000a, 1968/ 1972/2006:6–8). With mechanical explanation denied for bodies, mind–body dualism collapses and becomes unformulable. The mental creates no special problem in nature for a naturalism to address, when the body creates the same problem. So there is, on this view, no such problem as what was thought to be the starting problem of the 20th-century philosophy of mind. Nor did major scientists of the 18th and 19th century think so, when they concluded in light of Newton’s new mathematical physics, that: Properties ‘termed mental’ are the result of ‘such an organical structure as that of the brain’ (chemist–philosopher Joseph Priestley). … Darwin asked rhetorically why ‘thought, being a secretion of the brain,’ should be considered ‘more wonderful than gravity, a property of matter.’ (Chomsky 1968/1972/2006:9)

Nor do issues of reduction appear a promising path to pursue, when with new and emerging disciplines in the history of science, reduction is barely the rule. In the case of chemistry often cited by Chomsky, it was the established science, physics, which had to ‘upgrade’ to encompass the generalizations of the new science: what happened was the ‘unification [not reduction] of a virtually unchanged chemistry with a radically revised physics’ (Chomsky 1968/1972/2006:174). There appears to be little reason for expecting in the case of linguistics, either, that reduction should take place as an aspect of the normal course of scientific progress, or indeed that this should be the default for emerging disciplines. With gravity taken for granted as an ingredient of matter, in the absence of a mechanical explanation, why is the ‘creative principle’ not? What prevented the systematic development, over the subsequent centuries, of the ideas of the rationalistic psychologists of the 17th century, or the systematic study of the properties and organization of mind and natural knowledge that they targeted? An ‘enormous disparity,’ Chomsky notes (1968/ 1972/2006:7), ‘in the power of the explanatory theories that were developed’ of gravity and the res cogitans, respectively, is an obvious explanation. Yet the importance of the argument stands: the basis for the dissatisfaction with the new physics, namely its breaching of the standard of intelligibility that the mechanical philosophy had set, forcefully expressed by Leibniz, Locke, and by Newton himself, is the same basis on which dualistic rationalistic psychology was rejected. With the former becoming standard scientific coin, an opportunity for developing the latter was created. But it was only systematically taken up in the 20th century, when new tools from the formal sciences had made it possible. This history is why Chomsky plausibly resists calling it the ‘cognitive revolution’: it

Universal Grammar and Philosophy of Mind 47 was no such revolution but the rediscovery of psychological paradigms well articulated 300 years before, although in a completely different context, where it was focused on a different scientific problem: the creative powers of the human mind, unconnected to ‘physicalist’ metaphysical paradigms, and with language at the centre of the problem. With this we reach our first conclusion in this chapter. The rediscovery of universal grammar in the mid-20th century was meant to be much more than that, namely a vision for how philosophical psychology as a whole could be rethought and realigned with its 17th-century early vision, now that it had become formally and scientifically viable. That, we can now see, has barely happened. Rationalistic psychology had a revival in generative linguistics, though with a sharply reduced scope and an essential focus on syntax, with little philosophical participation. As a discussion of Davidson (2004) illustrates, Chomsky’s work is philosophically viewed as contributing to ‘formal syntax’ (or phonology), which does not bear on the interests of the philosopher, who claims meaning as his proprietary domain and does not see the formal study of syntax as contributing to the latter. Where Chomsky strayed from his assigned territory, commenting on epistemology and meaning, he faced prominent opposition from philosophers such as N. Goodman, W. V. O. Quine, and H. Putnam, who sharply opposed the rationalist ingredients of the new paradigm, taking empiricism (for no empirical reasons) to be as mandatory as philosophers of mind would later take ‘physicalism’ (for early objections, many of which recur in today’s empiricist paradigms, see Goodman 1967, Putnam 1967, and Harman 1967, all following the publication of Chomsky 1965). The physicalism and naturalism that would define much of 20th-century philosophy must in turn by Chomsky’s lights appear as a counter-historical curiosity, which posed various obstacles in the path of naturalistic inquiry, while propounding the naturalization of philosophy. Meanwhile elsewhere, philosophers such as J. A. Fodor endorsed generative grammar as the foundation for a new philosophical framework, the ‘Computational- Representational Theory of Mind’ (CRTM), which depicts itself as a ‘Cartesian rationalism’ (Fodor 1975, 2008). Yet core ingredients of it are in a sharp conflict with core dimensions of the generative paradigm (see Rey 2003; Chomsky 2003a), including its functionalism, its externalist notion of ‘representation,’ its ‘cognitivism’ drawing the criticism of those insisting on the ‘embodiment of mind,’ its modularity, and the assumption of a ‘Language of Thought,’ all of which Chomsky has unambiguously questioned. The ‘nativism’ of the CRTM, too, is widely taken to be a substantive and controversial philosophical doctrine, when in Chomsky’s view it is meant to be no less trivial than nativism about plants. If plants or an embryo face a ‘poverty of stimulus’ problem with regard to their respective growth and development (Chomsky 1968/1972/2006:11), why would language-acquiring infants not?6 It sets a problem for scientific investigation, but is misclassified as a controversial ‘philosophical view.’ 6

As Chomsky puts it in lecture notes from 2010: The expectation that language is like everything else in the organic world and therefore is based on a genetically determined initial state that distinguishes, say my granddaughter from my pets.

48 Wolfram Hinzen However classified, philosophical epistemology has proved immune to the idea, and whole textbooks and handbooks on epistemology do not mention it, the paradigm being to define knowledge as justified true belief rather than viewing it as something that can grow by nature and due to its non-analytic content could be classed as a prime instance of ‘synthetic a priori’ knowledge in the sense of Kant. Even today, for most philosophers, Chomsky’s detailed discussions on the philosophy of mind are unknown or perplexing and few systematic discussions of it exist (one being Lycan 2003 in a volume devoted to Chomsky and his Critics). Generative linguistics is not taught in philosophy departments and accordingly, knowledge of linguistics even in the philosophy of language is minimal and is bound to remain so.7 The status of language in philosophy at large tends to be peripheral, with philosophers of mind focused on perception, on consciousness and ‘qualia’ such as pain or seeing red, and with philosophers of language focused on logic. Prospects for a ‘philosophical grammar’ of the kind that 17th-century rationalist psychologists imagined look as dim today as they looked in the 1950s or indeed as they have for several hundred years. The actual influence, then, within philosophy, of technical work in universal grammar in its modern shape to the study of general properties of mind has remained minor. Nonetheless, in the next section I will set myself the challenge to make a suggestion of where, apart from its methodological reorientation, this contribution has factually been. I will take it that doctrines such as ‘nativism’ would be the wrong answer to that question, for the reason given; nor does its deflationary attitude towards the study of mind as depicted above suggest dignifying it with the title of a new ‘philosophy of mind,’ which it clearly does not aim to be.

2.4 What the Linguistic Contribution to the Philosophy of Mind has Been Restricting the scope of an emerging scientific enterprise to a particular domain and methodology may seem a strength and it will often be a necessity. Generative grammar

That assumption has been called the innateness hypothesis … . The literature has a curious character. There are lots of condemnations of the hypothesis but it’s never formulated. And nobody defends it. Its alleged advocates, of whom I am one, have no idea what the hypothesis is. Everyone has some innateness hypothesis concerning language, at least everyone who is interested in the difference between an infant and say her pets. Furthermore the invented term—innateness hypotheses—is completely meaningless. There is no specific innateness hypothesis rather there are various hypothesis about what might be the initial genetically determined state. These hypotheses are of course constantly changing as more is learned. That all should be obvious. Confusion about this matter has reached such extreme levels that it is becoming hard even to unravel, but I put this aside. See https://hotbookworm.wordpress.com/2010/01/11/noam-chomsky-the-biolinguistic-turn-lecturenotes-part-two/ 7

As even textbooks in the ‘philosophy of generative grammar’ attest: see Ott (2012) on Ludlow (2011).

Universal Grammar and Philosophy of Mind 49 with its central focus on syntactic structure viewed in abstraction from semantic interpretation is a case in point. It was motivated by the insight that a universal grammar could not be developed in independently specified semantic terms—say, a theory of reference or of communication, none of which predict or explain constraints we find in language (see chapters 3 and 9 for further discussion). For example, noun phrases exhibit restrictions with respect to the structural position in which they can appear in a sentence, but the constraints do not appear to be semantic ones, at least as semantics was understood at the time. It thus makes sense to aim to develop a purely formal theory specifying these constraints, which came to be known as (structural) Case theory. Over the decades, Case has remained a paradigmatically ‘uninterpretable’ aspect of grammatical organization, and attempts to ‘rationalize’ it in semantic or other terms have consistently failed, making it a prime illustration of the apparent ‘autonomy of syntax.’ Small wonder that, to my knowledge, there is not a single contribution by philosophers of language to Case theory (see Hinzen 2014a for discussion). More generally, a domain centrally characterized by principles such as the Case Filter cannot, it must appear, matter to philosophical inquiry. This methodological restriction also illustrates in what way universal grammar in its 20th-century shape differs from its 17th-century predecessor. According to the grammarians of Port Royal, grammar is non-arbitrary (i.e., rational, general, and scientific) precisely insofar as it reflects the ‘manner in which men use [signs] for signifying their thoughts’ (Arnauld and Lancelot 1660/1676/1966:41). Thought, in short, with its logical structure, is taken for granted, and grammar is grounded therein. Chomsky (1957, 1965) took a more cautious stance, seeking a more principled account of grammar in its own terms, defining the ‘deep structure’ of an expression in particular in linguistic rather than logical terms and leaving the question of its ‘interface’ with non-linguistic thought open. Primary attention to the syntactic component of universal grammar should however not distract from the fact that a semantic component is an inherent part of such a grammar in Chomsky’s technical sense: The grammar of a language … establishes a certain relation between sound and meaning—between phonetic and semantic representations. (Chomsky 1968/1972/ 2006:113)

A linguistic expression is something with meaning inherently—subtracting the meaning does not leave the expression behind. A ‘universal semantics’ will thus be as much part of a universal grammar as a ‘universal phonetics’ (Chomsky 1968/1972/ 2006:111), though both phonetics and semantics are ‘interpretive’ in the sense that phonetic and semantic interpretations are assigned to structures that the syntax provides. In Aspects (1965), Chomsky also makes it clear that for him a complete theory of grammaticality will have to derive forms of semantic deviance as well, such as the one exhibited by a sentence like Colorless green ideas sleep furiously, which, though deviant, still has coherent uses like any other grammatical sentence, such as poetic

50 Wolfram Hinzen ones.8 Even the ‘boundary separating syntax and semantics (if there is one),’ he suggests, should ‘remain open until these fields are much better understood’ (Chomsky 1965:77,159). So meaning clearly remains part of the enterprise, and more recently has come to the fore, as explained in section 2.5. The new methodology follows that of a century of linguistics in which the early Cartesian rationalist grounding of the universality of grammar in the universality of thought was called into question, leading to the downfall of the universal grammar enterprise (Graffi 2001). In the rationalist tradition, thought is rational and universal by definition, structured by logic. The same is assumed as a matter of course in analytic philosophy today, where logic courses in the first or second year teach bewildered philosophy undergraduates to write down ‘logical forms’ in first order predicate logic for various natural language sentences. The student will then also be told, with reference to Russell, that grammar can be ‘systematically misleading’ and is in any case too variable to suit a philosopher’s purpose. In short, a philosopher simply follows his intuitions on what the meaning of a sentence is (never mind where these intuitions come from) and formalizes that, ignoring, if need be, the sentence’s grammar. And why indeed should he care about the contingent ways in which languages have evolved to express logical forms? In a related way, Jespersen wondered whether grammatical categories are purely logical categories, or are they merely linguistic categories? If the former, then it is evident that they are universal, i.e., belong to all languages in common; if the latter, then they, or at any rate some of them, are peculiar to one or more languages as distinct from the rest. Our question is thus the old one: Can there be such a thing as a universal (or general) grammar? (Jespersen 1924:46–47)

And he answered, predictably: grammatical categories are too different from logical ones and too variable to be universal; they do not mirror the immutable logical and semantic structure of thought or reality. The structure of the argument is thus clear: first a system is posited, i.e., thought, which sets a standard, and then grammar fails that standard. So the theory of grammar, if it is to ‘state facts, not desires’ (Jespersen 1924:54–55) has to confine itself to a historical-comparative study. Thought, presumably, is not arbitrary, on the other hand, and it is relevantly universal and logical. But there never seems to have been a naturalistic ‘science of thought,’ as the queerness of this very term suggests (Friedrich Max Mueller’s ‘Science of Thought’ of 1887 being virtually unknown). There is no such thing as the discipline of the ‘evolution/acquisition/neuroscience of thought,’ either. This is a problem commonly unrecognized when language is rationalized as ‘expressing thought’ but no account of (human-specific) thought is given. 8 ‘It can only be the thought of verdure to come, which prompts us in the autumn to buy these dormant white lumps of vegetable matter covered by a brown papery skin, and lovingly to plant them and care for them. It is a marvel to me that under this cover they are labouring unseen at such a rate within to give us the sudden awesome beauty of spring flowering bulbs. While winter reigns the earth reposes but these colorless green ideas sleep furiously’ (C. M. Street).

Universal Grammar and Philosophy of Mind 51 Thus Chomsky (1968/1972/2006:18) cites the ‘distinguished American linguist William Dwight Whitney,’ who greatly influenced structural linguistics, as maintaining that: language in the concrete sense … is… the sum of words and phrases by which any man expresses his thought; the task of the linguist, then, is to list these linguistic forms and to study their individual histories. In contrast to philosophical grammar, Whitney argued that there is nothing universal about the form of language and that one can learn nothing about the general properties of human intelligence from the study of the arbitrary agglomeration of forms that constitutes a human language.

Even in contemporary versions of this idea, which dispense with the idea of a genetically endowed universal grammar (Tomasello 2008), no theory of thought is provided or even recognized as an explanandum, which it would indeed not be on the still common assumption that human and non-human thought is much the same, though it’s expressed in the one case but not in the other. In short, even in those rejecting universal grammar, there is thought, never mind that there is no theory of it, and there is language, which is a conventional means for expressing thought and as such cannot be a science and has no philosophical significance. But how, then, could grammar be non-arbitrary, when at the same time, as in Chomsky’s case, the universality of grammar is not meant to be grounded in universal thought or in meaning, which is the domain in which anyone agrees that universals lie, Neo-Whorfian qualifications aside? To illustrate this problem, let us take Case theory again. It is a prime example for an apparently ‘arbitrary’ constraint on grammar. So if the nature of human thought does not illuminate it, what grounds it, giving it a rationale? The truth be told, three decades after the onset of the first mature Case theory in the Chomskyan paradigm (Chomsky 1981a), Case still lacks a rationale and its status in the theory of grammar is unclear: we do not know whether there is a universal ‘Case Filter’ in the original sense, and whether or not Case theory can be eliminated from the theory of grammar, as some have proposed (cf. Marantz 1991; McFadden 2004; Landau 2006; Sigurðsson 2012; Dierks 2012). Chomsky (2000b) attempts to rationalize the Case Filter in the terms that it represents a mechanism for eliminating uninterpretable features in syntax, yet this, too, would barely rationalize Case, as its rationale would be to eliminate what has no rationale: uninterpretable features in syntax. So the problem is this: having separated language from thought or meaning, how can grammar be both arbitrary and universal? Why would we expect it to be? By a genetic accident? One way of reading the Minimalist Program (Chomsky 1995b), 20 years after its onset, is to address this very problem. The richer the universal grammar we posit, the more properties of language will to that extent be unexplained. They will just be that: general properties of language, which we cannot explain in terms of anything else we know, which means they must be language-specific, genetically endowed ones. So we should instead ask how minimal a universal grammar we could get by with. The less language-specific it looks, the less we have to explain. Based on this, Minimalism offers new tools of analysis and principles for grounding the non-arbitrariness of grammar

52 Wolfram Hinzen (see Hinzen 2012 for a synthetic presentation). It develops an idea present in Chomsky’s thinking since the earliest days, namely that we might be able to attribute a ‘complex human achievement’ not merely to biological evolution (hence universal grammar in the technical sense of genetic endowment for language), but to ‘principles of neural organization that may be more deeply grounded in physical law’ (Chomsky 1965:59). Where that is the case, genetic accidents are not needed; or rather, its accidents will happen within the channels predetermined by physical and chemical law. Where, in particular, linguistic computations are governed by economy and minimality principles as captured by such labels as ‘least effort,’ ‘last resort,’ or ‘shortest move,’ there is nothing to explain: for the design of language proves to be what we would rationally expect it to be in an object of (physical) nature. Scientific explanations are asked for when there is a prediction error: things are not as we would expect them to be. Nor would we ask for further explanations when things are as they have to be by conceptual necessity. Consider ‘D(eep)-Structures’ as posited in the Extended Standard Theory. As a level of representation, they are internal to the grammar and certainly not a conceptual necessity—there is no architectural necessity why the grammar needs such a level. If it exists, this has to be a contingent empirical finding, and it needs to be justified. But a semantic interface between the syntactic and semantic component does not: without it, no language system would arise, as linguistic expressions would not be used to express thoughts. So from an explanatory point of view, we can obtain the interface for free. The Government and Binding framework of the 1980s not only postulated D-structure, but assumed five independent generative systems as inherent to the workings of grammar: the first mapping items from the lexicon to D-structure subject to X- bar theory; the second mapping from D-structure to S-structure, which was meant to be the result of transformational operations displacing constituents to other positions in the phrase marker assembled at D-structure; the third leading from S-structure to a new level ‘LF’ (for Logical Form), where the same computational operations apply, though now with no phonetic effect (Huang 1982 [1998]); the fourth maps S-Structure to the interface with the sound systems—the phonetic component, PC; finally, the fifth maps an LF-representation to a semantic representation SEM of the thought expressed by a sentence, as constructed in the semantic component, SC. Each of these systems is meant to operate cyclically in the sense that in predefined chunks of such structure (e.g., complex verb phrases), a number of operations can apply, which have to be applied in a predetermined order and until they all have applied; the systems are also compositional in the sense that the mappings depend on the part–whole structure of each of the representations involved. A system with five cycles is biologically possible, but the question will immediately arise: ‘Why five?’ The problem is aggravated in the light of the fact that there appear to be great redundancies in the mappings involved, and that essentially the same operation—the basic combinatorial operation, Merge—applies in all these independent generative systems. In short, best-design considerations would suggest, in the absence of contrary evidence, a single-cycle generation (Chomsky 2007a, 2008), which dispenses with all cycles except one: the one that is conceptually necessary and generates a semantic representation from lexical items

Universal Grammar and Philosophy of Mind 53 only, based on the single operation Merge. What is minimally required cannot depend on further justifications; if it satisfies all other conditions on empirical adequacy, a real explanatory advance has been made. The system will now appear in a principled light. Universal grammar as a language-specific and unexplained genetic endowment is reduced and arbitrariness disappears. This leads to the second conclusion of this chapter. In the pursuit of greater explanatory depth, the generative program has converged on an approach that is arguably the most surprising message of the paradigm yet: that ‘design questions’ for human cognitive systems can be asked, and questions of ‘perfect design’ in particular. To what extent is there no more to the design of the language faculty than there has to be, in order for there to be such a faculty at all? Human nature is not only the subject matter of naturalistic inquiry here, but such inquiry has the power to reveal cognition to be subject to generalizations of a deeper sort than contingent genetic history would make us expect, suggesting the human mind to be grounded in natural law. This conclusion is so striking and novel that even raising it as a possibility and pursuing the Minimalist Program empirically is arguably the most innovative contribution made to philosophy for a very long time. It is a possibility that, ironically, brings out the truth in ‘physicalism’ in a completely unsuspected manner (Hinzen 2006b and Mukherji 2011 develop this perspective philosophically).

2.5 Where the Contribution Might Lie: The Future In fact, the vision is deeper and more promising than I have implied, promising to impact deeper still on philosophy. The reason has to do with the fact that universal grammar has always had to negotiate the universal with the language-particular. For example, stress patterns as illustrated in passing above are specific to particular languages, and while we would like the phonetic component of universal grammar to contain principles for which such patterns are possible, they are not themselves a part of such a grammar. What is part of this grammar is what is not language-particular, such as the idea that stress patterns are derived in a cyclic fashion applying particular types of rules in order. Similarly, it is a statement about universal grammar that syntactic derivations are not organized into five distinct cycles but a single one, in the sense of the previous section. If, moreover, we identify cycles with the ‘phases’ of Chomsky (2007a, 2008), then there will be maximally three: CP, vP, and DP, ordered in a part–whole fashion necessarily, in the sense that deriving a CP architecturally presupposes a vP as a proper part, which presupposes a DP as a proper part. In that sense, what phase we are deriving is a consequence of the stage in the derivation we are in. In this context, Hinzen (2014b) further notes that in crossing each phasal boundary, namely D, v, or C, the formal ontology of meaning of the expression changes, in the

54 Wolfram Hinzen following technical sense: within DP, we are confined to deriving object-like denotations; within vP (with Aspect, but no Tense) we derive what are formally events; while only in CP, we derive propositions with a truth value that can be evaluated in context. The distinctions between objects, events, and propositions are formal-ontological distinctions: they arise with respect to the formal ontology of what types of objects we refer to when using language depending what type of grammatical complexity we generate. Again, they are ordered in part–whole relations, since any proposition (say, that Sally met Harry yesterday) architecturally presupposes an event (of Sally meeting Harry), which presupposes event participants (Sally and Harry). Necessarily, denoting propositions/facts/truth values is a process that exhibits the most (indeed maximal) grammatical complexity. Ontological notions such ‘object’ and so forth are vague, as seen from long-standing conceptual discussions in philosophy such as whether events are really objects or whether objects can be (perhaps slow) events. This vagueness we would expect, if the formal-ontological distinctions in question really are grammatical rather than conceptual distinctions, contrary to what a long metaphysical tradition has maintained. On this novel view, they concern what types of denotations arise in the grammatical process. This perspective would cohere with the fact that at the level of lexical concepts as such, no formal ontology as yet exists, or at least is fixed. Thus the lexical concept SMILE can be used to refer to a smile, as in Mary’s smile, in which case it is an object; or it can be used to refer to an event, as in Mary smiles. These two expressions do not differ in terms of their substantive lexical content. Rather, they differ in how we use given lexical concepts to refer to the world, and in whether the referential perspective we adopt is that of an object or an event. They also and correlatively differ in grammar. It is in virtue of this grammatical difference that the latter expression denotes an event, specifies Tense, and has a truth value; while the former is only object-referential. Referential distinctions in this sense are thus grammatical distinctions, not lexical ones; nor are they semantic, since the external scene in which we use one or the other expression can of course be exactly the same and hence does not distinguish between the two. Intuitive differences between cycles/phases are formal differences in how we refer, not in what we refer to (the substantive or semantic content). Ontological distinctions, from this point of view, are grammatical ones in metaphysical disguise; and reference, crucially, is a grammatical concept, not a lexical one. Referentiality is an inherent aspect of grammatical organization, moreover, which we encounter the very moment that grammatical complexity unfolds, which never unfolds without giving rise to reference. Even Colorless green ideas sleep furiously can formally and meaningfully denote a fact, in an appropriate context, as noted. That, in some investigations of syntax, we may want to abstract from meaning should not distract from this intrinsic connection between grammatical organization, phases, truth, and reference, in humans. It should be emphasized that empirically, there is no such thing as a grammatical expression that does not come with a particular kind of referential meaning; and that there is no such thing as an ungrammatical expression that is semantically well formed. Since human knowledge presupposes the relevant forms of meaning or generating

Universal Grammar and Philosophy of Mind 55 appropriate thought contents, we derive, on this path, the viewpoint of Modistic universal grammar that grammar is a condition on human knowledge, representing the world in the format of language, or as known (Covington 2009; Leiss 2009; Hinzen and Sheehan 2013:ch. 1). This perspective also directly addresses a number of ‘deep puzzles’ in philosophy, which appear as pseudo-problems at first sight, while at the same time for a century it has proven hard to demonstrate what makes them so. For example, Frege notes that proper names can appear in predicative positions: e.g., Vienna in Trieste is no Vienna. So what does this mean? Is ‘Vienna,’ as a proper name, not ‘object-denoting,’ denoting a property instead? But what then about Vienna is no Trieste, where ‘Vienna’ does appear to be referential? The trivial solution to this long-standing puzzle could now be that the ontology changes with the grammar. ‘Denoting an object’ or ‘denoting a property’ are not lexical distinctions that can be associated to a word, but grammatical ones. Hence ‘Vienna’ denotes a property when in a predicative position and an object when it is in a referential one. The same approach eliminates the puzzle created by Frege’s (1892) assertion that ‘the concept “horse” is not a concept.’ A ‘concept,’ in Frege’s technical terms, is mapped from a predicative position. But in this expression, ‘the concept “horse” ’ is grammatically a referential one. Hinzen and Sheehan (2013) develop this concept of what grammatical organization entails in full. Language use cannot but be referential: we cannot use language but so as to use words to refer to the world. Words as such never are referential, as noted: they only have lexical concepts capturing, we here assume, perceptual (and also more abstract) classes. Based on such concepts, environmental stimuli of a certain kind pattern into distinct perceptual classes such as MAN, WOMAN, BOY, GIRL, and so forth. But based on no such classes can we make such distinctions as between the man I met yesterday, a man I saw, the ideal man I would like to meet, men in general, man-meat, manhood, or mankind. These are grammatical distinctions. Specifically, nominal reference comprises generic, indefinite-quantificational, indefinite-specific, definite-specific, rigid, deictic, personal. Intuitively, these forms of reference are ordered into a hierarchy ranging from the weakest (most descriptive or lexical) to the strongest (least descriptive and most grammatical) forms of reference. The first person pronoun stands at the top of this hierarchy, lacking any descriptive content, gender, and (arguably) number, and unlike demonstratives failing to take a nominal complement (*I man vs. that man). These forms of reference co-vary with grammatical configurations in a systematic way (Martin and Hinzen 2014; Hinzen, Sheehan, and Reichard 2014). They are not known to occur in any non-grammatical cognitive system. Declarative pointing gestures by pre-linguistic infants from 10 months, in particular, while clearly referential in the sense intended here, are arguably closely integrated with language development and they have no equivalents in non-human primates (Povinelli, Bering, and Giambrone 2003; Iverson and Goldin-Meadow 2005; Cartmill, Hunsicker, and Goldin-Meadow 2014; Mattos and Hinzen 2015). The point of all this is that reference is not only the most central notion in the philosophy of language and closely connected to the notion of intentionality in the philosophy

56 Wolfram Hinzen of mind, but that referentiality more generally is a core feature of the kind of contents that we take human thoughts to have. The existence of these kinds of contents needs to follow from something—exactly as the formal ontology of meaning that we find in language does. Such an ontology, which is a presupposition for thought in any human- specific sense (Davidson 2004) does not come for free. The proposal is now on the table: if grammar generates these distinctions, and we have developed a universal theory of it based on the notion of derivation by phase, there is nothing to stop us from programmatically identifying the structure of thought with the grammatical structure that corresponds to its content. All of the distinctions we need at the level of thought, after all, fall out from what we obtain from grammatical organization, on the new view of what the latter entails; and we obtain the substantive lexical contents from the lexical concepts involved. In that case, there is no ‘semantic interface,’ no ‘mapping’ of a ‘syntactic representation’ to a non-or pre-linguistic ‘semantic component.’ Whatever pre-linguistic thought there is, it is different from what we find in humans; the latter arises with grammatical organization, and hence grammatical organization cannot itself be constrained to express already given thought contents, as early Minimalism maintained. Language expresses thought, but it is not merely an expressive tool. So now we have turned the tables: language is not rationalized as mirroring an independently given thought system, which is the viewpoint of 17th-century Cartesian linguistics; nor is it an independent, genetically-based system viewed as autonomous from the organization of thought; rather, it is the sapiens-specific thought system, of which no science as yet exists. Note that minimally, such a science would have to posit a generative system since thoughts of the relevant type are internally articulated and unbounded in number: They contain a number of lexicalized concepts in a particular structural arrangement, with relations between them that, if not grammatical in nature, need to mirror grammatical structure one to one. Thought is intensional (with an ‘s’), and any grammatical difference makes a difference to the thought articulated. If our theory of grammar has independently arrived at a conception of grammatical structure building that provides the structures relevant to such an articulated and generative thought system, the option arises that the two systems simply coincide. This prospect converges with one of Chomsky’s derived on partially independent grounds, that positing a ‘language of thought’ in Fodor’s sense is unnecessary and redundant: Emergence of unbounded Merge in human evolutionary history provides what has been called a ‘language of thought,’ an internal generative system that constructs thoughts of arbitrary richness and complexity, exploiting conceptual resources that are already available or may develop with the availability of structured expressions. (Chomsky 2007b:22)

However, this account of the emergence of thought becomes substantive only once an account of content has been given—for it is intrinsic to thought that it carries a content

Universal Grammar and Philosophy of Mind 57 of a certain type. Intrinsic to this type are forms of reference. If these are byproducts of the phasal derivation of grammatical complexity based on lexical concepts, the story evolves into a new position on the nature of the thought capacity in humans. In this way, the study of universal grammar has led us into quite different territory, far beyond the confines of the initial methodological restrictions. There is, on this view, no language faculty viewed as distinct from a thought capacity; there is, rather, a distinctly human thought capacity, of which language is an inherent part. On Chomsky’s specific view, the external expression of this thought system using sound or sign is virtually irrelevant to its internal design and content; this is critically not assumed in Hinzen and Sheehan (2013). Indeed, core principles of grammatical organization such as the system of Person make intrinsic reference to speech/sign. Thus in He smiles, the third person marked on ‘he’ means that for a certain event of smiling, there is an agent, who smiles, and there is a speech act, in which this person smiling is not identical the speaker or addressee. With the first person, it is identical. Person is, in this sense, the locus in grammar where grammatical structure and context are brought into contact. Person features have been said to be non-identical on verbs, because verbs are not persons. But Person is a relational category. It is the relation that is interpreted, in the way just indicated, relative to a speech act, as and when that act takes place. Person features where they are visible signal this relation morphologically. But there is nothing uninterpretable about such features, whether marked on verbs or on nouns, and they are interpreted on neither. It is the grammar (the relation) that is meaningful, not the features. Hinzen (2014a) argues controversially that, in the end and from this novel viewpoint, there is nothing uninterpretable about the structural Cases either. These, too, are relations, which can but need not be marked morphologically through Case features, and which also can, if they are so marked, be marked by different features across different languages (the mapping between syntax and morphology is one-to-many: Sigurðsson 2012). The relations arise as nominal arguments cross phasal boundaries entering different phases in which they become part of a different formal ontology, in the above terms. As Chomsky (2001a) himself maintains, the phases are Case-domains—domains in which Case features, in his terms, have to be ‘checked.’ If phases are Case-domains, and phases are interpretable in terms of the formal ontology of semantics as arising grammatically, Case will be interpretable—just not in lexical or thematic role terms, but grammatical semantics instead and the notion of reference in particular. Cases license referential arguments that have become part of either verbal or clausal domains, turning them into grammatical arguments. Without argument relations there would be no propositions. Without propositions, there would be no rational thought. Case is rational in these terms. Hinzen (2014a) argues that this hypothesis predicts the Case anomalies long noted for arguments such as PRO, since these necessarily lack referential independence, being non-overt. It also predicts that nominal arguments, being paradigmatically referential, should be the prime targets of Case relations, and that clauses should be exempt, precisely except when they are in a referential position (Bošković 1995). It also makes

58 Wolfram Hinzen sense of the typological observation that languages that do not (or barely) mark Cases on nouns, such as English or German, nonetheless mark them on pronouns (English) and on determiners (German), but not the reverse (Case marking on bare nouns but not on pronouns, or on predicative nouns but not referential ones): Pronouns are paradigmatically referential devices and determiners have long been regarded as regulating the referentiality of the common nouns to which they attach. Moreover, accusative and dative morphology shows up obligatorily in the interpretations of strongly referential (deictic and personal) object clitic pronouns in Romance, rather than in clitics picking up the reference of weakly referential or predicative nominals, or of embedded clauses that function as predicates (Martín 2012). Next, we expect that ‘exceptional’ Case marking (ECM) would show up in clauses that are referentially ‘weak,’ incomplete, and dependent, lacking finiteness, referential independence, and deictic anchoring in the speech context: the phase boundary of the embedded clause remains penetrable from the outside—an independent Case domain is not demarcated—and therefore an exceptional Case, assigned by v from within the next phase, can enter. But this does not happen in factive non-finite clausal complements, as Kiparsky and Kiparsky (1970) noted. Finally, where we see some Case independence in non-finite embedded clauses with a PRO subject, as in Russian, it makes sense that, as Landau (2008) argues, this phenomenon should be restricted to tensed non-finite clausal complements, which are referentially stronger and grammatically more complex, in Sheehan and Hinzen’s (2011) terms. In short, the claim is that while Case theory began as a paradigmatic instance of the autonomy of grammar and has been philosophically ignored for this reason, the notion of meaning invoked in claims about the uninterpretability of Case has always been a non-grammatically semantic one. If on the other hand there is a different type of meaning, which uniquely patterns with grammar, namely referential meaning, then even paradigmatically ‘uninterpretable’ relations such as Case might start to make new and more principled sense. Grammar, universally, will be a system in which a small number of grammatical relations can be established as holding between lexicalized concepts: thematic relations, Person, Gender, and Number, Tense/Aspect, and, if the above should prove to be right, Case. With these relations in place, a deictic space will be set up in which all of human thinking and referring can now take place: the space in which ‘I,’ the thinker/speaker, addressing a ‘you,’ speaks about an ‘it’/’he’/’she’ viewed as distinct and independent of the I and the you, in an act of speech in which Tense establishes a relation between referenced events and the speech act, and a thought is expressed which is objectively true or false. This is the fine structure of human rationality. If human grammar has the exact resources to put it in place, and if every act of thinking and speaking involves the relevant relations, there can be nothing outside of this space: there is no thought (of the same kind), outside of grammar. Grammar is as intrinsic to this space as geometry is to physical space. Speakers talking inside of this space, moreover, are persons. Persons are, in the paradigmatic case (i.e., when not comatose, newborn, mentally ill, etc.) things that say ‘I,’ distinguish themselves from a ‘you’ they are addressing, and talk about a world that is independent of themselves and their acts of thinking. Ever since Descartes, the inherent

Universal Grammar and Philosophy of Mind 59 connections between personhood, consciousness, and a ‘first person perspective’ have been virtually taken for granted. Yet in line with a purely expressive view of language (‘Cartesian’ in the above terms), the fact that ‘the first person’ is actually a grammatical notion has barely been commented upon, and that there is ‘first person thought’ also independently of whether or not there is language has been widely taken for granted. Language, the idea is, only expresses such thought; but it is not the reason that it exists. Yet what if it is? There is little doubt, at least, that a narrative self, which requires language, is intrinsic to how we behave as persons. There is also well-established evidence that losing a first person perspective—as in the preference of many children on the autism spectrum for third-personal ways of referring to themselves (see, e.g., Shield, Meier, and Tager-Flusberg 2015)—does not merely involve a pragmatic inconvenience but a mental change or deviance (Hinzen et al. 2015). Moreover, in order for ‘the first person’ to only contingently be a grammatical term, it should be possible to spell out what this notion means in terms not invoking any grammatical distinctions. To my knowledge, no such attempt has succeeded, or even been attempted.9 As we have proceeded, then, in this chapter, we see that the foundational significance of grammar has become deeper and deeper. As we think this line of thought to its logical conclusion, grammatical organization may not only be intrinsic to productive thought and its specific type of content in humans, but also to the very way in which humans are persons. This, in fact, is what we would expect, since no thought and no utterance makes sense in the absence of grasping grammatical person distinctions. As Kant put it, for any thought or utterance, there must be an implicit ‘I think that…’ or ‘I say that…’ accompanying it. In grammatical terms, all thinking and speaking is subordinated to the first person (see Ross 1970). Without it, thoughts would float free of their deictic anchoring: I would seem to be thinking something, but I don’t know whose thought it is: mine, yours, or simply a fact. The approach also makes an empirical prediction, the last point in this section. We know that thought, while relevantly universal in the healthy human population (whether or not Neo-Whorfian claims are true), is nonetheless not uniform: thought changes in systematic and systematically different ways as we move from one clinical population to another, such as individuals with Autism Spectrum Conditions (ASC), schizophrenia, bipolar disease, or Huntington’s Disease. These changes are in part genetically conditioned. The prediction is that when and if thought changes, language should change too: the two systems should not dissociate. Moreover, the linguistic changes we see across all of these conditions should systematically match the clinical profile of the relevant populations (for two initial attempts to review the field in schizophrenia and ASC in the light of this line of thought, see Hinzen and Rossello 2015 and Hinzen et al. 2015, respectively). This invites linguists to expand the scope of comparative linguistics. In typology, we are looking at the possibly single-rooted tree of languages, all of which 9 Writing about first-person access to oneself and arriving at this conclusion on a different path, Jaszczolt (2013) relatedly speculates that ‘language not only reflects this complicated state of self- awareness but also participates in creating it.’

60 Wolfram Hinzen characterize the same human cognitive type and genotype. Yet it might also be that this tree has little sub-trees (variants) within itself, which are linguistically different in ways that correlate empirically with different cognitive types. Universal grammar need not be uniform. Language can change, and so can our mind.

2.6 Conclusions While this chapter simply had to begin on a somewhat solemn note, given the history of 20th-century interactions between generative grammar (or Chomsky specifically) and the philosophers of language and mind, I hope to now have turned these first impressions around. The philosophical significance of universal grammar has de facto been limited. Philosophers of language or mind couldn’t for the life of them see the point of struggling to understand the ‘subjacency principle,’ X-bar theory, the Case filter, or why what they were doing all day was supposed to make no sense. Nor has so-called ‘nativism’ been received in philosophy in what I have suggested are the right terms: not as the controversial and profound ‘innateness hypothesis,’ but as an indication for a new area of study, using the attitude of a methodological naturalism and monism with regards to the mental. However, with the advent of the Minimalist Program, the initial focus on the formal study of syntax has given rise to what must appear as a completely novel insight from a philosophical point of view: the mind truly is an object of nature, in the sense that principles that we expect to operate in physical systems are operative here as well, shaping core cognitive systems such as language in a far deeper way than convention and history do. Whatever may ultimately come of the research program itself, it is perhaps an idea too intriguing to not pursue. In the course of pursuing it, questions about the relation between language and thought have newly come to be asked, with the result that, in the limit, the language- thought dichotomy may cease to be a useful one, theoretically and empirically: it may be a single and inherently integrated system that we are looking at, to which grammar and thought may be as essential as their external expression in speech or sign. That single and integrated system, moreover, may be able to change or break down, revealing new paths of inquiry into mental disease and thought pathology. This perspective, strikingly, also brings us back to the earliest beginnings of a scientific approach to grammar, i.e., universal grammar, in India. There, at a time when philosophy and linguistics were much more united than they have been over the past centuries, the late ancient philosopher- linguist Bhartrhari pursued the intuition which is in essence the one depicted in the previous section. As Matilal put it in his seminal book (Matilal 1990:85): ‘language is not the vehicle of meaning or the conveyor belt of thought.’ Rather, it is its generative principle: thought anchors language and language anchors thought. Sabdana, ‘languageing,’ is thinking; and thought ‘vibrates’ through language.

Chapter 3

Un iversal Gra mma r a nd Phil osophy of L a ng uag e Peter Ludlow

Since the initial development of the theory of Universal Grammar (hereafter UG), it has come into contact with the philosophy of language at a number of points, in part because of competing visions about the nature of language and in some cases because of shared philosophical and technical problems. Space does not permit me to explore all of these points of contact (in particular the very many interesting technical issues involving topics that range from the theory of descriptions, to the proper analysis of tense, to the theory of conditionals), so I will instead focus on a particular thread that runs through contemporary philosophy of language. Specifically I’m going to focus on the nature of normative rule following, and explore its relationship to the thesis that UG underwrites a certain kind of linguistic knowledge or competence. I’m going to make the case that there is room within the theory of grammar for a notion of individual norms (note, individual and not social norms) that are determined by a parametric state of UG. While I am pursuing a single thread at the intersection of linguistics and the philosophy of language, we will find that when we pull on this thread a number of the leading issues in the philosophy of language will emerge. For example, we will need to address the nature of linguistic intuitions, the nature of normative guidance in language, the competence/performance distinction, the puzzles about rule following, and finally the dispute between externalism and internalism about content and form. I’ll begin (section 3.1) with a primer on the nature of UG that is aimed at the philosopher of language. In section 3.2 I will turn to the role of linguistic intuitions or judgments and their relationship to UG, and will hypothesize that our having such judgments plays a role in our linguistic rule governance. In section 3.3 I will try to make sense of the notion of individual norms that are not accessible to us, and in section 3.4 I will address the rule following considerations raised by Wittgenstein and Kripke. As we will see, this leads us (section 3.5) into the question of externalism about form, with reference to Chomsky’s distinction between I-language and E-language.

62 Peter Ludlow

3.1 UG for Philosophers of Language Universal Grammar (UG) is the biological system that accounts for the different individual grammars that humans have. UG is thus not to be confused with the theory of grammar itself; rather, UG is an object of study in the theory of grammar. For example, if we think of UG as being the initial state of a parametric system that humans are biologically endowed with, then individual grammars are the result of the parameters being set. I have the grammar that I do—call it GPL, using my initials— because of the way the parameters were set in response to the linguistic data I was exposed to. Let’s also not confuse the grammar that I have with the resulting state of the parameter setting of UG in me. Let’s call the resulting state of my parameters being set as UGPL. We can say that I have the grammar GPL because I am in parametric state UGPL. As a further preliminary, let’s say that a grammar generates a language (or language narrowly construed in the sense of Hauser, Chomsky, and Fitch 2002). We can now make a distinction between the language narrowly construed that is generated by my grammar GPL—we can call this language LGPL—and other phenomena that we might pre- theoretically take to be linguistic, or part of my ‘language’ understood loosely speaking. Let’s call this pretheoretical collection of phenomena that involve my language LPL. To illustrate the distinction, consider the contrast between the following two sentences involving center embedding. (1)

The cat the dog bit ran away

(2) The mouse the cat the dog bit chased ran away We might hypothesize that although I judge (2) to be unacceptable, it is still well formed or legible in LGPL; perhaps I merely judge it to be unacceptable because of processing difficulties. So (2) is well formed according to LGPL but not acceptable in LPL. Accordingly, LGPL should not be expected to line up with all of the phenomena that we pre-theoretically take to be linguistic or part of my language. The range of phenomena in LGPL are determined by theoretical investigation, and they at best overlap with the range of phenomena in LPL. There can be disagreement about the range of phenomena that fall under LGPL. I think it is fair to say that most generative linguists today take the range of phenomena explained by UG in isolation to be limited. However, it also seems fair to say that the interaction of UG with other systems can contribute to the explanation of a broad range of phenomena—perhaps even most of the phenomena falling under LPL. More generally, let’s say then that LPL is a function of LGPL + processing considerations + pragmatic considerations + socio-cultural factors, etc. As I noted, LGPL is the product of a parametric setting of UG. But how does this work? One answer would be that parametric settings of UG establish data structures

Universal Grammar and Philosophy of Language 63 that engage directly with the performance system. That is a possible view, but there is an alternative view according to which UG is not part of a performance theory but rather underwrites our linguistic competence. In this case we would rely upon linguistic judgments to tell us if we diverged from the rules established by the parametric settings of UG. If this is right, then linguistic judgments play two distinct roles. First, they play a role in how we are normatively guided by UG, but second they also play a role as evidence in our investigation of the nature of UG. This second role deserves further consideration before we return to the role of judgments in our normative guidance. If linguistic judgments play a role in our investigation of UG, they do this by providing evidence for linguistic ‘phenomena’ or ‘facts’ (I use the terms interchangeably here). For the moment let’s stick with the term ‘facts’ and make a distinction between two kinds of facts (or at least two ways of individuating facts). Let’s say that there are surfacey facts (S-facts) and explanatory facts (X-facts) about LPL. S-facts are facts like this: ‘Who did you hear the story that Bill hit’ is not acceptable. X-facts incorporate information about the explanations for these surfacey linguistic facts—for example this: ‘Who did you hear the story that Bill hit’ is unacceptable because it violates subjacency. To understand the role of linguistic judgments in this picture it will be useful if we can get clear on the difference between linguistic theory, linguistic phenomena or facts, and linguistic data. Following Bogen and Woodward (1988), I will take phenomena to be stable and replicable effects or processes that are potential objects of explanation and prediction for scientific theories. In this case the phenomena will include the pre- theoretical domain of language-related facts. While pre-theoretically we can’t say which facts provide evidence for the theory of grammar (the theory of UG), some facts will provide such evidence (more or less directly). We will also say that the theory contributes to the explanation and prediction of these facts. We can also say that I have knowledge of some of the linguistic phenomena. I will take data to be observational evidence for claims about phenomena. The data come from token events of observation and experimentation. For example, an act of measuring the freezing temperature of a liquid might yield the datum that the fluid froze at n degrees (this is not to be confused with a written record of the measurement—we can call this a record of the datum). This datum is a piece of evidence for the more general phenomenon (fact) that the fluid freezes at n degrees. The data are token-based, and the phenomena are type-based. We can, of course, aggregate data. So, for example, we might aggregate the results of several observations to show that the average freezing temperature in our experiments is n degrees. It still counts as data on my view because we are aggregating over token experiments/observations. Let’s make this a bit more concrete with some specific examples from linguistics. Consider the linguistic rule subjacency, and the case where an act of judgment by me is the source of a datum. As noted earlier, we do not have judgments about rules like subjacency, nor do we have judgments that a particular linguistic form violates subjacency.

64 Peter Ludlow Rather, our judgments of acceptability provide evidence for the existence of these phenomena. We can illustrate the idea as follows: Grammatical Rule for GPL Subjacency: Moved elements can’t jump an NP and an S node without an intervening landing site. Explanatory fact about LGPL (potential object of theoretical knowledge for PL) '[S whoi did you hear the story that Bill hit ei]' violates subjacency. Explanatory fact about LPL (potential object of theoretical knowledge for PL) 'who did you hear the story that Bill hit' is unacceptable in LPL because it violates subjacency. Surfacey fact about LPL (potential object of knowledge for PL) 'who did you hear the story that Bill hit' is unacceptable for PL. Datum (content judged by PL) That a particular tokening of 'who did you hear the story that Bill hit?’ is unacceptable to PL. Source of datum (act of judgment by PL) PL’s act of judging that 'who did you hear the story that Bill hit?’ is unacceptable.

Summarizing thus far, grammatical rules generate linguistic facts—facts about our language. We have knowledge about some of those facts (mostly the surfacey facts), and often our knowledge is underwritten by the judgments that we make about those surfacey facts. We can get things wrong, of course, this doesn’t undermine our knowledge. But what is the nature of these judgments?

3.2 Intuitions/Judgments I reject the standardly held view in the philosophy of language, which is that the data (typically linguistic ‘intuitions’) directly involve notions like grammaticality, binding, and other theoretical notions. Devitt (2006), who believes the view is widely held in linguistics, describes it as follows: We should start by clarifying what we mean by ‘linguistic intuitions.’ We mean fairly immediate unreflective judgments about the syntactic and semantic properties of linguistic expressions, metalinguistic judgments about acceptability, grammaticality, ambiguity, coreference/binding, and the like. (95)

I do not believe this description is faithful either to linguistic practice or to the way linguists understand their practice. Linguists typically do not claim to have judgments of grammaticality and certainly not of binding facts. Rather, they claim that we have

Universal Grammar and Philosophy of Language 65 judgments of acceptability and (in some cases) possible interpretations of linguistic forms. These judgments provide evidence for linguistic phenomena (like binding) and the theory of grammar in turn explains these phenomena. In other words, we don’t have judgments about linguistic rules. Those rules are discovered by sophisticated higher level theorizing. Notice that I have distinguished between the source of linguistic data—acts of judgment—and the data, which are the contents or targets of those judgments. That is, the linguistic judgments and the linguistic facts are not the same thing. Some theorists have confused the two notions. For example, Stich (1971, 1972) argues that linguistic theory is fundamentally an attempt to account for our faculty of linguistic judgments/intuitions by systematizing or axiomatizing our intuitions. There might be some interest in such a project, but it does not seem to me to be the project that linguists are engaged in. For linguists, acts of judging yield data that support linguistic facts; acts of judging are not the primary object of study in generative linguistics (which is not to say that this capacity to judge is not worthy of study, nor that understanding the nature and limits of it is not useful to a linguist). Chomsky (1982b), for example, is pretty explicit about this: To say that linguistics is the study of introspective judgments would be like saying that physics is the study of meter readings, photographs, and so on, but nobody says that…. It just seems absurd to restrict linguistics to the study of introspective judgments, as is very commonly done. Some philosophers who have worked on language have defined the field that way. (pp. 33–34)

A number of philosophers and linguists interpret the practice of using linguistic intuitions as involving some inner ‘voice of competence.’ Devitt (2006:96), for example is very explicit about this: ‘I need a word for such special access to facts. I shall call it “Cartesian”.’ I want to stress that such a view of linguistic intuition is mistaken. I am going to take a leaf from Williamson (2004), talking about so-called intuitions in other areas, and endorse his view that we ought to scrap this talk of ‘introspection’ and ‘intuition’ altogether: What are called ‘intuitions’ … are just applications of our ordinary capacities for judgement. We think of them as intuitions when a special kind of scepticism about those capacities is salient. Like scepticism about perception, scepticism about judgement pressures us into conceiving our evidence as facts about our internal psychological states: here, facts about our conscious inclinations to make judgements about some topic rather than facts about the topic itself. But the pressure should be resisted, for it rests on bad epistemology: specifically, on an impossible ideal of unproblematically identifiable evidence. (p. 109)

Part of the controversy about linguistic intuitions, I think, is simply due to a pair of misunderstandings. The first misunderstanding is the idea that linguistic intuitions are

66 Peter Ludlow objects in the Cartesian theater of the mind—quales of acceptability as it were. Others might dispense with the qualia but still fall into the second misunderstanding, which is that linguistic judgments can give us direct access to the linguistic rules that we have. I would argue that linguistic judgments are not judgments about rules, or even rule compliance (understood in the sense that we judge that we are in compliance with a particular rule or set of rules that is transparent to us). They are simply judgments about surfacey linguistic facts, and these facts are determined in part by the linguistic rules. In this section I’ve made the case that linguistic theories in general and UG in particular are not theories that axiomatize linguistic judgments, but rather that our judgments are evidence for linguistic facts and that our linguistic theories are designed to play a role in the explanation of these linguistic facts. But this raises the question of why we have any judgments at all about such things. Why should we have judgments about the acceptability of linguistic forms? One possibility is that our ability to make these judgments is a fluke or a bit of luck, but as I suggested earlier, there is another possibility. It might be that they play a role in our ability to follow the rules of our grammar—they might provide normative guidance for our linguistic performance. We can successfully comply with our linguistic rules because we are capable of judging whether what we are saying is acceptable (and what it means). The judgments thus could play a kind of regulative or optimizing role in our linguistic behavior at the same time that they provide insights into the language faculty. Does it make sense to think of the rules of generative linguistics as being normative? And how could they be if we don’t even know what those rules are?

3.3 Normative Rule Governance Chomsky has frequently spoken of our having ‘knowledge’ of grammar, but this is a peculiar way to speak, if, like philosophers, we take knowledge to be justified true belief. Linguistic rules are not the sorts of things that people believe even upon reflection, nor is it clear that linguistic rules are the sorts of things that are true or correct, since there isn’t a question of having the wrong linguistic rules. You just have the rules that you do. Finally, talk of justification doesn’t make sense for grammatical rules; there is no issue of my getting the right rule in the wrong way (say from an unreliable teacher or peer). So linguistic rules aren’t believed, they aren’t true in any sense, and even if they were true and we did believe them there is no reason to think those beliefs would be justified. That is 0 for 3. Things are different for a prescriptive grammarian, for in that case there is a question of knowing the prescriptive rules of your language. Maybe some sort of official academy of language or someone in a power relation with respect to you determines what those rules are. In that case I might come to believe a rule like ‘never split an infinitive,’ and (according to the prescriptivist) there could be a question of whether it is a ‘correct’ rule for English, and there is even a question about whether I am justified in believing

Universal Grammar and Philosophy of Language 67 the rule (did I get it from a reliable source?). But for a generative linguist this picture is deeply confused. Generative linguists are engaged in an enterprise that is both descriptive and explanatory. It is descriptive in that they are interested in the linguistic rules that individual people actually have, and it is explanatory in that they are interested in why those people have the rules that they do and UG plays a major role in this explanation. Whatever we choose to call our relation to the resulting body of linguistic rules, ‘knowledge’ is not a happy term. Chomsky does not always speak of ‘knowledge’ of grammar. Chomsky (1980:69–70) also introduced the term ‘cognize’—suggesting that it is a kind of technical precisification of the term ‘knowledge,’ and he offers that we might also say that we ‘cognize grammars.’ While ‘cognize’ may have started out as a sharpening of ‘knowledge’ its use has drifted in the linguistics literature to the point where it simply means we mentally represent grammars. This weakening seems to retire the worries that using ‘knowledge,’ (i.e., that rules of grammar aren’t true and we don’t believe them), but it raises questions of its own. Suitably watered down, ‘cognize’ suggests that linguistic rules are merely structures in the computational system that is the language faculty. (For example, we could think of the rules as being like lines of code that are accessed by a natural language processing system.) This makes sense for an account of linguistics that takes linguistics to be a performance theory, but it seems inadequate if we take linguistics to be (as Chomsky does) a competence theory. The problem with the term ‘cognize’ is that once it is watered down to mean what ‘encodes’ does, the meaning seems too thin. A competence theory suggests that linguistic rules are more than just data structures or lines of code involved in our computations. We have a dilemma here. On the one hand, ‘knowledge’ just doesn’t correctly describe our relation to linguistic rules. It is too thick a notion. On the other hand, ‘cognize,’ without further elaboration, is too thin a notion, which is to say that it is too thin to play a role in a competence theory. One advantage of the term ‘knowledge’—and presumably Chomsky’s original motivation for using it—is that knowledge would play the right kind of role in a competence theory; our competence would consist in a body of knowledge which we have and which we may or may not act upon—our performance need not conform to the linguistic rules that we know. Is there a way out of the dilemma? Perhaps the best way to talk about grammatical rules is simply to say that we have them. That doesn’t sound very deep, but saying that we have individual rules leaves room for individual norm guidance in a way that ‘cognize’ does not. I’ll say a bit more about the details of this (like what it means to have a linguistic rule), but for now I just want to be clear on how this avoids our dilemma. The problem with ‘knows’ was that it was too thick and introduced features that are simply not appropriate for the rules that generative linguists are concerned with. We don’t believe that we have rules like subjacency, nor is there some sense in which subjacency is a true rule for us. But it is certainly appropriate to say that we have subjacency (or that my idiolect has the subjacency rule or some parametric variation of it). Saying we have a rule like subjacency is also thicker than merely saying we cognize it. Saying I have such a rule invites the interpretation that it is a rule for me—that I am

68 Peter Ludlow normatively guided by it. The competence theory thus becomes a theory of the rules that we have. Whether we follow those rules is another matter entirely. Linguists and philosophers of linguistics don’t often think of linguistic rules as being normative, but that doesn’t mean the idea is a complete nonstarter. One problem, of course, is that even rules of the sort envisioned in the 1960s are so abstract that few people would be in a position to consciously entertain them, much less reflect on their normative pull. This, in effect, is why thinking about normative linguistic rules is not very attractive to linguists. It certainly doesn’t make sense to think of us following these rules in any sort of reflective capacity. It is much easier to think of linguistic principles as being part of a project of describing linguistic competence, rather than normatively guiding linguistic competence. But we can still make sense of the idea. Earlier we looked at linguistic judgments and the role they play in our linguistic theorizing. Could we also think of linguistic judgments as playing a role in directing or monitoring our linguistic competence? For a prescriptive grammarian this makes perfect sense. Certain rules are ingrained in you (for example: ‘don’t end a sentence with a preposition’), you come to have judgments that comport with those rules, and they guide your linguistic practice. Within the theory of UG, this idea is so confused it would be a task to even sort out and enumerate all the mistakes. Clearly, we aren’t interested in artificial prescriptive rules, and when we get to rules like subjacency, it seems implausible to think that we have transparent judgments about such things, much less that judgments of that form could provide normative guidance. So how can we be normatively guided by our grammar when it is construed as an abstract object that is the product of a parametric state of the language faculty? Consider the following passage from John Lawlor’s online course notes, where he discusses the role that Ross island constraints might play in our linguistic planning and performance: Violations of Ross Constraints are very ungrammatical. Most people never encounter them. We appear to formulate our discourse to avoid them. Occasionally, we get in a bind and see one looming at the end of the clause, and have to do something quick. What we do is often illuminating about the relative importance of syntactic rules. For instance, consider the following: ?That’s the booki [that Bill married the womanj [whoj illustrated it]]. *That’s the book [that Bill married the woman [who illustrated __ ]]. i j j i

Neither sentence is terrifically grammatical, but the first seems more appropriate (and common as a type) than the second, though the last word in the first sentence still feels strange. The ordinary rule of relative clause formation operating on the last clause should result in its deletion at the end of the clause (and thus the sentence). However, it appears inside another relative, an island, and is thus safe from such ‘movement’ by the Complex NP Constraint.

Universal Grammar and Philosophy of Language 69 Sentences like the first one are generated when, at the last minute, the speaker realizes what is going to result, and cancels the deletion, substituting an alternative relative-formation rule (called a Resumptive Pronoun in the trade), which merely pronominalizes the coreferential NP, instead of deleting it in the object position. This is not the way English forms its relative clauses (though other languages use it frequently, e.g., Hebrew), and the sentence is thus ungrammatical. But this turns out to be a venal syntactic sin by comparison with a violation of a Ross constraint, which typically produces extreme ungrammaticality.

Lawlor is not saying that we are consciously aware of Ross constraints, nor even that we judge that there has been compliance with such rules. Rather he is saying that we can see that something is wrong, and that we act so as to make it right. We make it as right as we can, and in so doing (and unbeknownst to all but theoretical linguists) we have acted so as to avoid a violation of the Complex NP Constraint. What I am suggesting is that this is a case in which we are normatively guided by the Ross constraint even though we have no conscious knowledge of such guidance. This is tricky. If it is true that we have judgments that play a role in the normative guidance of our linguistic performance, we don’t want to be in a position that only true believers can be so guided. These should be judgments that not only are available to all competent linguistic agents, but in fact are used by them as ways of checking or regulating their linguistic performance. All linguistic agents. Even agents that believe their judgments are not grounded in cognized rules but are actually judgments about language construed as an external social object—even agents that don’t believe they have linguistic rules of any form. Given all this, the idea of linguistic rules providing normative guidance must be pretty hopeless, no? I’m going to suggest that the problem is difficult but not hopeless at all. The puzzle is to figure out how can we have judgments that can guide us in producing linguistic forms that are well formed according to a linguistic rule system even if we have no knowledge of the rule system. This might seem hopeless, but it is really a quite widespread phenomenon. An example of the phenomenon comes from work in ethics by Arpaly (2003) and Railton (2006). They discuss cases where an agent is following an ethical principle but does not recognize this, and even interprets their behavior as ethically unprincipled and indeed morally wrong. Arpaly illustrates this idea with a literary example from Mark Twain’s book The Adventures of Huckleberry Finn. In the example, Huck decides not to turn in the escaped slave Jim, even though he thinks the moral option is to do precisely that. Huck believes that Jim is someone else’s property, after all. He judges that turning in Huck is not the thing to do, but he takes his judgment to have a non- moral etiology. Why does Huck refuse to turn in Jim? Well, Huck is really not able to articulate the reason. As Railton (2006) has described such situations, perhaps Huck just has a nagging feeling of discomfort at the idea of doing it. He doesn’t feel that this discomfort or his decision of what to do is based on an ethical principle, but Arpaly and Railton argue that this is precisely what Huck is doing—he is following a moral

70 Peter Ludlow principle, but he describes his action as being immoral. Indeed he thinks he is a bad boy for that very reason. Now let’s return to the case of judgments of acceptability for linguistic forms. We can imagine someone in the position of Huck, only with respect to linguistic rules rather than moral rules. Let’s call this hypothetical agent ‘Michael.’ Michael has a grammar as part of his cognitive architecture, and he has judgments of linguistic acceptability that he uses to guide his linguistic performance. Yet Michael, like Huck, is deeply confused. Although his judgments guide him in such a way that he generally follows the rules/principles of grammar, he does not recognize that he is so guided. Michael, like Huck who believes he is immoral, believes he is ignorant of language in the relevant sense. But on Arpaly and Railton’s view we don’t need to feel bad for either Huck or Michael. Huck really is acting on ethical principles; he is not really a bad boy after all. Similarly, Michael really is following grammatical rules—in spite of what Michael insists. As I said before, this gambit is subtle. One needs to develop a position on the normativity of language which can allow that linguistic rules are very abstract and currently outside the reach of our best linguistic theorizing yet have some normative pull on us. Railton (2006) formalizes the notion of rule guidance as follows, where for our purposes N can be thought of as a represented rule like subjacency. (RG) Agent A’s conduct C is guided by the norm N only if C is a manifestation of A's disposition to act in a way conducive to compliance with N, such that N plays a regulative role in A’s C-ing, where this involves some disposition on A’s part to notice failures to comply with N, to feel discomfort when this occurs, and to exert effort to establish conformity with N even when the departure from N is unsanctioned and non-consequential.

And his notion of regulative explanation is by analogy to the role of regulators in engineering. regulative explanation: For an engineer, a regulator is a device with a distinctive functional character. One component continuously monitors the state of the system—the regulated system—relative to an externally set value, e.g. temperature, water pressure, or engine velocity. If the system departs for the set-point value, the monitor sends an ‘error signal’ to a second component, which modulates the inputs into the system… until the set-point value is restored.

In effect, the idea is that the agent represents a rule that may not be consciously accessible to the agent, but which plays a regulative role in the agent’s behavior. This is not to say that the agent will comply with the rule; simply that a failure to comply or diverge

Universal Grammar and Philosophy of Language 71 from an optimal value may well be judged by the agent as not right and perhaps calling for correction or excuse. We may have found a coherent notion of individual normative rule governance in generative linguistics, but there is a penalty for that—we now run straight into the teeth of Wittgenstein and Kripke’s rule following argument.

3.4 Rule Following What fact can one appeal to in order to determine that a particular rule used in the past is identical to the rule being used now? Take Kripke’s (1982) example of someone (let’s say Jones) who determines that 68+57 = 125. The skeptic then asks Jones how she knows that the rule denoted by ‘plus’ right now is the same rule she was using in the past. Suppose that Jones has never before added two numbers totaling greater than 57. Is it not possible that in the past Jones was using ‘plus’ to refer to another rule altogether, perhaps one which we might call ‘quus,’ which is like plus in all respects except that when two numbers totaling greater than 57 are quussed, the answer is 5, rather than 125? Jones would surely be right in saying that the skeptic is crazy. But what fact about Jones (or the world) could we point to in order to show that she was in fact using the rule plus before? If we cannot succeed, then there is not merely a problem for past uses of ‘plus’ but for current ones as well. When we say that we are using a rule, what fact does our assertion correspond to? As Kripke puts the problem: Since the sceptic who supposes that I meant quus cannot be answered, there is no fact about me that distinguishes my meaning plus and my meaning quus. Indeed there is no fact that distinguishes my meaning a definite function by ‘plus’ (which determines my responses in new cases) and my meaning nothing at all. (Kripke 1982:21)

It is natural to think we can get around the skeptical argument by saying that the addition function is in turn defined by the rule for counting, and the quus rule violates the counting rule at a more basic level. But here the Kripkean skeptic can reply that there will be a non-standard interpretation of the rule denoted by ‘count.’ Perhaps Jones is really using the rule quount. What are the consequences of the argument if it is sound? Kripke’s conclusion is that talk of rules and representations is somehow illegitimate, and should be exorcised from the cognitive sciences: … if statements attributing rule-following are neither to be regarded as stating facts, nor to be thought of as explaining our behavior…, it would seem that the use of the idea of rules and of competence in linguistics needs serious reconsideration, even if these notions are not rendered meaningless. (Kripke 1982:31, n22)

72 Peter Ludlow But does the argument hold up? To answer this question we need to take a closer look. There are three different elements that commentators have found (or think they found) in the argument. Let’s label them as follows. i) The normativity question ii) The knowledge question iii) The determination question The normativity question comes to this. Even if you could show that an agent like Jones has a disposition to act in conformity with a clearly identifiable rule (e.g., addition and not quaddition), there is a serious question about finding some fact that makes it true she is rule-guided. Here is how Kripke puts the point: What is the relation of this supposition [the supposition that I mean addition by ‘+’] to the question how I will respond to the problem ‘68 + 57’? The dispositionalist gives a descriptive account of this relation: if ‘+’ meant addition, then I will answer ‘125.’ But this is not the proper account of the relation, which is normative, not descriptive. The point is not that, if I meant addition by ‘+’, I will answer ‘125,’ but if I intend to accord with my past meaning of ‘+’, I should answer ‘125.’ (Kripke 1982:37)

For a number of generative linguists, this won’t be an issue, because they don’t take linguistic rules to be normative. But on the story I am sketching in this chapter, we can make sense of the idea of linguistic competence by appealing to the idea that we have linguistic rules by being in a certain parametric state of UG and that they have a certain kind of normative force. It also seems that following Railton we have a plausible story about the facts that underwrite the normative aspect of rule following. Railton offers the example of an agent called Fred, who has a disposition to have a snack every day mid-morning, but on occasion he fails to have a snack. In Railton’s terminology Fred has a ‘default plan,’ and his behavior is certainly rule-described but it isn’t rule-guided because on those cases when Fred failed to have his morning snack he merely shrugged his shoulders (‘Oh well, I guess I forgot’). Had it been a case of normative rule guidance Fred would have judged that he was in some sense against this having happened. Perhaps he would feel a sense of moral discomfort at failing to have his morning snack (as he does when, for example, he fails to validate his bus ticket), or perhaps his inner monologue would involve attempts to justify or bargain over his omission (‘I’ll have two snacks tomorrow’ or ‘I was extra super busy’). If this is right then there is a phenomenological ingredient to rule following beyond the merely having a disposition to act in a way conducive to compliance with a rule. There is a sense of discomfort at failure to do so. The second component—the epistemic question—may or may not be a component of Kripke’s argument (there is some dispute about this), but we can frame the worry like this: Surely if I am following a rule (say plus) I must be in a position to know that I am doing so.

Universal Grammar and Philosophy of Language 73 Now it is certainly the case that for some kinds of rule following we do know that we are following the rule (the rule plus is a case in point), and maybe it is even crucial that we know we are following the rule in order to follow it (this would be difficult to establish), but the linguistic rule-following cases do not appear to be structured in this way. Knowledge of linguistic rules like subjacency is the product of scientific investigation and not available to the vast majority of language users, but we still can make sense of rules like subjacency normatively guiding us. The knowledge question is simply not applicable in the linguistic case. This leads us to the determination question of Kripke’s argument: What fact about an agent metaphysically determines that the agent is following a particular rule N, and not some other rule N′ which has the same result for all previously analyzed cases? Here we really do get to a worry that directly targets the linguistic case, and indeed the analysis of rule governance that we borrowed from Railton. We can concede that any number of dispositions might be consistent with previous conduct (say addition or speaking a language), but we must be careful not to overlook the norm N, which is represented by the agent and which plays a role in A’s conduct. So, to turn to the mathematical example, what separates Jones the adder from Jones the quadder is that there are different norms that are represented in the two cases. To a first approximation, we might think of this as involving two distinct lines of code or two distinct computer programs. Let’s call this the computationalist response to the determination question. This sort of solution is suggested by Fodor (1975): The physics of the machine thus guarantees that the sequences of states and operations it runs through in the course of its computations respect the semantic constraints on formulae in its internal language. What takes the place of a truth definition for the machine language is simply the engineering principles which guarantee this correspondence. (Fodor 1975:66) … it is worth mentioning that, whatever Wittgenstein proved, it cannot have been that it is impossible that a language should be private in whatever sense the machine language of a computer is, for there are such things as computers, and whatever is actual is possible. (Fodor 1975:68)

Fodor’s point is this: Anytime we attribute rule-following behavior to a system, computer or otherwise, we are offering, in part, a theory about the program that the system is executing. Since there is a brute fact about what program the system is executing, then that is the fact about the world that underlies our attribution of the rule to the system. But matters are not so simple, and Kripke rejects the computationalist alternative. His most potentially damaging objection can be put as follows: Is there really a fact of the matter about which program a system (computer or not) is executing? I cannot really insist that the values of the function are given by the machine. First, the machine is a finite object, accepting only finitely many numbers as input and yielding only finitely many as output—others are simply too big. Indefinitely many programs extend the actual finite behavior of the machine. Usually this is ignored

74 Peter Ludlow because the designer of the machine intended it to fulfill just one program, but in the present context such an approach to the intentions of the designer simply gives the skeptic his wedge to interpret in a non-standard way. (Indeed, the appeal to the designer’s program makes the physical machine superfluous; only the program is relevant. The machine as physical object is of value only if the intended function can somehow be read off from the physical object alone). (Kripke 1982:34)

The upshot is that appeal to computers will not work, for the skeptical argument can be extended to them as well. In short, there is no fact of the matter about what rules and representations they are using. But is it really plausible to think that there is a possible world in which all the relevant nonintentional facts hold and I am not in the same computational state? Soames (1998) objecting to Kripke, addresses this question for conscious rule following, and frames his objection thus: Would the result change if we enlarged the set of potential meaning-determining truths still further to include not only all truths about my dispositions to verbal behavior, but also all truths about (i) the internal physical states of my brain, (ii) my causal and historical relationships to things in my environment, (iii) my (nonintentionally characterized) interactions with other members of my linguistic community, (iv) their dispositions to verbal behavior, and so on? Is there a possible world in which someone conforms to all those facts—precisely the facts that characterize me in the actual world—and yet that person does not mean anything by ‘+’? I think not. Given my conviction that in the past I did mean addition by ‘+’, and given also my conviction that if there are intentional facts, then they don’t float free of everything else, I am confident that there is no such world. Although I cannot identify the smallest set of nonintentional facts about me in the actual world on which meaning facts supervene, I am confident that they do supervene. Why shouldn’t I be? (Soames 1998:229)

Soames’ point is that if we are concerned with metaphysical determination, then we are concerned with whether there are some nonintentional facts that necessarily determine the rule I am following. But as we learned in Kripke (1980), it does not follow that I should be in a position to know these facts (for example, water might necessarily be H20, but it doesn’t follow that we will know this). The rule following argument (or at least the determination question) appears to be refuted by Kripke’s own observations about the necessary a priori.

3.5 Externalism about Syntax? If we block the Kripkean considerations raised in the previous section, by casting our net widely on the determination question, we move to a position where meaning facts

Universal Grammar and Philosophy of Language 75 might supervene on nonintentional relational facts. For example, the fact about what rule I am utilizing may supervene on (among other things) nonintentional facts about my environmental embedding. If the Kripkean argument applies to all forms of rule following (as it is supposed to) this seems to push us to a view in which syntactic states, not just semantic states, are sensitive to embedding environmental conditions. That is, we could be pushed towards externalism about syntax. But doesn’t this push us away from Chomsky’s conception of I-language and towards what he called E-language? It does not, but it pushes us towards a third position. Chomsky (1986b) drew a distinction between two distinct conceptions of the nature of language, calling them I-language and E-language respectively. I-language is understood as referring to the language faculty, construed as a chapter in cognitive psychology and ultimately human biology. E-language, on the other hand, is a cover term for a loose collection of theories that take language to be a shared social object, established by convention, and developed for purposes of communication. Chomsky’s choice of the term ‘I-language,’ is playing off of several interesting features of the language faculty—what we might call the three i’s of I-language. First, he is addressing the fact that we are interested in the language faculty as a function in intension—we aren’t interested in the set of expressions that are determined by the language faculty (an infinite set we could never grasp), but the specific function (in effect, individual grammar) that determines that set of expressions. Second, the theory is more interested in idiolects—parametric states of the language faculty can vary from individual to individual (indeed, time slices of each individual). Thirdly, Chomsky thinks of the language faculty as being individualistic—that is, as having to do with properties of human beings that do not depend upon relations to other objects, but rather properties that supervene on events that are circumscribed by what transpires within the head of a human agent (or at least within the skin). I think that this last i can be separated out of the mix. The distinction between individualistic and relational properties can be illustrated by the distinction between one’s rest mass and one’s weight. My rest mass is what it is simply because of my physical constitution. My weight depends on the mass of the planet or moon I am standing on. It is arguable that many of our psychological states, ranging from perceptual states to our propositional attitude states are individuated by reference to the external world. As Kripke’s rule following argument suggests, even computational states might be individuated widely (what the determination problem may show is that what a machine computes is a function of its embedding environment and not the physics of the machine in isolation). I intend the term ‘Ψ-language’ to be neutral as to whether psychological states are individuated widely or narrowly (or both). My point is that the move to relational facts at most commits us to the Ψ-language position and not the E-language position. Of course, while psychological states may include external objects as part of their individuating conditions, almost all of the interesting questions involve the issue of how we represent those external objects and the cognitive mechanisms involved in how we come to represent them as we do.

76 Peter Ludlow I consider it plausible that generative linguistics should be concerned with the linguistic rules that we have, and that this in turn should be understood in terms of a computational system in our cognitive psychology, but I don’t see that such a view entails that we have to understand human psychology individualistically. The case for I-language has always placed a great deal of emphasis on the language faculty—UG—as a chapter in our psychology, but nowhere does the positive argument seem to entail that psychology must be individualistic. We can happily talk about computing over rules without sliding into individualism. As with I-language, we can take Ψ-language to be a computational system that is part of our biological endowment; we simply remain neutral on the assumption that the relevant computational states supervene exclusively on individualistic properties.

3.6 Conclusion I have argued that UG is a parametric system that establishes individual norms for us. This is the sense in which UG is part of a competence theory instead of a performance theory. By establishing individual norms it does not control our linguistic behavior but acts as a kind of regulator that warns us when we diverge from an optimal set-point value. These warnings take the form of linguistic judgments. These linguistic judgments are surfacey—they don’t tell us what rules have been violated; they merely tell us when something is amiss, or which of two forms is more optimal. While this leads us into the rule following arguments of Wittgenstein and Kripke, it does not force us to an E- language perspective. To the contrary it at most pushes us to adopt what I have called the Ψ-language perspective on the nature of language.

Chapter 4

On the History of Universal G ra mma r James McGilvray

4.1 Introduction ‘Universal Grammar’ (UG) is a technical term with changing applications in Chomsky’s decades-long attempt to construct a biologically-based natural science of language—of its development in the individual (growth) and the human species (evolution). Its basic sense is something like this: the hidden but scientifically discoverable innate component of the human mind/brain that allows a neonate’s mind with minimal input to develop a computational/derivational system that yields an unbounded number of the hierarchically structured conceptual complexes characteristic of a human language (see chapters 5, 10, and 12). Like any term in a natural science that has made progress, its theory-specified referents have changed, sometimes in major ways. And like virtually every term in formalized natural sciences—for example, ‘force’ in current physics—its connection with the senses and referents of commonsense and like terms (‘general grammar,’ ‘universal grammar’ without the capitals) is of no scientific interest. The scientifically interesting issue for the history of this technical term is whether the current account of UG is by the independent measure of the success of a natural science better than the initial one. A history of UG as a technical term can focus on progress towards that goal. It should, since reaching that goal—Chomsky claims—has from the beginning been the aim of his efforts to construct a science of language (Chomsky and McGilvray 2012:38; hereafter C&M). That history begins over sixty years ago and ends for the purposes of this chapter at the time of writing. It will be brief, focusing primarily on the goal and early and most recent stages. Some readers might expect an account of immediate influences on Chomsky—how, for example, the various intellectual projects of Goodman, Harris, Church, Skinner, Quine, Bloomfield, and others influenced Chomsky’s early and later efforts. Several have

78 James McGilvray attempted this, such as Newmeyer (1986), Matthews (1993), and Tomalin (2008); see also chapter 1. I will not. A responsible attempt needs more scope than a chapter affords. I aim instead to emphasize progress at offering a natural science of UG. And—as does Chomsky—I explore some of the historical origins and limits of natural science methods when applied to the mind, especially with regard to language. In books and articles from the early 1960s onwards Chomsky discussed the works of various polymaths, philosophers, linguists, and poets who between the 17th century and the 19th studied language or closely observed its use. His aim, apparently, was to elucidate the assumptions on which studies of language and mind proceeded, then and now, commending those assumptions that coordinate with the basic methods of natural science and that led to progress. His most expansive effort (and among the earliest) is found in his Cartesian Linguistics (CL). A brief but rich work, CL very selectively placed the studies it discusses in what Chomsky elsewhere called a ‘rationalist’ group, where studies of the mind presuppose that the mind’s ‘parts’—vision, language, and so forth—are best studied as natural objects that have the natures they do primarily because they are products of mind/brain internal growth agendas (nativism and internalism).1 Or they are placed in an empiricist camp with practitioners who study the mind assuming that its ‘parts’ are primarily the products of external influence (anti-nativism and externalism).2 CL’s basic message is that internalist and nativist assumptions lead to good natural sciences of mind; externalist and anti-nativist ones do not. Descartes had a special place in CL. He did not attempt a natural science of language and says little of interest about language itself. But with Galileo he invented natural science methods. He offered a plausible account of innateness (nativism), brain and other- based. His observations concerning language use advocate an internalist approach to mind and indicate the limits of natural science methods when applied to the mind. And at least one of his protégés initiated a natural science of language. In what follows, I outline the aims of natural science and sketch progress towards constructing a natural science of language, UG in particular. Then I sketch CL’s—and Descartes’s—contributions.

1

Now different terms sometimes serve for ‘rationalism’: ‘methodological monism,’ ‘normal science,’ ‘Galilean science,’ ‘biolinguistics.’ The natural science methods and their limits remain the same. 2 Except in note 3, Chomsky himself does not in CL organize the views he discusses around the rationalist/empiricist distinction; he does, though, in several places before and after. Nevertheless, it is a useful way to capture the differences. It is difficult to understand the motivations behind empiricist methods for the study of mental systems and language in particular; they do not appear to be those of the natural scientist. Perhaps they derive from the commonsense belief that language is a social institution located outside the head (externalism) that is learned through training or by application of some kind of general-purpose statistical sampling technique (anti-nativism).

On the History of Universal Grammar 79

4.2 Natural Science and Universal Grammar Natural sciences typically deal with phenomena out of reach of direct observation and—more generally—of our everyday (commonsense) conceptions of things and events. The physicist’s forces are out of reach of almost everyone. The articles and pictures of quarks and gluons that appear in Scientific American and like publications appeal to our everyday mechanism-focused intuitions, but you need to have the formal-mathematical theory well in hand to really understand what (say) strong interaction is and why there is no grand theory of forces, at least so far. Similarly for natural sciences of the mind, including language, one must abandon commonsense intuitions about language, sentences, words, and meaning in order to understand what a natural science of language ‘says.’ Fortunately, though, a prominent aspect of language as understood in natural science is not only reasonably accessible to educated audiences, but is of central importance. Natural sciences are postulated formal theories of hidden phenomena. The sciences succeed when they meet certain conditions. Expanding on Chomsky (1965) these are descriptive adequacy, explanatory adequacy, explicit formal statement, simplicity, accommodation to other sciences, objectivity, universality, and—especially—making progress in improving a science along one or more and ideally all of the preceding dimensions. To clarify, descriptive adequacy has little to do with observation; it is a matter of providing terms that characterize the entities and properties that the theory postulates—adequate in the case of UG to state at the least what the uniquely linguistic (biologically) innate component is that allows a mind/brain to develop a language. Explanatory adequacy could include almost everything else on the list of desiderata, but we can restrict it to explaining observable but puzzling facts such as why humans alone have languages and why languages appear to develop automatically. Chomsky until the early 1980s conceived of explanatory adequacy as providing a theory that explains swift and effortless language acquisition. After the 1980s introduction of the Principles and Parameters program (see chapter 14) and with increasing confidence that linguistic differences arise only on the way to externalization (speech or sign), he treats this matter as sufficiently well dealt with to in recent years speak of moving ‘beyond explanation’ (Chomsky 2001)—to dealing with accommodation to biology (evolution and development) and thus addressing evolution and why language is the way it is. We can treat these as explanatory issues too. Universality in UG-relevant cases is universality of language acquisition systems across the current human species, not uniformity in stable outcomes. And simplicity—Galileo’s and Goodman’s central concern—is not merely an epistemological constraint on inquiry and theory construction, but an ontological/ metaphysical principle that even in the kludgy-looking biological domain has proven to yield successful natural science theories. Perhaps it is also what Descartes had in mind when he spoke of the ‘light of nature.’ Whether it is or not, we do not need the assurance

80 James McGilvray of a kind Cartesian god to underwrite the principle. We may not know why, but it yields good natural sciences—good by the standards above. The most promising attempts so far by these standards at capturing the nature of language come in abstract packages: formalized computational/derivational theories that treat language as a mind-brain internal system operating automatically in accord with what appear to be faculty-unique principles or laws—in effect, computational theories of a mental module. It is only recently that it has been possible to address implementations of such systems (Embick and Poeppel 2015). Note that the desiderata for a natural science outlined above constitute an ideal, one that some fields of inquiry—including some natural sciences—are unlikely ever to emulate. An example is meteorology, which involves complex contributions of domain- specific micro-events. Sciences can deal with the micro-events in ways that approximate the ideal, but with meteorology the focus is on their interactions and macro effects, phenomena that we with our limited minds and natural science methods will never manage to capture within a single theory, unless perhaps one believes that system dynamics or ‘chaos theory’ are such theories. A related but importantly different example is complex organism behavior, including and especially human linguistic behavior. We easily deal with it with our native folk psychology, or what sometimes goes by the title ‘theory of mind.’ But it is out of reach of natural science. If linguistic action/use is out of reach of the methods of natural science, the language system is not. Chomsky’s early works (1955/1975, 1957) focused on developing an appropriate formalism that could simplify the apparent complexities of individual languages and bring them within the scope of a relatively small number of principles that provide infinite range from finite means. Pre-Chomsky work did not employ the mathematical-formal tools of recursion applied to language-appropriate abstracta, and their lack is one of several reasons for beginning the history of UG as a technical concept in the 1950s. The formalizations offered then and two decades after represented important advances, even if supplanted by later simpler ones better able to deal with explanation and accommodation to biology, especially in post-2000 Minimalist form. Explanatory adequacy was accomplished in a messy way: children’s minds are supplied with an internal simplicity measure that ‘chooses’ the simplest grammar for a set of data, where simplicity is measured by a form of symbol count, assuming a domain for its operation. What is the domain? 1962’s ‘Explanatory Models in Linguistics’ (p. 536) indicates that one must presuppose—and of course get a good theoretical grip on—a ‘highly intricate and [language-]specific initial structure’ that provides the grammars. At the time, there had been some progress in describing that structure and making it more nearly universal, but not much. Some apparently grammar-specific universals and some promissory notes were gathered in what Chomsky (C&M) now calls a ‘format’ for grammar. It included a principle of cyclicity of derivation/computation and a computational/derivational picture with several levels (DS, SS, etc.). A simple UG looked out of reach. The beginnings of a solution to that and other explanatory issues appeared only later.

On the History of Universal Grammar 81 A science of language must find and characterize the universal and innate language- relevant component that allows for ready acquisition. But it must also speak to differences between languages and make sense of how an infant could acquire any natural language quickly. Input matters, so some kind of learning or perhaps triggering takes place, leading at the least to differences in accent and prosody, in morphology, in word order. Some differences appear systematic: different languages sometimes display what appear to be systematically different characters. Early efforts within the 1980’s Principles and Parameters (P&P) program encapsulated some of these differences in parametric options to universal principles, building them into UG. A plausible-looking example is the phrasal ‘head parameter’: XP = X —YP, with ‘—’ unordered. It provides for languages with phrasal heads (N, V, etc.) placed before (English) or after (Japanese) complement phrases (see chapter 14, section 14.3, for more discussion and illustration of this and other early parameters). Parameters—especially binary ones like this—appear easy to ‘set’ on the basis of input, perhaps statistically assessed (Yang 2004). P&P clearly improved on the early picture of language acquisition, advancing explanatory adequacy. There is also no doubt that a natural science’s account of UG must deal with evolution: one must explain why humans alone have language. There was discussion of the issue even in the 1960s (Lenneberg 1967, 1975; Chomsky 1968/1972/2006). But so long as UG remained complex, evolution and especially the saltational account needed to make sense of the facts (and lack of evidence for a gradualist and kludgy alternative) remained mere speculations, not reasonable guesses. Unfortunately, a parameter-laden UG remains too cluttered and complex to be something that might have evolved in a way compatible with the reasonable assumption that UG evolved in a very brief period, likely in a single step. The obvious solution: retain the P&P program’s solution to acquisition and isolate UG from those aspects of language that must be learnt. A reasonable (by natural science methods standards) strategy to do this appeared after 2000. There was progress after Chomsky’s 1965 Aspects in simplifying and universalizing linguistic structure while improving descriptive adequacy. Examples are found in Ross on islands, X-bar theory, ‘move-α’ and the 1980s’ Government and Binding program, Bare Phrase Structure, and early Minimalist attempts at eliminating artifacts such as deep and surface structures. But it was not until after 2000 that the Minimalist Program offered a demonstrably uniform and simple UG divorced from a plausible P&P-inspired way to speak to swift acquisition of specific I-languages along with a plausible account of evolution—and thus accommodation to biology. Recent Minimalism began with the introduction of a single syntactic/compositional recursive operation Merge (replacing separate Merge and Move), followed by work on phases, ‘third factor’ contributions (see c hapter 6), edges, projection, and so forth (Chomsky 2000b, 2001a,b, 2004a,b,c, 2005, 2008, 2013). Intuitively, the picture that emerged was that of a language system with three ‘parts.’ One is a compositional operation—Merge—that takes ‘objects’ (for language, lexical items (LIs) or minimally concepts), puts them together to make an unordered set, and along with a structure- internal copying operation produces hierarchically ordered and structured complexes that—if the operation joins ‘words’ or lexical items, or even just atomic concepts—we

82 James McGilvray can think of as an internally represented sentential expression. That operation is called ‘Merge’ and it comes in two basically equivalent forms, External and Internal. The second component is at the least a supply of concepts—packages of what Chomsky calls ‘semantic features’—that can reasonably be seen as the products of a system that was already in place when the combinatory system was evolutionarily introduced. The third is a resource of phonological, morphological, and relevant other features that deal with sound/sign perception and production (expression, externalization of a conceptual complex produced by Merge), a resource that when packaged and associated with a concept in a ‘word’ constitutes a lexical item or perhaps lexeme. Matters are not settled—they rarely are in natural science—but if a proposal called the ‘Strong Minimalist Thesis’ (SMT) that is characteristic of recent Minimalist Program efforts is correct by the standards of natural science, UG could be the recursive operation Merge, period. There is evidence (Hauser, Chomsky, and Fitch 2002) that Merge is unique to humans, as is language (and mathematics); it is also obvious that language (along with the capacity to count) exploits and is apparently coeval with Merge, and that—with a qualification—it serves as language’s primary computational- compositional operation.3 The qualification is that—as noted—Merge must rely on at least a stock of concepts (likely thought of as ‘atomic’ packages of semantic features) in order to count as an operation that yields the conceptually complex cognitive ‘perspectives’ that provide the creativity, cognitive advantages, and flexibility that humans enjoy and exploit. If so, Merge + concepts plus minimal search (a third factor contribution) would suffice for language conceived of from the point of view of the SMT to be ‘selected,’ even without externalization—or consciousness, which requires re- internalization. Externalization and its linearization operating through some sensory medium could have come later. It is likely that the means for externalization existed when Merge was introduced, but the relevant mechanisms would have had to be configured in ways that accord with these expressive resources—and obviously, there are various ways. The upshot: expression (and communication and consciousness of ‘content’) turns out to be a secondary feature of language, the production of articulated meanings (‘thoughts’) primary (C&M; Chomsky 2013:36). This picture (with Merge + concepts universal, and with means of expression and linear order particular) resonates with the views of CL’s Port-Royal grammarians. They assumed that linguistic meanings (now seen as conceptual complexes (SEMs) produced by Merge, not CL’s and Aspects’ Deep Structures) are universal. No one has a naturalistic science of concepts, although it is reasonable to assume that there can be one: they seem to result automatically from some kind of productive procedure. That has been apparent to rationalists for centuries (see C&M), restated (but lumbered with misguided externalist baggage) by Fodor in (1998). Note that abduction (concept acquisition) and perception (application) are closely linked—a point Chomsky attributes to the rationalists he

3

I ignore probe–goal and related matters here.

On the History of Universal Grammar 83 discusses in CL (pp. 102–103). Similar claims reappear now in work by Gallistel (2006) and others on innate domain-specific ‘learning’ mechanisms. What about the swift acquisition of language particularities? Keep three things in mind. First, input or experience matters: children’s minds adjust to local linguistic conditions. Second, it has become ever more apparent since the beginning of the Principles and Parameters framework in the early 1980s that particularities are found in language ‘linearization’—basically, in how they variably manage to place a structured conceptual complex containing lots of redundancies into a ‘signal’ expressed in a sensory modality over a temporal period. Phonology, morphology, and the like constitute our resources for doing so.4 Third, developmental progress exhibits some predictable patterns both in developmental stages and the ranges permitted in initial and final states (see c hapter 12). As suggested, that led to P&P’s early thought that UG includes parametric options. And perhaps a few of what have come to be called ‘macroparameters’ (Baker 2008b) are a part of UG, with their setting a matter to be resolved—presumably—through an epigenetic account that includes response to the environment. But that possible concession may be the only residue of the idea that UG’s principles include some options. There is little reason to think that all the fine-grained differences reflected in microparameters (Kayne 2000) and claimed to be built into lexical items are (see c hapter 14 for relevant discussion). Further, ideally for the SMT, no parametric options would be included in UG/Merge. There is, however, another contributor to a language’s final state: third- factor physical, chemical, computational laws, and the like no doubt play a role. They are certainly innate (Cherniak 2005), although not by virtue of genetic control. They can be sensitive to differences, including fine ones. And their contribution has to be acknowledged in any case: minimal search plays a role in Merge’s operation. They can help deal with swift acquisition, presumably (see chapter 6). Assuming so, the language faculty with genetic, epigenetic, and third-factor considerations taken into account is a nature-based ‘mechanism’ that yields a specific version (an ‘I-language’) of a natural language in accord with biological and non-biological natural law that ‘channels’ development, a procedure that receives needed input (experience) and yields a stable final state in a child’s mind/brain by the age of 3;6 or 4. The final state is thus the result of (1) genetic specification (biology), (2) ‘input’ or ‘experience,’ and (3) other non-genetic natural constraints on development, such as constraints on effective computation. These are Chomsky’s ‘three factors.’ Accounting for the evolution of UG and language and accommodation to biology is now easy: all one must account for is the introduction of Merge, and that can be seen as the result of a single saltational event in a single breeding hominin. And we can now account for the fact that any normal infant can acquire any human language, and make sense of why nothing seems to have changed for the last 50,000 years or so. Explanatory adequacy is greatly improved by including epigenetic and physical, chemical, computational, and other constraints among the contributions to the development of a biological 4

Of course there are multiple differences in how languages are used in thought and speech, but they are irrelevant to a science of language. For reasons, see the discussion of CL below.

84 James McGilvray system (Thompson 1917/1942; Turing 1992; Kauffman 1993; Carroll 2005; among others), and by making Merge alone UG. Further, objectivity and universality are advanced. So the natural science dictum, ‘seek simplicity’ (generally in nature and here, in UG) has much to recommend it, even when pushed to the extent that it is in the strong minimalist thesis, a thesis that Chomsky remarks might even be true (Chomsky and McGilvray 2012:54). Clearly, by the standards of natural science research, there has been considerable progress towards a natural science of language.

4.3 The Contribution of Cartesian Linguistics 4.3.1 What Kind of Work is Cartesian Linguistics? One might think CL is antiquarian intellectual history that tries to capture the actual views of historical figures. First, there is the subtitle: ‘A Chapter in the History of Rationalist Thought.’ Second, there is the remark in Chomsky’s introduction (CL, p. 57)5 that what follows in the text can be seen as a ‘discussion of the history of linguistics in the modern period,’ followed by the suggestion that the part of the ‘modern period’ that he aims to discuss has been ignored by ‘modern linguistics’ (meaning that of Bloomfield, Harris, Joos, etc.). However, just below (pp. 57–58) that remark is an outline of what Chomsky actually attempts in the volume. It is highly selective, with choices driven by then current (and continuing) ‘concerns and problems’ of someone like himself, engaged in constructing a science of language as he was and is. He says: I … limit myself here to … a preliminary and fragmentary sketch of some of the leading ideas of Cartesian linguistics. … Questions of current [i.e., 1965–1966] interest will … determine the general form of this sketch, that is, I will make no attempt to characterize Cartesian linguistics as it saw itself, but rather will concentrate on the development of ideas that have reemerged, quite independently in current work.

Further, in some remarks in his 1970s televised and then published discussion with Foucault, he says of his approach to the texts of Descartes and other historical figures who figure in CL: I approach classical rationalism not really as a historian of science or … philosophy, but from the rather different point of view of someone who has a certain range of scientific notions and is interested in seeing how at an earlier stage people may have

5

Page numbers are to CL’s 2009 edition.

On the History of Universal Grammar 85 been groping towards these notions, possibly without even realizing what they were groping towards. (Elders 1974:143)

So Cartesian Linguistics is not an antiquarian work. Rather, as suggested above, it discusses ways in which various individuals contributed to (and in the case of empiricists undermined) the institution of and refinements in a naturalistic research strategy for the scientific study of mind and language initiated (Chomsky’s title suggests) at around the time of Descartes, and—to an extent that Chomsky considers significant—by Descartes. Its linguistics focus is fixed by work current in 1965 and—given now the history that has been sketched—its more successful successors. And it illuminates both historical figures and works and current work.

4.3.2 Why Descartes? Descartes’s connection to linguistics might appear remote. He had little of interest to say about language itself. He did, though, make important observations concerning language use and he detailed the implications of these observations for a natural science of the mind. He also was with Galileo and a few others one of the originators of the methods of modern formal naturalistic theory construction. And he made some interesting and relevant claims concerning innateness, including some that appear to demand what we would now think of as biological (or at least biological, physical, chemical, computational, and so forth) explanation. Perhaps it is these three contributions taken together that led Chomsky to call his study ‘Cartesian Linguistics.’ All contributions are novel with Descartes in some identifiable way, and each deserves attention. 1. Descartes on the methods of natural science: In his Discourse and in slightly different forms elsewhere Descartes offered what is plausibly the most articulate and relevant of the very few efforts available at his time to describe natural science methodology. Some might hold that that honor belongs to Bacon. But Bacon’s dicta bear primarily on gathering and organizing data or evidence, not on the more central matters of constructing theories of hidden phenomena—indicating what they should aim for and accomplish. Others might point out that Descartes left things out, such as Galileo’s emphasis on simplicity. But Descartes did not ignore simplicity. In fact, he did quite well at approximating the desiderata for natural science I listed above, with one prominent exception: progress. For Descartes, natural science amounted to his contact mechanics, which he believed applied to all ‘extended substance.’ He was much too confident about his theory’s longevity, as Newton’s law of gravitation fifty years later showed. Newton’s law postulated what was for Descartes an arcane force far removed from the ‘mechanical philosophy’s’ (and common sense’s) notion of effect through contact. Exercising critical scrutiny and some charity, one can find a majority of these desiderata in the Discourse’s four rules. Descartes’s much-disputed first rule is not to accept as true anything unless one is certain of it, where certainty amounts to having the claim

86 James McGilvray ‘present itself to my mind so clearly and distinctly that I had not occasion to doubt it’ (Descartes 1984–1985:120). Certainty—especially Descartes’s psychological-looking version of it—does not belong in natural science. Clarity and distinctness, however, accord with some of the seven desiderata and with current practice in the natural sciences. They accord with the demand to break things down into simples (what in Descartes’s time would be called ‘corpuscles’), to employ precise and explicit formalization (for Descartes, mathematics and geometry), and to seek simplicity in the natures that one aims to describe. A contemporary version of some of these desiderata appears in Colin McGinn’s (1994) CALM (‘combinatory atomism and lawlike mappings’). Notice that insisting on ‘clarity and distinctness’ distorts matters when applied to objects and events as understood in the commonsense domain, where interests of humans play central roles in ‘defining’ things. Into what does one analyze a shirt—sleeves, shoulders, main body, collar, cuffs? What about buttons and zippers? Individual threads? The strands that make up the threads? The molecules of which cotton and Lycra are composed? The point: do not start; you lose the shirt. Shirts are ‘tools’ of a sort manufactured for human use, and their parts—whatever they might in any specific case be—subserve the relevant purpose. Analysis into ‘simples’ misses the point. With the natural science of language, on the other hand, the lexical item shirt might be analyzed into ‘sound’ (phonological) and ‘meaning’ features (semantic features), perhaps—depending on one’s view of syntax and expression—formal features too. At each level of grain, one needs arguments that that degree of simplification is adequate for the purposes of the theory and its law-like principles: with ‘word meanings,’ for example, it is useful to break a word’s meaning/ concept down into more primitive semantic features because this allows for a plausible account of swift meaning/concept acquisition, and provides a way to account for differences in application (including metaphor) and differences between I-languages, even though from the point of view of syntax, a word’s cluster of semantic features might be treated as a unitary whole.6 The second rule adds little to the first: ‘divide each of the difficulties I examined into as many parts as possible as may be required in order to resolve them better.’ Again, 6

After the invention of natural science in the 17th century, no one should be surprised—for reasons like those above and others—that the entities and concepts of the natural sciences (or at least, those of the more advanced forms) are remote from common sense and its form of understanding the world, or what Descartes called ‘bon sens.’ Descartes did not fully appreciate the difference himself. He argued that good sense or common sense must be supplemented with his method (which yields science) to reach truth; see, for example, ‘The Search for Truth’ in Descartes (1984–1985) and its portrayal of Polyander’s development. That the results of following the method surprise the person of unaided common sense should indicate that they have as a result come to understand in a way that they had not before. Descartes recognized the distinction in another form, however. He distinguished the concepts (‘ideas’) that appear in science from the innate ‘common notions’ of common sense. The distinction appears in several places, including his reply to objections to the sixth meditation, and it is implicit in his reply to Regius in Comments on a Certain Broadsheet. In a letter to Mersenne in Descartes (1991:182–183), he explains with the example of the sun. There is for common sense the ‘common notion’ we have of the sun (rising and setting, etc.) and for science the ‘invented’ or ‘made up’ concept the scientist creates in creating a natural science theory. That example would be significant to his audience.

On the History of Universal Grammar 87 this applies usefully in science (with ‘possible’ depending on theoretical purposes), not to commonsense objects and commonsense actions and events. A walk to the store is not understood as n numbers of steps, or m number of left knee joint flexions, etc. ‘John walked slowly on Saturday to the store,’ on the other hand, might from the point of view of Pietroski’s minimalist-inspired semantic theory (2005) be understood in a neo- Davidsonian way: event e is a walk, and e is slow, and agent is John, and so forth. The third is to ‘direct … thoughts in an orderly manner, by beginning with the simplest and most easily known and by supposing some order even among objects that have no natural order of precedence.’ Descartes no doubt associated this rule with his foundationalism; we must reject foundationalism. The rule is also, however, a way of anticipating the notion of the unity of the sciences—something Descartes insisted on in his Rules too. If it is read as a demand for some kind of reductionism in the sciences, it is unwarranted, given what is actually exhibited by scientific practice. Softened to something like a demand for accommodation between appropriate sciences, however, we find in this rule an early version of the desideratum that a science be accommodated to another, or others. It is a central aim of current practice in biolinguistics to do this, making it possible to accommodate language to biology and a plausible account of evolution and third- factor considerations. And the fourth and last: ‘throughout to make enumerations so complete, and reviews so comprehensive, that I could be sure of leaving nothing out’ (Descartes 1984–1985:120). Here there is little need for charity: this looks like the demand that a theory be descriptively and explanatorily adequate to its domain. Of course, as mentioned, Descartes thought that natural science had a single domain—the ‘extended’—but we can ignore the several problems with that. And we can and should recognize that one can have natural sciences of the mind that are descriptively and explanatorily adequate to their specific domains—vision, perhaps, or language—although not language use. As noted, the four rules do not touch on progress; Descartes naively believed that his method quickly yields settled truth. Nor do they bear explicitly on objectivity understood as ‘built into nature’—or at least, nature as we can understand it. That can be assumed, however; it is presupposed in what he has to say about his contact mechanics and its scope—apparently, everything in the domain of ‘body’ or extension. Progress and perhaps objectivity aside, then, his rules anticipate our current view of natural science methodology. 2. The limits of the methods: Descartes’s second unique contribution to the natural science of language (and mind) lies in his observations concerning the creativity of language use and the implications he drew for the natural science of mind. Appearing in Part V of the 1637 Discourse (with no precedents of which I am aware) and expanded on by Cordemoy (1666), the creative aspect observations amount to three. (1) Language use appears to be free of internal or external stimulus control: I can speak or think linguistically about anything, anywhere, under any description, without regard to external or internal causal influence. Stimuli of either kind might ‘incite and incline’ me to think or speak one way or another, without in any way causing it. In effect, an audience or

88 James McGilvray I might find in some prior thought or external condition a reason for me to think or say such-and-such. But neither prior nor simultaneous conditions constitute causes. Those committed to a mental cause program (e.g., Fodor’s ‘computational theory of mind’) will claim a cause of some sort, but it is at best a claim on behalf of ‘rational causation’ (person-causation), which is inconsistent with both these observations and natural science methods. (2) Language use appears also to come in any of an unbounded number of sentences. The unbounded sentences can be—and in a generative theory are—the computational products of recursive operations on a finite set of lexical items. There is no upper limit, even with regard to a specific ‘inciting or inclining’ circumstance. Someone stumbling over another person’s foot could be described appropriately in many ways, with no upper bound on the ways that make it appropriate. (3) While not being caused and while taking any number of forms, given a specific circumstance or question posed (or some other ‘reason to speak or think’), what a person says or thinks remains appropriate, or ‘reasonable.’ The creative aspect observations are to be taken as a group. One can program a computer to produce any number of ‘well-formed’ language-appearing outputs, and do so randomly, thereby modeling a form of stimulus freedom and unboundedness. But there has been no success in programming a computer that can unrestrictedly meet all three conditions at the same time. Notice further that it is important that these observations bear not on the nature of whatever computational system generates sentential expressions, but on the use to which such expressions are put by human beings. Humans can (and do) have computational systems in their heads that could yield endless numbers of sentences or expressions (linguistic competence), and sciences of language focus on this competence. But the use of the expressions by humans appears to be beyond the reach of the sciences of language and mind, at least as we humans understand the nature of natural science and its methods. In effect, we can and do have sciences of language and its development in an infant, but no science of language use. Only an internalist (‘in the head’) approach to language—without regard to use by people—allows for an adequate (by the standards above) natural science of human I-languages and the ways that they develop. No adequate natural science of language has, or likely can, deal with the uses to which a specific person puts what his or her computational system provides. These facts about use and the scope of a science of language conflict with deeply held views of philosophers, psychologists, and other cognitive scientists, including many in the 16th and 17th centuries and still now. Perhaps that is because many remain in the thrall of commonsense externalist and anti-nativist views of mind and language.7 7

When Marr introduced his view of a computational system, he misguidedly used J. J. Gibson’s account of vision to claim that the ‘problem’ that the visual computational system is supposed to ‘solve’ amounts to recovering the environment. In fact, the problem it solves is that of computing a retinotopic (altitude, azimuth, depth) ‘map’ with colored ‘pixels’ from the intensity values of an array of light- sensitive cones and rods.

On the History of Universal Grammar 89 3. Innateness. Descartes made another important contribution to the science of language and mind. He was not the first to offer some form of poverty of the stimulus observations; perhaps Plato was. But he was arguably among the first to think about what kinds of constraints these observations imposed on the study of mind—a job done in greater detail soon afterwards by Ralph Cudworth and pursued in more detail with regard to language by Humboldt and later thinkers. Descartes noted that being innate is compatible with needing some kind of ‘triggering’ event or ‘input’ in order to develop; these are what he called ‘adventitious’ ideas. Further, assuming his analogy to diseases that might arise in some families (Descartes 1984–1985:303–304), a concept’s development might not only require input, but might require a course of nature-based development or growth, one that we (not he) would seek to describe and explain by use of biology and the other sciences involved in ‘evo-devo,’ including ‘third factor’ contributions. In addition, he held on reasonable grounds that the nature of a concept that develops depends not on the event(s) that trigger or begin its development, but depend instead on the internal system(s) that fix its nature. And finally, he noted that not all concepts are innate; some are ‘invented,’ or ‘made up.’ The latter include the important concepts invented by the scientist when (s)he constructs a theory. His example (Descartes 1991:182–183) contrasts the innate although adventitious ‘common notion’ of the sun with the concept the scientist invents. The invented concepts can be taken to be products of his method, which can also be called a procedure for what Chomsky calls ‘science formation,’ a procedure guided only by the ‘light of nature’ we all have available, but only occasionally follow. The concepts that are innate (‘anticipated’ by the mind, as in 17th-century Cudworth’s 1996 ‘proleptic’) limit the domain of common sense. Scientific concepts are a different matter. Descartes was likely unique in bringing together for the first time these observations and points. They support a nativist approach to mind along with an insistence on distinguishing science from common sense. I am not suggesting that Descartes himself or any of the others portrayed as rationalists in CL recognized all of the implications of these contributions, or that any of them prior to Chomsky and the introduction of the relevant formalism, the development of the science of biology, and the focused efforts of many working within a particular field actually managed to develop a natural science of language—or even realize that that was what work within the rationalist strategy as applied to language in the head could lead to. Notoriously, Descartes denied that there could be a natural science of mind or language at all. Rather, I am suggesting that these contributions to the natural science of language as we now understand it make sense of why CL is ‘Cartesian.’ They explain why Chomsky chose to give Descartes the honor of initiating a natural science methodology that leads to a remarkable degree of success in the study of language and mind. And in an indirect way, Descartes indirectly even initiated a natural science of language. He did so through a contribution of one of his protégés, Arnauld. The following section indicates how.

90 James McGilvray

4.3.3 CL’s Critics I will be very selective in the individuals and critiques discussed. I do not discuss Hans Aarsleff ’s two attempts to denounce CL or George Lakoff ’s early review. I also ignore many later efforts to criticize CL (they continue to appear) that are derivative or simply misunderstand. Some reviewers did read CL carefully and attempted to address it, or at least parts of it. The two I focus on are Robin Lakoff (1969) with her review of a then-new edition of Arnauld and Lancelot’s (1660/1676/1966) Grammaire générale et raisonnée and Vivian Salmon (1969) reviewing CL itself. Karl Zimmer (1968) deserves discussion too, but space limitations prevent anything but mention. I emphasize that I have had the advantage—and as the discussion above should indicate, it is an advantage—of seeing what has become of the study of language after six decades of exercising a rationalist research strategy for the study of mind that adopts natural science methodology. I can see what even Chomsky in 1966 could not have seen—how important certain aspects of Descartes’s contributions proved to be. If Chomsky could not fully appreciate them in the 1960s, Robin Lakoff and Salmon would appreciate them less. I will comment only on their success at showing that the Grammaire offers nothing new or distinctly Cartesian. Lakoff argues that Lancelot developed some of the Grammaire’s insights (among other things, concerning ellipsis and deletion) in an earlier work, and emphasizes that in a late edition of that work, Lancelot attributed his insights in turn to the 16th century’s Sanctius, not Descartes. She also points to precedents for the distinction between hidden processes and surface ones. On these and like grounds, she claimed that the Grammaire was neither original nor Cartesian in its claims about language. Salmon in her historical tour de force traces the Grammaire’s notions of generality/universality and hidden structure to a wide variety of pre-Grammaire non-Cartesian precedents, sometimes remote. She also mentions Campanella’s claim that he was aiming towards a ‘scientific’ account of language. Her erudition and scope are impressive, and she makes a reasonable case against saying that generality/universality, the postulation of ‘hidden’ elements, or an aim towards something Campanella called ‘science’ with regard to language originated with the Grammaire. On precedents to various points in the Grammaire, Lakoff and Salmon were no doubt by reasonable standards of relevance in the field of intellectual history correct. After all, claims of generality with regard to language could go back to Plato and Pāṇini, although admittedly with decreasing relevance—or interest. Focusing on one characteristic makes it particularly easy, especially with recent prior examples. And Chomsky himself acknowledges general and particular influences (Chomsky 1966/ 2009:134n67): Apart from its Cartesian origins, the Port-Royal theory of language, with its distinction between deep and surface structure, can be traced to scholastic and renaissance grammar; in particular, to the theory of ellipsis and “ideal types” that reached its fullest development in Sanctius’s Minerva (1587).

On the History of Universal Grammar 91 But if not these, then what does Chomsky see as the ‘Cartesian origins’ of Arnauld and Lancelot’s Grammaire? Clearly, it was not because Descartes offered anything in the way of a science of language. Recalling Chomsky’s list of Descartes’s distinctive contributions above, if he contributed, he did so through the ‘package’ of rules that captured natural science methods, and/or through his emphasis on creative language use, and/or through his dispositional, nature-based account of innateness. The rules dominate: I suggest that Chomsky saw in the Grammaire and other Port-Royal work the initiation of natural science methods applied to a mental system. Probably the bulk of Descartes’s methodological contribution to a science of language came through Arnauld, who corresponded with Descartes and was impressed by and tried to honor Descartes’s rules in the Logique that he published with Nicole two years after the Grammaire. Because of Descartes’s influence on Arnauld, Arnauld and Lancelot were on the way to natural science explanations. They were unlikely to be aware that they had launched the study of language on a path different from anything offered before, yielding— eventually— formalized computational/ derivational sciences of internal mental ‘modules,’ But it is still plausible that they did. They could not do so, of course, until after Descartes made his contributions. After Descartes, there was what amounted to a test for a natural science of mind: a candidate must show evidence of adopting a rationalist natural science research strategy and provide some indication that it can progress towards satisfying the desiderata of such research. Consider the speculative grammarian Campanella that Salmon mentions (1969:172–173) as among those who anticipated the Grammaire’s insights on universality and who claimed to be offering a science. Being influenced by Galileo (if he was) would not be enough; Galileo’s view of science had little to offer concerning the mind. To meet the test, Campanella’s efforts would have to indicate that he was aiming to satisfy the desiderata of natural science and that he proceeded with the internalist and nativist assumptions required for natural sciences of the mind, assumptions underwritten by Descartes’s works, not Galileo’s. I say this despite the fact that Descartes himself held (on the basis of the creative aspect of language use) that the mind is out of reach of natural science. As suggested in the paragraphs that follow, he did so because he attributed the source of sentences that can be used creatively not to an internal native system, as he should, but to what he called ‘reason.’8 After several decades of work on a rationalist natural science of language it is all the easier for us now to see that the Grammaire was not by the standards of natural science a great success. Chomsky points out some failings in CL—among them, being insufficiently explanatory in what we can see as a natural science way (pp. 96–97). We can add a lack of explicit formalism and several other items on the list of desiderata. 8

Despite his restrictions, Descartes proceeded to offer the rudiments of an internalist and nativist computational theory of a mental system. In the Optics and elsewhere he noted that ocular convergence yielded visual depth (effectively, what the mind yields a person for use) and pointed out that the blind can approximate the relevant calculations by using variably converging sticks. For discussion, see part III of my introduction to CL (2009).

92 James McGilvray We can see all this clearly now, because we can also see the failings of Aspects and its Deep Structures too—what Chomsky had achieved by 1966, and his object of comparison at the time. But the failings that Chomsky lists in CL—and these and others more obvious now—do not challenge my primary point. The Grammaire’s novelty lay in beginning to conceive of the science of language as a rationalist form of natural science of mind. Linguistics could not be Cartesian until after Descartes and reasonable evidence that his methods, creativity observations (and their implications), and dispositional account of innateness affected the work of others. Lancelot and Arnauld tried to honor some of his methods, at least, and even though they had nothing worth mentioning to offer concerning creative use and innateness, the Grammaire appears to respect these considerations. Fuller recognition of innateness and creative use came with later ‘Cartesian linguists.’ Descartes’s poverty of the stimulus observations (and related cluster of observations) are central to Cudworth’s work, even though a full implementation of a dispositional account of innateness is properly dealt with only by recent work on evolution and development (evo-devo), only now beginning to be explored in detail. As for creativity, due to later Cartesian linguists’ work and Chomsky’s, we now can plausibly hold that the way to deal with the creativity observations is to ensure that language proper (as opposed to its use) is captured by a theory of the operations of a modular internal system that is not creative as such itself, but that can yield an indefinitely large number of understandable (though not necessarily usable) sententially-expressed ‘perspectives.’ As pointed out, there is no science of creative language use, only of an isolated generative system that makes creative use by people possible. I should emphasize that taking language-based creativity into account properly has important consequences for the discussions of language and mind in the 17th century, and before and since. Since it is people who think and reason (typically using language to do so), there is no natural science of thinking and reasoning, only of some of the forms that our natures provide for doing so. The right way to look at the matter is to see language as making linguistically expressed thought possible. The reader can easily draw the implications for the unexamined assumption—exhibited in the work of the majority from Aristotle to Descartes and still now—that the right way to look for universal principles on which to base a theory of language or mind is to look to reason, thought, and logic. It is not; it is to look at the internal ‘machinery’ of the mind, revealed only through natural science methods. The creativity observations demand a view of language and mind that breaks with a long and misguided tradition.

On the History of Universal Grammar 93

4.4 Conclusion It is clear that the natural science of language has made considerable progress since the 1950s. And it is only recently that linguistics has come to deal fully with Descartes’s contributions to the study of mind and language. Nevertheless, the Grammaire apparently did start something new and Cartesian. In depicting the partial and halting beginnings of a rationalist and naturalistic science of language, Cartesian Linguistics is aptly named.

Pa rt I I

L I N G U I ST IC T H E ORY

Chapter 5

The C once p t of Ex pl anatory A de quac y Luigi Rizzi

From the point of view that I adopt here, the fundamental empirical problem of linguistics is to explain how a person can acquire knowledge of language. (Chomsky 1977:81)

5.1 Introduction The quote that introduces this chapter underscores the central role that the problem of language acquisition has had throughout the history of generative grammar (see also chapters 10, 11, and 12). Chomsky felt it was appropriate to start his foundational paper ‘Conditions on Transformations’ (Chomsky 1973), the first systematic attempt to structure a theory of Universal Grammar, by highlighting the importance of the acquisition issue. A few years earlier the central role of acquisition had been expressed in an even more fundamental, if programmatic, way: a particular, technical specification of the intuitive notion of explanation, ‘explanatory adequacy,’ was linked to the acquisition issue. An analysis of a linguistic phenomenon was said to meet ‘explanatory adequacy’ when it came with a reasonable account of how the phenomenon is acquired by the language learner (Chomsky 1964). In this chapter, I would like to first illustrate the technical notion of ‘explanatory adequacy’ in the context of the other forms of empirical adequacy envisaged in the history of generative grammar. I will then discuss the relevance of arguments from the poverty of the stimulus, which support the view that the adult knowledge of language cannot be achieved via unstructured procedures of induction and analogy recording and organizing knowledge in a tabula which is rasa initially (see also chapter 10). These

98 Luigi Rizzi arguments bear on the complexity of the task that every language learner successfully accomplishes, hence they define critical cases for evaluating the explanatory adequacy of a linguistic analysis. After illustrating the impact that parametric models had on the possibility of achieving explanatory adequacy over a large scale (see c hapter 14), I will then address the role that explanatory adequacy plays in the context of the Minimalist Program, and the interplay that the concept has with the further explanation ‘beyond explanatory adequacy’ that minimalist analysis seeks. I will conclude with a brief discussion of the connections and possible tensions arising between explanatory adequacy and simplicity, an essential ingredient of the intuitive notion of explanation.

5.2 Observational, Descriptive, and Explanatory Adequacy Chomsky (1964) distinguished three levels of empirical adequacy that a formal linguistic analysis can meet. Given a sample of linguistic data, a corpus of sentences that the linguist takes as a starting point for his description of the language, a fragment of generative grammar can meet: 1. Observational adequacy: the relevant fragment correctly generates the sentences observed in the corpus. 2. Descriptive adequacy: the relevant fragment correctly generates the sentences in the corpus, correctly captures the linguistic intuitions of the native speaker and ‘specifies the observed data … in terms of significant generalizations that express underlying regularities in the language’ (Chomsky 1964:63). 3. Explanatory adequacy: the relevant fragment reaches descriptive adequacy, and is selected by Universal Grammar over other alternative fragments also consistent with the observed corpus. The distinction between observational and descriptive adequacy is related to the distinction between weak and strong generative capacity: weak generative capacity has to do with the generation of the right sequences of words; strong generative capacity deals with the generation of sequences of words with appropriate structural descriptions (see further on). For observational adequacy, it does not matter what structural description (if any) the fragment of grammar associates to each sentence: the only important thing is that it generates the right sequence of words, hence, in traditional terminology, is adequate in terms of weak generative capacity. In order to meet descriptive adequacy, on the other hand, the fragment of grammar must also assign to the sentence the correct structural descriptions, able to capture certain intuitions of the native speaker and express certain generalizations: it must be adequate also in terms of strong generative capacity. An example will immediately clarify the distinction.

The Concept of Explanatory Adequacy 99 Suppose we are building a grammar of English able to generate the following sentence among others: (1)

The boy will eat the apple.

And we try to do so with a grammar involving a very restrictive version of Merge (if the reader will excuse the little anachronism in terminology and formalism), call it X –YP Merge, which only permits merging a new word with a phrase, but not two phrases already formed. This grammar will be able to generate (1) with the following structural description, obtained by successively merging each word to the phrase already formed (eat and the apple, will and eat the apple, etc.): (2) [ the [ boy [ will [ eat [ the [ apple ]]]]]] In terms of observational adequacy, our grammar does its job, as it generates the sequence of words in (1); but in terms of descriptive adequacy, it fails: it does not capture the fact that ‘the boy’ behaves as a unit with respect to a number of possible manipulations (movement, deletion, etc.), is interpreted as a unit by both interfaces (on the meaning side, it refers to an argument taking part in the event, and on the sound side it is treated as a unit in the assignment of the intonational contour), is perceived as an unbreakable unit, as shown by classical ‘click’ experiments (Fodor, Bever, and Garrett 1974) in experimental psycholinguistics, etc. In order to meet all these empirical constraints, we need a fragment of grammar capable of merging complete phrases (call it XP –YP Merge), hence able to assign (1) the correct representation (3): (3)

[ [the [ boy]] [ will [ eat [ the [ apple ]]]]]

The phrase will eat the apple is generated and placed in a temporary buffer; the phrase the boy is generated, and then the two phrases are merged together by XP –YP merge. So, the grammar endowed with XP –YP Merge is descriptively adequate, while the grammar endowed uniquely with X –YP Merge is not; on the other hand both are observationally adequate for the minute fragment of English that we are considering. Notice also that the two grammars may be equivalent in weak generative capacity, as they may generate the same sequence of words, but they are not equivalent in strong generative capacity, because the XP –YP Merge grammar can generate structural description (3), which the X –YP Merge grammar cannot. Another important conceptual dichotomy that the distinction between observational and descriptive adequacy relates to is the dichotomy between E(xternal)-language and I(nternal)-language (see also chapter 3, section 3.5). An observationally adequate grammar describes an object of the external world, an E-language (or a fragment thereof): a sentence or a corpus of sentences. A descriptively adequate grammar describes a representational system internal to the speaker, an I-language: a generative function capable

100 Luigi Rizzi of generating an unbounded set of sentences and structural descriptions capturing basic generalizations of the language, for example, that the sequence determiner–noun patterns as a unit with respect to innumerable formal manipulations in English. So, the notion of observational adequacy is of very limited relevance, if any, for the study of language as a cognitive capacity in a context of cognitive science: the critical notion is descriptive adequacy, which concerns the empirical adequacy of a theory of the speaker-internal entity which is the object of inquiry: the representational system that every speaker possesses, and which allows him to produce and understand new sentences over an unbounded domain.

5.3 Explanatory Adequacy Of an equally important cognitive significance is the distinction between descriptive and explanatory adequacy. While observational and descriptive adequacy express levels of empirical adequacy that a particular grammar can meet, explanatory adequacy has to do with the relation between a particular grammar and Universal Grammar (UG), the general system that limits the class of particular grammars. A particular descriptively adequate analysis meets explanatory adequacy when UG provides general principled reasons for choosing it over imaginable alternatives. Here too, an example will immediately illustrate the point. Given any Subject–Verb–Object structure, a priori three structural organizations can be envisaged: (4)

a. [S [V O]] b. [[S V] O] c. [S V O]

(4a) assumes a V–O constituent, which is merged with the subject; (4b) assumes an S– V constituent which is merged with the object, and (4c) assumes a flat, ternary structure. Every first year student in linguistics knows that the ‘Aristotelian’ structure (4a), expressing the subject–predicate relation, is the correct representation, as is shown by innumerable kinds of evidence. For instance, by the fact that the subject is systematically higher in the tree than the object: in technical terms, the subject asymmetrically c-commands the object.1 An anaphor in object position can be bound by the

1

I adopt here Chomsky’s (2000b) definition of c-command:

(i)

A c-commands B iff B is dominated by the sister node of A For example, in the following:

(ii)

. . . [A [x . . . B . . . ] ]

B is dominated by X, which is the sister node of A. So, A c-commands B, according to (i). The notion was originally introduced by Reinhart (1976), where the formal definition was slightly different.

The Concept of Explanatory Adequacy 101 subject, but not vice versa, and only (4a) expresses the empirically correct c-command relations: (5)

a John saw himself. b *Himself saw John.

A liberal enough UG (like most theories of phrase structure based on rewriting rules), consistent with the three representations in (4), would not offer any principled reason for selecting (4a), hence would not immediately meet explanatory adequacy here. A more restrictive theory of UG, restricted to generating binary branching structures with specifiers preceding complements (such as Kayne’s 1994 antisymmetric approach) would only be consistent with (4a), hence it would directly meet explanatory adequacy here. The connection with acquisition becomes clear at this point. Consider the situation of a language learner who must acquire the phrase structure of his language, say English. If the learner is equipped with an ‘anything-goes’ UG, consistent a priori with (4a), (4b), and (4c), the choice cannot be made on principled grounds, and the learner must choose on the basis of the evidence available, the primary linguistic data. For instance, the fact of hearing (5a) already leads the learner to exclude (4b), which would be inconsistent with it (assuming here, for the sake of the argument, that the learner ‘knows’ independently that the antecedent must c-command the anaphor). But (5a) is consistent with both (4a) and (4c), hence excluding the latter is more tricky, particularly if the learner has no direct access to negative evidence (such as the information ‘(5b) is ungrammatical’). One cannot exclude, in this particular case, that other kinds of positive evidence may lead the child to choose the correct structure; but the fact is clear that the permissive UG does not offer any guidance for the choice: in the particular toy situation that we have envisaged, an analysis based on a permissive UG does not reach explanatory adequacy (this is particularly clear in ‘poverty of stimulus’ situations, on which see section 5.4). In contrast, consider a restrictive UG endowed with binary branching constraints and ordering constraints of the kind ‘specifier precedes complement.’ A language learner endowed with such a restrictive UG automatically selects (4a) as the representation for transitive sentences, with correct consequences for the binding facts in (5) and for innumerable other cases. An analysis of such facts based on the restrictive UG thus meets explanatory adequacy. The language learner chooses representation (4a) because the system he is endowed with offers him no other choice. Explanatory adequacy is thus intimately connected to the ‘fundamental empirical problem’ of language acquisition. Humans acquire a natural language early in life, without specific instruction, apparently in a non-intentional manner, with limited individual variation in spite of the fragmentary and individually variable courses of experience that ground individual knowledge of language. A restrictive theory of UG can thus offer

102 Luigi Rizzi a plausible account of the rapidity and relative uniformity of language acquisition: children endowed with a restrictive UG can converge quickly and efficiently on the adult grammar because they have only few options to choose from: the convergence can thus take place in the empirical conditions (of time and limited exposure to the data) which are observed in the study of actual language development (see also chapter 12).

5.4 On Poverty of Stimulus The arguments for a restrictive UG are made particularly cogent in special situations, some of which have been illustrated and discussed in detail in the literature, the so- called ‘poverty of stimulus’ situations. In such situations the ‘primary linguistic data,’ the data available to the language learner, would be consistent with the postulation of a number of grammatical mechanisms over and above the ones that adult speakers seem to unerringly converge on. In such cases, it appears to be legitimate to attribute the choice of a particular mechanism to an internal pressure of the learning system, rather than to a data-driven induction (see chapters 10 and 11). In order to acquire some concrete argumentative force, such arguments must typically envisage and compare two hypotheses, say A and B, both with some initial plausibility and appeal, and both plausibly consistent with the data available to the child: then, if it can be shown that B is always part of the adult linguistic knowledge, while A is systematically discarded, then it is legitimate to conclude that the selection of B by the learner must be based on some general principle that is part of the initial endowment of the mind. One example that has very often been taken as an effective illustration of this situation, both in presentations of the issue and in critical appraisals, is the so-called ‘structure dependency’ of rules. In line with modern linguistic analysis, I would like to illustrate it in terms of the acquisition of particular constraints on movement, rather than in terms of the form of a particular construction-specific rule (as in the original discussion; see Chomsky 1968/1972/2006 for the original formulation and much recent discussion, in particular Crain et al. 2010, and Berwick et al. 2011; with responses to critiques of the argument in Lewis and Elman 2001, Reali and Christiansen 2005, and Perfors, Tenenbaum, and Regier 2006). Consider familiar cases of subject–auxiliary inversion in English, characterizing questions and a few other constructions: (6)

a The man is sick. b Is the man sick?

Informally, the relevant movement operation takes an occurrence of the copula (or of other functional verbs, auxiliaries, modals, etc.) and moves it to the front. Which occurrence of the copula is selected? In simple cases like (6) there is no choice to be made, but

The Concept of Explanatory Adequacy 103 in more complex cases, some locality constraint is needed for choosing between alternatives. If two potential candidates are involved, the one that is closer to the front is selected, i.e., isp in (7b), and the operation cannot take a ‘distant’ occurrence, such as isq in (7c); put slightly differently, a distant occurrence of a certain kind of element, cannot ‘jump over’ an intervening occurrence, and the closer occurrence always wins: (7) a.

The man isp aware that he isq sick

b.

Isp the man ___p aware that he isq sick?

c.

*Isq the man isp aware that he ___q sick?

The locality effect illustrated by (7) can be seen as a special case of a very general and natural intervention effect which can be captured by a formal principle roughly expressible in the following terms: (8) In a configuration like: … X… Z … Y … X cannot attract Y if there is a closer potential attractee Z that intervenes between X and Y. (This formulation is something of a synthesis between Relativized Minimality, Rizzi 1990, and the Minimal Link Condition, Chomsky 1995b, etc.). So, (7c) is ruled out because in the source structure (7a) isp (Z) intervenes between the clause-initial complementizer, the attractor (X), and isq (Y):2 (9)

C The man isp ↑ * X Z

aware that he isq

sick

Y

How is ‘intervention’ calculated, here? A very simple idea that immediately comes to mind is that the calculation is performed linearly: so, isp linearly intervenes in the sequence of words between C and isq, so that the latter cannot be attracted to C: (10)

Linear intervention: Z intervenes between X and Y when X precedes Z and Z precedes Y in the sequence of words.

2 This analysis can be immediately generalized to cases not involving the copula: if in general C attracts T, the position expressing tense, the attraction must involve the closest occurrence of T; in other words, an intervening occurrence of T, whatever its morpholexical realization (as a copula, auxiliary, modal, or as simple verbal inflection) determines an intervention effect. It should be noticed here that in at least certain versions of Phase Theory (Chomsky 2001a), (9) is independently excluded by Phase Impenetrability because isq is in a lower phase.

104 Luigi Rizzi This linear definition of intervention is very simple and seems to work for cases like (7), but clearly it does not express the ‘right’ concept of intervention for natural language syntax in general, as is shown by examples like the following (and innumerable others): (11) a.

The man who isp here isq sick.

b.

*Isp the man who ___p here isq sick?

c.

Isq the man who isp here ___q sick?

Here isp is closer to the front than isq linearly (it is separated from the beginning of the sentence by three words vs. five), and still the ‘distant’ occurrence isq is selected, as in (11c), and the linearly intervening isp does not yield any intervention effect. Why is that so? It appears that the notion of ‘intervention’ relevant for natural language syntax is hierarchical, expressed in terms of c-command, rather than linear precedence. For example: (12) Z intervenes between X and Y when X c-commands Z and Z c-commands Y. If we think of the relevant structural representations, it is clear that isp intervenes hierarchically between C and isq in (7c), but not in (11c), where it is embedded within the relative clause, hence it does not c-command isq:3 (7′)

c

*Isq

[ the man [ isp aware [ that he ___q sick ]]]?

(11′)

c

Isq

[ [the man [who isp here]] ___q sick ]?

3 The hierarchical definition (12) accounts for the lack of intervention effect in (11c), but it does not say anything on the impossibility of (11b). In fact in this case neither occurrence of is c-commands the other; therefore, no intervention effect is expected. In fact, the conclusion that (11b) is not ruled out by intervention is desirable, because it is excluded independently; the relative clause is an island, from which nothing can be extracted (Ross 1967 and much subsequent work). For instance, the locative here cannot be replaced by its interrogative counterpart where and questioned (we illustrate the phenomenon with an indirect question to avoid complications with inversion):

(i)

a

The man who is here is sick

b * (I wonder) where [the man [who is ___]] is sick So, the non-extractability of is in (11b) follows from the island character of the construction (Berwick and Chomsky 2011), and there is no need to appeal to intervention here (and no possibility, if intervention is hierarchically defined, as in (11)). On the other hand, in (7) the embedded complement clause is not a general island, as a wh-phrase is extractable from it: (ii)

How sick is the man aware that he is ___?

So that intervention is relevant to rule out (7c) (but see note 2, this chapter, for a possible independent reason to rule out (7c)).

The Concept of Explanatory Adequacy 105 Here comes the acquisition question. Adults unerringly opt for hierarchical intervention, as everybody has crystal-clear intuitions on such contrasts as (7c)–(11c) and the like. Moreover, as Crain and Nakayama (1987) showed, children at the age of three already sharply make such distinctions and unerringly opt for hierarchical formulations of rules and constraints. So, how does the child come to know that syntactic properties like intervention must be calculated hierarchically, not linearly? Most of the data available to the learner, that is, simple alternations like (6), are consistent with both linear and hierarchical analyses, and linear notions are immediately given by the input, while hierarchical notions require the construction of complex and abstract structures. So, why does the language learner unerringly opt for the complex hierarchical notions, the evidence for which is absent or at best very marginal in the primary data, and discard the much simpler linear notions? The natural conclusion seems to be to attribute the choice of hierarchical notions to the inner structure of the mind, rather than to experience. Opting for hierarchical computations is not a property that the learner must figure out inductively from the primary data, but is an inherent necessity of his cognitive capacities for language: language, as a mental computational capacity, is designed in terms of mechanisms (such as Merge) producing hierarchical structures, so what the mind sees and uses in linguistic computations is hierarchical information. There is nothing to learn here: other notions, no matter how simple and plausible they may look, are discarded a priori by the language learner for linguistic computations. The structure dependency of movement rules is a classical case for the illustration of the poverty of stimulus arguments, but such cases are ubiquitous (see again c hapter 10). To introduce a little variation on a classical theme, I will consider a second case, having to do with the interface between syntax and semantics: the constraints on coreference. Every speaker of English has intuitive knowledge of the fact that a pronoun and a noun phrase can corefer (refer to the same individual(s)) in some structural environments but not in others: for instance, in (13) and (15), but not in (14) and (16) (following standard practice, I express coreference by assigning the same index to the expressions to be interpreted as coreferential): (13)

Johni thinks that hei will win the race.

(14)

*Hei thinks that Johni will win the race.

(15)

Johni’s opinion of hisi father is surprising.

(16)

*Hisi opinion of Johni’s father is surprising.

(14) and (16) are of course possible if the pronominal forms he and his refer to some other individual, Peter for instance, but coreference with John is barred (this is what the asterisk expresses in these cases). Clearly, speakers of English tacitly possess some procedure for the interpretation of pronouns that they can efficiently use to compute the network of possible coreference relations in new sentences. Again, a very simple

106 Luigi Rizzi possibility would refer to linear order (we define the generalization in negative terms, following Lasnik’s insight, on which see further on in this section): (17) Coreference is impossible when the pronoun precedes the NP in the linear order (and possible otherwise). This linear principle is quite plausible and reasonable, it would seem: first the NP must introduce a referent, and then a pronoun can refer to it. Nevertheless, there are good reasons, in this case too, to discard a linear characterization in favor of a hierarchical characterization. The linear formulation is falsified by innumerable examples of the following types, in which the pronominal element precedes the noun phrase, and still coreference is fine: (18)

When hei wins, Johni is very happy.

(19)

All the people who know himi well say that Johni can win the race.

(20)

Hisi father thinks that Johni can win the race.

C-command plays a critical role here as well; in fact, the relation was introduced by Tanya Reinhart (see Reinhart 1976) in connection with the issue of referential dependencies; the relevant empirical generalization was identified by Lasnik (1976): (21)

Coreference is impossible when the pronoun c-commands the NP (and possible otherwise).

To illustrate this effect, in the following examples I have indicated by a pair of brackets the domain of c-command (or c-domain) of the pronoun: (13′)

John thinks that [he will win the race].

(14′)

*[He thinks that John will win the race].

(15′)

John’s opinion of [his father] is surprising.

(16′)

*[His opinion of John’s father] is surprising.

(18′)

When [he wins], John is very happy.

(19′)

All the people who [know him well] say that John can win the race.

(20′) [His father] thinks that John can win the race. What singles out (14′) and (16′) is that only in these structures does John fall within the c-domain of the pronoun, regardless of linear ordering, a generalization that Lasnik’s statement (21) correctly captures. Let us now go back to the acquisition issue. The first question to ask is rather radical: why should the learner postulate a structurally-determined ban on coreference at

The Concept of Explanatory Adequacy 107 all? Consider the evidence that learners have access to: they hear utterances containing pronouns corresponding to the sentence types in (13)–(20), and undoubtedly to innumerable others, for instance, ‘John thinks he is sick,’ sometimes with intended coreference, sometimes not (I am assuming that enough contextual information is available to the child to decide if coreference is intended or not in a number of actual utterances available to him), and that’s all. As no negative information is directly provided to the learner (information of the type ‘coreference is barred in this particular environment’), the positive evidence available to the child does not seem to offer any reason to postulate any ban on coreference. So, if induction and generalization from positive evidence were the only learning mechanism here, why wouldn’t the language learner simply assume that coreference, patently a free option in some cases, is a free option in all cases? The very fact that every adult speaker unerringly postulates a ban on coreference in particular structural environments is more likely to stem from some pressure internal to the learning system than from an induction from experience. Moreover, such an internal pressure must be quite specific to enforce a ban of a very particular structural form; the same question arises as in the case of movement rules: why does the learner always choose the hierarchical constraint (21) over the plausible and simpler linear constraint (17)? Again, this could hardly be determined on an inductive basis from the available evidence. So, the selection of (21) over (17) must be the consequence of some specific pressure internal to the learning system. Presumably, the inherently hierarchical nature of language is not just a property of the mental device generating structures: it is a property that pervades the mental representations of linguistic objects, and also deeply affects the interfaces with the interpretive systems. This conclusion is forcefully confirmed by experimental results such as those reported in Crain (1991), which show that the child is sensitive to the hierarchical non- coreference effect as soon as the relevant experimentation can be conducted, i.e., around the age of three, or even before (see also Guasti and Chierchia 1999/2000 for relevant evidence). It should now be clear what conceptual link exists between the notion of explanatory adequacy and the arguments from the poverty of the stimulus. An analysis of the adult competence of the speaker of a given language reaches explanatory adequacy when the theory of UG is able to explain how that aspect is acquired by the language learner on the basis of the evidence available to him. Poverty of stimulus situations are cases in which the evidence available to the learner is insufficient to choose, over imaginable alternatives of some plausibility, one particular feature of the adult competence, for instance that locality constraints on movement and referential dependencies are computed on the basis of hierarchical, not linear, principles. In such cases, explanatory adequacy can be reached by assuming that the relevant feature follows from some pressure internal to the learning system, i.e., explanatory adequacy can be achieved by properly structuring the theory of UG.

108 Luigi Rizzi

5.5 Explanatory Adequacy, Invariance, and Variation Can the level of explanatory adequacy be reached over a large scale? Natural languages involve invariant properties (linguistic universals) and properties that vary from language to language. The program of meeting explanatory adequacy in a systematic way thus requires plausible mechanisms for the acquisition of invariant and variable properties. If the knowledge of invariant properties may stem from the internal structure of UG, the acquisition of cross-linguistically variable properties inevitably involves the role of experience: the learner must figure out all sorts of language-specific properties, from the set of phonetic features with distinctive value, to the association of word forms with particular concepts, to morphological paradigms, properties of word order, etc. So, addressing the issue of explanatory adequacy over a large scale requires a full-fledged theory of language invariance and variation. The theory of principles and parameters introduced such a comprehensive framework. Pre-parametric models, such us the Extended Standard Theory (EST) of the 1970s, were based on the concept of particular grammars, conceived of as systems of language-specific, construction-specific rules. A particular grammar, say the grammar of English or of Japanese, would be a system of rules specific to the particular language and giving a precise characterization of the language particular way in which various linguistic constructions were expressed. So, the grammar of English would have phrase structure rules for the NP, the VP, and so forth, and transformational rules for the interrogative, relative, passive construction, etc. Universal Grammar was conceived of as a grammatical metatheory, expressing the format for phrase structure and transformational rules, hence providing the basic ingredients from which the language particular rules could be built. UG would also provide some general constraints on rule application expressing island properties. Here is an illustrative example, taken from Baker (1978:102), an influential introductory textbook. English would have a transformational rule for passive with the following shape: (22) Passive Structural description: Structural change:

NP – Aux – V – NP – X 1 2 3 4 5 4 2 be+en+3 0 5+by+1

→

The rule would apply to a tree meeting the structural description of the rule (essentially, the tree corresponding to a transitive sentence), and would introduce a structural change consisting of the movement of the object to subject position, the demotion of the subject to a PP headed by by, and the insertion of the passive morphology. The grammar

The Concept of Explanatory Adequacy 109 would include rules expressed in a similar format for raising, question and relative clause formation, topicalization, and so forth. As I stressed at the outset, language acquisition was a crucial issue at the time of the EST model, and in fact this framework went with a theory of acquisition, at least programmatically. The assumption was that the language learner, equipped with the format of rules provided by UG, would figure out inductively, on the basis of experience, the particular rule system that constitutes the grammar of the language he is exposed to. The learner would thus implicitly act as a ‘little linguist,’ formulating hypotheses within the class of formal options permitted by UG, and testing them on the empirical ground provided by the primary data. Among many other problems, this approach had to face a major obstacle concerning acquisition: no operative concept of ‘rule induction’ was introduced, so it remained unclear how the language learner could arrive at figuring out rules of the level of complexity of (22). Given this difficulty, the program of achieving explanatory adequacy over a large scale remained a distant goal. Things changed rather dramatically, in this and other respects, with the introduction of Principles and Parameters model around the late 1970s. According to the new model (Chomsky 1981a), Universal Grammar has a much more substantive role than just functioning as a grammatical metatheory, and directly provides the basic scaffolding of every particular grammar. UG is a system of universal principles, and specifies a finite number of parameters, binary choice points expressing different options that individual languages may take. The ambitious program was to reduce all morphosyntactic variation to such primitive elements, the values that parameters can take at each choice point. The program turned out to have an exceptional heuristic value for comparative studies, and comparative syntax flourished in the following decade and afterward (see chapters 14 and 16 for further discussion and illustration). What is more important in the context of this chapter is that the Principles and Parameters approach provided the tools for meeting explanatory adequacy on a systematic basis. Universal properties could be connected to the structure of principles of Universal Grammar, and the fixation of binary choice points on the basis of experience offered a promising device to account for the acquisition of variable properties. The fixation of a substantial number of parameters is by no means a trivial task because of the complex interactions that may arise, possible ambiguities of the primary data in relation to distinct patterns of fixation, and so forth, a complexity which has been highlighted by the computational modeling of parameter fixation (e.g., Gibson and Wexler 1994; see chapter 11). Nevertheless, the very fact that precise computational models could be built on that basis shows that the parametric approach to variation offered an operative device to concretely address the issue of the acquisition of grammatical systems. Not surprisingly, the study of the development of syntax in the child received a major impulse from the introduction of parametric models: from the viewpoint of such models, the fundamental empirical question of the study of language development was to chart the temporal course of the process of parameter fixation, both experimentally and

110 Luigi Rizzi through the naturalistic study of production corpora (see Hyams’ 1986 seminal proposal along these lines and, among many recent references, Rizzi 2006; Thornton 2008 for discussion of some corpus-based results, and Gervain et al. 2008; Franck et al. 2013 for experimental research bearing on the fixation of fundamental word order parameters).

5.6 Explanatory Adequacy and Further Explanation A properly structured UG can provide a realistic account of how a particular language is acquired and hence meet ‘explanatory adequacy’ in the technical sense discussed in this chapter. But then the further question arises of what explains the structure of UG. Why are linguistic computations universally organized the way they are and not in some other imaginable way? Here the question of explanation is raised at a further level: the explanandum is not the particular grammar that the adult speaker has acquired, but the structure of UG, the nature and properties of the human language faculty. The Principles and Parameters framework made it possible to achieve explanatory adequacy over a large range of phenomena; the Minimalist program has tried to ask the further explanatory question, going ‘beyond explanatory adequacy,’ in the technical sense (Chomsky 2004). What could be the ‘further explanation’ of the properties of UG? If UG is a biological entity, a kind of mental organ, one explanatory dimension of its properties must be evolutionary: much as it is reasonable to investigate the evolutionary history of the shape and structure of the liver, or of the eye, so it would make sense to trace back the structure of UG to its evolutionary roots. The study of the human language faculty in a biological setting can’t avoid coming to terms with the evolutionary aspects. The strategy that the Minimalist Program has adopted to pursue explanation ‘beyond explanatory adequacy’ is to isolate the different factors that enter into the growth of language in the individual. Three factors are identified in Chomsky (2005) and subsequent work, as follows: (23) 1. Task-specific genetic endowment, which ‘interprets part of the environment as linguistic experience … and which determines the general course of the development of the language faculty’ (Chomsky 2005:6). 2. Experience. 3. Principles not specific to the language faculty. The last category may include very diverse entities: principles of data analysis and organization proper to humans, higher mammals, or also to other forms of animal intelligence; principles of optimal computation, which may hold at different levels of generality for complex computational systems, within or outside the realm of cognitive

The Concept of Explanatory Adequacy 111 systems (see c hapter 6). So, the question about explanation ‘beyond explanatory adequacy’ can be asked of each factor. As for the first factor, the further explanation is of an evolutionary kind, so it makes sense to try to identify the evolutionary events that gave rise, presumably quite recently in human phylogeny, to the particular neural circuitry that makes human language possible with its unique characteristics among animal communication systems. As for the second factor, the data that the learner has access to, this has to do with the external world and its historical contingencies, wars, migrations, sociocultural stratifications, and so forth. Hence, it is not directly part of the study of cognitive capacities.4 As for the third factor, it potentially breaks down into a vast array of subfactors that may be quite different in nature, so that the ‘further explanation’ may take very different shapes. Principles of data analysis not specific to language and not specific to humans, but presumably shared with different forms of animal intelligence, evoke an evolutionary explanation, presumably over a very long temporal course. As for principles of optimal computation (principles of economy, locality, conservation), they may also demand an evolutionary explanation, to the extent to which they are specific to computational systems implemented in the biological world; or else, they may instantiate ‘natural laws’ operative in computational systems in general, even outside the sphere of biology. Needless to say, the very interesting questions raised by the attempt to distinguish first and third factor are extremely hard to state in scientifically satisfactory terms, hence they are at the border of current scientific inquiry on the topic (see again chapter 6). The attempt to go beyond explanatory adequacy is a program of great intellectual fascination, and it is a merit of minimalism to have focused attention on such an ambitious goal for linguistic research. But the goal of achieving explanatory adequacy does not dissolve into finer goals and research questions in minimalist analysis. Rather, having an analytic apparatus capable of reaching explanatory adequacy over a large scale remains a necessary point of departure for asking minimalist questions. And poverty of stimulus arguments continue to underscore the role of inner constraints of the human mind, now broken down into different factors in determining a richly structured body of linguistic knowledge (see Berwick and Chomsky 2011).5 4

Of course, the process through which the ‘input’ can become ‘intake,’ and the external data are integrated as ‘experience,’ is a cognitively relevant issue, presumably involving factors of the first and third kind. 5 A terminological note may be in order here. Explanatory adequacy evokes UG, but how does the classical concept of UG relate to the factors mentioned in (23)? The decision of how the term ‘UG’ cuts the cake in (23) may seem to be an arbitrary terminological matter, but terminological choices may not be entirely neutral and inconsequential in the context of a rich intellectual history. One possible terminological choice is to restrict the term UG to solely refer to the first factor, species-specific, task-specific genetic endowment (this seems to be the choice made by mainstream minimalism; see also chapter 6). Another possible terminological choice is to go along the externalism/internalism dichotomy and keep the term UG for the initial state of the cognitive system including both first and third factors, which are both internal and constitutive factors of the mental computational system for language; this system can be coherently defined, along the internal/external dichotomy, in opposition to the external data, which the internal system takes as input and operates on. It seems to me that this second terminological choice still permits us to construe the concept of UG in a way consistent with

112 Luigi Rizzi

5.7 Explanatory Adequacy and Simplicity In a certain sense, the intuitive notion of explanation is closely linked to simplicity. A better explanation of a certain domain of empirical data is a simpler explanation, we choose between two hypotheses or two models of comparable empirical adequacy on simplicity grounds, etc. Also the technical notion of explanatory adequacy is traditionally linked to simplicity: in the pre-parametric EST models, it was assumed that the language learner chooses between different fragments of grammars compatible with the primary data through an ‘evaluation metric’ choosing the simplest fragment (where simplicity can be computed in terms of number of symbols, of rules, etc.). Nevertheless, explanatory adequacy is a level of empirical adequacy, not an a priori criterion. In certain cases, a tension may arise between the needs of explanatory adequacy and imaginable criteria of optimal simplicity. If we take ‘simplicity’ as corresponding to ‘lack of structure,’ i.e., an unstructured system is simpler than a more structured one, the tension is clear. Consider again, in this connection, the issue of the choice of the correct structural representation of the Subject Verb Object sequence (or, in fact, of any configuration of three elements, specifier, head, and complement). A restrictive theory of Merge, limited to binary applications (hence, ensuring binary branching structures) constrains the choice to (4a) and (4b), ruling out (4c). If the theory is further supplied with a version of Kayne’s (1994) Linear Correspondence Axiom (or any principle stating or deriving the compulsory linear order Specifier–Head), then (4a) is the only possible choice for any transitive sentence like John loves Mary. So, a language learner equipped with binary Merge and the Linear Correspondence Axiom (or equivalent) has no choice, the ‘correct’ structure is enforced by the shape of his cognitive system, a desirable result from the viewpoint of the needs of explanatory adequacy. Clearly, a system not specifying the LCA (or equivalent) is simpler than a system with the LCA. And arguably, a system not putting any binary constraint on Merge, hence permitting n-ary Merge, the intellectual history of generative grammar, i.e., as ‘part of the innate schematism of mind that is applied to the data of experience’ and that ‘might reasonably be attributed to the organism itself as its contribution to the task of the acquisition of knowledge’ (Chomsky 1971). The two terminological choices correspond, in essence, to what Hauser, Chomsky, and Fitch (2002) have called ‘faculty of language in a narrow/broad sense,’ the first identifying task-specific genetic endowment, the latter also including an array of non-task-specific properties and capacities, arising from a long evolutionary history and recruited for language at some point; analogously, one could think of a narrow and a broad characterization of UG, the latter including both task-specific and domain-general properties and principles that are operative in language, understood as a cognitive capacity. As the term UG has come, for better or worse, to be indissolubly linked to the core of the program of generative grammar, I think it is legitimate and desirable to use the term ‘UG in a broad sense’ (perhaps alongside the term ‘UG in a narrow sense’), so that much important research in the cognitive study of language won’t be improperly perceived as falling outside the program of generative grammar.

The Concept of Explanatory Adequacy 113 may be considered simpler than a system-limiting Merge to binary applications. Such unstructured systems could arguably be considered simpler than more structured systems; still, they clearly put a heavier burden on the language learner, implying that the choice between (4a), (4b), and (4c), all a priori possible, must be determined through data analysis, a step in the wrong direction from the perspective of aiming at explanatory adequacy. In some such cases the tension may be genuine, in others it may stem from the ambiguity of the concept of ‘simplicity.’ For instance, in our case, if we do not take ‘simpler’ as meaning ‘involving less structure,’ but as ‘involving fewer computational resources,’ the binary Merge, yielding systematic binary branching may be taken as simpler than n-ary Merge, as the former involves at most two slots in operative memory, while the latter involves n slots. And perhaps partly analogous (if less straightforward) considerations can be made in connection with the LCA. So, it may well be that in some cases the apparent tension between explanatory adequacy and simplicity dissolves if we disentangle different facets of an ambiguous concept. In the following passage in ‘Derivation by Phase’ Chomsky highlights the subtle border between a priori concepts like good design and simplicity, and empirical discovery: ‘Even the most extreme proponents of deductive reasoning from first principles, Descartes for example, held that experiment was critically necessary to discover which of the reasonable options was instantiated in the actual world’ (Chomsky 2001:1–2). The example of simplicity as oscillating between ‘lack of structure’ and ‘computational parsimony’ illustrates the point. The first interpretation is in conceptual tension with the level of empirical success that goes under the name of ‘explanatory adequacy,’ while the second is fully congruent with it.

Chapter 6

Third-F ac tor Expl anati ons a nd U niversal G ra mma r Terje Lohndal and Juan Uriagereka

6.1 Introduction The biolinguistic approach to generative grammar has in recent years emphasized the relevance of principles that are not specific to the Faculty of Language.1 These are taken to work together with both genetic endowment and experience to determine relevant I-languages. Chomsky (2005) labels these non-language-specific principles ‘third factors,’ and argues that computational efficiency is a core example of the notion.2 Although the study of third factors is novel to the Principles and Parameters approach, we show in this chapter that this perspective has historical antecedents in generative grammar. Nonetheless, it is only now that we are beginning to know enough about the structure of Universal Grammar to be able to ask real questions about what third factors might amount to. This perspective is in our view fruitful not just within linguistics, but more generally within (molecular) biology. At the same time, we will argue that we are far from having offered real third-factor explanations for linguistic phenomena. This chapter is structured as follows. In section 6.2, we discuss the three factors that enter into the design of I-language(s) and discuss the historical roots of this viewpoint. We also situate the perspective within the Principles and Parameters approach. Section 6.3 offers examples of third factors suggested in the recent literature. We evaluate these critically, and argue that although they are suggestive, more work is needed 1 Terje Lohndal is Professor of English linguistics at the Norwegian University of Science and Technology in Trondheim and Professor II at UiT The Arctic University of Norway. 2 We are grateful to Robert Freidin, Ian Roberts, Bridget Samuels, and T. Daniel Seely for comments on a previous version of this chapter.

Third-Factor Explanations and Universal Grammar 115 to understand them fully. In section 6.4, we warn against overusing third factors. Conclusions are presented in section 6.5.

6.2 Three Factors in Biology and Three Factors in I-L anguage Chomsky’s perspective on generative grammar has always been biological. Since language is something humans have tacit knowledge of, Cartesian mentalism is immediately relevant (Chomsky 1966/2009). In turn, individual humans are taken to carry an I-language in their minds (Chomsky 1986b), which virtually entails a biological component. Universal Grammar only appears to exist in the human species, so something about the (epi)genetics of humans must be enabling the linguistic procedure. The classical poverty of the stimulus argument argues for this approach, even if much work lies ahead in understanding the bona fide mechanisms underlying language.3 However, given that language is biological in nature, already in Chomsky (1965) we find a remark that suggests something even deeper may be at work (pp. 58–59) (see also Freidin and Vergnaud 2001 for extensive discussion): It is clear why the view that all knowledge derives solely from the senses by elementary operations of association and generalization should have had much appeal in the context of eighteenth-century struggles for scientific naturalism. However, there is surely no reason today for taking seriously a position that attributes a complex human achievement entirely to months (or at most years) of experience, rather than to millions of years of evolution or to principles of neural organization that may be even more deeply grounded in physical law.

As Freidin and Lasnik (2011) point out, the traditional evolutionary approach is being compared here with nonbiological principles of natural law. In fact, present-day conjectures on the evolution of language, or anatomically modern humans more generally, situate the emergence within the last couple of hundred thousand years at most (see Fitch 2010 and references there). This strongly suggests that general principles of nature may be even more important than the (epi)genetic component, as Chomsky (2007a) hints (see also Sigurðsson 2011a). While these sorts of considerations have been around for decades, it is only within the Minimalist Program that it has been possible to speculate about what ‘principles non-specific to language’ might amount to and how they determine linguistic structure. By the time Government and Binding (GB) theory had been fully developed at the end of the 1980s, linguists had a sufficiently well-understood set of principles to ask how 3

See Berwick et al. (2011) for comprehensive discussion of several unsuccessful attempts at getting machines to learn significant linguistic structures.

116 Terje Lohndal and Juan Uriagereka these could reduce to their essentials. The theory of phrase structure presents a good example of reduction to the barest essentials (see Lasnik and Lohndal 2013 for discussion; see also Lohndal 2014). Once we have accomplished this for a number of principles or structures, the goal becomes to actually understand why these conditions are they way they are. In Chomsky’s (2004a:105) words: ‘In principle, then, we can seek a level of explanation deeper than explanatory adequacy, asking not only what the properties of language are but also why they are that way.’ The desideratum, of course, is not unique to linguistics, as physicist Steven Weinberg reminds us: In all branches of science we try to discover generalizations about nature, and having discovered them we always ask why they are true … Why is nature that way? When we answer this question the answer is always found partly in contingencies, … but partly in other generalizations. And so there is a sense of direction in science, that some generalizations are ‘explained’ by others. (quoted in Boeckx 2006:114–115)

The Minimalist Program has attempted to enable the formulation of these why- questions. Whereas Chomsky (1965) was concerned with explanatory adequacy (to what extent the theory offers an account of how the child can acquire a language in the absence of sufficient evidence; see c hapter 5), Chomsky (2004a) wants to go beyond explanatory adequacy in the way the quote above outlines. The goal is ambitious, leading us into unchartered territory for linguistic theory: the laws of natural science. As we will see next, it is hard to come up with ideas as to what these laws might be, as applied to the computational system of language as a model of knowledge of language. Chomsky (2004a, 2005) argues that three factors condition I-language design: Assuming that the faculty of language has the general properties of other biological systems, we should, therefore, be seeking three factors that enter into the growth of language in the individual. (Chomsky 2005:6)

These three factors are outlined in (1). (1) a. Genetic endowment b. Experience c. Principles not specific to the faculty of language Based on Chomsky (2005:6), the genetic endowment is assumed to be more or less uniform for the species.4 This hard-wired structure presumably shapes the acquisition of language, helping the human child in the task. Thereby it presumably also imposes constraints on the kind of I-languages that can be acquired.

4

Of course, the matter is debatable for epigenetic conditions, but we may set these aside, supposing that they too are uniform in human societies.

Third-Factor Explanations and Universal Grammar 117 Of course, without experience, the genetic component cannot do much. This is what determines that a child growing up in Japan will learn Japanese and a child growing up in Oslo, Norwegian (under normal circumstances). Finally the third factor consists of principles that are not specific to the computational system underlying I-language(s) (Universal Grammar). Rather, these conditions are more general, reaching beyond human language, but can be employed by the language faculty. Chomsky argues that there are several subtypes of these general principles. The first consists of principles of data analysis that might be used in language acquisition and other domains. Another subtype consists of: … principles of structural architecture and developmental constraints that enter into canalization, organic form, and action over a wide range, including principles of efficient computation, which would be expected to be of particular significance for computational systems such as language. (Chomsky 2005:6)

For this reason, the third factor has also been characterized as ‘general properties of organic systems’ (Chomsky 2004a:105). Chomsky (2005:6) suggests that these properties ‘should be of particular significance in determining the nature of attainable languages.’5 The three factors are more general than they might appear. Gould (2002) discusses three similar factors that hold for organisms more generally. Gould provides the ‘adaptive triangle’ in (2) (Gould 2002:259): (2)

Historical contingencies of philogeny (1st factor)

Functional active adaptation (2nd factor)

Structural rules of structure (3rd factor)

Gould says that a current trait may arise from adaptation to whatever environment surrounds the organism, from a constraint that is not particular to the development of this organism (‘architectural or structural principles, correlations to current adaptations’), 5

Although the Minimalist Program has centered around these concerns, Freidin and Lasnik (2011) point out that the reduction of the genetic factor is consistent with Chomsky’s earlier view on language evolution. They point to the following quote: It does seem very hard to believe that the specific character of organisms can be accounted for purely in terms of random mutation and selectional controls. I would imagine that biology of 100 years from now is going to deal with evolution of organisms the way it now deals with evolution of amino acids, assuming that there is just a fairly small space of physically possible systems that can realize complicated structures. (Chomsky 1982b:23)

118 Terje Lohndal and Juan Uriagereka or by inheritance of an ancestral form—a historical or a phylogenetic constraint. This sort of constraint is part of the genetic endowment of the organism. These three factors are argued to express the major influences on the genesis of form (Gould 2002:259). While most contemporary scientists accept Dobzhansky’s dictum that ‘nothing in biology makes sense except in the light of evolution,’ the issue is to what extent this amounts just to natural selection. In Gould’s triangle this notion constitutes the second factor, since it involves adaptation to the environment. But a growing literature, summarized in Hoelzer, Smith, and Pepper (2006), emphasizes the role of principles of self-organization. This is part of Gould’s third factor. Alas, it has proven difficult to clarify what specific laws self-organization obeys and what role they play in shaping matter, life, or mind. Chomsky (1968/1972/2006:180) outlines the linguist’s take on these issues: The third factor includes principles of structural architecture that restrict outcomes, including principles of efficient computation, which would be expected to be of particular significance for computational systems such as language, determining the general character of attainable languages.

However, this raises the question: what kind of ‘efficient computation’ is Chomsky talking about here? We discuss the matter in section 6.3.1. In summary, the biolinguistic enterprise raises the question of the role of third factors, which Thompson (1942) pointed out provide some fundamental explanations for the growth and form of biological entities (see Freidin and Vergnaud 2001 for detailed discussion). Next we consider some examples of third-factor considerations that have been suggested in the literature.

6.3 Examples of the Third Factor The goal of this section is to discuss a few examples of third-factor conditions offered in the literature. The list is not long yet, perhaps because of the novelty of these ideas, the relatively small number of those actively researching them, or the difficulty in identifying substantive hypotheses. There are various ways to study third factors. One approach is to look for general principles that appear across various domains in nature. This assumes that there are more general principles governing the creation of form, and that these can be recruited by various general cognitive computations. Another way is to look at linguistic units and see if we find them in non-human species. If we find that principles of, for instance, human phonology appear in different animal species, that strongly suggests that the phonological operations are not unique to the Faculty of Language. This is the approach taken by Samuels (2009, 2011), and in chapter 21 (but see also chapter 8). In the remainder of this section, we concentrate on the first approach.

Third-Factor Explanations and Universal Grammar 119

6.3.1 Computational Efficiency As we saw in section 6.2, Chomsky takes ‘computational efficiency’ to be the hallmark of a third-factor effect. The implicit assumption is that computations in general should be as efficient as possible, and that this is a property that all computations share, regardless of what is being computed. There are not that many examples of efficient computation in the literature when it comes to I-language; but Chomsky (2008) mentions cyclicity considerations as an example. The Extension Condition states that (External and Internal) Merge of a new object targets the top of the tree.6 In order to see this, consider the trees in (3a)–(3c): 3 a

b

X Z B

X

β

A

c

X

C

Z

Z A

B

X A B

C

C C

β

(3a) is the original tree. (3b) shows a derivation that obeys the Extension Condition because the new element β is merged at the top of the tree. The derivation in (3c) does not obey Extension because β is merged at the bottom of the tree. A related cyclicity condition is the No Tampering Condition. The No Tampering Condition states that merge of X and Y leaves the two syntactic objects X and Y unchanged.7 The set {X, Y} created by Merge cannot be broken up and new features cannot be added (Chomsky 2008). So on this view, (3b) involves no tampering, since the old tree in (3a) still exists as a subtree of (3b), whereas (3c) involves tampering with the original structure. Chomsky (2008:138) sees the No Tampering Condition as a ‘natural requirement for efficient computation.’ This is an economy condition, as argued by Lasnik and Lohndal (2013): it is more economical to expand a structure than to go back and change a structure that has already been built. Yet another example is what Rizzi (1990) called Relativized Minimality, also discussed in chapter 5, section 5.4. Chomsky (1993) reinterpreted Rizzi’s groundbreaking work in terms of least effort. Let us illustrate that here by way of a phenomenon called Superiority, which has often been analyzed as a Relativized Minimality effect. Consider: (4) a. Guess who bought what? b. *Guess what who bought? 6 7

See Richards (2001) for arguments that this condition does not always hold. Uriagereka (1998:264) calls it the Ban Against Overwriting.

120 Terje Lohndal and Juan Uriagereka In this situation, there might seem to be the option to either front who or what. As (4a) and (4b) show, only the former is licit. In such a situation, the wh-element closest to the target of movement is picked, as first observed by Chomsky (1973:246). Rizzi (2001:89) states Relativized Minimality as follows: (5)

In the configuration …X…Z…Y… Y cannot be related to X if Z intervenes and Z has certain characteristics in common with X. So, in order to be related to X, Y must be in a minimal configuration with X, where Minimality is relativized to the nature of the structural relation to be established.

Put differently, one should minimize the ‘distance traveled’ by the moving element, an instance of economy of derivation. Once again, economy conditions are supposed to be quite general, as argued, for instance, by Fukui (1996) or Uriagereka (1998). However, in this case, the notion of ‘distance’ is certainly not trivial. This is captured in the name of the condition itself: distance is somehow relativized to the units across the path being considered. In this regard the following remark in Fukui (1996:61) seems quite relevant: We are of course not suggesting that the economy principles of language are ‘reducible’ to the Principle of Least Action. The actual formulation of the principles appears to be highly specific to language. Nevertheless, the fundamental similarity between language and the inorganic world in this respect is so striking that it suggests that there is something deep in common between the two areas of inquiry.

In other words, we still want to understand why distance should matter—as it does in other realms of nature. In all languages that have been studied in this regard, some notion of distance, very much something along the lines Rizzi and others unearthed, is certainly at work. Therefore it is important to investigate what distance reduces to— what are the basic properties of distance and why do those particular properties matter, as opposed to other conceivable conditions? One proposal in the literature is that distance is limited because the derivation only happens ‘in chunks.’ That is, during the derivation various parts of the syntactic structure are shipped off to the interfaces (Uriagereka 1999; Chomsky 2000b). These units, called phases or cycles, are among others motivated on grounds of computational efficiency. Here is a relevant quote: Suppose we select L[exical]A[rray] as before … ; the computation need no longer access the lexicon. Suppose further that at each state of the derivation a subset LAi is extracted, placed in active memory (the workspace), and submitted to the procedure L. When LAi is exhausted, the computation may proceed if possible; or it may return to LA and extract LAj, proceeding as before. The process continues until it

Third-Factor Explanations and Universal Grammar 121 terminates. Operative complexity in some natural sense is reduced, with each stage of the derivation accessing only part of LA. (Chomsky 2000:106)

Put differently, phases reduce the computational complexity of a derivation. One question that immediately comes up is what phases are. Phases are defined by stipulating the phase heads: C and v. Chomsky (2000b) argues that these are the phase heads because they are propositional and they yield convergent derivations. In later work, unvalued features have been the defining properties of phase heads (Richards 2007; Chomsky 2008). Regardless of what the relevant property is, though, as long as phases are meant to reduce computational complexity it would be nice to see at least a correlation between how computational complexity works and how the phase heads are defined. Put differently, we would expect that the phase heads fall out from properties that are independently known to relate to computation, whatever these may be. Whereas it is somewhat easier to imagine how this would work for the C head,8 it is unclear how the v phase would fall out. Further problems arise if DPs too induce phases, as argued by Svenonius (2004). Evidently it would be a welcome result if phases fall out from natural constraints on computational complexity, as these units remain a stipulation—much in the sense bounding nodes were in earlier theories (see Boeckx and Grohmann 2007b for discussion). If cyclicity, in a broad sense, is a deep property of the computational system, we would expect it to have a deep rationalization as well (see Freidin 1978, 1999). Attempting to provide one turns out to require revisions of several standard assumptions (see, e.g., Uriagereka 2011). Another example of a third factor comes from Freidin and Lasnik (2011). They argue that interface constraints fall under the rubric of principles of efficient computation, providing the following argument for this view. Interface conditions (bare output constraints) are taken to be imposed on the grammar by other cognitive components. In particular, Freidin and Lasnik interpret the principle Full Interpretation as a legibility requirement banning superfluous symbols in representations, assuming meaning and sound interfaces cannot interpret the relevant structures. As such, the principle contributes to efficiency: the computation need not compute symbols that turn out to be superfluous. Freidin and Lasnik go on to argue that the Theta Criterion can be made to follow from the principle of Full Interpretation.9 This is taken to account for the data in (6): (6) a. *John seems that Mary is happy. b. *John gave Mary a book to Bill. An argument that does not have a theta role is uninterpretable at the semantic interface—hence superfluous, as in (7), where like only has two theta roles, whereas

8

This is assuming that something like a sentence is an independent computational unit. This condition prohibits an argument that does not get a theta role and prohibits multiple theta roles being assigned to the same argument. 9

122 Terje Lohndal and Juan Uriagereka there are three nominal constituents. If these data violate Full Interpretation, portions of the Theta Criterion are therefore redundant. The Case Filter can be analyzed in the same way if Case features are uninterpretable at the interfaces. (7) *John likes Mary Peter. As Freidin and Lasnik argue, this approach progressively eliminates principles whose nature was taken to be part of the first factor (the genetic endowment), in favor of conditions that are outside of the Faculty of Language.10 Now, for perspective, computational efficiency need not be a third factor. The following is a case that may at first glance appear to be a third factor, but which was actually argued to be what we are now calling a first factor, as it involves the computational efficiency of parsers.11 Berwick and Weinberg (1984) argue that cycles described by syntacticians constitute optimal units of parsing. Observe: (8) [Whoi did [John say [ti that [Peter believed ti that … [Mary sent ti flowers/to Bill ]]]]] The parser that Berwick and Weinberg are working with constructs one phrase marker and a discourse file that corresponds to it. The usual filler–gap problem emerges in a case like (8), where there is a wh-phrase. However, in the case of (8), the predicate send is syntactically ambiguous. It can be parsed with a different number of arguments. It is necessary for the parser to have access to the relevant predicate and its immediate context, plus the left-edge context. The latter provides information about the antecedent wh-phrase of the gap. But as the example in (8) indicates, the antecedent can be arbitrarily far removed from the variable that it binds. In order to account for this, Berwick and Weinberg suggest that the left context is present at every derivational cycle. Intermediate traces enable this. An immediate question is why the cyclic nodes that Berwick and Weinberg assume are the ones that coincide with those discovered in the past—as opposed to others that would seem equally plausible, parsing-wise. Why, for instance, can the next phrasal projection not constitute a cyclic node (Fodor 1985; van de Koot 1987)? Berwick and Weinberg did not attempt to answer that question; they simply argued that the parser works well if it is structured this way, even though it does not solve all cases of parsing ambiguity. Fodor (1985) criticizes this on evolutionary grounds, and Berwick and Weinberg (1985) counter by citing Gould’s (1983) criticism of perfect design. So their approach is very much a first factor approach: the parsing mechanisms are specific to the Faculty of Language, and the computational efficiency comes from adaptation; it is therefore species-specific.12 10

It remains to be seen in what sense relics of Full Interpretation are found in other domains of cognition, be it in humans or in non-humans, or in other areas of nature. 11 For more discussion of this case, see Uriagereka (2011). 12 In fact, there is no deep reason to assume that the parser works in exactly the same way across all languages, or that the cyclic nodes are universal (see Rizzi 1978).

Third-Factor Explanations and Universal Grammar 123 Let us now return briefly to computational efficiency from a third factor perspective. Even though it is pretty obvious that something like computational efficiency is a general property of computations, stating that does not answer the deeper question that one can ask: Why is computational efficiency what it is?13 What properties of the structure of computations make them efficient?

6.3.2 The Fibonacci Sequence A different, though ultimately related, argument for third factors stems from Fibonacci patterns of the sort seen in natural phenomena. Uriagereka (1998:485ff.) sketched an argument that we also find Fibonacci growth patterns in language, and thereafter some researchers have begun to produce specific results in this regard. Before we outline this sort of argument, consider Fibonacci sequences.14 Relevant structures manifest themselves either as a number of features falling into the series 0, 1, 1, 2, 3, 5, 8, … or as a logarithmic growth based on the limit of the ratio between successive terms in the Fibonacci series (1.618 … , the so-called golden expression φ). The majority of plants that have been studied have been shown to follow Fibonacci growth patterns, and we see them also from the organization of skin pores (in tetrapods) to the way in which shells grow (in mollusks), among scores of other examples. In addition, the pattern has been recreated in controlled lab situations (Douady and Couder 1992). Consider the latter case in some detail, since it shows that the structure can emerge under purely physical conditions, and not only under ‘Darwinian conditions.’ Douady and Couder slowly dropped a magnetized ferro-fluid on the center of a flat, rotating oil dish. The drops repel each other, but are constrained in velocity by the oil’s viscosity. As the dropping rate increases, a characteristic Fibonacci pattern emerges. The relevant equilibrium can be conceptualized as involving a local and a global force pulling in opposite directions, and the issue is how these opposing forces balance each other out, such that the largest number of repelling droplets can fit within the plate at any given time, as they fall onto it. It turns out that an angle φ of divergence between each drop and the next achieves this dynamic equilibrium.15 The first example of the Fibonacci pattern in language was proposed for the structure of syllables (Uriagereka 1998:ch. 6). We won’t review that now, but rather point to another result from phonology involving metrical feet.16 Idsardi (2008) proves that the number of possible metrical parsings into feet for a string of n elements is Fib(2n), where Fib(n) is the nth Fibonacci number. In particular he observes that, if we disregard 13

Recall Fukui’s (1996) suggestion that economy principles within linguistics resemble the Principle of Least Action in physics, suggesting a deeper physical basis behind computational efficiency. 14 What follows relies on material that is discussed in more detail in Piattelli-Palmarini and Uriagereka (2008). 15 We need not discuss the technical details of the experiment here, but the following link presents a curious video of the actual experiment: http://www.sciencenews.org/view/generic/id/8479. 16 For discussion of other third factors in phonology, see Samuels (2009, 2011) and c hapter 8.

124 Terje Lohndal and Juan Uriagereka prominence relations within the feet,17 the possible footings for strings up to a length of three elements are as in (9) (Idsardi 2008:233). Below, matching parentheses indicate feet, and elements that are not contained within parentheses are unfooted (‘unparsed’ in Optimality Theory terminology). (9)

a. 1 element, 2 possible parsings:

(x), x

b. 2 elements, 5 possible parsings:

(xx), (x)(x), (x)x, x(x), xx

c. 3 elements, 13 possible parsings:

(xxx), (xx)(x), (xx)x, (x)(xx), x(xx), (x)(x)(x), (x)(x)x, (x)x(x), x(x)(x), (x)xx, x(x)x, xx(x), xxx

As Idsardi (2008:234) observes, the number of possible footings is equal to every other member of the Fibonacci sequence (relevant parsings as in (9) boldfaced in (10)): (10)

Fibonacci sequence: 1, 1, 2, 3, 5, 8, 13, 21, … Intensionally: for n >1, where Fib(n) is the nth number in the Fibonacci series, Fib(n) = Fib(n − 1) + Fib(n − 2)

Idsardi then provides a proof for this result, which we will not go into now. In a follow- up paper, Idsardi and Uriagereka (2008) provide some rationale for why only half of the Fibonacci sequence is involved in these phonological parsings. Uriagereka (1998:ch. 6) also indicated that we should expect conditions such as these in other parts of the grammar. Boeckx, Carnie, and Medeiros (2005), Medeiros (2008, 2012), and Soschen (2008) have argued that this is the case. We will here focus on Medeiros’ work. Medeiros takes standard X-bar theory (Chomsky 1986a) as a point of departure. (11)

XP specifier

X’ X0

complement

He then investigates maximal expansions of (11), as in (12), which observe binary branching (Kayne 1984; Chomsky 2000b). The expansion in (12) is optimal in the sense that the basic X-bar structure is present in all branchings, although of course that maximality is not necessary in linguistic representations. 17

With Halle and Idsardi (1995:440), Idsardi does not assume an exhaustive parsing of phonological elements, pace Halle and Vergnaud’s (1987) Exhaustivity Condition.

Third-Factor Explanations and Universal Grammar 125 (12)

Max P XP YP

X’ Y’

WP RP

W’

MP R’ W NP

Y

ZP

X SP

OP S’

TP

Z’

PP T’ Z QP

Intermediate P Head

1

0

0

1

1

0

2

1

1

3

2

1

5

3

2

At each successive full expansion of the tree in (11), there are Fibonacci numbers of maximal projections, intermediate projections, and heads. Medeiros further shows how the Fibonacci patterns force deeper phrasal embedding in the relevant cases. Medeiros then goes on to claim that the formation of linguistic tree structures is related to structural optimization: Merge makes the full spectrum of binary branching forms available. Medeiros argues that there is a computational burden associated with establishing relations based on containment and c-command (2008:189). This entails that some derivational choices are better than others, and he argues that the form that is closest to the X-bar schema in (11) is the computationally most optimal. This then coincides with the Fibonacci pattern as seen in (12). Computational efficiency and Fibonacci therefore seem to be related at a fairly deep and structural level, where one may suggest that, in some sense to be understood, the Fibonacci structuring is constraining the nature of the computation. If the above turns out to be on the right track, it immediately raises the following question: what is the connection between the Fibonacci patterns for syllables and the Fibonacci patterns for phrase structures? Carstairs-McCarthy (1999) explicitly argues that there is one such connection when he claims that phrasal structure is a biological exaptation, in evolutionary terms, of earlier syllabification conditions. Such an idea is not entirely new; among others, Kaye, Lowenstamm, and Vergnaud (1985) and Levin (1985) argue for a close link between phrasal and syllabic structures. However, as Piattelli-Palmarini and Uriagereka (2008) emphasize, it remains to be seen why a structural translation—ultimately going from the realm of sound to that of structured meaning—is in the nature of language. There are also many unanswered questions that should be acknowledged. Even if Fibonacci patterns do occur in linguistic representations (a difficult empirical matter to ascertain one way or the other), that does not answer why such patterns should occur in the relevant representations. Piattelli-Palmarini and Uriagereka (2008:223) address this very issue when asking: What does it mean for linguistic forms to obey those conditions of growth, including such nuanced asymmetrical organizations as we have studied here? Why has natural law carved this particular niche among logically imaginable ones, thus yielding the attested sub-case, throughout the world’s languages? What in the evolution of

126 Terje Lohndal and Juan Uriagereka the species directed the emergence of these forms, and how was that evolutionary path even possible, and in the end, successful? The biolinguistics take that we assume attempts to address these matters from the perspective of coupling results within contemporary linguistic theorizing with machinery from systems biology and bio- physics more generally. Again, we do not fully understand the ‘embodiment’ of any of the F[ibonacci] patterns in living creatures. But the program seems clear: proceeding with hypothetical models based on various theoretical angles, from physical and bio-molecular ones, to grammatical studies isolating abstract patterns in the linguistic phenotype. A synthesis proceeding this way seems viable in the not so distant future, at the rate that new discoveries march.

6.3.3 Summary We have looked at a few examples of third factor proposals in the literature: efficient computation and Fibonacci patterns. In both cases, they invoke principles that are assumed to be non-specific to language. In the case of Fibonacci patterns, it is obvious that they exist in many different domains of nature. It seems plausible to argue that there are laws of nature that yield Fibonacci sequences, since these sequences appear in the organic and inorganic world alike. We have also mentioned how Medeiros (2008, 2012) presents a close link between efficient or optimal computation and Fibonacci patterns. One may speculate that the principles underlying the Fibonacci patterns are somehow structuring the computation. Or it may be that the computation is simply structured so that such patterns appear based on this structure, more or less as an epiphenomenon (see Uriagereka 2011). Work of the sort that we have reviewed here has made it possible to ask new questions, even if definitive answers are still missing. Future work will hopefully enable us to delve deeper into these issues, and it may turn out that our understanding of the basic computation of language has to be modified in order to truly rationalize third factors (see Uriagereka 2011 in this regard).

6.4 Studying Third Factors We would like to end this chapter with some reflection on the complex task of determining whether a given linguistic condition might be a third factor. On the PHON side of the interface, one can argue that none of the observable computational operations are human-specific (see Samuels 2009, 2011 and c hapters 8 and 21). One may also argue that the computational operations on the phonological side are grounded in phonetic constraints. Perhaps these constraints are the way they are because of physio-anatomical considerations; for instance, phonetic patterns across languages that involve ease of articulation and perception (Blevins 2004; Hayes, Kirchner, and Steriade 2004). Couldn’t one then argue that the phonetic grounding

Third-Factor Explanations and Universal Grammar 127 of phonology is a third factor, human physiology constraining possible phonological patterns? Since the foundations of human physiology presumably have little to do with how language is structured, in that view phonetic patterns are not specific to language. Whether an argument along these lines, then, determines the validity of this condition as a third factor depends on whether there is independent validity to the claim that phonetics is the way it is because of human physiology. For starters, such a claim requires a serious, and difficult, look at the relevant neurobiology, as Poeppel, Idsardi, and van Wassenhove (2008) emphasize for speech perception.18 Now while on the PHON side of things one at least has the advantage of relatively straightforward observables and decades upon decades of tradition to gear research, on the SEM side the task seems much harder. There are, no doubt, familiar arguments for relevant syntactic structures, and we know also that we understand meanings associated to those structures. However, little else is known, or directly observable, or even up for clever testing. One could certainly claim—on analogy with arguments about human physiology constraining possible phonological patterns—that human psychology constrains possible semantic patterns. However, it is far from obvious that the foundations of human psychology are totally independent of how language is structured, or that such a would-be claim is even testable with other animals. So it is not altogether clear what it would, then, mean to say that some third factor grounds SEM the way it might PHON.19 This, of course, does not mean that there could not be third factors on the semantic side—just that it is hard to make the case. Let us discuss a possible third factor effect exclusively for illustrative purposes. For Neo-Davidsonian approaches that assume conjunction to be the basic semantic composition principle (Schein 1993; Herburger 2000; Pietroski 2005, 2011), one could ask whether conjunction is, in fact, specific to language. Suppose it isn’t, and its essentials can be demonstrated for other species.20 We would still need to understand why, in the case of human language, not just anything gets conjoined for semantic composition; rather, in this view of things, predicates of a specific type are what conjoins, as shown in (13): (13)

a. Brutus stabbed Caesar quickly. b. ∃e[Agent(e, Brutus) & Theme(e, Caesar) & stab(e) & quickly(e)]

18

Albeit not with the goal of reducing phonology to phonetics, a matter orthogonal to that paper. Imagine, for the sake of argument, that the best theory for semantics is model-theoretic (see chapters 2, 3, and 9 for relevant discussion). To appropriately make the case for such a condition being a third factor, one would have to demonstrate its independence from language, for instance in terms of other organisms making use of this system in ways similar to those in which humans use said mechanics for language. 20 We are not making any specific claim in this regard, but it wouldn’t strike us as implausible that basic animal psychology should be conjunctive. After all, most roughly iterative animal tasks one can think of (e.g., building a nest or a dam, or plotting a path to some goal and back home) would seem to entail at the very least some elementary conjunctive semantics associated to the iteration (see c hapter 21). 19

128 Terje Lohndal and Juan Uriagereka Different proposals about how the syntax delivers the logical form in (13b) have been explored in Borer (2005); Hinzen (2006b); Uriagereka (2008); and Lohndal (2012, 2014). These are hypotheses—among others possible—on concretely how conjunction is actually employed depending on language-specific computations. Suppose a method is found to empirically validate one of these different theories, or some alternative. Would this then mean ipso facto that the psychology of conjunction is a third factor? Again, as in the case of phonetics, the answer to that question would depend on whether there is independent validity to the claim that semantics is the way it is because of human psychology. Very clearly a serious evaluation of such a claim would depend on matters of neurobiology for which, at present, there is effectively no understanding. Circumstances may change as discoveries bring us new insight into the brain. That being said, for evaluating whether a given semantic condition is a third factor, attitudes in that area of study will need to change. At present answers in this realm are often descriptive, and even questions of the sort we are now sketching are met with skepticism. But the main point of this section is to argue that, difficult as determining what a third factor is, in our view it won’t do to just claim such a thing on essentially eliminative grounds.21

6.5 Conclusions In recent years, the study of I-language has been divided into three factors: genetic endowment, experience, and principles of nature that are not specific to language. This chapter has outlined these three factors, discussing some possible third factor approaches. There are not many third factors suggested in the literature. This is not surprising, as the concept has not been around for that long in linguistics. Future research should be able to take us further, providing more detailed accounts of how principles that are not specific to the Faculty of Language apply to it (or not). In closing, we would again like to emphasize how challenging the third factor approach is. It also shows us how difficult it is to provide principled explanations. It should be clear that empirical generalizations such as cyclicity constitute the foundation of the third factor approach—but this perspective forces us to go beyond that unavoidable empirical step.

21 To paraphrase Sherlock Holmes: ‘When you have eliminated the second factor, whatever remains, however improbable, must be a third factor.’ The fallacy is based on the fact that, for starters, the hypothesized condition may be a total mirage—and blaming it on the third factor won’t give it more ontological bite, on the sheer basis of the claim.

Chapter 7

F or mal and Fu nc t i ona l Expl anat i on Frederick J. Newmeyer

Formalism and functionalism represent poles of a timeless dichotomy, each expressing a valid way of representing reality. Both poles can only be regarded as deeply right, and each needs the other because the full axis of the dichotomy operates as a lance thrown through, and then anchoring, the empirical world. If one pole ‘wins’ for contingent reasons of a transient historical moment, then the advantage can only be temporary and intellectually limited. (Gould 2002:312) Internalist biolinguistic inquiry does not, of course, question the legitimacy of other approaches to language, any more than internalist inquiry into bee communication invalidates the study of how the relevant internal organization of bees enters into their social structure. The investigations do not conflict; they are mutually supportive. In the case of humans, though not other organisms, the issues are subject to controversy, often impassioned, and needless. (Chomsky 2001a:41)

7.1 The Great Rhetorical Divide in Linguistics Perhaps the greatest rhetorical conflict that exists (and has long existed) among the linguists of the world is between those who practice some variety of ‘formal linguistics’ and those who practice some variety of ‘functional linguistics.’ The former advocate formal explanations of linguistic phenomena, and the latter functional explanations. But I have carefully chosen the wording ‘rhetorical conflict,’ since the principal purpose of this

130 Frederick J. Newmeyer chapter is to argue that there is no inconsistency in advocating (and practicing) both modes of explanation. A major impediment to the task of characterizing and evaluating the two explanatory strategies is that the very words ‘formal’ and ‘functional’ are used differently by different linguists. For some an explanation is ‘formal’ if and only if it accords center stage to formal structure (as opposed to, say, discourse). Such is the use of the term in mainstream syntactic theory, namely the Principles-and-Parameters (P&P) approach (Chomsky 1981a) and those models directly antecedent and subsequent to it (Chomsky 1973, 1995b). For others, particularly those who work in constraint-based approaches like Head-Driven Phrase Structure Grammar (Sag, Wasow, and Bender 2003) and Lexical- Functional Grammar (Bresnan 2001), an explanation is ‘formal’ if and only if the theory is formalized in minute detail. So when Pullum (1989) challenges the right of P&P to call itself a ‘formal approach,’ he has the latter sense of the term ‘formal’ in mind. We find analogous problems with the term ‘functional.’ For some an explanation is ‘functional’ if and only if its explanans are external to grammar per se. Most self-described ‘functionalists’ in North America use the term ‘functional’ in that way. But for others, it is sufficient to describe an explanation as ‘functional’ if it accounts for the functioning of the elements of the grammatical system internally—that is, with respect to each other. For example, André Martinet, a prominent French linguist active between the 1950s and 1970s could entitle one of his books A functional view of language (Martinet 1962), when much of its content was devoted to illustrating purely structural oppositions within self-contained grammars. And a final confusion arises from the question of where meaning fits into the picture. For most functionalists and for quite a few formalists, a semantically-based explanation is regarded as an alternative to a formal explanation, even though the semantic properties of language can be, and often are, formalized more thoroughly than the morphosyntactic properties. So Cognitive Linguistics (Lakoff 1987; Langacker 1987), whose goal is to explain morphosyntax in terms of the meanings of the items encoded, is generally regarded as a close ally of functional linguistics, rather than of formal linguistics. My first task therefore is to propose criteria by which an explanation might be considered ‘formal’ and one by which it might be considered ‘functional.’ I characterize the modes of explanation as in (1) and (2) and will attempt to use them consistently in these senses in the remainder of the chapter: (1) An explanation is formal if it derives properties of language structure from a set of principles formulated in a vocabulary of nonsemantic structural primitives. (2) An explanation is functional if it derives properties of language structure from human attributes that are not specific to language. One might immediately note an asymmetry between (1) and (2), or at least a vagueness in the formulation of (2). Bowing to common usage, definition (1) specifically excludes a semantically-based explanation from counting as ‘formal,’ whereas such

Formal and Functional Explanation 131 an explanation-type counts as ‘functional’ only if one takes a positive stance on the language-independence of meaning. Fortunately, little in the remainder of the chapter hangs on the resolution of this particular issue. The chapter is organized as follows. Sections 7.2 and 7.3 focus on the properties of formal explanation and functional explanation respectively. Section 7.4 argues that both have their place in a full account of the properties of linguistic structure. Section 7.5 is a brief conclusion.

7.2 Formal Explanation In this section I first outline the properties of formal explanation in linguistics (7.2.1). Section 7.2.2 replies to some frequently-raised criticisms about this mode of explanation and 7.2.3 argues that, given the autonomous nature of syntax, formal explanation is not just possible, but is, in fact, necessary.

7.2.1 The Nature of Formal Explanation As noted in (1), an explanation will be considered ‘formal’ if facts about language structure can be derived from a set of principles formulated in a vocabulary of grammar- internal primitives. Generativists say that the former are ‘explained’ by the latter. So, for example, one can imagine a language that formed questions by regularly inverting the order of all the words in the corresponding declarative (3b), rather than by fronting some particular constituent of the declarative (3c): (3) a. You would like to stay home today. b. *Today home stay to like would you? c. Would you like to stay home today? The theory of Universal Grammar (henceforth ‘UG’), however, prohibits the former option by its failure to provide a mechanism for carrying out such an inversion operation. That is, the following rule type, while perhaps simple and elegant in the abstract, is not allowed by UG, since in standard Minimalist Program approaches, it is incompatible with the combinatorial Merge operation: (4) W1 -W2 -W3 -… -Wn → Wn -… -W3 -W2 -W1 So we say that the ungrammaticality of sentence (3b) is explained by the fact that UG principles fail to allow it. But the notions ‘ungrammaticality’ and the formulation of Merge are themselves theory-internal. By what criteria, then, can we claim to have

132 Frederick J. Newmeyer explained anything about English (or about human language in general)? After all, an internally-consistent deductive system could be designed to rule (3b) grammatical and (3c) ungrammatical. The answer is that one assumes a consistent—though not mechanical—correspondence between the predictions of the grammar and observations of language performance. Since the ungrammaticality of sentences like (3b) in English and other languages is matched by its unacceptability to native speakers and Merge is called upon repeatedly in the analysis of other phenomena, formal linguists have no hesitation in claiming that the judgments of the native speaker are explained, in part, by the particular formulation of Merge, which excludes rule type (4). Many explanations provided within generative grammar thus fit a weak form of the deductive-nomological model (Hempel 1965). That is, a general principle is formulated and paired with a set of initial conditions. A phenomenon is said to be ‘explained’ if it can be deduced from this pairing. Consider, as another example, the ‘Wh-Criterion,’ a UG principle that helps to account for the syntax of interrogative clauses in a number of languages: (5)

Wh-Criterion (Rizzi 1996:64, revising May 1985): A. A wh-operator must be in a Spec–head configuration with X0 [+wh]. B. An X0 [+wh] must be in a Spec–head configuration with a wh-operator.

Independently arrived at assumptions about sentence structure lead to the conclusion that the wh-word who in the following sentence is not in a Spec–head configuration with an X0 [+wh]. Hence it is deduced that the following sentence will be ungrammatical: (6) *I wonder you saw who. Since (6) is indeed unacceptable to native speakers, we have evidence in favor of the Wh-Criterion, from which the deduction follows. The problems with exclusive reliance on the deductive-nomological mode of explanation are well known (for a review, see Salmon 1984, 1989). In a nutshell, for any complex phenomenon, the initial conditions and the data to be explained are themselves typically fluid enough that giving them a slight reinterpretation allows one to explain away any failed deduction without abandoning the proposed principle. Suppose, for example, that in one dialect of English we did find sentences of the form of (6) of unquestioned acceptability. Would the failure of the deduction then falsify the Wh-Criterion? Not necessarily. The problem might lie with the initial conditions—themselves hypotheses about the nature of grammar. For example, it might be the case that the grammar of this particular dialect contains a covert movement operation that leads to the Wh-Criterion being satisfied, despite appearances to the contrary. Or as another possibility, one might argue that in this particular case, speakers have some particular reason to accept sentences that their grammar does not generate.

Formal and Functional Explanation 133 Furthermore, we have no theory-independent way of knowing which facts are to be explained by which principles. Suppose we encountered a set of ungrammatical sentences bearing a resemblance to those that are recognized as Wh-Criterion violations, but which the Wh-Criterion (as currently formulated) could not exclude. Would that be sufficient to falsify the (current formulation of) the Wh-Criterion? Not necessarily— perhaps some other, possibly yet unformulated, principle is responsible for excluding them. An analogous issue came up in the early 1980s. In the context of a critique of a trace-theoretic explanation of the contraction of want to to wanna, Postal and Pullum (1982) pointed out that the proposals in Jaeggli (1980) and Chomsky (1980) failed to explain (among other things) the impossibility of contraction in (7a–b): (7) a. I want to dance and to sing. b. *I wanna dance and to sing. With some irony, they faulted Jaeggli and Chomsky for claiming ‘explanatory success’ when their proposals failed to even describe the set of facts that simple reflection might lead one to believe that they should describe. But in a reply to Postal and Pullum, Aoun and Lightfoot (1984) argued that the set of facts under the purview of a particular theoretical proposal is not given in advance: [Postal and Pullum] find it theoretically suspicious that trace theory advocates can claim to have achieved explanatory success when in fact their descriptions fail. We would argue that one can explain some facts even if others are left undescribed; it is unreasonable to say that one has no explanation until all facts are described. In order to have an explanation (of greater or lesser depth) one needs to describe the relevant facts. It is important to note that there is no theory-independent way of establishing which facts are relevant. (Aoun and Lightfoot 1984:472)

Given the difficulties inherent in the deductive-nomological mode of explanation, how does one then determine if a proposal (or, for that matter, an entire theory) is an explanatory one? There is no simple answer to this question. Most philosophers of science today would appeal, not to the successful deduction of a set of facts from a set of premises, but to more intangible matters, such as a theory’s internal consistency, to its meeting standards of simplicity and elegance, and above all its ability to contribute to an understanding of the matter at hand. To illustrate, consider two competing formal treatments of the English auxiliary: an item-and-arrangement phrase structure analysis of the type current in the 1950s and the analysis proposed in Syntactic Structures (Chomsky 1957). The former, by any criterion, is cumbersome and uninsightful. The latter analysis, however, treated the superficially discontinuous auxiliary morphemes have en and be ing as unit constituents generated by (a simplified set of) phrase structure rules, and posited a simple transformational rule to permute the affixal and verbal elements into their surface positions, thus predicting the

134 Frederick J. Newmeyer basic distribution of auxiliaries in simple declarative sentences. Moreover, Chomsky was able to show that the permutation rule, ‘the Auxiliary Transformation’ (later called ‘Affix Hopping’ by other linguists), interacts with rules forming simple negatives and simple yes–no questions to specify neatly the exact locations where ‘supportive’ do appears. Most grammarians agreed at the time that the Syntactic Structures account was more ‘explanatory’ than the alternative. But why? Certainly not because of its empirical coverage, which was no better than that of any purely phrase-structural account. The analysis was considered more explanatory because it was elegant, simple, and, by virtue of being able to bring new facts into its domain with only a minimum of additional assumptions, seemed to lead to a deeper understanding of English grammar as a whole.

7.2.2 Criticisms of Formal Explanation Nobody can reasonably reject the idea that some generalizations are stateable in purely grammar-internal terms. For example, the reason for (i.e., the explanation of) the fact that subjects of passive verbs in English occur in the nominative case is the fact that all subjects of tensed verbs occur in the nominative case. In other words, we have provided a formal explanation, and one which surely no functionalist could dispute. One could go on to ask why subjects have a particular case marking, but that, of course, is a different question. The most common criticism of formal modes of explanation, and one that is rampant throughout the functionalist literature, is that any model employing theory- internal constructs is in principle non-explanatory: In essence, a formal model is nothing but a restatement of the facts at a tighter level of generalization … There is one thing, however, that a formal model can never do: It cannot explain a single thing … The history of transformational-generative linguistics boils down to nothing but a blatant attempt to represent the formalism as ‘theory,’ to assert that it ‘predicts a range of facts,’ that it ‘makes empirical claims,’ and that it somehow ‘explains’ …. (Givón 1979:5–6; emphasis in original)

Givón however confuses two types of models: descriptive (iconic) and theoretical (explanatory). The former indeed are ‘nothing but a restatement of the facts’ and that is all they are intended to be. Zellig Harris, for example, who endeavored to provide a ‘compact one–one representation of the stock of utterances in the corpus’ (Harris 1951:366), was committed to a descriptive model. But ‘the function of a [theoretical] model is to form the basis of a theory, and a theory is intended to explain some phenomenon’ (Harré 1970:52). I conclude, then, that a formal explanation is, by any criterion, a ‘real’ explanation. The only questions to answer therefore are whether formal explanation is necessary and whether it is sufficient. The following section answers the first question in the affirmative and section 7.3.1 answers the second question in the negative.

Formal and Functional Explanation 135

7.2.3 Empirical Support for Formal Explanation: The Autonomy of Syntax If the form of language could be read off the functions of language or off the meanings that it conveys, then there would be no need for formal explanation. But just focusing on the latter, it is clear that such a ‘reading off ’ is impossible. To begin, no language has ever been reported that lacks ambiguity or paraphrase. And also, in every language we find a host of lexical idiosyncrasies, as in (8–10): (8)

a. He is likely to be late. b. *He is probable to be late. (likely, but not probable, allows raising)

(9) a. He allowed the rope to go slack. b. *He let the rope to go slack. (let doesn’t take infinitive marker) (10) a. He isn’t sufficiently tall. b. *He isn’t enough tall. /He isn’t tall enough. (enough is the only degree modifier that occurs post-adjectivally) Likewise, in most languages there are many more thematic roles expressible than there are structural means to express them. So a single grammatical relation encodes more than one role: (11)

a. Mary threw the ball.

[‘Mary’ is the Agent of the action]

b. Mary saw the play.

[‘Mary’ is the Experiencer of an event]

c. Mary received a letter.

[‘Mary’ is the goal/recipient of transfer]

d. Mary went from Chicago to Detroit. [‘Mary’ is an object undergoing transfer of position] Despite their thematic differences, in each sentence of (11), Mary is the grammatical subject. There are also categorial mismatches, that is, cases where the same concept is encoded by different grammatical categories. Consider (12a–b): (12)

a. Bill has [a lot of] friends. b. Bill has [many] friends.

Whatever one might want to call the quantificational nominal expression a lot of, it clearly does not have the formal properties of the quantifier many. Yet in relevant respects (12a) and (12b) have the same meaning (see Francis 1998 for discussion). And

136 Frederick J. Newmeyer finally there are displacements—frontings and extrapositions—that break up what are clearly semantic units: (13)

a. [Many objections to the new work rules] were raised. b. [Many objections] were raised [to the new work rules].

The many–many relationship between form and meaning (and as we will see, form and discourse function) was the key reason for adopting what Chomsky (1957) called ‘the autonomy of syntax,’ defined as in (14): (14)

Autonomy of Syntax (AS): The rules (principles, constraints, etc.) that determine the combinatorial possibilities of the formal elements of a language make no reference to constructs from meaning, discourse, or language use.

The need for formal explanation is of course logically entailed by AS. There are two parts to motivating AS empirically. To establish its correctness, one must demonstrate that: (15)

a. There exists an extensive set of purely formal generalizations orthogonal to generalizations governing meaning or discourse. FoG1 FoG2 FoG3 FoG4 FoG5 etc.

(FoG = Formal Generalization)

b. These generalizations ‘interlock’ in a system. FoG1

FoG4

FoG2

FoG3

FoG5

Both (15a) and (15b) are crucial. It is not enough just to point to formal generalizations, since all but the most extreme functionalists would agree that they exist. Rather, it is necessary to show that these generalizations are fully intertwined in a system. Support for AS is provided by displaced wh-phrases in English, that is constructions like those in (16–19):

Wh-constructions in English: Questions: (16) Whoi did you see ___i? Relative Clauses: (17) The woman whoi I saw___i Free Relatives: (18) I’ll buy what(ever)i you are selling __i.

Formal and Functional Explanation 137 Wh (Pseudo) Clefts: (19) Whati John lost ___i was his keys. In each construction type, a wh-word is displaced from its normal position in phrase structure to the left edge of its clause, creating a dependency between it and a coindexed gap, as indicated by ‘___i‘ in (16–19). Here we find a nice example of a structural generalization that does not map smoothly onto a semantic or discourse-functional one. The structures of these constructions are all essentially the same. Despite the structural parallelism, however, the wh-words in the four constructions play very different semantic and discourse-functional roles. In simple wh-questions (16), sentence- initial position serves to focus the request for a piece of new information, where the entire clause is presupposed except for a single element. Relative clauses (17) have nothing to do with focusing. The fronted phrase is a topic, not a focus, merely repeating the information of the head noun woman. Free relatives (18), by definition, have no head noun. The fronted wh-phrase seems to fulfill the semantic functions of the missing head noun. And pseudo-clefts (19) are different still. The clause in which the wh-word is fronted represents information that the speaker can assume that the hearer is thinking about. But the function of the wh-word itself is not to elicit new information (as is the case with such phrases in questions), but rather to prepare the hearer for the focused (new) information in sentence-final position. In other words, there is a profound mismatch between a formal generalization on the one hand and meaning and function on the other hand. Occasionally it is claimed that the function of a fronted wh-phrase is to set up an operator-variable configuration or to act as a scope marker. In the simplest cases, that might be true, as in questions like (20): (20) What did Mary eat? = for what x, Mary ate x But when the full range of wh-constructions is considered, there is no support for the idea that their semantic role is to encode operator-variable relations. So in (21a) there is no operator-variable configuration corresponding to the gap and its antecedent (which are coindexed with subscript i). Rather, the coreference relation is similar to that of a personal pronoun and its antecedent, rather than an operator-variable relation, as indicated by the paraphrase in (21b): (21)

a. Hillary Clinton, whoi ten million Americans voted for ___i in the primaries, was not nominated. b. Although ten million Americans voted for her in the primaries, Hillary Clinton was not nominated.

Or take questions where the wh-word itself is embedded in the phrase that moves. Which book and To whom in (22) are the phrases that are displaced, though they themselves are not the operators:

138 Frederick J. Newmeyer (22) a. Which book did you read? = for which x, x is a book, you read x b. To whom did you write an angry letter? = for which x, x is a person or persons, you sent an angry letter to x Nor does the fronted wh-element act reliably as a scope marker, as partial Wh- Movement in German and Romani, in which a main clause wh-element occurs within a subordinate clause, illustrates (McDaniel 1989). As the examples indicate, in German and Romani the elements translated as ‘with whom’ and ‘whom’ respectively have scope over the entire sentences, yet they occur in the embedded clauses: (23)

Wasi glaubt [IP Hans [CP mit wem]i [IP Jakob jetzt ti what think Hans with whom Jacob now ‘With whom does Hans believe Jacob is now talking?

(24)

Soi [[IP o Demìri mislinol what that Demir think ‘Whom does Demir think Arifa saw?’

spricht]]]? speak

[CP [kas]i [IP i Arìfa whom the Arif

dikhl’a ti]] saw

In other words, wh-constructions exhibit a profound mismatch between a formal generalization and both meaning and function. What is more, there is a significant cross-linguistic disparity when it comes to the analogs of English wh-constructions. For example, many languages have different pronouns for relatives and interrogatives or no pronouns at all in relative clauses. That fact as well indicates that the morphological similarity shown in English has no compelling discourse-functional motivation. Wh-constructions support (15b) as well, since the relevant structural generalizations interlock with each other within the broader structural system of English syntax. For example, as is well known, for each wh-construction, the wh-word can be indefinitely far from its associated gap: (25)

a. Whoi did you ask Mary to tell John to see ___i?

(question)

b. the woman whoi I asked Mary to tell John to see ___i

(relative clause)

c. I’ll buy what(ever)i you ask Mary to tell John to sell ___i.

(free relative)

d. Whati John is afraid to tell Mary that he lost ___i is his keys. (wh-cleft) But as (26) shows, wh-words cannot move out of a subordinate clause embedded within a complex noun phrase, a restriction predicted by the formal principle known as Subjacency:1 1 While Subjacency is a constraint on grammars, there seems little doubt that its origins lie in processing demands (see Hawkins 1999, 2004). Some scholars have argued that Subjacency effects are

Formal and Functional Explanation 139 (26) a. *Whoi did you believe the claim that John saw ___i?

(question)

b. *The woman whoi I believed the claim that John saw ___i

(relative clause)

c. *I’ll buy what(ever)i Mary believes the claim that John is willing to sell ___i.

(free relative)

d. *Whati John believes the claim that Mary lost ___i is his keys.

(wh-cleft)

Importantly, Subjacency constrains movement operations that involve no (overt) wh-element at all. For example, Subjacency accounts for the ungrammaticality of (27): (27) *Mary is taller than I believe the claim that Susan is. Many other syntactic processes interlock structurally with wh-constructions, from Case Assignment to Subject–Auxiliary Inversion. In other words, such constructions provide evidence for the idea of an autonomous structural system, and hence the need for formal explanation. Consider now an argument for AS based on the cross-linguistic typology of sentential negation. Presumably the semantics of negation is essentially the same from language to language. Yet the morphosyntactic realization of negation varies wildly across languages. For example, in some languages, such as Tongan, negation is encoded by a complement-taking verb. ‘Ikai behaves like a verb in the seem class (we know there is a complement because ke occurs only in embedded clauses): Tongan (Churchward 1953:56; J. R. Payne 1985:208): (28)

a. Na ’e ‘alu ‘a Siale ASP go ABSOLUTE Charlie ‘Charlie went’ b. Na ’e ‘ikai [S ke ‘alu ASP NEG ASP go ‘Charlie did not go’

‘a Siale] ABSOLUTE Charlie

Other languages treat the negative marker as an auxiliary verb. In Ladakhi, a TibetoBurman language spoken in India, the auxiliaries yin and yod together mark present continuous tense, but in the negative the latter is replaced by its negative form med. The negative med replaces yod in the negative of all tense–aspect–modality categories where this form of the copula occurs: almost entirely processing-based (see Kluender and Kutas 1993; Hofmeister and Sag 2010), though such is not the majority position.

140 Frederick J. Newmeyer Ladakhi (Koshal 1979, Miestamo 2005): (29) a. pəlldənni ʂpečhə ɖi-yin-yot Paldan.ERG book.ABS write-AUX-AUX ‘Paldan is writing a book’ b. pəlldənni ʂpečhə ɖi-yin-met Paldan.ERG book.ABS write-AUX-NEG.AUX ‘Paldan is not writing a book’ In many—possibly most—languages, negation is represented by a derivational affix. This situation can be illustrated with an example from Turkish, where the negative morpheme is one in a series of such affixes: Turkish (J. R. Payne 1985:227) (30) acı- n- dır- ıl- ma- dı- k grieve- REFL- CAUS- PASS- NEG- PAST- 1ST PL ‘We were not made to grieve’ In Evenki, formerly known as Tungus, one negative element, a#cin, belongs to the class of nouns. Consider the following: Evenki (J. R. Payne 1985:228): (31)

a. nuηan a#cin he absence ‘he is not here’ b. nuηartin a#cir they absence ‘they are not here’ c. nuηartin a#cir-du-tin they absence-LOC-their ‘in their absence’

Example (31c) shows that a#cin has a plural form and takes case endings like ordinary nouns. Finally, in English, the word not is an adverb in the same class as never, always, just, and barely (Jackendoff 1972; C. L. Baker 1991; Ernst 1992; Kim 2000; Newmeyer 2006). To summarize, languages have morphosyntactic systems that are not in lockstep with semantic or discourse properties, as is consistent with AS. It follows, then, that formal explanation is a justified mode of explanation in linguistics.

Formal and Functional Explanation 141

7.3 Functional Explanation In this section I begin by outlining some commonly offered types of functional explanations in linguistics (7.3.1). Section 7.3.2 raises an inherent (though not insurmountable) difficulty with this mode of explanation, and 7.3.3 notes that even many formal linguists have put forward, or endorsed, the idea of functional explanation.

7.3.1 Types of Functional Explanations Advocates of functional explanation typically appeal to either the pressures posed by the exigencies of communication, which presumably have invariable aspects from culture to culture, or to general human cognitive make up (or to a combination of the two) to explain why something might show up in roughly the same way from language to language. For example, a theme pervading much work in the functionalist tradition is that language structure to a considerable degree has an ‘iconic’ motivation. Roughly, that means that the form, length, complexity, or interrelationship of elements in a linguistic representation reflects the form, length, complexity, or interrelationship of elements in the concept that that representation encodes. For example, it is well-known that syntactic units tend also to be conceptual units. Or consider the fact that lexical causatives (e.g., kill) tend to convey a more direct causation than periphrastic causatives (e.g., cause to die). So, where cause and result are formally separated, conceptual distance is greater than when they are not. The grammar of possession also illustrates structure–concept iconicity. Many languages distinguish between alienable possession (e.g., John’s book) and inalienable possession (e.g., John’s heart). In no language is the grammatical distance between the possessor and the possessed greater for inalienable possession than for alienable possession. Iconicity would seem to have its roots in the fact that comprehension is facilitated when syntactic units are isomorphic to units of meaning. Experimental evidence shows that the semantic interpretation of a sentence proceeds on line as the syntactic constituents are recognized. To illustrate the effects of iconicity by pointing to a concrete cross-linguistic universal, Moravcsik (1980:28) has claimed that if in a language the temporal order of syntactic constituents that describe events symbolizes anything about the actual order in which these events are said to take place, the symbolism is iconic in that the earlier-mentioned event is to be taken as the one that takes place earlier. Indeed, Event-Related Potential (ERP) studies have shown that non-iconic temporal ordering of clauses (e.g., ‘Before X, Y’) is both harder to process and involves different neural activation than iconic ordering (e.g., ‘After X, Y’) (Münte, Schlitz, and Kutas 1998). Another commonly appealed to form of functional explanation is based on the self- evident facts that language is used to communicate and that communication involves

142 Frederick J. Newmeyer the conveying of information. Consequently, it is argued, the nature of information flow should leave—and has left—its mark on grammatical structure. Consider, by way of example, a suggested explanation for why ergative clause patterns are so common cross- linguistically (that is, cases in which the subject of an intransitive verb and the object of a transitive verb receive the same [absolutive] morphological marking). Du Bois (1985, 1987) studied narratives in Sacapultec, an ergative language, and found that clauses with two full noun phrases are quite rare. Typically only one occurs in actual discourse: either the intransitive subject or the transitive object. Furthermore, it is just these argument types that are used to introduce new participants in the discourse. Du Bois suggests that the combination of a verb and absolutive NP is a ‘preferred argument structure’ and concludes: ‘We may thus hypothesize that the grammaticalization of morphological ergativity in Sacapultec is ultimately a crystallization of certain patterns in discourse’ (1985:353). In other words, ergativity is motivated by principles of information flow. Information-flow-based principles have also been appealed to in order to explain the ordering of the major elements within a clause. So in Russian, the proposition ‘Daddy has brought a Christmas tree’ may be expressed in (at least) the following three ways (Vachek 1966:90): (32)

a. Pápa prinyós yólku. (SUBJ – VERB – OBJ) b. Yólku prinyós pápa. (OBJ – VERB – SUBJ) c. Yólku pápa prinyós. (OBJ – SUBJ – VERB)

It has been suggested that speakers of Russian exploit different word order possibilities depending on the relative information content contributed by the subject, verb, and object. A commonly expressed view is that information generally moves from the more thematic to the less, where thematic information is that shared by speaker and hearer and connected in a specific way to the ‘goal’ of the discourse.2 This idea has been expressed by Firbas (1964) in terms of the notion ‘Communicative Dynamism’:3 It has been claimed that the explanatory effects of this principle are also found in languages in which the speaker has little choice in the ordering of elements. Consider English, where the subject must rigidly appear before the verb—a ‘grammatical word order language,’ in the terms of Thompson (1978). Thompson claims that it is just such languages that have a number of what are described in the generative literature as ‘movement transformations,’ i.e., processes that promote material from postverbal position to signal its functioning as a discourse theme and demoting subjects to signal their playing the discourse

2 Alongside the opposition thematic vs. non-thematic (or rhematic), we also have old information vs. new information, ground vs. figure, topic vs. comment, and a number of others. These terminological distinctions are used differently by different investigators, and in some approaches coexist (in different senses) in one particular analysis. 3 Jakobson (1963) deemed this principle an iconic one, since the flow of information from old to new matches the flow in time of the speech act.

Formal and Functional Explanation 143 role of focus. Among the former are Passive, Tough-Movement, and Raising; among the latter are It-Cleft and Wh-Cleft: (33)

a. Tom’s latest theory has just been proven wrong. b. Linda is fun to speak French with. c. George is likely to need a ride.

(34) a. It’s Vicki who made the announcement. b. What Robbie needs is someone to sign with. In other words, we have a discourse-based explanation for the existence of such ‘transformational’ processes in languages like English. Other types of functional explanations are rooted in processing efficiency. One of the first ever proposed is Zipf ’s Law (Zipf 1935), which states that words occurring frequently in speech will be shorter on average than less frequent items. By way of illustration, note that articles, prepositions, auxiliaries, and conjunctions tend to be shorter than nouns and verbs. Along similar lines, there appears to be pressure to allow the hearer to identify the constituents of the sentence and its component phrases as rapidly as possible. The following universal would appear to have precisely that motivation: (35)

If a language has prepositions, then if the demonstrative follows the noun, then the relative clause follows the noun (Hawkins 1983:74).

Given the theory of Hawkins (1994, 2004), a prepositional language in which relatives preceded the noun, but demonstratives followed, would create difficulties in terms of recovering the constituent structure and hence the meaning of the phrases that manifested that particular correlation. Consider another example of a robust typological generalization. Hawkins (1983) proposed the following hierarchy: (36) Prepositional Noun Modifier Hierarchy (PrNMH): If a language is prepositional, then if RelN then GenN, if GenN then AdjN, and if AdjN then DemN. The PrNMH states that if a language allows long things to intervene between a preposition and its object, then it allows short things. This hierarchy predicts the possibility of prepositional phrases with the structures depicted in (37) (along with an exemplifying language): (37)

a.

PP[P NP[___N…] (Arabic, Thai)

b.

PP[P NP[___N…]; PP[P NP[Dem N…] (Masai, Spanish)

c.

PP[P NP[___N…]; PP[P NP[Dem N…]; PP[P NP[Adj N…] (Greek, Maya)

144 Frederick J. Newmeyer d.

PP[P NP[___N…]; PP[P NP[Dem N…]; PP[P NP[Adj N…]; PP[P NP[PossP N…]

e.

PP[P NP[___N…]; PP[P NP[Dem N…]; PP[P NP[Adj N…]; PP[P NP[PossP N…];

(Maung)

PP[P NP[Rel N…] (Amharic)

The parsing-based explanation of the hierarchy is straightforward. The longer the distance between the P and the N in a structure like (38), the longer it takes to recognize all the constituents of the PP. Given the idea that grammars try to reduce the recognition time, the hierarchy follows. (38)

PP NP

P X

N

Since relative clauses tend to be longer than possessive phrases, which tend to be longer than adjectives, which tend to be longer than demonstratives, which are always longer than ‘silence,’ the hierarchy is predicted on parsing grounds.

7.3.2 Difficulties with Functional Explanation As we will see in the following section, even some of the most ardent formal linguists see a place for functional explanation. Nevertheless, a significant percentage of linguists— for the most part formalists of one stripe or another—harbor the suspicion that intuitively plausible functional explanations are so abundant and so free for the taking when appeal to one or another might prove useful, that there is little content to the claim that ‘Phenomenon P is explained by Functional Explanation FuE.’ To give a concrete example, consider again Communicative Dynamism, discussed in section 7.3.1. As it turns out, the Nilotic languages systematically violate it (D. L. Payne 1990:11). Other languages, such as Ojibwa (Tomlin and Rhodes 1979), Ute (Givón 1983), O’odham (Papago) (D. L. Payne 1987), and Cayuga (Mithun 1987) have more nonthematic-before-thematic structures than thematic-before-nonthematic structures. And the great majority of languages allow at least some structures of the former type. For example, in English and (I believe) most other languages, focused elements can occur in fronted position before topics. So if one hears a knocking at the door and has identified the caller, one would be more likely to say (39a) than (39b): (39) a.

Your brother is at the door.

b.

At the door is your brother.

Formal and Functional Explanation 145 Why should this be if Communicative Dynamism is a force shaping syntactic structure? Talmy Givón suggests a countervailing force at work, namely one that he calls ‘Communicative Task Urgency’ and has the following effects: Given the preceding discourse context, less predictable information is fronted. … Given the thematic organization of the discourse, more important information is fronted. (Givón 1988:275)

According to Givón, Communicative Task Urgency also has a functional motivation, namely the natural human inclination to take care of more important things before less important things. In other words, we have two forces shaping grammars—both eminently plausible in and of themselves—that are diametrically opposed to each other. In the absence of a theory that explains the ‘balance of power’ between Communicative Dynamism and Communicative Task Urgency, that is, when and why one would exert its influence and not the other, appeal to either is all but vacuous. As noted in Newmeyer (1998:ch. 3), this situation is unfortunately the norm; most proffered external explanations have a counterpart whose effects work in the opposite direction.4 To be fair, most functionalists are well aware of the problem of potential vacuity caused by conflicting functional explanations. And more than a few have attempted to refine their analysis specifically with the view in mind of eliminating the danger of vacuity. So consider Du Bois’s account of ergativity, referred to in section 7.3.1. The obvious question to raise is why, given that the alignment of subjects of intransitives with objects of transitives is motivated by preferred information flow, all languages are not ergative. Du Bois appeals to a principle that competes with preferred information flow. This is the predisposition to mark cognitively similar entities (in this instance, human agentive topics) the same way (i.e., with nominative case) over long stretches of discourse. Du Bois claims to have avoided vacuity on the basis of the behavior of split-ergative languages: when there is a person split, 1st and 2nd are nominative, and 3rd ergative. Since 1st and 2nd person are never new participants in a discourse, but are always ‘given’ (Chafe 1976), his theory predicts that when ergativity is split, they will be marked nominatively, not ergatively. Thus the theory satisfies the goal of limiting language types: a split ergative system with ergative 1st person and nominative 3rd person will never occur.

4 Optimality Theory has attempted to deal with the problem of opposing forces at work, though not successfully, as argued in Newmeyer (2005:ch. 5). One reason to prefer processing-based explanations to other types is that there do not seem to exist alternatives with ‘equal and opposite’ effects (see Newmeyer 1998 for discussion). For interesting attempts to provide a non-vacuous processing-based account of the relative weight of Communicative Dynamism and Communicative Task Urgency, see Hawkins (1994:ch. 4) and Arnold, Wasow, Losongco, and Ginstrom (2000).

146 Frederick J. Newmeyer

7.3.3 Formal Linguists and Functional Explanation Very few formal linguists reject tout court the idea of functional explanation. The following decades-old quote from Paul Postal represents such a rejection, but I feel that the sentiment expressed, namely that pure fashion drives language change (and hence the properties of linguistic structures) to represent only a tiny minority of formalist thinking: There is no more reason for languages to change than there is for automobiles to add fins one year and remove them the next, for jackets to have three buttons one year and two the next, etc. (Postal 1968:283)

While Chomsky, needless to say, has never elaborated any particular functionalist theory, from his very earliest days he has shown himself amenable to the idea of functional explanation.5 In a reply to Searle (1972/1974), Chomsky (1975) wrote that ‘there are significant connections between structure and function’ (56) and to Searle’s contention that ‘it is reasonable to suppose that the needs of communication influenced [language] structure,’ he responded: ‘I agree’ (58). While he went on to express skepticism that that any comprehensive external explanation of the formal properties of language is likely to be forthcoming and pointed to a fundamental linguistic principle (the structure-dependency of rules) that would appear to have no grounding in communicative function, he is absolutely clear that there is no principled reason that aspects of the language faculty might not have been shaped to meet some external need. Chomsky’s Lectures on Government and Binding actually proposed several function- based principles. For example, it noted that given the appropriate context, sentences like Johni said that Johni would win are acceptable, and suggests that under certain circumstances Principle C of the binding theory might be overridden by the ‘general discourse principle … Avoid repetition of R-expressions, except when conditions warrant’ (Chomsky 1981a:227). Chomsky went on to remark that this principle: … quite possibly fall[s]‌together with the Avoid Pronoun principle, the principles governing gapping, and various left-to-right precedence conditions as principles that interact with grammar but do not strictly speaking constitute part of a distinct language faculty, or at least, are specific realizations in the language faculty of much more general principles involving ‘least effort’6 and temporal sequence. (Chomsky 1981a:227)

5

The psycholinguist Thomas Bever has long championed the idea of the compatibility of formal and functional explanations (see, i.a., Bever 1970, 1975, 2003). 6 Least effort-based explanations were to play an important role in the early development of the Minimalist Program (Chomsky 1995b), as is attested by principles such as Full Interpretation and the Last Resort condition on movement.

Formal and Functional Explanation 147 In the past decade, Chomsky, in the context of minimalism, has proposed a number of functional explanations for properties of grammars. For example, consider his explanation for the existence of dislocation. He distinguished between ‘deep semantics’ (thematic relations among elements) and ‘surface semantics’ (information structure properties such as ‘topic’ and ‘focus’) and wrote that: if surface semantics were signaled by inflection [like deep semantics is—FJN], the underlying morphological system would be complicated. For elements with Inherent Case, there would be double inflection if they have distinct surface-semantic properties; for elements lacking Inherent Case, they would have inflection only in this case. In contrast, if surface properties are signaled configurationally, at the edge, the morphological system is uniform throughout: a single Case inflection always (whether manifested phonologically or not). (Chomsky 2002:122)

In other words, dislocation exists for the purpose of simplifying the identification of the meaning-bearing elements of the clause. Terminology aside, such an account differs only in nuance from analogous attempts to explain the existence of frontings and postposing that were put forward by Prague School linguists more than a half century ago. In a number of recent publications, Chomsky has called attention to three factors that enter into the growth of language in the individual: 1., Genetic endowment, apparently nearly uniform for the species … ; 2., Experience, which leads to variation, within a fairly narrow range … ; and 3., Principles not specific to the faculty of language. (Chomsky 2005:6)

The ‘third factor,’ needless to say, encompasses functional explanation (see chapter 6) and, indeed, Chomsky has appealed to the third factor in putting forward a functional explanation (different from the one cited in his 2002 book) for displacement. So why are ‘items are commonly pronounced in one position but interpreted somewhere else as well’? Because [f]‌ailure to pronounce all but one occurrence follows from third factor considerations of efficient computation, since it reduces the burden of repeated application of the rules that transform internal structures to phonetic form—a heavy burden when we consider real cases. (Chomsky 2007b:21–22)

The new openness to functional explanation among formal linguists is perhaps best exemplified by the following quote by Cedric Boeckx, one of the most ardent defenders of the Minimalist Program (see also chapter 15 on the connections between Greenbergian typology and functionalist explanations): It is worth noting that Greenberg’s universals are really surfacing properties of language that typically can be explained in functionalist terms and allow for a variety of exceptions. (Boeckx 2009b:196)

148 Frederick J. Newmeyer

7.4 The Compatibility of Formal and Functional Explanation As we have just seen, formalists do not generally reject the need for functional explanation as a complement to formal explanation. But the converse is not the case. A persistent view among functionalists is that once one characterizes a system as discrete and algebraic, as in formal grammar, one gives up any hope of explaining any aspect of that system functionally. The following quote from two leading functionalists is typical: The autonomy of syntax cuts off [sentence structure] from the pressures of communicative function. In the [formalist] vision, language is pure and autonomous, unconstrained and unshaped by purpose or function. (Bates and MacWhinney 1989:5)

Bates and MacWhinney’s view is mistaken, however. Actually, it seems to be only linguists who have this odd idea. In every other discipline that I am aware of, formal and functional accounts are taken to be complementary, not contradictory. Take for example a formal system like the game of chess. There exists a finite number of discrete rules. Given the layout of the board, the pieces, and how the pieces move, one can ‘generate’ every possible game of chess. But functional considerations went into the design of a system, namely to make it a pleasurable pastime. And external factors can change the system. In fact, rules of chess have changed over the centuries. And they can change again if the international chess authority decides that they should. Furthermore, in any game of chess, the moves are subject to the conscious will of the players, just as in the act of speaking, there is the conscious decision of the speaker to say something. In other words, chess is a formal system and is explained functionally. If chess can be that way, then why not language? Or consider a biological example. Take any bodily organ, say, the liver. The liver can be described as an autonomous structural system, but, still, it has been shaped by its function and its use. The liver evolved in response to selective pressure for a more efficient role in digestion. And the liver can be affected by external factors. A lifetime of heavy drinking of alcohol can shape the structure of the liver for the worse. So, the liver, again, is an autonomous structural system, but has been shaped functionally. So I do not see any merit in pointing to the need for functional explanation as a wedge against the very idea of formal explanation. Even granting that both formal and functional explanations are needed, we are still left, for many particular cases, with the question of the ‘balance of power’ between the two. In particular, there are numerous cases in which the linguist appears to be forced to choose between one and the other. Consider a concrete example involving head–complement order. Clearly at one level a formal account in necessary: formal principles irreplaceable by functional principles specify, say, that English is head-initial and that Japanese is head-final.

Formal and Functional Explanation 149 But what tells us that, in general, if a language is verb–object, then it tends to be preposition– object, complementizer–clause, and so forth, and that if a language is object–verb, then it tends to be object–postposition, clause–complementizer, et cetera (Dryer 1992)? Here the classical formal and functional explanations differ. The former have generally involved a ‘head–parameter’ provided by UG, which aligns heads and complements consistently across languages (Koopman 1984; Travis 1989; and, more recently, Baker 2001; Roberts 2007; see also chapter 14, section 14.3.1). There are a number of functionalist explanations for this phenomenon, but the most successful point to the parsing advantages that result from keeping complements consistently to the left or consistently to the right of their heads (Hawkins 1994, 2004). Suppose that the parameter-based approach and the parsing preference model made identical predictions. Which should we choose? One reasonable answer is ‘both.’ That is, it might turn out that at a certain level, both theories are correct. In such a scenario, our mental grammars encode parameters for directionality of Case and theta role assignment. These parameters might be analyzed as grammaticalizations of the parsing principle. In fact, Hawkins hints that something along these lines might be right: Languages of all these types [discussed by Travis and Koopman] clearly exist, and grammars need to be written for them, possibly in [their] terms. (Hawkins 1994:109)

However, as he goes on to stress: Generative rule types and principles such as directionality of Case-and theta-role assignment may still be useful at the level of describing particular languages … and in predicting all and only the sentences in the relevant domain, but they are no longer a part of UG itself. (Hawkins 1997:18)7

There are numerous cases, however, where the formal explanation and the functional explanation are irrevocably in conflict. Consider, for example, what is known as the ‘definiteness effect’ in English sentences with existential there. An old observation is that the definite article normally sounds very strange in such sentences, as in (40a): (40)

a. ?There’s the dog running loose somewhere in the neighborhood. b. There were the same people at both conferences.

Syntacticians have generally opted for a purely syntactic explanation for the oddness of the definite article (Milsark 1977; Safir 1985; Reuland 1985). However, Ward and Birner (1995) have argued, convincingly in my opinion, that the correct explanation is functional, since it is based on the pragmatics of the elements involved. As they argue, NPs in such sentences are required to represent a hearer-new entity. Hence (40b) is impeccable. 7 Cedric Boeckx agrees, writing that ‘parametric clusters [are] tendencies, not to be accounted for in terms of UG principles’ (Boeckx 2011a:216). For related views challenging the classic view of parameters, see Newmeyer (2005) and Hornstein (2009). See also c hapters 14 and 16.

150 Frederick J. Newmeyer Many linguists who have no opposition to formal explanations in general have taken syntacticians to task for proposing such explanations when a functional explanation would be more empirically motivated (see, for example, Kuno 1987; Prince 1991; and Kuno and Takami 1993). I close this section with an interesting example from Lightfoot (1999) illustrating how difficult it can be to tease out relative roles of formal and functional explanation in a particular situation. In a nutshell, Lightfoot demonstrates that a formal constraint, whose ultimate explanation is most likely functional, can nevertheless have dysfunctional consequences, leading grammars to resort to formal means (but different means for different languages) to overcome these consequences. Consider the principle of Lexical Government (41), a key element of the theory of Chomsky (1981a): (41)

Lexical Government: Traces of movement must be lexically governed.

This condition does a lot of work—for example, it accounts for the grammaticality distinction between (42a) and (42b): (42) a. Whoi was it apparent [ei that [Kay saw ei]]? b. *Whoi was it apparent yesterday [ei that [Kay saw ei]]? In (42b) the word yesterday blocks government of the intermediate trace (in boldface) by the adjective apparent. Or consider phrase (43): (43) Jay’s picture (43) is at least 3-ways ambiguous: Jay could be the owner of the picture, the agent of the production of the picture, or the person portrayed (the object reading). The derivation of the object reading is depicted in (44): (44) [Jayi’s [picture ei]] Notice that the trace is governed by the noun picture. Now consider phrase (45): (45) the picture of Jay’s (45) has the owner and agent reading, but not the object reading. That is, Jay cannot be the person depicted. The derivation, schematically illustrated in (46), explains why: (46) *the picture of [Jayi’s [e ei]] The trace of Jay’s is not lexically governed; rather it is governed by another empty element, understood as ‘picture.’

Formal and Functional Explanation 151 Lightfoot is quite open to the possibility that condition (41) might be functionally motivated: … the general condition of movement traces … may well be functionally motivated, possibly by parsing considerations. In parsing utterances, one needs to analyze the positions from which displaced elements have moved, traces. The UG condition discussed restricts traces to certain well-defined positions, and that presumably facilitates parsing. (Lightfoot 1999:249)

However, he goes on to show that this condition—functionally motivated though it may be—has dysfunctional consequences. The problem is that it blocks the straightforward extraction of subjects: (47) a. *Whoi do you think [ ei that ei saw Fay]? b. *Whoi do you wonder [ei how [ ei solved the problem]]? Sentences (47a–b) are ungrammatical because the boldfaced subject traces are not lexically governed. Indeed, in the typical case, subjects will not be lexically governed. Nevertheless, it is safe to assume that it is in the interest of language users to question subjects, just as much as objects or elements in any other syntactic position. In other words, the lexical government condition is in part dysfunctional. Interestingly, languages have devised various ways of getting around the negative effects of the condition. They are listed in (48a–c): (48) Strategies for undoing the damage caused by the lexical government condition: a. Adjust the complementizer to licence the extraction. b. Use a resumptive pronoun in the extraction site. c. Move the subject first to a nonsubject position and then extract. English uses strategy (48a): (49) Who do you think saw Fay? Swedish uses strategy (48b): (50) Vilket ordi visste ingen [hur det/*ei Which word knew no one how it/e ‘Which word did no one know how it is spelled?’

stavas]? is-spelled?

The resumptive pronoun det replaces the trace, so there is no lexically ungoverned trace to violate the lexical government condition. And Italian uses the third strategy (48c). In

152 Frederick J. Newmeyer Italian, subjects can occur to the right of the verb, and this is the position from which they are extracted (as in (51)): (51)

Chii credi [che abbia who do-you-think that has ‘Who do you think has telephoned?

telefonato ei]? telephoned?

What we have here in other words are functionally motivated formal patches for dysfunctional side effects of a formal principle that is functionally motivated. I suspect that this sort of complex interplay between formal and functional modes of explanation is the norm, rather than the exception, in syntactic theory.

7.5 Conclusion An explanation in linguistics is considered ‘formal’ if it derives properties of language structure from a set of principles formulated in a vocabulary of nonsemantic structural primitives and ‘functional’ if it derives properties of language structure from human attributes that are not specific to language. Both modes of explanation have their inherent strengths and weaknesses. The desirability of formal explanation is entailed by the empirical correctness of the hypothesis of the Autonomy of Syntax. Its principal drawback is that any formal account relies on a web of underlying hypotheses that are themselves often highly tentative. There is also considerable empirical evidence that aspects of linguistic structure are functionally motivated. The principal difficulty with functional explanation is that the over-availability of plausible external motivations, each potentially pulling in a different direction, can lead to the vacuity of any particular functional explanation. Nevertheless, it seems that both formal and functional explanations have their role to play in linguistic theory.

Chapter 8

Phonol o gy in U ni v e rs a l Gram ma r Brett Miller, Neil Myler, and Bert Vaux

8.1 Introduction: Definitions and Scope In order to investigate the phonological component of Universal Grammar (UG), we must first clarify what exactly the concept of UG involves.1 The terms ‘Universal Grammar’ and ‘Language Acquisition Device’ (LAD) are often treated as synonymous,2 but we believe that it is important to distinguish between the two. We take a grammar to be a computational system that transduces conceptual-intentional representations into linear (but multidimensional) strings of symbols to be interpreted by the various physical systems employed to externalize linguistic messages. It thus includes the traditional syntactic, morphological, and phonological components, but not phonetics, which converts the categorical symbols output by the grammar into gradient representations implementable by the body. Bearing the above definition of ‘grammar’ in mind, we take ‘Universal Grammar’ to refer specifically to the initial state of this computational system that all normal humans bring to the task of learning their first language (cf. Hale and Reiss 2008:2; and c hapters 5, 10, and 12). The phonological component of this initial state may contain, inter alia, rules (the ‘processes’ of Natural Phonology Stampe 1979), violable constraints (as in Calabrese’s 1988, 1995 marking statements or Optimality Theory (OT)’s markedness and 1 Authors are listed in alphabetical order. Thanks to James Clackson, Ahrong Lee, Dave Odden, Ian Roberts, Bridget Samuels, Oktor Skjærvø, Patrick Taylor, and Jeffrey Watumull for helpful comments on aspects of this chapter. 2 Compare for instance Chomsky’s (1968, 1972, 2006) characterization of the LAD with his 2007 definition of UG as ‘the theory of the initial state of FL [the faculty of language],’ which encompasses the senses of both UG and the LAD defined in this section.

154 Brett Miller, Neil Myler, and Bert Vaux faithfulness constraints [Prince and Smolensky 1993/2004]), a set of parameters (e.g., Dresher’s 2009 feature tree), or nothing at all (Chomsky and Halle 1968, Reiss 2008; what these call Universal Grammar falls under our definition of the LAD to be set out in the next paragraph). Issues of interest in the phonological literature relevant to this conception of UG include (in OT) the contents of the universal constraint set CON, (in Natural Phonology), the set of natural processes such as coda devoicing, (in Calabrese’s 1988/1995 model), the inventory and ranking of marking statements, and so on. We take the Language Acquisition Device, on the other hand, to be the complex of components of the mind/brain involved in hypothesizing grammar+lexicon pairs upon exposure to primary linguistic data (PLD), the subset of sensory percepts deemed by learners to be valid tokens of the language they are attempting to learn. This conception of the LAD includes (limiting ourselves for present purposes to items of phonological interest) the mechanisms that determine what aspects of sensory input are linguistic, the system(s) that transduce sensory input into linguistic symbols (distinctive features, prosodic units, linear precedence encoders), and the inventories of symbols, relations, and operators which these systems manipulate. The latter three inventories are commonly assumed to include a universal set of distinctive features, a set-or tree-theoretic hierarchy of relations between these features (feature geometry, Clements 1985), a universal set of logical operators such as ∀, ∃, and ¬ that can take scope over these elements (Reiss 2003; Fitzpatrick, Nevins, and Vaux 2004), and (in Rule-Based Phonology) operators encoding the notions SPREAD, DELINK, and INSERT (cf. McCarthy 1988) or (in OT) GEN (the function that generates a potentially infinite set of candidate outputs from a given input via an unspecified set of operations) and EVAL (the function that assigns violation marks to output candidates and selects winners). Issues of interest in the phonological literature relevant to this conception of the LAD include learning biases that constrain the transparency (Kiparsky 1973), granularity (Pierrehumbert 2001), extension (Hale and Reiss 2008; Ross 2011), and ordering (Koutsoudas, Sanders, and Noll 1974; Kiparsky 1985; inter alia) of phonological hypotheses. Also of interest are the inventories and internal organization of phonological symbols, the range of possibilities for combining these symbols, the ontology of recurrent constraints such as the Obligatory Contour Principle (Leben 1973), Derived Environment Constraint (Kiparsky 1993), Sonority Sequencing Principle (Clements 1990), and Final Consonant Extraprosodicity (Itō 1986), the mechanisms used to construct underlying representations (Vaux 2005; Nevins and Vaux 2008), and so on. Both UG and LAD understood in the senses just described present a wide range of questions worthy of phonological investigation and debate. Since the semantic domain of the term ‘UG’ in the literature typically encompasses both UG and the LAD as defined earlier in this section (e.g., White 2003:xi,2), and since what counts as part of UG vs. the LAD differs depending on the linguistic architecture one assumes (e.g., the putative final voicing gap investigated in section 8.4.2 would be attributed by OT to mechanisms in UG, but in the LAD in classic Rule-Based Phonology [RBP; Kenstowicz 1994]) we touch on aspects of both in this chapter.

Phonology in Universal Grammar 155 Research involving UG is typically cast in terms of whether or not such an entity exists (see, e.g., Cowie 1999; and response by Fodor 2001), but we consider this to be a non- question, as learners demonstrably must bring a nontrivial array of prior knowledge to the learning task if anything at all is to be learned (cf. Yang 2004:451)—to put the matter baldly, an actual blank slate learns nothing whatsoever upon exposure to PLD. In practice, the questions being debated between the nativists and empiricists typically have to do with the exact nature and extent of this prior knowledge:

i. Are any of the mechanisms involved in language acquisition particular to the language faculty? ii. Are any or all of these co-opted from other biological and cognitive systems? iii. If so, which ones? Debates within the nativist camp, on the other hand, center on questions of the substantive content of UG and the LAD: Does UG contain a Sonority Sequencing Principle, and is it violable or inviolable? Does UG provide an invariant initial ranking of markedness constraints (Davidson, Jusczyk, and Smolensky 2004)? Can phonology count (Paster 2012)? And so on. The literature in this area is so enormous that it would be futile to attempt a comprehensive review of it here. For this reason we concentrate on illustrative recent contributions to this debate and organize our discussion around specific themes of current interest rather than taking a historical view of the field. Much of this recent work bears directly on questions (i) and (ii) above, and specifically whether phonology constitutes a counterexample to the idea, associated especially with Hauser, Chomsky, and Fitch (2002), that the species-specific component of the human language faculty is rather small, perhaps consisting only of recursion (see Pinker and Jackendoff 2005 for the argument that it is such a counterexample).

8.2 Rhetorical Typology and Practice Arguments for and against UG and the LAD and their putative components tend to take a limited number of rhetorical forms in phonological contexts. Supporters of UG/LAD typically invoke some combination of cross-linguistic recurrence (e.g., with respect to natural processes in Natural Phonology, or constraints in OT) and gaps (see Odden 2005 on retroflexes; or Kiparsky 2006 on final voicing, which we examine in detail in section 8.4). Proponents of phonological elements in UG have also cited economy considerations; for example, Kenstowicz and Kisseberth (1979), and many others have argued that positing syllables as part of the repertoire enables simpler analyses of recurrent cross-linguistic patterns. Learnability considerations have been invoked as well, as with the suggestion that an OT grammar is more easily acquirable because learners need only respond to PLD by ranking a set of constraints provided by UG, rather than constructing rules from scratch as is required in classic RBP (Tesar and Smolensky 1998).

156 Brett Miller, Neil Myler, and Bert Vaux Adherents of OT commonly maintain that Emergence of the Unmarked effects provide evidence for specific universal constraints not motivated by the PLD (McCarthy and Prince 1994), a variant of the poverty of the stimulus argument (see also chapter 10). Along similar lines, phonologists working within an RBP framework have suggested that the spontaneous appearance of rules or constraints in first and second language acquisition when unmotivated by evidence from the first or second language may provide evidence for phonological elements of UG such as the Derived Environment Constraint (see Eckman, El Reyes, and Iverson 2001 on the acquisition of English by Korean and Spanish speakers) and Identity Avoidance (see Vaux and Nevins 2007 on nanovariation in English schm-reduplication). On the opposite side of the fence, scholars such as Hale and Reiss (2008) have argued that we should not be too quick to attribute cross-linguistic synchronic and diachronic patterns and restrictions to the invisible hand of UG; these may be equally well attributed to independently motivated properties of the human perception and production systems whose activities extend beyond language per se. Others have added that learning biases implicated in phonological patterns may surface in other animals as well (e.g., Kuhl and Miller 1975 on voice onset time in chinchillas), undermining UG for those who limit its contents to elements specific to humans (see chapter 21). It is also possible to mount a plausible argument that at least some typological patterns claimed in the literature are illusory and that the actual facts of language are more consistent with a model in which, for example, features are constructed upon exposure to PLD rather than predetermined by UG (Mielke 2004). Finally, Flemming (2005) and Samuels (2011) have pointed out that UG postulates such as syllables do not always allow for more economical formulations of phonological generalizations, and it is not clear that economy is a valid criterion for deciding membership in UG in the first place. In order to clarify how exactly the types of argumentation just reviewed are put into practice in UG debates, let us consider in more detail a typical nativist argument that employs the recurrence and gaps arguments: one of the most significant discoveries of 20th-century phonology is that the seemingly endless variation in sound patterns in the world’s languages is not entirely random. Rather, some phonological patterns and processes are common in many unrelated language families, whereas other imaginable ones are rare or not attested at all. Processes that simultaneously target aspiration and voicing are commonplace, for instance—as in Thai final laryngeal neutralization (Clements 1985:235)— whereas processes simultaneously targeting aspiration and rounding are not. The same asymmetry holds for repairs of ill-formed phonological configurations; it is often observed, for example, that aspiration and voicing specifications in syllable codas are typically repaired via devoicing and deaspiration, but rarely3 3 Lombardi (2001) and many subsequent publications in Optimality Theory assert that deletion and epenthesis are never recruited to repair violations of constraints on laryngeal specifications in syllable codas, but deletion is attested in Chinese L2 English (Anderson 1983, Xu 2004) and epenthesis in the L2 English of speakers of Brazilian Portuguese (Major 1987), Chinese (Eckman 1981, Xu 2004), Korean (Kang 2003, Iverson and Lee 2006), and Vietnamese (Nguyen and Ingram 2004), as well as in first language acquisition (Major 2001). See Flynn (2007) for further examples and discussion.

Phonology in Universal Grammar 157 via epenthesis (Lombardi 2001; part of the larger Too Many Solutions Problem, Kager 1999, Steriade 2008). Such striking facts demand an explanation if valid, and many linguists, especially in the generative tradition, have argued that UG should bear the burden of this explanation. This viewpoint cuts across even deep theoretical divides in the generative literature, being manifest in Rule-Based Phonology from SPE onwards (see especially Chomsky and Halle 1968:ch. 9; Kenstowicz and Kisseberth 1979:251), and a key tenet of Natural Phonology (Stampe 1979; Donegan and Stampe 1979) and Classic OT (Prince and Smolensky 1993/2004; McCarthy 2002, 2008). The following statement from Kenstowicz’s standard textbook on Rule-Based Phonology encapsulates this position: There are many recurrent aspects of phonological structure of a highly specific and rich character whose acquisition cannot be explained on the basis of analogy or stimulus generalization in any useful sense of these terms. These properties are also most naturally explained as reflections of UG. (1994:2)

An alternative view argues that such recurrent aspects of phonological structures can be explained by the interplay of phonetic constraints on articulation and perception and historical change (perhaps in conjunction with general cognitive constraints on learnability). Common phonological patterns and processes reflect common sound changes, and the frequency of a given sound change will depend on articulatory and perceptual factors. This viewpoint is particularly prominent in the work of John Ohala (e.g., Ohala 1971, 1972, 1975, 1981, 2005; Ohala and Lorentz 1977; Chang, Plauché, and Ohala 2001). It forms the basis of Evolutionary Phonology as pursued by Juliette Blevins in recent work, and has also been adopted by some generativists (Hale and Reiss 2000a; Pycha et al. 2003; Vaux and Samuels 2004; Blevins 2004; etc.). As emphasized by Vaux (2003) and Samuels (2011), such a view takes much of the explanatory burden of phonology away from UG and passes it on to phonetics, the language acquisition device, and diachronic linguistics. Samuels (2011) and Samuels, Hauser, and Boeckx (this volume, chapter 21), taking this view to its logical end point, argue that this may even allow the formal theory of phonology to be reduced to a small number of primitive operations and categories with counterparts in the cognitive capacities of other species. The rest of this chapter examines the state of the UG debate with respect to two specific areas in phonological theory where the arguments made are representative of what we find in phonological UG discussions as a whole. Section 8.3 considers the question of whether UG should account for typological generalizations—that is, whether there should be a direct match between the phenomena that the theory predicts to exist and those that are attested in the languages of the world. Section 8.4 reviews the arguments for and against incorporating a notion of segmental and processual markedness into UG, as argued for explicitly in recent Optimality-Theoretic work by de Lacy (2002, 2006a, 2006b); Kiparsky (2006, 2008); and de Lacy and Kingston (2013). The evidence and arguments reviewed lead us to conclude that there must be a phonological component of the LAD and perhaps UG as well (in the senses defined in

158 Brett Miller, Neil Myler, and Bert Vaux section 8.1), but that the evidence claimed for specific components of UG in particular is equivocal at best.

8.3 Constraining the Typological Space with UG The desire that the theory of Universal Grammar should predict typological generalizations is present in SPE (see Chomsky and Halle 1968:4–5) and appears to have been a driving force in the move towards autosegmental phonology (begun in Goldsmith 1976) in the 1970s and ‘80s (witness McCarthy’s 1988:84 hope that ‘if the representations are right, then the rules will follow’). More recently, one of the most salient properties of Optimality Theory (Prince and Smolensky 1993/2004) is that typological predictions emerge automatically from the logically possible permutations of rankable constraints. Since the number of possible rankings of a given OT constraint set is the number of constraints factorial, this property of the model is known as Factorial Typology. Factorial Typology is frequently cited by proponents of OT as a major argument in its favor, as when Féry and Fanselow (2002) state that OT ‘offers a restrictive theory of linguistic variation: differences between languages can arise only as different rankings of universal principles in different languages’ (see also Pater 1996 for similar claims). As an illustration of the workings of Factorial Typology and the use of UG machinery to account for typological generalizations, we take Pater’s (1996) work on clusters of a nasal followed by a voiceless stop (which we shall notate as NT clusters henceforth).4 Pater notes that such sequences are disallowed in many languages, and where they would be created by affixation, phonological processes often eliminate the sequence. The processes that destroy such clusters vary in striking ways from language to language. For example, in many Austronesian languages, the voiceless stop disappears and the preceding nasal appears to assimilate to the place of the deleted stop (this phenomenon is termed ‘Nasal Substitution.’ To simplify the exposition, we omit reference to Pater’s discussion of root-faithfulness, by which in some languages NT clusters are retained in root morphemes but are eliminated at boundaries between affixes). (1) Indonesian (Pater 1996:2) /məN+pilih/ → [məmilih] ‘to choose, to vote’ /məN+tulis/ → [mənulis] ‘to write’

/məN+kasih/ → [məŋasih] ‘to give’

4 We employ this case because it is perhaps the one most often invoked by proponents of Optimality Theory, but note that extensive and serious empirical and conceptual problems with Pater’s line of reasoning have been pointed out by Blust (2004).

Phonology in Universal Grammar 159 In Mandar, on the other hand, nasals lose their nasality when they precede a voiceless stop: (2) Mandar (Pater 1996:16) /maN+dundu/ → [mandundu] ‘to drink’ /maN+tunu/ → [mattunu] ‘to burn’ Still another process is exemplified by Puyo Pungo Quechua, in which NT clusters created by affixation are eliminated via voicing of the stop: (3) Puyo Pungo Quechua (Pater 1996:21; glosses corrected by NM) sinik-pa ‘porcupine’s’

kam-ba ‘yours’

saĉa-pi ‘in the jungle’

hatum-bi ‘in the big one’

wasi-ta ‘the house-ACC’

wakin-da ‘the others-ACC’

Pater reports that Kelantan Malay exhibits a fourth option, deletion of the nasal and retention of the voiceless stop. Pater uses the typological fact that NT clusters are often subject to phonological processes, alongside phonetic and acquisitional data concerning these clusters, as evidence that they are marked. In OT terms, this means that there is a constraint in UG that assigns violation marks to NT clusters. Pater calls this constraint *NT: (4) *NT (Pater 1996:5) No nasal/voiceless obstruent sequences. This constraint conflicts with several different faithfulness constraints, each of which favors in different ways the preservation of underlying NT in the surface form. Pater analyzes Nasal Substitution as the merger of the stop and nasal, with the resulting segment retaining the place features of the stop and the [nasal] feature of the underlying nasal. Such merger violates a constraint which favors the maintenance of underlying ordering relationships amongst segments. This constraint is known as Linearity: (5) Linearity (Pater 1996:9) S1 reflects the precedence structure of S2 and vice versa. Denasalization, as seen in Mandar, violates a constraint that favors faithfulness to underlying nasality: (6)

Ident IO [nasal] (adapted from Pater 1996:17) Any output correspondent of an input segment specified as [nasal] must be [nasal].

160 Brett Miller, Neil Myler, and Bert Vaux A faithfulness constraint which protects underlying voicing specifications in obstruents militates against postnasal voicing, which is seen in Puyo Pungo Quechua: (7)

Ident[obsvce] (Pater 1996:22) Correspondent obstruents are identical in their specification for [voice].

Deletion of the nasal, seen in Kelantan Malay, is punished by the Max constraint in (8). (8) Max (adapted from McCarthy 2008:196) Assign a violation mark for every segment in the input that has no correspondent in the output. Pater shows that different permutations of these constraints can model the different processes which affect NT clusters. For example, if *NT dominates Linearity, and all other faithfulness constraints also dominate Linearity, then Nasal Substitution will result, as shown in the tableau in (9) (numerical indices indicate correspondence relationships amongst input and output segments; the winning candidate is indicated by an arrow): (9)

/məN1-p2ilih/ IdIO[nas]

Max

Id[ObsVce] *NT

→mem1,2ilih mep1p2ilih

* *!

mem1b2ilip

*!

mem1p2ilih mep2ilih

Lin

*! *!

Denasalization will result if Ident-IO[nasal] is dominated by all of the other constraints. Nasal deletion results from ranking Max the lowest, and postnasal voicing by ranking Ident[obsvce] the lowest. Finally, a language in which NT clusters are freely permitted (such as English) is derived by ranking *NT below all of the faithfulness constraints. This is a classic illustration of factorial typology at work: the attested range of variation is derived by different rankings of universal constraints. However, as Pater points out, a problem arises when one considers the possibility that epenthesis could also eliminate a NT cluster. This would occur if a constraint against epenthesis, known as Dep, were ranked below all of the constraints we have considered thus far:

Phonology in Universal Grammar 161 (10) Dep (adapted from McCarthy 2008:197) Assign a violation mark for every segment in the output which has no correspondent in the input. The result would be as in (11): (11)

/məN1-p2ilih/

IdIO[nas]

Max

Id[ObsVce]

*NT

Lin

→məŋ1əp2ilih

*

məm1,2ilih məp1p2ilih

!* *!

məm1b2ilip

*!

məm1p2ilih məp2ilih

Dep

*! *!

Since Dep is clearly a necessary constraint, Factorial Typology predicts the possibility of the constraint ranking in (11), and hence also the possibility of a language in which epenthesis breaks up NT clusters. The problem is that no such language appears to exist—the Factorial Typology thus predicts the existence of a superset of the actually attested phenomena. Critics of Optimality Theory (e.g., Hale and Reiss 2000a, 2008; Vaux 2003/ 2008) take this as evidence for the futility of trying to derive typological predictions from the formal grammar, arguing that what is learnable (sanctioned by UG) is almost certainly a superset of what is attestable, with the latter being explained by independent phonetic, acquisitional, and historical considerations, converging in this respect with work on Evolutionary Phonology (Blevins 2004), to be discussed later in this section. Interestingly, some research within the OT tradition has begun to converge on this conclusion as well. Myers (2002) shows that gaps in factorial typologies are ‘pervasive’ (p. 1), arguing that ‘while there is often more than one attested way of avoiding a marked configuration, it is hardly ever the case that every way of avoiding the configuration is attested … it is the norm for some of the possible rankings to be unattested.’ As well as the absence of epenthesis as a way of avoiding NT clusters, Myers points out that metathesis and lenition of the obstruent are also unattested (pp. 6–8). A similar problem besets Myers’ earlier (1997) work on the ways in which adjacent identical tones are dealt with in the languages of the world (so-called Obligatory Contour Principle (OCP) effects). While various possibilities predicted by factorial typology are attested, such as tone retraction and deletion, there appears to be no language in which a vowel or

162 Brett Miller, Neil Myler, and Bert Vaux a new tone is inserted to break up the OCP-violating sequence (see Myers 2002:8–9) for discussion). Asking why it is that the pervasiveness of these problems had not been noticed previously, Myers attributes it to the narrow scope of much work on factorial typology: ‘most factorial typologies presented in OT work involve a severely limited set of “relevant” constraints. It is only when we consider all the faithfulness constraints and all the ways that a representation could be changed to avoid markedness violations that it becomes clear that gaps are in fact the norm.’ Considering various possible OT responses to this problem and rejecting them, Myers embraces the conclusion that such phenomena, while being allowed by UG, are unattested because ‘the patterns they represent are unlikely to arise diachronically through natural sound changes on the basis of phonetic patterns.’ This is precisely the view of Evolutionary Phonology, to which we now turn. The central premise of Evolutionary Phonology, as Blevins (2004:23) puts it, is that ‘principled diachronic explanations for sound patterns have priority over competing synchronic explanations unless independent evidence demonstrates, beyond reasonable doubt, that a synchronic account is warranted.’ The reason for this attitude is that, since a principled diachronic explanation is one which reduces the phenomenon to independent phonetic factors outside of the Language Faculty per se, any statement of the phenomenon at the level of UG constitutes an unnecessary duplication. This reasoning has interesting implications for OT in the light of Myers’ (2002) paper: if phonetic and diachronic factors are needed to account for gaps in factorial typologies, then could they also be used to capture the effects of markedness and faithfulness constraints, and if so, is there any more motivation for claiming that the latter are part of UG? We address this question with respect to markedness in section 8.4. For now, we illustrate the Evolutionary Phonology approach by discussing how it accounts for the attested and unattested processes which affect underlying NT clusters of the sort discussed by Pater (1996). In Evolutionary Phonology, all synchronic phonological alternations are the results of sound change. Blevins (2004) follows Ohala (1971, 1972, 1975, 1981, 2005) in identifying the listener as the most important agent in sound change. In other words, frequent sound changes reflect frequent mishearings. Because the NT configuration is commonly altered by phonological processes, Evolutionary Phonology predicts that phonetic experimentation should show that NT is easily misheard as something else. Moreover, the attested processes which apply to eliminate NT—voice assimilation, Nasal Substitution, nasal deletion, and denasalization—should be the results of reasonably common misperceptions by experimental participants in the laboratory. The imaginable ways of eliminating NT sequences which are nonetheless unattested, such as epenthesis and metathesis, should turn out not to correspond to misperceptions of NT sequences uncovered by experimentation. Myers (2002:13–30) provides articulatory evidence that all of these predictions are plausible, although unfortunately he cites no data from perceptual experiments, which would be the most direct form of evidence for the Evolutionary Phonology position. In particular, he shows through an original articulatory experiment that stops undergo

Phonology in Universal Grammar 163 more coarticulatory voicing after a nasal than after a vowel. This adds plausibility to the notion that voiced and voiceless stops are harder to distinguish in postnasal contexts, which could then be the source of a mishearing leading to a postnasal voicing alternation being introduced into a language. He also shows that in some languages (as documented for Pokomo by Huffman and Hinnebusch 1998) coarticulation in NT clusters leads to devoicing of the nasal rather than voicing of the stop. Such nasal devoicing obscures the formants characteristic of nasals, potentially leading to the nasal not being perceived at all. In such a case the language learner might either posit that nasal deletion has occurred, or interpret the silence of the nasal as part of the closure for a stop cluster, leading to the denasalization process attested in languages such as Mandar. The fact that epenthesis and metathesis are not found as repairs for NT clusters can also be explained. Epenthesis usually arises historically either (a) when overlap of phonetic gestures leads to the perception of an inserted segment (Ohala 1983; Browman and Goldstein 1990) or (b) when the release of a consonant is reinterpreted as a vowel. However, neither of these two scenarios is likely to occur in a nasal–obstruent cluster in a way that is dependent on voicing. If the nasal happened to be released before an obstruent, then this could give rise to epenthesis, but this would affect both NT clusters and clusters of a nasal followed by a voiced obstruent. Meanwhile, a nasal followed by a fricative could give rise to an intrusive stop (as in the common pronunciation of English prince as [phrɪnts], Clements 1987), but this produces an affricate, which is still an instance of a nasal–obstruent cluster. Hence, no common phonetic pattern gives rise to epenthesis separating an NT cluster (Myers 2002:25–26). As for metathesis, Blevins and Garrett (1998) have argued that it arises when the perceptual cues for a particular consonant are sufficiently extended to allow for interpretation as originating in a new position in the string of sounds. In the case of a vowel followed by a glottal stop, the glottalization can extend to the beginning of the vowel, potentially leading to the sequence being interpreted as a glottal stop followed by a vowel, giving rise to metathesis Vʔ > ʔV. Since the cues for obstruency and nasality would not extend in this way, it is extremely unlikely that a process metathesising NT clusters could evolve (Myers 2002:26–27). It seems then that Evolutionary Phonology is well equipped to capture the typological data on processes affecting NT clusters without the need for direct reference to innate postulates (even, in the case of factorial typology, improving upon a proposal that does rely on innate postulates). However, a fierce debate has recently erupted concerning whether it is possible to do away with a notion of markedness in Universal Grammar, as we shall now see.

8.4 Markedness The notion of markedness in phonology originated with the work of the Prague Circle, most notably in the works of Jakobson and Trubetzkoy, and was taken up in earnest in the central document of early generative phonology (Chomsky and Halle 1968, The

164 Brett Miller, Neil Myler, and Bert Vaux Sound Pattern of English [SPE]:400–435). Although the precise definition of markedness varies across different theories, the fundamental content of the notion is always that certain segments, feature values, prosodic structures, and so forth are in some sense dispreferred—more marked—compared to others. In SPE (and its derivational successors), markedness was encoded by a set of markedness conventions which were held to be part of UG. These conventions stated what the unmarked value of a feature was in the context of another group of features. For instance, Convention XXI of SPE (p. 406) tells us that the unmarked value for the feature [±voice] is [–voice] in the context of the feature [–sonorant]; in other words, the unmarked case for obstruents is to be voiceless, rather than voiced. Feature values which were unmarked could be left out of lexical representations and filled in by the markedness conventions. Hence, less marked feature values led to grammars which were more highly valued with respect to economy of lexical storage, and so should be preferred by learners, all else held equal. The markedness conventions were therefore an important plank in the generative approach to capturing typological facts. Later, Classic Optimality Theory (Prince and Smolensky 1993/2004) gave a different formal definition to markedness: a given feature value, prosodic structure or segment is marked if and only if there exists a markedness constraint in UG that assigns violation marks to it. While differing from the classical generative view in allowing markedness constraints to be ranked with respect to each other (and with respect to faithfulness constraints), this view is still one in which the existence and nature of markedness are attributed to Universal Grammar. An alternative to this is suggested by Myers’ (2002:21) succinct statement (although he himself appears to prefer a theory in which markedness constraints are part of UG): ‘Marked phonological structures are those that tend to be misheard as something else.’ The view that synchronic markedness can in fact be reduced to this in conjunction with diachronic change has been pursued in much work by Ohala, Blevins, and certain generativists (viz. Ohala 1971, 1972, 1975, 1981, 2005; Ohala and Lorentz 1977; Hale and Reiss 2000a; Chang, Plauché, and Ohala 2001; Vaux and Samuels 2004; Blevins 2004). We have already seen from the case study of NT clusters in section 8.3 that this approach is successful in accounting for certain instances of markedness as it effects typology (Blevins 2004 catalogues many other examples). However, there has been some debate as to whether this approach is sufficient to capture all aspects of markedness. Many objections have been raised in the work of de Lacy and of Kiparsky (de Lacy 2002, 2006a, 2006b; Kiparsky 2006, 2008; de Lacy and Kingston 2013), to which we now turn.

8.4.1 Velar Epenthesis Some of de Lacy’s and Kiparsky’s arguments for the need to posit markedness constraints in UG depend on examples of marked configurations that are plausible results of sound change but which are claimed to be unattested and impossible. For example, de Lacy and Kingston (2013) assert that phonological velars are never epenthetic and

Phonology in Universal Grammar 165 are never the outcomes of place neutralization, in spite of static distributional patterns in some languages suggesting that (for instance) velar neutralization has occurred as a sound change. De Lacy and Kingston (2013:293–294) illustrate this last pattern with Peruvian Spanish forms like those in (12) and (13); several similar cases are discussed in Howe (2004) and de Lacy and Kingston (2013). (12)

Peruvian Spanish static pattern: stop codas in loan words are velar pepsi [peksi] Hitler [xikler]

(13)

Peruvian Spanish (some speakers) apto ‘apt’ [akto] (homophonous with acto ‘act’) abstracto ‘abstract’ [akstrakto] opcional ‘optional’ [oksjonal]

Howe (2004:17) mentions that Cuban Spanish possesses not only a similar static pattern of coda stops being velar but also related alternations including the prefix sub-having velar realizations like [suk]-before a consonant vs. [suβ]-before a vowel. In order for this not to constitute an example of synchronic velar neutralization, it would be necessary to insist that the alternation is not phonological. Yet since velar preconsonantal codas are predictable in this dialect, such an insistence seems motivated by nothing other than the desire to preserve the putative universal that velar neutralization does not occur as a synchronic phonological process. Rice (1999) and Howe (2004) also mention Uradhi dialects reported with a word- final epenthetic segment realized as [ŋ] if the next consonant leftward is nasal, otherwise variably as [k]‌or [ŋ]. While acknowledging uncertainty about this example, de Lacy and Kingston (2013:309) suggest that the [k] allophone shows long-distance consonant harmony. Synchronically, however, it seems possible that the invariably nasal epenthetic allophone is the one harmonizing, and such harmony would only be long-distance if the intervening vowel were phonetically oral. De Lacy and Kingston further suggest that the supposedly long-distance-harmonizing [k] might be a phonologically predictable, semantically meaningless reduplicant morpheme. They mention Gafos (1998) treating some long-distance consonant harmonies as instances of reduplication, along with Rose and Walker (2004) modelling long-distance consonant harmonies with correspondence constraints like those that have been applied to reduplication in OT, but of course it does not follow that all cases of such harmony are reduplicative (see Hansson 2001). Further, the agreement-based analysis of consonant harmony encounters a number of serious problems that do not arise for an alternative using relativized locality (Nevins and Vaux 2004). De Lacy and Kingston (2013) effectively dismiss the remaining kinds of evidence in Howe (2004) as amounting to nothing more than sound changes possibly involving epenthesis of velars or neutralization to velars but not producing synchronic phonological patterns of velar epenthesis or neutralization. However, additional arguments

166 Brett Miller, Neil Myler, and Bert Vaux that [k]‌-epenthesis occurs as a synchronic pattern are marshalled by Vaux and Samuels (2012) (see also Blevins 2006b:253, citing Vaux 2003) in support of a broader claim that any segment can become epenthetic as the result of reanalysis of a historically earlier process deleting that segment in some environments. One of the better-known examples of this is the [r]-insertion found in many nonrhotic varieties of English, a hiatus prevention strategy derived by reanalysis of earlier coda [r]-deletion.

8.4.2 Final Voicing In a similar controversy, Kiparsky has drawn attention to the dearth of examples of obstruents neutralizing to voiced segments in syllable-or word-final position, a process we hereafter term ‘final voicing.’ Plausible phonetic motivations for word-final devoicing are well known (see Blevins 2004:103–106 for a summary). However, Kiparsky (2008:45) asserts that final voicing does not occur. He argues that it could easily arise through several ordered sets of possible sound changes; thus, its non-occurrence must be due to a synchronic constraint somehow ruling out such diachronic scenarios, and Evolutionary Phonology is inadequate because, he argues, it lacks the necessary type of constraint. Kiparsky (2008:47) suggests two diachronic scenarios that would produce final voicing, shown in (14a–b). Kiparsky (2006:223) adds the scenario shown in (14c) (along with two others treated in (23)). (14) Final voicing scenarios a. Scenario 1: chain shift resulting in markedness reversal Stage 1: tatta tata tat (*tatt) (gemination contrast) Stage 2: tata tada tad (*tat) (lenition) • Result at stage 2: new voicing contrast, word-final phonological voicing. b. Scenario 2: lenition plus apocope Stage 1: takta tada (*tata, *data, *tat, *dat) (allophonic V___V voicing, no final -C) Stage 2: takta tad (*tat, *dat, *dad, *dat) (apocope, unless final *-CC would result) • Result at stage 2: allophonic voicing of word-final stops. c. Scenario 3: lenition plus deletion Stage 1: tat tad dat dad (voicing contrast) Stage 2: tad tað dad dað (coda lenition) Stage 3: tad ta dad da (loss of weak fricatives) • Result at stage 3: only voiced stops occur in codas.

Phonology in Universal Grammar 167 Even if these scenarios did lead to a system with final voicing, an important caveat must be sounded with respect to such diachronic arguments in general. In order for them to be convincing cases of overgeneration by the Evolutionary Phonology model, it must first be shown not only that each individual stage has a nonnegligible probability of occurrence, but that the probability of all stages occurring (= P(S1)×P(S2)×P(S3), assuming the changes are independent) is not vanishingly small. Otherwise, the apparent absence of such systems in attested natural languages could reduce to the unlikeliness of the whole scenario taken together and would thus not constitute a challenge to Evolutionary Phonology. Unfortunately, it is not clear at this point how to establish the likelihood of these scenarios without making arbitrary assumptions about the probability of occurrence of each individual stage. Even if a way around this obstacle can be found, it seems to us that the scenarios in (14) produce postvocalic voicing rather than final voicing (see also Blevins 2006a:145, who makes this observation for the first two scenarios). To further confirm this point, we examine one language not mentioned in this debate—Middle Persian—which appears to offer an approximate equivalent to a combination of Kiparsky’s second and third final voicing scenarios (see (14a–b)). In what follows, we use the Middle Persian facts as a means of unpacking the content and predictions of Kiparsky’s model. Proto-Iranian possessed a two-way laryngeal opposition in obstruents word-initially (15a); this contrast was preserved in Middle Persian (here represented by transcriptions of the written form known as Pahlavi) (15b). (15) selected5 word-initial laryngeal developments in Iranian a.

P-Ir

b. Pahlavi6

c. Avestan7

gloss

*b-

*brātā-

brād

brātā

brother

*d-

*dāru-

dār

dāru-

wood

*g-

*gāu-

gāw

gau-

cow

*p-

*pitar-

pidar

pitar-

father

*t-

*tanū-

tan

tanū-

body, self

*k-

*kapauta-(ka-)

kabōd

OP kapautaka

grey-blue

One can also see in (15b) that by the time of Pahlavi, Proto-Iranian final syllable rimes were lost in polysyllabic words;8 compare for example P-Ir. *dāru-‘tree’ with its Pahlavi

5

We omit the palatal affricates here, as their situation is more complicated (see Pisowicz 1984:22). All Pahlavi forms are from Mackenzie (1971). 7 We provide cognates from Old Iranian languages (Avestan or Old Persian [OP]) to give the reader a better sense of why Iranists assume the particular Proto-Iranian forms presented here. 8 Iranists tend to label this process as final syllable loss (e.g., Schmitt 1989:60; Weber 2007:942; Korn 2009:207), missing the facts that the onset of the final syllable normally does not delete and that monosyllables do not undergo the same reduction. 6

168 Brett Miller, Neil Myler, and Bert Vaux descendant dār. This process of rime loss can be compared to Stage 2 of Kiparsky’s Scenario 2 in (14). Proto-Iranian also contrasted voicing in stops in post-sonorant position (16a), but Pahlavi lost this contrast as a result of spirantization of original voiced stops in later Old Iranian (represented by the Young Avestan examples in (16ic); compare Stage 2 of Kiparsky’s Scenario 3) followed by Pahlavi voicing of original voiceless stops in this environment (see (16iib)), probably by the fourth century AD and certainly by the ninth century (Mackenzie 1967, 1971). (16) selected medial laryngeal developments in Iranian a. i.*b

P-Ir

b. Pahlavi

c. Avestan

gloss

*ābar-

āwar-

āβar-

*d

*madaka-

mayg

maðaxa- locust

*g

*yuga-

ǰuɣ

yaoga-

yoke

*āpa-

āb

āpa-

water

*t

*āzāta-

āzād

āzāta-

free, noble

*k

*parikā-

parīg

pairikā-

witch, demoness

ii.*p

bring

Iranists such as Pisowicz (1984) and Sims-Williams (1996) have suggested that the spirantization and voicing developments are connected, with the loss of voiced stops in this position allowing for the voiceless series to move into the vacated slot in a context where voicing is phonetically favored (i.e., after vowels, liquids, nasals, and perhaps voiced fricatives). Iranists typically formulate the voicing process reflected in (16iib) as happening intervocalically (Korn 2003:55; Paul 2008), postvocalically (Back 1978 and Korn 2009:208), or after voiced continuants (Weber 1997:613), rather than after voiced segments, the generalization we employ here. It is difficult to determine whether the process placed any restrictions on what could immediately follow the target consonant, i.e., whether it required a following vowel or could apply in word-final position, because we lack Pahlavi forms derived from Proto-Iranian forms ending in final voiceless consonants. We can refine the statement of what could immediately precede the targeted consonant, though, on the basis of forms of the sort in (17); the intervocalic and postvocalic generalizations mentioned at the beginning of this paragraph fail to account for the voicing that occurs after consonantal sonorants9 (as well as voiced fricatives according to Pisowicz 1984:18, though he provides no examples), and the post-continuant theory makes the wrong predictions for nasal contexts (since nasals are [−continuant] yet trigger voicing).

9 This process appears to have been blocked by at least some morphological boundaries (cf. Weber 1997:613); e.g., *ham-prsa- ‘consult’, lit. ‘ask together’ > Avestan hampərəsa-‘to deliberate’, Pahlavi hampursag ‘consulting’ (not *hambursag).

Phonology in Universal Grammar 169 (17) Pahlavi post-consonantal voicing a. *nT

*rT

P-Ir

b. Pahlavi

c. Avestan

gloss

*spanta-

spandarmad

spəṇta-+ ārmaiti Holy Thought

*antar(-aka)

andar(ag)

aṇtarə

between

*panča-

panǰ

paṇča

five

*martya-

mard

martya-

mortal, man

*krp-

kirb

kəhrp-

body, form

*wrka-

gurg

vəhrka-

wolf

Stops do not voice after voiceless segments; cf. P-Ir. *hapta-‘seven’ > Pahlavi haft; *wispa-‘all’ > Pahlavi wisp, *wist ‘twenty’ > Pahlavi wīst.

In light of these data, we propose to formalize the voicing process with the general assimilation rule in (18). (18)

Pahlavi assimilation of stops10 to a preceding voiced segment [−cont] → [−stiff vocal folds] /[−stiff] _ X

X

[-stiff]

[-cont]

|

|

But was it the fact that the voiceless stops followed a voiced segment that led them to undergo voicing, or the fact that they were in final position following the process of final rime loss mentioned earlier? If the latter, we might be approaching something like Kiparsky’s final voicing scenario. Most of the Pahlavi forms presented thus far are compatible with both possibilities, but pidar, kabōd, spandarmad, and andarag, wherein voicing applies word-internally, are not. Forms like these and the ones in (19) demonstrate that the voicing process in Pahlavi must have been triggered by a preceding voiced segment, not by the fact of being in word-final position. (19) relevant forms with intervocalic stops in Pahlavi a. P-Ir b. Pahlavi c. Avestan *p

*xšap-ika-

šabīg

*t

*dātār-

*k

*wikaya-

10

11

gloss

xšap-

darkness, night

dādār

dātāram (acc. sg.)

creator

gugāy

vīkaya-

witness

Note that by the formulation in (18), the rule applies vacuously to nasals as well. Mackenzie (1971) glosses this as ‘Mazdean ritual undershirt’; compare the Armenian loan šapīk ‘shirt.’ Originally it presumably meant something like ‘thing worn at night.’ 11

170 Brett Miller, Neil Myler, and Bert Vaux The Pahlavi facts present a situation of approximately the sort outlined by Kiparsky (2006) in Scenarios 2 and 3; we schematize the parallelism in (20). (20) Scenarios 2 and 3 and their Iranian parallel Scenario 2

Iranian

Scenario 3

voicing contrast (15a, 16a)

Stage 1

—

postvocalic spirantization (16ci) Stage 2

Stage 2

final Rime deletion

—

Stage 1

post-voiced voicing (16bii)

—

Result: post-voiced (and thus most final) stops are voiced; conditions are met for learners to postulate a post-voiced voicing process whose targets include coda stops.

To paraphrase Kiparsky (2006), what we see in the historical development of Pahlavi is a scenario that could produce the effect of a synchronic rule of post-voiced voicing. But did speakers of the Middle Persian variety represented in writing as Pahlavi actually postulate such a synchronic rule, or did the process die as soon as it had applied? The limited nature of the corpus makes it difficult to address this question, but a number of suggestive facts can be identified. The fact that Avestan liturgical loans such as ātaxš ‘fire’ (in contrast to the native outcome of the same root, ādur) do not undergo the voicing process (see Weber 1997:613) might seem to suggest that rule (18) was not synchronically active in Pahlavi, at least by the time these words were borrowed. However, we know from the work of Pierrehumbert (2006) and others that synchronic processes can remain active without applying to new forms, so the exceptional Avestan loans do not prove that rule (18) was not synchronically active in Pahlavi Middle Persian. Indeed, the existence in the language of extensive voicing alternations resulting from morpheme concatenation suggests that the voicing rule remained at least partially productive in the synchronic grammar of Pahlavi. Consider, for example, the suffixes in (21): (21) voicing alternations in Pahlavi (affixed forms from Weber 2007) affix

affixed form

gloss

infinitive –tan < PIr *-tanaiy a.

xuf-tan

sleep

raf-tan

go

kuš-tan

kill

abrōx-tan

illuminate

rēx-tan

flow

bas-tan

bind

Phonology in Universal Grammar 171 b.

pursī-dan

ask

dā-dan

give, create

abzū-dan

increase

bū-dan

be(come)

ēstā-dan

stand

kar-dan

do, make

xwar-dan

eat

za-dan

hit

kan-dan

dig, destroy

past -t < PIr *-ta- c.

āwiš-t

sealed

kar-d

made

dā-d

gave, given

guf-t

spoke(n)

superlative -tom < PIr *-tama- d.

wat-tom

worst

e.

abar-dom

highest

ab-dom

last, final

bē-dom

furthermost

fra-dom

first

di-dom

second

ni-dom

least, smallest

agentive -tār < PIr *-tar- f.

g.

bōx-tār

savior

dāš-tār

keeper

guf-tār

speaker

kas-tār

destroyer

amenī-dār

unthinking

bur-dār

bearer, womb

dā-dār

creator

dī-dār

sight

framā-dār

commander

gā-dār

husband

handēšī-dār

thoughtful

kar-dār

worker

172 Brett Miller, Neil Myler, and Bert Vaux The reader might be objecting at this point that one could equally well postulate underlying representations with initial voiced stops for the affixes in (21), which in tandem with a rule spreading the voicing value of a segment to a following stop would yield the desired surface forms and obviate the need for a voicing rule of the sort in (18). A number of considerations suggest that a devoicing analysis of this type is not to be preferred. First, suffixes beginning with original voiced stops (e.g., -bān ‘keeper of X,’ -bed ‘lord’) do not show voice alternations, as we would expect them to if the grammar contained a devoicing rule of the required sort. Moreover, the superlative form of wad ‘bad,’ wattom ‘worst’ (21d), works nicely if we assume underlying forms /wad/and /-tom/and a garden variety voicing assimilation rule ordered before rule (18) that devoices the /d/before the following /t/, but makes little sense if the underlying form of the suffix is /-dom/—in this case we should expect the superlative to be *[waddom]. A potential problem is posed by a small number of affixes that have non-alternating initial voiceless stops, including the comparative -tar < *-tara-. Derivations such as wad ‘bad’ → wattar ‘worse’ look at first blush exactly parallel to the above-mentioned wad ‘bad’ → wattom ‘worst,’ suggesting underlying forms /wad/and /-tar/. Sonorant- final roots reveal that the situation is somewhat different, though; contrast, for example, abēr-tar ‘much more’ with abar-dom ‘highest,’ abzōn-tar ‘more increasingly’ with kan-dan ‘dig.’ Pahlavi thus presents a three-way contrast in voicing behavior, with non-alternating voiced stops (-bed ‘lord’), non-alternating voiceless stops (comparative -tar), and stops that undergo voicing alternations (superlative -tom ~ -dom). This situation is reminiscent of what we find in Turkish, which possesses a contrast in stem-final position between non-alternating voiced stops (etüd ‘étude’: etüdüm ‘my étude’), non-alternating voiceless stops (at ‘horse’: at-ım ‘my horse’), and alternating stops (at ‘name’: ad-ım ‘my name’). Inkelas and Orgun (1995) and Hale and Reiss (2008) propose to analyze ternary oppositions of this type in terms of equipollent features combined with underspecification; the Turkish non-alternating /d/, for example, would be underlyingly specified as [−stiff], the non-alternating /t/as underlyingly [+stiff], and the alternating d~t as underlyingly unspecified for [stiff]. This sort of archiphonemic analysis can be extended to the Pahlavi case, but requires certain modifications (for both Turkish and Pahlavi) in order to account for the behavior of sequences of alternating consonants. By the logic applied in the Turkish case, the [b]‌of Pahlavi -bed should be underlyingly [−stiff], the [t] of -tar should be underlyingly [+stiff], and the [d]~[t] of -dom/-t om and of wad-/wat- should be underlyingly unspecified for [stiff]. The problem here is that a form like /waD-Dom/12 should then involve two adjacent coronal stops each unspecified for [stiff vocal folds], and it is unclear why they should be subsequently specified as [+stiff] in the surface form [wattom], rather than taking the [−stiff] specification of the preceding vowel, as is the case in /waD/‘bad’ → [wad]. Interestingly, the same problem arises in Turkish, as shown for example by 12

We henceforth use capital letters to denote archiphonemic representations underlyingly underspecified for the feature under discussion.

Phonology in Universal Grammar 173 alternating suffixes like ablative /-DAn/(cf. at-tan ‘horse-abl’ vs. adam-dan ‘man-abl’), which produce voiceless clusters when affixed to stems ending in an alternating stop, e.g., /kitaB/ ‘book’ → [kitap] ‘book-nom,’ [kitaba] ‘book-dat,’ [kitaptan] ‘book-abl.’ We propose (relatively uncontroversially) that underspecified segments which do not receive a value for the feature in question from the application of other phonological rules (as the /D/in Pahlavi /waD/would from the preceding /a/by rule (18)) are assigned the unmarked value for that feature ([+stiff] in the Pahlavi case) by default. But why does /waD-Dom/receive [+stiff] by dint of this redundancy rule rather than [–stiff] from the preceding /a/by application of rule (18)? Applying the redundancy rule before rule (18) would yield the desired result when combined with a leftward rule of voicing assimilation in stop clusters of the sort mentioned earlier. Two possibilities in the phonological literature may be applicable here. First, one could propose that rule (18) is preceded by a rule that coalesces strings of adjacent identical segments into linked structures, which could then lead to the cluster resisting application of rule (18) due to the Uniform Applicability Condition that has been claimed to produce geminate inalterability effects in numerous languages (Schein and Steriade 1986). A second possibility involves the fact that voiced geminates are said to be aerodynamically marked due to the longer closure increasing the probability of voicing failure (Hayes and Steriade 2004:6ff). In other words, the lack of *[waddom] can be captured with rule ordering, or related to a broader formal principle (geminate inalterability), or phonetically grounded (aerodynamics of voicing), illustrating competition between some of the same kinds of explanations (parochial, general-formal, and functional) that are discussed more broadly in the rest of this chapter. To summarize our discussion of Pahlavi, we have seen that the language appears to have followed the sort of historical trajectory that we find in Kiparsky’s second and third scenarios (see (14b,c)), and that this trajectory yields not the cross-linguistically rare final voicing that Kiparsky suggests, but rather a common and natural synchronically productive rule of voicing assimilation that spreads [−stiff vocal folds] from a voiced segment to an immediately following stop, both word-internally and word-finally. The fact that Kiparsky’s first two scenarios produce postvocalic voicing rather than final voicing was already noticed by Blevins (2006a:145). The same can be said of Kiparsky’s third scenario, where incidentally the final step of fricative loss is superfluous in that it does not help produce a pattern of having only voiced obstruents in codas. To see why these scenarios either fail or are unlikely to produce synchronic rules of final voicing, let us first consider what would happen to suffixed forms in Scenario 1. Let us suppose that this language contained one or more vowel-initial suffixes (a cross- linguistically common state of affairs in human languages), one of which we will assume for ease of exposition to be /-a/. Kiparsky’s proto-form *tat would then have the suffixed counterpart *tat-a. This pair should yield {tad, tada} at Stage 2, following application of what Kiparsky appears to assume is a postvocalic lenition rule. Learners exposed to paradigms of this type would most likely postulate underlying forms /tad/and /a/for the relevant morphemes, with no rule of final devoicing; this simple system would directly generate the desired surface forms. If, on the other hand, they were to postulate /tat/and

174 Brett Miller, Neil Myler, and Bert Vaux /-a/plus a rule of final voicing, the grammar would require an additional rule to generate the voicing in suffixed [tada]. One could imagine an aggressive learner postulating a rule of the latter type to account for the static fact that postvocalic stops are invariably voiced in this language, but such a rule would obviate the need for a rule of final voicing. One could hope that learners would induce a rule of final voicing if there were multiple suffixes consisting only of geminates which had reduced to singletons in word-final position in the parent stage, since in the daughter stage these suffixes would show voiceless and voiced singleton stops in word-medial and final position respectively. Yet it seems suspicious to make the final voicing rule rely on such suffixes: there could not be very many of them, and consequently their alternations might be learned without any phonological rule at all. The two more plausible grammars for Scenario 1, then, are one with no voicing rule at all, and one with a general rule of postvocalic voicing. In either case, it is highly unlikely (and perhaps impossible) that a rational learner would postulate a voicing rule restricted to final position. Scenarios 2 and 3 force a similar conclusion. When one considers words of more than two syllables, it becomes clear that Kiparsky’s scenarios would produce postvocalic voicing, not final voicing. Consider the sample forms in (22): (22) Outcomes including longer words in Kiparsky’s Scenarios 2 and 3 a. Scenario 2 • Stage 1: allophonic V___V voicing, no final -C • takta, tada, tadada • Stage 2: apocope, unless final *-CC would result • takta, tad, tadad b. Scenario 3 • Stage 1: voicing contrast • tat, tad, dat, dad, tatat, tadad … • Stage 2: coda lenition • tad, tað, dad, dað, tadad, taðað … • Stage 3: loss of weak fricatives • tad, ta, dad, da, tadad, taa (?) … Here the key forms are original tadada in (22a) and tatat in (22b). The former emerges from Stage 2 as tadad, showing that stops are voiced not word-finally, but postvocalically; the exact same generalization is shown by tatat in Scenario 3, which emerges from Stage 3 as tadad. Blevins (2006a:144–153) suggests that six languages developed what amounts to a postvocalic voicing pattern and that Evolutionary Phonology is consequently right on target in postulating that UG lacks any constraints forbidding this pattern. Kiparsky (2006) in turn rejects all these examples. Here we review the ones in which further

Phonology in Universal Grammar 175 details, parallels, or corrections seem worth mentioning. We agree with Kiparsky that none of the six languages clearly support Blevins’ (2006) claim, but we argue that this fact is not necessarily a problem for Evolutionary Phonology. A number of ancient languages are thought by some writers to have undergone neutralization of post-sonorant word-final stops to voiced forms, including Old Latin (see Meillet and Vendryes 1968:146, Weiss 2009:155). Proto-Italic is thought to have changed earlier postvocalic final *-t to *-d, which survives in a number of early inscriptional forms in various Italic languages (Sihler 1995:228–229). Kiparsky (2006:230–232) suggests based on wider developments within Italic that this segment may have been a voiced fricative when (if ever) its divergence from /t/constituted a synchronically active phonological pattern, but this does not necessarily affect its relevance to the question of final obstruent voicing. Certainly relevant, though, is Blevins’ observation that ‘while there are only a few morphemes for which the sound change is attested, there is no evidence in this case against a general final obstruent voicing process’ (2006a:145–146). The basic problem here is that the data are consistent with Old Latin having undergone a process of final voicing, but the amount and type of data are insufficient to determine whether the process was synchronically active, and whether or not it extended beyond the few coronal cases for which it is attested. In passing we note even more severe qualifications on the possibility of detecting final voicing in Hittite (see Melchert 1994:18) and in Proto-Indo-European itself (see Ringe 2006:143). First, any final voicing rule in Proto-Indo-European would probably need to exclude unsuffixed monosyllables; a voicing contrast in monosyllable-final stops is suggested by the relatively secure etyma *k̑erd ‘heart’ vs.*yekwr̥t ‘liver,’ reflected in Classical Armenian as sirt and leard respectively (thanks to James Clackson for pointing this out). Second, evidence for a synchronically active final voicing pattern in Proto-Indo- European seems limited to a few morphemes such as 3sg *-t- and 3pl *-nt-, which would have been voiced word-finally but voiceless when followed by the ‘hic et nunc’-i (thus *bhéreti ‘is carrying,’ *bhéronti ‘are carrying’ with primary endings, vs. the non-hic-et- nunc equivalents *bhéred, *bhérond, which have secondary endings; forms from Ringe 2006:143). Within this limited set of data, the inference of final voicing itself depends on scant evidence like that of Italic already mentioned. Finally, in Hittite, the non-place stop contrast is recorded orthographically through somewhat inconsistent gemination, which has been interpreted in a variety of ways besides a voice distinction (see Melchert 1994). Welsh is the next candidate for final voicing in Blevins (2006a:146–147). Kiparsky (2006:227–228) counterclaims that the stop contrast in this language is one of aspiration with variable phonetic voicing of the unaspirated series. Even if the phonological contrast involved [stiff vocal folds], though, the Welsh data would resist a final or even post-voiced or post-sonorant voicing analysis. Blevins admits that the language has final voiceless stops but observes that in monosyllables where the rhyme is a vowel followed by a stop, the vowel is predictably short if the stop is one of /p/, /t/, or /k/and long if it is one of /b/, /d/, or/g/. In fact, vowels in stressed monosyllables are predictably long before coda obstruents other than /p/, /t/and /k/(see Wood 1988:231–232), so any

176 Brett Miller, Neil Myler, and Bert Vaux analysis positing final voicing after long vowels would have to limit it to stops. Though this might work, rather than entertaining the possibility, Blevins (2006a:147) claims that the important fact here is counterfactual: if not for loans ending in /p/, /t/, and /k/, Welsh would have had final voicing. Kiparsky (2006:228) understandably responds that real synchronic systems are precisely the topic here, not imaginary alternatives. However, words ending in /p/, /t/, or /k/are not all loans; after nasals these segments are a regular native reflex except that the coronal stop has disappeared in clitics and variably in polysyllables. Examples include dant ‘tooth,’ pump ‘five,’ ieuanc ‘young’ in both Early Welsh (Strachan, Meyer, and Lewis 1937:8) and Modern Welsh (Evans 1852 and University of Wales 2012). Further, British word-final -lt(V(C)) yielded Welsh -llt, while otherwise an earlier post-liquid voiceless stop onset of a final syllable generally yielded a post-liquid word-final voiceless fricative in Welsh (see Schrijver 1995:349). This shows that Welsh underwent no sound change of final voicing per se, but at most only postvocalic or intervocalic voicing, assuming that the stops contrasted in voice rather than aspiration when this took place. We are unaware of evidence that this historical voicing process led to any synchronic phonological pattern of active voicing. The next candidate for final voicing is Somali (Blevins 2006a:147–148), which has two contrastive stop series, one aspirated and the other unaspirated. Unaspirated stops at the bilabial, coronal, and velar places of articulation are medially voiced between voiced segments; except for the lone (coronal) implosive, they are also spirantized between vowels unless the second vowel is a phonetic schwa offglide as described further in the next two paragraphs (Edmondson, Esling, and Harris 2004; cf. Gabbard 2010:20–22). Word- initially, unaspirated stops other than the glottal and epiglottal-uvular are described by Edmondson, Esling, and Harris (2004) as partly voiced, with at least the bilabial being entirely voiceless in the speech of one informant. Gabbard (2010:7–11) shows 86–115 ms of voicing for the bilabial, non-implosive coronal, and velar unaspirated stops in apparently utterance-initial position (i.e., preceded by a flatline waveform), so perhaps the degree of word-initial voicing varies considerably by speaker. Gabbard (2010:7,10) generalizes that non-uvular voiceless stops are aspirated, but without providing any argumentation for voice in coda stops. Edmondson, Esling, and Harris (2004:2–5) go into more detail. At the end of a word (perhaps when not followed by a vowel-initial word, though this is not stated), unaspirated stops other than the glottal stop are followed by a schwa offglide ‘in careful, overly correct speech,’ with non- uvular ones being voiced. In the same environment in ‘conversational style,’ stops apart from the implosive are voiceless glottalized; the examples are all unaspirated, and it is stated that aspirated stops are not found in codas. Coda unaspirated stops as the first member of a word-medial geminate are also identified as voiceless glottalized. This last point disagrees with Gabbard (2010:14, 28–29), who transcribes the geminates phonetically as voiced singleton stops but provides no experimental evidence on either closure duration or uniformity of closure voicing. On the whole, regardless of how the laryngeal contrast in Somali stops is phonologically specified, it appears that only unaspirated stops occur in codas, that they are voiceless glottalized in ordinary speech (at least of Edmondson et al.’s consultants), and that

Phonology in Universal Grammar 177 in especially careful speech, some underlyingly final stops are voiced with a schwa offglide. The fact that underlyingly final epiglottal-uvular stops are followed by this schwa without being voiced makes it harder to argue that the schwa is merely a result of sustaining underlying voice through stop release. Conceivably, the schwa is an artefact of stop release itself in careful speech, and the more anterior stops have become voiced before it as a result of being phonetically intervocalic and prone to greater degrees of voice leak from a preceding vowel. (On the aerodynamic correlation of voice with anteriority, see Hayes 1997; Ohala 1997; Helgason and Ringen 2008) This would entail that spirantization is more restricted than voicing since it would apply only between underlying vowels. Leaving the treatment of Tundra Nenets to Kiparsky (2006), we turn now to Blevins’ (2006a) last example of putative final voicing. Lezgian (Yu 2004) has an alternation between plain voiceless and voiced stops where the voiceless alternant occurs in onsets and the voiced one in codas at the end of a large set of monosyllabic stems (these codas are word-final unless followed by suffixes or by the second element of a compound). The alternating stops contrast with non-alternating voiced stops, voiceless aspirates, and ejectives, all of which are found in both onsets and codas including monosyllabic word- final position. The question is how to represent the alternating pair phonologically. If the onset alternant is taken as basic (as in Yu 2004; and Blevins 2006a:150–152), then Lezgian has a pattern in which otherwise plain voiceless stops are voiced in codas. Kiparsky (2006) instead takes the coda allophone as basic and underlyingly geminate, treating the alternation as a case of onset devoicing and degemination. Yet while the coda alternant does appear to be the historically more conservative one, it is not clear whether Lezgian learners would consider it either underlying or geminate. As seen in Yu (2004), its closure duration is about a quarter longer than the duration of its voiceless intervocalic onset alternant, about a third longer than onset non-alternating voiced stops, and about a fifth longer than coda non-alternating voiced stops. Would these length differences provide a sufficient basis for treating the coda alternant as geminate while treating all the other sounds just mentioned as singletons? Kiparsky notes that onset devoicing occurs in other languages but does not provide examples where voiced or any other kind of geminate stops surface only when not followed by a vowel. In fact, Yu’s (2004) historical analysis is that the coda alternants are and were singletons and that they geminated in onsets (which for independent reasons were generally pretonic), subsequently devoicing and then degeminating in Lezgian. The closure and voicing duration differences between alternating and non-alternating coda voiced stops—25 and 34 ms average in the tokens measured—shows that they do not completely neutralize (Yu 2004:81–83). For a critique of research on ‘partial’ neutralization see Yu (2011), who favors treating it as a decrease in the set of cues available to express a contrast in particular environments rather than necessarily involving actual categorical neutralization in the phonological domain (while also discussing factors that have motivated analyses of the latter type). If Lezgian does not involve categorical neutralization of alternating and non-alternating coda voiced stops, then it is not a counterexample to Kiparsky’s claim that such neutralizations are non-existent.

178 Brett Miller, Neil Myler, and Bert Vaux It would still involve final voicing of an underlyingly voiceless or laryngeally unspecified stop series if the onset alternants are taken as basic, but again it is unclear which alternant learners select as basic. Yu (2004:76–78, 87–88) notes that Lezgian has additional lexically restricted alternations between prevocalic ejectives and word- final voiced stops or aspirates. The word-final voiced alternants of prevocalic ejectives are virtually identical in closure and voicing duration to those of prevocalic plain voiceless stops (the final alternants of the prevocalic ejectives average 7 ms longer in closure duration and 10 ms shorter in voicing duration than those of the plain voiceless stops in the tokens measured; Yu 2004:81). At the same time, the restriction of both voicing alternations (with plain voiceless and ejective onsets) to particular sets of monosyllables within Yu’s data could mean that neither alternation is synchronically productive (Yu 2004:93); monosyllables with non-alternating voiced stop codas include both obvious loans and less easily explained forms, and Yu does not discuss other factors that might indicate which patterns are currently productive (see Yu 2004:89–92). Thus we remain open to the possibility that the putative phonological neutralization to voice in Lezgian codas is neither productive, nor a neutralization, nor a process with a voiced outcome. To summarize the discussion so far, Evolutionary Phonology does not predict that final voicing may be prohibited—merely that its likelihood of occurrence depends on sound change; to date, no single sound change is known to produce final voicing, and hypothetical sequences of sound change like the ones in (14) generate not final but intervocalic or postvocalic voicing patterns (Blevins 2004:108–110, 2006a:16,145). We have seen that synchronic postvocalic or perhaps coda voicing may exist in Somali careful speech and in Lezgian, though the details also seem open to other interpretations. Besides the three diachronic scenarios in (14) which lead at best to postvocalic or post-voiced voicing patterns rather than final voicing proper, Kiparsky (2006:223–224) offers two other diachronic scenarios capable of producing synchronic final voicing patterns—but arguably at some cost to plausibility. These are shown in (23). (23)

Kiparsky’s fourth and fifth final voicing scenarios a. Scenario 4: assimilation plus deletion Stage 1: tata tanta (no voicing contrast, only nasal codas) Stage 2: tata tanda (allophonic voicing after nasals) Stage 3: tata tand (apocope after heavy syllables) Stage 4: tata tad (loss of nasals before stops)

Stage 2 is like Japanese. At Stage 3, final vowels are lost after heavy syllables, as in Old English. Finally, nasals are lost before voiced stops, as in Modern Greek. • Result at stage 4: word-final allophonic voicing.

Phonology in Universal Grammar 179 b. Scenario 5: sound change plus analogy Stage 1: saz atasa, saz dasa, sas tasa (final voicing assimilation) Stage 2: saz tasa, saz dasa, sas tasa (aphaeresis) Stage 3: saz tasa, saz dasa, saz tasa (analogical generalization of voicing) At Stage 1, final obstruents undergo voicing assimilation. At Stage 2 voicing assimilation becomes opaque because initial vowels that trigger it are lost. Then the voiced obstruent is analogically generalized to all environments. • Result at stage 3: word-final voicing. Kiparsky’s Scenario 4 does indeed produce a system wherein stops only occur in word- final position if they are voiced. The postnasal voicing process involved is extremely common cross-linguistically, leading one to think that this might be a promising candidate for producing a final voicing generalization. For the restriction of apocope to post-heavy position, however, the only typological parallel that Kiparsky mentions is a contested interpretation of Old English (see Minkova 1991 for a review of the relevant debate). The next step in the scenario is loss of nasals before (predictably voiced) stops including word-finally—a detail which seems auditorily peculiar since the cues of the nasal would be stronger than those of the stop in that position (see Wright 2004). As a parallel, Kiparsky cites Modern Greek, where nasals are lost before voiced stops in some speech varieties in medial codas and across word boundaries (Arvaniti and Joseph 2000), but it is unclear whether there are any examples with loss of nasals before predictably voiced word-final stops. More importantly, supposing that Scenario 4 is diachronically possible, its conditions seem narrow enough that our lack of any attestation of its outcome might be due to chance. We can hardly insist on the need for synchronic constraints to ban the outcome of a diachronic scenario which in and of itself does not seem very likely. Scenario 5 also seems questionable because of the trajectory of analogy on which it depends. Consider the set of outcomes of this scenario in (24): (24) Expansion of Scenario 5 outcomes • Stage 1: final voicing assimilation • /saz/: saz atasa, saz dasa, sas tasa • /sas/: sas atasa, saz dasa, sas tasa • Stage 2: aphaeresis • /saz/: saz tasa, saz dasa, sas tasa • /sas/: sas tasa, saz dasa, sas tasa • Stage 3: analogical generalization leaving only voiced obstruents finally • /saz/: saz tasa, saz dasa, saz tasa • /sas/: saz tasa, saz dasa, saz tasa

180 Brett Miller, Neil Myler, and Bert Vaux At stage 2, underlying word-final voiced obstruents surface as voiced except before an opaquely defined set of words with initial voiceless obstruents, while underlying word- final voiceless obstruents surface as voiceless except before initial voiced obstruents. Under such complex conditions, it is unclear whether learners would be at all likely to level the underlying contrast as needed to reach stage 3. Each of the candidate outcomes [sas] and [saz] is a minority allomorph for one or the other of the two underlying forms /sas/and /saz/, so we question the likelihood of the forms becoming homophonous. If the problem were avoided by only levelling the allomorphs of original /saz/, this would leave voiceless surface codas in original sas atasa and sas tasa> [sas tasa], making it implausible for learners to postulate a final or coda voicing rule. Additionally, it is unclear why learners would proceed to stage 3 by leveling the underlying contrast in favor of the syntagmatically opaque and featurally marked option (final voiced obstruents); the voiceless alternative is globally just as common, and at stage 2 its distribution is not opaque and thus plausibly an easier basis for generalization. Thus, out of Kiparsky’s five diachronic scenarios for final voicing, three (1–3) do not actually produce surface forms compatible with a final voicing analysis, and two (4–5) seem unlikely to do so. Along with these considerations we can raise a broader concern with the theoretical framework Kiparsky deploys to rule out final voicing as a possible synchronic phonological process. His analysis crucially relies on the assumption that constraints can single out marked feature values but not unmarked feature values, with the consequence that ‘marked feature values [can be] suppressed [but not inserted] in “weak” prosodic positions’ (2006:222). Iverson and Salmons (2011) point out that in this scheme, phonologically ex nihilo feature insertion or addition is impossible. Both Iverson and Salmons (2011) and Vaux and Samuels (2005) provide numerous cases of coda enhancement effects that falsify this analysis. To summarize what we have seen in this section, in identifying areas where phonetics in conjunction with diachronic change predict the emergence of marked systems which fail to materialize, de Lacy and Kiparsky have opened up an important empirical front in the debate concerning whether UG principles of markedness are needed. Of course, these arguments are only as valid as the empirical claims they make. While none of the languages that Blevins suggested as synchronic examples of final or postvocalic voicing incontrovertibly display such phonological processes, Middle Persian is a plausible candidate for post-voiced voicing of obstruents including most word-final obstruent tokens in the language—the closest thing to final voicing for which we have identified a plausible diachronic origin. Kiparsky’s claim that the absence of final voicing points to synchronic constraints outside the natural influence of the phonetic apparatus on sound change is weakened by the fact that his own diachronic scenarios either fail to produce it or do so via typologically questionable paths. In light of the above, it seems premature to conclude either that final voicing does occur or that it cannot. A continued and concerted effort must be made to uncover examples of the supposedly absent structures before any final conclusions can be reached, as Kiparsky rightly notes (2006:234).

Phonology in Universal Grammar 181

8.5 Conclusions The final voicing phenomenon just discussed is ambiguous not solely with respect to whether or not it exists, but also as to whether or not its explanation would fall under the purview of UG. In the OT conception of UG currently employed by Kiparsky, whatever constraint(s) are responsible for the universal absence of final voicing would presumably fall under the rubric of CON, which constitutes the core of UG. If, however, final voicing is in fact impossible, as Kiparsky claims, the violable constraints of CON may prove insufficient to the task. Assume, for example, that we attempt to ban final voicing by employing the Generalized Alignment schema of McCarthy and Prince (1993) to formulate a constraint along the lines of Align(PWordR, [+stiff]), which would punish output candidates whose rightmost segment is voiced. A constraint such as this would prevent grammars wherein it was appropriately ranked from producing final voicing. By the OT tenet of Violability, though, such a constraint could also be outranked, for example, by a minimally different constraint Align(PWordR, [–stiff]), which if it also dominated the relevant faithfulness constraints would generate final voicing. In the absence of an explicit theory of the contents of CON, there is nothing to bar such a constraint and hence there is no explanation for the putative universal lack of final voicing processes. In light of this fact, a metaconstraint on GEN or CON may be required to ensure the correct typological result, and the membership of such a metaconstraint in UG vs. the LAD is not entirely clear. Problems of this sort in delineating the domains of UG and the LAD led us at the outset of this chapter to suggest a new division of labor between the terms UG and LAD. In this scheme, the territory covered by the term ‘UG’ in most work in the field is split in two, with ‘UG’ referring specifically to the initial state of the grammar that all normal humans begin with, and ‘LAD’ referring to the range of entities that learners employ in tandem with PLD to generate a lexicon and transmute the initial state into a steady state grammar that can map entries from the lexicon onto the desired outputs, and vice versa. We saw that many theoretical constructs commonly attributed to UG, such as the set of distinctive features, may be more properly viewed as components of the LAD in this division of labor, though this depends on the particular phonological theory one adopts. We saw moreover that arguments over the existence of such components, and of UG in general, typically revolve around questions of their language-specificity, rather than assailing the existence of UG as empiricists and the media often claim. As such questions do not actually get at the (non)existence or nature of UG, we then looked in closer detail at two case studies that promised to actually shed light on the matter. Our investigations of *NT and final voicing illustrated how for at least these sorts of cases, the arguments for phonological elements of UG submit at least as well to historical, phonetic, or other extra-UG explanations, raising the possibility that the phonological content of UG might be significantly sparser than most phonologists assume.

182 Brett Miller, Neil Myler, and Bert Vaux What really needs to be investigated at this point is the class of phonological phenomena that cannot be so readily accounted for without recourse to UG and/or the LAD, such as the spontaneous appearance in first, second, and toy language acquisition of phonological phenomena including derived environment effects, identity avoidance effects, and local ordering effects (see Vaux 2012 on Korean language games). Such phenomena provide a promising and largely unexplored area for future research into UG and the LAD, which moreover promises to bring phonological theory closer to its neighbor, experimental psychology, and may provide novel and unexpected insights into the phonological component of our linguistic endowment.

Chapter 9

Semanti c s i n Universal G ra mma r George Tsoulas

A kind of Platonism without preexistence. (Chomsky 1966/2009)

9.1 Introduction The traditional concerns of formal semantic theory in the Fregean mould are with reference and truth.1 The traditional concerns of the developers of Universal Grammar in the Chomskyan tradition have been with the structure-building and dependency-forming tools that are instantiated in the initial state of the language faculty, the tools that the child brings with her to the acquisition process.2 These sets of concerns have not overlapped a great deal. In his ‘Reflections on semantics in generative grammar,’ the late James Higginbotham wrote: Even now, however, many syntacticians are somewhat distrustful of semantics, and still more of the possibility of explaining anything, or anything that touches their

1 Claiming a paper’s errors and shortcomings for the author’s own cabinet of errors and omissions is routine. In the present case the vastness of the topic made some of the omissions necessary. I hope to correct that in future work. In writing this chapter I have tried to keep the overlap with von Fintel and Matthewson (2008) to a minimum. They have written in a most lucid and illuminating way about universals in semantics and I urge the reader to consult that work at the same time. Thanks to Norman Yeo for discussion and help with various aspects of this chapter. I would also like to thank Ian Roberts for making me write this and for his saintly patience as an editor. 2 This is the technical sense of the term ‘Universal Grammar.’ I am, in fact, not aware of any non- technical sense that is not a misunderstanding.

184 George Tsoulas professional concerns, about linguistic organization through semantic theory. (Higginbotham 1997:101)

Nearly two decades later, however, a lot has changed in the attitudes of syntacticians towards semantics and in the place semantics occupies, and is perceived to occupy, in the mind of those thinking about UG, for whom the main and most difficult task is to produce a specification of the initial state, whatever its contents turn out to be. The currently prevailing research strategy consists in attributing to UG as little as possible … but no less, one should add.3 Hauser et al. (2002) have caused a great deal of controversy by apparently suggesting that UG consisted solely in recursion. If that were the case there wouldn’t be much to say about the role of semantics (or anything else for that matter) in UG. But this is not the case, the conception of UG that will be pursued in this chapter actually follows Hauser et al. (2002) closely, but not entirely. Let’s take seriously their suggestion that: ‘In fact, we propose in this hypothesis that FLN [Faculty of Language in the Narrow sense—GT] comprises only the core computational mechanisms of recursion as they appear in narrow syntax and the mappings to the interfaces [emphasis mine—GT]’ (Hauser et al. 2002:1573). The last part of this quote seems to have been largely ignored, or at least not sufficiently explored in my view, in the discussion on the contents of UG. The minor departure from this conception will be that I will assume that the contents of UG are: (1)

a. The core recursive computational operation Merge. b. The universal set of linguistic features.4 c. The mappings to the interfaces.

In this sense, very generally speaking, we can think of the system as a whole in the following terms: the Universal Feature Set (UFS) forms the repository of the building blocks (features) out of which lexical items (LIs = Sets of Features) are built. LIs are manipulated by Merge (and Agree, etc.) in the course of the syntactic derivation. The output of the syntactic derivation is the input to the formal semantic component which interprets the relevantly structured object and yields a representation which is, in turn, relevant to the language-external, but mind-internal, systems such as belief formation, reasoning, general inferential processes, and so on. This is the interface representation. Whether the output of the syntactic derivation is an actual level of representation is in fact a moot point as long as there are no conditions that apply at that particular level. If we choose to call this level LF, as long as there are no statements of the form ‘At LF 3

Echoing Einstein’s famous ‘make things as simple as possible but no simpler.’ I take features to be the atomic information carrying linguistic units. The overall system may contain either or both privative and valued features. Recent work (e.g., Biberauer 2015) has suggested that at least some features may be in some way emergent (see also c hapter 14). Whether or not this turns out to be true it does not affect the system as presented here. If it does turn out to be true, the consequence for the universal feature set will be merely one of size. 4

Semantics in Universal Grammar 185 condition X must be satisfied’ then there is no problem in thinking of the passage from the purely syntactic processes to the semantic ones as a seamless process where all that really changes is the type of operations and perhaps, given that the semantic component does not add material, the direction of travel, i.e., the syntax will work bottom-up and the semantics top-down. But, of course, these are ultimately metaphorical notions, though they do resonate with current practice.5 Now, with this general picture in mind, the question of the semantic character of UG can be posed in the following way: (2) At which points in the system do constraints of a semantic nature apply, as a matter of UG principles, and what are they? Having posed the question in such wide terms I must, in the same breath, warn against any expectation that within the present chapter I will be able to either review or address the multitude of potential topics and empirical areas or do justice to the immensely varied and sophisticated semantic proposals and ideas available in the literature today. Instead of trying to discuss in much detail any specific proposals, I will look at the different areas where it is reasonable to expect UG-type semantic constraints to apply, lay out the conceptual case for them, and discuss some possibilities. At the same time I will adopt a very narrow approach and limit myself to the fundamental concepts of order, reference, truth, semantic computation, etc. The reason for this is, of course, that the landscape beyond is vast but the basic concepts cut across it. I will also adopt an approach that keeps things very close to the syntax. As mentioned earlier we are looking for the semantic constraints on the computational system. The question of semantics in UG has a peculiar history, or rather a non-history. Much work in the generative tradition has assumed—without much argument—that most if not all of semantics of the relevant sort was UG. The informal reason for this is the assumption that semantic properties and generalizations are somehow too abstract to be subject to reasonable parameterization. In other words, there is little in the evidence available to the child to set parameters that would be formulated in primarily semantic terms. Though there is much truth in this position, at the same time there has been little effort to spell out exactly what the content of the semantic component of UG is. Equally, there is in fact only one semantic parameter that is formulated explicitly enough to allow an evaluation of the resulting theory, namely Chierchia’s (1998b) Nominal Mapping Parameter, which has caused much debate and on which the jury is still out. Perhaps this relative lack of clarity is, in part at least, at the root of the ‘distrust’ Higginbotham saw on the part of syntacticians. At any rate, the hope is that we will be able to see a little more clearly in some corners of this landscape.

5 This general picture leaves aside obvious connections and interfaces internal to the system such as the interfaces with local and global pragmatic processes. On this matter, see Tsoulas (2015b) for some relevant commentary.

186 George Tsoulas The rest of this chapter is organized as follows. Section 9.2 deals with questions of ordering of arguments and modifiers. Section 9.3 addresses issues of semantic computation and the fundamental combinatorial principles that form part of UG. Section 9.4 deals with conceptions of reference and truth. The question here is to what extent principles governing the distribution of referential values ought to be part of UG and in what form. Section 9.5 addresses the issue of the relation between syntax and semantics as mediated by the particular conception of the lexicon put forward here. Section 9.6 concludes the chapter.

9.2 Order In this section we look at certain issues regarding ordering of elements in the structure. Before doing so, we set out a little background on the way we will understand the elements of the lexicon and their origin.

9.2.1 Merge and Concepts Humans have a store of concepts some of which end up as the meanings of structured linguistic expressions, either simple lexical items or more complex structures, while their essence remains the same. This is how Murphy (2002) puts it: ‘Concepts are a kind of mental glue, then, in that they tie our past experiences to our present interactions with the world. … Our concepts embody much of the world, telling us what things there are and what properties they have.’ Without, say, the concept of chair, ‘when we encounter new exemplars of this category we would be at a loss.’ Concepts are abstract and criterial. In the Euthyphro Socrates discusses the nature of forms and specifically suggests that in answer to the question ‘what is piety?’ we must assume that ‘there must be just one form of piety for all pious things (actions, people, and so on)’ (Irwin 1999:145). The idea, then, is that the general form, through some process, comes to manifest itself as the meaning of linguistic elements. We can identify the general form with concepts. The way in which concepts turn out to be meanings of linguistic elements is a matter of dispute. One idea goes like this: there is a very small set of primitive (semantic) features that can be used as a skeleton to which concepts (whatever they are) can be attached and create complex meanings. A variant of this idea is that language creates a framework where individual elements function as instructions to the cognitive systems to access concepts and create representations. Universal Grammar constrains the possible combinations by specifying the set of features that may be used and the way they may be combined. A different, and perhaps more illuminating, way in which we can cast this idea is in the familiar terminology of the interfaces. The proposal is that the lexicon is the interface between the UFS and the conceptual store. At the same time, the lexicon is also the interface between UFS and narrow syntax:

Semantics in Universal Grammar 187 (3)

Concepts

UFS Lexicon

Narrow Syntax

Making this a little more specific, the basic idea is that the relevant mapping is effected through the fundamental recursive operation Merge applying to elements of UFS (features). Let us, however, recall the general definition of Merge:6 (4) Let α and β be Syntactic Objects. Merge(α,β) = {α, {α,β}} where α is the head of the construction and labels the projection. This definition allows things of type ‘Syntactic Object’ (SO) to be merged. It is doubtful that features are SOs on their own. For the proposal above to work we therefore need an extended definition of Merge as follows (5): (5) For any α, β. Merge(α,β) = {α,β} Where, for our purposes α, β may be features drawn from UFS or SOs. If it is correct that single features are not SOs on their own, let us further assume a Merge Like With Like principle that would prevent us from merging SOs directly with features and vice versa. One advantage of this redefinition of Merge in these very general terms is that it leaves all labelling issues to be decided separately by, say, Chomsky’s (2013, 2015) labeling algorithm. Summarizing, the general idea then is rather simple. Merge is responsible for the creation of all syntactic structure whether this is the structure of syntactic objects (words and phrases) or the structure of feature sets that constitutes lexical items. As a result, we can see the mapping between the UFS and the lexicon as a mapping that may be semantically constrained. The relevant constraints are prime candidates for being part of the semantic aspects of UG. If LIs are structured feature sets (corresponding to concepts) the structure will reflect order in the syntax. The first area where this can be seen is the domain of argument structure. 6

There are several definitions of Merge but their differences are immaterial for our purposes. The definition in (4) is general enough for the point in the text.

188 George Tsoulas

9.2.2 Argument Structure The main question here is the ordering of arguments. On the one hand there is the more or less (by now) traditional idea that the ordering of arguments results from some kind of hierarchy of thematic roles which translates a semantic representation into syntactic structure.7 The goal here is to predict, based on a semantic hierarchy of thematic roles assigned to different arguments, which of these arguments will end up as an object or a subject and so on. One of the earliest such hierarchies is due to Fillmore (1968), and his subject selection rule shows how the issue might be tackled algorithmically: (6) If there is an Agent it becomes the subject; otherwise, if there is an Instrument, it becomes the subject; otherwise the subject is the Objective (Patient, Theme). Whatever the actual empirical success of this rule, it gives us an idea of the way the mapping from lexical semantic properties to syntactic structure might be effected. There is no shortage in the literature of proposed thematic hierarchies. Levin and Rappaport Hovav (2005:162–163) give the following sample:8,9 (7) No mention of goal and location: Belletti and Rizzi (1988) Agt

>

Exp

> Th

Fillmore (1968)

>

Inst

> Obj

Agt

Goal and location ranked above theme/patient: Grimshaw (1990) Agt >

Exp >

Jackendoff (1972)

Agt >

Van Valin (1990)

Agt > Eff

>

G/S/L >

Th

G/S/L >

Th

Exp

>

L >

Th > Pat

Goal and location ranked below theme/patient: Speas (1990)

Agt

>

Carrier-Duncan (1985) Agt

>

Jackendoff (1990)

Act

>

Larson (1988)

Agt

>

Baker (1989)

Agt

>

7

Exp

>

Pat/Ben > Inst

>

Th

>

G/S/L >

Th

>

G/S/L

Th

>

G/S/L

Th

>

G

Th/Pat

>

G/L

>

Man/Time

Obl

For a detailed exposition of the data and issues surrounding argument hierarchies and argument realisation, I refer the reader to Levin and Rappaport Hovav (2005), which is, to my knowledge, the most complete survey of the issues and analyses to date. 8 The word sample is key here. 9 L = Location, S = Source, G = Goal, Man = Manner.

Semantics in Universal Grammar 189 Goal above patient/theme; location ranked below theme/patient: Bresnan and

Agt > Ben

> Rec/Exp > Inst > Th/Pat > L

Kanerva (1989) Givón (1984)

Agt > Dat/Ben

>

Pat

> L > Inst

The question that arises from our perspective is what, if anything from the above, constitutes core semantic primitives that form part of the semantic character of UG. One answer to the question would be that one (or perhaps more) of the thematic hierarchies mentioned is a principle of UG to be supplemented with a mapping algorithm which can be of different types, i.e., an algorithm that translates (and preserves) thematic prominence directly onto hierarchical syntactic structure, or via a set of explicit linking rules. Some doubt on the view that attributes autonomous existence to the thematic hierarchy was cast by Hale and Keyser (1993:65) who write: While we feel that the grammatical effects commonly attributed to the thematic hierarchy are genuine, we are not committed to the idea that the hierarchy itself has any status in the theory of grammar—as an autonomous linguistic system that is. And we are sympathetic with the view (expressed by a number of scholars, often tacitly or indirectly) that questions the autonomous existence of thematic roles as well.

This type of distrust of the primacy and primitiveness of thematic notions (role, hierarchy) has led to a revival of the lexical decomposition approaches that were developed initially in the generative semantics literature.10 This so-called constructivist approach to argument realization and, by extension, to the creation of verbal meanings more generally has been developed among others by Hale and Keyser (1993); Harley (1995); Borer (2005); and Ramchand (2008, 2011). In this kind of approach to argument structure, argument interpretation is associated with specific positions in the structure. The fundamental insight that is shared to a greater or lesser extent by all these approaches is the central role played by events and their substructure in the account of argument structure. Argument structure is licensed by functional heads that correspond to parts of event structure, or eventuality descriptors as Ramchand (2011:453) puts it. Under this view, V(P) meanings are constructed from smaller pieces that are realized as functional heads. Consider as an illustration the structure Ramchand proposes (initP is the causing projection, procP the process projection and resP the result projection): (8) John split the coconut open.

10

See for example Lakoff and Ross (1976).

190 George Tsoulas (9)

initP

DP3 John

init

procP

split DP2 the coconut

proc

resP

split DP1

res

AP

the coconut

split

open

While other proposals differ in the mechanisms involved and other aspects of the implementation, the general idea should be clear. The second fundamental insight that these approaches share is that the important semantic distinctions within that area of the VP are aspectual in nature. Again with differences of implementation concerning, for example, the status of telicity as derivative or primary and the way arguments and aspectual information interface (directly as in Ramchand’s and Borer’s views, or through separate principles, as for example in work influenced by Tenny’s 1992 Aspectual Interface Hypothesis, etc.). The details can be set aside. The question that arises here is how much of this information is properly lexical and how much is not. For the purposes of this discussion I understand functional categories as being properly lexical in the relevant and appropriate sense. They are not lexical in the sense that they do not host roots. They are lexical in that they are part of the lexicon and they encode specific meanings which do not arise configurationally or in some other way. This is what Levin and Rappaport Hovav (2005) call the projectionist vs. constructionist debate. It is, of course, impossible to resolve this debate in the confines of the present chapter. However, we can draw some conclusions, albeit conditional ones. First, whether argument structure and realization is implemented through a thematic hierarchy or a set of projections, the essential point is that UG will have to encode in one way or another the hierarchical ordering in question. Ceteris paribus, this would be either the thematic hierarchy itself or the order of projections which would presumably be implemented in terms of selection.11 To a certain extent, Hale and Keyser’s (1993) comments cited above notwithstanding, this is a technical decision that depends on the framework within which these generalizations are to be expressed. One way to express this ordering would be to encode the order directly in the lexical entries, in the way Adger (2010) suggests. Another way would be to suggest that, 11

Ramchand’s proposals are somewhat different, and we return to them in the next section.

Semantics in Universal Grammar 191 if the earlier discussion of Merge as the operation that creates lexical items is correct, the order of arguments reflects the order of Merge of features. In other words, the proposal, under present assumptions is that the thematic hierarchy is a semantic constraint, part of UG, that dictates the order of featural Merge that creates LIs. This account embraces a version of lexicalism at odds with some of the strongly decompositional accounts mentioned above.12 We will return to the mechanics of the latter suggestion shortly.

9.2.3 More Order In order to be able to abstract sufficiently away from the technical issues, let us throw into the mix another case of hierarchical organization that may be semantically motivated. This concerns the functional hierarchy in the higher fields (above vP) such as those due to Rizzi (1997) (10) for the CP area, Beghelli and Stowell (1997) for scope related hierarchies (11), and Cinque’s (1999) hierarchy of adverb-hosting functional projections (12):13 (10)

Rizzi’s (1997) CP area elements [ForceP [TopP* [FocP [TopP* [FinP [IP … ]]]]]]

(11)

Beghelli and Stowell’s (1997) scope related heads [RefP GQP [CP WhQP [AgrSP CQP [DistP DQP [ShareP GQP [NegP NQP [AgrOP CQP [VP … ]]]]]]]]

(12)

The universal hierarchy of clausal functional projections (Cinque 1999:106) [Moodspeech-act frankly… [Moodevaluative fortunately… [Moodevidential allegedly… [Modepistemic probably… [Tpast once… [Tfuture then… [Modirrealis perhaps… [Modnecessity necessarily… [Modpossibility possibly… [Asphabitual usually… [Asprepetitive again… [Aspfrequentative(I) often… [Modvolitional intentionally… [Aspcelerative(I) quickly… [Tanterior already… [Aspterminative no longer… [Aspcontinuative still… [Aspperfect(?) always… [Aspretrospective just… [Aspproximative soon… [Aspdurative briefly… [Aspgeneric/progressive characteristically… [Aspprospective almost… [Aspsg.completive(I) completely… [Asppl.completive tutto… [Voice… well… [Aspcelerative(II) fast/early… [Asprepetitive(II) again… [Aspfrequentative(II) often… [Aspsg.completive(II) completely…]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]

12

Having said that, as will become obvious, I think that the decomposition is real but due to the syntactic process. 13 Needless to say, the works cited have sparked considerable technical and conceptual as well as empirical debate and have given rise to important bodies of work by these authors and others. The general directions are well known. Space constraints prevent me from discussing specific proposals in detail but the general points remain valid.

192 George Tsoulas Although the order in which elements eventually appear is a matter for the syntax to implement, the hierarchies themselves are semantic in nature.14 The same intuition also guides Ernst’s (2001) theory of adverb and adjunct placement which does not employ the same functional heads as Cinque but assumes a scopal hierarchy. What appears to emerge from the discussion above is that an appeal to semantics is made either directly in order to account for the order of elements, both in the functional field and for arguments in the vP, or in an underlying fashion as determining the order of selection of functional features. It would appear then that at the most abstract level semantic principles in UG reflect order. Exactly how the ordering principles will be implemented is a different matter; what Universal Grammar provides is a strict (scopal) ordering of elements or a strict ordering of the way elements enter syntactic derivations. The ordering of modifiers is, however, different from that of arguments. The difference lies in the fact that, at the end of the day, within the v/VP domain there is a unifying element (perhaps spread across a number of categories), namely, of course v/V, whereas there appears to be nothing of the sort in the case of modifiers. One hopes that, perhaps the difference is not so great as to preclude an eventual unification. Consider for a moment the following idea: a standard criticism of theories of a broadly cartographic sort is based on complexity in cases where the entire sequence is not needed. The derivational conception of the lexicon that is put forward here allows us to think of the relevant features as bundled in each relevant phase head in a particular order (semantically constrained) and then unbundled syntactically15 in the same way as for arguments. If this is on the right track the first major area where universal semantic constraints apply is the derivation of specific LIs and takes the form of the relevant hierarchies. Formulating the exact nature of these hierarchies would take us too far afield but the concept is clear.16 We now turn to the second area, namely semantic computation.

9.3 Semantic Computation What I call the Computational Component of the semantic part of UG corresponds to the ways available for compositionality to be properly computed. The principle of compositionality itself in the form most widely accepted and used in natural language semantics is stated in (13): (13)

Compositionality of Meaning The meaning of a compound expression is a function of the meaning of its parts and the syntactic rule by which they are combined.

14

This is arguably less so in the case of Rizzi’s proposals. More on this later. 16 Similar considerations apply to the nominal domain. It would take us equally far afield to discuss those but the conjecture is that the form of the constraints is the same. 15

Semantics in Universal Grammar 193 Now, the first part of the principle is for the most part directly derivable from the structures that Merge produces as long as semantic computation is assumed to be local. To be clear about what we mean by derivable at this and later points: we take it that the input to the semantic component is syntactic trees which are interpreted in a local and incremental fashion. This is not a logical necessity, but under the inescapable assumption that syntactic structure is created by Merge it is the simplest and optimal option. If this is correct then the only statement that UG will need to include in this regard is: (14)

Principle 1 Interpretation proceeds locally.

The second part of (13) assumes that there is a semantic translation of the syntactic rules used to combine expressions. This was formalized by Bach (1976) as the rule-to- rule approach or the parallelism thesis by Hintikka and Kulas (1983). Janssen (1996) expresses it as follows (15): (15)

For each syntactic rule, which tells how a complex expression Ɛ is formed from simpler ones say Ɛ1, Ɛ2, …, Ɛn there is a corresponding semantic rule which tells how the meaning of Ɛ depends on the meanings of Ɛ1, Ɛ2, …, Ɛn.

Assuming something like (15) is, however, somewhat problematic. The theory of syntax that we are assuming here (and its predecessors going back to Chomsky 1986b, Lasnik and Saito 1992, etc.) does not countenance more than one syntactic rule, i.e., Merge.17 The question then is: do the different semantic rules (modes of composition) follow from Merge? There are two ways of going about this issue. First, we can assume that the only principle of semantic composition available is indeed functional application. This is what Heim and Kratzer (1998) call Frege’s conjecture and according to von Fintel and Matthewson (2008) the null hypothesis.18 Assuming this to be the case entails a number of complications in the analysis of various constructions, but we will leave this aside. From our perspective, if FA is the only principle and semantic interpretation is fully type-driven (Klein and Sag 1985) then UG must also contain the principles that allow type-theoretic adjustments. The second way is to accept more than a single composition principle. Heim and Kratzer (1998) propose a principle of predicate modification which they define as follows: (16)

Predicate Modification If α is a branching node and {β,γ} is the set of α’s daughters, then α is in the domain of [[ ]‌] if both β and γ are, and [[β]] and [[γ]] are both in D. In this case, [[α]] = λx ∈ De. [[β]](x) = [[γ]](x) = 1.

17 Or mutatis mutandis Move/Affect-α. 18

See also Adger and Tsoulas (1999) who suggest that FA is the operation that corresponds to Merge.

194 George Tsoulas This principle pertains to the composition of structures that in simplified form look like the following: (17)

NP AP big

NP AP

N

blue book

More generally the composition of modifiers like blue and big whose denotations are is, according to Heim and Kratzer, best analyzed as given by predicate modification. The obvious conclusion to draw from this is that the rule-to-rule hypothesis must be refined somewhat in the sense that we would have to accept that the correspondence between semantic and syntactic rules must be one-to-many. The many semantic rules include at least Function Application and Predicate Modification. There are, however, other composition principles proposed in the literature. Chung and Ladusaw (2004) propose that a new operation should be added to the standard list of composition operations, namely Restrict, which works as follows: (18)

Restrict Restrict (λyλx[feed′(y)(x)], dog′) = λyλx[feed′(y)(x) ∧ dog′(y)]

In other words, Restrict is a mode of composition that allows two expressions to compose without one of them saturating an argument position of the other. The evidence that Chung and Ladusaw (2004) adduce in favor of Restrict relates to the determiner system of Maori and object incorporation in Chamorro. As von Fintel and Matthewson (2008) point out, it is interesting to note that although Chung and Ladusaw intend Restrict as a universally available principle, strong evidence is only available in a few languages.19 It should be pointed out, however, that given that not all languages should manifest all the available options that UG allows, then admitting Restrict among the candidates for UG principle is not problematic. All these principles20 are candidates for the semantic content of UG. Whether they are or not remains an empirical question. Intimately related to the question of semantic composition and the degree to which it is solely type-driven or requires other operations too is the issue of type-changing operations. The classic proposal in this area is found in Partee (1987) who argues that N/DP 19 Von Fintel and Matthewson point at Matthewson (2007), which suggests that the Salish determiner system is compatible with Restrict but does not constitute strong evidence in its favor. In the same vein Tsoulas (2010) suggests that multiple-subject constructions in Japanese, Korean, Arabic, etc., may provide further evidence in favor of the operation Restrict. 20 And others that we do not discuss in detail here such as function composition, event identification, etc.

Semantics in Universal Grammar 195 meanings can denote in three distinct model-theoretic domains, namely the domain of entities , the domain of properties and that of generalized quantifiers and there is a small number of operations that allows type shifting between the different domains. Her schema and definitons are as follows: (19) LIFT

LOWER

ed Pr m No nt Ide a Iot

BE

A

TH

E

The type shifting operations are defined as follows:21 (20) a.

lift: j → λP[P(j)]

b. lower: This operation maps a principal ultrafilter onto its generator: lower(lift(α)) = α c.

ident: j → λx[x = j]

d. iota: P → ιx[P(x)] e.

nom: P →∩P

f.

pred: x →∪x

Taking these mapping operations as basic we have an almost complete view of the semantic constraints and operations on the computational system. These operations can and should be conceived of as constraints on the mapping to the interfaces. However, these constraints on the mappings to semantic representations have recently come under serious criticism. The core of the criticism is that unlike what appears to be the case, the principles of semantic composition and compositionality place hardly any 21

lift and lower, iota and ident, and pred and nom are inverses.

196 George Tsoulas constraints on human languages. This is so, the criticism goes, because the formal language (the λ-calculus) is too powerful, it is too easy to satisfy the requirement (after all the meanings of the components of a complex expression and the rule that combines them is all there is) and, especially if type shifting is allowed, then the contribution of the principle is unclear (i.e., there is no constraint if we are allowed to change the types when it suits us). Criticisms such as the above in various degrees of strength have been voiced most recently by Pietroski (2005, 2008, 2011); Higginbotham (2007); Ramchand (2008, 2011). Higginbotham (2007) simply suggests: neither to insist upon some version of compositionality at all cost nor to propose that exactly this or that syntactic information is available to the interpretive component of a grammar; but rather to take compositionality in a strong form as a working hypothesis, and see what the consequences are.

Ramchand (2008, 2011) on the other hand suggests that at least insofar as the vP is concerned the principles involved are altogether different. The semantic glue that she proposes consists in the following: (21)

a.

Leads to/Cause (→)

Subevental embedding

b.

Predication

Merge of DP specifier

c.

Event Identification (conjunction)

Merge of XP complement

This is an intriguing proposal, but even if we accept it (or a variant thereof) it is unclear that it would be replacing the other principles mentioned above unless we can find a way to extend this type of composition to the higher layers of the clause. Finally, Pietroski (2005, 2008, 2011) builds a very different system where all (or almost all) concepts are monadic and are put together through a conjunction operation Conjoin. Pietroski’s system requires a richer ontology. His system is complex, and we cannot discuss it properly here. If this proposal turns out to be successful and empirically adequate it would be the most minimal of all, though as already noted this minimality is offset by the richness of the ontology. We cannot discuss these criticisms of compositionality in detail. Suffice it to say that if we take it that assuming compositionality, or the locality of interpretation principle (14), does not necessarily entail that the λ-calculus is the only formal tool for the job, and using something different like the principles mentioned above does not take away from the power of compositionality as a restrictive constraint. We also have nothing better at the moment. To conclude this section, principles that fall under the umbrella term of compositionality constrain significantly the ways that meanings are put together to produce interface representations. They are restricted in number and general enough to be good candidates for UG. In the next section we turn to two of the core defining concepts of semantics as we conceive it here, namely reference and truth.

Semantics in Universal Grammar 197

9.4 Reference, Truth, and the Like That natural language serves to talk about the world is undeniable. The recognition of this aspect of language use has put the notion of reference at the heart of formal semantics. At the same time, it is also one of the most problematic aspects of the theory. Chomsky has been one of the most vocal critics of the approach to semantics that puts reference center stage. However, he has not offered much by way of replacement. There have, however, been recent proposals that aim at reconciling the generative enterprise with referential semantics. The idea that is shared by these proposals is that not only is reference an essential part of language but also the way we refer to objects in the world is particularly constrained by the fact that it is done through language and language is grammatically organized in the way it is. Longobardi (1994) is one of the first serious attempts to connect specific syntactic processes to referential possibilities and understand these syntactic processes as referential strategies. In a nutshell, his proposal was that for elements that refer directly to individuals, like proper names, the syntactic movement of N to D is a prerequisite. From the point of view that matters most to us here, the core innovation of this proposal (and its subsequent development) lies less with the particular idea that N-to-D movement is necessary for object reference (i.e., non-quantificational denotation) and more with the fact that it syntacticizes, so to speak, the notion of reference. At the same time, it separates the referring part of reference from the referent itself, elegantly sidestepping most of the arguments that have been raised against the notion of reference as a core part of language and linguistic analysis. A further elaboration of Longobardi’s proposal and extension to the clausal domain is due to Sheehan and Hinzen (2011) who suggest that: These denotational strategies allow for the broad generalization that as we move from the phase-interior to the phase-edge, intensional semantic information is converted into extensional semantic information. Reference, in the extensional sense, is in this way an ‘edge’ phenomenon.

These ideas are important for our understanding of the way syntactically structured units (SOs) can refer, what remains obscure is the identity of the instruments of reference which would allow an understanding of the distribution of referential values across the structure, in other words, allow for the establishment of anaphoric relations. Being capable of referring does not ensure that the relevant connection can be made independently. For many years the device that linguists have relied on is that of an index. But that concept has been the object of much suspicion in the last 20 years or so and, perhaps unsurprisingly, no good replacement has been offered. In the next few sections I will address the issue of indices from a general point of view. If the concept can be defended appropriately then the relation of reference will make a better candidate for UG status.

198 George Tsoulas

9.4.1 The Trouble with Indices 9.4.1.1 Reference and Quantification As a device indices, (notation: numerical subscripts) on pronouns and DPs, immediately suggest themselves for the notation of relations of identity (in the relevant sense).22 Consider for instance the opening sentence of Fiengo and May (1994): (22) In characterizing syntactic relations there is no notion more central than that of an index. This is because it is this notion that allows us to speak of various elements in a structure as ‘going together’ in some sense as being the same or different. Important though it is, this use of indices is in fact derivative. The primary purpose of an index is, of course, reference.23 A pair of coindexed elements co-refers by virtue of the fact that a single indexed element refers. In assigning values to (unbound) pronouns the assignment function g is most commonly thought of as a function from the set of natural numbers to individuals. This state of affairs, however, does not automatically mean that indices are indispensable. They may be very useful but like many devices which present some notational convenience they may, eventually, not resist analysis. So the question that arises here is whether indices are necessary parts of the linguistic representation. There are several ways one can approach this question and certain traps ought to be avoided. If indexation is necessary to the theory of grammar and the primary purpose of indexation is reference, then reference is necessary to the grammar. Let us clarify a little further. Reference is a suspicious notion when seen strictly within the bounds of language–world connections.24 There are essentially two ways in which reference can be considered suspect. First as a matter of narrow syntax. It does not seem to play much of a role within the computational system in the sense that there is little in the computational system that cares about the worldly entity associated with an expression Ɛ. The second source of suspicion is ontological commitment. If my saying (23): (23) Lenin is sleepy

22

I am concerned in this chapter with referential indices alone. The general concept of an index assumed is the one given in the text. The type of index that I am not concerned with here is David Lewis’s concept of ‘index as an n-tuple of features of context’ packaged together (the coordinates of the index). See Lewis (1981) for more details. 23 Baker (2003) comments in the same spirit on the subject of indices. His indices are in fact formally different as they are pairs of integers and have different conditions. Baker’s criterial essentialism merits much more discussion than I can include here. It raises conceptual and philosophical issues that would take us in a very different direction. On the general philosophical issues see Wiggins (2001). 24 This is a view that Chomsky has repeatedly expressed in many places and in different ways. A representative selection: Chomsky (1975, 1992, 1994,1995a). See also the remarks in Stemmer (1999) and elsewhere. For an interesting view that holds that referential semantics is indeed possible within Chomsky’s assumptions and compatible with them, see Ludlow (2003) and Chomsky’s response in the same volume (Chomsky 2003b); see also c hapter 3.

Semantics in Universal Grammar 199 commits me to the existence of the individual Lenin, then, as Chomsky has often remarked, a reference-based theory leads us to the commitment to the existence of entities like flaws in arguments, Joe Sixpack, and so on to limit ourselves to non-relational entities, lest we enter Meinong’s jungle. But, at the same time, ontological commitment ought not to be the barrier that Chomsky makes it to the development of a semantic theory based on the idea that linguistic expressions are mapped onto ontologically independent entities. Following, among others, Longobardi (2008) we can simply take for granted a ‘(mental) ontology and its syntax’ with different types of elements (namely, kinds, objects, and properties) which constitute the ontology of the C-I system. The point here is the ontology in question is clearly an I-ontology, and one might feel less worried about making commitments concerning the existence of particular kinds of things in that ontology. Now, if these assumptions are on the right track there must be a device to make the connection between elements of the Domain and expressions of the language. An alternative one might try to pursue would amount to developing a theory whereby referential expressions are associated with reference conditions, not actual referents.25 In a way this also makes commitment to odd entities disappear. But if that takes care of reference, our woes do not end here. There is also quantification. The Tarskian, satisfaction-based understanding of quantification has forced us into what has now become the orthodox view of quantification, the so-called objectual view, i.e., quantifiers range over actual objects. As Chomsky (1993:34 and note 42) correctly points out this idea is inappropriate for the semantics of natural language. Specifically, Chomsky writes: (24) […] some of the notions used here (e.g., objectual quantification) have no clear interpretation in the case of natural language, contrary to common practice. Natural language quantification looks a lot closer to what R. B. Marcus (1972) called substitutional quantification.26 Although it is beyond the scope of this chapter to get into the details of substitutional quantification suffice it to say that it has two particularly crucial properties. First, it carries no ontological commitment. Here is Marcus (1972) on this point: When ontological inflation is avoided, the apparent anomalies that arise in going from so simple a sentence as: A statue of Venus is in the Louvre to (∃x) (A statue of x is in the Louvre) are dispelled. Whatever the ontological status of Venus, it is not something conferred to by the operation of ∃-quantification substitutionally conceived. 25

An approach along these lines has recently been elaborated in Sainsbury (2005). For more on substitutional quantification, see Kripke (1976); Baldwin (1979); and references therein. 26

200 George Tsoulas The second interesting property is the fact that substitutional quantifiers quantify over expressions rather than objects (hence in fact the lack of ontological commitment and existential import).27 Under the set of ideas briefly sketched above, it would then seem that any misgivings one might have in using indices for reference and quantification can be safely put aside. In other words, if UG provides a tool for reference, an index (whatever it may actually turn out to be) seems like a good bet. A second source of criticism, however, comes from Linguistic Pragmatism.

9.4.1.2 Linguistic Pragmatism Although connected in some way to the comments in the previous section, this one is a bit different. The basic concern that proponents of linguistic pragmatism have with indices is that they do not do justice to the complexity involved in reference assignment in utterance interpretation. In their view, invoking an index to assign reference or a context to specify a meaning for an utterance is in fact not possible because utterances don’t come with contexts and indices attached to them. Most of the work that hearers have to do is pragmatic and inferential. Here is how Stephen Neale expressed this point of view in a recent article (Neale 2004:81–82): (25)

Idealization and abstraction from the details of particular speech situations or contexts is unavoidable if work is to proceed. To this extent, we may temporarily avail ourselves of the formal ‘indices’ or ‘contexts’ of indexical logics in order to anchor or co-anchor the interpretations of indexical or anaphoric expressions that are not our primary concern at a certain point of the investigation. We should not take indices themselves particularly seriously, however. They are useful transitory tools, methodological or heuristic devices, not serious posits in any theory of utterance interpretation.

According to Neale then, an index is something that we use when we don’t want to bother with what it stands for in order to get on with other, more pressing matters. But whatever one might think of utterance interpretation the question of whether indices occur in linguistic representations is quite separate. Interestingly, in another paper, Neale comments directly on the use of indices in linguistic representations when they are used to represent binding relations. In Neale (2005:214), he writes: (26) We turn now to indexing. To say that α and β are co-indexed is to say something of syntactic import that may or may not have interpretive consequences depending upon how indexing is elaborated. Let us suppose each DP is assigned some index which we might indicate as a subscript. What prevents co-indexing being of ‘merely syntactic’ interest (if this even makes sense) is its interpretation. The whole 27 An alternative would be to embrace noneism and divorce completely quantification and existence. For a defence of noneism, see Priest (2005); for a historical perspective, see Priest (2008).

Semantics in Universal Grammar 201 point of indices would disappear if co-indexing were not meant to indicate something of interpretive significance. When ‘himself ’ takes the same index as ‘John’ […] we think of the two expressions as linked for the purposes of interpretation. In a rather perverse manner it seems that Neale’s comment in (25) is far more consonant with the view that Chomsky has been advocating based on the inclusiveness condition. But Neale’s comment only in part refers (if at all really) to binding relations (he mentions co-anchoring which is a more general notion), as opposed to the comment in (26) which is incompatible (at least at first sight) with inclusiveness.28 Before we get into a discussion of inclusiveness (the third source of trouble), I think we can also safely put aside the concerns arising from Neale’s linguistic pragmatism.

9.4.1.3 Inclusiveness The inclusiveness condition also known as the No Tampering Condition (NTC)29 (Chomsky 1995b:228) is given in (27): (27) A ‘perfect language’ should meet the condition of inclusiveness: any structure formed by computation (in particular π and λ [PF and LF]) is constituted of elements already present in the lexical items selected for [the numeration]; no new objects are added in the course of computation apart from rearrangements of lexical properties (in particular, no indices, bars, traces, lambdas). Why should a perfect language meet the IC? The idea here is that the IC/NTC is motivated as a natural requirement on efficient computation, perhaps a third-factor effect, and its status as such is not seriously in question. By this, I mean that it is certainly more efficient for the computation to proceed without manipulating the internal structure of lexical items or by creating intermediate projection levels and so on.30 One particular, often cited, much less properly discussed, and rather problematic consequence of inclusiveness is the ban on indices. Within the context of the previous sections, and setting the NTC/IC temporarily aside, for the sake of the argument the question we need to ask is this: the NTC notwithstanding, wouldn’t indices be required by principles of efficient computation, assuming that such principles substantially constrain the efficient and successful mapping to the interface?31 I think that the answer is positive. Indexation, as 28

Neale is aware of this and says so in a footnote (note 86, p. 214). As I understand it, inclusiveness is really entailed by the NTC, as, say, adding an index would be tampering with the Lexical Item. The NTC is a more wide-ranging condition. 30 Although, valuation of unvalued features might, in certain circumstances, count as tampering but I will leave this aside. I assume we must accept at least that much tampering, 31 Notice also that the question is independent of a variety of other similar questions regarding, say, the best place for the binding theory to apply and so on. This is a basic question about the role of inclusiveness and its consequences for the role of reference in the grammar. Also, one should not think 29

202 George Tsoulas a means to assign reference (to individuals of the C-I system) to linguistic expressions, and derivatively, to signal co-reference and binding relations, is necessary. So, we have now reached a paradox, namely: how can inclusiveness—itself a principle of efficient computation entailed as it is by the notion of perfection—preclude the use of indices when a perfectly efficient computation mapping a numeration to the C-I interface would require them? In other words I do not assume that binding/anaphora is altogether outside the domain of the grammar. Indeed, the only reason to assume something like that is that the analysis of binding/anaphora, etc., tends to involve things that have been a priori ruled out (indices and the like). But we saw that this is not necessary and under an appropriately elucidated conception of the elements of the theory of grammar, there seems to be no reason to assume something like this. To sum up this general discussion then, it seems that from a rather general point of view, and under suitable assumptions and idealizations, there is a case against the use of devices such as indices. The case in question is twofold and responds to the two uses of indices.32 First for reference simply. Neale’s linguistic pragmatism purports to ultimately do away with indices as referential devices. Chomsky’s inclusiveness goes a step further and eliminates them also in their relational use. Chomsky’s stance is of course stronger and for him, eliminating indices in their relational function is tantamount to eliminating them altogether since he does not see reference per se, at least as it is generally defined, as part of the semantics of natural language (see also c hapters 2 and 3). However, what I hope to have shown here is that as soon as we start taking more seriously the more general framework of sentence interpretation for minimalist grammars, we are led to make a number of assumptions regarding reference and quantification that eventually remove all a priori problems with the use of indices. But I tried to go a step further. In fact, the argument went, the grammar needs indices in order to be a maximally efficient computational device mapping the numeration to C-I. And yet indices are banned for the very same reasons. This is the paradox with which we end up. With that much by way of background, let’s now consider a possible solution to the paradox.

9.4.2 A Solution to the Paradox I want to claim here that a solution to the paradox just mentioned can be found if we put forward a different conception of what an index is. It would be pointless, it seems to me, to try to solve the paradox by inventing a completely different mechanism to represent what indices represent. Indices have special characteristics (most prominently their unbounded nature) that make it so that any other device will have to answer the same questions. The first step towards a solution is to understand the nature of indices that once indices are allowed then some sort of floodgates are open; if sufficient motivation exists, fine, otherwise not. 32

I completely disregard here other uses of indices, such as movement indices, etc. I assume that these are indeed unnecessary.

Semantics in Universal Grammar 203 in representations. For simplicity, I will concentrate during the formulation of the proposal on indices applying to nominals only.

9.4.2.1 Constant and Variable Indices There are two different types of indices. They are not different in nature—they are both indices (numbers if you wish). They are different in that they play different roles. Take first indices on simple pronouns as in (28): (28) She4 smiles. In this case, the index 4 is an interpreted index, given by an assignment function g, in a way that She4 = Nadia. But if the actual interpretation of the index is provided by the assignment function only, then the only thing that the syntax will require is some kind of variable index, say: (29) shex Call this a variable index. On the other hand, there are things that are not variable in the same way. Take for example the interpretation (perhaps gesturally supported) of the DP this man in (30): (30) Call the wagon, this man is dead. In this case, this man only refers to a specific man. Now, if this is true, then we already have to modify slightly the standard concept of an index and the way it gets interpreted. Pronouns and DPs will not be indexed numerically. The assignment function will map what I called here variable indices to numbers which in turn will refer to the elements in the ith position of individuals, arranged in a sequence. In a similar vein, consider indexical pronouns like I and you. These pronouns do not require indices in the classic sense because they, in a way, are indices.33 Call these constant indices.34,35 To put it differently, these are interpreted indices that are overtly realized in the form of pronouns/demonstratives. We can further extend this idea of overt realization of indices to cases such as the following (31): (31)

A: Which Peter? B: That Peter.

33

This is close to Peirce’s definition of an index sign (Eisele 1976): An index is a sign fit to be used as such because it is in a real relation with the object denoted. 34 The analogy that I am drawing here has its (obvious) limitations and requires further discussion. For my present purposes, however, it will do. 35 In keeping with Peircian terminology I might have called them dynamic constants.

204 George Tsoulas In (31) the question ranges over the index of the name Peter and the answer is given by an overt realization of the questioned index. The proposal here draws on previous work by Burge (1973) and discussion in Larson and Segal (1995) and Elbourne (2002). It is also close in nature to Postal’s classic proposal that pronouns are Ds (Postal 1966) but differs in implementation from his. The realization of pro-index in an adjoined position is also motivated by examples like (32) alongside Postal’s examples in (33): (32) (33)

a

I Claudius

b

We the people

c

We communists

d Us linguists If this line of reasoning is on the right track, then I would now like to take it to its natural conclusion. If some pronouns, demonstratives, and determiners behave like or are indeed indices, then the null hypothesis to investigate is that all indices are some form of pronominal element accompanying nominals. More concretely, I would like to propose here that an index is in fact, in the general case, a pro which is merged with all DPs: (34)

DP pro-index

DP

We could further streamline the mapping to C-I by maintaining the current understanding of assignment functions as functions from variable indices to integers, restricting the reference of pro-index to the set of integers. The immediate effect of this proposal is that the paradox to which we were led to in the first part of this section is resolved. As a result we have a system where the mapping to the interface meets the efficiency criteria of perfection as the computational system delivers a structure that can be immediately interpreted without any further tampering by the semantics. Obviously, the interpretation will enrich the content in many ways but crucially not the structure. Concerning the No Tampering Condition, there is no tampering with lexical items. Inclusiveness is preserved, since a pro is nothing but a lexical item and it is added by Merge, which means that this is no more tampering that building a normal structure. As a result, indices as instruments of reference—a semantic constraint part of UG—are simply there, nothing more needs to be said, when they are properly understood. Having now dealt with the issue of referential indices and their status with respect to the IC/NTC, the next step is of course to consider how these indices enter into the establishment/representation of anaphoric relations. The next section turns to this issue.

Semantics in Universal Grammar 205

9.4.3 Asymmetry, Directionality, and the Probe–Goal system The probe–goal system, basic in dependency formation in minimalism, is also fundamentally asymmetric. Interestingly though, it is asymmetric in exactly the opposite direction from the one required by the binding theory. As is well known, a probe P is a probe in virtue of an uninterpretable or unvalued feature which needs to be valued by a matching goal G that is internal to the syntactic object headed by the probe. It is crucial, at least for Chomsky (2008), that P is not required to c-command the goal G. P seeks G in its domain. Binding theory, on the other hand, has always been formulated in terms of the dependent being c-commanded by the element it depends upon. Despite that discrepancy, I would like to try to take advantage of the fundamental asymmetry of the probe–goal system to provide an index-based asymmetric account of binding. To do so I want first to return to the nature of the pro-index. Most importantly I want to further specify two things regarding its featural content. I will take the feature matrix of a pro- index to be constituted as follows, for the relevant part: (35)

Uϕ +/– R

That pro-index may contain φ-features is not unexpected; it is pronominal in nature anyway. The [+/–R] feature, on the other hand, represents the difference that we pointed out earlier between variable and constant indices. A [+R] index corresponds to a constant index whereas a [–R] one corresponds to a variable index, which can be given a value by the assignment function g. As for the φ-features, they are similar to the φ-features of Tense, in that they are uninterpretable and need to be valued by a goal in the domain of pro-index. With this much in place in terms of conceptual and technical apparatus, we can move on now to the binding conditions themselves.

9.4.3.1 The Binding Conditions The binding conditions I will now try to reformulate are the classical ones given below in a general form: (36) The Binding Conditions a.

Condition A An anaphor must be bound in domain D.

b.

Condition B A pronoun must be free in domain D.

c.

Condition C A referential expression must be free.

206 George Tsoulas Let’s start with Condition A. As Chomsky (2008) has suggested, condition A may in fact be not a case of c-command but rather a case of Agree. How exactly this suggestion can be implemented is a technical challenge to say the least. It is clear, however, that if we take the general line on the probe–goal system that Chomsky (2008) suggests we will end up with a situation that would seem, intuitively at least, unsatisfactory. The situation I am referring to is as follows. According to Chomsky (2008) the way to understand Condition A as a case of Agree is by assuming that a reflexive in object position bound by an element in Spec,vP is in fact not bound directly by the element sitting in Spec,vP. Rather, v is the head that probes and presumably agrees with the reflexive and somehow the reflexive appears indirectly to be bound by Spec,vP. In fact what Chomsky seems to have in mind is a case where both the reflexive and the antecedent are goals of the same probe. The Agree processes can be shown schematically in (37). On the other hand, under my proposal we could conceive of the φ-features of the pro-index as the probe that would cause an agreement effect with the reflexive. One possible schematic representation of (38) would be (39): (37)

HP XP

HP LP

H ZP

L

XP

WP R

(38) Hortense likes herself (39)

vP DP pro-index

v’ DP

v

Hortense

likes

VP V tlikes

DP pro-index

self

her

A number of problems are immediately obvious with (39). To take the most important one: how does the subject, or more specifically the subject’s index, probe into the vP all the way to the reflexive? I want to suggest here that the relation is indirect. The probe of the pro-index takes two separate paths, first it probes into its own domain, its sister constituent where it finds the φ-set of the nominal and gets valued. At the same time, as the

Semantics in Universal Grammar 207 D head is being altered the whole projection is altered dynamically; this should be true given that under bare phrase structure there is no D0−D′−DP but D−D−D. Therefore no issues of feature percolation, etc., arise. Then there is transfer of properties to the phase head v which then probes the reflexive. Schematically, this is represented in (40). The curved lines represent the indirect agreement path and the straight lines the direct path. Notice also that I am taking the pronominal part of the reflexive to be in fact the overt (another one) realization of the pro-index. Thus, reflexive anaphors of the type herself are represented simply as self; the fact that they are anaphors and therefore dependent is captured by the presence of the pro-index: (40)

vP DP

v’ DP

v

Hortense

likes

pro-index

VP V tlikes

DP pro-index

self

her

Another perceived weakness of this approach is the fact that it seems to introduce again Spec–Head relations. The only reason one might offer is that it seems that operations are not only driven by phase heads. But this is not exactly accurate. It is the phase head that drives the operations. Transfer of properties from the specifier is precisely because the specifier cannot drive operations in the same way. If this line of reasoning is along the right track, then we have a way to reconceptualize condition A in a natural manner as a case of Agree without direct reference to c-command. Turning now to Condition B, there are two options. Either the pronoun has a [+R] feature or a [−R] one (a constant or a variable index respectively). The representation of a [+R] pronoun then would be (41): (41)

DP pro-index

DP

Uϕ +R

She ϕ

The pronoun’s Uφ probe is valued by the φ-features of the lexical pronoun. As a result, this pronoun is closed so to speak and cannot be further probed (φ-agree). No more needs to be said. In the case now of a pronoun with [−R], i.e., a pronoun with a variable index, again there is no possibility of the pronoun being directly probed and φ-Agreeing

208 George Tsoulas with anything since it is closed. However, the [−R] feature can be probed and be bound by an operator, which does not (must not?) require φ-agreement. If Adger (2005) is correct, there is even more reason to assume that operator binding ought to be disconnected from φ-feature sharing. So this is the case of bound pronouns. Finally, Condition C is the easiest case. The representation of a name will be (42): (42)

DP pro-index

DP

Uϕ +R

Hortense ϕ

Obviously then, it can be the target of no probe, as expected. The only cases will be so- called mistaken identity cases which are the purest examples of coreference like (43): (43)

Hortense is Delia

Again, no more needs to be said.

9.4.3.2 The Instruments of (Co-)Reference: Conclusion Assume now that the preceding development is indeed on the right track. The question is where does that leave us now? More precisely, is there a sense in which the proposal developed yields the effects of indexation insofar as the binding theory is concerned? The answer to this question turns out to be more complicated than one might initially expect. I believe that the truth is that indices, as standardly conceived, are, in this proposal, eliminated really. There are no numerical subscripts anywhere to be seen in the representations given above or any other that this proposal entails. The focus here is shifted to the possible relations that nominals may enter with pronouns and reflexives, as these are made possible by the standard tools of dependency formation, i.e., the probe–goal system and the relation Agree. At the same time though, this proposal recognizes that these relations are not relations between pronouns/reflexives per se and their antecedents. Just as in a theory that is enriched with standard indices, relations of identity reduce to relations between indices, I have represented this as relations between the pro-index elements accompanying pronouns and nominals.36 Thus, in a clear sense this theory does yield indexation and the effects of co-indexation. Furthermore, this theory allows a straightforward syntax–semantics mapping and removes some of the mystery from the mechanisms of the syntax–semantics interface. Significantly perhaps, 36 The present account shares a number of features with Kratzer’s (2009) account of pronouns, namely the indirectness of the relation between antecedents and reflexives/pronouns. It is different however in the role that the functional heads play as the present view still maintains that it is part of the DP that actually enters in a relation with the anaphor or pronoun.

Semantics in Universal Grammar 209 we have reached through a different route a conclusion similar to the one Sheehan and Hinzen (2011) have reached, namely that reference is an edge phenomenon (see chapter 2). The details are different, the intuition ends up quite similar. In the terms of the present chapter, a pronominally realized index is a semantic tool that UG makes available.

9.4.4 Truth Let us now turn briefly to the notion of truth. The notion of truth is problematic in that, just like reference, it is supposed to be defined with respect to language-and mind-external situations and entities. This so-called correspondence theory position is the standard whether or not it is explicitly acknowledged. The question is whether such a theory is ultimately consonant with the internalist perspective prevailing in the generative paradigm. Perhaps truth in itself plays no role at all and it is a property of sentences determined by systems external to the language faculty, i.e., those systems that use the ultimate representation that the language faculty produces. This is a reasonable position which runs into problems in a restricted set of cases mostly involving factive complements where somehow we must state that the truth of the complement clause must be presupposed. Another view is that there is some form of a truth predicate that is available in the computational system; it is a primitive, and in the terms that interest us here, a semantic primitive part of UG.

9.4.4.1 Truth and Internalism An internalist view or definition of truth requires us to provide an understanding of the concept without making any reference to external realities and situations. In other words, we need to forego a conception of truth that is essentially relational and attributes truth to a proposition or your favorite truth-bearer in virtue of the facts of the world. Given that on the internalist view, a proposition p is true internalistically, there is, therefore, no a priori reason to suggest that truth is a relational concept. It does not mean that it is not either, but in any event it is not a concept that relates a proposition to an external fact, state of affairs or what have you. There aren’t many internalist theories of truth on the market at the moment. The most explicitly stated is the one presented by Wolfram Hinzen in Hinzen (2006a,b, 2007) and we will devote the next few sections to the examination of this theory (see also chapter 2).

9.4.4.2 Hinzen’s Theory of Truth In broad brushstrokes, Hinzen’s suggestion is to take the internalist view seriously and consider truth as a non-relational concept. Thus, propositions are true (or false) not because something makes them so but rather because simply a sentence has truth in it. It is worth quoting from his Essay on Names and Truth (Hinzen 2007:199) at some length: (44) So, the claim, to repeat, is: a sentence has truth integrally: this is what a human being takes a truth judgement to suggest. The truth that it predicates of the sentence is not an object in its own right, it is the truth—of a particular sentence. Even if that

210 George Tsoulas is a claim about cognition it may well tell us something about the metaphysics of truth, however. What we can say, firstly, is that a proposition does not relate to its truth in the way a formal symbolic structure relates to semantic values (designated denotational objects in the domain of a semantic model), through an interpretation function that arbitrarily connects them. If we are interested in the way the human mind configures a judgement of truth then truth does not attach to sentences in the way standard model-theoretic semantics suggests. Mapping sentences to such denotational objects is useful to various purposes, of course, but if I am right, no contribution to an understanding of truth predication as a mental act that happens internally to a syntactic derivation, in an integral fashion. Some explanation is required here. Let’s begin with the claim that a sentence has truth integrally. The suggestion here is that a truth judgment has the same structure (and nature) as an integral predication in the sense of Hornstein et al. (2002). An integral relation between α and β is akin to a relation of inalienable possession or a relation of constitution underlying mass terms. As Hinzen puts it a little earlier in the same work (Hinzen 2007:181, emphasis in the original): (45) This is to say: a particular way of seeing the world—one expressed in making an integral predication—is closely linked to an intricate structural paradigm in human syntax, … If right, then it is also an empirical fact that, structurally speaking, in the case of truth judgements the way the mind relates a given proposition to truth is much the same as the way it relates a whole to a part or a substance to its constitution. Hinzen suggests that the structure in (46b) is the underlying structure of the set of expressions in (47): (46) a.

[SC [proposition] truth]

b.

T prop

T DP

be + D + T

D

some truth to

AgrP it

Agr Agr

SC prop

truth

Semantics in Universal Grammar 211 (47) a.

That the Earth is flat is true.

b.

The truth is that the Earth is flat.

c.

That the Earth is flat has (some) truth (to it).

d. There is some truth to the Earth’s being flat. e.

The truth of the Earth’s being flat.

f.

The truth is that of the Earth’s being flat.

While I will not argue particularly with the judgments here, it is worth pointing out that speakers judge (47d) and (47e) quite marginal and (47f) ungrammatical. Whichever judgments are actually correct is, however, inconsequential to the argument. Furthermore, Hinzen takes truth to be a formal rather than a substantial concept. He also, somewhat strangely, suggests that as a result, there can be no theory of truth. So to summarize: (48) a.

Truth is a formal concept.

b.

Nothing makes sentences true, they have (or not) truth integrally.

c.

The relation of a proposition to its truth is a part–whole relation which is created by the small clause structure in (46b).

9.4.5 Some Issues and Questions To begin with, it is unclear how all the sentences in (47) could possibly be derived from the structure in (46b). Specifically, the problem arises with respect to the simple: (49) That the Earth is flat is true. It is hard to conclude that:

(50) [SC [that the earth is flat] true] gives rise to some integral relation. The question is what stands in the integral relation to what. If we follow the argument then it must be the proposition to true but to the extent that a property (expressed by an adjective) is the right kind of thing that can be integrally had/possessed, then these are no different from standard predication. Taking

212 George Tsoulas TRUTH qua possessum, for example, or part would be closer to the mark, but (51) is ungrammatical:37 (51)

*That the Earth is flat is truth.

The only reasonable analysis of (51) seems to be (52):

(52) True(that the earth is flat) Secondly, note the following contrast, which Hinzen also discusses: (53)

Is p/that John is a dentist true?

Is (53) equivalent to (54)? (54) Does p/that John is a dentist have truth (in it)? Again note first that these sentences are odd but let us keep supposing that they are either fine or that the oddness is due to unrelated reasons. Note also that it is somewhat unclear whether in it is required and if so why (but see note 37, in which case it can be related to the locative element in have). But (53) and (54) are not equivalent. Should they be? A positive answer to (54) does not entail a positive answer to (53). In other words (55) does not entail (56): (55)

p/that John is a dentist has truth (to it).

(56) p/that John is a dentist is true. To put it in plain terms, (55) tells us that the proposition p cannot in fact be true as such—there is a need for a specification of contextual parameters that make it so that another proposition p′ whose meaning is close enough but not identical to the meaning of p is true. It is in virtue of factors that have nothing to do with language but rather general world knowledge and reasoning that a speaker may assume a connection of this type to actually obtain. The reasons are also irrelevant to an internalist analysis directly. Indirectly, however, they may provide us with clues as to the right analysis. Consider further the following dialogue: (57) A: Does p have (any) truth in it? B: Of course, not only it does, it is true/*truth. 37

It is possible to relate (51) to (i):

(i) That the Earth is flat is the truth. This suggests that truth is some constant that can be referred to directly and take part in certain types of integral predication.

Semantics in Universal Grammar 213 If we assume that a proposition and its truth are related integrally like this: (58)

P/S =

T

?

then we run into the following issue: if the have questions inquire about the degree of truth in P, then what does B’s answer mean to convey? That P = T? This is familiar and wrong. Furthermore, Hinzen recognizes that there is a difference between (59) and (60): (59) That the earth is flat has some truth to it. (60) That the earth is flat is true. He suggests that the latter is stronger than the former, which is true. He further says that by using (59) we seem to imply that there may be some false aspects to the proposition. Although this seems intuitively correct, it calls into question not so much the notion of truth and its possible coexistence with falsity, but the notion of a proposition itself. It is unclear what notion of a proposition underlies the theory.

9.4.5.1 Truth and Falsity Obviously, some sentences are true while others are false. The question here is how does a proposition relate to its falsity? The issue is important in two ways. First, the word falsity seems not to behave in the same way as the word truth: (61)

That Stalin was kind is the truth.

(62)

*That Stalin was kind is the falsity.

One would expect these items to behave alike. The fact that they don’t means either that truth is special in a way that falsity isn’t (but what way?), or that there are some idiosyncratic properties of the English word truth and we should probably not be making too much of them. The latter conclusion could possibly be even reinforced by the fact that the word truth does not behave in the expected manner cross-linguistically, but we will set aside these considerations for now in order to briefly sketch an alternative.

9.4.5.2 Truth, Assertion, and the Periphery Perhaps there is no more to truth than assertion. The fact that a sentence is asserted by a speaker in a particular context gives it what we commonly call truth. The idea is of course not new, in Tractatus 4.062, Wittgenstein (1922) suggests that it would be possible to make ourselves understood with false propositions as we have done up till now with true ones … so long as it’s known that they are meant to be false.

214 George Tsoulas Without engaging in Wittgensteinian exegesis too much, one way to interpret this statement is that truth and falsity are properties that are not fundamentally independent from assertion. We can then proceed to syntacticize truth in the same way as we did for reference. Assume for concreteness that the richly structured left periphery includes a head encoding point of view and relevant operators for illocutionary force, speaker, addressee, etc.38 We construe truth then as a property of propositional meanings encoded syntactically from an individual’s point of view, in other words an individual assertion.39 One can maintain an approach of this type while remaining agnostic whether truth with respect to the facts is actually a property that can be assigned independently to interface representations. One expects this to be so though. The approach sketched here has clear affinities (and also differences) with the one put forward by Sheehan and Hinzen (2011); ultimately the approaches should be compatible (see also chapter 2). The difference is of the same type as that with reference. In the present proposal there is a truth operator so to speak just as there is a pro-index mediating reference, but the spirit is the same. Before concluding I would like to briefly turn to some more speculative remarks on the relation between syntax and semantics as mediated by Merge.

9.5 On the Relation of Semantics with Syntax In this section I consider more closely the consequences of the idea that lexical items are construed by Merge under certain semantic conditions. The semantic conditions in question include thematic conditions as well as general semantic compatibility and coherence principles often captured by lexical rules. An idea that has been influential, albeit often left unformulated, in the development of generative syntax has been that functional categories are each responsible for the introduction of a single piece of semantic (and morphological) information, the decompositional accounts alluded to being somewhat extreme versions of this. Syntactically, this idea found a clear expression in Nash and Rouveret’s (1997, 2002) Single Checking/Licensing condition:40

38

There is a rich history of these notions from Katz and Fodor (1963) and Ross (1970) to a wide array of works relating to the grammar of perspective-sensitive items, predicates of personal taste and so on. The literature is too large to even survey here. My personal take is based on an elaboration of Tsoulas and Kural (1999). That approach gives a syntactically more precise way to handle phenomena like those covered by the various types of Judge Parameter (Lasersohn 2005; Potts 2007; and references therein), though the exact implementation remains to be clarified. 39 Locating the truth predicate/operator in the left periphery, bypasses what Geach (1965) called The Frege Point, i.e., the use of non-asserted propositions. 40 This is of course not the only place where the idea is expressed. This formulation is the most convenient for my purposes.

Semantics in Universal Grammar 215 (63) Single Licensing Condition (Nash and Rouveret 2002) A functional head can enter into a licensing relation with the feature content of only one terminal node in its checking domain. and in an even stricter way in the general conjecture put forward in Kayne (2010b): (64)

Kayne’s Conjecture UG imposes a maximum of one interpretable syntactic feature per lexical item.

The way these ideas are relevant to the present discussion is that they tell us more about the way the whole system is built. Consider first the earlier suggestion that lexical items are built by Merge in a way parallel to sentences. The question is where does Merge stop, what constitutes a lexical item from this perspective? Semantically we assume that an LI must be complete and coherent: (65) Semantic Completeness for LIs A lexical item must correspond to a well-formed concept. (66) Semantic Coherence for LIs Each step xi of the ‘lexical’ derivation must be coherent with xi−1. (65) and (66) recapitulate the idea that the lexicon is the interface between the UFS and the conceptual store. This, however, entails greater complexity than what the very local character of the syntax, as expressed in (63), (64) can countenance. Take for instance the notion that verbal structure involves a series of elements (as in Ramchand’s proposal discussed earlier). Semantic completeness and coherence would require us to bundle those together, but the syntactic derivation would require us to take them apart. The same is true of the examples discussed by Kayne. Take for instance the case of few. Kayne suggests that it is an adjective that cannot, as entailed by (64), express the meaning: (67) small number This would contravene (64) in the sense that it would include two rather than one interpretable features, which we will take to be features with a semantic interpretation. Thus the meaning in (67) is expressed via (68): (68) few NUMBER … where NUMBER is an unpronounced nominal which is there only in order to carry the relevant meaning. The claim here is that semantic completeness and coherence would require us to keep these features together in the lexicon. If the above reasoning is correct, it follows that semantic constraints would require complex lexical items which

216 George Tsoulas then may be decomposed in the syntax as needed rather than separate elements in the lexicon that would have to be combined in the syntax. How can we do this without further complicating the grammar? Here, I would like to suggest a different take on the basic idea that Nash and Rouveret (1997, 2002) have put forward. Namely, I want to propose that a specific type of Merge is responsible for taking apart complex lexical items.

9.5.1 Only Merge Can Undo What Merge Did Although there exists an often implicit assumption that the elements that Merge applies to are distinct, it has been argued that Self-Merge, whereby an element α merges with/to itself is an operation that must be countenanced (Guimarães 2000; Kayne 2010a; Adger 2013). Self-Merge can in principle take the following two forms: (69) a Merge(α, α) → { α } b Merge(α, α) → { α, α } Generally speaking the studies cited above concentrate on the (69a) possible outcome of Self-Merge.41 But what about (69b)? Does anything actually preclude this? In principle not. A further question that arises with respect to Self-Merge is whether Self-Merge is a case of Internal or External Merge. Assuming that α is a set of features and that every set is a subset of itself it follows that Merge(α, α) falls with Internal Merge, i.e., where what is Merged to α is already part of the structure. The interesting thing about this observation is that it follows that Self-Merge must involve a copy of α. Now, suppose that the naturalness and necessity of Self-Merge notwithstanding, there still is some requirement of distinctness operating in the grammar, perhaps along the lines suggested by Richards (2010), regarding the two elements subject to Merge. This is what happens with almost all cases of Internal Merge anyway in the form of non- pronunciation of lower copies. Deleting the phonetic matrix of the lower copy satisfies this kind of distinctness. For the cases at hand, a way to make this more precise would be to suggest, in accordance with the main hypothesis that each syntactic unit/head carries/ expresses/checks/enters into a licensing relation with one feature only, that when more than one feature is carried by a head then Self-Merge must take place the result would be something like (70): (70) Merge 41

(

α F1 F2

,

α F1 F2

(

There is more to say here about the possibilities of Self-Merge. Tsoulas (2015a) considers in detail the relevant formal and technical consequences.

Semantics in Universal Grammar 217 Substituting few for α and assuming that few has [SMALL] and [NUMBER], and leaving labelling issues to one side for now, we have (71): (71)

{few,few} few

few

SMALL NUMBER

SMALL NUMBER

It looks like what we have as the result of this type of Self-Merge is a local version of feature fission. We might go further then and suggest that feature fission is nothing more that the outcome of (Internal) Self-Merge. Assuming now this to be on the right track, the general picture is as follows: Merge applies to members of UFS (features) and assembles them into structured elements, LIs. Call this Merge Micro-Merge. The syntax, on the other hand, manipulates medium-sized categories that must carry one interpretable feature. This is achieved by successive applications of Self-Merge and standard Merge. Call this Meso-Merge. Finally, if we follow Roberts and Watumull (2014), who claim that larger, paragraph sized units are also created by Merge, we have a case of Macro- Merge. The main semantic constraints of UG apply at the level of LI creation (Micro- Merge) determining order of arguments and modifiers. At the Meso-Merge level the semantic constraints are concerned on the one hand with reference and the requirement to add the relevant pro-index, and, on the other with the construction of the relevant peripheral structure that yields assertion and truth. Discourse constraints are relevant to Macro-Merge but these are beyond the scope of this chapter.

9.6 Conclusion At this point, we must conclude. In the article cited in the introduction, Higginbotham identified three necessary components to the understanding of what constitutes semantic knowledge: (72) a. A conception of the formal syntactic information available to the semantics b. A conception of the kinds of semantic values that are determined by the formal information in the syntax c. A conception of the principles that mediate between the first two components Our task here was to see which aspects of these components can be attributed to Universal Grammar directly. Our conception of syntax involved only the operation of Merge throughout the grammar. The semantic aspects of UG concern essentially the following: First, constraints on the way lexical items are built (Lexical Completeness

218 George Tsoulas and Coherence), which include information on semantically based ordering of heads. Second, referential values for expressions that refer, based on specific syntactic configurations. Third, a syntactically mediated truth component. Fourth, combinatorial principles that mediate between the syntactic input and the interface representation that the semantic component produced. Many areas (as frequently pointed out in the course of the chapter) remain, namely modality, embedding, and the recursive sort of pragmatics advocated by Chierchia (2013), the role of principles such as Heim’s Maximize Presupposition as well as virtually all aspects of dynamic semantics and so on. My only hope is that the direction taken in this chapter will prove fruitful in addressing these other concepts and areas from a UG perspective yielding, ultimately, an account of speakers’ knowledge of meaning. As ever, so much remains to be done.

Pa rt I I I

L A N G UAG E AC QU I SI T ION

Chapter 10

The Argum ent from t h e P overt y of th e St i mu lu s Howard Lasnik and Jeffrey L. Lidz

10.1 Introduction The problem of language acquisition has always been a central concern, perhaps the central concern, in generative grammar. The problem is that the learner, based on limited experience, projects a system that goes far beyond that experience. Already in Chomsky (1955), the founding document of the field, we find the following observation, which sets up a minimum explanatory criterion for a theory of linguistic knowledge and its acquisition: A speaker of a language has observed a certain limited set of utterances in his language. On the basis of this finite linguistic experience he can produce an indefinite number of new utterances which are immediately acceptable to other members of his speech community. He can also distinguish a certain set of ‘grammatical’ utterances, among utterances that he has never heard and might never produce. He thus projects his past linguistic experience to include certain new strings while excluding others. (Chomsky 1955/1975: p. 61 of 1975 version)

This general characterization of the situation leads to what Chomsky (1978) called ‘the argument from poverty of the stimulus’: the argument that our experience far underdetermines our knowledge and hence that our biological endowment is responsible for much of the derived state. The argument from the poverty of the stimulus is essentially equivalent to the problem of induction. As Hume (1739) stated, ‘even after the observation of the frequent or constant conjunction of objects, we have no reason to draw any inference concerning any object beyond those of which we have had experience’ (Hume 1739:139). Experience simply does not provide the basis for generalizing to the future. Chomsky’s idea,

222 Howard Lasnik and Jeffrey L. Lidz following Descartes, is that the basis for generalization must come from the learner, not from the world. Returning to the case of language, Chomsky went on to argue that the basis for generalization in language is in part distinct from the basis for generalization in other domains. The basic insight is that the character of linguistic representations is particular to just those representations and hence learning those representations must involve a mechanism designed to construct just those. The argument can be framed either in terms of the set of sentences allowed by the grammar (its weak generative capacity) or the set of structures generated by the grammar (its strong generative capacity). From the perspective of the weak generative capacity of the system, the critical observation is that there is an indefinite number of unexperienced strings which the speaker of a language can produce, understand, and identify as grammatical/ungrammatical. Because any finite set of utterances is compatible with an infinite set of languages in extension (Gold 1967), and because speakers of a language agree about the grammaticality/interpretation of nearly all novel sentences, there must be some contribution from the learner to determine which language is acquired. From the perspective of the strong generative capacity of the system, the observation is that the kinds of grammatical representations that speakers of a language construct on the basis of their experience are widely shared, but far removed from the data of experience. Any finite set of data is compatible with a wide/infinite range of characterizing functions (i.e., grammars, I-languages, etc.). That we all build the same kinds leads to the conclusion that learners are biased to construct certain kinds of grammatical representations and not others. Said differently, the input to the child is degenerate in two senses (Chomsky 1967a). First, it is degenerate in scope: the input cannot provide evidence about all possible sentences (sentence–meaning pairs, sentence structures, etc.) that the child will encounter. Second, it is degenerate in quality: the input itself does not contain information about the kinds of representations that should be used in building a generative grammar of the language. These notions of degeneracy should be not be confused with others. For example, Hornstein and Lightfoot (1981) note that speech to children contains speech errors, slips of the tongue, utterances produced by foreigners, etc., all of which might interfere with acquisition. While learners certainly do need to overcome this kind of degeneracy, what might be called the noise of the signal, the critical point about the degeneracy of the input from the perspective of the poverty of the stimulus is that the primary linguistic data (PLD) is (a) limited in scope and (b) uninformative with respect to choosing the appropriate representational vocabulary. Chomsky (1971) used the phrase ‘poverty of experience’ for this same state of affairs, highlighting the degeneracy of the input relative to the character of the acquired knowledge. Interestingly, in that book the first case he discusses is one where the empiricist position that Chomsky rejects might seem to be the strongest—word learning: Under normal conditions we learn words by a limited exposure to their use. Somehow, our brief and personal and limited contacts with the world suffice for us

The Argument from the Poverty of the Stimulus 223 to determine what words mean. When we try to analyze any specific instance—say, such readily learned words as ‘mistake,’ or ‘try,’ or ‘expect,’ or ‘compare,’ or ‘die,’ or even common nouns—we find that rather rich assumptions about the world of fact and the interconnections of concepts come into play in placing the item properly in the system of language. This is by now a familiar observation, and I need not elaborate on it. But it seems to me to further dissipate the lingering appeal of an approach to acquisition of knowledge that takes empiricist assumptions as a point of departure for what are presumed to be the simplest cases. (Chomsky 1971:16–17)

Chomsky concludes this part of his discussion by stating that ‘… what little is known about the specificity and complexity of belief as compared with the poverty of experience leads one to suspect that it is at best misleading to claim that words that I understand derive their meaning from my experience’ (Chomsky 1971:17). Such a claim would be misleading because the experience can only provide partial and indirect evidence about how to build a word meaning that will generalize to all relevant cases. Of course, the evidence is relevant, and in all likelihood necessary, for learning to occur, but there is no sense in which the evidence directly determines the content of the representations. As Chomsky (1965:34) notes, it is important ‘to distinguish between these two functions of external data—the function of initiating or facilitating the operation of innate mechanisms and the function of determining in part the direction that learning will take.’ A particular illustration offered by Chomsky (1971) centers on what is often called the A-over-A constraint. Chomsky, informally, proposes an account of the active–passive relation (developed much further in Chomsky 1973) under which an NP following the main verb is fronted (along with other changes that won’t directly concern us here). This process gives relations like those in (1). (1)

a. I believe the dog to be hungry. b. The dog is believed to be hungry.

Chomsky observes that if the NP following believe is complex, an NP containing another NP, it must be the containing NP that fronts: (2) a. I believe the dog’s owner to be hungry. b. The dog's owner is believed to be hungry. c. *The dog is believed ’s owner to be hungry. Chomsky’s description bears directly on the ‘poverty’ issue: The instruction for forming passives was ambiguous: the ambiguity is resolved by the overriding principle that we must apply the operation to the largest noun phrase that immediately follows the verb. This, again, is a rather general property of the formal operations of syntax. There has been some fairly intensive investigation of such

224 Howard Lasnik and Jeffrey L. Lidz conditions on formal operations in the past decade, and although we are far from a definitive formulation, some interesting things have been learned. It seems reasonably clear that these conditions must also be part of the schematism applied by the mind in language-learning. Again, the conditions seem to be invariant, insofar as they are understood at all, and there is little data available to the language learner to show that they apply. (Chomsky 1971:30)

It is essential to realize how important the ambiguity-resolving property is here. A priori, one might expect there to be two ways of forming a passive, (2b) or (2c). Thus, evidence for the learner that (2a) is possible does not address the acquisition problem. Evidence about the existence of passives is silent with respect to the proper way of representing that construction. It should also be noted that it is not really crucial that Chomsky appeals specifically to the A-over-A condition. Any constraint that has the effect of allowing (2b) and excluding (2c) is subject to the same line of reasoning. In fact, as we will discuss in section 10.2, virtually any constraint excluding certain derivations or barring particular structure–meaning pairings is the basis for a poverty argument. Consequently, there are hundreds, if not thousands, of such arguments, either implicit or explicit, in the literature.

10.2 The Form of the Argument Pullum and Scholz (2002) identify five steps to a poverty argument: i) that speakers acquire some aspect of grammatical representation; ii) that the data the child is exposed to is consistent with multiple representations; iii) that there is data that could be defined that would distinguish the true representation from the alternatives; iv) that that data does not exist in the primary linguistic data; v) conclusion: the aspect of the grammatical representation acquired in (i) is not determined by experience but by properties internal to the learner. The critical first step of the argument is in identifying the target of acquisition, whether that is a word meaning, a transformational rule, or a constraint on transformations in general. In the case of the data in (2), the target of acquisition is whatever piece of knowledge is responsible for the grammaticality of (2b) and the ungrammaticality of (2c), either a constraint on the application of a passivization transformation or, more likely, a constraint (like A-over-A) on the application of any transformational rule. The second step is that the data is consistent with multiple representations. Assuming that the majority of passivized sentences in the PLD involve simplex subjects, then the data is equally compatible with (a) the correct grammar, (b) one which allows for

The Argument from the Poverty of the Stimulus 225 movement of only the most embedded NP (producing only 2c), or (c) one which allows for movement of either. The third step involves defining what would be the relevant disambiguating evidence. In this case, the existence of (2b) is not sufficient, as this would rule out option (b), but would not rule out option (c), which allows for passivization of either NP.1 If the existence of the actual alternative is not sufficient evidence to rule out the competitors, then what is? One possibility often raised is that explicit negative evidence would suffice. If children were simply told that sentences like (2c) were ungrammatical (characterized in a way that was sufficiently explicit, and transparent to the learner, to identify just those cases that involved moving the contained NP in an NP-over-NP structure), or if they were corrected when they produced sentences like (2c), then that evidence would distinguish the correct from the incorrect grammar. And finally, we have steps (iv) and (v) of the argument. Since it is obvious that such explicit instruction or correction does not occur and that that is the only definable evidence that could distinguish the two grammars, it follows that the relevant constraint on the structure derives from properties internal to the learner and not from any aspect of their experience. Now it is quite important to emphasize at this stage that saying that there is a constraint on possible grammars internal to children that bars them from considering the possibility that (2c) is a possible passive of (2a) is not equivalent to saying that there is no learning involved in the acquisition of English or any other language. Rather, the point is that when the learner has developed to the point of considering ways of constructing transformational rules that conform to the exposure language, there are certain hypotheses that simply will not enter into their calculations. Identifying that some construction is derived transformationally and what the surface features of that construction are (e.g., the participial morphology on the verb) must happen through some interaction between the learner and the environment (see Lidz 2010, Viau and Lidz 2011, and Lidz and Gagliardi 2015 for extensive discussion).

10.3 Further Examples Chomsky (1971) examines one additional property of complex passive sentences of the sort exemplified in (1b), a property further developed as the Tensed-S Condition in Chomsky (1973). Corresponding to the active (1a), we found the passive (1b). But when 1 One might argue that the more restrictive option (a) is to be preferred over option (c) through the use of indirect negative evidence (Chomsky 1981a). Given that option (a) predicts only one type of passive and option (c) predicts two types, and only one type occurs in the PLD, option (a) is simply more likely. We return to the difficulty of identifying the scope of such arguments in section 10.5.

226 Howard Lasnik and Jeffrey L. Lidz the clausal complement to the main verb is finite (‘tensed’) instead of infinitival, the passive becomes impossible: (3)

a. I believe the dog is hungry. b. *The dog is believed is hungry.

Chomsky speculates that ‘nothing can be extracted from a tensed sentence.’ This is a narrowing of an earlier ‘clause-mate’ constraint on (some) processes.2 And again the logic of the situation is independent of the specifics of the constraint. Speakers know that (3b) is not possible, and there is no clear evidence in the input to a learner that the appropriate generalization of their experience should include the constraint that is responsible. Chomsky explores several processes that are impeded by the boundary of a finite clause, and formulates a more general version of the constraint: … let us propose that no rule can involve the phrase X and the phrase Y, where Y is contained in a tensed sentence to the right of X: i.e., no rule can involve X and Y in the structure [ … X … [ … Y … ] … ], where [ … Y … ] is a tensed sentence. (Chomsky 1971:35)

This version of the constraint blocks not only extraction out of a finite clause, but also insertion into it, and even relating X and Y by some nonmovement rule or process. One of the most interesting instances of the latter sort was a semantic effect first investigated in Postal (1966) and Postal (1969) and called in the latter work the Inclusion Constraint. As we will immediately see, both the Inclusion Constraint (dubbed RI, for Rule of Interpretation, by Chomsky 1973) and the condition on its application provide potential poverty arguments. Postal points out a contrast between examples (4a) and (4b). (4) a. When the men finally sat down, Harry began to speak softly. b. The men were proud of Harry. Postal (1969) observes that In [(4a)] the possibility is not excluded that Harry is one of the men who sat down. In [(4b)] Harry cannot be one of the men who were proud. This is not a logical or a priori necessary fact since it is logically possible that [(4b)] could be interpreted to mean that a certain set of men were proud of one of their number, who was named Harry, and this individual Harry was proud of himself. (Postal 1969:416) 2

This locality condition is incorporated into the notion governing category of Chomsky (1981).

The Argument from the Poverty of the Stimulus 227 Postal goes on to show that this interpretive contrast correlates with a grammaticality contrast: (5)

a. When we finally sat down, I began to speak softly. b. *We were proud of me.

As Postal hints, and Chomsky (1971), and Chomsky (1973) claims, these are the same phenomenon. Chomsky (1971) states the generalization this way: … some rule of interpretation assigns the property ‘strangeness’ to a sentence of the form: noun phrase—verb—noun phrase—X, where the two noun phrases intersect in reference. (Chomsky 1971:38)

Since we and I must overlap in reference, (5b) cannot avoid ‘strangeness.’ As for (4a) and (5a), the relevant interpretive rule is limited to a local domain (roughly, the clausemate domain). After surveying a range of processes, and the locality constraints on their operation, Chomsky suggests a poverty argument: … there apparently are deep-seated and rather abstract principles of a very general nature that determine the form and interpretation of sentences. It is reasonable to formulate the empirical hypothesis that such principles are language universals. Quite probably the hypothesis will have to be qualified as research into the variety of languages continues. To the extent that such hypotheses are tenable, it is plausible to attribute the proposed language invariants to the innate language faculty which is, in turn, one component of the structure of mind. These are, I stress, empirical hypotheses. Alternatives are conceivable. For example, one might argue that children are specifically trained to follow the principles in question, or, more plausibly, that these principles are special cases of more general principles of mind. (Chomsky 1971:43)

Postal (1969), in the course of his discussion, makes an explicit poverty argument concerning another reference phenomenon. He observes that in (6), his cannot be understood as coreferential with killer. (6) His killer was 6 feet tall. ‘… [(6)] is not a clever way of saying someone killed himself and was six feet tall’ (Postal 1969:421). Postal suggests an account in terms of principles of grammar, and observes that ‘facts like [(6)] are of interest in relation to language learning. They are exactly the sort of thing no adult does or could teach the child directly since adults are not aware of them’ (Postal 1969:422). In fact, it hardly seems likely that the child would have any evidence whatsoever for this. The same argument could have been made about the Inclusion Constraint.

228 Howard Lasnik and Jeffrey L. Lidz

10.4 Principle C and Indirect Negative Evidence Indeed, Crain and McKee (1985) make such an argument about a descendent of the Inclusion Constraint, Principle C of the binding theory. Crain and McKee (1985) examined English learning preschoolers’ knowledge of Principle C (Chomsky 1981a), asking whether children know that a pronoun can precede its antecedent but cannot c-command it. We refer to cases in which a pronoun precedes its antecedent as ‘backwards anaphora.’ In a truth value judgment experiment, children were presented with sentences like (7a) and (7b): (7) a. While he was dancing, the Ninja Turtle ate pizza. b. He ate pizza while the Ninja Turtle was dancing. In this task, participants observe a story acted out by the experimenter with toys and props. At the end of the story a puppet makes a statement about the story. The participants’ task is to tell the puppet whether he was right or wrong. Crain and McKee (1985) presented children with these sentences following stories with two crucial features. First, the Ninja Turtle ate pizza while dancing. This makes the interpretation in which the pronoun (he) and the referring expression (the Ninja Turtle) are coreferentially true. Second, there was an additional salient character who did not eat pizza while the Ninja Turtle danced. This aspect of the story makes the interpretation in which the pronoun refers to a character not named in the test sentence false. Thus, if children allow coreference in these sentences, they should accept them as true, but if children disallow coreference, then they should reject them as false. The reasoning behind this manipulation is as follows. If children reject the coreference interpretation, then they must search for an additional extrasentential antecedent for the pronoun. Doing so, however, makes the sentence false. The theoretical question is whether children know that backwards anaphora is possible in sentences like (7a) but not (7b). Crain and McKee found that, in these contexts, children as young as 3 years old accepted sentences like (7a), but overwhelmingly rejected sentences like (7b). The fact that they treated the two sentence types differently, rejecting coreference only in those sentences that violate Principle C, indicates that by 3 years of age English-learning children respect Principle C. The observation that Principle C constrains children’s interpretations raises the question of the origin of this constraint. The fact that children as young as 3 years of age behave at adult-like levels in rejecting sentences that violate Principle C is often taken as strong evidence not just for the role of c-command in children’s representations, but also for the innateness of Principle C itself (Crain 1991). The reasoning behind the argument

The Argument from the Poverty of the Stimulus 229 is that Principle C is a constraint on what is possible in language. It says that a given pairing between certain sentences and certain meanings is impossible. But, given that children do not have access to explicit evidence regarding what is not a possible form– meaning pairing in their language (see Marcus 1993 for a review), their acquisition of Principle C must be driven by internally generated constraints and not by experience alone (see Gelman and Williams 1998).

10.5 Indirect Negative Evidence and Bayesian Learning In recent years, however, the possibility that children can learn in an indirect fashion on the basis of Bayesian learning algorithms has gained some prominence (Tenenbaum and Griffiths 2001; Regier and Gahl 2003; among others). On this view, the absence of a given form–meaning pairing might be informative about the structure of the grammar as a kind of indirect negative evidence (Chomsky 1981a). In the context of Bayesian models, learning via indirect negative evidence is coded as the size principle (Tenenbaum and Griffiths 2001), which states roughly that smaller hypotheses are more likely than larger ones. Bayesian learners choose hypotheses by comparing the likelihood of the observed data under each hypothesis. These models assume that learners bring to the task of learning a set of hypotheses H, each of which represents a possible explanation of the process that generated the data. In the case of language learning, this means that the class of possible grammars is defined by H, with each member h of that set representing a particular grammar. Given the observed data d, the learner’s goal is to identify how probable each possible hypothesis h is, i.e., to estimate P(h|d), the posterior distribution over hypotheses. The hypothesis with the highest posterior probability is the one that is most likely responsible for generating the observed data and hence is acquired by the learner as the correct hypothesis. Bayes’ Theorem states that the posterior can be reformulated as in (8): (8) Bayes’ Theorem P(h d ) =

P ( d h) P ( h) P(d )

The likelihood, P(d | h), expresses how well the hypothesis explains the data; the prior, P(h), expresses how likely the hypothesis is antecedent to any observations of data. The evidence, P(d), represents the probability of the data across all hypotheses. P(d) functions as a normalizing factor that ensures that P(h|d) is a proper probability distribution, summing to 1 over all values of h, and is a constant that can often be safely ignored

230 Howard Lasnik and Jeffrey L. Lidz when comparing the relative probability of one hypothesis to another. Thus, defining a Bayesian model usually involves three steps:

i) Defining the hypothesis space: Which hypotheses does the learner consider? ii) Defining the prior distribution over hypotheses: Which hypotheses is the learner biased towards or against? iii) Defining the likelihood function: How does the learner’s input affect the learner’s beliefs about which hypothesis is correct? Reasoning by indirect negative evidence in such models involves comparing two (or more) hypotheses. If one hypothesis produces a subset of the data that the other hypothesis produces, the likelihood of the smaller hypothesis is greater than the likelihood of the larger one. Consequently, the posterior probability of the subset grammar (i.e., the grammar generating the subset language) is greater. To see why, consider the following figure representing two grammars standing in a subset–superset relation: (9)

|Hyp B|=a+b |Hyp A|=a d

In this figure, d represents a data point that is able to be produced by both Hypothesis A (the smaller grammar) and Hypothesis B (the larger one). For the purposes of this discussion, the size of each grammar is the set of sentences (or sentence–meaning pairs) that the grammar produces. In this case, we give the size of Hypothesis A as a and the size of Hypothesis B as a+b, where b is the set of sentences produced by Hypothesis B but not Hypothesis A. Hence, the likelihood that Hypothesis A produced d is 1/a; and, the likelihood that Hypothesis B produced d is 1/a+b. Since a is smaller than a+b, 1/a is larger than 1/a+b. Consequently, the data point d is more likely to have been produced by Hypothesis A (the subset grammar) than hypothesis B (the superset grammar). Thus, as more data consistent with both grammars occurs, the posterior probability of Hypothesis A increases (even though both grammars could have produced that data). This kind of reasoning resembles the Subset Principle of Berwick (1985), which claims that if one grammar produces a language that is a subset of the language produced by another grammar, the learner should choose the subset grammar, since only this hypothesis could be disconfirmed by positive data (see also Dell 1981; Manzini and Wexler 1987; Pinker 1989). It differs from the Subset Principle in two respects. First, in Berwick’s formulation the Subset Principle must be a hard-coded principle of grammar learning, whereas in the Bayesian formulation it is a general application of probability theory. Second, the Subset Principle reflects a discrete choice which can be overridden, whereas the Bayesian formulation, the preference for the subset is a probabilistic decision whose strength increases as the amount of data consistent with both grammars increases.

The Argument from the Poverty of the Stimulus 231 Returning now to the particular case under discussion, imagine that Hypothesis A is a grammar with Principle C in it and that Hypothesis B is a grammar without Principle C. Hypothesis A produces a smaller set of interpretations than Hypothesis B because any sentence to which Principle C applies has one more interpretation in Hypothesis B than it does under Hypothesis A. That is, sentences like (7b) allow either coreference between the pronoun and the name or disjoint reference under Hypothesis B but only allow disjoint reference under Hypothesis A. As learners will hear such sentences only with the disjoint reference interpretation intended, Hypothesis A becomes more likely. Thus, the lack of data that is consistent with Hypothesis B but not Hypothesis A can be treated as indirect evidence in favor of Hypothesis A over Hypothesis B. Now, it is important to recognize that this kind of approach would have to assume that the learner has the representational capacity to formulate Principle C innately. That is, the hypothesis space H must allow for the formulation of hypotheses in hierarchical terms; and, the learner, in order to compare the likelihoods of the two hypotheses must be able to recognize c-command relations among possibly coreferential expressions and must track the relative frequency of coreference vs. disjoint reference interpretations in these environments. Without these assumptions, the indirect learner could not even begin to learn in this fashion.3 Nonetheless, it does bring up the possibility that the existence of a constraint against certain form–meaning pairings is not by itself evidence that alternative grammars lacking that constraint are not formulable by Universal Grammar (contra the claims of Crain 1991). Kazanina and Phillips (2001) addressed this issue by looking at the acquisition of backwards anaphora in Russian. Like every language, as far as we know, Russian obeys Principle C. Importantly, however, Russian exhibits a further constraint against backwards anaphora when the pronoun is contained in certain adverbial clauses but not others. These facts are illustrated in (10): (10) a. Puxi s’el jabloko, poka oni čital knigu. Pooh ate.perf apple while he read.imp book ‘Pooh ate an apple while he was reading a book.’ b. Poka Puxi čital knigu, oni s’el jabloko. while Pooh read.imp book he ate.perf apple ‘While Pooh was reading a book, he ate an apple.’ c. *Oni s’el jabloko, poka Puxi čital knigu. he ate.perf apple while Pooh read.imp book ‘He ate an apple while Pooh was reading a book.’ d. *Poka oni čital knigu, Puxi s’el jabloko. while he read.imp book Pooh ate.perf apple ‘While he was reading a book, Pooh ate an apple.’ 3 More generally, as Lasnik (1989) notes, learning by indirect negative evidence should be possible only to the extent that the learner has a constrained representational vocabulary over which to build and compare hypotheses.

232 Howard Lasnik and Jeffrey L. Lidz In (10) we see that forwards anaphora is completely free, as in English, but that backwards anaphora is more restricted than in English. In (10c), the pronoun both precedes and c-commands its antecedent and so the sentence is ruled out by Principle C. But, in (10d), the pronoun does not c-command its antecedent, but still the sentence is ungrammatical, unlike its English counterpart. The restriction on backwards anaphora appears to be tied to certain adverbial clauses, as illustrated above with the temporal adverbial poka, ‘while.’ With different temporal adverbials, backwards anaphora is possible, as shown in (11): (11) Do togo kak ona pereehala v Rossiyu, Masha zhila vo Francii before she moved.perf to Russia, Masha was.living.imp in France ‘Before she moved to Russia, Masha lived in France.’ However the restriction on backwards anaphora in poka-clauses is to be formulated, it is clear that it is not a universal, since it does not hold in English (see Kazanina 2005 for details). The existence of two kinds of constraint against backwards anaphora allows us to ask about the origins of Principle C. In particular, the existence of language-particular constraints undermines the argument that Principle C is innate (in the sense that it is necessarily a part of every grammar constructable by Universal Grammar simply because it is a constraint). The existence of constraints like the Russian poka-constraint, therefore, makes a Bayesian approach to constraint learning more plausible. Nonetheless, Kazanina and Phillips asked whether children learning Russian demonstrate the same knowledge of Principle C as their English learning counterparts and whether they also demonstrate knowledge of the poka-constraint. These researchers found a developmental dissociation between Principle C and the poka-constraint in Russian. While three-year-olds demonstrated adult-like knowledge for Principle C violating sentences, children at this age appeared not to know the poka-constraint. By five years of age, however, the Russian children had acquired the poka-constraint. Because Principle C is a universal constraint but the constraint against backwards anaphora in Russian poka-clauses is specific to that language, Kazanina and Phillips suggest that their dissociation in acquisition derives from how they are learned. Principle C is a universal, innate, constraint on possible grammars and so does not need to be learned. Consequently, the effects of this constraint are visible in children at the earliest possible experimental observations (see also Lukyanenko, Conroy and Lidz 2014 and Sutton, Lukyanenko, and Lidz 2011). The poka-constraint, on the other hand, is specific to Russian and so must be learned from experience, perhaps on the basis of indirect negative evidence, as discussed earlier in this section and in section 10.4. The fact that these constraints show a different learning trajectory is consistent with Principle C being innate but does not force this conclusion. In order to show that it is innate, we would need to show that the asymmetry in acquisition does not follow from asymmetries in the amount of data that learners are exposed to that provide

The Argument from the Poverty of the Stimulus 233 opportunities for learning by indirect negative evidence. If the contexts for comparing a Principle C grammar against one without Principle C are more common than those for comparing a poka-constraint grammar against one without the poka-constraint, then the asymmetry in acquisition might also follow.

10.6 Structure Dependence and Polar Interrogatives The most widely discussed poverty of the stimulus argument is based on what Chomsky (1965) called structure dependence (see also chapter 5, section 5.4). Chomsky characterized a property of transformational operations which demands that they rely on analysis into constituent structural units, and not on, say, linear sequences of words or morphemes. He argued that this property must be part of the initial state of the acquisition mechanism since, according to Chomsky, human languages invariably conform to it. The following passage provided the foundation for much further discussion and argument in subsequent years: A theory that attributes possession of certain linguistic universals to a language- acquisition system, as a property to be realized under appropriate external conditions, implies that only certain kinds of symbolic systems can be acquired and used as languages by this device. Others should be beyond its language-acquisition capacity… In principle, one might try to determine whether invented systems that fail these conditions do pose inordinately difficult problems for language learning, and do fall beyond the domain for which the language acquisition system is designed. As a concrete example, consider the fact that, according to the theory of transformational grammar, only certain kinds of formal operations on strings can appear in grammars—operations that, furthermore, have no a priori justification. For example, the permitted operations cannot be shown in any sense to be the most ‘simple’ or ‘elementary’ ones that might be invented. In fact, what might in general be considered ‘elementary operations’ on strings do not qualify as grammatical transformations at all, while many of the operations that do qualify are far from elementary, in any general sense. Specifically, grammatical transformations are necessarily ‘structure-dependent’ in that they manipulate substrings only in terms of their assignment to categories. Thus it is possible to formulate a transformation that can insert all or part of the Auxiliary Verb to the left of a Noun Phrase that precedes it, independently of what the length or internal complexity of the strings belonging to these categories may be. It is impossible, however, to formulate as a transformation such a simple operation as reflection of an arbitrary string (that is, replacement of any string a1 … an, where each ai is a single symbol, by an … a1), or interchange of the (2n-1)th word with the 2nth word throughout a string of arbitrary length, or insertion of a symbol in the middle of a string of even length. … Hence, one who proposes this theory would have to predict that although a language might form interrogatives, for

234 Howard Lasnik and Jeffrey L. Lidz example, by interchanging the order of certain categories (as in English), it could not form interrogatives by reflection, or interchange of odd and even words, or insertion of a marker in the middle of the sentence. Many other such predictions, none of them at all obvious in any a priori sense, can be deduced from any sufficiently explicit theory of linguistic universals that is attributed to a language-acquisition device as an intrinsic property. (Chomsky 1965:55–56)

In later discussions, Chomsky often focused on the English auxiliary-fronting phenomenon as a clear and accessible example of the general idea. Unfortunately, other scholars were frequently misled by this into taking one particular aspect of the aux-fronting paradigm as the principal structure dependence claim, or, worse still, as the principal poverty of the stimulus claim. We will return to further discussion of this, after first looking at the early explicit presentation in Chomsky (1968/1972/2006): … grammatical transformations are invariably structure-dependent in the sense that they apply to a string of words [fn.: More properly, to a string of minimal linguistic units that may or may not be words.] by virtue of the organization of these words into phrases. It is easy to imagine structure-independent operations that apply to a string of elements quite independently of its abstract structure as a system of phrases. For example, the rule that forms the interrogatives of 71 from the corresponding declaratives of 72 (see note 10 [I should emphasize that when I speak of a sentence as derived by transformation from another sentence, I am speaking loosely and inaccurately. What I should say is that the structure associated with the first sentence is derived from the structure underlying the second.]) is a structure-dependent rule interchanging a noun phrase with the first element of the auxiliary.

71

a. Will the members of the audience who enjoyed the play stand? b. Has Mary lived in Princeton? c. Will the subjects who will act as controls be paid?

72

a. The members of the audience who enjoyed the play will stand. b. Mary has lived in Princeton. c. The subjects who will act as controls will be paid.

In contrast, consider the operation that inverts the first and last words of a sentence, or that arranges the words of a sentence in increasing length in terms of phonetic segments (‘alphabetizing’ in some specified way for items of the same length), or that moves the left-most occurrence of the word ‘will’ to the extreme left—call these O1, O2, and O3, respectively. Applying O1 to 72a, we derive 73a; applying O2 to 72b, we derive 73b; applying O3 to 72c, we derive 73c:

The Argument from the Poverty of the Stimulus 235 73

a. stand the members of the audience who enjoyed the play will4 b. in has lived Mary Princeton c. will the subjects who act as controls will be paid The operations O1, O2, and O3 are structure-independent. Innumerable other operations of this sort can be specified. There is no a priori reason why human language should make use exclusively of structure-dependent operations, such as English interrogation, instead of structure- independent operations, such as O1, O2, and O3. One can hardly argue that the latter are more ‘complex’ in some absolute sense; nor can they be shown to be more productive of ambiguity or more harmful to communicative efficiency. Yet no human language contains structure-independent operations among (or replacing) the structure-dependent grammatical transformations. The language-learner knows that the operation that gives 71 is a possible candidate for a grammar, whereas O1, O2, and O3, and any operations like them, need not be considered as tentative hypotheses. If we establish the proper ‘psychic distance’ from such elementary and commonplace phenomena as these, we will see that they really pose some nontrivial problems for human psychology. We can speculate about the reason for the reliance on structure-dependent operations … but we must recognize that any such speculation must involve assumptions regarding human cognitive capacities that are by no means obvious or necessary. And it is difficult to avoid the conclusion that whatever its function may be, the reliance on structure-dependent operations must be predetermined for the language-learner by a restrictive initial schematism of some sort that directs his attempts to acquire linguistic competence. (Chomsky 1968/1972/ 2006: p. 52 of the 1968 edition)

And here is Chomsky’s explicit poverty argument based on auxiliary-fronting and structure dependence: Notice further that we have very little evidence, in our normal experience, that the structure dependent operation is the correct one. It is quite possible for a person to go through life without having heard any relevant examples that would choose between the two principles. It is, however, safe to predict that a child who has had no such evidence would unerringly apply the structure-dependent operation the first time he attempts to form the question corresponding to the assertion ‘The dog that is

4 Chomsky’s point is clear, but this example seems not to be the correct one, as it evidently involves both the correct and the incorrect transformation, the latter following the former. More immediately relevant would be:

(i)

stand members of the audience who enjoyed the play will the

236 Howard Lasnik and Jeffrey L. Lidz in the corner is hungry.’ Though children make certain kinds of errors in the course of language learning, I am sure that none make the error of forming the question ‘Is the dog that in the corner is hungry?’ despite the slim evidence of experience and the simplicity of the structure-independent rule. Furthermore, all known formal operations in the grammar of English, or of any other language, are structure-dependent. This is a very simple example of an invariant principle of language, what might be called a formal linguistic universal or a principle of universal grammar. (Chomsky 1971:27–28)

In Piattelli-Palmarini (1980), there is a very interesting interchange between Chomsky and Hilary Putnam concerning this poverty argument. Chomsky formulates two imaginable versions of auxiliary fronting (related to some we have seen earlier in this section), calling H1 structure-independent and H2 structure-dependent: (12)

H1: Process the declarative from beginning to end (left to right), word by word, until reaching the first occurrence of the words is, will, etc.; transpose this occurrence to the beginning (left), forming the associated interrogative. H2: same as H1, but select the first occurrence of is, will, etc., following the first noun phrase of the declarative.

Chomsky observes that the following data refute H1 but are predicted by H2: (13)

The man who is here is tall. –Is the man who is here tall? The man who is tall will leave. –Will the man who is tall leave?

(14)

*Is the man who here is tall? *Is the man who tall will leave?

Chomsky then asks how the child knows that H1 is false. It is surely not the case that he first hits on H1 (as a neutral scientist would) and then is forced to reject it on the basis of data such as [(13)]. No child is taught the relevant facts. Children make many errors in language learning, but none such as [(14)], prior to appropriate training or evidence. A person might go through much or all his life without ever having been exposed to relevant evidence, but he will nevertheless unerringly employ H2, never H1 on the first relevant occasion (assuming that he can handle the structures at all). … If humans were differently designed, they would acquire a grammar that incorporates H1 and would be none the worse for that. In fact, it would be difficult to know, by mere passive observation of a person's total linguistic performance, whether he was using H1 or H2. … Such observations suggest that it is a property of S0 [the initial state of the language faculty] that rules (or rules of some specific category, identifiable on quite general grounds by some genetically determined mechanism) are structure-dependent. The child need not consider H1; it is ruled out by properties of his initial mental state, S0. (Piattelli-Palmarini 1980:40)

The Argument from the Poverty of the Stimulus 237 Putnam, in rejecting Chomsky’s conclusion, responds, in part ‘H1 has never been “put forth” by anyone, nor would any sane person put it forth …’ (Piattelli-Palmarini 1980:287). But Chomsky has a powerful counter-response: Putnam considers my two hypotheses H1 and H2, advanced to explain the formation of yes-or-no questions in English. He observes that the structure-independent rule H1 would not be put forth by any ‘sane person,’ which is quite true, but merely constitutes part of the problem to be solved. The question is: Why? The answer that I suggest is that the general principles of transformational grammar belong to S0L as part of a schematism that characterizes ‘possible human languages’. (Piattelli-Palmarini 1980:311)

Perfors et al. (2006) offer a related argument against this particular poverty argument. Perfors et al. summarize the situation and their argument as follows: The Poverty of the Stimulus (PoS) argument holds that children do not receive enough evidence to infer the existence of core aspects of language, such as the dependence of linguistic rules on hierarchical phrase structure. We reevaluate one version of this argument with a Bayesian model of grammar induction, and show that a rational learner without any initial language-specific biases could learn this dependency given typical child-directed input.

Curiously, they wind up doing no such thing, nor do they actually even attempt any such thing. Instead, as pointed out by Lasnik and Uriagereka (2007), they wind up repeating one error that Putnam made. Their system, when presented with a context- free language, learns a context-free language. All of the grammars presented as targets of the learning are particular phrase structure grammars. Thus, as Chomsky observed in discussing a point Putnam made, to talk of structure dependence is to make a category mistake. Structure dependence (and structure independence) are properties of transformations: Note that both of my hypotheses, H1 and H2, present rules that apply to a sentence, deforming its internal structure in some way (to be precise, the rules apply to the abstract structures underlying sentences, but we may put this refinement aside). Both the structure-independent rule H1 and the structure-dependent rule H2 make use of the concepts ‘sentence,’ ‘word,’ ‘first,’ and others; they differ in that H2 requires in addition an analysis of the sentence into abstract phrases. A rule that does not modify the internal structure of a sentence is neither structure-dependent nor structure-independent. For example, a phrase structure rule, part of a phrase structure grammar in the technical sense of the term, is neither structure-dependent nor structure-independent. (Piattelli-Palmarini 1980:315)

Pullum and Scholz (2002), in their extensive discussion of poverty of stimulus arguments, call the auxiliary-fronting argument discussed earlier in this section ‘the apparently strongest case of alleged learning from crucially inadequate evidence discussed in

238 Howard Lasnik and Jeffrey L. Lidz the literature, and certainly the most celebrated.’ But they reject Chomsky’s claim of unavailability of relevant evidence for the learner, citing especially Sampson (1989). They say that it has not at all been established that children are not exposed to interrogative sentences where the subject contains an auxiliary verb, and where that auxiliary is not what has been fronted, and they present several occurring examples found in various searches. They state that ‘Chomsky’s assertion that you can go over a vast amount of data of experience without ever finding such a case is unfounded hyperbole. We have found relevant cases in every corpus we have looked in’ (Pullum and Scholz 2002:44). Pullum and Scholz (2002) conclude: Our preliminary investigations suggest the percentage of relevant cases is not lower than 1 percent of the interrogatives in a typical corpus. By these numbers, even a welfare child would be likely to hear about 7,500 questions that crucially falsify the structure-independent auxiliary-fronting generalization, before reaching the age of 3. But assume we are wrong by a whole order of magnitude on this, so there are really only 750. Would exposure to 750 crucial examples falsifying the structure- independent generalization not be enough to support data-driven learning? (Pullum and Scholz 2002:45)

This argument misses two essential points. First, empirically, it is simply not correct. The corpora examined by Pullum and Scholz were all newspaper corpora which are not representative of speech to children. When corpora of child-directed speech were examined (Legate and Yang 2002), the relevant disambiguating data occurred at rates of less than 0.07% of all utterances, a number substantially (nearly two orders of magnitude) less than that of other constructions which are acquired at age 3. Second, as Lasnik and Uriagereka (2002) observe, essentially following Freidin (1991), this number of relevant sentences is largely beside the point. That is, while one might read Chomsky as implying that the availability of the relevant grammatical questions in the child’s data would undermine the poverty argument, that is not actually the case. Suppose the child is presented with data like (15). (15) Is the dog that is in the corner hungry? While this does seem to indicate the incorrectness of a rule demanding that the first auxiliary must be fronted to form an interrogative (16a) and the relative superiority of (16b), it provides no basis for excluding, say, a rule like (17a) or one like (17b). (16)

a. Front the first auxiliary. b. Front the auxiliary in the matrix Infl.

(17)

a. Front any auxiliary. b. Front any finite auxiliary.

The Argument from the Poverty of the Stimulus 239 c. Front the last auxiliary. d. Front either the first or last auxiliary. As Lasnik and Uriagereka (2002) observe,5 … P&S are missing Freidin’s point: [(15)] is not direct evidence for anything with the effects of hypothesis [(16b)]. At best, [(15)] is evidence for something with the logical structure of [(16a)] or X, but certainly not for anything having to do with [(16b)] …. (Lasnik and Uriagereka 2002:149)

In sum, while there have been many attempts to undermine the argument from the poverty of the stimulus through analysis of auxiliary fronting rules, these generally miss the point in several ways. First, while this argument is a prominent one, it does not represent the strongest case (indeed, when Chomsky 1975:30 discusses this case, he calls it ‘the simplest one that is not entirely trivial’); nor does the entire argument depend on the validity of this one instantiation. Second, the argument is about the relation between sentences and their structures. It is not about the grammaticality of particular yes–no questions; it is about the relation between these questions and the corresponding declaratives. Third, any poverty argument is based on a comparison of hypotheses. But, as noted above, any finite dataset is compatible with a potentially infinite number of grammars and so in essence a complete response to a poverty argument would have to compare perhaps an infinite number of hypotheses, and surely not just two. So, even if there is evidence in speech to children that favors the correct analysis over one explicit alternative, this does not decide the question against the poverty argument. In order to do so, it would be necessary to show that the available evidence would also rule out all other possible reformulations of the rule.

10.7 Structure Dependence and Statistical Analysis in Artificial Phrase Structure In research on animal learning, one of the most informative approaches to separating the contribution of the learner from the contribution of the environment is to withhold certain kinds of experiences from the learner and to see what gets acquired. If what is 5

Pullum and Scholz (2002) are aware of Freidin’s argument, but choose to overlook its deep significance, saying: ‘We ignore this interesting point (which is due to Freidin 1991), because our concern here is to assess whether the inaccessibility claim is true.’

240 Howard Lasnik and Jeffrey L. Lidz ultimately acquired is not affected by removing certain kinds of experiences from the organism’s normal life, then it follows that those experiences played no role in shaping the knowledge attained. While experiments like these are unethical to conduct with human children (but see Feldman, Goldin-Meadow, and Gleitman 1978, Landau and Gleitman 1985, and Senghas and Coppola 2001 for some natural variants), we can recreate this kind of selective rearing experiment in the laboratory through the use of artificial languages. Takahashi (2009) conducted just such an experiment examining the structure dependence of transformational rules. As is well known, constituent structure representations provide explanations for (at least) three kinds of facts. First, constituents provide the units of interpretation. Second, the fact that each constituent comes from a category of similar constituents (e.g., NP, VP, etc.) makes it such that a single constituent type may be used multiple times within a sentence, as in (18): (18)

[IP [NP the cat] [VP ate [NP the mouse]]]

Third, constituents provide the targets for grammatical operations such as movement and deletion: (19)

a. I miss [the mouse]i that the cat ate __i. b. The cat ate the mouse before the dog did [VP eat the mouse]

It is the third property which corresponds to the structure dependence of transformational rules. Thompson and Newport (2007) make a very interesting observation about phrase structure and its acquisition: because the rules of grammar that delete and rearrange constituents make reference to structure, these rules leave a kind of statistical signature of the structure in the surface form of the language. The continued co-occurrence of certain categories and their consistent appearance and disappearance together ensures that the co-occurrence likelihood of elements from within a constituent is higher than the co-occurrence likelihood of elements from across constituent boundaries. Thompson and Newport (2007) go on to argue that this statistical footprint could be used by learners in the acquisition of phrase structure. And, they show that adult learners are able to use this statistical footprint in assigning constituent structure to an artificial language. But showing that learners are sensitive to the statistical features of the environment does not yet provide information about the acquired representations. It is impressive that learners learned about the constituent structure of an artificial language given only statistical information about that structure. But this demonstration remains silent about the character of the acquired representations and the inferences that these representations license.

The Argument from the Poverty of the Stimulus 241 In order to determine whether the acquired representations have properties that derive from the structure of the learner, it is important to identify their deductive consequences. Do learners know things about constituent structure (even if this structure is acquired using statistical features of the environment) that are not evident in the statistics themselves? More narrowly, does the structure-dependence of grammatical rules follow from the fact that a grammar leaves a statistical signature on the input or does it follow from structure imposed by the learner on that statistical signature? In order to answer this question, Takahashi and Lidz (2008) constructed a miniature artificial grammar containing internally nested constituents. In addition, the grammar contained rules which allowed for the repetition of constituents of a certain type, the movement of certain constituents and substitution of certain constituents by pro- forms. They then created a corpus of sentences from this language in which these rules applied often enough to provide statistical evidence for the constituent boundaries. In other words, the language provided statistical cues to the internal structure of the sentences. Their first question, using this artificial language, was whether adults and infants could acquire constituent structure using only statistical information. The language was presented in contexts that did not provide any referential information, so that no meaning could be assigned to any of the words. And, there was no prosodic or phonological information of any kind that could serve as a cue to the phrase structure. So, to the extent that learners could acquire the phrase structure, they would have to do so through the statistical features of the exposure. In order to test whether the learners acquired the phrase structure, they asked whether the learners could distinguish novel sentences containing either moved constituents or moved nonconstituents. Since only constituents can move in natural languages, they reasoned that if learners could distinguish moved constituents from moved nonconstituents, it must be because they had learned the constituent structure of the artificial language. They found that both adults, after 36 minutes of exposure, and 18-month-old infants, after only two minutes of exposure, were able to do so (Takahashi and Lidz 2008; Takahashi 2009). Thus, the statistical footprint of constituent structure is detectable by learners and is usable in the acquisition of phrase structure. Now, the exposure provided to the learners in this experiment included sentences containing movement. Although the particular sentences tested were novel, they exhibited structures that had been evident during the initial exposure to the language. Takahashi and Lidz thus went on to ask whether the inference that only constituents can move derives from the learner’s exposure to movement rules which apply only to constituents or whether this inference derives from the child’s antecedent knowledge about the nature of movement rules in natural language. To ask this question, they created a new corpus of sentences from their artificial language. In this novel corpus were included sentences in which (a) certain constituents were repeated in a sentence, (b) certain constituents were optionally absent from a

242 Howard Lasnik and Jeffrey L. Lidz sentence, and (c) certain constituents were replaced by pro-forms. This combination of operations created a statistical signature of the phrase structure of the language such that it was possible to identify the constituent boundaries in the language. However, in this input corpus they included no examples of movement. This made it possible for them to identify the locus of the learner’s knowledge that only constituents can move. If this knowledge derives from the learner’s experience in seeing movement rules, then we would expect learners to be unable to distinguish moved constituents from moved nonconstituents. On the other hand, if the learner brings knowledge about what kinds of movement operations are possible in natural language to the learning task, then we would expect learners to correctly distinguish moved constituents from moved nonconstituents. They found that both adults and 18-month-old infants displayed knowledge of the constraint that only constituents can move, even when their exposure to the artificial language contained no instances of movement whatsoever. Thus, we can conclude that some of what is acquired on the basis of statistical information is not itself reflected in the statistics. Since the learners in this experiment had seen no examples of movement, their knowledge of the constraint that only constituents can move could not have come from the exposure to language but rather must have come from the learners themselves. More generally, while learners may get evidence about constituency from movement operations, the possibility that only constituents can move comes not from those experiences, but from constraints on possible movement rules that are part of the architecture of the language faculty. In this case, the learners knew about the structure dependence of transformational rules even without having any experience of transformations whatsoever. In sum, identifying the constituency of a language has consequences for novel sentences with structures never before encountered. These deductive consequences reveal the structure of the learner over and above any role of distributional learning. Distributional learning therefore functions as part of a process of mapping strings onto the grammar that generated them (perhaps in the Bayesian fashion described in section 10.5). But some properties of the identified grammar are contributed by the learner’s antecedent knowledge of the class of possible grammars.

10.8 One-Substitution and the Trouble with Indirect Negative Evidence One further well-known illustration of the poverty of the stimulus argument concerns the hierarchical structure of NP and the anaphoric uses of one (Baker 1978; Hornstein and Lightfoot 1981; Lightfoot 1982; Hamburger and Crain 1984). Consider two hypotheses for the structure of NP, given in (20). Both would, in principle, be possible analyses of strings containing a determiner, adjective, and noun.

The Argument from the Poverty of the Stimulus 243 (20) a Flat structure hypothesis

b Nested structure hypothesis 6 NP1 det

NP

N’ adj N’

det adj N0

N0

the red ball

the red ball

We know, on the basis of anaphoric substitution, that for adults (20b) is the correct representation (Baker 1978). In (21), the element one refers anaphorically to the constituent [red ball]. (21)

I’ll play with this red ball and you can play with that one.

Since anaphoric elements substitute only for constituents and since it is only under the nested structure hypothesis that the string red ball is represented as a constituent (i.e., with a single node containing only that string), it follows that (20b) is the correct structure.7 Now, although we know that the nested structure hypothesis reflects the correct adult grammar, and that one is anaphoric to N′ and not N0, how children acquire this knowledge is more mysterious. Consider the following learning problem (Hornstein and Lightfoot 1981). Suppose that a learner is exposed to small discourses like (21) in which one is anaphoric to some previously mentioned discourse entity and that the learner has recognized that one is anaphoric. In order to understand this use of one, the learner must know that it is anaphoric to the phrasal category N′, which is possible only under the nested structure hypothesis. However, the data to support this hypothesis is not available to the learner for the following reason. For positive assertions like (21), every situation that makes one = [Nʹ red ball] true also makes one = [N₀ ball] true. Thus, if the learner 6 The argument goes through in exactly the same way if we change the constituent labels of NP to DP and N′ to NP, as in Abney (1987). 7 Note also that one can be anaphoric to certain strings containing only a single noun, as in (i):

(i)

I like the red ball but you like the blue one.

Here, however, one is still anaphoric to N′. This can be seen by examining the difference between cases where the NP contains an argument and those in which it contains an adjunct: (ii) *I climbed the side of the building and you climbed the one of the mountain. (iii) I climbed the tree with big branches and you climbed the one with little branches. Because one cannot be anaphoric to a complement-taking noun (ii) without including the complement, it follows that one cannot be anaphoric to N0 (Lee (20)) and that cases in which it apparently is, such as (iii), represent cases of the head noun being contained in a larger N′ constituent.

244 Howard Lasnik and Jeffrey L. Lidz had come to the hypothesis that one is anaphoric to N0 and not N′, evidence that this is wrong would be extremely difficult to come by. Evidence that could support the N′ hypothesis over the N0 hypothesis comes from negative sentences like (22) in contexts in which Max has a blue ball. (22) Chris has a red ball but Max doesn’t have one. In such a situation, the learner who posited that one was anaphoric to the N0 ball, would have to conclude that he had built the wrong grammar (or that the speaker was lying) and thus be led to change the hypothesis. Now, in order for learners to build the correct grammar, such situations would have to be common enough for them to show up at levels distinguishable from noise in every child’s linguistic environment. Since such situations are not likely to be so common, we conclude that neither the flat structure hypothesis nor the hypothesis that one is anaphoric to N0 could be part of the hypothesis space of the learner. If they were, then some learners might never come upon the evidence disconfirming those hypotheses and would therefore acquire the wrong grammar. Since there is no evidence that English speakers actually do have that grammar, it simply must never be considered. The logic of the argument is unquestionable; however, it is based on the crucial assumption that the evidence that unambiguously supports the nested structure hypothesis does not occur often enough to impact learning. In addition, because it is an argument based on what adults know about their language, it is missing the important step of showing that at the earliest stages of syntactic acquisition, children do know that one is anaphoric to the phrasal category N′. Hamburger and Crain (1984), in response to Matthei (1982), addressed the latter issue by testing 4-to 6-year-old children and found that they do represent the NP with a nested structure and that they also know that one is anaphoric to the phrasal category N′. However compelling, evidence based on preschool-aged children cannot reveal the initial state of the learner or the mechanisms responsible for the acquisition of this syntactic structure. This type of evidence leaves open the possibility that learners begin the process of acquisition with a flat structure grammar, discover somehow that this structure is wrong, and subsequently arrive at the nested structure grammar to better capture the input. Lidz, Waxman, and Freedman (2003) addressed this concern by testing infants at the earliest stages of syntactic acquisition, since these infants are more likely to reveal the initial state of the learning mechanism. Lidz, Waxman, and Freedman (2003) tested 18-month-old infants in a preferential looking study (Hirsh-Pasek and Golinkoff 1996) in order to determine whether children represent strings like the red ball as containing hierarchical structure. Each infant participated in four trials, each consisting of two phases. During the familiarization phase, an image of a single object (e.g., a red ball) was presented three times, appearing in alternating fashion on either the left or right side of the television monitor. Each presentation was accompanied by a recorded voice that named the object with a phrase consisting of a determiner, adjective, and noun (e.g., ‘Look! A red ball’). During the test phase, two

The Argument from the Poverty of the Stimulus 245 new objects appeared simultaneously on opposite sides of the television monitor (e.g., a red ball and a blue ball). Both objects were from the same category as the familiarization object, but only one was the same color. Infants were randomly assigned to one of two conditions which differed only in the linguistic stimulus. In the control condition, subjects heard a neutral phrase (‘Now look. What do you see now?’). In the anaphoric condition, subjects heard a phrase containing the anaphoric expression one (‘Now look. Do you see another one?’). The assumption guiding the preferential looking method is that infants prefer to look at an image that matches the linguistic stimulus, if one is available (Spelke 1979). Given this methodological assumption, the predictions were as follows. In the control condition, where the linguistic stimulus does not favor one image over the other, infants were expected to prefer the novel image (the blue ball), as compared to the now-familiar image (the red ball). In the anaphoric condition, infants’ performance should reveal their representation of the NP. Here, there were two possible outcomes. If infants represent one as anaphoric to the category N0, then both images would be potential referents of the noun (ball). In this case, the linguistic stimulus is uninformative with regard to the test images, and so infants should reveal the same pattern of performance as in the control condition. However, if infants interpret one as anaphoric to N′, then they should reveal a preference for the (only) image that is picked out by N′ (the red ball). Subjects in the control condition revealed the predicted preference for the novel image, devoting more attention to it than to the familiar image. This preference was reversed in the anaphoric condition, where infants devoted more attention to the familiar than to the novel image. This constitutes significant evidence for the hypothesis that by 18 months, infants interpret one as anaphoric to the category N′, despite the fact that nearly any instance of anaphoric one in the input is consistent with both the one = N0 hypothesis and the one = N′ hypothesis. In addition to this behavioral data, Lidz, Waxman, and Freedman (2003) also conducted a corpus study aimed at determining whether it was really true that the evidence that would distinguish the one = N0 hypothesis from the one = N′ hypothesis did not occur in speech to children. They found that in a corpus of child-directed speech, unambiguous data that distinguishes the two hypotheses occurred at a rate of 0.2%, roughly ½ the rate of occurrence of ungrammatical sentences. Because learners should treat ungrammatical sentences as noise, they should not treat any data that occurs less often than that as useful. If they did, they would also learn that the ungrammatical sentences were grammatical, contrary to fact. Thus, the corpus analysis supports the step of the poverty argument concerning the unavailability of informative data. Now, Regier and Gahl (2003) noted that the sparseness of unambiguous data could be overcome if learners could use ambiguous data in conjunction with the predictions of alternative hypotheses to evaluate them. They reasoned that if one were anaphoric to N0, then sentences containing one should allow a wider range of interpretations than they would if one were anaphoric to N′. As noted, since every red ball is also a ball, but there are balls that are not red, then a grammar in which one was anaphoric to N0 would allow interpretations that excluded the property denoted by the adjective in the antecedent

246 Howard Lasnik and Jeffrey L. Lidz all balls blue balls green balls striped balls

red balls small balls

“...red ball...one...”

Figure 10.1 The red balls are a subset of the balls. If one is anaphoric to ball, it would be mysterious if all of the referents of the NPs containing one were red balls. Learners should thus conclude that one is anaphoric to red ball.

N’

red ball ball

purple bottle N0

bottle

ball behind his back

Figure 10.2 Syntactic predictions of the alternative hypotheses. A learner biased to choose the smallest subset consistent with the data should favor the one = N0 hypothesis.

NP, as depicted in Figure 10.1. Hence, if the property mentioned by the adjective in the antecedent always holds of the referent of the anaphoric NP containing one, this would be a conspicuous coincidence under the one = N0 hypothesis but is fully expected on the one = N′ hypothesis. Thus, a general learning mechanism that disfavors coincidences, like the Bayesian learner discussed in section 10.5, would come to prefer the N′ hypothesis over the N0 hypothesis. This idea shows the potential informativeness of indirect negative evidence. If learners are comparing the predictions of alternative hypotheses, then even if the data is ambiguous it might be more or less likely under each of the alternatives, providing some reason to favor one hypothesis over the other. However, in the particular case at hand, this kind of learning mechanism has been shown not to work, primarily because it concerns itself only with the semantic predictions of each hypothesis. Because the semantic predictions of the one = N′ hypothesis are a subset of the one = N0 hypothesis, learners should be biased towards the former. However, as shown in Figure 10.2, the two hypotheses also make predictions about the syntactic properties of their antecedents. In particular, the set of strings covered by the one = N′ hypothesis is larger than the set of strings covered by the one = N0 hypothesis. Pearl and Lidz (2009) show, with a range of computational learning simulations, that considering both the semantic and syntactic consequences of the two hypotheses leads to a learner that favors the one = N0 hypothesis, largely due to the 95% of utterances

The Argument from the Poverty of the Stimulus 247 containing one that do not have an adjective in the antecedent. In those cases, the two hypotheses make the same semantic predictions, but the one = N0 hypothesis is favored because it is the smallest hypothesis consistent with the data. The moral of this story is that when we consider the possibility of learning by indirect negative evidence, we must be sure to consider all of the syntactic and semantic predictions of each hypothesis under consideration. When we do so, we find that learning by indirect negative evidence may not be as effective as it seems when only considering a portion of the relevant data. Pearl and Lidz (2009) go on to show that the problem can be overcome by ignoring the vast majority of the data that the learner is exposed to (namely all of the data in which the antecedent NP contains no modifiers). However, because this discounting of certain data is not motivated by general learning principles, it follows that a general purpose learning mechanism (e.g., one that uses indirect negative evidence) can function only with domain-specific constraints on either the hypotheses under consideration or the set of data that the learner takes to be informative.

10.9 Conclusions The argument from the poverty of the stimulus remains one of the foundational cornerstones of generative linguistics. Because a grammatical theory must contribute to our understanding of how children come to have grammars (the ‘explanatory adequacy’ of Chomsky 1965), questions of learnability are intimately tied up with the proper formulation of the theory of syntax (see also c hapter 11). Because of this central place in the theory, it is important to understand the argument for what it is. Learning a language requires internalizing a grammar (i.e., a system for representing sentences). The internalized grammars have properties which do not follow from facts about the distributions of words and their contexts of use. Nor do these properties follow from independently understood features of cognition. Consequently, the way to force grammars to have these properties as opposed to others is to impose some constraints on the hypotheses that learners consider as to how to organize their experience into a system of grammatical knowledge. The point of these arguments is not that there is no way of organizing or representing experience to get the facts to come out right. Rather, there must be something inside the learner which leads to that particular way of organizing experience. The puzzle is in defining what forces learners to organize their experience in a way that makes the right divisions. This organizing structure is what we typically refer to as Universal Grammar: the innate knowledge of language that (a) shapes the representation of all languages and (b) makes it possible for learners to acquire the complex system of knowledge that undergirds the ability to produce and understand novel sentences. That said, it just as important to note that claims about the poverty of the stimulus and the existence of constraints on possible grammars do not eliminate the environment as a critical causal factor in the acquisition of a particular grammar. A complete theory of

248 Howard Lasnik and Jeffrey L. Lidz language development must show how the particular constraints of Universal Grammar (e.g., the necessary structure dependence of grammatical rules) makes it possible for learners to leverage their experience in the identification of a grammar for the language they are exposed to (see chapter 12). Positing a universal grammar constrains the learning mechanism to be a selective one, rather than instructive, in the sense that learning involves using the data in the exposure language to find the best-fitting grammar of that language, subject to the constraints imposed by Universal Grammar (Fodor 1966; Pinker 1979; Lightfoot 1982; and c hapter 11). Even if learners come fully loaded with innate knowledge about the range of abstract structures that are possibly utilized in language, they must still use evidence from the surface form of language to identify which particular abstract structures underlie any given sentence in the language to which they are exposed (Fodor 1966; Pinker 1979; Tomasello 2000; Viau and Lidz 2011; Lidz and Gagliardi 2015). But the fact that the input to children plays a causal role in the construction of a grammar does not undermine arguments from the poverty of the stimulus. Rather, the rich inferences that children make on the basis of partial and fragmentary data still provide strong arguments for the poverty of the stimulus and the contribution of innate principles of grammar in the acquisition of a language.

Chapter 11

Learnab i l i t y Janet Dean Fodor and William G. Sakas

11.1 Introduction The study of natural language learnability necessarily draws on many different strands of research. The goal—as least as we will construe it here—is to construct a psychological model of the process of language acquisition, which reveals how it is possible for a human infant, in little more than four or five years, to arrive at a sophisticated command of the complex system that is a human language. Such a project straddles the fields of descriptive and theoretical linguistics, developmental psychology, the psycholinguistics of language comprehension and production, formal learning theory, and computational modeling. Building a model of language acquisition must take into account the cognitive capacities (such as attention, memory, analytic powers) of a preschool child, the facts of the language that the child is exposed to, the relationship between those facts and the language knowledge the child eventually arrives at, and the kinds of data manipulation processes that might in principle bridge the gap between input and end-state knowledge. While the emphasis in the current state of research is on devising any learning system that could achieve what children reliably achieve, any particular model will be the more convincing, the more closely its predictions fit the observed developmental course of language acquisition (on which, see chapter 12). Progress has been made on many of these fronts in the last half-century or so, making this a lively and engrossing topic of research. But it must be acknowledged that despite the best of intentions, and some clever and innovative individual research programs, the contributing disciplines have not yet been able to integrate themselves to create a unitary picture of how children do what they do. Part of the reason for this, as will become clear, is that within each subdiscipline there are currently unsettled issues and competing views, so that building the overarching model we seek is like trying to piece together a jigsaw puzzle without knowing which of the puzzle pieces on the table belong to it. A classical disagreement concerns nature versus nurture. Linguists investigate the extent to which natural languages have properties in common: Are there linguistic

250 Janet Dean Fodor and William G. Sakas universals? The answer to this could hint at whether learners have innate knowledge of what a language can be like. That in turn may reflect on whether or not the mental mechanisms for acquiring language resemble those for acquiring knowledge of the world in general. A crosscutting uncertainty arising within linguistic theory concerns how abstractly the structures of sentences are mentally represented in the adult grammar. The answer to that may have implications for whether learners can gain broad linguistic knowledge from specific observations, or whether they have to build up generalizations gradually over many observations.

11.2 Mathematical Theorems The modern study of language learnability was founded in the 1960s following Chomsky and Miller’s remarkable collection of papers on the formal properties of natural language, including mathematical modeling of the relations between properties of languages and properties of the grammars that generate them (Chomsky 1959/ 1963; Chomsky and Miller 1963; Miller and Chomsky 1963).1 The learnability studies that ensued were mathematical also, and hence untrammeled by the demands of psychological plausibility that we impose today. Proofs were based on abstractions far removed from actual human languages (e.g., grammars were typically designated merely as g1, g2, g3 …, and the languages they generated as constituted of sentences s1, s2, s3 …). The groundbreaking work by Gold (1967) presented a general algorithm which, given an input sample from a target language, would hypothesize a grammar for that language. Gold showed that the success or failure of this or any other such algorithm depends not only on the properties of the target grammar itself but also on the properties of the range of alternative grammar hypotheses that might be entertained. (Technically, in that framework, what was learnable was thus not a language, but a class of languages.) This has been taken to underwrite the importance of there being an innate specification of the class of possible grammars for human languages (but see Johnson 2004). The most cited theorem resulting from Gold’s work, known as ‘Gold’s Theorem,’ establishes the nonlearnability of every class of languages in the ‘Chomsky hierarchy’ (see Chomsky 1956) from positive evidence only, i.e., from exposure to nothing but a sample of the sentences of the target language. This is called text presentation; it provides information about sentences that are in a target language, but no information about what is not in the language. This result resonated with linguists and psychologists who had observed the dearth of negative linguistic evidence available to children, providing anecdotes of children’s resistance to explicit correction (McNeill 1966; Brown and Hanlon 1970) and more recent systematic data concerning the nonspecificity of adult 1 There was also a rich tradition in categorial grammar research beginning with Ajdukiewicz (1935), continuing with the work of Bar-Hillel (1953) and Lambek (1958). For recent work in this tradition, see Steedman and Baldridge (2011) and references there.

Learnability 251 reformulations (Marcus 1993). Gold’s Theorem thereby fueled the conviction in some (psycho)linguistic circles that there must be innate constraints on the space of potential grammar hypotheses to compensate for the paucity of information in a learner’s input. (See discussion of ‘the argument from the poverty of the stimulus’ in chapter 10; see also chapters 5 and 12). This body of innately given constraints on the grammar space has come to be known as Universal Grammar (UG). Another important concept emanating from this research, still highly relevant today, is the problem posed by subset–superset relations between languages, for learners without access to negative evidence. Suppose the set of sentences in one possible language is a proper subset of the set of sentences in another one (e.g., the former is identical to English except that it lacks reduced relative clauses). Then a learner that hypothesizes the more inclusive language, when the target is the less inclusive one, would be capable of generating sentences that are ungrammatical in the target—an error that would be detectable by speakers of the target language, but not by the learner on the basis of the evidence available in positive-only input. It is clear that this does not generally happen, because if superset choices could be freely made, languages would historically grow more and more inclusive, as the output of one speaker’s grammar is input to later learners. Gold noted that this kind of ‘superset’ error would be avoidable if the possible grammar hypotheses were prioritized, with every ‘subset grammar’ taking precedence over all of its ‘superset grammars.’2 Imagine, then, that learners have knowledge of this prioritization, and systematically test grammars in order of priority, checking whether or not a given grammar generates the current input sentence. The learner would move on from one grammar to the next only if the former fails to generate a sentence in the input sample. Under this strategy a learner will never pass over the target grammar in favor of one of its supersets. In Gold’s proofs, this beneficial partial ordering of grammars was folded into a total ordering, which he called an enumeration. If human learners employed such an enumeration, it would also be of benefit in another way: it would automatically register which grammar hypotheses have already been tested and failed on some input sentence, making it possible to avoid unnecessary retesting. Current models with psychological aspirations mostly do not presuppose a full enumeration, for reasons to be discussed. However, a partial ordering (possibly innate) of grammars which respects subset–superset relations remains a central presupposition in theoretical studies of language acquisition, under the heading of the Subset Principle (Berwick 1985, Manzini and Wexler 1987; see later discussion in Fodor and Sakas 2005). Out of Gold’s work grew a whole field of research in mathematical learning theory, extending beyond human language learning. For language learnability, other mathematical studies followed Gold’s. Horning (1969) introduced a probabilistic approach and obtained a positive learning result: that the class of probabilistic context-free grammars is learnable. Angluin (1980) identified some nontrivial classes of languages that cut 2

Note that for convenience from now on we refer to subset relations between grammars as a shorthand for subset relations between the languages (sets of sentences) that the grammars generate.

252 Janet Dean Fodor and William G. Sakas across the Chomsky hierarchy, which are learnable on the basis of the original (nonprobabilistic) Gold paradigm. Osherson, Stob, and Weinstein (1986), also within the Gold tradition, contributed many additional theorems. Of particular interest to linguists was their proof that if a learner’s input sample of sentences from the target language could be ‘noisy,’ containing an unlimited number of intrusions of a finite number of word strings drawn from other languages in the domain, then learnability would be attainable only if the domain of languages were finite. This made a timely connection with the recent shift by Chomsky (1981a) from an open-ended class of rule-based grammars for natural languages to a finite class of parametrically defined grammars (see chapter 14). These formal studies of learnability had the great merit of delivering provable theorems, with broad application to all learning situations of whatever type was defined in the premises of the proof. If research had been able to proceed along those lines, it could have led rapidly toward the goal of a single viable learning model for language. But that was not possible. In order for such broad-scope proofs to go through, their premises had to be highly abstract and general, divorced from all the particular considerations that enter into psychological feasibility: the actual size of the set of grammar hypotheses, the extent of learners’ memory for individual sentences; the specific degree of ambiguity between the input word strings with respect to the grammars they are compatible with, and so forth. While we may not be able to assess these factors with great precision at present, some ‘ballpark’ comparisons can be made between formal models and human reality and they suggest that there is a considerable gulf between them. Subsequent commentaries have pointed out respects in which Gold’s framing of the learning process, however rigorous and illuminating it is in formal respects, may not be psychologically plausible as a basis for modeling human language acquisition; see especially Pinker (1979). While innate prioritization of grammar hypotheses could indeed eliminate superset errors, Gold’s concept of an enumeration as a total ordering of the possible grammars was regarded as making implausible predictions if construed psycholinguistically. Some languages would be acquired significantly more slowly than others, just because they appear late in the innate enumeration. Regardless of how distinctive its sentences might be, a ‘late’ grammar could not be adopted until all prior grammars in the enumeration had been tried and falsified.3 On the other hand, grammar ordering might be regarded as a virtue, related to the linguistic concept of markedness, if the grammars earlier in the enumeration were optimal grammars, conforming most closely to central principles of natural language. But Pinker (1979:234) judged that even so, some ‘good’ languages would be intolerably slow to attain: ‘In general, the problem of learning by enumeration within a reasonable time bound is likely to be intractable.’4 3 Since Gold’s enumeration learner was able to eliminate many grammars on the basis of a single new input sentence, this would not necessarily cause significant delay. Translated into psychological terms, however, this would amount to parallel processing of multiple grammars, on a possibly massive scale, which would disqualify it on grounds of psychological feasibility. 4 This assessment was made before parameter theory (Chomsky 1981a) imposed a finite limit on the number of possible human grammars, with the expectation of a finite number of learning steps to

Learnability 253 Perhaps the most telling objection against construing the Gold paradigm as the basis for a psychological model is that the grammars hypothesized by such a learner in response to input might bear no relation at all to the properties of that input. For example, consider Gold’s assumption that the learner proceeds along the enumeration until a compatible grammar is found. On the not unreasonable assumption that a human learner could test at most one new grammar against each input sentence (or entire input sample), an encounter with a sentence beyond the scope of the current grammar would cause a shift to the next grammar in the enumeration regardless of whether that grammar was any more compatible with the input than the current one—indeed it might be even less so. In contrast, we expect that when children change their grammar hypotheses in response to a novel input sentence, the properties of that sentence guide them toward a new grammar that could incorporate it. It would be odd indeed (cause for concern!) if a child shifted from one grammar to an arbitrarily different one, just because the former proved inadequate. Pinker made much the same point in advocating ‘heuristic language learning procedures’ which hold promise of learning more rapidly because they ‘draw … their power from the exploitation of detailed properties of the sample sentences instead of the exhaustive enumeration of a class of grammars’5 (1979:235). We return to this notion of ‘input-guided’ learning in section 11.6.

11.3 Psychological Feasibility As learnability theory aimed to become a contributing partner in the modeling of real-life language learning, it had to become more specific in its assumptions, both about languages and about language learners. In place of broadly applicable theorems about learning in general, particular models had to be developed and evaluated. These expanded ambitions proved difficult to achieve. The first noteworthy study in the new (psycho)linguistic tradition was that of Wexler and Hamburger (1973) and Hamburger and Wexler (1973, 1975); followed up by the heroic work of Wexler and Culicover (1980). The aim was to show that it was possible for a linguistically authentic grammar to be acquired by computational operations of a reasonably modest complexity, on the basis of a language sample consonant with the language that toddlers are exposed to. In other words, a learning model for natural language was to be developed which was psychologically feasible, computationally sound, and in tune with current linguistic theory. Wexler and Culicover’s demonstration of learnability made a very specific assumption about what kind of grammar was acquired: a transformational grammar in the general style of the (Extended) Standard Theory (as in Chomsky 1965, 1973), in which acquire them. But problems of scale have nevertheless remained central to learnability research, as we will discuss here; see also c hapter 14, section 14.2. 5

Pinker warns that the heuristic approach has its own pitfalls, however, if what is postulated is a large and unruly collection of ad hoc heuristics.

254 Janet Dean Fodor and William G. Sakas surface structures of sentences were derived by cyclic application of a rich variety of transformations to deep structures defined by context-free phrase structure rules, and only deep structures were semantically interpreted. Wexler and Culicover (henceforth W&C) assumed that the class of possible deep structures was tightly constrained and universal, and also that a learner could establish the deep structure of any given surface sentence (word string) from its nonverbal context via a unique association between deep structures and meanings. This was a particularly strong assumption, whose role was to create a fixed starting point for the typically complex transformational derivation of a surface sentence. Another crucial assumption concerned how the learner responded to a mismatch between the currently hypothesized grammar and the input. W&C proposed that on encountering an input sentence the learner would run through the transformational derivation from the known deep structure to whatever surface word string resulted from it by applying the currently hypothesized transformational rules. If that word string matched the input, no grammar change would be made; thus this learner was ‘error-driven’ (see also section 11.5.2). If it did not match, the learner would select at random a transformation to delete from the current grammar, regardless of its effect on the derived word string; or else it would add at random to the grammar any one (legitimate) transformational rule that would convert the predicted surface form into the actually observed surface form. As we noted in the case of enumeration-based learning, the resulting grammar might be less adequate than the one it replaced (e.g., loss of a needed transformation). This element of randomness in the revision of wrong grammars should be borne in mind as we proceed to other models that followed. W&C were able to develop an intricate proof that such a learner, under these conditions, would eventually converge on a descriptively adequate grammar of any target language compatible with their version of the Standard Theory, on the basis of reasonably simple input (‘degree-2’ sentences, i.e., with a maximum depth of two embedded clauses), though only if certain constraints on transformational derivations were respected. One such constraint, which W&C dubbed the Binary Principle,6 was strongly reminiscent of the Subjacency Condition that Chomsky (1973) had argued for on purely linguistic grounds. Other constraints that were required in order for the learnability proof to go through also had some theoretical linguistic credentials. For example the Freezing Principle7 was shown to cover a number of observed limits on transformational applications. The fact that independently motivated constraints on deep structures and on derivational operations were shown to be crucial contributions to learnability was greeted as further evidence for substantial innate pre-programming of humans for language acquisition.

6

‘… a transformation can involve material only in the S on which it is cycling or the next S down’ (Wexler and Culicover 1980:310). 7 ‘If the immediate structure of a node in a phrase-marker is nonbase, that node is frozen’, ‘If a node A of a phrase-marker is frozen, no node dominated by A may be analyzed by a transformation’ (Wexler and Culicover 1980:119).

Learnability 255 W&C’s work was a milestone in the search for a fully specified learning model that respected both linguistic and psychological considerations. It is true that in this early work some of the assumptions that bracketed the performance of the model were not the most stringent approximations to what might be assumed about child learners and their input (e.g., the necessity of some degree-2 input sentences). But for all that, this study clearly established a new research project for the future: proving natural language learnability within increasingly tighter and more realistic bounds of psycholinguistic feasibility.

11.4 Changing Grammars: The Move to Parameters No sooner had W&C completed their learnability proof than the linguistic theory which it presupposed (Chomsky’s Standard Theory, subsequently Extended Standard Theory) was abandoned. In 1981 Chomsky outlined a major revision of the theory of transformational grammar which had consequences both for the formal properties of the grammars that learners were assumed to acquire and for the mechanisms by which those grammars could be attained. In the course of introducing this new Government– Binding (GB) theory, Chomsky first proposed the concept of parameters as the means to codify cross-language differences in syntactic structure. This theory is therefore often referred to as the Principles and Parameters (P&P) theory, and this terminology (though loose) is useful since the concept of parameters has been long retained through various changes in the definition of government and other details of GB grammars, and even into the substantial theoretical revisions resulting in the Minimalist Program (Chomsky 1995b and since). Chomsky’s motivations for this major change in 1981 included both linguistic (typological) matters and learning-related issues (see also chapter 14). He wrote: The theory of UG must meet two obvious conditions. On the one hand, it must be compatible with the diversity of existing (indeed, possible) grammars. At the same time, UG must be sufficiently constrained and restrictive in the options it permits so as to account for the fact that each of these grammars develops in the mind on the basis of quite limited evidence. … What we expect to find, then, is a highly structured theory of UG based on a number of fundamental principles that sharply restrict the class of attainable grammars and narrowly constrain their form, but with parameters that have to be fixed by experience. (Chomsky 1981a:3–4)

P&P theory was actually the culmination of a growing trend throughout the 1970s to streamline grammars by eliminating the constraints previously included as part of each transformational rule, in favor of very general constraints on all rule applications (of the sort that figured in W&C’s learnability proof). Individual transformations were eventually replaced entirely by the assumption that any possible transformational operation

256 Janet Dean Fodor and William G. Sakas is free to apply at any point in a derivation except where specifically blocked by such constraints. The one maximally general transformational rule that remained was known most famously as ‘Move α’ (Chomsky 1980) and in even broader form as ‘Affect α’ (Lasnik and Saito 1992). This development left no way to register variation between languages in terms of their different sets of transformational rules, but the newly introduced parameters took on this role. All aspects of the syntactic component were assumed to be innate and universal, except for a finite number of choice points offered by the parameters.8 For example, one basic parameter establishes whether syntactic phrases are head- initial (e.g., the verb precedes its object in a verb phrase) or head-final (the verb follows the object); another parameter controls whether a wh-phrase is moved to the top (the Complementizer projection) of a clause or is left in situ (see chapter 14, section 14.3). In order to have full command of the syntactic structure of the target language, a learner would need only to establish the values of the parameters relevant to it. Thus, the research focus now shifted from how a learner could formulate a correct set of transformational rules, to how a learner could identify the correct parameter settings on the basis of an input sample.

11.5 Implementing Parameter Setting: Domain Search Chomsky himself was not very explicit about the mechanism for setting the syntactic parameters, but he drew attention to important aspects of it. Given a finite number of parameters, each with a finite number of values, the acquisition process would no longer be open-ended, but would be tightly focused. As Chomsky described it (1986b:146): We may think of UG as an intricately structured system, but one that is only partially ‘wired up.’ The system is associated with a finite set of switches, each of which has a finite number of positions (perhaps two). Experience is required to set the switches.

The learner would have to spot just one relevant fact about the language for each parameter value, or perhaps just one fact for each parameter if all parameters have default values which are pre-set at the outset of learning; for convenience of discussion we will assume default values in what follows. For n (independent) binary parameters, there would be 2n distinct grammars for the learner to choose between, but at most n observations to be made. For example, no more than 30 facts would need to be observed to set 30 8

Parameters were at first regarded as variables in the statement of the constraints, e.g., specifying whether S or S′ counted as a bounding node for subjacency (Rizzi 1978, 1982). In later work, parameters were associated with lexical items, especially functional heads, e.g., the head-direction parameter could be recast in terms of the direction of government of a given head (V, P, or the Infl(ection)/T(ense) morpheme); see in particular Travis (1984), Koopman (1984), and c hapter 14.

Learnability 257 parameters licensing over a billion grammars.9 Chomsky implicitly assumed that each such fact would be simple and readily detectable in the input available to the learner. Once set, a parameter value could have far-reaching and complex consequences for the structure of the target language (e.g., Hyams 1983), so the learner’s observation would not constitute evidence in a traditional sense; rather, it functions just as a ‘trigger’ that causes the value to be adopted by the learner. The radical simplification of what a child had to learn implied a corresponding simplification of the learning process, without need of any linguistic reasoning, generalizing, or integration of patterns across multiple sentences. The triggering of parameters was viewed as more or less automatic, effortless, and instantaneous, offering an explanation for how children acquire the target language in just a few years, during which their ability to acquire other complex skills (shoe tying, multiplication) typically progresses more slowly and haltingly. This P&P model was widely embraced by linguists and psycholinguists, at least those working within the Chomskyan tradition of generative grammar, which will be our main focus here. Nevertheless, it was a decade before any parameter-setting model was computationally implemented. This delay was certainly not due to any lack of interest. Perhaps the P&P learning model was perceived as so simple and mechanical that implementing it computationally was hardly necessary. Alternatively, with hindsight, one might surmise that attempts to embody it had discovered—sadly—that parameter setting cannot be automatic and effortless after all. Instead, as we now explain, it becomes ensnared by many of the complexities of human language that it was designed to escape. We will review here three notable implementations of parameter setting developed during the 1990s. While they differ from each other in various ways, they all depart from Chomsky’s original conception of triggering, in that they portray learning as a process of testing out whole grammars until one is found that fits the facts of the target language. Instead of the n isolated observations needed to trigger n individual parameters, these approaches require the learner to search through the vast domain of 2n grammars. In this respect these models resemble a Gold-type enumeration-based learner, although they employ other techniques for guiding the search, borrowing heuristics from computer science that were originally developed for purposes other than modeling language acquisition. Persuasive reasons were given at the time for this turn toward domain search, as we now review.

11.5.1 Clark’s Genetic Algorithm A series of papers by Clark (1990, 1992) first sounded the alarm that the linguistic relationships between parameter values and their triggers are often quite opaque. Clark presented examples of parametric ambiguity: sentence types which could be 9

For estimates of the actual number of parameters, see section 11.5.3, and chapters 14 and 16 for relevant discussion.

258 Janet Dean Fodor and William G. Sakas accommodated by resetting either one parameter or another. He noted that an accusative subject of an infinitival complement clause (e.g., John believes me to be smart) could be ascribed either to a parameter permitting exceptional case marking (ECM) by the matrix verb believes, or to a parameter permitting structural case marking (SCM) within the infinitival complement clause. He also pointed to interactions between different parameters, illustrated by an interaction between ECM/SCM and a binding domain parameter. With a negative value for the ECM parameter, an anaphor as subject of the complement clause (e.g., Johni believes himselfi to be smart) would demand a positive setting of the parameter licensing long-distance anaphora (LDA), but with a positive value of the ECM parameter (as is correct for English) it would not imply LDA. Ambiguities and interactions such as these create a minefield in which the learner might mis-set one parameter, which could distort the implications for another parameter, creating more errors. Over all this hangs the danger that some mistakes might lead to improper adoption of a superset grammar, from which no recovery is possible without corrective data. The conclusion that Clark drew from these important observations was that parameter setting cannot be a high-precision process but must involve a considerable amount of trial and error testing. Moreover, what must be tested is whole grammars, to avoid having to disentangle the interacting effects of individual parameters. The specific mechanism that Clark (1992) proposed drew from the computational literature on ‘genetic algorithms,’ applying their domain search procedures to the case of natural language. His genetic algorithm (GA) for syntax acquisition tested multiple grammars on each input sentence, ranking them for how well they succeeded in assigning a coherent syntactic structure to the word string. Each grammar in a batch (a ‘population’) is associated with a parsing device which is applied to the sentence, and the resulting parse is evaluated by a ‘fitness metric,’ which values economy of structure and which penalizes analyses that violate UG principles or the Subset Principle. After evaluating one population of grammars in this way, the GA would repeat the process with another and another, and would begin to ‘breed’ the highest-performing grammars from each. That is, their parameter values would be mixed into new combinations, creating new grammars which could then be evaluated in turn, in hope that eventually just one grammar would stand out as superior to the others, while low fitness grammars are removed from the pool. Clark also assumed a ‘mutation’ operation which changes the value of a parameter at random, as a means of introducing fresh variation into the pool. The GA’s search for the target grammar is thus linguistically directed, with the pool of candidate grammars being progressively narrowed to those that best cover the input data. As Clark observed (1992:95): It should be clear that a learner cannot blindly reset parameters. He or she must, in some sense, be aware of the potential effects of setting parameters and must have some means of moving toward the target, based on experience with the input text.

However, the amount of work involved in extracting relevant information from the input text is clearly far greater than was originally anticipated by the P&P theory. The

Learnability 259 GA keeps track of large numbers of grammars and obtains rich information about the sentence structures assigned by their associated parsers. Inevitably the question arises as to whether these operations would exceed the computational capacity of a small child. It is not easy to imagine that each time a toddler hears a sentence uttered by a parent, (s) he attempts to parse it with many different grammars in the small amount of time before responding to that utterance or hearing another one. Also, and surprisingly in view of the substantial memory and computing powers ascribed to this learner, Clark reports that the GA fails to acquire the correct parameter values in some cases. This is because the grammar-breeding process is guided by whole-grammar fitness and is only approximately related to the correctness of the specific parameter values comprising those grammars. So the GA will sometimes favor grammars with an incorrect parameter value. If the correct value of that parameter is not present in the narrowed pool of highly valued grammars, then it will be difficult to recover it; only the mutation operator could do so, and the probability that it would target the one parameter in question is low. Clark notes that this danger grows as the pool of candidate grammars is whittled down in later stages of learning, i.e., as the learner is closely approaching the target grammar. This is called the ‘endgame problem’ and is well known in studies of genetic algorithms for other types of learning besides language. Making a virtue of this, Clark and Roberts (1993) observe that the susceptibility of GAs to misconvergence can be an advantage in accounting for language change, as has been argued for other models as well (e.g., Lightfoot 1991; see also chapter 18).

11.5.2 Gibson and Wexler’s Triggering Learning Algorithm In a much-cited article, Gibson and Wexler (1994; henceforth G&W) presented a parameter setting model which they called the Triggering Learning Algorithm (TLA). Like Clark’s GA, the TLA did not reflect Chomsky’s picture of ‘automatic’ parameter setting in which an encounter with a trigger datum in the input would simply flip the appropriate parameter switch to the correct value. Despite its name, the TLA had no knowledge of what the triggers are for the various parameters, and hence no means of detecting them in the input. Most specifically, it had no means of identifying which parameters could be reset in order to license a particular novel input sentence. The TLA was input-guided only in the minimal sense that, once a grammar hypothesis had been selected by the learner for testing on an input sentence, that sentence could support or disconfirm the hypothesis. The TLA proceeded by trial and error, making random guesses about what parameter values to try out, constrained by three general procedural constraints: change the current grammar only if it has failed to license the current input sentence; change to a new grammar only if it licenses the current input sentence; change the value of at most one parameter in response to one input sentence. The first of these (‘error driven learning’) was intended to provide stability in the sequence of grammar hypotheses; the second

260 Janet Dean Fodor and William G. Sakas (‘Greediness’) was intended to yield improvement over the past grammar hypothesis; the third (the ‘Single Value Constraint’) was intended to conserve as much past learning as possible while updating the grammar in response to an as yet unlicensed input. A major advantage of the TLA over the GA, with obvious relevance for psychological feasibility, is that the TLA places minimal demands on memory and processing capacity. Like the GA, the TLA uses the sentence parsing mechanism to try out a grammar on an input sentence. But where the GA tests multiple grammars on the same sentence, the TLA tests only one new grammar at a time, and then only if parsing with the current grammar has failed. Also, where the GA extracts rich information from each parse (e.g., the number of grammatical principles violated, the lengths of syntactic chains), the TLA receives only a simple notification of success/failure of the parse. Thus, the TLA represents an opposite extreme from the GA, employing just about the simplest imaginable mechanism. It walks around the domain of possible languages, one small step at a time, seeking to arrive at the target just by turning away from unsuccessful grammars and toward more successful ones. A likely consequence of reduced computational power is reduced success, and indeed G&W showed that the TLA would fail to converge on the target grammar in a number of cases. The problem is caused by what are known as local maxima (sometimes local minima), in which the learner’s current hypothesis is wrong but the only other accessible hypotheses (limited by the Single Value Constraint) are no better. When the current hypothesis is locally the best one available, this learning mechanism cannot escape to explore other areas of the grammar domain which might afford even better hypotheses. This is not unrelated to the ‘endgame’ problem of Clark’s GA. It is a well-known property of the class of ‘hill-climbing’ algorithms, which search a hypothesis space by making local changes in the direction of improvement. Such searches are too shortsighted to be able to detect an overall optimal hypothesis beyond the neighborhood currently being scoured. It is unclear just how great a price the TLA pays for its extreme economy of resources. But G&W reported that the TLA failed to converge on the correct parameter values for 3 out of 8 languages in a tiny artificial language domain defined by 3 binary parameters (initial/final subject, initial/final head in VP, +/−Verb Second). Niyogi and Berwick (1996) subsequently provided a more detailed probabilistic analysis demonstrating additional TLA failures. By contrast, Turkel (1996) showed that a GA did not succumb to any local maxima errors on this same domain. It is possible that the properties of the G&W 3-parameter domain are not characteristic of the far more extensive domain of natural languages (see chapters 14, 15, and 16). It can be hard to predict in general whether performance would be improved on a larger domain, due to richer input information, or would decline due to more parameter interactions. However, increased domain size did not assist the TLA when it was tested on a 12-parameter domain (Kohl 1999). It was found that only 57% of the 4,096 languages in that domain could be reliably attained without the TLA getting trapped in a local maximum, even given the most facilitative starting-state default parameter values.10 10

Also, Fodor and Sakas (2004) observed 88% non-convergence by the TLA on a 13-parameter domain of 3,072 languages unrelated to Kohl’s domain.

Learnability 261 G&W explored various ways of avoiding these failures (e.g., giving up the Single Value Constraint and/or Greediness constraints). Noting that the problematic target grammars in the eight-language domain were all −V2 grammars, G&W contemplated designating −V2 as a default value, hence needing no triggers. The problem with this solution, for a nondeterministic (trial-and-error) system such as the TLA, is then to make sure that the default value is not prematurely given up by istake, which could spark further errors. (See Bertolo 1995a,b for discussion of possible remedies.) This model is not being actively pursued these days; however, a model akin to it but more resilient to local maxima has since been introduced by Yang.

11.5.3 Yang’s Variational Learning Model The new domain-search model developed by Yang (1999, 2002) has some of the strengths of both GA and TLA models. Yang’s Variational Learning model (VL) employs the more modest processing resources of the TLA, testing only one grammar on each input sentence, but like the GA it has a memory that registers a measure of past successes. In the case of the VL (unlike the GA), parameter values are assessed individually. As usual, parameters are assumed to be binary, with values 0 and 1. In the memory, a scale is associated with each parameter, and a pointer on the scale indicates the current weight of evidence for one value or the other. Whenever the 0 value of some parameter P is in a grammar that successfully parses an input sentence, P’s pointer is moved a small amount in the direction of the 0 value (i.e., the 0 value is ‘rewarded’); likewise, the 1 value of P is rewarded whenever it is part of a grammar that just succeeded in parsing an input. When the 0 value or the 1 value of P is part of a grammar that just failed to parse an input, that value is ‘punished,’ by moving the pointer a small way away from that value, towards the opposite value. If the pointer comes to within a very small margin from the 0-end or the 1-end of the scale, the parameter is deemed to have been set permanently to that value. When the VL encounters a new sentence, it selects a grammar to try parsing it with. The selection is probabilistic, based on the weighted scale-values of the various parameters in the domain. Grammars composed of previously successful parameter values are more likely to be selected for testing on a new input. But even low-valued grammars have some chance of being selected occasionally. Thus, the VL benefits from its past experience, as it accumulates evidence of which parameter values seem to suit the target language, but because it also samples a wide variety of other (less highly valued) grammars, it does not risk getting locked into an enticing but incorrect corner of the grammar space, as the TLA did. The occasional testing of low-valued grammars thus serves a similar purpose to the GA mutation operator in providing an escape route from local maxima. Yang has reported positive results in simulation experiments with the VL. In one study, Legate and Yang note that ‘the learner converges in a reasonable amount of time. About 600,000 sentences were needed for converging on ten interacting parameters’ (Legate and Yang 2002:51). A formal proof for that version of the VL shows that it is guaranteed to converge in any parametric domain except those containing subset–superset

262 Janet Dean Fodor and William G. Sakas relationships (Straus 2008). However, Straus also proves that in order to converge in some possible domains, the VL could consume a number of sentences exponential to the number of parameters. For any reasonable number of natural language parameters, learning would clearly be impracticable in this worst case. Yang (2012) presents a ‘reward-only’ variant of the model, which is shown to perform more efficiently than the original reward-and-punish version, especially in a language domain ‘favorable to the learner,’ i.e., with abundant unambiguous triggering data. The VL has psycholinguistic merits. Gradual learning is characteristic of the VL’s performance in the simulation studies, and Yang points to the child language acquisition literature as showing that a child’s transition to the target value of a parameter is gradual, with no sharp changes such as a classic ‘switch-setting’ model would imply. Also, the order of the VL’s acquisition of different parameters is correlated with the frequency of evidence for them in children’s linguistic environment. Yang (2012) notes that this too mirrors children’s linguistic development: for seven familiar parameters (including, e.g., wh-fronting and verb raising) the time course of child acquisition is predicted by the amount of evidence available in child-directed speech. This commitment to matching the performance of an implemented computer model to the facts of child language development establishes an important goal for all future modeling endeavors. But the VL has some quirks as a psychological model. Like the TLA it selects a grammar hypothesis to test without first consulting the properties of the current input sentence, so it may try out parameter values that bear no relation to the needs of that sentence, e.g., testing the positive value of preposition stranding on a sentence with no preposition. Moreover, that totally irrelevant preposition- stranding parameter value will actually be rewarded if the grammar it is included in succeeds in parsing that preposition-less sentence. Also, the VL predicts significant failures of sentence comprehension by learners. It does not always parse input with its currently most highly valued grammar, because of its need to sample lower-valued grammars. A low-valued grammar would sometimes, perhaps often, fail to parse a target-language sentence. So if children behaved in the manner of the VL, there would be occasions on which a child would be unable to parse a sentence, and hence would fail to understand it—even if it were a sentence she had understood just moments before when she parsed it with her current ‘best’ grammar! Without hard data against which to test this aspect of the model it cannot be regarded as a fatal defect, but it does seem implausible that all normal children deliberately process language with grammars they believe to be incorrect, sacrificing comprehension in order to achieve safer learning.

11.5.4 Domain Search: Taking Stock Three major approaches to modeling syntax acquisition, outlined here, follow Chomsky’s lead in assuming that natural language grammars consist of innate principles and parameters, with only the parameter values needing to be established by the learner. But they all depart from Chomsky’s original conception of acquisition as the triggering

Learnability 263 of individual parameters in a process that is psychologically plausible (resource-light), but is accurate and fast due to being strongly input-guided: on encountering a sentence of the target language, not yet licensed, the learning device would have direct knowledge of which parameters could or must be reset to accommodate it. The GA is the closest to being input-guided but is not resource-light. The TLA and VL are resource-light but not input-guided. For reasonably sized natural language domains, none of them is fast, though the VL is accurate in the weak sense that it is guaranteed to converge on the target grammar eventually. Nevertheless, however much these models differ from classical triggering and from each other, they do all come to grips with the twin problems of parametric ambiguity and parametric interaction, as emphasized by Clark. They can survive parametric ambiguity because they are nondeterministic, so any temporary mis-settings do no permanent harm (as long as they do not lead to Subset Principle violations or local maxima). These models can also survive complex parametric interactions because they do not even attempt to establish the values of single parameters in isolation from other parameters in a grammar (see chapter 16 for abundant illustration of parameter interaction). In one way or another, they assess whole grammars rather than choosing between the two values of each of n parameters—thereby undermining the economical and widely heralded ‘Twenty Questions’ character of Chomsky’s P&P proposal. These models have the advantage of drawing on known computational techniques, showing how general-purpose learning mechanisms can be adapted for language learning. As long as the UG principles and parameters are supplied as the knowledge base, there is no need to posit innate learning procedures specialized for human language. This is in keeping with the current emphasis on minimizing the complexity of the innate component of the language faculty (Hauser, Chomsky, and Fitch 2002), by shifting the burden to general cognitive mechanisms (see c hapter 6). In particular, unlike the original switch-setting model, these learning models do not need to be innately equipped with knowledge of the triggers for the parameter values. This will be relevant to discussion in section 11.6. On the other hand, as we have noted, the scale of domain search over the full domain of natural languages is immense, and nondeterministic techniques multiply that problem by repeating some steps many times over. Linguists have offered different estimates of how large the domain of natural languages is. Early estimates that about 30 syntactic parameters (already yielding over a billion grammars if parameters are mutually independent) would suffice to codify all cross-language syntactic differences, have given way in recent years to estimates many times larger. There may be several hundreds or thousands of microparameters (Kayne 2005a), which explode exponentially into trillions of trillions of candidate grammars. Promising proposals have been developed for systematizing and constraining the class of possible parameters (Gianollo, Guardiano, and Longobardi 2008; Roberts and Holmberg 2010; Roberts 2012, 2014) and for ordering them hierarchically so that some can be disregarded until others have been set (Villavicencio 2000; Baker 2008b; Biberauer, Holmberg, Roberts, and Sheehan 2010). As a simple example, a parameter for partial wh-movement is irrelevant until a higher level parameter licensing wh-movement of any kind has been set to its positive value.

264 Janet Dean Fodor and William G. Sakas If such proposals succeed in radically limiting the set of parameters that must be consulted at any stage in the acquisition process, then there may be a future for search-based learning models. But it is unclear at present whether the problem of scale can be tamed sufficiently to permit a search model to converge without exceeding either the cognitive resources of child learners or the time frame of their progress.

11.6 Implementing Parameter Setting: Input Guidance Despite the preponderance of trial-and-error domain-search learning models proposed during the 1990s, theoretical linguists were not greatly shaken in their allegiance to the original approach to parameter setting as envisaged in the 1980s, such that an input sentence not yet licensed by the learner’s grammar could reveal to the learning mechanism, without resort to extensive trial-and-error, which parameter settings could license it. Direct guidance by the input as to which parameter(s) it would be profitable to reset still seemed to hold the greatest promise of rapid and accurate learning. Learning theorists’ doubts about its feasibility did not deter syntacticians from continuing to propose unambiguous triggers for various parameters over the years (notably Hyams 1983, 1992 for the Null Subject parameter; Baker 1996 for the polysynthesis parameter; Pollock 1997 for the verb raising parameter). This optimistic perspective in theoretical syntax circles might have been expected to spur the development in computational modeling circles of input guided parameter setting systems in which Chomsky’s switch-setting metaphor would be concretely implemented. In fact, it was phonology that led the way. Dresher and Kaye (1990) pioneered an input-guided learning system for a collection of phonological parameters responsible for stress assignment, such as bounded vs. unbounded foot size; extrametrical syllables permitted at right or left (Dresher and Kaye 1990, Dresher 1999; henceforth DK). DK were well aware of the challenges involved. They noted that this approach requires the learning model to have knowledge of the triggers (which they referred to as ‘cues’) for all the parameters, and that these cues must be fully reliable, not infected by the ambiguity and interaction problems discussed in section 11.5. This is made more difficult by the fact that the linguistically relevant triggers are typically abstract properties not directly observable in the raw input a learner receives. In terminology introduced by Chomsky (1986b), the perceptible input is E- language, but the learner needs to extract abstract I-language facts from it.11

11

The terms I-language and E-language were coined by Chomsky (1986b:19–22). I-language is ‘internalized language’ which, following Jespersen, is ‘some element in the mind of the person who knows the language.’ This contrasts with E-language, which is ‘externalized language,’ i.e., following Bloomfield, ‘the totality of utterances that can be made in a speech community.’ See further discussion that follows, and c hapters 2 and 3.

Learnability 265 DK were able to solve problems raised by interactions among the metrical parameters they studied by imposing an ordering on the parameters, such that a reliable cue for one parameter could be discerned only once some other parameter(s) had been set correctly. The importance of an orderly sequence of learning events is reflected in the title of Dresher’s 1999 paper ‘Charting the learning path.’ The model also makes extensive use of defaults, both within parameters and between parameters, to establish priorities for the analysis of ambiguous inputs (e.g., in the absence of an unambiguous cue, assume the Quantity-insensitive value of the Quantity Sensitivity parameter). Some thorny problems remained, such as the unreliability of parameter values established on the basis of the default value of some other parameter, in case that default value was later overthrown by new input. One solution, adopted by DK in their 1990 implementation YOUPIE, was to assume non-incremental (‘batch’) learning, i.e., collecting up all relevant data before setting any parameters (contra psychological plausibility). An alternative, embraced by Dresher (1999), was to set back to their default values all parameters that had been set on the basis of the default value now overthrown. Another matter demanding reflection was how a learner could be constrained to adhere strictly to the learning path, without which there would be no way to avoid entanglement in troublesome parametric interactions. Assuming that infants could not discover the correct path unaided, it was presumed that it must be dictated by UG, along with the universal principles and parameter values and their cues. This was not incompatible with early views of the richness and specificity of innate linguistic knowledge, but it clashes with current attempts to minimize the extent to which biological specialization for language must be assumed (Hauser, Chomsky, and Fitch 2002). For syntax, one might set about developing an input-guided model of parameter setting in a similar fashion, by inspecting the parameters and triggers proposed by syntacticians on linguistic grounds and attempting to tidy away any damaging ambiguities or interactions between them. Following DK this might be achieved by refining the triggers, by imposing default values, and/or by ordering the parameters. Any such devices discovered to be necessary to ensure convergence on the learner’s target language might then be posited as part of the learner’s innate endowment or, where possible, be attributed to what are now called ‘third-factor’ influences, not specific to language, such as least effort tendencies or principles of efficient computation (Chomsky 2005 and chapter 6). We have made a start on this project, working with a modest collection of 13 familiar GB-style syntactic parameters in the context of a ‘structural-triggers’ learning model (Fodor 1998a; Sakas and Fodor 2012).12 In this model the parameter values are taken to be UG-specified I-language entities in the form of ‘treelets.’ A treelet is a substructure (a linked collection of syntactic nodes, typically underspecified in some respects) of sentential trees. A simple example would be a PP node immediately dominating a preposition and a nominal trace, signifying that a language can strand prepositions (as 12

Variants of this model can be found in Fodor (1998c) and in Fodor and Sakas (2004). Sakas (2000) and Sakas and Fodor (2001) present a more computational exposition of some of the issues.

266 Janet Dean Fodor and William G. Sakas in English Who did you play with today?, as opposed to pied-piping in standard French Avec qui avez-vous joué aujourd'hui?). The motivation for parametric treelets is that they can be used by learners to rescue a parse of a novel input sentence which has failed for lack of a needed parameter value; and in doing so they can reveal what new parameter value is needed. It is assumed on this approach that a child’s primary aim is to understand what others are saying. So the child tries to parse every sentence, using whatever her currently ‘best’ grammar is. If that succeeds, the child is doing just what an adult would do. But if it fails—i.e., if at some point the parse tree can’t be completed on the basis of the current grammar—the learning mechanism then consults the store of parametric treelets that UG makes available, seeking one that can bridge the gap in the parse tree. If some treelet proves itself useful, it is adopted into the learner’s grammar for future use in sentence processing (or, if learning is gradual, the activation level of that treelet is slightly increased, making it incrementally more accessible for future sentence processing). For example, imagine a learner of English who is familiar with wh-question formation but has not yet acquired preposition stranding (Sugisaki and Snyder 2006), and who hears the sentence Who did you play with today?. The child would process this just as an adult does, up to the incoming word today. At that point the child’s current grammar offers no means of continuing; it contains no treelet that will fit between with and today, and so today cannot be attached into the tree. The parser/learner will then search for a treelet in the wider pool of candidates made available by UG, to identify one which will fill that gap in the parse tree. Since with heads a prepositional phrase (PP), relevant UG treelets to be considered are those dominated by a PP node; among them, a PP with a null object will allow the parser to move forward to the word today. In this way the specific properties of input sentences provide a word-by-word guide to the adoption of relevant parametric treelets, in a narrowly channeled search process (find a treelet containing a preposition not followed in the word string by a potential object). A child’s sentence processing thus differs from an adult’s just in the need to draw on more treelets from UG because the child’s current grammar is as yet incomplete. In other respects the child is relating to language just as adults do. No additional language acquisition device (LAD) needs to be posited. The parsing mechanism exists in any case, and is widely believed to be innate, accessible to infants as soon as they are able to recognize word combinations to apply it to. And the tricky task of inferring I-language structure from E-language word strings is exactly what the human sentence parsing mechanism is designed to do, and does every day in both children and adults; the transition from learner to fully competent adult speaker is thus seamless. In comparison with the domain search models discussed above, this structural triggers learning-by-parsing approach has some efficiency advantages. It finds a grammar that can parse a novel input, unlike trial-and-error learners which first pick a grammar on some other basis and then find out whether or not it succeeds. So it avoids wasting effort trying out arbitrarily chosen parameter values, or testing grammars that fail, and it avoids the loss of comprehension that such failures would cause. Learning by parsing thus predicts faster convergence (confirmed in simulation studies, e.g., Fodor and

Learnability 267 Sakas 2004) and fewer errors of ‘commission’ (Snyder 2011) en route to the target. It also avoids the distracting free rides in which a parameter value is strengthened regardless of whether or not it contributed to a successful parse. Like the domain search models, it is compatible with the eliminative ambitions of recent linguistic theorizing, since it posits no more mechanism than is inherent in the ability to produce and understand language at all. It is also a potential source of third-factor influences since the parser exhibits some economy tendencies in familiar parsing strategies such as Minimal Attachment and the Minimal Chain Principle, as well as being open to frequency influences. The major cost of this approach is that the parametric treelets must be innately specified. How burdensome that is, and how plausible it is from an evolutionary point of view, remains to be determined. Note, however, that the treelets do double duty as (a) definitions of the parameter values themselves and (b) I-triggers for detecting those parameter values in the input.13 Every parametric theory must do at least the first, i.e., say what parameters it assumes. E-triggers do not need to be innately specified on this approach since they follow from the interplay of the I-triggers with general UG structural principles in the course of parsing; a parameter value at the I-level is acquired just in case it solves an online problem that the learner/parser has encountered at the E-level. Another uncertainty at present is how the local treelets that would be efficient for parsing relate to the derivational representations of recent linguistic theory, such as the bottom-up derivations resulting from Merge (but see Chesi 2014).

11.7 Learnability Theory and Linguistic Theories The trajectory of this survey reflects the course of thinking over the past 50 years, primarily within the tradition of transformational generative grammar, about the nature of human language and how its breadth and intricacies could be acquired by children so accurately and rapidly. Modern linguistic and learnability theory dawned with the riveting discovery that natural languages have a formal structure that can be studied with mathematical rigor. The emphasis today has shifted to a conception of the language faculty as a biological organ, functioning in concert with other perceptual and conceptual systems. The learning-by-parsing approach to parameter setting discussed above fits well with this latter stance, so it is worth observing that learning by parsing is compatible with a wide swath of current linguistic theories, even if they diverge greatly with respect to how they characterize the ways in which natural language grammars can differ. The structural elements that vary from one language to another, whether they are called parameters or not, are what the learner must choose between. 13

Lightfoot (1991) also emphasizes the I-language status of syntactic triggers.

268 Janet Dean Fodor and William G. Sakas Theoretical linguistic frameworks can thus be broadly characterized from a learnability perspective by examining the tools they offer to a parser/learner. One obvious difference between linguistic theories is the size of their ‘treelets’—the structural building blocks of sentential tree structures that are assumed to differentiate natural languages. The Minimalist Program (MP; Chomsky 1993, 1995b, 2001a, and since) represents the lean extreme on the scale of treelet size. The active elements that drive the course of a syntactic derivation are individual formal features (e.g., case features, agreement features) on functional heads. The ‘probe–goal’ apparatus which is the driving force of MP derivations establishes Agree relations between pairs of syntactic elements (a probe and a goal) in a tree structure, under which they can supply values for each other’s unvalued features. In order for this to occur, a goal constituent must in some cases, but not all, move to the neighborhood of the probe, thereby creating cross-language word order differences. Whether or not movement is triggered is determined by whether or not the probe carries a strong feature (or EPP feature or edge feature). Significant details are omitted here, but the import of this aspect of the Minimalist Program for syntax acquisition is discernable. In place of the free-standing parameters of the Government-Binding theory tradition, cross-language syntactic variation in MP is controlled by the featural composition of certain specific lexical items, some of which may be phonologically null. What is needed for acquisition is then a parser which can apply such a grammar online, and which can extend the learner’s inventory of formal features over time, as needed to build structure for the sentence patterns encountered in the input. By contrast, the structural units of cross-language variation in other theoretical frameworks can be quite large. In Tree Adjoining Grammars (TAGs), for example, the basic ingredients of syntactic structures are specified by the grammar as ‘elementary trees,’ which may span a complete clause. UG defines the set of possible elementary trees, and the adjunction and substitution operations by which they can be combined (Frank 2004). Construction Grammar, in its several variants,14 also admits syntactic building blocks with clausal scope. In the Cognitive Construction Grammar of Goldberg (2006), languages differ only with respect to which constructions they admit, where a construction is an abstractly characterized pattern that pairs aspects of form with a distinctive semantic or discourse role. Familiar examples are the passive construction, the subject– auxiliary inversion construction, the ditransitive construction, which are integrated in a sentence such as Will Mary be given a book? Other aspects of linguistic theories which may have consequences for learnability include how numerous their (counterparts of) structural ‘treelets’ are, how diverse they are, how surface-recognizable they are, and what, if any, general principles constrain them. In the present stage of research it is unclear which will prove to be the most important grammar characteristics for learnability, they may well include the following: Does the grammar format afford an effective psychological parser? (This is uncertain but

14

Goldberg (2005:ch. 10) outlines similarities and differences among several theories which embrace constructions as a (or the) major descriptive device for characterizing natural languages.

Learnability 269 under study at present for MP.) How deep or shallow is the relation between E-language and I-language structures? (Construction Grammar might anticipate an advantage for shallower I-structures.) Do subset and superset grammars stand in some systematic formal relation which could be exploited by learners to avoid Subset Principle violations? (Sadly, no linguistic theory has this property as far as is known now.) Not all current linguistic theories have been the target of simulation studies or formal theorems to test their potential for accurate and efficient acquisition in the way of the parametric models discussed in this article. But there is clearly worthwhile learnability research work to be done on all theoretical approaches. Psycholinguists are ever hopeful of finding a performance measure which differentiates between linguistic theories and can proclaim one to be more explanatory than others. It is possible, and to be hoped, that learnability as a litmus of the psychological reality of formal linguistic theories will one day play a more significant role than it has to date.

Chapter 12

First L an g uag e Ac quisi t i on Maria Teresa Guasti

12.1 Introduction Language is a distinctive feature that humans acquire naturally, without specific instruction, simply by being exposed to it and by interacting with other human beings in the first years of life.1 This second point is important, because language cannot be acquired simply by looking at TV programs (Kuhl, Tsao, and Liu 2003) or paying attention to examples. Human beings cannot help but use language to express their thoughts and communicate and if they don’t have a model, but can interact with other human beings, they invent a rudimentary system of signs (home signs) that allows them to communicate, as in the cases of deaf children born to hearing parents studied by Goldin- Meadow and collaborators (see Goldin-Meadow 2003). Interestingly, these home signs, regardless of the specific surrounding culture (Chinese or American) display some typical properties found in existing natural languages (e.g., they mark arguments following an ergative pattern, whereby the subject of transitive verbs is treated differently from both the direct object of transitive verbs and the subject of intransitive verbs). But it could not be otherwise: the shape of human languages is forged by our biology, which imposes severe restrictions on the range of variability. This is so not only for language. In a very insightful book on reading and writing, Dehaene (2007) shows that the various writing systems in the world share a lot of common features (e.g., almost all written characters involve by and large three strokes, rarely more, as seen in the characters F and N, which are formed by exactly three strokes; all writing systems include a small repertoire of basic shapes that they combine; the shapes of characters are similar to shapes found in the environment). He goes on to show that these similarities are dictated by our 1

I would like to thank Chiara Branchini, Francesca Foppolo, Marina Nespor, and Mirta Vernice for comments on previous versions of this article.

First Language Acquisition 271 genetic endowment: during evolution, characters were chosen whose shapes best fit the constraints imposed by our brain. Thus, there is no doubt that language is innate or has a genetic basis, but there is disagreement on what exactly is innate. For some scholars, it is a system of knowledge about the shape of possible languages that is part of our genetic endowment; for example, children ‘know’ that sentences have a hierarchical structure, that syntactic relations obey c-command or rules of language are structure dependent, etc. This system is called Universal Grammar (Chomsky 1981a; Crain and Thornton 1998) and guides children in the acquisition of language (see c hapters 5, 10, and 11). For others, what is innate are mechanisms for acquiring language, which may not be language specific, such as statistical mechanisms or mechanisms for reading the intentions of other human beings (Tomasello 2003). It is also possible that these mechanisms are language-specific or that they are designed to operate on linguistic objects in particular ways. Languages are acquired in similar manners and following a similar timetable across cultures: infants babble around six months, they speak their first words at around 12 months and combine words at around 24 months. Yet, just as there are variations among the various languages, there are variations among the ways specific aspects of languages are acquired. In this chapter, we will go through the process of acquisition starting from birth, and we will examine the acquisition of various pieces of linguistic knowledge beginning from phonology and ending with pragmatics. We will point out similarities and differences and provide some explanation for these differences. We will also suggest which mechanisms underlie the acquisition of various aspects of language.

12.2 Knowledge of the Phonological System Children already start to crack the linguistic code in the womb, where they hear their mother’s voice and prosodic features of their language, as demonstrated by their preference, at birth, to listen to stories read by their mothers during the last 10 weeks of pregnancy (De Casper and Spence 1986). Two to five days after birth, neonates’ left hemisphere (LH) is already organized to process speech stimuli. The LH, but not the right one, is activated when infants listen to normal utterances, but not when they listen to the same utterances played in the reversed direction, as established by Peña et al. (2003), using optical topography, a noninvasive imaging technique that estimates change in cerebral blood volume and oxygen saturation. This means that human beings are born with brain organization tuned to speech stimuli and particularly responsive to utterances produced in the environment. Speech stimuli, and not other sounds, are characterized by some properties that make the LH resonate. At birth, infants can discriminate their mother language from a foreign language, and they are also able to discriminate two foreign languages from each other. Thus, this

272 Maria Teresa Guasti ability does not depend on familiarity with one language, but on some distinctive feature that sets languages apart and to which our biological system is particularly sensitive. It turns out that this property is the rhythm of a language, which, roughly speaking, is given by the space occupied by vowels interrupted by the burst created by consonants (Mehler et al. 1996; Ramus, Nespor, and Mehler 1999) and this depends on the syllabic structure of a given language. Thus, strictly speaking, babies don’t discriminate between two individual languages, but between families of languages that have different rhythms: they discriminate syllable-timed languages (French, Italian, Spanish), that is, languages in which the rhythm is given by the regular recurrence of syllables, from stress-timed languages (Russian, English, Dutch), where the rhythm is given by the regular recurrence of stress. To see what we mean, consider the English sentence in (1) and the Italian one in (2) from Nespor, Peña, and Mehler (2003) and their representation in terms of consonants (C) and vowels (V). It is apparent that there are more consonants in English than in Italian, that between one vowel and the other the number of consonants is more variable in English than in Italian. These differences give rise to the different rhythms that we perceive when we listen to these sentences. (1) English: The next local elections will take place during the spring CVCVCCCCVVCVCVCVCCVCCCVCCVVCCCVVCCVCVCCVCCCVC ðǝnεkstlǝukǝlǝlεkʃǝnzwɪlteikpleisʤurɪŋðǝsprɪŋ (2) Italian: Le prossime elezioni locali avranno luogo in primavera CVCCVCCVCVVCVCCCVCVCVCVCVVCCVCCVCCVCVVCCCVCVCVCV leprossimeεlεtsjo:niloka:liavrannolwɔgoinprimavε:ra We can conjecture that, upon hearing stimuli like (1) and (2), infants build some abstract representation of the kind exemplified in terms of consonants and vowels and that on this basis they discriminate between distinct languages. In a very ingenious experiment, Nazzi, Bertoncini, and Mehler (1998) showed that newborns discriminated a mixture of Spanish and French, both syllable-timed languages, from a mixture of English and Dutch, both stress-timed languages, but did not discriminate a mixture of Spanish and English from a mixture of French and Dutch, as the two languages in each pair do not have the same rhythmic structure. In other words, infants can extract a common rhythmic structure when they hear English and Dutch and build a rhythmic representation that will turn out to be different from the representation underlying French and Spanish. As the rhythms of Spanish and English are different, there is no common representation that can be extracted, and nothing to support a discrimination behavior. Thus, newborns are attentive to the prosody of the language and figure out from this some aspects of its phonological organization. In doing so, newborns display an ability to extract an abstract representation from the speech stream, regardless of the specific input heard (be it familiar or not). This

First Language Acquisition 273 ability is remarkable, especially because infants can build an abstract representation of the speech stimuli after just 1,200 ms of an utterance (Dehaene-Lambertz and Houston 1998). From at least two months, infants are not only sensitive to prosodic properties, but also to segmental properties. It has been shown that they can discriminate a large number of phonetic contrasts, even those that are not present in the ambient language. However, in the following months, this ability is shaped by the linguistic environment, and at six months, babies cease to be sensitive to the vowels of other languages and become tuned to the specific vowels present in their language (Kuhl 1991). Similarly, by 12 months of age, infants are no longer able to discriminate non-native consonantal contrasts, although they were at six months of age. This is demonstrated by behavioral studies (Werker and Tees 1986), but also by imaging studies, in which an event-related potential (the Mismatched Negativity) is elicited in the left hemisphere by changes of repetitive native and non-native contrasts at six months, but only of native contrasts at 12 months (Cheour-Luhtanen et al. 1995). This tuning to the sounds of the ambient language prepares infants to acquire words, a process that is supported by another ability, i.e., that of segmenting the continuous speech stream into chunks corresponding to words. Typically, when we speak, we do not pause between one word and the next one; what we hear is a continuous uninterrupted sequence of words. Starting from seven months, babies display an ability to break this sequence into word candidates, based on various kinds of information (typical shapes of words, phonototactic regularities), one of which is the transitional probabilities (TP), namely statistical information about co-occurrence of two adjacent elements (e.g., sounds or syllables). In speech the probability that a syllable α is followed by a syllable β is higher if α and β belong to the same word than if they belong to two distinct words. Using an artificial language generated by a speech synthesizer and spoken in a monotone, Goodsitt, Morgan, and Kuhl (1993) and then Saffran, Aslin, and Newport (1996) established that seven-month-old babies keep track and can use TP to find word candidates (a hypothesis already alluded to by Chomsky 1975:ch. 6, n. 18). These experiments confirmed that infants are endowed with a powerful and non-language-specific mechanism to compute statistical dependencies among adjacent events (here syllables), a finding that engendered a lively discussion between the nativist and the nonnativist field. On the one hand, the claim that the whole language can be learned through statistical means was unwarranted, because it relied on an unsupported inference (if something can be learned in a certain way, everything can), on the other, the finding promoted exciting new research. Peña, Bonatti, Nespor, and Mehler (2002) showed that adults can segment the continuous speech stream using the TP between non-adjacent consonants, but not between non-adjacent vowels. If they hear (3), they notice that, in that string, the TP between P and R is 1, while the TP between the other consonants is lower and thus only Pura, Pori, and Piro are likely to be words. To better see the TP we put the consonants in capitals and bolded the more likely TP. (3) PuRaDuKaMaLiPoRiKaDuLoMiPiRoDuLe

274 Maria Teresa Guasti By contrast, if they hear (4), they are unable to figure out that the TP between U and A is 1, while the TP between other vowels is lower. Consequently, they cannot find words based on TP between vowels. (4) pUrAdUkOmArIbUtAlEpAtUgA These findings suggest that to find words we rely on TP between consonants and not between vowels, i.e., the statistical mechanism is sensitive to the nature of the linguistic representation and computes TP on an abstract representation based on the consonantal tier. This result makes a lot of sense and does justice to some properties of languages. For example, in Semitic languages, consonants, but not vowels, make up morphological roots. Across languages, we observe that there are more consonants than vowels. In other words, consonants, but not vowels, are dedicated to the representation of lexical items (Nespor, Peña, and Mehler 2003). Thus, during the first year of life infants acquire the prosodic properties of their language and its segmental properties and start to break the linguistic input into words. This tuning toward the input language is not only evident on the receptive side, but also in production. At around six months, babies engage in babbling, linguistic production lacking a meaning and consisting in sequences of the same syllables or of different syllables, such as mamama or mapamada. Soon after the start of the babbling phase, the vowels and consonants used in babbling are influenced by the surrounding environment (Boysson-Bardies, Hallé, Sagart, and Durand 1989; Boysson-Bardies and Vihman 1991). In summary, infants are born with the ability to set apart languages based on rhythm, they are initially able to discriminate between non-native and native sounds, an ability that is shaped by the linguistic environment through a process that can be characterized in terms of selection and that leads them at 12 months to discriminate only those sounds that have a phonemic value in their language (but see Kuhl 2000 for a different view). Infants are also endowed with a mechanism that tracks TP, likely taking into account the linguistic objects on which these TPs are computed, i.e., consonants and vowels.

12.3 Knowledge of the Morphosyntactic System When children start to put words together, they do it by respecting the word order of their mother language. Thus, English speaking children use the SVO order and their Turkish peers SOV (see Christophe, Nespor, Guasti, and van Oyen 2003 for a hypothesis about how children acquire this property). Order is one of the first properties that a child notices and acquires. But words often include morphemes that express tense and agreement, and these also start to be acquired within the second year of life. Agreement morphemes on verbs are acquired in a relatively short time in languages like Catalan,

First Language Acquisition 275 Italian, or Spanish (Hyams 1986; Guasti 1992/1993; Torrens 1995). By the age of two and a half, children speaking these languages have acquired the three singular morphemes of the present tense (1st, 2nd, and 3rd person) and soon after the plural ones. In other languages, like Dutch, French, German, and Swedish, children take up to three years to master the verbal agreement paradigm consistently and up to this age they optionally use root or optional infinitives (RI) (Rizzi 1993/1994; Wexler 1994). These are infinitive verbs that are used in main-clause declaratives. They behave syntactically like infinitives, as indicated by their relative position with respect to the negation or their failure to co-occur with clitic subjects in French. However, they stand for finite verbs whose temporal interpretation is recovered from the context (see Avrutin 1999). During this period of life (2–3 years), children feel free to say sentences like (5) or (6). (5) Elle roule pas It-FEM rolls not ‘It does not roll’ (6) Pas manger la Not eat-INF the-FEM ‘the doll does not eat’

poupée doll

Notice, in these sentences, the different placement of the negation depending on the finite or infinitive status of the verb. In English, RI are bare verbs without inflection (e.g., John drink). RIs are not found in the acquisition of Catalan, Italian, or Spanish (see Hyams 1986; Guasti 1992/1993; Torrens 1995). Why? Proposals trying to explain this difference capitalize on the null-subject nature of the languages that do not have RI (Wexler 1999) or on the failure of the infinitive verb, in those languages that have RI, to move to higher hierarchical positions (Rizzi 1993/1994) or to the consistent evidence for finite marking (Legate and Yang 2007). For Rizzi (1993/1994), the fact that infinitives do not move and stay in the VP allows the truncation of the syntactic tree at the VP level or just above. In languages in which infinitives raise to IP, as in Italian (Belletti 1990), truncation at the VP level is not possible. Be that as it may, one relevant property that somehow underlies both hypotheses is that languages lacking RI are languages in which the verbal agreement paradigm does not include syncretisms or it is very regular and has a three-person morphological distinction in the singular and in the plural verbal paradigm. By contrast, languages that manifest the RI phenomenon contain a lot of syncretisms and are quite irregular (in English present tense, 1st, 2nd singular and 1st, 2nd, and 3rd person plural do not have any inflection; some auxiliary verbs have more distinctions, others even fewer; in German, 1st and 3rd person plural are homophonous; 2nd person singular and plural are also homophonous). Linguistic studies have clearly shown that there is a link between richness of the agreement paradigm and the null subject status of a language (Taraldsen 1978). Given that children are quick to acquire the verbal agreement paradigm, as stated earlier, we may suppose that they quickly establish whether their language is a null subject language or not; accordingly they will or will not

276 Maria Teresa Guasti use RI (see Hyams 1986, Valian 1990, Rizzi 1993/1994 for the properties of early null subjects and an account for them). The role of the richness of the agreement system is central in Legate and Yang’s (2007) variational model of RI. According to these authors, children who produce RIs are exercising an option made available by UG, which they identify with the production of [– Tense] verbal forms. This option is operative in Mandarin, where verbs are not inflected for tense. Since children who produce RIs also produce finite forms, they are also entertaining the hypothesis [+Tense]. This means that both options are employed by the child and the incorrect one is gradually eliminated based on the input. The more the data available to children provide evidence for tense morphemes, the more the [+Tense] option increases in likelihood and the more the [−Tense] option is decreases in likelihood (see chapter 11, section 11.6, on imput-driven learning models). This approach makes it clear that a central point of this discussion, which is also at the heart of the RI phenomenon, is the faster acquisition of the agreement paradigm in languages with a richer paradigm than in languages with a poorer paradigm or with a lot of syncretism. Why does the course of language acquisition vary as a function of this property? This observation suggests that the system for acquiring language is endowed with a mechanism for seeking regularities; one regularity is the unique association of form and meaning. When every cell of a paradigm is filled with a distinct morpheme, acquisition proceeds smoothly, as each morpheme has a distinct meaning. Instead, when there is syncretism, the work of this mechanism is hindered, as the same morpheme is to be found in more than one cell and has different meanings: the form–meaning association is not unique in this case and the system takes time to find the irregularities. Between two and three years, children also start to use pronouns. Particular attention has been devoted to the acquisition of clitic pronouns. These are objects that are in some way deficient (Cardinaletti and Starke 1999): they cannot be used in isolation, as answers to questions; they must appear in specific positions and hang onto the verb (Kayne 1975). They must be used when the antecedent is highly accessible. Typically clitic pronouns (or simply clitics) have been studied in the acquisition of Romance languages. An example from French is in (7); (7a) is introducing a context that makes the use of the clitic in (7b) pragmatically plausible. (7) a. Qu’ a fait-elle la poule au chat? What has done-she the chicken to+the cat? ‘What did the chicken do to the cat?’’ b. Elle l’ a chassé. She him has chased ‘She has chased him.’ In (7b) we have a subject (elle) and an object clitic (l’). Here we will only discuss direct object clitics. Investigations in French, Spanish, Catalan, and Italian (Schaeffer 1997; Jakubowicz and Rigaut 2000; Wexler, Gavarrò, and Torrens 2004; Pérez-Leroux,

First Language Acquisition 277 Pirvulescu, and Roberge 2008) have demonstrated that initially clitics are optionally omitted and children produce sentences without a direct object as in (8). (8) (elle) — a chassé (she) — has chased ‘She has chased it’ (intended). Gradually clitics are used and target-like levels are reached by the age of 4–5 years. Different proposals have been advanced to explain this omission. Here we will mention only some of them. Schaeffer (2000) proposes that children omit clitics because their pragmatic knowledge is faulty; in particular they do not take into account that the hearer’s background may be different from theirs and what is highly accessible for them may not be so for their interlocutor. Avrutin (2004) claims that omission depends on the lack of processing resources and that missing information is recovered from the context. Omission of clitics is not a monolithic phenomenon. In European Portuguese (Costa and Lobo 2007), even at the age of five years, children omit quite a number of direct objects. On the basis of what is known from other languages, it is unlikely that these omissions are the expression of clitic omissions. In fact, European Portuguese, beyond the option of expressing an accessible element through a pronoun, has the possibility of expressing it through a null object, which alternates with the pronouns in (9): (9) Talking about Pedro: Encontrei(-o) ontem. I-met (-CL) yesterday ‘I met (him) yesterday.’ Costa, Lobo, Carmona, and Silva (2008) have proposed that at the age of five years, Portuguese-speaking children do not omit clitics, on a par with children speaking other languages, but they generalize null objects, even in syntactic contexts in which this is not possible, as in islands. Thus, while (11) is ungrammatical for adults, this is not so for children. In addition, Portuguese children not only produce sentences like (11), but they also attribute to their verb a transitive interpretation, which means that they are assuming that a null object is present in the representation. (10) A menina está feliz, porque a mãe está a penteá-la. the girl is happy because a mommy is combing her (11)

*A menina está feliz, porque a mãe está a pentear Ø. the girl is happy because a mommy is combing her

As we have just seen, omission of clitics does not follow the same trajectory in all early languages. The mirror image of the Portuguese case is represented by Spanish- and Romanian-speaking children, who omit clitics for a shorter period of time than

278 Maria Teresa Guasti Catalan-, French-, or Italian-speaking children (Wexler, Gavarró, and Torrens 2004; Babyonyshev and Marin 2005). While the latter reach target level of accuracy at 4–5 years, the former do so at 3 years. In the light of these findings, Wexler, Gavarrò, and Torrens (2004) proposed an interesting generalization: (12) Clitics are omitted for a longer period in languages that display past participle agreement with the preverbal clitic. In a nutshell and very approximately, Wexler et al.’s account of clitic omission is based on a uniqueness constraint holding in child language to the effect that a single operation can be performed: either insert the clitic or compute past participle agreement. With respect to this uniqueness constraint the Catalan-, French-, or Italian-speaking children are obliged to omit the clitic, while this is not so for children speaking Spanish or Romanian. It follows from this account that, in principle, children could also choose to just compute past-participle agreement and omit the clitic. This prediction is not fulfilled, however (Tedeschi 2009; Belletti and Guasti 2015). Beyond clitics, articles are also omitted for some time. Generally, the period of article omission ends before that of clitic omission. At least in some languages, at the age of three and a half, articles are no longer omitted, even though some articles are homophonous with some clitics. As in the case of clitics, omission of articles is found in various languages investigated, but the path of development is not the same. Guasti, Gavarrò, de Lange, and Caprin (2008) demonstrated that articles or their phonetic approximations are already employed at a higher rate from age two in Catalan and Italian than in Dutch. When in the former two languages, omission of articles has almost stopped, in the latter it is still high (see De Lange 2008). This difference among languages is probably due to some confounding factor, as in the case of Portuguese clitics. There, it was the option of null objects that influenced children’s behavior. For articles, it is the availability in a language of bare nouns in all syntactic positions. Dutch, like English and unlike Catalan and Italian, allows the use of bare nouns in generic contexts, as in (13) and in contexts in which a mass noun is used, as seen in (14). (13) Honden blaffen meestal. dogs bark usually ‘Usually dogs bark.’ (14) Ik wil melk. I want milk The equivalent of (13) in Italian, given in (16), requires an article in front of the noun. Bare nouns are ungrammatical in the preverbal position; in the postverbal position, mass nouns and bare plural nouns, can be used without an article, but not all speakers accept them to the same degree. Thus, I find (16a) to be a preferred option over (16b).

First Language Acquisition 279 (15) I cani abbaiano, generalmente. the dogs bark, usually ‘Usually dogs bark.’ (16) a. Voglio del latte. want.1SG some milk ‘I want milk.’ b. Voglio latte. want.1SG milk ‘I want milk.’ Although bare nouns are possible in Italian or Catalan, their use is highly restricted syntactically (only in complement position) and possibly, even in those cases that are legitimate, they are not highly preferred. It can be argued that Italian or Catalan children quickly come to the conclusion that articles are always compulsory, as they hear an input in which articles are generally required in all syntactic positions. Thus, after an initial stage in which they optionally omit them, either for phonological reasons or for some complexity reason, they converge on the target system. Children speaking Dutch also know that articles are used in their language, but, in addition, they know that bare nouns are a legitimate option. The availability of both options confuses them and leads them to err for a longer period of time than Catalan and Italian children. In conclusion, we can notice a parallelism between the acquisition of clitics and that of articles: convergence to the target system is delayed by the existence of other options.

12.4 Knowledge of the Syntactic System In this section, we will concentrate on one topic that has been widely investigated across various languages: the acquisition of wh-movement as it is expressed in questions. As in the 1960s scholars concentrated mainly on English, the first finding was that children had no trouble in moving the wh-element to the sentence-initial position, as in (17), but their acquisition of Subject–Auxiliary Inversion (SAI), whereby the auxiliary verb is fronted as in (17), proceeded in steps. (17)

Whati isk the boy tk reading ti?

Bellugi (1971) proposed the following stages for the acquisition of SAI, which were investigated in several later works (see Stromswold 1990 for a detailed discussion).

280 Maria Teresa Guasti (18) Failure to perform SAI SAI is first applied in positive yes/no questions only SAI is also applied in positive wh-questions SAI is applied in both positive and negative questions Although not all studies found evidence for these stages, especially for such a clear-cut division, it is clear that acquisition of SAI is not straightforward for English-speaking children (see Guasti, Thornton, and Wexler 1995 for evidence concerning the third stage; see Hiramatsu and Lillo-Martin 1998 for challenges). Difficulties with SAI are also manifested by the omission of the auxiliary, resulting in questions like (19) (see Guasti and Rizzi 1996): (19) What the boy said/say? To better appreciate what might go wrong in English, it is worth turning our attention to other languages. A major finding of more recent investigation is that wh-movement is performed from early on in those language in which it is required (Dutch, German, Italian, Swedish). Wh-in-situ is also employed in those languages in which it is a possibility, such as French (Hamann 2006) in (20). (20) Tu veux quoi? you want what? However, in contrast to what has been found for English, fronting of the verb is not problematic in languages that require it, such as German, Dutch, and Swedish (Haegeman 1995; Clahsen, Kursawe, and Penke 1996; Santelmann 1998), as exemplified by the German example in (21). (21) Wohin fährst du mit dem Zug? where go you with the train? ‘Where are you going on the train?’ Then, the question arises as to how this difference comes about. We may notice that those languages in which fronting of the verb is not problematic are languages in which movement of the verb to second position (V2) is generalized to all main clauses (questions, but also declaratives), and it is generalized to all verbs, auxiliaries, and lexical verbs, unlike in English, where only auxiliaries are fronted. Whether both factors matter or just one does can only be decided by looking at languages in which only one factor holds. Italian is a case in point. Consider a question that typically children and adults may form:

First Language Acquisition 281 (22) A chi darà un libro Gianni? to whom will-give.3SG a book Gianni-SUBJ? ‘To whom will Gianni give a book?’ In Italian wh-questions, the subject cannot intervene between the wh-element and the verb, a fact that has been interpreted by saying that the verb moves to a position adjacent to the wh-element in C (Rizzi 1996). This is so for both auxiliaries and lexical verbs. In (22), the subject is located in a right peripheral position. Children have no trouble forming questions with the order in (22) (Guasti 1996) or producing declarative clauses with the order SVO, which is the unmarked order in Italian. Thus, although V2 is not generalized in Italian, movement of the verb to C is not problematic. This suggests that the trouble for English children is the presence of an idiosyncratic class of auxiliaries that displays a peculiar behavior: that of being the only verbs in the language that move to C. This finding leads us to reiterate one observation for articles and clitics: when an option is regular and generalized to (almost) all contexts, acquisition goes relatively smoothly, otherwise it proceeds slowly. Again, this suggests that one mechanism for acquiring language is a regularity seeker. So far, we have been concerned with the formation of wh-questions in general. Now, we turn our attention to a concern less discussed in the past, but of growing interest nowadays: the asymmetry observed in the acquisition of subject and object questions. Although subject and object who-questions appear at the same time at least in English (Stromswold 1995), object who-questions, exemplified in (21), are more challenging than subject questions for children speaking a variety of languages. First, the former are more frequently produced than the latter in English (Stromswold 1995): (23) Who does the woman carry? Second, elicited production of subject questions is initially more accurate than that of object questions in English. O’Grady (2005), citing Yoshinaga (1996), shows that subject who-questions are 100% accurate by age two (100%), while accuracy is 8% for object who-questions. Improvements are seen at age four with an almost equal rate of accuracy in the production of both types of questions (see also van der Lely and Battell 2003). All questions elicited contained reversible verbs with two animate elements, as in (24): (24) a. Who washes the elephant? b. Who does the elephant wash? An explanation of this difficulty is in terms of processing: the wh-element and its trace are closer in subject than in object questions, as seen in (25) and thus the wh-element

282 Maria Teresa Guasti needs to be maintained in memory for a longer time in the latter than in the former case, taxing our memory system (De Vincenzi 1991): (25) a. Whoi ti washes the elephant? b. Whoi does the elephant wash ti? This is not the end of the story, though. A different picture emerges if we look at Italian: object questions are still more problematic than subject questions even at the age of 4–5 years in production (Guasti, Branchini, and Arosio 2012) and in comprehension they are difficult up to age 9–11 (De Vincenzi, Arduino, Ciccarelli, and Job 1999). Italian learners tend to produce a subject rather than an object question or mistakenly they interpret a subject as an object question. What goes wrong with Italian-speaking children? Consider subject and object questions in Italian: (26) Chi bagna gli elefanti? who wash-3SG the elephants? ‘Who washes the elephants?’ (27) Chi bagnano gli elefanti? who wash-3PL the elephants? ‘Who do the elephants wash?’ If we compare Italian in (26) and (27) with English in (24), we may notice that, in Italian, a subject and an object question have the same order. When both NPs are animate, as in the examples, a common way to disambiguate is through subject–verb agreement (which is possible only if the two NPs are different in number). In English, it is the different order of NPs in subject and object questions that differentiates the two questions. Returning to the difficulty encountered by Italian-speaking children, we can say that, although Italian children know the process of subject–verb agreement, they are misled by this process when it occurs in questions and/or with a postverbal subject; sometimes they take the postverbal NP as being the object rather than the subject. In this way, the wh-element is taken to be the subject and an object question is turned into a subject question. Things are different in English object questions, where the occurrence of the subject in a preverbal position unambiguously indicates that an object question is being uttered or must be comprehended (see Guasti et al. 2012 for a technical implementation of this idea).

12.5 Knowledge of the Semantic System One of the central properties of the semantic system is the interpretation of scopally ambiguous sentences like (28a): (28) a.

Every horse didn’t jump over the fence

First Language Acquisition 283 This sentence contains a universal quantifier (every) and negation and can be interpreted as in (28b) or (28c): (28) b. every horse is such that it didn’t jump over the fence (surface scope: every > not = ‘none’) (28) c. It is not the case that every horse is such that it jumped over the fence (inverse scope: not > every = ‘not all’) On the first interpretation (surface scope), no horse jumps over the fence; on the second (inverse scope), one horse could have jumped over the fence. Musolino, Crain, and Thornton (2000) demonstrated that five-year-old English speaking children prefer the first interpretation over the second. Lidz and Musolino (2002) confirmed the same finding with a different kind of quantifier, numerals. Children can also interpret every under the scope of negation, but only when the universal quantifier is in object position and follows negation, as in (29): (29) The Smurf didn’t buy every orange. Here children have no trouble in assigning narrow scope to every and wide scope to negation. On the basis of these findings, it has been claimed that children assign to scopally ambiguous sentences an isomorphic interpretation, whereby negation and quantifiers are interpreted on the basis of their position in overt syntax. By this, it is meant that the element that receives wider scope is also the element that c-commands and is hierarchically higher than the element with narrow scope. However, in English the isomorphic interpretation can also be obtained by relying on linear order and not on syntactic structure. This objection has been addressed by Lidz and Musolino (2002) based on a study of the interpretation of quantified negative sentences in Kannada. Kannada is an SOV language of the Dravidian family. It differs from English in that negation comes at the end of the sentence and the quantified object linearly precedes it, as in (30): (30) Naanu eraDu pustaka ood-al-illa. I-nom two book read-inf-neg ‘I didn’t read two books.’ However, in the hierarchical structure, negation c-commands the direct object, as displayed by the representation in (31): (31) [IP Naanu [NegP [VP eraDu pustaka] ood-al-illa]] I-nom two book read-inf-neg In other words, unlike in English, in Kannada linear order and hierarchical relations are not confounded. Therefore, one predicts that, if what matters is linear order, Kannada speaking children’s preferred interpretation would be one in which two books has wider

284 Maria Teresa Guasti scope than negation, i.e., the sentence should preferentially mean that there are two books that I did not read. By contrast, if what matters is c-command and the hierarchical organization underlying sentences, we expect that children will interpret the sentence as meaning that it is not the case that I read two books. Lidz and Musolino (2002) found that Kannada-speaking children prefer the second interpretation over the first, as do English-speaking children with the corresponding sentence in their language. These facts support the conclusion that English and Kannada speaking children compute scope relations on the basis of c-command and thus represent sentences not as mere strings of adjacent words, but as hierarchical objects. This result, which linguists take for granted, is quite remarkable, because there is no evident cue to the hierarchical organization in the string of words that one hears; yet, human beings rely on such representations when they compute meaning or produce sentences (other findings supporting hierarchical organization are discussed in Crain and Thorton 1998; Guasti and Chierchia 1999/2000; among others). The data from Kannada also confirm that children display a strong preference for the isomorphic reading. In contrast, adults can readily access both readings, in (28b) and (28c) (Lidz and Musolino 2002). Why do children differ from adults? There is evidence showing that we are not faced with a grammatical difference. Children can access the inverse scope reading (not > every in (28c)) in supportive contexts. For example, Musolino and Lidz (2006) found that familiarizing children with the intended domain of quantification enhances access to the inverse scope reading. In the story narrated to the child first all horses jumped over the log; then, they tried to jump over the fence, but only some of them succeeded. The puppet described the story with sentence (32). (32) Every horse jumped over the log but every horse didn’t jump over the fence. In these conditions, children accessed the not > every reading, much more than in a situation described with (28a), in which the horses were only involved in an event of jumping over the fence. Thus, the children’s grammar allows both interpretations; yet the inverse one is more difficult. There is consensus on the idea that the intricacy is caused by pragmatic factors. Divergences exist as to the exact nature of the factors involved and as to whether pragmatics alone is responsible. According to Gualmini et al. (2008) the preference for the isomorphic reading originates from a pragmatic requirement that a given interpretation is selected as an answer to a question under discussion (QUD). For example, the context used to describe (32) makes the question in (33) clearly relevant, and the answer is no in a situation in which only some horses jumped over the fence. (33) Did every horse also jump over the fence? By correctly answering no, children access the inverse scope interpretation, because in the context this interpretation meets the expectations raised by the first part of the sentence. These expectations are not met in those situations that typically have been used in the first experiments by Musolino et al. in which children did not access the inverse

First Language Acquisition 285 scope reading. In these situations children had to infer the QUD from contextual cues and this may be problematic. Another more articulated view is presented in Viau, Lidz, and Musolino (2010). These authors agree on the idea that pragmatic factors play a causal role in favoring the isomorphic interpretation, but they also assert that processing considerations related to ambiguity resolution are involved in explaining children’s behavior. In fact, they showed that the inverse scope reading can be primed by a previously heard unambiguous sentence. The experiment goes as follows. In the priming condition, children were first exposed to three different events, in which only two of three characters succeeded in some tasks. These events were then described with unambiguous sentences like in (34) (priming trials): (34) Not every horse jumped over the fence. The last three trials, instead, consisted of structurally similar events, but were described with ambiguous sentences like in (35): (35)

Every horse didn’t jump over the fence.

In a control condition, instead, children were only exposed to trials of the last type. Children accessed the inverse scope reading in the priming condition much more than in the control condition. This result is a hint that the inverse scope reading can be primed through previously heard unambiguous sentences. As priming is a typical effect one observes in situations of ambiguity resolution, one has to conclude that processing factors can facilitate or hamper the emergence of the inverse scope reading. In conclusion, children can access the inverse scope reading when the pragmatic set up is taken care of and when this reading is primed. On this view, the difference between children and adults would consist in adults possessing a more efficient parsing system and a more experienced pragmatic system.

12.6 Knowledge of the Pragmatic System We will investigate children’s acquisition of pragmatics through the case of scalar implicatures. A scalar implicature (SI), in the framework of a neo-Gricean approach, is an inference that is added to the uttered proposition, based on principles of rational conversation. Consider an example involving or. If a speaker utters (36a), her interlocutor would normally infer (36b): (36) a. Angela invited Sue or Lyn to the party. b. Angela invited Sue or Lyn to the party, not both.

286 Maria Teresa Guasti The added expression not both in (36b) is an implicature, that is, it is not part of the propositional content of the speaker’s utterance, but it’s an inference that the hearer draws from the speaker’s use of or. When speaking, we choose certain items that make our contribution as informative as is required to conform to the Cooperative Principle and the other conversational maxims that rule our talk exchanges (Grice 1989). Elements, like or and some are part of a scale ordered according to the informational strength, i.e., where or is the less informative element on the scale, and , where some is less informative than the other two items. Choosing the weaker (less informative) element in the scale means that one does not have sufficient evidence to use the stronger (most informative) element, or that she knows that it does not apply, otherwise she would have used it. Thus, by hearing A or B, the hearer will infer that NOT (A and B) and by hearing some, the hearer will infer NOT ALL. Although children know the meaning of or and some in felicitous contexts, they do not behave like adults, as they accept the sentence A or B when both A and B are true (Braine and Rumain 1981) or they accept that Some elephants are eating is true in a context in which all are (Smith 1980). However, the experimental setting matters (see also Katsos and Bishop 2011). Guasti et al. (2005) tested seven-year-old children in two different experimental settings. One employed a statement evaluation task in which children were presented with sentences like Some elephants have trunks and were asked to say whether they agree or not (see Noveck 2001). The second adopted a Truth Value Judgment task (TVJT) (Crain and Thornton 1998), in which the experimenter acted out a story in front of the child at the end of which a puppet had to describe what had happened in the story. The child was then asked to evaluate the puppet’s statement, saying if it was a good or a bad description of the story. For example, the puppet described a story in which five out of five Smurfs went for a trip by boat by saying that Some Smurfs went on a boat, which, in the situation, is a true but underinformative statement. While seven-year-olds were worse than adults in the statement evaluation task, this difference disappeared with the TVJT. Thus, the kind of task employed may elicit different types of responses and this suggests that children’s pragmatic ability in deriving implicatures is clearly present, but is not always put to use. Although the experimental setting matters, this cannot be the only factor that explains differences between children and adults, since using the TVJT Papafragou and Musolino (2003) found that five-year-old Greek-speaking children rejected underinformative statements much less frequently than adults. This result has been replicated now in a number of languages (see Hurevitz et al. 2006; Katsos and Bishop 2011 for English; Noveck 2001 for French; Papafragou and Musolino 2003 for Greek; Guasti et al. 2005 for Italian). In addition, for Italian Foppolo, Guasti, and Chierchia (2012) have shown that with the TVJT 5 year olds overaccept underinformative sentences more than seven-year-olds. Taking these results together suggests that there is a developmental change in the derivation of pragmatic inferences. It can be excluded that children do not know the meaning of these words, as they reject Some elephants are eating when none is or they accept it when only a subset is. It can also be excluded that children do not know which of two statements is more informative. When asked to say which of two characters described better a picture in which all elephants are eating, they do not hesitate in choosing the

288 Maria Teresa Guasti them to put words together in infinite meaningful ways for the purpose of specific conversations. Children build abstract rhythmic representations of their languages, they organize the sequences of sounds into words and, what is remarkable, they arrange these words into abstract hierarchical representations (that respect c-command), in spite of the fact that no evident cue for hierarchy is present in the speech stream. All this shows that children start to build various abstract representations of their language, depending on the linguistic component (phonology, syntax) and they can do so in a very short time. However, language acquisition does not happen all at once. Children differ from adults for a while: they omit certain words, articles, clitics, agreement morphemes, for reasons that may have to do with phonology, processing, or pragmatics. They have a preference for a certain interpretation, but they may eventually get other interpretations when the context is supportive, that is, they do not seem to lack knowledge, but to use it less effectively. The course of language acquisition presents similarities, but also differences that result from the specific properties of the language: children omit more articles in Dutch than in Italian because other options in the former language get in the way of the acquisition process. Similarly for clitics in Portuguese. According to a Universal Grammar perspective (Guasti forthcoming), the course of acquisition takes the shape it does because there exist specific mechanisms and constraints supporting this process (a claim that does not exclude the existence of non- language-specific mechanisms). Although children are endowed with a structured capacity for language, the environment plays its role too in forging our adult linguistic knowledge, as is evident from the specific course that acquisition takes in the various languages.

Chapter 13

The Role of U ni v e rs a l Gr amm ar in N onnat i v e L anguage Ac qu i si t i on Bonnie D. Schwartz and Rex A. Sprouse

There is a striking gap between the primary linguistic data (PLD) available to language- acquiring children and the rich systems of linguistic knowledge such children acquire (without systematic instruction and negative evidence). Furthermore, despite the haphazard and heterogeneous presentation of the PLD, children exposed to a given target language (TL) develop strikingly similar grammars of that language well before early puberty. The standard assumption in mainstream generative grammar has long been that children develop uniform grammars vastly underdetermined by their PLD because native language (L1) acquisition is guided and constrained by a set of innate domain- specific cognitive structures, generally referred to as Universal Grammar (UG); see in particular chapters 5 and 10.1 In light of this, the study of L1 acquisition has much to offer to our understanding of the nature of UG, almost by definition (see c hapter 12). However, it is an empirical question whether UG plays a similar role in (adult) nonnative language (L2) acquisition. On the surface of it, adult L2 acquisition might appear to be very different from L1 acquisition. Based almost exclusively on this intuition and a rather brief passage in Lenneberg’s (1967) treatise on the biological basis of human language, early research in generative grammar assumed that language acquisition was subject to a ‘critical period’ (the offset of which originally hypothesized as coincidental with the onset of puberty, but now often (much) earlier; see, e.g., Abrahamsson and Hyltenstam 2009), and that 1

Given the scope of this chapter, we will not attempt to provide further justification for this assumption. A mountain of evidence exists in support of this assumption, and attempts at refuting it generally ignore most of the relevant linguistic and developmental evidence available. For recent discussion, see Schwartz and Sprouse (2013); the contributions in Piattelli-Palmarini and Berwick (2013); and references cited therein.

290 Bonnie D. Schwartz and Rex A. Sprouse any attempts at language acquisition past the critical period would be (to some extent) frustrated by the absence of UG. A community of researchers interested in rigorous testing of specific hypotheses about the role of UG in L2 acquisition only began to form in the early 1980s. Two comments about that pre-1980s period of L2 research seem appropriate here. The first is that thinking about the ‘critical period’ often conflated (and even today conflates) two very different learning circumstances: (1) acquisition of an L1 and (2) acquisition of additional languages. It would of course be a criminal violation of ethics to deprive a child of all PLD for the purpose of studying L1 acquisition with a post-puberty onset. Curtiss’s (1977) classic study of Genie, a child who was isolated from direct human contact until the age of 13 by psychopathic parents and was, after her rescue, unable to acquire native-like English, is suggestive, but Genie was also subject to many forms of nonlinguistic deprivation, and it is thus impossible to assert with confidence the precise effect of late exposure to PLD on her post-rescue linguistic development. More recent studies of the late acquisition of signed language by young deaf children are perhaps more probative, again suggesting that native-like outcomes require very early onset of exposure to PLD (e.g., Mayberry 1993). It may well be the case that if the cognitive structures and/or learning mechanisms devoted to language in which UG is embedded are not activated by PLD during a window in early childhood, then the brain somehow discards them. However, it is logically premature to extrapolate from these results to the acquisition of new languages by adults who have acquired an L1 in the usual way in childhood. Our primary point here is that it does not logically follow from the establishment of a critical period for L1 acquisition that the same ‘critical period’ is relevant for (adult) L2 acquisition. A second comment regarding pre-1980s thinking about L2 acquisition and UG concerns the school of thought known as Creative Construction (see Krashen 1981 and references cited therein.) Creative Construction viewed adult L2 acquisition as essentially cognitively identical with child L1 acquisition, and it spawned pedagogical approaches that centered on providing L2 learners (L2ers) with ‘comprehensible input,’ eschewing the earlier behaviorist-based use of drills and explicit correction in order to cultivate TL-appropriate habit formation. However, Krashen and his fellow Creative Constructionists were not particularly attuned to the specifics of UG, merely to its existence as a set of cognitive mechanisms that enable language acquisition. In the final analysis, there was extremely little scholarly interaction between the (mostly) pedagogically oriented Creative Construction movement and the theoretically oriented linguists seeking to discover the nature of UG. In generative approaches to L2 acquisition, one of the central research questions is what the nature of ‘L2 grammars’ is, more specifically, what the role of UG is in L2 acquisition. In a highly influential article, Bley-Vroman (1990:6–13) lists ten observational differences between child L1 acquisition and adult L2 acquisition. As Bley-Vroman notes, not only are target-like outcomes not guaranteed in adult L2 acquisition, they appear to be extremely rare, if not impossible. Adult L2 acquisition seems to exhibit much more variation in success and developmental course than L1 acquisition does, and L2ers display a variation in goals with no counterpart in L1 children. Studies of, for

The Role of Universal Grammar 291 example, global proficiency suggest that the younger one begins to acquire a nonnative language (at least when immersed in a TL environment), the more proficient one is likely to become. L2 acquisition often levels off asymptotically long before convergence on the TL grammar (‘fossilization’), a phenomenon typically unknown in L1 acquisition.2 Bley-Vroman claims that unlike native speakers, ‘even very advanced non-native speakers seem to lack clear grammaticality judgements’ (p. 10). Bley-Vroman also notes the effect of instruction, negative evidence, and affective factors on outcomes in L2 acquisition, again phenomena playing no role in the L1 acquisition of the core elements of the grammatical system. From these observations arises Bley-Vroman’s Fundamental Difference Hypothesis, the proposal that adult L2 acquisition is not guided directly by UG, but rather by L1 knowledge and general problem-solving systems. Since L1 knowledge must instantiate much of the content of UG, L2 systems resemble natural languages in many ways; nevertheless, in the terms of their ultimate ontology, developing and end- state adult L2 systems are ‘fundamentally different’ from developing and end-state L1 systems. If L2 acquisition is not guided and constrained by UG, we might then expect to find ‘rogue grammars,’ L2 systems that are not merely TL-deviant, but that also violate well-motivated principles of UG. Indeed, some L2 research in the 1980s and 1990s claimed the existence of a few such rogue grammars, but none of these claims has gone unchallenged. Consider three examples. Clahsen and Muysken (1986) claimed that at several developmental stages, L1-Romance L2ers of German (Romance– German L2ers) posit grammars that violate principles of UG, such as Emonds’ (1976) Structure Preservation principle. However, duPlessis, Solin, Travis, and White (1987) and Schwartz and Tomaselli (1990) offer alternative analyses of these data, showing that each of the stages of Romance–German L2 development is amenable to independently motivated UG-compatible accounts. Schachter (1989) and Johnson and Newport (1991) claimed that native speakers of languages without overt wh-movement acquiring English posit grammars with movement operations violating Subjacency. However, Martohardjono (1993) and White and Juffs (1998) report data showing clear sensitivity to constraints on wh-movement in the English systems of precisely such L2ers. Klein (1993) claimed that L2ers of English often posit grammars that illicitly allow null prepositions in interrogatives (e.g., *What did John put the book Ø?), a phenomenon Klein maintained is unattested in any natural language grammar. However, Dekydtspotter, Sprouse, and Anderson (1998) point out that the relevant ‘null prep’ structures may arise through a UG-compatible process of preposition incorporation—and indeed that the phenomenon Klein claimed to be ruled out by UG is in fact attested in Yoruba. Recent years have seen few if any claims in the literature to the effect that L2ers develop rogue grammars. There are indications that some developing L2 systems/grammars (‘Interlanguages’) make use of options that UG makes available, but which are deployed neither in the 2

This may not be entirely correct, if language change is in fact at least in part initiated in children; see chapters 17 and 18.

292 Bonnie D. Schwartz and Rex A. Sprouse L1 nor in the TL. One such example is Schwartz and Sprouse’s (1994) L2 study of verb placement in German, a verb-second (V2) language, using a corpus of naturalistic production data collected over two years from Cevdet, a Turkish-speaking teenager living in Germany. At the beginning of the nonnative German corpus, Cevdet, an essentially untutored L2er, produces no main clauses with postverbal subjects, and at the end of the corpus, Cevdet produces main clauses with both pronominal and nonpronominal subjects in postverbal position. However, for a period of approximately one year, Cevdet robustly produces main clauses with postverbal subjects, but only when the subject is pronominal; when the subject is nonpronominal, it always precedes the verb (resulting in both licit SV … and illicit XSV … orders). Thus, Cevdet passed through developmental stages (1) with no postverbal subjects; (2) with only pronominal postverbal subjects; (3) with both pronominal and nonpronominal postverbal subjects. Curiously, the distinction in Stage 2 is suggested neither by Turkish, the L1, nor by German, the TL: Turkish exhibits neither verb fronting nor clitic pronouns; as a V2 language, German allows verb fronting in matrix clauses to a position before the subject, regardless of the subject’s pronominal/nonpronominal status. Yet, Cevdet’s asymmetry at Stage 2 is identical to the pattern found in French interrogatives, where postverbal pronominal subjects are licensed, but not postverbal nonpronominal subjects. Of course, both pronominal and nonpronominal postverbal subjects would have been robustly attested in the German PLD to which Cevdet was exposed. A straightforward interpretation is that during his Stage 2, Cevdet had settled on one of the options for licensing subjects made available by UG (and independently instantiated in French). Grammatical development is not instantaneous, and it took additional exposure to German input before Cevdet moved on to Stage 3, no doubt in response to many utterances in the PLD to which Cevdet’s Stage 2 grammar could not assign a licit representation. If there is no convincing evidence that (adult) L2ers develop rogue grammars and there is evidence that at least some of their non-TL-convergent Interlanguages seem to draw on specific UG-provided options used in neither the L1 nor the TL, one might ask whether there is evidence that Interlanguages regularly exhibit further hallmarks of UG. Following Schwartz and Sprouse (2013), we assume that the most compelling arguments for the role of UG in language acquisition are instances of extreme poverty of the stimulus (what Sprouse 2006 refers to as instances of the ‘bankruptcy of the stimulus’). Consider a two-by-two paradigm where the available PLD provide exemplars of three cells, as in (1): (1) Condition or Property 1A

Condition or Property 1B

Condition or Property 2A

attested in the input

attested in the input

Condition or Property 2B

attested in the input

The Role of Universal Grammar 293 Abstracting away from the child’s ability to ignore occasional noise in the PLD, the child would be led to treat this input as evidence for the grammaticality of these three cells in (2): (2) Condition or Property 1A

Condition or Property 1B

Condition or Property 2A

OK

OK

Condition or Property 2B

OK

We know that natural analogical extension must operate in a myriad of such cases, because every natural language grammar generates an infinite number of sentences ( pairings), despite the fact that any given child is exposed to only a finite and haphazard set of exemplars in the PLD. Natural analogical extension would therefore lead to filling in the lower right cell with ‘OK’ as in (3): (3) Condition or Property 1A

Condition or Property 1B

Condition or Property 2A

OK

OK

Condition or Property 2B

OK

OK

In a huge number of cases, such analogical extension is fully consistent with the generalizations underlying the TL grammar and makes it possible for a child to acquire that grammar without the need to hear exemplars of every imaginable combination of conditions or properties. However, a case of the bankruptcy of the stimulus obtains precisely where this does not occur, precisely where the child correctly (that is, ultimately in conformity with the intuitions of adult native speakers) ends up somehow knowing that the lower right cell is in fact ungrammatical, as in (4): (4) Condition or Property 1A

Condition or Property 1B

Condition or Property 2A

OK

OK

Condition or Property 2B

OK

*

Since there is nothing in the PLD to inform the child of this ungrammaticality and general problem-solving would lead the child to conclude the opposite, we are left with UG

294 Bonnie D. Schwartz and Rex A. Sprouse as the only plausible explanation.3,4 Because the generative syntax and semantics literature is replete with precisely such cases of the bankruptcy of the stimulus in natural language grammars, we can safely assume that UG includes many highly domain-specific properties.5 Turning to nonnative language acquisition, it is an empirical question whether UG is in fact operative in adult L2 acquisition. In addressing this question, many researchers pursuing generative approaches to L2 acquisition have adopted a ‘gold standard’ for the linguistic phenomena that would be most probative, outlined in (5): (5) The relevant property must a. be a genuine poverty (bankruptcy) of the stimulus problem, b. not be directly replicable in or derivable from the learners’ L1, and c. not be the object of instruction. We should add here that the research practice and culture of L2 acquisition research within the generative paradigm generally shares much with that of L1 acquisition research and departs from certain standard practices in general linguistics. In particular, L2 acquisition research relies on either naturalistic production data or data from carefully designed tasks. Although the tasks may include acceptability/plausibility judgments, one virtually never finds L2 studies based on the authors’ own Interlanguage intuitions or on interviews where researchers directly ask consultants about their L2 intuitions. In fact, every effort is made to design and administer tasks such that they do not reveal to the participants exactly which linguistic phenomena are under investigation. These tasks are often implemented via a computer program that allows only a relatively brief time for the participants 3 Intensive research on child L1 acquisition has shown that children simply do not entertain hypotheses that would (incorrectly) generate strings ( pairings) exemplifying the cell in the lower right of the relevant paradigms (note that this does not mean that L1 children never posit grammars that diverge from those of their input providers; see c hapter 18). And since children do not produce such examples themselves, they never receive corrective feedback (negative data) from older speakers. Furthermore, native speakers often have no metalinguistic awareness of these illicit gaps in the paradigms; hence, explicit instruction cannot be the source of this knowledge (see c hapters 5, 11, and 12 for more on negative evidence in L1 acquisition). 4 It is important here to distinguish between the existence of UG and particular hypotheses about the structure and content of UG. This distinction is very reminiscent of the distinction between the fact that current terrestrial life is the result of biological evolution and competing models or specific hypotheses about precisely how evolution unfolds. 5

This general and well-rehearsed line of argumentation holds only when the conditions/properties at issue are exclusively linguistic in nature, i.e., they cannot be reduced to more general conditions/ principles of cognition. Indeed, one might object that the spirit of Chomsky’s (1995) Minimalist Program is to eliminate the stipulation of domain-specific properties from our models of UG. However, the essence of the Minimalist Program lies in reducing the properties of the computational system to the ‘optimal solution’ for the problem of enabling interfaces between cognitive and motor systems with their own highly domain-specific properties. Thus, even if the computations of the ‘syntax proper’ are reducible to principles of elementary concatenation and simplicity, the larger system (including surface form and interpretation) still yields paradigms that cannot be readily accounted for in terms of general cognition. In short, domain specificity is still a defining characteristic of language. See also c hapter 6.

The Role of Universal Grammar 295 to register their responses and does not allow them to revisit an item once the subsequent item has been presented. Even when no time limit is enforced, instructions generally encourage participants to give their initial ‘feelings’ about the items and to refrain from backtracking. Some L2 researchers prefer to use purely ‘psycholinguistic’ tasks that rely on, for instance, the comparison of reaction times for identifying as identical two grammatical strings presented on a computer screen vs. two ungrammatical strings (with trials presenting nonmatching strings as distractors). Here the assumption is that it will take less time to recognize the identity of two strings generated by a given participant’s grammar than the identity of two strings not generated by the same participant’s grammar. The traditional justification for practices such as these is not that L2ers are in principle incapable of linguistic introspection, but rather that such introspection is likely to be fraught with linguistic insecurity and (in the case of instructed learners) a search for a pedagogical rule to inform their introspection. While such considerations may confound research on native grammars as well, they have long been judged much more acute in the investigation of nonnative grammars. A consequence of these practices is that the data in most generative L2 studies are quantitative and the interpretation of these data relies on statistically significant differences between sets of minimally paired sentences or interpretations. Generally, the tasks are also administered to a group of adult native speakers of the TL (hence predating the current push in theoretical linguistics to ‘experimental’ syntax and semantics). The primary purpose of this is to validate the instrument as capable of capturing the possible vs. impossible (or plausible vs. implausible) asymmetry at the heart of the study. In most cases, the response rates of neither native adults nor L2ers are 100% for the grammatical items and 0% for the ungrammatical items. This response pattern can seem puzzling to linguists focused on the formal properties of UG and accustomed to the presentation of more idealized judgments. Indeed, we believe that it is instructive that the response patterns of native adults on many of these tasks are not bifurcated into ceiling vs. floor responses. This is the nature of performance on a task. Both the introspective methodology of traditional generative linguistics and the task-based quantitative methodology of (L1 and) L2 acquisition research have the potential to reveal much about the subtleties of mentally represented linguistic knowledge. In both instances, what is relevant is the empirical demonstration of possible vs. impossible (or plausible vs. implausible) asymmetries that emerge in carefully designed comparisons of certain minimal pairs, but, crucially, not of other minimal pairs. Here we summarize two studies that fulfill the criteria listed under (5) and offer evidence that adult L2 acquisition is guided and constrained by (at least some of) the principles of UG implicated in L1 acquisition.6 Kanno’s (1997) basic research question is whether the Overt Pronoun Constraint (OPC) is operative in—i.e., limits the hypothesis space in—adult L2 acquisition. Based on Montalbetti (1984), Hong (1985), Saito and Hoji (1985), and Zamparelli (1996), the OPC may be stated as in (6). 6

We stress that these are only two representative studies. For a much broader overview of research on the role of UG in L2 acquisition through early 2003, we refer the reader to White (2003).

296 Bonnie D. Schwartz and Rex A. Sprouse (6) Overt Pronoun Constraint7 In languages that permit null arguments, an overt pronoun cannot be interpreted as a bound variable. The OPC provides a straightforward account of the following asymmetry in the interpretive possibilities of null (7a–b) vs. overt (7c–d) pronouns in Japanese. (7) a. Tanaka-sani wa [Øi/j kaisya de itiban da to] itte-iru. Tanaka-Mr. top company in best is that saying-is ‘Mr. Tanaka says that I am/you are/he is/she is (etc.) the best in the company.’ b. Darei ga [Øi/j sore o mita to] who nom that acc saw that ‘Who said that I/you/he/she (etc.) saw that?’

itta no? said Q

c. Tanaka-sani wa [karei/j ga kaisya de itiban da to] itte-iru. Tanaka-Mr. top he nom company in best is that saying-is ‘Mr. Tanaka says that he is the best in the company.’ d. Darei ga [karej/*i ga sore o mita who nom he nom that acc saw ‘Who said that he saw that?’ (examples based on Kanno 1997:266, (3); 270, (9), (10))

to] that

itta said

no? Q

In (7a), the null subject of the embedded clause can be interpreted referentially, as coreferential either with the matrix subject or with an extra-sentential antecedent. In (7b), the null subject of the embedded clause can be interpreted either referentially or as a bound variable. In (7c), kare ‘he,’ the overt pronominal subject of the embedded clause, can be interpreted referentially, again as coreferential either with the matrix subject or with an extra-sentential antecedent. However, while kare in (7d) can also be interpreted referentially (i.e., with an extra-sentential antecedent), the bound-variable interpretation is ungrammatical. This is a simple and elegant example of bankruptcy of the stimulus, as sketched in (1)–(4). Consider the direct evidence available in the input from utterances of the type in (7), summarized in (8): (8) Referential interpretation

Bound interpretation

Null pronoun

attested in the input

attested in the input

Overt pronoun

attested in the input

7 Kanno (1997:267, (6)) states the OPC as follows: ‘In languages that permit null arguments, an overt pronominal must not have a quantified NP as antecedent.’

The Role of Universal Grammar 297 This should provide a child exposed to Japanese PLD with evidence for the grammaticality of these three cells, as in (9): (9) Referential interpretation

Bound interpretation

Null pronoun

OK

OK

Overt pronoun

OK

Natural analogical extension would therefore lead to the conclusion that the overt pronoun can also receive a bound interpretation, as in (10): (10) Referential interpretation

Bound interpretation

OK

OK

Overt pronoun OK

OK

Null pronoun

However, this does not happen. Rather, children exposed to Japanese know that the interpretation of an overt pronoun as a bound variable is ungrammatical, as in (11): (11) Referential interpretation

Bound interpretation

Null pronoun

OK

OK

Overt pronoun

OK

*

This conclusion about the impossible bound-variable interpretation occurs in the absence of any PLD directly relevant to just that interpretation. Japanese children receive no metalinguistic instruction on this paradigm, and (unless enrolled in courses that examine the syntax–semantics interface in Japanese) neither do adult L2ers of Japanese. Furthermore, and specifically related to the Kanno L2 study, the Overt Pronoun Constraint is not directly operative in English, as shown in (12). (12)

a. *Who said that Ø saw b. Whoi said that hei/j saw

that? that?

English does not allow null arguments in tensed clauses (12a); thus, nothing prevents the overt pronoun in (12b) from taking either the referential interpretation or, importantly, the bound-variable interpretation (cf. (7d)). With this background in mind, Kanno (1997) administered an interpretation task (consisting of 5 tokens each of the 4 types in (7)) to a group of native speakers of Japanese

298 Bonnie D. Schwartz and Rex A. Sprouse Table 13.1 Referential vs. bound interpretations for null vs. overt pronouns in Kanno (1997) Native Japanese (n = 20) Referential

Bound

English–Japanese L2ers (n = 28) Referential

Bound 78.5%

Null pronoun

100%

83%

81.5%

Overt pronoun

47%

2%

42%

13%

(n = 20) and a group of fourth-semester L1-English-Japanese L2ers (n = 28). The relevant results are summarized in Table 13.1, as acceptance rates for the interpretation crossed with the form of the subject pronoun of the embedded clause. The columns marked ‘referential’ refer to the reading in which the embedded pronoun is coreferential with the matrix subject; the columns marked ‘bound’ refer to the bound-variable reading of the embedded pronoun. Both natives and L2ers exhibited a certain disinclination to associate overt embedded pronouns with matrix-subject antecedents (acceptance rates of 47% and 42%, respectively), slightly preferring extra-sentential antecedents instead. However, this cannot explain the much stronger effect of disallowing the bound-variable interpretation of overt embedded pronouns (2% acceptance for natives; 13% acceptance for L2ers).8 Recall that Kanno’s L2ers, just like L1 Japanese-acquiring children, are exposed to no PLD directly relevant to this interpretive asymmetry and receive no relevant instruction on it; moreover, these L2ers cannot draw on the interpretation of null pronouns from their L1 (given the absence of this null pronoun in English). However, Kanno’s data show that despite this threefold absence of evidence for the incompatibility of overt pronouns with a bound-variable interpretation in Japanese, adult English-Japanese L2ers indeed exhibit sensitivity to the Overt Pronoun Constraint. The only plausible remaining source for this knowledge is UG. Our second representative study concerns the L2 acquisition of word order variation in German, both what is possible and what is not. As is well known, German is an SOV language with V2 in finite main clauses; it is also a language that appears to permit a great deal of flexibility in the order of nonverbal constituents. The displacement of constituents is sensitive to discourse contexts and subject to certain syntactic and semantico-pragmatic conditions. Two distinct types of displacement in German are topicalization, as in (13), and scrambling, as in (15). (13) Topicalization in German main clauses a. XP V[+finite] S … b. *XP S V[+finite] … c. *XP S … V[+finite]

8 Kanno (1997:274) reports statistical significance values on the order of p = 0.0001 for all the relevant contrasts.

The Role of Universal Grammar 299 On standard analyses, topicalization moves the XP to the Spec,CP position, inducing V2 as in (13a), because the finite verb must move to C. The order in (13a) is obligatory, as shown by the illicit main-clause orders in (13b) and (13c). In embedded clauses (with an overt complementizer in the C position), by contrast, topicalization is not possible, i.e., an XP cannot move to a position to the left of the complementizer, as schematized in (14). (14) Topicalization in German embedded clauses * … XP complementizer S … V[+finite] Scrambling in German, by contrast, displaces constituents, within the appropriate discourse, to (various) positions somewhere to the right of the C position, irrespective of the distinction between main clause and embedded clause. Orders such as those in (15), illustrated with embedded clause schemas, are the result of scrambling. (15)

Scrambling in German (embedded clauses) a. … complementizer S DO (Adv) IO tDO (V[–finite]) V[+finite] b. … complementizer S IO (Adv) tIO DO (V[–finite]) V[+finite]

In both the topicalization of (13a) and the scrambling of (15), the complete, intact XP has been displaced. Yet German allows displacement of an even more interesting kind, known as remnant movement. Remnant movement consists of moving an XP (the remnant) from which a phrasal constituent (call it YP) has already first moved. This is schematized in (16): (16)

Remnant movement in German

[XP tYP] ... YP ... tXP (drawn from Hopp (2005: 38, (5))) However, such remnant movement is possible only when the two types of movement are distinct (Müller 1996, 1998). For instance, remnant topicalization after scrambling is possible (17c), but not remnant scrambling after scrambling or remnant topicalization after topicalization. This is illustrated in the paradigm in (17). (17a) exemplifies intact scrambling, and (17b) intact topicalization. The licit (17c) is remnant topicalization: first, the DO den Wagen ‘the car’ scrambles out of den Wagen zu reparieren ‘to repair the car’ in the lower clause, and then the remnant of this (t1 zu reparieren) topicalizes into the main clause.9 By contrast, (17d) and (17e) are illicit, because here remnant movement involves movement of the same type: remnant scrambling after scrambling in (17d) and remnant topicalization after topicalization in (17e). 9

The ‘t´2’ in the lower clause’s initial position (Spec,CP) in (17c) triggers V2. It is for this reason that in (17c) hat ‘has’ must precede Peter. (Also, the meaning glosses in (17b) and (17c) are ours.)

300 Bonnie D. Schwartz and Rex A. Sprouse (17) a. Intact scrambling: Ich glaube, dass [den Wagen zu reparieren]1 Peter schon t1 versucht hat. I think that the car to repair Peter already tried has ‘I think that Peter already tried to repair the car.’ b. Intact topicalization: [Den Wagen zu reparieren]1 hat Peter schon the car to repair has Peter already ‘Repairing the car (is what) Peter already tried (to do).’

t1 versucht. tried

c. Remnant topicalization across a scrambled phrase: [t1 Zu reparieren]2 glaube ich t´2 hat Peter [den Wagen]1 schon t2 versucht. to repair think I has Peter the car already tried ‘Repairing is what I think Peter has already tried to do to the car.’ d. * Remnant scrambling across a scrambled phrase: *Ich glaube, dass [t1 zu reparieren]2 Peter [den Wagen]1 schon t2 versucht hat. I think that to repair Peter the car already tried has e. * Remnant topicalization across a topicalized phrase:10 * [t1 Zu reparieren]2 glaube ich [den Wagen]1 hat Peter schon t2 versucht. to repair think I the car has Peter already tried (adapted from Hopp 2005:40–41, (10a), (10b), (10d), (10e1), (10f)) The main question Hopp’s (2002, 2005) L2 research addresses, building on work by Schreiber and Sprouse (1998), is whether adult native speakers of English, a language without scrambling, (can) come to have knowledge of the distinctions between grammatical vs. ungrammatical word orders of the type in (17). If they do, this constitutes evidence for the continued role of UG in adult L2 acquisition. The logic here parallels the logic presented earlier for bankruptcy-of-the-stimulus problems—but with the important added twist of (necessarily) taking into consideration the properties of the L1 grammar. For German native speakers, the grammaticality distinctions of the kind in (17) pose a (severe) poverty-of-the-stimulus problem. There is no doubt that tokens of intact scrambling of noncomplex XPs (see (15)) and intact topicalization (17b) are not uncommon in the language surrounding learners. However, as Hopp underscores, referring to work by Hoberg (1981); Schlesewsky, Fanselow, Kliegl, and Krems (2000); and Bornkessel, Schlesewsky, and Friederici (2002): [C]‌orpus studies demonstrate that the noncanonical word orders in ([17]), in particular scrambling of complex XPs [e.g., (17a)] and remnant movement [e.g., (17c)], are highly infrequent in spoken and written German. . . . The relative statistical difference between infrequent sentences and non-occurring ungrammatical 10

The string in (17e) is independently ruled out because topicalization itself creates a strong island in German, thereby blocking subsequent movement from its clause.

The Role of Universal Grammar 301 sentences [e.g., (17d) and (17e)] is thus very small. Therefore, observing the relative discourse frequency of noncanonical orders is unlikely to lead to a reliable distinction between rare licit and non-instantiated illicit sentences. (Hopp 2005:42)

Furthermore, on the basis of tokens of the sort in (17a)–(17c) that present evidence for variable word orders, learners might be led to expect (by natural analogical extension) additional word order possibilities, such as that in (17d), illustrated in the familiar four cells in (18): (18) Intact movement

Remnant movement after scrambling

Topicalization

OK

OK

Scrambling

OK

OK

But this is not what happens: native German speakers consistently distinguish the grammatical from the ungrammatical word orders of the types in (17), as in (19): (19) Intact movement

Remnant movement after scrambling

Topicalization

OK

OK

Scrambling

OK

*

The source of this knowledge is not external to the learner (i.e., not in the PLD), which leaves as the only source an internal one. Since the explanation for the range of phenomena here seems to implicate categories (e.g., constituents), operations (e.g., distinct movement types), and a restriction (no remnant movement involving movements of same type—or a functional equivalent to this) that pertain to only language, then UG is the ideal candidate for the source of this knowledge. The same conclusion would hold of L1-English L2ers of German, if they, too, come to make this distinction between grammatical vs. ungrammatical orders, but with two crucial extra steps: first, given the absence of the scrambling movement type in English, the L1, knowledge of such word order impossibilities in German cannot stem from the L1 grammar. Second, in addition, German language instruction does not broach the issue of remnant movement. Thus, with these two potential knowledge sources excluded, the English–German L2er faces essentially the same learning scenario here as that of the L1- German acquirer, set out earlier. And indeed, Schreiber and Sprouse (1998) found that advanced English–German L2ers do make the relevant distinctions sketched in (17) in a written, contextualized acceptability judgment task, as did Hopp (2002, 2005) testing intermediate to very advanced English–German L2ers. Hopp’s experiment substantially extended the

302 Bonnie D. Schwartz and Rex A. Sprouse enquiry; in his acceptability judgment task, not only were additional paradigms of possible and impossible (remnant) movement in German included, but the critical items (and fillers), following discourse-favorable contexts, were also presented bimodally, in writing and recordings (the latter carefully controlling for intonation naturalness, with both grammatical and ungrammatical items). His main finding is that adult native English-speaking L2ers of German, just like his adult German native participants, reliably make the relative distinctions between grammatical and ungrammatical orders, at the group level and the individual level. These results, in sum, offer clear-cut evidence of target-like adult L2 acquisition under poverty/bankruptcy of the stimulus, thereby implicating UG. Results such as these argue for the continued presence of Universal Grammar in adult L2 acquisition; this perhaps makes even more curious the (typical if not inevitable) fact of nonnative nonconvergence on the target grammar, the principal characteristic motivating the postulation of a critical period (in the L2 context). Why, if UG constrains adult L2 acquisition, does the ultimate attainment of adult L2ers, even under optimal environmental circumstances, not simply mirror that of L1 acquisition? One type of proposal links this nonconvergence, one way or another, to the L1 grammar. Many linguists conceptualize UG not only as a set of constraints, but also as a toolkit for grammar construction. For example, UG might provide the set of functional categories from which any particular grammar can draw. On this view, it might be possible, in principle, that L2 acquisition is constrained by principles of UG, while (adult) L2ers, after constructing their L1 grammar, no longer have access to the toolkit of basic building blocks. In fact, one particularly influential family of models of L2 acquisition makes precisely this claim. According to the Failed Functional Features Hypothesis of Hawkins and Chan (1997:189), ‘[b]‌eyond the critical period[,] the functional features, with the exception of those already encoded in the [L1] entries for specific lexical items, become inaccessible to modification.’ Tsimpli and Dimitrakopoulou (2007:217) narrow this in terms of uninterpretable features (in the sense of Chomsky’s 1995b Minimalist Program) by claiming that ‘interpretable features are accessible to the L2 learner[,] whereas uninterpretable features are difficult to identify and analyse in the [TL] input due to persistent, maturationally-based, L1 effects on adult L2 grammars.’ The basic idea of these models is that adult L2 Interlanguages maximize the properties of the L1 grammar, extending them to fit, as much as possible, the TL input. For instance, the Hawkins and Chan (1997) study argues that a (parametric) difference in functional features between English and Cantonese explains Cantonese speakers’ non-target-like acquisition of English restrictive relative clauses (RRCs): English has a [± wh] feature; Cantonese does not. Assuming that Cantonese RRCs are instances of (null) topicalization (generated in Spec,CP) binding a (null/resumptive) pronominal (Xu and Langendoen 1985; Xu 1986), Hawkins and Chan claim that Cantonese speakers analyze English RRCs in the same way, i.e., as a topicalization–pronominal binding relation rather than the target wh-movement– trace relation. In this sense, the Interlanguage English grammars of Cantonese adult L2ers can only ever approximate the TL grammar, and therefore certain nontarget-like

The Role of Universal Grammar 303 characteristics in their English, even at the most advanced stages, are inevitable (e.g., non-adherence to Subjacency). However, even if during the course of Interlanguage development, L2ers do posit such alternative, nontarget analyses (that stem from the L1 grammar), the conclusion about the inevitability of nonconvergence in regard to wh-movement and Subjacency cannot be true: as previously noted, there are studies showing that adult L2ers whose L1 lacks overt wh-movement do (come to) observe Subjacency in the TL (e.g., Martohardjono 1993, White and Juffs 1998; for an overview, see Belikova and White 2009). While the latter findings argue for the continued operation of UG in adult L2 acquisition (and the importance of considering TL proficiency in conducting L2 research), they also point to two important, related areas that go beyond the issue of the nature of L2 grammars: learnability and processing. Learnability concerns the transitioning from stage to stage (see chapter 11 for discussion and overview). What pushes an (L1 or) L2 acquirer to abandon a current analysis (i.e., a grammar) in favor of some other analysis (i.e., a different grammar)? Note that the subsequent analysis could be empirically more adequate, as in the case of Cevdet’s Interlanguage German moving from a grammar that (incorrectly) allows only pronominal postverbal subjects to a V2 grammar that (correctly) allows both pronominal and nonpronominal postverbal subjects, or the subsequent analysis could seemingly be equally empirically adequate, as in the case of—importantly—possible RRCs/wh- questions in the Interlanguage English of Cantonese/Chinese speakers. In the former, a type of presumably abundant PLD that had erstwhile been ignored, viz. postverbal nonpronominal subjects, is for some reason no longer ignored; in the latter, a topicalization–pronominal binding analysis for some reason gives way to a wh-movement–trace analysis (to account for a change from Subjacency non-adherence to Subjacency adherence, notably not in the input). In short, while the review on the preceding pages has focused on the nature of adult L2 systems, summarizing arguments and empirical data that lead to the conclusion that UG does constrain adult L2 acquisition, there is very little research that tackles how learning, i.e., stage-to-stage transitioning, actually takes place. The prevailing theoretical position on this within generative linguistics is that ‘grammar learning’ is a consequence of receptive sentence processing, that is, parsing (Fodor 1998c; see chapter 11, section 11.6). While the details of models differ (e.g., failure-driven parsing-to-learn vs. success-driven parsing-to-learn), a common thread is that only parsed (partial) strings (i.e., (partial) strings assigned structural and semantic representations) feed the grammar-building process. The L1 acquisition literature has engaged this issue more head-on than the L2 acquisition literature has thus far. For instance, in a recent article examining the relation between parser development and morphosyntactic development, Omaki and Lidz (2015) note that at least for children, information in an input string that comes earlier may be more likely to effect grammar change than information that comes later in the string (Choi and Trueswell 2010); this is because, given the incrementality of sentence processing (in even very young children), earlier information is more likely to be parsed, and any later information that requires revision

304 Bonnie D. Schwartz and Rex A. Sprouse of the initial parse becomes unavailable since, in general, immature parsers are poor at such reanalysis. While Omaki and Lidz speculate that for children, this may ultimately be due to the underdevelopment of ‘domain general, cognitive control mechanisms’ (p. 169), such a story is untenable in the adult L2 context. On the other hand, a similar account might hold of adult L2ers if the ability to revise parses is a function of (for instance) L2 proficiency: lower-level L2ers are less able to revise than higher-level L2ers (Roberts 2013), perhaps because, ultimately, the cognitive burden of using an L2 is greater in lower-level L2ers than in higher-level L2ers. Only future research will tell. Still, what remains uncontroversial for generative approaches to L2 acquisition, we contend, is the path L2 research needs to forge: investigating the development of L2 processing abilities, especially as they relate to the development of L2 grammars, focusing on the roles of the TL input, the L1 grammar, the L1 parser, the L2 parser, and, of course, Universal Grammar.

Pa rt I V

C OM PA R AT I V E SY N TAX

Chapter 14

Principle s a nd Parameters of Universal G ra mma r C.-T. James Huang and Ian Roberts

14.1 Introduction The Principles and Parameters Theory (P&P), which took shape in the early 1980s, marked an important step forward in the history of generative grammatical studies.1 It offered a plausible framework in which to capture both the similarities and differences among languages within a rigorous formal theory. It led to the discovery of important patterns of variation across languages. Most important of all, it offered an explanatory model for the empirical analyses which opened a way to meet the challenge of ‘Plato’s Problem’ posed by children’s effortless—yet completely successful—acquisition of their grammars under the conditions of the poverty of the stimulus (see chapters 5, 10, 11, and 12). Specifically, the P&P model led linguists to expand their scope of inquiry and enabled them to look at an unprecedented number of languages from the perspective of the formal theory of syntax, not only in familiar traditional domains of investigation; it also opened up some new frontiers, at the same time raising new questions about the nature of language which could not even have been formulated earlier. Another consequence was that it became possible to discover properties of one language (say English) by studying aspects of a distinct, genetically unrelated language (say, Chinese or Gungbe), and vice versa. Most of the original proposals for parameters in the early days of P&P were of the form that we would now, with the benefit of hindsight, think of as macroparameters. They have the characteristic property of capturing the fact that parametric variations occur in clusters. As the theory developed, it became clear that such a model is 1

This work was partly supported by the ERC Advanced Grant 269752 Rethinking Comparative Syntax (ReCoS), Principal Investigator: I. Roberts.

308 C.-T. James Huang and Ian Roberts inadequate for the description of micro-scale parametric variation across languages. In addition, certain correlations that were predicted by proposed well-known macroparameters turned out not to hold as more languages were brought into consideration. In the meantime, considerations of theoretical parsimony led to widespread adoption of the lexical parameterization hypothesis (known now as the Borer–Chomsky conjecture), ruling out much of the theoretical vocabulary used in earlier macroparametric proposals. These developments led to some doubts about the existence of macroparameters, and even the feasibility of the P&P program; see in particular Newmeyer (2005). In this chapter, developing recent work, we will support the position that both macroparameters and microparameters exist (and indeed other levels of parametric variation—see section 14.5), and that there is really no adequate alternative to a parameter-setting model of language acquisition. Consistent with the ‘three-factors’ conception of language design (Chomsky 2005; and chapter 6), parametric variation can be seen as an emergent property of the three factors of language design (Roberts and Holmberg 2010; Biberauer 2011; Roberts 2012; and references given there): the first factor is a radically underspecified UG, the second the Primary Linguistic Data (PLD) for language acquisition (see chapters 5, 10, 11, and 12) and the third general learning strategies based on computational conservatism (see chapter 6). Using the facts of Chinese as a paradigm case (i.e., the macroparametric contrasts with English, macroparametric changes since Old/Archaic Chinese, and microvariation among dialects), we show that both macroparameters and microparameters exist, and the tension between descriptive and explanatory adequacy is resolved by the view that macroparameters are aggregates of microparameters acting in concert, with correlating values as driven by the third-factor learning strategies (Roberts and Holmberg 2010, Roberts 2012).

14.2 The Principles and Parameters Theory Principles-and-Parameters theory emerged as a way of tackling what Chomsky (1986b) referred to as ‘Plato’s Problem.’ This is the basic observation that children acquire the intricacies of their native language in early life with little apparent effort and confronted with the impoverished stimulus (see chapters 5, 10, 11, and 12 for more details). As an illustration of the complexity of the task of language acquisition, consider the following sentences (this exposition is largely based on Roberts 2007:14–19): (1) a. The clowns expect (everyone) to amuse them. b. The clowns expected (everyone) to amuse themselves. If everyone is omitted in (1a), the pronoun them cannot correspond to the clowns, while if everyone is included, this is possible. If we simply change them to the reflexive pronoun

Principles and Parameters of Universal Grammar 309 themselves, as in (1b), exactly the reverse results. In (1b), if everyone is included, the pronoun themselves must correspond to it. If everyone is left out, themselves must correspond to the clowns. The point here is not how these facts are to be analyzed, but rather the precision and the subtlety of the grammatical knowledge at the native speaker’s disposal. It is legitimate to ask where such knowledge comes from. Another striking case involves the interpretation of missing material, as in (2): (2) John will go to the party, and Bill will—too. Here there is a notional gap following will, which we interpret as go to the party; this is the phenomenon known as VP-ellipsis. In (3), we have another example of VP-ellipsis: (3) John said he would come to the party, and Bill said he would—too. Here there is a further complication, as the pronoun he can, out of context, correspond to either John or Bill (or an unspecified third party). Now consider (4): (4) John loves his mother, and Bill does—too. Here the gap is interpreted as loves his mother. What is interesting is that the missing pronoun (the occurrence of his that isn’t there following does) has exactly the three- way ambiguity of he in (3): it may correspond to John, to Bill or to a third party. Example (4) shows we have the capacity to apprehend the ambiguity of a pronoun which we cannot hear. Again, a legitimate and, it seems, profound question is where this knowledge comes from. The cases just discussed are examples of native grammatical knowledge. The basic point in each case is that native speakers of a language constantly hear and produce novel sentences in that language, and yet are able to distinguish well-formed sentences from ill-formed ones and make subtle interpretative distinctions of the kind illustrated in (4). The existence of this kind of knowledge is readily demonstrated and not in doubt. But it raises the question of the origin: where does this come from? How does it develop in the growing person? This is Plato’s problem, as Chomsky called it, otherwise known as the logical problem of language acquisition (Hornstein and Lightfoot 1981). It is seen as a logical problem because there appears to be a profound mismatch between the richness and intricacy of adult linguistic competence, illustrated by the examples given in (1), and the rather short time taken by language acquisition coupled with small children’s seemingly limited cognitive capacities. This latter point brings us to the argument from the poverty of the stimulus. Here we briefly summarize this argument (for a more detailed presentation, see c hapter 10, as well as Smith 1999:40–41; Jackendoff 2003:82–87; and, in particular, Guasti 2002:5–18). As its name implies, the poverty-of-the-stimulus argument is based on the observation that there is a significant gap between what seems to be the experience facilitating first- language acquisition (the input or ‘stimulus’) and the nature of the linguistic knowledge

310 C.-T. James Huang and Ian Roberts which results from first-language acquisition, i.e., one’s knowledge of one’s native language. The following quotation summarizes the essence of the argument: The astronomical variety of sentences any natural language user can produce and understand has an important implication for language acquisition … A child is exposed to only a small proportion of the possible sentences in its language, thus limiting its database for constructing a more general version of that language in its own mind/brain. This point has logical implications for any system that attempts to acquire a natural language on the basis of limited data. It is immediately obvious that given a finite array of data, there are infinitely many theories consistent with it but inconsistent with one another. In the present case, there are in principle infinitely many target systems … consistent with the data of experience, and unless the search space and acquisition mechanisms are constrained, selection among them is impossible…. No known ‘general learning mechanism’ can acquire a natural language solely on the basis of positive or negative evidence, and the prospects for finding any such domain-independent device seem rather dim. The difficulty of this problem leads to the hypothesis that whatever system is responsible must be biased or constrained in certain ways. Such constraints have historically been termed ‘innate dispositions,’ with those underlying language referred to as ‘universal grammar.’ (Hauser, Chomsky, and Fitch 2002:1576–1577)

Hence we are led to a biological model of grammar. The argument from the poverty of the stimulus leads us to the view that there are innate constraints on the possible form a grammar of a human language can take; the theory of these constraints is Universal Grammar (UG). But of course it is clear that experience plays a role; no one is suggesting that English or Chinese are innate. So UG provides some kind of bias, limit, or schema for possible grammars, and exposure to people speaking provides the experience causing this latent capacity to be realized as competence in a given actual human language. Adult competence, as illustrated for English speakers by the data such as that in (1–4), is the result of nature (UG) and nurture (exposure to people speaking). The P&P model is a specific instantiation of this general approach to Plato’s problem. The view of first-language acquisition is that the child, armed with innate constraints on possible grammars furnished by UG, is exposed to Primary Linguistic Data (PLD, i.e., people speaking) and develops its particular grammar, which will be recognized in a given cultural context (e.g., in London, Boston, or Beijing) as the grammar of a particular language, English or Chinese. But it should be immediately apparent that London and Boston, or Beijing and Taipei, are not linguistically identical. Concepts such as ‘English’ and ‘Chinese’ are highly culture-bound and essentially prescientific. An individual’s mature competence, the end product of the process of first-language acquisition just sketched, is not really ‘English’ or ‘Chinese,’ but rather an individual, internal grammar, technically an I-grammar. We use the terms ‘English’ or ‘Chinese’ to designate different variants of I-grammar, but these terms are really only approximations (as are more narrowly defined terms such as ‘Standard Southern British English’ or ‘Standard Northern Mandarin Chinese,’ neither exactly corresponds to the I-grammar of Smith, Roberts, Li, or Huang).

Principles and Parameters of Universal Grammar 311 What makes the different I-grammars of, to revert to prescientific terms for convenience, English or Chinese? This is where the notion of parameters of UG comes in. Since UG is an innate capacity, it must be invariant across the species: Smith, Roberts, Li, and Huang (as well as Saito, Rizzi, and Sportiche) are all the same in this regard. But these individuals were exposed to different forms of speech when they were small and hence reached the different final states of adult competence that we designate as English, Chinese, Japanese, Italian, or French. These cognitive states are all instantiations of UG, but they differ in parameter-settings, abstract patterns of variation in the restricted set of grammars allowed by UG. So, on the Principles and Parameters conception of UG, differing PLD was sufficient to cause Roberts to set his UG parameters one way, so as to become an English speaker, while Huang set his another way and became a Chinese speaker, Saito another way, Rizzi still another, and so forth. On this view language acquisition is seen as the process of fixing the parameter values left open by UG, on the basis of experience determined by the PLD. The P&P model is a very powerful model of both linguistic diversity and language universals. More specifically, it provides a solution to Plato’s problem, the logical problem of language acquisition, in that the otherwise formidable task of language acquisition is reduced to a matter of parameter-setting. Moreover, it makes predictions about language typology: parameters make predictions about (possible) language types, as we will see in more detail in section 14.3 (see also c hapter 15). Furthermore, it sets the agenda for research on language change, in that syntactic change can be seen as parameter change (see c hapter 18). Finally, it draws research on different languages together as part of a general enterprise of discovering the precise nature of UG: we can discover properties of the English grammatical system (a particular set of parameter values) by investigating Chinese (or any other language), without knowing a word of English at all (and vice versa, of course). Let us now begin to look at the progress that has been made in this endeavor in more detail.

14.3 Principles and Parameters in GB In this section, we will briefly review some notable examples of parameters that were put forward in the first phase of research in the P&P model in the 1980s, using the general framework of Government–Binding (GB) theory.

14.3.1 The Head Parameter First proposed in Stowell (1981), and developed in Huang (1982), Koopman (1984), and Travis (1984), the head parameter can be stated as follows: (5) In X′, X {precedes/follows} its complement YP.

312 C.-T. James Huang and Ian Roberts This parameter regulates one of the most pervasive and well-studied instances of cross- linguistic variation: the variation in the linear order of heads and complements. Stated as (5), it predicts that all languages will be either rigidly head-initial (like English, the Bantu languages, the Romance languages, and the Celtic languages, among many others) or rigidly head-final (like Japanese, Korean, the Turkic languages, and the Dravidian languages). Of course many languages, including notably Chinese, show mixed, or disharmonic word order, suggesting that (5) needs to be relativized to categories, a matter we return to in section 14.4.1. The simplest statement of this parameter is along the lines of (5), which assumes, as was standard in GB theory, that linear precedence and hierarchical relations (defined in terms of X′-theory) are entirely separate. In fact, X′-theory was held to be invariant, a matter of UG principles (or deriving from UG principles), while linear order was subject to parametric variation. Since Kayne (1994), other approaches to linearization have been put forward (starting with Kayne’s Linear Correspondence Axiom), some of which, like Kayne’s, connect precedence and hierarchy directly. The head parameter must then be reformulated accordingly. In Kayne’s (1994) approach, for example, complement–head order cannot be directly generated, but must be derived by leftward movement (in the simplest case, of complements). The parameter in (5) must therefore be restated so as to regulate this leftward movement. Takano (1996), Fukui and Takano (1998), and Haider (2012:5), on the other hand, propose that complement–head order is the more basic option, with surface head–complement order being derived by head movement. In that case, (5) may be connected to head movement (and the availability of landing sites for such movement, according to Haider).

14.3.2 The Null Subject Parameter The basic observation motivating the postulation of this parameter is that some languages allow a definite pronominal subject of a finite clause to remain unexpressed, while others always require it to be expressed as a nominal bearing the subject function. Traditional grammars of languages such as Latin and Greek relate this to the fact that personal endings on the verb distinguish the person and number of the subject, thereby making a subject pronoun redundant. Languages that allow null subjects are very common: most of the older Indo-European languages fall into this category, as do most of the Modern Romance languages (with the exception of some varieties of French and some varieties of Rhaeto-Romansch; see Roberts 2010a), the Celtic languages, with certain restrictions in the case of Modern Irish (see McCloskey and Hale 1984; and, for arguments that Colloquial Welsh is not a null subject language, Tallerman 1987), West and South Slavic, but probably not East Slavic (these appear to be ‘partial’ null subject languages in the sense of Holmberg, Nayudu, and Sheehan 2009, Holmberg 2010b; see Duguine and Madariaga 2015 on Russian). Indeed, it seems that languages that allow null subjects are significantly more widespread than those which do not (Gilligan 1987, cited in Newmeyer 2005:85).

Principles and Parameters of Universal Grammar 313 Since Rizzi (1986), it has been widely assumed that the null subject parameter involves the ability of Infl (or T, or AgrS) to license a null pronoun, pro, and so can be stated as in (6): (6) T {licenses/does not license} pro in its Specifier. Perlmutter (1971) observed that languages that allow null subjects also allow wh- movement of the subject from a finite embedded clause across a complementizer (this observation has since become known as ‘Perlmutter’s generalization’). Rizzi (1982) linked this to the possibility of so-called ‘free inversion,’ leading to the following parametric cluster: (7)

a. The possibility of a silent, referential, definite subject of finite clauses. b. ‘Free subject inversion.’ c. The absence of complementizer–trace effects.

Rizzi showed that Italian has all of these properties while English lacks all of them. As with the head parameter, though, this cluster has empirical problems; see Gilligan (1987); Newmeyer (2005); and section 14.4.1.

14.3.3 The Null Topic Parameter Huang (1984) observed that certain languages allow arguments to drop if they are construed as topics. In Chinese, a question about the whereabouts of Lisi or whether anyone has seen him, may be answered by either of the sentences in (8): (8) a. Zhangsan kanjian-le. Zhangsan see-PERF ‘Zhangsan saw [him].’ b. Zhangsan shuo ta mei kanjian. Zhangsan say he not see ‘Zhangsan said that he didn’t see [him].’ Huang argued that the understood object in each case is first topicalized before it drops. This conception is supported by parallel facts in German. Thus, a similar question about Lisi can be answered by either of (9) (see Ross 1982 and Huang 1984 for more examples): (9) a. [e]‌ hab’ ich schon gesehen. Have I already seen. ‘I have already seen him.’

‘I saw [him].’

314 C.-T. James Huang and Ian Roberts b. [e]‌ hab’ ich in der Bibliothek gestern gesehen. Have I in the library yesterday seen ‘I saw him at the library yesterday.’ Note that the missing pronoun ihn ‘him’ (referring to Lisi) is licensed by virtue of being in the first (hence topic) position, as witnessed by the ill-formedness of *ich hab’ [e]‌schon gesehen where the topic position is filled by ich. The missing argument is thus not licensed by any formal feature of T (as it is in the case of null subjects). The Null Subject Parameter and the Null Topic Parameter thus jointly distinguish four language types. (10)

a. b. c. d.

[+null subject, –null topic]: Italian, Spanish, etc. [+null subject, +null topic]: Chinese, Japanese, European Portuguese, etc. [–null subject, –null topic]: English, Modern French, etc. [–null subject, +null topic]: German, Swedish, etc.

(See Raposo 1985 on European Portuguese, and Sigurðsson 2011b on Swedish and Icelandic.)

14.3.4 The Wh-Movement Parameter This parameter, first proposed in Huang (1982), regulates the option of preposing a wh- constituent or leaving it in place (‘in situ’) in wh-questions. English is a language which requires movement of such constituents, as shown in (11a), while Chinese and Japanese are standard examples of ‘wh-in-situ’ languages, illustrated by (11b,c): (11) a. What did John eat twhat ? b. Hufei chi-le sheme (ne) Hufei eat-asp what Qwh ‘What did Hufei eat?’ (Cheng 1991:112–113) c. John-ga dare-o John-NOM what-ACC ‘Who did John hit?’

butta-ka? hit-Q (Baker 2001:184)

(11c) shows the standard, neutral SOV order of Japanese (cf. John-ga Bill-o butta ‘John hit Bill’ [Baker 2001]), while (11a) illustrates that in English the object wh-constituent what is obligatorily fronted to the SpecCP position and (11b) illustrates wh-in-situ in SVO Chinese. To be more precise, English requires that exactly one wh-phrase be fronted in wh-questions. In multiple wh-questions, all wh-phrases except one stay in situ (and there are intricate constraints on which ones can or must be moved, as well as how they are interpreted in relation to one another). Some languages require all wh-expressions

Principles and Parameters of Universal Grammar 315 to move in multiple questions. This is typical of the Slavonic languages, as (12), from Bulgarian (Rudin 1988), shows: (12)

Koj kogo e who whom aux ‘Who saw whom?’

vidjal? saw-3s

There appears to be a further dimension of variation here (but see Bošković 2002 for a different view).

14.3.5 The Nonconfigurationality Parameter This parameter was put forward by Hale (1983) to account for a range of facts in languages that show highly unconstrained (sometimes rather misleadingly referred to as ‘free’) word order, such as Warlbiri and other Australian languages, as well as Latin and other conservative Indo-European languages. Hale’s proposal was that the phrase structure of such languages was ‘flat’ (i.e., it did not show the ‘configurational’ pattern familiar from languages such as English). This accounts directly for the ‘free’ word order of these languages, as well as the existence of ‘discontinuous constituents,’ i.e., cases where a nominal modifier may be separated from the noun it modifies by intervening material which is clearly extraneous to the NP (or DP). Hale (1983) connected two further properties, the extensive use of null anaphora and the unavailability of A-movement operations such as passive, to this parameter. The precise formulation of the parameter was as follows: (13) a. In configurational languages, the projection principle holds of the pair (LS, PS). b. In nonconfigurational languages, the projection principle holds of LS alone. Here ‘LS’ refers to Lexical Structure, a level of representation at which the lexical requirements of predicates are represented, and ‘PS’ refers to standard phrase structure. The projection principle requires lexical selection (c-selection and/or s-selection) properties of predicates to be structurally represented. Hence in nonconfigurational languages, according to this approach, phrase structure does not have to directly instantiate argument structure, with the consequence that arguments can be freely omitted, there are no structural asymmetries among arguments and no syntactic operations ‘converting’ one grammatical function into another. Hale (1983) argued for a number of other consequences of this parameter, focusing in particular on Warlbiri.

14.3.6 The Polysynthesis Parameter This parameter was argued for at length in Baker (1996). In fact, it can be broken up into two distinct parts. One aspect of it has to do with whether a language requires all arguments to show overt agreement with the main predicate (usually a verb); Baker (1996:17)

316 C.-T. James Huang and Ian Roberts formulates this in terms of whether a language requires its arguments to be morphologically or syntactically visible for θ-role assignment. A further option is whether a language allows (robust) noun incorporation. English and Chinese allow neither of these options, while Mohawk allows both. Navajo has the former but not the latter property. Noun incorporation is restricted to languages that satisfy the visibility requirement morphologically (Baker 1996:18), and so there are predicted to be no languages which have noun incorporation without fully generalized agreement. Baker connects the following cluster of properties to the polysynthesis parameter (as well as a further six, see Baker 1996:498–499, Table 11.1): (14) a. Syntactic Noun Incorporation; b. obligatory object agreement; c. free pro-drop; d. free word order; e. no NP reflexives; f. no true quantifiers; g. no true determiners. Baker’s parameter gives an elegant account of the major typological differences between languages of the Mohawk type (known as head-marking nonconfigurational languages) and those of the English/Chinese type.

14.3.7 The Nominal Mapping Parameter This parameter was put forward by Chierchia (1998a,b), and concerns, as its name implies, an aspect of the mapping from syntax to semantics. Chierchia observes that two features characterize the general semantic properties of nominals across languages: they can be argumental or predicative, or [±arg(ument)], [±pred(icate)], reflecting the general fact that nominals can function as arguments or predicates, as in Johnarg is [ a doctorPred ]. The parametric variation lies in which of the three possible combinations of values of these features a given language allows (the fourth logical possibility, negative values for both features, is ruled out as nominals of this sort would have no denotation at all). In a [+arg, –pred] language, every nominal is of type , i.e., nominals denote individuals rather than predicates. Languages with this parameter setting have the following properties (Chierchia 1998b:354): (15)

i. Generalized bare arguments; ii. the extension of all nouns is mass; iii. no plural marking; iv. generalized classifier system.

Principles and Parameters of Universal Grammar 317 Languages with this value for the nominal-mapping parameter include Chinese and Japanese. Nominals appear as bare arguments in these languages as a direct consequence of being of type ; hence count nouns can function directly as arguments with no article or quantifier (giving the equivalent of I saw cat, meaning ‘I saw a/the cat(s)’). Chierchia argues that all nouns have fundamentally mass denotations, so unadorned nouns will have property (ii); more generally, there is no mass–count distinction in these languages. Further, since mass nouns cannot pluralize, there is no plural marking and, finally, special devices have to be used in order to individuate noun denotations for counting; this is what underlies the classifier system. In a [–arg, +pred] language, on the other hand, all nominals are predicates (type ). It follows that bare nouns can never be arguments. This, as is well known, is the situation in French and, with certain complications, the other Romance languages (see Longobardi 1994). Such languages can have plural marking and lack classifiers. Finally, [+arg, +pred] languages allow mass nouns and plurals as bare arguments, but not singular count nouns, have plural marking, and lack classifiers. (Singular bare count nouns can function as predicates as in We elected John president.) This is the English, and, more broadly, Germanic, setting for this parameter.

14.3.8 The Relativized X-Bar Parameter Fukui (1986) presents a general theory of functional categories, arguing that only these categories project above X′, and hence only these categories have Specifiers. He further proposes that functional categories can be absent as a parametric option. He analyzes Japanese as lacking D and C, and having a very defective I (or T), in particular in lacking agreement features. It follows that Japanese has no landing site for wh-movement (and hence is a wh-in-situ language, as we saw in (11c)), no dedicated subject position of the English type and the concomitant possibility of multiple nominative (-ga) marked arguments in a single clause, no position in nominals for articles and the possibility of multiple genitive (-no) marked nominals inside a single complex nominal and of stacked appositive relatives.

14.3.9 Parametric Typology The parameters we have briefly reviewed can be put together to give a characterization of the major grammatical properties of different languages, as shown in Table 14.1. Table 14.1 illustrates, albeit in a rather approximate and (in certain cases, e.g., Chinese is head-final in DP) debatable form, how a reasonable number of parameters can give us a synoptic and highly informative characterization of the salient grammatical features of a system. Note that our three languages here all differ for their values of each parameter discussed, except polysynthesis (which of course has a distinct value in languages such as Mohawk). An approach of this general kind, known as the Parametric Comparison

318 C.-T. James Huang and Ian Roberts Table 14.1 Summary of values of parameters discussed in this section for English, Chinese, and Japanese Head- Null Null Wh- Non- Poly- final? subjects? topics? movement configurational? synthesis? N = X’? English

No

No

No

Yes

No

No

Yes

Yes

Chinese

No

Yes

Yes

No

No

No

No

??

Japanese Yes

Yes

??

No

??

No

No

No

Method, has been developed in detail by Giuseppe Longobardi and his associates; see in particular Longobardi (2003, 2005); Gianollo, Guardiano, and Longobardi (2008); Colonna et al. (2010); and chapter 16, especially Figure 16.1.

14.4 Macroparameters, Microparameters, and Parametric Clusters 14.4.1 Macroparameters and Clustering Most of the parameters proposed in the GB era, of which those discussed in the previous section are a representative sample, have the character of being macroparameters, in that their effects are readily observed across the board in almost any sentence in any language. This can be easily observed in the case of the head parameter and the nonconfigurationality parameter, but any finite clause with a definite pronominal subject can express the null subject parameter, any wh-interrogative expresses the wh-parameter, any realization of arguments expresses the polysynthesis parameter, any nominal containing a singular count noun the nominal mapping parameter and any nominal or clause Fukui’s functional-category parameter. The effects of these parameters are thus pervasive. This also means that their settings are salient in the PLD, making them, presumably, easy for acquirers to observe and thereby fix (see chapters 11 and 12). This also means that observed variations are predicted to typically cluster together. For example, the Head parameter (all else being equal) predicts that V-final, N-final, P-final, and A-final orders will all co-occur (in addition to, depending on what one assumes about functional categories, T-final, C-final, and D-final orders). As noted in section 14.3.2, the classical null subject parameter predicts the clustering of null subjects, free inversion, and apparent long subject-extraction as in (7) (as well as, possibly, differences between French and Italian long clitic-climbing and infinitival V-movement [Kayne 1989, 1991]). Chierchia’s nominal mapping parameter predicts the cluster of surface properties given in (15) and Baker’s

Principles and Parameters of Universal Grammar 319 Polysynthesis Parameter in (14). Similarly, the DP/NP parameter, more recently proposed by Bošković (2008), predicts that left branch extraction (as in Whose did you read book?), adjunct extraction from NP, scrambling, adnominal double genitives, superlatives with a ‘more than half ’ reading, and other properties cluster together. This, as was first pointed out in Chomsky (1981a), gives macroparameters their potential explanatory value: an acquirer need only observe one of the clustering properties which express the parameter to get all the others ‘for free,’ as an automatic consequence of their UG-mandated clustering. For example, merely recognizing that finite clauses allow definite pronominal subjects not to appear overtly automatically guarantees that the much more recondite property of long wh-extraction of subjects over complementizers is thereby acquired. In this way, the principles and parameters approach brought biolinguistics and language typology together; this point emerges particularly clearly when groups of parameters are presented together as in Table 14.1 and, much more strikingly, Figure 16.1. However, since the mid-1980s it has gradually emerged that there are problems with the conception of macroparameters. These are of both a theoretical and an empirical nature. On the empirical side, it has emerged that many of the typological predictions made by macroparameters are not borne out. This is particularly clear in the case of the Head Parameter which, formulated as in (5), predicts that all languages will be either rigidly, harmonically head-initial or rigidly, harmonically head-final. It is, of course, well known that this is not true: German, Mandarin, and Latin are all clear examples of very well-studied languages which show disharmonic orders (and see Cinque 2013 for the suggestion that fully harmonic systems may be very rare). However, at the same time it is not true that just anything goes: on the one hand, languages tend toward cross- categorial harmony (as first shown in detail in Hawkins 1983; see also Dryer 1992 and chapter 15); second, there appear to be general constraints on possible combinations of head-initial and head-final structures (see for example Biberauer, Holmberg, and Roberts 2014). Concerning the predictions made by the putative cluster associated with the classical null subject parameter, see the extensive critique in Newmeyer (2005) and the response in Roberts and Holmberg (2010). Similar comments could be made about the other parameters listed in section 14.3. From a theoretical point of view, there are two basic problems with macroparameters. First, they put an extra burden on linguistic theory, in that they have to be stated somewhere in the model. The original conception of parameters as variable properties associated with invariant UG principles dealt elegantly with this question, but most of the parameters listed in section 14.3 do not seem to be straightforwardly formulable in this way. Second, it is not clear why just these parameters are what they are; there is, in other words, a certain arbitrariness in where variation may or may not occur which is not explained by any aspect of the theory. In short, macroparameters, while having great potential merit from the perspective of explanatory adequacy, have often fallen short in descriptive terms by making excessively strong empirical predictions. Moreover, there has been no natural intensional characterization of the notion of what a possible macroparameter can be, rendering their theoretical status somewhat questionable.

320 C.-T. James Huang and Ian Roberts

14.4.2 Microparameters Here the key theoretical proposal is the lexical parameterization hypothesis (Borer 1984; Chomsky 1995b). This can be thought of, following Baker (2008a:3, 2008b:155–156), as the ‘Borer–Chomsky conjecture,’ or BCC: (16) All parameters of variation are attributable to differences in the features of particular items (e.g., the functional heads) in the Lexicon. More precisely, we can restrict parameters of variation to a particular class of features, namely formal features in the sense of Chomsky (1995b) (Case, φ, and categorial features) or, perhaps still more strongly, to attraction/repulsion features (EPP features, Edge Features, etc.). This view has a number of advantages, especially as compared with the earlier view that parameters were points of variation associated with UG principles. First, it is clearly a highly restrictive theory: reducing all parametric variations to features of (functional) lexical items means that many possible parameters simply cannot be stated. An example might be a putative parameter concerning the ‘arity’ of Merge, i.e., how many elements can be combined by a single operation of Merge. Such a parameter might restrict some languages to binary Merge but allow others to have ternary or n-ary Merge, perhaps giving rise to the effects of nonconfigurationality along the lines of Hale (1983) (see section 14.3.5). The second advantage of a microparametric approach has to do with language acquisition. As originally pointed out by Borer, ‘associating parameter values with lexical entries reduces them to the one part of a language which clearly must be learned anyway: the lexicon’ (Borer 1984:29). Third, the microparametric approach implies a restriction on the form of parameters, along roughly the lines of (17): (17) For some formal feature F, P = ±F. Here are some concrete, rather plausible, examples instantiating the schema in (17): (18) a. T is ±φ; b. N is ±Num; c. T is ±EPP. (18a) captures the difference between a language in which verbs inflect for person and number, such as English (in a limited way) and most other European languages, on the one hand, and languages like Chinese and Japanese, on the other, in which they do not. This may have many consequences for the syntactic properties of verbs and subjects (cf. the discussion of Japanese in Fukui 1986 mentioned in section 14.3.8). (18b) captures the difference between a language in which number does not have to be marked on (count) nouns, such as Mandarin Chinese, and one in which it does, as in English; this difference may be connected to the nominal mapping parameter (see section 14.3.7). (18c)

Principles and Parameters of Universal Grammar 321 determines the position of the overt subject; in conjunction with V-to-T movement, a negative value of this parameter gives VSO word order, providing a minimal difference between, for example, Welsh and French (see McCloskey 1996; Roberts 2005). This simplicity of formulation of microparameters, along with the general conception of the BCC, should be compared to the theoretical objections to macroparameters discussed at the end of the previous section. It seems clear that microparameters represent a theoretically preferable approach to the macroparametric one illustrated in section 14.3. Fourth, the microparametric view allows us to put an upper bound on the set of grammars. Suppose we have two potential parameter values per formal feature (i.e., each feature offers a binary parametric choice as stated in (17)), then we can define the quantity n as follows: (19) n = |F|, the cardinality of the set of formal features. It then follows that the cardinality of the set of parameter values |P| is 2n and the cardinality of the set of grammatical systems |G| is 2n. So, if |F| = 30, then |P| = 60 and |G| = 230, or 1,073,741,824. Or if, following Kayne (2005a:14), |F| = 100, then |G| = 1,267,650,600, 228,229,401,496,703,205,376. Kayne states that ‘[t]‌here is no problem here (except, perhaps, for those who think that linguists must study every possible language)’ (2005a:14). However, one consequence is clear: the learning device must be able to search this huge space very efficiently, otherwise selection among such a large range of options would be impossible for acquirers (see c hapter 11, section 11.5, for the problems that this kind of space poses for ‘search-based’ parameter-setting). It may be, though, that the observation of this extremely large space brings to light a fatal weakness of the microparametric approach. To see this, consider a thought experiment (variants of this have been presented in Roberts 2001 and 2014). Suppose that at present approximately 5,000 languages are spoken and that this figure has been constant throughout human history (back to the emergence of language faculty in modern homo sapiens; see the brief discussion of the evolution of language in chapter 1). Suppose further that every language changes in at least one parameter value with every generation. Then, if we have a new generation every 25 years, we have 20,000 languages per century. Finally, suppose that modern humans with modern UG have existed for 100,000 years, i.e., 1,000 centuries. It then follows that 20,000,000 languages have been spoken in the whole of human history, i.e., 107 × 2. This number is 27 orders of magnitude smaller than the number of possible grammatical systems arising from the postulation of 100 independent binary parameters. While there are many problems with the detailed assumptions just presented (several of them related to the Uniformitarian Principle, the idea that linguistic prehistory must have been essentially similar to recorded linguistic history; see Roberts forthcoming for discussion and a more refined statement of the argument), the conclusion is that, if the parameter space is as large as Kayne suggests, there simply has not been enough time since the emergence of the species (and therefore, we assume, of UG) for anything other than a tiny fraction of the total range of possibilities offered by UG to be realized. This implies that we could never know whether a language of the past corresponded to

322 C.-T. James Huang and Ian Roberts the UG of the present or not, since the overwhelming likelihood is that these languages could be typologically different from any language that existed before or since, perhaps radically so. More generally, even with a UG containing just 100 independent parameters we should expect that languages appear to ‘differ from each other without limit and in unpredictable ways’ in the famous words of Joos (1957:96). But of course, we can observe language types, and note diachronic drift from one type to another. We conclude that, despite the clear merits of the microparametric approach, it appears that a way must be found to lower the upper bound on the number of parameters, on a principled basis.

14.5 Principles and Parameters in Minimalism The exploratory program for linguistic theory known as the Minimalist Program (MP henceforth) has as its principal goal to go ‘beyond explanatory adequacy,’ that is, beyond explaining the ‘poverty of stimulus’ problem (see in particular Chomsky 2004a and chapters 5, 6, and 10). This goal has both theoretical and empirical aspects. On the theoretical side, the goal is to fulfill the Galilean ideal of maximally simple explanation (see also Chen-Ning Yang 1982). On the empirical side, the goal is to explain the ‘brevity of evolution’ problem. Estimates regarding the date of the origin of language vary widely, with anything between 200,000 and 50,000 years ago being proposed (Tallerman and Gibson 2012:239– 245 and chapter 1). It is not necessary to take a precise view on the date of the origin of language here, because anywhere within this range is a very short period for the development of such a seemingly complex cognitive capacity. It seems that there has been little time for the processes of random mutation and natural selection to operate so as to give rise to this capacity, unless we view the origin of the language faculty as due to a relatively small set of mutations which spread through a small, genetically homogeneous population in a very short time (in evolutionary terms). Hence, from the biological or neurological perspective, the core properties of the language faculty must be rather few. Combining this with the Galilean desideratum just mentioned, we then expect UG, at least the domain-specific aspects of cognition which are essential to language, to be few and simple. In trying to approach these goals, then, there has been an endeavor to reduce the ‘size,’ complexity, and the overall contents of UG; see Mobbs (2015) for an excellent discussion and overview. One important conceptual shift in this direction was Chomsky’s (2005) articulation of the three factors of language design. These are as follows: (20) a. Genetic endowment: UG. b. Experience: PLD. c. Other independent, non-domain-specific cognitive systems, such as other cognitive abilities (logical reasoning, memory), computational efficiency, minimality, and general laws of nature.

Principles and Parameters of Universal Grammar 323 From this perspective, many things which were previously attributed directly to UG as principles of grammar can be ascribed to the third factor (see in particular chapter 6 for discussion). Regarding the question of parametric variation, since there are few or no UG principles to be parametrized along the earlier, GB-style lines, all parameters must be stated as microparameters, and indeed in general the BCC has been the dominant view of where parameters fit into a minimalist approach (see in particular Baker 2008b). More generally, the nature of the rather speculative and, at least in principle, restrictive and programmatic proposals of the MP has meant in practice that there are numerous empirical problems that have been known since the GB era or before which have been largely left untouched. For example, many of the results of the intensive technical work on phenomena associated with Empty Category Principle in the GB era, particularly those developing the proposals in Chomsky (1986b), have not been carried forward, in part because some of the mechanisms and notions introduced earlier have been made unavailable (notably the various concepts of government, proper government, head/lexical government, and antecedent government; see Huang 1982; Lasnik and Saito 1984, 1992; Cinque 1991; and references given there). To a degree, the GB notion of parameter, as summarized and illustrated in section 14.3, has suffered a similar fate. Traditional macroparameters cannot be stated within Minimalist vocabulary, and so all parametric variation must be seen as microparametric variation, stated as variations in the nature of formal features of individual functional categories. So the ‘traditional’ macroparameters are completely excluded as such. This, combined with the empirical problems associated with clusters discussed in section 14.4.1, has led many to conclude that the entire P&P enterprise should have been abandoned (see especially Boeckx 2014), although no clear alternative proposals for how to deal with synchronic and diachronic linguistic diversity have emerged. So the question that arises is whether macroparameters really exist, and if so, how they can be accommodated in a minimalist UG. Furthermore, as our brief discussion of microparameters at the end of the previous section shows, given the large number of microparameters based on individual formal features, a question we have to ask is whether Plato’s Problem arises again. How can the acquirer search a space containing 1,267,651 trillion trillion possible grammars in the few years of first-language acquisition (see c hapter 11 on the question of searching the grammatical space, and chapter 12 on the time-course of first-language acquisition)? Do we not risk sacrificing the earlier notion of explanatory adequacy in our attempt to go beyond it? Perhaps surprisingly, these questions have not been at the forefront of theoretical discussion in the context of the MP. Nonetheless, some interesting views have been articulated recently. Here we will briefly discuss those of Kayne (2005a, 2013, i.a.); Baker (2008b); Gianollo, Guardiano, and Longobardi (2008); Holmberg (2010b); Roberts and Holmberg (2010); and Biberauer and Roberts (2012, 2015a,b, in press). Kayne (2005a, 2013, i.a.) emphasizes the fact that there is no doubt as to the existence of microparameters. The particular value of this approach lies in the idea that, in looking very carefully at very closely related languages or dialects (e.g., the Italo-Romance varieties), we detect many useful generalizations that would not have been visible on a macroparametric

324 C.-T. James Huang and Ian Roberts approach. Microparametric research has two methodological advantages. On the one hand, it gives us a restrictive theory of parametric variation, as the BCC clearly illustrates (see the discussion in section 14.4.2). Second, it permits something close to a ‘controlled experiment’: by looking at very closely related varieties, we control for as many potential variable factors (which may obscure the facet of variation we are interested in) as possible, making it possible to focus on the single variant property, or at least relatively few properties of interest. To give a very simple example (for which Kayne is not responsible), if we observe differences in verb/clitic orders across two or more Romance varieties (which is very easy to do; see Kayne 1991 and Roberts 2016), we are unlikely to treat these as reflexes of more general (‘macro’) differences in word order parameters, since all the Romance languages share the same very strong tendency to harmonic head-initial order. Furthermore, many or most (perhaps all) macroparameters can be broken up into microparameters. As Kayne (2013:137n23) points out: It might also be that all ‘large’ language differences, e.g. polysynthetic vs. non-(cf. Baker (1996)) or analytic vs. non-(cf. Huang 2010 [= 2013]), are understandable as particular arrays built up of small differences of the sort that might distinguish one language from another very similar one, in other words that all parameters are microparameters [emphasis added].

This last idea was developed by Roberts (2012); see also the discussion of Biberauer and Roberts (2015a,b, in press) later in this section. Baker (2008a,b) argues for the need for macroparameters in addition to microparameters. He argues that certain macroparameters go a long way towards reducing the range of actual occurring variation: The strict microparametric view predicts that there will be many more languages that look like roughly equal mixtures of two properties than there are pure languages, whereas the macroparametric-plus-microparametric approach predicts that there will be more languages that look like pure or almost pure instances of the extreme types, and fewer that are roughly equal mixtures. (Baker 2008b:361)

On the other hand, the macroparametric view predicts, falsely, rigid division of all languages into clear types (head-initial vs. head-final, etc.). Regarding this possibility, Baker comments (2008b:359) that ‘[w]‌e now know beyond any reasonable doubt that this is not the true situation.’ Baker further observes that, combining macroparameters and microparameters, we expect to find a bimodal distribution: languages should tend to cluster around one type or another, with a certain amount of noise and a few outliers from either one of the principal patterns. And, as he points out, this often appears to be the case, for example regarding the correlation originally proposed by Greenberg (1963/2007) between verb– object order and preposition–object order. The figures from the most recent version of The World Atlas of Language Structures (WALS) are as follows (these figures leave aside a range of minority patterns such as ‘inpositions,’ languages lacking adpositions, and the cases Dryer classifies as ‘no dominant order’ in either category):

Principles and Parameters of Universal Grammar 325 (21)

OV & Po(stpositions) OV & Pr(epositions) VO & Po VO & Pr

472 14 41 454 (Dryer 2013a,b)

It is very clear that here we see the kind of normal distribution predicted by a combination of macro-and microparameters. Baker therefore concludes that the theory of comparative syntax needs some notion of macroparameter alongside microparameters. He also makes the important point that many macroparameters could probably never have been discovered simply by comparing dialects of Indo-European languages. Gianollo, Guardiano, and Longobardi (2008) propose a distinction between parameters themselves, construed along the lines of the BCC, and hence microparameters, and parameter schemata (see also c hapter 16, section 16.9). On this view, UG makes available a small set of parameter schemata, which, in conjunction with the PLD, create the parameters that determine the non-universal aspects of the grammatical system. They suggest the following schemata, where in each case F is a formal feature of a functional head, lexically encoded as such in line with the BCC: (22) a. Grammaticalization: Is F grammaticalized? b. Checking: Is F, a grammaticalized feature, checked by X, X a category? c. Spread: Is F, a grammaticalized feature, spread on Y, Y a category? d. Strength: Is F a grammaticalized feature checked by X, strong? (i.e., does it overtly attract X?) e. Size: Is F a grammaticalized feature checked by a head X (or something bigger)? Gianollo, Guardiano, and Longobardi (2008:121–122) illustrate the workings of these schemata for the [definiteness] feature in relation to 47 parameters concerning internal structure of DP (e.g., is there a null article? is there an enclitic article? are demonstratives in SpecDP? do demonstratives combine with articles? What is the position of adnominal adjectives? etc.) across 24 languages (this is an example of Modularized Parametric Comparison; see c hapter 16). A very important aspect of Gianollo, Guardiano, and Longobardi’s position, taken up by Roberts and Holmberg (2010) and Biberauer and Roberts (2012, 2015a,b, in press) is the idea that parameters are not primitives in a minimalist system, but derive from other aspects of the system. Holmberg (2010b:8) makes a further important observation: it is possible to consider parameters as underspecifications in UG, entirely in line with minimalist considerations. He says: A parameter is what we get when a principle of UG is underdetermined with respect to some property. It is a principle minus something, namely a specification of a feature value, or a movement, or a linear order, etc.

326 C.-T. James Huang and Ian Roberts (In fact, Kayne argued for this view in the 1980s; see Uriagereka 1998:539.) Roberts and Holmberg (2010:53) combine these last two ideas and suggest that the existence of parameter variation, and in fact the parameters themselves, are emergent properties, resulting from the three factors of language design given in (20). They propose that, formally, parameters involve generalized quantification over formal features, as in (23): (23) Q(ff ∈ C) [P(f)] Here Q is a quantifier, f is a formal feature, C is a class of grammatical categories providing the restriction on the quantifier, and P is a set of predicates defining formal operations of the system (‘Agrees,’ ‘has an EPP feature,’ ‘attracts a head,’ etc.). In these terms, one of the standard formulations of the null subject parameter as involving ‘pronominal’ T/Infl, instantiates the general schema as follows: (24) ∃ff∈D [ S(D, TFin) ] (24) reads ‘For some feature D, D is a sublabel of finite T’ (where ‘sublabel’ is understood as in Chomsky 1995b:268). On this view, UG does not even provide the parameter schemata. As Roberts and Holmberg put it: In essence, parameters reduce to the quantificational schema in [(23)], in which UG contributes the elements quantified over (formal features), the restriction (grammatical categories) and the nuclear scope (predicates defining grammatical operations such as Agree, etc). The quantification relation itself is not given by UG, since we take it that generalized quantification—the ability to compute relations among sets—is an aspect of general human computational abilities not restricted to language. So even the basic schema for parameters results from an interaction of UG elements and general computation. (Roberts and Holmberg 2010:60)

The role of the second and third factors is developed and clarified in Roberts (2012) and, in particular, in Biberauer and Roberts (2012, 2015a,b, in press), summarizing and developing earlier work (see the references given). The third factor principles are seen as principles manifesting optimal use of cognitive resources, i.e., general computational conservativity. In particular, the following two acquisition strategies are proposed: (25)

(i) Feature Economy (FE) (see Roberts and Roussou 2003:201): Postulate as few formal features as possible. (ii) Input Generalization (IG) (see Roberts 2007:275): Maximize available features.

Biberauer and Roberts (2014:7) say: From an acquirer’s perspective, FE requires the postulation of the minimum number of formal features consistent with the input. IG embodies the logically invalid, but

Principles and Parameters of Universal Grammar 327 heuristically useful learning mechanism of moving from an existential to a universal generalisation. Like FE, it is stated as a preference, since it is always defeasible by the PLD. More precisely, we do not see the PLD as an undifferentiated mass, but we take the acquirer to be sensitive to particular aspects of PLD such as movement, agreement, etc., readily encountered in simple declaratives, questions and imperatives. So we see that the interaction of the second (PLD) and third (FE, IG) factors is crucial.

The effect of parametric variation arises from this interaction of PLD and FE/IG with the underspecification of the formal features of functional heads in UG. In further work, Biberauer (2011) in fact suggests that the formal features themselves may represent emergent properties, with UG contributing merely the general notion of ‘(un)interpretable formal feature’ rather than an inventory of features to be selected from; see also Biberauer and Roberts (2015a). This clearly represents a further step towards general minimalist desiderata of overall simplicity, as well as arguably going beyond explanatory adequacy. This emergentist approach has two interesting consequences. One is that it leads to the postulation of a learning path along the following lines: acquirers will always by default postulate that no heads bear a given feature F; this maximally satisfies FE and IG. Once F is detected in the PLD, IG requires that that feature is generalized to all relevant heads (of course this violates FE, but PLD will defeat the third-factor strategies). As a third step, if a head which does not bear F is detected, the learner retreats from the maximal generalization and postulates that some heads bear F. This creates a distinction between the set of heads bearing F and its complement set, and the procedure is iterated for the subset (this procedure is very similar to Dresher’s 2009, 2013 Successive Division Algorithm, as well as learning procedures observed in other domains, as Biberauer and Roberts 2014 show in detail). Related to the NO>ALL>SOME procedure is a finer-grained distinction among classes of parameters (originating in Biberauer and Roberts 2012), as follows: (26) For a given value vi of a parametrically variant feature F: a. Macroparameters: all heads of the relevant type, e.g., all probes, all phase heads, etc., share vi; b. Mesoparameters: all heads of a given natural class, e.g., [+V] or a core functional category, share vi; c. Microparameters: a small, lexically definable subclass of functional heads (e.g., modal auxiliaries, subject clitics) shows vi; d. Nanoparameters: one or more individual lexical items is/are specified for vi. Biberauer and Roberts (2015b) illustrate and support these distinctions in relation to parametric changes in the history of English. It is clear that the kinds of parameters defined in (26) fall into a hierarchy. Beginning with Roberts and Holmberg (2010) and developing through Roberts (2012); Biberauer and Roberts (2012, 2014, 2015a,b, in press); and numerous references given there (notably, but not only, Biberauer, Holmberg, Roberts and

328 C.-T. James Huang and Ian Roberts Sheehan 2014 and Sheehan 2014, to appear; see also the references at http://recos-dtal. mml.cam.ac.uk/papers). One advantage of parameter hierarchies is that they reduce the space of possible grammars created by parameters by making certain parameter values interdependent; see Biberauer, Holmberg, Roberts, and Sheehan (2014) for more discussion. We will return to some further implications of parameter hierarchies in section 14.9. So we see that the change in theoretical perspective brought about by the MP does not, in itself, invalidate the aims, methods, or the results achieved in the GB era, nor is it inconsistent with P&P theory, once parameters are seen as points of underspecification in UG, with other aspects of parametrization resulting from the interaction of UG so conceived with the second and third factors. In what follows, we give a case study of parametric variation both within varieties of Chinese (synchronically and diachronically), and between (mostly Mandarin) Chinese and English. This case study is intended to provide empirical support for the following claims and proposals: A: Both macroparameters and microparameters are needed in linguistic theory. B: Macroparameters are simply aggregates of microparameters acting in concert on the basis of a conservative learning strategy (see the discussion of work by Biberauer and Roberts in the preceding paragraphs). C: The (micro)parameters are themselves hierarchically organized (again see the discussion in the foregoing); we will also tentatively identify a candidate mesoparametric cluster, supporting the idea that there is hierarchy ‘all the way down.’ In the next three sections, we will develop and support each of Points A-C in turn.

14.6 Evidence for Macroparameters and Microparameters 14.6.1 Synchronic Variation: Macroparametric Contrasts between Modern Chinese and English Modern Chinese shows a number of properties that Huang (2015) characterizes as indicating a general property of ‘high analyticity’: (i) Chinese has light-verb constructions where English has (typically denominal) unergative intransitives: (27) a. Chinese: da yu ‘do fish’, da dianhua ‘do phone’, da penti ‘do sneeze’ … b. English: to fish, to phone, to sneeze …

Principles and Parameters of Universal Grammar 329 (ii) Chinese has ‘pseudo-incorporation’ (Massam 2001a), otherwise known as phrasal compound verbs, where English has simple transitives or intransitives: (28) a. Chinese: zhuo yu ‘catch fish,’ chi fan ‘eat rice,’ bo pi ‘peel skin’ … b. English: to fish, to feed, to skin … (iii) Chinese typically has compound and phrasal accomplishment verbs, where English has simple verbs: (29) a. Chinese: da-po ‘hit-broken,’ nong-po ‘make broken,’ ti-po ‘kick-broken,’ etc. b. English: break, etc. (iv) Chinese requires overt classifiers for count nouns: (30) a. Chinese: san ben shu ‘three CL book’ (‘three books’) b. English: three books (v) Chinese needs overt localizers to express locations: (31)

a. Chinese: zou dao zhuozi pangbian ‘walk to table’s side’ b. English: walked to the table

(vi) Chinese has the canonical ‘Kaynean word order’: Subject–Adjunct–Verb–Complement: (32)

Zhangsan zuijin changchang bu neng hui jia Zhangsan recently often not can return home ‘Recently Zhangsan often cannot come home for dinner.’

chi eat

fan. rice

(vii) Chinese has wh-in-situ (instead of overt wh-movement), cf. (11a,b), repeated here: (11) a. What did John eat what ? b. Hufei chi-le sheme (ne) Hufei eat-asp what Qwh ‘What did Hufei eat?’ (viii) Chinese has no forms equivalent to nobody or each other: (33) Negative quantifiers: a. John did not see anybody. b. John saw nobody. c. Zhangsan Zhangsan

mei not

you have

kanjian renhe ren. see any person

d. *Zhangsan kanjian-le meiyou ren. Zhangsan see-PERF no person

330 C.-T. James Huang and Ian Roberts (34) Reciprocals: a. They each criticized the other(s). b. They criticized each other. c. Tamen they

ge each

piping-le duifang. criticize-PERF other

d. *Tamen they

piping- le bici. criticize-PERF each-other

(ix) Chinese is restricted to ‘analytic’ adverbial and adjectival modification, unlike English. Regarding adverbial modification, examples such as (35a,b) are essentially synonymous in English: (35)

a. Jennifer types fast, Dorothy drives fast, etc. b. Jennifer is a fast typist, Dorothy is a fast driver, etc.

On the other hand, in Chinese adverbs equivalent to English fast can only modify the verb, not the derived noun (see Lin and Liu 2005): (36) a. Zhangsan shi yi-ge (da zi) da-de hen kuai Zhangsan be one-CL (type) type very fast ‘Zhangsan is a typist who types very b. *Zhangsan shi yi-ge hen Zhangsan be one-CL very

kuai fast

de DE

de daziyuan. DE typist fast.’

daziyuan. typist.

Regarding adjectival modification, in English (37) is ambiguous (see Cinque 2010 for extensive discussion): (37) Jennifer is a beautiful singer. This example is ambiguous between the reading ‘Jennifer is beautiful and a singer,’ and ‘Jennifer sings beautifully.’ In Chinese, on the other hand, these two readings must be expressed by quite different structures, in the one case with hen piaolang (‘very beautiful’) modifying ‘singer,’ in the other case with it modifying ‘sing’: (38) a. Amei shi yi-ge hen piaolang de geshou. Amei be one-CL very beautiful DE singer ‘Amei is a singer who is beautiful.’ b. Amei shi yi-ge chang-de hen Amei be one-CL sing very ‘Amei is a singer who sings beautifully.

piaolang de beautifully DE

geshou. singer

Principles and Parameters of Universal Grammar 331 (x) Chinese has no equivalents of English articles (although it has the equivalents of numeral one and demonstrative this, that). (xi) Chinese lacks ‘coercion’ in the sense of Pustejovsky (1995). In English, a sentence like (39a) can be understood, depending on the context and what we know about John, as any of (39b–d): (39) a. b. c. d.

John began a book. John began reading a book. John began writing a book. John began editing a book.

On the other hand, in Chinese the equivalent of (39a) is ungrammatical; the implicit subordinate verb must be overtly expressed (see Lin and Liu 2005): (40) a. *Zhangsan kaishi Zhangsan begin

yi-ben shu. one-CL book

b. Zhangsan kaishi kan yi-ben Zhangsan begin read one-CL ‘Zhangsan began to read a book.’

shu. book

c. Zhangsan kaishi xie yi-ben Zhangsan begin write one-CL ‘Zhangsan began to write a book.’

shu. book

d. Zhangsan kaishi bian yi-ben Zhangsan begin edit one-CL ‘Zhangsan began to edit a book.’

shu. book

(xii) Chinese lacks (canonical) gapping: (41)

a. John eats rice, and Bill spaghetti. b. *Zhangsan chi fan, Lisi mian. Zhangsan eat rice, Lisi noodles

(xiii) Chinese has no ‘ga–no conversion,’ i.e., nominative–genitive alternation, as often found in languages with prenominal relatives. Thus in Japanese object relatives, the subject of the relative clause may be case-marked with nominative -ga or genitive -no, indicating the influence of the nominal phrase that dominates it. (42)

John-ga/no katta John-Nom/Gen bought ‘the fish that John bought’

sakana fish

332 C.-T. James Huang and Ian Roberts This phenomenon is commonly attributed to the ‘strong’ nominal nature of the relative- clause CP and TP (see Ochi 2001 and references there, among many others). In Chinese the subject cannot bear genitive case: (43) Zhangsan (*de) mai de yu Zhangsan (*’s) bought REL fish the fish that Zhangsan (*’s) bought’ (xiv) Chinese lacks gerundive nominalization with a genitive subject: (44) a. John’s buying that car was a stupid decision. b. Zhangsan (*de) mai na-bu che shi ge Zhangsan (*’s) buy that-CL car be CL

yuchunde jueding stupid decision

(xv) Chinese shows a series of syntax–semantics mismatches (see Huang 1997 et seq). One famous case is when a pseudo-noun incorporation construction is separated by a low adverbial after the verb is raised: (45) ta chi-le yi-ge zhongtou (de) fan, hai mei chi-bao. he eat-PERF one-CL hour (*’s) rice, still not finish ‘He ate for a whole hour, and is still not done.’ (Literally: He ate a whole hour’s rice, and is still not done with eating.) (xvi) Chinese has analytic passivization, with the so-called ‘bei passive’ being some what akin to the English get-passive. Instead of employing passive morphology that intransitivizes an active transitive verb, Chinese forms a passive by superimposing a semi-lexical verb bei (whose meaning approximates ‘undergo’) on the main predicate without passivizing the latter: (46) Zhangsan bei [Lisi qipian-le liang ci] Zhangsan bei Lisi deceived two time ‘Zhangsan got twice deceived by Lisi.’ The important thing to observe here is the clustering of these sixteen properties in Chinese to the exclusion of them in English. (Other properties could be added to this list, including those related to argument structure, as argued in Huang 2006 for Mandarin resultatives, and in Lin 2001 et seq. on noncanonical subjects and objects; see also Barrie and Li 2015 for related discussion.) Some of these properties have previously been attributed to macroparameters (e.g., the Wh-Movement Parameter and Nominal Mapping Parameters mentioned in section 14.3), but the degree of clustering shown here had not been observed prior to Huang (2005, 2015) and indicates a macroparameter of high analyticity; following Huang (2005, 2015) this macroparameter can be opposed to Baker’s Polysynthesis Parameter (in fact, in terms of the Biberauer and Roberts-style NO>ALL>SOME learning path/parameter hierarchy, they can be seen as representing the two extreme NO vs. ALL options for some UG-underspecified

Principles and Parameters of Universal Grammar 333 property; we develop this idea below in section 14.11). So this is a clear case of macroparametric clustering.

14.6.2 Macroparametric Properties of Old Chinese (vs. Modern Chinese) Rather like the contrasts between Modern Chinese and English, we can observe a number of syntactic properties which distinguish Old (or Archaic) Chinese (OC:500 BC to AD 200) from Modern Chinese (MnC). These are as follows: (i) OC lacks light verbs, but instead has denominalized unergative intransitives: yu ‘to fish’ (instead of da yu); (ii) OC lacks pseudo-incorporation: fan ‘have rice’ (instead of chi fan ‘eat rice’); (iii) OC has simplex accomplishments: po ‘break’ (instead of da-po ‘make break’); (iv) OC does not have overt classifiers for count nouns: san ren ‘3 persons,’ er yang ‘two sheep’ (see Peyraube 1996 among others); (v) OC does not have overt localizers, as illustrated in the famous line from the Confucian Analects (Peyraube 2003, Huang 2009 for other examples): (47) 八侑舞於庭, 是可忍也, 孰不可忍也? (論語: 八侑) bayou wu yu ting, shi ke ren ye, shu bu ke ren ye? 8x8 dance at hall this can tolerate Prt, what not can tolerate Prt (Analects: Bayou) ‘To hold the 8x8 court dance in his own court, if this can be tolerated, what else cannot be tolerated?’ Note yu ting ‘in the court,’ instead of yu ting-zhong ‘at court’s inside.’ (vi) OC has passive-like sentences that are arguably derived by NP-movement: (48) 勞心者治人, 勞力者治于人。（孟子: 滕文公） laoxinzhe zhi ren, laolizhe zhi yu ren Mental-workers govern people, physical-workers govern by others; ‘Mental workers govern people; physical workers are governed by people.’ (vii) OC has overt wh-movement (although to an apparently clause-medial rather than a left-peripheral position): (49) 吾谁欺? 欺天乎? (論語: 子罕) wu shei qi, qi tian hu? (Analects: Zihan) I whom deceive, deceive heaven Prt ‘Who do I deceive? Do I deceive the Heavens?’

334 C.-T. James Huang and Ian Roberts (viii) OC relatives involve operator movement of a relative pronoun, the particle suo: (50) 魚, 我所欲也; 熊掌, 亦我所欲也。（孟子: 告子） yu, wo suo yu ye; xiongzhang, yi wo suo yu ye. fish, I which want Prt; bear-paw, also I which want Prt (Mancius: Gaozi) ‘Fish is what I want; bear paws are also what I would like to have.’ (ix) OC had focus movement: an object focused by wei ‘only’ is preposed to a Spec,FocusP position: (51) 唯命是从 (左傳: 召公12年) wei ming shi cong (Zuozhuan: Zhaogong 12) only order this follow ‘only the order have I followed’ (x) OC allowed postverbal adjuncts: (52) 易之以羊 (孟子: 梁惠王上) yi zhi yi yang (Mencius: Lianghuiwang I) replace it with sheep ‘replace it with a sheep’ (xi) OC shows canonical gapping, as shown by Wu (2002): (53) 為客治飯而自Ø藜藿。《淮南子·說林》 wei ke zhi fan er zi Ø lihuo for guest cook rice and self grass ‘For guests cook rice, but for onself [cook] grass.’

(Huainanzi.Shuolin)

(xii) OC exhibits nominative–genitive alternation in prenominal relatives. In (54) we have two (free) relative clauses whose subjects are Genitive-marked by zhi, indicating that the relative CPs are highly nominal: (54) 是聰耳之所不能聽也, 明目之所不能見也, … . (荀子: 儒效) (Xunzi: Rixiao) shi cong-er zhi suo bu neng ting ye, ming-mu zhi suo this sharp-ear Gen what not can hear Prt, bright-eye Gen what bu neng jian ye. not can see Prt ‘This is what a sharp ear cannot hear, and what a bright eye cannot see, …’

Principles and Parameters of Universal Grammar 335 (xiii) OC allows extensive use of gerundive constructions with genitive subjects, again revealing the nominal nature of the embedded CP: (55)

寡人之有五子, 猶心之有四支。（晏子內篇諫上） guaren zhi you wu zi, you xin zhi you Self Gen have five son, like heart Gen have ‘My having five sons is like the heart’s having four supports.’

si zhi. four support

We observe the same clustering of properties in OC that distinguish this system en bloc from MnC. In fact, OC seems to pattern consistently like English regarding these properties, and against MnC. Again, this clustering is macroparametric. We conclude, on the basis of the evidence presented in this and the preceding section that macroparametric variation exists. Therefore our theory of variation must capture these kinds of clusterings of properties.

14.7 Evidence for Microparameters: Synchronic Microvariation among Chinese Dialects There is considerable microvariation among the various ‘dialects’ of Chinese. Here we list a few striking examples of syntactic microvariation which have been discussed in the recent literature, mainly regarding differences among Mandarin, Cantonese, and Taiwanese Southern Min (TSM). A first set of differences involves classifier stranding (see Cheng and Sybesma 2005). This operation allows for deletion of the numeral associated with a classifier under the relevant conditions. It is schematically illustrated in (56): (56) yi ben shu ‘one classifier book’ → yi ben shu ‘classifier book’ The dialects of Chinese vary as to the syntactic positions which allow for this kind of deletion. In Mandarin, it is allowed in object position but not subject position: (57)

Mandarin: ok Object, *Subject a. wo yao mai ge roubaozi lai I want buy CL meat-bun to ‘I want to buy a meat bun to eat.’

chi. eat

336 C.-T. James Huang and Ian Roberts b. *ge roubaozi tai xian le. CL meat-bun too salty Prt. ‘A/the meat bun is too salty.’ In Cantonese, it is allowed in both subject and object position: (58)

Cantonese: ok Object, ok Subject a. ngo yiu maai go zyuyuk-baau lai I want buy CL meat-bun to b. go zuuyuk-baau taai ham CL meat-bun too salty

sik. eat

la. SFP

In TSM, it is not allowed in either position: (59)

TSM: *Object, *Subject a. *gua be boe liap bapao-a lai tsia. I want buy CL meat-bun to eat b. *liap bapao-a ukau kiam CL meat-bun very salty

This looks rather similar to the distribution of bare nominals in European languages: Italian allows them in object but not subject position (Longobardi 1994): *Latte è buono/Qui si beve latte (‘Milk is good/Here one drinks milk’); Germanic allows them in both positions: Milk is good/I drink milk; French doesn’t allow them in either position: *Lait est bon/*Je bois lait (equivalent to the English examples just given). There may thus be a parallel between the incidence of bare nominals in European languages and the incidence of classifier stranding in Chinese varieties. Clearly this observation merits further explanation. Second, dialects differ in the extent to which they make use of postverbal suffixes. Mandarin has some aspectual suffixes (e.g., the progressive zhe, the perfective le, and the experiential guo). Cantonese has a considerably more elaborate system, employing additional postverbal suffixes like saai, dak, and ngaang for expressions of exhaustivity, exclusivity, and obligation (see Tang 2006:14–15): (60) a. keoi tai-saai bun syu. he read-up CL book ‘He finished reading the entire book.’ b. keoi tai-dak jat-bun he read-only one-CL ‘He only read one book.’

syu. book

c. keoi tai-ngaang nei-bun syu. he read-should this-CL book ‘He should read this book.’

Principles and Parameters of Universal Grammar 337 Some of the suffixes may stack, indicating the considerable height of the verb, for example with the exhaustive on the experiential: (61) keoi tai-go-saai nei-di syu. he read-Exp-Exhaust these book ‘He has read up all these books.’ On the other hand, TSM is much more restricted. While the experiential kuei may arguably be a suffix in TSM as it is in Mandarin, the cognates of Mandarin progressive zhe and perfective le are not. Instead, the progressive and the perfective are rendered with preverbal auxiliaries, an analytic strategy: (62) a.

gua ti khoann tiansi. I Prog watch TV ‘I am watching the TV.’

b. li u chia-pa bou? you have eat-full not ‘Have you finished eating?’ Third, the dialects vary regarding their verb–object order preferences (see Liu 2002; Tang 2006). Mandarin allows both OV or VO orders, while Cantonese is strongly VO and TSM strongly OV. The following patterns of preference are typical: (63) a. Cantonese: ngo tai-zo (bun) syu. I read-Perf CL book ‘I have read the book.’ b. Mandarin: c. TSM:

wo I

??gua I

kan-le read-Perf

shu le. book SFP

khoann-kuei tshe a. read-Exp book SFP

??ngo (bun) syu tai-zo. I CL book read-Perf ‘??I the book have read.’ wo I

shu book

kan-le. read-Perf-SFP

gua tshe khoann-kuei a. I book read-Exp SFP

Fourth, there is variation regarding the position of the motion verb qu ‘go’ (see Lamarre 2008). Corresponding to the English sentence ‘Zhangsan went to Beijing,’ Mandarin allows both the ‘analytic’ strategy (64a) and the ‘synthetic’ strategy (64b): (64) a.

b.

Zhangsan dao Beijing qu le. Zhangsan to Beijing go Perf ‘Zhangsan to Beijing went.’ Zhangsan qu-le Beijing. Zhangsan go-Perf Beijing ‘Zhangsan went Beijing.’

338 C.-T. James Huang and Ian Roberts Cantonese allows only the synthetic strategy, whereas Pre-Modern Chinese (as illustrated in textbooks used during Ming–Qing dynasties) allows only the analytic strategy. Assuming that (64b) is derived by V-movement to a null light verb position otherwise occupied by dao in (64a), this pattern shows that V–v movement is obligatory in Cantonese, optional in Mandarin, but did not take place in Pre-Modern Chinese. We conclude then that there is clear empirical evidence from varieties of Chinese that, alongside macroparameters of the kind illustrated in the previous section, microparameters also exist, with varying (but lesser) degrees of clustering. We will see more examples of microparameters in section 14.10.5.

14.8 Macroparameters as Aggregates of Microparameters The idea that macroparameters are not primitive aspects of UG, but rather derive from more primitive elements, was first suggested in Kayne (2005a:10). It is also mentioned by Baker (2008b:354n2). However, it has been developed in various ways in recent work, starting from Roberts and Holmberg (2010) and Roberts (2012), by Biberauer and Roberts (2012, 2014, 2015a,b, in press), Biberauer, Holmberg, Roberts, and Sheehan (2014), Sheehan (2014, to appear); see again the references at http://recos-dtal.mml.cam. ac.uk/papers. On this view, macroparameters are seen as aggregates of microparameters with correlating values: a macroparametric effect arises when a group of microparameters act together (clearly, meso-parameters, as in (26), can be defined in a parallel fashion). Hence macroparameters are in a sense epiphenomenal; each microparameter that makes up a macroparameter falls under the BCC, limiting variation to formal features of functional heads. The microparameters act in concert for reasons of markedness, related to the general conservatism of the learner, and therefore arguably to the third factor (see chapter 6). The two principal markedness constraints are Feature Economy and Input Geeralization, as given in (25), repeated here: (25)

(i) Feature Economy (FE) (see Roberts and Roussou 2003:201): Postulate as few formal features as possible. (ii) Input Generalization (IG) (see Roberts 2007:275): Maximize available features.

Together these constitute a minimax search and optimization strategy: assume as little as possible and use it as much as possible. As Biberauer and Roberts (2014) show, there are analogs to this strategy in phonology (Dresher 2009, 2013) and in other cognitive domains (see in particular Jaspers 2012). Note also that IG generalizes the known to the unknown, and so can be seen as a form of bootstrapping. The interaction of FE and IG

Principles and Parameters of Universal Grammar 339 gives rise to the NO>ALL>SOME learning path described in section 14.5. We can now present that idea in a more precise fashion as follows (see also Biberauer, Holmberg, Roberts, and Sheehan 2014:111): (65)

(i) default assumption: ¬∃h [ F(h)] (ii) if F(h) is detected, generalize F to all relevant cases (∃h [ F(h)]→ ∀h [ F(h)]); (iii) if ∃h ¬[ F(h)] is detected, restrict h and go back to (i); (iv) if no further F(h) is detected, stop.

Here h designates functional heads, and F is the predicate ‘feature-of,’ so F(h) means ‘formal feature of a head H.’ As we have said, the procedure in (65) says that acquirers first postulate NO heads bearing feature F. This maximally satisfies FE and IG. Then, once F is detected in the PLD, that feature is generalized to ALL relevant heads, satisfying IG but not FE. This step, in other words the operation of the third-factor strategy IG, gives rise to clustering effects, i.e., aggregates of microparameters acting in concert as macroparameters. The existence of macroparameters and clustering, and therefore many large-scale typological generalizations such as the tendency towards harmonic word order, or high analyticity as in MnC, follows from the interaction of the three factors in language design in a way which is entirely compatible with both the letter and the spirit of minimalism. This establishes Point B in section 14.5.

14.9 The Hierarchical Organization of Parameters The idea of a hierarchy of parameters was first put forward in Baker (2001:170). Baker suggested a single hierarchy, and, while his specific proposal had some empirical problems, the proposal had two principal merits, both of which are intrinsic to the concept of a hierarchy. First, it forces us to think about the relations among parameter settings, both conceptually in terms of how they interact in relation to the architecture of the grammar (do we want to connect parameters of stress to parameters of word order, for example? See chapter 12 for relevant discussion in relation to first language acquisition), how they interact logically (it is impossible to have inflected infinitives in a system which lacks infinitives, for example), and empirically on the basis of typological observations (e.g., to account for the lack of SVO ergative languages, as observed by Mahajan 1994, among others). Second, parameter hierarchies can restrict the space of possible grammars, and hence reduce the predicted amount of typological variation and simplify the task for a search-based learner (see chapter 11). Given a hierarchical approach, the cardinality of G, the set of grammars, is equivalent to the cardinality of P, the set of parameters, plus 1, to the power of the number of hierarchies. So, if, for example, there are just 5 hierarchies with 20 parameters each. Then |G| is 215, or 4,084,101 for 5 × 20 = 100 possible choice

340 C.-T. James Huang and Ian Roberts points. Compared to 2100, this is a very small number, entailing the concomitant simplification of the task of a search-based learner (see again chapter 11, section 11.6). Roberts and Roussou (2003:210–213) suggested organizing the following set of options relating to a given formal feature F on the basis of their proposal that grammaticalization is a diachronic operation affecting functional categories: (66)

F? (formal feature?) No

yes

STOP

Does F Agree? No

yes

STOP

Does F have an EPP feature? No

Yes

(head-initial) Does F trigger head-movement?

(head-final) Is F realized by external Merge?

No

Yes

No

Yes

STOP

Does every F

STOP

Agglutinating

High analyticity trigger movement? No

Yes

Synthesis

Polysynthesis

Notice how this hierarchy derives the four traditionally recognized morphological types (Sapir 1921). It also connects analyticity and head-initiality on the one hand, and agglutination and head-finality on the other (see also Julien 2002 on the latter). Gianollo, Guardiano, and Longobardi (2008, see chapter 16) developed the Roberts and Roussou approach¸ and, as we have seen, introduced the very important idea that the parameters are not primitives of UG, but created by the hierarchies (‘schemata’ in their terminology). Roberts and Holmberg (2010) proposed two distinct hierarchies for word order and null argument phenomena, and Roberts (2012) and Biberauer, Holmberg, Roberts, and Sheehan (2014) proposed three more, dealing with word structure

Principles and Parameters of Universal Grammar 341 (polysynthesis/analyticity and microparametric options in between giving various kinds of fusional systems), A′-movement (wh-movement, scrambling, topicalization, and focalization), and alignment. In connection with the last of these, Sheehan (2014, to appear) has developed several hierarchies relating to ergativity, causatives, and ditransitives, and Sheehan and Roberts (2015) have developed a hierarchy for passives. Each of this last class of hierarchies has the general form in (67): (67) Does H a head have F (i.e. is F in the system at all?) NO

YES: is F generalized to all H? YES

NO: is F limited to a subset of transitive H? YES

NO: is F extended to a further subset? YES

NO: does H have an EPP feature? NO

YES: does H have an ‘extra’ Case/Φ feature?

Sheehan (2014, to appear) shows that a hierarchy of this kind applies to F an inherent Case feature of v (for ergativity), F a feature of Appl (causatives/ditransitives) and F a feature of Voice (passives; see Sheehan and Roberts 2015). Other hierarchies have been proposed for Person, Tense, and Negation (on the latter, see Biberauer 2011). These hierarchies are empirically successful in capturing wide typological variation of both the macro-and microparametric kind (for example, Sheehan and Roberts’ passive hierarchy covers Yoruba, Thai, Yidiɲ, Turkish, Dutch, German, Latin, Danish, Norwegian, Hebrew, Spanish, French, English, Swedish, Jamaican Creole, and Sami). As already mentioned, this hierarchical organization of the elements of parametrization reduces the potential number of options that a child has, thereby easing the learning procedure. Hence, Plato’s problem is solved. It is important to emphasize that the macroparameters, and the parameter hierarchies, are not primitives: they are created by the interaction of FE and IG. UG’s role is reducible to a bare minimum: it simply leaves certain options open. In this way, we approach the minimalist desideratum of moving beyond explanatory adequacy (see chapters 5 and 6). Note also that if Biberauer’s (2011, 2015) proposal that the formal features themselves are emergent properties resulting from the interaction of the three factors is adopted, then a still further step is taken in this direction. We now illustrate these ideas concretely, taking the variation discussed in section 14.6 in Modern Chinese, Old Chinese, and Modern Chinese dialects as case studies.

342 C.-T. James Huang and Ian Roberts

14.10 Back to Chinese 14.10.1 Summary and a First Attempt to Characterize Chinese in Parametric Terms To summarize so far, we have made the following proposals. First, there are macroparameters, which capture important, sometimes sweeping, clusters of typological properties. Second, there are also microparameters with fewer or no clustering effects. The macroparameters give rise to general patterns, while the microparameters give rise to the mixed, exceptional cases, plus details of variation or change. Third, macroparameters are aggregates of microparameters acting in concert driven by FE and IG. Fourth, parameters conceived as cases of underspecification do not add to the burdens of UG and are consistent with minimalist theorizing. Fifth, parameters are hierarchically organized so the number of occurring options to choose from is greatly reduced, as is the burden on the learner. Sixth, a small number of parameter hierarchies (which appear to be fairly isomorphic in general, if Sheehan is right) is enough to account for a large amount of cross-linguistic variation (languages, dialects, idiolects). Now let us consider the macroproperties of Chinese. Viewed synchronically, Modern Chinese exhibits high analyticity in a macroparametric way, in systematic contrast to many other languages (including English, which compared to many other Indo- European languages is often described as somewhat analytic in a pre-theoretical sense). Moreover, Chinese is analytic at all levels: lexical, functional, and at the level of argument structure. Viewed diachronically: Old Chinese underwent macroparametric change from a substantially synthetic language (typologically closer to English, as we observed in section 14.6.2) to a highly analytic language, at all levels, with analyticity peaking at the end of the Six-Dynasties period and the Tang–Song period, followed by small-scale new changes that result in the major dialects of Modern Chinese, with some varying degrees of small-scale synthesis. We thus observe a partial diachronic cycle: synthetic to analytic to synthetic. (For the shift from synthetic to analytic, see also Mei 2003; Xu 2006; and Peyraube 2014). The question now is how to characterize the macroparametric properties. One possibility would be a simple ‘analytic–synthetic’ parameter, with the features [±analytic], [±synthetic], so that Chinese is [+analytic, −synthetic], Old Chinese (and say English) are [+analytic, +synthetic], and some other languages (say Romance) are [−analytic, +synthetic]. This is not a good approach, we believe, for two main reasons. First, such a view is purely descriptive and does not reveal the real nature of linguistic variation. For one thing, there are exceptions that must be accounted for, and such exceptions must resort to microparametric descriptions. A binary-value parameter cannot reveal the nature of the gradation that characterizes cross-linguistic variation and diachronic changes. This is the basic problem with many macroparameters that formed the basis of Newmeyer’s

Principles and Parameters of Universal Grammar 343 (2005) critique. Second, such a view makes use of concepts unavailable in the theoretical vocabulary of a minimalist grammar: what are the features [±analytic], [±synthetic]? While it may have been possible to countenance such features in GB, it is against the spirit, and arguably the letter, of minimalist theorizing.

14.10.2 A Second Approach: Lexical and Phrasal Domains Suppose, following Hale and Keyser (1993), Chomsky (1995b), and much subsequent work (see in particular Borer 2005; Ramchand 2008) that transitive and unergative predicates involve a form of ‘VP shell.’ In English, unergatives like telephone, transitives like peel and verbs which can freely alternate between the two like fish are associated with a basic structure like that in (68): (68)

vP v’

DPEA

v

NP

DO DO DO

telephone fish peel

Here DO is an abstract predicate assigning an Agent θ-role to the external argument (EA) in its Specifier (Dowty 1979, 1991; Borer 2005; Folli and Harley 2007; Ramchand 2008). The head of the complement of v incorporates into v, giving rise to a derived verb. Head movement is the operation which gives rise to synthetic structures (and, of course, maximally generalized head movement gives rise to polysynthesis according to Baker 1996). Hence we understand the pretheoretical terms ‘synthetic’ and ‘analytic’ to mean, respectively, having/lacking head movement. This applies across various domains, as we will see. With verbs showing the anti-causative alternation in languages like English, we posit a CAUSE head above VP, as in (69) (see again Folli and Harley 2007): (69)

vP v’

DPEA v CAUSE

VP …

break

344 C.-T. James Huang and Ian Roberts And for the transitive version of denominal verbs, we have a further CAUSE head above vP as in (70), for the transitive feed, i.e., ‘EA causes IA to do food/eat’: vP

(70)

v’

DPEA v

VP V’

DPIA

CAUSE

V DO

NP food

English v may be in the form of a phonetically null light verb DO or CAUSE, which are assumed to have the following properties: they both have formal features which need to Agree, do not contain EPP, and do trigger head movement (these properties may all be connected in terms of the general approach to head movement developed in Roberts 2010d). Head movement equates to synthesis, and English abounds in simplex denominal verbs like telephone, fish, peel, and simplex causatives like break or feed. In Modern Chinese, on the other hand, v is occupied by an overt light verb such as da for an unergative or a ‘cognate verb’ for pseudo-incorporation: (71)

vP v’

DPEA

v

NP

da da bo ‘peel’ nian ‘read’

dianhua ‘telephone’ yu ‘fish’ pi ‘skin’ shu ‘book’

Principles and Parameters of Universal Grammar 345 For causatives, either an inchoative verb combines with a light/cognate verb to form a compound (rather than moving into a null v forming a simplex causative): vP

(72)

v’

DPEA v

VP

da/nong ‘do/make’

…

po ‘break’

Or we have a periphrastic causative, with heavy verbs like shi ‘cause,’ rang ‘let,’ and so forth. (73)

vP v’

DPEA

VP

v rang let

V’

DPIA V chi eat

NP fan rice

‘let someone eat rice’ Unlike English, Chinese does not have the phonetically null CAUSE and DO. Instead, it resorts to lexical (light or heavy) verbs which do not trigger head movement (though they may trigger compounding), leading to high analyticity. Instead of simplex denominalized action verbs or simplex causatives, Chinese resorts to more complex expressions, and abounds in light verb constructions, pseudo-incorporation, resultative compounds or phrases, and periphrastic causatives. The high analyticity of

346 C.-T. James Huang and Ian Roberts Chinese derives from the absence of incorporation into the abstract DO and CAUSE. These labels are really shorthand for certain event-and θ-role-related features of v, whose exact nature need not detain us here; these features are lexically instantiated in Chinese by verbs such as da and rang which, as lexical roots in this language, repel head movement. Let us now look at how IG can give rise to macroparametric clustering. By IG, if v can attract a head, then, all other things being equal, n, a, and p also have that property (this represents the unmarked option as it conforms to IG). Chinese has lexical classifiers, nominal localizers, an adjectival degree marker, and (discontinuous) prepositions, while English generally has such categories in null or affixal form. So high analyticity generalizes across all the principal lexical categories in Chinese. Looking at the specific cases, Chinese count nouns are formed by an overt ‘light noun’ (i.e., a classifier): (74) [CL ben [NP shu ]] = count noun By IG, the light noun does not trigger head movement, so ben shu is the Chinese ‘count noun,’ i.e., an analytic ‘count noun phrase.’ On the other hand, English count nouns are formed by incorporating the noun root into an empty CL-head (see Borer 2005): (75) [CL CL [NP book ]] → [CL book+CL [NP t ]] = count noun By IG, CL has a formal feature that Agrees, has no EPP and triggers head movement, so the count noun is synthetic. As we saw in section 14.6, Chinese forms locational NPs with overt localizers (see also Biggs 2014): (76) [zhuozi [nali]] table place ‘the table’s location’ The word nali means ‘place.’ Here too there is no head movement and so the locative expression is analytic in the sense we have defined. English forms such NPs by incorporating silent PLACE (see Kayne 2005b): (77) [table [PLACE]] Hence table here is synthetic. Chinese adjectives have lexical hen (‘very’), which marks absolute degree: hen hao (‘very good’). Kennedy (2005, 2007) proposes treating a gradable adjective as being headed by a Deg0 in the form of covert pos, e.g., [DegP pos [AP happy]], which we may think of as HEN, the covert counterpart of hen. English adjectives incorporate into null HEN and are synthetic, but Chinese adjectives do not incorporate but remain analytic.

Principles and Parameters of Universal Grammar 347 The Deg0 head hen or HEN turns a state adjective into a degree word, which is then able to combine with comparatives and superlatives, much as a classifier turns a mass or kind into a count noun so it can be combined with a number word. (See Dong 2005 and Liu 2010 for relevant discussions.) Chinese complex PPs take a ‘discontinuous’ form: (78) zai zhuozi pangbian at table side ‘by the table side’ Again, this is an analytic construction. English complex PPs are formed by incorporation, as can be fairly transparently seen in some cases, e.g., beside: (79) be-the table -side  beside the table Here side incorporates to be (presumably a morphologically conditioned variant of by), an example of a synthetic preposition (see Svenonius 2010 and the other papers in Cinque and Rizzi 2010 for cartographic analyses of the extended PP). Similarly, one could analyze English in the box along with its Chinese counterpart as AT the box’s in(side), thus taking all locative prepositions to be underlyingly headed by the light preposition AT.

14.10.3 The Clausal, Inflectional Domain Mandarin Chinese has aspectual suffixes that are functional heads (they are grammaticalized verbs) and instantiate formal features of those heads. As such, they enter into Agree with appropriate verb stems. However, they do not trigger (overt) head movement: (80) Zhangsan [ASP PERF] [VP zuotian qu-le Kaohsiung] Zhangsan yesterday go-le Kaohisung ‘Zhangsan went to Kaohisung yesterday.’

(PERF Agrees with le)

English T and Asp heads are similar to Mandarin in this respect. These clausal heads are functional, they enter into Agree with the inflected verb and they do not attract the inflected verbs: (81) John [T TNS] [VP often kisses Mary in the kitchen] (TNS Agrees with kisses) In Romance languages, as has been well known since Pollock (1989), T and Asp attract lexical verbs (see Schifano 2015 for an extensive analysis of verb movement across a range of Romance languages, which effectively supports this conclusion, with some important provisos). Thus, while English is synthetic in the v-domain, it is not synthetic in the T-domain: only some, but not all, Fs trigger head movement (in this respect

348 C.-T. James Huang and Ian Roberts English may be more marked than either Romance or Mandarin; Biberauer, Holmberg, Roberts, and Sheehan 2014:126 arrive at the same conclusion comparing English to other languages). Chinese is more consistently analytic than English is synthetic; hence it is less marked in this regard than English.

14.10.4 Old Chinese Let us turn now to Old Chinese, looking first at the lexical domain. In this domain, Old Chinese is similar to English (as we observed in section 14.6.2). Like English, Old Chinese possessed null DO and null CAUSE as higher lexical heads (both reconstructed as *s-by Tsu-Lin Mei 1989, 2012 and references given there) which trigger head movement (see also Feng 2005, 2015 for extensive other examples of head movement in OC). This gives rise to the following properties: (82) a. No light verb, but denominalization: yu ‘to fish’: [VP *s [NP yu]]  yuverb b. No pseudo-incorporation: fan ‘have rice’: [VP *s [NP fan]]  fanverb c. No compounds: synthetic accomplishments: po ‘break’: [VP *s [VP-inchoative po]]  poverb-causative And by IG, the properties in (83): (83) a. No overt classifiers for count nouns (no need for ‘light noun’); b. No need for overt localizers (no need for ‘light noun’). Turning now to the clausal functional heads, Old Chinese TP differs from Modern Chinese in the nature of at least one clausal functional head (probably more than one) in the TP region, immediately below the subject. Let us call this FP (possibly standing for focus phrase). F has an unvalued feature that requires it to Agree with an appropriate element and an EPP feature requiring XP movement. This gives rise to the following XP movements in OC: (84) a. b. c. d.

Wh-movement; suo-movement for relatives; focus-movement (of only-phrases); postverbal adjuncts.

Furthermore, it is possible that F also triggered head movement, giving rise to canonical gapping (Wu 2002; He 2010), assuming, following Johnson (1994) and Tang (2001), that gapping is across-the-board V-movement from a coordinated v/VP. The MnC–OC

Principles and Parameters of Universal Grammar 349 contrast follows from the general lack of v-movement beyond vP in MnC, and the availability of such movement (e.g., into FP) in OC.

14.10.5 Mesoparametric Variation in Modern Chinese Dialects Here we observe microvariation with small degrees of clustering among Mandarin, Cantonese, and TSM, creating, at least regarding the contrasts between Mandarin and TSM, a mesoparametric effect. Relatively speaking, Cantonese has undergone more grammaticalization and is the least analytic (or most synthetic) of the three dialects. Mandarin (and Shanghai) has developed some suffixes but remains analytic in that these suffixes do not trigger (overt) head movement. TSM remains the most analytic, in having developed the least number of suffixes. Clearly, here we have only scratched the surface of the dialectal variation to be found in ‘Chinese.’ Such microparametric differences are sure to increase when more dialects are examined, either contemporary dialects or dialects at any given historical stage. Hence although there is the appearance of macroparametric changes from OC to Modern Chinese, the truth must be that the actual changes took place on a microparametric level. Let us look more closely at some of the microparametric differences between TSM and Mandarin. Together with those we have touched upon, we can identify 10 differences that distinguish them: (i) Classifier stranding. As mentioned more generally in section 14.7, while Mandarin allows deletion of an unstressed yi ‘one’ in certain positions thereby stranding a classifier, TSM does not allow classifier stranding. Compare the following, repeated from (57a) and (59a): (85) Mandarin: wo yao mai (yi) ge I want buy (one) CL ‘I want to buy a meat bun to eat.’ TSM:

gua I

be want

boe buy

*(tsit) *(one)

liap CL

roubaozi meat-bun

lai to

chi. eat

bapao-a meat-bun

lai to

tsia. eat

(ii) Aspectual suffix vs. auxiliary. While the perfective aspect in Mandarin employs the suffix le, TSM resorts to a lexical auxiliary u ‘have.’ Compare: (86) Mandarin: ni chi-bao-le ma? you eat-full-Perf Q ‘Have you finished eating?’ TSM:

li u tsia-pa you have eat-full

bou? Q?

350 C.-T. James Huang and Ian Roberts That is, in Mandarin the Asp holds an Agree relation with the verb, in TSM a lexical auxiliary does away with the Agree relation. The use of u ‘have’ as an auxiliary is in fact generalized to all other categories, expressing existence of the main predicate’s denotation. Thus, as an auxiliary of a telic vP, it expresses perfectivity (as in (86)). It may also be used with an atelic VP, or with an AP, PP, or AspP predicate, expressing existence of the relevant eventuality: (87) a. li u tsia you have eat ‘Do you smoke?’

hun tobacco

b. gua bou ai I not-have like ‘I don’t like this shirt.’

bou? Q?

(VP)

tsit-nia sann this-CL shirt

c. in u te kong tsit-hang they have at discuss this-CL ‘They have been discussing this thing.’ d. i tsima bou ti he now not-have at ‘He is presently not at home.’ e tsit-tiunn too u this-CL picture have ‘This picture is pretty.’

taitsi. thing

tshu. home

sui. pretty

(VP)

(AspP)

(PP)

(AP)

(iii) Aspectual suffix vs. resultative verb. While Mandarin perfective le is a suffix denoting a viewpoint aspect, the corresponding item in TSM liau is still a resultative verb meaning ‘finished.’ (88) Mandarin: ta chi-le fan he eat-Perf rice ‘He has eaten /He ate.’ TSM: i chia-liau peng he eat-finished rice ‘He finished the rice.’

le. Prt a. SFP

(iv) Null vs. lexical light verb. In Mandarin, there is an interesting ‘possessive agent’ construction, illustrated here: (89) a. ni tan nide gangqin, ta kan tade you play your piano, he read his ‘You did your playing piano; he did his reading novels.’ b. ta ku tade, ni shui nide. he cry his, you sleep your ‘He did his crying; you did your sleeping.’

xiaoshuo. novels

Principles and Parameters of Universal Grammar 351 In (89a), the possessives nide ‘your’ or tade ‘his/her’ do not denote the possessor of the NP they modify (a piano or a novel). And in (89b), the possessives are presented without a possessee head noun. In each case, the genitive pronoun is understood as the agent of an event, represented as a gerundive phrase in the translation. Huang (1997) argued that these sentences involve a null light verb DO taking a gerundive phrase as its complement. The surface form is obtained when the verb moves out of the gerund into the position of DO. (90) a. ni DO nide [GerundP [VP tan gangqin]] you DO your play piano

b. ta DO tade [GerundP [VP ku]] he DO his cry These examples thus illustrate a limited kind of denominalization (whereby a verb moves out of a gerundive into DO). Interestingly, the corresponding expressions in TSM take the form of a lexical cho, literally ‘do,’ in place of the null DO, thus repelling head movement: (91)

a. li tso [li tuann kengkhim]; i tso [i khuann siosuat] you do you play piano he do he read novel ‘You do your piano-playing; he does his novel reading.’ b. i tso [i khao]; li tso [li khun]] he do he cry you do you sleep ‘He went on crying, and you went on sleeping.’

(v) Position of (definite) bare objects. As indicated, Mandarin allows a definite object in postverbal position, while TSM prefers a preverbal object. This preference is particularly strong with bare nouns with definite reference: (92)

Mandarin: ta mei zhao-dao shu. he not seek-get book ‘He did not find the book.’ TSM:

i tshe tshuei-bou. he book seek-not-have. ‘He did not find the book.’

(In the TSM example, placing the object tshe after the verb would render it non- referential, meaning ‘he didn’t find any book.’)

352 C.-T. James Huang and Ian Roberts (vi) Objects of verb-resultative constructions. In Mandarin they may appear after the main verb, but in TSM they are strongly preferred in preverbal position with ka: (93) Mandarin: wo ma-de ta ku-le qi-lai. I scold-to he cry-Perf begin ‘I scolded him to tears.’ TSM:

gua ka yi me-ka khau a. I ka he scold-to cry Prt ‘I scolded him to tears.’ (*?gua me-ka yi khao.)

(vii) Complex causative constructions. Mandarin allows V- to- CAUSE raising in forming complex causative constructions, while in TSM such constructions are strictly periphrastic with a lexical causative verb. This is not just a strong preference. (94) Mandarin: zhe-jian shi gaoxing-de ta liuchu-le yanlei. this-CL thing happy-to he flow-Perf tears ‘This thing pleased him to tears.’ TSM:

tsit-tsan taitsi hoo i huannhi-ka lao-baksai. this-CL thing cause he happy-to tears ‘This thing caused him to be happy to tears.’ (*tsit-tsan taitsi huannnhi-ka i lao-baksai.) this-CL thing pleased-to him tears

(viii) Outer objects and applicative arguments. In Mandarin the verb may raise above an outer or applicative object, but in TSM it must be licensed by the applicative head ka preverbally: (95) Mandarin: wo da-le Zhangsan yi-ge I hit-PERF Zhangsan one-CL ‘I slapped Zhangsan once.’ TSM:

erguang. slap

gua ka Abing sian tsit-e tshui-phuei. I KA Abing slap one slap ‘I slapped Abing once.’ (*gua I

sian Abing tsit-e tshui-phuei.) slap Abing one Slap

(ix) Noncanonical double-object construction. Both Mandarin and TSM have double- object constructions in the form of V-DP1-DP2. In Mandarin, DP1 can denote a recipient (the canonical DOC) or an affectee (the ‘noncanonical DOC,’ after Tsai 2007). TSM, however, has only the canonical DOC. Thus, (96) in Mandarin has both the ‘lend’ and ‘borrow’ reading, but (97) in TSM has only the ‘lend’ reading:

Principles and Parameters of Universal Grammar 353

(96)

(97)

ta jie-le he lend/borrow-PERF

wo liang-ben shu. me two-CL book

a.

‘He lent me two books.’

b.

‘He borrowed two books from me.’

i tsio gua neng-pun tshe. he lent me two-CL book ‘He lent me two books.’

For the ‘borrow’ meaning, the affectee (or source) DP1 must be introduced by the applicative ka head: (98)

i ka gua tsio neng-pun tshe. he ka me borrow two-CL book ‘He borrowed two books from me.’

The contrast shows that the main verb may raise to a null applicative head position in Mandarin, but not in TSM. (x) ka vs. ba. The above observations also lead us to the fact that, although the Mandarin ba (as used in the well-known ba-construction) is often equated with, and usually translates into TSM ka, the latter has a much wider semantic ‘bandwidth’ than the former. Generally the Mandarin ba-construction is used only with a preverbal low-level object (Theme or Patient), but the TSM ka-construction occurs with other, ‘non-core’ arguments, including affectees of varying heights—low and mid applicatives as illustrated above, and high applicatives— adversatives or (often sarcastically) benefactives, as illustrated here: (99)

i tshittsapetsa to ka gua tsao-teng-khi. i 7-early-8-early already KA me go-back ‘He quit and went home on me at such an early time!’

(100) li to-ai ka gua kha kuai-le o. you should KA me more obedient SFP ‘You should be more obedient for my sake, okay?’ We see then that while certain higher functional heads in the vP domain may be null in Mandarin, they seem to be consistently lexical in TSM. Arguably, in all these cases of differences between Mandarin and TSM, we see some small-scale clustering. In fact, we may be dealing here with one or two mesoparameters as defined in (26). Again, we see the pervasive effects of IG. If we take each difference as indicative of one microparameter, then we have observed ten microparameters. Logically there could be 210 = 1,024 independent TSM dialects that differ from each

354 C.-T. James Huang and Ian Roberts other by at least one parameter value. But it is unlikely that these parametric values are equally distributed. Rather, the likely norm is that they cluster together with respect to certain values. Hence here we have a mesoparameter, expressing special cases of TSM as consistently more analytic than Mandarin, i.e., a range of heads in TSM lacks the formal features giving rise to Agree or head movement in the corresponding cases in Mandarin. Finally, not all speakers agree on the observations made in the preceding discussion, thus reflecting dialectal and idiolectal differences. This is not surprising, as microvariations typically arise among individual speakers. Here we may also find cases of nanovariation.

14.11 Summary and Conclusion We began by sketching and exemplifying the GB conception of parameter of Universal Grammar as a parametrized principle, where the principles, the parameters, and the possible settings of the parameters were all considered to be innate. We described how there was a gradual move away from this view, with the introduction of microparameters and also the lexical parametrization hypothesis (the ‘Borer–Chomsky Conjecture’). We briefly discussed some of the conceptual difficulties with this view, particularly concerning the hyperastronomical number of grammars this predicts and the concomitant problems this poses for typology, diachrony and, in particular, acquisition, where the explanatory value of the whole approach may be called into question. We summarized some of the arguments, notably that put forward in Baker (2008b), for combining macro-and microparameters. We suggested that all (or at least the great majority of) parameters can be described in terms of lexical features (hence following the BCC, and reducing them effectively to microparameters), but we pointed out, following Baker, that we nonetheless do see large macroparametric patterns in the form of clusters. In this connection, we looked at Chinese, showing the remarkable extent of clustering. Here each property can be described by a microparameter, both in respect to synchronic variation and diachronic change, and in respect to typological differences and dialectal variations. The solution we proposed was to adopt the emergentist approach recently developed by Roberts and Holmberg (2010), Roberts (2012), and Biberauer and Roberts (2012, 2014, 2015a,b), and demonstrated how this approach can elegantly describe and explain the observed variation, achieving a high level of explanatory adequacy in the traditional sense (i.e., solving Plato’s Problem), while at the same time, in emptying UG of any statement regarding parameters beyond simple feature-underspecification, taking us in the desired direction, beyond explanatory adequacy.

Chapter 15

Linguistic T yp ol o g y Anders Holmberg

15.1 Introduction Linguistic typology, as usually conceived, is a research programme combining certain aims and objectives with certain methods and a preferred theoretical framework.1 The aim is to understand linguistic variation. More specifically, the aim is to distinguish between properties which are shared across languages for historical reasons (either shared parenthood or language contact), and for other reasons, to do with ‘the nature of language’ in some sense, and ultimately understand what those other reasons are. The preferred method is large-scale comparisons of as many languages as possible, sampled so as to control for genealogical and areal biases. The outcome is a typology or typologies of languages along a variety of parameters such as word order, stress placement, vowel systems, case systems, possessive constructions, and so forth. The theoretical challenge is to explain the cross-linguistic generalizations that are not due to historical contingencies. The preferred mode of explanation is in terms of functional rather than formal notions (see chapter 7 for more on the two approaches). This chapter will describe the basic properties of the research programme, some of its strengths and weaknesses, and some recent trends. It is written from the perspective of an outsider (a generative linguist) but a mainly sympathetic outsider interested in promoting closer collaboration between typological and generative linguistic research. The driving force behind typological research is the belief that there are systematic similarities among languages, leading to patterns of variation which are not due to historical contingencies but to ‘the nature of language,’ in some sense, and that these patterns and shared properties can be detected and mapped on the basis of careful comparison of languages, and explained at least in part on the basis of functional considerations, mainly to do with efficiency of processing and communication. The research program has undergone some significant development in the last twenty years, where 1

Thanks to Martin Haspelmath and Ian Roberts for their comments on the chapter.

356 Anders Holmberg the new elements are: a much better picture of areal effects on the form of languages (so much so that linguistic typology and linguistic geography can now be seen as facets of the same research program) and a move away from the search for Greenbergian implicational universals towards a purely descriptive enterprise. In what follows, I will comment on these developments. Linguistic typological research can point to some indisputable and quite astounding results. As a result of this research we now have detailed knowledge of the distribution of a very wide range of grammatical/linguistic properties, over the full range of languages described in some cases, over large subsets of languages in other cases. The most concrete result of this research is the World Atlas of Linguistic Structures (WALS), a database that is available as a book (Haspelmath et al. 2005) and online (the most recent edition at the time of writing is Haspelmath et al. 2013). It contains and systematizes a huge number of facts about 192 grammatical properties (called features), in samples of the world’s languages of varying sizes (over 1,500 languages for the largest samples), and including data from all in all 2,679 languages at the time of writing. WALS is characterized by Baker (2010) as ‘wonderful and frustrating.’ I will comment on both aspects.

15.2 History: The Greenbergian Paradigm Modern linguistic typological research starts with Joseph Greenberg’s paper ‘Some universals of grammar with particular reference to the order of meaningful elements’ (Greenberg 1963/2007). Linguistic typology existed before this paper; in particular, there is a body of work on morphological types, that is, synthetic vs. analytic languages and varieties thereof, going back to the 19th century (see Greenberg 1974; Croft 1999, 2003:39–42). However, Greenberg’s (1963) paper very clearly defined a new program for linguistic research. The paper has all the major components of most subsequent research on linguistic typology: the global, comparative perspective and the aim to discover cross-linguistic generalizations that are not the result of genetic relatedness or language contact, which requires selection of a representative and balanced sample of languages for comparison. Greenberg’s paper also exemplifies the statistical approach which is typical of most subsequent typological work: the generalizations are not required to be absolute. Finally, there is the ambition to get beyond the generalizations to a theoretical explanation in terms of more fundamental properties of language. What Greenberg did was put forward a set of 45 cross-linguistic generalizations, termed ‘universals,’ based on (primarily) a sample of 30 languages, selected with a view to get a certain spread in terms of genetic groupings. The universals include the famous

Linguistic Typology 357 word order universals, stating correlations between the ordering of the lexical categories and their complements, including Universals 3, 4, and 5: Universal 3: Languages with dominant VSO order are always prepositional. Universal 4: With overwhelmingly greater than chance frequency, languages with SOV order are postpositional. Universal 5: If a language has dominant SOV order and the genitive follows the governing noun, then the adjective likewise follows the noun. The other universals concern morphology, including Universals 27 and 34: Universal 27: If a language is exclusively suffixing, it is postpositional, if it is exclusively prefixing, it is prepositional. Universal 34: No language has a trial number unless it has a dual. No language has a dual unless it has a plural. It is obvious now, and it was obvious to Greenberg at the time, that the sample of languages is not a balanced sample representing all the families and regions in the world in a principled fashion (for instance, as many as six of the languages are Indo-European), but the ambition is nevertheless present; thus all the continents (Africa, the Americas, Eurasia, and Australia/Oceania) are represented. At the end of the paper Greenberg also presents, quite tentatively, a theoretical explanation of a subset of the universals, namely the word order universals. The explanation is in terms of the notions ‘dominance’ and ‘harmony.’ This also set the tone for much subsequent work in linguistic typology and, a bit later, generative grammar. There were two interrelated aspects of Greenberg’s work which, in particular, made it such an inspiration for linguistic research in typology as well as, later, in generative linguistics. One is the formulation of cross-linguistic implicational generalizations: if a language has property P, it also has property Q. Even though, as noted by Haspelmath (2008), implicational statements are implicit in basically all typological work (‘a language which has agglutinating verbal morphology has agglutinating nominal morphology’), Greenberg formulated them as a set of explicit, interrelated, and, as it looked, easily testable hypotheses. The second is the fact that a set of the generalizations, in particular the word order universals, seemed to hang together, centering on the order of S, O, and V. This strongly suggested that languages form ‘holistic types’ of a sort that had not been observed before. There was unity, simplicity and some sort of rationality underlying the seemingly unconstrained surface variation. There was a flurry of works in the late 60s and 70s taking Greenberg (1963) as a starting point, extending the program to other languages, proposing explanations of the universals, and applying them to diachronic linguistics (Lehmann 1973a,b, 1978; Venneman 1974, 1984; Greenberg et al. 1978; Hawkins 1979, 1983). In generative grammar the importance of Greenberg’s universals became apparent when X-bar theory was generalized

358 Anders Holmberg to all categories, making the phrase structure rule component redundant (Stowell 1981; Chomsky 1982a, 1986a), along with the introduction of the parametric theory of linguistic variation (Chomsky 1981a). This made possible a view of the cross-linguistic word order universals as effects of the setting of the head–complement parameter, together with certain other auxiliary parameters (Travis 1984; Koopman 1984); see also chapter 14, section 14.3. A proper extension and re-evaluation of Greenberg’s work on the word order universals finally came with Dryer (1992), putting the project on a more secure footing both in terms of the number of languages (625 instead of 30), and the sampling of languages, and in terms of categories being considered. Dryer showed that some of Greenberg’s universals do hold up (as statistical universals), under these more stringent conditions, others do not, but are shown to be areal effects, not detectable in Greenberg’s small sample. As will be discussed in what follows, within linguistic typology today, interest in research on word order universals has faded to a significant extent.

15.3 Explaining Typological Generalizations There are basically two lines of explanation. One is the functionalist line, according to which cross-linguistic generalizations which are not historically based are (mainly) the result of a variety of functional factors to do with efficient processing and efficient communication, broadly speaking. Language has evolved to be as efficient as possible for communication (see for example Comrie 1989:124; Hawkins 1994, 2004; Newmeyer 1998:105ff.; Haspelmath 2008:98ff.).2 The other is a formalist line, according to which the cross-linguistic patterns are due to universal formal properties of language, at least some of which are ultimately biologically based, the result of genetically determined properties of the human language faculty. See Newmeyer (1999:95–164) for a critical review of internal and external (mostly function-based) explanations in linguistics; see also chapter 7. There has always been a strong tendency within research on linguistic typology to gravitate towards functional explanations rather than explanations in terms of innate universals, in fact, so much so that linguistic typology has been seen as almost synonymous with linguistic functionalism.3 My own view on this is that there is no logically necessary connection between typology and functionally-based explanation, but there is a practical one to do with the methodology: the favored method in linguistic typological research is comparative surveys on a large scale in order to cover as much as possible of the existing variation and in order to establish, as far as possible, valid global 2

There is also a cognitive-linguistic approach along the lines of Langacker (1987), a variety of the functionalist approach that develops the idea that linguistic structure more or less directly reflects, or represents, conceptual structure. See Croft (2001:108ff and passim). 3 See Croft (1995), who argues that there is a logical connection between typology and functionalism; see Newmeyer (1998:348–349) for a rebuttal.

Linguistic Typology 359 generalizations. This has meant that the grammatical properties that are investigated/ compared, are by necessity all easily observable ‘surfacy’ properties of the kind which are recorded even in sketchy descriptive grammars. One result of this is that the generalizations discovered have been probabilistic, riddled with exceptions, rather than absolute, because surfacy properties are subject to unpredictable variation to a greater extent than more abstract properties (as will be discussed). This disfavors explanations in terms of universal, genetically determined properties of the language faculty, and favors explanations in terms of ‘functional pressure,’ which are expected to allow for exceptions. In generative grammar the aim is to uncover the universal properties of the human language faculty, including universal properties which are purely formal and ultimately based in the human genome. Such universal properties are typically not encoded in surface form, and the focus has therefore traditionally been on more abstract structural properties, which often can’t be observed directly, but typically rely on negative evidence, i.e., require making distinctions between grammatical and ungrammatical sentences, which in turn requires extensive access to native speakers’ intuitions (ideally the researcher is himself a native speaker of the language). This sort of data is seldom found in descriptive grammars, and in part for this reason, there has been only limited interest in typological research among generative grammarians. More recently there has been a certain rapprochement between the functionalist and the biological approaches, from the formalist side. Within the Minimalist Program the prevalent view now is that there are few genetically determined properties that are specific to language. Instead, the form of language is largely determined by more general constraints and conditions on computational systems, even including some very general conditions on ‘systems design’ in the natural world (see Hauser, Chomsky, and Fitch 2002; Chomsky 2005; Piatelli-Palmarini and Uriagereka 2008; and c hapter 6). This approach to UG is likely to lead to more openness towards functional explanations within generative grammar.4

15.4 On the Methodology of Typological Research The task of linguistic typology is to investigate linguistic variation. It should observe and record existing variation and establish valid generalizations about the variation among the languages of the world. What is important, therefore, is not the number of languages investigated per se, but the variety: the investigation should include as wide a variety of languages as possible. The sampling of the languages is therefore crucial. Given the aim of discovering generalizations or patterns which are not the result of genealogical relationship or the result 4 This is not to imply that functional explanations would not earlier have been recognized within generative grammar; see Newmeyer (1998:154–157) and c hapter 7. But it is fair to say that there has not been much interest in developing this approach, except within Optimality Theory; see Haspelmath (2008:87–92) for discussion.

360 Anders Holmberg of extensive language contact, the set of languages compared should include languages, preferably in equal proportions, from every language family, down to as fine-grained a genealogical classification as possible, and from every corner of the world. The genealogical class which is now (following Dryer 1989, 1992) commonly the basis for global comparison is the genus, denoting a division at a time depth of no more than 3,500–4,000 years. The standard subfamilies of Indo-European (Germanic, Slavic, Celtic, etc.) would be genera, although Dryer (2013d) adds that ‘Celtic is perhaps a clearer example than Germanic or Slavic, both of which have a time depth considerably less than 3,500 years.’ Thus a survey purporting to represent the languages of the world should ideally include at least one representative from every genus of every language family. Although one may ensure an even distribution by including exactly one language from each genus, an alternative is to include any number of languages from all genera, but to count only genera (adopting some principle to decide which property is representative of the genus in cases where they are not consistent); see Dryer (1988, 1992) and Rijkhoff and Bakker (1998). In order to control for the effects of extensive language contact, the sample should include representatives, in roughly equal proportions, from all over the world, i.e., from every continent and every region of every continent, defined in some principled fashion. The importance of this methodological principle was amply demonstrated by Dryer’s work on noun–adjective order (Dryer 1988), and has been confirmed by much work ever since (see Nichols 1990, 1992). Greenberg (1963) had proposed that AdjN (adjective– noun word order) correlates with OV, and NAdj with VO (Greenberg 1963:100).5 What Dryer found, when applying a principled sampling method, was that OV correlated strongly with AdjN order in Eurasia, while OV correlated, just as strongly, with NAdj in languages outside Eurasia. ‘In short, the previously believed tendency for OV languages to be AdjN is simply an Asian areal phenomenon …’ (Dryer 1988:188). Correspondingly, there was no global correlation between VO and NAdj order, but again there were areal effects, as in all the African language families SVO correlated with NAdj order (‘an instance of an apparent pan-African tendency to place modifiers after the noun regardless of the order of verb and object’ Dryer 1988:189), while in the case of VSO order, the families/phyla where NAdj was the dominant word order were all in northwestern North America. Apparently noun–adjective order is a property that is sensitive to language contact, leading to regional convergence. One of the most interesting developments of linguistic typological research in the last twenty years or so is, indeed, the discovery that shared grammatical properties can be spread over very large areas, crossing genetic language boundaries, so that they cannot be explained by common ancestry, but can still be geographically restricted, so that they cannot be explained by universal grammar, leaving extensive language contact as the only possible explanation. This is the topic of the next section. 5 Greenberg (1963/2007) proposed two generalizations, neither of which entails a direct correlation between object–verb and adjective–noun order. One is Universal 5 (see text earlier in this section). The other is Universal 17: ‘With overwhelmingly more than chance frequency, languages with dominant order VSO have the adjective after the noun.’ In addition, Greenberg claimed that there was a general (universal) tendency for NAdj order, complicating the picture. See Dryer (1988) for discussion.

Linguistic Typology 361

15.5 Areal Features It has long been known that there are areal effects in linguistic variation, such that certain grammatical properties are shared by genetically unrelated or only distantly related languages in a region: the so-called Sprachbund phenomenon. A famous case is that of the Balkan languages (including South Slavic languages, Romanian, Albanian, and Greek, all Indo-European, but only distantly related within this family), which share a number of features, for example loss of infinitives. Another well-known case is the Indian subcontinent, where the genetically unrelated Indo-Aryan and Dravidian languages have a number of properties in common, including retroflex stops and SOV order. The explanation for the Sprachbund phenomenon is, by all accounts, extensive language contact; people have intermingled within these regions, bilingualism/multilingualism and language shift have been widespread, resulting in the sharing of features, but where the social and political situation has nevertheless permitted languages to survive and continue to evolve as different languages. One of the most interesting findings of typological research based on large-scale global comparison, which could not have been discovered with any other method, is the distribution of linguistic features over very large areas, in some cases continent-wide, or even spanning continents, and, importantly, in many cases spanning several language families, thus basically ruling out the possibility of a phylogenetic explanation (Nichols 1992). In section 15.4 I mentioned Dryer’s discovery that adjective–noun order is subject to continent-wide areal effects. Dryer (1998) shows that SOV order is not just an areal feature of the Indian subcontinent, but an Asian phenomenon more generally: apart from Europe and South-East Asia, which both have strong VO preponderance, the Eurasian continent is solidly (S)OV. It is striking, when comparing the maps in WALS showing the distribution of grammatical features, how many of them show at least some areal effects which do not follow phylogenetic lines. The following are a few more examples: Tone (Maddieson 2013): 527 investigated languages are divided into three types with respect to (lexical) tone6—languages with no tones (307), with a simple tone system (132), and a complex tone system (88) (Maddieson 2013). The distribution is striking: the complex tone languages are all found in Subsaharan Africa, China, and mainland Southeast Asia, or New Guinea, with a few representatives in the Americas, particularly Mesoamerica. Eurasia, apart from Southeast Asia, is a tone-free area, apart from simple tone systems showing up in a few places (Japan, Scandinavia), as is Australia. In Africa, tone languages are found in all of the major language families: Afro-Asiatic, Niger-Congo, Nilo-Saharan, and Khoisan.

6

Tone is defined as ‘the use of pitch patterns to distinguish individual words or the grammatical forms of words, such as the singular and plural forms of nouns or different tenses of verbs’ (Maddieson 2013).

362 Anders Holmberg Numeral classifiers (Gil 2013): 400 languages are divided into three types—no numeral classifiers (260), optional (62), and obligatory (78). Obligatory classifiers are a Pacific Rim phenomenon (see Nichols 1992:198–200, 227–229), mainly found in China and South-East Asia, Japan, Micronesia, Polynesia, Mesoamerica, and along the Pacific coast in the Americas. There are also a few representatives in Central and West Africa, but none in the rest of the world. Comparative constructions (Stassen 2014): 167 languages are divided into four types—Locational Comparatives (‘He is old(er) from me’; 78), Exceed Comparatives (‘He is old exceed me’; 33), Conjoined Comparatives (‘He is old, I am young’; 34), and Particle Comparatives (‘He is older than me’; 22). Take the case of the Exceed Comparatives, which ‘have as their characteristic that the standard NP is constructed as the direct object of a transitive verb with the meaning “to exceed” or “to surpass” ’ (Stassen 2014). They are exclusively found in two areas: Subsaharan Africa and South- East Asia, extending to Polynesia. No examples are found in the rest of the world.7 The Particle type is basically only found in Europe. According to Stassen, instances of this type in other parts of the world (a few languages in Maritime South-East Asia and the Americas) may be due to influence from English or Spanish. The Locative type, on the other hand, is distributed more or less evenly across the world. There are also many features that do not show any clear geographical distribution, such as articleless NPs (Dryer 2013a) and second position question particles (Dryer 2013b). These findings are extremely interesting for several reasons. For one thing, they provide interesting evidence of the early history of humans, the spread of populations across the globe and ancient contact between peoples (Nichols 1990, 1992, 1998; Nichols and Peterson 1998).8 But they can also give valuable insights into universal grammar and linguistic variation, as they can show how grammatical properties are more or less susceptible to variation, change, and diffusion (Nichols 1992:279). Wichmann and Holman (2009) have carried out an interesting investigation on the relative diachronic stability of the linguistic features in WALS. Their paper first assesses some different methods for modelling stability. They decide in favor of estimating the stability of a feature by ‘assessing the extent to which phylogenetically related languages 7

Newmeyer (1998:329–330) insists that Classical Greek, Latin, and Classical Tibetan ‘manifest a wide range of comparatives of the “exceed” type.’ He takes this as a paradigm example of the problem of relying on secondary sources (descriptive grammars) in typological research. The problem in this case would be that the ‘exceed’ expressions are not the primary form of comparatives in these languages, and therefore are not mentioned in standard descriptive grammars. 8 Nichols (1992) claims that viewing the distribution of grammatical features as a population- typological or geographical issue ‘can take us farther back in time than the comparative method can and can see graspable facts and patterns where the comparative-historical method has nothing at all to work with’ (Nichols 1992:280). The comparative method referred to here is the traditional method of tracing genetic relations by comparing cognate morphemes and words. See Longobardi, Gianollo, and Guardiano (2008), Longobardi and Guardiano (2009), and c hapter 16, for extension of the comparative- historical method to syntactic features. One rationale for this extension is, indeed, that it has the potential to take us farther back in history than lexically-based comparison.

Linguistic Typology 363 are more similar with respect to the feature than are unrelated languages.’ Applied to the features in WALS, this metric assigns a numerical stability value (expressed as a percentage) to each feature. With the range divided four ways (very stable, stable, unstable, and very unstable), and considering the features discussed earlier, tone is stable, as are numeral classifiers, while comparative constructions come out as very stable. The word order features discussed earlier (VO vs. OV and NAdj vs. AdjN) also come out as very stable.9 Among the features which come out as very unstable are, for example, consonant inventories and fixed stress locations (among phonological features), definite and indefinite articles, and features to do with expression of negation (among syntactic features). A notable finding is that the stable features are no less susceptible to diffusion than unstable features; stability means more resistance to change internal to a language, but not more resistance to diffusion.

15.6 The Problem of Limited Data A methodological problem which has always dogged large-scale survey-based research is limited data. These investigations have to rely largely on existing grammars, which can be of highly variable quality, often being based on fieldwork carried out under difficult conditions within a limited amount of time, by researchers who are often without adequate linguistic training. The rules and generalizations are, therefore, often incomplete, vaguely formulated, and insufficiently exemplified. This is less of a problem, however, when the phenomenon described is easily observable. Consequently, typological research is typically engaged with such properties: word- stress placement, prepositions or postpositions, question particles in initial, final, or second position, and so forth. Even so, there is a tendency within typological research to present their results with rather more confidence than they actually warrant. And there is also a tendency within linguistics more generally to treat the results of typological research with more reverence than they deserve (especially when the results happen to support a favored theory). Obviously, as linguistic theory progresses, with more and more languages being subject to competent, systematic and detailed investigation and description, the more confident we can be that the observations are accurate, and the more abstract the properties can be that are subject to typological research. For example WALS is quite conservative in the choice of variables that are included, generally keeping to surface-observable phenomena.10 Even so, for some of the grammatical properties/features that WALS includes, it is questionable whether they lend 9

That order of V and O should be a very stable feature is not obvious from a European perspective, since many of the languages in Europe have recently (in the last 1,000 years) undergone change from OV to VO. Apparently this is not such a common phenomenon in a global perspective, though. 10 Furthermore, WALS online is an interactive website, where readers are encouraged to comment on the factual claims. In this way, the typological claims should be gradually improving in accuracy.

364 Anders Holmberg themselves to the sort of large-scale, surface-based analysis that WALS is based on. I will take an example which I happen to have some knowledge of, having investigated the phenomenon in some detail: the null subject or pro-drop property. I would claim that pro-drop is not a member of the class of easily observable phenomena which can be surveyed and counted on the basis of grammatical descriptions of the usual kind. WALS includes ‘Expression of pronominal subjects’ (Dryer 2013c), as one of its 192 features. An impressive total of 711 languages are investigated, and five types are distinguished, plus a mixed type:

1. Obligatory pronouns in subject position (82 languages); 2. Subject affixes on verb (437); 3. Subject clitics on variable hosts (32); 4. Subject pronouns in different position (67); 5. Optional pronouns in subject position (61); 6. Mixed (32).

Type 1 is the typical non-pro-drop type (languages where finite sentences ‘normally if not obligatorily contain a pronoun in subject position’ [Dryer 2013c]). Type 2, by far the most common, is the agreement pro-drop type, where rich subject agreement substitutes for a subject pronoun or licenses pro-drop (depending on one’s theory and analysis; see Holmberg 2005). Type 3 is languages with pronominal clitics. Type 4 has subject pronouns doubling a lexical subject. Type 5 are ‘radical pro-drop languages,’ where subject pronouns can be null without any agreement or other morphological expression of the subject, including Japanese, Thai, etc. Type 6 is a mixed type, which has pro-drop but only in some definable part of the system, as, for example, Finnish, which has pro-drop of 1st and 2nd person pronouns but not 3rd. The figures probably capture a tendency which is real, but I would not recommend attaching much importance to the exact figures. Type 1 includes not just rigidly pro- drop-free languages, but also languages in which it is ‘grammatically possible to have simple sentences without anything in subject position, but in which this option is seldom taken in actual usage’ (Dryer 2013c). This is a wise decision, since probably very few languages completely reject any instances of missing or silent subjects in finite clauses. For example (spoken) English and the written English of certain clearly definable registers such as diaries do not (see Haegeman 1999), yet there are good reasons to think that the syntax of English null subjects is in important respects different from the syntax of null subjects in, for example, Italian or Arabic. Drawing the line between Type 1, so conceived, and Type 2 is not easy to do in practice, though, on the basis of information typically found in descriptive grammars. I have picked out three among the languages in Type 2 that caught my attention, since I happen to have at least some knowledge of languages in the relevant families, and because I suspect, on that basis, that they are wrongly classified, or worse, they show that the typology does not make the right distinctions. These are Assamese and Punjabi, two Indo-Aryan languages, and Estonian, a Finno-Ugric language.

Linguistic Typology 365 Type 2 consists of languages in which ‘the normal expression of pronominal subjects is by means of affixes on the verb.’ What does ‘normal’ mean? For many languages it is easy to determine: the subject pronoun is basically always null except when it is focused/contrasted or expresses a new topic, or perhaps when, for some reason, agreement is unavailable. I would recommend, as a methodological rule of thumb in a case like this, not to rely solely on whatever assertion(s) the author of the grammar makes concerning expression of pronominal subjects, but to check the use of pronominal subjects in examples found throughout the book. If it is a true Type 2 language, examples with pronominal subjects in finite sentences throughout the grammar, especially 1st and 2nd person subjects (as these are typically not dependent on a linguistic antecedent), should have a null subject. If most of them have a spelled-out subject, there is cause to be skeptical. Take the case of Aja and Kresh, two languages included in the list of Type 2 language in WALS, based on Santandrea (1976).11 The author writes (1976:59) ‘[i]‌f the subject is a personal pronoun … it is regularly omitted, for in most of these languages (including Aja and Kresh—AH), verbs are conjugated, and thus inflexions or tones make up for the omission.’ The characterization ‘regularly omitted’ is quite unambiguous. But in addition, and crucially, this omission is evidenced in sentence after sentence, exemplifying various aspects of the grammar of the languages. The same is found in any grammar of Arabic or Turkish or Spanish, just to mention some more familiar, well-researched Type 2 languages.12 Assamese (Asamiya) is also classified as a Type 2 language. In the paper which the classification in WALS is based on (Goswami and Tamuli 2003) I could find no explicit mention relating to null subjects.13 However, the paper contains a number of examples with pronominal subjects, several of which exemplify varieties of narrow focus on nonsubject constituents, hence have a defocused subject. Only two out of about a dozen examples (Goswani and Tamuli 2003:482) have a null subject. This is not indicative of a Type 2 language. As for Punjabi, the work that the classification in WALS is based on is Gill and Gleason (1963). I have access only to the 1969 edition of the book. On the issue of expression of subject pronouns, the book mentions that such pronouns are omitted in connected discourse under topic continuity and provides an example of a narrative demonstrating this. A search of examples with 1st person singular subjects throughout the book confirms, however, that nearly all of them have an overt pronoun. Bhatia (1993), the most comprehensive syntactic description of Punjabi to date, writes on the expression of subject pronouns: ‘They are generally dropped if they are traceable either from the verb or from the context.’ He continues: ‘In non-perfective tenses, the verb agrees with the subject in number, gender, and person; therefore, in such instances, pronouns are often dropped’ (Bhatia 1993:222). The qualification ‘often’ suggests that we may not be dealing with a consistent 11

I have picked these languages purely for the sake of convenience: my local university library happened to have this book. 12 See, for example, Harrell (1962); Kornfilt (1997); Zagona (2002). 13 WALS includes references, with page numbers, for every language-feature ascription. In the case of Assamese subject pronoun expression the page reference is clearly wrong, though.

366 Anders Holmberg Type 2 language. But more to the point, examples of finite sentences with pronominal subjects (of which there are many), including those with nonperfective tenses, invariably have a spelled-out subject. It may well be the case that in some of these sentences the subject would be null in natural, connected discourse. Nevertheless, this is indicative of a system different from the one in Aja and Kresh (or Arabic, Turkish, Spanish, etc.).14 For Estonian, the WALS reference is de Sivers (1969). On the pages referred to (47– 48), the personal pronouns of Estonian are listed along with the agreement inflection on the verb, and a set of examples, drawn from a corpus of spoken Estonian. The verb inflection in Estonian is rich enough, with six distinct forms (three persons, two numbers). However, all three (probably) non-emphatic examples of the first person pronoun were overt, and three (probably) non-emphatic examples of the 2sg pronoun were overt, with one null. A consultation with an Estonian colleague (Anne Tamm, p.c.) confirms that Estonian, in particular the spoken language, is by no means a clearcut case of a null-subject language. It is, in this respect, similar to its close relative Finnish. Although written Finnish, conforming to normative pressure, has regular pro-drop of 1st and 2nd person subjects, this is not the case in spoken Finnish except in certain contexts. There is a perfectly reasonable rationale for the Type 1–Type 2 distinction in Dryer’s classification: for a large class of languages the subject agreement inflection on the verb or auxiliary in some sense is the subject, or at least, can express the subject function on its own: It is (or can be) a pronominal category capable of carrying/expressing the thematic role and case of the subject. These would be the Type 2 languages. This means (a) that there is no null or deleted subject pronoun in the relevant sentences in these languages, and (b) that when there is an overt subject, it is an adjunct, only indirectly linked to the subject role.15 There are good reasons to think that there are languages like this; see Barbosa (1995, 2009); Alexiadou and Anagnostopoulou (1998); Holmberg (2005). However, in Holmberg (2005) I show that there is at least one language that has rich subject–verb agreement and has overtly subjectless sentences, even quite regularly, but where there is a null or deleted subject, occupying a position in the core sentence, namely Finnish. The claim is that this is a more general phenomenon: there is a class of languages, including Finnish, Brazilian Portuguese, at least some of the Indo-Aryan languages, and probably Estonian, which are called partial null subject languages in Holmberg (2005), Holmberg, Nayudu, and Sheehan (2009), Biberauer et al. (2010), distinct from consistent null-subject languages, i.e., the more prototypical Type 2 languages in Dryer’s taxonomy. Characteristic of the partial pro-drop languages is that referential 14 I am grateful to Dr. Raja Nasim Akhtar (p.c.), a specialist on Punjabi, who also prefers not to classify the language as a null-subject language. Western dialects of Punjabi have a system of pronominal enclitics or suffixes attached to verbs doubling the subject, where ‘[u]‌se of pronominal suffixes normally presupposes suppression of the full pronominal forms capable of expressing the same syntactic functions’ (Shackle 2005:673). The examples cited in Shackle’s work bear this out. In terms of the WALS taxonomy this should be a reason to classify the relevant varieties of Punjabi as Type 3, though. 15 Dryer (p.c.) rejects the idea that overt subject NPs would be adjuncts in all Type 2 languages. While accepting that it may hold true of ‘languages with more flexible word order, for languages with more fixed word order, I think the NP is still in some sort of syntactic subject position.’

Linguistic Typology 367 subject pro-drop is never obligatory, the way it is in consistent null-subject languages when there is a local antecedent and no emphasis on the subject, and generally is syntactically more restricted. Another characteristic of partial null subject languages is that they have (or rather, can have) a null inclusive generic subject pronoun (‘null one’) in active sentences, where the consistent null subject languages always seem to resort to some overt strategy. This classification is based on a detailed investigation of a small number of languages, so far representing only a few language families (Indo-European, Finno-Ugric, Semitic/Afro-Asiatic). The hypothesis proposed is that the class-defining properties are due to a difference in the feature composition of the sentential functional head I(NFL), expressed as agreement inflection: in consistent pro-drop languages I has a referential feature, in partial pro-drop languages it does not.16 It takes a considerable amount of detailed investigation to establish where such languages stand with respect to expression of subject pronouns. Classification of a language as either a consistent or partial pro-drop language cannot, unfortunately, be done on the basis of token assertions regarding subject pronoun expression in descriptive grammars. Frequency of pro-drop may be an indicator: if almost every example sentence with a pronominal subject in the grammar (particularly 1st and 2nd person, which are not to the same degree context-dependent as 3rd person) has a null subject, then we are probably dealing with a consistent pro-drop language, a canonical Type 2 language in Dryer’s taxonomy.17 But what if half of them do? Given that the null-subject option exists, as it does in most languages, there are a variety of factors, including sociolinguistic factors, which may influence how this option is employed in spoken or written language, so frequency is at best a weak indicator of class membership. The partial pro-drop languages should also be kept distinct from languages like English, which allow null subjects in certain syntactic contexts, and employs them, even quite frequently, in ordinary discourse, subject to regional, social, and stylistic variation. A characteristic of English and the other Germanic languages is that the null subjects do not occur in embedded contexts (Haegeman 1999; Haegeman and Ihsane 1999). This is the sort of observation which may very well escape a researcher writing the first grammatical description of a language, and often will not be gleanable from the typically scarce example sentences in the grammar. This will hardly be the case in grammars written today, but is perfectly possible in the case of grammars written, say, fifty years ago. Two conclusions follow from the discussion above: the first point is that the taxonomy in WALS makes too coarse a distinction between Type 1 (languages where sentences with pronominal subjects ‘normally if not obligatorily’ have an overt pronoun in subject 16

See also Phimsawat (2011) who shows that Thai, along with several other languages of Type 5 in WALS—that is, ‘radical pro-drop languages’ with extensive pro-drop but no agreement—share with partial pro-drop languages the property that the (inclusive) generic pronoun is null. This is predicted by the theory in Holmberg (2001, 2010a): they have no nominal features in I, and hence no referential feature. 17 Finnish happens to be an exception, though: due to normative pressure in the written language, 1st and 2nd person null subjects are massively overrepresented in examples in descriptive grammars (e.g., Sulkala and Karjalainen 1992).

368 Anders Holmberg position) and Type 2 languages. The second is the methodological point that ‘expression of subject pronouns’ is too abstract a phenomenon to be subject to a broad-based typological investigation, at the present time, with the descriptions of most languages which we presently have at our disposal. The online interactive database Syntactic Structure of the World’s Languages (SSWL) implements a different approach from WALS, intended to avoid the problem of limited data. It is wholly based on information supplied by language experts directly to the database. Second, the information about syntactic features is coded, according to a strict protocol, as ‘yes’ (the language has the property) or ‘no’ (it does not have the property). This means avoiding, as far as possible, any indeterminacy and ambiguity in the use of grammatical terminology, and also means that the database provides explicit negative evidence. The drawback is that the database is wholly dependent on voluntary contributions from linguistic experts, and the development of the database has therefore been rather slow.

15.7 Some Recent Trends In many ways the research program of Greenbergian linguistic typology has been a success story.18 As a result of it we now have a vast amount of knowledge about grammatical properties of a large part of the existing languages of the world, including detailed knowledge about common properties and existing variation within most observable subsystems of grammar, phonological, morphological, and syntactic. The extent of large-scale areal spread of linguistic features has been discovered. On the other hand, the research program can be said to have failed to achieve one of its goals, that of uncovering universally valid and theoretically interesting generalizations on the basis of systematic comparison of large numbers of languages. As a consequence, an influential school of thought within modern typological research has all but given up the quest for universals along Greenberg’s lines, settling instead for mere description of patterns of variation: Large datasets almost invariably reveal exceptions to universals, and this, together with a substantial increase of newly described languages assisted by prominent conceptual argumentation … has practically done away with notions of absolute universals and impossibilities. Modern studies of typological distributions involve statistical methods, from association tests to multivariate scaling methods … The general assumption is that if there are large-scale connections between linguistic structures, or between linguistic structures and geography, they consist in probabilistic (and therefore exception-ridden) correlations between independently measured variables. (Bickel 2007:245) 18

The content in this section is heavily based on Baker (2010).

Linguistic Typology 369 Some recent, radical representatives of this view which have received a certain amount of attention are Evans and Levinson (2009) and Dunn et al. (2011). These works claim to demonstrate the absence of formal or function-based universals of any kind. Evans and Levinson work their way through a set of grammatical concepts and generalizations which, they say, have at one time or other been claimed to have universal validity, purporting to show that there are counterexamples to all of them and that they are therefore disconfirmed as theoretical hypotheses. Their conclusion is that there is no evidence of a biological basis for linguistic properties. Dunn et al. (2011) make essentially the same point, purporting to show specifically that there is no evidence for the Greenbergian word order universals, not even the reduced set in Dryer (1992). Instead, they argue, the proposed correlations of the sort that researchers in the Greenbergian tradition typically have assumed follow family lines. The conclusion is that variation is unpredictable and unconstrained, except for genealogical/historical constraints (related languages have more features in common than unrelated languages).19 In this context, Baker (2010) makes the following important point. The Chomskyan view of UG is that it is a property of human cognition that makes possible acquisition of one or more languages, each of them a staggeringly complex system of lexical items, categories, rules, and principles, in the space of a few years, on the basis of ordinary linguistic experience with its well-known limitations: absence of negative data, highly variable exposure to data, and absence of instruction (see c hapters 5, 10, and 12). UG is the initial state which, when provided with primary data eventually yields the final state, the adult grammar and lexicon, which makes possible use of language for communication and verbalized thought (see again chapter 12). Variation among languages is due to variation in the primary data, but the variation is restricted by various factors, one of which is properties of the genetically determined initial state, UG. Chomsky’s more recent position is that the initial state is less a matter of UG in the sense of a cognitive faculty dedicated to language and more a matter of general properties of computational/ combinatorial systems (see chapter 6), but nevertheless, there is a biologically given, mind-internal faculty which makes possible acquisition of a language, given sufficient primary data, as found in any normal human society. But since there is variation in the data (basically as much as is allowed by UG and other constraining factors), the outcome is different languages and dialects (see chapter 14). As discussed, typological research in the Greenbergian tradition is, for practical reasons, largely restricted to surface-observable properties, such as word order and overt morphological properties. These properties are observable to the descriptive grammarian, so they are also observable to the language learner in the data (s)he encounters. That means that they can, on that account, be learned on the basis of experience. This, in turn, means that they are among the linguistic facts that we can expect to vary across languages. The properties that we expect not to vary are all those abstract properties which 19 See Baker (2011b) for a discussion of Dunn et al. (2011) demonstrating that their overall conclusions do not follow from the statistical facts they discover. See the comments on Evans and Levinson’s (2009) target paper, many of which show that they are massively overstating their case. See also Rooryck (2011).

370 Anders Holmberg are not directly observable in the primary data. That is to say, the properties, rules, categories, and so forth that are best amenable to Greenbergian typological investigation are precisely those where we expect not to find absolute, exceptionless generalizations, but at most tendencies of varying strength. Correspondingly, from a Chomskyan perspective, we expect to find absolute generalizations regarding properties that are more abstract, not directly observable in the data. Baker (2010) mentions the existence of VP as an example of a (potential) universal, which is not directly observable in the primary data, since most languages have a variety of constructions where the verb and the object are separated (in VSO languages the subject systematically intervenes between the verb and its object). Another case that he mentions is island constraints on movement, such as the complex NP constraint and the coordinate structure constraint. Island constraints can basically only be detected by negative examples, and these do not occur in ordinary linguistic experience, yet there are good reasons to think that they are universal, presumably reflecting the limits of human linguistic computing capacity, even though the precise formulation of the various islands is still open to debate. Even more fundamental and more abstract properties that are, in all likelihood, universal are hierarchical syntactic structure (possibly strictly binary branching), and the role of c-command as a regulator of scope of operators and modifiers as well as movement, binding, and agreement relations.20 There may be some variation in the opacity of certain categorial boundaries, and the precise formulation of c-command is still open to debate and refinement, but there is very good reason to think that the relation plays essentially the same crucial role in every language. Consider the following quotation from Haspelmath (2008): But there does seem to be a widespread sense in the field of (non-generative) typology that cross-domain correlations do not exist and should not really be expected. After the initial success of word order typology, there have been many attempts to link word order (especially VO/OV) to other aspects of language structure. . . . But such attempts have either failed completely, or have produced only weak correlations that are hard to distinguish from areal effects. (Haspelmath 2008:95)21

There is a reason for the failure to find exceptionless correlations based on VO/OV order: we expect to find such correlations at the level of syntactic structure, including syntactic features and operations. But surface word order is only a weak predictor of syntactic structure. SVO order can be derived in a variety of ways (for example by V- movement in a language which otherwise has SOV order, as in German and Vata; see Koopman 1984), as can SOV order. Even for languages that are rigidly either SOV or

20

Needless to say, these notions are still subject to debate in the literature. See for example Nunes (2004) on c-command and movement, and Culicover and Jackendoff (2005) on the role of c-command generally. 21 Haspelmath (2008) does not claim that there are no universals. His argument concerns what he calls cross-domain universals, by which he means correlations between phonological and syntactic

Linguistic Typology 371 SVO there is no a priori reason to think that they necessarily have identical sentential syntactic structure; see Holmberg (1998). V-initial order can also be derived in a variety of ways. There are very good reasons to think that there are at least two entirely different types of V-initial order among the languages of the world (see Carnie, Dooley, and Harley 2005). One type is exemplified by Celtic and Mediterranean VS order (Berber, Arabic, Spanish, Greek). In these languages VS order is derived by V-movement. A different type is exemplified by Niuean and (very likely) other Oceanic (and perhaps more generally Austronesian) VS languages. As shown by Massam (2001b, 2005), Niuean is not just V-initial, but more generally predicate-initial. VS order is not derived by V-movement, but by VP-movement, which yields an entirely different syntactic structure, but often the same word order as V-movement.22 From a syntactic point of view, it is not obvious that the two VS types would systematically share any properties that they would not share with other languages, apart from (sometimes) identical surface order, while we do expect V-initial languages which derive VS order by V-movement to (possibly) share certain properties, namely properties that are linked to V-movement. A point made by Dryer (1998b) and also Newmeyer (1998) is that we can never establish absolute universals on the basis of just cross-linguistic comparison, simply because currently existing languages constitute only a fraction of all possible languages: countless languages have disappeared without a trace, and countless languages have not yet appeared. What we are expecting from cross-linguistic comparison is to give us clues about possible linguistic universals. Establishing whether they are truly universal, in the sense of being necessary components of human cognition, requires other kinds of data, such as experimental data showing that a particular structure or rule is unlearnable under natural conditions (see also chapter 10).

15.8 Comparing Categories Across Languages It is crucial for any comparative cross-linguistic research to know that we are comparing like with like. Does the subject in a given English sentence have the same structural import as the subject in the roughly corresponding Arabic sentence? Is the adposition meaning ‘on’ in English the same category as the case suffix meaning ‘on’ in Finnish? properties, or between VP-properties and NP-properties, or order in the VP and type of negation, etc. Universals that do exist are what he calls intra-domain universals. They typically have the form of universal hierarchies or scales, where languages can vary with regard to the placement of a feature. Keenan and Comrie’s (1977) accessibility hierarchy for relativization is one example. The hierarchy is: Subject > Direct Object > Oblique > Possessor. If a language can relativize on any position on this hierarchy it can also relativize on every higher position (Haspelmath 2008:96). 22 See Chung (2005), Holmer (2005), and Otsuka (2005) for discussion of VP-raising as a source of V-initial order in a variety of Austronesian languages.

372 Anders Holmberg Ever since Greenberg it has been taken for granted, or at least, it has been assumed as a heuristic in typological research that formal, syntactic, and phonological categories can be compared across languages; this is basis for most of the research behind WALS, for example. Greenberg (1963:74) concedes that he is basically employing semantic criteria when identifying grammatical relations and word classes. He notes (ibid.) that there are ‘very probably formal similarities which permit us to equate such phenomena in different languages,’ but refrains from entering any such discussion, adding: ‘In fact there was never any real doubt in the languages treated about such matters.’ In more recent years the issue has, again, come under scrutiny, and influential typologists such as Matthew Dryer, William Croft, and Martin Haspelmath have argued against the existence of cross-linguistic formal/structural categories. There are obviously similarities among grammatical items and categories across languages, but not identity. In a sense, this is almost a return to the position of the American structuralists, as most famously articulated in Boas (1911), who claimed that each language was a system unto itself, describable only on its own terms; see Croft (2001:34), Haspelmath (2008). It is not quite a return to this position, since the modern functional typologists obviously do not deny the possibility and usefulness of comparative research. The claim is that the comparison can be, and should be, done on the basis of ‘comparative concepts’ (Haspelmath’s 2010 term), that is categories which are drawn from traditional and/or formal grammar, but which are only loosely defined, typically on the basis of meaning or function, in a pragmatic fashion, the definitions being only just formal enough to permit stating broad cross-linguistic generalizations. For example ‘adjective’ as a comparative concept is defined as ‘a lexeme that denotes a descriptive property and that can be used to narrow the reference of a noun’ (Haspelmath 2010:670). Crucially, the definition does not specify the categorial features of the lexeme (verb, noun, or neither). As such, cross-linguistic generalizations can be stated over this comparative concept, including Greenberg’s word order universals (see above) and Cinque’s (2010) hierarchical universals, which hold true whether the words in question are verbal, nominal, or neither. Thus while noun-modifying items in individual languages have particular categorial features, they are also realizations of a broader, language-neutral ‘comparative concept.’ Newmeyer (2007) critiques the claim that typology can, and should, do without formal categories. Newmeyer makes essentially two points: first, that typological research has always relied on formal categories, with considerable success, and still does so, as the categories now called ‘comparative concepts’ are still in part formally defined, even when the definitions are eclectic and purposely ad hoc. Second, replacing formal syntactic and morphological criteria by semantic criteria has its own problems. The argument is that semantic categories are (a) universal and (b) not subject to theoretical controversy the way syntactic categories are. At least the last claim is wrong, Newmeyer points out, as semantic theory is every bit as controversial as syntactic theory.23 The obvious 23 Greenberg’s (1963/2007) criteria are in fact only in part semantic. The word order universals would lose much of their generality if the notions ‘subject’ or ‘genitive,’ for example, were defined just on semantic grounds; see Newmeyer (2007).

Linguistic Typology 373 optimal solution would seem to be that development of formal syntactic theory proceeds hand in hand with typological investigation. Formal syntactic theory needs reliable comparative data to test its hypotheses, but this will typically require finer formal distinctions to be made than are used in WALS, for example. And if the ambition, on the typologists’ side, is to get beyond the present state of understanding of linguistic variation, this may well require finer formal distinctions, too. For example, proceeding from the typology reported in Dryer (2013c) concerning expression of subject pronouns to a finer classification of Type 2 (along the lines suggested above), would seem to require a more detailed analysis of the subject–verb agreement element, presumably in terms of φ-features, arguably including features such as [±referential] or [±definite]. This is not to deny that certain cross-linguistic generalizations may actually be most appropriately stated in terms of semantically defined categories.

15.9 The Middle Way: Selective Global Comparison How big does a sample of languages need to be to serve as a viable testing ground for hypotheses about the nature of language? Given the limited scope of descriptive grammars and the almost inescapable unreliability of the data and the generalizations in them, it would be an advantage if careful sampling could substitute for large numbers. Let us say, instead of 100 or 1,000 languages selected for the comparison of a particular feature, there were 10 carefully selected languages. There are two obvious advantages: first, the languages in the sample could then be subjected to more detailed scrutiny, to avoid descriptive gaps and false generalizations and to make sure that the descriptive framework used to describe the languages, and the theoretical assumptions underlying it, are identical, or at least compatible (on this problem, see Newmeyer 1998:337ff.). Second, the investigation would not need to be restricted to ‘surface-observable’ properties; ideally negative data would be available as well. But importantly, selection of the ten languages would be such that they represent the different (major) families and different areas of the world, thus avoiding any obvious genetic or geographical bias. This method is advocated by Baker and McCloskey (2007), under the name of the Middle Way. They see it as way to combine the virtues of linguistic typology with generative linguistic research, hopefully with results that will be seen as interesting and relevant to both research programs. An additional sampling criterion would be that the restricted sample should include a variety of types expected to show variation with respect to the feature investigated. Assume that the investigation is about generic pronouns, a topic which I happen to be interested in. The sample should include one or more languages of Type 1 in Dryer (2013c) (non-pro- drop languages) as well as Type 2 (pro-drop languages), and among the Type 2 languages, consistent as well as partial ones (see section 15.6). There should be pro-drop languages

374 Anders Holmberg without agreement (Dryer’s Type 5). There should be languages with an overt dedicated generic pronoun, like English one, and languages where some other overt indefinite pronoun (‘anyone’) is employed to convey generic meaning. It should include one or more languages with a null generic pronoun with no special morphological conditions such as a particular impersonal verb inflection, and preferably one where a null generic pronoun is licensed by impersonal inflection. Unless the investigation is restricted to generic subject pronouns, the principal variation among object generic pronouns should also be covered (null vs. pronounced, dedicated pronoun or not, object agreement or not). These typological characteristics will overlap in various ways. The sample is based in part on prior knowledge of what factors make a difference, in part on educated guesswork. The expectation is that there will be patterns in the variation discernible even in a small but carefully selected sample, which can be probed further in the languages under investigation, possible because the number of languages is small, and/or can be tested on a larger sample. This method, combining a genetically and regionally balanced global mini-sample with a directed search for representatives of a variety of types, presupposes some knowledge of the distribution and the variation of the feature that we are interested in among the languages of the world, as well as a certain amount of knowledge of the formal properties of the phenomenon investigated. That is to say, it presupposes investigation of the large- scale comparative type, as a first step, as well as a preliminary investigation of the formal properties of the phenomenon, preferably in more than one or two languages, to have a preliminary idea of what properties it is desirable to include in a broader comparative investigation. This is, then, a kind of investigation which can be done successfully, at this stage, by a combination of the methods of linguistic typology and generative linguistics. In fact, something like this method, combining a broad survey with a more directed investigation of a smaller sample, has been employed in typological research. Haspelmath (1997) is a good example. In his investigation of indefinite pronouns he employs two samples. One consists of 40 languages, selected because there are grammatical descriptions of these languages which are detailed enough and reliable enough for the purposes of his investigation, and in addition native speakers can be contacted. This sample is heavily skewed towards European languages. The other sample contains 100 languages and is sampled so as to be representative of the world’s languages (Haspelmath 1997:16–17). The questions that are asked from these two samples are correspondingly different: the big sample can only supply information about a few superficial properties, while the small sample can provide information on issues of a more abstract nature. Haspelmath points out that indefinite pronouns appear to be a diachronically unstable phenomenon, exhibiting considerable variation even among closely related languages, and for this reason, too, use of a genetically and geographically biased sample is not as detrimental as might otherwise be the case (ibid.).24 In fact, the variability of a feature on 24 On the other hand indefinite pronoun systems show a clear continent-size areal pattern, with interrogative-based indefinite pronouns in the languages of Eurasia, America, and Australia, and generic-noun-based indefinites in Africa and Oceania, which appears to contradict the claim that they are diachronically unstable (Haspelmath 1997:241).

Linguistic Typology 375 the micro-level (between closely related languages and dialects) is the sort of information which can easily go unnoticed in an investigation based on a globally representative sample—and which may well turn out to be important for the proper understanding of the phenomenon. This indicates that Middle Way type of investigations based on globally representative mini-samples may need to be combined with micro-level investigation to minimize the risk of missing important generalizations.

15.10 In Conclusion Discussing the difference between research on language universals and linguistic typology, Comrie (1989) writes: ‘[T]‌he only difference [is] that language universals research is concerned primarily with limits on [the variation within human language—AH], whereas typological research is concerned more directly with possible variation.’ And he continues ‘However, neither conceptually nor methodologically is it possible to isolate the one study from the other’ (Comrie 1989:34). In this vein, I have stressed the complementary nature of research in typological linguistics and generative, formal linguistics in this chapter. The great strength of the typological program is in the method of investigation, the large-scale comparison, these days supported by increasingly sophisticated methods of processing data. The successes of the program in this regard are indisputable. There are also some rather obvious problems pertaining to large-scale comparative investigations, in part stemming from the fact that few languages have so far been described in sufficient detail to allow large-scale comparison of more abstract grammatical properties. The functionalist orientation which is characteristic of the linguistic typology stems at least in part from conditions on the method. If the properties compared are necessarily surface-observable, they will thereby be subject to some degree of unpredictable variation. Generalizations will therefore be exception-ridden, and thus often easier to explain in functional than formal terms. What the right explanation is, in any given case, is an empirical issue, though. I happily agree with Newmeyer (2005), Haspelmath (2008), and Hawkins (2013) that it is in the interest of everyone to distinguish between functionally and formally motivated properties of grammar; see chapter 7 in this connection. If a phenomenon has a credible functional explanation, then we can move on from there and look at the next phenomenon, without prejudice. There is no principled reason why generative, formally-oriented linguistics should not engage in broad-based comparison, following the methodology of modern typological research. There are some good examples of this practice. Cinque (1999) on adverbs and Cinque (2010) on adjectives, and Julien (2002) on tense and aspect marking are three well-known examples. Each of them is based on surveys of several hundred languages, with a view to constructing a universally valid theory of the functional sequence in the sentence (Cinque 1999, Julien 2002) and the noun phrase (Cinque 2010), with consequences for formal hypotheses such as the Mirror Principle (Baker 1985) and the status

376 Anders Holmberg of the notion ‘word’ in syntax. Giuseppe Longobardi’s project researching genealogical relations based on mass comparison of syntactic properties internal to the DP in large sets of languages could also be mentioned here (Gianollo, Guardiano, and Longobardi 2008; Longobardi and Guardiano 2009; see also chapter 16). Unfortunately these works remain rather isolated examples. There are many signs of increased, serious interest in typology within generative grammar, though, such as the ReCoS project at Cambridge (Roberts 2012). The SSWL database, created by Chris Collins and Richard Kayne, is also a step in that direction.

Chapter 16

Par ameter Th e ory a nd Par ametric C ompa ri s on Cristina Guardiano and Giuseppe Longobardi

16.1 Introduction A surface look at the syntax of human languages may give the impression that it contains only few more universals than the eminently arbitrary domain of lexical expressions. In this vein, Evans and Levinson (2009:429) make the following statement: ‘there are vanishingly few universals of language in the direct sense that all languages exhibit them,’ and present it as conflicting with the claim that ‘languages are all built to a common pattern.’ In this chapter, however, we argue that both statements are true, and that they are far from irreconcilable: the conclusion that there is idiosyncratic variation is mainly the result of too concrete and close-to-surface a notion of ‘universal.’ The apparently wide diversity of human syntax is in fact the consequence of the intricate interaction of three levels of universally constrained concepts, whose limited variability is not directly and biuniquely reflected in the external linguistic structures: (a) a limited number of parameter schemata and some general implications among them; (b) specific implications within sets of parameters; (c) the classical clustering of co-varying manifestations under single parameters. All these levels contain a large amount of cross-linguistically invariant information. In order to capture it and its complex relation to visible diversity, it is necessary to explore the structure of a general theory of parametric linguistics.1 The subfield of parametric linguistics can be minimally defined by four fundamental questions:

1

This work was partly supported by the ERC Advanced Grant 295733 LanGeLin (Principal Investigator: G. Longobardi). We are grateful to A. Ceolin for performing computational experiments.

378 Cristina Guardiano and Giuseppe Longobardi (1)

a. b. c. d.

What are the actual parameters needed to discriminate human languages? What is the format of a possible parameter? How do parameter values distribute in space and time? What kind of input sets each of the parameters allowed by UG?

To concretely answer instances of question (1a), Longobardi (2003) proposed a heuristic methodology called Modularized Global Parametrization (MGP). In order to address question (1b), a minimalist critique of the general form of parameters was suggested by Longobardi (2005), sketching the programmatic lines of a Principles & Schemata model of UG; this was subsequently refined in Gianollo, Guardiano, and Longobardi (2008) and Longobardi (2016), and will be discussed later in this chapter. Lastly, to address (1c), Guardiano and Longobardi (2005) and Longobardi and Guardiano (2009) proposed the Parametric Comparison Method (PCM), a parametric approach to the study of phylogenetic relationships among languages: they claimed that the synchronic and the historical study of formal grammar can be ultimately related within a unified framework, made available precisely by the rise of parametric linguistics. The pursuit of these lines of research is enhancing the empirical knowledge of parameters and parameter values across languages, thereby increasing the possibility of answering (1d) as well. In what follows, we focus on two aspects of this domain: (a) the pervasive role of the implicational structure of parameters in producing the surface appearance of unconstrained diversity; (b) the fact that the reality of parameter systems as theories of grammatical variation and its implicational structure receives novel support from their success with historical issues.

16.2 Parametric Linguistics and Levels of Adequacy Parameter theory was proposed as a way to solve the tension between descriptive and explanatory adequacy (Chomsky 1964) posed by language diversity (Chomsky 1995b:7; see chapters 5 and 14). In fact, parametric analyses have been very successful in attaining an intermediate level that we can call cross-linguistic descriptive adequacy, i.e., in accounting for grammatical diversity across languages (see Kayne 2000; Biberauer 2008; Biberauer et al. 2010; etc.) with the simplest axiomatization possible in each case: as such, they represent the most sophisticated theories of grammatical variation to date. As far as the level of explanatory adequacy is concerned, there is no doubt that parametric theories represent a conceptually plausible program, probably the only one so far proposed, but they have not yet been worked out as a realistic model of language acquisition. That is, no one has yet been able to implement a parameter setting system over a sufficiently large number of parameters (see Fodor 2001; Yang 2002; Wexler 2015; and chapter 11).

Parameter Theory and Parametric Comparison 379 Anyway, the cross-linguistic descriptive empirical evidence gathered so far, once systematically organized, is already sufficient to provide some general insights into the scope and structure of parametric variation. The PCM is an attempt to formally organize part of this body of knowledge. In principle, any particular instantiation of the PCM is a quintuple: (2)

a. a set of parameters; b. a set of UG principles defining the scope and interactions of such parameters; c. a set of triggers for parameter states as potentially available to linguists (presumably a superset of those available to first-language learners); d. an algorithm for parameter setting; e. an algorithm to calculate distances between strings of parameter states.

As a result of the application of (2a–e) to language data, the core grammar of every natural language is represented by a string of binary symbols, each coding the state of a parameter: such strings can easily be collated and used to define exact correspondence sets for theoretical or historical purposes.

16.3 The Domain of Data One of the most successful methods of identifying and describing syntactic variation within a parametric approach is to take two or a few more languages, otherwise rather similar, and to focus on varying properties, grouping them together, also on the grounds of poverty-of-stimulus arguments and simple covariation observations. This method provides deep and often correct grammatical insights, but, alone, may lead little further than the study of single parameters in relatively few languages. Another method is using fully extended typological coverage, in the style of Greenberg (1963), Hawkins (1983), and so forth; the application of this method comes up against two types of shortcomings. First, the depth of grammatical investigation required by a sound parametric analysis is not consistent with traditional typological descriptions. For example, descriptions in terms of Dixon’s (1998) Basic Linguistic Theory are often insufficient for a parametrically-oriented linguist to determine the setting of actual parametric values. Similarly, parametrically relevant information can hardly be gleaned from massive collections of data such as the World Atlas of Language Structures/WALS (Baker 2008b; see also the discussion in c hapter 15, section 15.6). Second, the method is likely to be insufficient if investigation is not guided by strong abductive and theory-oriented considerations, even if hypothetically extended to all languages, because the cardinality of the universal parameter set is so large as to generate a number of possible languages of which the actual existing or known ones represent an extremely small sample (cf. Bortolussi et al. 2011; chapter 14, section 14.4.2).

380 Cristina Guardiano and Giuseppe Longobardi The methodological dilemma raised by such inadequacies is the crucial issue to be addressed in order to establish a sound parametric linguistics. To resolve it, it is advisable to adopt a Modularized Global Parametrization (MGP, Longobardi 2003), an operational strategy that secures typological coverage and depth of insight, at the acceptable cost of limiting the analysis of the parameter system to specific modules. The MGP consists of the following combination of strategies: (3)

a. starting from a relatively wide sample of languages and language families, but crucially comprising a substantive core of structurally similar varieties exhibiting clear minimal contrasts; b. focusing in depth on a limited subset of the parameter system, with its parameters hopefully subject to limited perturbation by the values of others outside the subset (but controllable and revealing interactions with each other within the subset).

In line with this strategy, several PCM experiments have been conducted on the internal structure of Determiner Phrases (Guardiano and Longobardi 2005; Longobardi and Guardiano 2009; Longobardi et al. 2013, 2015; Guardiano et al. 2016). A good amount of syntactic polymorphy of the DP module was encoded in successive empirical approximations to a type of representation, conventionally named Table A, in Figure 16.1 (with 75 parameters and 40 languages). Note that Figure 16.1 provides only part of this table: the full version is available online at www.oup.co.uk/companion/roberts. In Figure 16.1 each parameter is identified by a progressive number (first column) and by a combination of three capital letters (second column). We built a crosslinguistic morphosyntactic difference into Figure 16.1 as a parameter if and only if it entailed any of four types of surface phenomena: (a) the presence of obligatory formal expression for a semantic or morphological distinction (grammaticalization, i.e., the obligatory presence of a feature in the computation to obtain the relevant interpretation and its coupling with an uninterpretable counterpart); (b) the variable form of a category depending on the syntactic context (selection and licensing); (c) the position of a category (reducible to movement, i.e., + overt attraction, triggered by grammaticalized features, under a universal base hypothesis); (d) the availability in the lexicon of certain functional features/morphemes. Within the DP-module, further subdomains can be distinguished in Figure 16.1: the status of various features typically associated with the functional category that heads nominal structures (D), such as Person, Number, definiteness, …; the syntactic properties of nominal modifiers (adjectives and relative clauses), of genitival arguments, possessives, and demonstratives; the type and scope of ‘N-movement’ (referring to the movement of the relevant projection headed or characterized by N). In Figure 16.1, the alternative parameter states are encoded as + and –: such symbols have no ontological value, but only serve to indicate oppositions. The parameter states were set in the languages listed in Figure 16.1 on the basis of the administration of purpose-specific questionnaires (trigger lists, i.e., a list of the sets of potential triggers identified for each parameter). Triggers have been formulated

Parameter Theory and Parametric Comparison 381 using English as a metalanguage. In defining the notion of trigger, we follow Clark and Roberts (1993:317): ‘A sentence s expresses a parameter pi just in case a grammar must have pi set to some definite state in order to assign a well-formed representation to s.’ Such a sentence (or phrase) s is thus a trigger for parameter pi.

16.4 The Implicational Structure of Parametric Diversity Observable syntactic properties are often not independent of each other. Two levels of deductive structure are to be considered in the first place, one related to the classical concept of parameter, since Chomsky’s first proposals at the end of the 1970s, the other to more recent empirical and theoretical work. As for the first point, syntactic parameters must already intrinsically encode the rich implicational relations supposedly connecting distinct observable phenomena at the level of abstract cognitive entities: their robust deductive structure captures the cross- linguistically systematic covariation of many superficial properties (see chapter 14 for more discussion and illustration). A further major formal feature of parameter sets (Fodor 2001:735; Baker 2001; Longobardi 2003; Biberauer and Roberts 2013; see also chapter 14, section 14.10) is the fact that parameter values are tightly interrelated. Such an implicational aspect of parametric systems challenges the independence of characters: one particular value of a certain parameter, but not the other, often entails the irrelevance of another parameter, whose consequences (corresponding surface patterns) become predictable. Therefore, the latter parameter will not be set at all in some languages and will represent completely implied information in the language. The investigation of the DP-module through the PCM, and the rich amount of information collected in Figure 16.1, has provided evidence of a remarkably intricate implicational structure of parameter systems and has revealed how pervasive the phenomenon is in a module of grammar of a minimally realistic size (Guardiano and Longobardi 2005; Longobardi and Guardiano 2009; Bortolussi et al. 2011; Longobardi et al. 2013). The use of a compact module thus turned out to be helpful to maximize control of the dependency structure of the parametric system, since the consequences of cross-parametric implications are more likely to be detectable in close syntactic spaces: at least all the pairs of the parameters of the sample used have been checked conceptually and empirically for the possibility of bearing implications. Implications are sometimes ‘empirical,’ such as, Greenberg’s (1963) Universal 36: Gender can be grammaticalized (parameter 5, FGG) only in languages in which Number is grammaticalized as well (parameter 3, FGN); others are ‘logical’ or at least suggested by virtual conceptual necessity, such as, in languages in which the functional (postadjectival) genitive projection called ‘GenO’ is not available (−54, GFO), it

382 Cristina Guardiano and Giuseppe Longobardi

Figure 16.1 A sample from Longobardi et al. (2015), showing 20 parameters and 40 languages. The full version of the table, showing all 75 parameters, can be found at www.oup.co.uk/companion/roberts.

will be impossible to ask whether it interacts with other categories, for example, if it is crossed over by N: therefore, it will be irrelevant to set parameter 65 (NGO). In Figure 16.1, the irrelevance of a certain parameter in the language as a consequence of the partial interactions among parameters is marked by a 0. The conditions which must hold for each parameter not to be neutralized are indicated in the third column, after the name of the parameter itself. In expressing parameter states, a 0 is used if and only if: (4)

a. as a consequence of interactions with parameter m, no construction which could trigger parameter n is ever manifested in the weakly generated language; b. the interaction between m and n necessarily produces surface consequences which are indeed identical to those obtained when n is set to + or to −, respectively.

Figure 16.1 encodes the set of parameters in a partial order. The criterion followed in choosing such an order is the direction of dependence for interacting parameters: as can be seen from the numbers in the labels, conditioning parameters always precede those which depend on their values (p2 is settable iff p1 is set to +; p3 is settable iff p2 is set to +, and so on). It is important to stress that the full set of implicational relations among parameters appears to be far from the simple parametric hierarchies of the type familiar from

Parameter Theory and Parametric Comparison 383

Biberauer and Roberts (2013; see also chapter 14, section 14.10). In the latter cases, the settability of one parameter depends on one and not the opposite value of a single other parameter, potentially in a recursive fashion: Parameter 1 N

Y Parameter 2 N

Y Parameter 3 N

Y

In fact, many parametric implications fall outside this felicitous format. As illustrated directly, a parameter’s settability may depend (1) on the conjunction of values of more than one parameter at a time; (2) on the disjunction of values of other parameters, i.e., either on the value of another parameter or on the value of a third parameter; (3) on the absence of a certain value of another parameter, i.e., on the presence of the opposite value or also of 0, expressible through the negation of a value. Such more complex and realistic feature geometry can be coded in a Boolean form, i.e., representing implicational conditions either as simple states (+ and –) of another parameter, or as conjunctions (written ‘,’), disjunctions (or) or negation (¬) thereof. The huge number of 0s in Figure 16.1 is a first witness to the high level of potential redundancy that a complete parametric list should present a language learner with: out of 28 × 75 = 2,100 parameter states, 818 are null, i.e., 39% of the information is redundant.

384 Cristina Guardiano and Giuseppe Longobardi The size of this redundancy with respect to the number of potentially generated languages is discussed in 16.5. An example of the depth of the implicational structure of parameters within the DP module is provided by the consequences of parameter 11, DGR. Parameter 11 (DGR) governs the requirement that the definite reading of a nominal argument be marked overtly. In languages with a definite article, such as the Romance languages, this parameter is set to +; instead, it is set to –in, say, Russian or Hindi. Now, the value –for parameter 11 (DGR) neutralizes, or contributes to neutralizing, the relevance of 9 other parametric choices, and indirectly (i.e., by transitivity) that of 15 additional parametric choices. In particular, parameters 12 (CGR), 16 (DOR), and 74 (PDC) require that parameter 11 (DGR) be set to + in order to be relevant. Parameter 12’s (CGR) most visible surface manifestation is the presence of an obligatory marker of a singular indefinite count noun in argument position, distinct from those used for definites and mass indefinites. Languages with –12 (CGR) do not syntactically distinguish, in argument position, indefinite singular nominals interpreted as mass from those interpreted as count. This property correlates with empty Ds non-locally interpreted as definite (Crisma 2015) through: (a) definiteness suffixes occurring on non- phrase- initial elements (Icelandic), (b) non-phrase-initial definite Genitives (Icelandic, Celtic, Semitic, …). In languages with +12 (CGR), indefinite nominals mark the count reading through a determiner-like element, most often a development of the numeral ‘one,’ and correspondingly (a) definiteness-bearing items must occur in the D area, (b) definite genitives may not transmit definiteness from a non-initial position (German). Languages not grammaticalizing definiteness seem not to distinguish count nouns from the rest in any context, hence the condition +11 (DGR). Parameter 16 (DOR) asks if the definiteness of the head of a relative clause is marked on the introducer of the relative clause itself, i.e., the latter overtly agrees in definiteness with the head (as in Arabic, +16, DOR), or not (–16, DOR). This parameter logically presupposes that definiteness is grammaticalized, hence its dependency on +11 (DGR). Parameter 74 (PDC) governs the option of using possessives as definite determiners (+74, PDC; Giorgi and Longobardi 1991) or not (–74, PDC). It opposes languages like French, where possessives occur without any visible article (mon livre vs. *le mon livre), to those like Italian, in which a visible determiner is possible and normally required instead (il/un mio libro vs. *mio libro). Only in languages like French does the possessive itself entail the definite reading of the whole DP (mon livre vs. il mio libro). This parameter conceptually and typologically depends on full grammaticalization of definiteness too (+11, DGR). Parameter 20 (BAT) requires parameter 11 (DGR) to be set to –. In languages with +20 (BAT), unbounded nouns require a specific morpheme that is necessary to cancel number-neutral readings. Since definite articles, when present, are not ambiguous between singular and plural readings, the parameter is relevant only in languages with no definite articles; hence the implication –11 (DGR).

Parameter Theory and Parametric Comparison 385 Parameter 27 (DGP) requires parameter 11 (DGR) not to be set to +: it can only be set if 11 (DGR) has the value –or is in turn neutralized by the value –, or by a 0, of parameter 2 (FGP). Parameter 27 (DGP) asks whether the definite reading of a nominal argument must be marked in the subset of cases in which it designates an entity already explicitly introduced in the previous discourse (anaphoric definiteness; +27, DGP), or not (–27, DGP). Therefore, it is relevant only for languages that do not have an obligatory marker for general definiteness (which includes anaphoric definiteness, too); hence the condition +11 (DGR). Parameters 13 (NSD), 15 (DCN) and 32 (TSP) have a disjunctive implicational condition: namely, to be relevant, they require that parameter 11 (DGR) be set to +, unless parameter 3 (FGN) is set to + and Parameter 7 (FSN) is not set to +. Parameter 3 (FGN) distinguishes languages where Number distinctions are obligatory at least on some part of a DP from those which do not mark such distinctions systematically on DPs. In +3 (FGN) languages, superficially determinerless nominal arguments are possible only if Number is contextually identified on the head noun. Parameters 7 (FSN) precisely governs the existence of feature (in particular Number) concord between the D position and the head noun. The lack of adequate number exponence on the head noun, in fact prevents the existence of bare argument nouns (Delfitto and Schroten 1991), forcing the generalized presence of some overt determiner (an article essentially neutral between definite and indefinite reading), even in languages which appear not to grammaticalize definiteness (e.g., Basque). Parameters 13 (NSD), 15 (DCN) and 32 (TSP) can be set only in the presence of a definite (+11, DGR) or a ‘general’ (+3, FGN and ¬+7, FSN) article in this sense. Parameter 13 (NSD) defines whether attraction to the D area of referential nominal material (e.g., proper names; Longobardi 1994) is overt (e.g., Romance, Basque; +13, NSD) or not (e.g., Germanic, Celtic; –13, NSD). Movement can be replaced by the insertion of a filler in D, with expletive function, usually homophonous with the definite article (cf. Roma antica vs. l’antica Roma). In languages where N is always trapped in a low position (e.g., Greek), proper names can never overtly move to D; thus, the expletive remains the only viable alternative (Guardiano 2011). For principled Economy reasons (Longobardi 2008), common nouns seem universally unable to overtly raise to D; therefore, in +13 (NSD) languages the expletive represents their only possibility to be referential (i.e., kind-referring) names, rather than indefinites. Languages with –13 (NSD) do not require D to be filled in either of the aforementioned cases. The visible effects of this parameter appear to be neutralized in languages where D may remain empty; hence the condition (+FGN, ¬+FSN) or +DGR. Parameter 15 (DCN) asks if the definite marker of a language is a bound morpheme cliticizing on the head noun, as e.g., in Romanian (+15, DCN) or not, as, for instance, in the rest of Romance (–15, DCN). The morphological merging of the definiteness affix with N may follow from overt N-movement and is morphologically quite uniform. In some languages, these suffixed nouns can never surface after an adjective (e.g., Romanian and Bulgarian). In others, they end up in a position below adjectives (i.e., Scandinavian). In Romanian and Bulgarian, if an adjective occurs prenominally, the suffix still occurs on a DP-initial category, indeed on the first adjective itself. In Scandinavian the adjectives, always prenominal, precede suffixed nouns. The difference is likely to be typologically connected to parameter 13 (NSD): in +13 (NSD) languages

386 Cristina Guardiano and Giuseppe Longobardi (Romanian, Bulgarian) definite suffixes would contain Person specification and thus be overtly attracted to D along with their host; in –13 (NSD) languages (Scandinavian), the suffix would be unspecified for Person and unable as such to go to D with its host. The occurrence on adjectives of definite suffixes is supposed to be contingent on them being specified for Person. This is excluded in Scandinavian, which is left with two main subcases for connecting definiteness to D over an A: if a language is +strong article (+12, CGR) a free-morpheme definite article is inserted in D (Mainland Scandinavian); if a language is –12 (CGR), D remains empty and long-distance interpreted as definite (Icelandic, Crisma 2015). The parameter logically presupposes the existence of an article; hence the condition (+FGN, ¬+FSN) or +DGR. Parameter 32 (TSP) governs the possibility for demonstratives to replace (as, e.g., in Italian or English; +32, TSP) rather than co-occur with (as, e.g., in Celtic; –32, TSP) articles, hence it presupposes the existence of the latter, as formalized in the disjunctive condition in question. Parameter 75 (ACL) has a disjunctive implicational condition, too; to be relevant, it requires parameter 11 (DGR) not to be set to +: it can only be set if parameter 11 (DGR) has the value –or a 0, unless parameter 74 (PDC) is set to –. Parameter 75 (ACL) governs the option, for possessives, to cliticize on adjectives (+75, ACL) or not (–75, ACL), only available in languages with structured adjectives (+35, AST) but without adjectival possessives (–72, APO). A language active for this parameter must also have a negative value for D-checking possessives (–74, PDC), unless it does not grammaticalize definite articles, hence the condition ¬+11 (DGR). A further category of implications includes parameters whose values depend on parameter 11 (DGR) by transitivity, at various levels of embedding: this is a much richer set of parameters. A simple case is, for instance, parameter 21 (FGC), which can only be set if parameter 20 (BAT), which in turn depends on –11 (DGR), is not set to +; a further example is parameter 25 (DNN), which can be set only if parameter 15 (DCN), which has a disjunctive implication containing +11 (DGR), is set to –, and parameter 13 (NSD), which also has a disjunctive implication containing +11 (DGR), is set to +; an example of a more indirect implicational dependency on parameter 11 (DGR) is represented by parameter 22 (GBC), which depends on +21 (FGC), which in turn depends on 20 (BAT), which, as shown, can only be set in –11 (DGR) languages. The complex deductive structure of any realistic set of parameters contributes to make non-transparent the relationship between most single parametric values of an I-language (Chomsky 1986b) and its actual manifestations in the corresponding E-language. But even sets of unimplied parameters interact in such a way as to determine nonbiunique relations between single parametric values and data: the linear sequence possessive–proper name in articleless argument nominals in Basque (gure Ion/*Ion gure) instantiates an opposite value for parameter 13 (NSD) with respect to the identical sequence in English (our John/*John our), but exactly the same parameter value as in Italian, in which instead only the reverse order is grammatical (Gianni nostro/*nostro Gianni). The interaction of the values of parameters 6 (NOD) and 13 (NSD), which do

Parameter Theory and Parametric Comparison 387 not imply each other, suffices to produce a situation in which the same sequence in two different languages expresses the opposite values of the same parameter.

16.5 Encoding Implicational Constraints: Distances and Possible Languages The relevance and impact of this system of universal implicational constraints can be better appreciated through a mathematical evaluation of their effects. Indeed, through the methods proposed in Longobardi and Guardiano (2009) and refined in Bortolussi et al. (2011), it was first of all possible to precisely measure the syntactic distance of all the language pairs generated from Figure 16.1. The Jaccard formula (i.e., the number of identities divided by the sum of identities and differences) was chosen to represent syntactic distances, because it almost completely neutralizes some numerical drawbacks of crossparametric dependencies. Distances among the languages of Figure 16.1 are listed in Figure 16.2. With the aid of an algorithm to sample uniformly from the space of parameters’ tuples satisfying all their implicational constraints, specifically devised by Bortolussi et al. (2011), it has become possible to calculate the number of possible languages generated by such a list of parameters as Figure 16.1, and to compare their distribution with that of the actual languages parametrically encoded in Figure 16.1. This was the first step towards understanding implicationally constrained grammatical systems, and to measure the impact of the implicational structure of parameters on the whole space of language variation. A priori, the number of potential grammars generated by a system of n independent parameters would be 2n. Instead, in a simple, though pervasive, implicational structure (i.e., Biberauer and Roberts’ 2013 system of parameter hierarchies), where each parameter depends on the setting of the previous one (otherwise it cannot be set), the cardinality of the set of generated grammars is n + 1. These cases represent two extremes along a continuum of implicational constraints of possible grammars. Bortolussi et al.’s algorithm (2011) was explicitly worked out in order to calculate the number of possible strings of parameter values (languages) generated according to a system of parametric implications. The number of admissible languages generated by the 75 parameters of Figure 16.1 respecting their acknowledged cross-parametric implications is 3.3 × 1015, a reduction of seven orders of magnitude, compared to the 275 = 3.78 × 1022 predicted under character independence (L. Bortolussi, A. Ceolin, p.c.). This skew makes it patent that implicational universals among parameters play a key role in constraining the number of possible languages, and provides the first empirical demonstration of the impressive downsizing of the possible space of syntactic variation induced by the implicational structure of parameters (see also chapter 14, section 14.10).

388 Cristina Guardiano and Giuseppe Longobardi

Figure 16.2 Sample parametric distances (from Longobardi et al. 2015) between the 40 languages shown in Figure 16.1. The full version of the figure showing all parametric distances can be found at www.oup.co.uk/ companion/roberts.

16.6 Testing Parameters through Historical Adequacy PCM implementations are the most systematic attempts to answer (1a), i.e., to show that (a) parameters attain cross-linguistic descriptive adequacy on a large scale, (b) they crucially do so through a number of abstract universal principles and implications, and (c) the relation between such abstract structures of UG and the observed language variability is necessarily mediated by a long deductive chain. To independently support this view of the theory of syntactic variation it is important to show how it can address other questions in parametric linguistics. Questions (1b) and (1c) represent an attempt to go ‘beyond explanatory adequacy’ (Chomsky 2001b), by pursuing two further levels of adequacy, termed in Longobardi (2003) evolutionary and historical adequacy. Specifically, (1c) pertains to the level of historical adequacy, the most promising domain of application of the PCM. In order for a comparison method, either typological or historical, to be fully successful, it must be based on comparanda able to identify precise correspondences between two or more languages (Guardiano and Longobardi’s [to appear] correspondence problem). As observed by Watkins (1976), in order to establish syntactic correspondences among languages, one must compare items clearly falling into equivalence classes.

Parameter Theory and Parametric Comparison 389 Following Roberts (1998), Longobardi (2003) suggested that parameter values provide such required systematic comparanda; owing to their universal character, in a comparison based on parameters, in principle there should not be any doubt about what is to be compared with what: a parameter in a given language must and can be aligned with the value of exactly the same parameter in other languages. Parameters promise to be better indicators of general historical trends than many cultural features in at least three other respects. First, they are virtually immune from natural selection and, in general, from environmental factors: indeed, while lexical items can be borrowed along with the borrowing of the object they designate, or adapted in meaning to new material and social environments, nothing of the sort seems to happen with abstract syntactic properties. Thomason and Kaufman (1988) have suggested that, while lexical borrowing may take place heavily and independently of other types of interference in a variety of cultural contact situations, morphosyntactic properties are transmitted into another language under only two circumstances: in the case of language shift on the part of a population (language replacement, as studied in, e.g., Renfrew 1987) or in the case of close and centuries-long contact involving salient levels of bilingualism. Both situations, even though not strictly phylogenetic in nature, are extremely significant from a historical point of view and thus naturally recorded by any effective method of historical linguistic analysis. Second, parameter values appear to be unconsciously and rather uniformly set by all the speakers of the same community in the course of acquisition, therefore they are largely unaffected by deliberate individual change which, according to Cavalli Sforza (2000), may influence the history of other culturally transmitted properties. Third, at least in some obvious cases, parameter values seem able to exhibit ‘long- term’ historical persistence: the inertial view of diachronic syntax, advocated in Longobardi (2001b) as a specific minimalist implementation of Keenan’s (1994) Inertia, proposes that parametric syntax should be among the most conservative aspects of a language, thus the most apt to preserve traces of original identity. This is based on the assumption that, while semantic and phonological features can be sources of primitive changes (i.e., spontaneous deviations from the target language in acquisition), formal features are deeper, and their change can only be either a direct or indirect consequence of some interface/lexical innovation, or of some direct but even more external influence (e.g., borrowing) on the grammar. The historical reliability of the PCM was statistically evaluated according to two different criteria: the analysis of the distribution of the calculated distances, and the probability of relatedness for the closest language pairs. To be historically probative, distances should present a statistically significant distribution, i.e., they should be scattered across different degrees of similarities. In order to check this, the distribution of the real-world language pairs from the 40 languages of Figure 16.1 was plotted against that of the randomly generated pairs using Bortolussi et al.’s (2011) algorithm (Longobardi et al. 2016). The result, modulo scaling the sizes of the two curves, is the graph represented in Figure 16.3: here, the actual distances (grey curve) are scattered across a large space of variability (ranging from 0 to 0.625), suggesting that their distribution is non-random, and thus calls for a (historical) explanation.

390 Cristina Guardiano and Giuseppe Longobardi

0.00

Type

0.25

Obs.

0.50 Distance

0.75

Obs. (cross-fam.)

1.00

Rand.

Figure 16.3 A Kernel density plot showing the distribution of observed distances from Figure 16.2 (grey curve), observed distances between cross-family pairs from Figure 16.2 (dashed-line curve), and randomly generated distances (black curve). From Longobardi et al. (2015).

Our analysis of probabilistic significance is inspired by the notion of individual-identifying evidence (Nichols 1996). Longobardi and Guardiano (2009) already argued, under elementary binomial evaluations, that the probability of relatedness for the closest language pairs of their Table A met Nichols’ requirements. Bortolussi et al.’s (2011) statistical procedures confirm such a probability as significant. A further test of the historical informativeness of the calculated distances, again made possible by Bortolussi et al.’s (2011) experiment, is based on Guardiano and Longobardi’s (2005) Anti-Babelic principle (‘similarities among languages can be due either to historical causes or to chance; differences can only be due to chance: no one ever made languages diverge on purpose’). In a system of binary equiprobable differences, in order for syntactic distances to display a historically significant distribution, we must expect the following empirical tendencies: (1) pairs of languages known to be closely related should clearly exhibit a distance between 0 and 0.5; (2) other pairs should (at least under a monophyletic assumption) tend toward distance 0.5; (3) distances between 0.5 and 1 should tend toward nonexistence. The Anti-Babelic Principle predicts that two completely unrelated languages should exhibit a distance closer to 0.5 than to 1: therefore, a system of parametric comparison producing too huge a number of distances higher than 0.5 is unlikely to correctly represent historical reality. Such an expectation could be tested for the first time precisely as an outcome of the PCM, because it has measurable effects only in discrete and universally bounded domains like parametric syntax, but hardly in virtually infinite ones like the lexicon. The distribution of the actual syntactic distances calculated from Figure 16.1 correctly meets these expectations. Indeed, only eight pairs exceed the median of the random distance distribution (0.5). Most importantly, no real language pair exhibits a distance higher than 0.842, i.e., the distance above

Parameter Theory and Parametric Comparison 391 Spanish Portuguese French Sicilian Italian Romanian Danish Norwegian Icelandic English German Serbo-Croatian Slovenian Polish Russian Bulgarian Greek (standard) Cypriot Greek Marathi Hindi Pashto Irish Welsh Estonian Finnish Hungarian Turkish Buryat Basque (central) Basque (western) Farsi Inuktitut Arabic Hebrew Cantonese Mandarin Kadiweu Japanese Wolof Kuikúro

Figure 16.4 KITSCH tree generated from the parametric distances of Figure 16.2.

which it is very unlikely for a language pair to appear by chance, given this set of parameters (its ‘Babelic threshold’). These results already lead us to expect that the number of historically incorrect taxonomic relations provided by such a parametric sample will be limited. Since Longobardi and Guardiano (2009), various language taxonomies have been produced using parametric distances, with the aid of distance-based computational algorithms provided in the PHYLIP package (Felsenstein 1993). All the experiments performed so far provided largely correct tree-like phylogenies and networks (Longobardi et al. 2013; Guardiano et al. 2016). Figure 16.4 represents a taxonomic tree built from Figure 16.2 using KITSCH. Here, most genealogical relations among the 40 languages are correctly represented. As predicted, Indo-European is recognized as a unit, with the exception of Farsi.2 Within Indo-European, all the established genealogical subgroupings (Romance, Germanic, Slavic, Greek, Indo-Aryan, Celtic) are correctly identified. Outside Indo-European, as expected, the clusterings of the three Finno-Ugric languages, the two Altaic, the two Semitic and the two Sinitic ones are also recognized. This proves that the PCM succeeds in empirically detecting historical signals within well-established families. A further good result is that the isolates are kept distinct from the core groups and from each other. 2

Farsi turned out to be problematic for classification in many respects, not only for parametric taxonomies (Longobardi et al. 2013), but partly also for lexical ones (Dyen et al. 1992).

392 Cristina Guardiano and Giuseppe Longobardi To sum up, in spite of Lightfoot’s (1979, 1991, 2006) and Newmeyer’s (2005) differently motivated skepticism, parameters appear to carry a historical signal, indeed a chronologically deep, statistically robust, and prevailingly vertical one. These results validate the PCM as a useful tool for measuring linguistically significant similarity, and in turn support parametric grammar as a realistic model of historical language transmission.

16.7 Parameters and Horizontal Transmission Further historical support for the parametric model of variation is provided by the application of the PCM to the study of dialect microvariation. In principle, two types of historical explanations are available for non-accidental linguistic similarity: vertical divergence from a common ancestor and horizontal (areal) convergence (see also chapter 15, especially section 15.5). The results described in the previous sections show that the PCM is able to capture an amount of genealogical information (vertical signal) higher than prima facie expected and, correspondingly, that horizontal transmission of parameter settings is a less widespread phenomenon than might superficially appear. To better understand the very nature of parameter borrowing, Guardiano et al. (2016) explored syntactic dialectal microvariation in two strong contact areas—Southern Italy and Asia Minor—and argued that parameter systems tend to resist changes motivated by external pressures (and thus not already banned by other restrictive conditions hypothesized on acquisition/diachronic resetting, such as Inertia): the Resistance Principle (Guardiano et al. 2016) poses a minimal condition on horizontal transmission in syntax that actually complements and further restricts Inertia, and predicts that the resetting of a parameter under the influence of interference data is possible only if the new triggers are ‘familiar’ to the borrowing language (though not sufficient on their own to trigger the new value). The study of microvariation also provides evidence that parameters of different ‘sizes’ (see Roberts 2012 and chapter 14) may capture different degrees of closeness between languages. Indeed, the exploration of syntactic variation under a finer-grained perspective reveals aspects of variation that can only be detected within single groups of dialects: the parameters required for this type of investigation (as recognized by Guardiano et al. 2016) belong to Roberts’ (2012) category of microparameters, unlike most of the 75 parameters of Figure 16.1, which instead belong to the class of mesoparameters (see chapter 14, section 14.5). Thus, pairs of independently (diachronically, geographically, sociolinguistically) distant languages and pairs of independently close languages differ with respect to sizes and sorts of syntactic properties: this reflects the fact that not all parameters have the same stability, and therefore different parametric changes can non- accidentally trace splits of variable historical depth.

Parameter Theory and Parametric Comparison 393

16.8 Parameter Schemata Summing up, parametric analyses represent the most promising approaches to resolve the tension between descriptive and explanatory adequacy. They attempt to do so by reducing most of the surface syntactic variation to a deductively intricate combination of parameter values, further limited by invariant implicational constraints. This system of abstract universals and open choices could be laid down explicitly and tested through the PCM against independent evidence, namely its capacity to attain historical adequacy. To further validate such parametric theories, one must test if they can also represent conceptually plausible models of the faculty of language and its natural history. This means being able to address question (1b). Question (1b) has to do with a new type of tension, this time between descriptive and evolutionary adequacy: if parameters are included in a theory of UG, minimization of the genetic endowment in the faculty of language must probably amount to minimizing the number of parameters as well, thus resulting, if anything, in a reduction, rather than in the observable extension, of the space of variation (see again chapter 14, especially section 14.5). A possible indication of the solution to this puzzle comes from a minimalist critique of the format of parameters (Longobardi 2005), in order to explore whether they obey principled restrictions imposed by general properties of computational efficiency and conditions at the interfaces. As a second step, a minimalist critique of the format of parameters should address the problem of a principled explanation for these schemata, i.e., of the conditions which are supposed to be optimally satisfied by such a parametric system in order to meet evolutionary adequacy. Fundamentally accepting Borer’s (1984) insight that parameters are always properties of functional heads of the lexicon as a point of departure, Longobardi (2005; now also see Gianollo, Guardiano, and Longobardi 2008, Longobardi 2016) proposed that the format of parameters can be reduced to a set of general and pervasive abstract schemata, such as those listed here: (5) a. Is F, F a feature, grammaticalized? b. Does F, F a grammaticalized feature, Agree with X, X a category (i.e., probes X)? c. Is F, F a grammaticalized feature, ‘strong’ (i.e., overtly attracts X, probes X with an EPP feature)? d. Is F, F a grammaticalized feature, spread on Y, Y a category? e. Does a functional category (a set of lexically co-occurring grammaticalized features) X have a phonological matrix φ? f. Does F, F a grammaticalized feature, probe the minimal accessible category of type X (or is pied-piping possible)? g. Are f1 and f2, the respective values of two grammaticalized features, associated on X, X a category? h. Are f1 and f2, two feature values associated on X, optionally associated? i. Does a functional feature (set) exist in the vocabulary as a bound/free morpheme?

394 Cristina Guardiano and Giuseppe Longobardi The nine schemata define, then, nine corresponding types of parameters: (6) a. b. c. d. e. f. g. h. i.

Grammaticalization parameters Probing parameters Strength (or EPP) parameters Spreading parameters Null category parameters Pied-piping parameters Association parameters (Inclusive) disjunction parameters Availability parameters

‘Grammaticalized’ in (5a) means that the feature must obligatorily occur and be valued in a grammatically (generally) rather than lexically (idiosyncratically) definable context, e.g., the definite/indefinite interpretation of D is obligatorily valued and marked in argument DPs in certain languages (say English or Italian), but not in others (say Russian or Latin). This does not mean that even the latter languages cannot have lexical items occasionally used to convey the semantic meaning of definiteness (presumably demonstratives and universal quantifiers can convey such a meaning in every language), but in this case the feature ‘definiteness’ would be regarded as a lexical, not a grammatical one. (5b) asks whether a certain feature requires establishing a relation with a specific category in the structure, creating a dependency (acts as a probe searching a certain syntactic space for a goal, in Chomsky’s 2001 terminology). Optimally, the domain of probing (i.e., the scope of application of Agree) should be determined by universal properties of features and categories, and from variation affecting the latter (arising from schemata such as (5g) and (5h)); hence, (5b) could perhaps be eventually eliminated from parameter schemata and the relative labor divided, such as between (5a) and (5c). However, some dimension of variation in that spirit probably has to be maintained at the level of externalization properties, especially governing whether head movement takes place in a language to form, say, N+enclitic article or V+T clusters. Further questions arise with respect to clitics in general (Roberts and Roussou 2003, Roberts 2010d). (5c) corresponds to the traditional schema inaugurated by Huang (1982) for wh- questions, asking whether a dependency of the type mentioned in (5b) involves overt displacement of X, i.e., remerging of X next to F, or not. Innumerable cases of cross-linguistic variation of this type have been pointed out. (5d) asks if a feature which is interpreted in a certain structural position also has uninterpretable counterparts on other categories which depend on it for valuation. This is meant to cover the widespread phenomenon of concord, such as in φ-features: attributive adjectives agree in gender and number with determiners in, say, Italian, though not in English, or nouns agree in number with determiners in English, though not in Basque. Though ultimately morphological, these differences may trigger salient

Parameter Theory and Parametric Comparison 395 syntactic consequences: such as that determinerless argument nominals are possible in English and Italian but not in Basque (also see Delfitto and Schroten 1991), and determinerless argument substantivized adjectives are impossible in English. (5e) defines whether some bundle of universal meaning features is always null in the lexicon of a certain language: for example wh-operators in comparative clauses seem to be null in English, overt in Italian. Similarly, English appears to have a null version of the complementizer that (in both declaratives and relatives) which is unknown in French (or German). Recent work by Kayne (2005b) has made several inspiring proposals in this sense. (5f) is inspired by work by Biberauer and Richards (2006). If pied-piping is allowed in a specific construction, then ideally other bounding conditions should establish if it occurs optionally or obligatorily (probably with a general marked status of optional pied-piping). As for (5g) and its specification (5h), Gianollo, Guardiano, and Longobardi (2008:120) suggested that: the other potential candidate for schema status is represented by lexico-syntactic parametrization regarding the encoding of some universally definable features –say, [+pronominal], [+anaphoric], [+variable], [+definite], [+deictic] and so on –in different categories. [. . .] This latter schema was [. . .] used by Sportiche (1986), to account for the peculiarities of Japanese zibun and kare as opposed to English anaphors and pronouns.

Sportiche (1986) suggested that different languages may distribute certain valued features on different bundles of other valued features (basically, the feature +Bound Variable seems associated also with –Anaphoric, +Pronominal in English, but only with +Anaphoric, −Pronominal in Japanese). (5i) asks which features from our encyclopedia, apart from the grammaticalized ones which will be obligatory in defined contexts, can be expressed by a functional, closed- class, bound, or free morpheme in a given language, whether or not it has other consequences, such as probing. The role of a parameter schemata model becomes evident once a full and coherent subsystem of parameters is extensively investigated: in fact, the theoretical formulation and the empirical validation of the DP-parameters in Figure 16.1 enables us to investigate the functioning of the parameter schemata system on a relatively stable and exhaustive basis. As a matter of fact, up to 68 out of the 75 parameters in Figure 16.1 seem reducible to the schemata presented (seven are still uncertain in status and may derive from segmental or prosodic phonological properties). For example, the already mentioned parameter 11 (DGR) is an instance of type (6a): it asks whether a certain functional feature is grammaticalized or not. Parameter 15 (DCN) belongs to type (6b), because the (definite) article phonologically probes a [+N] head. Parameter 13 (NSD) belongs to type (6c), because it asks about the strength of the referential feature person, hosted in the D category. Parameter 16 (DOR) asks whether introducers of relative clauses must overtly agree in definiteness with the head noun, namely whether the functional feature

396 Cristina Guardiano and Giuseppe Longobardi definiteness is spread on the category ‘relative clauses’; thus, it instantiates a case of type (6d). An instance of type (6e) is given by parameter 25 (DNN), which governs the possibility of a phonologically null head. Parameter 42 (NPP) belongs to type (6f); it defines a particular type of overt noun raising: N successive-cyclically moves to higher functional layers of the DP by pied-piping at each step all the material it has come to precede at the previous one. Parameter 74 (PDC) asks whether the functional feature definiteness is associated to the meaning feature ‘possessives,’ thus representing an instance of type (6g). While there are no instances of type (6h) among the 75 parameters of Figure 16.1, the last type, (6i) is instantiated for instance by parameters 51 to 54, which define the availability of functional projections and morphemes in nominal arguments. This way, parameter schemata derive actual parameters, which can be literally constructed out of functional features, lexical categories, and parameter schemata, and set under standard assumptions. In a format like Figure 16.1, which ideally proposes a complete parametric representation of the cross-linguistic structure of DPs, we expect to find different subsystems, each associated to the syntactic impact of the features encoded in the DP-structure cross-linguistically, and each reflecting, for the relevant feature, the possibility of its grammaticalization, the inventory of lexical categories that can check it, the inventory of lexical items that may host a copy of it, the visibility of the mechanism fulfilling the checking requirement, and the size of the moved element. Yet, in fact, Figure 16.1 is far from instantiating all the predictable possibilities. This may be due in part to the particular sample of languages, which may accidentally fail to exhibit all possible variation of that domain; but other gaps may have deeper causes: it is not necessary for all parameter schemata to be realized for every possible functional feature and all potentially relevant categories. This follows from two factors. First, schemata may, again logically or empirically, predict interactions of specific parameters with each other: for instance, a possible interesting empirical hypothesis is that grammaticalization of a feature (schema (5a)) is a precondition for the instantiation of some other schemata with respect to the same feature. Logically, value + for a parameter of schema (5b), ± F-checking X, is a precondition for the corresponding parameter of schema (5d), ± strong F, and value + for a parameter of schema (5d), is in turn, a precondition for its (5e) correspondent, ± minimal size of X. At least a substantial part of the redundancy caused by partial interaction among specific parameters is then simply deducible from the more general and abstract interaction among the relevant schemata to which parameters belong. Second, there might be specific principles of UG altogether forbidding variation of an a priori admitted format for particular combinations of features and categories: for example, if it is really the case that in all languages sentences have subjects, then T(ense) must have a feature that is universally grammaticalized (and perhaps strong). The approach sketched may have consequences for the conception of UG itself: it becomes unnecessary to suppose that the initial state of the mind comprises highly specific parameters. Rather, it may just consist of a more restricted number of parameter schemata, which combine with the appropriate elements of the lexicon (features and categories) under the relevant triggers in the primary data to yield both the necessary

Parameter Theory and Parametric Comparison 397 parameters (formulate the questions) and set their values (provide the answers) for each language (Longobardi 2016): (7) Principles & Schemata model: UG = principles and parameter schemata. Parameter schemata at the initial state of language acquisition S0, closed parameters at the final/steady state of language acquisition SS. The apparently huge number of possible core parameters depends, in principle, on the more limited numbers of functional features F and of lexical categories X, Y, combined with the tiny class of parameter schemata. The Principles & Schemata model determines the possibility of huge arithmetical simplification in the primitive axioms of the theory of grammatical variation: exactly as parameters were adopted as cross-constructional generalizations, significantly reducing the number of atomic points of variation, parameter schemata, in the intended sense, are more abstract, cross-parametric entities, allowing further reduction of the primitives of variation.

16.9 Conclusion and Perspectives Parametric approaches to syntactic variation are being systematically implemented and can be tested through the PCM. So far, they have been successful from the viewpoint of wide cross-linguistic descriptive adequacy and of historical adequacy; furthermore, they lend themselves well to reduction to more minimalist concepts, like a system of parameter schemata, thus promisingly moving toward a level of evolutionary adequacy. Parametric theories propose a very abstract model of the relationship between invariance and polymorphy of grammars. They suppose that UG comprises at least: (8) a. a limited set of variation schemata b. a set of principles further constraining such variation, divided into: i. absolute ii. general implicational iii. particular implicational Absolute principles restrict the scope of variation, as in the case of sentential subjects recalled in 16.8; general implicational principles govern the relations among schemata, and particular implicational principles the relations among specific parameters. Figure 16.1 formalizes a vast number of such implications. Surface polymorphy of human syntax is also very heavily influenced by the interactions of parameter manifestations with each other, often between parameters of different size (i.e., from macro-to nanoparameters in Roberts’ 2012 sense; see also chapter 14): as a result, in some cases, the surface manifestations of the same value of

398 Cristina Guardiano and Giuseppe Longobardi the same parameter in different languages may even be disjoint sets, if defined in terms of linear orders of supposed exactly matching categories (this seems to be the case with respect to, say, value + at parameter 13, NSD between Romance and Basque); and, as noticed, the same linear pattern may even be the manifestation of the opposite values of the same parameter in different languages. Thus, the actual connection between the nontrivial set of UG structures referred to in (8b) and observable linguistic constructions is necessarily mediated by a long deductive chain. Through the parametric structure of variation any axiom of the theory of grammar becomes ‘proteiform’ on the surface, in the sense of its ‘physical’ manifestations being relativized to the whole set of values of the specific language. Therefore, owing to this implicational structure, it is possible for the human faculty of language to possess a number of invariant properties (conceivably all the implications notated in Figure 16.1 are universal as implicational principles), though it is hardly the case that they emerge in the data ‘in the direct sense that all languages exhibit (Evans and Levinson 2009:429) the same visible manifestations for them.

Pa rt V

W I DE R I S SU E S

Chapter 17

A Null Theory of C re ol e F ormation Base d on Universal G ra mma r Enoch O. Aboh and Michel DeGraff

Creole languages in the Caribbean are among the outcomes of peculiar historical processes linking Europe, Africa, and the Americas: these languages are the linguistic side effects of global economies based on the forced migration and labor of enslaved Africans toiling in European colonies in the Americas.1 Because the postulated processes of ‘Creole formation’ are most controversial (perhaps even more so than Universal Grammar), section 17.1 addresses terminological and methodological preliminaries. After a brief historical survey of early Creole studies, we revisit some of the initial definitions of ‘Creoles’ in order to highlight the various biases that these definitions may have introduced into linguistics from the start. Many of these biases go against the spirit of Universal Grammar (UG). The claims to be overviewed in this section will show the persistence of certain mistaken tropes in Creole studies. These tropes force a certain degree of polemics in any state-of-the-art survey of the field, especially a survey like ours where some of the basic foundations of UG are confronted with theoretical claims with antecedents and correlates in anti-universalist views on language variation. 1

The names of the co-authors are listed in alphabetical order. This chapter is the outcome of ongoing and long-term collaboration. We bear equal responsibility for the strengths as well as shortcomings of the chapter. Part of this research was supported by Aboh’s 2011–2012 fellowship at the Netherlands Institute for Advanced Study in the Humanities and Social Sciences. We are grateful to the editor for inviting us to contribute to this handbook, and we are deeply indebted to Trevor Bass, Bob Berwick, Noam Chomsky, Morris Halle, and Salikoko Mufwene for most thorough and constructive discussions. Salikoko deserves a special medal of friendship for his support, intellectual and otherwise, throughout our long ongoing task of understanding what Creole languages alongside their source languages and other sorts of comparative data can teach us about language contact, language acquisition, and language change and how all of this relates to the human language capacity.

402 Enoch O. Aboh and Michel DeGraff These polemics will take us to section 17.2, where we evaluate the hypotheses introduced in section 17.1, with a focus on recurrent claims about the relative lack of grammatical complexity in Creoles and on various attempts at establishing an exceptional ‘Creole typology’ that lies outside the scope of the comparative method in historical linguistics. In other words, Creoles are claimed as a type of language consisting of ‘orphans’ that dwell outside the family-tree model of language change (Taylor 1956; Thomason and Kaufman 1988; Bakker et al. 2011; etc.). Section 17.3 offers the sketch of a framework, for what we call a Null Theory of Creole Formation (NTC).2 This null theory does away with any sui generis stipulation that applies only to Creole languages. Instead it is rooted in basic assumptions and findings about UG that apply to all languages. Section 17.4 concludes the chapter with some open-ended questions for future research on the place of Creole formation within larger patterns of contact-induced language change with both children and adults engaged in language acquisition viewed as a UG-constrained (re)construction process with, as input, socio-historically contingent Primary Linguistic Data (PLD).

17.1 Terminological and Methodological Preliminaries from a Historical Perspective 17.1.1 Basic Caveats: What’s in the Name? Let us first clarify our objects of study and their label. In this chapter, we use the phrase Creole languages as an ostensive label to refer to a set of languages extensionally defined, keeping in mind Mufwene’s (2008:40–58) caveats to the effect that creolization should be taken as a socio-historical, and not a linguistic, concept. Our main objects of study in this chapter come from the set of classic Creoles: the Creole languages of the Caribbean (see DeGraff 2009). This well-circumscribed and uncontested set of Creole languages will suffice to make the points we need to make, especially in light of our cautious epistemological stance whereby ‘we should not expect any specific sociohistorical or structural claim about any subset of languages known as “Creoles” (e.g., Caribbean Creoles or French-based Creoles) to be straightforwardly extrapolated to all other languages known as ‘Creole’ across time and across space’ (DeGraff 2009:894).3 With these caveats 2 This ‘null theory’ label for our framework was suggested by Beatrice Santorini (p.c., July 2009) with reference to Guglielmo Cinque’s (1993) ‘null theory of phrase and compound stress.’ Cinque’s ‘null theory’ dispenses with language-specific provisos for stress. Similarly, and as pointed out by Santorini, our views about Creole formation make superflous any Creole-specific proviso for Creole formation. 3 As in DeGraff (2009:894), we ‘consider it a fallacy to a priori expect any specific structural or developmental feature of a given Creole (e.g., Haitian Creole in the Caribbean) to necessarily have an

A Null Theory of Creole Formation Based on Universal Grammar 403 in mind, we use data from Haitian Creole (HC) to make our case against various claims about Creole languages as a class with stipulated pan-Creole structural characteristics. From its genesis onward, the notion Creole in linguistics and related fields (e.g., ethnography, anthropology, and cultural studies) has been shrouded in a mist of terminological and theoretical confusion (Chaudenson and Mufwene 2001; Stewart 2007; Roberts 2008). Our hunch is that this confusion is partly rooted in the fact that the concept Creole, arguably from the Portuguese crioulo and Spanish criollo (from criar ‘to raise, to breed’ in Spanish and Portuguese), first emerged in the 16th century, not as a linguistic term, but as a geopolitically-rooted classificatory label that acquired ethnographic significance in the midst of European imperialism in the Americas, especially Latin America (for extensive discussion, see Mufwene 1997; Chaudenson and Mufwene 2001; Palmié 2006; Stewart 2007; Roberts 2008). The term Creole first applied to biological entities, namely flora, fauna, and humans, that were ‘raised’ in the then-recently discovered ‘New World’ though their ancestors were from the ‘Old World.’ This ‘New World,’ though new to the Europeans, was, of course, not new to the indigenous Amerindians who inhabited it prior to Columbus’s arrival. But this Caribbean world did become ‘new’ after the European colonists who laid claim to it eliminated, through disease and warfare, much of the Amerindian population there, and then brought in indentured workers from Europe and enslaved Africans as laborers to turn their New World colonies into settlements that produced immense wealth for Europe.4 These enslaved laborers brought with them a wide range of typologically diverse African languages, mostly from the Niger-Congo area. The European settlers also spoke a variety of languages, even when they pledged allegiance to a single flag. It is in this milieu of conquest, global economy, and language contact that new languages emerged that were subsequently labelled as Creoles. These new varieties were then enlisted as instruments of that conquest and global economy, both through their uses as linguae francae and through their descriptions by European scholars whose prestige and funding relied, by and large, on the forced labor—and ultimately the dehumanization—of Creole speakers. Consider Saint-Domingue (the colonial name of Haiti). There the French settlers spoke a range of French dialects including patois varieties from Normandy, Picardy, Saintonge, Poitou, Anjou, and so forth (Alleyne 1969; Brasseur 1986; Fattier 1998; analogue in some other Creole (e.g., Reunionese Creole in the Indian Ocean) “simply” because both languages have been called “Creole”.’ This is in keeping with Mufwene’s (2001:138, 2004:460) notion that ‘Creole’ should be considered, at best, a socio-historical label with blurry boundaries. Consider, for example, the fact (to be discussed in the main text) that the term originated, not with language, but with people (e.g., the ‘Creole’ people of the Caribbean) and other living species (‘Creole’ cows and ‘Creole’ rice; i.e., varieties of cows and rice that are indigenous to the New World). Then again, there are ‘Creole people’ (e.g., in Cuba) who never spoke any language called ‘Creole.’ (See Mufwene 1997, Palmié 1996, Mufwene 1997, Roberts 2008 for thorough discussions.) 4 After slavery was abolished, certain plantation owners turned to Asia and focused also on some parts of Africa, notably Nigeria and the collapsing Kongo Kingdom, from where they could import contract laborers. Because we focus on the early years of Creole formation, we do not consider the effect of these latecomers on the emergent language.

404 Enoch O. Aboh and Michel DeGraff Chaudenson and Mufwene 2001). It is in this context that new speech varieties were created that were perceived as related, but distinct from and inferior to, the French spoken by French settlers (Girod-Chantrans 1785; Moreau de Saint-Méry 1797; Ducœurjoly 1802; Descourtilz 1809). These new varieties were referred to as ‘Creole,’ on a par with other (non-indigenous) colonial phenomena (e.g., ‘Creole’ cows and ‘Creole’ rice as in note 3) that were perceived as distinct from their counterparts in Europe or Africa. These new ‘Creole’ varieties became associated with, often as an emblem, the Creole people (i.e., people born in Saint-Domingue, with non-indigenous parents—that is, with parents from Europe or Africa; but see note 4). Moreau de Saint Méry, for example, made it clear that the most fluent Creole is spoken by the Creole people. But we’re getting ahead of ourselves. So let’s first dwell on the societal uses of the term Creole since these uses preceded the strictly linguistic ones.

17.1.2 A Brief History of the Label ‘Creole’ Let’s first draw attention to the resemblance between, on the one hand, the conquest and language-contact milieu of the colonial Caribbean, which gave rise to Creole languages, and, on the other hand, the analogous milieu in the Roman Empire, which gave rise to the Romance languages as non-Roman tribes in various parts of Europe shifted to varieties of Latin. Such similarity will be important to keep in mind throughout this chapter. For now, there’s one basic ethnographic fact to highlight as we discuss the foundations of Creole studies: Creole people in the Caribbean were distinguished both from the indigenous inhabitants (i.e., Ameridians) and from the then relatively new arrivals from Europe and Africa. In the Caribbean, the term Creole subsequently evolved to encode various social biases related to now outdated notions of racial hierarchy contrasting Europeans to non-Europeans. This racial hierarchy is most clearly articulated in Moreau de Saint-Méry’s (1797) description of Saint-Domingue, where the author states that ‘for all tasks, it is the Creole slaves that are preferred; their worth is always a quarter more than that of the Africans’ (1797:40). Saint-Méry (1797) further argues that Creole blacks ‘are born with physical and moral qualities that truly give them the right to be superior over Blacks that have been brought from Africa’; ‘domesticity has embellished the [Black] species’ (Moreau de Saint-Méry 1797:39). For Saint-Méry, like for many observers since then, the gold standards for humanity, cultures, languages, and so forth, are dictated by race-and class- based hierarchies—the same hierarchies that motivated Europe’s mission civilisatrice in Africa and the Americas. Thus, from its very first ethnographic usage, the term Creole already had an exceptionalist flavor attached to it. This exceptionalist flavor was carried along to the linguistic realm when the term was applied to the new speech varieties emblematic of the recently created communities in Caribbean colonies. These speech varieties were eventually attributed structural or developmental characteristics that were perceived as sui generis (this is the core thesis of ‘Creole Exceptionalism’). In the colonial era,

A Null Theory of Creole Formation Based on Universal Grammar 405 the often explicit goal was to fit Creole languages into linguistic categories consistent with the race-related assumptions that prevailed during the Creole-formation period and were also used to justify the enslavement of Africans. The writings of Saint-Méry and of many other scholars of his and later periods mistakenly suggest that Creole languages lie somewhere between the language of civilisation spoken by the colonists and the primitive tongues spoken by enslaved Africans in the colony (see DeGraff 2005a for an overview).

17.1.3 Racial Hierarchies and Linguistic Structure in Creole Studies One central factor in the early debate on the formation of Caribbean Creoles is related to the Europeans’ assumption about the Africans’ cognitive ability to acquire European languages. The numerically most important group of adults engaged in the acquisition of European languages in the colonial mileu was the Africans. In a worldview where languages were used to measure the intellectual and moral advancement of nations, the speech varieties of the enslaved Africans had to be ranked as inferior to those of the European colonists. Often this inferiority was explicitly theorized on a racial basis with African minds considered primitive and European minds advanced, as in this definition by Julien Vinson in the 1889 Dictionnaire des Sciences Anthropologiques: Creole languages result from the adaptation of a language, especially some Indo- European language, to the (so to speak) phonetic and grammatical genius of a race that is linguistically inferior. The resulting language is composite, truly mixed in its vocabulary, but its grammar remains essentially Indo-European, albeit extremely simplified. (Vinson 1889:345–346)

Note here the claim about ‘extreme simplification’ which, in various guises, has dogged Creole studies from its inception onward through the writings of linguists such as Antoine Meillet, Otto Jespersen, Leonard Bloomfield, Louis Hjelmslev, Albert Valdman, Derek Bickerton, and Pieter Seuren (see Degraff 2001a,b, 2005a, 2009 for overviews). To this day, many linguists and other scholars from various fields still assume that Creoles are at the bottom of variously defined hierarchies of structural complexity (see e.g., Bakker et al. 2011; McWhorter 2011; Hurford 2011). This view is now reified in linguistics textbooks as well, where it is sometimes taken to an extreme as in Dixon’s (2010:21) claim that ‘… of the well-documented creoles, none equals the complexity … of a non- creole language.’ Contrary to the view just quoted that Creole languages are ‘essentially Indo- European,’ we find, among European scholars of the same period, the view that Creole languages are peculiar ‘hybrids’ of European and African languages. From this perspective as well, Creoles are not only structurally simpler than European languages, but also, as noted by Mufwene (2008), bad or unfitting, according to the ideology of race and

406 Enoch O. Aboh and Michel DeGraff language purity that prevailed in the 19th century. Hybrids were then considered maladaptive compared to pure species. One oft-quoted exponent of this view is Frenchman Lucien Adam, for whom the French-derived Creoles of Guyane and Trinidad were ‘Negro-Aryan dialects’ created by Blacks from West Africa who ‘took French words [even as they] conserved, as much as possible, the phonetics and grammar of their mother tongues’ (1883:5). In Adam’s scenario, the Africans cannot reproduce the grammatical properties of the European target languages: the latter are too complex for the primitive minds of the African learners who can only replicate the words of the European language. Adam was in the avant-garde of ‘biolinguistics’ in a loose metaphorical sense: he framed his race-based research project in an explicitly biological perspective, namely Hybridologie Linguistique. In this perspective, languages, like plants, hybridize. In the case of languages, the structural results of hybridization are bounded by the least complex languages due to the lower cognitive capacities of their speakers, namely speakers of African languages. Adam is an early proponent of the still current ‘substratist’ view, as instantiated, for example, in the Relexification Hypothesis, according to which the Atlantic Creole languages (i.e., those that emerged around the Atlantic Ocean) embody Niger-Congo grammars spelled out with morphemes whose forms are derived from Romance or Germanic languages. Though contemporary linguists do not adhere to Adam’s racial biases, many substratist theories promote Creole-formation scenarios similar to his. A case in point is Suzanne Sylvain who, though her book documented influences from both French and African languages in the formation of Haitian Creole, concluded her description with the famous description of the language as ‘French cast in the mold of African syntax or … an Ewe tongue with a French lexicon’ (1936:178). Similarly Lefebvre’s (1998) reformulation of Muysken’s (1981) relexification hypothesis suggests that Haitian Creole is constituted of Gbe grammar relexified with French-derived phonetic strings (see DeGraff 2002 for a critique). In another set of popular proposals with intellectual antecedents in the 19th century, Creole languages are considered ab ovo linguistic creations that offer exceptional windows on the prehistoric foundations of language in the human species. Certain aspects of this line of argument go at least as far back as the 1872 book by Alfred and Auguste de Saint-Quentin on the Creole of Guyane—also in a ‘biolinguistic’ perspective that postulates a minimum of cognitive capacities and cultural characteristics among the creators of Creole languages: [Creole] is, therefore, a spontaneous, hasty and unconscious product of the human mind, freed from any kind of intellectual culture. For this reason only, it would be remarkable to find in this language anything but a confused collection of deformed French phrases. But when one studies its structure, one is so very surprised, so very charmed by its rigor and simplicity that one wonders if the creative genius of the most knowledgeable linguists would have been able to give birth to anything that so completely reaches its goal, that imposes so little strain on memory and that calls for so little effort from those with limited intelligence. An in-depth analysis has convinced me of something that seems paradoxical: namely, if one wanted to create from

A Null Theory of Creole Formation Based on Universal Grammar 407 complete scratch an all-purpose language that would allow, after only a few days of study, a clear and consistent exchange of simple ideas, one would not be able to adopt more logical and more productive structures than those found in Creole syntax. (Saint-Quentin 1872:lviii–lix)

Unlike Vinson’s and Adam’s views sketched in this section, this ab ovo perspective on Creole formation, especially in some of its contemporary instantiations, draws a sharp line between the genealogy of Creole languages and that of Indo-European and African languages. The most extreme implementation of this hypothesis can be found in Derek Bickerton’s Language Bioprogram Hyothesis, which takes Creole formation to resemble the initial evolutionary steps of language in the human species, especially in the putative catastrophic evolution from the Pidgin (qua ‘protolanguage’) to the Creole stage under the agency of children exposed to extraordinarily impoverished PLD. The three 19th-century views sketched here, with illustrative quotes from SaintQuentin (1872), Adam (1883), and Vinson (1889), all make specific ‘biolinguistic’ claims about Creole languages as extraordinarily simple languages—much simpler than the European languages from which the Creoles selected their lexica. This idea of linguistic structural simplicity associated with the alleged cognitive limitations of Creole speakers runs through the gamut of pre-20th century Creole studies. As described in DeGraff (2005a), this peculiar exceptionalist mode of thinking about languages and their speakers was part and parcel of pre-20th century ‘normal’ scholarship (‘normal’ in the sense of Kuhn 1970:10–34). Given the title of this Handbook of Universal Grammar (UG), these claims are incompatible with a theory of UG that leaves no room for grammatical distinctions to be rooted in alleged racial charactistics. UG is truly ‘universal’ in the sense that it entertains basic ingredients and operations (e.g., abstract grammatical features and Merge) and constraints (e.g., structure dependence) that apply to all human languages notwithstanding their history of formation, the race of their speakers, and so on. The data and observations in this chapter will further invalidate Creole Exceptionalism claims.

17.2 A Primer against Creole Exceptionalism The belief that Creole languages manifest the most extreme structural simplicity is often related to, among other things, an alleged ‘break in transmission’ due to the emergence of a structurally reduced Pidgin spoken as lingua franca, immediately prior to their formation.5 This Pidgin would constitute a bottleneck for the transmission of complex 5

This communal sense of ‘Pidgin’ qua lingua franca is distinct from the use of ‘Pidgin’ in DeGraff (1999a,b, 2009) where the label is used with an individual and internal sense as a cover term for the

408 Enoch O. Aboh and Michel DeGraff structures from the languages in contact. The first Creole speakers are assumed to have been the first children exposed to the Pidgin in the course of language acquisition. These children would have created the grammars of their native languages based on the Pidgin input, which allegedly explains the drastic simplicity of the emergent Creole structures (see Bloomfield 1933:472–474; Hall 1962; Bickerton 1981, 1984, 1988, 1990, 1999, 2008; and others). This is the ‘Pidgin-to-Creole life-cycle’ that is found in most contemporary introductory linguistics textbooks (see, e.g., O’Grady et al. 2010:503–504). Related to this ‘break in transmission’ claim is the assumption that Creole languages do not bear any genealogical affiliation with any prior languages. In other words, the results of Creole formation are language varieties that are outside the branches of well- established language families. Creoles are not even genealogically affiliated to any of the languages whose contact triggered their emergence. The contemporary locus classicus for this claim is Thomason and Kaufman (1988), where Creole languages are considered fundamentally distinct from non-Creole languages to the extent that Creoles are strictly outside the purview of the comparative method. For Thomason and Kaufman and for many other linguists, Creoles are taken as languages without genealogical affiliation due to their ‘abrupt formation.’ Given the history and evidence to be overviewed in this chapter, the popularity of this exclusionary approach to Creole languages makes ‘Creole Exceptionalism … a set of sociohistorically rooted dogmas with foundations in (neo-) colonial power relations’ in modern linguistics (DeGraff 2005a:576).

17.2.1 Some Historical Background and Preliminary Data As early as 1665 in the French Caribbean colonies of Martinique, Guadeloupe, and Marie-Galante, the Jesuit missionary Pierre Pelleprat was already comparing the verbal systems of Caribbean French-lexicon Creoles with that of French and making quotable comments about Creole structures, comments that are still rehashed by 21st-century creolists. Pelleprat’s attention was drawn to the apparent simplicity of the Creole verbal system which he attributed to the enslaved Africans’ failure to learn French: We wait until they learn French before we start evangelizing them. It is French that they try to learn as soon as they can, in order to communicate with their masters, on whom they depend for all their needs. We adapt ourselves to their mode of speaking. They generally use the infinitive form of the verb [instead of the inflected forms— EA, MdG] … adding a word to indicate the future or the past.… With this way of speaking, we make them understand all that we teach them. This is the method we use at the beginning of our teaching … Death won’t care to wait until they learn French. (Pelleprat 1655 [1965, 30–31], our translation) (early) interlanguages of adult learners in the language-contact setting of Creole formation. In the framework that we sketch in this chapter, such interlanguages, at various stages of development (from early to advanced), do play a key role in Creole formation and in all other instances of contact-induced language change (see sections 17.3 and 17.4).

A Null Theory of Creole Formation Based on Universal Grammar 409 Pelleprat’s observations offer some insights about the ways in which African adult learners in the colonial Caribbean may have reanalyzed certain verbal patterns from 17th-century French according to general and now well-documented strategies of second-language acquisition (e.g., non-retention of inflectional morphology, preference for analytical verbal periphrases over synthetic constructions for the expression of tense, mood, aspect, etc., as discussed later in this section). But Pelleprat’s and his colleagues’ 17th-century thinking was not about universal strategies of language acquisition. It was rooted in the belief that Blacks ‘lacked intelligence and were slow learners, thus required lots of patience and work from their teachers’ (Pelleprat 1665:56). Such thinking was imbued with a mission civilisatrice (e.g., being enslaved by the French was then described as preferrable to ‘enslavement by Satan,’ that is, slavery was lauded as a means for the Africans to enjoy ‘the freedom given to God’s children,’ Pelleprat 1665:56). Therefore, it is no surprise that Creole verbal patterns were then considered as reflexes of the Africans’ inferior humanity, even though similar patterns (e.g., preference for periphrastic constructions with invariant verbal forms) are also found in popular varieties of French as described in Frei (1929) and Gougenheim (1929). No effort was made, back then, to analyze Creole grammars as autonomous systems with their own internal logic, some of which was influenced both by French varieties and by the Africans’ native languages. In contemporary terms, Pelleprat’s view would be translated as the claim that Creole speech was a manifestation of early stages in second language acquisition—this view is found most recently in Plag (2008a,b) where Creoles are described as ‘conventionalized interlanguages of an early stage.’ In the particular case of the Creole verbal system that attracted Pelleprat’s attention, he didn’t notice that some of the words that ‘indicate the future or the past’ are actually derived, in both distribution and interpretation, from vernacular French periphrastic verbal constructions of the 17th and 18th centuries. For example, Haitain Creole (henceforth HC) te for anterior marking as in Mwen te rive anvan ou ‘I had arrived before you (sg)’ is derived from forms for the imperfect of the French copula be such as était and étais as in J’étais arrivé avant vous ‘I had arrived before you (sg).’ Similarly HC ap for progressive marking as in Mwen t(e) ap danse ‘I was dancing’ is derived from the French preposition après as in J’étais après danser ‘I was dancing.’ As it turns out, verbal periphrastic constructions in French, which are very common in spoken French (Gougenheim 1929; Frei 1929), often employ forms that are either non- inflected (e.g., the infinitive as in J’étais après danser) or less inflected (e.g., the participle as in J’étais arrivé). For most French verbs (i.e., the French verbs with infinitives ending in -er such as chanter ‘to sing’), both the infinitive and the past participle end with a suffix pronounced /e/, written as -er for the infinitive and -é(e)(s) for the participle. In written French, the participle of verbs in -er shows gender agreement (-é for masculine and -ée for feminine) and number agreement (with a word-final -s for the plural) but these gender and plural orthographic markings usually have no reflex in spoken French. It is thus that the suffix -e has become a general verbal suffix in HC via reanalysis in second language acquisition of the sort already adumbrated in Pelleprat (1665) (see DeGraff 2005b for further details and references). These reanalysis patterns, based on the French

410 Enoch O. Aboh and Michel DeGraff infinitival and participial forms, will play a major role in our account of some fascinating properties of HC clausal syntax in section 17.3. Other HC preverbal markers include: (i) the irrealis (a)va from forms of the French verb aller ‘to go’ (e.g., vas in the 2sg present indicative or va in the 3sg present indicative); (ii) the completive marker fin(i) from forms of the French verb finir ‘to finish’; (iii) the marker of recent past sòt from forms of the Fernch verb sortir ‘to leave’; (iv) the modal marker dwe from forms of the French verb devoir ‘to owe’; and others. (see DeGraff 2005b, 2007 for further details). All of these preverbal markers for Tense, Mood, and Aspect (TMA), which belong to the grammatical layers of the VP domain in HC syntax, have straightforward etyma in French morphemes in periphrastic verbal constructions whose meanings often overlap with those of the corresponding TMA+V combinations in HC—thus providing additional evidence for the genealogy of HC as a descendent of French, according to the comparative method. Furthermore, these TMA markers in preverbal positions in HC are subject to complex combinatorics. If we consider only the three TMA markers te, ap, and (a)va, we already get eight possible combinations with distinct semantics for each, and HC has at least a dozen such auxiliary-like elements with complex distributions, co-occurrence restrictions, and semantic specifications (see Magloire-Holly 1981, Koopman and Lefebvre 1982, Fattier 1998, 2003, Sterlin 1999, Howe 2000, DeGraff 2005b 2007, Fon Sing 2010 for data samples and diachronic and synchronic analyses). Consider, say, the following word-order/interpretive correlations involving the tense marker te and the modal dwe: Jinyò te dwe vini ‘Jinyò was obliged to come’ (literally: Jinyò ant must come) vs. Jinyò dwe te vini ‘It is likely that Jinyò had come’ (literally: Jinyò must ant come). Like HC dwe, the French verb devoir, which is the etymon of HC dwe, is also ambiguous between an epistemic and a deontic interpretation when used in verbal periphrases, on a par with English must, to wit: Jean doit être là ‘John must be there,’ which can be either deontic or epistemic. But it is not a straightforward matter to decide, without any theoretical analysis, whether the syntax–semantics interaction in HC constructions with dwe, where the verb is not inflected, is any less complex than its analogs with French devoir where the verb is inflected. In 1665, Pelleprat’s remarks on a French-based Creole verbal system focused on the ‘general use of the infinitive form.’ Pelleprat found this pattern lacking in comparison to French. His approach exemplifies a more general trend in Creole studies where isolated aspects of Creole languages are claimed as ‘simple’ independently of their internal workings as part of a larger complex system and independently of their apparent analogs with historically related languages (see DeGraff 2001a,b, 2005, 2009 for surveys of other examples of this approach). In addition, the use of main verbs alongside preverbal TMA markers is, to some degree, analogous to the verbal system in the Gbe languages spoken by many of the Africans in Saint-Domingue during the formation of HC. Given this contact situation, the patterns described by Pelleprat can be better understood as general learning strategies in L2 acquisition (e.g., the role of nonnative acquisition in the diachronic emergence of new English varieties and the rise in the 16th century of non-inflected English modals

A Null Theory of Creole Formation Based on Universal Grammar 411 from previously inflected main verbs). In section 17.3, we will revisit such structural and socio-historical analogs between Creole and non-Creole formation as we clarify basic methodological issues.

17.2.2 Empirical Issues with ‘Simplicity’ and ‘Creole Typology’ Claims Pelleprat’s early focus on a Creole verbal system is all the more striking given that this empirical domain of inquiry has led in the 20th and 21st centuries to controversial claims about a Creole typology. In what may be the most famous such claim, the TMA markers in Creole languages, among other features, are taken as a pan-Creole manifestation of a ‘Language Bioprogram’ that surfaces relatively intact when the learner’s linguistic environment is most extremely impoverished (Bickerton 1981, 1984). Bickerton postulated that Creole TMA’s distribution and semantics, which he took to be similar across Creoles, is the manifestation of a genetically wired ‘Language Bioprogram.’ More generally, the argument implies that Creole languages constitute a particular typology, one that generally does not show any structure whose acquisition requires exposure to positive evidence from the PLD. This scenario suggests that Creoles are in some sense pristine in comparison with other languages (Bickerton 1988:274). The most recent versions of this claim state that Creole languages as a class do not manifest complexity levels that surpass that of older, non-Creole languages (see, e.g., McWhorter 2001, 2011; Bakker et al. 2011; and Hurford 2012). But this claim is straightforwardly defeated by evidence from HC where noun phrases exhibit a complex set of syntactic and morpho-phonological properties that are absent in French and in the Gbe languages that participated in its formation: (i) a prenominal indefinite determiner and a postnominal definite determiner: yon chat ‘a cat’ vs. chat la ‘the cat’; (ii) a set of (at least) five allomorphs for the definite determiner: la, lan, a, an, nan. In addition, bare (i.e., determiner-less) noun phrases and noun phrases with the definite determiner manifest semantic options that are not attested in contemporary dialects of French and Gbe. (See Aboh and DeGraff 2014 for further details.) These characteristics are not outliers: DeGraff (2001b:284–285) produces a list of structural complexities that are found in Creoles, but that are not found in various non-Creoles. Likewise, Aboh and Smith (2009) and Aboh (2015) provide a variety of empirical and theoretical arguments highlighting ‘complex processes in new languages.’ One such process, which is analyzed in Aboh (2015), is agreement between the determiner and the complementizer of the relative clause modifying the noun as in Di fisi di mi tata kisi bigi ‘The (singular) fish that (singular) my father caught is big’ vs. Dee fisi dee mi tata kisi bigi ‘The (plural) fish that (plural) my father caught are big.’ Such agreement processes, which are found in non-Creole languages such as Dutch (e.g., Booij 2005:108–109), directly contradict Plag’s (2008a,b) claim that, as ‘conventionalized early interlanguages,’ Creoles lack inter- phrasal information exchange.

412 Enoch O. Aboh and Michel DeGraff Proponents of the Pidgin-to-Creole Life Cycle do not usually describe any Pidgin source for Caribbean Creoles. Yet, one occasionally finds descriptions such as the following in Bickerton (2008:216) for Hawaiian Pidgin as spoken in 1887. (1) Mi ko kaonu polo Kukuihale, kaukau bia mi nuinui sahio Me go town large Kukuihale drink beer me plenty-plenty drunk ‘I went to the big town Kukuihale, drank beer, and got very drunk.’ In introducing this example, Bickerton (2008) remarks that ‘by 1887 many people in Hawaii were speaking a pidgin that mixed Hawaiian and English words indiscriminately.’ The Pidgin is further described as ‘word salad,’ ‘macaronic,’ without ‘any consistent grammatical structure,’ a ‘linguistic meltdown,’ ‘almost totally devoid of complex sentences,’ etc. (Bickerton 2008:217–218, 223). Yet, a cursory look at this sentence shows that the words there were not jumbled together as ‘word salad.’ First, the speaker has access to a coordination strategy that is similar to that in English, as is evident from the translation. Here the first clause, unlike the other two, starts with an overt subject. Second, though bare, the noun phrase kaonu polo Kukuihale ‘town large Kukuihale’ does not seem to come out of free concatenation: the noun is adjacent to its modifier large and the noun–modifier sequence precedes the proper name. This seems a systematic grouping of the type [[N–Modifier]–Proper Name]] or [N–Modifier]–[Proper Name]. A ‘macaronic’ sequence could have been one whereby the noun and its modifier seem arbitrarily separated such as in kaonu Kukuihale polo (town Kukuihale large), but this is not what this Pidgin speaker produces. In addition, it is important to realize that, in many languages, constituent structures in sequences like (1) are intimately related to prosody (absent from Bickerton’s description). Assuming the right prosody, one can get a similar sequence in English: (2) I went to a small town, Fort Valley, drank a whole lot of beer, and got very very drunk. So once we factor in prosody, the sequence in (1) appears to be on a par with the English example in (2) at least with respect to combinatorial possibilities. Unlike in English though, (1) displays noun–modifier order. But this is nothing exceptional given that such an ordering is commonly found cross-linguistically. This is for instance the case in Gungbe and most Kwa languages (cf. Aboh and Essegbey 2010). Consider the following example from Gungbe. (3) Ùn yì Djὲgàn ɖàxó. Ùn nù bíà dón, bò mú kpε ɖ́ ε ḱ pέɖε ́ 1sg go Djὲgàn ɖàxó 1sg drink beer there and drunk very ‘I went to Djὲgàn ɖàxó. I drank beer there and got very drunk.’ As the reader can see from the gloss, what is presented as ‘word salad’ in (1) is not only relatively close to English structure but shows a number of noteworthy similarities with Gungbe. Furthermore, both languages transform English beer into bia. This comparison

A Null Theory of Creole Formation Based on Universal Grammar 413 of (1) with patterns in English and Gungbe suggest that the word order and phonological changes attested in (1) are made available by UG. Furthermore, in the case of Hawaii, it can be argued that those shifting to English in the corresponding language contact setting also paid attention to its word order patterns.

17.2.3 Methodological Issues with ‘Simplicity’ and ‘Creole Typology’ Claims Creole simplicity has been argued largely on the basis of hypothetical pre-Creole Pidgins as lingua francas with drastically reduced and unstable structures. Yet there isn’t, to the best of our knowledge, any documentation of any such pre-Creole Pidgin in the history of the colonial Caribbean. The lack of evidence for pre-Creole Pidgins with ‘massive structural reduction’ is admitted by McWhorter (2011:30–31, 70); cf. Alleyne (1971), Chaudenson and Mufwene (2001), Bakker (2003:26) and Mufwene (2008:ch. 3) who then argues that it is ‘the linguistic facts [that] strongly suggest that Atlantic creoles arose as structurally reduced pidgin varieties.’ (2001:31). What are these linguistic facts? The discussion of Bickerton’s (2008) Pidgin example in (1) already suggests that diagnosing Creole simplicity on the basis of alleged Pidgins is not a straightforward task. Yet, McWhorter (2011:31–39) proposes four tell-tale signs for Creoles’ hypothetical Pidgin ancestry: (i) generalization of the infinitive; (ii) absence of copula; (iii) no case distinctions among pronouns; (iv) preverbal placement of the clausal negation marker. In the absence of theoretically-grounded definitions, these ‘Pidgin’ characteristics appear overly vague: how do we determine synchronically whether an ‘infinitive’ has been ‘generalized’? Is such generalization of the infinitive qualitatively different from the patterns observed for instance by Frei (1929) and Gougenheim (1929) where spoken varieties of French show a strong tendency toward invariant verbal forms? What are the morphosyntactic criteria for a ‘copula’? If we define ‘infinitive’ as a least inflected form of the verb, and ‘copula’ as an overt linking morpheme between subject and certain non-verbal predicates, Vietnamese as described in Dryer and Haspelmath (2011) would qualify as a quasi-Pidgin, notwithstanding its long history. And Vietnamese seems even more ‘Pidgin’-like than the earliest documented varieties of Creole in 18th-century Haiti (see later in this section). Even more problematic is the fact that the postulated criteria in (i)–(iv) are disconfirmed by data from Pidgins that have been documented outside the Caribbean (see e.g., Bakker 2003; Thomason 2007). For example, Kenya Pidgin Swahili, Pidgin Ojibwe, Taymir Pidgin Russian, and Fanakalo show tense inflectional morphology on verbs while Pidgin Ojibwe, Central Hiri Motu Pidgin, Arafundi- Enga Pidgin, and Lingala have morphological subject agreement (Bakker 2003:20–21). Bakker (2003:20) further remarks that ‘some cross-linguistically uncommon inflectional affixes can also be found [in Pidgins—EA, MdG], such as reciprocal, and negative past.’ Such language-contact patterns seem like reflexes of their specific ecology and are evidence against a cookie-cutter approach to Creole formation whereby all

414 Enoch O. Aboh and Michel DeGraff Pidgins must look alike. The classic Creoles of the Caribbean, having emerged from languages that are often at the low end of the cline of inflectional richness, are unsurprisingly at the low end of that cline as well—and with even fewer inflectional affixes than their source languages, given the well-documented effect of second language acquisition on inflectional paradigms. Be that as it may, the general cross-Pidgin typology contemplated by McWhorter has no empirical basis in the history of Caribbean Creoles. Based on the linguistic evidence from a sample of Pidgins worldwide and the observation that the morphology of Pidgin seems quite distinct from that of Creoles, Bakker (2003:24) considers the possibility that Creoles need not necessarily derive from Pidgins, noting in addition that ‘[t]‌here are no cases where we have adequate documentation of a (non-extended) pidgin and a creole in the same area’ (2003:26) (cf. Mufwene 2008:ch. 3 for the complementary geographical distribution of Pidgins and Creoles). And we certainly have no evidence for any Pidgin in the history of HC—a ‘radical’ Creole in the sense of Bickerton (1984) where radical Creoles are postulated to have emerged on the basis of radically reduced Pidgin input. On the contrary, whatever evidence we have about the earliest documented varieties of HC contradicts claims about a drastically reduced Pidgin as an essential ingredient in Creole formation. Consider, say, McWhorter’s claims for lack of case distinctions in Pidgins. We have documentation of 18th-century Creole varieties in Haiti that show robust case distinctions in pronouns such as nominative 1sg mo vs. accusative 1sg moé, and nominative 2sg to vs. accusative 2sg toué. Here are two examples from Ducœurjoly (1802:353) with the French translations given there: HC: To va bay moué nouvelles /French: Tu m’en diras des nouvelles ‘You(+sg) will give me news’ vs. HC: Mo te byen di toué/ Fr: Je te l’avais bien dit ‘I had told you well.’ Such case distinctions in early HC are also reported in Anonymous (1811), Sylvain (1936:62f), and in Goodman (1964:34–36). The latter shows that similar case distinctions also apply to other French-based Creoles such as Louisiana Creole and Mauritian Creole. These morphological distinctions have now disappeared in contemporary HC, thus suggesting that Creole varieties closer to French (so-called ‘acrolectal’ varieties) must have been more prevalent in the earlier stages of Creole formation, but were later replaced by varieties structurally more divergent from French (so-called basilectal varieties). This is consistent with observations about the history of other Caribbean Creoles as in Jamaica (Lalla and da Costa 1989) and Guyana (Bickerton 1996). (See note 8.) Saramaccan is another ‘radical’ Creole (actually, the most radical Creole according to Bickerton 1984:179 and a ‘prototypical’ one according to McWhorter 1998). Yet it too manifests case distinctions. In this case, we have a nominative vs. accusative opposition in the 3rd person singular: a ‘3sg nominative’ vs. en ‘3sg accusative’ (Bickerton 1984:180; Aboh 2006a:5). This contrast too is contrary to expectations based on McWhorter’s Pidgin criteria. In a related vein, Creoles with verbal paradigms that go beyond ‘a generalization of the infinitive’ are documented in Holm (2008) and Luís (2008) with data from Portuguese Creoles that manifest inflectional verbal suffixes.

A Null Theory of Creole Formation Based on Universal Grammar 415 Once Creoles are analyzed holistically, taking into account much more than the four isolated patterns arbitrarily chosen by McWhorter, it becomes doubtful that there ever was a structureless Pidgin in their history—especially one so reduced that it would have massively blocked the transmission of features from the languages in contact into the emergent Creole. Because the so-called Pidgins are human creations, we expect them to display structural properties that are made available by UG even if these properties may seem rare cross-linguistically. In this regard, the available literature on Pidgins provides a list of seemingly ‘exotic’ features, ‘exotic’ to the extent that they are lacking in many an ‘old’ language. These features include: evidential markers in Chinese Pidgin Russian, noun-class markers in Fanagalo, Kitúba, and Lingala, tense suffixes in Kitúba and Lingala, gender marking and agreement in the Mediterranean Lingua Franca, OSV and SOV word orders in Ndjuka Trio Pidgin, lexically and morphosyntactically contrasting tones in Nubi Arabic and Lingala, etc. (see DeGraff 2001b:250f for references). As Pidgins are second languages for the majority of their speakers, they are susceptible to structural transfers from their speakers’ native languages. In effect, such an observation entails that there is, a priori, no such thing as an essential ‘Pidgin’ or ‘Creole’ type of language: the structural profile of each Pidgin will, to some degree, reflect the contingent ecology of its formation, including the structures of the respective languages in contact. The evidence in Thomason (1997a) and Bakker (2003) from Pidgins with non- European ancestor languages illustrates the ways in which the specific native languages of Pidgin speakers, including certain cross-linguistically rare structural properties of said native languages, do influence the structural make-up of Pidgins. Bakker as well relates the structural profiles of Pidgins and Creoles to the respective ecology of each language contact situation. One corollary of these observations is that, be it called ‘Pidgin’ or ‘Creole’ or ‘language change,’ the eventual outcome of language acquisition in the context of language contact, carries along various properties from the languages in contact (Müller 1998; Hulk and Müller 2000; Müller and Hulk 2001; Notley, van der Linden, and Hulk 2007; Mufwene 2008:149–153; and references cited there). What specific properties are transferred to the new variety depends on a variety of factors: socio-historical such as population structure and dynamics, linguistic-structural such as typological variation and markedness among the languages in contact, and psycholinguistic such as saliency and transparency of available features. Given that the languages in contact are usually assumed to be ‘old’ languages, the outcome of language contact will inherit various features from these, and this is exactly what we see in comprehensive surveys of language contact phenomena such as those cited in DeGraff (2001:250–259). Such instances of feature transfer can thus induce various increments of local complexity in the outcome of language contact (Aboh 2006b, 2009; cf. DeGraff 2009:963n8).6

6 We stress the ‘local’ in ‘local complexity’ here (and, elsewhere, in ‘local simplicity’) to highlight our skepticism vis-à-vis various claims that certain languages can be in toto more (or less) complex than others.

416 Enoch O. Aboh and Michel DeGraff The available evidence about the complex ecology of language contact should also help lay to rest the now-popular ‘fossils of language’ scenarios (along the lines of Bickerton 1990:69–7 1, 181–185) that liken Pidgins to some hypothetical structureless protolanguage spoken by homo sapiens’ immediate hominid ancestors. These scenarios also liken Creoles to the earliest and most primitive incarnation of modern human language. Firstly, the linguistic ecology of Pidgin speakers—in the midst of modern and complex human languages—is radically distinct from the ecology of our hominid ancestors who, presumably, did not have competence in anything that was structurally like the grammar of any human language. In any case, Pidgin speakers as modern humans have brains/minds very much unlike those of our hominid ancestors. So, even if we were to grant the validity of the Pidgin-to-Creole cycle, this cycle would have little bearing on the transition from, say, homo erectus protolanguage to homo sapiens language, a transition that most likely would have been accompanied by some reorganization of the brain from one stage to the next (Mufwene 2008:ch. 5). These ongoing observations about the relationship between linguistic ecology and structural complexity suggest how important it is to beware the effect of sampling biases on Creole-simplicity claims. This sampling problem, already noted by Thomason and Kaufman (1998:154), Bakker (2003:26), Mufwene (2008:143–153), and Kouwenberg (2010), especially affects those claims that try to isolate Creole languages into one small corner of linguistic typology with grammars that fit a narrowly defined uniform structural template that, in turn, is placed at the bottom of some arbitrarily defined hierarchy of complexity (McWhorter 2001, 2011; Parkvall 2008; Bakker et al. 2011; etc.). More concretely, let’s consider the basic data and method in Parkvall (2008) which, in turn, has been adduced to support the claims in Bakker et al. (2011), McWhorter (2011), and others. The main argument is that Creoles are typologically distinct from non- Creoles, with grammars that are among the world’s simplest grammars. Parkvall’s database consists of 155 languages as documented in the World Atlas of Language Structures (WALS, Haspelmath et al. 2005) alongside 34 Pidgins and Creoles— 2 from WALS and 32 of Parkvall’s own choosing, 18 of them from Holm and Patrick’s (2007) Comparative Creole Syntax: Parallel Outlines of 18 Creole Grammars (CCS). The data samples in Parkvall (2008) make room for a variety of confounds. Most of the Creoles in this study are historically related to typologically similar European lexifiers (typically Germanic or Romance) with relatively little affixal morphology and with few cross-linguistically rare features, and to African substrates (mostly Niger-Congo) that fall in narrow bands of typological variation as well, as noted in, e.g., Alleyne (1980:146– 180); Thomason and Kaufman (1998:154); Bakker (2003:26); Mufwene (2008:136–153); and Holm (2008:319–320). This particular selection of Creole languages constitutes an extremely biased sample from the start. Furthermore, putting such a restricted and biased Creole sample side-by-side with the much larger set of non-Creole languages in WALS, languages that come from much more diverse stocks, both genetically and typologically, makes for a tendentious comparison (Kouwenberg 2010). In effect, such biased comparison is akin to the following: comparing the heights of 15-year-old basketball male players with the heights of males of other ages from the general population, then

A Null Theory of Creole Formation Based on Universal Grammar 417 concluding erroneously that 15-year-old males are in general taller than males of other ages. Such conclusions are nothing but an artifact of sampling biases.7 A complexity metric that is based on such a small and arbitrary set of morphosyntactic distinctions, forms, and constructions can only impose a biased artificial ranking. As often noted (e.g., in Alleyne 1980), the languages in contact during the formation of these Creole languages are in the set-union of Germanic, Romance, and Niger-Congo and have relatively similar profiles—within a relatively small window of typological variation. Given such major overlaps across sets of ancestor languages plus the well-known effect of second language acquisition on phonological and morphosyntactic paradigms (Bunsen 1864; Meillet 1958:76–101; Weinreich 1958; etc.), it is thus not surprising that the Creole sample in Parkvall (2008) shows the similarities and the ranking that it does, owing to the particular ‘bits’ in his complexity metrics.

17.2.4 Conceptual and Theoretical Issues with ‘Simplicity’ and ‘Creole Typology’ Claims Another fundamental theoretical flaw in the ‘simplicity’ literature on Creoles is the absence of a rigorous and falsifiable theory of ‘complexity.’ Consider, for example, Creole-simplicity claims where complexity amounts to ‘bit complexity’ as defined in DeGraff (2001b:265–274). Such overly simplistic metrics consist of counting overt markings for a relatively small and arbitrary set of morphological and syntactic features (see, e.g., McWhorter 2001, 2011; Parkvall 2008; Bakker et al. 2011). In effect, any language’s complexity score amounts to the counting of overt distinctions (e.g., for gender, number, person, perfective, evidentiality) and on the cardinality of various sets of signals (e.g., number of vowels and consonants, number of genders), forms (e.g., suppletive ordinals, obligatory numeral classifiers) and ‘constructions’ (e.g., passive, antipassive, applicative, alienability distinction, difference between nominal and verbal conjunction). The problem is that such indices for bit complexity resemble a laundry list without any theoretical justification: ‘[T]‌he differences in number of types of morphemes make no sense in terms of morphosyntactic complexity, unless they tell us exactly how overt morphemes and covert morphemes interact at the interfaces, and how they may burden or alleviate syntactic processing by virtue of being overt or covert’ (Aboh and Smith 2009:7). The problem is worsened when bit-complexity metrics are mostly based on the sort of overt morphological markings that seem relatively rare in the Germanic, Romance, and Niger-Congo languages that were in contact during the formation of Caribbean Creoles. Parkvall (2008) defines ‘a complex language [as] a language with more complex constructions’ (269) with ‘an expression [being] more complex than another if it involves 7

The example with 15-year-old basketball players is due to Trevor Bass and is discussed in greater detail in DeGraff et al. 2013. (Thank you, Trevor, for this example and for much more.)

418 Enoch O. Aboh and Michel DeGraff more rules’ (265n1). However, the notions ‘construction’ and ‘rule’ only make sense as part of a larger linguistic theory. Compare, say, the ‘passive construction’ in Transformational Grammar vs. its counterparts (or absence thereof) in the Minimalist framework, and then compare such a ‘passive construction’ to its analogs in Generalized Phrase Structure Grammar and its descendants such as Head-Driven Phrase-Structure Grammar. The complexity metric in Parkvall (2008) is thus devoid of any theoretical content as regards ‘constructions’ or ‘rules,’ and its actual complexity evaluation seems quite arbitrary. Here it’s worth stressing that the sense of complexity depends, not on the data per se, but on the particular counting method, or absence thereof. Let’s take a closer look. The complexity score for each language in Parkvall’s data set is based on a total of 53 features and constructions. The terms ‘feature’ and ‘construction’ in Parkvall are used in a strictly superficial sense, that is, without any analysis of the ‘rules’ that may be involved in deriving, or accounting for the properties of, said features or constructions. There is, therefore, no way to systematically compute whether ‘construction’ X in language Y ‘involves more rules’ than some (analogous?) ‘construction’ W in language Z. In other words, the complexity metric in Parkvall (2008) is based strictly on presence vs. absence of ‘feature’ or ‘construction’ and the counting of overt ‘forms’ without any ‘looking under the hood’ (so to speak) of these features, constructions, or forms. As this author states: ‘for all the traits listed, I consider their presence (or the presence in larger numbers) to add to the overall complexity of a language’ (Parkvall 2008:270). It should now be clear that the bit-complexity markers in Parkvall (2008) do not fall into any theoretically- motivated hierarchy of complexity. These features are simply a subset of those available through WALS (among those features that could be counted), plus certain features that Parkvall ‘happened to have access to’ (Parkvall 2008:273). Even more problematic is the fact that the selected bits belong to narrow domains of morphosyntax, thus ignoring other grammatical modules and the interaction therein (Kouwenberg 2010b). The problem is more general: any complexity metric that is stipulated without any theory of complexity that is grounded in linguistic theory or in the psycholinguistics of language acquisition or processing can too easily become a self-fulfilling prophecy based on one’s subjective expectation as to what should count as less, or more, complex (cf. Hawkins 2009). Indeed, the result of any comparison will depend on which bits are included in the comparanda. So bit-complexity is too unconstrained an approach, especially in light of the fact that language is a tightly-knit computational system with intricate channels of interaction across modules. If so, then one would a priori expect certain sources of complexity in any given module of grammar to, potentially, interact with other sources of complexity in other modules, with various increases or decreases in complexity resulting from interaction among modules. At this rate, no theory has yet been proposed that would adequately weigh the contributions of each module of grammar, plus the contribution of their mutual interaction, to a specific grammar’s overall complexity (for additional comments, see DeGraff 2001b:265–274; Aboh and Smith 2009). In the particular case of Parkvall’s claims, one must ask: Why this bias in favor of overt morphological markings with much less weight accorded to other possible sources of

A Null Theory of Creole Formation Based on Universal Grammar 419 complexity such as phonology, syntax, and semantics? ‘To prove the claim [that Creole grammars are overall simpler than non-Creole grammars—EA, MdG], one would need to show that for every single subdomain of grammar (not just for an eclectic range of subdomains) all creoles score lower or equal to all non-creoles’ (Deutscher 2009:250; also see Faraclas and Klein 2009 and Kouwenberg 2010:372 for related comments).

17.2.5 The Broken Pieces of ‘Break in Transmission’ Claims Linguistics textbooks often cite Pidgin- based ‘broken transmission’ scenarios in their definition of Creole formation. These scenarios exclude Creole languages from the class of languages with normal ancestors, that is, from the class of ‘genetic languages’ (i.e., those that have emerged via ‘normal transmission’). The opposition here is between creolization viewed as ‘abnormal’ vs. language change viewed as ‘normal’ (Bickerton 1988). These scenarios bring to mind the early 20th-century debate opposing Hugo Schuchardt to Max Müller about ‘mixed’ languages: Is every language ‘mixed’ to some extent (Schuchardt’s position) or is mixed-ness an aberration (Muller’s position)? These are the opening questions in Thomason and Kaufman’s (1988) book on language contact and genetic linguistics. Their answer is to admit the existence of mixed languages such as Creoles, but to argue that these languages cannot be assigned any genetic classification. This echoes Taylor’s (1956) position that Creoles are genetically ‘orphans.’ They conclude that Creole languages fall outside the purview of the traditional comparative method. We first argue that basic principles of the comparative method cast Caribbean Creole languages along the phylogenetic branches of their European ancestors (e.g., French in the case of HC). Similar arguments go as far back as Meillet (1951, 1958), Weinreich (1958), and others (see DeGraff 2009 for an overview). These results contradict the ‘break in transmission’ claims in Creole studies and allow us to uncover basic fallacies in the use of computational phylogenetic methods à la Bakker et al. (2011) for classifying Creole languages as a unique type.

17.2.5.1 Creole Languages Are Bona Fide Genetic Languages in the Scope of the Comparative Method Let’s consider again HC, once characterized as a ‘radical’ Creole (Bickerton 1984). The core properties we discuss in this section are characteristics of the earliest (proto-)HC varieties and, thus, cannot be dismissed as post-creolization features, which would have entered the language only recently, via contact with French long after the Creole- formation period (DeGraff 2001b:291–294, 2009:940–941).8 8

The very concept of ‘decreolization’ in the history of Caribbean Creoles is problematic to the extent that the ‘acrolectal’ varieties that are often taken to be the results of decreolization (i.e., those varieties

420 Enoch O. Aboh and Michel DeGraff In establishing the genetic affiliation of a non-Creole language, what is usually taken as confirming evidence consists of a system of correspondences between this language and some other languages—correspondences that suggest inheritance of related features from a common ancestor (or set of ancestors). This system of correspondences must meet some ‘individual-identifying’ treshhold (Nichols 1996). That is, it must contain enough ‘language-particular idiosyncratic properties,’ or ‘faits particuliers’ in Meillet’s (1951, 1958) terminology, in order to reliably rule out chance correspondences, borrowings, homologous developments, and so forth. HC offers robust evidence of straightforward correspondences with French, and similar evidence is straightforwardly available from dictionaries and descriptive grammars for other Caribbean Creoles. Müller et al. (2010) look at lexical similarity among half of the world’s languages, including Caribbean Creoles, and the latter show systematic correspondences with their European ancestors. Such correspondences between Caribbean Creoles and their European ancestors generously meet Nichols’s ‘individual-identifying’ threshold (for HC, see Fattier 1998, 2003; DeGraff 2001a, 2002, 2007, 2009). For example, we find the following arrays of ‘faits particuliers’ in HC with systematic correspondences vis-à-vis French, including the majority of affixes, the majority of paradigmatic lexical sets (including items from Swadesh lists), all grammatical morphemes, pronouns, deictic elements, and so forth. Here is a small sample to illustrate HC items that are inherited from French with various degrees of modification: • All HC cardinal numbers are derived from French: en ‘1,’ de ‘2,’ twa ‘3,’ kat ‘4,’ … san ‘100,’ … mil ‘1,000’ … from French un, deux, trois, quatre … cent … mille … • All HC ordinal numbers, including the suffix /-jɛm/and its morphophonology (sandhi, suppletion, etc.), are derived from French: premye ‘1st,’ dezyèm ‘2nd,’ twazyèm ‘3rd,’ katryèm ‘4th,’ … santyèm ‘100th’ … milyèm ‘1,000th’ … from French premier, deuxième, troisième, quatrième, … centième, … millième … • All HC kinship terms are derived from French: for example, frè ‘brother,’ sè ‘sister,’ kouzen ‘cousin,’ kouzin ‘cousin (feminine)’ … from French frère, soeur, cousin, cousine … • All color terms are derived from French: blan ‘white,’ nwa ‘black,’ rouj ‘red’ … from French blanc, noir, rouge … • All body-part terms are derived from French: cheve ‘hair,’ zòrèy ‘ear,’ je ‘eye,’ nen ‘nose,’ bouch ‘mouth,’ dan ‘tooth,’ lang ‘tongue’ … from French cheveux, oreille, yeux, nez, bouche, dent, langue … that are structurally closest to the European lexifier) would have been among the earliest to emerge, then, to subsequently co-exist with the more ‘basilectal’ varieties (i.e., those that are structurally most removed from the European lexifier). In other words, the earliest Creole varieties were closer to the lexifier than the later ones. This is carefully documented in Lalla and D’Costa (1989) (also see Alleyne 1971; Bickerton 1996; and Chaudenson and Mufwene 2001). In the case of Haiti, the contact with French was reduced to a minimum after independence in 1804, with most Creole speakers being monolingual and having little, if any, contact with French speakers (DeGraff 2001b:229–232).

A Null Theory of Creole Formation Based on Universal Grammar 421 • All TMA markers are derived from French: te ANT, ap PROG, FUT, ava IRREALIS, fini COMPLETIVE … from French étais/était/été (imperfect and participle of ‘to be’), après ‘after,’ va(s) ‘go+3sg/2sg+PRES’, finir/fini(s) ‘to finish’ and its various participial and finite forms … • All prepositions are derived from French: nan ‘in,’ pou ‘for,’ apre ‘after,’ anvan ‘before,’ devan ‘in front of,’ dèyè ‘behind’ … from French dans, pour, après, avant, devant, derrière … • All determiners, demonstratives, etc., are derived from French: yon ‘a,’ la ‘the,’ sa ‘this/that’ from French un, la/là, ça. • All pronouns are derived from French: m(wen) 1sg, ou 2sg, li 3sg, nou 1pl, 2pl, yo 3pl … from French moi, vous, lui, nous, eux … • All complementizers are derived from French: ke ‘that,’ si ‘if,’ pou ‘for’ … from French que, si, pour … • Almost all HC derivational morphemes have inherited their distribution and semantics from French—with modification, of course: for example, HC de- as in deboutonnen ‘to unbutton’ and dezose ‘to debone’ from French de-which, like HC de-, has inversive and privative uses. • HC morphophonological phenomena with French ancestry such as liaison phenomena in an Bèljik ‘in Belgium’ vs. ann Ayiti ‘in Haiti’; de zan /de zã/‘two years’, twa zan /twa zã/‘three years’, san tan /sa tã/‘one hundred years’ … (cf. the pronunciation of the HC and French ordinal and cardinal numbers above; see Cadely 2002 for further examples of HC-French correspondences in phonology) In Nichols’ terminology, these sets would count as individual-identifying ‘lexical categories with some of their (phonologically specific) member lexemes.’ As it turns out, HC even instantiates Nichols’ example of ‘the miniparadigm of good and better … as diagnostic of relatedness.’ To wit, HC bon and miyò straightforwardly derive from French bon and meilleur. This HC example is all the more telling in that the last vowel in miyò (written millor in Ducœurjoly’s language manual (1802:330) reflects an Old and Middle French pronunciation of the word meilleur as meillor (Nyrop 1903, II:312), a pronunciation that is partly retained as mèlyor in Franco-Provençal dialects (Stich 2001), thus indicating that this paradigm was inherited from French and is not a late (‘decreolization’) feature of HC. This hunch is confirmed by Ducœurjoly’s (1802) analysis in his Creole language manual: he translates French meilleur as Creole miyor and French filleul ‘godson’ as Creole fillol. Compare the latter with fillol in 17th-century French (Nyrop 1899, I:158), fiyòl/fiyèl in contemporary HC and filyol in contemporary Franco-Provençal (Stich 2001:585). These sound–meaning correspondences between 17th-century French and HC are further confirmed by the fact that the French agentive suffix -eur /œr/ often maps in HC to an alternation between -è /ɛ/and -ò /ɔ/as in vòlè/vòlò ‘thief,’ mantè/ mantò ‘liar,’ and flatè/flatò ‘flatterer’ from earlier pronunciations of French voleur, menteur, and flatteur, pronunciations still attested in Franco-Provençal (Stich 2001:221). These HC doublets thus reflect a phonological property of early varieties of French as spoken in colonial Haiti.

422 Enoch O. Aboh and Michel DeGraff As carefully documented in Fattier (1988, 2002, 2003), there is a great variety of related morphophonological phenomena and lexical patterns that robustly show that Haitian affixes, alongside much else in Haitian grammar, were inherited early on from colonial varieties of French (see note 168). In other words, HC emerged with bona fide structural ‘faits particuliers’ suggesting its genetic affiliation with French. Such systematic correspondences are incompatible with the existence of an extraordinarily reduced (and affixless) Pidgin as the immediate ancestor of HC (Alleyne 1971; see Fattier 1998, 2003 and DeGraff 2001b:291–294 for further details on the origins of HC morphophonology). As for the whole of the HC lexicon, the vast majority of HC morphemes (bound or free) are etymologically French. Fattier’s (1988) six-volume dialect atlas and subsequent publications (notably Fattier 2002, 2003) establish the French stock of the HC lexicon and morphosyntax beyond any doubt. These Creole-vs.-French correspondences are attested from the earliest documentation of proto-HC, including the passage from Pelleprat (1665) quoted earlier and the very first language manual for 18th-century learners of Creole (Ducœurjoly 1802). In the latter, we find examples with French-derived TMA markers, pronouns, articles, verbs, nouns, and so forth, that are similar to contemporary HC. The Creole examples in Ducœurjoly are all the more striking in that they are given side-by-side with their French translations. Consider these two examples from Ducœurjoly (1802:292): Mo va tendre ly ‘I will wait for him’ (translated as French Je vais l’attendre) and Yo trape nion volor ‘They have caught a thief ’ (translated as French On a attrapé un voleur; note the suffix -or in volor ‘thief ’ from the Old and Middle French, which is identical to Franco-Provençal volor in Stich 2001:1316). These examples constitute additional evidence against proposals that French-derived HC morphemes would have entered the language as ‘late borrowings.’ If we do take the aforementioned individual-identifying evidence at face value, then we must conclude that HC, though it shows substrate influence (e.g., from Gbe as discussed in section 17.3), is indeed genealogically related to French. This conclusion, in turn, entails that HC’s French-derived lexemes are ‘native and cognate until shown otherwise’ (again, borrowing Nichols’ terms). Once such genealogical relatedness is taken as established (‘relatedness by descent’ as in the better studied Stammbaumtheorie branches such as in the evolution of Latin to Romance), it becomes clear that ‘break in transmission’ scenarios à la Bickerton, Thomason, and Kaufman, McWhorter, Bakker et al., and so forth, cannot hold. Then again, one may argue that Caribbean Creoles such as HC show more ‘significant discrepancy’ between their lexical-vs. grammatical-correspondences vis-à-vis their respective lexifiers than French does vis-à-vis Latin (Thomason and Kaufman 1988); argument to that effect is already undermined by Meillet’s observation long ago that French is of a grammatical type distinct from Latin, even though French can be considered as duly descended from Latin according to the comparative method. When we compare HC and French using the structural parameters identified by Meillet (1958:148) to show that French ‘fall[s]‌into a typological class that is quite remote from the structural type represented by Latin,’ our comparison shows that HC and French, especially colloquial

A Null Theory of Creole Formation Based on Universal Grammar 423 17th-and 18th-century varieties, are typologically closer to each other than French and Latin are—with respect to word order, case morphology, definite determiners, and so forth. (These arguments are taken up in more detail in DeGraff 2001b, 2009:919–922 and some of the references cited there.)9 By the same token, the notion of global Creole simplicity falls apart: though French may look ‘simpler’ than Latin on the surface (e.g., absence of nominal declensions) it developed structural devices (e.g., articles) that are absent in Latin. Similarly HC developed structural devices (e.g., a TMA system, focus marking, predicate copying, etc.) that are by and large absent in French, even though some of the basic ingredients in these innovations (e.g., the distribution and semantics of individual TMA markers) have straightforward ancestry in French (more on this in section 17.3).

17.2.5.2 Nonnative Acquisition in Stammbaumtheorie Genetic Branches One other argument that has often been leveled against classifying Caribbean Creoles as Germanic or Romance languages is that genealogical relatedness among diachronically well-behaved Indo-European (IE) languages entails unbroken sequences of native language acquisition (‘NLA’). Recent exponents of this position include Ringe et al. (2002:63) and Labov (2007:346). It seems that this NLA-based position assumes that IE languages would have all evolved via unbroken NLA. But this is contradicted by the crucial role of second language acquisition by adults in, say, the emergence of Romance languages from Latin and in other IE cases (i.e., English) where language contact (thus, by definition, non- NLA) played a key role. In these cases (e.g., in the evolution of French from Latin), we do find, like in the HC and other Caribbean Creole cases, robust systems of lexical correspondences between the outcomes of non-NLA acquisition, on the one hand, and, on the other hand, the target/ancestor language. Like in the Caribbean Creole cases, these correspondences include paradigms of bound morphemes, paradigmatic lexical sets, and other systems of faits particuliers that would have been inherited from the ancestor language through instances of transmission that include non-native language acquisition by adults. The emergence of French from Latin, initially in the context of language contact among adults, seems a good example of non-NLA in genealogical branches of Indo- European. In the history of French, like in the history of Haitian Creole, we do find documentation of language contact and second language acquisition by adults, yet we do not assume that such nonnative acquisition would exclude French from the Romance family and we do not speak of Latin lexemes being ‘borrowed’ into early French. In a related vein, we know of English dialects that descended from the English learned imperfectly by Scandinavian settlers (Kroch et al. 2000; Ringe et al. 2002). As far as 9 Wichmann and Holman (2010) provides extensive empirical and quantitative observations, alongside methodological caveats, about the relationship, and lack thereof, between structural similarity and genealogical relatedness—observations and caveats that raise further doubt on the reliability of Bakker et al.’s claims about typology and relatedness. See also c hapter 16.

424 Enoch O. Aboh and Michel DeGraff we can tell by looking at their phonology, lexica, and morphosyntax, such dialects of English still count as West Germanic—notwithstanding the broken sequence of NLA in their history, due to language contact. Once we consider the aforementioned French and English cases at face value, alongside the HC case as analyzed here, then genealogical relatedness cannot be taken to be strictly coextensive with unbroken NLA sequences. French counts as a Romance language, just as English and its related dialects count as Germanic. Therefore, NLA cannot count as a deciding factor for genealogical relatedness. After all, the comparative method is about the correspondences of linguistic forms. The comparative method is not about the history of the speakers of the corresponding languages (Weinreich 1958:375), which is precisely why Meillet (1951, 1958) and Nichols (1996) warn us about correspondences that do not suggest genetic kinship (see also Mufwene 2008). Similar issues arise in the many cases of ‘indigenized varieties’ of Germanic and Romance languages in postcolonial contexts in Africa and Asia. Those varieties as well can be reasonably considered Germanic and Romance (e.g., French in West Africa or English in West Africa and India) even though they are often learnt as nonnative second languages. (See DeGraff 2009:923–929 and, especially, Weinreich 1958, Mufwene 2001:ch. 4, 2004, and Campbell and Poser 2008 for comprehensive arguments against the use of nonlinguistic factors in evaluating genetic relatedness.) The facts mentioned here in favor of the genetic affiliation of Haitian Creole (HC) with French also count as counter-evidence to other exceptionalist views on Creole formation such as Lefebvre’s Relexification Hypothesis (see DeGraff 2002, 2009 for full- fledged details of this argument.)

17.2.5.3 Th eoretical and Empirical Issues in Computational Phylogenetics in Creole Studies The most prominent exemplar of such methods is the 2011 paper by Bakker et al. claiming that ‘Creoles are typologically distinct from non-Creoles.’ They use the sort of computational phylogenetic algorithms described in Dunn et al. (2008). The application of these algorithms to Creole languages is riddled with empirical and conceptual problems. One foremost challenge is the circularity and data problems in the definition of Creoles. Bakker et al., ‘in order to avoid circularity in [their] definition,’ consider a socio-historical definition, that is, Creoles as ‘nativized or vernacularized developments of pidgins, which are makeshift languages used in some contact situations’ (p. 10). But, in absence of documentation for such Pidgins in the history of the classic Caribbean Creoles, this definition triggers the ‘data problem’ described in Bakker (2003:26): ‘There are no cases where we have adequate documentation of a (non-extended) pidgin and a creole in the same area.’ In this regard, Mufwene (2008:34–35) provides a map showing the ‘geographical complementary distribution between the territories where creoles developed and those where pidgins emerged.’ Given the complementary distribution of Pidgins and Creoles across the world, Bakker et al.’s treatment is circular since they attribute the putatively simple properties of Creoles to their emerging from

A Null Theory of Creole Formation Based on Universal Grammar 425 hypothetical, but undocumented, Pidgins qua ‘simplified forms of interethnic makeshift languages [that] were insufficient for communication’ (2011:36). Another set of methodological and theoretical problems concern Bakker et al.’s assumptions about the use of computational methods in establishing historical relatedness among languages. Though Bakker et al. rely on Dunn et al.’s framework, they misapply it to their sample of Creole languages. Firstly, Dunn et al.’s study of the isolate Papuan languages in Island Melanesia gives clear methodological priority to the classic comparative method with its vocabulary- based sound–meaning correspondences. The comparative method remains the ‘gold standard for historical linguistics’ to be applied whenever cognate sets can be reasonably established within the limited time depth of the comparative method, which is estimated at some 10,000 years (Dunn et al. 2008:710–7 12; cf. Wichmann and Saunders 2007:378). In fact, Dunn et al. first calibrate their computational structure-based comparison of isolate Papuan languages against the prior vocabulary-based results obtained by the comparative method’s ‘gold standard’ as applied to the Oceanic languages of the same area. Then, and only then, do they apply their structure-based computational methods to their sample of Papuan languages (Dunn et al. 2008:734). The reason why these language isolates are outside the scope of the comparative method is that they ‘separated so long ago that any surface traces of cognacy have been eroded’ (Dunn et al. 2008:712). One major issue with Bakker et al.’s application of these structure-based methods to Creoles is that the latter are claimed to be among the world’s youngest languages, certainly younger than the allowable 10,000-year time depth for the comparative method. Furthermore Creole languages are certainly not language isolates by any means. There’s plenty of lexical evidence available to trace these languages’ genealogical classification to their European sources, notwithstanding borrowings through language contact as in the documented history of other Indo-European languages. As already mentioned, some of the evidence for genealogical classification is even available in texts dating back to the early emergence of these languages (e.g., Pelleprat 1665; Ducœujoly 1802). It’s been claimed that the networks produced by computational phylogenetic methods are ‘completely objective and thus not influenced by any preconceptions and prejudices’ (Bakker et al. 2011:12). As we already pointed out with regard to the bit-complexity method in Parkvall (2008), the outcome of the phylogenetic computations is, among other things, a direct result of what features (or ‘characters,’ see Nichols and Warnow 2008) are chosen for the comparison (see also chapter 16). Of course, some finite choice has to be made when comparing languages, and the relevant sets for potential features for any language are not exhausted by available reference grammars. But the key issue here is how to ensure that the initial choices do not undermine the results of our comparisons: on what theoretical basis are small sets of features selected from specific domains of grammar? How do we ensure that certain domains (e.g., isolated areas of phonology and morphosyntax) are not assigned higher priority than other domains (e.g., the lexicon or various areas of syntax, semantics, and discourse)? The point is that,

426 Enoch O. Aboh and Michel DeGraff given the availability of a vast range of structural features to compare between any two languages, the choice of any relatively small set of features is certainly open to ‘preconceptions and prejudices.’ More generally: [t]‌he choice of characters for use in a phylogenetic analysis is of great importance, and has often been one of the main issues involved in critiquing a phylogenetic analysis: which characters did the authors use, and what are the consequences of that choice? (Nichols and Warnow 2008:769)

Nichols and Warnow’s review should be a must-read for any creolist interested in computational linguistic phylogeny. Their review concludes that ‘data selection (both of characters and languages) and the encoding of the character data have the potential to significantly impact the resultant phylogenetic estimation.’10 Dunn et al. (2008) are aware of these and related issues, and they enlist the following strategies, all of which are lacking in Bakker et al.’s comparison of Creoles with non-Creoles:

(i) Dunn et al.’s phylogenetic computations are based on ‘the combination of structural features from different domains of a grammar (phonology, morphology, syntax, semantics)’ (715). (ii) ‘As many abstract structural features from as many parts of the grammar as possible should be investigated’ (716). (iii) They used 115 features for a sample of 22 Papuan languages (pp. 728, 730), with features ranging over phonology, morphology, and syntax, with the goal of providing ‘a large body of basic features for each language, which together give a broad typological profile, regardless of whether any given feature seems typologically significant. The resultant phylogenies are thus not likely to reflect a sampling bias.’ (Also see Wichmann and Saunders 2007:383 on how areal effects introduce noise in the data when the comparison is based on small set of features.) (iv) They ‘avoid the charge of “hand-picking” features by including in [their] sample the widest feasible range of noninterdependent typological phenomena’ (p. 733; cf. Wichmann and Saunders 2007: 376, 382, 385n7). These methodological caveats are all flouted in Bakker et al.’s comparison of Creole versus non-Creole languages, starting with the size and the nature of the features used in the comparison. Bakker et al. use features from two previous publications: Holm and Patrick’s (2007) Comparative Creole Syntax (CCS) and Parkvall’s (2008) aforementioned study on Creole simplicity. One major problem is that the 97 features 10 ‘In addition, the characters must then be coded for each language—a step that is itself critically important as it is often here that mistakes are made, even by trained linguists…’ (Nichols and Warnow 2008:769). See Kouwenberg (2010, 2012), Parkvall (2012), Fon Sing and Leoue (2012), and DeGraff et al. (2013) for examples of such mistakes in the coding of characters.

A Null Theory of Creole Formation Based on Universal Grammar 427 in CCS are massively interdependent; Holm (2007:xi) warns the reader that the CCS feature set is indeed ‘redundant’ (more details in the next paragraph). As for the Bakker et al. study that is based on Parkvall (2008), it uses only 43 features from limited areas of grammar to compare some 200 languages. Compare with Dunn et al.’s use of 115 features for their 22 Papuan languages. Like the CCS features, the features in Parkvall (2008) violate the ban against interdependency and are taken from narrow and superficial areas of morphosyntax—mostly having to do with overt morphology, as noted in Aboh (2009) and Kouwenberg (2010, 2012) (see section 17.3.3). Let’s take a closer look at the Bakker et al. study that is based on the 97 CCS features. It is claimed that these features make the Creole languages in the Bakker et al. sample ‘stand out.’ But the CCS features were explicitly chosen because they were viewed as a good candidate set for pan-Creole features if any set could exist. So it is no surprise that such features would make these Creoles ‘stand out’: the CCS features had been cherry- picked with the express goal of trying to group Atlantic Creoles and Niger-Congo languages together in contrast with the Creoles’ European superstrate languages (Holm 2007:vi). Bakker et al.’s choice of the CCS features is therefore in contradiction with Dunn et al.’s caution against sampling biases. There’s yet another problem in Bakker et al.’s use of CCS features. Holm (2007:vi) and Patrick (2007:xii) make it very clear that the feature values in each and every Creole language in the CCS sample only makes sense as part of ‘a set of interconnected systems.’ Things become even trickier when CCS readers are warned at the outset that rating particular features across CCS languages suffers from ‘varying definitions and operationalizations’ (Patrick 2007:xii). A good example of this is in DeGraff ’s discussion of HC in Holm and Patrick (2007), where he notes that features such as ‘adjectival verbs’ and ‘copula’ cannot be taken as stable properties that can be compared with a ‘+’ (for presence) or a ‘–’ (for absence) across languages, exactly because their underlying syntax is part of larger intricate systems (see, e.g., DeGraff 2007:103–104, 112–115). To illustrate: in HC it is the ‘absence’ of a copula in certain constructions (e.g., those that involve adjectives as in Jezila bèl ‘Jezila is beautiful’) that may give the impression that bèl is a verbal predicate (e.g., on a par with the verb danse as in Jezila danse ‘Mary has danced’). However, there are distributional tests that do distinguish verbs and adjectives in HC and that show that bèl is adjectival, not verbal. Contrast, for example, these HC comparative constructions as discussed in DeGraff (2007): Jezila pi bèl pase Mari ‘Jezila is more beautiful than Mary’ vs. Jezila mache plis pase Mari ‘Jezila has walked more than Mary’ (cf. *Jezila plis bèl pase Mari and *Jezila pi mache pase Mari). These examples show two key properties of the features ‘copula’ and ‘adjectival verb’: (i) they are interdependant; (ii) they only make sense within the larger syntax of the given language. Indeed, the distinction between verbs and adjectives in HC can only be made on diagnostics that are, to some degree, internal to HC grammar. (See Seuren 1986 for related arguments about verbs vs. adjectives in Sranan.) Another example relates to the TMA system. Among the 97 features in Holm and Patrick (2007), 40 of them are related to the TMA and verbal system, with 24 of them

428 Enoch O. Aboh and Michel DeGraff related to temporal interpretation; among the other features, some 20 are related to the nominal system (Véronique 2009:153–154). As Holm and Patrick’s (2007:vi) correctly put it: … the logic of this set of systems is clear only in terms of itself as a totality. For this reason it is unenlightening to compare, for example, a particular tense marker in twelve different Creoles without also explaining how this tense marker fits into the overall verbal system of each language.

And comparing particular markers without explaining their functions within each language is exactly what Bakker et al. (2011) do, not for 12, but for some 200 languages. Not only do their claims on Creole type go against the spirit of Holm and Patrick’s work, but they also flout Dunn et al.’s (2008) caveat that non-interdependent features from diverse domains of grammar are a sine qua non for reliable computational phylogenetics.11 Our point, thus far, is that no matter how powerful the computational algorithms that underlie these phylogenetic methods, the initial choice of features for comparison as well as the size and nature of the samples of features and languages will exert a key influence on the outcome of said comparisons and the reliability thereof. In this regard, it is not surprising that the Creole sample in Bakker et al. (2011) shows certain typological similarities, especially given the sampling problem that Bakker (2003) warns against. As in Parkvall (2008), the comparisons in Bakker et al. (2011) are based on samples where most of the Creoles have Germanic or Romance lexifiers or Niger-Congo substrates. For example, of the 18 Creole languages in Holm and Patrick (2007), 15 have Germanic or Romance lexifiers and 14 include Niger-Congo languages among their substrates. The focus on narrow morphosyntactic domains for broad typological claims makes the sampling bias even worse. For example, the TMA systems of Caribbean Creoles are all influenced by typological tendencies in Niger-Congo with certain analogs in Romance and Germanic (see section 17.3). So what we may be dealing with here is the sort of Sprachbund phenomena that are also found across distinct phylogenetic branches (e.g., in the Balkans). In the Caribbean Creole cases, this can be termed a ‘Trans-Atlantic Sprachbund’ (Aboh and DeGraff 2014).12 11 Bakker et al.’s (2011) claims become even more worrisome when it’s discovered that their empirical generalizations are logically incompatible, contradict one another, contradict well-known facts from both Creole and non-Creole languages, contradict data documented in their own publications, and contradict some of their own references, including WALS (see DeGraff et al. 2013). 12 Our ‘Sprachbund’ concept is different from the ‘scattered Sprachbund’ in Kihm (2011), where Pidgins and Creoles are postulated to share Sprachbund-like similarities due to their common emergence via untutored second language acquisition (e.g., the Basic Variety as discussed in Klein and Perdue 1997). Instead, trans-Atlantic Sprachbund, as used here, is based on attested features in the specific languages that were in contact in the colonial Caribbean and whose speakers participated in Creole formation. As we’ve mentioned earlier, this would mean that Creoles will display distinct structural properties depending on the ecology they emerged in. As it turns out, some of the Sprachbund features highlighted in Kihm’s study of Guinée-Bissau Kriyol and Nubi Arabic (e.g., inflectional morphology for passives and for plural number) are absent in Atlantic Creoles (see Kihm 2011:53, 17, 74, 80–81). Therefore, such features cannot be claimed as pan-Creole Sprachbund features. For related caveats (e.g., regarding Klein

A Null Theory of Creole Formation Based on Universal Grammar 429 Let’s now ask a broader question: why would Bakker et al. choose structural-typological features when establishing phylogenetic trees for Creole languages even though it is lexical and morphophonological features that have been used to establish phylogenetic relatedness at much greater time depth than in Creole formation, as in the history of Indo-European? Such choice introduces a methodological double standard from the start. Indeed the most successful cases of computationally-derived phylogenetic trees have been drawn on the basis of vocabulary, not structural-typological features (this is noted in, for example, Bakker et al. 2011:13; see also Mufwene 2003 for related issues). The reliability of phylogenetic trees is much more fragile when the comparison is based on structural-typological features alone (Nichols and Warnow 2009; Donohue et al. 2011; see c hapter 16 for a defense of parameter-based phylogenies). Bakker et al. consider the structure-based phylogenetic results in Dunn et al. (2005) ‘quite remarkable’ (13) and they further argue that ‘structural features may be safely used for evolutionary studies’ (p. 21). But what is not noted is that the structure-based results reported in Dunn et al. (2005) about the Papuan languages have been argued to be incompatible with previously established subgroupings (e.g., Nichols and Warnow 2009:799). Nichols and Warnow attribute the lack of accuracy in Dunn et al. (2005) to their exclusive use of structural-typological features. Donohue et al. (2011) have argued that phylogenetic relationships cannot be reliably established in absence of morphophonological and lexical correspondences (p. 378). They thus argue in favor of the classic comparative method and against the results of Dunn et al. (2008), and they come to the more cautious conclusion that ‘there is not a direct link between the typology-constructed tree and the linguistic phylogeny, but rather … both of them covary according to linguistic geographic distance’ (p. 373). From this perspective, ‘the analysis of abstract typological features is a valuable detection tool in that the results serve as an accurate proxy for distance, rather than a proxy for phylogenetic results such as would result from the application of the comparative method to a group of languages’ (p. 374). Their fundamental insight (p. 378) is that it’s ‘linguistic geography, rather than phylogenetic identity [that] determines typological clusters’ in the sort of networks that are produced in Dunn et al. (2008). These conclusions are compatible with our own observations here that the oft-noted structural similarities among Caribbean Creoles are of the quasi-Sprachbund13 type due to their origins in contact among overlapping sets of Niger-Congo and Indo-European languages of relatively similar types, not the result of any catastrophic diachronic event such as radical pidginization.14 The controversy surrounding phylogenies based on and Perdue’s Basic Variety as an explanation to pan-Creole similarities), see DeGraff (2011b:249–250), also see note 10. 13

The term ‘quasi’ in ‘quasi-Sprachbund’ is used advisedly: not to suggest that Caribbean Creoles have converged to similar patterns via mutual borrowings, but only to highlight the presence of similar patterns that seem due to their ancestry in a relatively narrow band of typologically similar languages from Romance, Germanic, and Niger-Congo languages. 14 Donohue et al.’s concept ‘linguistic geography’ is misunderstood by Daval-Markussen and Bakker (2012:90) who take their results to ‘go against the conclusions of Donohue et al. (2011), who claim that the various clusterings observable in phylogenetic networks are due to the effects of areality

430 Enoch O. Aboh and Michel DeGraff structural-typological features is unsurprising in light of Meillet’s caveats that genetic affiliation does not track typological similarity (as in the famous Latin-to-French case already discussed in section 17.2.5.2). These caveats highlight the recurrent double standard that has long been applied to Creole languages with claims such as: Creoles typically show lexical continuity with their lexifiers, but only limited continuity in their structural make-up, making it strictly seen impossible to consider a creole language a genetic descendant of its lexifier. (Bakker et al. 2011:14)

It seems worth repeating that French as well shows vis-à-vis Latin ‘only limited continuity in [its] structural make-up.’ Yet French is a genetic descendant of Latin even if it belongs to ‘a typological class that is quite remote from the structural type represented by Latin’ (Meillet 1958:148). Therefore, in establishing phylogenetic relatedness, typological-structural features cannot be taken to override the sort of lexical and morphophonological correspondences that can be reliably established in the history of Creole languages (contra Bakker et al. 2011:21). In the case of Creole languages (languages whose emergence is much more recent than that of most other Indo-European languages), the origins of the vocabulary items and the related cognates and morphophonological correspondences are relatively straightforward, so there seems to be no need to shy away from them.

17.3 A Null Theory of Creole Formation (NTC) Let’s summarize our essay so far: Creole languages have traditionally been excluded by fiat from the scope of the comparative method in historical linguistics—they are thus considered to lie outside Stammbaumtheorie (this dogma is noted in for example Noonan 2010:60). This exclusion is based on the belief that Creoles emerged through a break in linguistic transmission and represent an exceptional case of language and geography rather than to genealogy.’ Donohue et al. (2011:369) defines ‘linguistic geography [as] the network of contact and diffusion that postdates a proto-language, in most cases corresponding to geographic distance’ [emphasis added]. More generally, linguistic geography can be defined as ‘the spatially measured network of social interactions,’ which, in turn, entail ‘the diffusion (spatial or social) of linguistic traits’ [emphases added]. Such diffusion is compatible with the sort of quasi Sprachbund effects that, in the case of Caribbean Creoles, were caused by overlapping sets of Niger-Congo substrates and Germanic or Romance superstrates. It is important to note that such (long-distance) areal effects via socio-historically determined diffusion of certain typological features are compatible with genealogical relationships as determined by the comparative method. According to the latter (and as argued in the main text) Caribbean Creoles are genealogically related to their respective European lexifiers, contrary to Daval-Markussen and Bakker.

A Null Theory of Creole Formation Based on Universal Grammar 431 evolution: they emerged in the absence of normal linguistic input. Given this view, various exceptionalist theories postulate a catastrophic emergence for Creole languages as an explanation of their supposedly simple structural make-up. Our ongoing critique has suggested that these claims are all mistaken, and we argue that Caribbean Creoles duly fall in the scope of the comparative method as languages genealogically affiliated with their European ancestors, notwithstanding the documentation of quasi-Sprachbund phenomena due to the pervasive influence of overlapping substrate and superstrate languages such as Germanic, Romance, and subsets of Niger-Congo languages (see note 13). Such quasi-Sprachbund phenomena are similar to borrowing patterns that are now well documented in established Stammbaumtheorie branches such as Indo-European. On the latter, see Ringe et al. (2002); Nakhleh et al. (2005). In this section we sketch a framework for a null theory of Creole formation, a theory that we believe is empirically more adequate than the popular exceptionalist claims surveyed in sections 17.1 and 17.2. Our null theory does away with any sui generis stipulation that applies to Creole languages only. Instead it is rooted in basic assumptions and findings about UG, that is, assumptions and findings that apply to all languages and to how learners acquire these languages. In this approach, the emergence of any new language or language variety in the context of language contact sheds lights on the interplay of first and second language acquisition as new grammars are built from complex and variable input. The effects of this interplay are similar across familiar cases of Creole formation such as the creation of HC and familiar cases of language change such as in the history of French and English. Our null theory undermines various traditional claims about ‘Creole simplicity’ and ‘Creole typology’ whereby Creoles are considered exceptional languages of the lowest complexity. Our approach also makes for a better integration of Creole phenomena to our general understanding of the cognitive bases for language change (see DeGraff 1999b, 2002, 2009; Mufwene 2001, 2008; Aboh 2006b, 2009, 2015) for further discussion, see also chapter 18.

17.3.1 The Rationale from a Language Acquisition Perspective Informed by History To begin with, we depart from the common view that Creoles developed as a consequence of a catastrophic break in transmission or radical transmission failure. The history of Haiti, for instance, suggests that a key factor in Creole formation was the role of language acquisition among a multilingual community involving speakers with different profiles—a situation typical of other cases of contact-induced language change as in the history of Germanic or Romance. So we need to first take stock of these learners’ profiles and their role in Creole formation. Let’s recall Pelleprat’s aforementioned quote about the 17th-century French colonial Caribbean, repeated here for convenience. This quote shows that there was no break in the transmission of French as long as we keep in mind that second language acquisition

432 Enoch O. Aboh and Michel DeGraff by adults with varying exposure to the target L2 was also operative in the early history of, say, Romance languages: We wait until they learn French before we start evangelizing them. It is French that they try to learn as soon as they can, in order to communicate with their masters, on whom they depend for all their needs. We adapt ourselves to their mode of speaking. They generally use the infinitive form of the verb [instead of the inflected forms— EA, MdG] … adding a word to indicate the future or the past.… With this way of speaking, we make them understand all that we teach them. This is the method we use at the beginning of our teaching…. Death won’t care to wait until they learn French. (Pelleprat 1655 [1965, 30–31], our translation)

A couple of noteworthy paradoxes emerge here that are often ignored in Creole studies. Without calling it ‘Creole,’ Pelleprat introduces the enslaved Africans’ ‘mode of speaking’ French as a variety that arises as these Africans try to learn French ‘as soon as they can.’ Pelleprat’s remarks also suggest that biblical teaching and presumably other sorts of instruction were carried out in this emerging variety: ‘with this way of speaking, we make them understand all that we teach them. This is the method we use at the beginning of our teaching … Death won’t care to wait until they learn French.’ It appears from this citation that the mode of speaking, which was to be subsequently referred to as ‘Creole,’ was used as the language of instruction and was therefore accepted in what could be considered, back then, relatively formal contexts. This hypothesis is further corroborated by official declarations by no less than Napoleon Bonaparte in 1799 in Saint-Domingue Creole as reported in Denis (1935:353–354) (see Aboh 2015 for discussion): Paris, 17 Brimer, an 10, Répiblique francé, yon et indivisible … Consuls la Répiblique Francé a tout zabitans Saint-Domingue … Qui ça vous tout yé, qui couleur vous yé, qui côté papa zote vini, nous pas gardé ça: nous savé tan seleman que zote tout libre, que zote tout égal, douvant bon Dieu et dans zyé la Répiblique. Dans tan révolution, la France voir tout plein misère, dans la même que tout monde te fere la guerre contre Français. Français levé les ens contre les otes. Mes jordi là tout fini, tout fere paix, tout embrassé Français; tout Français zami; tout hémé gouverneman, tout obéi li … Signé: Bonaparte [Our translation: Paris, 17 Brumaire, year 10 French Republic, one and indivisible…. From the consuls of the French Republic to the entire population of Saint-Domingue. Whoever you are, whatever your skin color, wherever your ancestors are from, that does not matter to us: we only know that you are all free and all equal before God and before the Republic. During the Revolution, France experienced a lot of suffering because every other country fought against the French. The French were fighting each other. But today, all of that is over. All people have made peace. All people have

A Null Theory of Creole Formation Based on Universal Grammar 433 embraced the French. All the French people are friends, they all love and obey the government. Signed: Bonaparte ]

In a related vein, Ducœurjoly’s (1803) Manuel des habitans de Saint-Domingue contains a language primer intended for newly arrived colonists. There it is explicitly stated, in the table of contents, that this primer is the first ‘Creole dictionary, with conversations translated in French and in Creole to give an idea of this language and to make oneself understood by the Blacks.’ This objective suggests that the Creole was used by almost everyone in the colony including the white colonists. Given this function of the Creole as a lingua franca, the enslaved Africans arriving in the colony must have considered the Creole as language acquisition target, possibly in addition to other local varieties. Taken together, these observations, from Pelleprat to Ducœurjoly, are inconsistent with the ‘break in transmission’ assumption often evoked in creolistics, whereby Creole languages would emerge out of a structureless ‘macaronic pidgin’ (Bickerton 1981). As the alert reader would have noticed, a structureless Pidgin can hardly serve as means for proper biblical instruction and political propaganda. Instead, what these documents suggest is that the colonial precursor of HC was used in both everyday and (quasi-)official contexts by the Church, by government officials and by colonists as the one language accessible to most inhabitants of the island. Given the fact that the Creole was used as language of instruction and for political propaganda as well as commercial and plantation activities, it was in all likelihood spoken by members of the emerging Creole society in increasing numbers and with diverse degrees of fluency—both by speakers of the substrate languages and by speakers of the local varieties of French. And this popularity of Creole in colonial Haiti was explicitly noted by the anonymous author of Idylles et Chansons ou essais de Poésie Créole par un Habitant d’Hayiti, who defined the ‘Langue Créole’ as a sort of ‘corrupted French’ that was ‘generally spoken by the Blacks, the Creoles [i.e., the Caribbean-born—EA, MdG] and by most of the colonists in our islands in the Americas’ (Anonymous 1811:2). Similar remarks about the widespread use of early Creole varieties are noted in Hazaël-Massieux (2008:42–43). In a related vein, and as already documented in DeGraff (2001:251), quoting Schuchardt, ‘[t]‌he slaves spoke the creole not only with the Whites but also among themselves while their mother tongue was still in existence, the latter being moreover constantly revived to some extent by the continual immigration from Africa.’ Historical records indicate that a significant group of such African speakers were the Gbe people from the Slave Coast (see Singler 1996; Aboh 2015). These Gbe speakers were also L2 speakers of the Creole. Like adult language learners everywhere, their native languages would certainly influence their approximation of the target language (see chapter 13). Thus arises the well-documented substrate influence in HC—and in other Caribbean Creole languages (see Mufwene 2010 for a critical overview). Beside these two types of L2 speakers, including Europeans and Africans, there were the locally-born (i.e., the so-called Creole) children. White children probably grew up bilingual in the Creole and the local variety of French, as is evident with the Beke

434 Enoch O. Aboh and Michel DeGraff population today in Martinique, while some black children must have grown up bi-or multilingual in the Creole and some of the Niger-Congo languages. Some black children, alongside mixed race children (so-called ‘mulattoes’) and white children also spoke the local variety of the colonial language. These bi-and multilingual speakers would contribute their diverse innovations to the mix, as influenced by their parents’ heritage languages. Finally, we should not forget those enslaved African children who were brought to the colony relatively young and became early L2 learners of the then- incipient Creole varieties. As suggested in DeGraff (2002, 2009), the interaction between these different types of learners in the early history of HC created an ‘L2–L1 acquisition cascade’ whereby newly arrived L2 learners and newly-born L1 learners were exposed to PLD from a mix of Creole varieties—PLD that are partly fed by previously arrived L2 learners (often by the ‘seasoned slaves’), alongside older native Creole speakers. In terms of this hypothesis, L1 learners contribute to the emerging Creole by, among other things, setting the relevant parameters of their own, native and stable, idiolects on the basis of patterns that, in turn, are influenced by L2 learners’ reanalyses and innovations (see chapters 12, 13, and 18). It is thus that the early norms of Creole varieties would emerge and become more and more stable as more and more native Creole speakers would converge on overlapping sets of parameters for their Creole idiolects.15 We now look at specific examples of such patterns arising through this L2–L1 cascade. This hypothesis is compatible with views on contact-induced diachronic changes. Meisel (2011:125f), for instance, argues that: As far as the setting of parameters is concerned, ‘transmission failure’ is unlikely to happen in simultaneous first language acquisition. Only in successive acquisition of bilingualism might L2 learners fail to reconstruct the target grammar based on the information provided by the primary linguistic data. Alternatively, monolingual or bilingual children may develop a grammar distinct from that of the previous generation if they are exposed to an L2 variety of the target language.

In addition to substrate-influenced patterns in Creole formation, we also need to think about the fact mentioned in section 17.2 that HC grammar, including its morphophonology, syntax, and semantics, shows systematic correspondences with French, even in the domain of affixes and inflectional heads such as TMA markers, complementizers, determiners, and the like. Recall that these correspondences include diverse morphological idiosyncrasies such as affix distribution, the semantic ambiguity of certain affixes, sandhi phenomena, and morphological suppletion. As we consider the various categories of learners who contributed to the creation of HC, we may wonder who among those learners were the main channels for the inheritance of French-derived forms and structures in the emerging Creole. 15 Here we use the term ‘parameters’ as a shorthand, as we remain agnostic on the ontology of these ‘parameters’ and the actual path and mechanics for the setting of ‘parameters’ in the course of language acquisition (see chapters 11, 14, and 16 for relevant discussion).

A Null Theory of Creole Formation Based on Universal Grammar 435 In colonial Haiti (i.e., Saint-Domingue) the most likely groups to create early Creole varieties with robust lexical and structural French inheritance were not the field slaves on large and segregated plantations, but a socio-economically privileged group constituted by Africans on ‘homesteads’ and by others with direct and regular contact with speakers of French varieties (e.g., African and European L2 speakers of the local varieties). In this privileged group the Creole people of Saint-Domingue would stand out, with Creole taken in its ethnographic sense (i.e., locally born of non-indigenous parents). Having been born in Saint-Domingue, these Creoles were in the best position to attain native fluency in the emergent Creole varieties, alongside any available French varieties (DeGraff 2009:940–941). Also important is the fact that among the non-white population, the Creoles, many of whom were of mixed European–African descent, were generally the ones with the highest social capital. This advantage was quantified by Moreau de Saint-Méry as worth ‘a quarter more than that of the Africans.’ So their native Creole varieties, including the French inheritances therein, would have had enough prestige to subsequently spread through the population at large into communal norms of Creole speech—or, more accurately, communal norms of French-derived varieties that were increasingly perceived as autonomous from French and symbolic of the new ‘Creole’ community and its emerging identity. Alongside substrate influence and superstrate inheritance, we also need to take into account novel aspects of Creole grammars, aspects without analogs in any of the ancestor languages. Here, some fundamental considerations are in order about the general role of language acquisition in language change (see also c hapter 18). Given the fact that idiolects are like linguistic fingerprints, the I-languages of any learner’s models, in Creole formation and everywhere else, will unavoidably be distinct pairwise. So it’s logically impossible for any learner to replicate the I-languages of all available models (DeGraff 2009:906–907, 914–916). Therefore, language acquisition always makes room for innovations that, in turn, either endure beyond the innovators’ own I-languages or have a transient short life restricted to only the span of said I-languages or the early stages therein. Endurance or transience of any given innovation depends on whether or not these innovations, which originate in individual learners’ I-languages, subsequently spread into new communal norms via their widespread language use alongside further instances of language acquisition—acquisition that is relatively convergent vis-à-vis the particular innovation or a substantial component therein. As the innovation is spread through larger groups of speakers, there’s also the possibility that its domain of application is extended to larger and larger linguistic environments. The spread of innovations is sensitive to complex ecological, socio-historical and linguistic-structural factors (see DeGraff 2009:899–914 for some discussion, including innovation via grammaticalization, and some important caveats against the conflation of spread and innovation in analyses of Creole formation and language change; cf. Hale 1998). The key point here is that the rise of innovations through language acquisition is not a hallmark of Creole formation alone. Innovations are part and parcel of language acquisition throughout our species (see Crain et al.’s 2006 article aptly titled ‘Language acquisition is language change’).We therefore conclude that what matters in a context

436 Enoch O. Aboh and Michel DeGraff of change is, not so much whether there has been a break in transmission, but rather a combination of the following: (i) the profiles of the learner (e.g., bilingual L1 learners vs. early/late L2 learners), and (ii) the input to which various learners are exposed and in what ratios (e.g., input from native vs. nonnative speakers of the target language). In this regard, Creoles do not represent an exception: they developed in the same set of conditions that have generally led to language change. With respect to second language acquisition (L2A), it has been argued that certain innovations in adult learners’ interlanguages are due to general strategies of L2A while others are due to patterns in the learners’ native languages (their L1s); see c hapter 13 for discussion. Therefore substrate influence in Creole formation is par for the course. One well-established fact in the literature on L2A is that very early interlanguages are structurally reduced and unstable. This is a stage that all adult L2 learners go through, no matter how competent and no matter the nature of their input—this includes L2 learners in the history of any language. But in any language contact situation that involves young and adult learners (e.g., the ecology of Creole formation), not all L2 learners will be starting their respective L2A path at the same time and proceeding through the very same stages at the same pace. At any moment, the patterns from early interlanguages, with their reduced structures and their L1-influenced patterns, will constitute only one subset of the total ecology, alongside patterns from later stages of L2A and from L1A. In effect, both L1A and L2A would produce a mix of native and native-like patterns based on European(-derived) varieties as acquisition targets. The structural patterns in this mix and the stochastic properties thereof would be determined by contingent demographic and socio-historical factors. Accordingly, Creole languages cannot be mere fossilized interlanguages as argued by Plag (2008a,b) (cf. Mufwene 2010, DeGraff 2009:948–958, DeGraff et al. 2013, and Aboh 2015 for critiques). In order to appreciate the extent of L1 influence in L2A in the course of Creole formation, we first need to understand the basic facts of substrate influence, alongside superstrate inheritance, in the diachrony of particular Creole languages. As in the rest of this chapter, we resist explanations that take scope over Creole languages as a class. Instead we prefer to focus on particular case studies as they shed light on specific theoretical analyses and complexities thereof—complexities that are often obscured by exceptionalist approaches to Creole languages.

17.3.2 Superstrate Inheritance, Substrate Influence, and Innovation in HC Formation In the particular case of HC, here are a set of key facts, as surveyed in the preceding sections, that need explanation. These facts involve certain similarities with the French lexifier, and the Niger-Congo languages, most notably the Gbe languages (e.g., Fongbe, Gungbe, Ewegbe) of the Kwa family. These similarities often come with innovations, including both simplification, as in the reduction of verbal inflectional morphology, and local complexification, as in, e.g., complementation structures and related properties

A Null Theory of Creole Formation Based on Universal Grammar 437 (see note 5). We now turn to these facts. (See Aboh 2006a,b, 2009, DeGraff 2009:916–929, 938–940, 948, etc., for related empirical and theoretical details and for references.)

17.3.2.1 Reanalysis in the Clausal Domain Since Sylvain (1936) there has been an impressive list of studies documenting both similarities and dissimilarities between French and HC, with some of the dissimilarities apparently related to substrate influence. There is not enough space here to comprehensively review all the morphosyntactic similarities between French and HC, but the following examples from DeGraff (2007:109) should suffice to make our point. Similarly to French, HC displays a null complementizer in certain nonfinite subordinate clauses (4): (4) a. Tout moun vle ale nan every person want go to ‘Everyone wants to go to heaven.’ b. Tout le monde veut every det people want.3sg ‘Everyone wants to go to heaven.’

syèl. heaven aller go

au to

(HC)

ciel. heaven

(French)

Aside from differences in nominal morphosyntax, which are discussed in detail in Aboh and DeGraff (2012), and in finite verbal inflectional morphology, which we return to in 17.3.2.2, it appears that in (4) French and HC display a parallel morphosyntax when it comes to nonfinite complementation.16 Also note that every single morpheme in (4a) finds its etymon in the French example in (4b). Such correspondences hold in every domain of HC lexicon and even grammar, including the functional domain. This evidence supports our position that HC is genealogically related to French, consistent with the traditional application of the comparative method. Interestingly, the complementation structures in French and HC manifest different properties from the substrate Gbe languages where the nonfinite clause is obligatorily introduced by a prepositional complementizer, here in boldface. A Gungbe example is given in (5): (5)

16

Mὲ lέ kpó wὲ jró people pl all foc want ‘Everyone wants to go to heaven.’

*(ná) prep

yì go

lɔˊn heaven

See Koopman and Lefebvre (1982) and Mufwene and Dijkhoff (1989) for discussions of whether Creoles morphologically distinguish between finite and nonfinite verbs. Here we take ‘finiteness’ to have syntactic correlates. So a language like HC can still show finite vs. nonfinite syntactic distinctions (in terms of, say, structure and binding domains) independently of morphological distinctions (or lack thereof) on verbal forms (see DeGraff 2009:67f for observations and references).

438 Enoch O. Aboh and Michel DeGraff In a sense, this Gungbe prepositional complementizer resembles French pour ‘for,’ whose HC cognate is pou, as in example (6): (6)

Annou vote [pp pou [ kandida let-1pl vote for candidate ‘Let’s vote for the candidate we want.’

nou 1pl

vle want

a]]. det

In addition to selecting nominal projections and projecting run-of-the-mill PPs as in (6), HC pou, in one of its many uses, can also select for subjectless nonfinite purpose clauses on a par with its French etymon pour: (7)

a. Kouto sa a pa fèt pou knife dem det neg make for ‘This knife is not made for cutting bread.’

koupe cut

pen. bread

HC

b. Ce couteau n’est pas fait pour couper le pain. Fr dem knife neg-be.3sg neg make for cut det bread ‘This knife is not made for cutting bread.’ The Haitian example in (7a) displays the same word order as the French sentence in (7b). Yet DeGraff (2007) also shows that HC pou, unlike French pour, selects for full finite clauses of the types given in (8): (8)

a. Kouto sa a pa fèt pou knife dem det neg make for ‘This knife is not made for cutting bread.’ b. Li 3sg

te ant

ale go

nan to

fèt party

la det

li 3sg

pou for

koupe cut

li 3sg

te ant

pen. bread

HC

ka capable

fè do

yon ti danse, men lè li rive, pa te gen mizik. det little dance but when 3sg arrive neg ant have music ‘(S)he went to the party to dance a bit, but when (s)he arrived there was no music.’ It therefore appears from the HC examples in (7) and (8) that HC pou as complementizer can introduce both nonfinite and finite clauses. This is not the case for complementizer pour in Modern French which only selects for nonfinite clauses without any overt subject. As it turns out, there’s one obvious difference between HC and French that supports our L2–L1 cascade hypothesis about the formation of HC: the verbs in our HC examples are, unlike French, devoid of affixes for TMA and subject–verb agreement. This difference can be explained in a perspective that considers L2A a key process in Creole formation and in contact-induced language change more generally, with verbal inflectional

A Null Theory of Creole Formation Based on Universal Grammar 439 affixes as a frequent casualty in L2A, especially in the early stages—and much more so than in L1A (Weinreich 1953; Archibald 2000; Prévost and White 2000; Wexler 2002; Ionin and Wexler 2002). This morphological difference between French and HC seems related to a syntactic difference over a larger domain, namely the clausal domain and the selectional properties of French pour vs. HC pou. Recall the example in (8) where HC pou can introduce finite clauses. This selectional property of HC pou is not available for French pour. The only way for the latter to embed a finite clause is through the intermediacy of the complementizer que, as in the following example with a subjunctive: (9)

Jean John

a has

acheté bought

un a

livre book

pour for

*(que) that

son his

fils son

puisse le lire. can.subjunctive it read ‘John has bought a book so that his son can read it.’ At this point, we need to go beyond Standard Modern French, and highlight the fact that the preposition pour in Middle French, unlike its modern descendant, could select for nonfinite clauses with overt subjects as in … et lour donna rentes pour elles vivre … ‘… and gave them a stipend for them to live …,’ Quelle fureur peut estre tant extrême … pour l’appetit chasser la volonté? ‘What fury can be so extreme … for hunger to chase the will?’ (Nyrop 1930, VI:219; also see Frei 1929:95). Frei (1929:93–94) and Nyrop (1930, VI:220) also mention that certain modern dialects show a similar pattern, as in the following example Apportez-moi du lait pour les enfants boire ‘Bring me milk for the children to drink’ and … un oreille pour moi dormir et un saucisson pour moi manger ‘… a bonnet for me to sleep and a sausage for me to eat.’ The tonic (non-nominative) pronoun moi as subject of dormir ‘to sleep’ and manger ‘to eat’ indicates that the embedded subject position is not assigned nominative Case—it either receives Case from the preposition pour or realizes default case. There’s one key difference between HC examples such as (8b) Li te ale nan fèt la pou li te ka fè and these Middle French examples with overt subjects in the embedded infinitival clauses, namely the fact that the embedded clause in the HC example is finite whereas the embedded clause in the Middle French example is nonfinite. But we’ve already noted the fact that HC finite verbs are often homophonous with French infinitives or past participles. This pattern was even more pronounced in the 17th and 18th century when the final /r/of French infinitives such as finir, courir, ouvrir, voir, boire, etc., was often silent (Nyrop 1899, I:293–294, 1903, II:62; Brunot 1906, II:273; Gougenheim 1951:30; Hazaël-Massieux 2008:19; but see Nyrop 1903, III:153–154 and Brunot and Bruneau 1913, IV:208–210 for evidence of variation). The pattern in (8b) would then emerge as the outcome of a reanalysis process: French nonfinite small clauses (e.g., as the nonfinite complement of pour) were reanalyzed by HC speakers as full-fledged finite clauses where the main invariant verb could co-occur with preverbal TMA markers on

440 Enoch O. Aboh and Michel DeGraff the model of the French verbal periphrases that are popular in the spoken varieties of French described in Gougenheim (1929). In effect, such a reanalysis, from nonfinite small clauses in French to full-fledged finite clauses as complement of prepositional complementizers, led to embedding possibilities in HC that are now more complex than their analogs in both Middle and Modern French. This is yet another counterexample to the Creole-simplicity claims that are so prevalent in Creole studies. In a related vein, let’s take another look at (9) where the finite embedded clause is obligatorily introduced by the combination pour + que—with que ‘that’ directly selecting for the finite clause. Here there’s another point of comparison with HC that may support our reanalysis scenario for the emergence of the clausal structure selected by HC pou. One important fact about the translation of (9) into HC, is that HC ke, unlike its Modern French etymon French que, is optional: (10)

Jan achte yon liv pou (ke) pitit li John bought a book for that son 3sg ‘John has bought a book such that his son can read it.’

ka can

li read

li. 3sg

The optionality of HC ke is found in other instances (DeGraff 2007:109): (11)

Jinyò konnen (ke) Jezila renmen Jinyò know comp Jezila love ‘Jinyò knows (that) Jezila loves her a lot.’

l 3sg

anpil. much

Here too we find a contrast with Modern French where the complementizer que in similar examples cannot be omitted: (12)

Jeanne sait *(que) Jeannette l’aime Jane knows comp Jeannette her-loves ‘Jane knows that Janet loves her very much.’

beaucoup. much

What about earlier varieties of French? As it turns out, complementizer que in these earlier varieties was optional, somewhat on a par with HC. The optionality of HC ke is not surprising once we consider 17th-century French where que was also optional (Nyrop 1930, VI:159). To summarize: while HC and English share null complementizers, this is not the case in Modern French, and while both French pour and English for select for nonfinite clauses, only HC pou selects for either finite or nonfinite clauses. The selectional properties of HC pou can be analyzed as the outcome of reanalysis based on patterns in earlier varieties of French. If we assume that (apparent?) optionality in the realization of a complementizer induces an increase in complexity, HC is, in this particular respect, more complex than Modern French. We now look at the function of HC pou as modality marker (see Koopman and Lefebvre 1982; Sterlin 1989). The latter use of pou is reminiscent of patterns in both

A Null Theory of Creole Formation Based on Universal Grammar 441 French and in the Gbe languages. Consider the example in (13a) where HC indicates deontic modality, in a way similar to English ‘to’ or to the French (quasi-)modal periphrastic construction être pour + infinitival V as in (9b). The latter construction is documented in the history and dialectology of French (with examples both from the 17th century and from dialects spoken in Picardy, Midi, Provence and Canada in Gougenheim 1971:120–121). (13)

a. Se Jinyò ki pou foc Jinyò comp mod ‘It’s Jinyò who had to come.’

te ant

b. Je suis pour me marier I am for me marry ‘I am to get married next week.’

vini. come

la the

(HC)

semaine prochaine. week next (Canadian French)

We find related constructions in the Gbe languages as well (Aboh 2006a). The following Gengbe examples indicate that the dative preposition né in (14a) can also be used as conditional mood marker as in (14b) and injunctive mood marker as in (14c) (see Aboh 2015 for discussion): (14) a. Kòfí pò tómὲ né Kofi hit ear-in né ‘Kofi slapped Kojo’s face.’

Kɔˋ jó. Kojo

b. Né Kòfí vá á mì né Kofi come top 2pl ‘If/when Kofi comes, call him.’

yrɔˋ -ὲ. call-3sg

c. Kɔˋjó glɔˋ n bé Kwésí né Kojo said that Kwesi né ‘Kojo said that Kwesi should come.’

vá. come

These examples indicate that both French and the Gbe languages contributed to the emergence of modal pou in HC. (Aboh 2006a discusses related facts in the grammar of Saramaccan; also see Corne 1999 for related facts in a diverse range of French-based Creoles, under the umbrella label of ‘congruence.’) Here we’ve introduced a concrete sample of HC/French (dis)similarities related to verbal inflectional morphology and the use of prepositional and finite complementizers in HC and French. The word order and semantic similarities are pervasive while the dissimilarities seem to touch on well-delimited aspects of morphosyntax such as the selectional properties and optional pronunciation of certain complementizers and the realization of inflectional verbal morphology. These HC patterns can be analyzed as the outcome of reanalysis in combination with substrate influence via second language acquisition, thus incorporating influences from both the French superstrate and the Gbe substrates. These patterns

442 Enoch O. Aboh and Michel DeGraff contradict any break-in-transmission scenario that postulates a reduced-pidgin stage in the history of HC. Furthermore, this combination of superstrate cum substrate influence seems to introduce a certain amount of local complexity in the relevant domains once certain crucial assumptions are made about the relevant complexity metric (see note 6). This is another illustration that shows that ‘complexity’ cannot be determined in absence of any theory of grammar. For example, the lexical entry for pou needs to be made more ‘complex’ than that of its French etymon pour to the extent that ‘complexity’ can be measured by the inventory of combinatory possibilities or the amount of structure entailed by said combinations: the complement selected by HC pou can be either a nonfinite small clause or a full-fledged finite clause while that of French pour is a nonfinite clauses, unless pour first takes a CP that is headed by que. In effect, HC pou has one more option for its complement (namely, a finite TP) than French pour (which does not select for finite TPs). Such a local increase of ‘complexity’ given a particular set of theoretical assumptions is a running theme in our ongoing survey.

17.3.2.2 From V-to-T to V-in-Situ: Restructuring vs. Simplification Now we consider the profile of verbal inflectional morphology in HC and its consequences for its morphosyntax. In exceptionalist theories of creolization, the absence of inflectional morphology in Creole languages is taken as strong evidence for postulating some drastically reduced Pidgin or some fossilized early interlanguage as the ancestor of the Creole (e.g., Bickerton 1981, 1999; McWhorter 2001; Plag 2008a,b). Such views fail to provide any insight into the structural properties of Creoles and into the deep similarities between Creole formation and general patterns of language change. A case in point is the loss of inflectional morphology and V-to-T in the history of English and Mainland Scandinavian languages. Consider the Early Modern English (ENE) sentences in (15a–d). In these examples, the finite verb in boldface precedes the negative marker as depicted in (15e): (15)

a. It serveth not.

[Middle English]

b. Wepyng and teres counforteth not dissolute laghers. (Roberts 1993) c. Quene Ester looked never with switch an eye.

(Kroch 1989)

d. … if man grounde not his doinges altogether upon nature.

(Kroch 1989)

e. Verb placement in ENE Subject … Vfinite … not/never … The patterns in (15) contrast with Modern English (NE) where the inflected verb cannot precede negation, hence the ungrammaticality of (16a) as opposed to (16b). (16)

a. *He speaks not English. b. He does not speak English.

A Null Theory of Creole Formation Based on Universal Grammar 443 The contrast in (15) vs. (16) indicates that while the lexical verb can precede negation in Middle English (ME), this is impossible in NE, where lexical verbs must follow negation. In such contexts, any tense or agreement specification is expressed by pleonastic do as in (16b) or by a relevant auxiliary/modal. The lexical verb on the other hand occurs in its nonfinite form when it follows negation as in (17): (17)

a. John will not sell his new car. b. John must never sell his new car.

Unlike lexical verbs, auxiliary verbs, such as be and have, precede negation in NE. The schema in (18c) describes verb placement in NE. (18)

a. John has not bought a new car. b. John has never bought any car. c. Verb placement in NE Subject … Aux/Mod/do … not/never … Vnonfinite …

In the literature on diachronic changes in English, the path from the examples in (15) to those in (18) is commonly analyzed in terms of V-to-T movement vs. lack thereof. There are many competing proposals for the implementation of this basic idea (Kroch 1989; Roberts 1993, 1999; Vikner 1997; Rohrbacher 1999; Han and Kroch 2000; Bobaljik 2002; among others; see chapter 18 for discussion of models of syntactic change). Here we adopt the minimal set of assumptions without committing ourselves to any specific implementation. Starting with a clause structure similar to (19a), we take (15a) to derive from movement of the verb to T (i.e., above the negative phrase) giving the order Subject–Verb–Negation in (19b). Furthermore we take V-to-T movement to be related to the nature of verbal inflectional morphology and the licensing thereof. In addition, the negative element not is analyzed as a negative adverbial (on a par with never) and realizes the specifier position of NegP, the functional projection responsible for the expression of sentential negation. (19) a. [TP [T [NegP not [Neg [VP ]]]]] b.

TP T’

DP it

T serve-th

NegP not

Neg’ Neg

VP

444 Enoch O. Aboh and Michel DeGraff The impossibility of (16a) suggests that V-to-T movement is lost in NE. Here, the lexical verb cannot raise to T. As a consequence, sentential negation requires the presence of a pleonastic element (e.g., do), a modal, or a non-modal auxiliary which expresses tense or agreement specification. The sentence under (16b) is represented as follows. TP

(20) DP he

T’ T does

NegP not

Neg’ Neg

VP V speak

DP English

ENE and NE thus differ in that the former allowed V-to-T movement unlike the latter where the lexical verb must stay within V. What also got lost in NE is a subset of verbal agreement affixes (e.g., the 2nd singular suffix -est as in Thou singest) and the possibility of stacking of tense and agreement affixes as in Thou showedest (Kroch 1989:238; cf. Bobaljik 2002). The loss of various verbal affixal combinations is itself understood as a reflex of language contact phenomena (e.g., second language acquisition by Scandinavian invaders; cf. Kroch et al. 2000). With this in mind, let us now return to the diachronic path from French to HC in the context of language contact with both French and Niger-Congo languages such as Gbe. Much of the discussion here recapitulates the findings in DeGraff (1997, 2005) to which we refer the reader for a detailed discussion. The facts about verb placement in ME are comparable to the following French data (adapted from Pollock’s 1989 seminal comparative study of Romance and Germanic languages). (21)

a. Joujou (ne) parle pas Joujou neg speak negadv ‘Joujou doesn’t speak Creole.’

Créole. Creole

b. Joujou (ne) parle jamais Joujou neg speak negadv ‘Joujou never speaks Creole.’

Créole. Creole

In French, finite lexical verbs precede negative adverbials (i.e., pas/jamais), as in the ME data, but unlike in NE where lexical verbs generally follow negative adverbials (i.e., not/never). Another difference is that Standard French, unlike English, allows the

A Null Theory of Creole Formation Based on Universal Grammar 445 combination of the negative clitic ne and the negative adverb pas. However, this clitic is optional in most spoken French. Example (22) therefore indicates the position of the French finite verb vis-à-vis the negative adverb pas (comparable to English not). (22) Verb placement in French Subject … Vfinite … pas/jamais … Verb placement in French is like in ME: it exhibits V-to-T movement. In addition, French is similar to ME in that it displays a larger set of inflectional suffix combinations on the verbs than NE does, including the stacking of tense and agreement suffixes as in aim-er-ai (love+FUT+1sg). With these observations in mind, let us now turn to HC. As already discussed in DeGraff (1993, 1997, 2005), HC lacks V-to-T movement: the lexical verb must follow the negative marker as indicated in (23a), the equivalent of the French example (21a). The sentence in (23b) shows that a sequence comparable to that of French is ungrammatical in HC: (23) a. Jinyò pa pale Kreyòl. Jinyò neg speak Creole ‘Jinyò doesn’t speak Creole.’ b. *Jinyò pale pa Kreyòl. Jinyò speak neg Creole In addition, the HC negative particle pa precedes the negative adverb janm ‘never’ whose etymon is French jamais ‘never’ (24). Both negative elements precede the verb, thus indicating that HC lexical verbs are pronounced in a low structural position. (24) Jinyò pa janm pale kreyòl. Jinyò neg never speak Creole ‘Jinyò never speaks Creole.’ Thus, while finite lexical verbs precede the negative elements pas/jamais in French, the HC equivalents pa/janm, precede the lexical verb, as depicted in (25): (25) Verb placement in HC Subject … pa/janm … Vfinite The contrast between French (22) and HC (25) reminds us of the diachronic path from ME with V-to-T movement to NE without V-to-T movement. In addition, both French and ME display a set of inflectional affixal combinations that is larger than in HC and NE, respectively. Yet it must also be noted, as we already have, that popular colloquial varieties of French (i.e., the varieties that were the terminus a quo of

446 Enoch O. Aboh and Michel DeGraff HC) show a preference for analytic periphrastic constructions with invariant verbal forms (either infinitival or participal forms) of the sort described in Frei (1929) and Gougenheim (1929). Be that as it may, HC shows even less inflection than these nonstandard varieties of French. Building on previous discussion, we propose that the absence of V-to-T in HC, like in NE, correlates with a reduction in verbal inflectional morphology, notwithstanding a possible time lag between the two sets of phenomena (see DeGraff 1997, 2005 and references therein for additional details on the emergence of the HC patterns with special attention to their cognates in French periphrastic constructions). For exceptionalist views of creolization, the reduction in verbal inflectional morphology in HC is evidence for structural simplification as a result of prior pidginization. Yet, the comparison with English indicates that things might not be so simple. Here we discuss two facts that suggest that in both English and HC the reduction in inflectional morphology and the absence of V-to-T movement may have triggered a local increase of complexity vis-à-vis one particular aspect of the morphosyntax of these languages. Let’s start with English. Recall that Roberts (1985, 1993), Kroch (1989a,b) and Han and Kroch (2000), among others, have argued that the loss of V-to-T and the reduction in verbal inflectional morphology are related to the rise of do-support. One common explanation is that do-support is a morphosyntactic device that licenses inflectional specifications in absence of V-to-T movement. This can be seen in structure (20) where do is inserted under T where it bears tense and agreement morphology. V-to-T movement aside, there is no structural difference between the ME representation in (19b) and the NE one in (20). The parallel between these two structures indicates that the reduction in verbal inflectional morphology and the loss of V- to-T movement does not trigger a simplification in the basic clause structure. Instead, what we see here is that absence of V-to-T movement entails a new morphosyntactic strategy, that is, do-support. The latter entails a new set of morphosyntactic constraints to be acquired by learners of NE (e.g., do-support in interrogative, emphatic constructions, negative imperatives, negative sentences; see Han and Kroch 2000 for discussion). Any approach that focuses solely on the loss of inflectional morphology from ME to NE will miss the fact that such a loss did trigger a new battery of morphosyntactic strategies that certainly needs to be taken into account in any evaluation of complexity differences between ME and NE. Similar issues arise in the comparison between French and HC. It is indeed arguable that, as in the history of English, the reduction in verbal inflectional morphology from French to HC (e.g., the disappearance of tense and agreement suffixes such as -ais in Je dansais ‘I was dancing’) and the concomitant loss of V-to-T movement correlates with the emergence of a series of preverbal TMA markers that fulfill similar functions as TMA suffixes in French (to wit: HC Mwen t ap danse ‘I was dancing’ with t(e) as the anterior marker and ap as the progressive marker). In this regard, the following examples show that the HC negative marker pa is higher in the structure than

A Null Theory of Creole Formation Based on Universal Grammar 447 its English counterpart not: whereas the tense specification in English is realized in a position preceding not as in John did not dance, HC pa must precede all TMA elements, including the anterior marker te (cf. DeGraff 1993:63): (26) a. Jan pa t- av- ale nan mache. John neg ant irr go in market ‘John would not have gone to the market.’ b. *Jan te- pa (av-) ale nan mache. John ant neg irr go in market The sentences under (27) involve negative concord where the negative marker pa is combined with janm ‘never’ in addition to other TMA markers. In all these examples, pa precedes all the TMA markers which in turn precede the verb. (27) a. Jinyò pa te janm ale Miami. Jinyò neg ant never go Miami ‘Jinyò never went to Miami.’ b. Jinyò pa t- ava janm ale Miami. Jinyò neg ant irr never go Miami ‘It was never the case that Jinyò was likely to go to Miami.’ c. Jinyò pa t- ap janm ale Miami. Jinyò neg ant irr never go Miami ‘Jinyò would have never gone to Miami.’ Examples of this sort indicate that the syntax of HC negative marker is different from that of its French etymon pas which follows the finite verb of its clause. According to DeGraff (1993), HC pa heads the higher NegP from where it dominates the series of TMA markers and verb: (28)

Jinyò

NegP Neg pa

TP T t-

ModP Mod ava

NegP janm

Neg’ Neg

VP ale Miami

448 Enoch O. Aboh and Michel DeGraff We observe from (28) that reduction in inflectional morphology and loss of V-to-T movement do not correlate with simplification of structure. Instead, the lack of V-to-T movement in HC corresponds to a clause structure where TMA-markers license the relevant projections. The emergence of these TMA markers comes with its own increment in local complexity due to their combinatorics, the ordering constraints therein and the distinct semantics of the various combinations. Witness the subtle semantic distinctions between (27b) and (27c) and the contrast previously described in 17.2.1 with regard to te- dwe versus dwe-te sequences. In this respect, another type of local complexity in HC concerns temporal interpretation. Indeed, the absence of V-to-T movement in HC means that the verb itself does not bear temporal specifications. Instead, these specifications are deduced from the combination of TMA markers and the lexical aspect of the verb. Put differently, temporal specification is computed based on TMA markers and Aktionsart. HC, like many Creoles and Niger-Congo languages (including Gbe; see (31)), displays an asymmetry between eventive/dynamic verbs and stative verbs: when they occur without any TMA marker, eventive/dynamic verbs are interpreted as perfective, while stative verbs are interpreted as present (this asymmetry has been well known in Creole studies since Bickerton 1981). In the literature on African languages going back to the 1960s, this phenomenon has been labeled ‘factative effect’ (see Déchaine 1994 for an overview). An illustration of this effect is given in (29), taken from DeGraff (1993:77– 79). As is shown by the French translation, the interpretation of a bare eventive verb (29a) is comparable to the French passé composé even though such sequences are commonly translated with past tense marking in English. On the other hand, bare stative verbs are interpreted as present. This is illustrated by the stative verb of the matrix clause in (29b): (29) a. Prèske pèsonn pa vote pou almost nobody neg vote for ‘Presque personne n’a voté pour Manigat.’ ‘Almost nobody voted for Manigat.’

Manigat. Manigat [French] [English]

b. Mwen pa kwè pèsonn ap vini. 1sg neg believe nobody fut come ‘I don’t believe that anybody will come.’ Contrary to French where temporal interpretation can be read off the morphology of the verbs, HC therefore requires temporal interpretation to be computed based on both the lexical aspect of verbs and the combinations of TMA markers. In turn, the combinations of TMA markers are regulated by complex constraints, and so is their semantics (Fattier 1998, 2003; Howe 2000; Fon Sing 2010; etc.). In addition, lexical aspect interacts with argument structure, and HC displays subtle interpretative nuances that are sensitive to the form and semantics of the internal argument. As is shown by the example in (30) from Déchaine (1994), a bare noun phrase with non-individuated

A Null Theory of Creole Formation Based on Universal Grammar 449 generic reference allows a habitual reading while a determined noun phrase triggers a perfective reading (30b): (30) a. Jinyò vann chat. Jinyò sell cat ‘Jinyò sells cats.’ b. Jinyò vann chat Jinyò sell cat ‘Jinyò sold the cat.’

la. det

So far we have surveyed an intricate set of distributional and interpretive facts in HC, all related to verbal inflectional morphology and the clausal domain. From our perspective in this chapter, popular exceptionalist claims to the effect that Creoles are the simplest languages (as, for example, in McWhorter 2001 and related work) are misleading because they merely rely on presence versus absence of various sorts of overt morphology without taking into account the (covert) structural correlates of said morphology vs. lack thereof. Besides that question of complexity, a question that is of interest here is whether our observations about temporal interpretations in HC have analogs in phenomena that are attested beyond Creole languages. As noted in Avolonto (1992), and Déchaine (1994), the asymmetry just described between eventive/dynamic verbs and stative verbs is also found in the Gbe languages involved in the contact situation that led to the development of HC. The following example illustrates this asymmetry in Gungbe (Aboh 2004). (31)

a. Súrù sà àsé lɔˊ. Suru sell cat det ‘Suru sold the cat.’ b. Súrù nyɔˊ n àsé lɔˊ. Suru know cat det ‘Suru knows the cat.’

Given these similarities between HC and the Gbe languages, we can reasonably hypothesize that the latter, as the native languages of adult learners of French or (proto-)HC, influenced the emergence of the Haitian TMA system. As a consequence, HC appears to have morphosyntactic properties in common with both French and the Gbe languages—with patterns from all the languages contributing to various local domains of complexity in the emergent Creole (e.g., in the Creole’s clause structure). This leads us to the discussion of superstrate inheritance combined with substrate influence and concomitant increases of local complexity, specifically in the nominal domain.

450 Enoch O. Aboh and Michel DeGraff

17.3.2.3 Superstrate Inheritance, Substrate Influence, and Innovation in the Nominal Domain We now turn to relative clauses in HC as compared to French and Gungbe. This discussion focuses on restrictive relative clauses and recapitulates some of the findings in Aboh (2006b) and Aboh and DeGraff (2014). Starting with French relatives, we observe that they display the word order pattern in (32a): the relative clause follows the noun which itself is introduced by the determiner. Because Modern French does not allow singular bare noun phrases, the determiner is compulsory in relative clauses as indicated by the ungrammatical example (32b): (32)

a. Le cheval que j’ai acheté the horse that 1sg-have bought ‘The horse that I bought’ b. *Cheval que j’ ai acheté horse that 1sg-have bought

Roodenburg (2006) shows that French allows coordinate plural nouns of the type in (33a) but even such noun phrases exclude relativization, hence the ungrammatical sentence (33b): (33)

a. Étudiants et professeurs manifestent à Montréal. students and professors demonstrate in Montreal ‘Students and professors are demonstrating in Montreal.’ b. *Étudiants et professeurs qui ont students and professors who have

manifesté à Montréal demonstrated in Montreal

Let us assume the format of French relative clauses as in (34a). We adopt Kayne’s (1994) complementation analysis of relative clauses in (34b) where the determiner introduces the relative clause and the relativized noun phrase occurs in Spec,CP. As for French DP structure, we assume that it involves at least two functional projections DP and NumP where the latter is responsible for the expression of number. We restrict ourselves to this partial representation, and we refer the reader to De Vries (2002) for a careful discussion. (34)

DP D le

NumP Num +SG3

CP cheval

C’ C que

j’ai acheté

With this description in mind, let us now look at relative clauses in the Gbe languages. Aboh (2002, 2004, 2005) gives ample details on relative clauses in Gungbe some of

A Null Theory of Creole Formation Based on Universal Grammar 451 which we recapitulate here. In Gungbe and other Gbe languages bare noun phrases (i.e., determiner-less noun phrases) are allowed in all argument positions, unlike in Modern French and most contemporary varieties of Romance and Germanic languages. In this respect, Gbe is similar to earlier varieties of French where determiner-less nouns, up until the 17th century, could occur in argument positions as well—unlike their distribution in Modern French (see Mathieu 2009 for data and references).17 The congruence of the distribution of argumental bare nouns in both Gbe and earlier varieties of French would have favored their distribution in HC as well, as described here and in Aboh and DeGraff (2014). Bare noun phrases in Gbe can optionally be accompanied with nominal modifiers such as adjectives. The object noun phrase in example (35a) illustrates this. Similarly to such noun phrases, noun phrases including determiners (i.e., the specificity marker lɔˊ and the number marker lέ) can also occur in the same set of argument positions. This is shown by the object noun phrase in (35b). Example (35c) summarizes the sequencing of the noun phrase in Gungbe as discussed in Aboh (2002, 2004) and subsequently. In these examples, we translate the specificity- marked noun phrases as ‘N in question’ but we refer the reader to Aboh and DeGraff (2014) for detailed discussion of bare and determined noun phrases in Gungbe and Haitian Creole. (35)

a. Súrù xɔˋ sɔˊ (ɖàxó). Suru buy horse big ‘Suru bought (a)(big) horse(s).’ b. Súrù xɔˋ sɔˊ ɖàxó àwè lɔˊ lέ Suru buy horse big two det pl ‘Suru bought the two big horses (in question).’ c. Noun–[Modifier]–Determiner–Number

As is clear from (35c), Gbe noun phrases display structures where the noun systematically precedes its modifiers which in turn precede the determiner and the number marker. This ordering is fixed, and the relative clause (here in square brackets) also occurs within the slot of modifiers where it is followed by the determiner. This yields the order in (36b) where the determiner associated with the head noun occurs on the right edge of the relative clause (see Aboh 2002, 2005, 2010). (36) a. Súrù xɔˋ sɔˊ ɖàxó [ɖě Sàgbó wlé]. Suru buy horse big rel Sagbo buy ‘Suru bought the big horses that Sagblo caught.’

17 Mathieu’s (2009) analysis makes bare NPs contingent on the availability of interpretable φ-features on the N. These φ-features are, in turn, related to overt agreement features on N. The latter are present in Old French but absent in both HC and Gbe whose morphological profiles seem to constitute a challenge for Mathieu’s analysis.

452 Enoch O. Aboh and Michel DeGraff b. Súrù xɔˋ sɔˊ ɖàxó ɖě Sàgbó wlé lɔˊ lέ. Suru buy horse big rel Sagbo catch det pl ‘Suru bought the big horses (in question) that Sagbo caught.’ c. Noun–[Modifier]–[relative clause]–Determiner–Number In accounting for the word order properties of relative clauses in Gungbe, Aboh (2002, 2005) argues that Gbe languages differ from Romance (e.g., French) and Germanic (e.g., English) because the relative clause must raise to the specifier positions of the number marker and the specificity marker as depicted in (37). (37)

DP spec

D’ D l c′

NumP spec

Num’ Num lε′

CP

spec d sc′ àxó C eˇ

C’ Sàgbó wlé

This movement serves to license specificity and number in Gbe. Without getting into the details of the analysis in Aboh (2002, 2005), what our description of relative clauses in French and Gungbe suggests is that while it is possible to postulate a similar underlying structure for both languages, Gbe languages differ from French in two crucial respects: (i) the former languages display noun–modifier sequences and (ii) they require successive movements of the relative clause CP to Spec,NumP and Spec,DP. These contrasts between Gbe and French can help us understand the formation of relative clauses in HC. Indeed, HC is similar to Gungbe and earlier varieties of French to the extent that it allows bare nouns in all argument positions similarly to determined noun phrases. But, HC and Gbe, unlike all varieties of French, have their definite determiners follow the noun as illustrated in (38). The sequencing in (38c) shows the parallels and differences among HC, Gungbe, and French. Indeed, HC is like French in exhibiting both prenominal and postnominal adjectives though the noun in HC must precede the definite determiner, unlike the noun in French: (38) a. Jinyò Jinyò

achte (gwo) bought (big)

chwal (blan). horse(s) (blan)

A Null Theory of Creole Formation Based on Universal Grammar 453 b. Jinyò achte (gwo) chwal (blan) yo. ‘Jinyò bought the big white horses (in question)’ c. [Modifier]–Noun–[Modifier]–Determiner With regard to relative clauses, HC also displays bare relative clauses of the type illustrated in Gungbe. (39) a. Moun ki p ap travay p ap manje. People who do not work will not eat ‘People who don’t work will not eat’ b. [Moun ki p ap travay la] People who do not work DET ‘The person who doesn’t work will not eat.’

p will

ap manje. not eat

c. Noun –[relative clause] (determiner) A number of properties arise that differentiate HC and French from Gungbe. Indeed, both French and HC display a prenominal indefinite article (yon liv ‘a book’) and prenominal modifiers, as in gwo chwal ‘big horse(s),’ alongside with postnominal modifiers as in chwal blan ‘white horses’ (cf. DeGraff 2007), while Gungbe (and Gbe generally) exhibit postnominal modifiers and postnominal determiners only. And it is striking that the classes of pre- and post-nominal adjectives are similar across HC and French (Sauveur 1999). On the other hand, Gungbe and HC pattern alike with regard to relative clauses. In these languages, the relative clause is to the left of the determiner that right-bounds the DP while it is to the right of the DP-initial determiner in French. Aboh (2006b) argues on this basis that HC involves a derivation similar to that in (37) where the CP relative clause raises to Spec,NumP and Spec,DP, as illustrated in (40), which represents the bracketed sequence in example (39b): (40)

DP D’

spec D la

NumP

spec

Num’ Num

CP spec C ki

C’ p ap travay

454 Enoch O. Aboh and Michel DeGraff Aboh (2006b:239/240) argues that ‘the parallels between Haitian and Gungbe determiner phrases can be regarded as an instance of pattern transmission because both languages share similar properties with regard to the function and syntax of the nominal left peripheral elements, such as the specificity markers lɔˊ/la and the number markers lέ/yo.’ The similarities between these languages are further reinforced by the fact that both languages display constructions that Aboh (2005, 2010) characterized as event relativization. In such structures, the verb is relativized but doubles such that there are two tokens of the same verb in the clause (see Glaude and Zribi-Hertz 2012 for discussion). This is unlike French (or English for that matter) where such constructions can only be formed by the noun phrase the fact…/le fait… underlined in the translations. (41)

a. [Yì ɖě Pɔˊ l yì] wà lέblánú ná mì. go rel Paul go do sadness for 1sg ‘The fact that Paul left made me sad.’ b. Yo di [pati Pol pati a] fè Elsi 3pl say leave Paul leave det make Elsi ‘They said that the fact that Paul left makes Elsi sad.’ (Glaude and Zribi-Hertz 2012:84)

tris. sad

These constructions are structurally similar to verb focus with doubling (known as ‘predicate cleft’) which both HC and Gungbe display contrary to French: (42) a. Yì wὲ Pɔˊ l yì. go foc Paul go ‘Paul left.’ b. Se pati Pol pati. foc leave Paul leave ‘Paul left.’ The constructions in (42) differ from the event relativization in (41) in two ways. Firstly, they lack the factive reading that is highlighted by the ‘the fact that …’ translation in (41), and they involve a focus marker that is absent in (41): se in HC and wὲ in Gungbe. We will not discuss the morphosyntax of these constructions here, and the reader is referred to DeGraff (1995), Aboh and Dyakonova (2009), Glaude and Zribi- Hertz (2012), and references therein. Assuming that these constructions are properties of the left periphery, we therefore reach the conclusion that HC and Gungbe share significant properties with regard to the morphosyntax of their left periphery: the left periphery of the clause and the left periphery of the noun phrase. Yet HC resembles French when it comes to the ordering of nominal modifiers vis-à-vis the head noun.

A Null Theory of Creole Formation Based on Universal Grammar 455 As is clear from this discussion, the emergence of such a ‘hybrid’ system in HC grammar necessarily comes with its share of local complexification such as: (i) the relation between verb focus with doubling (or predicate cleft) and relativization; (ii) the relation between left peripheral markers of the nominal domain and those of the clausal domain, which in turn allows the nominal specificity marker to be used as a clausal determiner both in Gungbe and in HC. The HC examples are taken from Glaude and Zribi-Hertz (2012:91n10): (43) [context]: Pou ki sa ou leve? ‘Why are you getting up?’ a. M ale. 1sg go ‘I’m going.’ b. M ale a.18 1sg go det ‘I’m going, as you knew I would.’ In Aboh’s Gungbe, the clausal determiner occurs in contexts where it is introduced by the relative marker ɖě as shown in (44) (see Aboh 2004, 2005 for discussion): (44) [context] There is some noise outside: A: Étέ wὲ jɔˋ ? what foc happen ‘What happened?’ B: ɖě rel

ùn 1sg

ɖɔˋ say

ná wè ɖɔˋ fut 2sg that

Súrù ná gɔˋ lɔˊ Suru fut return det

à má ɖì, bò yì kpɔˊ n 2sg neg believe coord go look

hɔˋ ntò outside

à mɔˋ n-ὲ flέn. 2sg.fut look-3sg there ‘When I told you that Suru will return (as you certainly remember) you didn’t believe me. You can look outside and you will see him there.’ As discussed in Aboh and DeGraff (2014) these examples show that the licensing properties of certain determiners in these languages cut across the morphosyntax of the nominal and clausal domains in intricate ways.

18 The HC definite determiner la has allomorphs lan, nan, a, an: chat la ‘the cat’, chanm lan ‘the bedroom,’ machin nan ‘the car,’ dan an ‘the tooth,’ bra a ‘the arm.’ The French etymon is the deictic locative adverbial and discourse particle là in Spoken French as in T’as vu ce chat-là là ‘Did you see that cat there, yeah?’ (with the first là as locational deictic adverbial and the second là as discourse particle).

456 Enoch O. Aboh and Michel DeGraff What this description shows is that HC displays a morphosyntax that shares properties of both French and Gbe. Yet, not only does the HC pattern represent an innovation, but it also involves apparent cases of local complexification as compared to both French and Gbe. Indeed the learners of HC must acquire a complex of DP-and CP-related morphosyntactic properties that are not found in French or in Gbe—for example, the presence of both pre-and postnominal modifiers and both pre-and postnominal articles (such patterns seem marked from the perspective of Greenbergian universals; see chapter 15). The overall point here is that the specificities and intricacies of HC grammar, once considered as part of a system, cannot be captured—and certainly not analyzed— in any approach that takes Creole languages to all belong to the same type, with Creole formation boiling down to simplification qua reduction of morphological markers.

17.4 Creole Formation as Normal Language Change: A Recursive L2A–L1A Cascade At this stage of the chapter, the question facing us is: How did the respective contributions of the diverse learners in the language contact ecology of the colonial Caribbean contribute to the formation of the relatively stable and uniform sets of I-languages that now go by the label Creole languages? If we focus on the particular case of Haiti, it is now well established that HC is a relatively homogeneous language, notwithstanding the dialectal differences across geographical areas (Fattier 1998). Yet, given the history of HC, it is expected that the first proto-Creole varieties in Caribbean colonies would have manifested the structural influences of a variety of substrate languages. What would have prevailed in the earliest stages of HC is a set of structurally distinct proto-HC varieties, each showing primary influence from a specific set of substrate languages, depending on the ethnic composition of the corresponding area. One fact that is revealed in Fattier’s (1998) extensive dialect atlas for Haitian Creole is that, despite class-and region-based variations, HC is relatively uniform, especially in its morphosyntax. What’s striking is that the documented dialectal differences seem largely orthogonal to the inter-substrate differences that would have prevailed at the earliest stages of HC formation. In other words, in Haiti today one would be hard pressed to identify, say, a Gbe-influenced HC dialect vs. a Bantu-influenced HC dialect. What we do find are Gbe-influenced patterns (e.g., postnominal determiners) and Bantu-influenced patterns (e.g., morphemes with Congo cognates) in all dialects of HC. In light of these observations, it thus appears that L2A did play a key role in Creole formation, with both the native languages of the L2 learners and general strategy of L2A influencing the shapes of their respective interlanguages and the ultimate outcome of Creole formation. Our hunch is that L2A plays a similar role in other instances of

A Null Theory of Creole Formation Based on Universal Grammar 457 language change, as in the history of English (see Kroch et al. 2000 and chapter 18). L1A would have also played a key role in Creole formation, as it does in other instances of language change: the Caribbean-born (Creole) children would have created stable and relatively homogeneous I-languages such that any prior substrate-influenced cross- dialectal differences would have been leveled off through successive L1A by larger and larger groups of Creole children. The latter, no matter the languages spoken by their parents, would have created their own Creole I-languages (I-Creoles in the terminology introduced in DeGraff 1999a:8–9). The emergence of these I-Creoles in the minds of these early Creole (i.e., Caribbean-born) speakers was conditioned by PLD containing proto-Creole patterns influenced by a diverse set of substrate and superstrate languages and by mutual accommodation across boundaries of these diverse heritage languages. These languages were the L1s of the older non-Creole generations—be they speakers of Niger-Congo languages or speakers of French(-derived) varieties, including proto-HC varieties. It is through successive L1A instances by Creole children that patterns influenced by specific substrate languages would have spread throughout the population at large. And it is also through such L1A that the proto-Creole varieties would acquire stable norms as natively spoken varieties by larger and larger groups of native speakers—Creole speakers with increasing socio-political influence. Thus arises the L2A–L1A cascade in Creole formation. Similar homogenization processes (or ‘normalization’ in the terminology of Chaudenson and Mufwene 2001) have been documented in real time by Newport (1999) and Kegl et al. (1999). These two studies convincingly show the capacity of children to regularize certain patterns in their PLD. A caveat is in order: we do not consider these two studies to be replicas of Creole-formation scenarios and we do not commit ourselves to the structural details and analyses in these studies: the socio-historical circumstances in Newport and Kegl et al.’s sign language studies differ greatly from what obtained in the case of Caribbean Creole formation, and the nature of the input and output in the sign-language and Creole cases is also different in some crucial aspects—partly due to differences in modalities (spoken vs. signed). But what these studies help us evaluate is the role of children vs. adults when exposed to language input that seems unstable and non-native to varying degrees (see DeGraff 1999b:483–487 for related caveats; cf. Mufwene 2008:ch. 5 for implications vis-à-vis the emergence of communal norms at the population level). Indeed, Newport and Kegl et al. focus on learners of sign languages who are creating their L1s from PLD that is nonfluent and unstable. Such PLD does not provide evidence of certain combinations—in, for example, the morphosyntax for TMA marking. Furthermore, the PLD patterns show inconsistent variability. What the children in these studies did is to process this unusually sparse and inconsistent PLD in order to create a stable system with certain combinations that were missing in the PLD. Similar patterns of regularization by children are documented in S. J. Roberts’ (1999) study of the Hawaiian Creole TMA system (see DeGraff 2009:912–914, 934–936, 945). Such studies give further evidence in favor of a particular role of L1A in the L2A–L1A cascade that we’re positing here as crucial to Creole formation.

458 Enoch O. Aboh and Michel DeGraff Though we use the metaphor of a ‘cascade,’ it may be more appropriate to speak of a ‘recursive cascade’ or a ‘series of overlapping cascades’ where the utterances produced by both L1 and L2 learners feed into the PLD for subsequent L1 and L2 learners, and then the latter’s utterances in turn feed the PLD of newly born L1 learners and newly arrived L2 learners, and so on. It is through these ‘recursive L2A–L1A cascades’ that certain patterns among the output of L1-influenced interlanguages become selected, through prior competition, as key triggers for the subsequent setting of stable properties in the I-Creoles (see Mufwene 2008:ch. 7 for a discussion of the complex ecological factors— psycholinguistic, structural, typological, social, and demographic—that may count toward the comparative weighting of patterns in competition in the course of language change, including Creole formation). The fact that the setting of (internal) properties in the Creole I-languages is based on (external) patterns in necessarily heterogeneous PLD automatically creates room for: (i) the appearance of substrate transfer; (ii) individual-level internal innovations such as reanalysis (or ‘selection with modification’ in Mufwene’s terms). In section 17.3, we identifed phenomena within the clausal and nominal left peripheries of HC (e.g., the emergence of prepositional and modal complementizers and determiners in HC) where patterns emerged in HC based on reanalysis of superstrate patterns with influence from certain substrate patterns (for related ideas in a different framework, see Mufwene 2008, which has inspired some of our own work). In terms of current cartographic views (Rizzi 1997; Aboh 2004) these layers in the clausal and nominal domains represent interfaces between, on the one hand, the predicate and its extended projections and, on the other hand, the discourse. Given this characterization, our discussion suggests that these zones of ‘interface’ (e.g., in the left periphery of the nominal and clausal domains) are more open to innovations based on apparent ‘recombination’ of superstrate and substrate properties (Aboh 2006b). When the parameters to be set involved these interface zones, it’s as if learners, as they processed mutually conflicting input from the PLD (input influenced by L1s with distinct parameter settings—for example, with respect to word order and semantics in the DP), converged on a ‘third way’ with an emergent grammar whose output appears to combine in a novel way certain patterns from the source languages (see Aboh and DeGraff 2012 for a DP- related case study). Our approach thus lends itself to identifying grammatical areas where Creoles innovate new parametric values and where local complexification arises as a result of PLD that are unusually complex due to the language contact situation. As far as we know, this is a novel approach to Creole formation to the extent that its basic UG-based assumptions and its faithfulness to historical details make it prone to identify, and to account for, such areas of local complexification, alongside potential areas that may seem ‘simple’ due to certain superficial consequences of adult learners’ strategies.

Chapter 18

L anguage C ha ng e Eric Fu ß

18.1 Introduction In the Principles and Parameters framework, (syntactic) variation between grammatical systems is attributed to different settings for a limited number of parameters that are associated with invariable principles of Universal Grammar (UG) (Chomsky 1981a, 1986b, 1995b; and chapter 14). The latter identify the set of possible human languages and thus define the upper limits of linguistic diversity. The task of acquiring a given grammar then consists of filling in the gaps left open by the principles of UG, that is, detecting the parameter settings which are reflected by the linguistic input the child is confronted with (see chapters 11 and 12). The stimulus that serves to set a given parameter one way or other is usually called trigger experience or cue (see Lightfoot 1999 and note 9 on the latter notion). Apart from accounting for synchronic differences between a set of individual languages, this approach can also be used to describe diachronic differences between historical stages of a single language in terms of parameter values that vary over time (Lightfoot 1991, 1999; Kroch 2001; Roberts 2007). From this perspective, language change is constrained by the requirement that its outcome must meet the specifications imposed by UG. However, it seems that the pervasiveness of change cannot be directly attributed to properties of the human language faculty, in contrast to other universal properties of language. Today, most scholars agree that the possibility (and omnipresence) of change is to be attributed to another ‘external’ universal property of language, namely the fact that language transmission is necessarily discontinuous (see Paul 1880 and Meillet 1904/1905 for relevant statements in early generative work; cf. Halle 1962; Klima 1964, 1965; Kiparsky 1965, 1968; see also Lightfoot 1979, Janda and Joseph 2003). During the process of first language (L1) acquisition, children do not have direct access to (abstract) properties of the target grammar; rather, they construct a ‘new’ grammar based on the linguistic input they receive. From this perspective, language change takes place when linguistic features fail to be transmitted in the course of language acquisition. However, at least in a monolingual, homogeneous speaker community, this

460 Eric Fuß explanation seems to lead to a paradox: how can a target grammar G1 produce an output that differs in significant ways from the input that led to the acquisition of G1 in the previous generation? This is sometimes called the ‘logical problem of language change’ (see Clark and Roberts 1993; Niyogi and Berwick 1998; Roberts 2007:230–231; see section 18.3 for discussion); it is also at the heart of what Weinreich, Labov, and Herzog (1968:102) call the ‘actuation problem,’ arguably the central explanandum in historical linguistics: (1)

The actuation problem ‘What factors can account for the actuation of changes? Why do changes in a structural feature take place in a particular language at a given time, but not in other languages with the same feature, or in the same language at other times?’

An answer to the actuation problem presupposes a thorough grasp of the relationship between the linguistic experience presented to the child and the grammar constructed on the basis of this evidence. Unfortunately, our understanding of these matters is still quite limited, despite recent advances in the formal (and quantitative) study of L1 acquisition (see Guasti 2002, Clark 2009, and c hapters 11 and 12 for overviews; Niyogi and Berwick 1997, 1998, Yang 2002, and Niyogi 2006 for formal/computational approaches to the relationship between language acquisition and change). It is generally assumed that under normal circumstances, L1 acquisition results in an accurate reproduction of the target grammar. However, the very fact of change shows that this assumption is in need of some qualification. In this context, cases of language change constitute a natural laboratory for the investigation of the relative weight of nature (the genetic endowment, i.e., UG) and nurture (the linguistic experience/environment). Diachronic variation can be taken to reflect the ‘limits to attainable grammars’ (Lightfoot 1991:172), in the sense that change reveals how certain linguistic choices which give rise to individual grammars are selected on the basis of the evidence available to the learner (Lightfoot 1979, 1991, 1999; Kroch 2001; Hale 2003, 2007; Roberts and Roussou 2003; Roberts 2007). In addition, the study of change might provide us with a better understanding of the role of general cognitive principles not specific to language (sometimes called ‘third factors’), in particular ‘(a) principles of data analysis that might be used in language acquisition and other domains; (b) principles of structural architecture and developmental constraints … including principles of efficient computation’ (Chomsky 2005:6; see chapter 6). This chapter explores a couple of selected issues that arise when language change is studied from a UG-based perspective, focusing on the following two questions: 1. To what extent can a UG-based perspective contribute to our understanding of the phenomenon of language change? 2. What can the study of language change contribute to our understanding of properties of the human language faculty/UG? The chapter is organized as follows. Section 18.2 is concerned with a set of general issues that arise when language change is studied from a UG-based perspective. In particular,

Language Change 461 it is argued that the proper object of a formal study of language change should be identified as grammar change, that is, a set of discrete differences between the target grammar and the grammar acquired by the learner (see Hale 2007). It also gives an outline of the generative approach to (syntactic) change in terms of a change in parameter values. Section 18.3 focuses on acquisition-based answers to the actuation problem and addresses the question of how the discontinous nature of language transmission can be reconciled with the traditional observation of long-term, directional changes (drift, Sapir 1921). A related issue is dealt with in section 18.4, which discusses the fact that the transition from one historical language state to another typically involves linguistic variation and diachronic gradualness. Section 18.5 argues that the study of grammaticalization phenomena may offer insights into properties of UG and the theory of functional categories, in particular. Section 18.6 provides a concluding summary.

18.2 Basic Notions 18.2.1 ‘Language Change’ vs. ‘Grammar Change’ Generative linguists usually agree that the proper object of the scientific study of language is I-language (or simply ‘grammar’), that is, a knowledge state in the mind of an individual (see Chomsky 1986b). This is usually contrasted with E-language, which is sometimes defined as the set of actual (or potential) expressions that are in use in a linguistic community. Thus, linguistic evidence consists of E-language data; it is the goal of the linguist (and the language learner) to detect properties of I-language by inspecting the linguistic behavior of individuals. Any principle or rule of grammar that is posited by the linguist is to be seen as a piece of I-language. Universal Grammar (UG) is then construed as a theory of formal universals of human language that identifies the set of possible I-languages/ grammars (see Chomsky 1986b:23). Adopting this view for the study of language change implies that the proper object of formal historical linguistics must be I-language(s)—or grammar(s)—as well. In the following I will briefly review some consequences of this position (see Lightfoot 1999, Hale 2007, Roberts 2007 for in-depth discussion). First of all, an I-language perspective on language change highlights the fact that language transmission between generations of speakers is necessarily discontinuous: ‘a language is not some gradually and imperceptibly changing object which smoothly floats through space and time’ (Kiparsky 1968:175; see also Lightfoot 1979:148). Rather, the grammar of a language—I-language—is always created anew in the mind of an individual child when he/she engages in the task of language acquisition. From this perspective, change takes place when the grammar acquired by the learner differs in some property from the target grammar(s) that generated the input the learner was exposed to. The proper scientific object of a linguistic theory of change should therefore be defined in terms of a set of discrete differences between I-languages, that is, between the target grammar and the grammar eventually acquired by the learner. Note that this

462 Eric Fuß perspective is at odds with views held in the traditional literature on language change where language change is often used to refer to a (sociolinguistic) process in which a given change spreads through a speech community. To avoid confusion with traditional uses of the term language change, Hale (1998, 2007) and Lightfoot (1999) introduce the notion of grammar change to refer to the proper scientific object of (generative) historical linguistics (for discussion see also Janda and Joseph 2003; Roberts 2007). Thus, we must differentiate between the following aspects of any given change: (2)

a. Innovation

(grammar change, abrupt)

b. Diffusion

(a grammar change gradually gaining a wider distribution in a speech community, often perceived as ‘language change’)1

As pointed out by Hale (2007:29), the fact that ‘some of the most hotly debated issues in traditional historical linguistics (is change “gradual” or “abrupt”?, is phonological change “regular”?, etc.) have arisen and remained obscure’ can ultimately be attributed to the failure to distinguish properly between these two aspects of diachronic variation. It is likely that the confusion of different notions of ‘change’ arises at least partially from the fact that what we perceive as ‘language change’ is normally the result of diffusion. In fact, there are presumably myriads of changes that never showed up in the records since they were confined to a single speaker and never spread to other speakers, let alone to the whole community of speakers. Still, it seems that we must focus on the first kind of change, that is, innovations, if we aim at developing a restrictive linguistic theory of language change.2 This point can be illustrated with the following example adopted from Hale (2007:39): (3)

a. Middle English lutter ‘pure’ b. Modern English pure ‘pure’

1

According to Hale (1996:16), the difference between innovation (i.e. grammar change) and diffusion can be defined in somewhat more formal terms as follows (see also Hale 2007:36): (i)

Innovation a. The target grammar that generates the PLD has properties X, Y, Z. b. The grammar acquired by the learner has properties X, Y, W.

(ii) Diffusion a. There is a ‘mixed’ PLD generated by a grammar with properties X, Y, Z and another grammar with properties A, Y, W. b. The grammar acquired by the learner has properties X, Y, W.

Thus, cases of diffusion in fact do not represent instances of change, since the learner ‘has accurately adopted a linguistic feature from some speaker’ (Hale 2007:36, original emphasis). 2 Of course, the study of diffusion can also reveal important insights in that it tells us something about social aspects of language, for example the factors that govern the diffusion of forms, the social stratification of speech communities, social factors that govern linguistic variation, etc. Crucially, however, it does not tell us much about language, that is, grammar, itself. See Niyogi and Berwick (1997, 1998), Niyogi (2006) for formal approaches that model the diffusion of a change through a speech community in terms of dynamical systems theory.

Language Change 463 Of course, the (lexical) change from lutter to pure cannot be explained in terms of a restrictive theory of possible sound changes (e.g., /l/→ /p/is a very unlikely type of sound change). Rather, the change from (3a) to (3b) is an example of borrowing due to language contact with French. As a result, the original English word meaning ‘pure’ was replaced by the loanword pure. As has repeatedly been pointed out in the literature, there are presumably no linguistic constraints on borrowing (see, e.g., Thomason and Kaufman 1988, Harris and Campbell 1995, Curnow 2001; but also see chapter 16, section 16.8).3 Furthermore, note that borrowing represents an instance of diffusion (the input contained both pure and lutter, and over time, more and more learners acquired pure instead of lutter as the realization of the concept ‘pure’). Thus, while we can formulate a constrained theory of possible sound changes (ruling out a sequence of grammar changes leading from (3a) to (3b)), it seems quite unlikely that we can develop a restrictive linguistic theory of possible diffusion/borrowing events. Any change can diffuse, and there are presumably no strong linguistic constraints at work here. From an I-language perspective on change, the relationship between language acquisition and change can be represented as follows (see Hale 2007:28): (4) A model of language acquisition and change S0 (= the initial state of the learner; UG) S1

S2

output of G1/ input to the learner

G1 (target grammar)

S3

. . .

Change: a set of (discrete) differences between G1 and G2

G2 (fixed knowledge state/grammar eventually acquired by the learner) 3

However, see Heine and Kuteva (2005, 2008) for an opposing view.

464 Eric Fuß Starting out from S0 (the initial state of grammar, usually taken to be an expression of the genes, which can be modeled in terms of a system of abstract principles, UG), the learner constructs a number of intermediate knowledge stages during the acquisition process based on the evidence provided by the input (where Sn is revised to Sn+1 if the learner becomes aware of the relevant evidence necessary to trigger a certain property of the grammar).4 Eventually, the process of grammar construction leads to a fixed knowledge state which represents the grammar acquired by the learner in the course of language acquisition.5

18.2.2 Grammar Change as Parametric Change The model represented in (4) highlights that changes resulting from misacquisition involve a set of discrete differences between the target grammar G1 and the acquirer’s grammar G2. As already noted in the introduction, these differences (i.e., grammar change) can be modeled in terms of shifting values of individual parameters.6 However, the traditional notion that linguistic variation reflects parameterized principles of UG (e.g., in terms of two or more options with regard to some property) does not seem to sit comfortably with current minimalist theorizing (cf., e.g., Boeckx 2011a; see Roberts and Holmberg 2010, Holmberg 2010a, and chapter 14 for discussion). First, the assumption of a richly structured UG with a fairly large number of principles (and associated parameters) does not seem to be compatible with evolutionary considerations: it is commonly assumed that language arose in humans rather recently, that is, during the last 100,000–200,000 years, which is a blink of an eye in evolutionary terms (see, e.g., Corballis 2003). On the assumption that complex systems can only develop via time-consuming natural selection, the rapid development of the human 4 The timing of the intermediate stages is usually taken to be shaped by processes of cognitive maturation that control certain aspects of language development (see, e.g., Borer and Wexler 1987, 1992; Guasti 2002; and c hapter 12). The idea that linguistic competence matures in the course of language acquisition can be used to explain certain differences between the child language and the target grammar such as the availability of so-called Root Infinitives (see Wexler 1994, Rizzi 1994, Roberts 2007 for discussion). Still, it is generally assumed that each of the intermediate stages must represent a possible human grammar, that is, ‘there are no dead ends in language acquisition’ (Chomsky 2002:130–131). 5 As pointed out by Hale (2007), conflicting evidence that is encountered after L1 acquisition has terminated does not lead to revisions of G2, but rather triggers the acquisition of an additional grammar, leading to multilingualism. Another possibility not discussed by Hale is to assume that conflicting evidence that cannot be associated with an additional grammar is simply ignored by the learner (e.g., if it is not frequent or systematic enough; see c hapters 11 and 12). 6 This does not amount to saying that there are no changes affecting the syntax of a given language apart from parametric change. For example, there might be changes that have to do with the frequency of a given structural choice, or properties of individual, open-class lexical items (e.g., changes affecting the subcategorization frames of verbs, adpositions, etc.). Importantly for our purposes, however, it is only parametric changes (i.e., differences in the featural properties of a closed class of functional categories, as will be argued shortly) that are of theoretical interest and significance, in the sense that only they can reveal something about the structure and the workings of the language faculty/UG; see Lightfoot (1991, 1999), Roberts (2007), and Hale (2007) for discussion.

Language Change 465 language faculty (FL) seems to preclude the possibility that FL has a rich internal structure. Rather, it is more likely that language came into existence as a result of a small number of evolutionary innovations, with other properties of language determined by general cognitive structure and design principles (see, e.g., Hauser, Chomsky, and Fitch 2002; Chomsky 2005; Hornstein 2009). Second, the traditional notion of parameter is at odds with the minimalist assumption that the computational system (CHL), that is, the mechanisms that build up hierarchical structures by combining individual lexical items, is universal and therefore cross-linguistically and diachronically invariable (Chomsky 1995b, 2000b, 2005). Thus, strictly speaking, there is no such thing as ‘syntactic change’: the properties of the syntactic component of grammar remain constant over time (see Hale 1998, 2007; Longobardi 2001a; Keenan 2002). On this assumption, linguistic variation is limited to the lexicon (the Lexical Parametrization Hypothesis, going back to Borer 1984, Manzini and Wexler 1987; for discussion see, e.g., Ouhalla 1991; Chomsky 1995b, 2000b; Roberts and Roussou 2003; Holmberg 2010a; Boeckx 2011a). More precisely, parametric variation is associated with lexical properties of a closed class of functional categories which trigger syntactic operations in order to license their (abstract) morphological content (including minimally C, T, ν, and D; see, e.g., Rizzi 1997 and Cinque 1999 for more elaborate inventories of functional categories; see section 18.5 for diachronic evidence that bears on this issue). On this assumption, syntactic change is to be identified with changes affecting the feature content of functional categories (e.g., via phonological erosion or grammaticalization processes, see Longobardi 2001a; Roberts and Roussou 2003; Roberts 2007). The fact that syntactic change is usually highly systematic (in contrast to other types of change, which may be sporadic, i.e., confined to a single lexical element) can then be accounted for if we adopt the assumption that the set of core functional categories must be present in every well-formed syntactic representation as the basic ‘building blocks’ or ‘skeleton’ of clause structure (see von Fintel 1995; Chomsky 1995b, 2000b; Hale 1998). Moreover, if overt inflectional morphology is taken to reflect the (abstract) feature content of functional categories, it is possible to construe a correlation between syntactic change and morphological change, that is, to provide a principled explanation for the traditional observation that changes affecting the inflectional morphology of a given language often go hand in hand with syntactic change (see, e.g., Sapir 1921; see also Lightfoot 1991, 1999; Roberts 1993a, 2007; Haeberli 1999, 2004; and the papers in Kemenade and Vincent 1997 and Lightfoot 2002).7 From this perspective, grammar change takes place when there is an innovative value vj of a parameter pk (i.e., a lexical property/feature value linked to a certain functional category) which is for some reason more ‘accessible’ in the input data than the relevant value vi which is part of the target grammar. However, as already noted, this gives rise to 7

Ideally, a small change in the featural properties of (core) functional categories should yield a number of (at times dramatic) distinct changes on the syntactic surface, which is a hallmark of parametric change (see Lightfoot 1979, 1991, 1999). See Holmberg (2010) for a minimalist attempt to account for the traditional assumption that a single parameter determines a complex of syntactic properties; see also c hapters 14 and 16.

466 Eric Fuß an apparent paradox, which is sometimes called ‘the logical problem of language change’ (see Clark and Roberts 1993, 1994; Niyogi and Berwick 1998; Roberts 2007:230–231):8 The logical problem of language change ‘if the trigger experience of one generation, say g1, permits members of g1 to set parameter pk to value vi, why is the trigger experience produced by g1 insufficient to cause the next generation to set pk to vi?’ (Clark and Roberts 1994:12)

As already pointed out in section 18.1, this paradox is intimately connected to the question of what factors can account for the actuation of changes (Weinreich et al. 1968). In current theoretical approaches to language change, it is generally assumed that a solution to the actuation problem must be based on a closer inspection of the set of factors that may impede perfect transmission of linguistic features during L1 acquisition.

18.3 Language Acquisition and the Actuation Problem Under the assumption that language acquisition is a deterministic process (that is, two different sets of input data give rise to two different grammars), the possibility of change can be related to shifts in the Primary Linguistic Data (PLD), that is the set of (partially parsed) linguistic signals on the basis of which the learner constructs a grammar: change occurs when learners fail to detect a trigger/cue for a certain property of the target grammar G1 in the linguistic input they are exposed to.9 From this perspective, the actuation problem concerns the question of how and why the PLD generated by a target grammar G1 may trigger a grammar G2, where G1 ≠ G2. Moreover, answers to this apparent 8

This paradox is intensified by the belief widely held in recent language-acquisition studies that children set parameters correctly very early and acquire basic properties of the target grammar in an almost flawless fashion (see, e.g., Wexler 1999), which has led some generative linguists to assume that language change cannot be explained in terms of L1 acquisition (see, e.g., Weerman 2008). However, see Yang (2010) for critical discussion and Cournane (2014) on lexical mapping errors in the acquisition of English modals which suggest that learners do play an active role in processes of language change (further reports on delayed/non-targetlike acquisition of morphosyntactic phenomena include, e.g., Fritzenschaft et al. 1990 on verb placement in German; Håkansson, and Dooley Collberg 1994 and Waldmann 2014 on non-targetlike embedded V-Neg patterns in Swedish; and Bohnacker 2004 on determiner use in Swedish); see again c hapters 11 and 12. 9 The notion of ‘cue-based acquisition’ is developed in Lightfoot (1999) based on earlier proposals developed by Dresher and Kaye (1990) and Dresher (1999) to model the acquisition of phonological properties. The basic assumption is that UG contains not only a set of parameters, but also specifies for each parameter a cue that serves to switch the parameter one way or the other (see Fodor 1998a for a related approach; and c hapter 11 for discussion). If the learner detects a cue that is attested robustly, this will activate a given parameter or syntactic operation in the learner’s grammar. Language change results either if a given linguistic feature fails to be cued or if it starts to be cued, in contrast to the target grammar.

Language Change 467 paradox must reconcile the phenomenon of language change with the standard assumption that language acquisition is highly accurate under normal circumstances. However, the omnipresence of change might be taken to suggest that the standard model of L1 acquisition is perhaps overly simplistic. To facilitate change between two grammars G1 and G2, we must look for factors that might give rise to differences between PLD1 that led to the construction of G1 and PLD2 feeding the construction of G2.10 One likely source of differences between PLD1 and PLD2 is grammar-external factors such as language contact or conscious changes adopted by adult speakers of G1, for example the use of linguistic features associated with a prestige dialect, or the avoidance of features that are not part of the prestige dialect (see, e.g., Thomason and Kaufman 1988; for relevant examples in connection with standardization processes, see Weiß 2001 on the loss of multiple negation and doubly-filled Comp in Standard German, Hoeksema 1994 on the loss of verb projection raising in [written] Dutch, and Nevalainen and Tieken-Boon van Ostade 2006 on the history of English). As a result, the evidence for a certain property of the target grammar may be obscured by the fact that the child usually receives input from different speakers with possibly different grammars. The learner must determine which output string is generated by which grammar, a nontrivial task. It is at least conceivable that in such a situation, the child may mistakenly attribute a certain output string to the wrong grammar, which in turn may give rise to a new grammar with properties that differ from those of the target grammar (see Hale 2007:38–39 for discussion). A relevant example comes from Kroch and Taylor’s (1997) analysis of the loss of V2 in the Middle English period. Kroch and Taylor attribute the loss of V2 to a mixed dialect situation where speakers of a northern V2 grammar came into contact with speakers of a southern variety in which subject pronouns regularly intervened between a fronted XP and the finite verb. According to Kroch and Taylor, the resulting mixed input (in particular, the systematic deviations from V2 generated by the southern grammars) led to the acquisition of a grammar that also generated V3 patterns, leading to the loss of V2 in the northern variety: (5)

a. Output string generated by southern grammar: XP –pronoun –Vfin b. Output string generated by northern grammar: XP –Vfin –pronoun

The relevant grammar change would then result from a misanalysis in which learners mistakenly attributed output string (5a) to the northern grammar, which originally was a strict V2 grammar. However, one might argue that this outcome was actually not an instance of grammar change, but rather the result of diffusion (see note 1). As already mentioned, there are presumably no strong linguistic constraints on contact-induced change. The same goes for changes triggered by sociolinguistic factors. In what follows, 10 Note that this raises a number of further questions, in particular concerning the way the language acquisition device (LAD) converts information conveyed by the PLD into a grammar G with a set of properties {P1 … Pn}, which cannot be addressed here in detail (but see recent work by Charles Yang and the discussion in c hapter 11).

468 Eric Fuß we will therefore focus on (internal) triggers of grammar change.11 Relevant proposals typically assume that for some reason, the PLD the learner is confronted with differs from the PLD that gave rise to the target grammar, due to factors such as (stylistically motivated) shifts in the frequencies with which certain constructions are used, (morpho-)phonological erosion, linguistic variation inherent to speech production, or ‘noise in the channel’ that blurs the evidence for certain properties of the target grammar in the linguistic input the learner receives (see, e.g., Lightfoot 1979, 1991, 1999; Hale 1998, 2003, 2007; Roberts 2007). It is a widespread assumption that stylistically motivated changes in language use may lead to significant changes in the make-up of the triggering experience (see Lightfoot 1991, 1999). As a result, the evidence necessary to trigger a certain property of the grammar may cease to be robustly expressed in the input data. When a certain threshold (e.g., in frequency) is crossed, this may lead to grammar change. Examples discussed in the literature include the reanalysis of surface VO patterns (in terms of underlying VO order) due to an overuse of NP postposition from an OV base (see Stockwell 1977; and van Kemenade 1987 on English), the rise of ergative/absolutive case marking via a reanalysis of frequently used passive constructions (see Anderson 1977, 1980), or the loss of V2 due to an increased frequency of subject-initial clauses (see Lightfoot 1991, 1997; Roberts 1993a; Clark and Roberts 1993). Of course, this raises the question of what counts as robust quantitative evidence signaling (the absence of) a certain parameter value. Lightfoot (1997, 1999) links the loss of V2 patterns to the observation that in stable V2 languages such as German or Swedish, at least 30% of main clauses exhibit subject– verb inversion (XP–Vfin–subj.). Lightfoot is led to conclude that a positive value for the V2 parameter can be acquired as long as the frequency of examples with inversion does not drop below the threshold of 30%.12 It is commonly assumed that fluctuations in the frequency with which certain patterns are attested in the PLD may give rise to reanalysis, that is, a process in which a 11 For contact-induced change, see Thomason and Kaufman (1988) and the bulk of work put together by Anthony Kroch and his collaborators on changes in the history of English (see, e.g., Kroch and Taylor 1997, 2000; Pintzuk 1999; Kroch, Taylor, and Ringe 2000; Yang 2000; Kroch 2001; Trips 2002; see also chapter 17 for a discussion of this kind of change in relation to the diachrony of Haitian and other Creoles). 12 Yang (2002, 2010, 2015) develops a formal model of the relationship between language acquisition and language change (the so-called ‘variational learning model’) that pays attention to the frequencies with which linguistic expressions are attested in the input data. Roughly put, Yang shows that there is a close connection between relevant frequencies and the speed (and robustness) with which a certain grammatical property is acquired. In Yang’s model, the dynamics of learning are formalized in a way that resembles the mathematical treatment of the dynamics of selection in an evolutionary system (see also Clark and Roberts 1993). From this perspective, L1 acquisition is based on a process of selection that successively reduces the number of grammatical hypotheses which are not compatible with the linguistic environment. According to Yang (2010), this acquisition strategy is sensitive to the traditional distinction between core and periphery: selecting properties of the ‘core’ linguistic system from a narrow range of options (the set of parameter values provided by UG) ‘is sensitive to token frequencies of specific linguistic data that are quite far removed from surface level patterns’ (Yang 2010:20). For example, Yang argues that the option of topic drop in early child English is gradually driven out of the grammar by the small number (1.2%) of expletive subjects in child-directed speech. In contrast, the acquisition of language specific generalizations such as productive phonological rules (e.g., umlaut, glottalization,

Language Change 469 given surface string is assigned an underlying structure that differs from the relevant structure in the target grammar.13 While reanalyses do not alter the surface manifestation of the relevant syntactic patterns, they may give rise to other changes, in particular, the obsolescence of forms generated by earlier grammars (in the examples mentioned, the loss of OV orders, and the loss of subject–verb inversion [i.e., a V2 grammar], respectively). The idea that grammar change is triggered by shifts in E-language predicts that language change is a contingent and unpredictable process that depends on the availability (and frequency) of certain cues in the linguistic input the learner is exposed to (which may be subject to random fluctuations). This conclusion seems to be a logical consequence if we assume that incomplete transmission of linguistic features cannot be attributed to built-in factors/tendencies (such as speaker/hearer economy) that drive change, but results solely from the interaction between properties of the PLD and UG. However, purely frequency-based approaches have been criticized by Kroch (2001), who points out that usage frequencies may remain stable over long periods of time. Another set of problems comes from the traditional observation that language change does not seem to be random in the way expected. On the one hand, there seem to be cases of drift (see, e.g., Sapir 1921:160ff.), where languages are apparently subject to long-term, highly gradual directional changes (e.g., from the inflected to the isolating morphological type). On the other hand, it is a well-known fact that certain types of change are more likely than others (e.g., a change from basic OV order to VO seems to be much more frequent than the reverse change from VO to OV; see Kiparsky 1996, Roberts 2007) and that there are pathways of change that shape the historical development of languages cross-linguistically (e.g., in grammaticalization processes, Hopper and Traugott 2003). See section 18.3.1 on how these observations can possibly be captured by UG-based approaches to language change. final devoicing, etc.) and morphological properties (e.g., plural, past tense, agreement, etc.) seems to be sensitive to type frequency (e.g., the number of English verbs that form the past tense with a suffixed /-d/). In addition, the acquisition of the periphery is taken to be constrained by third factors (Chomsky 2005) such as the Elsewhere Condition (see also section 18.3.1). A question of some interest which is not addressed by Yang concerns the possible relationship between core and peripheral properties (e.g., inflection) in acquisition and change. See Heycock and Wallenberg (2013) for an application of Yang’s learning model to the loss of V-to-T movement in the Scandinavian languages. See c hapter 11 for further discussion of Yang’s proposals and related matters. 13 Reanalysis is an instance of what Andersen (1973) calls ‘abductive change.’ The concept of abductive change goes back to the notion of abduction, a method of logical inference introduced by Charles Sanders Peirce, where an inquirer makes (learned) guesses about possible explanations for a set of (seemingly connected) empirical facts. Thus, abductive change occurs when learners entertain a hypothesis concerning the grammar underlying the linguistic evidence they encounter that deviates from the actual target grammar. Note that approaches to change based solely on abduction (e.g., without a theory of what counts as robust evidence in the PLD) do not provide a satisfying answer to the actuation problem. This has to do with the fact that abduction is all about facilitating change (by introducing an inherent instability into the process of language transmission), but it does not have much to offer when it comes to explaining change. In particular, it does not address the question of why a given change takes place at a certain point in the historical development of a language, which is central to the actuation problem. See Kroch (2001) and Roberts (2007:123–124) for discussion.

470 Eric Fuß An alternative way to tackle the actuation problem is to assume that grammar change may be triggered by independently motivated changes in other parts of the grammar. This line of research has been (and continues to be) very productive as a means of explaining syntactic change by correlating it with (independently triggered) changes affecting properties of inflectional morphology (see Sapir 1921 for an early statement concerning the connection between the loss of case marking and the rise of SVO word order).14 Relevant generative case studies include the impact of the loss of verbal inflection on the availability of verb movement (for the history of English, see Roberts 1993a; for the Scandinavian languages, see Platzack 1988 and Holmberg and Platzack 1995) and pro-drop (for the history of French, see Roberts 1993a, Vance 1997; for present-day varieties of French see Roberts, 2010a; for Swedish, Falk 1993; for the Scandinavian languages in general, see Holmberg and Platzack 1995; for English, Allen 1995, Haeberli 1999), the relation between the loss of nominal inflections (i.e., case) and changes affecting word order and the rise of Exceptional Case-Marking constructions (for the history of English, see Lightfoot 1979, 1991, 1999; van Kemenade 1987; Kiparsky 1996, 1997; Roberts 1997; Haeberli 1999, 2004; Biberauer and Roberts 2005, 2008), or changes affecting the inventory of C-related clausal particles and the rise of generalized V2 syntax in Germanic (see Ferraresi 2005 on Gothic; and Axel 2007 on Old High German).15,16 Other causal connections that are posited to address the actuation problem include the rise of phonemic contrasts due to the loss of a segment that originally triggered a relevant phonological rule (‘phonemicization,’ see, e.g., Hyman 1976), the reanalysis of prosodically triggered ordering properties in terms of syntactic operations (see Stockwell and Minkova 1994, Dewey 2006 on the rise of V2 in Germanic), and the interaction of morphosyntactic change with shifts in the semantic/pragmatic function of expressions. The latter is usually based on the assumption that the semantic/pragmatic function of linguistic forms may become opaque over time (e.g., due to overuse or the development of competing, more ‘expressive’ forms), leading to a loss or reanalysis of forms (see, e.g., Givón 1976 on the rise of subject agreement via a reanalysis of clitics in left-dislocation structures; Stockwell 1977, van Kemenade 1987 on the rise of VO orders via a reanalysis of NP postposition; Hinterhölzl 2004, 2009 on the change from OV to VO order; Roberts 2007:276–277 on a possible scenario for the rise of OV order via a reanalysis of discourse-driven leftward movement of objects; 14 Fischer (2010) puts forward a differing proposal according to which word order change may set off grammaticalization processes (e.g., the rise of periphrastic constructions), which are then followed by the loss of inflectional morphology. 15 See section 18.5 below on the reverse change, that is, ways in which inflectional morphology arises historically via processes of grammaticalization (see, e.g., Roberts 1993b, 2010b; Roberts and Roussou 1999, 2003; Heine and Kuteva 2002; Hopper and Traugott 2003 for a generative perspective). 16 Under the assumption (Kayne 1994) that OV order is derived by leftward movement of the verb’s complements (and other VP-internal material) for licensing purposes, the loss of surface OV orders can be connected to independently motivated lexical or morphological changes (e.g., the erosion of the formerly rich system of inflections in the ME period, see, e.g., Roberts 1997, 2007; van der Wurff 1997, 1999; Fischer et al. 2000; Ingham 2002; Biberauer and Roberts 2005, 2006, 2008; see Hróarsdóttir 1996, 2000 on the loss of OV orders in the history of Icelandic).

Language Change 471 Fuß 2008, Hinterhölzl and Petrova 2009, Trips and Fuß 2009, and Walkden 2015 on the diachrony of V2 in Germanic).17 Attempts to link parametric change to independent changes in other parts of the grammar potentially offer an internal solution to the actuation problem; in addition they may reveal insights into the interaction between different components of the grammar (and thus properties of UG) which possibly cannot be reached from a purely synchronic perspective.18 However, one might argue that this approach merely shifts the burden of explanation (with respect to actuation) to another part of the grammar, that is, we may ask why learners at some point failed to acquire certain morphological or phonological properties. Solutions to the latter problem often capitalize on the fact that the acquisition of morpho-phonological properties is a complex and difficult task due to the often messy and chaotic character of the acoustic input the learner receives. This is what Hale (2007:53ff.) calls ‘noise in the channel.’ As is well known, the output of the articulatory system may be highly variable, even for a single individual (due to random factors such as speed of pronunciation, a cold, etc.). Moreover, it can be shown (e.g., by methods of instrumental phonetics such as spectrographic analyses) that even a single speaker seldom realizes one and the same linguistic sign (sounds, morphemes, words, sentences) in exactly the same way. The variation inherent in the target realization may obscure properties of the target grammar G1. If the range of variation in the phonological realization of a given underlying target structure crosses a certain threshold, this may cause the learner to posit an underlying form that differs from the relevant structure in the target grammar (see Hale 2003, 2007 and in particular the work of John Ohala [Ohala 2003 for an overview] for relevant considerations concerning phonetic aspects of phonological change; see also Blevins 2004 and chapter 8 for discussion).19 In this way, the messy character of the incoming data may give rise 17 A different stance is taken by Simpson (2004), who argues that movement operations are never lost from the grammar. Rather, EPP features are always available for the language learner as a formal means to cope with movement operations encountered in the input where the original semantic/ pragmatic trigger has been lost in the course of time (e.g., due to overuse of a certain construction). As a consequence, movement operations are not lost if the original trigger disappears, but rather they are converted into fossilized, purely syntactic movement (see Fuß 2008 for a relevant analysis of the development of generalized V2 in the history of German). 18 Though it might be that the link between, for instance, morphophonological changes and syntactic change is less direct than is often assumed. For example, it is conceivable that speakers tend to avoid ambiguity in their speech production after the loss of case distinctions by using word order as a means to distinguish between different arguments of the verb. This in turn leads to shifts in the PLD which at some point may give rise to parametric change (cf. Kroch 2001; see Bobaljik 2002, Alexiadou and Fanselow 2002, and Hróarsdóttir et al. 2006 for a critical view on the supposed link between rich verbal inflection and verb movement; but see Koeneman and Zeijlstra 2014 for a recent defense of the claim that V-to-I movement is conditioned by rich verbal agreement). 19 For example, see Ohala (1983) and Solé (2009) on cases of spontaneous nasalization which can be attributed to a tendency for hearers/learners to posit nasals (absent in the target grammar) in front of (final) voiced stops (see Ohala and Busà 1995 for an inverse change in which listeners fail to detect nasals before voiceless fricatives).

472 Eric Fuß to misparses/misanalyses that lead to wrong conclusions concerning properties of the target grammar. If these conclusions fail to be corrected (e.g., by further evidence to the contrary), they will eventually become part of the acquirer’s steady state grammar, an instance of grammar change. Under the assumption that language acquisition is a deterministic process (see chapters 11 and 12), it is fairly clear that the complex mapping from G1 to properties of G2 introduces quite a number of random factors (‘noise in the channel’) that may prevent a flawless transmission of features from G1 to G2. Moreover, from the fact that the make-up of the input seems to be different for each individual speaker we may conclude that change is not a rare phenomenon (as suggested by the logical problem of language change), but rather a necessary consequence of the way human languages are transmitted over time. From this perspective, the linguistic diversity we find in the world today (according to some estimations at least 6,500 different languages, belonging to at least 250 larger families, see Nettle 1999:1) can be directly attributed to the discontinuous nature of language transmission: imperfect replication of grammars gives rise to small- scale inter-speaker variation which might grow into dialects or even mutually unintelligible languages as a result of social selection and geographical isolation (see Nettle 1999 for in-depth discussion). At this point, it is interesting to note that the wide range of linguistic diversity that we can observe today (not to mention the languages that have vanished before now) developed in a comparatively short period of time (100,000– 200,000 years, or 4,000–8,000 generations if we assume a generation time of twenty- five years; see also chapter 14, section 14.4.2). The massive diversification of languages is a testimony to the pervasiveness and rapidity of linguistic change; it also highlights that the evolution of languages and the evolution of biological species operate at different speeds, a finding which can be traced back to the fact that only the latter is based on a direct transmission of properties (via inheritance).20 We have seen that the likelihood and directionality of (different types of) sound change may be attributed to purely phonetic factors (see, e.g., Hale 2007). However, it is still unclear how UG-based approaches can handle directionality effects and phenomena such as drift in other domains of the grammar.

18.3.1 Drift, Trajectories of Change, and the Role of ‘Third Factors’ In general, UG-based approaches to the actuation problem should lead us to expect that language change is basically a random phenomenon, that is, a ‘random walk through the range of possibilities defined by UG’ as Roberts (2007:348) puts it. Again, this 20 Another factor (in addition to discontinuous transmission) that contributes to the swiftness of language change is that parents may pass on to their offspring linguistic features that they have acquired during their lifetime (i.e., the historical development of language(s) exhibits Lamarckian properties, for discussion see, e.g., Nettle 1999; Mufwene 2002; Andersen 2006).

Language Change 473 expectation can be ultimately traced back to the discontinuous nature of language transmission, compare the following quote from David Lightfoot: Languages are learned and grammars constructed by the individuals of each generation. They do not have racial memories such that they know in some sense that their language has gradually been developing from, say, an SOV and towards an SVO type, and that it must continue along that path. After all, if there were a prescribed hierarchy of changes to be performed, how could a child, confronted with a language exactly half-way along this hierarchy, know whether the language was changing from type x to type y, or vice versa? (Lightfoot 1979:391)

However, this prediction is clearly at odds with the bulk of evidence accumulated in historical linguistics in support of the existence of (universal) pathways of change, including probable, possible, and impossible sound changes (see, e.g., Campbell 2004; Blevins 2004), grammaticalization clines (i.e., the observation that the development of functional categories from former lexical elements proceeds via identical stages across time and languages, e.g., full verb → modal verb → auxiliary → inflection, see, e.g., Heine and Kuteva 2002; Hopper and Traugott 2003), or the observation that basic OV order is often replaced by VO while the reverse change seems to be quite rare (Faarlund 1990:50; Kiparsky 1996:140).21 Attempts to reconcile the observation of diachronic pathways with the notion of parametric change often converge with another line of thinking to solve the actuation problem, namely the assumption that there are properties of the acquisition device that may promote changes or determine the direction of change in case the evidence contained in the PLD is ambiguous or insufficient (see, e.g., Roberts and Roussou 2003:3ff. and Roberts 2007:345ff. for discussion).22 It is usually assumed that these factors 21 In the history of many languages, we can observe a change from basic OV to VO order (see, e.g., van Kemenade 1987, Lightfoot 1991, Pintzuk 1999 on English; Delsing 2000 on Swedish; Rögnvaldsson 1996, Hróarsdóttir 2000 on Icelandic; Grewendorf and Poletto 2005 on German language isolates in Northern Italy; Gerritsen 1984 and Kiparsky 1996 on the Germanic languages in general; Taylor 1994 on Ancient Greek; Devine and Stephens 2006 on Latin/Vulgar Latin/Romance; Rinke 2007 on Portuguese; Williamson 1986, Gensler 1994 on the Niger-Congo languages; Kiparsky 1996 on the Finno-Ugric languages [Finnish and Estonian, in particular]). In contrast, the reverse change from VO to OV seems to be rather rare, and relevant examples cited in the literature can mostly be attributed to language contact (see, e.g., Lehmann 1978 on Sinhalese [under contact with Dravidian languages]; Leslau 1945 and Biberauer, Newton, and Sheehan 2009 on South/Ethiopian Semitic languages such as Tigre and Tigrinya [under contact with Cushitic]; Givón 2001 on Akkadian [under contact with Sumerian]; Ratcliffe 2005 on Bukhara Arabic [under contact with Tajik and Uzbek]; Ross 2001 on Oceanic languages of North- West Melanesia [under contact with Papuan languages]; rare examples of apparent endogenous change from VO to OV include Georgian [Harris 2000], and the Mande languages [Claudi 1994]). As noted by Newmeyer (2000), the apparent rarity of VO-to-OV leads to an interesting paradox, given that the majority of the world’s languages are SOV. 22 Biberauer, Newton, and Sheehan (2009) argue that the set of possible diachronic pathways is also shaped by general properties of phrase structure. More specifically, they explore diachronic consequences of Holmberg’s (2000:124) Final-Over-Final Constraint (FOFC), which rules out configurations where a head-initial projection is embedded under a head-final projection. For example, FOFC predicts (correctly, it seems, see, e.g., Kiparsky 1996) that the often-observed change from OV to VO must proceed in a ‘top-down’ fashion (i.e., a change in headedness must first affect higher functional

474 Eric Fuß are not specific to the faculty of language. Hence, they are third factors in linguistic variation according to the typology introduced by Chomsky (2005:6; see section 18.1 and chapter 6). From this perspective, ambiguity in the PLD regarding the expression of certain parameter values is only a necessary condition for a change to take place, while the actuation of a change is attributed to learning strategies that select the most economic variant compatible with the input (relevant early generative proposals include Lightfoot’s 1979 Transparency Principle, or the Least Effort Strategy proposed in Clark and Roberts 1993).23 In general, these economy considerations come in two varieties. First, following early generative work on L1 acquisition (see Wexler and Culicover 1980, Berwick 1985 for discussion), it is often assumed that there are marked and unmarked (or default) parameter values and that the learner assigns a given parameter the unmarked value if no decision can be made based on the evidence available in the input.24 Alternatively, the decision in question is assumed to be sensitive to the notion of derivational/representational economy (see, e.g., Clark and Roberts 1993), in the sense that the learner assigns a given input string the most economical representation/derivation that is compatible with the evidence. Economy-driven approaches to change typically focus on the diachronic loss of movement operations (see, e.g., Clark and Roberts 1993; Roberts 1993a, 1993b), while markedness considerations can be used to capture other changes such as the loss of pro-drop. More recently, Roberts and Roussou (2003) and Roberts (2007) have proposed a synthesis of these two approaches that is based on the idea that marked parameter values give rise to more complex syntactic derivations. Under the assumption that marked parametric choices correspond to the presence of formal features (associated with functional heads) which drive syntactic operations (Move, Agree), this idea can be formalized by the following simplicity metric (Roberts 2007:235; see also Longobardi 2001a): (6) Given two structural representations R and R′ for a substring of input text S, R is simpler than R′ if R contains fewer formal features than R′. projections such as TP/IP before it can affect the basic order in the VP). In contrast, the (much rarer) change from VO to OV is expected to proceed in a ‘bottom-up’ fashion, first affecting VP, before it can reverse the headedness of higher functional categories. 23

Alternatively one might entertain a weaker position, claiming that principles of economy actually do not trigger changes, but merely restrict the set of possible changes. Note that from this point of view, economy considerations do not provide an answer to the actuation problem, but are relevant for the so- called ‘constraints problem’ concerning ‘the set of possible changes and possible conditions for changes which can take place in a structure of a given type’ (Weinreich et al. 1968:101). 24 A related idea lies behind the Subset Principle (Berwick 1985:37; Manzini and Wexler 1987:61), which ensures that if there are two possible values A and B for a given parameter, and A generates a subset of the sentences generated by B, the learner will acquire the more restrictive setting A in the absence of decisive evidence for setting B. In this sense, A can be viewed as the unmarked setting (see Biberauer and Roberts 2009 who invoke the Subset Principle to account for the loss of optional word order patterns in the history of English). Note that an approach in terms of default settings is not always compatible with the Subset Principle. For example, Hyams (1986) claims that the default setting for the pro-drop parameter is [+pro-drop], which is clearly in conflict with the Subset Principle, since ‘[+pro- drop] permits a set of grammatical sentences that includes the set generated by the setting [–pro-drop]’ (see O’Grady 1997 for discussion; and c hapter 11).

Language Change 475 (See also the discussion of Feature Economy in chapter 14, section 14.9). (6) ensures that in cases where the PLD underdetermines the target grammar, learners will generally prefer grammars that contain fewer formal features and thus yield simpler representations/derivations with a smaller number of movement operations. This can then be used to explain the phenomenon of drift in terms of a series of discrete and independent changes that successively reduce the inventory of (marked) formal features present in the grammar and in this way may give rise to the impression of generation-spanning, directional change (see Biberauer and Roberts 2005, 2006 for the change from OV to VO in the history of English as a combination of the loss of (i) EPP features and (ii) pied-piping of verbal projections; see Roberts 2007 for further discussion). In a similar vein, grammaticalization clines can be explained as reflecting a series of reanalyses in which movement operations linking positions in a fixed hierarchical sequence of functional heads (see, e.g., Cinque 1999) are converted into Merge operations that target the head of the former movement chain (Roberts and Roussou 2003). As a result, grammaticalization is expected to proceed upwards along the clausal spine, as exemplified by the cross-linguistically widespread development of future markers from lexical verbs, where the latter first turn into modals expressing obligation or necessity (Modnecessity0/ Modobligation0) which are then reanalyzed as a realization of Tfuture0.25 In this way, the study of grammaticalization phenomena can also provide insights into the make-up of the clausal architecture and the nature of functional heads (see Roberts and Roussou 2003, ch. 5; Roberts 2010b; see section 18.5 for further discussion).

18.4 Diachronic Gradualness, Real and Imagined: The Transition Problem The transition from one historical state to another is often described as a gradual process that is typically accompanied by a degree of linguistic variation which is not found in stable languages (see also Labov 1994 on the connection between variation and change). Changes do not manifest themselves instantaneously in the historical records. Rather, the replacement of old forms by new forms over time follows an S-shaped curve (which corresponds to the logistic function, which is also used in population biology, e.g., to model population growth). Thus, changes typically start slowly, gather momentum (up to a point where the growth is approximately exponential) before they lose speed and tail off slowly to completion (see Bailey 1973; Kroch 1989). These observations are at the heart of 25 An interesting question that comes up here concerns the mechanisms that underlie the traditional observation (see, e.g., Jespersen 1917) that language change (and grammaticalization, in particular) often proceeds in a cyclic fashion. Thus, the reduction (and loss) of linguistic forms by phonological erosion and analogical leveling is often compensated for by grammaticalization processes that create new phonological exponents, which are subsequently again subject to erosion (for discussion see, e.g., Jäger 2008; and the contributions in van Gelderen 2009 and Willis et al. 2013).

476 Eric Fuß the so-called ‘transition problem’ highlighted by Weinreich et al. (1968:153), which concerns ‘the route by which a linguistic change is proceeding to completion.’ To reconcile the apparently gradual character of language change with the notion that (grammar) change is necessarily abrupt (involving a set of discrete differences between the target grammar and the grammar acquired by the learner), it is often assumed that the impression of gradualness arises from a set of independent factors which traditional approaches failed to recognize (Kroch 1989, 2001; Hale 1998, 2007; Lightfoot 1999:ch. 4; Roberts 2007:ch. 4):

(i) the distinction between grammar change and its diffusion (Hale 2007; see section 18.2.1); (ii) the possibility of intra-speaker variation, which can be modeled as a form of bilingualism (grammar competition, Kroch 1989, 2001); and (iii) the use of a more fine-grained system of parametric choices (‘microparameters,’ Kayne 2000, 2005a), see Roberts 2007, 2010b; and chapters 14 and 16; (iv) ‘true’ optionality of syntactic operations (Biberauer and Roberts 2005). First of all, the impression of gradualness may arise from not distinguishing properly between the actual change and its (highly gradual) diffusion within a population/speech community. As pointed out by Hale (1998:3), the actual (parametric) change ‘has no temporal properties—it is a set of differences’ (original emphasis) governed solely by the interaction of cognitive processes (including properties of UG) with the linguistic experience. In contrast, the gradual diffusion of a change through a speech community necessarily has a temporal dimension. It is governed by sociolinguistic factors (prestige, register choice, etc.) and can be modeled by mathematical methods of population biology (see, e.g., Niyogi and Berwick 1998; Niyogi 2006). However, work by Anthony Kroch and his collaborators has shown that linguistic change is typically accompanied by another kind of variation which cannot be attributed to (incomplete) diffusion/inter-speaker variation. Relevant examples involve linguistic variation within one and the same text, compare the relative order of verbs and their complements in the following passage taken from the (northern) Early Middle English Ormulum: (7) Forr þatt I wollde bliþelig þatt all Ennglisshe lede wiþþ ære shollde For that I would gladly that all English people with ear should lisstenn itt, listen it,

wiþþ herte shollde itt trowwenn, wiþþ tunge shollde with heart should it trust, with tongue should

spellenn itt, wiþþ dede shollde itt follghenn. spell it, with deed should it follow. (CMORM,DED.L113.33; Trips 2002:112) According to Kroch, cases such as (7) represent genuine instances of intra-speaker variation, where speakers have command over two (or more) internalized grammars

Language Change 477 which differ with respect to a small set of parametric choices, giving rise to a wider range of linguistic variation (so-called ‘grammar competition,’ see, e.g., Kroch 1989, Han and Kroch 2000, and Ecay 2015 on the rise of do-support in English; Pintzuk 1999, Pintzuk and Taylor 2006 on word order change in the history of English; Santorini 1992 on Yiddish; Taylor 1994 on Ancient Greek; see also Kroch 1994, 2001 and Yang 2000; see Hale 2007:172ff. for critical discussion).26 Thus, it is claimed that language change typically proceeds via a stage of ‘internal diglossia’ up to a point where one grammar (or parametric choice) eventually wins out over the other.27 The replacement of one grammar/ parametric option by another is taken to follow the same S-curve which characterizes change in a population; moreover, it is assumed that different surface manifestations of a single underlying parameter change replace competing (older) forms at the same rate in all contexts (the so-called Constant Rate Effect, CRE). Under this assumption, estimations of the rate of change can also be used as diagnostic to identify clusterings of surface changes that can be attributed to a single underlying change (see Kroch 1989, and more recently Ecay 2014; see Fruehwald et al. 2013 on Constant Rate Effects in phonological rule change).28,29 26 It is generally assumed that grammar competition is triggered in cases where the PLD contains robust evidence for conflicting parametric choices that cannot be part of a single grammar (see, e.g., Kroch 1994; Lightfoot 1999:92). For example, Haeberli (2004) argues that grammar competition is triggered by mismatches between morphology and syntax (e.g., a grammar with syntactic evidence for verb movement but without rich verbal inflection), giving rise to a second internalized grammar where the mismatch is resolved. Over time, the more harmonic or economic variant may then eliminate its competitor. However, as pointed out by Hale (1998, 2007), this scenario (and the idea of grammar competition in general) seems to be at odds with the fact that speakers confronted with incompatible triggers/cues in the PLD either ignore a subset of the input data (see Anderson 1980, Bobaljik 2002, and Koeneman and Zeijlstra 2014 on relevant examples and the relative significance of syntactic and morphological triggers/cues) or acquire two fully distinct grammars, the use of which is typically linked to different registers (which arguably cannot be used to account for examples like (7)). 27 Kroch (1994) argues that blocking effects imposed by UG restrict the co-existence of grammars that differ only minimally with respect to a set of parameter doublets (i.e., coexisting competing values for one parameter), thereby guaranteeing that one grammar will eventually win out over its competitors (see Wallenberg and Fruehwald 2013 for further discussion and the claim that the notion of competing grammars can also be used to account for [stable] syntactic optionality). However, as pointed out by Hale (2007:173), it is not entirely clear how morphological blocking, which is usually considered a grammar- internal process, can effect the loss of parametric variants that are part of different grammars. This seems to suggest that it is actually more appropriate to assume that grammar competition involves competing parametric options within a single—instead of multiple—grammars. 28 But see Janda and Joseph (2003:140–141) for a critique of the CRE which is based on the conviction

that the order in which changes appear in written language need not reflect the order in which they first appeared in colloquial speech. In particular, we believe that novel patterns which arise individually in spoken language may cumulate for a long period of time before they jointly achieve a breakthrough, as a set, into writing. 29 More recently, the research program initiated by Anthony Kroch has sparked a number of new studies that combine a generative approach to language change with advanced corpus linguistic and statistical methods (see, e.g., Wallenberg 2009; Fruehwald et al. 2013; Wallenberg and Fruehwald 2013; Bacovcin 2013; and Ecay 2014, 2015).

478 Eric Fuß In addition, the impression of gradualness (and variation) might be due to grammar-internal factors. Roberts (2007:300ff., 2010b) discusses the possibility that the phenomenon of gradualness can be reconciled with the notion of parametric change if we replace the traditional notions of ‘parameter’ and ‘syntactic category’ with a system of more fine-grained microparametric choices (either involving a multitude of functional heads or a more elaborate system of morphosyntactic features), allowing for the possibility of ‘lexical diffusion’30 of formal features (such as EPP-features) through a rich system of functional heads.31 Under this assumption, ‘a series of discrete changes to the formal features of a set of functional categories taking place over a long period [may give] the impression of a single, large, gradual change’ (Roberts 2007:300). An alternative grammar-internal source of variation is true optionality of syntactic operations (see Biberauer and Richards 2006; for a relevant proposal to account for word order variation in Old and Middle English, see Biberauer and Roberts 2005). We can therefore conclude that apparent diachronic gradualness does not impose a challenge for UG-based approaches to linguistic change. Rather, change between grammars is necessarily abrupt and instantaneous, with the impression of gradualness arising from a set of independent grammar-external and grammar-internal factors. The next section discusses the phenomenon of grammaticalization, the prime example of an allegedly gradual change, and shows how this diachronic process can not only be subsumed under the model of language change outlined so far, but may also provide interesting evidence bearing on the nature of functional categories.

18.5 Grammaticalization and the Nature of Functional Categories It is a well-known observation (going back at least to 19th-century grammarians such as Franz Bopp 1816) that grammatical categories like determiners, inflections, conjunctions, or auxiliaries evolve historically from formerly (free) substantial lexical categories such as nouns or verbs.32 Studies of grammaticalization phenomena deal 30 The notion of ‘lexical diffusion’ is commonly used to refer to the idea that changes may spread through the lexicon gradually, affecting one lexical item at a time. 31 See Guardiano and Longobardi (2005), Longobardi and Guardiano (2009), and Longobardi et al. (2013) for a new method of measuring historical relatedness and genetic affiliation of languages based on a fine-grained system of microparameters (relating to the syntax of noun phrases/DPs; this approach is presented in detail in c hapter 16). See also Walkden (2014) for a generative approach to syntactic reconstruction. 32 From the 1980s on, the study of grammaticalization processes has become a main focus of descriptive diachronic/typological linguistics, leading to a wealth of new data and a set of generalizations on the course and defining properties of grammaticalization (see, e.g., Heine and Kuteva 2002; Lehmann 2002; Hopper and Traugott 2003; Heine 2003; and Narrog and Heine 2011). For a generative perspective

Language Change 479 with (i) the historical pathways along which lexical elements develop into grammatically functional elements and then further into ‘more grammaticalized’ functional elements and (ii) the (pragmatic, morphosyntactic) contexts where this development takes place. Descriptive/typological studies often emphasize that grammaticalization is a separate type of change (in addition to well-known types such as reanalysis, or analogy/extension) in which the syntactic category of a lexical item changes gradually and irreversibly33 along universal (historical) pathways, called grammaticalization clines or paths (see Hopper and Traugott 2003; Heine 2003), compare the following cline that characterizes the development of agreement markers (over several generations of speakers): (8) independent pronoun → weak pronoun → clitic pronoun → affixal (agglutinative) agreement marker → fused agreement marker → ∅ According to some researchers, the particular properties of grammaticalization phenomena pose a challenge to theoretical views widely held among generative linguists. For example, it has been argued that the seemingly gradual transition from one syntactic category to another is not compatible with the assumption of discrete syntactic categories (see Hopper and Traugott 2003, Heine 2003; but see Newmeyer 1998 for an opposing view). Moreover, the long-term, unidirectional character of clines seems to be incompatible with the fact that language transmission is necessarily discontinuous: learners do not have any historical knowledge that tells them that they currently take part in a generation-spanning grammaticalization process and that they must proceed further down the cline. Rather, the new grammar is created anew in the mind of the child, solely on the basis of (i) the linguistic experience the child is exposed to, and (ii) properties of the human language faculty. The challenge posed by grammaticalization phenomena has also been taken up in generative work (see note 32 for references). For example, Roberts and Roussou (2003) argue forcefully that the properties of grammaticalization can be accounted for in more formal terms if we adopt the following assumption (see also von Fintel 1995; Roberts 2007, 2010b):34 (9) Grammaticalization involves the reanalysis of substantial lexical elements as phonological exponents of (higher) functional categories/heads. on relevant phenomena see, e.g., Roberts 1993b; von Fintel 1995; Newmeyer 1998; Roberts and Roussou 1999, 2003; and van Gelderen 2004, 2011. 33

In other words, it is claimed that grammaticalization is a unidirectional process. However, it has been pointed out that there exist exceptions to the purported irreversability of grammaticalization processes, involving, e.g., changes where former inflections (such as case affixes) turn into clitics (see, Campbell 1991; Newmeyer 1998; Allen 2003; Willis 2007; Norde 2009; and most recently Jung and Migdalski 2015). 34 In some cases, the reanalysis may also affect functional elements that turn into other functional elements, as in the case of clitics (presumably realizations of D0) that turn into agreement markers (see Roberts and Roussou 2003; Fuß 2005).

480 Eric Fuß From this perspective, grammaticalization is taken to involve the reanalysis of a movement dependency where the phonological features of an element moved to a higher functional head F (or to the specifier of F35) are analyzed as the phonological realization of F. Accordingly, grammaticalization prototypically proceeds in an upwards fashion (Roberts and Roussou 2003): the phonological exponent [π] of a hierarchically lower lexical category is reanalyzed as the exponent of a higher functional head. The unidirectional character of this type of change can then be attributed to economy principles (i.e., third factors in the sense of Chomsky 2005) that favor less complex structures and derivations and drive grammaticalization processes in case the input data is ambiguous (see Roberts 1993b; Roberts and Roussou 1999, 2003; van Gelderen 2004).36 It can be shown that under this view, grammaticalization processes are not only fully compatible with standard generative assumptions, but can also serve to deepen our understanding of the nature of functional categories, in particular concerning (a) their featural make-up and (ii) the hierarchical organization of functional projections in the structure of the clause. The following discussion focuses on a set of general properties of grammaticalization phenomena that have been described in the literature (see, e.g., Lehmann 2002; Hopper and Traugott 2003), including the gradual loss of phonological and semantic content, the development from free into bound forms, the loss of variation concerning the use and placement of forms as well as the rise of selectional restrictions and properties characteristic of elements that are organized into (inflectional) paradigms. The reduction of phonological substance can be characterized as the loss of segments or of marked phonological features (e.g., the reduction of Latin ille to Romance le, see Vincent 1997). Phonological reduction is usually accompanied by a loss of prosodic independence which proceeds along the following pathway: (10) content item > function word > clitic > inflectional affix > phoneme > ∅ Adopting the basic assumption in (9), phonological attrition can be attributed to independently motivated, universal properties of functional categories. It is a well-known fact that functional elements such as complementizers, determiners, auxiliaries, and inflections are prosodically/phonologically deficient. In contrast to lexical categories, they cannot bear stress and are preferably monosyllabic/moraic elements (see Kenstowicz 1994 on English; Vogel 1999 on Italian).37 Thus, we might suppose that the 35 Note that changes in which elements occupying the specifier of a phrase (usually via movement) are reanalyzed as the head of that phrase are quite frequent (demonstratives > determiners, relative pronouns > complementizers, negative adverbs > sentential negation, etc.; see van Gelderen 2004 for detailed discussion). 36 Fuß (2005) argues that grammaticalization phenomena are in addition shaped by morphological third factors (i.e., blocking effects induced by some form of the Elsewhere Condition) which restrict grammaticalization to cases where the newly coined exponents signal more featural distinctions than competing older forms. 37 Note, moreover, that it is generally assumed that many functional heads (such as T in finite clauses without an auxiliary English or D in languages without overt determiners such as Tagalog) are phonologically empty which is the endpoint of phonological deficiency.

Language Change 481 absence of stress and a reduced segmental make-up is presumably a necessary precondition for the reanalysis as a functional element (see Roberts and Roussou 2003 for discussion). After the initial reanalysis, it is predicted that the element is prone to undergo further phonological reduction due to the deficient phonological nature of functional categories, accounting for the pathway described in (10). The inherent prosodic deficiency of functional categories can also be used to account for the development of bound forms from formerly free forms (sometimes referred to as ‘coalescence,’ see Lehmann 2002). Again, this clearly parallels the behavior of functional categories (e.g., inflections), which are often realized by affixes or clitic elements that require the presence of a (lexical) host they can attach to (clearly a consequence of the general phonological deficiency of functional categories, henceforth FCs).38 Typically, phonological reduction is accompanied by the loss of semantic content, sometimes labeled bleaching. Informally, the notion of semantic bleaching refers to the erosion of substantial lexical meaning (see Eckhardt 2006 for in-depth discussion and a thorough critique of the notion of ‘bleaching’). For example, we can observe that nouns evolving into nominalizing affixes, and pronouns evolving into agreement markers lose their referential (and descriptive) potential while verbs evolving into auxiliaries lose their thematic (predicational) properties: (11)

German -heit < Old High German heit, Gothic haidus ‘person, nature, form, rank’

(12)

Finnish a. 1pl agr -me < 1pl pronoun me b. 2pl agr -te < 2pl pronoun te

(13)

English shall: future marker < ‘(root) modal’ < OE sceal, full verb meaning ‘subject has to pay an amount of money or has to return something to somebody’ (Lehmann 2002:114)

However, it has been pointed out that the semantic changes characteristic of grammaticalization do not affect the semantic properties of a given lexical item in a random fashion (von Fintel 1995; Roberts and Roussou 2003; Eckhardt 2006). First, grammaticalization typically involves the loss of substantial lexical meanings, in the sense of, for example, ‘predicative’ or ‘descriptive’ content: (14) a. Verbs lose their argument structure (or their thematic properties); b. Nouns lose their referential (and descriptive) content; c. Prepositions lose their capacity to designate spatial relations. 38 The existence of free tense, aspect, and mood markers (as in many Creole languages for example; see chapter 17) indicates that the development of bound forms is not a necessary consequence of the reanalysis of lexical material as exponents of functional heads, but rather a tendency, caused by the phonological deficiency of functional categories.

482 Eric Fuß Second, another set of semantic properties is preserved or added to the formerly substantial lexical category as a result of the grammaticalization process. This set of semantic properties involves abstract ‘logical meanings’: (15)

a. modals/auxiliaries preserve the modal content of the relevant former full verbs (as in the case of Old English sceal noted in (13)); b. addition of the feature [±definiteness] in the case of determiners evolving from nouns or numerals; c. addition/preservation of abstract meanings typically expressed by an operator–variable relationship (development of (i) wh-pronouns from former indefinites, (ii) quantifiers such as English many from former OE adjective manig).39

Thus, semantic changes typical of grammaticalization are not adequately described by the term semantic bleaching, since they show a set of clearly structured properties and, even more important, the retention or even addition of logical semantic content. In this way, the study of grammaticalization phenomena can inform us about semantic properties of functional categories. For example, Roberts and Roussou (2003) argue that elements subject to grammaticalization typically develop quantificational properties which they assume to be a characteristic of functional categories.40 Moreover, if FCs are confined to a certain kind of semantic content, then it is expected that the transition from lexical to functional category requires (i) the loss of those semantic properties that are not compatible with the universal make-up of FCs, that is, nonlogical (i.e., predicative or descriptive) content and (ii) the retention or addition of logical content. The loss of semantic content is typically accompanied by changes affecting selectional properties, such as a simplified subcategorization frame and a development from syntactic selection towards morphological selection (labeled condensation by Lehmann 2002). An often-discussed example comes from the development of the Romance synthetic future (Roberts 1993b; Roberts and Roussou 2003): (16)

a. Main verb (habere) that took a nonfinite clause as its complement → b. Future-indicating auxiliary selecting a verbal projection → c. Inflectional suffix m-selecting a verb stem.

39

According to von Fintel (1995:185), the quantifying determiner many developed from a former adjective (OE manig) which originally combined with plural nouns to “[identify] those sets/groups that have many members“ (examples like the many Englishes spoken in the world today show that many can still be used as an adjectival element). Cases where ‘many’+NP could be associated with existential readings (probably involving a phonologically empty determiner) eventually gave rise to a reanalysis in which many turned into a new phonological realization of D0 which combined existential force with the selectional requirements (plural) of the adjective ‘many.’ 40 Note that on these assumptions, the category ‘agreement’ seemingly does not to qualify as a separate functional category in the same sense as D, T or C, since it is not quantificational (see Chomsky 1995b for a related conclusion).

Language Change 483 Again, this feature of grammaticalization processes can tell us something about the universal profile of functional categories (and the distinction between functional and lexical elements with respect to selection). For example, we might conclude that diachronic data support the idea that only functional categories can select for lexical categories, whereas lexical categories cannot select for other lexical categories (Roberts and Roussou 1999). Furthermore, the study of grammaticalization phenomena may contribute new evidence to the debate on whether the inventory of functional categories is subject to language-specific parameterization or the same set of functional categories is universally present in all languages (see, e.g., Thráinsson 1996; Bobaljik and Thráinsson 1998). In this context, the observation that forms that were previously optional tend to become used obligatorily when subject to grammaticalization is potentially relevant. A well-known example comes from the rise of determiners, which cross-linguistically (e.g., in Romance [Vincent 1997; Giusti 2001] or Germanic [van Gelderen 2007]) develop from former demonstratives. While demonstratives are typically restricted to contexts where they add an indexical meaning to nouns, the resulting determiners are always obligatory. This follows if (at least a set of core) functional categories are taken to be the building blocks of syntax, which are universally present (either with phonological content or zero). Thus, if a determiner (D0) has a PF-realization /π/, then /π/ will show up in all DPs, since D0 is a necessary component of a nominal expression (see von Fintel 1995; but see van Gelderen 2004 for the claim that grammaticalization processes may enrich the inventory of non-core functional categories). In a similar vein, the fact that grammaticalized elements tend to occur in a fixed position (which Lehmann 2002 terms fixation) can be attributed to properties of FCs if we assume (i) that the order/hierarchical position of functional categories is universally fixed (Cinque 1999; Chomsky 2000b) and (ii) that the relative order of inflectional affixes is determined by the syntactic structure (see, e.g., the Mirror Principle, Baker 1988). As already pointed out in section 18.3.1, on the additional assumption that the relevant reanalyses convert movement dependencies into Merge operations targeting the head of the former movement chain, grammaticalization clines may then be taken to reflect the (universal) hierarchy of functional categories in the structure of the clause (see Roberts 2010b). Finally, grammaticalization processes may also provide insights into differences between lexical and functional categories at the interface between syntax and morphophonology. Grammaticalization typically involves the development of open class lexical elements into elements that belong to a closed class of functional words or are integrated into a morphological paradigm (typically exhibiting a certain degree of formal, functional and semantic homogeneity). This is what Lehmann (2002) calls paradigmatization. Fuß (2005) argues that paradigmatization can be attributed to universal properties of FCs under the assumption that paradigmaticity results from the way FCs are supplied with phonological content in the morphological component of grammar. More specifically, the phenomenon of paradigmatization is taken to reflect a crucial difference between the phonological realization of lexical and functional categories. Lexical

484 Eric Fuß material typically involves an arbitrary, idiomatic pairing of form and meaning. This can be modeled either by assuming early, presyntactic insertion of lexical roots that include phonological features (Harley and Noyer 1999), or by assuming that substantial lexical heads (N, V, A, P) can be realized by any Vocabulary item that matches the category specification (e.g., in English, N may be realized by /kæt/, /dɑg/, /bʊʃ/, etc., dependent on the choice of the speaker; see Marantz 1995, 1997). In contrast, the phonological realization of functional categories is non-arbitrary: it involves a competition between Vocabulary items that are specified for a (common) subset of the inflectional features contained in the functional head. The item realizing the greatest number of inflectional features is then chosen for insertion (the Subset Principle, Halle 1997). Under these assumptions, the traditional notion of a paradigm can be reconstructed in more formal terms by assuming that the insertion procedure for functional categories differs significantly from the way Vocabulary Insertion proceeds in the case of lexical heads. The impression that certain items ‘belong to the same paradigm’ arises from the fact that they are specified for an identical subset of morphosyntactic features (and thus form a natural class). In this way, increasing paradigmaticity (as a result of grammaticalization) can be systematically linked to the reanalysis of exponents of lexical items as exponents of functional categories.

18.6 Concluding Summary It is a truism that the data set available to the historical linguist is very small if we compare it with the empirical sources available to linguists working on present-day languages. There are only a restricted number of historical records, and we do not have access to speaker judgments, or any kind of negative evidence. Thus, no matter how carefully we make use of the evidence available to us, we still have to face the fact that there are major gaps and discontinuities in the historical records. Given the state of the empirical evidence, a key question for historical linguistics is how to bridge the gaps in our knowledge of the past. Going back at least to Jakobson’s (1931) Prinzipien der historischen Phonologie [Principles of historical phonology] an influential line of thinking suggests that formal approaches to language and language change which are based on abstract theoretical notions offer ways to (partially) fill in the gaps left by the historical evidence via formulating precise diachronic analyses and, ideally, a restrictive theory of linguistic change which delimits the set of transmission failures that can occur in a language with a given set of structural properties (this is what Weinreich et al. 1968 call the ‘constraints problem’).41 In this chapter, I have argued that generative, 41 For example, Jakobson (1931) demonstrated that the (abstract) structuralist concept of the phoneme makes available an approach to sound change and linguistic reconstruction which is empirically and conceptually superior to the purely ‘phonetic’ version of the comparative method developed by the Neogrammarians.

Language Change 485 UG-based approaches not only sharpen our understanding of the past; in addition, the study of language change can inform us about properties of the human language faculty as well, via making available evidence that cannot be gathered by purely synchronic investigations. Section 18.2 introduced the notion of grammar change (a set of discrete differences between the target grammar and the grammar acquired by the learner) as the proper object of a restrictive approach to language change, arguing that only from this perspective can we hope to discover restrictions on possible changes imposed by properties of UG and the workings of language acquisition. In addition, it was shown how grammar change can be modeled in terms of parameter change, that is, diachronic variation in lexical properties of a closed class of functional categories. In section 18.3, I addressed the actuation problem, suggesting that upon closer inspection, change is not a rare and paradoxical phenomenon, but rather a not unlikely outcome of the process of language acquisition if learners are highly sensitive to small fluctuations in the linguistic input they receive. To reconcile this perspective with the fact that there appear to be recurrent pathways of change that can be observed in many different languages, it has been suggested that the acquisition process is shaped by general cognitive principles of data analysis and efficient computation (‘third factors’ in the sense of Chomsky 2005; see c hapter 6) that tip the scales in favor of less complex structures in case the evidence contained in the PLD is not sufficient to determine the value of a given parameter. Section 18.4 discussed what Weinreich et al. (1968) call the ‘transition problem,’ that is, the question of how a language moves from one state to a succeeding state, focusing on the problems posed by diachronic gradualness and linguistic variation. We have argued that the impression of gradualness arises from a set of independent factors including inter-speaker (due to the gradual diffusion of a change in a speech community) and intra-speaker variation (i.e., grammar competition), microparametric change, and the possibility of true formal optionality of syntactic operations. Section 18.5 was concerned with the formal analysis of grammaticalization phenomena, arguing that from the perspective of a restrictive theory, this type of change may potentially reveal insights into the nature and (possibly universal) inventory of functional categories, which constitute major topics of interest in current research into the make-up of the human language faculty/UG. In this way, the study of language change appears as an integrated part of the generative enterprise, which hopefully will become even more productive when future historical linguists are no longer confined to written records of the past, but have access to a richer data set made available by large corpora and long-term investigations of developments in (present-day) colloquial speech.42

42 The work on sound change in (Northeastern) American English conducted by William Labov and his colleagues (see Labov et al. 2013 for a recent overview) is an impressive demonstration that the study of change in progress offers a level of detail unmatched by research based on purely historical data.

Chapter 19

L anguage Pat h ol o g y Ianthi Maria Tsimpli, Maria Kambanaros, and Kleanthes K. Grohmann

Universal Grammar (UG) denotes the species-specific faculty of language, presumed to be invariant across individuals. Over the years, it has shrunk from a full-blown set of principles and parameters to a much smaller set of properties, possibly as small as just containing the linguistic structure-building operation Merge, which in turn derives the uniquely human language property of recursion (Hauser et al. 2002). UG qua human faculty of language is further assumed to constitute the ‘optimal solution to minimal design specifications’ (Chomsky 2001a:1), a perfect system for language. Unfortunately, the human system or physiology does not always run perfectly smoothly in an optimal fashion. There are malfunctions, misformations, and other aberrations throughout. The language system is no exception. This chapter will present language pathology from the perspective of the underlying system: what can non-intact language tell us about UG?

19.1 Introduction Pathology has long played an important role in the study of language in general, and the human faculty of language in particular. Modern linguistics has benefited greatly from insights gained through investigating language breakdown, for example, supporting the original theory of movement and theta theory with evidence from agrammatic aphasia (trace deletion: Grodzinsky 1986, 1990, 1995) or employing some postulated mechanics to first language acquisition (truncation/tree pruning: Ouhalla 1993; Rizzi 1994; Friedmann and Grodzinsky 1997; Friedmann 1998, 2002). That is, theoretical implementations in analyses of impaired language utilized and thereby buttressed linguistic theory; and at the same time, the types of impaired language found in different pathologies across languages and the patterns attested—as well as those not attested, often ruled out by the theory—were taken to be the direct workings of what is possible in human

Language Pathology 487 language and what is not (UG, principles and parameters, and so on). In other words, the linguistic study of language pathologies has long held one constant: the invariant human faculty of language. Working on this assumption, data fed theory and vice versa. However, it is now time that the leverage of investigations of pathological language and cognition be raised. Future research could perhaps steer the focus of its potential contribution to finding out more about the underlying language faculty. We take the central questions about such a contribution of research on language pathology to research on Universal Grammar to be the following: (A) Can pathology affect core language abilities (i.e., the use of universal operations and primitives) or does pathology only affect language-specific properties (giving rise to optionality or variability)? (In other words, is there any evidence that language operations such as External or Internal Merge are unavailable or otherwise impaired in any population with language pathology?) (B) Does language pathology affect linguistic competence (in language-specific options) or does the variability in language use depend on accessing this knowledge due to affected mechanisms mediating language use (such as working memory resources)? (C) Does language pathology affect language use differently depending on whether we are dealing with an acquired or a developmental language disorder? (Associated with this question: could we distinguish between language use vs. language knowledge issues as a function of acquired vs. developmental language disorders?) (D) Can we disentangle the contribution of language-external factors in human cognition such as executive control or in the environment, such as input frequency of a particular token, type, or structure from language-impaired performance in developmental or acquired language disorders? We will address each of these questions as we go along. The chapter is structured as follows. Section 19.3 surveys research on developmental language disorders, with emphasis on specific language impairment, where we will try to find some answers to questions (A) and (D) in particular. The study of linguistic savantism, introduced in section 19.4, addresses the interplay of language and cognition in the context of language pathology, focusing on question (D). Section 19.5 singles out the most widespread acquired language disorder, aphasia, which allows us to address all of (A)–(D). Finally, section 19.6 sketches paths towards a comparative biolinguistics through phenotypic comparisons, which also provides an outlook for future research, especially on language pathology as a window into variation in UG and the human faculty of language, respectively. The chapter will be briefly concluded in section 19.7. But first we provide some background on linguists’ conceptions of the human faculty of language, complementing many other contributions to this handbook and relating these to impaired language; this section also contains suggestions for further investigations that are beyond the scope of this chapter.

488 Ianthi MARIA Tsimpli, Maria Kambanaros, and Kleanthes Grohmann

19.2 Impaired Language and the Invariance of the Language Faculty Theoretical assumptions regarding UG, and whether or how it is affected in developmental or acquired language impairment, have also raised issues about the links or dissociations among speech, language, and communication disorders. This is because human communication can include speech (the sensorimotor system: the spell-out of communication and the input to language perception), language (the elementary tools: words, syntactic rules, sentence meaning), and pragmatics (appropriateness: why a particular communication/social act happens at this time and place). The distinction among the three terms is fundamental in diagnosis, intervention, and prognosis for clinical practice and research, although the tacit assumption is that impairment in speech, language, and communication is rarely selective, affecting more than one of the three abilities (Bishop and Adams 1989; Bishop 1990; and much research since, recently summarized by Benítez-Burraco 2013). For example, specific language impairment is often comorbid with dyslexia (Smith et al. 1996; Catts et al. 2005), speech-sound disorder (Shriberg et al. 1999), or autism spectrum disorders (Norbury 1995; Tager-Flusberg 2006). It is thus usually the case that when speech impairment is diagnosed as the main area of deficit, communication problems ensue. Similarly, when problems in expressive language are identified, impairment will also affect communication as well, albeit not as the prime cause. Considering a diagram such as the one depicted in Figure 19.1, it is obvious that interactions between various components of speech, language, and communication are not only possible but perhaps inevitable as well. In view of the general question we seek to address in this chapter, namely the link between the contributions that a UG-based framework can make to language pathology, theory, and practice (and vice versa), we suggest that, with respect to questions (A)–(D) just listed, two main issues arise from the picture in Figure 19.1. The first has to do with the status of UG and its assumed location in this diagram, which the remainder of this section will largely deal with. The second issue concerns the possibility of ‘selective’ impairment in speech, language, and communication in already known types Lexicon

Impairment

Language

Grammar Semantics

Speech Communication

Perception Production

Pragmatics

(relevance of communicative act to time and place)

Figure 19.1 Impairment of communication for speech, language, and pragmatics.

Language Pathology 489 of language disorders given the intricate, and inevitable, interactions among them in the process of language production and processing (linguistic performance), the topic of the following sections. To complicate the picture further, linguists have successfully shown that ‘language’ is more of an umbrella term, including various levels of linguistic analysis such as morphology, phonology, semantics, and syntax which in turn are fed into by a lexicon viewed as an interface component par excellence: abstract feature specifications of a morphosyntactic or semantic nature interface with input from the phonological, semantic, syntactic, and encyclopedic levels allowing pragmatics, speech, and verbal communication to contribute (or be contributed to). Thus, in language impairment, both developmental and acquired, dissociations are found among levels of linguistic analysis, too—not only among speech, language, and communication as such. For instance, lexical abilities dissociate from morphosyntax in individuals with specific language impairment (see van der Lely and Stollwerck 1997; Tomblin and Zhang 2006) or with aphasia, exhibiting word-finding problems but not morphosyntactic problems and vice versa (cf. Grodzinsky 2000; Mendez et al. 2003). Similarly, in types of dementia, lexical access problems are attested with no or minimal morphosyntactic problems (Hodges et al. 1992). In autistic individuals, problems with phonetics–phonology dissociate from problems with grammar (morphosyntax) and communication deficits (pragmatics). Specifically, large variation is found in individuals with autism spectrum disorders insofar as language impairment is concerned (Kjelgaard and Tager-Flusberg 2001), distinguishing between Autism with Language Impairment and Autism with Normal Language (Bishop 2002; De Fossé et al. 2004; Hodge et al. 2010; Tomblin 2011), while phonetics and phonology are usually spared. On the other hand, impaired pragmatics and social skills are assumed to be universal characteristics of autistic individuals, evincing a communication impairment. (We concentrate on lexical and morphosyntactic issues and do not cover the relevance of semantics for language pathologies from a UG perspective in this chapter, but for a current review of relevant issues and additional references, see, e.g., Hinzen 2008, as well as chapters 2 and 9; likewise, for details on phonology and UG, see chapter 8). This complex picture of dissociations in language pathology in seemingly diverse populations has been used to argue against a UG-approach to language: the complex interactions between different levels of linguistic analysis with communication and speech are claimed to illustrate the non-autonomous nature of language, arguing against a mental organ with that name. What we hope to show, however, is that these complex and highly intricate patterns of linguistic behavior in developmental and acquired language disorders allow us to disentangle the locus of UG-driven problems from other, interacting components (speech and pragmatics on one hand, nonverbal cognition on the other) which lead to impaired communication skills in diverse populations. When asking question (A), then, whether a given pathology can affect the core of language abilities, one important assumption must be kept in mind: the study of human language as set out from the earliest days of generative grammar (Chomsky 1955/1975, 1956, 1957)—whether it involves a Language Acquisition Device (Chomsky 1965), a Faculty of Language (Chomsky 1975), or Universal Grammar (Chomsky 1981a) to

490 Ianthi MARIA Tsimpli, Maria Kambanaros, and Kleanthes Grohmann capture the species’ tacit, underlying ‘knowledge of language’ (Chomsky 1981b, 1986b)— can only be successful if we assume invariance across the species. If we allow for healthy, typically developing individuals to acquire language with the help of different faculties, then there is no constant against which differences and, possibly more importantly, similarities observed in the acquisition and subsequent developmental process could ever be evaluated (see also c hapters 5, 6, 10, 12, and 14, as well as many discussions in Norbert Hornstein’s Faculty of Language blog at http://facultyoflanguage.blogspot.com). Looking back on the past six decades of generative research and the core assumptions that have been put forth concerning UG and the language faculty, it becomes clear that such tacit, underlying knowledge of language is indeed a significant consideration. Take the quote that starts off c hapter 5, the opening sentence of Chomsky (1973:232): ‘From the point of view that I adopt here, the fundamental empirical problem of linguistics is to explain how a person can acquire knowledge of language’ (republished in Chomsky 1977:81). This conceptualization predates Chomsky’s (1986b) five questions on the knowledge of language, the first three of which already appeared in Chomsky (1981b), which we will return to presently. And it ties in with longer held assumptions about language acquisition: Having some knowledge of the characteristics of the acquired grammars and the limitations on the available data, we can formulate quite reasonable and fairly strong empirical hypotheses regarding the internal structure of the language-acquisition device that constructs the postulated grammars from the given data. (Chomsky 1968/1972/2006:113)

Specifically, UG is ‘a hypothesis about the initial state of the mental organ, the innate capacity of the child, and a particular grammar conforming to this theory is a hypothesis about the final state, the grammar eventually attained’ (Lightfoot 1982:27). UG is thus understood as ‘the biological system that accounts for the different individual grammars that humans have,’ though ‘not to be confused with the theory of grammar itself; rather UG is an object of study in the theory of grammar’ (chapter 3, p. 62). More explicitly, UG ‘is taken to be a characterization of the child’s pre-linguistic initial state’ (Chomsky 1981a:7) and—especially in the Principles and Parameters era of generative grammar (Chomsky 1981a, 1986b; cf. Chomsky and Lasnik 1993)—consists of ‘a system of principles with parameters to be fixed, along with a periphery of marked exceptions’ (Chomsky 1986b:150–151). In this context, the distinction between ‘core grammar’ and ‘peripheral grammar’ is introduced. While the latter ‘is made up of quirks and irregularities of language,’ ‘core grammar entails a set of universal principles, which apply in all languages, and a set of parameters which may vary from language to language’ (Gentile 1995:54). As Gentile continues, ‘[t]‌he theory of UG must observe two conditions’ (Gentile 1995:54): On the one hand, it must be compatible with the diversity of existing (indeed possible) grammars. At the same time, UG must be sufficiently constrained and restrictive

Language Pathology 491 in the options it permits so as to account for the fact that each of these grammars develops in the mind on the basis of quite limited evidence … What we expect to find, then, is a highly structured theory of UG based on a number of fundamental principles that sharply restrict the class of attainable grammars and narrowly constrain their form, but with parameters that have to be fixed by experience. (Chomsky 1986:3–4)

To the extent that the core/periphery distinction is still valid, one way of approaching language pathology might be to separate impaired competence and performance into these categories and to then focus on core grammar as a window into a possibly compromised UG. Having said that, though, it becomes immediately obvious that it is often in the grammatical periphery that we see revealing problems. For example, discourse-driven structures such as topicalization or dislocation go hand in hand with subtle interpretative differences which might not be picked up by individuals who experience problems in the pragmatics component. Preempting our brief presentation of the architecture of grammar in section 19.3, this might indicate impairment of the syntax– pragmatics interface, for example, going well beyond core grammar. Approaches such as the Interpretability Hypothesis (Tsimpli 2001) and others are sensitive to this aspect of impaired language, and more recent research ties such problems to other cognitive deficits (addressing questions (B) and (D)); unfortunately, this has not yet been systematically investigated across the board (therefore not allowing a definite answer to (C)). Another way to tackle questions (A)–(D) would proceed along the aforementioned ‘five questions’ familiar from current perspectives on the field of biolinguistics, the study of the ‘biological foundations of language’ (Lenneberg 1967). Expanding on Jenkins (2000), Boeckx and Grohmann (2007a) connected Chomsky’s (1986b) five questions on ‘knowledge of language’ to Tinbergen’s (1963) four questions on ‘the aims and methods of ethology.’ Boeckx (2010) is a more recent attempt to flesh out this research program in (text)book length, and the five questions have each been dubbed specific ‘problems,’ denoting their possible intellectual origin (from Leivada 2012:35–36): (1)

What is knowledge of language? (Humboldt’s problem; Chomsky 1965)

(2) How is that knowledge acquired? (Plato’s problem; Chomsky 1986b) (3)

How is that knowledge put to use? (Descartes’s problem; Chomsky 1997)

(4) How is that knowledge implemented in the brain? (Broca’s problem; Boeckx 2009a) (5)

How did that knowledge emerge in the species? (Darwin’s problem; Jewett 1914)

There is a direct relationship between the five questions and language pathology. While (1) is usually investigated from the vantage point of healthy, adult speakers, impaired language may give us further clues as to which aspects of language belong to the realm of tacit, underlying knowledge. Likewise, looking at developmental language impairments may tell us more about (2), in addition to studying typical language development. The exploration of (3) would gain tremendously from the examination of language-impaired

492 Ianthi MARIA Tsimpli, Maria Kambanaros, and Kleanthes Grohmann populations, especially for a yet outstanding unified competence–performance model of language, possibly one that also encompasses the aforementioned aspects of speech and communication. (4) is a very obvious concern for language pathology, especially impaired language which results from damage to the brain. And (5), lastly, which has recently been redubbed ‘Wallace’s problem’ (Berwick and Chomsky 2016), is the question that not only opens up so much room for speculation in normal language; it also concerns language pathology which arises from genetic malfunction, for example. However, each of these questions merits a chapter of its own with respect to language pathology; moreover, the relationship to UG as such may be more indirect at this point and not as clear-cut as one would wish for. So, rather than pursuing this route (briefly returned to in section 19.6), we will proceed with more direct consequences of the study of impaired language for UG.

19.3 Developmental Language Impairments Although generative researchers admit that some variation in language abilities can be found among monolingual healthy adults, particularly in areas of language that belong to the periphery or are primarily affected by education levels (larger vocabularies, more complex structures), the core of language knowledge is supposed to be shared by native speakers of the same language (Chomsky 1955/1975). Continuing along the lines sketched in the previous sections, developing knowledge of language is the outcome of typical first language acquisition which shows overwhelming similarities across children and across languages, despite variation in the pace of development within predictable time limits (Chomsky 1981a and chapter 12). Based on research of typical first language acquisition, researchers have concentrated on different stages of development looking for triggers in the input, as well as poverty of the stimulus arguments which downplay the role of the input and appraise the role of the innate endowment (Lightfoot 1991; Poeppel and Wexler 1993; Crain and Thornton 1998; Anderson and Lightfoot 2002; Han et al. 2016; see also chapters 5, 10, and 12). Since Lenneberg (1967), generative research on first language acquisition has explicitly assumed that during the preschool years, the core of grammatical knowledge has been acquired by the typically developing child, and that this is to a certain extent determined by human genetics. The special status of the faculty of language as a partly autonomous computational system in the human brain was thus associated with a developmental pattern that was specific to that system, biologically determined and unique to the species. The time frame in which language development takes place is consistent with this innate predisposition for language (Guasti 2002). Since the early 1980s, the generative approach to language acquisition inspired and was inspired by studies on developmental language disorders, with particular

Language Pathology 493 interest focusing on a type of language disorder revealing a strong asymmetry between an unexpectedly delayed or deviant profile of language development in the absence of any hearing loss, cognitive, emotional, social, or neurological damage (Leonard 1989, 2014; Levy and Kavé 1999). Although this type of developmental language disorder has been referred to under different names, specific language impairment (SLI) is the term mostly widely used in the literature. In early work on SLI, emphasis was placed on the considerable delay in the emergence of language use as well as in lexical and morphosyntactic development both in comprehension and production. A debate concerning the domain-specific (i.e., language-specific) or a domain-general deficit in SLI has been central in the literature, usually creating a boundary between generative and nongenerative approaches to SLI (cf. Clahsen 1989; van der Lely and Stollwerck 1996; Leonard 1998; Bishop et al. 2000; van der Lely et al. 2004; Ullman and Pierpont 2005; Bishop et al. 2006; Archibald and Gathercole 2006; Leonard et al. 2007). In the extensive research on SLI that has been conducted since, the question about the domain-specific vs. domain-general deficit has become considerably more refined in view of the heterogeneity of SLI and the variation that can be found across subgroups in terms of the areas of language that seem to be more or less affected. For instance, van der Lely et al. (2004) argued for a particular type of SLI, referred to as G(rammatical)-SLI, for children who unlike other SLI groups show a deficit in grammatical ability and core domains of grammar such as hierarchical structures and computations of dependencies (van der Lely 1998, 2005). Other generative approaches argued for a deficit or a delay in the acquisition of or access to specific grammatical features, agreement processes, or computational complexity of the syntactic derivation (Jakubowicz et al. 1998; Wexler et al. 1998; Tsimpli 2001; Clahsen 2008; Rothweiler et al. 2012). Some of these proposals may in fact be taken to support an affirmative answer to question (A)—for example, that the operation Move is impaired—but more recent considerations suggest an interplay of different factors, moving in the direction of question (B), such as the role of working memory resources on the syntactic computation. Regardless of whether we are dealing with an acquired or a developmental language disorder, disentangling the contribution of language-external factors, then, has become one principal issue of experimental concern (cf. (D)). Assuming that there are indeed discrete sub-types of SLI (van der Lely 2005; Friedmann and Novogrodsky 2008), such as lexical SLI, phonological SLI, or syntactic SLI, these sub-types ideally signal an impairment at the relevant interface level. Thus, viewed against a typical minimalist architecture of the sort displayed in Figure 19.2, PhonSLI would indicate some impairment to the interface between language-internal Phonetic Form (PF) and language-external sensorimotor system (SM), SemSLI, and perhaps also PragSLI between Logical Form (LF) and the conceptual-intentional system (CI), LexSLI between the mental storage unit and the computational component, and so on. Arguably the most significant advantage of an interface approach is the potential to tease apart language-internal (linguistic) from language-external effects (cognitive, motor, etc.), and SLI sub-types are predestined to provide the relevant evidence. More

494 Ianthi MARIA Tsimpli, Maria Kambanaros, and Kleanthes Grohmann Lexicon

Spell-Out

PF ==== SM

LF ==== CI

Figure 19.2 A minimalist architecture of the grammar.

detailed investigations are in order, though, which might also bear on the distinction between the faculty of language in the broad vs. the narrow sense (cf. Fitch 2009; Balari and Lorenzo 2015; and the more programmatic Barceló-Coblijn et al. 2015), something we will return to in section 19.6. After all, it is generally assumed that PF and LF form part of the language faculty (first-factor properties), while SM and CI refer to corresponding neural structures and operations in the brain (relating to the third factor; see chapter 6). Of course, the devil lies in the detail. Coming from a separationist perspective such as Distributed Morphology (Halle and Marantz 1993; Marantz 1997; Arregi and Nevins 2012) for example, one could hold that there is no discrete component ‘Lexicon’ as such; in that case, the impaired interface could denote the link between Vocabulary and/or Encyclopedia and the computational system. The point is that conceptual and empirical knowledge could work better hand in hand, across disciplines, and inform conceptual theory (confirming the architecture, for example, or requiring revisions)—just as we apply theoretical advances to impaired language data and the other way around. An important aspect of SLI research which illustrates the mutually beneficial relation between linguistic theory and the study of developmental language disorders is the large number of studies investigating lexical, grammatical, semantic, and pragmatic properties of different languages in the performance of children with SLI. The motivation is both theoretical and clinical. Efforts to identify the language-specific, vulnerable domains in atypical language development can lead to the evaluation of alternative approaches to SLI—but also to the evaluation of typological generalizations favored by linguistic theory concerning the nature and status of features, structures, and derivations. From a clinical perspective, identifying the vulnerable domains for each language which induce extensive delays in the language development of children with SLI helps speech and language therapists isolate diagnostic markers and design appropriate interventions. The cross-linguistic study of children with SLI has revealed that the patterns of acquisition that distinguish among typically-developing monolingual children also distinguish among children with SLI speaking those languages. For instance, the earlier acquisition of finiteness by French compared to English monolingual children is also attested in French-and English-speaking children with SLI (Rice and Wexler 1996;

Language Pathology 495 Conti-Ramsden et al. 2001; Jakubowicz and Nash 2001; Thordardottir and Namazi 2007; Hoover et al. 2012). Similarly, pronominal object clitics have been used as diagnostic markers for French-speaking children with SLI as they are particularly prone to omission (for recent summary, see Varlokosta et al. 2015), while in Cypriot Greek enclitics do not seem to be problematic for children with the same diagnosis (Petinou and Terzi 2002; Theodorou and Grohmann 2015). This distinction parallels the difference in the precocious development of enclitics in Cypriot Greek-speaking typically developing children compared with French monolingual L1 data (Grohmann et al. 2012). The value of cross-linguistic studies on children with SLI also extends to the comparison between children with SLI speaking different varieties of the same language. Thus, for children with SLI speaking Standard Modern Greek, a language with proclisis in finite contexts, the evidence is conflicting, with many studies showing clitic omission in production or processing as well as problems with clitic comprehension and some with no omission problems in production (cf. Tsimpli and Stavrakaki 1999; Tsimpli 2001; Tsimpli and Mastropavlou 2008; Stavrakaki and van der Lely 2010; Manika et al. 2011; Chondrogianni et al. 2015). More recently, research on SLI has included bilingual or dual language children with the aim of identifying similarities and differences between the developmental patterns and outcomes of typically developing bilingual learners from monolingual and bilingual children with SLI (Paradis 2007). This research is also largely driven by two opposing views: the domain-specific account of SLI, whereby language is the primary domain of the deficit, and the domain-general account, according to which general processing and conceptual deficits are responsible for problems in nonverbal cognition and for the profound language problems in children with SLI (Leonard et al. 1992; Miller et al. 2001; Kohnert and Windsor 2004). For the latter account, learning a second language is expected to cause some type of double-delay in children with SLI (e.g., Steenge 2006; Orgassa and Weerman 2008), according to which bilingual children with SLI will be outperformed both by typically developing monolingual and bilingual children but also by monolingual children with SLI. In recent years, many studies on different combinations of languages in bilingual children with SLI have been carried out and shed light on the debate between the domain-specific and the domain-general nature of SLI as well as on the question of whether bilingualism affects in either a positive or negative way the phenotypic profile of SLI. Many recent studies on bilingual children with SLI converge in showing that bilingualism either causes no additional deficits in their linguistic performance or improves some aspects of it (Paradis et al. 2003; Paradis 2010; Armon-Lotem 2012; Kambanaros et al. 2013, 2014; Tsimpli et al. 2016). One example, based on research on the lexicon in a multilingual child with SLI, revealed that there was no marked difference in lexical retrieval abilities across her three spoken languages (Kambanaros et al. 2015). Although research on bilingual children with SLI is still limited, the findings concerning the absence of further deficits in their language ability not only supports a domain-specific approach to SLI but also suggests that whatever language deficit is involved in children with SLI, it does not seem to be so severe that it prevents children from developing a

496 Ianthi MARIA Tsimpli, Maria Kambanaros, and Kleanthes Grohmann second language. From the generative perspective, this conclusion implies that SLI does not affect the basic properties and operations of the language faculty itself (i.e., a negative answer to question (A)), but instead delays development so extensively that maturational instructions associated with sensitive or critical periods for language development can, ultimately, become inoperative (Rice et al. 2009). Even if our current state of knowledge about SLI does not allow us to offer insights into the precise structure of UG, there is an obvious theoretical import that concerns the interface-driven architecture of grammar and the minimalist program of linguistic research. Furthermore, the study of developmental language disorders in relation to bilingualism and in relation to the comorbidity they present with other processing, cognitive, or socio-emotional deficits enriches our knowledge by directing research to the investigation of the (behavioral and neurological) links between language and other aspects of human cognition.

19.4 Linguistic Savantism When concentrating on language impairment and, in particular, developmental language disorders from the perspective of a UG-based approach, we ask questions about the levels, operations, features, or derivations that are affected, leading to deviant output. At the same time, we ask questions about the contribution of nonlinguistic, cognitive systems (memory, executive functions, general intelligence) to language use and deviant language outputs (cf. question (D)). The underlying motivation for both sets of questions is the asymmetry observed: an individual with SLI appears to have impaired language in the absence of other, major cognitive deficits in general intelligence, emotional development, or social skills. Thus, language impairment in individuals with SLI reflects an asymmetry between language ability and other (nonlinguistic, cognitive) abilities. The special status of language as a mental organ, autonomous and encapsulated from other aspects of cognitive abilities, can also be examined from asymmetries with the opposite pattern: impaired communication, emotional, and executive control abilities with language being unaffected or, more strikingly, a talent. This asymmetrical pattern has been extensively studied in the case of a polyglot-savant, Christopher (Smith and Tsimpli 1995; Smith et al. 2010). To appreciate the status of a language as a talent in an individual with otherwise impaired cognition, we will briefly present savantism: a profile built on an atypical and strongly asymmetrical pattern of behavior (Tsimpli 2013). In savants, verbal and nonverbal performance is at or below the lower end of the average scale, and communication skills are severely deficient. In contrast, savants excel in some special skill; music, mental arithmetic and calendrical calculations, drawing or sketching (Hermelin 2002). Importantly for the status of language, as noted by Howlin et al. (2009), savants are more frequently found in individuals with autism spectrum conditions than in any other group. Moreover, the majority of savants reported in the literature are autistic.

Language Pathology 497 Autism does not present a single profile; it is a spectrum of communication disorders associated with impairments in three major domains: social interaction, social communication and imagination, and a restricted repertoire of activities and interests (Wing 1997). Deficient social interaction is perhaps the defining feature of individuals with autism spectrum disorders (ASD). As suggested in the previous section, problems in social interaction can dissociate from linguistic and cognitive abilities. Linguistic abilities show considerable variation across individuals with ASD (Kjelgaard and Tager-Flusberg 2001), ranging from nonverbal individuals to those with normal language skills. Of the autistic children with some but not normal language ability, many exhibit language impairments affecting both vocabulary and morphosyntax. There are different cognitive approaches to autism, some of which offer alternative explanations and others complementary ones. One of the dominant views of autism involves the notion of ‘mind-blindness’ (Baron-Cohen 1995), which views autism as a deficit in theory of mind. Theory of mind can be understood as the ability to interpret other people’s intentions, beliefs, and attitudes through interpreting discourse, context, facial expressions, and body language. The theory of ‘weak central coherence’ (Frith 1989) views autism as a deficit in the integration of relevant pieces of information drawn from perceptual evidence and/or from previous knowledge, preventing the autistic individual from seeing the ‘big picture’ and allowing him to focus on the detail. Thus, the typical autist shows excellent attention to detail in terms of input processing and memory of detail and patterns. The more recent version of weak central coherence suggests that this attention to detail is due to a local processing bias. A third approach to autism suggests that the deficit lies in executive functions leading to limited or no cognitive flexibility (Ozonoff et al. 1991). Among other things, individuals with ASD show an inability to abandon decisions or misguided interpretations of actions as a result of this executive dysfunction. All three approaches concentrate on the deviance of autistic behavior from neurotypical individuals and attempt to account for it on the basis of some deficit: in theory of mind, in weak central coherence, or in executive functions. Linking these accounts to language impairment, we expect weak central coherence to be closer to an account of communication, and pragmatics in particular, as making sense of the global picture allows individuals to interpret nonliteral and figurative language, among other things. However, weak central coherence would not necessarily account for the comorbidity of autism and language impairment in a mild or a severe form. Executive dysfunctions would also predict problems with communication and coherence both at the level of language and of nonverbal cognition. Variation in the patterns of language impairment in individuals with autism still remains largely unpredictable on this type of account. A deficit in theory of mind would account for social and communication skills, problems with figurative and nonliteral language, and problems with social interaction. Although it has been suggested that good theory of mind presupposes a certain level of language development, such as lexical abilities (Happe 1995) or subordination (de Villiers 2007), the fact that complex theory of mind skills may be missing even when language abilities

498 Ianthi MARIA Tsimpli, Maria Kambanaros, and Kleanthes Grohmann have reached the expected threshold of abilities indicate that language is necessary but not sufficient. Overall, all three approaches seek to account for the main cognitive cause of autism and the way in which individuals with ASD diverge from the norm. Nevertheless, the fact that savantism and autism appear to frequently co-occur challenges the earlier theories: what is the link between ‘talent’ and autism? Baron-Cohen et al. (2009) analyze the excellent attention to detail, a characteristic of individuals with ASD regardless of whether they are high-or low-functioning, as strong systemizing. Systemizing is a higher cognitive ability which helps the extraction of patterns, rules, and generalizations from perceptual input. Autistic individuals are good at extracting repeated patterns in stimuli, while strong systemizing is also evident in their actions and overall (verbal and nonverbal) behavior. Baron-Cohen et al. (2009) suggest that savants exhibit a stronger version of the common autistic trait of strong systemizing: sensory hypersensitivity, hyperattention to detail, and hypersystemizing. Sensory hypersensitivity concerns low-level perception, aspects of which (tactile, visual, auditory) seem to be enhanced in savants. Hyperattention to detail is necessary for extracting the detailed ingredients of an association; these enter into modus ponens (p → q) patterns from a complicated and highly integrated input. Hypersystemizing constitutes a basic inferential ability which, if meticulously applied to perceptually accessible systems, leads to excellent knowledge of the system’s operations and categories. A memory system dedicated to the domain of hypersystemizing in the savant enables the buildup, storage, and retrieval of a highly sophisticated and detailed system of knowledge in this cognitive domain. This is reminiscent of Ericsson and Kintsch’s (1995) proposal of a long-term working memory for experts in a specific field, such as professional chess players; long-term memory is an ability that ‘requires a large body of relevant knowledge and patterns for the particular type of information involved’ (Ericsson and Kintsch 1995:215). It would seem appropriate to characterize hypersystemizing as a possible development of a trait in typical individuals with high expertise in a particular domain. Music, arithmetic, and fine arts (drawing, sculpture) are highly complex systems appropriate for the savant hypersystemizing mind. Most savants presented in the literature excel in one of these systems. The question now turns to language. The complexity of human language is usually based on operations such as recursion and discrete infinity which characterize other aspects of human cognition, such as music and mathematics (Berwick and Chomsky 2011, 2016). At the same time, language enjoys a privileged position in the human mind as a vehicle of thought and, secondarily, communication. Since autism typically involves a deficit in social interaction and communication but also a deficit in thought processes, language is not expected to be a domain of hypersystemizing in savants (cf. Howlin et al. 2009). Christopher is a unique case of a polyglot-savant (Smith and Tsimpli 1995; Smith et al. 2010; and references there). Christopher is unable to look after himself, and he has thus been institutionalized in his adult life. He is severely apraxic with visuo-spatial deficits, very poor drawing skills, and poor motor coordination. His working memory ability is poor but, at the same time,

Language Pathology 499 unusual in terms of his delayed consolidation phase: his memory for details appears to improve, rather than deteriorate, with time. His behavior shows autistic traits in the limited use of facial expression and intonation, in the avoidance of eye contact, and in resistance to social, verbal, and nonverbal interaction. His performance on theory of mind tasks is inconsistent, as he usually fails versions of the Sally–Ann task and passes the Smarties tasks. However, when varying these tasks in terms of certain variables, such as accessibility of encyclopedic knowledge (‘Smarties tubes contain Smarties’), Christopher’s performance deteriorates (Smith et al. 2010). Turning to language, Christopher’s profile is very different. Christopher speaks English as his native language. He understands, speaks, and judges English sentences of varying lexical and syntactic complexity like a native speaker of English. He can appropriately judge long-distance wh-questions, island violations, sequence-of-tense phenomena, morphosyntactic agreement patterns, negative inversion, and other ‘core’ phenomena of English grammar in a fast and determined way. The only phenomena where Christopher behaves unlike native English speakers are those that rely on the processing of information structure, such as topicalization and dislocation structures. In addition, he fails to recover from garden path violations, indicating that his ability to inhibit a first, incorrect parse is rather deficient (Smith and Tsimpli 1995; Smith et al. 2010; and references therein). Consistent with his autistic behavior is Christopher’s inability to interpret irony and metaphor, metalinguistic negation, and figurative language in general. At the same time, Christopher has learned around twenty languages, most of which he learned without spoken interaction. Through his ‘obsessive’ interest in written language, he has been able to develop large vocabularies in most of these languages, knowledge of derivational and inflectional morphology (with overgeneralizations evident in many of the ‘second’ languages in which he has achieved a good level of proficiency), and, to some extent, syntactic variation. At the same time he reads for comprehension, as he has clearly gathered considerable encyclopedic information on a variety of topics ranging from football and politics to history and metalanguage. Christopher was taught an invented language, Epun, from mostly written input. Epun had both natural and unnatural rules of grammar, the latter based on linear rather than hierarchical ordering, hence not UG-constrained. Unsurprisingly from a generative perspective, Christopher was unable to discover the unnatural rules but was able to learn Epun’s natural, UG- based rules of grammar (Smith et al. 1991; Smith and Tsimpli 1995). His learning of Epun illustrates a learner profile compatible with the autonomous status of a language faculty, which in Christopher’s case remained intact (cf. question (A)), despite a seriously challenged general cognitive profile. Teaching Christopher British Sign Language (BSL) was a risk; his limited eye contact, insensitivity to facial expression, and the fact that BSL lacks a written form were all reasons to predict failure. Nevertheless, Christopher’s learning of BSL showed a strong asymmetry between comprehension and production—in favor of comprehension. Moreover, his inability to develop good knowledge of BSL classifiers stood in sharp contrast with his learning of other domains of morphosyntax (agreement, negation,

500 Ianthi MARIA Tsimpli, Maria Kambanaros, and Kleanthes Grohmann questions), where he performed similarly with the control group of adult, good language learners. The learner profile Christopher revealed in his BSL development was interesting precisely due to the asymmetries it involved: his inability to learn classifiers, but not other aspects of BSL grammar, indicates that classifiers draw heavily on representations mediated by visuo-spatial cognition, which is deficient in Christopher’s case (Smith et al. 2010). As such, his poor performance indicates the role of language modality (signed vs. spoken) in language acquisition and the role of interfaces in each. For instance, the form of classifiers in BSL seems to involve an interface between the visuo-spatial processor and language, while in spoken languages this interface does not seem relevant, as shown by Christopher’s unproblematic acquisition of spatial prepositions in spoken languages (Smith et al. 2010). We could thus conclude that Christopher’s unique savantism in language (itself an obvious answer to question (C)) might be attributed to his hyperattention to linguistic detail, hypersensitivity to written language input, and hypersystemizing in the discovery of rules, patterns, and generalizations that make languages similar—and at the same time different from each other. Crucially, Christopher’s case demonstrates multiple dissociations among language, thought, and communication, pointing to a certain degree of autonomy of the language faculty itself.

19.5 Acquired Language Disorders Every human being uses language to communicate needs, knowledge, and emotions. In everyday life, talking, finding the right words, understanding what is being said, reading and writing, and making gestures are all part of using language effectively to communicate. If, as a result of brain damage, one or more parts of language use stop functioning properly, this brings on aphasia. The word aphasia comes from the ancient Greek aphatos meaning ‘without speech,’ but this does not mean that people who suffer from aphasia are speechless. In theoretical and clinical terms, aphasia is a linguistic impairment due to brain injury (usually a stroke in the left hemisphere), which results in difficulties with both comprehension and production of spoken and written language. Aphasia is classified into different types according to performance on production, comprehension, naming, and repetition tasks. The different aphasia types and associated brain lesioned areas are reported in Table 19.1. Overall, the salient symptoms of aphasia are interpreted as disruptions of normal language processing and production, affecting multiple levels of linguistic description (lexicon, morphology, phonology, semantics, and syntax). Yet, individuals with aphasia form a highly heterogeneous group with large individual differences in post-stroke linguistic profiles, severity of aphasia type, and recovery patterns. Cognitive disorders such as working memory limitations and/or executive-function deficits may also influence the impact of—and recovery from—aphasia (Lazar and Antoniello 2008).

Language Pathology 501 Table 19.1 C lassification of aphasias based on fluency, language understanding, naming and repetition abilities, and lesion site (BA = Brodmann area) Aphasia Type

Fluency

Comprehension

Naming

Repetition

Lesion (BA)

Broca’s

non-fluent

preserved

impaired

impaired

BA 44, 45

Wernicke’s

fluent

impaired

impaired

impaired

BA 22, 37, 42

Anomic

fluent

preserved

impaired

preserved

BA 21, 22, 37, 42

Conduction

fluent

preserved

impaired

impaired

BA 22, 37, 42

Global

non-fluent

impaired

impaired

impaired

BA 42, 44, 45

When focusing on acquired language disorders from the perspective of a UG-based approach, a number of questions about the mechanisms mediating language use after language breakdown in relation to linguistic knowledge (semantic, syntactic, phonological, pragmatic) become pertinent, such as the possibility of impaired core operations (cf. question (A) above) or the distinction between language use vs. linguistic knowledge (cf. question (B)). Furthermore, whether linguistic competence per se is impaired or whether language difficulties are the consequence of a breakdown in nonlinguistic, cognitive systems (e.g., working and/or short-term memory, executive functions) affecting language use merits deciphering (also relating to question (D)). Moreover, whether language pathology affects language use differently, depending on how or when in life the language breakdown occurred, deserves further exploration as well (cf. question (C)). We will now address each of these questions. Let us start with question (B), language use vs. language knowledge in aphasia. For the aims of this chapter and due to space restrictions, our focus lies on describing the effects of a specific language deficit (for words, syntactic rules, sentence meaning, etc.), such as agrammatic Broca’s aphasia, on the language faculty. Agrammatic aphasia is very well studied, and characterized within and across modalities (comprehension and production) by impaired and preserved syntactic abilities. Individuals with post- stroke agrammatic aphasia show no major cognitive deficits in general intelligence or any severe impairment in activities of daily living and social skills. In this case, we have impaired language and communication with cognitive and social abilities generally intact. The theoretical implications for analyses of impaired and spared grammatical abilities in agrammatic aphasia stem from the extensively reported asymmetry in grammatical knowledge and use of grammatically complex structures on the one hand, and between comprehension and production on the other. We will briefly present recent research on the topic as it contributes to the discussion of issues debated in current syntactic theory suggesting a resulting breakdown of language, not of UG. What these theories have in common is the same culprit: linguistic processing complexity is responsible for the agrammatic profile in light of generally intact linguistic representations. Nevertheless,

502 Ianthi MARIA Tsimpli, Maria Kambanaros, and Kleanthes Grohmann the notion of agrammatism per se remains widely open to cross-linguistic investigation and is not yet resolved. One large body of studies has shown that comprehension of sentences with canonical word order remains intact in agrammatic aphasia (e.g., active sentences, subject wh- questions, subject relative clauses, and subject clefts), while comprehension of sentences with noncanonical word order is impaired (e.g., passives, object wh-questions, object relative clauses, and object cleft sentences). Several accounts have been postulated to account for the underlying deficit in comprehension of canonical vs. noncanonical word order, which can be grouped as computational, syntactic, compositional, movement- derived, or as related to processing effects (see Varlokosta et al. 2014 and references there). Also, the frequency of certain attested complex constructions—that is, how often the construction appears in a language (e.g., verb movement or transitivity)—does not seem to impact negatively on comprehension of word order (Bastiaanse et al. 2009). A second body of work on agrammatism has revealed that a deficit in producing inflectional morphemes is selective. For example, tense-marking has been found to be more impaired than subject–verb agreement across morphologically distinct languages (such as Arabic, Hebrew, German, Greek, and Korean, to name but a few; see, e.g., Friedmann et al. 2013; Varlokosta et al. 2014). It has been suggested that this finding may be accommodated within a general hypothesis about impairment of interpretable features (Fyndanis et al. 2012), motivated by the framework of the Minimalist Program (Chomsky 1995b, 2000b), such as the Interpretability Hypothesis (Tsimpli 2001). This could mean that for agrammatic aphasia, there is a loss of interpretable features (tense), while uninterpretable features (agreement) are preserved. There is also increasing work on the production of negation markers across languages to explain the linguistic deficit in agrammatism. Several different hypotheses have been postulated to predict the findings, including hierarchical, syntactic, and computational accounts. The reader is referred to Koukoulioti (2010) for an excellent review of current predictions of the different hypotheses on the basis of published findings on use of negation cross-linguistically by agrammatic speakers. Here we would just like to mention that one way of addressing question (A) might be to adopt the ‘weak syntax’ approach of Avrutin (2006), possibly in terms of a ‘slow syntax characterization,’ according to which the operation Merge is ‘critically delayed in the agrammatic system’ (Burkhardt et al. 2008:122–123). However, since ‘slow syntax’ refers to a temporal delay, it is not at all clear that this would constitute an answer to (A) after all; the hypothesized limited processing capacity rather points to (B), affected mechanisms mediating language use, if not even (D), language-external factors. What remains fundamental to most accounts for either comprehension and production deficits of grammatically complex structures in agrammatism is that the overall language disorder reflects processing limitations or underspecified syntactic representations in terms of some features for grammatically complex structures—and not a breakdown in core linguistic knowledge (Kolk 1998; Grillo 2009). A second part to question (B), thereby also touching on (D), concerns the exact relationship between language use and the memory systems: short-term memory (STM)

Language Pathology 503 and working memory (WM) after language breakdown. Many studies have highlighted the strong connection between language and STM impairments in aphasia by using tasks that require a spoken response (e.g., digit repetition) and for tasks that are nonverbal (e.g., pointing span). Similarly, WM impairments have been argued to correlate with language impairments (word-level and sentence-level processing) in aphasia. The reader is referred to Salis et al. (2015) for a recent update of the literature on the close link between aphasia and STM/WM impairments. Given the focus in this chapter on UG, it is fair to say that very little has been done to evaluate the relationship between WM and linguistic complexity in aphasia. One preliminary study (Wright et al. 2007) reported a syntactic WM deficit (measured by n-back tasks) for patients with different types of aphasia (Broca, conduction, and anomic) who showed the most difficulty comprehending complex noncanonical word order vs. canonical word order. The authors claim that language-specific WM deficits are a promising avenue of research with regards to the specific type of processing required (syntactic, semantic, phonological) for sentence comprehension (see also Gvion and Friedmann 2012). Yet, the main controversies in relation to WM and aphasia are (i) whether a deficit in WM necessarily leads to a deficit in language understanding and (ii) the nature of WM for different language processes. With regards to question (C), we will focus on lexicon-based research coming out of our own work using the noun–verb distinction in production to probe lexical retrieval deficits in SLI and anomic aphasia. Both SLI and anomic aphasia share a number of characteristics in relation to their linguistic profiles (e.g., word-finding problems, fluent speech) but differ significantly in other areas (e.g., aetiology). Aphasia, for example, may range from mild to severe, but it almost universally affects the ability to find words, a condition known as anomia. Patients diagnosed with anomic aphasia often struggle in structured confrontation naming tasks and in conversational speech when retrieving the target word. In their attempt to produce the correct response, they often generate a wide range of errors including semantically or phonetically related words, descriptions of the word, unrelated or jargon words, or no response at all. Similarly, lexical SLI has been proposed as a separate sub-type of SLI by Friedmann and Novogrodsky (2008) based on their description of Hebrew-speaking children, who after rigorous testing met the criteria for lexical SLI; they failed a naming test, a word-to-definition task, and two verbal fluency tasks, but showed preserved abilities on grammatical (e.g., relative clause comprehension and production) and phonological testing (e.g., repetition of words and nonwords). In our research (Kambanaros 2010; Kambanaros et al. 2014; Kambanaros and van Steenbrugge 2013), we have shown that children with SLI and adults with anomic aphasia display a similar pattern of verb–noun performance in relation to context effects (picture naming vs. connected speech). In fact, the verb-specific naming deficit observed in both language impaired groups during a picture confrontation task involving concrete verbs (and nouns) did not resurface in connected speech (Kambanaros and van Steenbrugge 2013). Overall, there seems to be no direct, one-to-one relationship between the production of verbs/nouns in connected speech and the retrieval and naming of these word types in isolation. Children with SLI and adults with anomic aphasia have intact comprehension

504 Ianthi MARIA Tsimpli, Maria Kambanaros, and Kleanthes Grohmann for verbs and nouns but show a differential retrieval performance according to context, with poorer naming than word retrieval in connected speech. It is conceivable that multiple factors may influence greater retrieval success for verbs in connected speech. These can include the lexical characteristics of the target words (e.g., lexical-semantic heaviness or specificity), the interactions among activations at different linguistic levels (i.e., semantic, phonological), and the cognitive demands of the experimental tasks on verb use (e.g., contextual influences on lexical retrieval). Nevertheless, verb deficits cannot serve as a distinctive characteristic for either language-impaired group but only as a clinical descriptor in the context of naming. The bigger picture concerns the classic competence–performance distinction. We suggest using insights from such work on the lexical–syntactic interface and transform competence-based research into a performance-sensitive model: it is not an issue of being able to name a particular action with a single, well-known verb—it is a matter of activating and retrieving the concept speedily and successfully within the human language faculty and its interface with the performance systems for which there is a model. Put simply, in the current minimalist framework (Chomsky 2000b et seq.), there is a well-defined pathway from the lexicon to the conceptual-intentional interface (vocabulary, syntax, semantics) with the add-on component of sound (phonological structure, instructions to the articulatory-perceptual/sensorimotor system), roughly as depicted in Figure 19.2. In the research just cited (see also Kambanaros and Grohmann 2015), we show that language pathology can affect language use in a similar pattern, at least for the lexicon and for our populations under investigation, irrespective of whether the language impairment is developmental (e.g., SLI) or acquired (e.g., aphasia). In closing this section on aphasia, it is important to note that little is known about the relationship between aphasia treatment and the pattern of language impairment (speech, understanding, reading, writing), individual patient profiles (age, gender, education, languages), and stroke (severity, location, and time since onset). Unfortunately, these gaps in knowledge limit the accuracy of the diagnosis, the design of optimum rehabilitation interventions, and the precision of prognosis for people with aphasia following stroke, their families, and the health and social care professionals working with them.

19.6 Language Pathology within Comparative Biolinguistics In addition to the five foundational questions (Chomsky 1986b) addressed in section 19.2, Chomsky (2005) suggested three factors that are crucially involved in the design of language: genetic (UG), epigenetic (experience), and third-factor considerations (not language-specific). As McGilvray puts it in chapter 4, the final state of a child’s linguistic development is ‘the result of (1) genetic specification (biology), (2) “input” or “experience,” and (3) other nongenetic natural constraints on development, such

Language Pathology 505 as constraints on effective computation,’ that is, ‘Chomsky’s “three factors” ’ (see also chapters 5, 6, 7, and 14). The null hypothesis concerning UG is that it, qua faculty of language, is invariant across the species (see section 19.2). This perspective has recently been challenged in a very interesting proposal by Balari and Lorenzo (2015), who argue for a gradient view of language, ‘an aggregate of cognitive abilities’ (Balari and Lorenzo 2015:8), rather than a language faculty as such (see also Hernández et al. 2013; Kambanaros and Grohmann 2015; Grohmann and Kambanaros 2016): Developmentally speaking, the resulting gradient view purports that language is not circumscribed to a particular compartment of our mind/brain, but spreads on a complexly interactive system of bodily capacities subject to the impact of a correspondingly complex array of developmental influences, both endogenous and exogenous. Such a picture makes very unlikely the idea of a faculty of language as an epitome of sorts, from the point of view of which impaired, lessened or even enhanced variants must be deemed exceptionally deviant. (Balari and Lorenzo 2015:34)

Such a conceptualization of the biological factors underlying language fits well into the current debates concerning the size of UG; see, for example, many discussions on Norbert Hornstein’s Faculty of Language blog (http://facultyoflanguage.blogspot.com). These, in turn, were fed by Noam Chomsky himself opening up the possibility that a faculty dedicated to language need not be unique to humans; what is unique is the particular combinatorial system required for computing and interfacing language. This is Hauser et al.’s (2002) distinction between ‘Faculty of Language in the Broad Sense’ (FLB) and ‘Faculty of Language in the Narrow Sense’ (FLN); for useful follow-up, see in particular Fitch et al. (2005); Fitch (2009); Miyagawa et al. (2013); Berwick and Chomsky (2016); and Miyagawa (to appear). What this means is that the question of how language pathologies may inform UG and vice versa receives a new twist, too new as to provide a learned overview. But it certainly gives rise to interesting new questions. One such question concerns the neurobiological basis of language (see, e.g., Schlesewsky and Bornkessel-Schlesewsky 2013; and other chapters in Boeckx and Grohmann 2013). Addressing this question is obviously relevant for directly brain damage-induced language pathologies such as aphasias, independently of whether we assume a full-fledged faculty of language in the traditional sense (‘big UG’), a highly reduced one (‘small UG’), or the distinction between FLB and FLN (see chapter 1, section 1.1, and c hapter 5, section 5.6, especially note 5). However, one very important barrier to investigating the relationship of language breakdown and UG in this case is what Embick and Poeppel (2006) dubbed the ‘granularity mismatch problem’ (see also Poeppel and Embick 2005), ‘a serious problem for those aiming to find brain correlates for the primitives of [the human faculty of language]’ (Hornstein 2009:7n14): The Granularity Mismatch Problem (Poeppel and Embick 2005:104–105): Linguistic and neuroscientific studies of language operate with objects of different granularity. In particular, linguistic computation involves a number of fine-grained

506 Ianthi MARIA Tsimpli, Maria Kambanaros, and Kleanthes Grohmann distinctions and explicit computational operations. Neuroscientific approaches to language operate in terms of broader conceptual distinctions.

That is, if we understand language pathology to inform researchers on UG, or the other way around, then trying to find the neural correlate for a possibly impaired Merge or Move operation will not be easy, to put it mildly (notwithstanding the exciting recent findings reported in Ding et al. 2015). As Grohmann (2013:343) points out, this issue is also addressed by Boeckx (2010:159), who actually turns the tables: ‘[U]‌ntil neurolinguists try to look for units that match what theoretical linguists hypothesize [note with references omitted—KKG], the conundrum we are in will not go away.’ Regardless of the outcome of these developments, UG viewed from the perspective of language pathology may open new windows into the human faculty of language as conceived today, windows that may not have been available in earlier stages of theoretically informed language research. By highlighting some relevant linguistic profiles across selected language pathologies, one route for future research could be to refine these, collect more data, tie in impaired language more closely with current linguistic theorizing (and vice versa). However, there is also a broader, larger message behind the research directions already mentioned. A particular avenue of research that investigates more closely the commonalities behind genetic, developmental, and acquired language pathologies may be couched within what Benítez-Burraco and Boeckx (2014) refer to as ‘comparative biolinguistics.’ Looking at variation in language pathology allows us to consider in more depth ‘inter-and intra-species variation that lies well beneath the surface variation that is the bread and butter of comparative linguistics’ (Boeckx 2013:5–6; see also Samuels 2015 and chapter 21). This is a larger research enterprise (for a first sketch, see Grohmann and Kambanaros 2016, which parts of this sections draw on). The primary aim is to obtain distinctive linguistic profiles regarding lexical and grammatical abilities, for example, concomitant with the goal of developing cognitive profiles across a range of genetically and nongenetically different populations who are monolingual, multilingual, or somewhere in between, as well as populations with or without comorbid linguistic and/or cognitive impairments as part of their genotype. While individual variability is clinically crucial, population-based research can advance cognitive–linguistic theory through behavioral testing that acknowledges the brain bases involved. This will offer a unique opportunity to researchers to collaborate in fields as different as (but not restricted to) genetic biology, neurobiology of the brain, cognitive neuroscience, cognitive and developmental psychology, speech and language therapy or pathology, psycho-, neuro-, and clinical linguistics, and language development—as well as theoretical linguistics. In addition, it may inform us better about the underlying faculty(s) involved, with the role of UG in pathology of particular concern, of course. Some recent work goes in this direction, if only partially, such as emergent perspectives on autism phenotypes (Bourguignon et al. 2012), the biological nature of human language and the underlying genetic programs (Di Sciullo et al. 2010), genetic factors in the organization of language and the role of the two cerebral hemispheres (Hancock and Bever 2013), and the idea that syntactic networks

Language Pathology 507 may constitute an endophenotype of developmental language disorders (Barceló- Coblijn et al. 2015). One concrete research project could involve child populations with developmental language disorders that are language-based (SLI and beyond), behavior-based (such as ASD), or the result of a genetic syndrome (not discussed in this chapter). It would investigate language competence and performance with a range of tools (from lexical to syntactic) as well as nonverbal, cognitive tasks (such as executive control); additional information will be yielded from comparing mono-and bilingual participants. Ideal groups could include the following: • Specific Language Impairment (SLI): SLI is considered a language disorder in children exhibiting difficulties acquiring grammar, phonological skills, semantic knowledge, and vocabulary, despite having a nonverbal IQ within the normal range. • Developmental Dyslexia (DD): Children with DD experience problems learning to read, write, and spell below their chronological age, despite having a nonverbal IQ within the normal range. • Autism Spectrum Disorders (ASD): Children with a high-functioning ASD, such as Asperger’s, have problems with language and communication; they also show repetitive and/or restrictive thoughts and patterns of behavior, despite having a non-verbal IQ within the normal range. • Down Syndrome (DS): DS is caused by three copies of chromosome 21 instead of the normal two; children present with language and cognitive deficits, though differently from WS. • William’s Syndrome (WS): Individuals with WS are missing around 28 genes from one copy of chromosome 7; children present with language and cognitive deficits, though differently from DS. • Fragile X Syndrome (FXS): In FXS, a particular piece of genetic code has been multiplied several times on one copy of the X chromosome; children present with language and cognitive deficits.

19.7 Conclusion In a sense, the purpose of this chapter was twofold. First, we wanted to provide a rough- guide overview of the present state of the art concerning linguistic accounts of some pertinent language pathologies. We believe that much can be gained from an interface approach that looks at the interaction of different components of linguistic analysis. The second goal, however, is less graspable and directly links the study of language pathology to the theme of this handbook: What can non-intact language tell us about UG? Looking at explicit accounts in the current literature, one could reach the conclusion that, despite decades of linguistically informed investigations into impaired language,

508 Ianthi MARIA Tsimpli, Maria Kambanaros, and Kleanthes Grohmann not much progress has been made in addressing the fundamental, underlying issue: the nature of the human faculty of language (UG) and its role in genetically damaged language, impaired language development, or acquired language disorders. It is fair to say that most of the discussion on language pathology suggests a resulting breakdown of language (possibly alongside speech and/or communication), not of UG. In the past, allusions to some deeper breakdown may have been made. However, to our knowledge no study had been conducted in order to tease apart direct connections between language breakdown and UG; when UG principles were evoked, it was typically as an epiphenomenal attempt of explaining independent findings. An additional contributing factor seems to be the confusion that surrounds the notion of ‘UG,’ especially in more recent years—in particular, whether the language faculty should be considered FLB or FLN (Hauser et al. 2002) and whether we assume a ‘big UG’ or a ‘small UG’ (Clark 2012) (but see Fitch 2009 for clarifications on both, and this volume chapter 1, section 1.1, and chapter 5, section 5.6, note 5), and perhaps whether it is static or gradient (Balari and Lorenzo 2015). More recent advances in biolinguistics (Boeckx and Grohmann 2013) also provide new ways of thinking about, and investigating, the links between typical and impaired knowledge of language, language acquisition, language use, and its implementation in the brain (and organism at large). These are exciting, new starting points for dedicated research into the relationship between language pathologies and UG.

Chapter 20

The Sy ntax of Si g n L anguag e a nd Universal G ra mma r Carlo Cecchetto

20.1 Introduction Until the early sixties of the last century, a book on universal grammar could not possibly have included a chapter on sign languages, not only because the very concept of universal grammar was still being developed, but also because up to that time linguists considered sign languages impoverished gestural systems of communication, not real languages. This misconception was due to the (wrong) impression that sign languages lack ‘duality of patterning,’ namely the property that the meaningful units of language, say words or morphemes, are made up of meaningless units, say phonemes (see Hockett 1963 for the claim that duality of patterning is one of the defining features of human languages and can distinguish them from animal communication systems). The reason why sign languages seemed to lack duality of patterning is that signs were taken to be transparent gestures that transmitted meaning by virtue of being holistic and iconic. This misconception started being refuted with the publication of Stokoe (1960), who for the first time identified the equivalent of phonemes in American Sign Language (ASL), minimal units that distinguish minimal pairs of signs (in Stokoe’s early work, these units were the hand configuration, the location of the sign, and the movement1 with which the sign is produced). Much water has flowed under the bridge since the pioneering work of Stokoe and today no professional linguist questions the fact that sign languages are full-fledged 1 To avoid any terminological confusion, in this chapter I will call ‘movement’ the actual movement of the hand(s) when a sign is produced and ‘syntactic movement’ the operation of displacement of a syntactic category.

510 Carlo Cecchetto linguistic systems. For example, it is established that they have the same expressive power as spoken languages and that their phonology, morphology, and syntax are rich and complex (Sandler and Lillo Martin 2006; and Pfau, Steinbach, and Woll 2012). Still, sign languages are (or at least look) very different, since they exploit the visual- spatial modality in contrast to the vocal-auditory modality. Furthermore until very recently, linguistic research has been severely biased towards spoken languages, and this is also true for the debate about language universals and/or universal grammar. For example, the two distinct ways of looking at the universal traits of human languages, the typological tradition that goes back to Joseph Greenberg (see c hapter 15) and the generative (Universal Grammar) tradition that goes back to Noam Chomsky, were based on data and observations coming from spoken languages only. Therefore, it is important to ask whether the conclusions that had been reached based on the study of spoken languages hold across modalities. In this chapter, I cannot be exhaustive and, in particular, I will not focus on sign language typology, since this research field has just grown out of its infancy. Instead, my main focus will be reporting on works that investigate questions like the following: do sign languages have the hierarchical and recursive organization that spoken languages have been claimed to have? Is syntactic movement attested in sign languages and, if it is, do general constraints on syntactic movement (i.e., the fact that the landing site must c-command the base position, island constraints, etc.) hold? Do the putative universal interpretative constraints that govern anaphora in spoken languages (Binding Theory in Chomsky’s 1981a terminology) extend to sign languages? After answering these questions (mostly positively), I will focus on some properties that make sign languages special and, at least prima facie, seem to make them resistant to an analysis based on categories developed for spoken languages. This chapter is organized as follows: Section 20.2 is devoted to the hierarchical organization of the clause. Section 20.3 focuses on subordination as a case of recursion. Section 20.4 discusses constraints on syntactic movement, and section 20.5 deals with binding (especially, Principle C and Vehicle Change). Section 20.6 is devoted to the challenges posed by sign languages to the theory of grammar, and section 20.7 contains a short conclusion.

20.2 Hierarchical Organization Although there have been some claims that sign languages do not rely on an abstract hierarchical structure (cf. Bouchard 1996), there is clear evidence (see Sandler and Lillo Martin 2006 for an overview) that it is possible to single out in the clausal structure of sign languages the three structural layers that that have been identified in decades of research based on spoken languages: the most embedded layer (VP or vP) where thematic relations are established, the intermediate (extended) IP layer, where properties

The Syntax of Sign Language and Universal Grammar 511 like tense, aspect and agreement are determined, and the external CP layer specialized for subordinating particles and for discourse-related features like interrogative force, topic, and focus. Showing that sign languages have a hierarchical structure can be done by adapting to sign languages tests that have been developed for spoken language. For sure, this adaptation is not always straightforward, given the modality-specific properties shown by sign languages (and given the fact that fieldwork on under-studied languages is never trivial!). However, as I will show in this section, no special challenge comes from sign languages for the hypothesis that the clause has an abstract underlying hierarchical structure, contrary to what is explicitly claimed by Evans and Levinson (2009:438). Evans and Levinson also claim that ‘many proposed universals of language ignore the existence of sign languages—the languages of the deaf.’ Unfortunately they do not say which universals of language would be falsified by sign languages. Certainly, hierarchical organization is not, as I will show. A classical way to establish that the clause has a structural organization involves constituency tests like anaphora and ellipsis. I will apply the ellipsis test by using Cecchetto, Checchetto, Geraci, Santoro, and Zucchi’s (2015) description of clausal ellipsis in LIS as a guide. In LIS (Italian Sign Language), a predicate can go unuttered if a suitable antecedent is present. For example, the predicate can be elided2 if the elliptical clause contains an adverbial sign like same, meaning ‘too’/‘as well’ (cf. 1). LIS is a head-final language, so the verb break follows the direct object vase in (1).3 __________topic (1)

dining-room gianni vase break. mario same [LIS] ‘Gianni broke a vase in the dining room and Mario did so too.’ (adapted from Cecchetto et al. 2015)

In (1), the missing constituent in the elliptical clause corresponds to the entire VP in the antecedent clause, namely vase break plus (the trace of) the locative phrase dining- room (in order to make the sentence fully natural, dining-room is topicalized into the left periphery of the antecedent clause in (1), as indicated by the topic’s non-manual- marking, roughly raised eyebrows). 2 I use the terms ‘elided’ and ‘deletion’ simply to indicate that a category is interpreted but is not pronounced. It is not important for our purposes to discuss whether this is due to actual phonological deletion at PF or is due to a mechanism of LF copying that provides a null category with semantic content. 3 Following common practice, I will gloss the signs by using words in small caps. For the reader’s convenience, I will use English words throughout and I will indicate the language of the sentence to the right of each gloss. The glosses do not do justice to the full morphological and phonological complexities of the signs, but I will indicate the non-manual markings that are relevant for the phenomenon that I am discussing by a continuous line over the signs the non-manual marking is co-articulated with. I refer to the original papers for access to videos or photo clips.

512 Carlo Cecchetto Crucially, the missing constituent can correspond to a smaller segment of the VP. For example, sentence (2) is also acceptable: __________topic (2) dining-room gianni vase break. mario same kitchen [LIS] ‘Gianni broke a vase in the dining room. Also Mario did, but in the kitchen.’ (adapted from Cecchetto et al. 2015) In (2) the missing category is just the segment [vase break], as shown by the fact that it is modified by a locative expression (kitchen), different from the locative phrase in the antecedent clause (dining-room). Simple as it is, the ellipsis diagnostic is important because it shows that the VP in the elliptical clause cannot be flat (as in the schematic representation in (3)) but must have a hierarchical organization (as in (4)). As a matter of fact, assuming a hierarchical structure, one can easily explain the different dimension of the ellipsis site in (1) and (2) by saying that in (1) the entire VP is elided under parallelism with the antecedent VP (the bigger circle in the schematic representation (4)), while in (2) only the complement plus the verb is elided (the inner circle in the schematic representation (4)). (3)

vase

break

kitchen

(4)

kitchen

vase

break

The ellipsis diagnostic can confirm the hierarchical (as opposed to flat) structure of other clausal layers in LIS. For example, an auxiliary can, but does not need to, be omitted in the elliptical clause (remember that the auxiliary for future indicated by fut in (5) follows the VP, since LIS is head-final). By using the same logic illustrated with previous examples, one can say that in (5a) the layer formed by the

The Syntax of Sign Language and Universal Grammar 513 auxiliary plus the VP is elided under parallelism with the antecedent clause, while in (5b) only the VP is elided while the inflectional node hosting the fut node survives.4 Needless to say, a representation with a flat structure could not easily explain why a smaller portion is elided in (5b) than in (5a). (5)

a. gianni bean eat fut. piero same

[LIS]

b. gianni bean eat fut. piero fut same [LIS] ‘Gianni will eat beans, and Piero will too.’ (adapted from Cecchetto et al. 2015) The ellipsis diagnostic can be applied up to the higher level of the clausal organization. For example, LIS shows sluicing, as in (6). Under canonical approaches (Merchant 2001), in sluicing the entire IP out of which the wh-phrase has been extracted is elided under parallelism with the antecedent IP. Again this analysis presupposes that the clause is hierarchically organized and that the upper layer (the CP) can be preserved when the lower layer (the IP) undergoes deletion. ___wh (6) gianni someone meet but who i-know not [LIS] ‘Gianni met someone but I do not know who.’ (adapted from Cecchetto et al. 2015) So we see that standard constituency tests can be fruitfully applied to sign languages. Admittedly, this may appear far from surprising to some of the readers of this book. Still it is important to establish these basic facts, in the light of observations like the one I reported at the beginning of this section according to which sign languages would be a challenge to the hypothesis that languages have an intrinsic hierarchical and recursive organization. I switch to recursion in the next section.

20.3 Recursion (Subordination) Another defining feature of natural languages is the fact that they display recursive structures. For example, subordinate clauses can be embedded an unbounded number of times (‘John thinks that Mary said that Robert is convinced that Tony believes that Oscar doubts that …’). In fact, a defining feature of the syntax of natural languages is the combination of hierarchical structure and recursion. In principle, these properties are independent from each other, since it is easy to find structures that are produced by a 4

An alternative analysis is possible for this specific example, namely (5b) might be a case of English- type VP ellipsis (‘Gianni will eat beans and Piero will too’) while (5a) might be a case of stripping (‘Gianni will eat beans and Piero too’). See Cecchetto et al. (2015) for discussion.

514 Carlo Cecchetto recursive rule that are not hierarchical or structures that are hierarchical but are not produced by a recursive rule. An example of a rule that is recursive (in the sense that it can be applied to the result of its previous application) but does not produce a hierarchical structure is a command like: ‘put a copy of the third letter in the next sequence between the first and the second letter of the same sequence.’ This produces sequences like (7), which in principle are infinitely long but nonetheless lack a hierarchical organization. (7) [d]‌[b][q] [d]‌[q][b][q] [d]‌[b][q][b][q] [d]‌[q][b][q][b][q] [d]‌[b][q][b][q][b][q] … An example of a structure which is hierarchical but not recursive is easily detectable in the language domain: as observed by Nespor and Vogel (1986), rules that construct the phonological hierarchy are not recursive (for example a syllable has a hierarchical organization but no syllable is found inside another syllable). In the previous section, I reviewed evidence that the clausal organization of sign languages is hierarchical. Is it possible to show that a hierarchical sign language structure (say, a clause) can be embedded within another hierarchical sign language structure of the same type? For example, can a CP be embedded under another CP? Not surprisingly, the answer to this question is positive, but reaching this conclusion forced sign language linguists to make sure that there are reliable tests to distinguish simple juxtaposition of clausal units (akin to coordination) from real embedding (subordination). A case of subordination that has been investigated is complement clauses. As the sign languages studied up to now typically do not have manual subordination markers introducing complement clauses, linguists had to find more indirect markers. Padden (1988) in her seminal work on subordination in ASL identified some. I will mention here the phenomenon called Subject Pronoun Copy. In ASL (but the fact is also attested in other sign languages) a pronoun that refers to the main clause subject may occur at the end of the sentence. As Padden shows, Subject Pronoun Copy can be used as a test to differentiate between subordination and coordination, since copying is allowed in the former but not in the latter. In cases of subordination like (8), the sentence-final pronoun 1index is coreferential with the subject of the main clause 1index (importantly, the structure is monoclausal, since there is no intonational break): (8)

1index decide 1index should idrivej see children 1index

‘I decided he ought to drive over to see his children, I did.’ (Padden 1988:86–88)

[ASL]

The Syntax of Sign Language and Universal Grammar 515 However, in cases of coordination like (9), the subject pronoun copy can only be coreferential with the subject of the second conjunct but not with the subject of the first conjunct, as shown here: (9) *1hiti1 index tattle mother1 index [ASL] ‘I hit him and he told his mother, I did.’ (Padden 1988:86–88) Tang and Lau (2012) discuss other possible ways to distinguish genuine complement clauses from coordination in cases in which the absence of a subordination particle makes the structure potentially ambiguous. These tests include the distribution of non- manual-marking and constraints on extraction. Another area where clear cases of subordination have been described is adverbial clauses, since the relevant sign language literature, especially that on relative clauses, is fairly extensive. It should be stressed that for our purposes in this chapter embedding of an adverbial clause inside a matrix clause is a genuine case of recursion, as, at the abstract level, this is still a case of CP embedded inside another CP. With adverbial clauses manual subordination markers are more easily detectable. This includes signs corresponding to ‘if ’ and relativization markers. As discussed by Branchini et al. (2007) and Tang and Lau (2012), data from at least four different sign languages—ASL, LIS, HKSL (Hong Kong Sign Language) and DGS (German Sign Language)—show that different sign languages use different relativization strategies that fit well into the typological pattern identified for spoken languages (for example, both externally headed and internally headed relatives are found in these languages). A marker of relativization that is found systematically across these sign languages, and across LSC (Catalan Sign Language) and TİD (Turkish Sign Language), two other languages for which a systematic description of relative clauses is available, is a specific non-manual-marking that occurs over the relative clause and tends to be similar to topic marking. However, all these languages also display manual signs that function as relativization markers, which tend to be similar (and possibly derived from) determiners or indexical signs used in the language. The obligatoriness of the non-manual-marking varies from language to language but the picture that emerges overall is that relative clauses are clearly distinguished from coordinate structure.5 The literature on other types of adverbial clauses, like conditionals, reason clauses, and purpose clauses, is more limited in the sign language field but, even based on the short overview that I have offered, it is safe to conclude that sign languages do have clear markers of subordination. When a manual sign is not present, a grammaticalized non-manual-marking (typically, a construction-specific and a sign-language-specific facial expression) indicates the type of subordination that is involved. 5

For relative clauses in ASL: Liddell (1978); for relative clauses in LIS: Cecchetto, Geraci and Zucchi (2006), Branchini and Donati (2009), and Branchini (2014); for relative clauses in DGS: Pfau and Steinbach (2005); for relative clauses in LSC: Mosella (2011); for relative clauses in HKSL: Tang and Lau (2012); for relative clauses in TİD: Kubuş (2010).

516 Carlo Cecchetto

20.4 The Displacement Property and Constraints on Syntactic Movement Another universal trait of natural languages, as opposed to artificial systems of communication, is the so-called displacement property, namely the fact that syntactic categories (words or phrases) do not necessarily show up in the position where they are interpreted but may be displaced to another position in the clause. The displacement property is manifested by syntactic movement and by the related phenomenon of creation of syntactic dependencies or ‘chains.’ A well-known case is wh-movement, where a wh-item typically moves to a dedicated position in the left periphery in languages like English. An interesting fact about the displacement property is that there is no obvious motivation for why languages should display it. Artificially designed systems do not necessarily include this property and in fact for the paradigmatic cases of syntactic movement it is easy to find languages that express the same grammatical information without exploiting the syntactic movement option (this is the case of wh-in-situ languages for example; see c hapter 14, section 14.3.4). Still, languages in which no type of syntactic movement is present do not seem to be attested (incidentally, this raises the question of why the displacement property is a language universal and a component of Universal Grammar, see Chomsky 1995b:222 and much subsequent work for discussion). Furthermore, syntactic movement has some general properties that hold cross-linguistically: in principle, the displaced item and its base position can be separated by arbitrarily many intervening clause boundaries provided that the landing site c-commands the base position. However, there are certain categories that block syntactic movement (called ‘islands,’ starting from the seminal work of Ross 1967). The precise taxonomy of islands may vary from language to language, but there is a core set of structures (including if-clauses and relative clauses) out of which syntactic movement (at least overt syntactic movement) is always banned. Another very general condition on syntactic movement is the fact that minimality effects are observed, namely, no category can intervene between the landing site and the base position if the intervening category is similar in its featural specification to the category that has moved (cf. Rizzi 1990, who shows that intervention is defined in terms of c-command, not in terms of linear intervention). Does this picture, largely based on spoken languages, extend to sign languages? The research on sign languages has established that syntactic movement is attested, the two best-studied cases being wh-movement and topic movement. The first research focused on ASL and established that, although wh-signs can remain in situ in direct questions, they can also be displaced to the sentence periphery, where the landing site c-commands the base position. However, in ASL when the wh-sign moves, it can appear either in the left or in the right periphery (or in both). The precise conditions under which a wh-sign is allowed to appear in the right or the left periphery are the subject of an intense controversy in the field (Petronio and Lillo-Martin 1997 argue that Spec,CP is linearized on the left in ASL, while Neidle, Kegl, MacLaughlin, Bahan, and Lee 2000 argue that it

The Syntax of Sign Language and Universal Grammar 517 is linearized on the right; this disagreement is based on a different assessment of the data and/or dialectical variation among consultants of the two research groups). The presence of wh-movement (as well as the in-situ option) is expected if the same general principles govern question formation in sign and spoken languages. The presence of the wh-phrase in the right periphery is less expected since cases of spoken languages in which wh-phrases systematically occur at the right edge of the sentence are extremely rare or even unattested (see Cecchetto 2012 for discussion). However, the case for Spec,CP being linearized on the right can be made more forcefully on the basis of sign languages in which wh-phrases can move only to the right periphery. One example is LIS. In fact, Cecchetto. Geraci, and Zucchi (2009) have proposed that there is macrotypological variation between sign and spoken languages since in the former Spec,CP can be linearized to the right, while this is not possible in spoken languages. Ultimately they derive this difference from the presence in sign languages of a non-manual-marking which is specific for wh-chains (in ASL, LIS, and many other sign languages, wh-questions are marked by furrowed eyebrows). They assume (following Ackema and Neeleman 2002 among others) that specifiers are normally linearized on the left because this facilitates processing, since only if Spec positions are linearized to the left does a filler precede its gap. However, they claim that the presence of non-manual wh-marking helps the parser to single out wh-chains, even when the filler follows its gap. This makes possible the linearization of Spec,CP to the right, which is normally excluded in the absence of a specific non-manual wh-marking. Be that as it may, although the issue of Spec,CP linearization in sign languages is still open, and it is even possible that this is an area of macrotypological variation between sign and spoken languages, the presence of wh-movement to a dedicated position in the clausal periphery is not under discussion. The next question is whether wh-movement is unbound in principle. For example, is wh-extraction allowed from an embedded clause, provided that the latter is not an island? Answering this question is not straightforward, because many researchers have noted that long-distance wh-movement is rarely observed in natural conversation in sign languages. However, both Petronio and Lillo-Martin (1987) and Neidle, Kegl, MacLaughlin, Bahan, and Lee (2000) report examples of long wh-extraction in ASL that are accepted by most (or all) their informants. An example is (10), in which the subject of the embedded clause moves to the right periphery of the main clause, as indicated by the bracketing (that WHO moves to the periphery of the matrix clause, as opposed to the periphery of the embedded clause, is indicated by the fact that (10) is a matrix, not an embedded, question). _________________wh (10) [teacher expect [ ti pass test ] whoi ] [ASL] ‘Who does the teacher expect to pass the test?’ (adapted from Neidle et al. 2000) The next question is whether wh-movement in sign languages obeys island constraints. Unfortunately, this issue is understudied, so much caution is needed here. In an early work, Lillo-Martin (1992) claims that extraction from embedded clauses in

518 Carlo Cecchetto ASL is subject to a stricter version of boundedness than extraction in English. Geraci, Aristodemo, and Santoro (2014) report a complex pattern in which, although adjunct- island effects are observed, they can be partially obviated both by using strategies available in spoken languages (ameliorating effects are found with D-linked wh-phrases) and by sign-language-specific strategies (spatial agreement). Another attested case of syntactic movement in sign languages is topic movement, which (unlike wh-movement) uniformly targets the left periphery, at least in the sign languages in which topicalization has been studied up to now. Topic movement can take place long-distance, as shown by the ASL example in (11), discussed by Padden (1988). (11)

exercise classi, 1index hope sister succeed persuade mother [ASL] take-up ti ‘The exercise class, I hope my sister manages to persuade my mother to take (it).’ (Adapted from Padden 1988)

Topic movement has been shown to be sensitive to an island, since it obeys Ross’s (1967) Coordinate Structure Constraint, which states that no element can be moved out of a conjunct unless an identical constituent is extracted from the other conjunct in the coordinate structure. (12) is a case of Coordinate Structure Constraint violation: the direct object of the verb ‘give’ in the first conjunct is topicalized, while the object of the verb ‘give’ in the second conjunct is not (in technical jargon, extraction does not take place across-the-board in (12)). No sign for conjunction is present in (12) because in ASL, as in other sign languages, coordination does not need to be marked by a manual sign (a pause or a non-manual-marking like a headshake is enough to signal coordination): (12)

*flower, 2give1 money, jgivei *‘Flowers, you gave me money but she gave me.’ (adapted from Padden 1988:93)

[ASL]

(13) shows that, when it takes place across-the-board, topic extraction is possible, as expected under the Coordinate Structure Constraint (in (13), a coordination particle is present):6 (13)

that moviei, steve like ti but ‘That movie, Steve likes but (adapted from Lillo-Martin 1991:60)

julie dislike ti [ASL] Julie dislikes.’

6 See Tang and Lau (2012) for further discussion of the Coordinate Structure Constraint in sign languages. They show that HKSL patterns with ASL as far as topic movement is concerned, but that wh-movement is never allowed out of a conjunct structure, not even when wh-movement takes place across-the-board. See also Lillo-Martin (1991), who claims that topic extraction is possible from an embedded clause only if the verb is ‘agreeing.’ If the verb is ‘plain,’ an overt pronoun is required to fill the gap (for the distinction between ‘plain’ and ‘agreeing verbs’ in sign languages, see Padden 1988; and Mathur and Rathmann 2012).

The Syntax of Sign Language and Universal Grammar 519 I summarize this section by pointing out that the studies available up to now allow us to conclude that sign languages display the displacement property, which has been identified as a landmark of natural languages and as a component of Universal Grammar. However, more research is needed, since questions that have been at center of the debate in generative grammar, like the issue of islands and of intervention effects, have not been systematically studied yet in sign languages.

20.5 Pronouns and Binding Another area in which it has been proposed that there are constraints that are universal and are plausible candidates for being part of Universal Grammar is anaphoric relations between a pronoun and the noun phrase it depends on. A well-known attempt to capture these interpretative constraints is the systematization proposed by Chomsky (1981a) in terms of Principle A, Principle B, and Principle C of the Binding Theory (see Reuland 2011 for a recent discussion). It is worth asking whether the interpretative constraints that have been proposed based on spoken language research hold for sign languages as well. This is even more important given the rather radical claim that most sign languages would lack pronouns (Evans and Levinson 2009:431,435). Presumably, this claim refers to important differences between the paradigm of pronouns in languages like English (where they are inflected for first, second, and third person, singular and plural) and pronominal expressions in sign languages. In the sign languages studied up to now, personal pronouns take the form of pointing signs. First person pronouns are usually pointing signs directed towards the signer’s chest (but not always, for example in Nihon Shuwa, Japanese Sign Language, they are directed towards the signer’s nose). Second person pronouns are directed towards the interlocutor of the signer. As for pointing signs that refer to some other referent, if this is physically present, the signer points to its location. If it is not present, the signer may establish a point in space for that referent that can be arbitrary. Importantly, once a location in space for a referent has been established, that same location will be used to refer to that referent in the following discourse. Therefore, location in space becomes a crucial device to track (co)reference. This introduces interesting differences with spoken languages since the latter lack these spatial devices. While pointing signs to the signer and to his/her interlocutor can be more easily assimilated to first/second person pronouns, assimilating pointing signs to another referent to third person pronouns is more difficult. The reason is that in principle there are infinite loci in the signing space to which the signer can associate the referents (s)he is talking about. Since the system seems gradient rather than categorical (with potentially infinite loci instead of a discrete number of morphological forms), it is controversial whether pointing signs are analogous to the three-person system familiar from spoken languages. This has raised a lively and interesting debate in the sign language literature (Lillo-Martin and Klima 1990; Meier

520 Carlo Cecchetto 1990; among others). Incidentally, this debate cannot be captured by the simplifying claim that sign languages ‘lack pronouns.’ However, instead of summarizing any further this important debate, I will ask the question whether the general principles that govern anaphora hold across modalities, notwithstanding the important differences introduced by the fact that in sign languages anaphoric relations can be mediated by the use of space. I will focus here on Principle C of Binding Theory since to the best of my knowledge no clear counterexample has been identified up to now.7 As is well known, Principle C roughly states that coreference is not allowed between a pronoun and an NP when the pronoun c-commands the NP. Again I will take Cecchetto et al.’s (2015) analysis of LIS as guide. An example of Principle C violation is (14): in the first clause the proper names gianni and maria are articulated in two distinct positions in the signing space (indicated by the subscripts ‘a’ and ‘b’). If the indexical sign ix is articulated in the same position as maria in the second clause (to indicate that the pronoun and the proper name it c-commands have the same semantic value), the sentence is ungrammatical. Importantly, the sentence would be acceptable if the pointing sign ix were articulated in a position in the signing space different from the position where maria is articulated, which is indicated by the subscript ‘b’ in (14). We can conclude that the second clause in (14) cannot mean that Maria loves herself, because this reading would trigger a Principle C violation: (14) giannia mariab love. *ixb mariab ‘Gianni loves Maria. *She loves (Modified from Cecchetto et al. 2015)

love Maria,

same too.’

[LIS]

What about Principle A and Principle B of Binding Theory? Do they apply to sign languages? I cannot discuss reflexives for reasons of space, but I can report the facts about so-called Vehicle Change (Fiengo and May 1994), a case in which Principle B and Principle C interact. This case may seem to be an interpretative quirk, but it is interesting for us, not only because it indirectly shows that Principle B effects are attested, but also because it reveals that the same interpretive principles govern anaphora in sign and spoken languages up to a level of detail that a priori might seem surprising. I first describe Vehicle Change by using English examples for the reader who is not familiar with this phenomenon. The English sentence (15) is ungrammatical under the intended reading (indicated by coindexing), and this is expected since the elided VP contains a referential expression (‘John’) which is c-commanded by a coindexed pronoun. This triggers a 7

In their reply to Luigi Rizzi’s commentary (in which Rizzi mentions Principle C as a clear case of a language universal), Evans and Levinson cite Guugu Yimidhirr (spoken in Northern Queensland, Australia) as a counterexample. However, in the original paper in which binding in Guugu Yimidhirr is discussed, just one example is reported as a counterexample and the author says about this sentence that it ‘is apparently fine, if contrastive in emphasis’ and comments that ‘although Condition C cannot be held to hold at a grammatical level, it does in fact quite well describe a usage preference’ (Levinson 1987:392). Therefore, the unique Guugu Yimidhirr example is not a compelling counterexample, as far as I can see.

The Syntax of Sign Language and Universal Grammar 521 Principle C violation, as indicated in (15′), the representation where strikethrough indicates the VP which is deleted: (15) (15′)

* Maryy admires Johnx and hex does too. Maryy admires Johnx and hex does admire Johnx too.

The surprising fact that goes under the name of ‘Vehicle Change’ is the grammaticality of (16): (16) (16)

Maryy admires Johnx, and hex thinks that Sallyz does too. should also be a Principle C violation, if the representation before deletion were (16′): (16′) Maryy admires Johnx, and hex thinks that Sallyz does admire Johnx too. In order to make sense of the grammaticality of (16), Fiengo and May (1994) suggest that what happens in (16) is that the referential expression ‘John’ can undergo Vehicle Change, namely a pronoun replaces the referential expression in the elided VP but preserves its indexical information. So the structure before deletion (16) would be (16″), rather than (16′): (16″) Maryy admires Johnx, and hex thinks that Sallyz does admire himx, too Crucially, the Vehicle Change analysis can still explain why (15) is ungrammatical in the intended reading. True, if Vehicle Change applied, (15) would stop being a Principle C violation but it would become a Principle B violation, as shown in (15″): (15″) *Maryy admires Johnx and hex does admire himx, too In LIS, exactly the same pattern can be reproduced in the elliptical construction that we mentioned in section 20.2. A sentence like (16) cannot mean that Maria loved herself, and this is expected, since the relevant reading would trigger a Principle C violation. In fact, the structure of (16) before deletion of the VP is the sentence (14), which, as we know, is a Principle C violation. (17)

*giannia mariab love. ixb same8

The example showing Vehicle Change is (18). As indicated by the translation, (18) can mean that Gianni loves Maria and Maria thinks that Piero loves her (= Maria): (18)

8

giannia mariab love. ixb think pieroc same ‘Gianni loved Maria and she thinks that Piero did too.’

A caveat: (16) is grammatical under the reading ‘Gianni loves Maria and Maria loves Gianni, too’. This is expected since under this reading no binding principle is violated.

522 Carlo Cecchetto The availability of this reading shows that Vehicle Change takes place, as made clear in (18′), which is the representation before deletion under Vehicle Change. As a matter of fact, without Vehicle Change the intended reading of (18) would elicit a Principle C violation, as shown in (18″), the representation before deletion without Vehicle Change (as in the English examples, strikethrough indicates the structure which is deleted). (18′) (17″)

giannia mariab love. ixb think pieroc ixb love same giannia mariab love. ixb think pieroc mariab love same

The reader may have noticed that the loci in the signing space to which the referential expressions and the pronoun are associated (indicated by the subscripts ‘a,’ ‘b,’ and ‘c’) are devices of reference tracking similar to the referential indexes used in the English examples (indicated by the subscripts ‘x,’ ‘y,’ and ‘z’). In fact, it has been proposed (see Lillo-Martin and Klima 1990; among others) that loci in sign languages are the overt manifestation of referential indexes, which in spoken languages remain covert. Schlenker (2011), building on this insight, claims that the overt nature of indices in sign language makes it possible to bring overt evidence to bear on classic debates in formal semantics. So, issues that may remain elusive if investigated in spoken languages can be better understood once sign languages are included in the picture. An issue that according to Schlenker may be better understood in sign languages is so-called donkey sentences, exemplified in (19), in which a pronoun is bound in the absence of c-command. He claims that ASL and LSF (French Sign Language) data indicate that the dynamic approach initiated by Kamp (1981) and Heim (1982) is better suited to handling donkey sentences than the alternative E-type analysis (cf. Evans 1980). (19)

If a farmer owns a donkey, he beats it.

Based on what I said in this section, we can conclude that some clear differences between spoken and sign languages in the domain of anaphoric relations due to the meaningful use of space in sign languages do not prevent the use of the same general analytical categories across modalities. Furthermore, this may have a positive result where the application of these categories to sign languages leads to a better understanding of the grammar of spoken languages as well.

20.6 Challenges The discussion that I have reported up to now should not give the impression that it is possible to make an automatic application to sign languages of the analytical categories initially elaborated for spoken languages. On the contrary, an application is possible only if these categories are refined and/or reshaped to capture the peculiarities of language

The Syntax of Sign Language and Universal Grammar 523 in the visual-gestural modality. In fact, the challenges are even greater than this. There are indeed some aspects of sign language grammar that seem to resist an analysis based on traditional linguistic categories. This is very interesting, but it does not necessarily speak against universalist approaches. It might well be an indication that the huge bias towards spoken languages has limited the explanatory power of the linguist’s toolkit and that the new focus on sign language may lead to significant refinements, which will ultimately allow a better understanding of the mechanisms of universal grammar across modalities. As an example, I will briefly discuss classifier predicate constructions. Although classifiers are present in spoken languages as well, sign language classifier predicate constructions have very interesting peculiarities that, at least prima facie, set them apart from constructions familiar from spoken languages. Classifiers in sign languages are manual configurations that identify a class of objects and typically do so by visually representing some properties that these objects share, i.e., their size, shape, or the way they are handled. When a classifier (a manual configuration) is associated with a movement that iconically reproduces the actual movement of the object the classifier refers to, a classifier predicate is formed. For example, in ASL the handshape- 3 (namely the configuration the hand assumes to represent the number 3) refers to the class of vehicles and can be used to talk about the movement of a car, a bicycle, a boat, or any other vehicle. If the signer’s hand with the handshape-3 stops halfway moving from left to right and restarts after a short stop, it means that the car (or the bicycle, boat etc.) stopped halfway before reaching its final destination. There is experimental evidence (Emmorey and Herzig 2003) that in classifier predicate constructions the variations of the movement of the hand in the signing space reflect differences in the movement of the object in the real world in a gradient rather than a categorical way. This evidence is a challenge to a linguistic analysis, because discreteness is thought to be a defining characteristic of natural languages. In principle, one can deny that classifier predicates are truly linguistic and endorse the view that they belong to the gestural system of communication (cf. Cogill-Koez 2000). However, this view, at least in its simplest form, would not be without problems either, since there is fairly solid evidence that the handshape part of the classifier predicate construction is linguistic. For one thing, in the same paper Emmorey and Herzig show that variations in the handshape display categorical properties for signers: variations in the size of the classifier correspond to a whole range of sizes of the real world object. Second, Benedicto and Brentari (2004) have proposed that the choice of the handshape in classifier predicate constructions determines the syntactic nature of the predicate (for example, a body- part classifier introduces an ergative predicate while a whole-object classifier introduces an unaccusative predicate). Therefore, if the movement component in classifier predicates is gestural, we are bound to conclude that the very same sign in a sentence, the classifier predicate, conflates two modes of communication, one linguistic and the other nonlinguistic. The literature on classifier predicates is fairly large and I cannot do justice to it here. Various interesting proposals have been advanced. I will mention an idea proposed

524 Carlo Cecchetto by Zucchi (2011), and extended by Davidson (2015) to the case of role shift,9 mainly because it illustrates rather well the kind of virtuous feedback that can be established between sign-and spoken-language research when even the most recalcitrant aspects of sign languages are taken seriously. Zucchi’s idea is that a classifier predicate incorporates a demonstrative component, much like the English sentence ‘A car moved in a way similar to this’ and which must be accompanied by a demonstration, namely the gesture that illustrates the kind of movement the car performed. If so, classifier predicates are demonstrative predicates and the movement part in them is not a linguistic morpheme but the demonstration required by the predicate. If this hypothesis is on the right track, a theory describing the interactions between the linguistic and the gestural systems is called for, but this is an area that has been largely neglected by formal linguists. Ultimately, the development of such a theory would be beneficial not only for the development of sign language grammars but also for an improved understanding of those spoken language constructions in which the gestural component is required (rather than being an optional extra). It might well be possible that some cases of obligatory co-occurrence of gestures with language have been too quickly dismissed as extralinguistic and that this has weakened the theory of grammar. Another area in which sign languages differ from spoken languages is the fact that they appear to be more iconic. The seminal work of Stokoe (1960) for sign language phonology and of Klima and Bellugi (1979) for sign language psycholinguistics has opened a research tradition that has established that the role of iconicity at the lexical and sub-lexical level should not overestimated. Even when their iconic origin is transparent, in the actual processing of lexical signs iconicity plays a minor role, because signs are decomposed into small phonological units, with the effect that the iconic component (which is based on the holistic representation) is lost. Evidence for this sign decomposition where iconicity dissolves include the following phenomena: ‘slips of the hand,’ where the sublexical features of adjacent signs influence one another (Klima and Bellugi 1979); the fact that in short-term memory tasks for signs, most errors involve phonological substitutions (Bellugi, Klima, and Siple 1975); the fact that the paraphasic errors made by aphasic signers are determined by the phonological structure of ASL signs (Corina 2000); the ‘tip of the fingers’ phenomenon, when signers are sure they know a sign and can guess some of its phonological features but cannot retrieve it (Thompson, Emmorey, and Gollan 2005); the occurrence of phonological priming effects in a primed, lexical decision paradigm (Mayberry and Witcher 2005); iconicity does not play this role in most of children’s phonological errors (Pichler 2012). 9

Role shift is a strategy common across sign languages in which the signer takes the perspective of the quoted person by slightly shifting his/her body toward the position in the signing space to which the quoted person is associated (role shift usually involves also a change of the position of the head and breaking the eye contact with the addressee). Role shift is another challenging case for the mechanical application to sign languages of categories developed for spoken languages, because, despite initial appearances, it cannot be assimilated to the direct discourse report familiar from spoken languages (for example, role shift can occur even in absence of a verb of saying or of a propositional attitude verb, as pointed out by Zucchi 2004). See Quer (2005) for discussion.

The Syntax of Sign Language and Universal Grammar 525 This notwithstanding, the effects of iconicity might be more substantial at the clausal level. We already mentioned the example of classifier predicates, but iconicity is also visible in simpler cases. For example, in many sign languages the sign for EAT is produced by moving the dominant hand towards the mouth. The iconicity effect is given by the fact that the more rapid the movement of the hand, the quicker the eating process (and, possibly, the bigger the quantity of food that is ingested). These iconicity effects are pervasive in sign languages (Cuxac 1999) but are very rare in spoken languages.10 Schlenker, Lamberton, and Santoro (2013) started developing an approach, which they call ’formal semantics with iconicity,’ with the explicit aim of explaining iconicity effects with the tools offered by formal approaches. The core idea is that logical variables can be simplified pictures of what they denote. This is accomplished by assuming that some geometric properties of signs must be preserved by the interpretation function. Schlenker, Lamberton, and Santoro raise the question of whether the modifications necessary to the apparatus of formal semantics to account for iconicity are useless (or even inappropriate) for spoken languages. They sketch two possible answers. The first is that spoken language semantics might be viewed as a semantics for sign language from which most iconic elements have been removed. From this perspective, sign language semantics is richer than spoken language semantics. An alternative possibility is what already emerged at the end of the discussion about classifier predicates. It might be that when co-speech gestures are reintegrated in the study of spoken language, sign and spoken languages end up displaying roughly the same expressive possibilities, including iconicity effects. Schlenker, Lamberton, and Santoro offer some arguments which favor the second possibility, but clearly more research is needed to clarify this important issue.

20.7 Conclusion In this chapter, I first argued that the research on sign languages has made clear that the clause has a hierarchical (as opposed to a flat) structure. This was illustrated by using the ellipsis test as a probe. Furthermore, I claimed that it is possible to show that in sign languages a hierarchical unit can be embedded within another hierarchical unit of the same type. Therefore, sign languages display hierarchical structures produced by recursive rules. Since hierarchy+recursion is a core component of Universal Grammar as this concept has been developed in the generative tradition, sign languages do not pose any special challenge to this tradition in this respect. Then, I asked whether displacement (syntactic movement) is visible in sign languages and I answered positively, although I pointed out that it is possible that constraints on syntactic movement are partially different (tighter) for sign languages. 10

Not unattested, though. For example by uttering the sentence ‘the movie was loooooong’ one transmits the idea that one has suffered for the excessive duration of the film.

526 Carlo Cecchetto The next question was about binding principles and I claimed that, although the pronominal system of sign languages has some clear differences with respect to the pronominal system of spoken languages, there are general principles concerning anaphora (notably Principle C) that hold across modalities. Finally, I switched to the challenges that sign languages pose to grammatical theory, and I mentioned classifier predicate constructions and iconicity effects. My tentative conclusion is that these challenges might call for significant revision of the linguist’s toolkit. However, it is likely that these changes will impact on the grammatical theory of spoken languages as well, rather than weighing in against the hypothesis that an important core of grammatical properties are shared by language across modalities.

Chapter 21

L o ok i n g f or U G in Animals A Case Study in Phonology Bridget D. Samuels, Marc D. Hauser, and Cedric Boeckx

21.1 Introduction Do animals have Universal Grammar?1 The short answer must be ‘no.’ Otherwise, why (as Noam Chomsky has repeatedly asked) do human children learn language with strikingly little conscious effort, while no other animal has even come close to approximating human language, even with extensive training (e.g., apes, dolphins) or exposure (e.g., dogs) to massive linguistic input? But we must qualify this answer, particularly in light of the fact that many of the cognitive capacities that clearly serve our linguistic ability—rich conceptual systems, vocal imitation, categorical perception, and so on— are shared with other species, including some of our closest living relatives. This suggests that the question is more complicated than it might first appear. In the present work, we use phonology as a case study to show what type of cross-species evidence may bear—now and in future work—on the issue of whether animals have (various components of) UG, which we construe here broadly as the systems that are recruited by language but need not be specific to it. Phonology is a particularly interesting area to study with an eye towards determining how much can be attributed to mechanisms that are present in other cognitive areas and in other species. Here there 1 We thank the editor for the opportunity to contribute to this volume. In the six years since we initially drafted this manuscript, our perspectives have changed and in some cases diverged. This work represents a snapshot of our thinking on these matters. A substantial body of work has also appeared in recent literature, only some of which we are able to include here. We therefore encourage readers to consult our other works, which may interpret the currently available body of evidence in different ways.

528 Bridget D. Samuels, Marc D. Hauser, and Cedric Boeckx is a tension in the literature between Pinker and Jackendoff (2005:212), who claim that ‘major characteristics of phonology are specific to language (or to language and music), [and] uniquely human,’ and Chomsky (2004a, 2008), who argues that phonology is an afterthought. In Chomsky’s view, an externalization system was applied to an already fully functional internal language (i.e., syntax and semantics); phonological systems are ‘doing the best they can to satisfy the problem they face: to map to the [Sensory-Motor system—BS, MH, CB] interface syntactic objects generated by computations that are “well-designed” to satisfy [Conceptual-Intentional system—BS, MH, CB] conditions’ but unsuited to communicative purposes (Chomsky 2008:136). Chomsky’s position accords with the evolutionary scenario developed by Hauser et al. (2002) and Fitch et al. (2005), who hypothesize that language may have emerged rather suddenly as a result of minimal genetic changes with far-reaching consequences. If this is correct, phonology might make extensive use of abilities that already found applications in other cognitive domains at the time externalized language emerged. One way of testing this hypothesis is to see how many of the mechanisms that potentially underlie phonology can be found in other species, including mechanisms and abilities that may or may not show up in the communication systems of the species being investigated. To the extent we can show that other species can do what phonological computations require—that is, the more we can minimize the need for evolving language-or phonology-specific abilities—the evolutionary account developed by Hauser et al. gains credibility from an evolutionary/biological standpoint (Hornstein and Boeckx 2009; see already Lindblom, MacNeilage, and Studdert-Kennedy 1984:187). We suggest, in line with this view, that operations and representations that underlie phonology may have been exapted, or recruited from other cognitive or perceptuo- motor domains for the purpose of externalizing language.2 If we are on the right track, comparing humans and other species may provide an interesting perspective on components of UG.

21.2 Animal Phonology: What Do We Look For? Hauser et al. (2002:1573) list a number of approaches to investigating the Sensory-Motor system’s properties (shown in (1)), all of which are taken to be shared with other species and/or cognitive domains other than language: (1)

a. Vocal imitation and invention: Tutoring studies of songbirds, analyses of vocal dialects in whales, spontaneous imitation of artificially created sounds in dolphins;

2 On language as an exaptation, see among others Piattelli-Palmarini (1989); Uriagereka (1998); Hauser et al. (2002a); Boeckx and Piattelli-Palmarini (2005); Fitch et al. (2005).

Looking for UG in Animals: A Case Study in Phonology 529 b. Neurophysiology of action–perception systems: Studies assessing whether mirror neurons, which provide a core substrate for the action–perception system, may subserve gestural and (possibly) vocal imitation; c. Discriminating the sound patterns of language: Operant conditioning studies of categorical perception and the prototype magnet effect in mammals and birds; d. Constraints imposed by vocal tract anatomy: Studies of vocal tract length and formant dispersion in birds and primates; e Biomechanics of sound production: Studies of primate vocal production, including the role of mandibular oscillations; f Modalities of language production and perception: Cross-modal perception and sign language in humans versus unimodal communication in animals. While all of these undoubtedly deserve attention, they address two areas—how auditory categories are learned and how speech is produced—that are peripheral to the core of phonological computation and tell us little about how phonological objects are represented or manipulated.3 Yip (2006a,b) outlines a complementary set of research aims, suggesting that we should investigate whether other species are capable of the following:4 (2) a. Grouping by natural classes; b. Grouping sounds into syllables, feet, words, phrases; c. Calculating statistical distributions from transitional probabilities; d. Learning arbitrary patterns of distribution; e. Learning/producing rule-governed alternations; f. Computing identity (total, partial, adjacent, non-adjacent). This list can be divided roughly into three parts (with some overlap): (2a,b) are concerned with how representations are organized, (2c,d) are concerned with how we arrive at generalizations about the representations, and (2e,f) are concerned with the operations that are used to manipulate the representations. There are at least three more areas to investigate in nonlinguistic domains and in other species: (3) g. Exhibiting preferences for contrast/rhythmicity; h. Performing numerical calculations (parallel individuation and ratio comparison); i. Using computational operations relevant to phonology. 3

See Samuels (2011, 2012) for discussion of (c) and (f), which are relevant to questions of phonological acquisition and the building of phonological categories. 4 Yip mentions two additional items that also appear on Hauser et al.’s list: categorical perception/ perceptual magnet effects and accurate production of sounds (mimicry).

530 Bridget D. Samuels, Marc D. Hauser, and Cedric Boeckx In the sections to follow, we present evidence that a wide range of animal species are capable of solving experimental tasks that provide evidence for features (a–i), though it may be the case that there is no single species except ours in which all these abilities cluster in exactly this configuration. In other words, it may be that what underlies human phonology is a unique combination of abilities, while the individual abilities themselves may be found in many other species—a point made long ago by Charles Hockett, though with attention focused on different features. In section 21.3, we focus on the mechanisms underlying capacities (a), (b), and (h)—that is, how phonological material is grouped. Next, in section 21.4, we turn to the abilities (c–g), that is, the capacity to identify and produce patterns. In section 21.5, we discuss capacities (e) and (i), focusing on symbolic computation. Finally, we briefly discuss other linguistic modules and conclude in section 21.6.

21.3 Grouping Since the hypothesis put forward by Hauser et al. (2002) takes recursion to be one of the central unique properties of human language, much attention has been paid to groupings, particularly recursive ones, in language. Human beings are masters at grouping and making inductive generalizations. In setting up the comparative approach to this problem, Cheney and Seyfarth (2007:118) make the point eloquently: ‘the tendency to chunk is so pervasive that human subjects will work to discover an underlying rule even when the experimenter has—perversely—made sure there is none.’ This point holds true across the board, not just for linguistic patterns. While phonology is widely considered to be free of recursion,5 grouping (of features, of segments, and of larger strings) is nonetheless an integral part of phonology, and there is evidence that infants perform grouping or ‘chunking’ in nonlinguistic domains as well (see for example Feigenson and Halberda 2004). Additionally, segmenting the speech stream into words, morphemes, and syllables depends on the converse of grouping, namely edge detection. We will discuss edge detection and pattern extraction in section 21.4. Since the 1970s, categorical perception experiments on a wide range of species have provided evidence that animals are sensitive to the phonetic building blocks—features and segments—which are grouped into larger constituents in human phonology. Many studies beginning with Kuhl and Miller’s (1975) pioneering work on chinchillas show that mammals (who largely share our auditory system) are sensitive to many of the 5

Some authors have argued for recursion in the higher levels of the prosodic hierarchy (e.g., at the Prosodic Word level or above). See Truckenbrodt (1995) for a representative proposal concerning recursion at the Phonological Phrase level. Even if this is correct (though see Samuels 2011), the recursive groupings in question are mapped from syntactic structure and are therefore not created by the phonological system alone. Furthermore, this type of recursive structure is also quite different from the type found in syntax (e.g., sentential embedding), which is limited in its depth only by performance factors.

Looking for UG in Animals: A Case Study in Phonology 531 acoustic parameters that define phonemic categories in human language (for further discussion, see Samuels 2011, 2012). Moreover, it has been shown that some songbirds categorize the notes of their song in a context-dependent manner, just as humans do when they categorize phones in a linguistic context (Lachlan and Nowicki 2015). We can then begin to ask whether animals might be able to group segments into larger units such as syllables, words, and phrases, all of which are utilized by phonological processes in humans. One way of approaching the question of whether animals can group sensory stimuli in ways that are relevant to phonology is to see whether their own vocalizations contain internal structure. The organization of bird song is particularly clear, though it is not obvious exactly whether or how analogies to human language should be made. Yip (2006a) discusses how zebra finch songs are structured, building on work by Doupe and Kuhl (1999) and others. The songs of many passerine songbirds consist of a sequence of one to three notes (or ‘songemes,’ as Coen 2006 calls them) arranged into a ‘syllable.’ The syllables, which can be up to one second in length, are organized into motifs which Yip considers to be equivalent to prosodic words but others equate with phrases, and there are multiple motifs within a single song. The structure can be represented graphically as follows, where M stands for motif, σ stands for syllable, and n stands for note (modified from Yip 2006a): (4)

Song

M1

M2

M3 ...

σ1 n1

σ2 n2

n3 n4

σ3 n5

σ1 n6

n1

σ2 n2 n3 n4

σ3 n5

n6

There are a few important differences between this birdsong structure and those found in human phonology, some of which are not apparent from the diagram. First, as Yip points out, there is no evidence for binary branching in this structure, which suggests that the combinatorial mechanism used by birds, whatever it is, cannot be equated with binary Merge. It could, however, be more along the lines of adjunction or concatenation, processes that create a flat structure (see for example Samuels and Boeckx 2009). Second, the definition of a syllable in birdsong is a series of notes/songemes bordered by silence (Williams and Staples 1992; Coen 2006). This is very unlike syllables, or indeed any other phonological categories, in human language. Third, the examples from

532 Bridget D. Samuels, Marc D. Hauser, and Cedric Boeckx numerous species in Slater (2000) show that the motif is typically a domain of repetition (as I have represented it in (4)); the shape of a song is ((ax)(by)(cz))w with a string of syllables a, b, c repeated in order. This is quite reminiscent of reduplication. Payne (2000) shows that virtually the same can be said of humpback whale songs, which take the shape (a…n)w, where the number of repeated components, n, can be up to around ten. We should underscore the fact that the (a…n)w patterns just discussed are combinatorial rules that define what counts as well-formed song: that is, they describe a type of syntax. Here the distinction made by Anderson (2004) and suggested in earlier work by Marler (1977) is useful: a number of species have a ‘phonological’ syntax to their vocalizations, but only humans have a ‘semantic’ or ‘lexical’ syntax which is compositional and recursive in terms of its meaning. Again, this reiterates Hauser et al.’s hypothesis that what is special about human language is the mapping from syntax to the interfaces (following Chomsky 2004a, 2008, particularly the semantic interface), not the externalization system. Concerning the development of phonological syntax, Fehér et al. (2009) conducted a study on zebra finches, which are close-ended learners and can only learn their songs during a critical period of development. They set out to test whether zebra finch isolates—birds who were deprived of the proper tutoring from adults during development—could evolve wild-type song over the course of generations in an isolated community. The experiment succeeded: the community of isolates, after only three or four generations, spontaneously evolved a song which approached the wild-type song of their species. Out of the chaos of stunted song, a well-behaved system emerged, stemming only from the birds themselves and the input they received from other individuals in the isolate colony. Such rapid self-organization of structure in vocalizations is truly remarkable and suggests to us that a similar process could be operative in creating human phonological systems. This is consistent with observations of emerging phonological structure in new sign languages, such as Nicaraguan Sign Language (Senghas et al. 2005) and Al-Sayyid Bedouin Sign Language (Aronoff et al. 2008; Sandler et al. 2011). It should also be noted that both birdsong and whale song structures are ‘flat’ or ‘linearly hierarchical’ (on these terms, see Neeleman and van de Koot 2006; Cheney and Seyfarth 2007)—they have a depth of embedding that is limited to a one-dimensional string which has been delimited into groups, as in (5). Samuels (2011) has argued that the same is true of human phonology. It is interesting to note in conjunction with this observation that baboon social knowledge also appears to be consistent with this type, as Cheney and Seyfarth (2007) have suggested. Baboons within a single social group (of up to about eighty individuals) obey a strict, transitive dominance hierarchy. But this hierarchy is divided up into matrilines; individuals from a single matriline occupy adjacent spots in the hierarchy, with mothers, daughters, and sisters from the matriline next to one another. So an abstract representation of their linear dominance hierarchy would look something like this, with each x representing an individual and parentheses defining matrilines: (5)

(xxx)(xx)(xxxx)(xxx)(xxxxxxx)(xxx)(x)(xxxx)

Looking for UG in Animals: A Case Study in Phonology 533 Not only does human language group segments into syllables and larger chunks, there are also melodies (prosodic or intonational contours) that play out over domains larger than the word, as well as languages in which each syllable bears a particular tonal pattern. Processing such melodies requires sensitivity to both relative and absolute pitch. There is evidence that rhesus monkeys, like us, treat a melody which is transposed by one or two octaves as more similar to the original than one which is transposed by a different interval (Wright et al. 2000). Rhesus monkeys can also distinguish rising pitch contours from falling ones, which is also required to perceive pitch accent, lexical tone, and intonational patterns in human speech (Brosch et al. 2004). However, most nonhuman animals are generally more sensitive to absolute pitch than they are to relative pitch; the opposite is true for humans (see Patel 2008). We discuss additional evidence that tamarins can discriminate cross-linguistic prosodic differences in 21.4. There is evidence to suggest that, as in phonology (but strikingly unlike narrow syntax), the amount of hierarchy capable of being represented by animals is quite limited. In the wild, apes and monkeys rarely perform actions that are hierarchically structured with sub-goals and sub-routines, and this is true even when attempts are made to train them to do so. Byrne (2007) offers one notable exception, namely the food processing techniques of gorillas. Byrne provides a flow chart detailing a routine, complete with several decision points and optional steps, that mountain gorillas use to harvest and eat nettle leaves. This routine comprises a minimum of five steps, and Byrne reports that the routines used to process other foods are of similar complexity. Byrne further notes that ‘all genera of great apes acquire feeding skills that are flexible and have syntax-like organisation, with hierarchical structure…. Perhaps, then, the precursors of linguistic syntax should be sought in primate manual abilities rather than in their vocal skills’ (Byrne 2007:12; emphasis his). We agree that manual routines provide a potentially interesting source of comparanda for the syntax of human language, broadly construed (i.e., including the syntax of phonology), but must be treated cautiously so as not to misconstrue patterns that are consistent with a syntactic structure and evidence that this is the structure represented an implemented by the animal. Fujita (2007) has suggested along these lines the possibility that Merge evolved from an action grammar of the type which would underlie apes’ foraging routines. This said, even if such computations are to be found in the manual routines of animals during foraging, a further puzzle arises: why haven’t these computations been deployed in other motor domains, including communicative facial and body gestures, as well as vocal gestures? One possible answer to this question is that unlike humans, the capacities that evolve in animals are locked into particular (adaptive) contexts, lacking any form of generality to new domains or contexts. If this is correct, it would say something about the nature of the interfaces, and how they have evolved, perhaps uniquely within the genus Homo. At this point, we simply know too little about either the representations or computations underlying motor routines in animals, not to mention how these routines might be instantiated or generalized to other domains. There are, however, experiments suggesting that nonhuman primates appear highly limited in the complexity of their routines, shedding light on key comparative

534 Bridget D. Samuels, Marc D. Hauser, and Cedric Boeckx differences with humans. For example, Johnson-Pynn et al. (1999) used bonobos, capuchin monkeys, and chimpanzees in a study similar to one done on human children by Greenfield et al. (1972) (see also discussion of these two studies by Conway and Christiansen 2001). These experiments investigated how the subjects manipulated a set of three nesting cups (call them A, B, and C, in increasing order of size). The subjects’ actions were categorized as belonging to the ‘pairing,’ ’pot,’ or ’subassembly’ strategies, that exhibit varying degrees of embedding:6 (7) a. Pairing strategy: Place cup B into cup C. Ignore cup A. b. Pot strategy: First, place cup B into cup C. Then place cup A into cup B. c. Subassembly strategy: First, place cup A into cup B. Then place cup B into cup C. The pairing strategy is the simplest, requiring only a single step. This was the predominant strategy for human children up to twelve months of age, and for all the other primates—but the capuchins required watching the human model play with the cups before they produced even this kind of combination. The pot strategy requires two steps, but it is simpler than the subassembly strategy in that the latter, but not the former, requires treating the combination of cups A + B as a unit in the second step.7 Human children use the pot strategy as early as eleven months (the youngest age tested) and begin to incorporate the subassembly strategy at about twenty months. In stark contrast, the nonhuman primates continued to prefer the pairing strategy, and when they stacked all three cups, they still relied on the pot strategy even though the experimenter demonstrated only the subassembly strategy for them. Though we should be careful not to discount the possibility that different experimental methodologies or the laboratory context is responsible for the nonhumans’ performance, rather than genuine cognitive limitations, the results are consistent with the hypothesis that humans have the ability to represent deeper hierarchies than other primates—a result that is not merely a quantitative difference, but qualitative in terms of computations and representations. This is, of course, what we predict if only humans are endowed with the recursive engine that allows for infinite syntactic embedding, as well as other operations (Hauser et al. 2002). Many other types of experimental studies have also been used to investigate how animals group objects. It is well known that a wide variety of animals, including rhesus monkeys, have the ability to perform precise comparisons of small numbers (